Test that LM queries are being made correctly

- [ ] are logprobs being computed from the results correctly?
- [ ] is spacing between sentences sensible / consistent during reorderings?
- [ ] `max_tokens` should be set tiny since we're not actually using the generation results
- [ ] could there be better prompts for the LM (vs the "write a short essay using this outline" that's there)

It would be helpful to have some test cases that didn't need an OpenAI API call (just paste in a prior response).

It would be helpful to visualize what's going on. For this issue, at least log the LM queries in a way that we can show easily. In general we probably want a more visual representation of what's going on; full version of this is a separate issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test that LM queries are being made correctly #8

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Test that LM queries are being made correctly #8

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions