It would be helpful to have some test cases that didn't need an OpenAI API call (just paste in a prior response).
It would be helpful to visualize what's going on. For this issue, at least log the LM queries in a way that we can show easily. In general we probably want a more visual representation of what's going on; full version of this is a separate issue.
max_tokensshould be set tiny since we're not actually using the generation resultsIt would be helpful to have some test cases that didn't need an OpenAI API call (just paste in a prior response).
It would be helpful to visualize what's going on. For this issue, at least log the LM queries in a way that we can show easily. In general we probably want a more visual representation of what's going on; full version of this is a separate issue.