Skip to content

Scope out question generation #3

@kcarnold

Description

@kcarnold
  • Collect a bunch of example inputs and outputs from last summer's exploratory work
    • We have one or two examples already.
  • Wrangle these examples into something we can use to train (fine-tune) a LM
    • we could start with the approach of the Interviewer even though it's not perfect...
  • Pick an LM that we can feasibly run inference on
    • OpenAI API?
    • One of the existing open-source ones we've used (maybe an encoder-decoder one like flan-ul2)
    • one of the new batch of open-source models (LLaMa etc), fine-tuned perhaps
  • Fine-tune the LM to generate questions like our examples
  • Collect ranking data on LM generations
    • build Streamlit app for this? Maybe there's already an app that people are using, e.g., any open-source ChatGPT replication project will have something like this. Vicunia, Alpaca... see llama.cpp repo. or https://arxiv.org/abs/2204.05862
  • Use ranking data to optimize the LM
  • Deploy the optimized LM as an API that the frontend can access.
    • We've already got an API for the interviewer model.

Design a simple format for input and output. e.g., input is document_text, cursor_position, and optional question_type, output is question, start_position, end_position (where positions are character offsets from the beginning of the texts)

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions