Skip to content

Feat/prompt eval#435

Open
MaryChen68 wants to merge 2 commits into
mainfrom
feat/prompt-eval
Open

Feat/prompt eval#435
MaryChen68 wants to merge 2 commits into
mainfrom
feat/prompt-eval

Conversation

@MaryChen68

Copy link
Copy Markdown
Contributor
  • Runs all prompt types against a set of test documents and prints the outputs, so you can eyeball whether the LLM is giving good results.

  • When LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY/LANGFUSE_BASE_URL are present in .env, every run is also logged to Langfuse so you can compare prompt versions in the UI.

  • Usage:

  • eyeball mode:

  • uv run python eval_prompts.py
    
  • compare two models side-by-side in terminal:

  • uv run python eval_prompts.py --compare gpt-4o gpt-4o-mini
    
  • using different LLM model

  • uv run python eval_prompts.py --model <model_name>
    
  • send results to Langfuse as an experiment (dataset is auto-created if missing)

  • uv run python eval_prompts.py --experiment <name of experiment>
    
  • Example:
    
  • uv run python eval_prompts.py --model gpt-5.4 --experiment gpt-5.4
    

@MaryChen68 MaryChen68 requested a review from kcarnold May 29, 2026 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant