feat: expose evaluator row status for LLM evaluators#11333
feat: expose evaluator row status for LLM evaluators#11333SeCuReDmE-main-dev wants to merge 2 commits into
Conversation
|
@SeCuReDmE-main-dev is attempting to deploy a commit to the deepset Team on Vercel. A member of the Team first needs to authorize it. |
|
Phase 2 base work is complete and stacked on top of this Phase 1 branch. Downstream stacked PR: SeCuReDmE-main-dev#1 |
|
Phase 3 base work is complete and the stacked progression now reaches explicit HITL decision contracts. Downstream stacked PR: SeCuReDmE-main-dev#2 |
Pause boundary — awaiting maintainer feedbackAll three stacked phases are now validated and documented:
All branches pass No further work will be performed until maintainers respond. The next action depends entirely on your feedback — whether that is naming guidance, scope adjustment, acceptance, or rejection. The corresponding RFC is #11332. Happy to answer any questions. |
|
@coderabbitai review |
Summary
This is Phase 1 for RFC #11332.
It adds a narrow top-level
evaluation_statusesoutput toLLMEvaluatorresults so callers can distinguish rows that were successfully evaluated from rows that failed during generation or parsing whenraise_on_failure=Falseis used.Scope
Included:
LLMEvaluatornow returnsevaluation_statusesalongside existingresultsandmeta.evaluated.raise_on_failure=Falseare marked aserror.ContextRelevanceEvaluatorandFaithfulnessEvaluatorpass through the new status list while preserving their currentnanbehavior.releasenotes/notes.Intentionally excluded:
EvaluationRunResultreporting changes;indeterminatestatus yet;Phase 2 and Phase 3 remain dependent on maintainer feedback from the RFC and this PR.
Tests
Results:
38 passed, 2 skippedruff check: passedruff format --check: passedreno lint: passed