Skip to content

feat: expose evaluator row status for LLM evaluators#11333

Open
SeCuReDmE-main-dev wants to merge 2 commits into
deepset-ai:mainfrom
SeCuReDmE-main-dev:feature/haystack-evaluator-uncertainty-phase1
Open

feat: expose evaluator row status for LLM evaluators#11333
SeCuReDmE-main-dev wants to merge 2 commits into
deepset-ai:mainfrom
SeCuReDmE-main-dev:feature/haystack-evaluator-uncertainty-phase1

Conversation

@SeCuReDmE-main-dev
Copy link
Copy Markdown

@SeCuReDmE-main-dev SeCuReDmE-main-dev commented May 18, 2026

Summary

This is Phase 1 for RFC #11332.

It adds a narrow top-level evaluation_statuses output to LLMEvaluator results so callers can distinguish rows that were successfully evaluated from rows that failed during generation or parsing when raise_on_failure=False is used.

Scope

Included:

  • LLMEvaluator now returns evaluation_statuses alongside existing results and meta.
  • Successful parsed rows are marked as evaluated.
  • generation/parsing failures that continue under raise_on_failure=False are marked as error.
  • ContextRelevanceEvaluator and FaithfulnessEvaluator pass through the new status list while preserving their current nan behavior.
  • focused tests for successful rows, invalid JSON, and generation failures.
  • release note added under releasenotes/notes.

Intentionally excluded:

  • no EvaluationRunResult reporting changes;
  • no retriever, agent, router, HITL, or governance changes;
  • no indeterminate status yet;
  • no public neutrosophic naming in code.

Phase 2 and Phase 3 remain dependent on maintainer feedback from the RFC and this PR.

Tests

C:\Users\jeans\.local\bin\uv.exe run pytest test/components/evaluators/test_llm_evaluator.py test/components/evaluators/test_context_relevance_evaluator.py test/components/evaluators/test_faithfulness_evaluator.py -q
C:\Users\jeans\.local\bin\uv.exe run ruff check haystack/components/evaluators test/components/evaluators
C:\Users\jeans\.local\bin\uv.exe run ruff format --check haystack/components/evaluators test/components/evaluators
C:\Users\jeans\.local\bin\uv.exe run reno lint .

Results:

  • 38 passed, 2 skipped
  • ruff check: passed
  • ruff format --check: passed
  • reno lint: passed

@SeCuReDmE-main-dev SeCuReDmE-main-dev requested a review from a team as a code owner May 18, 2026 04:04
@SeCuReDmE-main-dev SeCuReDmE-main-dev requested review from anakin87 and removed request for a team May 18, 2026 04:04
@vercel
Copy link
Copy Markdown

vercel Bot commented May 18, 2026

@SeCuReDmE-main-dev is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 18, 2026

CLA assistant check
All committers have signed the CLA.

@SeCuReDmE-main-dev
Copy link
Copy Markdown
Author

Phase 2 base work is complete and stacked on top of this Phase 1 branch.

Downstream stacked PR: SeCuReDmE-main-dev#1
Outcome: BM25 now has an opt-in retrieval confidence metadata path via Document.meta.
Boundary: this stays BM25-only and does not redefine global retriever score semantics.

@SeCuReDmE-main-dev
Copy link
Copy Markdown
Author

Phase 3 base work is complete and the stacked progression now reaches explicit HITL decision contracts.

Downstream stacked PR: SeCuReDmE-main-dev#2
Outcome: ToolExecutionDecision now carries explicit approved / modified / rejected status semantics.
Boundary: this stops at HITL contract enrichment and does not introduce Phase 4 runtime/governance changes.

@SeCuReDmE-main-dev
Copy link
Copy Markdown
Author

Pause boundary — awaiting maintainer feedback

All three stacked phases are now validated and documented:

All branches pass ruff check and ruff format. All new fields are opt-in or additive with no breaking changes.

No further work will be performed until maintainers respond. The next action depends entirely on your feedback — whether that is naming guidance, scope adjustment, acceptance, or rejection.

The corresponding RFC is #11332. Happy to answer any questions.

@SeCuReDmE-main-dev
Copy link
Copy Markdown
Author

@coderabbitai review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants