Skip to content

Comments

fix(evals): Exact match float int mismatch#1365

Merged
andrei-rusu merged 2 commits intomainfrom
fix/andreiru/exact_match_float_int
Feb 24, 2026
Merged

fix(evals): Exact match float int mismatch#1365
andrei-rusu merged 2 commits intomainfrom
fix/andreiru/exact_match_float_int

Conversation

@andrei-rusu
Copy link
Collaborator

No description provided.

@github-actions github-actions bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Feb 24, 2026
@andrei-rusu andrei-rusu force-pushed the fix/andreiru/exact_match_float_int branch from daeeba0 to 2c5a528 Compare February 24, 2026 11:38
@andrei-rusu andrei-rusu force-pushed the fix/andreiru/exact_match_float_int branch from 2c5a528 to f3224b7 Compare February 24, 2026 13:46
@andrei-rusu andrei-rusu force-pushed the fix/andreiru/exact_match_float_int branch from f4c6b30 to 73395d3 Compare February 24, 2026 13:50
@andrei-rusu andrei-rusu merged commit f3218b5 into main Feb 24, 2026
95 checks passed
@andrei-rusu andrei-rusu deleted the fix/andreiru/exact_match_float_int branch February 24, 2026 14:04
@cristipufu
Copy link
Member

This fix doesn't cover the case where target_output_key is "*" (the default).

When target_output_key is "*", _get_actual_output / _get_expected_output return the full dict, so the flow becomes:

actual_output = str({"result": 27.0})    # "{'result': 27.0}"
expected_output = str({"result": 27})     # "{'result': 27}"

float("{'result': 27.0}")  # ValueError → falls back to string comparison → mismatch

The float() try/except only helps when a specific key is extracted (scalar value). With "*" the whole dict is stringified and float() always fails, so it's the same broken path as before.

Repro: any evaluator JSON with "targetOutputKey": "*" comparing {"result": 27} expected vs {"result": 27.0} actual → scores 0%.

The fix probably needs deep structural comparison with numeric normalization for the dict case, e.g. recursively normalizing int/float before comparing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-code-assisted test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants