chore: Dev to Main Merge#623
Merged
Merged
Conversation
…rkflow Applies the changes from Dependabot PR #589 onto dev so they reach the dev branch ahead of the upstream PR (which targets main). Refs: ADO #44960 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Applies the changes from Dependabot PR #595 onto dev so they reach the dev branch ahead of the upstream PR (which targets main). Refs: ADO #44960 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Applies the changes from Dependabot PR #596 onto dev so they reach the dev branch ahead of the upstream PR (which targets main). Refs: ADO #44960 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…uv.lock Applies the changes from Dependabot PR #597 onto dev so they reach the dev branch ahead of the upstream PR (which targets main). Refs: ADO #44960 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Regenerated uv.lock after merging dev to incorporate both dependabot upgrades and agent-framework 1.3.0 changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ial in python application.
…ores Root cause: When the evaluate step couldn't compute any per-field confidence (e.g. logprobs unavailable on reasoning models like gpt-5/o1/o3, or image-only flow with no Content Understanding signal), save_handler emitted entity_score=0.0, schema_score=0.0. These `0.0`s flowed through Cosmos -> API -> UI and rendered as `0%` (red), indistinguishable from a genuine zero confidence. Fix: Treat `total_evaluated_fields_count == 0` (or no comparison items) as *unavailable* and propagate `None` through the ContentProcessor, ContentProcessorAPI and ContentProcessorWorkflow models. The frontend percentage cell renderer now shows `N/A` for null/undefined and `0%` only for a genuine numeric zero. Files changed: - ContentProcessor: save_handler.py (extracted _derive_aggregate_scores helper) - ContentProcessor: content_process.py default scores -> None - ContentProcessorAPI: ContentProcess + Content_Process default scores -> None - ContentProcessorWorkflow: ContentProcessRecord + Content_Process default scores -> None - ContentProcessorWorkflow: document_process_executor preserves None instead of coercing to 0.0 - ContentProcessorWeb: ProcessQueueGridTypes types scores nullable; ProcessQueueGrid passes undefined for null/undefined; CustomCellRender renders `N/A` when valueText is null/undefined and only `...` while still processing Tests: - New: ContentProcessor/tests/unit/pipeline/test_save_handler_scores.py (5 cases: valid scores, missing per-field signal, no comparison items, genuine zero, all-fields-above-threshold) - Updated existing default-value tests in Workflow + src/tests to assert None - Added tests for explicit zero preservation and Failed status -> None
Per feedback: Completed runs must always show a meaningful number; Failed runs and genuine zeros stay at 0%. - save_handler._derive_aggregate_scores picks the best available signal: (1) probabilistic confidence when logprobs available; (2) structural completeness (filled fields / total) when no logprobs (reasoning models, image-only flow); (3) 0.0 when no extraction data at all. - _is_filled_value heuristic: None/empty/whitespace count as not filled; descends into nested dicts/lists. - Reverted models from Optional[float]=None back to default 0.0. - Reverted frontend: no N/A path; renders 0% for null/missing scores. - 15 new tests covering all 3 paths + _is_filled_value heuristic.
…ation - F401: drop unused sync DefaultAzureCredential import in 3 credential util files (sync flow now raises RuntimeError; AsyncDefaultAzureCredential is still used). - W293/E122: fix blank-line whitespace and continuation-line indentation in ContentProcessorWorkflow/src/utils/credential_util.py.
fix: Psl entity score
Coverage Report •
|
||||||||||||||||||||||||||||||
Avijit-Microsoft
approved these changes
Jun 16, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This pull request refactors extraction quality scoring in the Content Processor pipeline to avoid misleading 0.0 scores for Completed runs without probabilistic confidence, while also tightening Azure credential selection behavior, updating dependencies, and hardening the UI display of scores.
Changes:
- Refactors
SaveHandleraggregate score derivation to select probabilistic confidence when available, otherwise fall back to structural completeness. - Removes sync fallback to
DefaultAzureCredentialin credential utilities (raising a clear error instead) and updates app initialization to use the utility. - Updates score semantics documentation/models, bumps dependencies (
idna,authlib), and adds/updates unit tests + UI handling for missing scores.
Reviewed changes
Copilot reviewed 23 out of 24 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/tests/ContentProcessorWorkflow/utils/test_credential_util_extended.py | Updates tests to expect a RuntimeError when no credential options succeed. |
| src/tests/ContentProcessorWorkflow/services/test_content_process_models.py | Clarifies score defaults in model default tests. |
| src/tests/ContentProcessorWorkflow/repositories/test_claim_process_model.py | Clarifies score defaults in repository model tests. |
| src/tests/ContentProcessor/utils/test_azure_credential_utils.py | Updates tests to expect RuntimeError when no credentials are available. |
| src/tests/ContentProcessor/utils/test_azure_credential_utils_extended.py | Updates extended credential tests for the new “raise on failure” behavior. |
| src/ContentProcessorWorkflow/uv.lock | Bumps authlib lock entry to 1.6.12. |
| src/ContentProcessorWorkflow/tests/unit/services/test_content_process_models.py | Adds assertions/tests around preserving explicit 0.0 scores. |
| src/ContentProcessorWorkflow/tests/unit/repositories/test_claim_process_model.py | Adds tests ensuring explicit 0.0 and failure-default 0.0 behavior. |
| src/ContentProcessorWorkflow/src/utils/credential_util.py | Changes sync credential selection to raise when no auth is available. |
| src/ContentProcessorWorkflow/src/steps/document_process/executor/document_process_executor.py | Centralizes safe coercion of score values from poll payloads. |
| src/ContentProcessorWorkflow/src/repositories/model/claim_process.py | Updates field descriptions to document new score semantics. |
| src/ContentProcessorWorkflow/src/libs/base/application_base.py | Switches initialization to use get_azure_credential() instead of DefaultAzureCredential. |
| src/ContentProcessorWorkflow/src/libs/azure/app_configuration.py | Requires an explicit credential instead of implicitly defaulting. |
| src/ContentProcessorWorkflow/pyproject.toml | Bumps authlib to 1.6.12. |
| src/ContentProcessorWeb/src/Pages/DefaultPage/Components/ProcessQueueGrid/ProcessQueueGridTypes.ts | Updates score field documentation (and should align types with nullish handling). |
| src/ContentProcessorWeb/src/Pages/DefaultPage/Components/ProcessQueueGrid/ProcessQueueGrid.tsx | Handles null/undefined scores in the grid rendering path. |
| src/ContentProcessorAPI/requirements.txt | Bumps idna to 3.15. |
| src/ContentProcessorAPI/app/routers/models/contentprocessor/claim_process.py | Updates API model field descriptions to explain new score semantics. |
| src/ContentProcessorAPI/app/libs/base/application_base.py | Switches initialization to use get_azure_credential() utility. |
| src/ContentProcessor/tests/unit/pipeline/test_save_handler_scores.py | Adds comprehensive unit tests for new aggregate scoring logic. |
| src/ContentProcessor/src/libs/utils/credential_util.py | Changes sync credential selection to raise when no auth is available. |
| src/ContentProcessor/src/libs/utils/azure_credential_utils.py | Changes sync credential selection to raise when no auth is available. |
| src/ContentProcessor/src/libs/pipeline/handlers/save_handler.py | Implements _derive_aggregate_scores + _is_filled_value structural fallback scoring. |
| src/ContentProcessor/requirements.txt | Bumps idna to 3.15. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Roopan-Microsoft
approved these changes
Jun 16, 2026
|
🎉 This PR is included in version 2.1.2 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This pull request introduces a major refactor to how extraction quality scores are calculated, surfaced, and described in the Content Processor pipeline. It replaces the previous scoring logic with a robust, well-tested system that distinguishes between probabilistic confidence and structural completeness, ensuring that completed runs without logprobs are scored meaningfully rather than as zero. It also improves Azure authentication error handling and updates dependency versions.
Extraction Score Calculation and API Improvements:
SaveHandlerto use a new_derive_aggregate_scoresmethod, which selects between probabilistic confidence and a structural completeness fallback, ensuring completed runs without logprobs get a meaningful score rather than 0.0. Also added a helper_is_filled_valueto robustly determine if a field is filled. [1] [2] [3]entity_scoreandschema_scorein the API model to clarify the new scoring semantics for consumers.Dependency and Azure Authentication Handling:
idnadependency to version 3.15 in both requirements files for consistency and security. [1] [2]DefaultAzureCredentialin Azure credential utilities; now, if CLI and managed identity authentication fail, a clear error is raised, prompting explicit user action. [1] [2] [3] [4]DefaultAzureCredentialdirectly. [1] [2]Frontend Robustness:
nullorundefinedscores gracefully, always displaying "0" instead of crashing or showing blank. [1] [2]Does this introduce a breaking change?
Golden Path Validation
Deployment Validation
What to Check
Verify that the following are valid
Other Information