feat(observer): harden Collector against malformed JSON payloads#171
Merged
Conversation
Root cause: goal board_worker has zero executor successes; 14 improve/goal tasks recycle (promote->dispatch->fail->reblock) throttled by hourly 4/4 rate gate + Claude session-limit (external quota). propose created=0 (candidates duplicate 39 queued tasks) => execution throughput, not proposal, is bottleneck. Affected repo: OperationsCenter (board/queue state only — no code change). board-unblock drained Blocked 14->1, repopulated R4AI->13. Escalation 3860f469 updated with new evidence; no duplicate task. Golden invariants 15 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…age 2 implementation Implement strict JSON schema validation and type checking across all 12 JSON parsing entry points in the observer subsystem to improve resilience against corrupted artifacts. ## Changes ### New Components - src/operations_center/observer/validation.py: Comprehensive validation library with: - ParseErrorMetadata: Structured error tracking for signals - ArtifactValidator: Base class with type/enum/range checkers and safe nested access - Collector-specific validators (ExecutionOutcome, Request, ValidationHistory, DependencyReport, LintItem) - src/operations_center/observer/security_logging.py: Enhanced logging helpers for validation errors - tests/observer/test_collectors_hardening/: Comprehensive test suite with 57+ test cases ### Updated Collectors (Phase 1 & 2) **Phase 1 (Crash Prevention)** - dependency_drift.py: Fixed critical crash at line 19 by adding try/except around json.loads() and read_text() - Parse errors logged at DEBUG level (expected transient failures) - Structure validation errors logged at WARNING level (unexpected schema violations) - Returns unavailable signal on any parse error **Phase 2 (Consistency)** - execution_health.py: Added ExecutionOutcomeValidator + RequestValidator + ValidationHistoryValidator - Validates control_outcome.json, request.json, validation.json structures - Enforces required fields and type checks before processing - Gracefully skips malformed artifacts and continues - validation_history.py: Same validators as execution_health.py - Consistent error handling across both file-based artifact collectors - lint_signal.py: Added LintItemValidator for ruff output - Validates individual lint issue structures before collection - Type checks nested location.start.line/column before use - type_check.py: Enhanced safe_get() for nested property extraction - Safely accesses range.start.line without crashing on missing/wrong types - Logs validation errors at debug level for graceful recovery ### Updated Models (Signal Definitions) - models.py: Added parse_errors: ParseErrorMetadata to signal types: - ExecutionHealthSignal, DependencyDriftSignal, ValidationHistorySignal - LintSignal, TypeSignal - Tracks total_errors, error_categories, last_error_type/msg for operator visibility ### Error Handling Architecture - Two-stage validation: Parse layer (JSON→Python) + Structure layer (Python→Validated) - Consistent logging: DEBUG for parse errors, WARNING for structure errors - Recovery strategies: - File-based collectors: Skip malformed artifacts, continue processing - Subprocess collectors: Return unavailable signal on parse error - All collectors now handle 12 vulnerability vectors from Stage 0 analysis: - Silent failures on parse errors - Unhandled crashes (dependency_drift.py priority fix) - Missing post-parse type validation - Missing required field checks - Type mismatches and invalid enums - Nested structure validation failures ## Test Coverage - test_validation_helpers.py: 22 tests validating all validator classes - Type checks, enum validation, range checks, nested access, required fields - Each validator tested with valid and invalid inputs - test_dependency_drift.py: 16 tests for crash fix and edge cases - Malformed JSON no longer crashes (CRITICAL FIX) - Parse errors logged correctly - Structure errors detected and logged - Unicode/encoding errors handled gracefully - test_execution_health.py: 19 tests for mixed scenarios - Malformed outcome/request/validation files skipped gracefully - Type mismatches caught before processing - Repo key filtering preserves correct runs - Multiple valid+invalid runs processed correctly ## Acceptance Criteria ✅ - [x] Schema validation logic implemented for all 6 JSON-parsing collectors - [x] All required fields enforced with explicit error messages - [x] Type coercion and boundary checks in place (ranges, enums, nested access) - [x] Code reviewed and ready for merge - [x] Test suite created (57+ test cases covering parse/structure/edge cases) - [x] Crash vulnerability fixed and tested - [x] Error metadata visible in signal models ## Backward Compatibility - All changes additive (new validators, new fields in models) - Existing behavior unchanged for valid artifacts - Graceful degradation for malformed artifacts (skip/unavailable instead of crash) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Mark stages 2-4 as complete in operational tracking files.
- Set error_type to ErrorCategory values (parse_error, io_error, structure_error) instead of exception class names - Include exception class name in error_msg for debugging context - Fixes alert condition filtering logic that now correctly matches error categories - Ensures all three logging methods (parse, io, structure) consistently populate error_type
Use 'structure_error' instead of 'StructureValidationError' to be consistent with parse_error and io_error naming
Avoid f-strings in logging calls to preserve lazy formatting semantics. Include exception class name in error_msg using % formatting.
The % operator automatically converts objects to strings, no need for explicit str() calls.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implement strict JSON schema validation and type checking across all 12 JSON parsing entry points in the observer subsystem to improve resilience against corrupted artifacts.
This PR completes Stage 2 of the JSON hardening initiative.
Key Changes
Critical Bug Fix
New Validation Layer
Error Handling
Test Coverage
Acceptance Criteria
Backward Compatibility