test: add unit tests for FrameTracer and DivergenceDetector#211
Open
acailic wants to merge 3 commits into
Open
test: add unit tests for FrameTracer and DivergenceDetector#211acailic wants to merge 3 commits into
acailic wants to merge 3 commits into
Conversation
Makes `evidence` an optional keyword argument (default `None`, treated as `[]`) in `RecordingMixin.record_decision`. All existing callers already pass evidence explicitly so this is non-breaking. Also adds lightweight drift-event collection to `record_decision` and wires `_drift_events`/`_drift_compare_index` onto `TraceContext.restore` so the previously-skipped drift-emission test now passes. Closes #205 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ison fixes - Add `*` after `chosen_action` in `record_decision` to make `evidence` and remaining params keyword-only, preventing accidental positional use and protecting existing positional callers - Use clamped `event.confidence` instead of raw `confidence` in drift event_dict to match what is actually persisted - Add `action` alias alongside `chosen_action` in drift event_dict so baselines using either key are matched - Advance `_drift_compare_index` to the next decision event in the baseline (skipping non-decision events) to prevent index misalignment Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
82 tests covering TokenUsage arithmetic, FrameEvent/FrameLifetimeTrace serialization, build_frame_tree, cost breakdown, FrameCaptureContext, capture_function_call decorator, to_dict/from_dict round-trips, DivergenceType/Severity enums, DivergencePoint/SessionComparison, detect_divergences, compare_session_structures, analyze_temporal_divergence, and analyze_behavioral_divergence. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tests/test_frame_tracer_divergence.pywith 82 unit tests covering zero-coverage modulesframe_tracer.pyanddivergence_detector.pyCoverage
FrameTracer
TokenUsagearithmetic (__add__) andto_dict()ExceptionInfoconstruction and serializationFrameEventdataclass defaults andto_dict()(with/without token_usage, exception, children)FrameLifetimeTraceconstruction andto_dict()build_frame_tree()— empty, single root, parent-child, multi-rootget_frame_by_id(),get_frames_at_depth(),filter_frames_by_name()get_cost_breakdown()— duration aggregation, error counts, token sumsFrameCaptureContext— add_frame, enter/exit depth tracking, build_trace, parent-child linkingcapture_function_calldecorator — passthrough (no context), capture args/return, exception capture, nested callsto_dict/from_dictround-trips (empty, with frames, with exception)DivergenceDetector
DivergenceTypeandDivergenceSeverityenum string valuesDivergencePointdataclass andto_dict()(with/without timestamp, metadata, event IDs)SessionComparisondefaults andto_dict()detect_divergences()— both empty, session ID extraction, identical sessions, count divergences, summary keys, score bounding, one empty session, JSON serializable outputcompare_session_structures()— key presence, full similarity, empty vs empty, empty vs nonempty, event distribution countsanalyze_temporal_divergence()— empty, one empty, same timing, duration difference detection, key presenceanalyze_behavioral_divergence()— empty, decision/tool counts, same behavior, key presence, tool divergence detectionTest plan
pytest -q tests/test_frame_tracer_divergence.py→ 82 passedruff check .→ all checks passedpytest -q→ 2756 passed, 25 skipped🤖 Generated with Claude Code