-
Notifications
You must be signed in to change notification settings - Fork 7
Add comprehensive methodology verification tests for CallawaySantAnna #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add comprehensive methodology verification tests for CallawaySantAnna #105
Conversation
Create tests/test_methodology_callaway.py with 46 tests covering: - Phase 1: Equation verification (hand-calculated ATT formula match) - Phase 2: R benchmark comparison (did::att_gt() alignment) - Phase 3: Edge case tests (all REGISTRY.md documented cases) - Phase 4: SE formula verification (analytical vs bootstrap) Update METHODOLOGY_REVIEW.md to mark CallawaySantAnna as Complete with detailed verification results and documented deviations from R package. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Overall assessment: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
- Fix Rust Webb bootstrap weights to match NumPy implementation: - Correct values: ±√(3/2), ±1, ±√(1/2) (was using wrong values) - Correct probabilities: [1,2,3,3,2,1]/12 (was uniform) - Add 3 Rust unit tests for Webb weight verification - Both backends now produce variance ≈ 0.833 - Add lazy R availability fixture to avoid import-time latency: - New tests/conftest.py with session-scoped r_available fixture - Support DIFF_DIFF_R=skip environment variable - Test collection now completes in <1s (was ~2s with subprocess) - Improve test performance: - Add @pytest.mark.slow marker for thorough bootstrap tests - Reduce bootstrap iterations from 199 to 99 where sufficient - Add slow marker definition to pyproject.toml - Documentation updates: - METHODOLOGY_REVIEW.md: Correct Webb variance to 0.833 - TODO.md: Log 7 pre-existing NaN handling issues - CLAUDE.md: Document Rust test troubleshooting (PyO3 linking) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Open Questions
Overall Assessment
Executive Summary
Tests not run (review only). |
…ements - Fix Webb weights to use equal probabilities (1/6 each) matching R's did package - Python: removed weighted probs from staggered_bootstrap.py - Rust: changed to uniform selection in bootstrap.rs - Now Var(w) = 1.0 instead of 0.833, fixing ~8.5% SE underestimation - Add suppressMessages() to R benchmark tests to prevent JSON parse errors - Exclude slow tests by default via pytest addopts (-m 'not slow') - Update METHODOLOGY_REVIEW.md to reflect Webb alignment with R - Update test expectations for Webb variance (1.0 instead of 0.833) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…coverage - Add detailed Webb distribution documentation to REGISTRY.md with citation - Table of all bootstrap weight distributions (Rademacher, Mammen, Webb) - Explicit values, probabilities, and properties - Reference to Webb (2023) working paper - Fix R test path handling for cross-platform compatibility - Convert backslashes to forward slashes before embedding in R script - Use normalizePath() in R for proper path resolution - Document how to run slow tests in pyproject.toml - Add comment explaining pytest -m slow and pytest -m '' options - Update marker description for clarity Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…mparison tests - Fix Webb weights in utils.py to use equal probabilities (1/6 each), matching staggered_bootstrap.py, Rust backend, and R's did package. This gives Var(w)=1.0 for consistency across all weight distributions. - Add MPDTA-based strict R comparison tests (TestMPDTARComparison): - Export R's mpdta data to ensure identical input for Python and R - Test overall ATT, SE, and group-time effects with <1% tolerance - Validates methodology alignment with R's did::att_gt() Note: pyproject.toml, test R suppressMessages(), and METHODOLOGY_REVIEW.md were already correct from previous commits. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Address PR #105 round 4 feedback: Add verification note referencing fwildclusterboot's C++ source code as evidence that our Webb weights implementation (±√1.5, ±1, ±√0.5 with equal probabilities) matches established R packages. Also clarify why some documentation shows simplified values (±1.5, ±1, ±0.5) vs the actual sqrt values used in implementations. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment:
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Remove `-m 'not slow'` filter from pytest addopts so slow tests (like the analytical-vs-bootstrap SE convergence test) run by default. Developers can still use `pytest -m 'not slow'` for faster local runs. Addresses PR #105 feedback about CI test coverage. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add fwildclusterboot verification to Webb weights section in REGISTRY.md (authoritative methodology documentation, not just METHODOLOGY_REVIEW.md) - Update R dependency check in conftest.py to also verify jsonlite package (used by methodology tests for JSON output from R) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
Methodology references (required if estimator / math changes)
Validation
tests/test_methodology_callaway.py(new - 46 tests)Security / privacy
Generated with Claude Code