Skip to content

Conversation

@igerber
Copy link
Owner

@igerber igerber commented Jan 24, 2026

Summary

  • Fix empty cell edge case documentation: empty cells cause rank deficiency (handled per rank_deficient_action), not ValueError as previously documented
  • Mark DifferenceInDifferences requirements checklist as verified
  • Add comprehensive methodology tests for DifferenceInDifferences estimator
  • Update methodology review document with verification findings

Methodology references (required if estimator / math changes)

  • Method name(s): DifferenceInDifferences (basic 2x2 DiD)
  • Paper / source link(s): N/A - documentation correction only, no methodology changes
  • Any intentional deviations from the source (and why): None - this PR corrects the documentation to match actual (correct) R-style behavior

Validation

  • Tests added/updated: tests/test_methodology_did.py (new comprehensive methodology test suite)
  • Backtest / simulation / notebook evidence (if applicable): N/A

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

- Fix empty cell edge case documentation: empty cells cause rank deficiency
  (handled per rank_deficient_action), not ValueError as previously documented
- Mark DifferenceInDifferences requirements checklist as verified
- Add comprehensive methodology tests for DifferenceInDifferences estimator
- Update methodology review document with verification findings

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions
Copy link

Overall Assessment

  • ⚠️ Needs changes

Executive Summary

  • P1: Singleton-cluster handling is documented as “dropped with warning” and marked verified, but the implementation doesn’t drop or warn and the new tests don’t assert that behavior.
  • P3: The empty-cell test docstring still quotes the old “ValueError” behavior after the registry update.
  • No estimator code changes; the added methodology tests otherwise align with the DiD formula and R comparisons.

Methodology

  • Severity: P1 | Impact: The registry states singleton clusters are dropped with a warning, and the review now marks that edge case as verified, but the cluster-robust vcov path keeps all clusters with no warning and the new singleton test only asserts “no error.” This is a methodology/documentation mismatch that can mislead users about SE handling for singleton clusters. | Concrete fix: Either implement singleton-cluster drop + warning in diff_diff/linalg.py and assert it in tests, or update the registry and review notes to reflect the current behavior. References: docs/methodology/REGISTRY.md:63, diff_diff/linalg.py:783, tests/test_methodology_did.py:926, METHODOLOGY_REVIEW.md:67, METHODOLOGY_REVIEW.md:88

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • Severity: P3 | Impact: The empty-cell test docstring still quotes the old registry behavior (“raise ValueError”) after the registry change, which can confuse maintenance and future reviewers. | Concrete fix: Update the docstring to reflect rank-deficiency handling or remove the outdated quote. Reference: tests/test_methodology_did.py:894

- Update REGISTRY.md singleton cluster description to match actual
  implementation (included in variance, not dropped with warning)
- Remove outdated REGISTRY.md quote from empty-cell test docstring
- Update METHODOLOGY_REVIEW.md to reflect REGISTRY.md correction

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Jan 24, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: 7713bb2ee53fb2aaeacf3bd75123ae80977b93f6


Overall assessment: ⚠️ Needs changes

Executive Summary

  • REGISTRY’s singleton-cluster variance note conflicts with the implemented cluster‑robust formula for DifferenceInDifferences.
  • The new absorb R² test compares within‑R² to overall R²; this expectation is not valid and may be flaky.
  • significance_stars test omits the “.” case even though the API can return it.
  • R benchmark tests spin up multiple R processes; if R is installed, this can slow CI noticeably.

Methodology

  • P1 Impact: The REGISTRY now states singleton clusters “contribute 0 to cluster variance,” but the implementation aggregates cluster scores for all clusters (including singletons), so the contribution is generally non‑zero; this misstates the variance behavior for DifferenceInDifferences. Fix: update the REGISTRY text to describe the current sandwich estimator (singletons included with no special handling), or change the computation if you intend to drop or zero‑weight singletons. docs/methodology/REGISTRY.md:L58-L64, diff_diff/linalg.py:L790-L813

Code Quality

  • No issues found.

Performance

  • P3 Impact: Each R comparison test launches a fresh Rscript, which can add significant wall time when R is available. Fix: cache R results in a session-scoped fixture or consolidate into a single R invocation per dataset. tests/test_methodology_did.py:L373-L643

Maintainability

  • No issues found.

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • P2 Impact: test_absorb_improves_r_squared compares R² from a within‑transformed model (absorb) against the untransformed model; these are not comparable metrics and the assertion can be misleading or flaky. Fix: compare like‑for‑like (e.g., compute within‑R² for both) or drop the monotonicity expectation and just assert r_squared is finite. tests/test_methodology_did.py:L1243-L1266, diff_diff/estimators.py:L231-L316
  • P3 Impact: test_significance_stars_property disallows “.” even though the public API returns it for 0.05 < p < 0.1; this could fail if the test data changes. Fix: include “.” in the accepted set or compare against _get_significance_stars output. tests/test_methodology_did.py:L1512-L1519, diff_diff/results.py:L183-L199

Verifies REGISTRY.md documented behavior: "Singleton clusters (one
observation): included in variance estimation; contribute to meat
matrix via (residual² × X'X), same as larger clusters"

Addresses PR #106 round 2 feedback: closes consistency loop between
implementation, documentation, and tests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Jan 24, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: 820dd783cfbacf2cbc23436d2603491f06314a2e


Methodology

  • P1 — Impact: DifferenceInDifferences registry now says singleton clusters contribute 0 to cluster-robust variance, but the implementation sums per-cluster scores so singletons contribute non-zero terms; this misdocuments variance/SE behavior and conflicts with the new singleton-cluster test. Fix: update the registry wording to match the sandwich estimator contribution for singleton clusters (or change the implementation to drop/zero them if that is intended). refs: docs/methodology/REGISTRY.md:63, diff_diff/linalg.py:803, tests/test_linalg.py:451

Code Quality

  • No issues found.

Performance

  • No issues found.

Maintainability

  • No issues found.

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • No issues found.

Overall Assessment

  • ⚠️ Needs changes

Executive Summary

  • Singleton-cluster variance behavior is misdocumented versus implementation/tests.
  • No estimator logic changes; updates are documentation and DiD methodology tests.
  • R benchmarks remain optional and gated by fixest availability.

igerber and others added 2 commits January 24, 2026 16:46
REGISTRY.md incorrectly stated singleton clusters "contribute 0 to
cluster variance component". The implementation (linalg.py:803-813)
actually includes singletons via the standard sandwich formula:
(residual² × X'X), same as larger clusters.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Updates METHODOLOGY_REVIEW.md:
- Sync edge case description with corrected REGISTRY.md singleton cluster
  documentation

Updates tests/test_methodology_did.py:
- Fix R² comparison test: Replace test_absorb_improves_r_squared with
  test_absorb_produces_valid_results since within-transformed R² is not
  comparable to untransformed R² (different variance scales)
- Add "." to significance_stars assertion (p < 0.1 significance level)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Jan 24, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: ab5add0dae081a0a1066f5bd0383593c0a851408


Overall Assessment: ✅ Looks good

Executive Summary

  • Registry updates for DiD edge cases align with current rank‑deficiency handling and cluster‑robust computation; no methodology deviations detected.
  • The new DiD methodology suite is broad (ATT formula, R benchmarks, edge cases, FE equivalence).
  • Performance risk: repeated Rscript invocations and 99–199 bootstrap iterations per test are unmarked slow and may lengthen CI when R is available.
  • Minor doc clarity: singleton cluster variance wording could be more explicit about per‑cluster score outer products.

Methodology

  • None found; updates in docs/methodology/REGISTRY.md match current behavior.

Code Quality

  • None found.

Performance

  • P2 | Impact: Multiple Rscript calls per dataset plus repeated wild‑bootstrap runs can materially increase CI time when R is installed. | Fix: cache R results with session‑scoped fixtures and/or mark these tests @pytest.mark.slow (or reduce bootstrap iterations for tests). (tests/test_methodology_did.py:L373-L653, tests/test_methodology_did.py:L787-L883)

Maintainability

  • None found.

Tech Debt

  • None found.

Security

  • None found.

Documentation/Tests

  • P3 | Impact: The singleton‑cluster description could be misread as using global X'X instead of per‑cluster score outer products, which may confuse readers. | Fix: clarify the formula to u_i^2 X_i X_i' (or score_g score_g' for n_g=1). (docs/methodology/REGISTRY.md:L58-L64)

Use unambiguous notation u_i² X_i X_i' instead of (residual² × X'X)
to make clear this is the per-observation outer product, not the
global X'X matrix.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber igerber merged commit 0e23f3e into main Jan 24, 2026
4 checks passed
@igerber igerber deleted the docs/registry-did-verification branch January 24, 2026 22:32
@igerber igerber mentioned this pull request Jan 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants