Update REGISTRY.md with verified DifferenceInDifferences behavior #106

igerber · 2026-01-24T20:24:10Z

Summary

Fix empty cell edge case documentation: empty cells cause rank deficiency (handled per rank_deficient_action), not ValueError as previously documented
Mark DifferenceInDifferences requirements checklist as verified
Add comprehensive methodology tests for DifferenceInDifferences estimator
Update methodology review document with verification findings

Methodology references (required if estimator / math changes)

Method name(s): DifferenceInDifferences (basic 2x2 DiD)
Paper / source link(s): N/A - documentation correction only, no methodology changes
Any intentional deviations from the source (and why): None - this PR corrects the documentation to match actual (correct) R-style behavior

Validation

Tests added/updated: tests/test_methodology_did.py (new comprehensive methodology test suite)
Backtest / simulation / notebook evidence (if applicable): N/A

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

- Fix empty cell edge case documentation: empty cells cause rank deficiency (handled per rank_deficient_action), not ValueError as previously documented - Mark DifferenceInDifferences requirements checklist as verified - Add comprehensive methodology tests for DifferenceInDifferences estimator - Update methodology review document with verification findings Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-actions · 2026-01-24T20:36:38Z

Overall Assessment

⚠️ Needs changes

Executive Summary

P1: Singleton-cluster handling is documented as “dropped with warning” and marked verified, but the implementation doesn’t drop or warn and the new tests don’t assert that behavior.
P3: The empty-cell test docstring still quotes the old “ValueError” behavior after the registry update.
No estimator code changes; the added methodology tests otherwise align with the DiD formula and R comparisons.

Methodology

Severity: P1 | Impact: The registry states singleton clusters are dropped with a warning, and the review now marks that edge case as verified, but the cluster-robust vcov path keeps all clusters with no warning and the new singleton test only asserts “no error.” This is a methodology/documentation mismatch that can mislead users about SE handling for singleton clusters. | Concrete fix: Either implement singleton-cluster drop + warning in diff_diff/linalg.py and assert it in tests, or update the registry and review notes to reflect the current behavior. References: docs/methodology/REGISTRY.md:63, diff_diff/linalg.py:783, tests/test_methodology_did.py:926, METHODOLOGY_REVIEW.md:67, METHODOLOGY_REVIEW.md:88

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Severity: P3 | Impact: The empty-cell test docstring still quotes the old registry behavior (“raise ValueError”) after the registry change, which can confuse maintenance and future reviewers. | Concrete fix: Update the docstring to reflect rank-deficiency handling or remove the outdated quote. Reference: tests/test_methodology_did.py:894

- Update REGISTRY.md singleton cluster description to match actual implementation (included in variance, not dropped with warning) - Remove outdated REGISTRY.md quote from empty-cell test docstring - Update METHODOLOGY_REVIEW.md to reflect REGISTRY.md correction Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-01-24T21:08:25Z

/ai-review

github-actions · 2026-01-24T21:17:36Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 7713bb2ee53fb2aaeacf3bd75123ae80977b93f6

Overall assessment: ⚠️ Needs changes

Executive Summary

REGISTRY’s singleton-cluster variance note conflicts with the implemented cluster‑robust formula for DifferenceInDifferences.
The new absorb R² test compares within‑R² to overall R²; this expectation is not valid and may be flaky.
significance_stars test omits the “.” case even though the API can return it.
R benchmark tests spin up multiple R processes; if R is installed, this can slow CI noticeably.

Methodology

P1 Impact: The REGISTRY now states singleton clusters “contribute 0 to cluster variance,” but the implementation aggregates cluster scores for all clusters (including singletons), so the contribution is generally non‑zero; this misstates the variance behavior for DifferenceInDifferences. Fix: update the REGISTRY text to describe the current sandwich estimator (singletons included with no special handling), or change the computation if you intend to drop or zero‑weight singletons. docs/methodology/REGISTRY.md:L58-L64, diff_diff/linalg.py:L790-L813

Code Quality

No issues found.

Performance

P3 Impact: Each R comparison test launches a fresh Rscript, which can add significant wall time when R is available. Fix: cache R results in a session-scoped fixture or consolidate into a single R invocation per dataset. tests/test_methodology_did.py:L373-L643

Maintainability

No issues found.

Tech Debt

No issues found.

Security

No issues found.

Documentation/Tests

P2 Impact: test_absorb_improves_r_squared compares R² from a within‑transformed model (absorb) against the untransformed model; these are not comparable metrics and the assertion can be misleading or flaky. Fix: compare like‑for‑like (e.g., compute within‑R² for both) or drop the monotonicity expectation and just assert r_squared is finite. tests/test_methodology_did.py:L1243-L1266, diff_diff/estimators.py:L231-L316
P3 Impact: test_significance_stars_property disallows “.” even though the public API returns it for 0.05 < p < 0.1; this could fail if the test data changes. Fix: include “.” in the accepted set or compare against _get_significance_stars output. tests/test_methodology_did.py:L1512-L1519, diff_diff/results.py:L183-L199

Verifies REGISTRY.md documented behavior: "Singleton clusters (one observation): included in variance estimation; contribute to meat matrix via (residual² × X'X), same as larger clusters" Addresses PR #106 round 2 feedback: closes consistency loop between implementation, documentation, and tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-01-24T21:32:25Z

/ai-review

github-actions · 2026-01-24T21:40:40Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 820dd783cfbacf2cbc23436d2603491f06314a2e

Methodology

P1 — Impact: DifferenceInDifferences registry now says singleton clusters contribute 0 to cluster-robust variance, but the implementation sums per-cluster scores so singletons contribute non-zero terms; this misdocuments variance/SE behavior and conflicts with the new singleton-cluster test. Fix: update the registry wording to match the sandwich estimator contribution for singleton clusters (or change the implementation to drop/zero them if that is intended). refs: docs/methodology/REGISTRY.md:63, diff_diff/linalg.py:803, tests/test_linalg.py:451

Code Quality

No issues found.

Performance

No issues found.

Maintainability

No issues found.

Tech Debt

No issues found.

Security

No issues found.

Documentation/Tests

No issues found.

Overall Assessment

⚠️ Needs changes

Executive Summary

Singleton-cluster variance behavior is misdocumented versus implementation/tests.
No estimator logic changes; updates are documentation and DiD methodology tests.
R benchmarks remain optional and gated by fixest availability.

REGISTRY.md incorrectly stated singleton clusters "contribute 0 to cluster variance component". The implementation (linalg.py:803-813) actually includes singletons via the standard sandwich formula: (residual² × X'X), same as larger clusters. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Updates METHODOLOGY_REVIEW.md: - Sync edge case description with corrected REGISTRY.md singleton cluster documentation Updates tests/test_methodology_did.py: - Fix R² comparison test: Replace test_absorb_improves_r_squared with test_absorb_produces_valid_results since within-transformed R² is not comparable to untransformed R² (different variance scales) - Add "." to significance_stars assertion (p < 0.1 significance level) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-01-24T21:48:28Z

/ai-review

github-actions · 2026-01-24T21:59:22Z

🔁 AI review rerun (requested by @igerber)

Head SHA: ab5add0dae081a0a1066f5bd0383593c0a851408

Overall Assessment: ✅ Looks good

Executive Summary

Registry updates for DiD edge cases align with current rank‑deficiency handling and cluster‑robust computation; no methodology deviations detected.
The new DiD methodology suite is broad (ATT formula, R benchmarks, edge cases, FE equivalence).
Performance risk: repeated Rscript invocations and 99–199 bootstrap iterations per test are unmarked slow and may lengthen CI when R is available.
Minor doc clarity: singleton cluster variance wording could be more explicit about per‑cluster score outer products.

Methodology

None found; updates in docs/methodology/REGISTRY.md match current behavior.

Code Quality

None found.

Performance

P2 | Impact: Multiple Rscript calls per dataset plus repeated wild‑bootstrap runs can materially increase CI time when R is installed. | Fix: cache R results with session‑scoped fixtures and/or mark these tests @pytest.mark.slow (or reduce bootstrap iterations for tests). (tests/test_methodology_did.py:L373-L653, tests/test_methodology_did.py:L787-L883)

Maintainability

None found.

Tech Debt

None found.

Security

None found.

Documentation/Tests

P3 | Impact: The singleton‑cluster description could be misread as using global X'X instead of per‑cluster score outer products, which may confuse readers. | Fix: clarify the formula to u_i^2 X_i X_i' (or score_g score_g' for n_g=1). (docs/methodology/REGISTRY.md:L58-L64)

Use unambiguous notation u_i² X_i X_i' instead of (residual² × X'X) to make clear this is the per-observation outer product, not the global X'X matrix. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber and others added 2 commits January 24, 2026 16:46

Clarify singleton cluster formula notation in REGISTRY.md

11c664e

Use unambiguous notation u_i² X_i X_i' instead of (residual² × X'X) to make clear this is the per-observation outer product, not the global X'X matrix. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber merged commit 0e23f3e into main Jan 24, 2026
4 checks passed

igerber deleted the docs/registry-did-verification branch January 24, 2026 22:32

igerber mentioned this pull request Jan 24, 2026

Bump version to 2.1.6 #107

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update REGISTRY.md with verified DifferenceInDifferences behavior #106

Update REGISTRY.md with verified DifferenceInDifferences behavior #106

igerber commented Jan 24, 2026

Uh oh!

github-actions bot commented Jan 24, 2026

Uh oh!

igerber commented Jan 24, 2026

Uh oh!

github-actions bot commented Jan 24, 2026

Uh oh!

igerber commented Jan 24, 2026

Uh oh!

github-actions bot commented Jan 24, 2026

Uh oh!

igerber commented Jan 24, 2026

Uh oh!

github-actions bot commented Jan 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update REGISTRY.md with verified DifferenceInDifferences behavior #106

Update REGISTRY.md with verified DifferenceInDifferences behavior #106

Conversation

igerber commented Jan 24, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions bot commented Jan 24, 2026

Uh oh!

igerber commented Jan 24, 2026

Uh oh!

github-actions bot commented Jan 24, 2026

Uh oh!

igerber commented Jan 24, 2026

Uh oh!

github-actions bot commented Jan 24, 2026

Uh oh!

igerber commented Jan 24, 2026

Uh oh!

github-actions bot commented Jan 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants