Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 37 additions & 5 deletions METHODOLOGY_REVIEW.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Each estimator in diff-diff should be periodically reviewed to ensure:

| Estimator | Module | R Reference | Status | Last Review |
|-----------|--------|-------------|--------|-------------|
| DifferenceInDifferences | `estimators.py` | `fixest::feols()` | Not Started | - |
| DifferenceInDifferences | `estimators.py` | `fixest::feols()` | **Complete** | 2026-01-24 |
| MultiPeriodDiD | `estimators.py` | `fixest::feols()` | Not Started | - |
| TwoWayFixedEffects | `twfe.py` | `fixest::feols()` | Not Started | - |
| CallawaySantAnna | `staggered.py` | `did::att_gt()` | **Complete** | 2026-01-24 |
Expand Down Expand Up @@ -51,14 +51,46 @@ Each estimator in diff-diff should be periodically reviewed to ensure:
| Module | `estimators.py` |
| Primary Reference | Wooldridge (2010), Angrist & Pischke (2009) |
| R Reference | `fixest::feols()` |
| Status | Not Started |
| Last Review | - |
| Status | **Complete** |
| Last Review | 2026-01-24 |

**Verified Components:**
- [x] ATT formula: Double-difference of cell means matches regression interaction coefficient
- [x] R comparison: ATT matches `fixest::feols()` within 1e-3 tolerance
- [x] R comparison: SE (HC1 robust) matches within 5%
- [x] R comparison: P-value matches within 0.01
- [x] R comparison: Confidence intervals overlap
- [x] R comparison: Cluster-robust SE matches within 10%
- [x] R comparison: Fixed effects (absorb) matches `feols(...|unit)` within 1%
- [x] Wild bootstrap inference (Rademacher, Mammen, Webb weights)
- [x] Formula interface (`y ~ treated * post`)
- [x] All REGISTRY.md edge cases tested

**Test Coverage:**
- 53 methodology verification tests in `tests/test_methodology_did.py`
- 123 existing tests in `tests/test_estimators.py`
- R benchmark tests (skip if R not available)

**R Comparison Results:**
- ATT matches within 1e-3 (R JSON truncation limits precision)
- HC1 SE matches within 5%
- Cluster-robust SE matches within 10%
- Fixed effects results match within 1%

**Corrections Made:**
- (None yet)
- (None - implementation verified correct)

**Outstanding Concerns:**
- (None yet)
- R comparison precision limited by JSON output truncation (4 decimal places)
- Consider improving R script to output full precision for tighter tolerances

**Edge Cases Verified:**
1. Empty cells: Produces rank deficiency warning (expected behavior)
2. Singleton clusters: Included in variance estimation, contribute via residuals (corrected REGISTRY.md)
3. Rank deficiency: All three modes (warn/error/silent) working
4. Non-binary treatment/time: Raises ValueError as expected
5. No variation in treatment/time: Raises ValueError as expected
6. Missing values: Raises ValueError as expected

---

Expand Down
15 changes: 9 additions & 6 deletions docs/methodology/REGISTRY.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,11 @@ where τ is the ATT.
- Optional: Wild cluster bootstrap for small number of clusters

*Edge cases:*
- Empty cells (e.g., no treated-pre observations) raise ValueError
- Singleton groups in clustering are dropped with warning
- Empty cells (e.g., no treated-pre observations) cause rank deficiency, handled per `rank_deficient_action` setting
- With "warn" (default): emits warning, sets NaN for affected coefficients
- With "error": raises ValueError
- With "silent": continues silently with NaN coefficients
- Singleton clusters (one observation): included in variance estimation; contribute to meat matrix via u_i² X_i X_i' (same formula as larger clusters with n_g=1)
- Rank-deficient design matrix (collinearity): warns and sets NA for dropped coefficients (R-style, matches `lm()`)
- Tolerance: `1e-07` (matches R's `qr()` default), relative to largest diagonal element of R in QR decomposition
- Controllable via `rank_deficient_action` parameter: "warn" (default), "error", or "silent"
Expand All @@ -67,10 +70,10 @@ where τ is the ATT.
- Stata: `reghdfe` or manual regression with interaction

**Requirements checklist:**
- [ ] Treatment and time indicators are binary 0/1 with variation
- [ ] ATT equals coefficient on interaction term
- [ ] Wild bootstrap supports Rademacher, Mammen, Webb weight distributions
- [ ] Formula interface parses `y ~ treated * post` correctly
- [x] Treatment and time indicators are binary 0/1 with variation
- [x] ATT equals coefficient on interaction term
- [x] Wild bootstrap supports Rademacher, Mammen, Webb weight distributions
- [x] Formula interface parses `y ~ treated * post` correctly

---

Expand Down
44 changes: 44 additions & 0 deletions tests/test_linalg.py
Original file line number Diff line number Diff line change
Expand Up @@ -420,6 +420,50 @@ def test_single_cluster_error(self):
with pytest.raises(ValueError, match="at least 2 clusters"):
solve_ols(X, y, cluster_ids=cluster_ids)

def test_singleton_clusters_included_in_variance(self):
"""Test that singleton clusters contribute to variance estimation.

REGISTRY.md documents: "Singleton clusters (one observation): included
in variance estimation; contribute to meat matrix via (residual² × X'X),
same as larger clusters"

This test verifies that behavior.
"""
np.random.seed(42)
n = 100
X = np.column_stack([np.ones(n), np.random.randn(n)])
y = X @ np.array([1.0, 2.0]) + np.random.randn(n)

# Create clusters: one large cluster (50 obs), 50 singleton clusters
# Total: 51 clusters, 50 of which are singletons
cluster_ids = np.concatenate([
np.zeros(50), # Large cluster (id=0)
np.arange(1, 51) # 50 singleton clusters (ids 1-50)
])

coef, resid, vcov = solve_ols(X, y, cluster_ids=cluster_ids)

# Basic validity checks
assert vcov.shape == (2, 2), "VCoV should be 2x2"
assert np.all(np.isfinite(vcov)), "VCoV should have finite values"
assert np.allclose(vcov, vcov.T), "VCoV should be symmetric"

# Variance should be positive (singletons contribute, not zero)
assert vcov[0, 0] > 0, "Intercept variance should be positive"
assert vcov[1, 1] > 0, "Slope variance should be positive"

# Compare to case without singletons (only large clusters)
# With fewer clusters, variance should be DIFFERENT (not necessarily larger)
cluster_ids_no_singletons = np.concatenate([
np.zeros(50), # Cluster 0
np.ones(50) # Cluster 1
])
_, _, vcov_no_singletons = solve_ols(X, y, cluster_ids=cluster_ids_no_singletons)

# The two variance estimates should differ (singletons change the calculation)
assert not np.allclose(vcov, vcov_no_singletons), \
"Singleton clusters should affect variance estimation"


class TestComputeRobustVcov:
"""Tests for compute_robust_vcov function."""
Expand Down
Loading