From 45c7b22f811fdf5f0f237c240dcc6d9d3f09ff90 Mon Sep 17 00:00:00 2001 From: igerber Date: Tue, 23 Jun 2026 09:16:13 -0400 Subject: [PATCH 1/3] fix: route TripleDifference power to panel DGP when n_periods > 2 simulate_power/simulate_mde/simulate_sample_size silently ignored n_periods for DDD because the _EstimatorProfile hard-coded the cross-sectional generate_ddd_data. Route to generate_ddd_panel_data when n_periods > 2, honoring n_periods/ treatment_period and sizing the panel by n_units directly. simulate_sample_size switches from the multiple-of-8 grid to a continuous step-1 search on the panel path. Emit a UserWarning when the estimator lacks cluster="unit" (panel SEs are anti-conservative and overstate power); reject the cross-sectional-only n_per_cell key with a clear ValueError. treatment_fraction stays inert (balanced 2x2x2). Docs: REGISTRY.md PowerAnalysis notes + tutorial 06 + CHANGELOG [Unreleased]; removes the resolved TODO row. Co-Authored-By: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 12 ++ TODO.md | 1 - diff_diff/power.py | 159 +++++++++++++++++++++++-- docs/methodology/REGISTRY.md | 5 +- docs/tutorials/06_power_analysis.ipynb | 6 +- tests/test_power.py | 130 ++++++++++++++++++-- 6 files changed, 288 insertions(+), 25 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 69984936..0c61e49b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Fixed - **`LinearRegression.get_se()` / `get_inference()` no longer return a `NaN` standard error from a tiny-negative variance artifact.** A high-leverage / degenerate coefficient (e.g. an absorbed-FE dummy near-collinear with the treatment, whose Bell-McCaffrey Satterthwaite DOF already hits the noise-floor guard) can have a CR2/HC variance of ~0 (≈1e-32) whose vcov diagonal lands just-below-zero under BLAS-dependent float rounding; `np.sqrt` of the negative then produced a `NaN` SE **nondeterministically** — passing single-threaded but failing under the parallel pure-Python full-suite run (`tests/test_methodology_wls_cr2.py::TestLinearRegressionFENanGuardEndToEnd::test_did_absorbed_fe_lr_inference_nan_for_guarded_coefs`). Both SE sites now clamp the vcov diagonal at 0, so the SE is finite (0 for a genuinely-zero variance), deterministic, and BLAS-independent. **No change for any positive variance** (the clamp is a no-op there); only the previously-`NaN` degenerate case is affected. +- **`TripleDifference` power analysis now honors `n_periods > 2`.** `simulate_power`, + `simulate_mde`, and `simulate_sample_size` previously routed DDD to the + cross-sectional 2×2×2 `generate_ddd_data` regardless of `n_periods` (emitting an + "n_periods ignored" warning). They now route to the panel DGP + `generate_ddd_panel_data` when `n_periods > 2`, honoring `n_periods`/`treatment_period` + and sizing the panel by `n_units` directly (the sample-size search switches from the + multiple-of-8 grid to a continuous step-1 search). Because `simulate_power` defaults + to `n_periods=4`, the default DDD power call now uses the panel DGP. The panel DGP has + within-unit serial correlation, so construct the estimator as + `TripleDifference(cluster="unit")` for valid power — a `UserWarning` fires otherwise. + `treatment_fraction` remains inert (balanced 2×2×2); pass `group_frac`/`partition_frac` + via `data_generator_kwargs`. See `docs/methodology/REGISTRY.md` §PowerAnalysis. ## [3.5.2] - 2026-06-08 diff --git a/TODO.md b/TODO.md index 045f6fe1..c1546529 100644 --- a/TODO.md +++ b/TODO.md @@ -93,7 +93,6 @@ Deferred items from PR reviews that were not addressed before merge. | Survey sandwich SE is not exactly invariant to zero-weight (subpopulation / padded) rows: the shared `_compute_stratified_psu_meat` finite-sample correction counts zero-weight units as PSUs (an `n_psu/(n_psu-1)`-style factor), so adding zero-weight rows shifts the SE by a second-order amount (~2e-4 relative in the EfficientDiD e2e). The point estimate is exactly invariant and the weighted scores of zero-weight rows are already zero — only the DOF correction's PSU count includes them. Cross-cutting across all survey-enabled estimators; fix by counting only positive-weight PSUs in the correction. | `survey.py` (`_compute_stratified_psu_meat`) | PR-B follow-up | Low | | ImputationDiD: leave-one-out (LOO) conservative-variance refinement (BJS 2024 Supplementary Appendix A.9) not implemented — a finite-sample improvement to the auxiliary-model residuals that reduces overfitting of `tau_tilde_g` to `epsilon`. The asymptotic Theorem-3 variance is implemented and matches R `didimputation` (which also omits LOO by default). | `imputation.py` | imputation-validation follow-up | Low | | TROP: extend Wave 4's `_setup_trop_data` helper to also cover the duplicated bootstrap resampling loop in `_bootstrap_variance` / `_bootstrap_variance_global` (~40 LoC dedup; mirrors the data-setup helper pattern with a `fit_callable` parameter for the per-draw refit step). | `trop_local.py`, `trop_global.py` | follow-up | Low | -| TripleDifference power auto-routing: `power.simulate_power` ignores `n_periods` for DDD because `_ddd_dgp_kwargs` is hard-coded to the cross-sectional `generate_ddd_data`. Now that `generate_ddd_panel_data` exists (Wave 4), add a new `_EstimatorProfile` registry entry (or extend the existing one) to route to the panel DGP when `n_periods > 2`. | `power.py`, `prep_dgp.py` | follow-up | Low | | StaggeredTripleDifference R cross-validation: CSV fixtures not committed (gitignored); tests skip without local R + triplediff. Commit fixtures or generate deterministically. | `tests/test_methodology_staggered_triple_diff.py` | #245 | Medium | | StaggeredTripleDifference R parity: benchmark only tests no-covariate path (xformla=~1). Add covariate-adjusted scenarios and aggregation SE parity assertions. | `benchmarks/R/benchmark_staggered_triplediff.R` | #245 | Medium | | StaggeredTripleDifference: per-cohort group-effect SEs include WIF (conservative vs R's wif=NULL). Documented in REGISTRY. Could override mixin for exact R match. | `staggered_triple_diff.py` | #245 | Low | diff --git a/diff_diff/power.py b/diff_diff/power.py index 84d098a8..dd85d130 100644 --- a/diff_diff/power.py +++ b/diff_diff/power.py @@ -267,6 +267,29 @@ def _ddd_dgp_kwargs( ) +def _ddd_panel_dgp_kwargs( + n_units: int, + n_periods: int, + treatment_effect: float, + treatment_fraction: float, + treatment_period: int, + sigma: float, +) -> Dict[str, Any]: + # Panel DDD DGP (n_periods > 2). `n_units` maps directly (no 8-cell //8 + # rounding). `treatment_fraction` is intentionally NOT mapped — DDD is a + # balanced factorial, so group_frac/partition_frac are left at the DGP + # default 0.5. Omitting group_frac/partition_frac here also keeps them out + # of the _PROTECTED_DGP_KEYS collision check, so a user can still override + # the group/partition split via data_generator_kwargs. + return dict( + n_units=n_units, + n_periods=n_periods, + treatment_period=treatment_period, + treatment_effect=treatment_effect, + noise_sd=sigma, + ) + + # -- Fit kwargs builders ------------------------------------------------------ @@ -320,6 +343,19 @@ def _ddd_fit_kwargs( return dict(outcome="outcome", group="group", partition="partition", time="time") +def _ddd_panel_fit_kwargs( + data: pd.DataFrame, + n_units: int, + n_periods: int, + treatment_period: int, +) -> Dict[str, Any]: + # Panel DDD: time="post" is generate_ddd_panel_data's derived binary + # pre/post indicator (vs the cross-sectional "time"). Clustering is NOT a + # fit kwarg — it resolves from the estimator's cluster="unit" attribute + # against the DGP's "unit" column. + return dict(outcome="outcome", group="group", partition="partition", time="post") + + def _trop_fit_kwargs( data: pd.DataFrame, n_units: int, @@ -686,6 +722,55 @@ def _check_ddd_dgp_compat( ) +def _check_ddd_panel_dgp_compat( + estimator: Any, + treatment_fraction: float, + data_generator_kwargs: Optional[Dict[str, Any]], +) -> None: + """Compat checks for the panel DDD power path (``n_periods > 2``). + + Unlike the cross-sectional ``_check_ddd_dgp_compat``, ``n_periods`` and + ``treatment_period`` are honored here (no warning). ``treatment_fraction`` + is still inert (the panel DGP is a balanced 2×2×2). The key addition is the + clustering caveat: ``generate_ddd_panel_data`` has within-unit serial + correlation, so unclustered SEs overstate power. + """ + overrides = data_generator_kwargs or {} + if "n_per_cell" in overrides: + raise ValueError( + "data_generator_kwargs contains 'n_per_cell', a cross-sectional " + "generate_ddd_data parameter. The panel DDD power path " + "(n_periods > 2) uses generate_ddd_panel_data, which sizes the panel " + "by n_units directly. Control the design via n_units and the " + "group_frac / partition_frac data_generator_kwargs instead." + ) + + if treatment_fraction != 0.5: + warnings.warn( + f"treatment_fraction={treatment_fraction} is ignored for " + f"TripleDifference power: generate_ddd_panel_data uses a balanced " + f"2×2×2 design (group_frac=partition_frac=0.5). Pass group_frac / " + f"partition_frac via data_generator_kwargs to vary the split.", + UserWarning, + stacklevel=2, + ) + + # The panel DGP hard-names its unit column "unit", and TripleDifference + # resolves clustering from `self.cluster` against a same-named data column, + # so on this auto-DGP path the only correct value is the literal "unit" — + # this is NOT a general clustering check. + if getattr(estimator, "cluster", None) != "unit": + warnings.warn( + "TripleDifference power on the panel DGP (n_periods > 2) has " + "within-unit serial correlation, so unclustered standard errors are " + "anti-conservative and overstate power. Construct the estimator as " + 'TripleDifference(cluster="unit") so the reported power reflects ' + "cluster-robust (Liang-Zeger CR1) inference.", + UserWarning, + stacklevel=2, + ) + + def _check_sdid_placebo_data( data: pd.DataFrame, estimator: Any, @@ -833,6 +918,28 @@ def _get_registry() -> Dict[str, _EstimatorProfile]: return _ESTIMATOR_REGISTRY +def _ddd_panel_profile() -> "_EstimatorProfile": + """Profile for panel DDD power simulations (n_periods > 2). + + Routes TripleDifference power analysis to ``generate_ddd_panel_data`` + (which honors ``n_periods``/``treatment_period``) instead of the + cross-sectional 2x2x2 ``generate_ddd_data``. See ``simulate_power``. + """ + from diff_diff.prep import generate_ddd_panel_data + + # min_n=16 is intentionally lower than the cross-sectional DDD min_n=64 + # (which counts 8 cells x n_per_cell); the panel DGP maps n_units directly + # and only requires n_units >= 4, so 16 gives >=1 unit per (group,partition) + # cell with margin. + return _EstimatorProfile( + default_dgp=generate_ddd_panel_data, + dgp_kwargs_builder=_ddd_panel_dgp_kwargs, + fit_kwargs_builder=_ddd_panel_fit_kwargs, + result_extractor=_extract_simple, + min_n=16, + ) + + @dataclass class PowerResults: """ @@ -2007,6 +2114,21 @@ def simulate_power( use_custom_dgp = data_generator is not None use_survey_dgp = survey_config is not None + # Route DDD power to the panel DGP when n_periods > 2. The cross-sectional + # 2x2x2 generate_ddd_data ignores n_periods; generate_ddd_panel_data honors + # it. Swapping the profile here (before the collision check and the DDD + # compat warnings) makes every downstream consumer (dgp_kwargs_builder, + # default_dgp, fit_kwargs_builder, result_extractor, min_n) use the panel + # variant automatically. + use_ddd_panel = ( + estimator_name == "TripleDifference" + and n_periods > 2 + and not use_custom_dgp + and not use_survey_dgp + ) + if use_ddd_panel: + profile = _ddd_panel_profile() + # --- Survey config validation --- if use_survey_dgp: assert survey_config is not None # for type narrowing @@ -2161,7 +2283,13 @@ def simulate_power( ) # Warn if DDD design inputs are silently ignored - if estimator_name == "TripleDifference" and not use_custom_dgp: + if use_ddd_panel: + # Panel path honors n_periods/treatment_period; n_units maps directly + # (no //8 rounding). Different compat surface than the cross-sectional + # 2x2x2 design. + _check_ddd_panel_dgp_compat(estimator, treatment_fraction, data_generator_kwargs) + effective_n_units = None + elif estimator_name == "TripleDifference" and not use_custom_dgp: _check_ddd_dgp_compat( n_units, n_periods, @@ -2707,8 +2835,10 @@ def simulate_mde( estimator_name = type(estimator).__name__ search_path: List[Dict[str, float]] = [] - # Compute effective N for DDD (N is fixed throughout MDE search) - if estimator_name == "TripleDifference" and data_generator is None: + # Compute effective N for DDD (N is fixed throughout MDE search). Only the + # cross-sectional 2x2x2 DGP (n_periods <= 2) rounds n_units to a multiple of + # 8; the panel DGP (n_periods > 2) maps n_units directly, so report None. + if estimator_name == "TripleDifference" and data_generator is None and n_periods <= 2: effective_n_units = _ddd_effective_n(n_units, data_generator_kwargs) else: effective_n_units = None @@ -2925,15 +3055,24 @@ def simulate_sample_size( estimator_name = type(estimator).__name__ search_path: List[Dict[str, float]] = [] - # Determine min_n from registry + # Determine min_n from registry. DDD splits cross-sectional (n_periods <= 2, + # 2x2x2 factorial) from panel (n_periods > 2, generate_ddd_panel_data). registry = _get_registry() profile = registry.get(estimator_name) - min_n = profile.min_n if profile is not None else 20 + is_ddd = estimator_name == "TripleDifference" and data_generator is None + is_ddd_panel = is_ddd and n_periods > 2 + if is_ddd_panel: + min_n = _ddd_panel_profile().min_n + else: + min_n = profile.min_n if profile is not None else 20 - # DDD grid snapping: bisection candidates must be multiples of 8 - is_ddd_grid = estimator_name == "TripleDifference" and data_generator is None + # Grid snapping: the cross-sectional 2x2x2 DDD DGP rounds n_units to a + # multiple of 8, so bisection candidates must snap to that grid. The panel + # DGP maps n_units directly → continuous (step-1) search like every other + # estimator. + is_ddd_grid = is_ddd and not is_ddd_panel grid_step = 8 if is_ddd_grid else 1 - convergence_threshold = grid_step + 1 # 9 for DDD, 2 for others + convergence_threshold = grid_step + 1 # 9 for cross-sectional DDD, 2 otherwise if is_ddd_grid and data_generator_kwargs and "n_per_cell" in data_generator_kwargs: raise ValueError( @@ -2991,7 +3130,9 @@ def _power_at_n(n: int) -> float: ) # --- Bracket --- - abs_min = 16 if is_ddd_grid else 4 + # Both DDD paths (cross-sectional and panel) want a >=16 floor so the 8 + # G×P×T cells are populated with margin; everything else floors at 4. + abs_min = 16 if is_ddd else 4 if survey_config is not None: abs_min = max(abs_min, survey_config.min_viable_n) if n_range is not None: diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index ce725563..afb2eeea 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -3369,9 +3369,10 @@ where δ is the target effect. - Within-unit equicorrelation ρ: higher ρ *lowers* the DiD variance (and hence the MDE and required N) because the difference-in-differences cancels the shared within-unit component (Burlig Eq. 2 equicorrelated case) — the opposite of survey/cluster design effects (`deff`), which inflate variance (see the `deff` Note below) - Unequal allocation: optimal is often 50-50 but depends on costs - **Note:** `data_generator_kwargs` keys that overlap with registry-managed simulation inputs (`treatment_effect`, `noise_sd`, `n_units`, `n_periods`, `treatment_fraction`, `treatment_period`, `n_pre`, `n_post`) are rejected with `ValueError` to prevent silent desync between the DGP and result metadata. `n_pre` and `n_post` are derived from `treatment_period` and `n_periods` in factor-model DGPs (SyntheticDiD, TROP); the 3-way intersection check naturally scopes the rejection to those estimators only. Use the corresponding `simulate_power()` parameters directly, or pass a custom `data_generator` to override the DGP entirely. -- **Note:** `simulate_sample_size()` rejects `n_per_cell` in `data_generator_kwargs` for `TripleDifference` because `n_per_cell` is derived from `n_units` (the search variable). A fixed override would freeze the effective sample size across bisection iterations, making the search degenerate. Use `simulate_power()` with a fixed `n_per_cell` override instead, or pass a custom `data_generator`. +- **Note:** For the **cross-sectional** `TripleDifference` path (`n_periods ≤ 2`), `simulate_sample_size()` rejects `n_per_cell` in `data_generator_kwargs` because `n_per_cell` is derived from `n_units` (the search variable). A fixed override would freeze the effective sample size across bisection iterations, making the search degenerate. Use `simulate_power()` with a fixed `n_per_cell` override instead, or pass a custom `data_generator`. (On the panel path — `n_periods > 2` — `n_per_cell` is rejected upstream in `simulate_power` because the panel DGP does not accept it; see the panel routing Note below.) - **Note:** The simulation-based power registry (`simulate_power`, `simulate_mde`, `simulate_sample_size`) uses a single-cohort staggered DGP by default. Estimators configured with `control_group="not_yet_treated"`, `clean_control="strict"`, or `anticipation>0` will receive a `UserWarning` because the default DGP does not match their identification strategy. Users must supply `data_generator_kwargs` (e.g., `cohort_periods=[2, 4]`, `never_treated_frac=0.0`) or a custom `data_generator` to match the estimator design. -- **Note:** The `TripleDifference` registry adapter uses `generate_ddd_data`, a fixed 2×2×2 factorial DGP (group × partition × time). The `n_periods`, `treatment_period`, and `treatment_fraction` parameters are ignored — DDD always simulates 2 periods with balanced groups. `n_units` is mapped to `n_per_cell = max(2, n_units // 8)` (effective total N = `n_per_cell × 8`), so non-multiples of 8 are rounded down and values below 16 are clamped to 16. A `UserWarning` is emitted when simulation inputs differ from the effective DDD design. When rounding occurs, all result objects (`SimulationPowerResults`, `SimulationMDEResults`, `SimulationSampleSizeResults`) set `effective_n_units` to the actual sample size used; it is `None` when no rounding occurred. `simulate_sample_size()` snaps bisection candidates to multiples of 8 so that `required_n` is always a realizable DDD sample size. Passing `n_per_cell` in `data_generator_kwargs` suppresses the effective-N rounding warning but not warnings for ignored parameters (`n_periods`, `treatment_period`, `treatment_fraction`). +- **Note:** The `TripleDifference` registry adapter routes by `n_periods`. For `n_periods ≤ 2` it uses `generate_ddd_data`, a fixed 2×2×2 factorial DGP (group × partition × time): the `n_periods`, `treatment_period`, and `treatment_fraction` parameters are ignored — it always simulates 2 periods with balanced groups. `n_units` is mapped to `n_per_cell = max(2, n_units // 8)` (effective total N = `n_per_cell × 8`), so non-multiples of 8 are rounded down and values below 16 are clamped to 16. A `UserWarning` is emitted when simulation inputs differ from the effective DDD design. When rounding occurs, all result objects (`SimulationPowerResults`, `SimulationMDEResults`, `SimulationSampleSizeResults`) set `effective_n_units` to the actual sample size used; it is `None` when no rounding occurred. `simulate_sample_size()` snaps bisection candidates to multiples of 8 so that `required_n` is always a realizable DDD sample size. Passing `n_per_cell` in `data_generator_kwargs` suppresses the effective-N rounding warning but not warnings for ignored parameters (`treatment_period`, `treatment_fraction`). +- **Note:** For `n_periods > 2`, `TripleDifference` power routes to `generate_ddd_panel_data` (a balanced panel with time-invariant `group`/`partition` and a derived `post` indicator), which **honors** `n_periods` and `treatment_period`. `n_units` maps directly to panel units (no `// 8` rounding; `effective_n_units` is `None`), and `simulate_sample_size()` uses a continuous (step-1) search. `treatment_fraction` is still ignored (the design is a balanced 2×2×2 with `group_frac = partition_frac = 0.5`); pass `group_frac`/`partition_frac` via `data_generator_kwargs` to vary the split, and a `UserWarning` fires if `treatment_fraction ≠ 0.5`. `n_per_cell` (a cross-sectional-only key) raises `ValueError`. **Clustering caveat:** the panel DGP has within-unit serial correlation, so unclustered standard errors are anti-conservative and overstate power — construct the estimator as `TripleDifference(cluster="unit")` (Liang-Zeger CR1; the panel DGP's unit column is named `"unit"`). A `UserWarning` is emitted when the estimator lacks `cluster="unit"`. The point estimate is invariant to clustering; the inference contract is not. - **Note:** The analytical power methods (`PowerAnalysis.power/mde/sample_size` and the `compute_power/compute_mde/compute_sample_size` convenience functions) accept a `deff` parameter (survey design effect, default 1.0). This inflates variance multiplicatively: `Var(ATT) *= deff`, and inflates required sample size: `n_total *= deff`. The `deff` parameter is **not redundant** with `rho` (intra-cluster correlation): `rho` models within-unit (serial) equicorrelation in panel data via the Burlig (2020) Eq. 2 equicorrelated factor `(1/m + 1/r)(1 - rho)`, while `deff` models the survey design effect from stratified multi-stage sampling (clustering + unequal weighting). A survey panel study may need both. Values `deff > 0` are accepted; `deff < 1.0` (net variance reduction, e.g., from stratification gain) emits a warning. - **Note:** `simulate_power()` catches a narrow set of exception types — `ValueError`, `numpy.linalg.LinAlgError`, `KeyError`, `RuntimeError`, `ZeroDivisionError` — raised inside the per-simulation fit and result-extraction block, increments a per-effect failure counter, and skips the replicate. Programming errors (`TypeError`, `AttributeError`, `NameError`, `IndexError`, etc.) are allowed to propagate so that bugs in the estimator or custom result extractor surface loudly instead of being absorbed as simulation failures. The primary-effect failure count is surfaced on the result object as `SimulationPowerResults.n_simulation_failures`; a `UserWarning` still fires when the failure rate exceeds 10% for any effect size, and all-failed runs raise `RuntimeError`. This replaces the prior bare `except Exception` that swallowed root causes and kept the counter internal to the function (axis C — silent fallback — under the Phase 2 audit). - **Note:** `SurveyPowerConfig._build_survey_design()` no longer caches its return value in `self._cached_survey_design`. Reassigning `config.survey_design` (either replacing a user-supplied `SurveyDesign` with another, or toggling between `None` and a user-supplied design) after the first call used to silently return the stale cached design; the method now returns the live `self.survey_design` (or the default construction when `None`) every call. Other config fields (`n_strata`, `icc`, `weight_variation`, etc.) never influenced the returned design, so the staleness surface was specifically `survey_design` reassignment. Construction is microseconds — the cache never earned its complexity. Axis-J finding #28 in the Phase 2 silent-failures audit. diff --git a/docs/tutorials/06_power_analysis.ipynb b/docs/tutorials/06_power_analysis.ipynb index c104ff97..208df307 100644 --- a/docs/tutorials/06_power_analysis.ipynb +++ b/docs/tutorials/06_power_analysis.ipynb @@ -425,13 +425,13 @@ { "cell_type": "markdown", "id": "6qpu05hi18s", - "source": "### Triple Difference\n\n`TripleDifference` uses a fixed 2×2×2 factorial design (group × partition × time). Sample sizes are **rounded via `n_per_cell = max(2, n_units // 8)`**, so the minimum effective N is 16 (2 units per cell × 8 cells). The `effective_n_units` field in results tracks any rounding. Note that `simulate_sample_size()` uses a higher search floor of 64 from the registry.", + "source": "### Triple Difference\n\n`TripleDifference` power routes by `n_periods`:\n\n- **`n_periods ≤ 2`** → the cross-sectional 2×2×2 factorial DGP (`generate_ddd_data`, group × partition × time). Sample sizes are **rounded via `n_per_cell = max(2, n_units // 8)`**, so the minimum effective N is 16 (2 units per cell × 8 cells), and the `effective_n_units` field tracks any rounding. `simulate_sample_size()` snaps `required_n` to multiples of 8.\n- **`n_periods > 2`** → the **panel** DGP (`generate_ddd_panel_data`), which honors `n_periods`/`treatment_period`. `n_units` maps directly to panel units (no rounding; `effective_n_units` is `None`), and `simulate_sample_size()` searches a continuous grid. The panel DGP has within-unit serial correlation, so construct the estimator as **`TripleDifference(cluster=\"unit\")`** — otherwise the unclustered SEs overstate power and a `UserWarning` is emitted. `treatment_fraction` is inert (balanced 2×2×2); pass `group_frac`/`partition_frac` via `data_generator_kwargs` to vary the split.\n\nThe example below uses the panel path (`n_periods=6`):", "metadata": {} }, { "cell_type": "code", "id": "91uackfwqp", - "source": "# Triple Difference — n_units snaps to multiples of 8\nddd = TripleDifference()\n\nddd_results = simulate_power(\n estimator=ddd,\n n_units=64,\n treatment_effect=3.0,\n sigma=2.0,\n n_simulations=100,\n seed=42,\n progress=False,\n)\n\nprint(ddd_results.summary())\nif ddd_results.effective_n_units is not None:\n print(f\"\\nEffective N (after grid rounding): {ddd_results.effective_n_units}\")", + "source": "# Triple Difference — panel path (n_periods > 2 routes to generate_ddd_panel_data).\n# cluster=\"unit\" is required for valid inference on panel-generated data.\nddd = TripleDifference(cluster=\"unit\")\n\nddd_results = simulate_power(\n estimator=ddd,\n n_units=64,\n n_periods=6,\n treatment_period=3,\n treatment_effect=3.0,\n sigma=2.0,\n n_simulations=100,\n seed=42,\n progress=False,\n)\n\nprint(ddd_results.summary())\n# effective_n_units is None on the panel path (n_units maps directly).\nprint(f\"\\nEffective N: {ddd_results.effective_n_units}\")", "metadata": {}, "execution_count": null, "outputs": [] @@ -439,7 +439,7 @@ { "cell_type": "markdown", "id": "6kb8ovmue4m", - "source": "### Supported Estimators\n\nThe following 12 estimators are supported by the simulation power analysis registry. Each is automatically paired with the correct data-generating process:\n\n| DGP Family | Estimators | Min N |\n|---|---|---|\n| **Basic DiD** (`generate_did_data`) | DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD | 20 |\n| **Staggered** (`generate_staggered_data`) | CallawaySantAnna, SunAbraham, ImputationDiD, TwoStageDiD, StackedDiD, EfficientDiD | 40 |\n| **Factor Model** (`generate_factor_data`) | TROP, SyntheticDiD | 30 |\n| **Triple Difference** (`generate_ddd_data`) | TripleDifference | 16* |\n\n\\* DDD effective N rounds to `max(2, n_units // 8) * 8` with minimum 16. `simulate_sample_size()` uses a higher search floor of 64.\n\n> **Note:** `ContinuousDiD` is not in the registry because continuous/dose-response treatments require a different DGP structure. `BaconDecomposition` and `HonestDiD` are diagnostic/sensitivity tools rather than treatment effect estimators. For unsupported estimators, you can pass a custom `data_generator` and `result_extractor` (see Section 11).", + "source": "### Supported Estimators\n\nThe following 12 estimators are supported by the simulation power analysis registry. Each is automatically paired with the correct data-generating process:\n\n| DGP Family | Estimators | Min N |\n|---|---|---|\n| **Basic DiD** (`generate_did_data`) | DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD | 20 |\n| **Staggered** (`generate_staggered_data`) | CallawaySantAnna, SunAbraham, ImputationDiD, TwoStageDiD, StackedDiD, EfficientDiD | 40 |\n| **Factor Model** (`generate_factor_data`) | TROP, SyntheticDiD | 30 |\n| **Triple Difference** (`generate_ddd_data` / `generate_ddd_panel_data`) | TripleDifference | 16* |\n\n\\* DDD routes by `n_periods`: for `n_periods ≤ 2` the cross-sectional `generate_ddd_data` rounds effective N to `max(2, n_units // 8) * 8` (minimum 16) and `simulate_sample_size()` uses a search floor of 64; for `n_periods > 2` the panel `generate_ddd_panel_data` maps `n_units` directly (use `TripleDifference(cluster=\"unit\")`).\n\n> **Note:** `ContinuousDiD` is not in the registry because continuous/dose-response treatments require a different DGP structure. `BaconDecomposition` and `HonestDiD` are diagnostic/sensitivity tools rather than treatment effect estimators. For unsupported estimators, you can pass a custom `data_generator` and `result_extractor` (see Section 11).", "metadata": {} }, { diff --git a/tests/test_power.py b/tests/test_power.py index 61259c47..22e20fff 100644 --- a/tests/test_power.py +++ b/tests/test_power.py @@ -1110,13 +1110,13 @@ def test_triple_difference(self): self._assert_valid_result(result, "TripleDifference") def test_ddd_warns_ignored_params(self): - """TripleDifference warns when simulation params don't match DDD design.""" - with pytest.warns(UserWarning, match="n_periods=6 is ignored"): + """Cross-sectional DDD (n_periods<=2) warns when params don't match the design.""" + with pytest.warns(UserWarning, match="treatment_fraction=0.3 is ignored"): simulate_power( TripleDifference(), n_units=80, - n_periods=6, - treatment_period=3, + n_periods=2, + treatment_period=1, treatment_fraction=0.3, n_simulations=2, seed=42, @@ -1191,13 +1191,14 @@ def custom_dgp(**kwargs): ) def test_ddd_no_warn_n_per_cell_override(self): - """n_per_cell override suppresses rounding warning but not ignored-param warnings.""" - with pytest.warns(UserWarning, match="n_periods=6 is ignored"): + """Cross-sectional: n_per_cell suppresses the rounding warning but not ignored-param warnings.""" + with pytest.warns(UserWarning, match="treatment_fraction=0.3 is ignored"): simulate_power( TripleDifference(), n_units=80, - n_periods=6, + n_periods=2, treatment_period=1, + treatment_fraction=0.3, data_generator_kwargs=dict(n_per_cell=10), n_simulations=2, seed=42, @@ -1250,6 +1251,111 @@ def test_ddd_power_effective_n_aligned(self): assert result.to_dict()["effective_n_units"] is None assert "Effective sample size" not in result.summary() + # --- Panel DDD routing (n_periods > 2 → generate_ddd_panel_data) --- + + def test_ddd_panel_routing(self): + """n_periods>2 routes DDD power to the panel DGP and honors n_periods.""" + with warnings.catch_warnings(record=True) as caught: + warnings.simplefilter("always") + result = simulate_power( + TripleDifference(cluster="unit"), + n_units=80, + n_periods=6, + treatment_period=3, + treatment_fraction=0.5, + n_simulations=10, + seed=42, + progress=False, + ) + self._assert_valid_result(result, "TripleDifference") + # Panel path honors n_periods/treatment_period (no "ignored" warning), + # and cluster="unit" suppresses the clustering caveat. + msgs = [str(w.message) for w in caught if issubclass(w.category, UserWarning)] + assert not any("ignored" in m for m in msgs), msgs + assert not any("overstate power" in m for m in msgs), msgs + + def test_ddd_panel_warns_without_cluster(self): + """Panel DDD power warns when the estimator lacks cluster='unit'.""" + with pytest.warns(UserWarning, match="overstate power"): + simulate_power( + TripleDifference(), + n_units=80, + n_periods=6, + treatment_period=3, + treatment_fraction=0.5, + n_simulations=5, + seed=42, + progress=False, + ) + + def test_ddd_panel_no_warn_with_cluster(self): + """No warning on the panel path with cluster='unit' and a balanced split.""" + with warnings.catch_warnings(): + warnings.simplefilter("error") + simulate_power( + TripleDifference(cluster="unit"), + n_units=80, + n_periods=6, + treatment_period=3, + treatment_fraction=0.5, + n_simulations=2, + seed=42, + progress=False, + ) + + def test_ddd_panel_rejects_n_per_cell(self): + """n_per_cell is cross-sectional-only; the panel path rejects it clearly.""" + with pytest.raises(ValueError, match="n_per_cell"): + simulate_power( + TripleDifference(cluster="unit"), + n_units=80, + n_periods=6, + treatment_period=3, + data_generator_kwargs=dict(n_per_cell=10), + n_simulations=2, + seed=42, + progress=False, + ) + + @pytest.mark.slow + def test_ddd_panel_mde(self): + """simulate_mde routes to the panel DGP for n_periods>2 (effective_n_units=None).""" + result = simulate_mde( + TripleDifference(cluster="unit"), + n_units=80, + n_periods=6, + treatment_period=3, + n_simulations=5, + effect_range=(0.5, 5.0), + seed=42, + progress=False, + ) + assert isinstance(result, SimulationMDEResults) + assert result.mde > 0 + assert result.effective_n_units is None + + @pytest.mark.slow + def test_ddd_panel_sample_size(self): + """simulate_sample_size uses a continuous (step-1) search on the panel path.""" + result = simulate_sample_size( + TripleDifference(cluster="unit"), + n_periods=6, + treatment_period=3, + treatment_effect=1.0, + n_simulations=10, + n_range=(20, 160), + seed=3, + progress=False, + ) + assert isinstance(result, SimulationSampleSizeResults) + assert result.required_n > 0 + # Panel path is NOT snapped to the cross-sectional multiple-of-8 grid: + # the bisection must be able to explore non-multiples of 8. + assert result.search_path + assert any( + int(step["n_units"]) % 8 != 0 for step in result.search_path + ), "panel sample-size search should explore non-multiples of 8 (step-1 grid)" + @pytest.mark.slow def test_ddd_mde(self): """simulate_mde works for TripleDifference.""" @@ -2104,11 +2210,15 @@ def test_allow_cohort_periods_override(self): ) def test_allow_n_per_cell_override(self): - """n_per_cell is not a protected key — no collision for DDD.""" - # Should not raise (n_per_cell is in DDD builder output but not - # in _PROTECTED_DGP_KEYS, so 3-way intersection is empty) + """n_per_cell is not a protected key — no collision for cross-sectional DDD.""" + # Should not raise (n_per_cell is in the cross-sectional DDD builder + # output but not in _PROTECTED_DGP_KEYS, so 3-way intersection is empty). + # n_periods=2 pins the cross-sectional path; on the panel path + # (n_periods > 2) n_per_cell is rejected (see test_ddd_panel_rejects_n_per_cell). simulate_power( TripleDifference(), + n_periods=2, + treatment_period=1, n_simulations=2, seed=42, progress=False, From 3e8093c1f4a93108e80e9d4551df84bad972eb1d Mon Sep 17 00:00:00 2001 From: igerber Date: Tue, 23 Jun 2026 09:28:28 -0400 Subject: [PATCH 2/3] fix(power): split-aware viable floor for panel DDD sample-size search MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Addresses local codex P1/P2: simulate_sample_size advertised group_frac/ partition_frac overrides for the panel DDD path but hard-coded the search floor at 16, which is infeasible for skewed splits (generate_ddd_panel_data requires every (group,partition) cell non-empty — e.g. 0.1/0.1 needs n>=55). Add _ddd_panel_viable_min_n() (mirrors the DGP's rounded stratified allocation) and use it for min_n/abs_min on the panel path; raise a clear ValueError when an n_range upper bound is below that floor. Regression tests for the unbalanced split + low n_range guard. Co-Authored-By: Claude Opus 4.8 (1M context) --- diff_diff/power.py | 61 ++++++++++++++++++++++++++++++++++++++++++--- tests/test_power.py | 37 +++++++++++++++++++++++++++ 2 files changed, 94 insertions(+), 4 deletions(-) diff --git a/diff_diff/power.py b/diff_diff/power.py index dd85d130..91c09d59 100644 --- a/diff_diff/power.py +++ b/diff_diff/power.py @@ -672,6 +672,36 @@ def _ddd_effective_n( return eff if eff != n_units else None +def _ddd_panel_cells_populated(n: int, group_frac: float, partition_frac: float) -> bool: + """Whether ``generate_ddd_panel_data`` would populate all 4 (group,partition) + cells at ``n`` units. Mirrors the rounded stratified allocation + non-empty + validation in ``prep_dgp.generate_ddd_panel_data`` (kept in lockstep).""" + if n < 4: + return False + n_g1 = int(round(n * group_frac)) + n_g0 = n - n_g1 + n_p1_g0 = int(round(n_g0 * partition_frac)) + n_p1_g1 = int(round(n_g1 * partition_frac)) + cells = (n_g0 - n_p1_g0, n_p1_g0, n_g1 - n_p1_g1, n_p1_g1) + return min(cells) >= 1 + + +def _ddd_panel_viable_min_n( + group_frac: float, partition_frac: float, floor: int = 16, search_max: int = 100000 +) -> int: + """Smallest ``n_units`` for which ``generate_ddd_panel_data`` populates all + four (group,partition) cells under the given split, floored at ``floor``. + + For the balanced default (0.5/0.5) this is 4, so the result is ``floor``; + skewed splits (e.g. 0.1/0.1) need more units before every cell is non-empty, + so the sample-size search must bracket above this value (the registry + documents group_frac/partition_frac as data_generator_kwargs overrides).""" + for n in range(4, search_max + 1): + if _ddd_panel_cells_populated(n, group_frac, partition_frac): + return max(floor, n) + return max(floor, search_max) + + def _check_ddd_dgp_compat( n_units: int, n_periods: int, @@ -3062,7 +3092,16 @@ def simulate_sample_size( is_ddd = estimator_name == "TripleDifference" and data_generator is None is_ddd_panel = is_ddd and n_periods > 2 if is_ddd_panel: - min_n = _ddd_panel_profile().min_n + # The panel DGP requires every (group,partition) cell non-empty, which + # for a skewed group_frac/partition_frac override needs more than the + # default 16-unit floor. Bracket above the viable minimum so the search + # never probes an infeasible n (which would raise in the DGP). + _ddd_overrides = data_generator_kwargs or {} + min_n = _ddd_panel_viable_min_n( + _ddd_overrides.get("group_frac", 0.5), + _ddd_overrides.get("partition_frac", 0.5), + floor=_ddd_panel_profile().min_n, + ) else: min_n = profile.min_n if profile is not None else 20 @@ -3130,11 +3169,25 @@ def _power_at_n(n: int) -> float: ) # --- Bracket --- - # Both DDD paths (cross-sectional and panel) want a >=16 floor so the 8 - # G×P×T cells are populated with margin; everything else floors at 4. - abs_min = 16 if is_ddd else 4 + # Cross-sectional DDD wants a >=16 floor so the 8 G×P×T cells are populated; + # the panel path uses its split-aware viable floor (min_n above, which is + # >=16 and higher for skewed group_frac/partition_frac); everything else + # floors at 4. + if is_ddd_panel: + abs_min = min_n + elif is_ddd: + abs_min = 16 + else: + abs_min = 4 if survey_config is not None: abs_min = max(abs_min, survey_config.min_viable_n) + if is_ddd_panel and n_range is not None and n_range[1] < abs_min: + raise ValueError( + f"n_range upper bound ({n_range[1]}) is below the minimum panel-DDD " + f"sample size ({abs_min}) needed to populate all (group, partition) " + f"cells for group_frac/partition_frac. Raise the upper bound or move " + f"the split closer to 0.5." + ) if n_range is not None: lo, hi = _snap_n(n_range[0], "up", floor=abs_min), _snap_n( n_range[1], "down", floor=abs_min diff --git a/tests/test_power.py b/tests/test_power.py index 22e20fff..383d0435 100644 --- a/tests/test_power.py +++ b/tests/test_power.py @@ -38,6 +38,7 @@ _basic_fit_kwargs, _ddd_dgp_kwargs, _ddd_fit_kwargs, + _ddd_panel_viable_min_n, _extract_multiperiod, _extract_simple, _extract_staggered, @@ -1356,6 +1357,42 @@ def test_ddd_panel_sample_size(self): int(step["n_units"]) % 8 != 0 for step in result.search_path ), "panel sample-size search should explore non-multiples of 8 (step-1 grid)" + @pytest.mark.slow + def test_ddd_panel_sample_size_unbalanced_split(self): + """Unbalanced group_frac/partition_frac override raises the panel floor so + the search never probes an infeasible (empty-cell) n.""" + result = simulate_sample_size( + TripleDifference(cluster="unit"), + n_periods=6, + treatment_period=3, + treatment_effect=2.0, + n_simulations=8, + seed=1, + progress=False, + data_generator_kwargs={"group_frac": 0.1, "partition_frac": 0.1}, + ) + assert isinstance(result, SimulationSampleSizeResults) + assert result.required_n > 0 + # Every probed n must populate all 4 (group, partition) cells (>= the + # split-aware viable floor); none should hit the infeasible n=16 default. + viable_floor = _ddd_panel_viable_min_n(0.1, 0.1) + assert viable_floor > 16 # skewed split needs more than the balanced floor + assert all(int(step["n_units"]) >= viable_floor for step in result.search_path) + + def test_ddd_panel_sample_size_low_n_range_raises(self): + """An n_range whose upper bound is below the split-aware viable floor raises clearly.""" + with pytest.raises(ValueError, match="below the minimum panel-DDD"): + simulate_sample_size( + TripleDifference(cluster="unit"), + n_periods=6, + treatment_period=3, + n_range=(8, 20), + n_simulations=2, + seed=1, + progress=False, + data_generator_kwargs={"group_frac": 0.1, "partition_frac": 0.1}, + ) + @pytest.mark.slow def test_ddd_mde(self): """simulate_mde works for TripleDifference.""" From e059454157b27d0abe4e86e2a4142ac1e1ae8086 Mon Sep 17 00:00:00 2001 From: igerber Date: Tue, 23 Jun 2026 09:49:45 -0400 Subject: [PATCH 3/3] fix(power): validate group_frac/partition_frac in panel-DDD viable-floor helper Addresses CI codex P2: _ddd_panel_viable_min_n() now validates group_frac/ partition_frac in (0, 1) up front (matching generate_ddd_panel_data) and raises clearly if no feasible n is found within search_max, so an out-of-range split surfaces the DGP's clear message instead of a misleading "n_range below the minimum" bracketing error. Adds a unit test for the validation. Also clarifies the tutorial-06 DDD support-table footnote (P3): simulate_sample_size starts at the registry floor of 64, but an explicit n_range can reach the 16-unit floor. Co-Authored-By: Claude Opus 4.8 (1M context) --- diff_diff/power.py | 13 ++++++++++++- docs/tutorials/06_power_analysis.ipynb | 2 +- tests/test_power.py | 8 ++++++++ 3 files changed, 21 insertions(+), 2 deletions(-) diff --git a/diff_diff/power.py b/diff_diff/power.py index 91c09d59..08aae4d2 100644 --- a/diff_diff/power.py +++ b/diff_diff/power.py @@ -696,10 +696,21 @@ def _ddd_panel_viable_min_n( skewed splits (e.g. 0.1/0.1) need more units before every cell is non-empty, so the sample-size search must bracket above this value (the registry documents group_frac/partition_frac as data_generator_kwargs overrides).""" + # Validate up front (matching generate_ddd_panel_data) so an out-of-range + # split raises the same clear message here — before the bracketing logic can + # surface a misleading "n_range below the minimum" error downstream. + if not (0.0 < group_frac < 1.0): + raise ValueError(f"group_frac must be in (0, 1); got {group_frac}.") + if not (0.0 < partition_frac < 1.0): + raise ValueError(f"partition_frac must be in (0, 1); got {partition_frac}.") for n in range(4, search_max + 1): if _ddd_panel_cells_populated(n, group_frac, partition_frac): return max(floor, n) - return max(floor, search_max) + raise ValueError( + f"No panel-DDD sample size <= {search_max} populates all four " + f"(group, partition) cells for group_frac={group_frac}, " + f"partition_frac={partition_frac}; move the split closer to 0.5." + ) def _check_ddd_dgp_compat( diff --git a/docs/tutorials/06_power_analysis.ipynb b/docs/tutorials/06_power_analysis.ipynb index 208df307..31cdae1e 100644 --- a/docs/tutorials/06_power_analysis.ipynb +++ b/docs/tutorials/06_power_analysis.ipynb @@ -439,7 +439,7 @@ { "cell_type": "markdown", "id": "6kb8ovmue4m", - "source": "### Supported Estimators\n\nThe following 12 estimators are supported by the simulation power analysis registry. Each is automatically paired with the correct data-generating process:\n\n| DGP Family | Estimators | Min N |\n|---|---|---|\n| **Basic DiD** (`generate_did_data`) | DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD | 20 |\n| **Staggered** (`generate_staggered_data`) | CallawaySantAnna, SunAbraham, ImputationDiD, TwoStageDiD, StackedDiD, EfficientDiD | 40 |\n| **Factor Model** (`generate_factor_data`) | TROP, SyntheticDiD | 30 |\n| **Triple Difference** (`generate_ddd_data` / `generate_ddd_panel_data`) | TripleDifference | 16* |\n\n\\* DDD routes by `n_periods`: for `n_periods ≤ 2` the cross-sectional `generate_ddd_data` rounds effective N to `max(2, n_units // 8) * 8` (minimum 16) and `simulate_sample_size()` uses a search floor of 64; for `n_periods > 2` the panel `generate_ddd_panel_data` maps `n_units` directly (use `TripleDifference(cluster=\"unit\")`).\n\n> **Note:** `ContinuousDiD` is not in the registry because continuous/dose-response treatments require a different DGP structure. `BaconDecomposition` and `HonestDiD` are diagnostic/sensitivity tools rather than treatment effect estimators. For unsupported estimators, you can pass a custom `data_generator` and `result_extractor` (see Section 11).", + "source": "### Supported Estimators\n\nThe following 12 estimators are supported by the simulation power analysis registry. Each is automatically paired with the correct data-generating process:\n\n| DGP Family | Estimators | Min N |\n|---|---|---|\n| **Basic DiD** (`generate_did_data`) | DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD | 20 |\n| **Staggered** (`generate_staggered_data`) | CallawaySantAnna, SunAbraham, ImputationDiD, TwoStageDiD, StackedDiD, EfficientDiD | 40 |\n| **Factor Model** (`generate_factor_data`) | TROP, SyntheticDiD | 30 |\n| **Triple Difference** (`generate_ddd_data` / `generate_ddd_panel_data`) | TripleDifference | 16* |\n\n\\* DDD routes by `n_periods`: for `n_periods ≤ 2` the cross-sectional `generate_ddd_data` rounds effective N to `max(2, n_units // 8) * 8` (minimum 16) and `simulate_sample_size()` starts its bisection at the registry floor of 64 (an explicit `n_range` can still go down to the 16-unit realizability floor); for `n_periods > 2` the panel `generate_ddd_panel_data` maps `n_units` directly (use `TripleDifference(cluster=\"unit\")`).\n\n> **Note:** `ContinuousDiD` is not in the registry because continuous/dose-response treatments require a different DGP structure. `BaconDecomposition` and `HonestDiD` are diagnostic/sensitivity tools rather than treatment effect estimators. For unsupported estimators, you can pass a custom `data_generator` and `result_extractor` (see Section 11).", "metadata": {} }, { diff --git a/tests/test_power.py b/tests/test_power.py index 383d0435..913f913c 100644 --- a/tests/test_power.py +++ b/tests/test_power.py @@ -1393,6 +1393,14 @@ def test_ddd_panel_sample_size_low_n_range_raises(self): data_generator_kwargs={"group_frac": 0.1, "partition_frac": 0.1}, ) + def test_ddd_panel_viable_min_n_validates_split(self): + """The viable-floor helper rejects out-of-range fractions with the DGP's + clear message (not a misleading n_range/bracketing error).""" + with pytest.raises(ValueError, match=r"group_frac must be in \(0, 1\)"): + _ddd_panel_viable_min_n(0.0, 0.5) + with pytest.raises(ValueError, match=r"partition_frac must be in \(0, 1\)"): + _ddd_panel_viable_min_n(0.5, 1.0) + @pytest.mark.slow def test_ddd_mde(self): """simulate_mde works for TripleDifference."""