Skip to content

is_pregnant stochastic draw silently dropped by extended_cps formula filter #576

@baogorek

Description

@baogorek

Problem

cps.py assigns is_pregnant as a stochastic draw using CDC/Census state-level pregnancy rates, because CPS does not collect pregnancy data. However, extended_cps.py has a _drop_formula_variables step that removes any variable with a formula, adds, or subtracts defined in policyengine-us.

is_pregnant has adds: ['current_pregnancies'] in the country package, so it gets silently dropped. The result is that is_pregnant is absent from every downstream dataset (extended, stratified, source-imputed), and calibration targets for pregnancy fail as "impossible."

Current workaround

_drop_formula_variables has a _KEEP_FORMULA_VARS set for exceptions. Adding is_pregnant to it fixes this specific case, but it's a hard-coded denylist escape hatch — any future stochastic input that happens to have a formula/adds/subtracts in the country package will be silently dropped too.

Underlying design tension

The data package (cps.py) deliberately sets is_pregnant as an input variable. The country package (policyengine-us) defines it with adds: ['current_pregnancies'], meaning the engine wants to compute it from components. Both are reasonable:

  • Data package perspective: CPS lacks pregnancy data, so we impute it stochastically and store it as an input for calibration to fine-tune.
  • Country package perspective: is_pregnant is derived from current_pregnancies via adds, so the engine should recompute it.

The question is: when the data package provides a value for a variable that the country package also computes, which should win?

Options

  1. Hard-code exceptions (_KEEP_FORMULA_VARS) — current approach, fragile
  2. Don't drop variables that were explicitly set in cps.py — flip the logic so intentional data inputs are preserved
  3. Coordinate with policyengine-us — perhaps is_pregnant shouldn't have adds if it's meant to be a data input, or current_pregnancies should be the stored variable instead

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions