Skip to content

_drop_formula_variables drops employment_income, zeroing it in CI builds #571

@baogorek

Description

@baogorek

Bug

_drop_formula_variables() in ExtendedCPS (introduced in PR #554, commit 7025f6a) drops employment_income and self_employment_income because they have adds. But the raw CPS H5 only stores employment_incomenot employment_income_before_lsr. So income data is silently lost during the ExtendedCPS build.

Chain of failure (CI builds from scratch)

  1. CPS_2024 H5 stores employment_income
  2. ExtendedCPS_2024.generate() calls _drop_formula_variables() → drops employment_income (has adds)
  3. ExtendedCPS H5 now has no employment income variable
  4. EnhancedCPS_2024 loads ExtendedCPS → no employment income
  5. Microsimulation computes employment_income via adds → needs employment_income_before_lsr → not in H5 → defaults to 0

Same issue for self_employment_income / self_employment_income_before_lsr.

Evidence

# CPS H5 contents
employment_income: shape=(50692,), sum=1.84e+09        # ← present
employment_income_before_lsr: NOT IN H5                # ← missing

# _drop_formula_variables behavior
employment_income:            would_be_dropped=True     # has adds
employment_income_before_lsr: would_be_dropped=False    # input variable

Why it's not caught locally

Local EnhancedCPS data downloaded from HuggingFace was built before _drop_formula_variables existed, so it still has employment_income_before_lsr. Only fresh CI builds (which regenerate from scratch) hit this.

Discovered via

PR #570's new sanity tests caught employment_income summing to 0 in CI.

Possible fixes

  • Don't drop variables with only adds/subtracts (only drop those with actual formulas)
  • Or rename employment_incomeemployment_income_before_lsr before the drop step

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions