-
Notifications
You must be signed in to change notification settings - Fork 10
Closed
Description
Bug
_drop_formula_variables() in ExtendedCPS (introduced in PR #554, commit 7025f6a) drops employment_income and self_employment_income because they have adds. But the raw CPS H5 only stores employment_income — not employment_income_before_lsr. So income data is silently lost during the ExtendedCPS build.
Chain of failure (CI builds from scratch)
- CPS_2024 H5 stores
employment_income - ExtendedCPS_2024.generate() calls
_drop_formula_variables()→ dropsemployment_income(hasadds) - ExtendedCPS H5 now has no employment income variable
- EnhancedCPS_2024 loads ExtendedCPS → no employment income
- Microsimulation computes
employment_incomeviaadds→ needsemployment_income_before_lsr→ not in H5 → defaults to 0
Same issue for self_employment_income / self_employment_income_before_lsr.
Evidence
# CPS H5 contents
employment_income: shape=(50692,), sum=1.84e+09 # ← present
employment_income_before_lsr: NOT IN H5 # ← missing
# _drop_formula_variables behavior
employment_income: would_be_dropped=True # has adds
employment_income_before_lsr: would_be_dropped=False # input variableWhy it's not caught locally
Local EnhancedCPS data downloaded from HuggingFace was built before _drop_formula_variables existed, so it still has employment_income_before_lsr. Only fresh CI builds (which regenerate from scratch) hit this.
Discovered via
PR #570's new sanity tests caught employment_income summing to 0 in CI.
Possible fixes
- Don't drop variables with only
adds/subtracts(only drop those with actualformulas) - Or rename
employment_income→employment_income_before_lsrbefore the drop step
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels