Skip to content

employment_income is zero in published enhanced CPS datasets #573

@baogorek

Description

@baogorek

Problem

The published enhanced_cps_2024.h5 on HuggingFace produces employment_income.sum() == 0 because of two independent bugs.

Bug A: _drop_formula_variables drops CPS income without preserving raw data

PR #554 (commit 7025f6a) introduced _drop_formula_variables() in ExtendedCPS, which drops all variables with adds/subtracts. This correctly drops employment_income (which has adds: [employment_income_before_lsr, employment_income_behavioral_response]), but the CPS raw data stores income under employment_income directly — employment_income_before_lsr was never written to the H5.

When PolicyEngine tries to recompute employment_income via its adds formula, it needs employment_income_before_lsr, finds nothing, and defaults to 0. Same for self_employment_income.

Bug B: create_sparse_ecps indentation error deletes populated variables

In create_sparse_ecps() (small_enhanced_cps.py), lines 128-129 have the del data[variable] cleanup check inside the inner for time_period loop instead of after it. For formula variables with no known periods, the inner loop never executes, so empty data[variable] = {} dicts survive and get written as empty H5 groups. The sibling function create_small_ecps() at lines 51-52 has the correct indentation.

Relationship to existing issues

Fix

  • Bug A: Before dropping formula variables, rename CPS-stored aggregate variables to their input-variable equivalents (e.g. employment_incomeemployment_income_before_lsr). This preserves the raw data under the correct name so the adds formula can recompute the aggregate.
  • Bug B: Dedent the empty-dict cleanup in create_sparse_ecps to match create_small_ecps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions