Context
PWBM's 2020 tax-data methodology paper (Ricco, "Data Processing for PWBM's Tax Module") §2.3.2 calibrates CPS tax-unit counts to PUF filer counts within 14 joint demographic cells:
|
0 dep |
1 dep |
2 dep |
3+ dep |
65+ |
| Single |
✓ |
✓ |
✓ |
— |
✓ |
| Married |
✓ |
✓ |
✓ |
✓ |
✓ |
| HOH |
— |
✓ |
✓ |
✓ |
✓ |
| Dependents |
— |
— |
— |
— |
✓ |
(Table 4 of the PWBM paper.)
This matters because the growth rate in return counts within cells diverges from the overall growth rate. Getting cell-level counts wrong means errors land on the intensive margin (average income) when they should be on the extensive margin (number of filers).
Current state in policyengine-us-data
calibration/target_config.yaml calibrates tax_unit_count along one dimension at a time:
tax_unit_count × adjusted_gross_income (by AGI bucket)
tax_unit_count × aca_ptc, refundable_ctc, non_refundable_ctc, total_self_employment_income
tax_unit_count × adjusted_gross_income,refundable_ctc (2D)
No joint targets on (filing_status × num_dependents × age_65plus). The closest coverage is via CTC counts (which imply filing status indirectly) and the open AGI-by-filing-status CTC issue #717.
Proposal
Add SOI-based calibration targets for tax_unit_count stratified on the 14-cell matrix above. Source: IRS SOI Table 1.2 (returns by filing status and size of AGI) and Table 2.5 (returns by size of AGI and number of exemptions). Both publish filer counts at exactly this granularity.
Why this helps
- Forces CPS → PUF demographic alignment that is currently implicit
- Catches Census TAX_ID assignment errors (e.g., CPS units that "should" be HOH but Census assigned as single)
- Removes a class of CTC/EITC mismatches that currently flow through the income-by-AGI targets
- Matches PWBM's diagnostic that CPS/SOI ratios vary from 0.47 (single, 1 dep) to 1.79 (married, 0 deps) — i.e., cell-level discrepancies are large
References
Context
PWBM's 2020 tax-data methodology paper (Ricco, "Data Processing for PWBM's Tax Module") §2.3.2 calibrates CPS tax-unit counts to PUF filer counts within 14 joint demographic cells:
(Table 4 of the PWBM paper.)
This matters because the growth rate in return counts within cells diverges from the overall growth rate. Getting cell-level counts wrong means errors land on the intensive margin (average income) when they should be on the extensive margin (number of filers).
Current state in policyengine-us-data
calibration/target_config.yamlcalibratestax_unit_countalong one dimension at a time:tax_unit_count×adjusted_gross_income(by AGI bucket)tax_unit_count×aca_ptc,refundable_ctc,non_refundable_ctc,total_self_employment_incometax_unit_count×adjusted_gross_income,refundable_ctc(2D)No joint targets on
(filing_status × num_dependents × age_65plus). The closest coverage is via CTC counts (which imply filing status indirectly) and the open AGI-by-filing-status CTC issue #717.Proposal
Add SOI-based calibration targets for
tax_unit_countstratified on the 14-cell matrix above. Source: IRS SOI Table 1.2 (returns by filing status and size of AGI) and Table 2.5 (returns by size of AGI and number of exemptions). Both publish filer counts at exactly this granularity.Why this helps
References