Context
policyengine_us_data/datasets/cps/cps.py:433-438 sets would_file_taxes_voluntarily using a flat 5% rate across all CPS tax units not taking up EITC:
voluntary_filing_rate = 0.05
rng = seeded_rng("would_file_taxes_voluntarily")
data["would_file_taxes_voluntarily"] = ~data["takes_up_eitc"] & (
rng.random(n_tax_units) < voluntary_filing_rate
)
The parameter parameters/take_up/voluntary_filing.yaml is also a single scalar.
Problem
Elective filing is not a uniform random process. PWBM (Ricco 2020, §2.3.1) notes:
The second is a probabilistic module that models elective filing among CPS records not flagged as filers in the first step. This portion is aimed primarily at selecting low-income wage earners with children, a group that generally qualifies for refundable credits such as the Earned Income Tax Credit and the Additional Child Tax Credit. The elective filing probabilities are calibrated such that the number of CPS filers by demographic group more closely matches that of the PUF.
PE's flat 5% likely over-recruits voluntary filers from demographic groups that rarely file (high-income seniors on Social Security only, for example) and under-recruits from low-income families with children who systematically file to claim refundable credits.
Since PE separately handles EITC take-up, the remaining elective-filer population is mostly:
- ACTC claimants (filing even without sufficient EITC)
- state refundable credit claimants
- filers recovering over-withheld wages
- voluntary filers for non-refund reasons (habit, documentation, state requirement)
These populations are not uniformly distributed across demographics.
Proposal
Replace the flat rate with a probability conditioned on:
- presence of children under 17
- wage income bracket (especially sub-$30k)
- withholding-likely employment status (W-2 wages present vs. 1099/self-employed only)
- age bracket (young-adult filers claiming back taxes vs. elderly non-filers)
Calibrate the probability function so that the CPS filer count matches SOI filer counts within the demographic cells used in the PWBM-style stratification (see companion issue on filer-cell calibration).
Implementation notes
- The calibration pass already knows per-group filer counts as targets; this issue is about the imputation step, so that calibration isn't fighting an obviously-wrong initial allocation
parameters/take_up/voluntary_filing.yaml would become a table keyed on the conditioning variables (or a small logistic model stored as parameters)
- The "refund-seeking" intent captured by EITC takeup would remain separate; this only handles the non-EITC voluntary filers
References
Context
policyengine_us_data/datasets/cps/cps.py:433-438setswould_file_taxes_voluntarilyusing a flat 5% rate across all CPS tax units not taking up EITC:The parameter
parameters/take_up/voluntary_filing.yamlis also a single scalar.Problem
Elective filing is not a uniform random process. PWBM (Ricco 2020, §2.3.1) notes:
PE's flat 5% likely over-recruits voluntary filers from demographic groups that rarely file (high-income seniors on Social Security only, for example) and under-recruits from low-income families with children who systematically file to claim refundable credits.
Since PE separately handles EITC take-up, the remaining elective-filer population is mostly:
These populations are not uniformly distributed across demographics.
Proposal
Replace the flat rate with a probability conditioned on:
Calibrate the probability function so that the CPS filer count matches SOI filer counts within the demographic cells used in the PWBM-style stratification (see companion issue on filer-cell calibration).
Implementation notes
parameters/take_up/voluntary_filing.yamlwould become a table keyed on the conditioning variables (or a small logistic model stored as parameters)References