-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Summary
The extended CPS build retrains all QRF models from scratch every time, even when only calibration weights change. Since the QRF imputation depends on source CPS + PUF data (not weights), the fitted models could be cached and reused.
Current cost
- 85+ variable sequential QRF on ~20K PUF records: ~30-60 min
- Additional QRF calls for weeks_unemployed, retirement contributions, SS sub-components
- This runs on every
make dataor CI build
Proposed approach
- Serialize fitted QRF models (e.g. pickle/joblib) keyed by a hash of the training data
- On rebuild, check if source data hash matches cached model — if so, skip training and just predict
- microimpute could potentially support this natively (save/load fitted models)
- Could also cache the full
extended_cps_2024.h5and only rebuild when CPS/PUF inputs change
Context
Related to the sequential QRF migration in #594 — now that all 85 variables are in a single fit() call, caching the one fitted model would skip the entire training phase.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels