Proximal L1 solver (method=prox, l1_lambda) + honest method label#184
Proximal L1 solver (method=prox, l1_lambda) + honest method label#184MaxGhenis wants to merge 2 commits into
Conversation
MaxGhenis
left a comment
There was a problem hiding this comment.
Review: PolicyEngine/populace PR #184
Recommendation: REQUEST_CHANGES
Findings
- [CRITICAL] The L1 path does not implement the objective it documents and records.
In packages/populace-calibrate/src/populace/calibrate/solve.py, _optimize_proximal documents the objective as capped weighted-MAPE plus l1_lambda * mean(w_i / w0_i), and result metadata records l1_penalty="mean_initial_weight_ratio_abs". But the update at lines 685-690 normalizes the smooth gradient step by RMS, making the effective step size learning_rate / rms, while the soft-threshold is fixed at learning_rate * l1_lambda. For the documented mean penalty, the threshold should use the same effective step size and the mean divisor, approximately (learning_rate / rms) * l1_lambda / n. The current update is therefore not the proximal operator for the recorded objective; it behaves like an unnormalized heuristic whose lambda scale changes with record count and gradient scale. That is especially risky because this PR is meant to supply a solver/provenance path for l0-paper comparisons.
Relevant lines: packages/populace-calibrate/src/populace/calibrate/solve.py:627, :632, :648, :652, :685, :690, :1289.
Suggested fix: choose one contract and make code, docs, metadata, and tests agree. If the contract is mean(w/w0), use the actual effective step in the prox shrink and divide by n. If the desired behavior is a budget heuristic, rename the metadata away from an objective penalty and stop claiming it minimizes target_loss + lambda * mean(r).
Test Gap
The new tests show that prox can produce exact zeros and that pruning is monotone on one fixture. They do not test the L1 scale/objective contract. Add a small deterministic one-step prox test or a duplicate-record invariance test for the declared mean penalty.
Other Notes
The PR is mergeable but the branch is three commits behind the current #182 base. It should be rebased after the solver issue is fixed. No GitHub checks were reported for this branch during review; local focused test_solve.py passed 37 tests.
The method arg accepted "apg"/"adam" but both ran torch Adam on log-weights, so manifests could record method="apg" for an Adam run. Adam also cannot perform the soft-thresholding L1 needs. This adds method="prox" for proximal gradient on weight ratios r=w/w0, rejects the misleading apg alias, and guards l1_lambda so it can only be used with the prox path. The L1 contract is explicit: l1_lambda multiplies mean(w/w0). The prox shrink uses the same effective smooth-step size and divides by n, so the soft-threshold matches the recorded mean-initial-weight-ratio objective rather than an unnormalized sum penalty. Tests cover sparse selection, lambda monotonicity, apg rejection, prox-only l1_lambda, and a one-step zero-gradient case that pins the mean-ratio prox scale. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
42c1a58 to
258e7eb
Compare
|
Fixed the review finding and pushed the update to #184. What changed:
Verification:
|
There was a problem hiding this comment.
Follow-up after the pushed fix: the L1 proximal update now matches the documented objective, l1_lambda * mean(w / w0), because the shrink threshold uses the same effective smooth step size and divides by record count. The new one-step regression test covers the scale contract that the original review flagged.
Validated locally with pytest packages/populace-calibrate/tests/test_solve.py -q and pytest packages/populace-calibrate/tests -q. I am leaving this as a comment review rather than approving my own pushed fix.
MaxGhenis
left a comment
There was a problem hiding this comment.
Follow-up after the second review pass: I pushed 99b4f36 to address the remaining solver-contract issues. method="prox" now rejects l2_lambda > 0 instead of recording an L2 objective it does not optimize, and the L1 scale tests now cover both the zero-gradient / n shrink and a nonzero-gradient case that pins use of the RMS-normalized effective step.
Validated with uv run --package populace-calibrate --group dev python -m pytest packages/populace-calibrate/tests/test_solve.py -q, uv run --package populace-calibrate --group dev python -m pytest packages/populace-calibrate/tests -q, uv run --no-sync ruff check packages/populace-calibrate/src/populace/calibrate/solve.py packages/populace-calibrate/tests/test_solve.py, and git diff --check. Existing torch sparse warnings only.
Proximal L1 solver + honest
methodlabelAdds the L1 selection path the l0-paper comparison needs, and fixes the
apgmislabel surfaced while building it.The
apgbugcalibrate(method=...)accepted"apg"/"adam"but both ran torch Adam on log-weights —methodwas validated, written into the run manifest, then never passed to_optimize. So every default run recordedmethod: "apg"for an Adam run (false provenance), and the docstring claimed "Adam is the accelerated proximal gradient," which is wrong (Adam has no prox step). Not a results bug, but a real mislabel — and it actively blocks L1, which needs a proximal operator.Fix:
method ∈ {"adam", "prox"}, default"adam";"apg"is rejected. No back-compat alias (no caller passed it).The L1 path (
method="prox",l1_lambda)Proximal gradient (ISTA-style) minimizing the same capped weighted-MAPE loss plus
l1_lambda · mean(w / w0):r = w / w0(O(1) scale) so the step is well-conditioned.r ← max(r - eta · l1_lambda / n, 0), sol1_lambdais the coefficient on the recorded mean initial-weight-ratio objective, not an unnormalized sum penalty.Validation
Focused checks passed:
ruff format/ruff checkon the touched solver/test files.pytest packages/populace-calibrate/tests/test_solve.py -q→ 38 passed.pytest packages/populace-calibrate/tests -q→ 99 passed, with existing torch sparse warnings.New tests cover sparse selection, lambda monotonicity,
apgrejection,l1_lambdarequiringmethod="prox", and a one-step zero-gradient case that pins thel1_lambda · mean(w / w0)prox scaling.On the L2 concentration fixture after the mean-penalty scaling fix:
l1=0 → 160survivors,0.5 → 106,1.0 → 79,2.0 → 57; larger penalties correctly error if all weights are zeroed.Based on
codex/target-alignment-audit-20260624atdcdde76, so it stacks on the current formula-owned aggregate export fix.