Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions app/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,6 @@ src/data-summary.json
public/data/

.cache/

# Local benchmark exports stay untracked; builds resolve src/data.artifact.json
src/data.json
1 change: 0 additions & 1 deletion app/src/data.json

This file was deleted.

30 changes: 16 additions & 14 deletions docs/artifacts.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,20 +56,22 @@ without uploading.
2. `app/src/data.artifact.json` — download the asset, verify its sha256,
cache under `app/.cache/`, and refuse hash mismatches.

## Cutover plan

The committed `app/src/data.json` and the pointer currently coexist; the
committed file wins so in-flight refresh branches are unaffected. Once the
current refresh cycle lands:

1. Refresh flow becomes: export run → `policybench publish-dashboard --tag
dashboard-data-<date>` → commit the pointer (a 9-line diff instead of a
57MB blob).
2. Delete `app/src/data.json` from the repo; builds resolve via the pointer.
3. `paper/snapshot/<date>/` copies stay committed and frozen — they are the
manuscript's evidence base, not the live site's data path. A future
snapshot may pin release artifacts by sha256 in its manifest instead of
committing copies.
## Refresh flow (post-cutover)

`app/src/data.json` is no longer committed (and is gitignored); builds resolve
the pointer. A data refresh is:

1. Export the run locally (`policybench export-full-run` writes
`app/src/data.json` in your working tree).
2. `policybench publish-dashboard --tag dashboard-data-<date>` — validates,
uploads the release asset, rewrites the pointer.
3. Commit the pointer plus a snapshot-manifest update pinning the new
artifact's sha256 (tests enforce pointer == manifest pin == the combined
committed run exports).

`paper/snapshot/<date>/` copies stay committed and frozen — they are the
manuscript's evidence base, and the integrity tests rebuild the published
payload from them.

History rewriting to reclaim the existing ~250MB of data.json blobs is
intentionally out of scope: it would invalidate every open fork and PR, and
Expand Down
6 changes: 4 additions & 2 deletions docs/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,10 @@ It builds against:
scenarios, reference outputs, impact summaries, frozen run-level dashboard
exports under `runs/`, the rendered PDF/web manuscript hashes, and the
`manifest.json` provenance index.
- `app/src/data.json` — the live site export. In this snapshot it is required
to equal the committed source-run dashboard exports under `runs/`.
- `app/src/data.artifact.json` — the committed pointer to the published
live-site payload (a GitHub release asset). The manifest pins its sha256,
which must equal the combined source-run dashboard exports under `runs/`
(machine-checked by tests/test_snapshot_artifacts.py).

## Rendered outputs

Expand Down
3 changes: 2 additions & 1 deletion docs/results.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ policybench analyze --output-dir results/local/analysis
These commands are for local scratch analysis. Published dashboard exports are
assembled from dated country/model batch directories with
`policybench export-full-run`; do not treat a single local `analyze` run as the
source for `app/src/data.json`.
source for the published dashboard payload (`policybench publish-dashboard`
uploads it and commits the pointer `app/src/data.artifact.json`).

`reference-outputs` writes PolicyEngine reference outputs rather than
administrative truth.
Expand Down
9 changes: 8 additions & 1 deletion paper/snapshot/20260501/manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"us": "tax year 2026",
"uk": "fiscal year 2026-27"
},
"live_dashboard_note": "For this snapshot, app/src/data.json is required to equal the committed source run data.json files listed under source_run_artifacts. Future live-site refreshes should update the snapshot manifest or use a new snapshot directory.",
"live_dashboard_note": "The live dashboard payload is a published release asset; the committed pointer app/src/data.artifact.json must reference the artifact pinned under published_dashboard_artifact, which equals the combined export of the source run data.json files listed under source_run_artifacts. Future refreshes publish a new artifact (policybench publish-dashboard) and either update this manifest or use a new snapshot directory.",
"source_run_labels": {
"us": "us_full_run_20260513_policyengine_4_4_4_nested_outputs",
"uk": "uk_full_run_20260513_policyengine_4_4_4_nested_outputs"
Expand Down Expand Up @@ -236,5 +236,12 @@
"path": "policybench/population_weights.json",
"sha256": "31f603a8989f3784938de71854d5c3bf10a62baa33d6cdb85a3acbc726d54360",
"note": "Output weights for the household and aggregate scoring views. US weights use the full Enhanced CPS; UK weights use the full enhanced FRS. This weighting source is separate from the public UK calibrated transfer dataset used for the benchmark scenarios. These weights are fixed for scoring the 100-household snapshot."
},
"published_dashboard_artifact": {
"tag": "dashboard-data-20260520",
"asset": "dashboard-data.json",
"url": "https://github.com/PolicyEngine/policybench/releases/download/dashboard-data-20260520/dashboard-data.json",
"sha256": "4686fe4c74c41ae107f163b18d40ebed6c675c4432be1bf9fe67c61a64194d71",
"bytes": 57432435
}
}
44 changes: 40 additions & 4 deletions tests/test_snapshot_artifacts.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,12 +108,20 @@ def _snapshot_country_payloads(manifest: dict) -> dict[str, dict]:
return payloads


def test_committed_app_payload_matches_frozen_source_run_export():
def test_published_dashboard_artifact_matches_frozen_source_run_export():
"""The published payload must equal the combined frozen run exports.

The dashboard blob is no longer committed, so this recombines the
committed per-country run exports exactly as export_full_run serializes
them and checks the bytes hash to the manifest's published-artifact pin —
the same equality the old committed-blob comparison enforced, offline.
"""
manifest = json.loads((SNAPSHOT_DIR / "manifest.json").read_text())
expected_payload = {"countries": _snapshot_country_payloads(manifest)}
app_payload = json.loads((ROOT / "app" / "src" / "data.json").read_text())
combined_bytes = json.dumps(expected_payload).encode("utf-8")
digest = hashlib.sha256(combined_bytes).hexdigest()

assert app_payload == expected_payload
assert digest == manifest["published_dashboard_artifact"]["sha256"]


def _aggregate_scenario_metric(country_payload: dict, metric: str) -> dict[str, float]:
Expand Down Expand Up @@ -162,7 +170,8 @@ def _aggregate_scenario_metric(country_payload: dict, metric: str) -> dict[str,


def test_scenario_row_scores_reproduce_committed_model_stats():
app_payload = json.loads((ROOT / "app" / "src" / "data.json").read_text())
manifest = json.loads((SNAPSHOT_DIR / "manifest.json").read_text())
app_payload = {"countries": _snapshot_country_payloads(manifest)}

metric_pairs = {
"score": "score",
Expand Down Expand Up @@ -286,3 +295,30 @@ def test_snapshot_deviation_audit_annotations_are_complete_and_final():
audited["failure_source"].value_counts().to_dict()
== expected_sources[country]
)


def test_dashboard_pointer_matches_published_snapshot_artifact():
"""The committed artifact pointer must reference the snapshot's pinned
dashboard payload — the machine-checked version of live_dashboard_note."""
manifest = json.loads((SNAPSHOT_DIR / "manifest.json").read_text())
pinned = manifest["published_dashboard_artifact"]
pointer = json.loads((ROOT / "app" / "src" / "data.artifact.json").read_text())
assert pointer["sha256"] == pinned["sha256"]
assert pointer["tag"] == pinned["tag"]
assert pointer["asset"] == pinned["asset"]
assert pointer["url"] == pinned["url"]


def test_dashboard_blob_is_not_committed():
"""data.json is a published artifact, not source; only the pointer is
committed (local exports are gitignored)."""
import subprocess

tracked = subprocess.run(
["git", "ls-files", "app/src/data.json"],
capture_output=True,
text=True,
cwd=ROOT,
check=True,
).stdout.strip()
assert tracked == ""