Skip to content

fix(validate): primary-dim filter + GCP usageUnit normalization (#57, #59, #61)#68

Merged
sofq merged 1 commit intomainfrom
fix/validate-gcp-primary-dim
May 5, 2026
Merged

fix(validate): primary-dim filter + GCP usageUnit normalization (#57, #59, #61)#68
sofq merged 1 commit intomainfrom
fix/validate-gcp-primary-dim

Conversation

@sofq
Copy link
Copy Markdown
Owner

@sofq sofq commented May 5, 2026

Summary

Two validator bugs causing chronic false-positive drift on `gcp-gcs`, `gcp-run`, `gcp-functions`. Both verified against live Cloud Billing API.

Bug 1 — fan-in dimensions can't be re-fetched

`gcp-gcs` ingest stores 3 dims under one SKU id (storage + read-ops + write-ops); ops prices are fanned in from a different global SKU. `gcp-run` / `gcp-functions` similarly carry memory-gb-second + requests under the cpu-second SKU id. When the sampler picks a non-primary dim, the validator looks up the upstream SKU and reads its `tieredRates[0].unitPrice` — which only matches the primary dim. Different number → reported as drift.

Fix: `PRIMARY_DIMENSIONS` in `pipeline/validate/driver.py`. Non-primary samples are filtered out before revalidation. (Fan-in dims need a follow-up: validate against their actual source SKUs.)

Bug 2 — validator didn't mirror ingest unit normalization

Ingest divides by the `parse_usage_unit` divisor (e.g. `GiBy.d → /(1/30.4375)` = ×30.4375 day→month conversion); validator read raw `tieredRates[0].unitPrice`. Two early-delete-fee SKUs in gcp-gcs were drifting by exactly 30.4375×.

Fix: `_USAGE_UNIT_DIVISORS` in `pipeline/validate/gcp.py` mirroring the table in `pipeline/ingest/gcp_common.py`.

End-to-end verification

Ran the validator locally against the published 2026.05.04 catalog using ADC creds:

Shard Before After
gcp-gcs 17 drift 0 drift, 0 missing
gcp-run 7 drift 0 drift, 0 missing
gcp-functions 9 drift 0 drift, 0 missing

Test plan

  • `uv run pytest pipeline/tests/` — 599 passed.
  • New `test_gcp_validator_applies_usage_unit_divisor` covers the GiBy.d normalization.
  • New `test_driver_filters_to_primary_dimensions` covers the dimension filter.
  • Local end-to-end run against live API for all three shards.

Closes #57, #59, #61.

Two real validator bugs producing chronic false-positive drift on
gcp-gcs, gcp-run, gcp-functions.

1. Fan-in dimensions were sampled but couldn't be re-fetched.
   gcp-gcs storage rows carry fanned-in global ops prices (read-ops,
   write-ops) under the storage SKU id; gcp-run / gcp-functions
   carry memory-gb-second + requests under the cpu-second SKU id.
   The validator looks up the upstream SKU and reads its
   tieredRates[0].unitPrice — which only matches the primary
   dimension. Add PRIMARY_DIMENSIONS in driver.py so non-primary
   dimensions are filtered out before revalidation. Fan-in dims will
   need separate validation against their actual source SKUs (follow-up).

2. Validator did not mirror ingest's usageUnit normalization.
   For SKUs with usageUnit='GiBy.d' (e.g. early-delete fees), ingest
   converts day → month (×30.4375) but validator compared against the
   raw per-day price, producing 30.4375× drift. Add
   _USAGE_UNIT_DIVISORS in validate/gcp.py mirroring the table in
   ingest/gcp_common.py.

Verified end-to-end against the live GCP Cloud Billing API for
gcp-gcs / gcp-run / gcp-functions: 0 drift, 0 missing.

Closes #57, #59, #61.
@sofq sofq merged commit 89ee462 into main May 5, 2026
20 checks passed
@sofq sofq deleted the fix/validate-gcp-primary-dim branch May 5, 2026 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Catalog drift in gcp-run

1 participant