fix(validate): primary-dim filter + GCP usageUnit normalization (#57, #59, #61)#68
Merged
fix(validate): primary-dim filter + GCP usageUnit normalization (#57, #59, #61)#68
Conversation
Two real validator bugs producing chronic false-positive drift on gcp-gcs, gcp-run, gcp-functions. 1. Fan-in dimensions were sampled but couldn't be re-fetched. gcp-gcs storage rows carry fanned-in global ops prices (read-ops, write-ops) under the storage SKU id; gcp-run / gcp-functions carry memory-gb-second + requests under the cpu-second SKU id. The validator looks up the upstream SKU and reads its tieredRates[0].unitPrice — which only matches the primary dimension. Add PRIMARY_DIMENSIONS in driver.py so non-primary dimensions are filtered out before revalidation. Fan-in dims will need separate validation against their actual source SKUs (follow-up). 2. Validator did not mirror ingest's usageUnit normalization. For SKUs with usageUnit='GiBy.d' (e.g. early-delete fees), ingest converts day → month (×30.4375) but validator compared against the raw per-day price, producing 30.4375× drift. Add _USAGE_UNIT_DIVISORS in validate/gcp.py mirroring the table in ingest/gcp_common.py. Verified end-to-end against the live GCP Cloud Billing API for gcp-gcs / gcp-run / gcp-functions: 0 drift, 0 missing. Closes #57, #59, #61.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two validator bugs causing chronic false-positive drift on `gcp-gcs`, `gcp-run`, `gcp-functions`. Both verified against live Cloud Billing API.
Bug 1 — fan-in dimensions can't be re-fetched
`gcp-gcs` ingest stores 3 dims under one SKU id (storage + read-ops + write-ops); ops prices are fanned in from a different global SKU. `gcp-run` / `gcp-functions` similarly carry memory-gb-second + requests under the cpu-second SKU id. When the sampler picks a non-primary dim, the validator looks up the upstream SKU and reads its `tieredRates[0].unitPrice` — which only matches the primary dim. Different number → reported as drift.
Fix: `PRIMARY_DIMENSIONS` in `pipeline/validate/driver.py`. Non-primary samples are filtered out before revalidation. (Fan-in dims need a follow-up: validate against their actual source SKUs.)
Bug 2 — validator didn't mirror ingest unit normalization
Ingest divides by the `parse_usage_unit` divisor (e.g. `GiBy.d → /(1/30.4375)` = ×30.4375 day→month conversion); validator read raw `tieredRates[0].unitPrice`. Two early-delete-fee SKUs in gcp-gcs were drifting by exactly 30.4375×.
Fix: `_USAGE_UNIT_DIVISORS` in `pipeline/validate/gcp.py` mirroring the table in `pipeline/ingest/gcp_common.py`.
End-to-end verification
Ran the validator locally against the published 2026.05.04 catalog using ADC creds:
Test plan
Closes #57, #59, #61.