docs: add Index Authority Receipts for IFC evidence#270
docs: add Index Authority Receipts for IFC evidence#270Maurice Witten (blocksifrdev) wants to merge 3 commits into
Conversation
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a880e2b115
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
| if float(economics.get("storage_reduction_x", 0)) < float(policy.get("min_storage_reduction_x", 0)): | ||
| return "REQUIRE_DENSE_FALLBACK" | ||
|
|
||
| if float(economics.get("single_query_speedup_x", 0)) < float(policy.get("min_single_query_speedup_x", 0)): |
There was a problem hiding this comment.
Recompute ratios before applying policy thresholds
When verifying a receipt whose raw bytes/latencies don’t match the derived storage_reduction_x/single_query_speedup_x, this branch authorizes based only on the supplied derived numbers. For example, a receipt can set baseline.bytes_per_vector == candidate_bytes_per_vector and equal latencies, but inflate both *_x fields above the policy thresholds and still get ALLOW_INDEX_FIRST. Since those ratios are included alongside their source values in the receipt, the verifier should recompute or at least cross-check them before using them for authorization.
Useful? React with 👍 / 👎.
| args = parser.parse_args() | ||
|
|
||
| data = load_json(args.receipt) | ||
| errors = shape_errors(data) |
There was a problem hiding this comment.
Reject schema-invalid receipts before computing decisions
The verifier’s only validation here is the custom shape_errors() subset, so receipts that violate the checked-in JSON schema either crash later or are still authorized. For example, omitting the schema-required decision.policy raises a traceback in compute_decision, while strings for numeric fields or extra properties are accepted even though the schema rejects them. Since this command is the documented verifier for machine-readable receipts, it should run full schema validation or mirror the required/type/additionalProperties checks before computing authorization.
Useful? React with 👍 / 👎.
| if float(economics.get("single_query_speedup_x", 0)) < float(policy.get("min_single_query_speedup_x", 0)): | ||
| return "REQUIRE_DENSE_FALLBACK" | ||
|
|
||
| return "ALLOW_INDEX_FIRST" |
There was a problem hiding this comment.
Handle the HNSW-comparison decision state
The schema and docs advertise REQUIRE_HNSW_COMPARISON as a valid decision, but compute_decision has no path that can return it; after the existing checks pass, every receipt falls through to ALLOW_INDEX_FIRST. A receipt for the documented regime where graph/ANN comparison is required will therefore always fail with a decision mismatch, so the verifier needs a policy/scope predicate for that state or the state should not be accepted as valid.
Useful? React with 👍 / 👎.
| ifc = data["ifc"] | ||
| evidence = data["evidence"] | ||
| economics = data["economics"] | ||
| policy = data["decision"]["policy"] |
There was a problem hiding this comment.
Require verifier-owned acceptance policy
Because the verifier reads the policy thresholds from the receipt being evaluated, a schema-valid receipt can authorize itself by lowering min_storage_reduction_x/min_single_query_speedup_x to zero or disabling the quality requirement, even when the reported speedup and storage reduction are below any meaningful bar. For an authorization verifier, these acceptance rules need to come from the verifier configuration or fixed minimums rather than the untrusted evidence packet itself.
Useful? React with 👍 / 👎.
Signed-off-by: blocksifrdev <maurice@blocksifr.com>
a880e2b to
d302dd7
Compare
|
Thank you for your contribution! We will review the PR as soon as we have the bandwidth. 🙏🏻 |
There was a problem hiding this comment.
🟡 Acceptable — Core concept is sound, but there are gaps to address before merging.
The Index Authority Receipt pattern is a good fit for ordvec's evidence story. However, the implementation has security and completeness issues that need resolution.
Key issues:
- Missing schema file (documentation says it exists)
- Self-signed policy thresholds (verifier should own acceptance policy)
- Unreachable
REQUIRE_HNSW_COMPARISONdecision - No test coverage for the verifier
- No CI validation that receipts stay valid
See inline comments for details.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/Project-Navi/ordvec/actions/runs/27879938562
| except Exception as e: | ||
| die(f"cannot read receipt: {e}") | ||
|
|
||
| for k in ["schema","subject","baseline","ifc","evidence","economics","decision","scope","limitations"]: |
There was a problem hiding this comment.
🟠 Important: Missing schema validation — The PR description lists schemas/caif/ordvec-index-authority.v0.1.schema.json but this file does not exist in the PR. The verifier only checks field presence and schema string equality, not JSON Schema validation. A malformed receipt will either crash later or pass silently. Either remove the schema reference from the PR description, or add the schema file and use jsonschema (stdlib-compatible, no new dependencies) to validate receipts before processing.
| if abs(econ["storage_reduction_x"] - expected_storage) > 0.02: | ||
| die("storage_reduction_x mismatch") | ||
|
|
||
| expected_speedup = econ["single_query_latency_ms"]["baseline"] / econ["single_query_latency_ms"]["candidate"] |
There was a problem hiding this comment.
🟠 Important: Self-signed policy thresholds — The verifier reads acceptance policy (min_storage_reduction_x, min_single_query_speedup_x) from the receipt being evaluated. A receipt can authorize itself by setting min_storage_reduction_x: 0.01. The policy thresholds should be verifier-owned, not receipt-owned. Options:
- Move thresholds to a separate verifier config file
- Require thresholds to be above documented minimums
- Document explicitly that this is a self-certifying receipt system, not a trusted verifier
| if abs(econ["single_query_speedup_x"] - expected_speedup) > 0.02: | ||
| die("single_query_speedup_x mismatch") | ||
|
|
||
| decision = "ALLOW_INDEX_FIRST" |
There was a problem hiding this comment.
🟠 Important: REQUIRE_HNSW_COMPARISON is unreachable — The compute_decision logic can only return ALLOW_INDEX_FIRST, REQUIRE_DENSE_FALLBACK, or DENY_UNSCOPED_CLAIM. REQUIRE_HNSW_COMPARISON is never assigned. Either:
- Add a code path that returns it (e.g., when highly parallel threaded serving is claimed but no HNSW comparison exists)
- Remove it from the advertised schema/decision set
- Document why it exists as a valid decision but is not reachable in this implementation
| print(f"single_query_speedup: {econ['single_query_speedup_x']}x") | ||
| print(f"receipt_hash: {sha(r)}") | ||
|
|
||
| if decision != r["decision"]["recommended"]: |
There was a problem hiding this comment.
🟡 Suggestion: Add test coverage — The verifier has no tests. At minimum, add tests for:
- Valid receipt passes
- Missing fields are rejected
- Computed metrics that don't match declared values are rejected
- The decision mismatch exit code (3)
Example location: tests/verify_index_authority_test.py
| - `examples/caif/trec-covid-sign-rq2.index-authority.json` | ||
| - `tools/verify_index_authority.py` | ||
|
|
||
| ## What this does not do |
There was a problem hiding this comment.
🟡 Documentation mismatch — The "What this includes" section lists schemas/caif/ordvec-index-authority.v0.1.schema.json but this file is not present in the PR. Either add the schema file or remove it from this list.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e501704594
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
| if not isinstance(limitations, list): | ||
| die("limitations must be a list") | ||
|
|
||
| decision = "ALLOW_INDEX_FIRST" |
There was a problem hiding this comment.
Reject receipts where IFC is disabled
A receipt can set ifc.enabled to false or leave ifc.compute_path empty and still reach the default ALLOW_INDEX_FIRST decision because the verifier never reads the ifc object after requiring it exists. That allows a receipt which explicitly says no index-first path is enabled to verify successfully as long as the benchmark metrics pass, which undermines the authorization this tool is meant to provide.
Useful? React with 👍 / 👎.
| has_hnsw_comparison = ( | ||
| e.get("compared_against_hnsw") is True | ||
| or isinstance(e.get("hnsw_comparison"), dict) | ||
| ) |
There was a problem hiding this comment.
Validate HNSW comparison evidence before allowing claims
For an applies_to value containing production/parallel serving terms, an empty hnsw_comparison object or a bare compared_against_hnsw: true makes has_hnsw_comparison true and bypasses REQUIRE_HNSW_COMPARISON. That lets a receipt with no HNSW metrics or artifacts verify as ALLOW_INDEX_FIRST in exactly the policy-protected context, so this should require concrete comparison fields rather than just a marker.
Useful? React with 👍 / 👎.
|
|
||
| quality_loss = baseline_score - candidate_score | ||
| quality_too_low = quality_loss > float(policy["max_quality_delta_loss"]) | ||
| outside_bootstrap_noise = e["within_bootstrap_noise"] is not True |
There was a problem hiding this comment.
Allow significant quality improvements
When candidate_score is higher than baseline_score but within_bootstrap_noise is false because the improvement is statistically significant, this flag still forces REQUIRE_DENSE_FALLBACK despite there being no quality loss and the max_quality_delta_loss policy passing. Only quality losses outside the allowed/noise envelope should block index-first authorization.
Useful? React with 👍 / 👎.
| if not isinstance(value, (int, float)) or isinstance(value, bool): | ||
| die(f"{label}.{key} must be a number") | ||
| return float(value) |
There was a problem hiding this comment.
Reject non-finite numeric fields
Python's JSON parser accepts NaN, and this check treats float('nan') as a valid number. If a receipt puts NaN in declared fields like storage_reduction_x or single_query_speedup_x, the later mismatch and threshold comparisons all evaluate false, so the verifier can still print verified: true with non-finite metrics.
Useful? React with 👍 / 👎.
| scope_missing = not applies_to or not does_not_claim | ||
| limitations_missing = not limitations |
There was a problem hiding this comment.
Treat blank scope entries as missing
With require_scope and require_limitations enabled, this only checks whether the lists are truthy, so applies_to: [""], does_not_claim: [""], and limitations: [""] still authorize as a scoped claim. That lets a receipt omit any meaningful workload envelope while satisfying the default policy; validate that these lists contain non-empty string entries.
Useful? React with 👍 / 👎.
| claims_parallel_or_production = any( | ||
| marker in claims_text | ||
| for marker in ["parallel", "threaded", "production", "prod", "serving", "online"] | ||
| ) |
There was a problem hiding this comment.
Require HNSW only for parallel claims
The policy flag is specifically require_hnsw_comparison_for_parallel_claims, and the receipt text says HNSW is required for highly parallel threaded serving claims, but these markers also trigger on any production, serving, or online scope. A receipt scoped to single-query production serving would therefore be rejected with REQUIRE_HNSW_COMPARISON even though it is not making a parallel-throughput claim.
Useful? React with 👍 / 👎.
|
Addressed.
Changes:
- Reject receipts unless IFC is explicitly enabled and compute_path is non-empty.
- Reject non-finite numeric fields, including NaN.
- Require scope and limitation lists to contain non-empty strings.
- Allow statistically significant quality improvements instead of forcing dense fallback.
- Require concrete HNSW comparison evidence for parallel/threaded/concurrent throughput claims.
- Narrowed HNSW requirement so generic single-query production serving does not trigger REQUIRE_HNSW_COMPARISON.
Receipts now carry evidence only; verifier-owned policy remains the authority boundary.
________________________________
From: chatgpt-codex-connector[bot] ***@***.***>
Sent: Sunday, June 21, 2026 1:55 PM
To: Project-Navi/ordvec ***@***.***>
Cc: Maurice Witten ***@***.***>; Author ***@***.***>
Subject: Re: [Project-Navi/ordvec] docs: add Index Authority Receipts for IFC evidence (PR #270)
@chatgpt-codex-connector[bot] commented on this pull request.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e501704
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo<https://chatgpt.com/codex/cloud/settings/general>. Reviews are triggered when you
* Open a pull request for review
* Mark a draft as ready
* Comment "Codex ***@***.***)<https://github.com/codex> review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex ***@***.***)<https://github.com/codex> address that feedback".
________________________________
In tools/verify_index_authority.py<#270 (comment)>:
+
+ expected_storage = baseline_bytes / candidate_bytes
+ if abs(declared_storage - expected_storage) > 0.02:
+ die("storage_reduction_x mismatch")
+
+ expected_speedup = baseline_latency / candidate_latency
+ if abs(declared_speedup - expected_speedup) > 0.02:
+ die("single_query_speedup_x mismatch")
+
+ applies_to = require_list(scope, "applies_to", "scope")
+ does_not_claim = require_list(scope, "does_not_claim", "scope")
+
+ if not isinstance(limitations, list):
+ die("limitations must be a list")
+
+ decision = "ALLOW_INDEX_FIRST"
[P2 Badge]<https://camo.githubusercontent.com/f2c1aacb361ddd3a0e9f9cacdb84fab050de434017f6747bb916e31e29bdf03d/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f50322d79656c6c6f773f7374796c653d666c6174> Reject receipts where IFC is disabled
A receipt can set ifc.enabled to false or leave ifc.compute_path empty and still reach the default ALLOW_INDEX_FIRST decision because the verifier never reads the ifc object after requiring it exists. That allows a receipt which explicitly says no index-first path is enabled to verify successfully as long as the benchmark metrics pass, which undermines the authorization this tool is meant to provide.
Useful? React with 👍 / 👎.
________________________________
In tools/verify_index_authority.py<#270 (comment)>:
+ has_hnsw_comparison = (
+ e.get("compared_against_hnsw") is True
+ or isinstance(e.get("hnsw_comparison"), dict)
+ )
[P2 Badge]<https://camo.githubusercontent.com/f2c1aacb361ddd3a0e9f9cacdb84fab050de434017f6747bb916e31e29bdf03d/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f50322d79656c6c6f773f7374796c653d666c6174> Validate HNSW comparison evidence before allowing claims
For an applies_to value containing production/parallel serving terms, an empty hnsw_comparison object or a bare compared_against_hnsw: true makes has_hnsw_comparison true and bypasses REQUIRE_HNSW_COMPARISON. That lets a receipt with no HNSW metrics or artifacts verify as ALLOW_INDEX_FIRST in exactly the policy-protected context, so this should require concrete comparison fields rather than just a marker.
Useful? React with 👍 / 👎.
________________________________
In tools/verify_index_authority.py<#270 (comment)>:
+ die("single_query_speedup_x mismatch")
+
+ applies_to = require_list(scope, "applies_to", "scope")
+ does_not_claim = require_list(scope, "does_not_claim", "scope")
+
+ if not isinstance(limitations, list):
+ die("limitations must be a list")
+
+ decision = "ALLOW_INDEX_FIRST"
+
+ scope_missing = not applies_to or not does_not_claim
+ limitations_missing = not limitations
+
+ quality_loss = baseline_score - candidate_score
+ quality_too_low = quality_loss > float(policy["max_quality_delta_loss"])
+ outside_bootstrap_noise = e["within_bootstrap_noise"] is not True
[P2 Badge]<https://camo.githubusercontent.com/f2c1aacb361ddd3a0e9f9cacdb84fab050de434017f6747bb916e31e29bdf03d/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f50322d79656c6c6f773f7374796c653d666c6174> Allow significant quality improvements
When candidate_score is higher than baseline_score but within_bootstrap_noise is false because the improvement is statistically significant, this flag still forces REQUIRE_DENSE_FALLBACK despite there being no quality loss and the max_quality_delta_loss policy passing. Only quality losses outside the allowed/noise envelope should block index-first authorization.
Useful? React with 👍 / 👎.
________________________________
In tools/verify_index_authority.py<#270 (comment)>:
+ if not isinstance(value, (int, float)) or isinstance(value, bool):
+ die(f"{label}.{key} must be a number")
+ return float(value)
[P2 Badge]<https://camo.githubusercontent.com/f2c1aacb361ddd3a0e9f9cacdb84fab050de434017f6747bb916e31e29bdf03d/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f50322d79656c6c6f773f7374796c653d666c6174> Reject non-finite numeric fields
Python's JSON parser accepts NaN, and this check treats float('nan') as a valid number. If a receipt puts NaN in declared fields like storage_reduction_x or single_query_speedup_x, the later mismatch and threshold comparisons all evaluate false, so the verifier can still print verified: true with non-finite metrics.
Useful? React with 👍 / 👎.
________________________________
In tools/verify_index_authority.py<#270 (comment)>:
+ scope_missing = not applies_to or not does_not_claim
+ limitations_missing = not limitations
[P2 Badge]<https://camo.githubusercontent.com/f2c1aacb361ddd3a0e9f9cacdb84fab050de434017f6747bb916e31e29bdf03d/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f50322d79656c6c6f773f7374796c653d666c6174> Treat blank scope entries as missing
With require_scope and require_limitations enabled, this only checks whether the lists are truthy, so applies_to: [""], does_not_claim: [""], and limitations: [""] still authorize as a scoped claim. That lets a receipt omit any meaningful workload envelope while satisfying the default policy; validate that these lists contain non-empty string entries.
Useful? React with 👍 / 👎.
________________________________
In tools/verify_index_authority.py<#270 (comment)>:
+ claims_parallel_or_production = any(
+ marker in claims_text
+ for marker in ["parallel", "threaded", "production", "prod", "serving", "online"]
+ )
[P2 Badge]<https://camo.githubusercontent.com/f2c1aacb361ddd3a0e9f9cacdb84fab050de434017f6747bb916e31e29bdf03d/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f50322d79656c6c6f773f7374796c653d666c6174> Require HNSW only for parallel claims
The policy flag is specifically require_hnsw_comparison_for_parallel_claims, and the receipt text says HNSW is required for highly parallel threaded serving claims, but these markers also trigger on any production, serving, or online scope. A receipt scoped to single-query production serving would therefore be rejected with REQUIRE_HNSW_COMPARISON even though it is not making a parallel-throughput claim.
Useful? React with 👍 / 👎.
—
Reply to this email directly, view it on GitHub<#270?email_source=notifications&email_token=B2HS6GQAGVLJU3BVSSC3SAT5BAOQVA5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTINJUGAYDOMBWGM3KM4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2KYZTPN52GK4S7MNWGSY3L#pullrequestreview-4540070636>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/B2HS6GXHN67G4U7LLZGVFQ35BAOQVAVCNFSNUABGKJSXA33TNF2G64TZHMYTENBXGA2TCNRTGY5US43TOVSTWNBXGA3DQOJVGI3DTILWAI>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
|
You have reached your Codex usage limits. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5a6c18c60b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
| economics_too_weak = ( | ||
| declared_storage < float(policy["min_storage_reduction_x"]) | ||
| or declared_speedup < float(policy["min_single_query_speedup_x"]) | ||
| ) |
There was a problem hiding this comment.
Use recomputed ratios for threshold decisions
Fresh evidence after the ratio cross-check is that the verifier still applies policy thresholds to the receipt's declared ratios, not the recomputed expected_storage/expected_speedup. Because the mismatch tolerance is ±0.02, a receipt with true speedup 1.24 can declare 1.2501, pass the cross-check, and get ALLOW_INDEX_FIRST against the default 1.25 threshold even though the benchmark data is below policy; make the decision from the recomputed ratios or compare thresholds against them.
Useful? React with 👍 / 👎.
| has_metric_pair = any( | ||
| isinstance(h.get(a), (int, float)) | ||
| and isinstance(h.get(b), (int, float)) | ||
| and math.isfinite(float(h.get(a))) | ||
| and math.isfinite(float(h.get(b))) |
There was a problem hiding this comment.
Fresh evidence in the current implementation is that these isinstance(..., (int, float)) checks count booleans as numbers, so a parallel-claim receipt with a non-empty artifact plus baseline_latency_ms: true and candidate_latency_ms: false satisfies has_metric_pair and skips REQUIRE_HNSW_COMPARISON without any real HNSW measurements. Exclude bool (and apply the same numeric validation to nested latency) before treating the comparison as concrete.
Useful? React with 👍 / 👎.
| "min_storage_reduction_x", | ||
| "min_single_query_speedup_x", | ||
| "max_quality_delta_loss", |
There was a problem hiding this comment.
Validate policy thresholds before comparison
The policy input is only checked for key presence here, and the later float(...) conversions accept JSON strings such as "NaN"; comparisons against NaN are false, so a custom --policy with "min_storage_reduction_x": "NaN" or "max_quality_delta_loss": "NaN" can make receipts with weak economics or large quality loss verify as ALLOW_INDEX_FIRST. Validate these policy fields as finite non-boolean numbers before computing the decision.
Useful? React with 👍 / 👎.
| quality_loss = max(0.0, baseline_score - candidate_score) | ||
| outside_bootstrap_noise = e["within_bootstrap_noise"] is not True | ||
| quality_too_low = quality_loss > float(policy["max_quality_delta_loss"]) and outside_bootstrap_noise |
There was a problem hiding this comment.
Enforce max quality-loss cap even within noise
Because this combines the loss cap and bootstrap flag with and, any receipt that sets within_bootstrap_noise to true bypasses max_quality_delta_loss entirely. For example, with the default policy a candidate score far below baseline still verifies as ALLOW_INDEX_FIRST as long as the receipt marks it within bootstrap noise; the configured maximum loss should remain a hard cap, with the noise flag only affecting losses inside that cap or a separate policy check.
Useful? React with 👍 / 👎.
Summary
Adds an optional CAIF-style Index Authority Receipt for ordvec benchmark evidence.
The goal is to make ordvec's index-first retrieval evidence machine-readable: quality delta, bytes/vector, latency regime, benchmark scope, limitations, fallback conditions, and a deterministic receipt hash.
Why
ordvec already has a strong index-first compute story: compressed ordinal/sign retrieval can preserve retrieval quality under stated benchmark scopes while reducing storage and latency.
This PR adds a small evidence packet and verifier so downstream systems can answer:
What this includes
docs/INDEX_AUTHORITY_RECEIPTS.mdschemas/caif/ordvec-index-authority.v0.1.schema.jsonexamples/caif/trec-covid-sign-rq2.index-authority.jsontools/verify_index_authority.pyWhat this does not do
Cargo.tomlVerification
Expected output includes:
Scope
The example uses existing public README benchmark values and preserves the stated limitations around dataset, encoder, corpus size, batch/threading regime, HNSW comparison, and larger-corpus claims.
Framing
Benchmarks should not only report performance.
They should authorize compute paths within a defined evidence envelope.