docs: add Index Authority Receipts for IFC evidence by blocksifrdev · Pull Request #270 · Project-Navi/ordvec

blocksifrdev · 2026-06-20T13:53:43Z

Summary

Adds an optional CAIF-style Index Authority Receipt for ordvec benchmark evidence.

The goal is to make ordvec's index-first retrieval evidence machine-readable: quality delta, bytes/vector, latency regime, benchmark scope, limitations, fallback conditions, and a deterministic receipt hash.

Why

ordvec already has a strong index-first compute story: compressed ordinal/sign retrieval can preserve retrieval quality under stated benchmark scopes while reducing storage and latency.

This PR adds a small evidence packet and verifier so downstream systems can answer:

Is this compressed/index-first retrieval path authorized to answer before dense compute for this stated workload scope?

What this includes

docs/INDEX_AUTHORITY_RECEIPTS.md
schemas/caif/ordvec-index-authority.v0.1.schema.json
examples/caif/trec-covid-sign-rq2.index-authority.json
tools/verify_index_authority.py

What this does not do

Does not change Rust code
Does not change Cargo.toml
Does not add runtime dependencies
Does not add CI requirements
Does not claim new benchmark results
Does not add signing, key management, or deployment trust policy

Verification

python3 tools/verify_index_authority.py examples/caif/trec-covid-sign-rq2.index-authority.json

Expected output includes:

decision: ALLOW_INDEX_FIRST
quality_within_bootstrap_noise: true
storage_reduction: 10.67x
single_query_speedup: 105.66x
receipt_hash: sha256:...

Scope

The example uses existing public README benchmark values and preserves the stated limitations around dataset, encoder, corpus size, batch/threading regime, HNSW comparison, and larger-corpus claims.

Framing

Benchmarks should not only report performance.

They should authorize compute paths within a defined evidence envelope.

qodo-code-review · 2026-06-20T13:53:47Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a880e2b115

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

chatgpt-codex-connector · 2026-06-20T13:59:16Z

+    if float(economics.get("storage_reduction_x", 0)) < float(policy.get("min_storage_reduction_x", 0)):
+        return "REQUIRE_DENSE_FALLBACK"
+
+    if float(economics.get("single_query_speedup_x", 0)) < float(policy.get("min_single_query_speedup_x", 0)):


Recompute ratios before applying policy thresholds

When verifying a receipt whose raw bytes/latencies don’t match the derived storage_reduction_x/single_query_speedup_x, this branch authorizes based only on the supplied derived numbers. For example, a receipt can set baseline.bytes_per_vector == candidate_bytes_per_vector and equal latencies, but inflate both *_x fields above the policy thresholds and still get ALLOW_INDEX_FIRST. Since those ratios are included alongside their source values in the receipt, the verifier should recompute or at least cross-check them before using them for authorization.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-20T13:59:16Z

+    args = parser.parse_args()
+
+    data = load_json(args.receipt)
+    errors = shape_errors(data)


Reject schema-invalid receipts before computing decisions

The verifier’s only validation here is the custom shape_errors() subset, so receipts that violate the checked-in JSON schema either crash later or are still authorized. For example, omitting the schema-required decision.policy raises a traceback in compute_decision, while strings for numeric fields or extra properties are accepted even though the schema rejects them. Since this command is the documented verifier for machine-readable receipts, it should run full schema validation or mirror the required/type/additionalProperties checks before computing authorization.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-20T13:59:16Z

+    if float(economics.get("single_query_speedup_x", 0)) < float(policy.get("min_single_query_speedup_x", 0)):
+        return "REQUIRE_DENSE_FALLBACK"
+
+    return "ALLOW_INDEX_FIRST"


Handle the HNSW-comparison decision state

The schema and docs advertise REQUIRE_HNSW_COMPARISON as a valid decision, but compute_decision has no path that can return it; after the existing checks pass, every receipt falls through to ALLOW_INDEX_FIRST. A receipt for the documented regime where graph/ANN comparison is required will therefore always fail with a decision mismatch, so the verifier needs a policy/scope predicate for that state or the state should not be accepted as valid.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-20T13:59:16Z

+    ifc = data["ifc"]
+    evidence = data["evidence"]
+    economics = data["economics"]
+    policy = data["decision"]["policy"]


Require verifier-owned acceptance policy

Because the verifier reads the policy thresholds from the receipt being evaluated, a schema-valid receipt can authorize itself by lowering min_storage_reduction_x/min_single_query_speedup_x to zero or disabling the quality requirement, even when the reported speedup and storage reduction are below any meaningful bar. For an authorization verifier, these acceptance rules need to come from the verifier configuration or fixed minimums rather than the untrusted evidence packet itself.

Useful? React with 👍 / 👎.

Signed-off-by: blocksifrdev <maurice@blocksifr.com>

Fieldnote-Echo · 2026-06-20T16:39:09Z

Thank you for your contribution! We will review the PR as soon as we have the bandwidth. 🙏🏻

github-actions

🟡 Acceptable — Core concept is sound, but there are gaps to address before merging.

The Index Authority Receipt pattern is a good fit for ordvec's evidence story. However, the implementation has security and completeness issues that need resolution.

Key issues:

Missing schema file (documentation says it exists)
Self-signed policy thresholds (verifier should own acceptance policy)
Unreachable REQUIRE_HNSW_COMPARISON decision
No test coverage for the verifier
No CI validation that receipts stay valid

See inline comments for details.

Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/Project-Navi/ordvec/actions/runs/27879938562

github-actions · 2026-06-20T18:26:49Z

+    except Exception as e:
+        die(f"cannot read receipt: {e}")
+
+    for k in ["schema","subject","baseline","ifc","evidence","economics","decision","scope","limitations"]:


🟠 Important: Missing schema validation — The PR description lists schemas/caif/ordvec-index-authority.v0.1.schema.json but this file does not exist in the PR. The verifier only checks field presence and schema string equality, not JSON Schema validation. A malformed receipt will either crash later or pass silently. Either remove the schema reference from the PR description, or add the schema file and use jsonschema (stdlib-compatible, no new dependencies) to validate receipts before processing.

github-actions · 2026-06-20T18:26:49Z

+    if abs(econ["storage_reduction_x"] - expected_storage) > 0.02:
+        die("storage_reduction_x mismatch")
+
+    expected_speedup = econ["single_query_latency_ms"]["baseline"] / econ["single_query_latency_ms"]["candidate"]


🟠 Important: Self-signed policy thresholds — The verifier reads acceptance policy (min_storage_reduction_x, min_single_query_speedup_x) from the receipt being evaluated. A receipt can authorize itself by setting min_storage_reduction_x: 0.01. The policy thresholds should be verifier-owned, not receipt-owned. Options:

Move thresholds to a separate verifier config file

Require thresholds to be above documented minimums

Document explicitly that this is a self-certifying receipt system, not a trusted verifier

github-actions · 2026-06-20T18:26:49Z

+    if abs(econ["single_query_speedup_x"] - expected_speedup) > 0.02:
+        die("single_query_speedup_x mismatch")
+
+    decision = "ALLOW_INDEX_FIRST"


🟠 Important: REQUIRE_HNSW_COMPARISON is unreachable — The compute_decision logic can only return ALLOW_INDEX_FIRST, REQUIRE_DENSE_FALLBACK, or DENY_UNSCOPED_CLAIM. REQUIRE_HNSW_COMPARISON is never assigned. Either:

Add a code path that returns it (e.g., when highly parallel threaded serving is claimed but no HNSW comparison exists)

Remove it from the advertised schema/decision set

Document why it exists as a valid decision but is not reachable in this implementation

github-actions · 2026-06-20T18:26:49Z

+    print(f"single_query_speedup: {econ['single_query_speedup_x']}x")
+    print(f"receipt_hash: {sha(r)}")
+
+    if decision != r["decision"]["recommended"]:


🟡 Suggestion: Add test coverage — The verifier has no tests. At minimum, add tests for:

Valid receipt passes

Missing fields are rejected

Computed metrics that don't match declared values are rejected

The decision mismatch exit code (3)

Example location: tests/verify_index_authority_test.py

github-actions · 2026-06-20T18:26:49Z

+- `examples/caif/trec-covid-sign-rq2.index-authority.json`
+- `tools/verify_index_authority.py`
+
+## What this does not do


🟡 Documentation mismatch — The "What this includes" section lists schemas/caif/ordvec-index-authority.v0.1.schema.json but this file is not present in the PR. Either add the schema file or remove it from this list.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e501704594

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

chatgpt-codex-connector · 2026-06-21T17:54:58Z

+    if not isinstance(limitations, list):
+        die("limitations must be a list")
+
+    decision = "ALLOW_INDEX_FIRST"


Reject receipts where IFC is disabled

A receipt can set ifc.enabled to false or leave ifc.compute_path empty and still reach the default ALLOW_INDEX_FIRST decision because the verifier never reads the ifc object after requiring it exists. That allows a receipt which explicitly says no index-first path is enabled to verify successfully as long as the benchmark metrics pass, which undermines the authorization this tool is meant to provide.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-21T17:54:58Z

+    has_hnsw_comparison = (
+        e.get("compared_against_hnsw") is True
+        or isinstance(e.get("hnsw_comparison"), dict)
+    )


Validate HNSW comparison evidence before allowing claims

For an applies_to value containing production/parallel serving terms, an empty hnsw_comparison object or a bare compared_against_hnsw: true makes has_hnsw_comparison true and bypasses REQUIRE_HNSW_COMPARISON. That lets a receipt with no HNSW metrics or artifacts verify as ALLOW_INDEX_FIRST in exactly the policy-protected context, so this should require concrete comparison fields rather than just a marker.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-21T17:54:59Z

+
+    quality_loss = baseline_score - candidate_score
+    quality_too_low = quality_loss > float(policy["max_quality_delta_loss"])
+    outside_bootstrap_noise = e["within_bootstrap_noise"] is not True


Allow significant quality improvements

When candidate_score is higher than baseline_score but within_bootstrap_noise is false because the improvement is statistically significant, this flag still forces REQUIRE_DENSE_FALLBACK despite there being no quality loss and the max_quality_delta_loss policy passing. Only quality losses outside the allowed/noise envelope should block index-first authorization.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-21T17:54:59Z

+    if not isinstance(value, (int, float)) or isinstance(value, bool):
+        die(f"{label}.{key} must be a number")
+    return float(value)


Reject non-finite numeric fields

Python's JSON parser accepts NaN, and this check treats float('nan') as a valid number. If a receipt puts NaN in declared fields like storage_reduction_x or single_query_speedup_x, the later mismatch and threshold comparisons all evaluate false, so the verifier can still print verified: true with non-finite metrics.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-21T17:54:59Z

+    scope_missing = not applies_to or not does_not_claim
+    limitations_missing = not limitations


Treat blank scope entries as missing

With require_scope and require_limitations enabled, this only checks whether the lists are truthy, so applies_to: [""], does_not_claim: [""], and limitations: [""] still authorize as a scoped claim. That lets a receipt omit any meaningful workload envelope while satisfying the default policy; validate that these lists contain non-empty string entries.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-21T17:54:59Z

+    claims_parallel_or_production = any(
+        marker in claims_text
+        for marker in ["parallel", "threaded", "production", "prod", "serving", "online"]
+    )


Require HNSW only for parallel claims

The policy flag is specifically require_hnsw_comparison_for_parallel_claims, and the receipt text says HNSW is required for highly parallel threaded serving claims, but these markers also trigger on any production, serving, or online scope. A receipt scoped to single-query production serving would therefore be rejected with REQUIRE_HNSW_COMPARISON even though it is not making a parallel-throughput claim.

Useful? React with 👍 / 👎.

blocksifrdev · 2026-06-21T18:00:12Z

Addressed. Changes: - Reject receipts unless IFC is explicitly enabled and compute_path is non-empty. - Reject non-finite numeric fields, including NaN. - Require scope and limitation lists to contain non-empty strings. - Allow statistically significant quality improvements instead of forcing dense fallback. - Require concrete HNSW comparison evidence for parallel/threaded/concurrent throughput claims. - Narrowed HNSW requirement so generic single-query production serving does not trigger REQUIRE_HNSW_COMPARISON. Receipts now carry evidence only; verifier-owned policy remains the authority boundary.

________________________________ From: chatgpt-codex-connector[bot] ***@***.***> Sent: Sunday, June 21, 2026 1:55 PM To: Project-Navi/ordvec ***@***.***> Cc: Maurice Witten ***@***.***>; Author ***@***.***> Subject: Re: [Project-Navi/ordvec] docs: add Index Authority Receipts for IFC evidence (PR #270) @chatgpt-codex-connector[bot] commented on this pull request. 💡 Codex Review Here are some automated review suggestions for this pull request. Reviewed commit: e501704 ℹ️ About Codex in GitHub Your team has set up Codex to review pull requests in this repo<https://chatgpt.com/codex/cloud/settings/general>. Reviews are triggered when you * Open a pull request for review * Mark a draft as ready * Comment "Codex ***@***.***)<https://github.com/codex> review". If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "Codex ***@***.***)<https://github.com/codex> address that feedback".