[HAI] Persistent vs Sample Level Manifest protocol

**Clarify Context Contract and Schema Convention: Dataset-Level vs. Sample-Level Information**

## Short summary
Section 1.1 of the HAIG API Standardisation Proposal establishes a context contract and schema convention but is incomplete — what information is permissible at each context level, and the practical schema conventions that follow, remain to be fully resolved.

## What is the idea or problem?
Section 1.1 of the HAIG API Standardisation Proposal defines two context levels — sample-level and dataset-level — and provides an initial indication of what belongs at each, but this is not complete. Two related but separable questions need to be resolved:

**Part 1 — Context Contract: what is permitted at each level**

What can persist at Context Level 2 (dataset-level) and what must remain at Context Level 1 (sample-level) is use-case dependent — the appropriate level is not an intrinsic property of the information itself but depends on how and where it is consumed. Two examples:

- **Annotation cache**: appropriate as persistent in deployment, but must be released one at a time in evaluation to prevent backdoor alignment with reference labels
- **Semantic label dictionary** (#1868): if translated into the payload by the front-end, it need only exist at the sample level; if translation is deferred to the backend, the dictionary itself may need to be accessible at the dataset level

The `image_cache` is an example of something unambiguously dataset-level regardless of use-case.

**Part 2 — Schema Convention: what is actually included and how**

The schema is a reflection of the context contract — given what is permitted, it declares what is included. However, practical convention may deviate from the natural context level of information for reasons of simplicity. For example, a channel-to-acquisition-protocol mapping may be dataset-level by nature but passed at the sample level to minimise overhead.

## Why does it matter?
Without a complete context contract, the schema cannot be correctly specified, and downstream architectural decisions — such as what requires persistent storage vs. stateless per-request passing — cannot be made reliably. This affects algorithm designers, front-end engineers, and evaluation framework maintainers.

## Any context, examples, or references?
- HAIG API Standardisation Proposal, Section 1.1 — Context and schema contract
- Related issue #1868 (semantic label dictionary)

## How would you like to be involved?
- [ ] I can contribute code or documentation
- [ ] I can test or provide feedback
- [ ] I want to follow the discussion
- [ ] I have other ideas or expertise to share: _____________


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HAI] Persistent vs Sample Level Manifest protocol #1904

Short summary

What is the idea or problem?

Why does it matter?

Any context, examples, or references?

How would you like to be involved?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[HAI] Persistent vs Sample Level Manifest protocol #1904

Description

Short summary

What is the idea or problem?

Why does it matter?

Any context, examples, or references?

How would you like to be involved?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions