docs: document and stabilize token/cost field semantics by zhongxuanwang-nv · Pull Request #317 · NVIDIA/NeMo-Relay

zhongxuanwang-nv · 2026-06-26T00:46:52Z

Overview

Document and stabilize NeMo Relay's LLM token and cost field semantics (RELAY-243). This freezes the current behavior as a documented contract and locks it with characterization tests. There is no runtime behavior change.

I confirm this contribution is my own work, or I have the right to submit it under this project's license.
I searched existing issues and open pull requests, and this does not duplicate existing work.

Details

Adds a canonical Token and Cost Field Semantics section to docs/integrate-into-frameworks/provider-response-codecs.mdx:

Usage and CostEstimate field reference (names, units, optionality).
Per-provider token normalization table (OpenAI Chat / OpenAI Responses / Anthropic → Usage).
Granularity (per call; only ATIF final_metrics aggregates), "missing ≠ zero", and "Relay does not convert currencies".
Exporter field-mapping table across ATOF / ATIF / OpenInference / OpenTelemetry, stated as intentional contract — OpenTelemetry is cost-only and currency-aware; ATIF and OpenInference are USD-only.
A stability subsection (stable as of ATOF 0.1 / ATIF-v1.7 / pricing catalog version: 1; additive-only) plus the documented limitations.

Brief field pointers + back-links were added to the OpenTelemetry, OpenInference, and ATIF exporter pages. Cost policy is stated once on the canonical page, per the runtime-contract docs convention (projections do not redefine policy). A self-contained Known Issues entry documents that ATIF derives token/cost from the raw event payload rather than the codec annotation, so codec-only usage/cost appears in OpenTelemetry/OpenInference but not in ATIF; aligning ATIF is deferred to a follow-up.

Two characterization tests lock the freeze:

OpenTelemetry LLM end events emit cost only — exactly the two nemo_relay.llm.cost.* keys, with no token-count or gen_ai.* attributes.
Usage ignores unmodeled provider subfields (forward-compat: no serde catch-all).

Existing tests already cover the remaining projections, per-provider mapping, reasoning-tokens-in-api_specific, and the USD-only/currency-aware cost behavior, so no duplicate tests were added.

Testing: targeted cargo test (both new tests pass; perturbing the OpenTelemetry exporter to emit a token attribute makes the cost-only test fail as intended, confirming the lock bites), just docs-linkcheck (0 errors), and pre-commit (SPDX, markdown linkcheck, cargo fmt/clippy/check) all pass.

Follow-up (separate ticket): align ATIF token/cost extraction with the codec-normalized annotated_response.usage so codec-only usage reaches ATIF step/final metrics; currently raw-output-sourced.

Where should the reviewer start?

docs/integrate-into-frameworks/provider-response-codecs.mdx — the new Token and Cost Field Semantics section (the exporter field-mapping table and the Stability subsection are the core contract). Then crates/core/tests/unit/observability/otel_tests.rs::llm_end_emits_cost_only_no_token_or_gen_ai_attributes.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Closes RELAY-243

Summary by CodeRabbit

Documentation
- Clarified how token and cost values are represented across observability outputs.
- Added a central reference for token/cost field meanings, including cache token handling and currency details.
- Updated examples for ATIF, OpenInference, and OpenTelemetry exports.
Tests
- Added coverage to ensure unknown usage fields are ignored while known token fields are preserved.
- Added coverage to confirm OpenTelemetry emits cost attributes without token-count attributes.

Add a canonical "Token and Cost Field Semantics" section to the provider response codecs page: a Usage and CostEstimate field reference, the per-provider token normalization table, an exporter field-mapping table (ATOF/ATIF/OpenInference/OpenTelemetry), and a stability contract. Add brief field pointers from the OpenTelemetry, OpenInference, and ATIF exporter pages, and a Known Issues entry noting ATIF derives token/cost from the raw event payload rather than the codec annotation. Lock the contract with two characterization tests: OpenTelemetry LLM end events emit cost only (no token-count or gen_ai attributes), and Usage ignores unmodeled provider subfields. Existing tests already cover the other projections and the USD-only/currency-aware cost behavior. No runtime behavior change. Signed-off-by: Zhongxuan Wang <daniewang@nvidia.com>

coderabbitai · 2026-06-26T00:47:01Z

Walkthrough

The PR adds tests and documentation for LLM token and cost field handling across codec normalization and observability exporters. It also records an ATIF limitation that derives metrics from raw event payloads instead of codec annotation.

Changes

Token and cost semantics

Layer / File(s)	Summary
Usage and cost contract `docs/integrate-into-frameworks/provider-response-codecs.mdx`, `crates/core/tests/unit/codec/response_tests.rs`	`provider-response-codecs` adds token/cost field semantics and cost-field definitions, and the new `Usage` codec test confirms unknown provider subfields are dropped on round-trip serialization.
OpenTelemetry emission `crates/core/tests/unit/observability/otel_tests.rs`, `docs/integrate-into-frameworks/provider-response-codecs.mdx`	The new OTEL test asserts cost-only LLM end attributes for an unannotated response, and the exporter mapping section records how token and cost fields are projected across observability outputs.
Exporter docs and known issue `docs/observability-plugin/atif.mdx`, `docs/observability-plugin/openinference.mdx`, `docs/observability-plugin/opentelemetry.mdx`, `docs/about-nemo-relay/release-notes/known-issues.mdx`	ATIF, OpenInference, and OpenTelemetry docs add expected output field names, and the release note records the ATIF raw-payload metric limitation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

NVIDIA/NeMo-Relay#291: Shares the Usage normalization and observability attribute emission path covered by this PR’s codec and OTEL tests.
NVIDIA/NeMo-Relay#300: Also changes OpenTelemetry LLM end-event emission and cost/token extraction.
NVIDIA/NeMo-Relay#304: Covers the same token/cost normalization and exporter mapping behavior described by these tests and docs.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title uses valid Conventional Commits format and accurately summarizes the documentation-focused change.
Description check	✅ Passed	The description matches the template well, with overview, details, reviewer start, and related issue information filled in.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

github-actions · 2026-06-26T00:51:10Z

Fern docs preview: https://nvidia-preview-pull-request-317.docs.buildwithfern.com/nemo/relay (https://nvidia-preview-pull-request-317.docs.buildwithfern.com/nemo/relay)

zhongxuanwang-nv · 2026-06-26T01:12:59Z

@coderabbitai review

coderabbitai · 2026-06-26T01:13:03Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

github-actions Bot added size:M PR is medium Documentation documentation-related lang:rust PR changes/introduces Rust code labels Jun 26, 2026

copy-pr-bot Bot temporarily deployed to fern June 26, 2026 00:47 Inactive

zhongxuanwang-nv self-assigned this Jun 26, 2026

zhongxuanwang-nv added this to the 0.5 milestone Jun 26, 2026

zhongxuanwang-nv added the DO NOT MERGE PR should not be merged; see PR for details label Jun 26, 2026

willkill07 changed the title ~~docs: document and stabilize token/cost field semantics (RELAY-243)~~ docs: document and stabilize token/cost field semantics Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: document and stabilize token/cost field semantics#317

docs: document and stabilize token/cost field semantics#317
zhongxuanwang-nv wants to merge 1 commit into
NVIDIA:mainfrom
zhongxuanwang-nv:docs/relay-243-token-cost-semantics

zhongxuanwang-nv commented Jun 26, 2026 •

edited by willkill07

Loading

Uh oh!

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

zhongxuanwang-nv commented Jun 26, 2026

Uh oh!

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

zhongxuanwang-nv commented Jun 26, 2026 • edited by willkill07 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Details

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

zhongxuanwang-nv commented Jun 26, 2026

Uh oh!

coderabbitai Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zhongxuanwang-nv commented Jun 26, 2026 •

edited by willkill07

Loading

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading