Skip to content

docs: document and stabilize token/cost field semantics#317

Draft
zhongxuanwang-nv wants to merge 1 commit into
NVIDIA:mainfrom
zhongxuanwang-nv:docs/relay-243-token-cost-semantics
Draft

docs: document and stabilize token/cost field semantics#317
zhongxuanwang-nv wants to merge 1 commit into
NVIDIA:mainfrom
zhongxuanwang-nv:docs/relay-243-token-cost-semantics

Conversation

@zhongxuanwang-nv

@zhongxuanwang-nv zhongxuanwang-nv commented Jun 26, 2026

Copy link
Copy Markdown
Member

Overview

Document and stabilize NeMo Relay's LLM token and cost field semantics (RELAY-243). This freezes the current behavior as a documented contract and locks it with characterization tests. There is no runtime behavior change.

  • I confirm this contribution is my own work, or I have the right to submit it under this project's license.
  • I searched existing issues and open pull requests, and this does not duplicate existing work.

Details

Adds a canonical Token and Cost Field Semantics section to docs/integrate-into-frameworks/provider-response-codecs.mdx:

  • Usage and CostEstimate field reference (names, units, optionality).
  • Per-provider token normalization table (OpenAI Chat / OpenAI Responses / Anthropic → Usage).
  • Granularity (per call; only ATIF final_metrics aggregates), "missing ≠ zero", and "Relay does not convert currencies".
  • Exporter field-mapping table across ATOF / ATIF / OpenInference / OpenTelemetry, stated as intentional contract — OpenTelemetry is cost-only and currency-aware; ATIF and OpenInference are USD-only.
  • A stability subsection (stable as of ATOF 0.1 / ATIF-v1.7 / pricing catalog version: 1; additive-only) plus the documented limitations.

Brief field pointers + back-links were added to the OpenTelemetry, OpenInference, and ATIF exporter pages. Cost policy is stated once on the canonical page, per the runtime-contract docs convention (projections do not redefine policy). A self-contained Known Issues entry documents that ATIF derives token/cost from the raw event payload rather than the codec annotation, so codec-only usage/cost appears in OpenTelemetry/OpenInference but not in ATIF; aligning ATIF is deferred to a follow-up.

Two characterization tests lock the freeze:

  • OpenTelemetry LLM end events emit cost only — exactly the two nemo_relay.llm.cost.* keys, with no token-count or gen_ai.* attributes.
  • Usage ignores unmodeled provider subfields (forward-compat: no serde catch-all).

Existing tests already cover the remaining projections, per-provider mapping, reasoning-tokens-in-api_specific, and the USD-only/currency-aware cost behavior, so no duplicate tests were added.

Testing: targeted cargo test (both new tests pass; perturbing the OpenTelemetry exporter to emit a token attribute makes the cost-only test fail as intended, confirming the lock bites), just docs-linkcheck (0 errors), and pre-commit (SPDX, markdown linkcheck, cargo fmt/clippy/check) all pass.

Follow-up (separate ticket): align ATIF token/cost extraction with the codec-normalized annotated_response.usage so codec-only usage reaches ATIF step/final metrics; currently raw-output-sourced.

Where should the reviewer start?

docs/integrate-into-frameworks/provider-response-codecs.mdx — the new Token and Cost Field Semantics section (the exporter field-mapping table and the Stability subsection are the core contract). Then crates/core/tests/unit/observability/otel_tests.rs::llm_end_emits_cost_only_no_token_or_gen_ai_attributes.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • Closes RELAY-243

Summary by CodeRabbit

  • Documentation

    • Clarified how token and cost values are represented across observability outputs.
    • Added a central reference for token/cost field meanings, including cache token handling and currency details.
    • Updated examples for ATIF, OpenInference, and OpenTelemetry exports.
  • Tests

    • Added coverage to ensure unknown usage fields are ignored while known token fields are preserved.
    • Added coverage to confirm OpenTelemetry emits cost attributes without token-count attributes.

Add a canonical "Token and Cost Field Semantics" section to the provider response codecs page: a Usage and CostEstimate field reference, the per-provider token normalization table, an exporter field-mapping table (ATOF/ATIF/OpenInference/OpenTelemetry), and a stability contract. Add brief field pointers from the OpenTelemetry, OpenInference, and ATIF exporter pages, and a Known Issues entry noting ATIF derives token/cost from the raw event payload rather than the codec annotation.

Lock the contract with two characterization tests: OpenTelemetry LLM end events emit cost only (no token-count or gen_ai attributes), and Usage ignores unmodeled provider subfields. Existing tests already cover the other projections and the USD-only/currency-aware cost behavior.

No runtime behavior change.

Signed-off-by: Zhongxuan Wang <daniewang@nvidia.com>
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

The PR adds tests and documentation for LLM token and cost field handling across codec normalization and observability exporters. It also records an ATIF limitation that derives metrics from raw event payloads instead of codec annotation.

Changes

Token and cost semantics

Layer / File(s) Summary
Usage and cost contract
docs/integrate-into-frameworks/provider-response-codecs.mdx, crates/core/tests/unit/codec/response_tests.rs
provider-response-codecs adds token/cost field semantics and cost-field definitions, and the new Usage codec test confirms unknown provider subfields are dropped on round-trip serialization.
OpenTelemetry emission
crates/core/tests/unit/observability/otel_tests.rs, docs/integrate-into-frameworks/provider-response-codecs.mdx
The new OTEL test asserts cost-only LLM end attributes for an unannotated response, and the exporter mapping section records how token and cost fields are projected across observability outputs.
Exporter docs and known issue
docs/observability-plugin/atif.mdx, docs/observability-plugin/openinference.mdx, docs/observability-plugin/opentelemetry.mdx, docs/about-nemo-relay/release-notes/known-issues.mdx
ATIF, OpenInference, and OpenTelemetry docs add expected output field names, and the release note records the ATIF raw-payload metric limitation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • NVIDIA/NeMo-Relay#291: Shares the Usage normalization and observability attribute emission path covered by this PR’s codec and OTEL tests.
  • NVIDIA/NeMo-Relay#300: Also changes OpenTelemetry LLM end-event emission and cost/token extraction.
  • NVIDIA/NeMo-Relay#304: Covers the same token/cost normalization and exporter mapping behavior described by these tests and docs.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title uses valid Conventional Commits format and accurately summarizes the documentation-focused change.
Description check ✅ Passed The description matches the template well, with overview, details, reviewer start, and related issue information filled in.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@github-actions github-actions Bot added size:M PR is medium Documentation documentation-related lang:rust PR changes/introduces Rust code labels Jun 26, 2026
@github-actions

Copy link
Copy Markdown

@zhongxuanwang-nv zhongxuanwang-nv self-assigned this Jun 26, 2026
@zhongxuanwang-nv zhongxuanwang-nv added this to the 0.5 milestone Jun 26, 2026
@zhongxuanwang-nv zhongxuanwang-nv added the DO NOT MERGE PR should not be merged; see PR for details label Jun 26, 2026
@zhongxuanwang-nv

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@willkill07 willkill07 changed the title docs: document and stabilize token/cost field semantics (RELAY-243) docs: document and stabilize token/cost field semantics Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DO NOT MERGE PR should not be merged; see PR for details Documentation documentation-related lang:rust PR changes/introduces Rust code size:M PR is medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant