Skip to content

refactor(telemetry)!: move backend tracing onto plugin/hook pattern#1181

Open
ajbozarth wants to merge 1 commit into
generative-computing:mainfrom
ajbozarth:feat/enhanced-tracing
Open

refactor(telemetry)!: move backend tracing onto plugin/hook pattern#1181
ajbozarth wants to merge 1 commit into
generative-computing:mainfrom
ajbozarth:feat/enhanced-tracing

Conversation

@ajbozarth
Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth commented Jun 1, 2026

Pull Request

Issue

Fixes #1045, fixes #1046, fixes #1047. Phase 1 of #444. Also closes out item 2 of #909.

Description

Phase 1 of the tracing epic. Migrates backend span emission from inline calls scattered across five backends onto a BackendTracingPlugin that subscribes to the existing generation_* hooks (chat path) and new generation_batch_* hooks (raw path). Spans stay live on the OTel context across the API call so nested instrumentation (httpx, langchain) parents under the backend span.

Reviewer call-outs:

  1. Breaking: env var rename. MELLEA_TRACE_*MELLEA_TRACES_* (plural), aligned with OTEL_EXPORTER_OTLP_TRACES_ENDPOINT and MELLEA_METRICS_*. Adds MELLEA_TRACES_ENABLED umbrella flag and opt-in MELLEA_TRACES_OTLP. No deprecation shim — tracing was still pre-1.0 surface.
  2. Breaking: gen_ai.system removed in favor of gen_ai.provider.name. Cleans up the dual emission introduced by feat(telemetry): close five OTel GenAI semantic convention emission gaps (#1035) #1036.
  3. Additive: new hook surface. generation_batch_pre_call / generation_batch_post_call / generation_batch_error for plugin authors. Raw path now emits one span per generate_from_raw call (per OTel GenAI semconv), not one per MOT.
  4. Architectural: spans now live on the OTel context during API calls. Chat and raw paths both keep the backend span active across the network call, so nested instrumentation parents under it instead of floating at the root.
  5. Rebase artifact: ModelOutputThunk.cancel_generation (added upstream by feat(stdlib): add stream_with_chunking() with per-chunk validation (#901) #942 in 9e8a9636) had its body rewritten to fire GENERATION_ERROR instead of calling the removed _telemetry_span API. Reviewers may notice as "extra surface", but it's rebase-driven, not refactor-driven.

BREAKING CHANGE: Public telemetry API surface narrowed:

  • MELLEA_TRACE_* env vars renamed to plural MELLEA_TRACES_* with no deprecation shim.
  • The deprecated gen_ai.system span attribute is removed; consumers should read gen_ai.provider.name instead.
  • The split helpers is_application_tracing_enabled() and is_backend_tracing_enabled() are replaced by a single is_tracing_enabled().
  • The application-tracing helpers add_span_event, start_backend_span, end_backend_span, and trace_backend are removed from mellea.telemetry. Backend spans are now emitted by the BackendTracingPlugin automatically; application spans use trace_application (unchanged).
  • The mellea.telemetry.backend_instrumentation module is deleted along with its exports (start_generate_span, instrument_generate_from_raw, record_token_usage, record_response_metadata, finalize_backend_span).

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code was added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

Adding a new component, requirement, sampling strategy, or tool?

If your PR adds or modifies one of the types below, check the matching box. A checklist of type-specific review items will be posted as a comment.

  • Component
  • Requirement
  • Sampling Strategy
  • Tool

NOTE: Please ensure you have an issue that has been acknowledged by a core contributor and routed you to open a pull request against this repository. Otherwise, please open an issue before continuing with this pull request.

Phase 1 of the tracing epic: migrates backend span emission from inline
calls scattered across five backends onto a BackendTracingPlugin that
subscribes to the existing generation_* hooks for chat and new
generation_batch_* hooks for raw. Spans stay live on the OTel context
across the API call so nested instrumentation parents under them.

- Renames MELLEA_TRACE_* env vars to plural MELLEA_TRACES_* and
  introduces MELLEA_TRACES_ENABLED umbrella flag, opt-in
  MELLEA_TRACES_OTLP, and signal-specific
  OTEL_EXPORTER_OTLP_TRACES_ENDPOINT. No deprecation shim.
- Drops deprecated gen_ai.system attribute in favor of
  gen_ai.provider.name.
- Prefixes application-span attributes with mellea. for consistency
  with backend spans and the existing logging/metrics pillars.
- Eagerly initialises the tracer provider and registers the
  BackendTracingPlugin at module import when MELLEA_TRACES_ENABLED is
  truthy. Tests reset module state and call _setup_tracing() to re-init
  after env-var changes, removing the need for importlib.reload.
- Adds generation_batch_pre_call/post_call/error hook types so raw-path
  emits one span per generate_from_raw call (matching OTel GenAI
  semconv) rather than one per MOT.
- Deletes mellea/telemetry/backend_instrumentation.py; backends no
  longer import from mellea.telemetry (except .context).
- Removes the _telemetry_span round-trip on mot._meta and the
  tracing-specific error block in core/base.py; error-path span
  closure lives in the plugin's generation_error hook.

Closes generative-computing#1045, generative-computing#1046, generative-computing#1047 (Phase 1 of generative-computing#444).

BREAKING CHANGE: Public telemetry API surface narrowed:

- MELLEA_TRACE_* env vars renamed to plural MELLEA_TRACES_* with no
  deprecation shim.
- The deprecated gen_ai.system span attribute is removed; consumers
  should read gen_ai.provider.name instead.
- The split helpers is_application_tracing_enabled() and
  is_backend_tracing_enabled() are replaced by a single
  is_tracing_enabled().
- The application-tracing helpers add_span_event, start_backend_span,
  end_backend_span, and trace_backend are removed from
  mellea.telemetry. Backend spans are now emitted by the
  BackendTracingPlugin automatically; application spans use
  trace_application (unchanged).
- The mellea.telemetry.backend_instrumentation module is deleted along
  with its exports (start_generate_span, instrument_generate_from_raw,
  record_token_usage, record_response_metadata, finalize_backend_span).

Assisted-by: Claude Code
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
@ajbozarth
Copy link
Copy Markdown
Contributor Author

Follow-up work spawned by this PR (not part of the tracing epic):

@ajbozarth
Copy link
Copy Markdown
Contributor Author

I also opened #1180 which fixes a bug in metrics that also applies here. (ie that needs to merge first)

Copy link
Copy Markdown
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! I just have a few questions and a few places I think extra comments would be helpful.

try:
# Chat path is single-action; sequences[0] is the only sequence.
last_token = hf_output.sequences[0][-1].item()
eos = self._tokenizer.eos_token_id
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically don't we allow the user to specify stop sequences as well through model options? Should we be grabbing those as well?

Comment thread mellea/core/base.py
self._start: datetime.datetime | None = None
self._first_chunk_received: bool = False
self._generate_log: GenerateLog | None = None
self._generation_id: str | None = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a comment here that this is different than the response_id in GenerationMetadata? I think they are similar enough that we should add commentary.

Comment on lines +49 to +51
generation_id: Correlation identifier matching the corresponding
pre_call payload's `generation_id` for the same request, or
`None` when the firing site did not generate one.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add an explanation about generation_id being the Mellea identifier (and not the provider identifier) here as well please?

Comment on lines +45 to +46
@hook("generation_pre_call")
async def on_pre_call(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a note explaining why pre_call isn't fire and forget but the others are?

Comment on lines +74 to +76
finish_backend_span_success(
payload.generation_id, operation="chat", usage=gen.usage, mot=mot, gen=gen
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if someone only registers the start_backend_span function and these finish span functions never run?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

2 participants