Skip to content

feat(otel): Implement distributed tracing#3218

Open
logaretm wants to merge 4 commits intoredis:masterfrom
logaretm:feat/otel-tracing
Open

feat(otel): Implement distributed tracing#3218
logaretm wants to merge 4 commits intoredis:masterfrom
logaretm:feat/otel-tracing

Conversation

@logaretm
Copy link
Copy Markdown
Contributor

@logaretm logaretm commented Apr 2, 2026

This PR builds on #3195 by adding OTEL traces implementation by consuming the events emitted by the tracing channels, you can test it out with a simple tool I built.

CleanShot 2026-04-02 at 13 06 43@2x

I followed the implementation spirit of the OTEL metrics, so it gets enabled with a config and works in a very similar manner to metrics, I added the neccessary tests and documentations as well.

I tested a OTEL exporter app and it produced the spans you see above when running:

OpenTelemetry.init({
  tracing: {
    enabled: true,
    enableConnectionSpans: true,
  },
});

await tracer.startActiveSpan("api.handleRequest", async (outerSpan) => {
  await client.set("req:counter", "0");

  await tracer.startActiveSpan("api.authenticate", async (authSpan) => {
    await client.get("user:1:name"); // simulate auth lookup
    await client.set("session:abc", "user:1");
    await client.expire("session:abc", 3600);
    authSpan.end();
  });

  await tracer.startActiveSpan("api.fetchData", async (dataSpan) => {
    await client.incr("req:counter");

    await tracer.startActiveSpan("api.fetchData.cache", async (cacheSpan) => {
      await client.get("cache:products");
      // cache miss, set it
      await client.set("cache:products", JSON.stringify(["widget", "gadget"]));
      await client.expire("cache:products", 300);
      cacheSpan.end();
    });

    dataSpan.end();
  });

  outerSpan.end();
});

Note

Medium Risk
Adds new tracing instrumentation hooked into diagnostics_channel TracingChannels and relies on OpenTelemetry internal AsyncLocalStorage binding for context propagation, which could affect span correctness/performance in opt-in paths.

Overview
Adds OpenTelemetry tracing support alongside existing metrics by introducing OTelTracing and wiring it into OpenTelemetry.init({ tracing: ... }) with configurable command filtering, optional connection spans, and span error attribution.

Implements span creation for commands, batch operations (MULTI/PIPELINE) with parent/child relationships, and optional connect() spans, plus a new test suite validating span attributes and lifecycle.

Updates docs and examples to cover tracing (docs/otel-tracing.md, examples/otel-tracing.js) and refreshes package dev deps/lockfile to include tracing SDK/context packages.

Written by Cursor Bugbot for commit 4337ded. This will update automatically on new commits. Configure here.

logaretm and others added 4 commits April 2, 2026 12:44
Adds built-in OpenTelemetry distributed tracing (spans) alongside the
existing metrics implementation. Spans are created by subscribing to the
same TracingChannels that metrics use, keeping the core code unchanged.

Context propagation uses the bindStore hack on OTel's internal
AsyncLocalStorage so that parent-child span relationships work
automatically (e.g. MULTI > individual commands).

Span coverage:
- Command spans (SET, GET, etc.) with db.query.text, db.namespace
- Batch spans (MULTI/PIPELINE) with db.operation.batch.size
- Connection + connection wait spans (opt-in via config)
- Error attributes: error.type, db.response.status_code

All attributes follow OTel semantic conventions for database spans.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 2, 2026 17:15
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds OpenTelemetry distributed tracing support to the @redis/client package by subscribing to the existing Diagnostics Channel TracingChannel events (commands, batches, and optionally connect), alongside docs/tests/examples to validate and demonstrate span output.

Changes:

  • Introduces OTelTracing implementation that creates OTel spans from node-redis tracing channels (command, batch, connect).
  • Extends OTel config/types to include TracingConfig and adds new semantic attribute (db.operation.batch.size).
  • Adds tracing documentation + runnable example, and a new test suite validating span creation, error recording, and parent/child relationships.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
README.md Updates OpenTelemetry section to include tracing and links to tracing docs/examples.
packages/client/package.json Adds dev dependencies needed for tracing tests/examples (context manager + trace SDK base).
packages/client/lib/opentelemetry/types.ts Adds TracingConfig and db.operation.batch.size attribute constant.
packages/client/lib/opentelemetry/tracing.ts New tracing subscriber that creates spans from Diagnostics Channel TracingChannel.
packages/client/lib/opentelemetry/tracing.spec.ts New tests covering command/batch/connect spans, filtering, and context propagation.
packages/client/lib/opentelemetry/index.ts Wires tracing into OpenTelemetry.init when enabled and exports tracing types/class.
package-lock.json Lockfile updates for added dependencies.
examples/README.md Adds otel-tracing.js to examples list.
examples/otel-tracing.js New runnable tracing example.
docs/otel-tracing.md New tracing documentation and configuration reference.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +88 to +98
static init({
api,
config,
}: {
api: typeof import('@opentelemetry/api');
config?: TracingConfig;
}) {
if (OTelTracing.#initialized) return;
OTelTracing.#instance = new OTelTracing(api, config);
OTelTracing.#initialized = true;
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TracingConfig.enabled is not respected inside OTelTracing.init/constructor (spans will be subscribed/created even if someone calls OTelTracing.init({ ..., config: { enabled: false } })). Either enforce the flag in OTelTracing.init (return early when disabled) or remove enabled from TracingConfig to avoid a misleading configuration surface.

Copilot uses AI. Check for mistakes.
Comment on lines +137 to +147
if (this.#otelStorage) {
// Bind OTel's ALS to the start channel for context propagation
(tc as any).start.bindStore(this.#otelStorage, (data: WithSpan & any) => {
const desc = onStart(data);
if (!desc) return context.active();

const parentCtx = context.active();
const span = this.#tracer.startSpan(desc.name, desc.options, parentCtx);
data[SPAN_KEY] = span;
return trace.setSpan(parentCtx, span);
});
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tc.start.bindStore / unbindStore are invoked via any without checking they exist. On Node versions where TracingChannel exists but bindStore/unbindStore are unavailable or have different semantics, this will throw during initialization/teardown. Add a runtime guard (e.g., check for functions before calling) and fall back to the WeakMap approach when binding isn’t supported.

Copilot uses AI. Check for mistakes.
Comment on lines +151 to +156
ctx[SPAN_KEY]?.end();
},
error: (ctx: WithSpan & { error?: Error }) => {
const span = ctx[SPAN_KEY];
if (span) {
this.#endSpanWithError(span, ctx.error!);
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the bound-store path, error uses ctx.error! and asyncEnd/error leave the span attached to the context object. If ctx.error is unexpectedly missing this will throw, and retaining the ended span on ctx can keep references alive longer than needed. Prefer guarding when error is falsy and deleting ctx[SPAN_KEY] after ending to reduce retention.

Suggested change
ctx[SPAN_KEY]?.end();
},
error: (ctx: WithSpan & { error?: Error }) => {
const span = ctx[SPAN_KEY];
if (span) {
this.#endSpanWithError(span, ctx.error!);
const span = ctx[SPAN_KEY];
if (span) {
span.end();
delete ctx[SPAN_KEY];
}
},
error: (ctx: WithSpan & { error?: Error }) => {
const span = ctx[SPAN_KEY];
if (span) {
const error = ctx.error;
if (error) {
this.#endSpanWithError(span, error);
} else {
span.end();
}
delete ctx[SPAN_KEY];

Copilot uses AI. Check for mistakes.
],
});

context.setGlobalContextManager(new AsyncLocalStorageContextManager());
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AsyncLocalStorageContextManager needs to be enabled (e.g. new AsyncLocalStorageContextManager().enable()) before setting it as the global context manager; otherwise context propagation (and therefore parent/child span relationships) may not work as documented.

Suggested change
context.setGlobalContextManager(new AsyncLocalStorageContextManager());
const contextManager = new AsyncLocalStorageContextManager().enable();
context.setGlobalContextManager(contextManager);

Copilot uses AI. Check for mistakes.
],
});

context.setGlobalContextManager(new AsyncLocalStorageContextManager());
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AsyncLocalStorageContextManager snippet is missing .enable(). Without enabling, OpenTelemetry context propagation can be disabled, so Redis spans may not become children of active application spans as described later in this doc.

Suggested change
context.setGlobalContextManager(new AsyncLocalStorageContextManager());
context.setGlobalContextManager(
new AsyncLocalStorageContextManager().enable()
);

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

if (span) {
this.#endSpanWithError(span, ctx.error!);
}
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Span ended twice on error in ALS path

Medium Severity

In the ALS code path, when a traced operation rejects, Node.js TracingChannel.tracePromise publishes both the error and asyncEnd events. The error handler calls #endSpanWithError, which calls span.end(), and then the asyncEnd handler calls ctx[SPAN_KEY]?.end() again — double-ending the span. The WeakMap fallback path correctly avoids this by calling spans.delete(ctx) in the error handler, but the ALS path never clears ctx[SPAN_KEY]. This produces an OTel SDK warning ("Can not end a Span that has already been ended") for every errored Redis command.

Additional Locations (1)
Fix in Cursor Fix in Web

includeCommands: Record<string, true>;
excludeCommands: Record<string, true>;
hasIncludeCommands: boolean;
hasExcludeCommands: boolean;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field hasExcludeCommands is assigned but never read

Low Severity

hasExcludeCommands is declared in TracingOptions, computed and assigned in the constructor, but never read anywhere. The #isCommandExcluded method only uses hasIncludeCommands, includeCommands, and excludeCommands — the hasExcludeCommands shortcut is dead code.

Additional Locations (1)
Fix in Cursor Fix in Web

@PavelPashov
Copy link
Copy Markdown
Contributor

@logaretm thank you, I'm gonna take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants