Skip to content

Add sourcemap upload failure metrics#392

Open
watson wants to merge 2 commits into
masterfrom
watson/DEBUG-5301/add-upload-metrics
Open

Add sourcemap upload failure metrics#392
watson wants to merge 2 commits into
masterfrom
watson/DEBUG-5301/add-upload-metrics

Conversation

@watson

@watson watson commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

What and why?

Add opt-in Datadog metrics for Error Tracking sourcemap upload retries and final upload failures. These metrics make transient sourcemap intake issues and exhausted retry failures observable without parsing CI logs.

The metrics are emitted only when the existing metrics plugin is enabled, so customers using Error Tracking without build metrics keep the same default experience.

How?

Sourcemap uploads record retry and failure counts in memory during each upload phase, then add those counts to a shared internal metrics collector on the global context.

The metrics plugin owns the actual metrics pipeline: it includes collected upload metrics in the existing metrics: {} payload, applies the normal filtering, prefixing, tags, custom metrics hook, and Datadog send behavior, and awaits the send. It also flushes any metrics collected late at true end so fallback sourcemap uploads keep the same awaited behavior.

Metrics added before the metrics plugin prefix is applied:

  • sourcemaps.upload.retry
  • sourcemaps.upload.failure

Tags include bundler, plugin version, service, site, status code, error type, and CI job context when available.

Testing

  • yarn workspace @dd/metrics-plugin typecheck
  • yarn workspace @dd/error-tracking-plugin typecheck
  • yarn test:unit packages/plugins/error-tracking/src/sourcemaps/sender.test.ts packages/plugins/error-tracking/src/index.test.ts packages/plugins/metrics/src/index.test.ts

@watson watson requested a review from yoannmoinet as a code owner June 2, 2026 04:44

watson commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@watson watson force-pushed the watson/DEBUG-5301/add-upload-metrics branch from e8766cd to 4376beb Compare June 2, 2026 04:51
Add metrics for Error Tracking sourcemap upload retries and final upload
failures. These make intake instability observable without parsing CI logs.

Emit the metrics only when the existing metrics plugin is enabled, preserving the
default behavior for customers who use Error Tracking without build metrics.

Batch metric emission after each upload phase and keep metric publishing
non-fatal so sourcemap uploads are not affected by telemetry failures.

Include tags for bundler, plugin version, service, site, status code, error type,
and CI job context when available.
@watson watson force-pushed the watson/DEBUG-5301/add-upload-metrics branch from 4376beb to 6a47213 Compare June 2, 2026 04:56
@watson watson requested a review from a team June 2, 2026 08:49
@watson watson requested a review from a team as a code owner June 3, 2026 12:08
@datadog-prod-us1-5

datadog-prod-us1-5 Bot commented Jun 3, 2026

Copy link
Copy Markdown

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: d1d3c15 | Docs | Datadog PR Page | Give us feedback!

Replace the sourcemap-specific metrics sender with a shared internal metrics
collector exposed on the global context. This lets error-tracking add upload
retry and failure metrics while the metrics plugin still owns filtering,
prefixing, hooks, and sending.

Flush any collected metrics at true end so late sourcemap uploads keep the old
awaited send behavior.
@watson watson force-pushed the watson/DEBUG-5301/add-upload-metrics branch from 2b943c4 to d1d3c15 Compare June 3, 2026 12:27
Comment on lines +50 to +57
`bundler:${context.bundlerName}`,
`plugin_version:${context.version}`,
`service:${options.service}`,
`site:${context.site}`,
...(process.env.CI_JOB_NAME ? [`jobname:${normalizeTagValue(process.env.CI_JOB_NAME)}`] : []),
...(process.env.BRANCH_TYPE
? [`branchtype:${normalizeTagValue(process.env.BRANCH_TYPE)}`]
: []),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like these should be default for all metrics we send through the metrics plugin.
Or at least, it feels weird to me to have different metrics definition between base metrics and these new metrics.
It should be made through an unified pipeline.

Also, CI_JOB_NAME and BRANCH_TYPE are pretty custom to be included here.
Aren't all of these already handled on the user's side (web-ui)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of these helpers are only used for sourcemaps metrics but are made in a generic way, as-in are not really sourcemaps specific.

They should instead be part of the main metrics pipeline we already have.
So all the metrics we send can benefit from it.

Comment on lines -72 to +113
const allMetrics = new Set([...universalMetrics, ...pluginMetrics, ...loaderMetrics]);
const allMetrics = new Set([
...universalMetrics,
...pluginMetrics,
...loaderMetrics,
...stores.metrics,
]);
stores.metrics.clear();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should change this part, and only rely on the "flush" mechanism for these metrics added at build time.

Comment on lines +173 to +175
async asyncTrueEnd() {
await sendCollectedMetrics();
},

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great to have this flush mechanism here.
But it reveals a design flaw in how we run the asyncTrueEnd hook.

Right now, the sourcemaps are also uploaded part of the asyncTrueEnd hook, so we may end up with a race condition as there is no enforced order during the trueEnd hooks.

Maybe we could introduce a new flush hook that would execute after the asyncTrueEnd/syncTrueEnd hooks to ensure there is no race condition for this specific use case.

It conflicts with the trueEnd semantics though 😔 but I don't really have a better idea for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants