Skip to content

Reduce allocation rate when marshaling OTLP data#11296

Open
mcculls wants to merge 1 commit intomasterfrom
mcculls/otlp-buffer-performance
Open

Reduce allocation rate when marshaling OTLP data#11296
mcculls wants to merge 1 commit intomasterfrom
mcculls/otlp-buffer-performance

Conversation

@mcculls
Copy link
Copy Markdown
Contributor

@mcculls mcculls commented May 6, 2026

What Does This Do

Marshals proto messages into a single prepending buffer, instead of many small byte-arrays and moves decision whether to export a span into OtlpTraceCollector (avoids re-allocations)

Motivation

Avoid O(n) allocations when serializing spans, metrics, or logs.

Additional Notes

Tests have been updated and cleaned up with the assistance of Claude

Contributor Checklist

Jira ticket: [PROJ-IDENT]

Note: Once your PR is ready to merge, add it to the merge queue by commenting /merge. /merge -c cancels the queue request. /merge -f --reason "reason" skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.

@mcculls mcculls added type: feature request tag: performance Performance related changes inst: opentelemetry OpenTelemetry instrumentation labels May 6, 2026
@mcculls mcculls force-pushed the mcculls/otlp-buffer-performance branch 2 times, most recently from c4d131f to d6ff0d7 Compare May 6, 2026 21:44
@mcculls mcculls marked this pull request as ready for review May 6, 2026 23:59
@mcculls mcculls requested a review from a team as a code owner May 6, 2026 23:59
@mcculls mcculls requested a review from dougqh May 6, 2026 23:59
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d6ff0d706b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@mcculls mcculls force-pushed the mcculls/otlp-buffer-performance branch from d6ff0d7 to 46e852c Compare May 7, 2026 09:15
@mcculls mcculls requested a review from a team as a code owner May 7, 2026 09:15
@mcculls mcculls requested review from mhlidd and removed request for a team May 7, 2026 09:15
…any small byte-arrays

and move decision whether to export a span into OtlpTraceCollector (avoids re-allocations)
@mcculls mcculls force-pushed the mcculls/otlp-buffer-performance branch from 46e852c to 6f5caa3 Compare May 7, 2026 10:19
Copy link
Copy Markdown
Contributor

@dougqh dougqh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good to me.

There was one thing that Claude pointed out...

  1. Behavior change in sampling-priority / process-tag emission (medium). Worth confirming intent. - Old code: if (i == 0 || i == len - 1) metaWriter.includeSamplingTags(); before visitSpan(spans[i]). Because visitSpan writes the previous span via completeSpan, the flag actually applies to span i-1 once consumed. Net effect across two traces [a,b]+[c,d]: a, b, c get _sampling_priority_v1; d does not. Process tags go on a. - New code: sampling tag on the last-written span of each trace boundary (a and c in the same example), plus process+sampling on whichever span completeScope finalizes (the one that ends up first in payload). So b and d lose their sampling tags relative to old behavior.
    The functional invariant "≥1 span per trace carries sampling priority" still holds, and the new placement is arguably cleaner (the old emission on second-to-last looked accidental). But there's no test asserting which spans carry these tags, so a downstream agent expecting redundancy across spans wouldn't catch a regression here. Recommend confirming with the agent-side OTLP→Datadog ingest team that the new placement is sufficient.

I haven't fully thought this through myself, but I thought it good to double check.
If you think it is fine, then I'm okay with it.

@mcculls
Copy link
Copy Markdown
Contributor Author

mcculls commented May 7, 2026

I haven't fully thought this through myself, but I thought it good to double check.
If you think it is fine, then I'm okay with it.

Yes the old code was a relic of porting the Datadog protocol code over - the new code cleans that up by removing the "last span" notion. (Note some of the expected OTLP behaviour is still being defined, but this passes the current system tests.)

@mcculls mcculls added this pull request to the merge queue May 8, 2026
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 8, 2026

/merge

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 Bot commented May 8, 2026

View all feedbacks in Devflow UI.

2026-05-08 00:06:41 UTC ℹ️ Start processing command /merge


2026-05-08 00:06:45 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in master is approximately 1h (p90).


2026-05-08 02:07:07 UTCMergeQueue: The build pipeline has timeout

The merge request has been interrupted because the build 0 took longer than expected. The current limit for the base branch 'master' is 120 minutes.

@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

inst: opentelemetry OpenTelemetry instrumentation tag: performance Performance related changes type: feature request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants