Support OTLP runtime metrics with OTel-native naming#11318
Draft
Support OTLP runtime metrics with OTel-native naming#11318
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What Does This Do
Adds an OTLP runtime-metrics path that emits JVM runtime metrics with OTel semantic-convention names (
jvm.*) through the agent'sMeterProvider, instead of the proprietary DogStatsD names (jvm.heap_memory,jvm.thread_count, …).When the three flags below are set together,
JvmOtlpRuntimeMetrics.start()is invoked fromAgent.installDatadogTracer()and registers 15 instruments backed byjava.lang.managementMXBean callbacks. They flow through the existing OTLP exporter — no new transport, no JMXFetch.DD_RUNTIME_METRICS_ENABLEDtruetrueDD_METRICS_OTEL_ENABLEDtruefalseDD_METRICS_OTEL_EXPORTERotlpInstruments registered (15 total —
Recommended+Developmentper the OTel JVM semconv):jvm.memory.used,jvm.memory.committed,jvm.memory.limit,jvm.memory.init,jvm.memory.used_after_last_gcjvm.buffer.memory.used,jvm.buffer.memory.limit,jvm.buffer.countjvm.thread.countjvm.class.loaded,jvm.class.count,jvm.class.unloadedjvm.cpu.time,jvm.cpu.count,jvm.cpu.recent_utilizationjvm.gc.durationis intentionally deferred. The spec requires a Histogram of per-collection pause durations, butGarbageCollectorMXBeanonly exposes cumulative collection time. Populating the histogram requires either subscribing toGarbageCollectionNotificationInfovia JMX (blocked by the bootstrap-class-loading constraints indocs/bootstrap_design_guidelines.md) or consuming JFRGarbageCollectionevents. Tracked as a follow-up.Motivation
Customers running with
DD_METRICS_OTEL_EXPORTER=otlproute their telemetry to an OTel collector — there may not be a Datadog Agent on the path, and therefore nothing listening on the DogStatsD socket. Today the tracer's runtime metrics still emit through DogStatsD with proprietary names (jvm.heap_memory, …), so in those deployments runtime metrics silently go nowhere.This change emits the same runtime metric data as OTLP instruments with OTel semantic-convention names through the OTel
MeterProvider, so it travels the same OTLP pipeline the customer already configured. Customers who haven't opted into OTLP metrics see no change — the existing DogStatsD path is untouched.Additional Notes
start()is single-shot: anAtomicBooleanCAS guards against re-entry from re-init, and on failure we log and stop (partial registration is worse than a silentretry).
java.lang.management.*pluscom.sun.management.OperatingSystemMXBeanfor CPU. CPU instruments are skipped at registration time on JVMs where thecom.sunbean isn't present. Nojavax.management.*is touched, keeping the constraints indocs/bootstrap_design_guidelines.mdintact.JvmOtlpRuntimeMetricsis registered inMETA-INF/native-image/.../reflect-config.json(using its post-shadow-relocation FQN,datadog.trace.bootstrap.otel.shim.metrics.JvmOtlpRuntimeMetrics) so AOT/native-image builds can resolve it reflectively fromAgent.java.JvmOtlpRuntimeMetricsTest(JUnit 5) covers instrument surface, attribute keys (jvm.memory.type=heap|non_heap), positive values for live metrics (jvm.memory.used,jvm.thread.count), and idempotency of repeatedstart()calls.