fix(llmobs): openai-java payload mapping for responses, tool metadata, and prompt tracking#10644
fix(llmobs): openai-java payload mapping for responses, tool metadata, and prompt tracking#10644
Conversation
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 62 metrics, 9 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.60.0-SNAPSHOT~661ea70f3f, baseline=1.61.0-SNAPSHOT~5580c61ac4
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.062 s) : 0, 1062146
Total [baseline] (8.946 s) : 0, 8945766
Agent [candidate] (1.063 s) : 0, 1063058
Total [candidate] (8.897 s) : 0, 8897056
section iast
Agent [baseline] (1.231 s) : 0, 1231042
Total [baseline] (9.577 s) : 0, 9577256
Agent [candidate] (1.228 s) : 0, 1227961
Total [candidate] (9.564 s) : 0, 9563849
gantt
title insecure-bank - break down per module: candidate=1.60.0-SNAPSHOT~661ea70f3f, baseline=1.61.0-SNAPSHOT~5580c61ac4
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.208 ms) : 0, 1208
crashtracking [candidate] (1.212 ms) : 0, 1212
BytebuddyAgent [baseline] (632.573 ms) : 0, 632573
BytebuddyAgent [candidate] (633.335 ms) : 0, 633335
AgentMeter [baseline] (29.625 ms) : 0, 29625
AgentMeter [candidate] (29.57 ms) : 0, 29570
GlobalTracer [baseline] (258.037 ms) : 0, 258037
GlobalTracer [candidate] (258.161 ms) : 0, 258161
AppSec [baseline] (31.807 ms) : 0, 31807
AppSec [candidate] (31.969 ms) : 0, 31969
Debugger [baseline] (59.747 ms) : 0, 59747
Debugger [candidate] (59.697 ms) : 0, 59697
Remote Config [baseline] (602.128 µs) : 0, 602
Remote Config [candidate] (583.05 µs) : 0, 583
Telemetry [baseline] (8.081 ms) : 0, 8081
Telemetry [candidate] (8.845 ms) : 0, 8845
Flare Poller [baseline] (4.257 ms) : 0, 4257
Flare Poller [candidate] (3.518 ms) : 0, 3518
section iast
crashtracking [baseline] (1.202 ms) : 0, 1202
crashtracking [candidate] (1.222 ms) : 0, 1222
BytebuddyAgent [baseline] (798.555 ms) : 0, 798555
BytebuddyAgent [candidate] (797.664 ms) : 0, 797664
AgentMeter [baseline] (11.383 ms) : 0, 11383
AgentMeter [candidate] (11.405 ms) : 0, 11405
GlobalTracer [baseline] (248.238 ms) : 0, 248238
GlobalTracer [candidate] (247.671 ms) : 0, 247671
IAST [baseline] (25.407 ms) : 0, 25407
IAST [candidate] (25.413 ms) : 0, 25413
AppSec [baseline] (26.553 ms) : 0, 26553
AppSec [candidate] (26.439 ms) : 0, 26439
Debugger [baseline] (69.053 ms) : 0, 69053
Debugger [candidate] (67.291 ms) : 0, 67291
Remote Config [baseline] (531.369 µs) : 0, 531
Remote Config [candidate] (519.34 µs) : 0, 519
Telemetry [baseline] (10.261 ms) : 0, 10261
Telemetry [candidate] (10.542 ms) : 0, 10542
Flare Poller [baseline] (3.69 ms) : 0, 3690
Flare Poller [candidate] (3.754 ms) : 0, 3754
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.60.0-SNAPSHOT~661ea70f3f, baseline=1.61.0-SNAPSHOT~5580c61ac4
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.056 s) : 0, 1055849
Total [baseline] (11.13 s) : 0, 11129717
Agent [candidate] (1.073 s) : 0, 1072978
Total [candidate] (11.337 s) : 0, 11337301
section appsec
Agent [baseline] (1.248 s) : 0, 1247733
Total [baseline] (11.188 s) : 0, 11187973
Agent [candidate] (1.248 s) : 0, 1248251
Total [candidate] (11.131 s) : 0, 11131266
section iast
Agent [baseline] (1.229 s) : 0, 1228838
Total [baseline] (11.4 s) : 0, 11399785
Agent [candidate] (1.228 s) : 0, 1228338
Total [candidate] (11.36 s) : 0, 11359730
section profiling
Agent [baseline] (1.192 s) : 0, 1191915
Total [baseline] (11.009 s) : 0, 11008741
Agent [candidate] (1.184 s) : 0, 1184053
Total [candidate] (11.081 s) : 0, 11080559
gantt
title petclinic - break down per module: candidate=1.60.0-SNAPSHOT~661ea70f3f, baseline=1.61.0-SNAPSHOT~5580c61ac4
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.206 ms) : 0, 1206
crashtracking [candidate] (1.23 ms) : 0, 1230
BytebuddyAgent [baseline] (628.457 ms) : 0, 628457
BytebuddyAgent [candidate] (638.698 ms) : 0, 638698
AgentMeter [baseline] (29.438 ms) : 0, 29438
AgentMeter [candidate] (29.967 ms) : 0, 29967
GlobalTracer [baseline] (256.757 ms) : 0, 256757
GlobalTracer [candidate] (260.161 ms) : 0, 260161
AppSec [baseline] (31.65 ms) : 0, 31650
AppSec [candidate] (32.308 ms) : 0, 32308
Debugger [baseline] (60.267 ms) : 0, 60267
Debugger [candidate] (61.198 ms) : 0, 61198
Remote Config [baseline] (582.959 µs) : 0, 583
Remote Config [candidate] (595.351 µs) : 0, 595
Telemetry [baseline] (8.0 ms) : 0, 8000
Telemetry [candidate] (8.181 ms) : 0, 8181
Flare Poller [baseline] (3.471 ms) : 0, 3471
Flare Poller [candidate] (4.301 ms) : 0, 4301
section appsec
crashtracking [baseline] (1.217 ms) : 0, 1217
crashtracking [candidate] (1.195 ms) : 0, 1195
BytebuddyAgent [baseline] (658.46 ms) : 0, 658460
BytebuddyAgent [candidate] (658.919 ms) : 0, 658919
AgentMeter [baseline] (12.166 ms) : 0, 12166
AgentMeter [candidate] (12.095 ms) : 0, 12095
GlobalTracer [baseline] (258.464 ms) : 0, 258464
GlobalTracer [candidate] (258.347 ms) : 0, 258347
AppSec [baseline] (177.991 ms) : 0, 177991
AppSec [candidate] (178.556 ms) : 0, 178556
Debugger [baseline] (66.216 ms) : 0, 66216
Debugger [candidate] (66.058 ms) : 0, 66058
Remote Config [baseline] (638.592 µs) : 0, 639
Remote Config [candidate] (627.869 µs) : 0, 628
Telemetry [baseline] (8.389 ms) : 0, 8389
Telemetry [candidate] (8.292 ms) : 0, 8292
Flare Poller [baseline] (3.582 ms) : 0, 3582
Flare Poller [candidate] (3.581 ms) : 0, 3581
IAST [baseline] (24.211 ms) : 0, 24211
IAST [candidate] (24.187 ms) : 0, 24187
section iast
crashtracking [baseline] (1.182 ms) : 0, 1182
crashtracking [candidate] (1.186 ms) : 0, 1186
BytebuddyAgent [baseline] (796.816 ms) : 0, 796816
BytebuddyAgent [candidate] (796.903 ms) : 0, 796903
AgentMeter [baseline] (11.375 ms) : 0, 11375
AgentMeter [candidate] (11.408 ms) : 0, 11408
GlobalTracer [baseline] (247.492 ms) : 0, 247492
GlobalTracer [candidate] (247.512 ms) : 0, 247512
AppSec [baseline] (27.257 ms) : 0, 27257
AppSec [candidate] (26.423 ms) : 0, 26423
Debugger [baseline] (69.491 ms) : 0, 69491
Debugger [candidate] (70.449 ms) : 0, 70449
Remote Config [baseline] (533.328 µs) : 0, 533
Remote Config [candidate] (523.227 µs) : 0, 523
Telemetry [baseline] (9.727 ms) : 0, 9727
Telemetry [candidate] (9.119 ms) : 0, 9119
Flare Poller [baseline] (3.541 ms) : 0, 3541
Flare Poller [candidate] (3.311 ms) : 0, 3311
IAST [baseline] (25.364 ms) : 0, 25364
IAST [candidate] (25.36 ms) : 0, 25360
section profiling
ProfilingAgent [baseline] (94.247 ms) : 0, 94247
ProfilingAgent [candidate] (94.238 ms) : 0, 94238
crashtracking [baseline] (1.177 ms) : 0, 1177
crashtracking [candidate] (1.157 ms) : 0, 1157
BytebuddyAgent [baseline] (688.585 ms) : 0, 688585
BytebuddyAgent [candidate] (683.463 ms) : 0, 683463
AgentMeter [baseline] (9.087 ms) : 0, 9087
AgentMeter [candidate] (9.078 ms) : 0, 9078
GlobalTracer [baseline] (217.102 ms) : 0, 217102
GlobalTracer [candidate] (215.69 ms) : 0, 215690
AppSec [baseline] (32.268 ms) : 0, 32268
AppSec [candidate] (32.142 ms) : 0, 32142
Debugger [baseline] (65.681 ms) : 0, 65681
Debugger [candidate] (64.979 ms) : 0, 64979
Remote Config [baseline] (572.119 µs) : 0, 572
Remote Config [candidate] (562.643 µs) : 0, 563
Telemetry [baseline] (7.755 ms) : 0, 7755
Telemetry [candidate] (7.664 ms) : 0, 7664
Flare Poller [baseline] (4.204 ms) : 0, 4204
Flare Poller [candidate] (4.225 ms) : 0, 4225
Profiling [baseline] (94.805 ms) : 0, 94805
Profiling [candidate] (94.817 ms) : 0, 94817
LoadParameters
See matching parameters
SummaryFound 1 performance improvements and 2 performance regressions! Performance is the same for 18 metrics, 15 unstable metrics.
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~661ea70f3f, baseline=1.61.0-SNAPSHOT~5580c61ac4
dateFormat X
axisFormat %s
section baseline
no_agent (1.178 ms) : 1167, 1189
. : milestone, 1178,
iast (3.024 ms) : 2984, 3064
. : milestone, 3024,
iast_FULL (5.861 ms) : 5803, 5920
. : milestone, 5861,
iast_GLOBAL (3.635 ms) : 3572, 3698
. : milestone, 3635,
profiling (2.39 ms) : 2367, 2412
. : milestone, 2390,
tracing (1.778 ms) : 1762, 1793
. : milestone, 1778,
section candidate
no_agent (1.212 ms) : 1200, 1224
. : milestone, 1212,
iast (3.224 ms) : 3179, 3269
. : milestone, 3224,
iast_FULL (5.934 ms) : 5874, 5994
. : milestone, 5934,
iast_GLOBAL (3.566 ms) : 3501, 3630
. : milestone, 3566,
profiling (2.042 ms) : 2024, 2061
. : milestone, 2042,
tracing (1.792 ms) : 1777, 1807
. : milestone, 1792,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~661ea70f3f, baseline=1.61.0-SNAPSHOT~5580c61ac4
dateFormat X
axisFormat %s
section baseline
no_agent (16.795 ms) : 16635, 16955
. : milestone, 16795,
appsec (18.917 ms) : 18726, 19108
. : milestone, 18917,
code_origins (18.247 ms) : 18064, 18430
. : milestone, 18247,
iast (17.787 ms) : 17609, 17965
. : milestone, 17787,
profiling (18.689 ms) : 18503, 18875
. : milestone, 18689,
tracing (18.129 ms) : 17947, 18310
. : milestone, 18129,
section candidate
no_agent (17.098 ms) : 16927, 17269
. : milestone, 17098,
appsec (18.665 ms) : 18474, 18856
. : milestone, 18665,
code_origins (17.847 ms) : 17670, 18025
. : milestone, 17847,
iast (17.525 ms) : 17353, 17698
. : milestone, 17525,
profiling (18.391 ms) : 18204, 18578
. : milestone, 18391,
tracing (18.746 ms) : 18557, 18935
. : milestone, 18746,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 10 metrics, 2 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~661ea70f3f, baseline=1.61.0-SNAPSHOT~5580c61ac4
dateFormat X
axisFormat %s
section baseline
no_agent (14.808 s) : 14808000, 14808000
. : milestone, 14808000,
appsec (14.711 s) : 14711000, 14711000
. : milestone, 14711000,
iast (18.493 s) : 18493000, 18493000
. : milestone, 18493000,
iast_GLOBAL (18.016 s) : 18016000, 18016000
. : milestone, 18016000,
profiling (15.504 s) : 15504000, 15504000
. : milestone, 15504000,
tracing (14.872 s) : 14872000, 14872000
. : milestone, 14872000,
section candidate
no_agent (15.496 s) : 15496000, 15496000
. : milestone, 15496000,
appsec (14.996 s) : 14996000, 14996000
. : milestone, 14996000,
iast (18.438 s) : 18438000, 18438000
. : milestone, 18438000,
iast_GLOBAL (17.796 s) : 17796000, 17796000
. : milestone, 17796000,
profiling (14.991 s) : 14991000, 14991000
. : milestone, 14991000,
tracing (15.121 s) : 15121000, 15121000
. : milestone, 15121000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~661ea70f3f, baseline=1.61.0-SNAPSHOT~5580c61ac4
dateFormat X
axisFormat %s
section baseline
no_agent (1.487 ms) : 1475, 1499
. : milestone, 1487,
appsec (2.526 ms) : 2472, 2580
. : milestone, 2526,
iast (2.261 ms) : 2193, 2330
. : milestone, 2261,
iast_GLOBAL (2.318 ms) : 2249, 2388
. : milestone, 2318,
profiling (2.091 ms) : 2037, 2146
. : milestone, 2091,
tracing (2.067 ms) : 2014, 2120
. : milestone, 2067,
section candidate
no_agent (1.481 ms) : 1470, 1493
. : milestone, 1481,
appsec (3.828 ms) : 3602, 4055
. : milestone, 3828,
iast (2.277 ms) : 2208, 2346
. : milestone, 2277,
iast_GLOBAL (2.314 ms) : 2245, 2383
. : milestone, 2314,
profiling (2.547 ms) : 2330, 2763
. : milestone, 2547,
tracing (2.068 ms) : 2015, 2121
. : milestone, 2068,
|
5cd257e to
cbd6226
Compare
…wthTestOpenAiLlmInteractions::test_completion
…teractions::test_chat_completion_tool_call
…d with python openai instrumentation and system-tests
… with variables + chat_template, longest-first overlap handling) and support map-based LLM input serialization (messages + prompt) in LLMObs mapper. Also filter empty instruction messages to match system-test expectations.
…st and return [image] (not empty) when stripped input_image URLs are missing, aligning mixed-input chat_template output with expected behavior.
…output.messages from request params so existing error-span tests pass.
…ol_definitions tags
…JSON argument parsing and remove duplicate manual parsing logic from ResponseDecorator.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0c879ba692
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
What Does This Do
Aligns OpenAI Java LLMObs span payloads with expected intake/system-test schema by:
_ml_obs_tag.integration_ml_obs_tag.source_ml_obs_tag.ddtrace.version_ml_obs_tag.error_ml_obs_tag.error_typemodel_name(and stable placeholder output where applicable) is set on error paths forchat/completions/embeddings/responses.
input.prompt,variables,chat_template)tool_definitions)stream,tool_choice,text.verbosity, etc.)JsonValueUtils._ddmap with span/trace idsmeta.errorinputserialization (messages+prompt)tool_definitionsintometa.Motivation
OpenAI/LLMObs system tests exposed schema and tag mismatches in Java payloads (especially response spans, tool metadata, error mapping, and prompt tracking structure). This change brings Java output in line with expected LLMObs intake contract and behavior.
Additional Notes
openai-java-3.0min version updated from3.0.0to3.0.1.DataDog/dd-apm-test-agent#280
DataDog/system-tests#6364
Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issueJira ticket: [PROJ-IDENT]
Note: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.