Fix flaky KafkaClientDataStreamsDisabledForkedTest batch consume test by bm1549 · Pull Request #10797 · DataDog/dd-trace-java

bm1549 · 2026-03-10T19:54:29Z

What Does This Do

Fixes flaky Kafka tests in KafkaClientDataStreamsDisabledForkedTest by replacing SORT_TRACES_BY_ID with SORT_TRACES_BY_START and using dynamic parent lookup instead of hardcoded positional trace references.

Motivation

Multiple test methods use SORT_TRACES_BY_ID which sorts traces by their root span's spanId. With SEQUENTIAL ID generation (the test default), this happens to match creation order. With RANDOM IDs (production behavior), the sort order is non-deterministic, breaking hardcoded consumer-to-producer trace mappings.

Root Cause

SORT_TRACES_BY_ID sorts by span ID, which is only deterministic with sequential IDs. The test hardcodes positional mappings like trace(1)→trace(0)[6] that assume a specific sort order. With RANDOM IDs, these mappings break.

Fix

SORT_TRACES_BY_START: Producer traces always start before consumer traces, giving deterministic ordering by start time regardless of ID strategy.
Dynamic parent matching: For tests with multiple consumer traces (batch consume, backwards iteration), dynamically match each consumer span's parentId to the correct producer span instead of assuming positional indices.
KafkaClientDsmDisabledRandomIdsForkedTest: New test class that uses RANDOM IDs to reproduce the issue deterministically.

Affected Test Methods

test spring kafka template produce and batch consume — dynamic parent matching
test spring kafka template produce and consume — SORT_TRACES_BY_START
test pass through tombstone — SORT_TRACES_BY_START
test records(TopicPartition) kafka consume — SORT_TRACES_BY_START
test records(TopicPartition).subList kafka consume — SORT_TRACES_BY_START
test records(TopicPartition).forEach kafka consume — SORT_TRACES_BY_START
test iteration backwards over ConsumerRecords — SORT_TRACES_BY_START + dynamic parent matching

Additional Notes

Only the !hasQueueSpan() branches use dynamic matching (the hasQueueSpan() branches are unchanged)
All changes are in test assertion logic only — no production code modified

Jira ticket: N/A

🤖 Generated with Claude Code

pr-commenter · 2026-03-10T20:27:19Z

Kafka / producer-benchmark

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	brian.marks/fix-kafka-dsm-disabled-flaky-test-v2
git_commit_date	1773849990	1773855291
git_commit_sha	`9352dfa`	`24d96fd`

See matching parameters

	Baseline	Candidate
ci_job_date	1773856361	1773856361
ci_job_id	1518697788	1518697788
ci_pipeline_id	103339041	103339041
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
jdkVersion	11.0.25	11.0.25
jmhVersion	1.36	1.36
jvm	/usr/lib/jvm/java-11-openjdk-amd64/bin/java	/usr/lib/jvm/java-11-openjdk-amd64/bin/java
jvmArgs	-Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/producer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant	-Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/producer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant
vmName	OpenJDK 64-Bit Server VM	OpenJDK 64-Bit Server VM
vmVersion	11.0.25+9-post-Ubuntu-1ubuntu122.04	11.0.25+9-post-Ubuntu-1ubuntu122.04

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 3 metrics, 0 unstable metrics.

See unchanged results

scenario	Δ mean throughput
scenario:not-instrumented/KafkaProduceBenchmark.benchProduce	same
scenario:only-tracing-dsm-disabled-benchmarks/KafkaProduceBenchmark.benchProduce	same
scenario:only-tracing-dsm-enabled-benchmarks/KafkaProduceBenchmark.benchProduce	same

pr-commenter · 2026-03-10T20:38:59Z

Benchmarks

Startup

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	brian.marks/fix-kafka-dsm-disabled-flaky-test-v2
git_commit_date	1773849990	1773961072
git_commit_sha	`9352dfa`	`f100979`
release_version	1.61.0-SNAPSHOT~9352dfa345	1.61.0-SNAPSHOT~f10097968b

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1773963119	1773963119
ci_job_id	1524004066	1524004066
ci_pipeline_id	103634932	103634932
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-1-qqf19qee 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-1-qqf19qee 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module	Agent	Agent
parent	None	None

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 64 metrics, 7 unstable metrics.

Startup time reports for insecure-bank

gantt
    title insecure-bank - global startup overhead: candidate=1.61.0-SNAPSHOT~f10097968b, baseline=1.61.0-SNAPSHOT~9352dfa345

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.063 s) : 0, 1062643
Total [baseline] (8.883 s) : 0, 8882623
Agent [candidate] (1.066 s) : 0, 1066388
Total [candidate] (8.897 s) : 0, 8896926
section iast
Agent [baseline] (1.227 s) : 0, 1226852
Total [baseline] (9.571 s) : 0, 9570764
Agent [candidate] (1.226 s) : 0, 1225992
Total [candidate] (9.6 s) : 0, 9599541

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.063 s	-
Agent	iast	1.227 s	164.209 ms (15.5%)
Total	tracing	8.883 s	-
Total	iast	9.571 s	688.141 ms (7.7%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.066 s	-
Agent	iast	1.226 s	159.605 ms (15.0%)
Total	tracing	8.897 s	-
Total	iast	9.6 s	702.614 ms (7.9%)

gantt
    title insecure-bank - break down per module: candidate=1.61.0-SNAPSHOT~f10097968b, baseline=1.61.0-SNAPSHOT~9352dfa345

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.208 ms) : 0, 1208
crashtracking [candidate] (1.207 ms) : 0, 1207
BytebuddyAgent [baseline] (632.462 ms) : 0, 632462
BytebuddyAgent [candidate] (633.048 ms) : 0, 633048
AgentMeter [baseline] (29.393 ms) : 0, 29393
AgentMeter [candidate] (29.478 ms) : 0, 29478
GlobalTracer [baseline] (258.043 ms) : 0, 258043
GlobalTracer [candidate] (259.457 ms) : 0, 259457
AppSec [baseline] (31.538 ms) : 0, 31538
AppSec [candidate] (31.852 ms) : 0, 31852
Debugger [baseline] (59.444 ms) : 0, 59444
Debugger [candidate] (59.96 ms) : 0, 59960
Remote Config [baseline] (581.363 µs) : 0, 581
Remote Config [candidate] (584.55 µs) : 0, 585
Telemetry [baseline] (8.034 ms) : 0, 8034
Telemetry [candidate] (8.738 ms) : 0, 8738
Flare Poller [baseline] (5.76 ms) : 0, 5760
Flare Poller [candidate] (5.797 ms) : 0, 5797
section iast
crashtracking [baseline] (1.216 ms) : 0, 1216
crashtracking [candidate] (1.186 ms) : 0, 1186
BytebuddyAgent [baseline] (795.929 ms) : 0, 795929
BytebuddyAgent [candidate] (795.283 ms) : 0, 795283
AgentMeter [baseline] (11.341 ms) : 0, 11341
AgentMeter [candidate] (11.274 ms) : 0, 11274
GlobalTracer [baseline] (247.486 ms) : 0, 247486
GlobalTracer [candidate] (247.231 ms) : 0, 247231
AppSec [baseline] (26.488 ms) : 0, 26488
AppSec [candidate] (26.453 ms) : 0, 26453
Debugger [baseline] (69.136 ms) : 0, 69136
Debugger [candidate] (70.052 ms) : 0, 70052
Remote Config [baseline] (528.454 µs) : 0, 528
Remote Config [candidate] (533.945 µs) : 0, 534
Telemetry [baseline] (9.722 ms) : 0, 9722
Telemetry [candidate] (9.212 ms) : 0, 9212
Flare Poller [baseline] (3.494 ms) : 0, 3494
Flare Poller [candidate] (3.414 ms) : 0, 3414
IAST [baseline] (25.316 ms) : 0, 25316
IAST [candidate] (25.299 ms) : 0, 25299

Startup time reports for petclinic

gantt
    title petclinic - global startup overhead: candidate=1.61.0-SNAPSHOT~f10097968b, baseline=1.61.0-SNAPSHOT~9352dfa345

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.062 s) : 0, 1061900
Total [baseline] (11.125 s) : 0, 11125064
Agent [candidate] (1.061 s) : 0, 1061302
Total [candidate] (11.056 s) : 0, 11055821
section appsec
Agent [baseline] (1.249 s) : 0, 1248696
Total [baseline] (11.117 s) : 0, 11116621
Agent [candidate] (1.246 s) : 0, 1246224
Total [candidate] (11.083 s) : 0, 11082961
section iast
Agent [baseline] (1.239 s) : 0, 1239293
Total [baseline] (11.403 s) : 0, 11403024
Agent [candidate] (1.241 s) : 0, 1241336
Total [candidate] (11.316 s) : 0, 11315630
section profiling
Agent [baseline] (1.185 s) : 0, 1184835
Total [baseline] (11.071 s) : 0, 11070964
Agent [candidate] (1.184 s) : 0, 1183520
Total [candidate] (11.029 s) : 0, 11029413

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.062 s	-
Agent	appsec	1.249 s	186.796 ms (17.6%)
Agent	iast	1.239 s	177.392 ms (16.7%)
Agent	profiling	1.185 s	122.935 ms (11.6%)
Total	tracing	11.125 s	-
Total	appsec	11.117 s	-8.443 ms (-0.1%)
Total	iast	11.403 s	277.96 ms (2.5%)
Total	profiling	11.071 s	-54.101 ms (-0.5%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.061 s	-
Agent	appsec	1.246 s	184.921 ms (17.4%)
Agent	iast	1.241 s	180.033 ms (17.0%)
Agent	profiling	1.184 s	122.217 ms (11.5%)
Total	tracing	11.056 s	-
Total	appsec	11.083 s	27.14 ms (0.2%)
Total	iast	11.316 s	259.809 ms (2.3%)
Total	profiling	11.029 s	-26.408 ms (-0.2%)

gantt
    title petclinic - break down per module: candidate=1.61.0-SNAPSHOT~f10097968b, baseline=1.61.0-SNAPSHOT~9352dfa345

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.186 ms) : 0, 1186
crashtracking [candidate] (1.207 ms) : 0, 1207
BytebuddyAgent [baseline] (630.262 ms) : 0, 630262
BytebuddyAgent [candidate] (630.836 ms) : 0, 630836
AgentMeter [baseline] (29.425 ms) : 0, 29425
AgentMeter [candidate] (29.319 ms) : 0, 29319
GlobalTracer [baseline] (259.05 ms) : 0, 259050
GlobalTracer [candidate] (257.889 ms) : 0, 257889
AppSec [baseline] (31.991 ms) : 0, 31991
AppSec [candidate] (31.788 ms) : 0, 31788
Debugger [baseline] (60.904 ms) : 0, 60904
Debugger [candidate] (60.57 ms) : 0, 60570
Remote Config [baseline] (593.059 µs) : 0, 593
Remote Config [candidate] (589.611 µs) : 0, 590
Telemetry [baseline] (8.068 ms) : 0, 8068
Telemetry [candidate] (8.801 ms) : 0, 8801
Flare Poller [baseline] (4.288 ms) : 0, 4288
Flare Poller [candidate] (4.243 ms) : 0, 4243
section appsec
crashtracking [baseline] (1.19 ms) : 0, 1190
crashtracking [candidate] (1.178 ms) : 0, 1178
BytebuddyAgent [baseline] (658.448 ms) : 0, 658448
BytebuddyAgent [candidate] (657.174 ms) : 0, 657174
AgentMeter [baseline] (12.012 ms) : 0, 12012
AgentMeter [candidate] (12.014 ms) : 0, 12014
GlobalTracer [baseline] (258.78 ms) : 0, 258780
GlobalTracer [candidate] (258.437 ms) : 0, 258437
AppSec [baseline] (178.35 ms) : 0, 178350
AppSec [candidate] (177.933 ms) : 0, 177933
Debugger [baseline] (66.679 ms) : 0, 66679
Debugger [candidate] (66.384 ms) : 0, 66384
Remote Config [baseline] (616.15 µs) : 0, 616
Remote Config [candidate] (619.901 µs) : 0, 620
Telemetry [baseline] (8.417 ms) : 0, 8417
Telemetry [candidate] (8.356 ms) : 0, 8356
Flare Poller [baseline] (3.628 ms) : 0, 3628
Flare Poller [candidate] (3.652 ms) : 0, 3652
IAST [baseline] (24.251 ms) : 0, 24251
IAST [candidate] (24.166 ms) : 0, 24166
section iast
crashtracking [baseline] (1.228 ms) : 0, 1228
crashtracking [candidate] (1.214 ms) : 0, 1214
BytebuddyAgent [baseline] (803.864 ms) : 0, 803864
BytebuddyAgent [candidate] (805.692 ms) : 0, 805692
AgentMeter [baseline] (11.625 ms) : 0, 11625
AgentMeter [candidate] (11.654 ms) : 0, 11654
GlobalTracer [baseline] (249.51 ms) : 0, 249510
GlobalTracer [candidate] (249.998 ms) : 0, 249998
AppSec [baseline] (26.842 ms) : 0, 26842
AppSec [candidate] (27.071 ms) : 0, 27071
Debugger [baseline] (71.106 ms) : 0, 71106
Debugger [candidate] (70.657 ms) : 0, 70657
Remote Config [baseline] (536.84 µs) : 0, 537
Remote Config [candidate] (525.952 µs) : 0, 526
Telemetry [baseline] (9.245 ms) : 0, 9245
Telemetry [candidate] (9.157 ms) : 0, 9157
Flare Poller [baseline] (3.357 ms) : 0, 3357
Flare Poller [candidate] (3.337 ms) : 0, 3337
IAST [baseline] (25.605 ms) : 0, 25605
IAST [candidate] (25.734 ms) : 0, 25734
section profiling
crashtracking [baseline] (1.173 ms) : 0, 1173
crashtracking [candidate] (1.209 ms) : 0, 1209
BytebuddyAgent [baseline] (684.02 ms) : 0, 684020
BytebuddyAgent [candidate] (683.527 ms) : 0, 683527
AgentMeter [baseline] (8.63 ms) : 0, 8630
AgentMeter [candidate] (8.651 ms) : 0, 8651
GlobalTracer [baseline] (215.584 ms) : 0, 215584
GlobalTracer [candidate] (215.643 ms) : 0, 215643
AppSec [baseline] (32.211 ms) : 0, 32211
AppSec [candidate] (32.197 ms) : 0, 32197
Debugger [baseline] (66.071 ms) : 0, 66071
Debugger [candidate] (65.767 ms) : 0, 65767
Remote Config [baseline] (568.535 µs) : 0, 569
Remote Config [candidate] (559.235 µs) : 0, 559
Telemetry [baseline] (7.742 ms) : 0, 7742
Telemetry [candidate] (7.712 ms) : 0, 7712
Flare Poller [baseline] (3.456 ms) : 0, 3456
Flare Poller [candidate] (3.442 ms) : 0, 3442
ProfilingAgent [baseline] (94.565 ms) : 0, 94565
ProfilingAgent [candidate] (94.048 ms) : 0, 94048
Profiling [baseline] (95.137 ms) : 0, 95137
Profiling [candidate] (94.607 ms) : 0, 94607

Load

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	brian.marks/fix-kafka-dsm-disabled-flaky-test-v2
git_commit_date	1773849990	1773961072
git_commit_sha	`9352dfa`	`f100979`
release_version	1.61.0-SNAPSHOT~9352dfa345	1.61.0-SNAPSHOT~f10097968b

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1773963670	1773963670
ci_job_id	1524004067	1524004067
ci_pipeline_id	103634932	103634932
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-0-4fqzougn 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-0-4fqzougn 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 1 performance improvements and 1 performance regressions! Performance is the same for 19 metrics, 15 unstable metrics.

scenario	Δ mean agg_http_req_duration_p50	Δ mean agg_http_req_duration_p95	Δ mean throughput	candidate mean agg_http_req_duration_p50	candidate mean agg_http_req_duration_p95	candidate mean throughput	baseline mean agg_http_req_duration_p50	baseline mean agg_http_req_duration_p95	baseline mean throughput
scenario:load:petclinic:no_agent:high_load	better [-2.036ms; -0.761ms] or [-11.343%; -4.238%]	unsure [-3.201ms; -0.345ms] or [-10.632%; -1.146%]	unstable [-13.702op/s; +48.827op/s] or [-5.410%; +19.278%]	16.552ms	28.335ms	270.844op/s	17.950ms	30.108ms	253.281op/s
scenario:load:petclinic:profiling:high_load	worse [+0.727ms; +1.605ms] or [+4.026%; +8.887%]	unsure [+236.422µs; +1638.891µs] or [+0.799%; +5.537%]	unstable [-42.361op/s; +15.111op/s] or [-16.713%; +5.962%]	19.222ms	30.536ms	239.844op/s	18.057ms	29.598ms	253.469op/s

Request duration reports for petclinic

gantt
    title petclinic - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~f10097968b, baseline=1.61.0-SNAPSHOT~9352dfa345
    dateFormat X
    axisFormat %s
section baseline
no_agent (18.426 ms) : 18238, 18614
.   : milestone, 18426,
appsec (18.433 ms) : 18248, 18618
.   : milestone, 18433,
code_origins (17.957 ms) : 17776, 18138
.   : milestone, 17957,
iast (18.179 ms) : 17995, 18362
.   : milestone, 18179,
profiling (18.413 ms) : 18228, 18598
.   : milestone, 18413,
tracing (17.753 ms) : 17574, 17932
.   : milestone, 17753,
section candidate
no_agent (17.225 ms) : 17051, 17399
.   : milestone, 17225,
appsec (18.339 ms) : 18156, 18522
.   : milestone, 18339,
code_origins (18.003 ms) : 17823, 18183
.   : milestone, 18003,
iast (17.954 ms) : 17775, 18132
.   : milestone, 17954,
profiling (19.464 ms) : 19273, 19656
.   : milestone, 19464,
tracing (18.554 ms) : 18364, 18745
.   : milestone, 18554,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	18.426 ms [18.238 ms, 18.614 ms]	-
appsec	18.433 ms [18.248 ms, 18.618 ms]	7.01 µs (0.0%)
code_origins	17.957 ms [17.776 ms, 18.138 ms]	-469.376 µs (-2.5%)
iast	18.179 ms [17.995 ms, 18.362 ms]	-247.571 µs (-1.3%)
profiling	18.413 ms [18.228 ms, 18.598 ms]	-13.333 µs (-0.1%)
tracing	17.753 ms [17.574 ms, 17.932 ms]	-673.375 µs (-3.7%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	17.225 ms [17.051 ms, 17.399 ms]	-
appsec	18.339 ms [18.156 ms, 18.522 ms]	1.114 ms (6.5%)
code_origins	18.003 ms [17.823 ms, 18.183 ms]	777.629 µs (4.5%)
iast	17.954 ms [17.775 ms, 18.132 ms]	728.124 µs (4.2%)
profiling	19.464 ms [19.273 ms, 19.656 ms]	2.239 ms (13.0%)
tracing	18.554 ms [18.364 ms, 18.745 ms]	1.329 ms (7.7%)

Request duration reports for insecure-bank

gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~f10097968b, baseline=1.61.0-SNAPSHOT~9352dfa345
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.17 ms) : 1159, 1181
.   : milestone, 1170,
iast (3.053 ms) : 3013, 3092
.   : milestone, 3053,
iast_FULL (5.921 ms) : 5861, 5981
.   : milestone, 5921,
iast_GLOBAL (3.431 ms) : 3379, 3482
.   : milestone, 3431,
profiling (2.187 ms) : 2165, 2209
.   : milestone, 2187,
tracing (1.821 ms) : 1804, 1837
.   : milestone, 1821,
section candidate
no_agent (1.178 ms) : 1167, 1190
.   : milestone, 1178,
iast (3.156 ms) : 3109, 3202
.   : milestone, 3156,
iast_FULL (5.863 ms) : 5803, 5923
.   : milestone, 5863,
iast_GLOBAL (3.529 ms) : 3471, 3587
.   : milestone, 3529,
profiling (2.222 ms) : 2200, 2244
.   : milestone, 2222,
tracing (1.785 ms) : 1770, 1800
.   : milestone, 1785,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	1.17 ms [1.159 ms, 1.181 ms]	-
iast	3.053 ms [3.013 ms, 3.092 ms]	1.883 ms (161.0%)
iast_FULL	5.921 ms [5.861 ms, 5.981 ms]	4.751 ms (406.2%)
iast_GLOBAL	3.431 ms [3.379 ms, 3.482 ms]	2.261 ms (193.3%)
profiling	2.187 ms [2.165 ms, 2.209 ms]	1.017 ms (87.0%)
tracing	1.821 ms [1.804 ms, 1.837 ms]	650.762 µs (55.6%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	1.178 ms [1.167 ms, 1.19 ms]	-
iast	3.156 ms [3.109 ms, 3.202 ms]	1.978 ms (167.8%)
iast_FULL	5.863 ms [5.803 ms, 5.923 ms]	4.685 ms (397.6%)
iast_GLOBAL	3.529 ms [3.471 ms, 3.587 ms]	2.351 ms (199.5%)
profiling	2.222 ms [2.2 ms, 2.244 ms]	1.044 ms (88.6%)
tracing	1.785 ms [1.77 ms, 1.8 ms]	606.547 µs (51.5%)

Dacapo

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	brian.marks/fix-kafka-dsm-disabled-flaky-test-v2
git_commit_date	1773849990	1773961072
git_commit_sha	`9352dfa`	`f100979`
release_version	1.61.0-SNAPSHOT~9352dfa345	1.61.0-SNAPSHOT~f10097968b

See matching parameters

	Baseline	Candidate
application	biojava	biojava
ci_job_date	1773963278	1773963278
ci_job_id	1524004068	1524004068
ci_pipeline_id	103634932	103634932
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-1-qljq6kwc 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-1-qljq6kwc 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 10 metrics, 2 unstable metrics.

Execution time for biojava

gantt
    title biojava - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~f10097968b, baseline=1.61.0-SNAPSHOT~9352dfa345
    dateFormat X
    axisFormat %s
section baseline
no_agent (14.783 s) : 14783000, 14783000
.   : milestone, 14783000,
appsec (14.687 s) : 14687000, 14687000
.   : milestone, 14687000,
iast (18.642 s) : 18642000, 18642000
.   : milestone, 18642000,
iast_GLOBAL (18.301 s) : 18301000, 18301000
.   : milestone, 18301000,
profiling (15.015 s) : 15015000, 15015000
.   : milestone, 15015000,
tracing (14.886 s) : 14886000, 14886000
.   : milestone, 14886000,
section candidate
no_agent (15.496 s) : 15496000, 15496000
.   : milestone, 15496000,
appsec (14.724 s) : 14724000, 14724000
.   : milestone, 14724000,
iast (18.384 s) : 18384000, 18384000
.   : milestone, 18384000,
iast_GLOBAL (18.027 s) : 18027000, 18027000
.   : milestone, 18027000,
profiling (15.165 s) : 15165000, 15165000
.   : milestone, 15165000,
tracing (14.779 s) : 14779000, 14779000
.   : milestone, 14779000,

baseline results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	14.783 s [14.783 s, 14.783 s]	-
appsec	14.687 s [14.687 s, 14.687 s]	-96.0 ms (-0.6%)
iast	18.642 s [18.642 s, 18.642 s]	3.859 s (26.1%)
iast_GLOBAL	18.301 s [18.301 s, 18.301 s]	3.518 s (23.8%)
profiling	15.015 s [15.015 s, 15.015 s]	232.0 ms (1.6%)
tracing	14.886 s [14.886 s, 14.886 s]	103.0 ms (0.7%)

candidate results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	15.496 s [15.496 s, 15.496 s]	-
appsec	14.724 s [14.724 s, 14.724 s]	-772.0 ms (-5.0%)
iast	18.384 s [18.384 s, 18.384 s]	2.888 s (18.6%)
iast_GLOBAL	18.027 s [18.027 s, 18.027 s]	2.531 s (16.3%)
profiling	15.165 s [15.165 s, 15.165 s]	-331.0 ms (-2.1%)
tracing	14.779 s [14.779 s, 14.779 s]	-717.0 ms (-4.6%)

Execution time for tomcat

gantt
    title tomcat - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~f10097968b, baseline=1.61.0-SNAPSHOT~9352dfa345
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.47 ms) : 1459, 1481
.   : milestone, 1470,
appsec (3.728 ms) : 3511, 3944
.   : milestone, 3728,
iast (2.25 ms) : 2181, 2319
.   : milestone, 2250,
iast_GLOBAL (2.29 ms) : 2221, 2359
.   : milestone, 2290,
profiling (2.504 ms) : 2277, 2731
.   : milestone, 2504,
tracing (2.049 ms) : 1996, 2102
.   : milestone, 2049,
section candidate
no_agent (1.475 ms) : 1463, 1486
.   : milestone, 1475,
appsec (3.814 ms) : 3594, 4034
.   : milestone, 3814,
iast (2.249 ms) : 2180, 2317
.   : milestone, 2249,
iast_GLOBAL (2.29 ms) : 2221, 2359
.   : milestone, 2290,
profiling (2.486 ms) : 2334, 2638
.   : milestone, 2486,
tracing (2.06 ms) : 2006, 2113
.   : milestone, 2060,

baseline results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	1.47 ms [1.459 ms, 1.481 ms]	-
appsec	3.728 ms [3.511 ms, 3.944 ms]	2.258 ms (153.6%)
iast	2.25 ms [2.181 ms, 2.319 ms]	780.175 µs (53.1%)
iast_GLOBAL	2.29 ms [2.221 ms, 2.359 ms]	819.755 µs (55.8%)
profiling	2.504 ms [2.277 ms, 2.731 ms]	1.034 ms (70.3%)
tracing	2.049 ms [1.996 ms, 2.102 ms]	578.785 µs (39.4%)

candidate results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	1.475 ms [1.463 ms, 1.486 ms]	-
appsec	3.814 ms [3.594 ms, 4.034 ms]	2.339 ms (158.6%)
iast	2.249 ms [2.18 ms, 2.317 ms]	773.963 µs (52.5%)
iast_GLOBAL	2.29 ms [2.221 ms, 2.359 ms]	815.609 µs (55.3%)
profiling	2.486 ms [2.334 ms, 2.638 ms]	1.012 ms (68.6%)
tracing	2.06 ms [2.006 ms, 2.113 ms]	585.038 µs (39.7%)

pr-commenter · 2026-03-10T20:39:39Z

Kafka / consumer-benchmark

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	brian.marks/fix-kafka-dsm-disabled-flaky-test-v2
git_commit_date	1773849990	1773961072
git_commit_sha	`9352dfa`	`f100979`

See matching parameters

	Baseline	Candidate
ci_job_date	1773962427	1773962427
ci_job_id	1524004072	1524004072
ci_pipeline_id	103634932	103634932
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
jdkVersion	11.0.25	11.0.25
jmhVersion	1.36	1.36
jvm	/usr/lib/jvm/java-11-openjdk-amd64/bin/java	/usr/lib/jvm/java-11-openjdk-amd64/bin/java
jvmArgs	-Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/consumer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant	-Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/consumer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant
vmName	OpenJDK 64-Bit Server VM	OpenJDK 64-Bit Server VM
vmVersion	11.0.25+9-post-Ubuntu-1ubuntu122.04	11.0.25+9-post-Ubuntu-1ubuntu122.04

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 3 metrics, 0 unstable metrics.

See unchanged results

scenario	Δ mean throughput
scenario:not-instrumented/KafkaConsumerBenchmark.benchConsume	same
scenario:only-tracing-dsm-disabled-benchmarks/KafkaConsumerBenchmark.benchConsume	unsure [-12911.355op/s; -1878.325op/s] or [-4.212%; -0.613%]
scenario:only-tracing-dsm-enabled-benchmarks/KafkaConsumerBenchmark.benchConsume	same

The test mapped consumer traces to producer spans by positional index after SORT_TRACES_BY_ID sorting. Since trace IDs are random, the consumer-to-producer mapping was non-deterministic, causing intermittent `span.parentId == parent.spanId` assertion failures. Fix by dynamically finding each consumer span's actual parent producer span via parentId matching instead of relying on sort order. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use SEQUENTIAL id.generation.strategy in the DSM-disabled Kafka test to force a deterministic sort order for SORT_TRACES_BY_ID. Sequential IDs sort traces in creation order, which differs from the reverse mapping the original positional code assumed. This proves the dynamic parent lookup fix handles any trace ordering. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…test Remove injectSysConfig("id.generation.strategy", "SEQUENTIAL") which did not actually trigger the flake. Add KafkaClientDsmDisabledRandomIdsForkedTest that overrides idGenerationStrategyName() to "RANDOM", matching production behavior. With RANDOM IDs, SORT_TRACES_BY_ID produces non-deterministic order, causing the original positional consumer-to-producer mapping to fail ~95% of the time. Switch the batch consume test to SORT_TRACES_BY_START so the parent trace (started before any consumer receives messages) is always at index 0. The dynamic parent lookup fix handles any ordering of the 3 consumer traces. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…est-v2

bm1549 · 2026-03-18T17:36:52Z

...va-agent/instrumentation/kafka/kafka-clients-0.11/src/test/groovy/KafkaClientTestBase.groovy

+
+/**
+ * Reproduces the flake in "test spring kafka template produce and batch consume"
+ * by using RANDOM IDs (instead of the default SEQUENTIAL used in tests).
+ *
+ * Root cause: The test's assertTraces(4, SORT_TRACES_BY_ID) sorts traces by
+ * localRootSpan.spanId, then hardcodes positional mappings between consumer and
+ * producer traces. With SEQUENTIAL IDs (the test default), both the producer span
+ * finish order within trace(0) and the consumer trace sort order are driven by the
+ * same Kafka internal ordering, so the mapping happens to be consistent.
+ *
+ * With RANDOM IDs (as used in production), the sort order becomes non-deterministic.
+ * There are 3! = 6 possible orderings for the 3 consumer traces, and only 1 matches
+ * the hardcoded mapping. The dynamic parent lookup fix handles any ordering.
+ */
+class KafkaClientDsmDisabledRandomIdsForkedTest extends KafkaClientDataStreamsDisabledForkedTest {
+  @Override
+  protected String idGenerationStrategyName() {
+    return "RANDOM"
+  }
+}


Suggested change

/**

* Reproduces the flake in "test spring kafka template produce and batch consume"

* by using RANDOM IDs (instead of the default SEQUENTIAL used in tests).

*

* Root cause: The test's assertTraces(4, SORT_TRACES_BY_ID) sorts traces by

* localRootSpan.spanId, then hardcodes positional mappings between consumer and

* producer traces. With SEQUENTIAL IDs (the test default), both the producer span

* finish order within trace(0) and the consumer trace sort order are driven by the

* same Kafka internal ordering, so the mapping happens to be consistent.

*

* With RANDOM IDs (as used in production), the sort order becomes non-deterministic.

* There are 3! = 6 possible orderings for the 3 consumer traces, and only 1 matches

* the hardcoded mapping. The dynamic parent lookup fix handles any ordering.

*/

class KafkaClientDsmDisabledRandomIdsForkedTest extends KafkaClientDataStreamsDisabledForkedTest {

@Override

protected String idGenerationStrategyName() {

return "RANDOM"

}

}

I'd suggest leaving it in since it provides a more representative test case, but I'm happy to remove it if folks think otherwise

Update 6 more test methods that use SORT_TRACES_BY_ID with hardcoded positional trace references to use SORT_TRACES_BY_START, so they work with the KafkaClientDsmDisabledRandomIdsForkedTest that uses RANDOM span IDs. For the backwards iteration test with 9 traces, also use dynamic parent matching since consumer trace ordering may vary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

bm1549 added type: bug Bug report and fix tag: flaky test Flaky tests inst: kafka Kafka instrumentation tag: ai generated Largely based on code generated by an AI or LLM tag: no release notes Changes to exclude from release notes labels Mar 10, 2026

bm1549 force-pushed the brian.marks/fix-kafka-dsm-disabled-flaky-test-v2 branch from 79b8bde to 106ceb0 Compare March 10, 2026 20:50

bm1549 marked this pull request as ready for review March 10, 2026 22:00

bm1549 requested review from a team as code owners March 10, 2026 22:00

bm1549 force-pushed the brian.marks/fix-kafka-dsm-disabled-flaky-test-v2 branch from 106ceb0 to dbe89c4 Compare March 17, 2026 11:01

bm1549 and others added 3 commits March 17, 2026 11:12

bm1549 force-pushed the brian.marks/fix-kafka-dsm-disabled-flaky-test-v2 branch from c6cb1d4 to 45636bc Compare March 17, 2026 15:13

Merge branch 'master' into brian.marks/fix-kafka-dsm-disabled-flaky-t…

24d96fd

…est-v2

bm1549 commented Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky KafkaClientDataStreamsDisabledForkedTest batch consume test#10797

Fix flaky KafkaClientDataStreamsDisabledForkedTest batch consume test#10797
bm1549 wants to merge 5 commits intomasterfrom
brian.marks/fix-kafka-dsm-disabled-flaky-test-v2

bm1549 commented Mar 10, 2026 •

edited

Loading

Uh oh!

pr-commenter bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

pr-commenter bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

pr-commenter bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

bm1549 Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bm1549 commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What Does This Do

Motivation

Root Cause

Fix

Affected Test Methods

Additional Notes

Uh oh!

pr-commenter bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Kafka / producer-benchmark

Parameters

Summary

Uh oh!

pr-commenter bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Startup

Parameters

Summary

Load

Parameters

Summary

Dacapo

Parameters

Summary

Uh oh!

pr-commenter bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Kafka / consumer-benchmark

Parameters

Summary

Uh oh!

bm1549 Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bm1549 commented Mar 10, 2026 •

edited

Loading

pr-commenter bot commented Mar 10, 2026 •

edited

Loading

pr-commenter bot commented Mar 10, 2026 •

edited

Loading

pr-commenter bot commented Mar 10, 2026 •

edited

Loading