Skip to content

Fix flaky KafkaClientDataStreamsDisabledForkedTest batch consume test#10797

Open
bm1549 wants to merge 5 commits intomasterfrom
brian.marks/fix-kafka-dsm-disabled-flaky-test-v2
Open

Fix flaky KafkaClientDataStreamsDisabledForkedTest batch consume test#10797
bm1549 wants to merge 5 commits intomasterfrom
brian.marks/fix-kafka-dsm-disabled-flaky-test-v2

Conversation

@bm1549
Copy link
Contributor

@bm1549 bm1549 commented Mar 10, 2026

What Does This Do

Fixes flaky Kafka tests in KafkaClientDataStreamsDisabledForkedTest by replacing SORT_TRACES_BY_ID with SORT_TRACES_BY_START and using dynamic parent lookup instead of hardcoded positional trace references.

Motivation

Multiple test methods use SORT_TRACES_BY_ID which sorts traces by their root span's spanId. With SEQUENTIAL ID generation (the test default), this happens to match creation order. With RANDOM IDs (production behavior), the sort order is non-deterministic, breaking hardcoded consumer-to-producer trace mappings.

Root Cause

SORT_TRACES_BY_ID sorts by span ID, which is only deterministic with sequential IDs. The test hardcodes positional mappings like trace(1)→trace(0)[6] that assume a specific sort order. With RANDOM IDs, these mappings break.

Fix

  1. SORT_TRACES_BY_START: Producer traces always start before consumer traces, giving deterministic ordering by start time regardless of ID strategy.
  2. Dynamic parent matching: For tests with multiple consumer traces (batch consume, backwards iteration), dynamically match each consumer span's parentId to the correct producer span instead of assuming positional indices.
  3. KafkaClientDsmDisabledRandomIdsForkedTest: New test class that uses RANDOM IDs to reproduce the issue deterministically.

Affected Test Methods

  • test spring kafka template produce and batch consume — dynamic parent matching
  • test spring kafka template produce and consume — SORT_TRACES_BY_START
  • test pass through tombstone — SORT_TRACES_BY_START
  • test records(TopicPartition) kafka consume — SORT_TRACES_BY_START
  • test records(TopicPartition).subList kafka consume — SORT_TRACES_BY_START
  • test records(TopicPartition).forEach kafka consume — SORT_TRACES_BY_START
  • test iteration backwards over ConsumerRecords — SORT_TRACES_BY_START + dynamic parent matching

Additional Notes

  • Only the !hasQueueSpan() branches use dynamic matching (the hasQueueSpan() branches are unchanged)
  • All changes are in test assertion logic only — no production code modified

Jira ticket: N/A

🤖 Generated with Claude Code

@bm1549 bm1549 added type: bug Bug report and fix tag: flaky test Flaky tests inst: kafka Kafka instrumentation tag: ai generated Largely based on code generated by an AI or LLM tag: no release notes Changes to exclude from release notes labels Mar 10, 2026
@pr-commenter
Copy link

pr-commenter bot commented Mar 10, 2026

Kafka / producer-benchmark

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master brian.marks/fix-kafka-dsm-disabled-flaky-test-v2
git_commit_date 1773849990 1773855291
git_commit_sha 9352dfa 24d96fd
See matching parameters
Baseline Candidate
ci_job_date 1773856361 1773856361
ci_job_id 1518697788 1518697788
ci_pipeline_id 103339041 103339041
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
jdkVersion 11.0.25 11.0.25
jmhVersion 1.36 1.36
jvm /usr/lib/jvm/java-11-openjdk-amd64/bin/java /usr/lib/jvm/java-11-openjdk-amd64/bin/java
jvmArgs -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/producer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/producer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant
vmName OpenJDK 64-Bit Server VM OpenJDK 64-Bit Server VM
vmVersion 11.0.25+9-post-Ubuntu-1ubuntu122.04 11.0.25+9-post-Ubuntu-1ubuntu122.04

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 3 metrics, 0 unstable metrics.

See unchanged results
scenario Δ mean throughput
scenario:not-instrumented/KafkaProduceBenchmark.benchProduce same
scenario:only-tracing-dsm-disabled-benchmarks/KafkaProduceBenchmark.benchProduce same
scenario:only-tracing-dsm-enabled-benchmarks/KafkaProduceBenchmark.benchProduce same

@pr-commenter
Copy link

pr-commenter bot commented Mar 10, 2026

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master brian.marks/fix-kafka-dsm-disabled-flaky-test-v2
git_commit_date 1773849990 1773961072
git_commit_sha 9352dfa f100979
release_version 1.61.0-SNAPSHOT~9352dfa345 1.61.0-SNAPSHOT~f10097968b
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1773963119 1773963119
ci_job_id 1524004066 1524004066
ci_pipeline_id 103634932 103634932
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-1-qqf19qee 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-1-qqf19qee 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module Agent Agent
parent None None

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 64 metrics, 7 unstable metrics.

Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.61.0-SNAPSHOT~f10097968b, baseline=1.61.0-SNAPSHOT~9352dfa345

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.063 s) : 0, 1062643
Total [baseline] (8.883 s) : 0, 8882623
Agent [candidate] (1.066 s) : 0, 1066388
Total [candidate] (8.897 s) : 0, 8896926
section iast
Agent [baseline] (1.227 s) : 0, 1226852
Total [baseline] (9.571 s) : 0, 9570764
Agent [candidate] (1.226 s) : 0, 1225992
Total [candidate] (9.6 s) : 0, 9599541
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.063 s -
Agent iast 1.227 s 164.209 ms (15.5%)
Total tracing 8.883 s -
Total iast 9.571 s 688.141 ms (7.7%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.066 s -
Agent iast 1.226 s 159.605 ms (15.0%)
Total tracing 8.897 s -
Total iast 9.6 s 702.614 ms (7.9%)
gantt
    title insecure-bank - break down per module: candidate=1.61.0-SNAPSHOT~f10097968b, baseline=1.61.0-SNAPSHOT~9352dfa345

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.208 ms) : 0, 1208
crashtracking [candidate] (1.207 ms) : 0, 1207
BytebuddyAgent [baseline] (632.462 ms) : 0, 632462
BytebuddyAgent [candidate] (633.048 ms) : 0, 633048
AgentMeter [baseline] (29.393 ms) : 0, 29393
AgentMeter [candidate] (29.478 ms) : 0, 29478
GlobalTracer [baseline] (258.043 ms) : 0, 258043
GlobalTracer [candidate] (259.457 ms) : 0, 259457
AppSec [baseline] (31.538 ms) : 0, 31538
AppSec [candidate] (31.852 ms) : 0, 31852
Debugger [baseline] (59.444 ms) : 0, 59444
Debugger [candidate] (59.96 ms) : 0, 59960
Remote Config [baseline] (581.363 µs) : 0, 581
Remote Config [candidate] (584.55 µs) : 0, 585
Telemetry [baseline] (8.034 ms) : 0, 8034
Telemetry [candidate] (8.738 ms) : 0, 8738
Flare Poller [baseline] (5.76 ms) : 0, 5760
Flare Poller [candidate] (5.797 ms) : 0, 5797
section iast
crashtracking [baseline] (1.216 ms) : 0, 1216
crashtracking [candidate] (1.186 ms) : 0, 1186
BytebuddyAgent [baseline] (795.929 ms) : 0, 795929
BytebuddyAgent [candidate] (795.283 ms) : 0, 795283
AgentMeter [baseline] (11.341 ms) : 0, 11341
AgentMeter [candidate] (11.274 ms) : 0, 11274
GlobalTracer [baseline] (247.486 ms) : 0, 247486
GlobalTracer [candidate] (247.231 ms) : 0, 247231
AppSec [baseline] (26.488 ms) : 0, 26488
AppSec [candidate] (26.453 ms) : 0, 26453
Debugger [baseline] (69.136 ms) : 0, 69136
Debugger [candidate] (70.052 ms) : 0, 70052
Remote Config [baseline] (528.454 µs) : 0, 528
Remote Config [candidate] (533.945 µs) : 0, 534
Telemetry [baseline] (9.722 ms) : 0, 9722
Telemetry [candidate] (9.212 ms) : 0, 9212
Flare Poller [baseline] (3.494 ms) : 0, 3494
Flare Poller [candidate] (3.414 ms) : 0, 3414
IAST [baseline] (25.316 ms) : 0, 25316
IAST [candidate] (25.299 ms) : 0, 25299
Loading
Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.61.0-SNAPSHOT~f10097968b, baseline=1.61.0-SNAPSHOT~9352dfa345

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.062 s) : 0, 1061900
Total [baseline] (11.125 s) : 0, 11125064
Agent [candidate] (1.061 s) : 0, 1061302
Total [candidate] (11.056 s) : 0, 11055821
section appsec
Agent [baseline] (1.249 s) : 0, 1248696
Total [baseline] (11.117 s) : 0, 11116621
Agent [candidate] (1.246 s) : 0, 1246224
Total [candidate] (11.083 s) : 0, 11082961
section iast
Agent [baseline] (1.239 s) : 0, 1239293
Total [baseline] (11.403 s) : 0, 11403024
Agent [candidate] (1.241 s) : 0, 1241336
Total [candidate] (11.316 s) : 0, 11315630
section profiling
Agent [baseline] (1.185 s) : 0, 1184835
Total [baseline] (11.071 s) : 0, 11070964
Agent [candidate] (1.184 s) : 0, 1183520
Total [candidate] (11.029 s) : 0, 11029413
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.062 s -
Agent appsec 1.249 s 186.796 ms (17.6%)
Agent iast 1.239 s 177.392 ms (16.7%)
Agent profiling 1.185 s 122.935 ms (11.6%)
Total tracing 11.125 s -
Total appsec 11.117 s -8.443 ms (-0.1%)
Total iast 11.403 s 277.96 ms (2.5%)
Total profiling 11.071 s -54.101 ms (-0.5%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.061 s -
Agent appsec 1.246 s 184.921 ms (17.4%)
Agent iast 1.241 s 180.033 ms (17.0%)
Agent profiling 1.184 s 122.217 ms (11.5%)
Total tracing 11.056 s -
Total appsec 11.083 s 27.14 ms (0.2%)
Total iast 11.316 s 259.809 ms (2.3%)
Total profiling 11.029 s -26.408 ms (-0.2%)
gantt
    title petclinic - break down per module: candidate=1.61.0-SNAPSHOT~f10097968b, baseline=1.61.0-SNAPSHOT~9352dfa345

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.186 ms) : 0, 1186
crashtracking [candidate] (1.207 ms) : 0, 1207
BytebuddyAgent [baseline] (630.262 ms) : 0, 630262
BytebuddyAgent [candidate] (630.836 ms) : 0, 630836
AgentMeter [baseline] (29.425 ms) : 0, 29425
AgentMeter [candidate] (29.319 ms) : 0, 29319
GlobalTracer [baseline] (259.05 ms) : 0, 259050
GlobalTracer [candidate] (257.889 ms) : 0, 257889
AppSec [baseline] (31.991 ms) : 0, 31991
AppSec [candidate] (31.788 ms) : 0, 31788
Debugger [baseline] (60.904 ms) : 0, 60904
Debugger [candidate] (60.57 ms) : 0, 60570
Remote Config [baseline] (593.059 µs) : 0, 593
Remote Config [candidate] (589.611 µs) : 0, 590
Telemetry [baseline] (8.068 ms) : 0, 8068
Telemetry [candidate] (8.801 ms) : 0, 8801
Flare Poller [baseline] (4.288 ms) : 0, 4288
Flare Poller [candidate] (4.243 ms) : 0, 4243
section appsec
crashtracking [baseline] (1.19 ms) : 0, 1190
crashtracking [candidate] (1.178 ms) : 0, 1178
BytebuddyAgent [baseline] (658.448 ms) : 0, 658448
BytebuddyAgent [candidate] (657.174 ms) : 0, 657174
AgentMeter [baseline] (12.012 ms) : 0, 12012
AgentMeter [candidate] (12.014 ms) : 0, 12014
GlobalTracer [baseline] (258.78 ms) : 0, 258780
GlobalTracer [candidate] (258.437 ms) : 0, 258437
AppSec [baseline] (178.35 ms) : 0, 178350
AppSec [candidate] (177.933 ms) : 0, 177933
Debugger [baseline] (66.679 ms) : 0, 66679
Debugger [candidate] (66.384 ms) : 0, 66384
Remote Config [baseline] (616.15 µs) : 0, 616
Remote Config [candidate] (619.901 µs) : 0, 620
Telemetry [baseline] (8.417 ms) : 0, 8417
Telemetry [candidate] (8.356 ms) : 0, 8356
Flare Poller [baseline] (3.628 ms) : 0, 3628
Flare Poller [candidate] (3.652 ms) : 0, 3652
IAST [baseline] (24.251 ms) : 0, 24251
IAST [candidate] (24.166 ms) : 0, 24166
section iast
crashtracking [baseline] (1.228 ms) : 0, 1228
crashtracking [candidate] (1.214 ms) : 0, 1214
BytebuddyAgent [baseline] (803.864 ms) : 0, 803864
BytebuddyAgent [candidate] (805.692 ms) : 0, 805692
AgentMeter [baseline] (11.625 ms) : 0, 11625
AgentMeter [candidate] (11.654 ms) : 0, 11654
GlobalTracer [baseline] (249.51 ms) : 0, 249510
GlobalTracer [candidate] (249.998 ms) : 0, 249998
AppSec [baseline] (26.842 ms) : 0, 26842
AppSec [candidate] (27.071 ms) : 0, 27071
Debugger [baseline] (71.106 ms) : 0, 71106
Debugger [candidate] (70.657 ms) : 0, 70657
Remote Config [baseline] (536.84 µs) : 0, 537
Remote Config [candidate] (525.952 µs) : 0, 526
Telemetry [baseline] (9.245 ms) : 0, 9245
Telemetry [candidate] (9.157 ms) : 0, 9157
Flare Poller [baseline] (3.357 ms) : 0, 3357
Flare Poller [candidate] (3.337 ms) : 0, 3337
IAST [baseline] (25.605 ms) : 0, 25605
IAST [candidate] (25.734 ms) : 0, 25734
section profiling
crashtracking [baseline] (1.173 ms) : 0, 1173
crashtracking [candidate] (1.209 ms) : 0, 1209
BytebuddyAgent [baseline] (684.02 ms) : 0, 684020
BytebuddyAgent [candidate] (683.527 ms) : 0, 683527
AgentMeter [baseline] (8.63 ms) : 0, 8630
AgentMeter [candidate] (8.651 ms) : 0, 8651
GlobalTracer [baseline] (215.584 ms) : 0, 215584
GlobalTracer [candidate] (215.643 ms) : 0, 215643
AppSec [baseline] (32.211 ms) : 0, 32211
AppSec [candidate] (32.197 ms) : 0, 32197
Debugger [baseline] (66.071 ms) : 0, 66071
Debugger [candidate] (65.767 ms) : 0, 65767
Remote Config [baseline] (568.535 µs) : 0, 569
Remote Config [candidate] (559.235 µs) : 0, 559
Telemetry [baseline] (7.742 ms) : 0, 7742
Telemetry [candidate] (7.712 ms) : 0, 7712
Flare Poller [baseline] (3.456 ms) : 0, 3456
Flare Poller [candidate] (3.442 ms) : 0, 3442
ProfilingAgent [baseline] (94.565 ms) : 0, 94565
ProfilingAgent [candidate] (94.048 ms) : 0, 94048
Profiling [baseline] (95.137 ms) : 0, 95137
Profiling [candidate] (94.607 ms) : 0, 94607
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master brian.marks/fix-kafka-dsm-disabled-flaky-test-v2
git_commit_date 1773849990 1773961072
git_commit_sha 9352dfa f100979
release_version 1.61.0-SNAPSHOT~9352dfa345 1.61.0-SNAPSHOT~f10097968b
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1773963670 1773963670
ci_job_id 1524004067 1524004067
ci_pipeline_id 103634932 103634932
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-4fqzougn 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-4fqzougn 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 1 performance improvements and 1 performance regressions! Performance is the same for 19 metrics, 15 unstable metrics.

scenario Δ mean agg_http_req_duration_p50 Δ mean agg_http_req_duration_p95 Δ mean throughput candidate mean agg_http_req_duration_p50 candidate mean agg_http_req_duration_p95 candidate mean throughput baseline mean agg_http_req_duration_p50 baseline mean agg_http_req_duration_p95 baseline mean throughput
scenario:load:petclinic:no_agent:high_load better
[-2.036ms; -0.761ms] or [-11.343%; -4.238%]
unsure
[-3.201ms; -0.345ms] or [-10.632%; -1.146%]
unstable
[-13.702op/s; +48.827op/s] or [-5.410%; +19.278%]
16.552ms 28.335ms 270.844op/s 17.950ms 30.108ms 253.281op/s
scenario:load:petclinic:profiling:high_load worse
[+0.727ms; +1.605ms] or [+4.026%; +8.887%]
unsure
[+236.422µs; +1638.891µs] or [+0.799%; +5.537%]
unstable
[-42.361op/s; +15.111op/s] or [-16.713%; +5.962%]
19.222ms 30.536ms 239.844op/s 18.057ms 29.598ms 253.469op/s
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~f10097968b, baseline=1.61.0-SNAPSHOT~9352dfa345
    dateFormat X
    axisFormat %s
section baseline
no_agent (18.426 ms) : 18238, 18614
.   : milestone, 18426,
appsec (18.433 ms) : 18248, 18618
.   : milestone, 18433,
code_origins (17.957 ms) : 17776, 18138
.   : milestone, 17957,
iast (18.179 ms) : 17995, 18362
.   : milestone, 18179,
profiling (18.413 ms) : 18228, 18598
.   : milestone, 18413,
tracing (17.753 ms) : 17574, 17932
.   : milestone, 17753,
section candidate
no_agent (17.225 ms) : 17051, 17399
.   : milestone, 17225,
appsec (18.339 ms) : 18156, 18522
.   : milestone, 18339,
code_origins (18.003 ms) : 17823, 18183
.   : milestone, 18003,
iast (17.954 ms) : 17775, 18132
.   : milestone, 17954,
profiling (19.464 ms) : 19273, 19656
.   : milestone, 19464,
tracing (18.554 ms) : 18364, 18745
.   : milestone, 18554,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 18.426 ms [18.238 ms, 18.614 ms] -
appsec 18.433 ms [18.248 ms, 18.618 ms] 7.01 µs (0.0%)
code_origins 17.957 ms [17.776 ms, 18.138 ms] -469.376 µs (-2.5%)
iast 18.179 ms [17.995 ms, 18.362 ms] -247.571 µs (-1.3%)
profiling 18.413 ms [18.228 ms, 18.598 ms] -13.333 µs (-0.1%)
tracing 17.753 ms [17.574 ms, 17.932 ms] -673.375 µs (-3.7%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 17.225 ms [17.051 ms, 17.399 ms] -
appsec 18.339 ms [18.156 ms, 18.522 ms] 1.114 ms (6.5%)
code_origins 18.003 ms [17.823 ms, 18.183 ms] 777.629 µs (4.5%)
iast 17.954 ms [17.775 ms, 18.132 ms] 728.124 µs (4.2%)
profiling 19.464 ms [19.273 ms, 19.656 ms] 2.239 ms (13.0%)
tracing 18.554 ms [18.364 ms, 18.745 ms] 1.329 ms (7.7%)
Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~f10097968b, baseline=1.61.0-SNAPSHOT~9352dfa345
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.17 ms) : 1159, 1181
.   : milestone, 1170,
iast (3.053 ms) : 3013, 3092
.   : milestone, 3053,
iast_FULL (5.921 ms) : 5861, 5981
.   : milestone, 5921,
iast_GLOBAL (3.431 ms) : 3379, 3482
.   : milestone, 3431,
profiling (2.187 ms) : 2165, 2209
.   : milestone, 2187,
tracing (1.821 ms) : 1804, 1837
.   : milestone, 1821,
section candidate
no_agent (1.178 ms) : 1167, 1190
.   : milestone, 1178,
iast (3.156 ms) : 3109, 3202
.   : milestone, 3156,
iast_FULL (5.863 ms) : 5803, 5923
.   : milestone, 5863,
iast_GLOBAL (3.529 ms) : 3471, 3587
.   : milestone, 3529,
profiling (2.222 ms) : 2200, 2244
.   : milestone, 2222,
tracing (1.785 ms) : 1770, 1800
.   : milestone, 1785,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.17 ms [1.159 ms, 1.181 ms] -
iast 3.053 ms [3.013 ms, 3.092 ms] 1.883 ms (161.0%)
iast_FULL 5.921 ms [5.861 ms, 5.981 ms] 4.751 ms (406.2%)
iast_GLOBAL 3.431 ms [3.379 ms, 3.482 ms] 2.261 ms (193.3%)
profiling 2.187 ms [2.165 ms, 2.209 ms] 1.017 ms (87.0%)
tracing 1.821 ms [1.804 ms, 1.837 ms] 650.762 µs (55.6%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.178 ms [1.167 ms, 1.19 ms] -
iast 3.156 ms [3.109 ms, 3.202 ms] 1.978 ms (167.8%)
iast_FULL 5.863 ms [5.803 ms, 5.923 ms] 4.685 ms (397.6%)
iast_GLOBAL 3.529 ms [3.471 ms, 3.587 ms] 2.351 ms (199.5%)
profiling 2.222 ms [2.2 ms, 2.244 ms] 1.044 ms (88.6%)
tracing 1.785 ms [1.77 ms, 1.8 ms] 606.547 µs (51.5%)

Dacapo

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master brian.marks/fix-kafka-dsm-disabled-flaky-test-v2
git_commit_date 1773849990 1773961072
git_commit_sha 9352dfa f100979
release_version 1.61.0-SNAPSHOT~9352dfa345 1.61.0-SNAPSHOT~f10097968b
See matching parameters
Baseline Candidate
application biojava biojava
ci_job_date 1773963278 1773963278
ci_job_id 1524004068 1524004068
ci_pipeline_id 103634932 103634932
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-1-qljq6kwc 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-1-qljq6kwc 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 10 metrics, 2 unstable metrics.

Execution time for biojava
gantt
    title biojava - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~f10097968b, baseline=1.61.0-SNAPSHOT~9352dfa345
    dateFormat X
    axisFormat %s
section baseline
no_agent (14.783 s) : 14783000, 14783000
.   : milestone, 14783000,
appsec (14.687 s) : 14687000, 14687000
.   : milestone, 14687000,
iast (18.642 s) : 18642000, 18642000
.   : milestone, 18642000,
iast_GLOBAL (18.301 s) : 18301000, 18301000
.   : milestone, 18301000,
profiling (15.015 s) : 15015000, 15015000
.   : milestone, 15015000,
tracing (14.886 s) : 14886000, 14886000
.   : milestone, 14886000,
section candidate
no_agent (15.496 s) : 15496000, 15496000
.   : milestone, 15496000,
appsec (14.724 s) : 14724000, 14724000
.   : milestone, 14724000,
iast (18.384 s) : 18384000, 18384000
.   : milestone, 18384000,
iast_GLOBAL (18.027 s) : 18027000, 18027000
.   : milestone, 18027000,
profiling (15.165 s) : 15165000, 15165000
.   : milestone, 15165000,
tracing (14.779 s) : 14779000, 14779000
.   : milestone, 14779000,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 14.783 s [14.783 s, 14.783 s] -
appsec 14.687 s [14.687 s, 14.687 s] -96.0 ms (-0.6%)
iast 18.642 s [18.642 s, 18.642 s] 3.859 s (26.1%)
iast_GLOBAL 18.301 s [18.301 s, 18.301 s] 3.518 s (23.8%)
profiling 15.015 s [15.015 s, 15.015 s] 232.0 ms (1.6%)
tracing 14.886 s [14.886 s, 14.886 s] 103.0 ms (0.7%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.496 s [15.496 s, 15.496 s] -
appsec 14.724 s [14.724 s, 14.724 s] -772.0 ms (-5.0%)
iast 18.384 s [18.384 s, 18.384 s] 2.888 s (18.6%)
iast_GLOBAL 18.027 s [18.027 s, 18.027 s] 2.531 s (16.3%)
profiling 15.165 s [15.165 s, 15.165 s] -331.0 ms (-2.1%)
tracing 14.779 s [14.779 s, 14.779 s] -717.0 ms (-4.6%)
Execution time for tomcat
gantt
    title tomcat - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~f10097968b, baseline=1.61.0-SNAPSHOT~9352dfa345
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.47 ms) : 1459, 1481
.   : milestone, 1470,
appsec (3.728 ms) : 3511, 3944
.   : milestone, 3728,
iast (2.25 ms) : 2181, 2319
.   : milestone, 2250,
iast_GLOBAL (2.29 ms) : 2221, 2359
.   : milestone, 2290,
profiling (2.504 ms) : 2277, 2731
.   : milestone, 2504,
tracing (2.049 ms) : 1996, 2102
.   : milestone, 2049,
section candidate
no_agent (1.475 ms) : 1463, 1486
.   : milestone, 1475,
appsec (3.814 ms) : 3594, 4034
.   : milestone, 3814,
iast (2.249 ms) : 2180, 2317
.   : milestone, 2249,
iast_GLOBAL (2.29 ms) : 2221, 2359
.   : milestone, 2290,
profiling (2.486 ms) : 2334, 2638
.   : milestone, 2486,
tracing (2.06 ms) : 2006, 2113
.   : milestone, 2060,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.47 ms [1.459 ms, 1.481 ms] -
appsec 3.728 ms [3.511 ms, 3.944 ms] 2.258 ms (153.6%)
iast 2.25 ms [2.181 ms, 2.319 ms] 780.175 µs (53.1%)
iast_GLOBAL 2.29 ms [2.221 ms, 2.359 ms] 819.755 µs (55.8%)
profiling 2.504 ms [2.277 ms, 2.731 ms] 1.034 ms (70.3%)
tracing 2.049 ms [1.996 ms, 2.102 ms] 578.785 µs (39.4%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.475 ms [1.463 ms, 1.486 ms] -
appsec 3.814 ms [3.594 ms, 4.034 ms] 2.339 ms (158.6%)
iast 2.249 ms [2.18 ms, 2.317 ms] 773.963 µs (52.5%)
iast_GLOBAL 2.29 ms [2.221 ms, 2.359 ms] 815.609 µs (55.3%)
profiling 2.486 ms [2.334 ms, 2.638 ms] 1.012 ms (68.6%)
tracing 2.06 ms [2.006 ms, 2.113 ms] 585.038 µs (39.7%)

@pr-commenter
Copy link

pr-commenter bot commented Mar 10, 2026

Kafka / consumer-benchmark

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master brian.marks/fix-kafka-dsm-disabled-flaky-test-v2
git_commit_date 1773849990 1773961072
git_commit_sha 9352dfa f100979
See matching parameters
Baseline Candidate
ci_job_date 1773962427 1773962427
ci_job_id 1524004072 1524004072
ci_pipeline_id 103634932 103634932
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
jdkVersion 11.0.25 11.0.25
jmhVersion 1.36 1.36
jvm /usr/lib/jvm/java-11-openjdk-amd64/bin/java /usr/lib/jvm/java-11-openjdk-amd64/bin/java
jvmArgs -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/consumer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/consumer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant
vmName OpenJDK 64-Bit Server VM OpenJDK 64-Bit Server VM
vmVersion 11.0.25+9-post-Ubuntu-1ubuntu122.04 11.0.25+9-post-Ubuntu-1ubuntu122.04

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 3 metrics, 0 unstable metrics.

See unchanged results
scenario Δ mean throughput
scenario:not-instrumented/KafkaConsumerBenchmark.benchConsume same
scenario:only-tracing-dsm-disabled-benchmarks/KafkaConsumerBenchmark.benchConsume unsure
[-12911.355op/s; -1878.325op/s] or [-4.212%; -0.613%]
scenario:only-tracing-dsm-enabled-benchmarks/KafkaConsumerBenchmark.benchConsume same

@bm1549 bm1549 force-pushed the brian.marks/fix-kafka-dsm-disabled-flaky-test-v2 branch from 79b8bde to 106ceb0 Compare March 10, 2026 20:50
@bm1549 bm1549 marked this pull request as ready for review March 10, 2026 22:00
@bm1549 bm1549 requested review from a team as code owners March 10, 2026 22:00
@bm1549 bm1549 force-pushed the brian.marks/fix-kafka-dsm-disabled-flaky-test-v2 branch from 106ceb0 to dbe89c4 Compare March 17, 2026 11:01
bm1549 and others added 3 commits March 17, 2026 11:12
The test mapped consumer traces to producer spans by positional index
after SORT_TRACES_BY_ID sorting. Since trace IDs are random, the
consumer-to-producer mapping was non-deterministic, causing intermittent
`span.parentId == parent.spanId` assertion failures.

Fix by dynamically finding each consumer span's actual parent producer
span via parentId matching instead of relying on sort order.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use SEQUENTIAL id.generation.strategy in the DSM-disabled Kafka test to
force a deterministic sort order for SORT_TRACES_BY_ID. Sequential IDs
sort traces in creation order, which differs from the reverse mapping
the original positional code assumed. This proves the dynamic parent
lookup fix handles any trace ordering.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…test

Remove injectSysConfig("id.generation.strategy", "SEQUENTIAL") which did not
actually trigger the flake. Add KafkaClientDsmDisabledRandomIdsForkedTest that
overrides idGenerationStrategyName() to "RANDOM", matching production behavior.

With RANDOM IDs, SORT_TRACES_BY_ID produces non-deterministic order, causing
the original positional consumer-to-producer mapping to fail ~95% of the time.
Switch the batch consume test to SORT_TRACES_BY_START so the parent trace
(started before any consumer receives messages) is always at index 0. The
dynamic parent lookup fix handles any ordering of the 3 consumer traces.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@bm1549 bm1549 force-pushed the brian.marks/fix-kafka-dsm-disabled-flaky-test-v2 branch from c6cb1d4 to 45636bc Compare March 17, 2026 15:13
Comment on lines +1517 to +1537

/**
* Reproduces the flake in "test spring kafka template produce and batch consume"
* by using RANDOM IDs (instead of the default SEQUENTIAL used in tests).
*
* Root cause: The test's assertTraces(4, SORT_TRACES_BY_ID) sorts traces by
* localRootSpan.spanId, then hardcodes positional mappings between consumer and
* producer traces. With SEQUENTIAL IDs (the test default), both the producer span
* finish order within trace(0) and the consumer trace sort order are driven by the
* same Kafka internal ordering, so the mapping happens to be consistent.
*
* With RANDOM IDs (as used in production), the sort order becomes non-deterministic.
* There are 3! = 6 possible orderings for the 3 consumer traces, and only 1 matches
* the hardcoded mapping. The dynamic parent lookup fix handles any ordering.
*/
class KafkaClientDsmDisabledRandomIdsForkedTest extends KafkaClientDataStreamsDisabledForkedTest {
@Override
protected String idGenerationStrategyName() {
return "RANDOM"
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/**
* Reproduces the flake in "test spring kafka template produce and batch consume"
* by using RANDOM IDs (instead of the default SEQUENTIAL used in tests).
*
* Root cause: The test's assertTraces(4, SORT_TRACES_BY_ID) sorts traces by
* localRootSpan.spanId, then hardcodes positional mappings between consumer and
* producer traces. With SEQUENTIAL IDs (the test default), both the producer span
* finish order within trace(0) and the consumer trace sort order are driven by the
* same Kafka internal ordering, so the mapping happens to be consistent.
*
* With RANDOM IDs (as used in production), the sort order becomes non-deterministic.
* There are 3! = 6 possible orderings for the 3 consumer traces, and only 1 matches
* the hardcoded mapping. The dynamic parent lookup fix handles any ordering.
*/
class KafkaClientDsmDisabledRandomIdsForkedTest extends KafkaClientDataStreamsDisabledForkedTest {
@Override
protected String idGenerationStrategyName() {
return "RANDOM"
}
}

I'd suggest leaving it in since it provides a more representative test case, but I'm happy to remove it if folks think otherwise

Update 6 more test methods that use SORT_TRACES_BY_ID with hardcoded
positional trace references to use SORT_TRACES_BY_START, so they work
with the KafkaClientDsmDisabledRandomIdsForkedTest that uses RANDOM
span IDs. For the backwards iteration test with 9 traces, also use
dynamic parent matching since consumer trace ordering may vary.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

inst: kafka Kafka instrumentation tag: ai generated Largely based on code generated by an AI or LLM tag: flaky test Flaky tests tag: no release notes Changes to exclude from release notes type: bug Bug report and fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant