Skip to content

Conversation

@amarziali
Copy link
Contributor

@amarziali amarziali commented Nov 3, 2025

What Does This Do

This PR introduces a set of queue implementations in order to replace the JCTools-based queues, eliminating direct usage of sun.misc.Unsafe and providing full compatibility with Java 9+ runtimes through the VarHandle API.

The goal is to achieve similar high-performance concurrent queue behavior as JCTools while using supported, standard Java mechanisms.

A new Queues factory class is introduced to dynamically select the optimal queue implementation based on the Java runtime environment:

  • On Java 9 and newer, the factory instantiates the new VarHandle-based queues
  • On Java 1.8, it falls back to the existing JCTools-based queues to maintain backward compatibility and performance consistency.

Introduced Classes Summary

Class Pattern Description
SpscArrayQueueVarHandle Single-Producer / Single-Consumer Lock-free SPSC queue using VarHandles with acquire/release semantics. When possible, eliminates redundant volatile reads via cached head and tail.
SpmcArrayQueueVarHandle Single-Producer / Multiple-Consumer Lock-free SPMC queue supporting concurrent consumers using atomic head updates (CAS). Uses consumerLimit caching to reduce volatile contention.
MpscArrayQueueVarHandle<E> Multiple-Producer / Single-Consumer Lock-free MPSC queue where producers claim slots via CAS on TAIL_HANDLE. Maintains a producerLimit to minimize volatile head reads.
MpscBlockingConsumerArrayQueueVarHandle<E> Multiple-Producer / Single-Consumer (Blocking) Extends MPSC with blocking consumer support. Uses CONSUMER_THREAD_HANDLE to park/unpark the waiting consumer efficiently.

Memory Padding

All queue state fields (head, tail, cached limits, etc.) are cache-line padded to prevent false sharing between producers and consumers.
This ensures that frequently accessed hot fields do not reside on the same cache line across threads, minimizing cache invalidations and improving throughput under contention.

Memory Fence Semantics

Memory fences were explicitly chosen for each access type to minimize volatile overhead while maintaining correct visibility guarantees:

  • setRelease / getAcquire for publishing and consuming elements — provides correct inter-thread ordering without full barriers.
  • setOpaque / getOpaque for relaxed head/tail updates — avoids unnecessary synchronization costs where ordering is not required.
  • getVolatile only used when full memory fences are really required (e.g. refreshing limits to ensure visibility when the queue might be full or empty).

Queue Benchmark Results (ops/us)

Note: SPSC benchmark shows contentions on slow path (i.e. queue is full/queue is empty). This should less frequently happen in our case. Increasing the queue size (hence reducing the probability that's full) shows good performances.

MPSCBlockingConsumer Queue Benchmark (ops/us)

Implementation Capacity Total Consume Produce
JCTools 1024 41,149 30,661 10,488
VarHandle 1024 258,074 246,683 11,391
JCTools 65536 32,413 24,680 7,733
VarHandle 65536 224,982 217,498 7,485

MPSC Queue Benchmark (ops/us)

Implementation Capacity Total Consume Produce
JCTools 1024 41,784 31,070 10,715
VarHandle 1024 238,609 222,383 16,226
JCTools 65536 39,589 32,370 7,219
VarHandle 65536 262,729 250,627 12,102

SPSC Queue Benchmark (ops/us)

Implementation Capacity Total Consume Produce
JCTools 1024 259,418 129,694 129,724
VarHandle 1024 101,007 72,542 28,465
JCTools 65536 537,111 268,577 268,534
VarHandle 65536 353,161 191,188 161,973

Takeaways:

  • MPSC queues can be replaced with a VarHandle equivalent. In some cases, the new implementation shows even better performances.
  • jctools still outperforms in SPSC queues. Perhaps the VarHandle implementation might be optimised further.

Room for future improvements

In high-throughput scenarios where multiple producers compete for queue space, contention on the CAS operation can become a bottleneck.

Idea to mitigate this, when the queue is likely not full, a getAndAdd operation can be used instead of a CAS to claim slots since it will never fail. This optimization allows multiple producers to advance the tail index with reduced atomic contention. However, when the queue is nearly full, the getAndAdd cannot be reliably done hence the classic CAS loop (slow path) can be used instead.

Motivation

Additional Notes

Contributor Checklist

Jira ticket: [PROJ-IDENT]

@datadog-datadog-prod-us1

This comment has been minimized.

@pr-commenter
Copy link

pr-commenter bot commented Nov 3, 2025

Debugger benchmarks

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
ci_job_date 1762350659 1762351004
end_time 2025-11-05T13:52:20 2025-11-05T13:58:05
git_branch master andrea.marziali/remove-jctools-queues
git_commit_sha 8db72c0 a880c67
start_time 2025-11-05T13:51:00 2025-11-05T13:56:45
See matching parameters
Baseline Candidate
ci_job_id 1217030429 1217030429
ci_pipeline_id 81317491 81317491
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
git_commit_date 1762349967 1762349967

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 10 metrics, 5 unstable metrics.

See unchanged results
scenario Δ mean agg_http_req_duration_min Δ mean agg_http_req_duration_p50 Δ mean agg_http_req_duration_p75 Δ mean agg_http_req_duration_p99 Δ mean throughput
scenario:noprobe unstable
[-16.952µs; +19.877µs] or [-6.074%; +7.122%]
unstable
[-27.156µs; +31.309µs] or [-8.553%; +9.861%]
unstable
[-36.996µs; +42.158µs] or [-11.187%; +12.748%]
unstable
[-142.099µs; +61.104µs] or [-13.747%; +5.911%]
same
scenario:basic same same same unstable
[-226.500µs; -8.890µs] or [-25.732%; -1.010%]
same
scenario:loop unsure
[+0.257µs; +4.484µs] or [+0.003%; +0.051%]
unsure
[-7.436µs; -0.629µs] or [-0.083%; -0.007%]
unsure
[-8.505µs; -0.543µs] or [-0.094%; -0.006%]
same same
Request duration reports for reports
gantt
    title reports - request duration [CI 0.99] : candidate=None, baseline=None
    dateFormat X
    axisFormat %s
section baseline
noprobe (317.508 µs) : 291, 344
.   : milestone, 318,
basic (294.053 µs) : 287, 301
.   : milestone, 294,
loop (8.959 ms) : 8956, 8963
.   : milestone, 8959,
section candidate
noprobe (319.585 µs) : 290, 349
.   : milestone, 320,
basic (293.196 µs) : 286, 300
.   : milestone, 293,
loop (8.955 ms) : 8952, 8958
.   : milestone, 8955,
Loading
  • baseline results
Scenario Request median duration [CI 0.99]
noprobe 317.508 µs [291.339 µs, 343.677 µs]
basic 294.053 µs [286.681 µs, 301.425 µs]
loop 8.959 ms [8.956 ms, 8.963 ms]
  • candidate results
Scenario Request median duration [CI 0.99]
noprobe 319.585 µs [290.19 µs, 348.98 µs]
basic 293.196 µs [286.34 µs, 300.053 µs]
loop 8.955 ms [8.952 ms, 8.958 ms]

@pr-commenter
Copy link

pr-commenter bot commented Nov 3, 2025

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master andrea.marziali/remove-jctools-queues
git_commit_date 1764344838 1764345098
git_commit_sha 17c7fcf 9a902be
release_version 1.57.0-SNAPSHOT~17c7fcf3e9 1.57.0-SNAPSHOT~9a902be485
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1764346917 1764346917
ci_job_id 1262017328 1262017328
ci_pipeline_id 84364713 84364713
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-1-mxxtzcsw 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-1-mxxtzcsw 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module Agent Agent
parent None None

Summary

Found 3 performance improvements and 0 performance regressions! Performance is the same for 52 metrics, 10 unstable metrics.

scenario Δ mean execution_time candidate mean execution_time baseline mean execution_time
scenario:startup:insecure-bank:iast:Flare Poller better
[-7.114ms; -6.569ms] or [-65.496%; -60.474%]
4.021ms 10.863ms
scenario:startup:petclinic:iast:Flare Poller better
[-6.966ms; -6.462ms] or [-64.882%; -60.181%]
4.023ms 10.737ms
scenario:startup:petclinic:profiling:Debugger better
[-456.469µs; -182.922µs] or [-6.685%; -2.679%]
6.508ms 6.828ms
Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.57.0-SNAPSHOT~9a902be485, baseline=1.57.0-SNAPSHOT~17c7fcf3e9

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.051 s) : 0, 1050647
Total [baseline] (8.702 s) : 0, 8702333
Agent [candidate] (1.054 s) : 0, 1053780
Total [candidate] (8.695 s) : 0, 8694955
section iast
Agent [baseline] (1.204 s) : 0, 1203661
Total [baseline] (9.395 s) : 0, 9394979
Agent [candidate] (1.196 s) : 0, 1195813
Total [candidate] (9.354 s) : 0, 9354072
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.051 s -
Agent iast 1.204 s 153.014 ms (14.6%)
Total tracing 8.702 s -
Total iast 9.395 s 692.646 ms (8.0%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.054 s -
Agent iast 1.196 s 142.033 ms (13.5%)
Total tracing 8.695 s -
Total iast 9.354 s 659.117 ms (7.6%)
gantt
    title insecure-bank - break down per module: candidate=1.57.0-SNAPSHOT~9a902be485, baseline=1.57.0-SNAPSHOT~17c7fcf3e9

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.501 ms) : 0, 1501
crashtracking [candidate] (1.49 ms) : 0, 1490
BytebuddyAgent [baseline] (705.353 ms) : 0, 705353
BytebuddyAgent [candidate] (711.982 ms) : 0, 711982
GlobalTracer [baseline] (249.125 ms) : 0, 249125
GlobalTracer [candidate] (245.517 ms) : 0, 245517
AppSec [baseline] (31.986 ms) : 0, 31986
AppSec [candidate] (32.292 ms) : 0, 32292
Debugger [baseline] (6.425 ms) : 0, 6425
Debugger [candidate] (6.398 ms) : 0, 6398
Remote Config [baseline] (676.635 µs) : 0, 677
Remote Config [candidate] (691.263 µs) : 0, 691
Telemetry [baseline] (16.688 ms) : 0, 16688
Telemetry [candidate] (11.081 ms) : 0, 11081
Flare Poller [baseline] (4.111 ms) : 0, 4111
Flare Poller [candidate] (9.399 ms) : 0, 9399
section iast
crashtracking [baseline] (1.484 ms) : 0, 1484
crashtracking [candidate] (1.491 ms) : 0, 1491
BytebuddyAgent [baseline] (839.472 ms) : 0, 839472
BytebuddyAgent [candidate] (844.407 ms) : 0, 844407
GlobalTracer [baseline] (239.787 ms) : 0, 239787
GlobalTracer [candidate] (233.853 ms) : 0, 233853
AppSec [baseline] (30.888 ms) : 0, 30888
AppSec [candidate] (29.319 ms) : 0, 29319
Debugger [baseline] (6.173 ms) : 0, 6173
Debugger [candidate] (6.002 ms) : 0, 6002
Remote Config [baseline] (635.14 µs) : 0, 635
Remote Config [candidate] (624.26 µs) : 0, 624
Telemetry [baseline] (8.074 ms) : 0, 8074
Telemetry [candidate] (7.935 ms) : 0, 7935
Flare Poller [baseline] (10.863 ms) : 0, 10863
Flare Poller [candidate] (4.021 ms) : 0, 4021
IAST [baseline] (31.36 ms) : 0, 31360
IAST [candidate] (32.955 ms) : 0, 32955
Loading
Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.57.0-SNAPSHOT~9a902be485, baseline=1.57.0-SNAPSHOT~17c7fcf3e9

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.057 s) : 0, 1057036
Total [baseline] (10.788 s) : 0, 10788279
Agent [candidate] (1.049 s) : 0, 1048628
Total [candidate] (10.773 s) : 0, 10773229
section appsec
Agent [baseline] (1.223 s) : 0, 1222705
Total [baseline] (10.889 s) : 0, 10889247
Agent [candidate] (1.221 s) : 0, 1221015
Total [candidate] (10.874 s) : 0, 10874446
section iast
Agent [baseline] (1.195 s) : 0, 1195366
Total [baseline] (11.167 s) : 0, 11166550
Agent [candidate] (1.184 s) : 0, 1184296
Total [candidate] (11.173 s) : 0, 11173407
section profiling
Agent [baseline] (1.196 s) : 0, 1196480
Total [baseline] (10.915 s) : 0, 10914884
Agent [candidate] (1.193 s) : 0, 1193101
Total [candidate] (10.902 s) : 0, 10902174
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.057 s -
Agent appsec 1.223 s 165.669 ms (15.7%)
Agent iast 1.195 s 138.329 ms (13.1%)
Agent profiling 1.196 s 139.444 ms (13.2%)
Total tracing 10.788 s -
Total appsec 10.889 s 100.968 ms (0.9%)
Total iast 11.167 s 378.272 ms (3.5%)
Total profiling 10.915 s 126.606 ms (1.2%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.049 s -
Agent appsec 1.221 s 172.387 ms (16.4%)
Agent iast 1.184 s 135.669 ms (12.9%)
Agent profiling 1.193 s 144.473 ms (13.8%)
Total tracing 10.773 s -
Total appsec 10.874 s 101.217 ms (0.9%)
Total iast 11.173 s 400.178 ms (3.7%)
Total profiling 10.902 s 128.945 ms (1.2%)
gantt
    title petclinic - break down per module: candidate=1.57.0-SNAPSHOT~9a902be485, baseline=1.57.0-SNAPSHOT~17c7fcf3e9

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.495 ms) : 0, 1495
crashtracking [candidate] (1.471 ms) : 0, 1471
BytebuddyAgent [baseline] (711.52 ms) : 0, 711520
BytebuddyAgent [candidate] (707.738 ms) : 0, 707738
GlobalTracer [baseline] (249.308 ms) : 0, 249308
GlobalTracer [candidate] (245.056 ms) : 0, 245056
AppSec [baseline] (32.159 ms) : 0, 32159
AppSec [candidate] (32.124 ms) : 0, 32124
Debugger [baseline] (6.428 ms) : 0, 6428
Debugger [candidate] (6.382 ms) : 0, 6382
Remote Config [baseline] (671.387 µs) : 0, 671
Remote Config [candidate] (671.318 µs) : 0, 671
Telemetry [baseline] (14.913 ms) : 0, 14913
Telemetry [candidate] (10.281 ms) : 0, 10281
Flare Poller [baseline] (5.61 ms) : 0, 5610
Flare Poller [candidate] (10.178 ms) : 0, 10178
section appsec
crashtracking [baseline] (1.473 ms) : 0, 1473
crashtracking [candidate] (1.47 ms) : 0, 1470
BytebuddyAgent [baseline] (728.101 ms) : 0, 728101
BytebuddyAgent [candidate] (731.141 ms) : 0, 731141
GlobalTracer [baseline] (239.975 ms) : 0, 239975
GlobalTracer [candidate] (235.947 ms) : 0, 235947
IAST [baseline] (24.746 ms) : 0, 24746
IAST [candidate] (24.635 ms) : 0, 24635
AppSec [baseline] (174.278 ms) : 0, 174278
AppSec [candidate] (174.001 ms) : 0, 174001
Debugger [baseline] (6.307 ms) : 0, 6307
Debugger [candidate] (6.118 ms) : 0, 6118
Remote Config [baseline] (683.246 µs) : 0, 683
Remote Config [candidate] (685.189 µs) : 0, 685
Telemetry [baseline] (8.218 ms) : 0, 8218
Telemetry [candidate] (8.183 ms) : 0, 8183
Flare Poller [baseline] (3.977 ms) : 0, 3977
Flare Poller [candidate] (4.044 ms) : 0, 4044
section iast
crashtracking [baseline] (1.483 ms) : 0, 1483
crashtracking [candidate] (1.491 ms) : 0, 1491
BytebuddyAgent [baseline] (833.751 ms) : 0, 833751
BytebuddyAgent [candidate] (834.347 ms) : 0, 834347
GlobalTracer [baseline] (238.528 ms) : 0, 238528
GlobalTracer [candidate] (233.959 ms) : 0, 233959
IAST [baseline] (32.75 ms) : 0, 32750
IAST [candidate] (33.388 ms) : 0, 33388
AppSec [baseline] (28.637 ms) : 0, 28637
AppSec [candidate] (27.083 ms) : 0, 27083
Debugger [baseline] (6.038 ms) : 0, 6038
Debugger [candidate] (6.001 ms) : 0, 6001
Remote Config [baseline] (611.783 µs) : 0, 612
Remote Config [candidate] (593.025 µs) : 0, 593
Telemetry [baseline] (7.979 ms) : 0, 7979
Telemetry [candidate] (7.891 ms) : 0, 7891
Flare Poller [baseline] (10.737 ms) : 0, 10737
Flare Poller [candidate] (4.023 ms) : 0, 4023
section profiling
crashtracking [baseline] (1.427 ms) : 0, 1427
crashtracking [candidate] (1.451 ms) : 0, 1451
BytebuddyAgent [baseline] (732.71 ms) : 0, 732710
BytebuddyAgent [candidate] (734.4 ms) : 0, 734400
GlobalTracer [baseline] (222.019 ms) : 0, 222019
GlobalTracer [candidate] (217.783 ms) : 0, 217783
AppSec [baseline] (32.14 ms) : 0, 32140
AppSec [candidate] (32.331 ms) : 0, 32331
Debugger [baseline] (6.828 ms) : 0, 6828
Debugger [candidate] (6.508 ms) : 0, 6508
Remote Config [baseline] (700.953 µs) : 0, 701
Remote Config [candidate] (716.138 µs) : 0, 716
Telemetry [baseline] (16.456 ms) : 0, 16456
Telemetry [candidate] (16.472 ms) : 0, 16472
Flare Poller [baseline] (4.129 ms) : 0, 4129
Flare Poller [candidate] (4.134 ms) : 0, 4134
ProfilingAgent [baseline] (110.996 ms) : 0, 110996
ProfilingAgent [candidate] (109.961 ms) : 0, 109961
Profiling [baseline] (111.631 ms) : 0, 111631
Profiling [candidate] (110.623 ms) : 0, 110623
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master andrea.marziali/remove-jctools-queues
git_commit_date 1764344838 1764345098
git_commit_sha 17c7fcf 9a902be
release_version 1.57.0-SNAPSHOT~17c7fcf3e9 1.57.0-SNAPSHOT~9a902be485
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1764347418 1764347418
ci_job_id 1262017330 1262017330
ci_pipeline_id 84364713 84364713
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-1-vpizpu5p 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-1-vpizpu5p 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 4 performance improvements and 4 performance regressions! Performance is the same for 11 metrics, 17 unstable metrics.

scenario Δ mean agg_http_req_duration_p50 Δ mean agg_http_req_duration_p95 Δ mean throughput candidate mean agg_http_req_duration_p50 candidate mean agg_http_req_duration_p95 candidate mean throughput baseline mean agg_http_req_duration_p50 baseline mean agg_http_req_duration_p95 baseline mean throughput
scenario:load:insecure-bank:iast:high_load worse
[+168.703µs; +253.184µs] or [+6.713%; +10.075%]
worse
[+370.853µs; +695.788µs] or [+5.070%; +9.512%]
unstable
[-243.368op/s; +44.243op/s] or [-17.144%; +3.117%]
2.724ms 7.848ms 1320.000op/s 2.513ms 7.315ms 1419.562op/s
scenario:load:insecure-bank:iast_FULL:high_load better
[-316.023µs; -106.488µs] or [-5.999%; -2.022%]
same
[-615.548µs; +32.558µs] or [-4.961%; +0.262%]
unstable
[-59.279op/s; +105.404op/s] or [-7.559%; +13.441%]
5.056ms 12.117ms 807.281op/s 5.268ms 12.408ms 784.219op/s
scenario:load:petclinic:profiling:high_load worse
[+0.869ms; +1.854ms] or [+4.777%; +10.196%]
worse
[+1.045ms; +2.649ms] or [+3.535%; +8.958%]
unstable
[-39.782op/s; +9.657op/s] or [-15.818%; +3.840%]
19.544ms 31.423ms 236.438op/s 18.183ms 29.575ms 251.500op/s
scenario:load:petclinic:iast:high_load better
[-1308.411µs; -551.847µs] or [-7.080%; -2.986%]
unsure
[-1.916ms; -0.525ms] or [-6.351%; -1.741%]
unstable
[-10.721op/s; +39.658op/s] or [-4.353%; +16.103%]
17.551ms 28.948ms 260.750op/s 18.481ms 30.169ms 246.281op/s
scenario:load:petclinic:code_origins:high_load better
[-2.048ms; -1.279ms] or [-10.864%; -6.783%]
better
[-2.547ms; -1.285ms] or [-8.405%; -4.240%]
unstable
[-4.108op/s; +47.921op/s] or [-1.682%; +19.620%]
17.191ms 28.391ms 266.156op/s 18.854ms 30.307ms 244.250op/s
Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.57.0-SNAPSHOT~9a902be485, baseline=1.57.0-SNAPSHOT~17c7fcf3e9
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.2 ms) : 1188, 1211
.   : milestone, 1200,
iast (3.231 ms) : 3185, 3277
.   : milestone, 3231,
iast_FULL (5.897 ms) : 5837, 5956
.   : milestone, 5897,
iast_GLOBAL (3.72 ms) : 3660, 3781
.   : milestone, 3720,
profiling (2.004 ms) : 1986, 2022
.   : milestone, 2004,
tracing (1.829 ms) : 1813, 1844
.   : milestone, 1829,
section candidate
no_agent (1.201 ms) : 1189, 1212
.   : milestone, 1201,
iast (3.472 ms) : 3423, 3520
.   : milestone, 3472,
iast_FULL (5.723 ms) : 5666, 5780
.   : milestone, 5723,
iast_GLOBAL (3.582 ms) : 3530, 3634
.   : milestone, 3582,
profiling (2.134 ms) : 2116, 2153
.   : milestone, 2134,
tracing (1.816 ms) : 1800, 1831
.   : milestone, 1816,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.2 ms [1.188 ms, 1.211 ms] -
iast 3.231 ms [3.185 ms, 3.277 ms] 2.032 ms (169.4%)
iast_FULL 5.897 ms [5.837 ms, 5.956 ms] 4.697 ms (391.6%)
iast_GLOBAL 3.72 ms [3.66 ms, 3.781 ms] 2.52 ms (210.1%)
profiling 2.004 ms [1.986 ms, 2.022 ms] 804.678 µs (67.1%)
tracing 1.829 ms [1.813 ms, 1.844 ms] 629.1 µs (52.4%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.201 ms [1.189 ms, 1.212 ms] -
iast 3.472 ms [3.423 ms, 3.52 ms] 2.271 ms (189.1%)
iast_FULL 5.723 ms [5.666 ms, 5.78 ms] 4.522 ms (376.6%)
iast_GLOBAL 3.582 ms [3.53 ms, 3.634 ms] 2.382 ms (198.3%)
profiling 2.134 ms [2.116 ms, 2.153 ms] 933.663 µs (77.8%)
tracing 1.816 ms [1.8 ms, 1.831 ms] 614.867 µs (51.2%)
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.57.0-SNAPSHOT~9a902be485, baseline=1.57.0-SNAPSHOT~17c7fcf3e9
    dateFormat X
    axisFormat %s
section baseline
no_agent (17.824 ms) : 17638, 18011
.   : milestone, 17824,
appsec (18.674 ms) : 18481, 18867
.   : milestone, 18674,
code_origins (19.115 ms) : 18925, 19305
.   : milestone, 19115,
iast (18.952 ms) : 18760, 19145
.   : milestone, 18952,
profiling (18.56 ms) : 18376, 18744
.   : milestone, 18560,
tracing (18.807 ms) : 18619, 18995
.   : milestone, 18807,
section candidate
no_agent (19.045 ms) : 18851, 19240
.   : milestone, 19045,
appsec (18.367 ms) : 18181, 18554
.   : milestone, 18367,
code_origins (17.53 ms) : 17358, 17703
.   : milestone, 17530,
iast (17.898 ms) : 17718, 18078
.   : milestone, 17898,
profiling (19.752 ms) : 19547, 19956
.   : milestone, 19752,
tracing (18.84 ms) : 18646, 19034
.   : milestone, 18840,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 17.824 ms [17.638 ms, 18.011 ms] -
appsec 18.674 ms [18.481 ms, 18.867 ms] 849.698 µs (4.8%)
code_origins 19.115 ms [18.925 ms, 19.305 ms] 1.291 ms (7.2%)
iast 18.952 ms [18.76 ms, 19.145 ms] 1.128 ms (6.3%)
profiling 18.56 ms [18.376 ms, 18.744 ms] 735.371 µs (4.1%)
tracing 18.807 ms [18.619 ms, 18.995 ms] 982.494 µs (5.5%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 19.045 ms [18.851 ms, 19.24 ms] -
appsec 18.367 ms [18.181 ms, 18.554 ms] -678.018 µs (-3.6%)
code_origins 17.53 ms [17.358 ms, 17.703 ms] -1.515 ms (-8.0%)
iast 17.898 ms [17.718 ms, 18.078 ms] -1.148 ms (-6.0%)
profiling 19.752 ms [19.547 ms, 19.956 ms] 706.306 µs (3.7%)
tracing 18.84 ms [18.646 ms, 19.034 ms] -205.506 µs (-1.1%)

Dacapo

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master andrea.marziali/remove-jctools-queues
git_commit_date 1764344838 1764345098
git_commit_sha 17c7fcf 9a902be
release_version 1.57.0-SNAPSHOT~17c7fcf3e9 1.57.0-SNAPSHOT~9a902be485
See matching parameters
Baseline Candidate
application biojava biojava
ci_job_date 1764347140 1764347140
ci_job_id 1262017332 1262017332
ci_pipeline_id 84364713 84364713
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-2-yuiicez9 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-2-yuiicez9 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics.

Execution time for biojava
gantt
    title biojava - execution time [CI 0.99] : candidate=1.57.0-SNAPSHOT~9a902be485, baseline=1.57.0-SNAPSHOT~17c7fcf3e9
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.404 s) : 15404000, 15404000
.   : milestone, 15404000,
appsec (14.613 s) : 14613000, 14613000
.   : milestone, 14613000,
iast (18.56 s) : 18560000, 18560000
.   : milestone, 18560000,
iast_GLOBAL (17.961 s) : 17961000, 17961000
.   : milestone, 17961000,
profiling (14.806 s) : 14806000, 14806000
.   : milestone, 14806000,
tracing (14.554 s) : 14554000, 14554000
.   : milestone, 14554000,
section candidate
no_agent (15.362 s) : 15362000, 15362000
.   : milestone, 15362000,
appsec (14.667 s) : 14667000, 14667000
.   : milestone, 14667000,
iast (18.461 s) : 18461000, 18461000
.   : milestone, 18461000,
iast_GLOBAL (18.006 s) : 18006000, 18006000
.   : milestone, 18006000,
profiling (14.889 s) : 14889000, 14889000
.   : milestone, 14889000,
tracing (14.759 s) : 14759000, 14759000
.   : milestone, 14759000,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.404 s [15.404 s, 15.404 s] -
appsec 14.613 s [14.613 s, 14.613 s] -791.0 ms (-5.1%)
iast 18.56 s [18.56 s, 18.56 s] 3.156 s (20.5%)
iast_GLOBAL 17.961 s [17.961 s, 17.961 s] 2.557 s (16.6%)
profiling 14.806 s [14.806 s, 14.806 s] -598.0 ms (-3.9%)
tracing 14.554 s [14.554 s, 14.554 s] -850.0 ms (-5.5%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.362 s [15.362 s, 15.362 s] -
appsec 14.667 s [14.667 s, 14.667 s] -695.0 ms (-4.5%)
iast 18.461 s [18.461 s, 18.461 s] 3.099 s (20.2%)
iast_GLOBAL 18.006 s [18.006 s, 18.006 s] 2.644 s (17.2%)
profiling 14.889 s [14.889 s, 14.889 s] -473.0 ms (-3.1%)
tracing 14.759 s [14.759 s, 14.759 s] -603.0 ms (-3.9%)
Execution time for tomcat
gantt
    title tomcat - execution time [CI 0.99] : candidate=1.57.0-SNAPSHOT~9a902be485, baseline=1.57.0-SNAPSHOT~17c7fcf3e9
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.472 ms) : 1460, 1483
.   : milestone, 1472,
appsec (3.646 ms) : 3433, 3860
.   : milestone, 3646,
iast (2.205 ms) : 2140, 2269
.   : milestone, 2205,
iast_GLOBAL (2.252 ms) : 2187, 2317
.   : milestone, 2252,
profiling (2.076 ms) : 2023, 2130
.   : milestone, 2076,
tracing (2.038 ms) : 1987, 2088
.   : milestone, 2038,
section candidate
no_agent (1.476 ms) : 1464, 1487
.   : milestone, 1476,
appsec (3.703 ms) : 3485, 3922
.   : milestone, 3703,
iast (2.202 ms) : 2138, 2266
.   : milestone, 2202,
iast_GLOBAL (2.252 ms) : 2187, 2317
.   : milestone, 2252,
profiling (2.067 ms) : 2015, 2119
.   : milestone, 2067,
tracing (2.045 ms) : 1995, 2096
.   : milestone, 2045,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.472 ms [1.46 ms, 1.483 ms] -
appsec 3.646 ms [3.433 ms, 3.86 ms] 2.175 ms (147.8%)
iast 2.205 ms [2.14 ms, 2.269 ms] 732.779 µs (49.8%)
iast_GLOBAL 2.252 ms [2.187 ms, 2.317 ms] 780.218 µs (53.0%)
profiling 2.076 ms [2.023 ms, 2.13 ms] 604.676 µs (41.1%)
tracing 2.038 ms [1.987 ms, 2.088 ms] 565.89 µs (38.4%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.476 ms [1.464 ms, 1.487 ms] -
appsec 3.703 ms [3.485 ms, 3.922 ms] 2.228 ms (151.0%)
iast 2.202 ms [2.138 ms, 2.266 ms] 726.576 µs (49.2%)
iast_GLOBAL 2.252 ms [2.187 ms, 2.317 ms] 776.393 µs (52.6%)
profiling 2.067 ms [2.015 ms, 2.119 ms] 591.001 µs (40.1%)
tracing 2.045 ms [1.995 ms, 2.096 ms] 569.823 µs (38.6%)

@amarziali amarziali force-pushed the andrea.marziali/remove-jctools-queues branch 6 times, most recently from 229f67a to 374d13d Compare November 7, 2025 14:59
@amarziali amarziali changed the title Removes jctools usage for lock-free queues. Replace JCTools queues with VarHandle-based implementations for Java 9+ Nov 10, 2025
@amarziali amarziali force-pushed the andrea.marziali/remove-jctools-queues branch 2 times, most recently from 21e0a65 to 259eeb5 Compare November 10, 2025 15:25
@amarziali amarziali marked this pull request as ready for review November 10, 2025 16:20
@amarziali amarziali requested review from a team as code owners November 10, 2025 16:20
@amarziali amarziali requested a review from mcculls November 10, 2025 16:20
@github-actions
Copy link
Contributor

Hi! 👋 Thanks for your pull request! 🎉

To help us review it, please make sure to:

  • Add at least one type, and one component or instrumentation label to the pull request

If you need help, please check our contributing guidelines.

@amarziali amarziali requested a review from dougqh November 10, 2025 16:20
@amarziali amarziali added type: enhancement Enhancements and improvements comp: core Tracer core labels Nov 10, 2025
@amarziali amarziali force-pushed the andrea.marziali/remove-jctools-queues branch from 9e7acbe to b2850b3 Compare November 12, 2025 09:01
@franz1981
Copy link

Hi @amarziali I am one of the developers of JCTools and we are super happy if we could bring a var handle generation variant in our lib as well.
I can see the value and faster feedback/different ownership of having a stripped version of our dependency (which can still be obtain via shading actually...), but I believe would be a great community value if we could join efforts...plus, we love contributions ☺️

Note: JCTools is at the very core of other frameworks which will soon hit the "no unsafe world" JVM barrier, including Netty.
I'm recently working hard to improve it re this aspect, and JCtools is one of the key but missing pieces there too.
Which means that contributing to JCTools would bring an enormous value to Netty and to many others very impactful projects as well
🙏

@amarziali amarziali force-pushed the andrea.marziali/remove-jctools-queues branch 3 times, most recently from 85b0dcd to fc49419 Compare November 17, 2025 12:28
@amarziali amarziali force-pushed the andrea.marziali/remove-jctools-queues branch from fc49419 to 183fc37 Compare November 19, 2025 17:27
@amarziali amarziali force-pushed the andrea.marziali/remove-jctools-queues branch from 183fc37 to 9a902be Compare November 28, 2025 15:51
Copy link
Contributor

@bric3 bric3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First batch of comments, questions, open thoughts...

Comment on lines +23 to +24
// -1 in two's complement = 0xFFFF_FFFF (all bits set to 1). Unsigned right-shifting by n: (-1
// >>> n) produces a mask of (32 - n) one-bits.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion:

Avoids spotless meaningless reformat

Suggested change
// -1 in two's complement = 0xFFFF_FFFF (all bits set to 1). Unsigned right-shifting by n: (-1
// >>> n) produces a mask of (32 - n) one-bits.
// -1 in two's complement = 0xFFFF_FFFF (all bits set to 1).
// Unsigned right-shifting by n: (-1 // >>> n) produces a mask
// of (32 - n) one-bits.

Comment on lines 21 to 23
tasks.withType<Javadoc>().configureEach() {
javadocTool = javaToolchains.javadocToolFor(java.toolchain)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: Do we need javadoc for a submodule ?

Copy link
Contributor Author

@amarziali amarziali Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. To be fair I copied paste from another in the same utils folder. I can perhaps get rid of

Comment on lines +22 to +31
/*
Benchmark (capacity) Mode Cnt Score Error Units
JctoolsMPSCBlockingConsumerQueueBenchmark.queueTest 1024 thrpt 41,149 ops/us
JctoolsMPSCBlockingConsumerQueueBenchmark.queueTest:consume 1024 thrpt 30,661 ops/us
JctoolsMPSCBlockingConsumerQueueBenchmark.queueTest:produce 1024 thrpt 10,488 ops/us
JctoolsMPSCBlockingConsumerQueueBenchmark.queueTest 65536 thrpt 32,413 ops/us
JctoolsMPSCBlockingConsumerQueueBenchmark.queueTest:consume 65536 thrpt 24,680 ops/us
JctoolsMPSCBlockingConsumerQueueBenchmark.queueTest:produce 65536 thrpt 7,733 ops/us
*/
@BenchmarkMode(Mode.Throughput)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: It might worth it to have benchmarks on aarch64 and x86_64 ?

As there might be differences how VarHandle translates to machine code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, fences are cheaper on x86_64, so running the JMH benchmark on the worst-case architecture should already give a meaningful picture. Just keep in mind that benchmarks of this type tend to be very sensitive to machine conditions because the operations are so small. My goal was only to get a rough estimate of the expected behaviour, not to push the microbenchmarking further. In realistic scenarios, I’m more interested in understanding the impact on the host application, and that’s already covered by the macro benchmarks we have.

Comment on lines +20 to +28
/*
Benchmark (capacity) Mode Cnt Score Error Units
MPSCQueueBenchmark.queueTest 1024 thrpt 238,609 ops/us
MPSCQueueBenchmark.queueTest:consume 1024 thrpt 222,383 ops/us
MPSCQueueBenchmark.queueTest:produce 1024 thrpt 16,226 ops/us
MPSCQueueBenchmark.queueTest 65536 thrpt 262,729 ops/us
MPSCQueueBenchmark.queueTest:consume 65536 thrpt 250,627 ops/us
MPSCQueueBenchmark.queueTest:produce 65536 thrpt 12,102 ops/us
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Removed tabs + alignment

Suggested change
/*
Benchmark (capacity) Mode Cnt Score Error Units
MPSCQueueBenchmark.queueTest 1024 thrpt 238,609 ops/us
MPSCQueueBenchmark.queueTest:consume 1024 thrpt 222,383 ops/us
MPSCQueueBenchmark.queueTest:produce 1024 thrpt 16,226 ops/us
MPSCQueueBenchmark.queueTest 65536 thrpt 262,729 ops/us
MPSCQueueBenchmark.queueTest:consume 65536 thrpt 250,627 ops/us
MPSCQueueBenchmark.queueTest:produce 65536 thrpt 12,102 ops/us
*/
/*
Benchmark (capacity) Mode Cnt Score Error Units
MPSCQueueBenchmark.queueTest 1024 thrpt 238,609 ops/us
MPSCQueueBenchmark.queueTest:consume 1024 thrpt 222,383 ops/us
MPSCQueueBenchmark.queueTest:produce 1024 thrpt 16,226 ops/us
MPSCQueueBenchmark.queueTest 65536 thrpt 262,729 ops/us
MPSCQueueBenchmark.queueTest:consume 65536 thrpt 250,627 ops/us
MPSCQueueBenchmark.queueTest:produce 65536 thrpt 12,102 ops/us
*/

Objects.requireNonNull(e);

// jctools does the same local copy to have the jitter optimise the accesses
final Object[] localBuffer = this.buffer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion:

Suggested change
final Object[] localBuffer = this.buffer;
@SuppressWarnings("UnnecessaryLocalVariable")
final Object[] localBuffer = this.buffer;

import java.lang.invoke.VarHandle;

/** A padded, volatile long sequence value designed to minimize false sharing. */
public final class PaddedSequence extends LongRhsPadding {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: It doesn't seem to be always used as "sequence".

@SuppressWarnings("unused")
private long r0, r1, r2, r3, r4, r5, r6;

public BaseQueue(int requestedCapacity) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick:

Suggested change
public BaseQueue(int requestedCapacity) {
public BaseQueue(int requestedCapacity) {
if (requestedCapacity < 1) { throw new IllArgumentException("Size needs to be at least 1"); }

public MpscArrayQueueVarHandle(int requestedCapacity) {
super(requestedCapacity);
this.producerLimit = new PaddedSequence(capacity);
;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
;

MetricWriter writer,
Queue<Batch> batchPool,
MpscCompoundQueue<InboxItem> inbox,
NonBlockingQueue<InboxItem> inbox,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I wonder if it makes to communicate via the type, the properties of this queue here, i.e. multiple producers single consumer?
So reader of this code don't have to wonder the connection between the customers and producers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would require creating too many additional types, which I deliberately want to avoid. Functionally, it wouldn’t change anything. I think it’s sufficient for the reader to simply check which queue variant is requested at creation time.

long cachedHead = 0L; // Local cache of head to reduce volatile reads

int spinCycles = 0;
boolean parkOnSpin = (Thread.currentThread().getId() & 1) == 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth some larger comment here ?

@amarziali amarziali force-pushed the andrea.marziali/remove-jctools-queues branch from 9a902be to c8499a4 Compare December 5, 2025 19:00
@amarziali amarziali requested a review from a team as a code owner December 5, 2025 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: core Tracer core type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants