-
Notifications
You must be signed in to change notification settings - Fork 319
Replace JCTools queues with VarHandle-based implementations for Java 9+ #9896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
Debugger benchmarksParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 10 metrics, 5 unstable metrics. See unchanged results
Request duration reports for reportsgantt
title reports - request duration [CI 0.99] : candidate=None, baseline=None
dateFormat X
axisFormat %s
section baseline
noprobe (317.508 µs) : 291, 344
. : milestone, 318,
basic (294.053 µs) : 287, 301
. : milestone, 294,
loop (8.959 ms) : 8956, 8963
. : milestone, 8959,
section candidate
noprobe (319.585 µs) : 290, 349
. : milestone, 320,
basic (293.196 µs) : 286, 300
. : milestone, 293,
loop (8.955 ms) : 8952, 8958
. : milestone, 8955,
|
BenchmarksStartupParameters
See matching parameters
SummaryFound 3 performance improvements and 0 performance regressions! Performance is the same for 52 metrics, 10 unstable metrics.
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.57.0-SNAPSHOT~9a902be485, baseline=1.57.0-SNAPSHOT~17c7fcf3e9
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.051 s) : 0, 1050647
Total [baseline] (8.702 s) : 0, 8702333
Agent [candidate] (1.054 s) : 0, 1053780
Total [candidate] (8.695 s) : 0, 8694955
section iast
Agent [baseline] (1.204 s) : 0, 1203661
Total [baseline] (9.395 s) : 0, 9394979
Agent [candidate] (1.196 s) : 0, 1195813
Total [candidate] (9.354 s) : 0, 9354072
gantt
title insecure-bank - break down per module: candidate=1.57.0-SNAPSHOT~9a902be485, baseline=1.57.0-SNAPSHOT~17c7fcf3e9
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.501 ms) : 0, 1501
crashtracking [candidate] (1.49 ms) : 0, 1490
BytebuddyAgent [baseline] (705.353 ms) : 0, 705353
BytebuddyAgent [candidate] (711.982 ms) : 0, 711982
GlobalTracer [baseline] (249.125 ms) : 0, 249125
GlobalTracer [candidate] (245.517 ms) : 0, 245517
AppSec [baseline] (31.986 ms) : 0, 31986
AppSec [candidate] (32.292 ms) : 0, 32292
Debugger [baseline] (6.425 ms) : 0, 6425
Debugger [candidate] (6.398 ms) : 0, 6398
Remote Config [baseline] (676.635 µs) : 0, 677
Remote Config [candidate] (691.263 µs) : 0, 691
Telemetry [baseline] (16.688 ms) : 0, 16688
Telemetry [candidate] (11.081 ms) : 0, 11081
Flare Poller [baseline] (4.111 ms) : 0, 4111
Flare Poller [candidate] (9.399 ms) : 0, 9399
section iast
crashtracking [baseline] (1.484 ms) : 0, 1484
crashtracking [candidate] (1.491 ms) : 0, 1491
BytebuddyAgent [baseline] (839.472 ms) : 0, 839472
BytebuddyAgent [candidate] (844.407 ms) : 0, 844407
GlobalTracer [baseline] (239.787 ms) : 0, 239787
GlobalTracer [candidate] (233.853 ms) : 0, 233853
AppSec [baseline] (30.888 ms) : 0, 30888
AppSec [candidate] (29.319 ms) : 0, 29319
Debugger [baseline] (6.173 ms) : 0, 6173
Debugger [candidate] (6.002 ms) : 0, 6002
Remote Config [baseline] (635.14 µs) : 0, 635
Remote Config [candidate] (624.26 µs) : 0, 624
Telemetry [baseline] (8.074 ms) : 0, 8074
Telemetry [candidate] (7.935 ms) : 0, 7935
Flare Poller [baseline] (10.863 ms) : 0, 10863
Flare Poller [candidate] (4.021 ms) : 0, 4021
IAST [baseline] (31.36 ms) : 0, 31360
IAST [candidate] (32.955 ms) : 0, 32955
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.57.0-SNAPSHOT~9a902be485, baseline=1.57.0-SNAPSHOT~17c7fcf3e9
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.057 s) : 0, 1057036
Total [baseline] (10.788 s) : 0, 10788279
Agent [candidate] (1.049 s) : 0, 1048628
Total [candidate] (10.773 s) : 0, 10773229
section appsec
Agent [baseline] (1.223 s) : 0, 1222705
Total [baseline] (10.889 s) : 0, 10889247
Agent [candidate] (1.221 s) : 0, 1221015
Total [candidate] (10.874 s) : 0, 10874446
section iast
Agent [baseline] (1.195 s) : 0, 1195366
Total [baseline] (11.167 s) : 0, 11166550
Agent [candidate] (1.184 s) : 0, 1184296
Total [candidate] (11.173 s) : 0, 11173407
section profiling
Agent [baseline] (1.196 s) : 0, 1196480
Total [baseline] (10.915 s) : 0, 10914884
Agent [candidate] (1.193 s) : 0, 1193101
Total [candidate] (10.902 s) : 0, 10902174
gantt
title petclinic - break down per module: candidate=1.57.0-SNAPSHOT~9a902be485, baseline=1.57.0-SNAPSHOT~17c7fcf3e9
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.495 ms) : 0, 1495
crashtracking [candidate] (1.471 ms) : 0, 1471
BytebuddyAgent [baseline] (711.52 ms) : 0, 711520
BytebuddyAgent [candidate] (707.738 ms) : 0, 707738
GlobalTracer [baseline] (249.308 ms) : 0, 249308
GlobalTracer [candidate] (245.056 ms) : 0, 245056
AppSec [baseline] (32.159 ms) : 0, 32159
AppSec [candidate] (32.124 ms) : 0, 32124
Debugger [baseline] (6.428 ms) : 0, 6428
Debugger [candidate] (6.382 ms) : 0, 6382
Remote Config [baseline] (671.387 µs) : 0, 671
Remote Config [candidate] (671.318 µs) : 0, 671
Telemetry [baseline] (14.913 ms) : 0, 14913
Telemetry [candidate] (10.281 ms) : 0, 10281
Flare Poller [baseline] (5.61 ms) : 0, 5610
Flare Poller [candidate] (10.178 ms) : 0, 10178
section appsec
crashtracking [baseline] (1.473 ms) : 0, 1473
crashtracking [candidate] (1.47 ms) : 0, 1470
BytebuddyAgent [baseline] (728.101 ms) : 0, 728101
BytebuddyAgent [candidate] (731.141 ms) : 0, 731141
GlobalTracer [baseline] (239.975 ms) : 0, 239975
GlobalTracer [candidate] (235.947 ms) : 0, 235947
IAST [baseline] (24.746 ms) : 0, 24746
IAST [candidate] (24.635 ms) : 0, 24635
AppSec [baseline] (174.278 ms) : 0, 174278
AppSec [candidate] (174.001 ms) : 0, 174001
Debugger [baseline] (6.307 ms) : 0, 6307
Debugger [candidate] (6.118 ms) : 0, 6118
Remote Config [baseline] (683.246 µs) : 0, 683
Remote Config [candidate] (685.189 µs) : 0, 685
Telemetry [baseline] (8.218 ms) : 0, 8218
Telemetry [candidate] (8.183 ms) : 0, 8183
Flare Poller [baseline] (3.977 ms) : 0, 3977
Flare Poller [candidate] (4.044 ms) : 0, 4044
section iast
crashtracking [baseline] (1.483 ms) : 0, 1483
crashtracking [candidate] (1.491 ms) : 0, 1491
BytebuddyAgent [baseline] (833.751 ms) : 0, 833751
BytebuddyAgent [candidate] (834.347 ms) : 0, 834347
GlobalTracer [baseline] (238.528 ms) : 0, 238528
GlobalTracer [candidate] (233.959 ms) : 0, 233959
IAST [baseline] (32.75 ms) : 0, 32750
IAST [candidate] (33.388 ms) : 0, 33388
AppSec [baseline] (28.637 ms) : 0, 28637
AppSec [candidate] (27.083 ms) : 0, 27083
Debugger [baseline] (6.038 ms) : 0, 6038
Debugger [candidate] (6.001 ms) : 0, 6001
Remote Config [baseline] (611.783 µs) : 0, 612
Remote Config [candidate] (593.025 µs) : 0, 593
Telemetry [baseline] (7.979 ms) : 0, 7979
Telemetry [candidate] (7.891 ms) : 0, 7891
Flare Poller [baseline] (10.737 ms) : 0, 10737
Flare Poller [candidate] (4.023 ms) : 0, 4023
section profiling
crashtracking [baseline] (1.427 ms) : 0, 1427
crashtracking [candidate] (1.451 ms) : 0, 1451
BytebuddyAgent [baseline] (732.71 ms) : 0, 732710
BytebuddyAgent [candidate] (734.4 ms) : 0, 734400
GlobalTracer [baseline] (222.019 ms) : 0, 222019
GlobalTracer [candidate] (217.783 ms) : 0, 217783
AppSec [baseline] (32.14 ms) : 0, 32140
AppSec [candidate] (32.331 ms) : 0, 32331
Debugger [baseline] (6.828 ms) : 0, 6828
Debugger [candidate] (6.508 ms) : 0, 6508
Remote Config [baseline] (700.953 µs) : 0, 701
Remote Config [candidate] (716.138 µs) : 0, 716
Telemetry [baseline] (16.456 ms) : 0, 16456
Telemetry [candidate] (16.472 ms) : 0, 16472
Flare Poller [baseline] (4.129 ms) : 0, 4129
Flare Poller [candidate] (4.134 ms) : 0, 4134
ProfilingAgent [baseline] (110.996 ms) : 0, 110996
ProfilingAgent [candidate] (109.961 ms) : 0, 109961
Profiling [baseline] (111.631 ms) : 0, 111631
Profiling [candidate] (110.623 ms) : 0, 110623
LoadParameters
See matching parameters
SummaryFound 4 performance improvements and 4 performance regressions! Performance is the same for 11 metrics, 17 unstable metrics.
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.57.0-SNAPSHOT~9a902be485, baseline=1.57.0-SNAPSHOT~17c7fcf3e9
dateFormat X
axisFormat %s
section baseline
no_agent (1.2 ms) : 1188, 1211
. : milestone, 1200,
iast (3.231 ms) : 3185, 3277
. : milestone, 3231,
iast_FULL (5.897 ms) : 5837, 5956
. : milestone, 5897,
iast_GLOBAL (3.72 ms) : 3660, 3781
. : milestone, 3720,
profiling (2.004 ms) : 1986, 2022
. : milestone, 2004,
tracing (1.829 ms) : 1813, 1844
. : milestone, 1829,
section candidate
no_agent (1.201 ms) : 1189, 1212
. : milestone, 1201,
iast (3.472 ms) : 3423, 3520
. : milestone, 3472,
iast_FULL (5.723 ms) : 5666, 5780
. : milestone, 5723,
iast_GLOBAL (3.582 ms) : 3530, 3634
. : milestone, 3582,
profiling (2.134 ms) : 2116, 2153
. : milestone, 2134,
tracing (1.816 ms) : 1800, 1831
. : milestone, 1816,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.57.0-SNAPSHOT~9a902be485, baseline=1.57.0-SNAPSHOT~17c7fcf3e9
dateFormat X
axisFormat %s
section baseline
no_agent (17.824 ms) : 17638, 18011
. : milestone, 17824,
appsec (18.674 ms) : 18481, 18867
. : milestone, 18674,
code_origins (19.115 ms) : 18925, 19305
. : milestone, 19115,
iast (18.952 ms) : 18760, 19145
. : milestone, 18952,
profiling (18.56 ms) : 18376, 18744
. : milestone, 18560,
tracing (18.807 ms) : 18619, 18995
. : milestone, 18807,
section candidate
no_agent (19.045 ms) : 18851, 19240
. : milestone, 19045,
appsec (18.367 ms) : 18181, 18554
. : milestone, 18367,
code_origins (17.53 ms) : 17358, 17703
. : milestone, 17530,
iast (17.898 ms) : 17718, 18078
. : milestone, 17898,
profiling (19.752 ms) : 19547, 19956
. : milestone, 19752,
tracing (18.84 ms) : 18646, 19034
. : milestone, 18840,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.57.0-SNAPSHOT~9a902be485, baseline=1.57.0-SNAPSHOT~17c7fcf3e9
dateFormat X
axisFormat %s
section baseline
no_agent (15.404 s) : 15404000, 15404000
. : milestone, 15404000,
appsec (14.613 s) : 14613000, 14613000
. : milestone, 14613000,
iast (18.56 s) : 18560000, 18560000
. : milestone, 18560000,
iast_GLOBAL (17.961 s) : 17961000, 17961000
. : milestone, 17961000,
profiling (14.806 s) : 14806000, 14806000
. : milestone, 14806000,
tracing (14.554 s) : 14554000, 14554000
. : milestone, 14554000,
section candidate
no_agent (15.362 s) : 15362000, 15362000
. : milestone, 15362000,
appsec (14.667 s) : 14667000, 14667000
. : milestone, 14667000,
iast (18.461 s) : 18461000, 18461000
. : milestone, 18461000,
iast_GLOBAL (18.006 s) : 18006000, 18006000
. : milestone, 18006000,
profiling (14.889 s) : 14889000, 14889000
. : milestone, 14889000,
tracing (14.759 s) : 14759000, 14759000
. : milestone, 14759000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.57.0-SNAPSHOT~9a902be485, baseline=1.57.0-SNAPSHOT~17c7fcf3e9
dateFormat X
axisFormat %s
section baseline
no_agent (1.472 ms) : 1460, 1483
. : milestone, 1472,
appsec (3.646 ms) : 3433, 3860
. : milestone, 3646,
iast (2.205 ms) : 2140, 2269
. : milestone, 2205,
iast_GLOBAL (2.252 ms) : 2187, 2317
. : milestone, 2252,
profiling (2.076 ms) : 2023, 2130
. : milestone, 2076,
tracing (2.038 ms) : 1987, 2088
. : milestone, 2038,
section candidate
no_agent (1.476 ms) : 1464, 1487
. : milestone, 1476,
appsec (3.703 ms) : 3485, 3922
. : milestone, 3703,
iast (2.202 ms) : 2138, 2266
. : milestone, 2202,
iast_GLOBAL (2.252 ms) : 2187, 2317
. : milestone, 2252,
profiling (2.067 ms) : 2015, 2119
. : milestone, 2067,
tracing (2.045 ms) : 1995, 2096
. : milestone, 2045,
|
229f67a to
374d13d
Compare
21e0a65 to
259eeb5
Compare
|
Hi! 👋 Thanks for your pull request! 🎉 To help us review it, please make sure to:
If you need help, please check our contributing guidelines. |
9e7acbe to
b2850b3
Compare
|
Hi @amarziali I am one of the developers of JCTools and we are super happy if we could bring a var handle generation variant in our lib as well. Note: JCTools is at the very core of other frameworks which will soon hit the "no unsafe world" JVM barrier, including Netty. |
utils/queue-utils/src/main/java/datadog/common/queue/BaseQueue.java
Outdated
Show resolved
Hide resolved
85b0dcd to
fc49419
Compare
fc49419 to
183fc37
Compare
183fc37 to
9a902be
Compare
bric3
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First batch of comments, questions, open thoughts...
| // -1 in two's complement = 0xFFFF_FFFF (all bits set to 1). Unsigned right-shifting by n: (-1 | ||
| // >>> n) produces a mask of (32 - n) one-bits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion:
Avoids spotless meaningless reformat
| // -1 in two's complement = 0xFFFF_FFFF (all bits set to 1). Unsigned right-shifting by n: (-1 | |
| // >>> n) produces a mask of (32 - n) one-bits. | |
| // -1 in two's complement = 0xFFFF_FFFF (all bits set to 1). | |
| // Unsigned right-shifting by n: (-1 // >>> n) produces a mask | |
| // of (32 - n) one-bits. |
utils/queue-utils/build.gradle.kts
Outdated
| tasks.withType<Javadoc>().configureEach() { | ||
| javadocTool = javaToolchains.javadocToolFor(java.toolchain) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: Do we need javadoc for a submodule ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. To be fair I copied paste from another in the same utils folder. I can perhaps get rid of
| /* | ||
| Benchmark (capacity) Mode Cnt Score Error Units | ||
| JctoolsMPSCBlockingConsumerQueueBenchmark.queueTest 1024 thrpt 41,149 ops/us | ||
| JctoolsMPSCBlockingConsumerQueueBenchmark.queueTest:consume 1024 thrpt 30,661 ops/us | ||
| JctoolsMPSCBlockingConsumerQueueBenchmark.queueTest:produce 1024 thrpt 10,488 ops/us | ||
| JctoolsMPSCBlockingConsumerQueueBenchmark.queueTest 65536 thrpt 32,413 ops/us | ||
| JctoolsMPSCBlockingConsumerQueueBenchmark.queueTest:consume 65536 thrpt 24,680 ops/us | ||
| JctoolsMPSCBlockingConsumerQueueBenchmark.queueTest:produce 65536 thrpt 7,733 ops/us | ||
| */ | ||
| @BenchmarkMode(Mode.Throughput) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: It might worth it to have benchmarks on aarch64 and x86_64 ?
As there might be differences how VarHandle translates to machine code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, fences are cheaper on x86_64, so running the JMH benchmark on the worst-case architecture should already give a meaningful picture. Just keep in mind that benchmarks of this type tend to be very sensitive to machine conditions because the operations are so small. My goal was only to get a rough estimate of the expected behaviour, not to push the microbenchmarking further. In realistic scenarios, I’m more interested in understanding the impact on the host application, and that’s already covered by the macro benchmarks we have.
| /* | ||
| Benchmark (capacity) Mode Cnt Score Error Units | ||
| MPSCQueueBenchmark.queueTest 1024 thrpt 238,609 ops/us | ||
| MPSCQueueBenchmark.queueTest:consume 1024 thrpt 222,383 ops/us | ||
| MPSCQueueBenchmark.queueTest:produce 1024 thrpt 16,226 ops/us | ||
| MPSCQueueBenchmark.queueTest 65536 thrpt 262,729 ops/us | ||
| MPSCQueueBenchmark.queueTest:consume 65536 thrpt 250,627 ops/us | ||
| MPSCQueueBenchmark.queueTest:produce 65536 thrpt 12,102 ops/us | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Removed tabs + alignment
| /* | |
| Benchmark (capacity) Mode Cnt Score Error Units | |
| MPSCQueueBenchmark.queueTest 1024 thrpt 238,609 ops/us | |
| MPSCQueueBenchmark.queueTest:consume 1024 thrpt 222,383 ops/us | |
| MPSCQueueBenchmark.queueTest:produce 1024 thrpt 16,226 ops/us | |
| MPSCQueueBenchmark.queueTest 65536 thrpt 262,729 ops/us | |
| MPSCQueueBenchmark.queueTest:consume 65536 thrpt 250,627 ops/us | |
| MPSCQueueBenchmark.queueTest:produce 65536 thrpt 12,102 ops/us | |
| */ | |
| /* | |
| Benchmark (capacity) Mode Cnt Score Error Units | |
| MPSCQueueBenchmark.queueTest 1024 thrpt 238,609 ops/us | |
| MPSCQueueBenchmark.queueTest:consume 1024 thrpt 222,383 ops/us | |
| MPSCQueueBenchmark.queueTest:produce 1024 thrpt 16,226 ops/us | |
| MPSCQueueBenchmark.queueTest 65536 thrpt 262,729 ops/us | |
| MPSCQueueBenchmark.queueTest:consume 65536 thrpt 250,627 ops/us | |
| MPSCQueueBenchmark.queueTest:produce 65536 thrpt 12,102 ops/us | |
| */ |
| Objects.requireNonNull(e); | ||
|
|
||
| // jctools does the same local copy to have the jitter optimise the accesses | ||
| final Object[] localBuffer = this.buffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion:
| final Object[] localBuffer = this.buffer; | |
| @SuppressWarnings("UnnecessaryLocalVariable") | |
| final Object[] localBuffer = this.buffer; |
| import java.lang.invoke.VarHandle; | ||
|
|
||
| /** A padded, volatile long sequence value designed to minimize false sharing. */ | ||
| public final class PaddedSequence extends LongRhsPadding { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: It doesn't seem to be always used as "sequence".
| @SuppressWarnings("unused") | ||
| private long r0, r1, r2, r3, r4, r5, r6; | ||
|
|
||
| public BaseQueue(int requestedCapacity) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick:
| public BaseQueue(int requestedCapacity) { | |
| public BaseQueue(int requestedCapacity) { | |
| if (requestedCapacity < 1) { throw new IllArgumentException("Size needs to be at least 1"); } |
| public MpscArrayQueueVarHandle(int requestedCapacity) { | ||
| super(requestedCapacity); | ||
| this.producerLimit = new PaddedSequence(capacity); | ||
| ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ; |
| MetricWriter writer, | ||
| Queue<Batch> batchPool, | ||
| MpscCompoundQueue<InboxItem> inbox, | ||
| NonBlockingQueue<InboxItem> inbox, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: I wonder if it makes to communicate via the type, the properties of this queue here, i.e. multiple producers single consumer?
So reader of this code don't have to wonder the connection between the customers and producers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would require creating too many additional types, which I deliberately want to avoid. Functionally, it wouldn’t change anything. I think it’s sufficient for the reader to simply check which queue variant is requested at creation time.
| long cachedHead = 0L; // Local cache of head to reduce volatile reads | ||
|
|
||
| int spinCycles = 0; | ||
| boolean parkOnSpin = (Thread.currentThread().getId() & 1) == 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth some larger comment here ?
This reverts commit 14cc597.
9a902be to
c8499a4
Compare
What Does This Do
This PR introduces a set of queue implementations in order to replace the JCTools-based queues, eliminating direct usage of sun.misc.Unsafe and providing full compatibility with Java 9+ runtimes through the VarHandle API.
The goal is to achieve similar high-performance concurrent queue behavior as JCTools while using supported, standard Java mechanisms.
A new
Queuesfactory class is introduced to dynamically select the optimal queue implementation based on the Java runtime environment:Introduced Classes Summary
SpscArrayQueueVarHandleSpmcArrayQueueVarHandleconsumerLimitcaching to reduce volatile contention.MpscArrayQueueVarHandle<E>TAIL_HANDLE. Maintains aproducerLimitto minimize volatile head reads.MpscBlockingConsumerArrayQueueVarHandle<E>CONSUMER_THREAD_HANDLEto park/unpark the waiting consumer efficiently.Memory Padding
All queue state fields (
head,tail, cached limits, etc.) are cache-line padded to prevent false sharing between producers and consumers.This ensures that frequently accessed hot fields do not reside on the same cache line across threads, minimizing cache invalidations and improving throughput under contention.
Memory Fence Semantics
Memory fences were explicitly chosen for each access type to minimize volatile overhead while maintaining correct visibility guarantees:
setRelease/getAcquirefor publishing and consuming elements — provides correct inter-thread ordering without full barriers.setOpaque/getOpaquefor relaxed head/tail updates — avoids unnecessary synchronization costs where ordering is not required.getVolatileonly used when full memory fences are really required (e.g. refreshing limits to ensure visibility when the queue might be full or empty).Queue Benchmark Results (ops/us)
Note: SPSC benchmark shows contentions on slow path (i.e. queue is full/queue is empty). This should less frequently happen in our case. Increasing the queue size (hence reducing the probability that's full) shows good performances.
MPSCBlockingConsumer Queue Benchmark (ops/us)
MPSC Queue Benchmark (ops/us)
SPSC Queue Benchmark (ops/us)
Takeaways:
Room for future improvements
In high-throughput scenarios where multiple producers compete for queue space, contention on the CAS operation can become a bottleneck.
Idea to mitigate this, when the queue is likely not full, a
getAndAddoperation can be used instead of a CAS to claim slots since it will never fail. This optimization allows multiple producers to advance the tail index with reduced atomic contention. However, when the queue is nearly full, the getAndAdd cannot be reliably done hence the classic CAS loop (slow path) can be used instead.Motivation
Additional Notes
Contributor Checklist
type:and (comp:orinst:) labels in addition to any useful labelsclose,fixor any linking keywords when referencing an issue.Use
solvesinstead, and assign the PR milestone to the issueJira ticket: [PROJ-IDENT]