Fix virtual thread propagation on thread remounts#10887
Fix virtual thread propagation on thread remounts#10887
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8069c80ae9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
...rc/main/java/datadog/trace/instrumentation/java/lang/jdk21/VirtualThreadInstrumentation.java
Outdated
Show resolved
Hide resolved
PerfectSlayer
left a comment
There was a problem hiding this comment.
Thanks for the reproducer. What's the overhead from moving from State to ConcurrentState?
| // Sleeping should park and later remount the virtual thread. | ||
| Thread.sleep(10); |
There was a problem hiding this comment.
🔨 issue: This is not deterministic. You can't use Thread.sleep() and expect the VM to unmount the thread. This will be a source of flakiness.
There was a problem hiding this comment.
well, not really of flakiness, because if it's not unmounted then the trace and span IDs will stay the same, but "reverse flakiness" in the sense that if it's broken, we may not see it
There was a problem hiding this comment.
Unfortunately, depending on what is exposed, we may not have a way to make this fully reliable. For similar threading / scheduling issues, I took the approach of just repeating the test many times across multiple core / threads.
Alternatively, you could also add in a signal that the relevant code was actually triggered, so we can verify that this is working.
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 68 metrics, 3 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.61.0-SNAPSHOT~a67163a58b, baseline=1.61.0-SNAPSHOT~9f2354e364
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.06 s) : 0, 1060262
Total [baseline] (8.848 s) : 0, 8847670
Agent [candidate] (1.056 s) : 0, 1055839
Total [candidate] (8.795 s) : 0, 8795088
section iast
Agent [baseline] (1.238 s) : 0, 1238170
Total [baseline] (9.552 s) : 0, 9552045
Agent [candidate] (1.237 s) : 0, 1236867
Total [candidate] (9.558 s) : 0, 9558030
gantt
title insecure-bank - break down per module: candidate=1.61.0-SNAPSHOT~a67163a58b, baseline=1.61.0-SNAPSHOT~9f2354e364
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.203 ms) : 0, 1203
crashtracking [candidate] (1.187 ms) : 0, 1187
BytebuddyAgent [baseline] (631.966 ms) : 0, 631966
BytebuddyAgent [candidate] (629.004 ms) : 0, 629004
AgentMeter [baseline] (29.369 ms) : 0, 29369
AgentMeter [candidate] (29.136 ms) : 0, 29136
GlobalTracer [baseline] (257.969 ms) : 0, 257969
GlobalTracer [candidate] (257.282 ms) : 0, 257282
AppSec [baseline] (31.78 ms) : 0, 31780
AppSec [candidate] (31.615 ms) : 0, 31615
Debugger [baseline] (59.657 ms) : 0, 59657
Debugger [candidate] (59.459 ms) : 0, 59459
Remote Config [baseline] (594.614 µs) : 0, 595
Remote Config [candidate] (584.89 µs) : 0, 585
Telemetry [baseline] (8.016 ms) : 0, 8016
Telemetry [candidate] (8.053 ms) : 0, 8053
Flare Poller [baseline] (3.517 ms) : 0, 3517
Flare Poller [candidate] (3.487 ms) : 0, 3487
section iast
crashtracking [baseline] (1.207 ms) : 0, 1207
crashtracking [candidate] (1.23 ms) : 0, 1230
BytebuddyAgent [baseline] (804.463 ms) : 0, 804463
BytebuddyAgent [candidate] (803.318 ms) : 0, 803318
AgentMeter [baseline] (11.557 ms) : 0, 11557
AgentMeter [candidate] (11.592 ms) : 0, 11592
GlobalTracer [baseline] (249.145 ms) : 0, 249145
GlobalTracer [candidate] (248.866 ms) : 0, 248866
IAST [baseline] (25.48 ms) : 0, 25480
IAST [candidate] (25.447 ms) : 0, 25447
AppSec [baseline] (26.767 ms) : 0, 26767
AppSec [candidate] (26.663 ms) : 0, 26663
Debugger [baseline] (70.186 ms) : 0, 70186
Debugger [candidate] (70.326 ms) : 0, 70326
Remote Config [baseline] (531.37 µs) : 0, 531
Remote Config [candidate] (529.452 µs) : 0, 529
Telemetry [baseline] (9.161 ms) : 0, 9161
Telemetry [candidate] (9.235 ms) : 0, 9235
Flare Poller [baseline] (3.338 ms) : 0, 3338
Flare Poller [candidate] (3.361 ms) : 0, 3361
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.61.0-SNAPSHOT~a67163a58b, baseline=1.61.0-SNAPSHOT~9f2354e364
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.059 s) : 0, 1058865
Total [baseline] (11.061 s) : 0, 11061036
Agent [candidate] (1.056 s) : 0, 1055803
Total [candidate] (11.003 s) : 0, 11003154
section appsec
Agent [baseline] (1.245 s) : 0, 1244894
Total [baseline] (11.107 s) : 0, 11107430
Agent [candidate] (1.245 s) : 0, 1245425
Total [candidate] (11.134 s) : 0, 11134272
section iast
Agent [baseline] (1.228 s) : 0, 1228383
Total [baseline] (11.328 s) : 0, 11328166
Agent [candidate] (1.228 s) : 0, 1227779
Total [candidate] (11.294 s) : 0, 11293940
section profiling
Agent [baseline] (1.185 s) : 0, 1184873
Total [baseline] (10.947 s) : 0, 10947371
Agent [candidate] (1.183 s) : 0, 1183068
Total [candidate] (10.996 s) : 0, 10996040
gantt
title petclinic - break down per module: candidate=1.61.0-SNAPSHOT~a67163a58b, baseline=1.61.0-SNAPSHOT~9f2354e364
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.202 ms) : 0, 1202
crashtracking [candidate] (1.196 ms) : 0, 1196
BytebuddyAgent [baseline] (628.979 ms) : 0, 628979
BytebuddyAgent [candidate] (628.433 ms) : 0, 628433
AgentMeter [baseline] (29.195 ms) : 0, 29195
AgentMeter [candidate] (29.092 ms) : 0, 29092
GlobalTracer [baseline] (257.775 ms) : 0, 257775
GlobalTracer [candidate] (257.1 ms) : 0, 257100
AppSec [baseline] (31.705 ms) : 0, 31705
AppSec [candidate] (31.65 ms) : 0, 31650
Debugger [baseline] (60.337 ms) : 0, 60337
Debugger [candidate] (60.162 ms) : 0, 60162
Remote Config [baseline] (586.402 µs) : 0, 586
Remote Config [candidate] (588.733 µs) : 0, 589
Telemetry [baseline] (7.975 ms) : 0, 7975
Telemetry [candidate] (8.044 ms) : 0, 8044
Flare Poller [baseline] (4.998 ms) : 0, 4998
Flare Poller [candidate] (3.513 ms) : 0, 3513
section appsec
crashtracking [baseline] (1.196 ms) : 0, 1196
crashtracking [candidate] (1.182 ms) : 0, 1182
BytebuddyAgent [baseline] (657.357 ms) : 0, 657357
BytebuddyAgent [candidate] (657.486 ms) : 0, 657486
AgentMeter [baseline] (12.012 ms) : 0, 12012
AgentMeter [candidate] (11.956 ms) : 0, 11956
GlobalTracer [baseline] (257.598 ms) : 0, 257598
GlobalTracer [candidate] (257.969 ms) : 0, 257969
IAST [baseline] (24.091 ms) : 0, 24091
IAST [candidate] (24.146 ms) : 0, 24146
AppSec [baseline] (177.737 ms) : 0, 177737
AppSec [candidate] (177.677 ms) : 0, 177677
Debugger [baseline] (66.056 ms) : 0, 66056
Debugger [candidate] (66.222 ms) : 0, 66222
Remote Config [baseline] (631.833 µs) : 0, 632
Remote Config [candidate] (618.559 µs) : 0, 619
Telemetry [baseline] (8.318 ms) : 0, 8318
Telemetry [candidate] (8.297 ms) : 0, 8297
Flare Poller [baseline] (3.614 ms) : 0, 3614
Flare Poller [candidate] (3.577 ms) : 0, 3577
section iast
crashtracking [baseline] (1.193 ms) : 0, 1193
crashtracking [candidate] (1.184 ms) : 0, 1184
BytebuddyAgent [baseline] (797.496 ms) : 0, 797496
BytebuddyAgent [candidate] (796.123 ms) : 0, 796123
AgentMeter [baseline] (11.346 ms) : 0, 11346
AgentMeter [candidate] (11.31 ms) : 0, 11310
GlobalTracer [baseline] (247.023 ms) : 0, 247023
GlobalTracer [candidate] (247.42 ms) : 0, 247420
IAST [baseline] (25.248 ms) : 0, 25248
IAST [candidate] (25.386 ms) : 0, 25386
AppSec [baseline] (26.401 ms) : 0, 26401
AppSec [candidate] (26.486 ms) : 0, 26486
Debugger [baseline] (70.563 ms) : 0, 70563
Debugger [candidate] (70.613 ms) : 0, 70613
Remote Config [baseline] (538.682 µs) : 0, 539
Remote Config [candidate] (532.587 µs) : 0, 533
Telemetry [baseline] (9.212 ms) : 0, 9212
Telemetry [candidate] (9.322 ms) : 0, 9322
Flare Poller [baseline] (3.372 ms) : 0, 3372
Flare Poller [candidate] (3.318 ms) : 0, 3318
section profiling
crashtracking [baseline] (1.171 ms) : 0, 1171
crashtracking [candidate] (1.155 ms) : 0, 1155
BytebuddyAgent [baseline] (684.544 ms) : 0, 684544
BytebuddyAgent [candidate] (683.857 ms) : 0, 683857
AgentMeter [baseline] (8.65 ms) : 0, 8650
AgentMeter [candidate] (8.599 ms) : 0, 8599
GlobalTracer [baseline] (215.837 ms) : 0, 215837
GlobalTracer [candidate] (215.349 ms) : 0, 215349
AppSec [baseline] (32.312 ms) : 0, 32312
AppSec [candidate] (32.236 ms) : 0, 32236
Debugger [baseline] (65.194 ms) : 0, 65194
Debugger [candidate] (65.612 ms) : 0, 65612
Remote Config [baseline] (566.541 µs) : 0, 567
Remote Config [candidate] (561.462 µs) : 0, 561
Telemetry [baseline] (8.482 ms) : 0, 8482
Telemetry [candidate] (7.699 ms) : 0, 7699
Flare Poller [baseline] (3.481 ms) : 0, 3481
Flare Poller [candidate] (3.482 ms) : 0, 3482
ProfilingAgent [baseline] (93.855 ms) : 0, 93855
ProfilingAgent [candidate] (93.792 ms) : 0, 93792
Profiling [baseline] (94.423 ms) : 0, 94423
Profiling [candidate] (94.35 ms) : 0, 94350
LoadParameters
See matching parameters
SummaryFound 1 performance improvements and 0 performance regressions! Performance is the same for 19 metrics, 16 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~a67163a58b, baseline=1.61.0-SNAPSHOT~9f2354e364
dateFormat X
axisFormat %s
section baseline
no_agent (18.991 ms) : 18797, 19186
. : milestone, 18991,
appsec (18.534 ms) : 18346, 18722
. : milestone, 18534,
code_origins (18.642 ms) : 18455, 18830
. : milestone, 18642,
iast (17.724 ms) : 17548, 17900
. : milestone, 17724,
profiling (20.483 ms) : 20282, 20685
. : milestone, 20483,
tracing (17.911 ms) : 17737, 18084
. : milestone, 17911,
section candidate
no_agent (18.436 ms) : 18244, 18628
. : milestone, 18436,
appsec (18.719 ms) : 18528, 18909
. : milestone, 18719,
code_origins (17.862 ms) : 17686, 18037
. : milestone, 17862,
iast (17.822 ms) : 17645, 17999
. : milestone, 17822,
profiling (19.694 ms) : 19494, 19895
. : milestone, 19694,
tracing (17.518 ms) : 17345, 17691
. : milestone, 17518,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~a67163a58b, baseline=1.61.0-SNAPSHOT~9f2354e364
dateFormat X
axisFormat %s
section baseline
no_agent (1.192 ms) : 1180, 1204
. : milestone, 1192,
iast (3.193 ms) : 3146, 3241
. : milestone, 3193,
iast_FULL (5.664 ms) : 5608, 5720
. : milestone, 5664,
iast_GLOBAL (3.564 ms) : 3503, 3625
. : milestone, 3564,
profiling (2.208 ms) : 2187, 2229
. : milestone, 2208,
tracing (1.752 ms) : 1738, 1767
. : milestone, 1752,
section candidate
no_agent (1.18 ms) : 1169, 1191
. : milestone, 1180,
iast (3.177 ms) : 3133, 3220
. : milestone, 3177,
iast_FULL (5.667 ms) : 5611, 5724
. : milestone, 5667,
iast_GLOBAL (3.654 ms) : 3594, 3714
. : milestone, 3654,
profiling (2.129 ms) : 2106, 2152
. : milestone, 2129,
tracing (1.844 ms) : 1827, 1862
. : milestone, 1844,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~a67163a58b, baseline=1.61.0-SNAPSHOT~9f2354e364
dateFormat X
axisFormat %s
section baseline
no_agent (15.046 s) : 15046000, 15046000
. : milestone, 15046000,
appsec (14.668 s) : 14668000, 14668000
. : milestone, 14668000,
iast (18.077 s) : 18077000, 18077000
. : milestone, 18077000,
iast_GLOBAL (17.751 s) : 17751000, 17751000
. : milestone, 17751000,
profiling (15.233 s) : 15233000, 15233000
. : milestone, 15233000,
tracing (14.863 s) : 14863000, 14863000
. : milestone, 14863000,
section candidate
no_agent (15.356 s) : 15356000, 15356000
. : milestone, 15356000,
appsec (14.767 s) : 14767000, 14767000
. : milestone, 14767000,
iast (18.017 s) : 18017000, 18017000
. : milestone, 18017000,
iast_GLOBAL (17.93 s) : 17930000, 17930000
. : milestone, 17930000,
profiling (15.369 s) : 15369000, 15369000
. : milestone, 15369000,
tracing (15.084 s) : 15084000, 15084000
. : milestone, 15084000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~a67163a58b, baseline=1.61.0-SNAPSHOT~9f2354e364
dateFormat X
axisFormat %s
section baseline
no_agent (1.473 ms) : 1462, 1485
. : milestone, 1473,
appsec (3.801 ms) : 3578, 4023
. : milestone, 3801,
iast (2.251 ms) : 2181, 2321
. : milestone, 2251,
iast_GLOBAL (2.297 ms) : 2227, 2367
. : milestone, 2297,
profiling (2.068 ms) : 2013, 2123
. : milestone, 2068,
tracing (2.057 ms) : 2003, 2111
. : milestone, 2057,
section candidate
no_agent (1.469 ms) : 1457, 1481
. : milestone, 1469,
appsec (3.777 ms) : 3558, 3997
. : milestone, 3777,
iast (2.248 ms) : 2179, 2317
. : milestone, 2248,
iast_GLOBAL (2.284 ms) : 2214, 2353
. : milestone, 2284,
profiling (2.081 ms) : 2025, 2137
. : milestone, 2081,
tracing (2.053 ms) : 1999, 2107
. : milestone, 2053,
|
I didn't check, is there an existing benchmark I could run, or do I need to create one ? |
It was one of the feedback when we start introducing VT support that there is a CPU overhead. Some customers are even disabling it for that reason.
I think so. But the more support we had, the bigger the overhead is. And VT may impact many customers / use case. |
What Does This Do
Follow up on #10040
The code was storing and retrieving context on the first activation, but then it was lost, while a virtual thread can be paused and resumed many times.
Switching to a ConcurrentState allows recovering the context several times, to keep following the thread.
Motivation
Potential fix for #9984
Additional Notes
Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issueJira ticket: [PROJ-IDENT]
Note: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.