-
Notifications
You must be signed in to change notification settings - Fork 358
feat(debugger): add special handling for very large collections/objects #6912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Overall package sizeSelf size: 13.43 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.7.0 | 35.02 MB | 35.02 MB | | @datadog/native-appsec | 10.3.0 | 20.73 MB | 20.74 MB | | @datadog/pprof | 5.12.0 | 11.19 MB | 11.57 MB | | @datadog/native-iast-taint-tracking | 4.1.0 | 9.01 MB | 9.02 MB | | @opentelemetry/resources | 1.30.1 | 557.67 kB | 7.71 MB | | @opentelemetry/core | 1.30.1 | 908.66 kB | 7.16 MB | | protobufjs | 7.5.4 | 2.95 MB | 5.83 MB | | @datadog/wasm-js-rewriter | 5.0.1 | 2.82 MB | 3.53 MB | | @datadog/native-metrics | 3.1.1 | 1.02 MB | 1.43 MB | | @opentelemetry/api-logs | 0.208.0 | 199.48 kB | 1.42 MB | | @opentelemetry/api | 1.9.0 | 1.22 MB | 1.22 MB | | jsonpath-plus | 10.3.0 | 617.18 kB | 1.08 MB | | import-in-the-middle | 1.15.0 | 127.66 kB | 856.24 kB | | lru-cache | 10.4.3 | 804.3 kB | 804.3 kB | | @datadog/openfeature-node-server | 0.2.0 | 118.51 kB | 437.19 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | pprof-format | 2.2.1 | 163.06 kB | 163.06 kB | | @datadog/sketches-js | 2.1.1 | 109.9 kB | 109.9 kB | | @isaacs/ttlcache | 2.1.2 | 90.79 kB | 90.79 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 7.0.5 | 63.38 kB | 63.38 kB | | istanbul-lib-coverage | 3.2.2 | 34.37 kB | 34.37 kB | | rfdc | 1.4.1 | 27.15 kB | 27.15 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | shell-quote | 1.8.3 | 23.74 kB | 23.74 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | semifies | 1.0.0 | 15.84 kB | 15.84 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | ttl-set | 1.0.0 | 4.61 kB | 9.69 kB | | mutexify | 1.4.0 | 5.71 kB | 8.74 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | module-details-from-path | 1.0.4 | 3.96 kB | 3.96 kB | | escape-string-regexp | 5.0.0 | 3.66 kB | 3.66 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🔗 Commit SHA: d42205f | Docs | Datadog PR Page | Was this helpful? Give us feedback! |
b11baaa to
bae04ee
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #6912 +/- ##
=======================================
Coverage 84.94% 84.94%
=======================================
Files 514 514
Lines 21754 21754
=======================================
Hits 18478 18478
Misses 3276 3276 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
b9b7fa5 to
ddd5ec0
Compare
70a9f50 to
eb9dfa6
Compare
ddd5ec0 to
bfd99a1
Compare
eb9dfa6 to
c66996e
Compare
65c4e4b to
4a970c8
Compare
c66996e to
22ae693
Compare
BenchmarksBenchmark execution time: 2025-12-02 09:53:21 Comparing candidate commit d42205f in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 294 metrics, 26 unstable metrics. |
76733ce to
96bba09
Compare
4a970c8 to
6f67548
Compare
96bba09 to
c680828
Compare
6f67548 to
495678c
Compare
c680828 to
f42bb33
Compare
495678c to
cd88600
Compare
f42bb33 to
f9c32b5
Compare
cd88600 to
8b643eb
Compare
6599c43 to
7b21825
Compare
57f64ef to
a1e166b
Compare
packages/dd-trace/test/debugger/devtools_client/snapshot/target-code/max-collection-size.js
Outdated
Show resolved
Hide resolved
3644a2d to
1e27a7b
Compare
| snapshot.captures = { | ||
| lines: { [probe.location.lines[0]]: { locals: state } } | ||
| } | ||
| if (captureErrors.length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's safe if we init captureErrors with [] or use captureErrors?.length to avoid undefined behavior when numberOfProbesWithSnapshots === 0 (if this possible)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! I fixed it in the latest commit 👍
| // Consider if we could set errors just for the part of the scope chain that throws during collection. | ||
| log.error('[debugger:devtools_client] Error getting local state for call frame', err) | ||
| return returnError | ||
| if (opts.ctx.deadlineReached === true) break // TODO: Bad UX; Variables in remaining scopes are silently dropped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in a finally block no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: why not adding an error to captureErrors when we reach the deadline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in a finally block no?
Since we're not re-throwing inside of the catch block, having no finally block vs having a finally block is functionally equivalent: The first line of code after the end of the catch block is going to be executed after either the try or the catch block successfully ends.
Q: why not adding an error to
captureErrorswhen we reach the deadline?
If the calling code sees any errors in the captureErrors array they will be treated as permanent errors and future capturing of snapshots for this probe will be disabled. We don't want that, as reaching the deadline is not seen as a permanent error, but just as a normal stop-gab to keep the overhead low. So we don't want to add an error to the captureErrors array in case we reach the deadline.
Handle edge cases where an object has a lot of properties, or where collections (arrays/sets/maps) has a lot of elements. In those cases, a single Chrome DevTools Protocol `Debugger.getProperties` request will take a long time to serialize their properties/elements, which can siginificantly overshoot the time budget (as ongoing CDP requests are allowed to complete before aborting) This is dealt with in the following way: - Skip enumerating large collections (arrays/maps/sets) by parsing size from descriptions; mark with notCapturedReason: "collectionSize" and record size. - Large objects are still collected the first time, but objects with >500 properties set a fatalSnapshotError to prevent future snapshot attempts for that probe and emits error to the diagnostics endpoint.
1e27a7b to
957638a
Compare
782a2f4 to
d42205f
Compare
| // Trim the number of properties on the object if there's too many. | ||
| const size = result.length | ||
| if (size > LARGE_OBJECT_SKIP_THRESHOLD) { | ||
| opts.ctx.captureErrors.push(new Error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, do these errors get bubbled to customers? It feels like this should fall into a "warning" category since there isn't really an action for them to take unless we want to disable probes that hit too many of these conditions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are added as evaluation errors to the probe result. Both to the current probe result and to all subsequent probe results of the current probe version, as future invocations of this probe will have snapshot collection disabled. In the future we need a better way to communicate errors like these to the backend, but right now it's either as evaluation errors or as a error on the diagnostics endpoint. The latter would not work as it would be cleared the a few milliseconds later as the probe result is sent.
tylfin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM

What does this PR do?
When a snapshot is being collected as part of a Dynamic Instrumentation/Live Debugger line probe, the following new logic has been added:
notCapturedReason: 'Large collection with too many elements (skip threshold: 500)'and record size.Error reporting
For the case where an object with more than 500 properties was detected, an evaluation error will be added to the probe result, letting the user know that no future snapshots will be collected for this version of the probe within the current process. This evaluation error will also be added to all future probe results of this version of the probe within the current process.
Motivation
Handle edge cases where an object has a lot of properties, or where collections (arrays/sets/maps) has a lot of elements.
In those cases, a single Chrome DevTools Protocol
Debugger.getPropertiesrequest will take a long time to serialize their properties/elements, which can siginificantly overshoot the time budget (as ongoing CDP requests are allowed to complete before aborting).Additional Notes
The implementation in this PR has the following downsides: