Skip to content

Conversation

@watson
Copy link
Collaborator

@watson watson commented Nov 13, 2025

What does this PR do?

When a snapshot is being collected as part of a Dynamic Instrumentation/Live Debugger line probe, the following new logic has been added:

  • Skip enumerating collections with more than 500 elements/entries. Mark with notCapturedReason: 'Large collection with too many elements (skip threshold: 500)' and record size.
  • Collect object with more then 500 properties the first time a probe encounters them but flag the probe to prevent future snapshot attempts for that probe.

Error reporting

For the case where an object with more than 500 properties was detected, an evaluation error will be added to the probe result, letting the user know that no future snapshots will be collected for this version of the probe within the current process. This evaluation error will also be added to all future probe results of this version of the probe within the current process.

Motivation

Handle edge cases where an object has a lot of properties, or where collections (arrays/sets/maps) has a lot of elements.

In those cases, a single Chrome DevTools Protocol Debugger.getProperties request will take a long time to serialize their properties/elements, which can siginificantly overshoot the time budget (as ongoing CDP requests are allowed to complete before aborting).

Additional Notes

The implementation in this PR has the following downsides:

  • The threshold of 500 was chosen by looking at avg. CDP request times on my own machine, but might not be the right number on other hardware.
  • A user might see a significant overhead the first time a probe tries to snapshot a very large object with a lot of properties on the same level. However, it's a one time overhead (only reappearing if the Node.js process is restarted and the probe is updated/re-initialized).
  • A snapshot will never contain any entries/elements from collections that has a size larger than 500 entries/elements.

@watson watson self-assigned this Nov 13, 2025
Copy link
Collaborator Author

watson commented Nov 13, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions
Copy link

github-actions bot commented Nov 13, 2025

Overall package size

Self size: 13.43 MB
Deduped: 113.62 MB
No deduping: 128.64 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.7.0 | 35.02 MB | 35.02 MB | | @datadog/native-appsec | 10.3.0 | 20.73 MB | 20.74 MB | | @datadog/pprof | 5.12.0 | 11.19 MB | 11.57 MB | | @datadog/native-iast-taint-tracking | 4.1.0 | 9.01 MB | 9.02 MB | | @opentelemetry/resources | 1.30.1 | 557.67 kB | 7.71 MB | | @opentelemetry/core | 1.30.1 | 908.66 kB | 7.16 MB | | protobufjs | 7.5.4 | 2.95 MB | 5.83 MB | | @datadog/wasm-js-rewriter | 5.0.1 | 2.82 MB | 3.53 MB | | @datadog/native-metrics | 3.1.1 | 1.02 MB | 1.43 MB | | @opentelemetry/api-logs | 0.208.0 | 199.48 kB | 1.42 MB | | @opentelemetry/api | 1.9.0 | 1.22 MB | 1.22 MB | | jsonpath-plus | 10.3.0 | 617.18 kB | 1.08 MB | | import-in-the-middle | 1.15.0 | 127.66 kB | 856.24 kB | | lru-cache | 10.4.3 | 804.3 kB | 804.3 kB | | @datadog/openfeature-node-server | 0.2.0 | 118.51 kB | 437.19 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | pprof-format | 2.2.1 | 163.06 kB | 163.06 kB | | @datadog/sketches-js | 2.1.1 | 109.9 kB | 109.9 kB | | @isaacs/ttlcache | 2.1.2 | 90.79 kB | 90.79 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 7.0.5 | 63.38 kB | 63.38 kB | | istanbul-lib-coverage | 3.2.2 | 34.37 kB | 34.37 kB | | rfdc | 1.4.1 | 27.15 kB | 27.15 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | shell-quote | 1.8.3 | 23.74 kB | 23.74 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | semifies | 1.0.0 | 15.84 kB | 15.84 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | ttl-set | 1.0.0 | 4.61 kB | 9.69 kB | | mutexify | 1.4.0 | 5.71 kB | 8.74 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | module-details-from-path | 1.0.4 | 3.96 kB | 3.96 kB | | escape-string-regexp | 5.0.0 | 3.66 kB | 3.66 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@datadog-official
Copy link

datadog-official bot commented Nov 13, 2025

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: d42205f | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@watson watson force-pushed the watson/DEBUG-4727/handle-large-objects branch from b11baaa to bae04ee Compare November 13, 2025 18:21
@codecov
Copy link

codecov bot commented Nov 13, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.94%. Comparing base (0bb1f17) to head (d42205f).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6912   +/-   ##
=======================================
  Coverage   84.94%   84.94%           
=======================================
  Files         514      514           
  Lines       21754    21754           
=======================================
  Hits        18478    18478           
  Misses       3276     3276           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@watson watson force-pushed the watson/DEBUG-4727/debugger-snapshot-time-budget branch from b9b7fa5 to ddd5ec0 Compare November 14, 2025 10:22
@watson watson force-pushed the watson/DEBUG-4727/handle-large-objects branch 2 times, most recently from 70a9f50 to eb9dfa6 Compare November 14, 2025 10:27
@watson watson force-pushed the watson/DEBUG-4727/debugger-snapshot-time-budget branch from ddd5ec0 to bfd99a1 Compare November 14, 2025 10:27
@watson watson force-pushed the watson/DEBUG-4727/handle-large-objects branch from eb9dfa6 to c66996e Compare November 14, 2025 10:31
@watson watson force-pushed the watson/DEBUG-4727/debugger-snapshot-time-budget branch 2 times, most recently from 65c4e4b to 4a970c8 Compare November 14, 2025 11:31
@watson watson force-pushed the watson/DEBUG-4727/handle-large-objects branch from c66996e to 22ae693 Compare November 14, 2025 11:31
@pr-commenter
Copy link

pr-commenter bot commented Nov 14, 2025

Benchmarks

Benchmark execution time: 2025-12-02 09:53:21

Comparing candidate commit d42205f in PR branch watson/DEBUG-4727/handle-large-objects with baseline commit 0bb1f17 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 294 metrics, 26 unstable metrics.

@watson watson force-pushed the watson/DEBUG-4727/handle-large-objects branch 4 times, most recently from 76733ce to 96bba09 Compare November 15, 2025 05:18
@watson watson force-pushed the watson/DEBUG-4727/debugger-snapshot-time-budget branch from 4a970c8 to 6f67548 Compare November 15, 2025 05:18
@watson watson force-pushed the watson/DEBUG-4727/handle-large-objects branch from 96bba09 to c680828 Compare November 15, 2025 05:41
@watson watson force-pushed the watson/DEBUG-4727/debugger-snapshot-time-budget branch from 6f67548 to 495678c Compare November 15, 2025 05:41
@watson watson force-pushed the watson/DEBUG-4727/handle-large-objects branch from c680828 to f42bb33 Compare November 17, 2025 10:43
@watson watson force-pushed the watson/DEBUG-4727/debugger-snapshot-time-budget branch from 495678c to cd88600 Compare November 17, 2025 10:43
@watson watson force-pushed the watson/DEBUG-4727/handle-large-objects branch from f42bb33 to f9c32b5 Compare November 17, 2025 10:51
@watson watson force-pushed the watson/DEBUG-4727/debugger-snapshot-time-budget branch from cd88600 to 8b643eb Compare November 17, 2025 10:51
@watson watson changed the base branch from watson/DEBUG-4727/debugger-snapshot-time-budget to graphite-base/6912 November 17, 2025 14:36
@watson watson force-pushed the watson/DEBUG-4727/handle-large-objects branch 5 times, most recently from 6599c43 to 7b21825 Compare November 21, 2025 12:26
@watson watson marked this pull request as ready for review November 21, 2025 12:44
@watson watson requested review from a team as code owners November 21, 2025 12:44
@watson watson force-pushed the watson/DEBUG-4727/handle-large-objects branch 2 times, most recently from 57f64ef to a1e166b Compare November 21, 2025 15:13
@watson watson added the debugger Dynamic Instrumentation & Live Debugger label Nov 21, 2025
@watson watson force-pushed the watson/DEBUG-4727/handle-large-objects branch 5 times, most recently from 3644a2d to 1e27a7b Compare November 29, 2025 07:40
snapshot.captures = {
lines: { [probe.location.lines[0]]: { locals: state } }
}
if (captureErrors.length > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's safe if we init captureErrors with [] or use captureErrors?.length to avoid undefined behavior when numberOfProbesWithSnapshots === 0 (if this possible)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! I fixed it in the latest commit 👍

// Consider if we could set errors just for the part of the scope chain that throws during collection.
log.error('[debugger:devtools_client] Error getting local state for call frame', err)
return returnError
if (opts.ctx.deadlineReached === true) break // TODO: Bad UX; Variables in remaining scopes are silently dropped
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in a finally block no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: why not adding an error to captureErrors when we reach the deadline?

Copy link
Collaborator Author

@watson watson Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in a finally block no?

Since we're not re-throwing inside of the catch block, having no finally block vs having a finally block is functionally equivalent: The first line of code after the end of the catch block is going to be executed after either the try or the catch block successfully ends.

Q: why not adding an error to captureErrors when we reach the deadline?

If the calling code sees any errors in the captureErrors array they will be treated as permanent errors and future capturing of snapshots for this probe will be disabled. We don't want that, as reaching the deadline is not seen as a permanent error, but just as a normal stop-gab to keep the overhead low. So we don't want to add an error to the captureErrors array in case we reach the deadline.

Handle edge cases where an object has a lot of properties, or where
collections (arrays/sets/maps) has a lot of elements.

In those cases, a single Chrome DevTools Protocol
`Debugger.getProperties` request will take a long time to serialize
their properties/elements, which can siginificantly overshoot the time
budget (as ongoing CDP requests are allowed to complete before aborting)

This is dealt with in the following way:

- Skip enumerating large collections (arrays/maps/sets) by parsing size from
  descriptions; mark with notCapturedReason: "collectionSize" and record size.
- Large objects are still collected the first time, but objects with
  >500 properties set a fatalSnapshotError to prevent future snapshot
  attempts for that probe and emits error to the diagnostics endpoint.
@watson watson force-pushed the watson/DEBUG-4727/handle-large-objects branch from 1e27a7b to 957638a Compare December 2, 2025 08:46
@watson watson force-pushed the watson/DEBUG-4727/handle-large-objects branch from 782a2f4 to d42205f Compare December 2, 2025 09:42
// Trim the number of properties on the object if there's too many.
const size = result.length
if (size > LARGE_OBJECT_SKIP_THRESHOLD) {
opts.ctx.captureErrors.push(new Error(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, do these errors get bubbled to customers? It feels like this should fall into a "warning" category since there isn't really an action for them to take unless we want to disable probes that hit too many of these conditions

Copy link
Collaborator Author

@watson watson Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are added as evaluation errors to the probe result. Both to the current probe result and to all subsequent probe results of the current probe version, as future invocations of this probe will have snapshot collection disabled. In the future we need a better way to communicate errors like these to the backend, but right now it's either as evaluation errors or as a error on the diagnostics endpoint. The latter would not work as it would be cleared the a few milliseconds later as the probe result is sent.

Copy link
Member

@tylfin tylfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

debugger Dynamic Instrumentation & Live Debugger semver-minor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants