Store ClickHouse advanced-queries metric definitions as JSON by AAraKKe · Pull Request #23829 · DataDog/integrations-core

AAraKKe · 2026-05-25T13:55:55Z

What does this PR do?

Moves the four clickhouse/datadog_checks/clickhouse/advanced_queries/system_*.py modules to compact JSON files under clickhouse/datadog_checks/clickhouse/data/. The check loads them once on first run via check_initializations. Runtime dict shape and emitted metric names are byte-identical to before.

Motivation

advanced_queries/system_events.py alone was ~172 KB, with the per-entry 'name': '<prefix>.<key>' and 'type': '<type>' fields repeated for every metric. Combined, the four modules shipped ~250 KB of redundant dict literals. JSON-with-derived-fields drops that to ~64 KB and removes the matching .pyc companions, contributing to the agent's static-quality-gates budget.

Size reduction

Installed footprint of clickhouse/datadog_checks/clickhouse/ (source .py + bytecode .pyc + data files, as the agent stores it on disk after install) measured against master:

	master	this PR	Δ
`.py` source	562,620 B	316,689 B	−245,931 B (−240 KB)
`.pyc` bytecode	469,483 B	301,952 B	−167,531 B (−164 KB)
`.json` data	0	64,044 B	+64,044 B
other	19,913 B	19,913 B	0
total	1,052,016 B (1.00 MB)	702,598 B (686 KB)	−349,418 B (−341 KB)

The .pyc drop is the larger half of the win: Python's marshal representation of the ~1,600-entry dict literals across the four old modules is substantial. JSON ships as plain bytes with no bytecode companion.

Review checklist (to be filled by reviewers)

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
Add qa/required if this PR needs QA validation, or qa/skip-qa if it does not. Exactly one of the two is required.
If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

This PR has been created and validated using the paired-review skill from agent-integrations. Ready for human review.

The four advanced_queries Python modules shipped ~250 KB of redundant dict literals (per-entry 'name' was always '<prefix>.<key>', and every entry repeated its type). Move that data to compact per-system-table JSON files under datadog_checks/clickhouse/data/ and build the QueryManager-shaped dicts at runtime. The check registers a check_initializations callable so the JSON files are parsed once on the first check run. Module attributes like advanced_queries.SystemMetrics remain available through __getattr__ backed by the same cache, so tests that read those attributes directly keep working. The metric generator emits the new JSON format directly; the three system_*.tpl templates and the four old Python modules are removed.

github-actions · 2026-05-25T13:56:39Z

⚠️ Major version bump
The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

codecov · 2026-05-25T14:00:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.16%. Comparing base (ae6846b) to head (a9c1a57).
⚠️ Report is 19 commits behind head on master.

Additional details and impacted files

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

datadog-official · 2026-05-25T14:01:31Z

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
• Patch Coverage: 100.00%
• Overall Coverage: 93.19%

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: a9c1a57 | Docs | Datadog PR Page | Give us feedback!}

- Replace the initializer(check) factory with a plain warm_cache callable; the closure never depended on the check instance. - Restore __all__ on the package so the public surface is explicit. - Rename _cache to cache per the AGENTS.md module-name rule. - Mirror Python's default AttributeError format in __getattr__. - Tighten return-type annotations on load, _build_items, and __getattr__. - Raise loudly in generate_queries when a metric type appears with mixed scaled/unscaled entries instead of silently producing a wrong container. - Reclass the changelog from .changed to .fixed: the refactor preserves runtime behaviour byte-for-byte, so a patch bump is the right semver.

- Guard warm_cache with an explicit "key not in cache" check so the JSON files are read at most once per process, even when __getattr__ populated the cache first. - Wrap load() exceptions as RuntimeError so a missing or malformed data file produces an actionable message in the check-init failure. - Discriminate the two JSON shapes on positive presence of "columns" rather than absence of "items". - Reclass the changelog as .changed: this is a significant internal refactor, not a bug fix, so .fixed was misleading.

- Wrap KeyError in load()'s error path so a malformed JSON file (one that parses but is missing expected keys) raises the same context-rich RuntimeError as a missing or invalid file. - Remove SYSTEM_ERRORS_SPEC and generate_system_errors() from the generator. The system_errors data is static; the committed data/system_errors.json is the single source of truth, and the other three queries continue to be generator-driven from ClickHouse source.

- Comment the load() branch that handles system_errors so the asymmetry is named at the call site instead of requiring readers to audit the JSON files. - Widen load()'s except tuple with TypeError and AttributeError so a malformed JSON shape (e.g. items shipped as a list) still raises the wrapped RuntimeError with the file name rather than leaking the bare underlying exception. - Narrow _build_items's compact parameter type to dict[str, list[str] | dict[str, str]] to mirror the producer annotation in generate_metrics.py. - Rename the generator's Template dataclass to FileTemplate so it doesn't silently shadow string.Template if the stdlib import is ever reintroduced.

- Rename advanced_queries.cache to _cache so the underscore signals it as module-internal mutable state (PEP 8 module-private). The module's __all__ already advertises only the four System* names. - Rephrase the load() inline comment around the columns shortcut so it names the discriminating shape rather than the system_errors file, which would mislead if a second verbatim file were added later. - Replace the single-member Templates enum in the generator with a TESTS_METRICS_TEMPLATE constant; the enum was a vestige from when it held the three QUERY_* templates that now live in QUERY_SPECS.

Adds tests/test_advanced_queries.py covering the new loader logic: - module-level __getattr__ resolution + caching - compact format: source/match column shape, sorted items, name derivation including the dotted-key edge case (jemalloc.epoch) - verbatim format: system_errors columns pass through with the boolean: true tag preserved - RuntimeError wrap on every malformed-JSON path the load() except tuple is meant to cover (missing file, invalid JSON, items as list, items as scalar, missing required keys) with the cause chain preserved - warm_cache populates every known name and is idempotent

iliakur · 2026-05-26T10:54:26Z

@@ -0,0 +1 @@
+Store advanced-queries metric definitions as JSON loaded on first check run.


If this doesn't break existing customers, we need a different changelog format. fixed suggests itself with disk usage optimization stated as a clear goal.

If there's no actual change to behavior and we only refactor to shave off disk usage, I'd even consider no changelog to be fine.

True, didn't notice the changed. I focused on the fix and completely missed this was wrong. Updated.

Review from dkirov-dd is dismissed. Related teams and files:

agent-integrations
- clickhouse/changelog.d/23829.fixed

The advanced_queries package is now organised around one named pattern: the SQL-returns-(value, metric_name)-and-dispatches-via-lookup-table shape that SystemEvents, SystemMetrics, and SystemAsynchronousMetrics all share. The compact JSON files exist specifically to compress that pattern. - Rename load() to load_match_query(); _build_items() to _expand_match_items(); NAMES to MATCH_QUERIES; _cache to _match_query_cache. Names now say what the loader does. - Inline SystemErrors as a plain Python literal in __init__.py. Its shape doesn't fit the bulk-match pattern, so the JSON compression has nothing to compress; data/system_errors.json is removed and the verbatim-columns branch in load() goes away with it. - Add a top-level docstring that describes the JSON schema, names the generator that produces the files, and points operators at the hatch run metrics:generate command. - Add clickhouse/AGENTS.md (with a CLAUDE.md @AGENTS.md indirection) giving anyone opening this directory a short orientation note plus the "don't hand-edit the JSON files" warning that JSON has no comment syntax to carry. - Update tests/test_advanced_queries.py for the rename and add coverage that SystemErrors stays out of the match-query cache. Runtime dict shape and metric names are byte-identical to before; verified by diffing the four module-attribute dumps against the pre-refactor master.

Review from iliakur is dismissed. Related teams and files:

agent-integrations
- clickhouse/AGENTS.md
- clickhouse/CLAUDE.md
- clickhouse/datadog_checks/clickhouse/advanced_queries/init.py
- clickhouse/tests/test_advanced_queries.py

dd-octo-sts · 2026-05-26T17:03:06Z

Validation Report

All 21 validations passed.

Show details

Validation	Description	Status
`agent-reqs`	Verify check versions match the Agent requirements file	✅
`ci`	Validate CI configuration and Codecov settings	✅
`codeowners`	Validate every integration has a CODEOWNERS entry	✅
`config`	Validate default configuration files against spec.yaml	✅
`dep`	Verify dependency pins are consistent and Agent-compatible	✅
`http`	Validate integrations use the HTTP wrapper correctly	✅
`imports`	Validate check imports do not use deprecated modules	✅
`integration-style`	Validate check code style conventions	✅
`jmx-metrics`	Validate JMX metrics definition files and config	✅
`labeler`	Validate PR labeler config matches integration directories	✅
`legacy-signature`	Validate no integration uses the legacy Agent check signature	✅
`license-headers`	Validate Python files have proper license headers	✅
`licenses`	Validate third-party license attribution list	✅
`metadata`	Validate metadata.csv metric definitions	✅
`models`	Validate configuration data models match spec.yaml	✅
`openmetrics`	Validate OpenMetrics integrations disable the metric limit	✅
`package`	Validate Python package metadata and naming	✅
`qa-label`	Validate the pull request declares whether it needs QA for the next Agent release	✅
`readmes`	Validate README files have required sections	✅
`saved-views`	Validate saved view JSON file structure and fields	✅
`version`	Validate version consistency between package and changelog	✅

View full run

AAraKKe · 2026-05-27T08:55:47Z

Run quality gates manually with this commit and they now pass: https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/1715292102

…#23829) * Store ClickHouse advanced-queries metric definitions as JSON The four advanced_queries Python modules shipped ~250 KB of redundant dict literals (per-entry 'name' was always '<prefix>.<key>', and every entry repeated its type). Move that data to compact per-system-table JSON files under datadog_checks/clickhouse/data/ and build the QueryManager-shaped dicts at runtime. The check registers a check_initializations callable so the JSON files are parsed once on the first check run. Module attributes like advanced_queries.SystemMetrics remain available through __getattr__ backed by the same cache, so tests that read those attributes directly keep working. The metric generator emits the new JSON format directly; the three system_*.tpl templates and the four old Python modules are removed. * Add changelog entry * Simplify advanced_queries loader and tighten the generator - Replace the initializer(check) factory with a plain warm_cache callable; the closure never depended on the check instance. - Restore __all__ on the package so the public surface is explicit. - Rename _cache to cache per the AGENTS.md module-name rule. - Mirror Python's default AttributeError format in __getattr__. - Tighten return-type annotations on load, _build_items, and __getattr__. - Raise loudly in generate_queries when a metric type appears with mixed scaled/unscaled entries instead of silently producing a wrong container. - Reclass the changelog from .changed to .fixed: the refactor preserves runtime behaviour byte-for-byte, so a patch bump is the right semver. * Tighten the advanced_queries loader - Guard warm_cache with an explicit "key not in cache" check so the JSON files are read at most once per process, even when __getattr__ populated the cache first. - Wrap load() exceptions as RuntimeError so a missing or malformed data file produces an actionable message in the check-init failure. - Discriminate the two JSON shapes on positive presence of "columns" rather than absence of "items". - Reclass the changelog as .changed: this is a significant internal refactor, not a bug fix, so .fixed was misleading. * Drop system_errors from the generator and widen load() error wrap - Wrap KeyError in load()'s error path so a malformed JSON file (one that parses but is missing expected keys) raises the same context-rich RuntimeError as a missing or invalid file. - Remove SYSTEM_ERRORS_SPEC and generate_system_errors() from the generator. The system_errors data is static; the committed data/system_errors.json is the single source of truth, and the other three queries continue to be generator-driven from ClickHouse source. * Tighten the advanced_queries loader and rename the generator's Template - Comment the load() branch that handles system_errors so the asymmetry is named at the call site instead of requiring readers to audit the JSON files. - Widen load()'s except tuple with TypeError and AttributeError so a malformed JSON shape (e.g. items shipped as a list) still raises the wrapped RuntimeError with the file name rather than leaking the bare underlying exception. - Narrow _build_items's compact parameter type to dict[str, list[str] | dict[str, str]] to mirror the producer annotation in generate_metrics.py. - Rename the generator's Template dataclass to FileTemplate so it doesn't silently shadow string.Template if the stdlib import is ever reintroduced. * Rename module cache to _cache and drop the one-member Templates enum - Rename advanced_queries.cache to _cache so the underscore signals it as module-internal mutable state (PEP 8 module-private). The module's __all__ already advertises only the four System* names. - Rephrase the load() inline comment around the columns shortcut so it names the discriminating shape rather than the system_errors file, which would mislead if a second verbatim file were added later. - Replace the single-member Templates enum in the generator with a TESTS_METRICS_TEMPLATE constant; the enum was a vestige from when it held the three QUERY_* templates that now live in QUERY_SPECS. * Test the advanced_queries loader directly Adds tests/test_advanced_queries.py covering the new loader logic: - module-level __getattr__ resolution + caching - compact format: source/match column shape, sorted items, name derivation including the dotted-key edge case (jemalloc.epoch) - verbatim format: system_errors columns pass through with the boolean: true tag preserved - RuntimeError wrap on every malformed-JSON path the load() except tuple is meant to cover (missing file, invalid JSON, items as list, items as scalar, missing required keys) with the cause chain preserved - warm_cache populates every known name and is idempotent * Move changelog to fixed * Scope the advanced_queries loader to bulk match queries The advanced_queries package is now organised around one named pattern: the SQL-returns-(value, metric_name)-and-dispatches-via-lookup-table shape that SystemEvents, SystemMetrics, and SystemAsynchronousMetrics all share. The compact JSON files exist specifically to compress that pattern. - Rename load() to load_match_query(); _build_items() to _expand_match_items(); NAMES to MATCH_QUERIES; _cache to _match_query_cache. Names now say what the loader does. - Inline SystemErrors as a plain Python literal in __init__.py. Its shape doesn't fit the bulk-match pattern, so the JSON compression has nothing to compress; data/system_errors.json is removed and the verbatim-columns branch in load() goes away with it. - Add a top-level docstring that describes the JSON schema, names the generator that produces the files, and points operators at the hatch run metrics:generate command. - Add clickhouse/AGENTS.md (with a CLAUDE.md @AGENTS.md indirection) giving anyone opening this directory a short orientation note plus the "don't hand-edit the JSON files" warning that JSON has no comment syntax to carry. - Update tests/test_advanced_queries.py for the rename and add coverage that SystemErrors stays out of the match-query cache. Runtime dict shape and metric names are byte-identical to before; verified by diffing the four module-attribute dumps against the pre-refactor master. d6365cc

AAraKKe added the qa/skip-qa Automatically skip this PR for the next QA label May 25, 2026

dd-octo-sts Bot added the integration/clickhouse label May 25, 2026

Add changelog entry

d7fd9e6

AAraKKe marked this pull request as ready for review May 25, 2026 15:06

AAraKKe requested review from a team as code owners May 25, 2026 15:06

dd-octo-sts Bot added team/agent-integrations team/database-monitoring-agent labels May 25, 2026

AAraKKe added 5 commits May 25, 2026 17:40

dkirov-dd previously approved these changes May 26, 2026

View reviewed changes

iliakur requested changes May 26, 2026

View reviewed changes

Move changelog to fixed

d4f460f

iliakur previously approved these changes May 26, 2026

View reviewed changes

AAraKKe added 2 commits May 26, 2026 14:37

Merge branch 'master' into aarakke/shrink-clickhouse-aq

5559e55

AAraKKe requested review from a team as code owners May 26, 2026 17:01

dd-octo-sts Bot added team/documentation team/database-monitoring labels May 26, 2026

jeff-morgan-dd self-assigned this May 26, 2026

jeff-morgan-dd approved these changes May 26, 2026

View reviewed changes

sethsamuel approved these changes May 28, 2026

View reviewed changes

AAraKKe added this pull request to the merge queue May 28, 2026

Merged via the queue into master with commit d6365cc May 28, 2026
63 of 68 checks passed

AAraKKe deleted the aarakke/shrink-clickhouse-aq branch May 28, 2026 13:21

dd-octo-sts Bot added this to the 7.81.0 milestone May 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store ClickHouse advanced-queries metric definitions as JSON#23829

Store ClickHouse advanced-queries metric definitions as JSON#23829
AAraKKe merged 11 commits into
masterfrom
aarakke/shrink-clickhouse-aq

AAraKKe commented May 25, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

codecov Bot commented May 25, 2026 •

edited

Loading

Uh oh!

datadog-official Bot commented May 25, 2026 •

edited by datadog-prod-us1-6 Bot

Loading

Uh oh!

iliakur May 26, 2026

Uh oh!

AAraKKe May 26, 2026

Uh oh!

dd-octo-sts Bot commented May 26, 2026

Uh oh!

AAraKKe commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		@@ -0,0 +1 @@
		Store advanced-queries metric definitions as JSON loaded on first check run.

Conversation

AAraKKe commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Size reduction

Review checklist (to be filled by reviewers)

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

codecov Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

datadog-official Bot commented May 25, 2026 • edited by datadog-prod-us1-6 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iliakur May 26, 2026

Choose a reason for hiding this comment

Uh oh!

AAraKKe May 26, 2026

Choose a reason for hiding this comment

Uh oh!

dd-octo-sts Bot commented May 26, 2026

Validation Report

Uh oh!

AAraKKe commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

AAraKKe commented May 25, 2026 •

edited

Loading

codecov Bot commented May 25, 2026 •

edited

Loading

datadog-official Bot commented May 25, 2026 •

edited by datadog-prod-us1-6 Bot

Loading