Skip to content

Store ClickHouse advanced-queries metric definitions as JSON#23829

Merged
AAraKKe merged 11 commits into
masterfrom
aarakke/shrink-clickhouse-aq
May 28, 2026
Merged

Store ClickHouse advanced-queries metric definitions as JSON#23829
AAraKKe merged 11 commits into
masterfrom
aarakke/shrink-clickhouse-aq

Conversation

@AAraKKe
Copy link
Copy Markdown
Collaborator

@AAraKKe AAraKKe commented May 25, 2026

What does this PR do?

Moves the four clickhouse/datadog_checks/clickhouse/advanced_queries/system_*.py modules to compact JSON files under clickhouse/datadog_checks/clickhouse/data/. The check loads them once on first run via check_initializations. Runtime dict shape and emitted metric names are byte-identical to before.

Motivation

advanced_queries/system_events.py alone was ~172 KB, with the per-entry 'name': '<prefix>.<key>' and 'type': '<type>' fields repeated for every metric. Combined, the four modules shipped ~250 KB of redundant dict literals. JSON-with-derived-fields drops that to ~64 KB and removes the matching .pyc companions, contributing to the agent's static-quality-gates budget.

Size reduction

Installed footprint of clickhouse/datadog_checks/clickhouse/ (source .py + bytecode .pyc + data files, as the agent stores it on disk after install) measured against master:

master this PR Δ
.py source 562,620 B 316,689 B −245,931 B (−240 KB)
.pyc bytecode 469,483 B 301,952 B −167,531 B (−164 KB)
.json data 0 64,044 B +64,044 B
other 19,913 B 19,913 B 0
total 1,052,016 B (1.00 MB) 702,598 B (686 KB) −349,418 B (−341 KB)

The .pyc drop is the larger half of the win: Python's marshal representation of the ~1,600-entry dict literals across the four old modules is substantial. JSON ships as plain bytes with no bytecode companion.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add qa/required if this PR needs QA validation, or qa/skip-qa if it does not. Exactly one of the two is required.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

This PR has been created and validated using the paired-review skill from agent-integrations. Ready for human review.

The four advanced_queries Python modules shipped ~250 KB of redundant
dict literals (per-entry 'name' was always '<prefix>.<key>', and every
entry repeated its type). Move that data to compact per-system-table
JSON files under datadog_checks/clickhouse/data/ and build the
QueryManager-shaped dicts at runtime.

The check registers a check_initializations callable so the JSON files
are parsed once on the first check run. Module attributes like
advanced_queries.SystemMetrics remain available through __getattr__
backed by the same cache, so tests that read those attributes directly
keep working.

The metric generator emits the new JSON format directly; the three
system_*.tpl templates and the four old Python modules are removed.
@AAraKKe AAraKKe added the qa/skip-qa Automatically skip this PR for the next QA label May 25, 2026
@github-actions
Copy link
Copy Markdown
Contributor

⚠️ Major version bump
The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.16%. Comparing base (ae6846b) to head (a9c1a57).
⚠️ Report is 19 commits behind head on master.

Additional details and impacted files
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@datadog-official
Copy link
Copy Markdown
Contributor

datadog-official Bot commented May 25, 2026

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 93.19%

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: a9c1a57 | Docs | Datadog PR Page | Give us feedback!

- Replace the initializer(check) factory with a plain warm_cache callable;
  the closure never depended on the check instance.
- Restore __all__ on the package so the public surface is explicit.
- Rename _cache to cache per the AGENTS.md module-name rule.
- Mirror Python's default AttributeError format in __getattr__.
- Tighten return-type annotations on load, _build_items, and __getattr__.
- Raise loudly in generate_queries when a metric type appears with mixed
  scaled/unscaled entries instead of silently producing a wrong container.
- Reclass the changelog from .changed to .fixed: the refactor preserves
  runtime behaviour byte-for-byte, so a patch bump is the right semver.
@AAraKKe AAraKKe marked this pull request as ready for review May 25, 2026 15:06
@AAraKKe AAraKKe requested review from a team as code owners May 25, 2026 15:06
AAraKKe added 5 commits May 25, 2026 17:40
- Guard warm_cache with an explicit "key not in cache" check so the
  JSON files are read at most once per process, even when __getattr__
  populated the cache first.
- Wrap load() exceptions as RuntimeError so a missing or malformed
  data file produces an actionable message in the check-init failure.
- Discriminate the two JSON shapes on positive presence of "columns"
  rather than absence of "items".
- Reclass the changelog as .changed: this is a significant internal
  refactor, not a bug fix, so .fixed was misleading.
- Wrap KeyError in load()'s error path so a malformed JSON file (one
  that parses but is missing expected keys) raises the same context-rich
  RuntimeError as a missing or invalid file.
- Remove SYSTEM_ERRORS_SPEC and generate_system_errors() from the
  generator. The system_errors data is static; the committed
  data/system_errors.json is the single source of truth, and the other
  three queries continue to be generator-driven from ClickHouse source.
- Comment the load() branch that handles system_errors so the asymmetry
  is named at the call site instead of requiring readers to audit the
  JSON files.
- Widen load()'s except tuple with TypeError and AttributeError so a
  malformed JSON shape (e.g. items shipped as a list) still raises the
  wrapped RuntimeError with the file name rather than leaking the bare
  underlying exception.
- Narrow _build_items's compact parameter type to
  dict[str, list[str] | dict[str, str]] to mirror the producer
  annotation in generate_metrics.py.
- Rename the generator's Template dataclass to FileTemplate so it
  doesn't silently shadow string.Template if the stdlib import is ever
  reintroduced.
- Rename advanced_queries.cache to _cache so the underscore signals
  it as module-internal mutable state (PEP 8 module-private). The
  module's __all__ already advertises only the four System* names.
- Rephrase the load() inline comment around the columns shortcut so it
  names the discriminating shape rather than the system_errors file,
  which would mislead if a second verbatim file were added later.
- Replace the single-member Templates enum in the generator with a
  TESTS_METRICS_TEMPLATE constant; the enum was a vestige from when it
  held the three QUERY_* templates that now live in QUERY_SPECS.
Adds tests/test_advanced_queries.py covering the new loader logic:
- module-level __getattr__ resolution + caching
- compact format: source/match column shape, sorted items, name
  derivation including the dotted-key edge case (jemalloc.epoch)
- verbatim format: system_errors columns pass through with the
  boolean: true tag preserved
- RuntimeError wrap on every malformed-JSON path the load() except
  tuple is meant to cover (missing file, invalid JSON, items as list,
  items as scalar, missing required keys) with the cause chain
  preserved
- warm_cache populates every known name and is idempotent
dkirov-dd
dkirov-dd previously approved these changes May 26, 2026
@@ -0,0 +1 @@
Store advanced-queries metric definitions as JSON loaded on first check run.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this doesn't break existing customers, we need a different changelog format. fixed suggests itself with disk usage optimization stated as a clear goal.

If there's no actual change to behavior and we only refactor to shave off disk usage, I'd even consider no changelog to be fine.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, didn't notice the changed. I focused on the fix and completely missed this was wrong. Updated.

@temporal-github-worker-1 temporal-github-worker-1 Bot dismissed dkirov-dd’s stale review May 26, 2026 10:59

Review from dkirov-dd is dismissed. Related teams and files:

  • agent-integrations
    • clickhouse/changelog.d/23829.fixed
iliakur
iliakur previously approved these changes May 26, 2026
AAraKKe added 2 commits May 26, 2026 14:37
The advanced_queries package is now organised around one named pattern:
the SQL-returns-(value, metric_name)-and-dispatches-via-lookup-table
shape that SystemEvents, SystemMetrics, and SystemAsynchronousMetrics
all share. The compact JSON files exist specifically to compress that
pattern.

- Rename load() to load_match_query(); _build_items() to
  _expand_match_items(); NAMES to MATCH_QUERIES; _cache to
  _match_query_cache. Names now say what the loader does.
- Inline SystemErrors as a plain Python literal in __init__.py. Its
  shape doesn't fit the bulk-match pattern, so the JSON compression has
  nothing to compress; data/system_errors.json is removed and the
  verbatim-columns branch in load() goes away with it.
- Add a top-level docstring that describes the JSON schema, names the
  generator that produces the files, and points operators at the
  hatch run metrics:generate command.
- Add clickhouse/AGENTS.md (with a CLAUDE.md @AGENTS.md indirection)
  giving anyone opening this directory a short orientation note plus
  the "don't hand-edit the JSON files" warning that JSON has no comment
  syntax to carry.
- Update tests/test_advanced_queries.py for the rename and add coverage
  that SystemErrors stays out of the match-query cache.

Runtime dict shape and metric names are byte-identical to before;
verified by diffing the four module-attribute dumps against the
pre-refactor master.
@AAraKKe AAraKKe requested review from a team as code owners May 26, 2026 17:01
@temporal-github-worker-1 temporal-github-worker-1 Bot dismissed iliakur’s stale review May 26, 2026 17:01

Review from iliakur is dismissed. Related teams and files:

  • agent-integrations
    • clickhouse/AGENTS.md
    • clickhouse/CLAUDE.md
    • clickhouse/datadog_checks/clickhouse/advanced_queries/init.py
    • clickhouse/tests/test_advanced_queries.py
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 26, 2026

Validation Report

All 21 validations passed.

Show details
Validation Description Status
agent-reqs Verify check versions match the Agent requirements file
ci Validate CI configuration and Codecov settings
codeowners Validate every integration has a CODEOWNERS entry
config Validate default configuration files against spec.yaml
dep Verify dependency pins are consistent and Agent-compatible
http Validate integrations use the HTTP wrapper correctly
imports Validate check imports do not use deprecated modules
integration-style Validate check code style conventions
jmx-metrics Validate JMX metrics definition files and config
labeler Validate PR labeler config matches integration directories
legacy-signature Validate no integration uses the legacy Agent check signature
license-headers Validate Python files have proper license headers
licenses Validate third-party license attribution list
metadata Validate metadata.csv metric definitions
models Validate configuration data models match spec.yaml
openmetrics Validate OpenMetrics integrations disable the metric limit
package Validate Python package metadata and naming
qa-label Validate the pull request declares whether it needs QA for the next Agent release
readmes Validate README files have required sections
saved-views Validate saved view JSON file structure and fields
version Validate version consistency between package and changelog

View full run

@jeff-morgan-dd jeff-morgan-dd self-assigned this May 26, 2026
@AAraKKe
Copy link
Copy Markdown
Collaborator Author

AAraKKe commented May 27, 2026

Run quality gates manually with this commit and they now pass: https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/1715292102

@AAraKKe AAraKKe added this pull request to the merge queue May 28, 2026
Merged via the queue into master with commit d6365cc May 28, 2026
63 of 68 checks passed
@AAraKKe AAraKKe deleted the aarakke/shrink-clickhouse-aq branch May 28, 2026 13:21
@dd-octo-sts dd-octo-sts Bot added this to the 7.81.0 milestone May 28, 2026
github-actions Bot pushed a commit to ConnectionMaster/integrations-core that referenced this pull request May 28, 2026
…#23829)

* Store ClickHouse advanced-queries metric definitions as JSON

The four advanced_queries Python modules shipped ~250 KB of redundant
dict literals (per-entry 'name' was always '<prefix>.<key>', and every
entry repeated its type). Move that data to compact per-system-table
JSON files under datadog_checks/clickhouse/data/ and build the
QueryManager-shaped dicts at runtime.

The check registers a check_initializations callable so the JSON files
are parsed once on the first check run. Module attributes like
advanced_queries.SystemMetrics remain available through __getattr__
backed by the same cache, so tests that read those attributes directly
keep working.

The metric generator emits the new JSON format directly; the three
system_*.tpl templates and the four old Python modules are removed.

* Add changelog entry

* Simplify advanced_queries loader and tighten the generator

- Replace the initializer(check) factory with a plain warm_cache callable;
  the closure never depended on the check instance.
- Restore __all__ on the package so the public surface is explicit.
- Rename _cache to cache per the AGENTS.md module-name rule.
- Mirror Python's default AttributeError format in __getattr__.
- Tighten return-type annotations on load, _build_items, and __getattr__.
- Raise loudly in generate_queries when a metric type appears with mixed
  scaled/unscaled entries instead of silently producing a wrong container.
- Reclass the changelog from .changed to .fixed: the refactor preserves
  runtime behaviour byte-for-byte, so a patch bump is the right semver.

* Tighten the advanced_queries loader

- Guard warm_cache with an explicit "key not in cache" check so the
  JSON files are read at most once per process, even when __getattr__
  populated the cache first.
- Wrap load() exceptions as RuntimeError so a missing or malformed
  data file produces an actionable message in the check-init failure.
- Discriminate the two JSON shapes on positive presence of "columns"
  rather than absence of "items".
- Reclass the changelog as .changed: this is a significant internal
  refactor, not a bug fix, so .fixed was misleading.

* Drop system_errors from the generator and widen load() error wrap

- Wrap KeyError in load()'s error path so a malformed JSON file (one
  that parses but is missing expected keys) raises the same context-rich
  RuntimeError as a missing or invalid file.
- Remove SYSTEM_ERRORS_SPEC and generate_system_errors() from the
  generator. The system_errors data is static; the committed
  data/system_errors.json is the single source of truth, and the other
  three queries continue to be generator-driven from ClickHouse source.

* Tighten the advanced_queries loader and rename the generator's Template

- Comment the load() branch that handles system_errors so the asymmetry
  is named at the call site instead of requiring readers to audit the
  JSON files.
- Widen load()'s except tuple with TypeError and AttributeError so a
  malformed JSON shape (e.g. items shipped as a list) still raises the
  wrapped RuntimeError with the file name rather than leaking the bare
  underlying exception.
- Narrow _build_items's compact parameter type to
  dict[str, list[str] | dict[str, str]] to mirror the producer
  annotation in generate_metrics.py.
- Rename the generator's Template dataclass to FileTemplate so it
  doesn't silently shadow string.Template if the stdlib import is ever
  reintroduced.

* Rename module cache to _cache and drop the one-member Templates enum

- Rename advanced_queries.cache to _cache so the underscore signals
  it as module-internal mutable state (PEP 8 module-private). The
  module's __all__ already advertises only the four System* names.
- Rephrase the load() inline comment around the columns shortcut so it
  names the discriminating shape rather than the system_errors file,
  which would mislead if a second verbatim file were added later.
- Replace the single-member Templates enum in the generator with a
  TESTS_METRICS_TEMPLATE constant; the enum was a vestige from when it
  held the three QUERY_* templates that now live in QUERY_SPECS.

* Test the advanced_queries loader directly

Adds tests/test_advanced_queries.py covering the new loader logic:
- module-level __getattr__ resolution + caching
- compact format: source/match column shape, sorted items, name
  derivation including the dotted-key edge case (jemalloc.epoch)
- verbatim format: system_errors columns pass through with the
  boolean: true tag preserved
- RuntimeError wrap on every malformed-JSON path the load() except
  tuple is meant to cover (missing file, invalid JSON, items as list,
  items as scalar, missing required keys) with the cause chain
  preserved
- warm_cache populates every known name and is idempotent

* Move changelog to fixed

* Scope the advanced_queries loader to bulk match queries

The advanced_queries package is now organised around one named pattern:
the SQL-returns-(value, metric_name)-and-dispatches-via-lookup-table
shape that SystemEvents, SystemMetrics, and SystemAsynchronousMetrics
all share. The compact JSON files exist specifically to compress that
pattern.

- Rename load() to load_match_query(); _build_items() to
  _expand_match_items(); NAMES to MATCH_QUERIES; _cache to
  _match_query_cache. Names now say what the loader does.
- Inline SystemErrors as a plain Python literal in __init__.py. Its
  shape doesn't fit the bulk-match pattern, so the JSON compression has
  nothing to compress; data/system_errors.json is removed and the
  verbatim-columns branch in load() goes away with it.
- Add a top-level docstring that describes the JSON schema, names the
  generator that produces the files, and points operators at the
  hatch run metrics:generate command.
- Add clickhouse/AGENTS.md (with a CLAUDE.md @AGENTS.md indirection)
  giving anyone opening this directory a short orientation note plus
  the "don't hand-edit the JSON files" warning that JSON has no comment
  syntax to carry.
- Update tests/test_advanced_queries.py for the rename and add coverage
  that SystemErrors stays out of the match-query cache.

Runtime dict shape and metric names are byte-identical to before;
verified by diffing the four module-attribute dumps against the
pre-refactor master. d6365cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants