Store ClickHouse advanced-queries metric definitions as JSON#23829
Conversation
The four advanced_queries Python modules shipped ~250 KB of redundant dict literals (per-entry 'name' was always '<prefix>.<key>', and every entry repeated its type). Move that data to compact per-system-table JSON files under datadog_checks/clickhouse/data/ and build the QueryManager-shaped dicts at runtime. The check registers a check_initializations callable so the JSON files are parsed once on the first check run. Module attributes like advanced_queries.SystemMetrics remain available through __getattr__ backed by the same cache, so tests that read those attributes directly keep working. The metric generator emits the new JSON format directly; the three system_*.tpl templates and the four old Python modules are removed.
|
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files🚀 New features to boost your workflow:
|
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: a9c1a57 | Docs | Datadog PR Page | Give us feedback! |
- Replace the initializer(check) factory with a plain warm_cache callable; the closure never depended on the check instance. - Restore __all__ on the package so the public surface is explicit. - Rename _cache to cache per the AGENTS.md module-name rule. - Mirror Python's default AttributeError format in __getattr__. - Tighten return-type annotations on load, _build_items, and __getattr__. - Raise loudly in generate_queries when a metric type appears with mixed scaled/unscaled entries instead of silently producing a wrong container. - Reclass the changelog from .changed to .fixed: the refactor preserves runtime behaviour byte-for-byte, so a patch bump is the right semver.
- Guard warm_cache with an explicit "key not in cache" check so the JSON files are read at most once per process, even when __getattr__ populated the cache first. - Wrap load() exceptions as RuntimeError so a missing or malformed data file produces an actionable message in the check-init failure. - Discriminate the two JSON shapes on positive presence of "columns" rather than absence of "items". - Reclass the changelog as .changed: this is a significant internal refactor, not a bug fix, so .fixed was misleading.
- Wrap KeyError in load()'s error path so a malformed JSON file (one that parses but is missing expected keys) raises the same context-rich RuntimeError as a missing or invalid file. - Remove SYSTEM_ERRORS_SPEC and generate_system_errors() from the generator. The system_errors data is static; the committed data/system_errors.json is the single source of truth, and the other three queries continue to be generator-driven from ClickHouse source.
- Comment the load() branch that handles system_errors so the asymmetry is named at the call site instead of requiring readers to audit the JSON files. - Widen load()'s except tuple with TypeError and AttributeError so a malformed JSON shape (e.g. items shipped as a list) still raises the wrapped RuntimeError with the file name rather than leaking the bare underlying exception. - Narrow _build_items's compact parameter type to dict[str, list[str] | dict[str, str]] to mirror the producer annotation in generate_metrics.py. - Rename the generator's Template dataclass to FileTemplate so it doesn't silently shadow string.Template if the stdlib import is ever reintroduced.
- Rename advanced_queries.cache to _cache so the underscore signals it as module-internal mutable state (PEP 8 module-private). The module's __all__ already advertises only the four System* names. - Rephrase the load() inline comment around the columns shortcut so it names the discriminating shape rather than the system_errors file, which would mislead if a second verbatim file were added later. - Replace the single-member Templates enum in the generator with a TESTS_METRICS_TEMPLATE constant; the enum was a vestige from when it held the three QUERY_* templates that now live in QUERY_SPECS.
Adds tests/test_advanced_queries.py covering the new loader logic: - module-level __getattr__ resolution + caching - compact format: source/match column shape, sorted items, name derivation including the dotted-key edge case (jemalloc.epoch) - verbatim format: system_errors columns pass through with the boolean: true tag preserved - RuntimeError wrap on every malformed-JSON path the load() except tuple is meant to cover (missing file, invalid JSON, items as list, items as scalar, missing required keys) with the cause chain preserved - warm_cache populates every known name and is idempotent
| @@ -0,0 +1 @@ | |||
| Store advanced-queries metric definitions as JSON loaded on first check run. | |||
There was a problem hiding this comment.
If this doesn't break existing customers, we need a different changelog format. fixed suggests itself with disk usage optimization stated as a clear goal.
If there's no actual change to behavior and we only refactor to shave off disk usage, I'd even consider no changelog to be fine.
There was a problem hiding this comment.
True, didn't notice the changed. I focused on the fix and completely missed this was wrong. Updated.
Review from dkirov-dd is dismissed. Related teams and files:
- agent-integrations
- clickhouse/changelog.d/23829.fixed
The advanced_queries package is now organised around one named pattern: the SQL-returns-(value, metric_name)-and-dispatches-via-lookup-table shape that SystemEvents, SystemMetrics, and SystemAsynchronousMetrics all share. The compact JSON files exist specifically to compress that pattern. - Rename load() to load_match_query(); _build_items() to _expand_match_items(); NAMES to MATCH_QUERIES; _cache to _match_query_cache. Names now say what the loader does. - Inline SystemErrors as a plain Python literal in __init__.py. Its shape doesn't fit the bulk-match pattern, so the JSON compression has nothing to compress; data/system_errors.json is removed and the verbatim-columns branch in load() goes away with it. - Add a top-level docstring that describes the JSON schema, names the generator that produces the files, and points operators at the hatch run metrics:generate command. - Add clickhouse/AGENTS.md (with a CLAUDE.md @AGENTS.md indirection) giving anyone opening this directory a short orientation note plus the "don't hand-edit the JSON files" warning that JSON has no comment syntax to carry. - Update tests/test_advanced_queries.py for the rename and add coverage that SystemErrors stays out of the match-query cache. Runtime dict shape and metric names are byte-identical to before; verified by diffing the four module-attribute dumps against the pre-refactor master.
Review from iliakur is dismissed. Related teams and files:
- agent-integrations
- clickhouse/AGENTS.md
- clickhouse/CLAUDE.md
- clickhouse/datadog_checks/clickhouse/advanced_queries/init.py
- clickhouse/tests/test_advanced_queries.py
Validation ReportAll 21 validations passed. Show details
|
|
Run quality gates manually with this commit and they now pass: https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/1715292102 |
…#23829) * Store ClickHouse advanced-queries metric definitions as JSON The four advanced_queries Python modules shipped ~250 KB of redundant dict literals (per-entry 'name' was always '<prefix>.<key>', and every entry repeated its type). Move that data to compact per-system-table JSON files under datadog_checks/clickhouse/data/ and build the QueryManager-shaped dicts at runtime. The check registers a check_initializations callable so the JSON files are parsed once on the first check run. Module attributes like advanced_queries.SystemMetrics remain available through __getattr__ backed by the same cache, so tests that read those attributes directly keep working. The metric generator emits the new JSON format directly; the three system_*.tpl templates and the four old Python modules are removed. * Add changelog entry * Simplify advanced_queries loader and tighten the generator - Replace the initializer(check) factory with a plain warm_cache callable; the closure never depended on the check instance. - Restore __all__ on the package so the public surface is explicit. - Rename _cache to cache per the AGENTS.md module-name rule. - Mirror Python's default AttributeError format in __getattr__. - Tighten return-type annotations on load, _build_items, and __getattr__. - Raise loudly in generate_queries when a metric type appears with mixed scaled/unscaled entries instead of silently producing a wrong container. - Reclass the changelog from .changed to .fixed: the refactor preserves runtime behaviour byte-for-byte, so a patch bump is the right semver. * Tighten the advanced_queries loader - Guard warm_cache with an explicit "key not in cache" check so the JSON files are read at most once per process, even when __getattr__ populated the cache first. - Wrap load() exceptions as RuntimeError so a missing or malformed data file produces an actionable message in the check-init failure. - Discriminate the two JSON shapes on positive presence of "columns" rather than absence of "items". - Reclass the changelog as .changed: this is a significant internal refactor, not a bug fix, so .fixed was misleading. * Drop system_errors from the generator and widen load() error wrap - Wrap KeyError in load()'s error path so a malformed JSON file (one that parses but is missing expected keys) raises the same context-rich RuntimeError as a missing or invalid file. - Remove SYSTEM_ERRORS_SPEC and generate_system_errors() from the generator. The system_errors data is static; the committed data/system_errors.json is the single source of truth, and the other three queries continue to be generator-driven from ClickHouse source. * Tighten the advanced_queries loader and rename the generator's Template - Comment the load() branch that handles system_errors so the asymmetry is named at the call site instead of requiring readers to audit the JSON files. - Widen load()'s except tuple with TypeError and AttributeError so a malformed JSON shape (e.g. items shipped as a list) still raises the wrapped RuntimeError with the file name rather than leaking the bare underlying exception. - Narrow _build_items's compact parameter type to dict[str, list[str] | dict[str, str]] to mirror the producer annotation in generate_metrics.py. - Rename the generator's Template dataclass to FileTemplate so it doesn't silently shadow string.Template if the stdlib import is ever reintroduced. * Rename module cache to _cache and drop the one-member Templates enum - Rename advanced_queries.cache to _cache so the underscore signals it as module-internal mutable state (PEP 8 module-private). The module's __all__ already advertises only the four System* names. - Rephrase the load() inline comment around the columns shortcut so it names the discriminating shape rather than the system_errors file, which would mislead if a second verbatim file were added later. - Replace the single-member Templates enum in the generator with a TESTS_METRICS_TEMPLATE constant; the enum was a vestige from when it held the three QUERY_* templates that now live in QUERY_SPECS. * Test the advanced_queries loader directly Adds tests/test_advanced_queries.py covering the new loader logic: - module-level __getattr__ resolution + caching - compact format: source/match column shape, sorted items, name derivation including the dotted-key edge case (jemalloc.epoch) - verbatim format: system_errors columns pass through with the boolean: true tag preserved - RuntimeError wrap on every malformed-JSON path the load() except tuple is meant to cover (missing file, invalid JSON, items as list, items as scalar, missing required keys) with the cause chain preserved - warm_cache populates every known name and is idempotent * Move changelog to fixed * Scope the advanced_queries loader to bulk match queries The advanced_queries package is now organised around one named pattern: the SQL-returns-(value, metric_name)-and-dispatches-via-lookup-table shape that SystemEvents, SystemMetrics, and SystemAsynchronousMetrics all share. The compact JSON files exist specifically to compress that pattern. - Rename load() to load_match_query(); _build_items() to _expand_match_items(); NAMES to MATCH_QUERIES; _cache to _match_query_cache. Names now say what the loader does. - Inline SystemErrors as a plain Python literal in __init__.py. Its shape doesn't fit the bulk-match pattern, so the JSON compression has nothing to compress; data/system_errors.json is removed and the verbatim-columns branch in load() goes away with it. - Add a top-level docstring that describes the JSON schema, names the generator that produces the files, and points operators at the hatch run metrics:generate command. - Add clickhouse/AGENTS.md (with a CLAUDE.md @AGENTS.md indirection) giving anyone opening this directory a short orientation note plus the "don't hand-edit the JSON files" warning that JSON has no comment syntax to carry. - Update tests/test_advanced_queries.py for the rename and add coverage that SystemErrors stays out of the match-query cache. Runtime dict shape and metric names are byte-identical to before; verified by diffing the four module-attribute dumps against the pre-refactor master. d6365cc
What does this PR do?
Moves the four
clickhouse/datadog_checks/clickhouse/advanced_queries/system_*.pymodules to compact JSON files underclickhouse/datadog_checks/clickhouse/data/. The check loads them once on first run viacheck_initializations. Runtime dict shape and emitted metric names are byte-identical to before.Motivation
advanced_queries/system_events.pyalone was ~172 KB, with the per-entry'name': '<prefix>.<key>'and'type': '<type>'fields repeated for every metric. Combined, the four modules shipped ~250 KB of redundant dict literals. JSON-with-derived-fields drops that to ~64 KB and removes the matching.pyccompanions, contributing to the agent's static-quality-gates budget.Size reduction
Installed footprint of
clickhouse/datadog_checks/clickhouse/(source.py+ bytecode.pyc+ data files, as the agent stores it on disk after install) measured againstmaster:.pysource.pycbytecode.jsondataThe
.pycdrop is the larger half of the win: Python'smarshalrepresentation of the ~1,600-entry dict literals across the four old modules is substantial. JSON ships as plain bytes with no bytecode companion.Review checklist (to be filled by reviewers)
qa/requiredif this PR needs QA validation, orqa/skip-qaif it does not. Exactly one of the two is required.backport/<branch-name>label to the PR and it will automatically open a backport PR once this one is mergedThis PR has been created and validated using the paired-review skill from agent-integrations. Ready for human review.