Skip to content

fix(common-utils): wrap Date partition columns in toDate() inside CTE-using queries (HDX-4247)#2291

Open
dhable wants to merge 4 commits into
mainfrom
claude/fix-pattern-date-pk-hdx-4247
Open

fix(common-utils): wrap Date partition columns in toDate() inside CTE-using queries (HDX-4247)#2291
dhable wants to merge 4 commits into
mainfrom
claude/fix-pattern-date-pk-hdx-4247

Conversation

@dhable
Copy link
Copy Markdown
Contributor

@dhable dhable commented May 15, 2026

Summary

Event Patterns (and several other CTE-using panels) silently return No results found against any source whose partition/primary key is a Date column, when the query window does not cross midnight UTC. This happens for the entire class of sources that follow the OpenTelemetry ClickHouse exporter default of partitioning by event_date / EventDate.

Root cause

In timeFilterExpr, the column-metadata lookup is skipped whenever the chart config contains a subquery CTE:

const columnMeta =
  hasSubqueryCte(withClauses) || toStartOf || isToDateExpr
    ? null
    : await metadata.getColumn({...});

The skip is too aggressive. A subquery CTE in WITH does not imply that FROM references the CTE — chart configs routinely inject a subquery CTE (e.g. pattern sampling) while the outer FROM still hits a real base table. When that base-table column is Date-typed, the lookup is the only signal that tells the filter to wrap bounds in toDate(...). Without it, ClickHouse promotes Date -> DateTime at midnight and any window not crossing midnight UTC returns zero rows.

Fix

Tighten the skip to (hasSubqueryCte(withClauses) && !databaseName), matching the established guard pattern already used twice in the same file (line 524 and line 842) — databaseName is the existing signal that FROM is a CTE alias rather than a real table. Also consolidate the lookup-skip and warning-suppression into one skipColumnLookup boolean so they cannot desync.

Behavioral diff:

withClauses databaseName Before After
subquery CTE unset (FROM is CTE) skip lookup skip lookup (unchanged)
subquery CTE set (FROM is real table) skip lookup (bug) do lookup (fixed)
no CTE / alias only set do lookup do lookup (unchanged)

Impact

  • Event Patterns panel now returns rows on Date-PK sources (primary motivation).
  • Silently improves any other path that calls timeFilterExpr with a subquery CTE against a base table — heatmap tile, value-distribution autocomplete, services dashboard.
  • No behavior change for sources whose primary key is DateTime / DateTime64.

Test

Added a regression test inside describe('timeFilterExpr', …) that pins the fix:

  • timestampValueExpression: 'date' (mocked as Date-typed)
  • databaseName: 'default', tableName: 'target_table' (real base table)
  • A subquery CTE in with: [{ name: 'tableStats', sql: ... }]
  • Asserts the generated SQL wraps both bounds in toDate(fromUnixTimestamp64Milli(...)).

The test fails on main and passes with this change.

Screenshots or video

N/A — non-UI change in @hyperdx/common-utils. The user-visible effect (Event Patterns rendering rows instead of "No results found") is downstream of this SQL generation.

How to test on Vercel preview

N/A — non-UI change. To exercise manually:

  1. Point HyperDX at a source whose partition key is a Date column (e.g. an OTel ClickHouse exporter table with event_date Date).
  2. Open the Event Patterns panel with a time range narrower than a full UTC day (e.g. last 15 minutes).
  3. Confirm patterns now render. Without the fix the panel shows "No results found".

References

…-using queries (HDX-4247)

When a chart config injects a subquery CTE (e.g. for pattern sampling) into
the outer query against a base table whose partition key includes a Date
column, the column-metadata lookup was skipped wholesale, leaving the time
filter without the required toDate() wrapper on the RHS. ClickHouse then
promotes Date -> DateTime at midnight, excluding the entire day's rows for
any window that doesn't include midnight UTC.

Pair the CTE skip with !databaseName (matching the established guard
pattern at renderChartConfig.ts:524 and :842), and tie the warning to a
single computed skipColumnLookup boolean so the two conditions cannot
desync.

Fixes Event Patterns 'No results found', and silently improves heatmap,
value-distribution autocomplete, and services dashboard on Date-PK sources.
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 15, 2026

🦋 Changeset detected

Latest commit: 9c08e7f

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@hyperdx/common-utils Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented May 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment May 15, 2026 9:14pm

Request Review

@github-actions github-actions Bot added the review/tier-2 Low risk — AI review + quick human skim label May 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

🔵 Tier 2 — Low Risk

Small, isolated change with no API route or data model modifications.

Why this tier:

  • Standard feature/fix — introduces new logic or modifies core functionality

Additional context: agent branch (claude/fix-pattern-date-pk-hdx-4247) — change small enough to qualify for Tier 2

Review process: AI review + quick human skim (target: 5–15 min). Reviewer validates AI assessment and checks for domain-specific concerns.
SLA: Resolve within 4 business hours.

Stats
  • Production files changed: 1
  • Production lines changed: 35 (+ 39 in test files, excluded from tier calculation)
  • Branch: claude/fix-pattern-date-pk-hdx-4247
  • Author: dhable

To override this classification, remove the review/tier-2 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

<!-- claude-code-review -->

PR Review

✅ No critical issues found.

Targeted, well-scoped fix for HDX-4247. The skip condition is correctly tightened from hasSubqueryCte(withClauses) alone to hasSubqueryCte(withClauses) && !databaseName, so a subquery CTE in WITH no longer suppresses the base-table column lookup when FROM still references a real table. Consolidating into skipColumnLookup removes the prior risk of the lookup-skip and the warning-suppression desyncing.

Minor observations (non-blocking):

  • 📝 PR description says the new guard "matches the established guard pattern" at renderChartConfig.ts:524 and :842, but those use || (skip if either condition) while this one uses && (skip only if both). The shape differs intentionally — existing guards skip an optimization (safe in either case) while this one gates required type detection (only safe to skip when there's no base table) — so the logic is correct, just not an exact copy of the prior pattern. Worth tweaking the description for accuracy, no code change needed.
  • ✅ Regression test in renderChartConfig.test.ts:1740-1777 correctly exercises the previously-broken path: databaseName set + subquery CTE in with + Date-typed getColumn mock, asserting both bounds wrap in toDate(...). Relies on the beforeEach at :1666-1672 that returns Date for column name date — that linkage is implicit but consistent with the sibling test directly above it.
  • 🧪 No new behavior added for the (!hasSubqueryCte && !databaseName) shape, which is unchanged from before — metadata.getColumn is still called with databaseName: undefined and presumably returns nothing. Not introduced by this PR, just worth noting it's preserved.

Changeset entry, scope (@hyperdx/common-utils patch), and message are appropriate.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

E2E Test Results

All tests passed • 177 passed • 3 skipped • 1270s

Status Count
✅ Passed 177
❌ Failed 0
⚠️ Flaky 4
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

Deep Review

✅ No critical issues found. The narrowing of the skip condition is correct, the precedent at renderChartConfig.ts:524/:847 (intentionally) stays looser for the materialized-fields path, and the new test pins the regression. Two observability/reliability smells are worth a follow-up, and a few coverage gaps.

🟡 P2 — recommended

  • packages/common-utils/src/core/renderChartConfig.ts:714-721 — The metadata.getColumn call is now newly reachable when hasSubqueryCte(withClauses) && databaseName is truthy; it has no try/catch, so a transient ClickHouse failure on the DESCRIBE (network blip, in-flight schema change, auth-token rotation) will reject the promise and fail the chart render for Event Patterns, heatmap tile, services dashboard, and value-distribution autocomplete — all paths that previously short-circuited around this lookup.

    • Fix: Wrap the getColumn call in try/catch matching the getTableMetadata precedent at lines 671-686, and fall back to columnMeta = null on rejection so the filter degrades to DateTime semantics instead of failing the whole render.
    • ce-adversarial-reviewer, ce-correctness-reviewer
  • packages/common-utils/src/core/renderChartConfig.ts:727-731 — The consolidated console.warn('Column ${col} not found ...') now fires in a strictly broader population than before: any chart whose timestampValueExpression is a function-wrapped expression that isn't toStartOf* or toDate(...) (e.g. coalesce(EventDate, toDate(now())), fromUnixTimestamp64Milli(ts_ms), assumeNotNull(ts)) will now log a misleading "column not found" warning on every render, because the lookup is attempted with the full expression as the column name.

    • Fix: Either extend skipColumnLookup to short-circuit when col contains non-identifier characters (e.g. /[(\s,]/.test(col)`), or rewrite the warning text to say the expression isn't a bare column reference rather than implying a missing column.
    • ce-adversarial-reviewer, ce-kieran-typescript-reviewer
  • packages/common-utils/src/__tests__/renderChartConfig.test.ts:1740-1776 — The new test pins the happy path but leaves two adjacent branches of the new skipColumnLookup predicate unasserted: the getColumn-rejects failure mode (relevant to the P2 above) and the hasSubqueryCte && !databaseName "still-skip" branch that the fix relies on for unchanged behavior.

    • Fix: Add a sibling test that mocks mockMetadata.getColumn to reject and asserts the function still resolves with a degraded DateTime-form filter, plus a negative test with databaseName: '' and a subquery CTE that asserts the bounds are not wrapped in toDate(...).
    • ce-correctness-reviewer, ce-testing-reviewer, ce-kieran-typescript-reviewer
🔵 P3 nitpicks (3)
  • packages/common-utils/src/core/renderChartConfig.ts:710!databaseName treats '', undefined, null, 0, and false as "CTE alias", but the codebase convention is specifically the empty-string sentinel; external @hyperdx/common-utils consumers passing undefined against a CTE config would silently bypass the lookup again, reintroducing the very regression this PR fixes.

    • Fix: Tighten the guard to databaseName === '' to match the established sentinel exactly, and consider tightening the sibling guards at lines 524 and 847 in a follow-up.
    • ce-adversarial-reviewer, ce-kieran-typescript-reviewer
  • packages/common-utils/src/core/renderChartConfig.ts:703-708 — The comment block explains why the new && !databaseName form is correct but does not acknowledge that the parallel guards at lines 524 and 847 deliberately stay looser (||, not &&) for the materialized-fields lookup, so a reader cross-referencing the three sites will assume one of them is wrong.

    • Fix: Add one sentence to the comment noting that the materialized-fields guard at 524/847 intentionally keeps the looser shape because that path is an optimization, not a correctness signal.
    • ce-adversarial-reviewer, ce-maintainability-reviewer
  • packages/common-utils/src/__tests__/renderChartConfig.test.ts:21 — The suite-wide jest.spyOn(console, 'warn').mockImplementation(() => {}) swallows all warnings, so a future regression that drops the !skipColumnLookup guard and starts spamming the misleading warning above would pass silently.

    • Fix: Use a per-test jest.spyOn with mockClear() in the new test and assert expect(console.warn).not.toHaveBeenCalled() for the CTE-alias-skipped case.
    • ce-testing-reviewer

Reviewers (6): ce-correctness-reviewer, ce-testing-reviewer, ce-maintainability-reviewer, ce-kieran-typescript-reviewer, ce-adversarial-reviewer, ce-project-standards-reviewer.

Testing gaps:

  • No assertion that timeFilterExpr degrades gracefully when metadata.getColumn rejects.
  • No negative test pinning hasSubqueryCte && !databaseName → lookup still skipped.
  • No coverage for non-bare-column timestampValueExpression paths combined with a subquery CTE.

@github-actions
Copy link
Copy Markdown
Contributor

<!-- claude-code-review -->

PR Review

✅ No critical issues found.

Targeted, minimal fix with a regression test that fails on main and passes here. The skipColumnLookup consolidation correctly removes the desync risk between the lookup skip and the warning suppression (renderChartConfig.ts:709-731). The new test (renderChartConfig.test.ts:1740) properly relies on the nested beforeEach at :1666 that maps column === 'date' to { type: 'Date' }, so it exercises the real Date-type code path. No regression for sources whose primary key is DateTime/DateTime64, and the no-CTE path is unaffected.

Minor (non-blocking) nit:

  • 📝 PR description says the new guard "matches the established pattern" at L524/L847, but those use OR (hasSubqueryCte || !databaseName) while this uses AND (hasSubqueryCte && !databaseName). The spirit (using !databaseName as the CTE-alias signal) is similar, but the conditions are not equivalent — the materialized-fields lookups intentionally skip whenever any subquery CTE is present, while this time-filter lookup must run even with a subquery CTE as long as FROM is a real base table. Worth tightening the wording for future readers, but the code is correct.

@github-actions
Copy link
Copy Markdown
Contributor

PR Review

✅ No critical issues found.

Targeted, well-scoped fix. The new skipColumnLookup boolean correctly consolidates the lookup-skip and warning-suppression conditions so they can't desync, and the truth table in the PR description matches the implementation at renderChartConfig.ts:709-712. Regression test asserts the Date-PK + subquery-CTE + base-table case generates toDate(...) bounds.

Minor notes (non-blocking):

  • ℹ️ The PR description claims the fix "matches the established guard pattern" at lines 524/847, but those use hasSubqueryCte(...) || !databaseName (skip if either is true) while this fix uses && (skip only if both are true). The new logic is actually more precise for this use case — base-table column metadata can be looked up even when a subquery CTE exists — so the divergence is intentional, but the PR wording is misleading.
  • ℹ️ Consider a parallel test asserting the databaseName: undefined + subquery-CTE path still skips the lookup (the "unchanged" row in the truth table) to lock in both halves of the behavior. Optional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automerge review/tier-2 Low risk — AI review + quick human skim

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant