Skip to content

feat: filters are not search aware by default; accelerated by MV#2272

Open
knudtty wants to merge 9 commits into
mainfrom
aaron/filters-mv-usage
Open

feat: filters are not search aware by default; accelerated by MV#2272
knudtty wants to merge 9 commits into
mainfrom
aaron/filters-mv-usage

Conversation

@knudtty
Copy link
Copy Markdown
Contributor

@knudtty knudtty commented May 13, 2026

Summary

Filters now are not search aware by default. They can/will use metadata MVs if available.

How to test on local

Steps:

  1. Start up a fresh HyperDX Instance via yarn dev
  2. View the filters loading faster
  3. Adjust the filters value in the team settings

References

  • Linear Issue: Closes HDX-4227

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 13, 2026

🦋 Changeset detected

Latest commit: 8c474a1

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 4 packages
Name Type
@hyperdx/common-utils Patch
@hyperdx/api Patch
@hyperdx/app Patch
@hyperdx/otel-collector Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented May 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment May 15, 2026 2:00pm

Request Review

@github-actions github-actions Bot added the review/tier-3 Standard — full human review required label May 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

🔴 Tier 4 — Critical

Touches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD.

Why this tier:

  • Critical-path files (1):
    • packages/api/src/routers/api/team.ts
  • Cross-layer change: touches frontend (packages/app) + backend (packages/api) + shared utils (packages/common-utils)

Review process: Deep review from a domain expert. Synchronous walkthrough may be required.
SLA: Schedule synchronous review within 2 business days.

Stats
  • Production files changed: 17
  • Production lines changed: 813 (+ 318 in test files, excluded from tier calculation)
  • Branch: aaron/filters-mv-usage
  • Author: knudtty

To override this classification, remove the review/tier-4 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

E2E Test Results

All tests passed • 176 passed • 3 skipped • 1251s

Status Count
✅ Passed 176
❌ Failed 0
⚠️ Flaky 5
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

PR Review

  • ⚠️ useAllFieldsAndValues queryKey omits timestampValueExpression (packages/app/src/hooks/useMetadata.tsx:429-439) but the value is threaded into getAllFieldsAndValuesgetMapValues/getAllFields fallback, which uses it to build the time filter. If the source's timestampValueExpression changes while DB/table/connection stay constant, React Query will serve stale data. → Add timestampValueExpression to the queryKey.
  • ⚠️ showAllValues uses useState(true) (packages/app/src/components/DBSearchPageFilters.tsx:1128) while every sibling filter-settings toggle in the same component (isSharedFiltersVisible, showFilterCounts, isFiltersExpanded, isSharedFiltersExpanded) uses useLocalStorage. The preference will reset on every reload. → Switch to useLocalStorage with a stable key (e.g. hdx-filters-show-all-values).
  • ⚠️ Cache key for getMapKeys/getMapValues interpolates raw dateRange.getTime() values when metadataMVs isn't available (packages/common-utils/src/core/metadata.ts:1302-1312, 1405-1409), unlike the rollup branch which aligns to granularity. Every distinct millisecond-precision dateRange (e.g. on tail refresh) creates a new in-process cache entry → unbounded growth in long-lived sessions. → Either align the suffix to a coarser bucket or rely solely on the React Query cache here.
  • ℹ️ TeamClickHouseSettingsUpdateSchema allows null to unset; the controller iterates Object.entries(settings) and builds $set/$unset dynamically. This is safe because Zod's default .object() strips unknown keys before reaching the controller — worth a brief comment to make that invariant explicit for future readers, but not a blocker.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

Deep Review

🔴 P0/P1 — must fix

  • packages/common-utils/src/core/metadata.ts:386-541getMapKeys uses the same cacheKey for both the rollup path (line 388, this.cache.getOrFetch<string[]>(cacheKey, ...)) and the main-table fallback (line 512), so when the rollup returns [] (empty rollup, no data in window, freshly-enabled MVs, or the caught error at line 425), the cached empty array shadows the fallback and the fallback fn never runs.
    • Fix: Use a distinct cache key for the fallback path (e.g., append .fallback), or use cache.get/cache.set instead of getOrFetch on line 512 so the rollup's empty result does not poison the fallback lookup.
    • correctness, adversarial
  • packages/common-utils/src/core/metadata.ts:1368-1395getAllKeyValues removed the prior try/catch that swallowed rollup query failures and returned [], so a rollup timeout (max_execution_time: 15) or ClickHouse error now propagates out to the UI as a red notification with an empty filter panel instead of falling back to per-key getMapValues.
    • Fix: Wrap the rollup query in try/catch (or wrap the entire if (metadataMVs && dateRange) block) and on failure fall through to the no-rollup branch that fans out to getMapValues.
    • reliability, correctness

🟡 P2 — recommended

  • packages/common-utils/src/core/metadata.ts:1502-1542getAllFieldsAndValues rollup query has no error handling; a thrown rollup query (15-30s timeout or CH error) bypasses the intended getAllFields + getMapValues fallback at line 1544 and surfaces as a user-visible error.
    • Fix: Wrap the rollup getOrFetch call in try/catch and treat thrown rollup queries the same as empty results.
    • reliability
  • packages/app/src/components/DBSearchPageFilters.tsx:1128-1418extraFacets populated by loadMoreFilterValuesForKey in all mode (unfiltered values from rollup) persists when the user toggles showAllValues off, so exact mode then renders unfiltered values mixed with query-filtered ones with no visual distinction; clicking one of those entries produces a filter that yields zero results.
    • Fix: Clear extraFacets (and loadMoreLoadingKeys) on filterMode change, or partition extraFacets by mode (e.g. { all: {...}, exact: {...} }).
    • adversarial
  • packages/common-utils/src/core/metadata.ts:1502-1514getAllFieldsAndValues orders by ColumnIdentifier = 'NativeColumn' DESC, ColumnIdentifier, Key then applies LIMIT maxKeys, so on a logs source with several map columns (LogAttributes, ResourceAttributes, ScopeAttributes, SpanAttributes) and maxKeys = 100, alphabetically-later columns (e.g., SpanAttributes) are dropped entirely from filter discovery and the user has no way to reach them in all mode (keysToFetch is [] in that branch).
    • Fix: Order by sum(count) DESC for the secondary sort so high-volume keys are retained under LIMIT, and consider topK(maxValuesPerKey)(Value, count) over groupUniqArray so per-key values surface in frequency order rather than non-deterministic insertion order.
    • adversarial
  • packages/app/src/components/DBSearchPageFilters.tsx:1318-1361loadMoreFilterValuesForKey only calls console.error on failure; the loading spinner clears in finally and the user sees no notification, no inline error, no retry — the button returns to idle as though there were simply no more values.
    • Fix: Surface failures via notifications.show({ color: 'red', ... }) matching the existing error handler at line 1196, or set inline error state for the key.
    • reliability
  • packages/app/src/hooks/useMetadata.tsx:428-439useAllFieldsAndValues passes timestampValueExpression to getAllFieldsAndValues (line 450) where it shapes the fallback SQL, but omits it from the React Query queryKey; two callers sharing (db, table, connection, dateRange) with different timestampValueExpression collide and the second receives stale data. Sibling useMultipleAllFields correctly includes it (line 145).
    • Fix: Add timestampValueExpression to the queryKey array.
    • correctness, kieran-typescript, api-contract
  • packages/common-utils/src/types.ts:1148-1163TeamClickHouseSettingsUpdateSchema is a hand-copied parallel of TeamClickHouseSettingsSchema with .nullable() added to each field; adding a new ClickHouse setting requires touching three places that must stay in lock-step (base schema, update schema, the inline mutation generic in packages/app/src/api.ts:367-374), and z.object() silently strips unknown keys, so a forgotten field would cause PATCH /clickhouse-settings to return 200 with no effect.
    • Fix: Derive the update schema from the base (e.g., extend or transform every field to .nullable().optional()) and re-export the inferred type for api.ts to consume instead of redefining.
    • api-contract, kieran-typescript, maintainability
  • packages/common-utils/src/core/metadata.ts:1534-1574${row.ColumnIdentifier}['${row.Key}'] reconstructs bracket-notation key expressions without escaping the inner Key, so an OTel/log attribute key containing ' or \ (legal — attribute keys are arbitrary strings) produces a string that parseKeyPath (line 1773) then misparses, causing the Load-More callback to issue a lookup with a truncated Key and silently return empty.
    • Fix: Escape ' and \ when building the bracket form, and make parseKeyPath symmetric — or return a structured {column, mapKey} representation and only stringify at the SQL boundary with proper parameterization.
    • adversarial
  • packages/common-utils/src/core/metadata.ts:633-725getMapValues sets max_rows_to_read but no max_execution_time, while the rollup paths it backs (getAllKeyValues, getAllFieldsAndValues) set 15-30s caps; when the fallback fans out (up to maxKeys parallel SELECT DISTINCT scans on the main table when the rollup is cold or empty), a single panel render can occupy ClickHouse threads for minutes.
    • Fix: Add max_execution_time + timeout_overflow_mode: 'break' to the getMapValues query options consistent with the rollup queries.
    • reliability
  • packages/common-utils/src/core/metadata.ts:1419-1449getAllKeyValues forwards signal to the rollup query but the fallback getMapValues calls inside Promise.all do not accept a signal and never abort; when React Query cancels (mode switch, unmount, dateRange change) the fallback queries continue running against ClickHouse.
    • Fix: Add signal?: AbortSignal to getMapValues and propagate it through both fallback branches in getAllKeyValues and getAllFieldsAndValues.
    • correctness, reliability
  • packages/common-utils/src/__tests__/metadata.test.ts — The PR adds new public methods (getAllFieldsAndValues) and changes the signature of getAllKeyValues to batched (keyExpressions[]{key,value}[]), but neither method is exercised by tests; the rollup-empty → per-key getMapValues fallback inside getAllKeyValues is the exact path that exhibits the cache-collision and try/catch-removal regressions above and has no coverage.
    • Fix: Add tests for getAllFieldsAndValues (rollup happy path, rollup-empty fallback, native-vs-map key reconstruction) and for getAllKeyValues (batched IN-tuple SQL, per-key fallback fan-out, empty keyExpressions early return, rollup throws → fallback fires).
    • testing, correctness, kieran-typescript
  • packages/api/src/controllers/team.ts:94-114updateTeamClickhouseSettings now branches on null$unset vs value → $set, but no test covers it; a regression that flipped null to $set: null (writing null into Mongo) or treated undefined as $unset (clobbering existing settings on partial updates) would not be caught.
    • Fix: Add controller-level tests asserting that { key: null } yields $unset and { key: value } yields $set, plus a router integration test for PATCH /clickhouse-settings confirming reset-to-default removes the persisted field.
    • testing, api-contract
🔵 P3 nitpicks (10)
  • packages/api/src/controllers/team.ts:94-114$set and $unset are typed as Record<string, any>, dropping the guarantees of the new TeamClickHouseSettingsUpdate type at the controller boundary.
    • Fix: Use mongoose.UpdateQuery<ITeam> (or narrower Partial<Record<keyof TeamClickHouseSettingsUpdate, ...>>) for the update accumulator.
  • packages/app/src/api.ts:363-381 — Mutation body for useUpdateClickhouseSettings is a fourth hand-rolled copy of the settings shape with no compile-time link to the zod schema.
    • Fix: Replace the inline body with TeamClickHouseSettingsUpdate imported from common-utils.
  • packages/app/src/hooks/useMetadata.tsx:421Omit<UseQueryOptions<any, Error>, 'queryKey'> widens TQueryFnData to any, allowing mismatched initialData/select to slip past type checking.
    • Fix: Use Omit<UseQueryOptions<{ key: string; value: string[] }[], Error>, 'queryKey'>.
  • packages/common-utils/src/core/metadata.ts:1773-1783parseKeyPath splits on the first [' and the trailing '], returning a malformed second component for multi-bracket inputs like A['b']['c'].
    • Fix: Either restrict input to single-bracket map keys explicitly, or parse all bracket segments and return string[] matching the expression structure.
  • packages/common-utils/src/core/metadata.ts:386-1492 — The five-line "get aligned date range + start/end bucket expressions + Timestamp BETWEEN" block is duplicated across getMapKeys, getAllKeyValues, and getAllFieldsAndValues with subtle leading-AND differences.
    • Fix: Extract a buildRollupTimeFilter(metadataMVs, dateRange, { withLeadingAnd }) helper returning { alignedDateRange, timeFilter }.
  • packages/app/src/components/DBSearchPageFilters.tsx:1128-1373 — The showAllValues toggle now drives three parallel hooks, two merged loading flags, two error sources, and a dual-branch loadMoreFilterValuesForKey inside an already 2000-line component.
    • Fix: Extract a useFilterFacets({ mode, ... }) hook that owns both pipelines and returns one { facets, isLoading, isFetching, error, loadMore } surface.
  • packages/app/src/components/TeamSettings/TeamQueryConfigSection.tsx:325-329 — The "Filter Keys Fetch Limit" UI hard-codes defaultValue={DEFAULT_FILTER_KEYS_FETCH_LIMIT} (20) and displays "20 keys" even when the actual runtime default for MV-enabled sources is DEFAULT_FILTER_KEYS_FETCH_LIMIT_WITH_MVS (100), so admins see a value that does not match what their searches use.
    • Fix: Surface the effective default (MV-aware) in displayValue, or remove the per-source default branching and centralize one team-level default.
  • packages/app/src/hooks/useMetadata.tsx:424-465useAllFieldsAndValues fires before me loads (fieldMetadataDisabled defaults to false while loading), so a team with the setting on still issues one wasted query at mount; sibling useMultipleGetKeyValues correctly gates on !isLoadingMe.
    • Fix: Add !isLoadingMe (or isFetched) to the enabled predicate.
  • packages/common-utils/src/core/metadata.ts:370-373, 652-656 — The ${dateRange[0].getTime()}-${dateRange[1].getTime()}-${timestampValueExpression} cache-key construction is duplicated inline in getMapKeys and getMapValues.
    • Fix: Extract buildDateRangeCacheSuffix(dateRange, timestampValueExpression) so the format is defined once.
  • packages/common-utils/src/core/metadata.ts:370-378, 652-656 — When dateRange is provided but timestampValueExpression is missing (or vice versa), the cache-key suffix is empty and the SQL omits the time filter; two semantically different calls (Today vs LastMonth, both lacking timestampValueExpression) collide on the same cache slot, and the result is unfiltered across the entire table.
    • Fix: Require both together at the type level (e.g., merge into a single optional { dateRange, timestampValueExpression } object), or include dateRange in the cache key unconditionally.

Reviewers (9): correctness, testing, maintainability, project-standards, performance, api-contract, reliability, adversarial, kieran-typescript, security.

Testing gaps:

  • No tests for getAllFieldsAndValues (rollup happy path, rollup-empty fallback, native-vs-map key reconstruction).
  • No tests for the new batched getAllKeyValues signature, per-key fallback, or cache-key collision properties.
  • No controller unit tests for updateTeamClickhouseSettings $set / $unset / mixed branches.
  • No router integration test for PATCH /clickhouse-settings confirming null$unset persists.
  • No coverage for the showAllValues / filterMode toggle, the merged loading/error flags, or dual-mode loadMoreFilterValuesForKey.
  • useAutoCompleteOptions test mocks useSource to undefined only — the populated-source path that this PR adds is never exercised.

@github-actions github-actions Bot added review/tier-4 Critical — deep review + domain expert sign-off and removed review/tier-3 Standard — full human review required labels May 14, 2026
@knudtty knudtty requested review from a team and teeohhem and removed request for a team May 15, 2026 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/tier-4 Critical — deep review + domain expert sign-off

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant