Skip to content

Add programmatic search endpoint for logs/traces#2258

Open
vinzee wants to merge 1 commit into
hyperdxio:mainfrom
vinzee:va/programmatic-search-api
Open

Add programmatic search endpoint for logs/traces#2258
vinzee wants to merge 1 commit into
hyperdxio:mainfrom
vinzee:va/programmatic-search-api

Conversation

@vinzee
Copy link
Copy Markdown
Contributor

@vinzee vinzee commented May 11, 2026

Summary

Motivation

The existing external API (/api/v2/charts/series) only supports aggregated time-series queries. There is no REST API to retrieve individual log or trace rows - the only programmatic way to do this today is through the MCP tool (hyperdx_query displayType:"search"), which is scoped to LLM tooling and not usable from scripts, CI pipelines, or backend services.

This change fills this gap by adding POST /api/v2/search - a new external API endpoint for fetching raw log and trace rows programmatically using familiar HTTP semantics, without having to write raw ClickHouse SQL or manage a direct database connection.

Implementation Details

The endpoint mirrors the "search" panel mode in the HyperDX UI and routes through the same runConfigTile execution path used by the hyperdx_query MCP tool (displayType: "search"), so it automatically inherits all the same query optimizations:

  • Named attribute columns are rewritten to their indexed materialized column equivalents at query time.
  • Source-configured PREWHERE / partition pruning is applied.
  • Results are ordered by timestamp descending.

Also extends runConfigTile to accept an offset option (previously hardcoded to 0) to support pagination.

Screenshots or video

N/A - API-only change, no UI impact.

How to test

  1. Get a source ID: GET /api/api/v2/sources
  2. Query logs with Lucene:
curl -s -X POST http://localhost:8080/api/v2/search \
  -H "Authorization: Bearer $HDX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "sourceId": "<logs-source-id>",
    "startTime": "2026-05-10T00:00:00Z",
    "endTime":   "2026-05-10T01:00:00Z",
    "where": "SeverityText: \'ERROR\'",
    "whereLanguage": "lucene",
    "select": [
      { "valueExpression": "Timestamp" },
      { "valueExpression": "SeverityText" },
      { "valueExpression": "Body" }
    ],
    "orderBy": "Timestamp ASC",
    "maxResults": 100,
    "offset": 0
  }' | jq .
  1. Query Traces with SQL:
curl -s -X POST http://localhost:8080/api/v2/search \
  -H "Authorization: Bearer $HDX_API_KEY" \
  -H "Content-Type: application/json" \
  -d @- <<'EOF' | jq .
{
  "sourceId": "<traces-source-id>",
  "startTime": "2026-05-10T00:00:00Z",
  "endTime":   "2026-05-10T01:00:00Z",
  "where": "SpanAttributes['http.status_code'] = '500' AND Duration > 1000000000",
  "whereLanguage": "sql",
  "columns": "Timestamp,TraceId,SpanName,ServiceName,Duration",
  "maxResults": 100
}
EOF
  1. Verify the response shape is { "data": [...], "rows": N }.

References

  • Linear Issue: n/a
  • Related PRs: n/a

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 11, 2026

🦋 Changeset detected

Latest commit: 013eb62

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@hyperdx/api Minor
@hyperdx/app Minor
@hyperdx/otel-collector Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented May 11, 2026

@vinzee is attempting to deploy a commit to the HyperDX Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

PR Review

  • MCP hyperdx_query displayType:"search" regression in packages/api/src/mcp/tools/query/helpers.ts:181-195 — the searchOverrides block was removed, so MCP search queries now have no default LIMIT, no default ORDER BY, and pass select: '' straight through (renderChartConfig then emits SELECT FROM …, a syntax error) when the caller omits columns. Previously defaults were limit: 50, orderBy: timestamp DESC, select: defaultTableSelect || '*'. → Either reapply search-specific defaults in runConfigTile, or route the MCP search path through buildSearchChartConfig/runSearchConfig so both surfaces share the new utility.

  • Dead options in runConfigTile — the signature was widened to { maxResults?: number; offset?: number } but neither field is ever read in the new body (compare helpers.ts:109 with the removed searchOverrides). The MCP caller in mcp/tools/query/index.ts:144-146 still passes { maxResults: input.maxResults }, so it is silently ignored. → Wire the options into the chart-config limit, or drop them from the signature and the caller.

  • ⚠️ PR description mismatch — the body's curl examples use "columns": "...", but the implemented field is select (search.ts:182). The "how to test" snippets will 400 against the merged code. → Update PR body to use select.

  • ⚠️ Regex-based subquery blacklist is fragile (search.ts:141) — /;|(?<!\w)SELECT\s/i is bypassable via comment splits (SE/**/LECT), and ClickHouse will still let an authenticated caller read other tables their DB user has rights to via plain references. The validation reads as a security boundary but isn't one. → Reframe the comment as "best-effort guardrail, real isolation comes from the team DB user" or replace with a tokenizing parse.

  • ⚠️ Test coverage gap for the MCP regressionmcp/__tests__/queryTool.test.ts:194-218 only asserts result.isError is falsy and doesn't run a search-without-columns case. If isError isn't surfaced on ClickHouse 500s, the broken SQL path slips through CI. → Add a search-without-columns test and assert non-empty data.

Style/convention items look good overall: the OpenAPI block matches the Zod schema, status-code mapping is exhaustive, the ClickHouse user-input error allow-list is sensible, the new test file covers validation/auth/cross-team isolation/pagination thoroughly, and the changeset is present.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

Deep Review

🔴 P0/P1 -- must fix

  • packages/api/src/routers/external-api/__tests__/search.test.ts:17 -- relative-path imports are out of alphabetical order (controllers/user placed after models/*), and simple-import-sort/imports is 'error' in packages/api/eslint.config.mjs, so make ci-lint will fail and block merge.
    • Fix: Reorder the relative imports so '../../../controllers/user' sorts before '../../../fixtures' and '../../../models/*' within the relative-import group.
    • project-standards

🟡 P2 -- recommended

  • packages/api/src/mcp/tools/query/helpers.ts:246 -- the Markdown-tile branch in runConfigTile now returns isError: true, code: 'INTERNAL' where it previously returned a non-error { content: [...] }, so existing MCP consumers like mcp/tools/dashboards/queryTile.ts will surface Markdown tiles as tool failures.
    • Fix: Keep the prior non-error envelope for the Markdown branch, or introduce a distinct NOT_QUERYABLE code that MCP clients can treat as informational rather than as a tool error.
    • adversarial
  • packages/api/src/routers/external-api/v2/search.ts:145 -- DISALLOWED_COLUMNS_PATTERN = /;|(?<!\w)SELECT\s/i is bypassed by comment-separated forms (SELECT/**/x, SELECT--x\n1) and parenthesis-attached forms (SELECT(1)), so authenticated callers can execute scalar subqueries against system.* and other tables their team's CH user can read, even though the OpenAPI declares such subqueries rejected.
    • Fix: Strip SQL comments before testing, tighten the keyword anchor to (?<!\w)SELECT\b/i, or replace the substring denylist with a structural allowlist of identifier / map-lookup / scalar-function tokens.
    • security, adversarial
  • packages/api/src/routers/external-api/v2/search.ts:401 -- the response has no byte cap; with maxResults: 2000 on a source whose rows carry multi-MB Body/SpanAttributes payloads, a single request can serialize hundreds of MB to multi-GB into the Express res.json buffer and pressure both API heap and CH memory.
    • Fix: Inject max_result_bytes and max_memory_usage into the ClickHouse querySettings on the search path and/or measure the running JSON byte budget in the handler and truncate before res.json.
    • reliability, performance, adversarial
  • packages/api/src/mcp/tools/query/helpers.ts:294 -- when a source has no max_execution_time setting or it is 0, searchRequestTimeout is left undefined and the ClickHouse client falls back to its default requestTimeout of 3_600_000 ms, so a stuck search query can hold an HTTP connection and CH slot for up to an hour.
    • Fix: Clamp searchRequestTimeout to a project default (for example 60 s default, 5-10 min ceiling) regardless of the source max_execution_time setting.
    • reliability
  • packages/api/src/mcp/tools/query/helpers.ts:120 -- resolveSearchOrderBy deliberately omits the UI's tableMetadata.sorting_key fold-in that optimizeDefaultOrderBy does, so sources with a composite sort key like (toStartOfDay(Timestamp), Timestamp) produce a different row ordering between the search panel and the new /api/v2/search.
    • Fix: Move the derivation into @hyperdx/common-utils as a function that takes an optional sortingKey, and pass the cached value from getTableMetadata so the API and the UI share one implementation.
    • correctness, maintainability
  • packages/api/src/mcp/tools/query/helpers.ts:86 -- the new if (startDate > endDate) guard returns 400 for cases that previously silently returned an empty result; MCP callers (hyperdx_query, hyperdx_query_tile) that passed a future startTime without endTime and relied on the implicit endTime = now default now receive an error.
    • Fix: Apply the strict-greater guard only when both startTime and endTime are caller-provided, or document the behavior change in the changeset and update the MCP tool descriptions.
    • correctness
  • packages/api/src/routers/external-api/v2/search.ts:388 -- the documented { message: 'Query execution failed' } 5xx redaction envelope has no test; a refactor that accidentally inlines result.content[0]?.text into a 500 response would not be caught.
    • Fix: Add a test that forces a 500 (e.g., point the source at an unreachable CH host) and asserts both the status code and that the body equals { message: 'Query execution failed' }.
    • testing, reliability
  • packages/api/src/routers/external-api/v2/search.ts:411 -- the 11 codes in CH_USER_INPUT_ERRORS (SYNTAX_ERROR, UNKNOWN_IDENTIFIER, etc.) are the boundary between 400 and 500 responses but no test exercises any of them, so a regression that routes user-input errors to 500 (or vice versa) would pass CI.
    • Fix: Add a test that triggers a real CH SYNTAX_ERROR or UNKNOWN_IDENTIFIER via a malformed columns or where value and asserts status 400 with a ${chType}: prefix.
    • testing, previous-comments
  • packages/api/src/routers/external-api/v2/search.ts:363 -- the handler builds a synthetic externalDashboardTileSchemaWithId with grid fields (x: 0, y: 0, w: 24, h: 6, name: 'Search') just to satisfy the schema, so a future required field on the dashboard-tile schema will silently break /api/v2/search even though it has no dashboard semantics.
    • Fix: Extract a narrower internal entrypoint in helpers.ts that takes (teamId, source, where, whereLanguage, select, dateRange, { maxResults, offset }) and have runConfigTile reuse it instead of going through the external tile schema.
    • maintainability, api-contract
  • packages/api/src/routers/external-api/v2/search.ts:390 -- the handler uses console.error('[search] ...', ...) for all three error log sites while sibling v2 routers (sources.ts, webhooks.ts, dashboards.ts) use the shared @/utils/logger, so search errors bypass structured logging and any downstream alerting tied to the logger.
    • Fix: Import the project logger and replace the three console.error calls with structured logger.error({ ... }, '...') matching the pattern in sibling v2 routers.
    • project-standards, reliability, kieran-typescript
  • packages/api/src/routers/external-api/v2/search.ts:299 -- response error envelope is { message: string } here while /api/v2/charts/series uses { error: string }, so v2 ships two error shapes that clients must special-case per endpoint.
    • Fix: Pick one v2 error envelope (alerts/sources/dashboards already use { message }) and either reference the shared #/components/schemas/Error schema here instead of redefining it inline, or open a follow-up to migrate charts.ts.
    • api-contract, project-standards
  • packages/api/src/routers/external-api/v2/search.ts:162 -- startTime/endTime accept ISO strings while /api/v2/charts/series accepts millisecond integers via millisecondTimestampSchema, so a single v2 client has to translate between two time encodings inside the same namespace.
    • Fix: Pick one v2 time-input convention and either align this endpoint with charts.ts or open a follow-up to migrate the older route.
    • api-contract
  • packages/api/src/routers/external-api/v2/search.ts:41 -- OpenAPI advertises sourceId as plain type: string and startTime/endTime as format: date-time, but zod requires ObjectId.isValid(sourceId) and new Date(...) happily parses non-ISO inputs like '2024', so generated SDK clients see broader/narrower acceptance than the server actually enforces.
    • Fix: Tighten zod (.regex(/^[a-fA-F0-9]{24}$/), z.string().datetime()) so the runtime contract matches the published OpenAPI, or loosen the OpenAPI text to match the server's actual acceptance.
    • api-contract
  • packages/api/src/routers/external-api/__tests__/search.test.ts:48 -- the suite covers only a SourceKind.Log source with a single-column timestampValueExpression: 'Timestamp' and a single Lucene equality clause; trace sources, multi-part timestamps, orderByExpression overrides, displayedTimestampValueExpression, Lucene AND/OR/NOT/grouped clauses, and SQL map-lookup syntax (LogAttributes['key'], SpanAttributes['key']) are all untested.
    • Fix: Add a trace-source test, a Lucene compound-clause test, a SQL map-lookup test, and at least one resolveSearchOrderBy advanced-branch test (orderByExpression set, or multi-part timestamp producing (a, b) DESC).
    • testing, correctness, previous-comments
  • packages/api/src/mcp/tools/query/helpers.ts:330 -- the search branch falls back to options?.maxResults ?? 50, while the v2 zod schema defaults maxResults to 100; the helper fallback is dead today but diverges from the documented default and silently changes behavior for any future caller that skips the validator.
    • Fix: Align the helper fallback to ?? 100, or delete it since the only current caller always supplies a default.
    • previous-comments
🔵 P3 nitpicks (13)
  • packages/api/src/routers/external-api/v2/search.ts:2 -- splitAndTrimWithBracket is imported but never referenced in this file (the actual user is helpers.ts).
    • Fix: Remove the unused import; the same prior-review comment already requested it.
    • correctness, maintainability, project-standards, api-contract, kieran-typescript, adversarial, previous-comments
  • packages/api/src/routers/external-api/v2/search.ts:387 -- if ('isError' in result) is a negative discriminant; if a future refactor lets the helper return TileMcpResult here, result.data.length on the success branch will throw.
    • Fix: Use a positive discriminant if ('isRaw' in result && Array.isArray(result.data)) and fall through to a structured 500 otherwise.
  • packages/api/src/mcp/tools/query/helpers.ts:211 -- isRawSqlSavedChartConfig(config as never) casts the external tile config to never to feed a guard that expects SavedChartConfig; this disables narrowing entirely.
    • Fix: Use the existing isRawSqlExternalTileConfig helper from @/routers/external-api/v2/utils/dashboards and drop the as never.
  • packages/api/src/mcp/tools/query/helpers.ts:357 -- } as ChartConfigWithDateRange plus an eslint-disable @typescript-eslint/no-unsafe-type-assertion papers over the fact that buildSearchChartConfig returns dateRange as Partial<DateRange>.
    • Fix: Set dateRange: [startDate, endDate] explicitly in the object literal and replace the cast with satisfies ChartConfigWithDateRange.
  • packages/api/src/routers/external-api/v2/search.ts:404 -- ((err.cause as Record<string, unknown> | undefined)?.type ?? 'UNKNOWN') as string accepts non-string .type values silently.
    • Fix: Replace the trailing as string with a typeof rawType === 'string' ? rawType : 'UNKNOWN' narrow.
  • packages/api/src/routers/external-api/v2/search.ts:371 -- displayType: 'search' as const hard-codes the enum literal.
    • Fix: Import and use DisplayType.Search from @hyperdx/common-utils/dist/types.
  • packages/api/src/mcp/tools/query/helpers.ts:33 -- TileErrorCode, TileError, TileRawResult, TileMcpResult are declared inline in an MCP-tagged module but consumed across the module boundary by the v2 REST router.
    • Fix: Move the four types into a sibling mcp/tools/query/types.ts (or a neutral services/ module) and import from there in both the MCP helper and search.ts.
  • packages/api/src/routers/external-api/v2/search.ts:7 -- the REST router imports parseTimeRange, runConfigTile, and TileErrorCode from @/mcp/tools/query/helpers, reverse-coupling the external API to MCP internals.
    • Fix: Hoist those symbols into a neutral module (e.g., services/query or routers/external-api/v2/utils/) and import from there.
  • packages/api/src/mcp/tools/query/helpers.ts:165 -- the derived ORDER BY <timestamp> DESC has no stable tiebreaker, so on busy sources where many rows share a Timestamp value the same row can surface on two pages of offset pagination or be skipped between them.
    • Fix: Append a stable secondary key (for example the source's primary-key column or a row hash) to the derived ORDER BY clause, or document this limitation alongside the existing "prefer timestamp-cursor pagination" note.
  • packages/api/src/routers/external-api/v2/search.ts:1 -- new search.ts is 437 lines and modified helpers.ts is 445 lines, both over the project's 300-line guidance.
    • Fix: Extract the OpenAPI JSDoc blocks into a sibling search.openapi.ts and split helpers.ts into time.ts, runConfigTile.ts, orderBy.ts modules.
  • packages/api/src/routers/external-api/v2/search.ts:414 -- the 400 path returns ${chType}: ${safeMsg}, which on UNKNOWN_IDENTIFIER/THERE_IS_NO_COLUMN echoes CH's suggestion list of nearby column or table names.
    • Fix: Return only the chType plus a generic hint to the caller and keep safeMsg in the server-side log.
  • packages/api/src/mcp/tools/query/helpers.ts:380 -- (result as { data: unknown[] }).data as Record<string, unknown>[] double-casts without verifying that each row is an object.
    • Fix: Type the cast as unknown[] and narrow each row, or assert every(r => r != null && typeof r === 'object') before returning.
  • packages/api/src/routers/external-api/v2/search.ts:428 -- console.error('[search] unexpected error', err) logs the raw error, which on driver-level failures can include connection metadata or rendered SQL via err.cause/err.config.
    • Fix: Log only err.name, the first line of err.message, and a truncated stack — matching the redaction style used for the ClickHouseQueryError branch on the previous lines.

Reviewers (11): correctness, security, testing, maintainability, project-standards, api-contract, reliability, adversarial, kieran-typescript, performance, previous-comments.

Testing gaps:

  • 5xx redaction envelope ({ message: 'Query execution failed' }) is unverified.
  • None of the 11 CH_USER_INPUT_ERRORS codes are exercised, so the 400-vs-500 boundary is untested.
  • Trace sources, multi-part timestampValueExpression, orderByExpression overrides, and displayedTimestampValueExpression branches in resolveSearchOrderBy are uncovered.
  • Lucene AND/OR/NOT, grouped clauses, and quoted-string values are not asserted.
  • SQL map-lookup syntax (LogAttributes['key'], SpanAttributes['key']) is documented but untested.
  • requestTimeout derivation from a source's max_execution_time setting (including '0' and non-numeric values) is unasserted.
  • Off-by-one pagination boundary (offset >= matching-row-count) and network-failure 500 path (unreachable CH host) are untested.
  • DISALLOWED_COLUMNS_PATTERN bypass cases ((SELECT/**/x ...), SELECT(x), SELECT--x\n1) are not in the validator test set.

@vinzee vinzee force-pushed the va/programmatic-search-api branch 3 times, most recently from 2ef528f to c894469 Compare May 12, 2026 01:39
]);

/**
* @openapi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run cd packages/api && yarn docgen and commit the generated changes from packages/api/openapi.json so that these specs appear in packages/api/openapi.json

Comment on lines +80 to +103
* columns:
* type: string
* maxLength: 4096
* default: ""
* description: |
* Comma-separated list of ClickHouse column expressions to include in
* each result row. When omitted the source's default select expression
* is used.
*
* Each entry is a ClickHouse SQL expression executed under the team's
* database user. Semicolons and subqueries (SELECT keyword) are
* rejected; use column references, map lookups, or function calls only.
*
* Column naming:
* - Top-level schema columns use PascalCase: Timestamp, Body,
* SeverityText, ServiceName, TraceId, Duration, StatusCode.
* - Materialized attribute columns (preferred -- have skip indices):
* pipedream.pipeline_name, k8s.pod.name, sim.scenario_id, etc.
* - Raw map lookups (slower, no index):
* ResourceAttributes['key'], LogAttributes['key'],
* SpanAttributes['key'].
*
* HyperDX rewrites known attribute column names to their materialized
* equivalents automatically; you can still pass the logical name.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor suggestions here:

  • select instead of columns to match the internal naming and the UI
  • Removed note on top-level column naming because we support custom schemas which may not match that convention
Suggested change
* columns:
* type: string
* maxLength: 4096
* default: ""
* description: |
* Comma-separated list of ClickHouse column expressions to include in
* each result row. When omitted the source's default select expression
* is used.
*
* Each entry is a ClickHouse SQL expression executed under the team's
* database user. Semicolons and subqueries (SELECT keyword) are
* rejected; use column references, map lookups, or function calls only.
*
* Column naming:
* - Top-level schema columns use PascalCase: Timestamp, Body,
* SeverityText, ServiceName, TraceId, Duration, StatusCode.
* - Materialized attribute columns (preferred -- have skip indices):
* pipedream.pipeline_name, k8s.pod.name, sim.scenario_id, etc.
* - Raw map lookups (slower, no index):
* ResourceAttributes['key'], LogAttributes['key'],
* SpanAttributes['key'].
*
* HyperDX rewrites known attribute column names to their materialized
* equivalents automatically; you can still pass the logical name.
* select:
* type: string
* maxLength: 4096
* default: ""
* description: |
* Comma-separated list of ClickHouse column expressions to include in
* each result row. When omitted, the source's default select expression
* is used.
*
* Each entry is a ClickHouse SQL expression executed under the team's
* database user. Semicolons and subqueries (SELECT keyword) are
* rejected; use column references, map lookups, or function calls only.
*
* HyperDX rewrites map key lookups to materialized column accesses when possible:
* - Materialized attribute columns (preferred when available):
* pipedream.pipeline_name, __hdx_materialized_k8s.pod.name, sim.scenario_id, etc.
* - Raw map lookups (slower, no index):
* ResourceAttributes['pipedream.pipeline_name'], LogAttributes['k8s.pod.name'],
* SpanAttributes['sim.scenario_id'].

Comment on lines +354 to +355
const { sourceId, startTime, endTime, where, whereLanguage, columns, maxResults, offset } =
req.body;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few lint errors in this file, please run make dev-lint to fix.

Comment on lines +159 to +161
(implicitDateTimePrefixes.some(prefix => key.startsWith(prefix)) ||
timestampParts.includes(key) ||
displayedExpr === key)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow this check - is there a way that a candidate could not either be part of timestampParts or equal to displayedExpr? If not, maybe we only need the de-dupe check here, and we can get rid of implicitDateTimePrefixes entirely?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. removed

): Promise<TileMcpResult | TileError>;

// Implementation — broader signature, not exposed to callers.
export async function runConfigTile(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of changes to this function which are specific to search tiles now. In the interest of keeping functions simpler, would it make sense to just have a function specifically for search configs which encapsulates all of the search-specific branches added here? That would also eliminate some of the special handling for skipTrim and raw sql as well.

Something like the following, which could then be in its own file since it need not be specific to MCP helpers:

export type SearchErrorCode = 'SOURCE_NOT_FOUND' | 'CONNECTION_NOT_FOUND';

type SearchError = {
  isError: true;
  code: SearchErrorCode;
  message: string;
};

type SearchResults = {
  isError: false;
  data: Record<string, unknown>[];
};

export async function runSearchConfig({
  teamId,
  config,
  startDate,
  endDate,
  maxResults,
  offset,
}: {
  teamId: string;
  config: ExternalDashboardSearchChartConfig;
  startDate: Date;
  endDate: Date;
  maxResults: number;
  offset: number;
}): Promise<SearchResults | SearchError> {
  const source = await getSource(teamId, config.sourceId);
  if (!source) {
    return {
      isError: true as const,
      code: 'SOURCE_NOT_FOUND',
      message: `Source not found: ${config.sourceId}`,
    };
  }

  const connection = await getConnectionById(
    teamId,
    source.connection.toString(),
    true,
  );
  if (!connection) {
    return {
      isError: true,
      code: 'CONNECTION_NOT_FOUND',
      message: `Connection not found for source: ${config.sourceId}`,
    };
  }

  // Set client-side HTTP timeout slightly above the  source's
  // max_execution_time so CH can return a clean error first.
  // value=0 means no server limit — leave requestTimeout unset in that case.
  const maxExecSourceSetting = source.querySettings?.find(
    s => s.setting === 'max_execution_time',
  );
  const maxExecSeconds = maxExecSourceSetting
    ? Number(maxExecSourceSetting.value)
    : NaN;
  const searchRequestTimeout =
    maxExecSeconds > 0 && isFinite(maxExecSeconds)
      ? maxExecSeconds * 1000 + 2_000
      : undefined;

  const clickhouseClient = new ClickhouseClient({
    host: connection.host,
    username: connection.username,
    password: connection.password,
    ...(searchRequestTimeout != null
      ? { requestTimeout: searchRequestTimeout }
      : {}),
  });

  const searchBase = buildSearchChartConfig(source, {
    where: typeof config.where === 'string' ? config.where : '',
    whereLanguage: config.whereLanguage ?? 'lucene',
    select: config.select ?? null,
    displayType: DisplayType.Search,
    orderBy: resolveSearchOrderBy(source),
    dateRange: [startDate, endDate],
  });

  const chartConfig: ChartConfigWithDateRange = {
    ...searchBase,
    connection: source.connection.toString(),
    limit: {
      limit: maxResults ?? 50,
      offset: offset ?? 0,
    },
  };

  const metadata = getMetadata(clickhouseClient);
  const result = await clickhouseClient.queryChartConfig({
    config: chartConfig,
    metadata,
    querySettings: source.querySettings,
  });

  return { isError: false, data: result.data };
}

That could also simplify the /search handler, removing the need to add unecessary tile-related fields:

...
const search = externalDashboardSearchChartConfigSchema.parse({
        displayType: 'search' as const,
        sourceId,
        select: columns,
        where,
        whereLanguage,
      });

      const result = await runSearchConfig({
        teamId: teamId.toString(),
        config: search,
        startDate,
        endDate,
        maxResults,
        offset,
      });

      if (result.isError) {
        const status = codeToStatus(result.code);
        if (status >= 500) {
          console.error('[search] runConfigTile error', result.message);
          return res.status(status).json({ message: 'Query execution failed' });
        }
        return res
          .status(status)
          .json({ message: result.message ?? 'Not found' });
      }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. moved them to packages/api/src/routers/external-api/v2/utils/search.ts

) as string;
const safeMsg =
(err.message.split('\n')[0] ?? '').slice(0, 300) || 'Query error';
console.error('[search] ClickHouse query error', chType, safeMsg);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the pino logger instead of the console logger:

logger.error({ chType, safeMsg }, '[search] ClickHouse query error');

'Timestamp',
] as const;

function resolveSearchOrderBy(source: TSource): string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary for this PR, but it could be nice to also expose orderBy as a field in the request payload in the future.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. added it!

Introduce a new V2 API endpoint to query raw log and trace data.
This allows users to search events programmatically using Lucene
or SQL syntax, with support for column selection and pagination.
@vinzee vinzee force-pushed the va/programmatic-search-api branch from c894469 to 013eb62 Compare May 16, 2026 00:55
@vinzee
Copy link
Copy Markdown
Contributor Author

vinzee commented May 16, 2026

@pulpdrew addressed all comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants