Add programmatic search endpoint for logs/traces#2258
Conversation
🦋 Changeset detectedLatest commit: 013eb62 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
@vinzee is attempting to deploy a commit to the HyperDX Team on Vercel. A member of the Team first needs to authorize it. |
PR Review
Style/convention items look good overall: the OpenAPI block matches the Zod schema, status-code mapping is exhaustive, the ClickHouse user-input error allow-list is sensible, the new test file covers validation/auth/cross-team isolation/pagination thoroughly, and the changeset is present. |
Deep Review🔴 P0/P1 -- must fix
🟡 P2 -- recommended
🔵 P3 nitpicks (13)
Reviewers (11): correctness, security, testing, maintainability, project-standards, api-contract, reliability, adversarial, kieran-typescript, performance, previous-comments. Testing gaps:
|
2ef528f to
c894469
Compare
| ]); | ||
|
|
||
| /** | ||
| * @openapi |
There was a problem hiding this comment.
Please run cd packages/api && yarn docgen and commit the generated changes from packages/api/openapi.json so that these specs appear in packages/api/openapi.json
| * columns: | ||
| * type: string | ||
| * maxLength: 4096 | ||
| * default: "" | ||
| * description: | | ||
| * Comma-separated list of ClickHouse column expressions to include in | ||
| * each result row. When omitted the source's default select expression | ||
| * is used. | ||
| * | ||
| * Each entry is a ClickHouse SQL expression executed under the team's | ||
| * database user. Semicolons and subqueries (SELECT keyword) are | ||
| * rejected; use column references, map lookups, or function calls only. | ||
| * | ||
| * Column naming: | ||
| * - Top-level schema columns use PascalCase: Timestamp, Body, | ||
| * SeverityText, ServiceName, TraceId, Duration, StatusCode. | ||
| * - Materialized attribute columns (preferred -- have skip indices): | ||
| * pipedream.pipeline_name, k8s.pod.name, sim.scenario_id, etc. | ||
| * - Raw map lookups (slower, no index): | ||
| * ResourceAttributes['key'], LogAttributes['key'], | ||
| * SpanAttributes['key']. | ||
| * | ||
| * HyperDX rewrites known attribute column names to their materialized | ||
| * equivalents automatically; you can still pass the logical name. |
There was a problem hiding this comment.
Some minor suggestions here:
selectinstead ofcolumnsto match the internal naming and the UI- Removed note on top-level column naming because we support custom schemas which may not match that convention
| * columns: | |
| * type: string | |
| * maxLength: 4096 | |
| * default: "" | |
| * description: | | |
| * Comma-separated list of ClickHouse column expressions to include in | |
| * each result row. When omitted the source's default select expression | |
| * is used. | |
| * | |
| * Each entry is a ClickHouse SQL expression executed under the team's | |
| * database user. Semicolons and subqueries (SELECT keyword) are | |
| * rejected; use column references, map lookups, or function calls only. | |
| * | |
| * Column naming: | |
| * - Top-level schema columns use PascalCase: Timestamp, Body, | |
| * SeverityText, ServiceName, TraceId, Duration, StatusCode. | |
| * - Materialized attribute columns (preferred -- have skip indices): | |
| * pipedream.pipeline_name, k8s.pod.name, sim.scenario_id, etc. | |
| * - Raw map lookups (slower, no index): | |
| * ResourceAttributes['key'], LogAttributes['key'], | |
| * SpanAttributes['key']. | |
| * | |
| * HyperDX rewrites known attribute column names to their materialized | |
| * equivalents automatically; you can still pass the logical name. | |
| * select: | |
| * type: string | |
| * maxLength: 4096 | |
| * default: "" | |
| * description: | | |
| * Comma-separated list of ClickHouse column expressions to include in | |
| * each result row. When omitted, the source's default select expression | |
| * is used. | |
| * | |
| * Each entry is a ClickHouse SQL expression executed under the team's | |
| * database user. Semicolons and subqueries (SELECT keyword) are | |
| * rejected; use column references, map lookups, or function calls only. | |
| * | |
| * HyperDX rewrites map key lookups to materialized column accesses when possible: | |
| * - Materialized attribute columns (preferred when available): | |
| * pipedream.pipeline_name, __hdx_materialized_k8s.pod.name, sim.scenario_id, etc. | |
| * - Raw map lookups (slower, no index): | |
| * ResourceAttributes['pipedream.pipeline_name'], LogAttributes['k8s.pod.name'], | |
| * SpanAttributes['sim.scenario_id']. |
| const { sourceId, startTime, endTime, where, whereLanguage, columns, maxResults, offset } = | ||
| req.body; |
There was a problem hiding this comment.
There are a few lint errors in this file, please run make dev-lint to fix.
| (implicitDateTimePrefixes.some(prefix => key.startsWith(prefix)) || | ||
| timestampParts.includes(key) || | ||
| displayedExpr === key) |
There was a problem hiding this comment.
I don't quite follow this check - is there a way that a candidate could not either be part of timestampParts or equal to displayedExpr? If not, maybe we only need the de-dupe check here, and we can get rid of implicitDateTimePrefixes entirely?
| ): Promise<TileMcpResult | TileError>; | ||
|
|
||
| // Implementation — broader signature, not exposed to callers. | ||
| export async function runConfigTile( |
There was a problem hiding this comment.
There are a lot of changes to this function which are specific to search tiles now. In the interest of keeping functions simpler, would it make sense to just have a function specifically for search configs which encapsulates all of the search-specific branches added here? That would also eliminate some of the special handling for skipTrim and raw sql as well.
Something like the following, which could then be in its own file since it need not be specific to MCP helpers:
export type SearchErrorCode = 'SOURCE_NOT_FOUND' | 'CONNECTION_NOT_FOUND';
type SearchError = {
isError: true;
code: SearchErrorCode;
message: string;
};
type SearchResults = {
isError: false;
data: Record<string, unknown>[];
};
export async function runSearchConfig({
teamId,
config,
startDate,
endDate,
maxResults,
offset,
}: {
teamId: string;
config: ExternalDashboardSearchChartConfig;
startDate: Date;
endDate: Date;
maxResults: number;
offset: number;
}): Promise<SearchResults | SearchError> {
const source = await getSource(teamId, config.sourceId);
if (!source) {
return {
isError: true as const,
code: 'SOURCE_NOT_FOUND',
message: `Source not found: ${config.sourceId}`,
};
}
const connection = await getConnectionById(
teamId,
source.connection.toString(),
true,
);
if (!connection) {
return {
isError: true,
code: 'CONNECTION_NOT_FOUND',
message: `Connection not found for source: ${config.sourceId}`,
};
}
// Set client-side HTTP timeout slightly above the source's
// max_execution_time so CH can return a clean error first.
// value=0 means no server limit — leave requestTimeout unset in that case.
const maxExecSourceSetting = source.querySettings?.find(
s => s.setting === 'max_execution_time',
);
const maxExecSeconds = maxExecSourceSetting
? Number(maxExecSourceSetting.value)
: NaN;
const searchRequestTimeout =
maxExecSeconds > 0 && isFinite(maxExecSeconds)
? maxExecSeconds * 1000 + 2_000
: undefined;
const clickhouseClient = new ClickhouseClient({
host: connection.host,
username: connection.username,
password: connection.password,
...(searchRequestTimeout != null
? { requestTimeout: searchRequestTimeout }
: {}),
});
const searchBase = buildSearchChartConfig(source, {
where: typeof config.where === 'string' ? config.where : '',
whereLanguage: config.whereLanguage ?? 'lucene',
select: config.select ?? null,
displayType: DisplayType.Search,
orderBy: resolveSearchOrderBy(source),
dateRange: [startDate, endDate],
});
const chartConfig: ChartConfigWithDateRange = {
...searchBase,
connection: source.connection.toString(),
limit: {
limit: maxResults ?? 50,
offset: offset ?? 0,
},
};
const metadata = getMetadata(clickhouseClient);
const result = await clickhouseClient.queryChartConfig({
config: chartConfig,
metadata,
querySettings: source.querySettings,
});
return { isError: false, data: result.data };
}That could also simplify the /search handler, removing the need to add unecessary tile-related fields:
...
const search = externalDashboardSearchChartConfigSchema.parse({
displayType: 'search' as const,
sourceId,
select: columns,
where,
whereLanguage,
});
const result = await runSearchConfig({
teamId: teamId.toString(),
config: search,
startDate,
endDate,
maxResults,
offset,
});
if (result.isError) {
const status = codeToStatus(result.code);
if (status >= 500) {
console.error('[search] runConfigTile error', result.message);
return res.status(status).json({ message: 'Query execution failed' });
}
return res
.status(status)
.json({ message: result.message ?? 'Not found' });
}There was a problem hiding this comment.
makes sense. moved them to packages/api/src/routers/external-api/v2/utils/search.ts
| ) as string; | ||
| const safeMsg = | ||
| (err.message.split('\n')[0] ?? '').slice(0, 300) || 'Query error'; | ||
| console.error('[search] ClickHouse query error', chType, safeMsg); |
There was a problem hiding this comment.
Please use the pino logger instead of the console logger:
logger.error({ chType, safeMsg }, '[search] ClickHouse query error');
| 'Timestamp', | ||
| ] as const; | ||
|
|
||
| function resolveSearchOrderBy(source: TSource): string { |
There was a problem hiding this comment.
Not necessary for this PR, but it could be nice to also expose orderBy as a field in the request payload in the future.
There was a problem hiding this comment.
makes sense. added it!
Introduce a new V2 API endpoint to query raw log and trace data. This allows users to search events programmatically using Lucene or SQL syntax, with support for column selection and pagination.
c894469 to
013eb62
Compare
|
@pulpdrew addressed all comments |
Summary
Motivation
The existing external API (
/api/v2/charts/series) only supports aggregated time-series queries. There is no REST API to retrieve individual log or trace rows - the only programmatic way to do this today is through the MCP tool (hyperdx_query displayType:"search"), which is scoped to LLM tooling and not usable from scripts, CI pipelines, or backend services.This change fills this gap by adding
POST /api/v2/search- a new external API endpoint for fetching raw log and trace rows programmatically using familiar HTTP semantics, without having to write raw ClickHouse SQL or manage a direct database connection.Implementation Details
The endpoint mirrors the "search" panel mode in the HyperDX UI and routes through the same runConfigTile execution path used by the hyperdx_query MCP tool (displayType: "search"), so it automatically inherits all the same query optimizations:
Also extends
runConfigTileto accept an offset option (previously hardcoded to 0) to support pagination.Screenshots or video
N/A - API-only change, no UI impact.
How to test
GET /api/api/v2/sources{ "data": [...], "rows": N }.References