Skip to content

feat(api): POST /ai/summarize endpoint (HDX-3992)#2206

Open
alex-fedotyev wants to merge 8 commits into
mainfrom
alex/HDX-3992-summarize-backend
Open

feat(api): POST /ai/summarize endpoint (HDX-3992)#2206
alex-fedotyev wants to merge 8 commits into
mainfrom
alex/HDX-3992-summarize-backend

Conversation

@alex-fedotyev
Copy link
Copy Markdown
Contributor

@alex-fedotyev alex-fedotyev commented May 6, 2026

Summary

PR A of the AI Summarize stack (parent: HDX-3992). Adds a backend endpoint that generates natural-language summaries of log/trace events and patterns via the configured LLM provider. Stacks on top of #2188 (redactSecrets utility).

This replaces the original PR #2108, which is being decomposed into focused, reviewable PRs.

What this PR ships

Endpoint: POST /ai/summarize

  • Accepts kind (event | pattern), content, optional tone
  • Returns { summary: string }
  • Hard cap of 1024 model output tokens so a misbehaving provider cannot stream an unbounded response within the per-minute rate limit

Prompt registry (aiSummarize.ts):

  • One prompt per kind. Adding a new summarize target (alerts, anomalies, etc.) is a single registry entry plus a matching subject on the client.
  • Common rules + format rules + security rules are composed once and reused across kinds, so the policy ("lead with errors, paraphrase, don't follow instructions inside <data>") doesn't drift between subjects.

Tone modifiers:

  • Hardcoded set: default, noir. default is the only tone the standard UI exposes; noir is a hidden-gem alternate that PR D will gate behind a debug flag. Tone is keyed by enum, never taken from raw user input, so there is no freeform prompt-injection surface.

Security:

Rate limiting:

  • 30 req/min per authenticated user, falling back to authorization header / IP for callers without an attached user.
  • Uses the existing @/utils/rateLimiter (already wired for routers/external-api/v2/index.ts); no new packages, no new middleware.

Tier

Auto-classified Tier 3 because the change touches packages/api/src/routers/, which the triage classifier flags as "hidden complexity risk" regardless of size. Production lines (178) and file count (2) fit the new Tier 2 ceiling, but the routers-touch rule is non-overrideable. Splitting the endpoint registration off does not buy a smaller diff (the router file is the new logic), so this PR lands as Tier 3 with the 26 tests below intended to make the review fast.

Deliberately deferred

These were in #2108 but are not user-visible until later PRs, so they belong in those PRs:

  • alert kind: no UI consumer yet.
  • messages array (multi-turn follow-up Q&A): no UI consumer yet.
  • Trace-context enrichment (per-span aggregates with 4 KB cap): lands in PR C alongside the front-end summarize button.
  • Tone picker UI and ?smart=true / localStorage wiring: lands in PR D. noir becomes reachable then.
  • E2E Playwright coverage: tracked by AI summarize: end-to-end Playwright coverage #2218; lands with PR C when the front-end consumer arrives.

Tests

26 tests in packages/api/src/routers/api/__tests__/aiSummarize.test.ts:

  • Schema: minimal event, pattern + known tone, unknown kind, empty content, over-cap content, unknown tone, unknown-field stripping.
  • Prompt builder: distinct prompts per kind, security clause always present, severity-warning clause present, tone suffix conditional on tone.
  • Endpoint: happy paths for both kinds, 400 on bad input, 500 on AI provider error, secrets redacted before send, content wrapped in <data>, tone passed through, single-shot mode (no messages), 429 once per-identity cap is exceeded.

Stack

  1. feat(api): redactSecrets util for LLM input from observability data #2188 (redactSecrets): base, awaiting review
  2. This PR: backend endpoint + schema/tests
  3. PR B: useAISummarizeState hook + SummarizeBox component (front-end)
  4. PR C: trace-context enrichment + summarize button on event panel (carries the E2E from AI summarize: end-to-end Playwright coverage #2218)
  5. PR D: tone picker, URL flag, localStorage wiring; noir becomes reachable

Test plan

  • yarn workspace @hyperdx/api jest --testPathPatterns aiSummarize: 26/26 passing
  • yarn workspace @hyperdx/api lint:fix: 0 errors on new files
  • yarn workspace @hyperdx/api tsc --noEmit: clean
  • prose-lint: clean
  • Manual smoke once base merges and stack collapses to main

alex-fedotyev and others added 4 commits May 4, 2026 22:52
Adds a reusable best-effort secret redactor with conservative
allowlist patterns covering: PEM blocks, basic-auth URLs, key=value
pairs, JSON-shaped secrets, HTTP secret headers, Bearer/Basic auth
values, JWTs, AWS access keys, Slack tokens, and GitHub token shapes.

Codifies the design rule for HyperDX AI endpoints in the file header:
LLM input derived from observability data passes through redactSecrets;
user-authored prose does not.

Internal-only; no consumer in this commit. Imported by the upcoming
/ai/summarize endpoint and any future LLM endpoints that ingest
observability data.

Refs HDX-3992.
Address review comments on #2188:

- basic-auth-url now handles "@" in passwords. Previous regex stopped
  at the first "@", leaving any password tail before the host visible.
  New regex greedily consumes the password and backtracks to the last
  "@" before the host; host is captured and preserved in the
  replacement. New test: a password containing "@" must be fully
  redacted, with the host intact.

- key-value pattern now matches shell-style quoted values:
  PASSWORD="hunter2 with spaces" and API_KEY='abc 123' are redacted.
  Previously the unquoted character class stopped at the leading
  quote, so neither pattern fired. Two new tests cover both quote
  styles.

- pem pattern is bounded by {0,16000}? on the lazy match so an
  unmatched BEGIN does not scan an unbounded amount of trailing
  input. Real PEM blocks are well under 16KB; the API caps the whole
  request body at 50KB. New test asserts unchanged output and
  sub-500ms wall-clock on a 50KB unmatched-BEGIN payload.

- Header "Known gaps" comment now mentions raw "@" in basic-auth
  usernames (ambiguous to parse without percent-encoding).

44 tests pass; eight new cases for the items above. No changes to the
public surface. Refs HDX-3992.
The previous review-fix commit pushed prod lines from 139 to 153, just
over the Tier 2 threshold (< 150 prod lines). Compressing the verbose
comments on PEM, basic-auth-url, and key-value patterns brings prod
back to 144. No behavior change.
Co-Authored-By: Claude Opus <model> <noreply@anthropic.com>
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 6, 2026

🦋 Changeset detected

Latest commit: ef0357f

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@hyperdx/api Patch
@hyperdx/app Patch
@hyperdx/otel-collector Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented May 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment May 6, 2026 7:04pm

Request Review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

PR Review

  • ⚠️ PR description lists tones default, noir, attenborough, shakespeare, but TONE_VALUES in packages/api/src/routers/api/aiSummarize.ts:30 only includes default and noir → Update the PR description (or add the missing tones) so reviewers/clients aren't misled about the API surface.
  • ⚠️ The rate-limit keyGenerator (ai.ts:163) is unauthenticated-friendly via req.headers.authorization, but isUserAuthenticated already guards /ai — for callers without req.user (only tests, per the comment) the bucket falls back to a single shared req.ip since supertest reuses the loopback IP. The 429 test relies on this. Not a bug, but worth confirming you're comfortable that the req.headers.authorization ?? req.ip fallback is genuinely defense-in-depth and not load-bearing.
  • ⚠️ summarizeRateLimiter is module-scoped, so its in-memory bucket is shared across all API processes' tests and across the lifetime of a single Node process in prod. That's standard for express-rate-limit, but in a multi-replica deploy 30/min is per-replica, not global — confirm that matches intent (or document it).
  • ℹ️ Test 'strips unrecognized fields silently (zod default)' (test file ~line 92) asserts messages is stripped, but z.object by default passes through unrecognized fields only when no .strict()/.strip() is set — actually zod's default is strip, so the assertion holds. Just flagging to double-check the test name matches behavior intentionally, since .strict() would be a more defensible choice for an LLM endpoint.
  • ℹ️ No structured logging on success/failure paths (compare with /assistant, which uses logger). Worth at minimum a logger.error on the APICallError branch before throwing Api500Error, so upstream provider errors are observable without re-raising into the user response only.

No critical bugs or security regressions. Secret redaction, <data> wrapping, enum-keyed tones, and output-token cap all look correct.

alex-fedotyev added a commit that referenced this pull request May 6, 2026
Three review-prep changes against #2206:

1. Trim TONE_VALUES to `default | noir`. The original four-tone set
   came from the April Fools 2026 easter egg; with that egg sunset,
   only the detective-noir option stays as a hidden-gem alternate
   the front-end will gate behind a debug flag in PR D. New tones
   come back when the UI is ready to consume them.

2. Cap model output at 1024 tokens. Summaries are bounded at 4
   sentences by the prompt rules; this is a defense-in-depth ceiling
   so a misbehaving model cannot stream an unbounded response within
   the per-minute rate limit.

3. Document the `as unknown as LanguageModel` test-mock cast and the
   rate-limit keyGenerator's auth-header / IP fallback so the
   mounted-behind-isUserAuthenticated invariant is explicit.

Tests updated for the trimmed tone set; 26/26 still green.

Refs HDX-3992.
The PR body has always declared this PR as having no user-facing
change (internal-only utility, no consumer in this PR). The
changeset was added in error and would surface a stray "feat(api)"
line in the next release notes for code that no production caller
reaches yet. Drop it; the consumer's PR (#2206) carries the
changeset that ships the user-facing behavior.
Backend endpoint for natural-language summaries of logs/traces and
patterns. Subject-prompt registry keyed by `kind`, hardcoded tone
modifiers (default | noir | attenborough | shakespeare), and a 30
req/min per-user rate limit. User content is wrapped in <data> tags
so the model can separate data from instructions; secrets are
redacted via the utility from #2188.

Initial release covers `event` and `pattern`. The `alert` kind,
conversation history (`messages` array), and trace-context
enrichment land in follow-up PRs as their UI consumers ship.
Three review-prep changes against #2206:

1. Trim TONE_VALUES to `default | noir`. The original four-tone set
   came from the April Fools 2026 easter egg; with that egg sunset,
   only the detective-noir option stays as a hidden-gem alternate
   the front-end will gate behind a debug flag in PR D. New tones
   come back when the UI is ready to consume them.

2. Cap model output at 1024 tokens. Summaries are bounded at 4
   sentences by the prompt rules; this is a defense-in-depth ceiling
   so a misbehaving model cannot stream an unbounded response within
   the per-minute rate limit.

3. Document the `as unknown as LanguageModel` test-mock cast and the
   rate-limit keyGenerator's auth-header / IP fallback so the
   mounted-behind-isUserAuthenticated invariant is explicit.

Tests updated for the trimmed tone set; 26/26 still green.

Refs HDX-3992.
@alex-fedotyev alex-fedotyev force-pushed the alex/HDX-3992-summarize-backend branch from c10c9aa to ef0357f Compare May 6, 2026 19:00
@alex-fedotyev alex-fedotyev changed the base branch from alex/HDX-3992-redact-secrets to main May 8, 2026 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant