v0.7.16: security hardening, db o11y and profiling, settings UI#5245
Conversation
* perf(trigger): cap concurrency on background DB tasks * test(trigger): update schedule concurrency assertion to 30
…cryption (#5236) - Add dispatch-latency / trigger-age instrumentation: capture webhook receipt time + Slack x-slack-request-timestamp at the route and log structured dispatchLatencyMs + triggerAgeMs before execution, surfacing the pre-execution latency that per-block timings cannot see (Slack trigger_id expires at 3s). - Guard the effective-env fetch in verifyProviderAuth: only fetch+decrypt when the handler verifies auth AND the providerConfig references env vars ({{VAR}}), avoiding a needless DB read/decrypt on the synchronous pre-ack path. The guard scope exactly matches resolveProviderConfigEnvVars, so resolution is identical.
* fix(connectors): harden Zendesk connector against SSRF Route the Zendesk connector through the SSRF-safe secureFetchWithRetry (DNS-resolve + IP-pin + per-redirect revalidation) instead of the plain fetchWithRetry, and validate the user-supplied subdomain against a strict DNS-label pattern before building the base URL. Matches the GitLab/Sentry/Obsidian/S3 precedent. * fix(connectors): retry transient DNS failures in secureFetchWithRetry secureFetchWithValidation throws a validation error before the request when a hostname temporarily fails to resolve. Classify that transient DNS failure as retryable so secureFetchWithRetry mirrors the old fetchWithRetry network-retry behavior, while keeping the deterministic blocked-IP SSRF rejection non-retryable.
* perf(db): drive Postgres pool size + application_name from per-role profiles Replace ad-hoc DB_APP_NAME sizing with a per-role profile map keyed by SIM_DB_ROLE (web/trigger/realtime), defaulting to web. Trigger machines open a small pool instead of 15 to avoid PgBouncer connection exhaustion. Also size realtime's separate socketDb pool down to 10. * fix(db): throw on invalid SIM_DB_ROLE instead of silently using web pools * fix(db): use Object.hasOwn for SIM_DB_ROLE validation to avoid prototype keys
…tion DoS (#5240) Knowledge-base ingestion downloaded an attacker-controlled external fileUrl with no byte cap: downloadFileFromUrl defaults maxBytes to MAX_SAFE_INTEGER, so the streaming reader buffered the entire response into memory uncapped. An authenticated user could OOM the processing worker by pointing fileUrl at a server that streams an unbounded body. Wire the documented 100MB file-size limit (MAX_FILE_SIZE) into the ingestion download helper. The existing stream limiter aborts the read once the cap is exceeded and rejects up front on an oversized Content-Length, so the body is never fully buffered.
…ory exhaustion (#5239) * fix(file-parsers): guard OOXML parsers against decompression-bomb memory exhaustion Pre-inspect the ZIP central directory of xlsx/docx/pptx buffers and reject archives whose declared expanded size (>1 GiB) or compression ratio (>150x) exceeds safe bounds, before SheetJS/mammoth/officeparser inflate them. The existing pipeline only capped the compressed input (100 MB), which does not bound decompressed size, so a crafted zip bomb could expand to many GB and OOM the worker. * fix(file-parsers): fail closed on unverifiable ZIP-shaped OOXML archives Address review: the guard previously no-opped (fell through to the decompression library) whenever the central directory could not be parsed, and findEocdOffset accepted the first backward EOCD signature without checking it sat at the buffer tail. A crafted archive with a decoy EOCD or an unsupported directory layout could bypass the size limits. - findEocdOffset now requires the EOCD comment length to place the record exactly at the buffer tail, defeating decoy signatures planted in the trailing region. - assertOoxmlArchiveWithinLimits now fails closed: a ZIP-shaped buffer (local file header / EOCD magic) whose central directory cannot be parsed is rejected rather than passed through. Genuine non-ZIP inputs (legacy OLE .xls/.doc, plaintext) still no-op and defer to the downstream parser.
…nent (#5235) * chore(data-drains): remove settings callout and unused InfoNote component * improvement(data-retention): convert policy editor from modal to full-surface page Mirror the access-control group-detail pattern: clicking a retention policy now drills into a full-surface PolicyDetail page (back chip + dirty-gated header Save/Discard + Remove) instead of an xl modal. Form fields use SettingsSection (Workspaces / Retention / PII redaction) and the page chrome matches group-detail exactly. The unsaved-changes confirm stays a small modal. * improvement(settings): shared save/discard + unsaved-changes guard Consolidate every editable settings surface onto one stack instead of per-page custom logic: - SaveDiscardActions: the canonical dirty-gated Discard+Save chip pair - useSettingsUnsavedGuard: syncs local dirty into useSettingsDirtyStore (so the sidebar section-switch confirm applies) + provides guardBack/UnsavedChangesModal for detail sub-views' back chip - useSettingsBeforeUnload: a single beforeunload in the settings shell Migrate whitelabeling, sso, access-control group-detail, data-retention, and secrets-manager (drops its duplicate beforeunload). Deletes the hand-rolled 'Unsaved changes' modals; the leave-confirm standardizes on Keep editing / Discard. Documents the pattern in the settings rule + add-settings-page skill. * fix(sso): reset originalFormData on discard/save so dirty state clears handleDiscard and the post-save cleanup reset formData to DEFAULT but left originalFormData on the edit snapshot, so hasChanges stayed true after leaving the form — leaking a stuck-dirty state into the shared settings guard. Reset originalFormData alongside formData in both leave paths (handleEdit re-seeds both on re-entry). * fix(settings): auto-dismiss unsaved-changes modal when page goes clean useSettingsUnsavedGuard stashed a deferred leave + opened UnsavedChangesModal when back was pressed while dirty, but never cleared them if isDirty later became false. Confirming Discard could then run a stale leave with no unsaved edits. Clear the pending leave and close the modal in the dirty-sync effect whenever isDirty is false.
…5241) * fix(copilot): gate post-tool output writes behind write permission The Copilot/Mothership executor runs three post-tool output-redirection sinks (maybeWriteOutputToFile, maybeWriteOutputToTable, maybeWriteReadCsvToTable) that persist a tool's result into the workspace. They were gated only on identity (workspaceId + userId), not on permission. Because function_execute/user_table/read are read-allowed for execution (absent from WRITE_ACTIONS in tools/server/router.ts), a read-only collaborator could drive the agent to durably create/overwrite workspace files and insert/overwrite table rows via output declarations — a function-level authorization bypass (CWE-862) that the dedicated write tools correctly reject. Add a shared denyOutputWriteWithoutWritePermission guard built on the canonical permissionSatisfies predicate and apply it to all three sinks, once a write is actually intended, so read-only principals get the same Permission denied outcome as the dedicated mutation tools. * fix(copilot): move file output write-permission gate after no-op skip branches Address Cursor review: in maybeWriteOutputToFile the gate ran before the sandbox-export skip branch (which returns the result unchanged without writing), so a read-only caller with a sandbox files payload was denied even though no workspace write would occur. Move the check to immediately before writeWorkspaceFileByPath so it only fires when a write is actually performed.
…rop token (#5243) GET /api/credential-sets/[id]/invite listed every invitation row — including the bearer token — to any org member, matching neither its sibling methods (POST/DELETE enforce admin/owner) nor the self-scoped /invitations endpoint. A non-privileged member could harvest a null-email invite token and self-join the credential set via POST /api/credential-sets/invite/[token]. - Add the admin/owner role gate to GET, matching POST/DELETE on the same route - Project explicit columns (drop token) so the secret is never returned to the management list; the creating admin still receives it via the create response
… skip redundant actor lookup (#5242) * improvement(execution): stop rewriting execution snapshots on reuse + skip redundant actor lookup - SnapshotService.createSnapshotWithDeduplication: switch the per-execution dedup write from onConflictDoUpdate(set state_data) to onConflictDoNothing + a conditional select. A (workflowId, stateHash) row is byte-identical by hash, so rewriting the full state jsonb every run only churned a dead tuple + TOAST/WAL under Postgres MVCC. The reuse path (the common case) now performs no write. - preprocessExecution: add an optional resolvedActorUserId so a caller that already resolved the billing actor upstream can skip the redundant workspace billed-account lookup. The ban/usage/rate/archived gates still run against the actor — only the resolution is reused, never a gate. The webhook background job passes the route-resolved payload.userId. * fix(webhooks): scope actor reuse to inline execution only Addresses review: a queued/Trigger.dev webhook can outlive a workspace billed-account change, so reusing the route-resolved actor there could gate against a stale account. Set resolvedActorUserId only on the in-process inline payload (sub-second after resolution); queued and persisted payloads omit it, so the background pass re-resolves the current billed account. Gates unchanged. * docs(webhooks): convert inline comments on actor-reuse to TSDoc * fix(logs): keep snapshot dedup a single atomic upsert (no select race) Addresses review: the DO NOTHING + follow-up select could fail if cleanup deletes the conflicting (orphaned, aged) snapshot between the no-op insert and the select. Revert to one atomic upsert but SET only state_hash, so RETURNING always yields the row (no race) while the unchanged TOASTed state_data jsonb is still not rewritten under MVCC — keeping the per-execution write tiny.
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryMedium Risk Overview Execution and infra: Trigger.dev tasks for workflow, webhook, and resume runs get env-configurable queue concurrency caps; default schedule concurrency drops 50 → 30 and the realtime Postgres pool 15 → 10, with Blocks/registry: Block data moves to Settings UX: Shared Reviewed by Cursor Bugbot for commit 2686f04. Bugbot is set up for automated code reviews on this repo. Configure here. |
Greptile SummaryThis PR hardens several security-sensitive paths and updates settings and execution infrastructure. The main changes are:
Confidence Score: 4/5The whitelabeling discard path can keep and later save a discarded uploaded image.
apps/sim/ee/whitelabeling/components/whitelabeling-settings.tsx Important Files Changed
Reviews (1): Last reviewed commit: "improvement(execution): stop rewriting e..." | Re-trigger Greptile |
#5223) * perf(dev): SIM_DEV_MINIMAL_REGISTRY mode to slash local dev-server RAM Adds a dev-only escape hatch (`bun run dev:minimal`, or `dev:full:minimal` with the realtime server): when SIM_DEV_MINIMAL_REGISTRY=1, a Turbopack/webpack resolve-alias swaps the two heavy registries for tiny curated variants — `@/tools/registry` → 2 tools, `@/blocks/registry-maps` → ~20 core blocks. The shared workspace layout drags the full ~247-tool registry (~2,074 modules) into every route via providers/utils → tools/params, and the editor/executor pull all ~268 block configs; aliasing both stops Turbopack from compiling those graphs at all. To make the blocks alias clean, the heavy block import maps move out of registry.ts into registry-maps.ts (registry.ts keeps only its accessors, importing the maps); its public API is unchanged and full builds/tests use the full maps. The alias is gated on isDev + the flag and is never applied in production. Measured (Turbopack dev, authenticated, /logs): peak next-server RSS ~16 GB → ~4.7 GB, compile 4.9 min → ~18 s; the workflow editor route similarly drops to ~5 GB / ~17 s. Only http_request + function_execute and the curated core blocks work in minimal mode; unset the flag for the full set. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * perf(dev): add table block to minimal registry; doc block registration in registry-maps Adds the Table block to the SIM_DEV_MINIMAL_REGISTRY curated set so the tables surface works under dev:minimal. Updates the integration skills/rules and CLAUDE.md to point block registration at blocks/registry-maps.ts (the BLOCK_REGISTRY / BLOCK_META_REGISTRY maps), reflecting that registry.ts now holds only the accessor functions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(dev): rename dev:full:minimal → dev:full:minimal-registry Matches the dev:full:* formatting and makes the suffix self-explanatory — it is the registry that's minimal, not the dev stack. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 2686f04. Configure here.
… tool versions (#5246) * improvement(clickhouse): expand block templates and skills, normalize tool versions - Expand ClickHouseBlockMeta templates from 3 to 9 (schema docs, table maintenance, partition retention, long-running-query alerts, table provisioning, storage growth report) - Add document-schema and maintain-tables skills (now 5, all grounded in tools.access) - Normalize tool version '1.0' to '1.0.0' across all 26 tools for repo consistency * improvement(clickhouse): enforce explicit safeguards in destructive-op guidance Address Greptile P1 review: tighten the partition-retention and kill-query templates/skill so agent guidance requires an explicit retention cutoff and elapsed-time threshold, lists/verifies targets first, defaults to alert-only, and never drops a partition or kills a query without confirmation. * improvement(clickhouse): make long-running-query template alert-only Address Greptile P1: the kill-query path is a low-level primitive (like drop_table/delete) and shouldn't carry tool-specific kill policy. Remove the autonomous-kill suggestion from the scheduled template so no shipped template steers an agent toward killing a query unattended; alert a human instead. The kill tool stays available for explicit manual use.
After clicking Edit on an existing SSO provider, the only header action was SaveDiscardActions, which renders nothing when the form is clean — so there was no way to leave edit mode back to the read-only summary without first changing a field or navigating away. Render a Cancel chip when isEditing && !hasChanges (the dirty-gated Discard already exits when there are changes).
…ite amplification (#5248) * improvement(logs): move per-block progress markers to Redis to cut write amplification Per-block lastStartedBlock/lastCompletedBlock markers were persisted via a jsonb_set UPDATE on workflow_execution_logs on every block start and complete (~2N UPDATEs per run) — the heaviest write query in the DB. These are live progress breadcrumbs with no DB-polling consumer (live progress comes from the executor over WebSocket); their only durable value is a breadcrumb folded into the final record. Behind the redis-progress-markers flag, markers now live in Redis during the run and are folded into the single terminal UPDATE at completion, dropping per-run row UPDATEs from ~2N+1 to 1. - New progress-markers module: HASH execution:progress:{id}, atomic Lua monotonic-guard writes preserving the existing <= ordering, reservation-aligned TTL backstop, graceful no-op when Redis is unavailable - Deterministic GC: cleared at every terminal/pause boundary; TTL covers crashes - Flag resolved once per logging session so a run never mixes write paths - Fold markers into the completion record (Redis wins, falls back to row markers) - Merge live markers for in-flight detail reads - Extract shared getExecutionReservationTtlMs so marker and admission-slot TTLs share one source of truth * fix(logs): SQL fallback when Redis marker write fails, fold markers on force-fail, validate marker shape Addresses review feedback on the redis-progress-markers PR: - persistLast* now falls back to the jsonb_set UPDATE when Redis is unavailable or the write fails (setLast* returns whether it persisted), so a marker is never dropped when the flag is on without a healthy Redis. - markExecutionAsFailed folds live Redis markers into execution_data before clearing, so the last-started/last-completed breadcrumb survives the force-fail path. - getProgressMarkers validates marker shape (rebuilds from typed fields), so a stale or wrong-shaped Redis value can never reach API consumers. * chore(logs): convert inline marker comments to TSDoc * fix(logs): preserve markers when the completion read fails getProgressMarkers now returns null on a Redis read error (vs {} for genuinely empty). completeWorkflowExecution and markExecutionAsFailed skip clearProgressMarkers when the read returns null, so a transient read error at completion no longer wipes markers that are still durably in Redis — the TTL reclaims them instead. * fix(logs): resolve marker store split-brain by latest-timestamp-wins + drain on force-fail - When a Redis marker write falls back to SQL, Redis and the row can each hold a marker for a different block; reads/folds previously preferred Redis unconditionally and could pick a stale value. Now the completion fold, the in-flight detail read, and the force-fail fold all pick the marker with the later timestamp (pickLatestStartedMarker/pickLatestCompletedMarker; markExecutionAsFailed uses a monotonic SQL guard). - markAsFailed now drains pending per-block marker writes (not just the completion promise) before folding, so a force-fail racing onBlockStart/onBlockComplete still captures the latest breadcrumb. * fix(logs): harden Lua marker guard against non-table decoded values Guard the monotonic-check index with type(decoded) == 'table' so a corrupted Redis field that decodes to a non-table (e.g. a number) can't error the eval; our write path only ever stores JSON objects, so this is defense-in-depth. * perf(logs): skip completion Redis read/clear when markers went to SQL completeWorkflowExecution now takes readProgressMarkers (the session's resolved marker mode); when the flag is off it skips the per-completion HGETALL+DEL entirely instead of probing a key that was never written. Sticky to the session so it stays flip-safe (an execution that wrote to Redis always folds+clears Redis). Non-session callers default to true (safe read-and-fold). Also hardened the Lua guard with type(decoded)=='table'.
updateWebhookProviderConfig built a DB-side merge with jsonb operators
(COALESCE(provider_config, '{}'::jsonb) || $1::jsonb), but the
provider_config column is json, not jsonb. Postgres cannot apply jsonb
merge operators to a json column, so every polling state write failed
with "could not convert type jsonb to json" — silently breaking
historyId/lastCheckedTimestamp/pageToken/lastSeenGuids persistence for
all polling webhooks (Gmail, RSS, Google Sheets/Drive, Outlook, IMAP)
since the atomic-merge change landed.
Cast the column to jsonb for the || / - merge and cast the result back
to json for storage, matching the existing pattern in subscription.ts.
…oy (#5250) When a deploy activates a new version, superseded versions' webhooks are removed by a separate, best-effort CLEANUP_INACTIVE outbox event. When that event is lost/dead-letters, old-version webhooks linger as is_active orphans that fetchActiveWebhooks skips (version mismatch), so they silently stop polling (~515 webhooks across ~130 workflows in prod). Run the existing cleanupInactiveDeploymentVersions synchronously in the SYNC_ACTIVE handler, right after the active version's webhooks/schedules are registered, falling back to the deferred outbox event only if the inline pass throws. This reuses the existing guarded cleanup, which re-checks each version is still inactive before tearing anything down (so it never touches the active version) and runs strictly after registration (so a teardown failure can't block it).

Uh oh!
There was an error while loading. Please reload this page.