Skip to content

improvement(db): add session statement/lock timeouts; simplify KB doc tx#4593

Open
TheodoreSpeaks wants to merge 60 commits into
stagingfrom
fix/threshold-billing-pool-deadlock
Open

improvement(db): add session statement/lock timeouts; simplify KB doc tx#4593
TheodoreSpeaks wants to merge 60 commits into
stagingfrom
fix/threshold-billing-pool-deadlock

Conversation

@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator

Summary

  • Add server-side lock_timeout=5s / statement_timeout=30s session defaults via connection: {...} startup params in packages/db/db.ts. Converts silent pool wedges into loud server-side cancellations.
  • Workspace archival tx keeps its own ceiling via SET LOCAL statement_timeout='5min' and lock_timeout='30s' — rare admin op stays atomic without tripping the new global default.
  • KB document creation (bulk + single) drops the db.transaction + SELECT 1 ... FOR UPDATE wrapper. A single bulk db.insert(...).values([...]) is atomic by Postgres natively; FK fails loud on concurrent KB delete. Side effect: removes a processDocumentTags-uses-db-inside-tx deadlock surface.

Type of Change

  • Bug fix

Testing

Tested manually. bun run lint clean. bun run check:api-validation:strict passes. Vitest: workspace lifecycle 2/2, billing 29/29, knowledge 43/43, webhook trigger 17/17.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

waleedlatif1 and others added 30 commits April 3, 2026 23:30
…ership workflow edits via sockets, ui improvements
…ration, signup method feature flags, SSO improvements
* feat(posthog): Add tracking on mothership abort (#4023)

Co-authored-by: Theodore Li <theo@sim.ai>

* fix(login): fix captcha headers for manual login  (#4025)

* fix(signup): fix turnstile key loading

* fix(login): fix captcha header passing

* Catch user already exists, remove login form captcha
…nts, secrets performance, polling refactors, drag resources in mothership
…endar triggers, docs updates, integrations/models pages improvements
…mat, logs performance improvements

fix(csp): add missing analytics domains, remove unsafe-eval, fix workspace CSP gap (#4179)
fix(landing): return 404 for invalid dynamic route slugs (#4182)
improvement(seo): optimize sitemaps, robots.txt, and core web vitals across sim and docs (#4170)
fix(gemini): support structured output with tools on Gemini 3 models (#4184)
feat(brightdata): add Bright Data integration with 8 tools (#4183)
fix(mothership): fix superagent credentials (#4185)
fix(logs): close sidebar when selected log disappears from filtered list; cleanup (#4186)
v0.6.46: mothership streaming fixes, brightdata integration
TheodoreSpeaks and others added 23 commits April 24, 2026 16:59
v0.6.57: mothership reliability, ashby refactor, tables row count, copilot id fix, bun upgrade
…rizations, mothership positional table row insertion, CI improvements, org-external users, file viewer improvements
v0.6.62: fix new copilot chat creation and selection on refresh
…ixes, db query optimizations, contract boundaries code hygiene, CORS, toast improvements, tables infinite query, executor robustness, reranker support
…ogs block, parallel-in-loop wall clock, gpt-image-2
…s, logs panel width, tables UI/DB decoupling

v0.6.67: VFS upload fix, posthog/copilot correlation, exa date filters, logs panel width, tables UI/DB decoupling
…ering upgrades, data drains, security hardening, paginated dropdowns
…ntegrations, robots.txt update, workday hardening
v0.6.72: billing pool contention fix
…personation fixes, md rendering, doc/pdf/pptx generation improvements
…pentelemetry updates, data drains to snowflake, blob, datadog, bigquery
…ip md polish

v0.6.75: scheduler claim-budget drain, helm chart hardening, mothership md polish
@vercel
Copy link
Copy Markdown

vercel Bot commented May 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment May 14, 2026 2:50am

Request Review

@gitguardian
Copy link
Copy Markdown

gitguardian Bot commented May 14, 2026

⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
29606901 Triggered Generic High Entropy Secret a54dcbe apps/sim/providers/utils.test.ts View secret
32763747 Triggered Generic Password 3e9849b helm/sim/tests/validators_test.yaml View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@cursor
Copy link
Copy Markdown

cursor Bot commented May 14, 2026

PR Summary

Medium Risk
Introduces global Postgres lock_timeout/statement_timeout defaults, which can cause previously-long or lock-waiting queries to start failing and needs validation in production workloads. Also changes knowledge-base document creation semantics by removing a transaction/lock, so knowledge_base.updatedAt is no longer updated atomically with the insert.

Overview
Adds default Postgres session guards in packages/db/db.ts (lock_timeout=5s, statement_timeout=30s) so lock waits and long-running queries fail fast instead of wedging connections.

Updates workspace archival to SET LOCAL higher per-transaction timeouts (5min statement, 30s lock) to keep this rare, large operation reliable under the new global defaults.

Simplifies knowledge-base document creation (createDocumentRecords, createSingleDocument) by dropping the explicit transaction and SELECT ... FOR UPDATE lock, relying on FK constraints/atomic inserts and reducing deadlock/pool checkout overhead.

Reviewed by Cursor Bugbot for commit 0ad209f. Bugbot is set up for automated code reviews on this repo. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 14, 2026

Greptile Summary

This PR adds session-level lock_timeout (5 s) and statement_timeout (30 s) to the Postgres connection pool, overrides them per-transaction for the workspace archival path, and removes the db.transaction + SELECT … FOR UPDATE wrappers from both KB document creation functions to eliminate a processDocumentTags-inside-transaction deadlock surface.

  • packages/db/db.ts: Adds connection: { lock_timeout: 5000, statement_timeout: 30000 } startup params; numeric ms values are correctly interpreted by PostgreSQL without units.
  • lifecycle.ts: Wraps the archive transaction with SET LOCAL overrides (5 min / 30 s), correctly scoped so the session defaults apply everywhere else.
  • documents/service.ts: Drops db.transaction from createDocumentRecords and createSingleDocument; the stated rationale that "FK fails loud on concurrent KB delete" is accurate only for hard-deletes — KB deletion here is exclusively soft (deletedAt), leaving a TOCTOU window where documents can be silently inserted into an already-deleted KB.

Confidence Score: 3/5

The connection-timeout and workspace-archival changes are clean and safe; the document-creation simplification removes a real deadlock surface but the claimed FK protection does not apply to soft-deletes, leaving orphaned documents possible on concurrent KB deletion.

The document-service refactor solves a real deadlock problem, but KB deletion is exclusively soft in this codebase so the race window is real — documents can be inserted into a just-deleted KB without any error. The insert and updatedAt update are also now two separate uncommitted statements, so a mid-flight interruption leaves a minor inconsistency.

apps/sim/lib/knowledge/documents/service.ts — the TOCTOU window and non-atomic insert/update deserve a second look before merging.

Important Files Changed

Filename Overview
apps/sim/lib/knowledge/documents/service.ts Removes db.transaction + SELECT FOR UPDATE from both createDocumentRecords and createSingleDocument; eliminates the deadlock surface but opens a TOCTOU window because KB deletion is soft-only and the FK constraint cannot catch a concurrent soft-delete.
packages/db/db.ts Adds session-level lock_timeout=5s and statement_timeout=30s via postgres.js connection startup params; values are numeric milliseconds, which PostgreSQL correctly interprets without units as milliseconds. Clean and well-commented.
apps/sim/lib/workspaces/lifecycle.ts Adds SET LOCAL overrides at the start of the archive transaction to extend statement_timeout to 5min and lock_timeout to 30s; correctly scoped to the transaction via SET LOCAL.
apps/sim/lib/workspaces/lifecycle.test.ts Adds tx.execute mock for the new SET LOCAL calls in the archive transaction; minimal but covers the changed interface.

Reviews (1): Last reviewed commit: "improvement(db): add session statement/l..." | Re-trigger Greptile

Comment on lines +769 to +777
// No tx wrapper: the bulk `db.insert(...).values([...])` is a single statement
// and atomic by Postgres. The KB FK constraint fails loud if the KB is
// concurrently deleted, so an explicit FOR UPDATE lock is unnecessary and
// doubles per-call pool checkouts.
const kb = await db
.select({ id: knowledgeBase.id })
.from(knowledgeBase)
.where(and(eq(knowledgeBase.id, knowledgeBaseId), isNull(knowledgeBase.deletedAt)))
.limit(1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 FK guard doesn't cover soft-deletes

The comment claims "KB FK constraint fails loud on concurrent KB delete," but KB deletion in this codebase is always a soft-delete (deletedAt = now) — confirmed by knowledgeService.ts ("Soft deleted knowledge base"). Since the knowledge_base row is never hard-deleted, the FK constraint on document.knowledgeBaseId never fires when a KB is concurrently soft-deleted.

The old SELECT ... FOR UPDATE locked the KB row for the duration of the transaction, serializing any concurrent soft-delete attempt. Without that lock, a concurrent UPDATE knowledge_base SET deletedAt = now can succeed between the SELECT on line 773 and the db.insert on line 850, leaving documents silently inserted into an already-deleted KB.

Comment on lines +849 to +858
if (documentRecords.length > 0) {
await db.insert(document).values(documentRecords)
logger.info(
`[${requestId}] Bulk created ${documentRecords.length} document records in knowledge base ${knowledgeBaseId}`
)

return returnData
})
await db
.update(knowledgeBase)
.set({ updatedAt: now })
.where(eq(knowledgeBase.id, knowledgeBaseId))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Non-atomic insert + timestamp update

The db.insert(document) and the subsequent db.update(knowledgeBase).set({ updatedAt }) run as two separate statements with no enclosing transaction. If the process is interrupted after the insert succeeds but before the update runs, documents will exist in Postgres with a stale knowledge_base.updatedAt. This applies identically to the createSingleDocument path at line 1326–1331.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0ad209f. Configure here.

.limit(1)
// No tx wrapper: the bulk `db.insert(...).values([...])` is a single statement
// and atomic by Postgres. The KB FK constraint fails loud if the KB is
// concurrently deleted, so an explicit FOR UPDATE lock is unnecessary and
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FK constraint doesn't guard against soft-deleted KB race

Medium Severity

The comments claim "The KB FK constraint fails loud if the KB is concurrently deleted," but knowledge bases are only ever soft-deleted (setting deletedAt), never physically removed. The FK on document.knowledgeBaseId references knowledgeBase.id — since the row still exists after a soft-delete, the constraint is always satisfied and never rejects an insert. Removing the old FOR UPDATE lock + transaction means a concurrent soft-delete between the isNull(deletedAt) check and the db.insert(document) now silently succeeds, creating orphaned documents inside a deleted KB. The bulk path is especially exposed because processDocumentTags performs additional async DB calls that widen the race window.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 0ad209f. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants