Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
0b9019d
v0.6.23: MCP fixes, remove local state in favor of server state, moth…
waleedlatif1 Apr 4, 2026
a54dcbe
v0.6.24: copilot feedback wiring, captcha fixes
waleedlatif1 Apr 4, 2026
28af223
v0.6.25: cloudwatch, cloudformation, live kb sync, linear fixes, post…
waleedlatif1 Apr 5, 2026
d889f32
v0.6.26: ui improvements, multiple response blocks, docx previews, ol…
waleedlatif1 Apr 5, 2026
316bc8c
v0.6.27: new triggers, mothership improvements, files archive, queuei…
waleedlatif1 Apr 7, 2026
3f508e4
v0.6.28: new docs, delete confirmation standardization, dagster integ…
waleedlatif1 Apr 7, 2026
d6ec115
v0.6.29: login improvements, posthog telemetry (#4026)
TheodoreSpeaks Apr 7, 2026
d7da35b
v0.6.30: slack trigger enhancements, connectors performance improveme…
waleedlatif1 Apr 8, 2026
cf233bb
v0.6.31: elevenlabs voice, trigger.dev fixes, cloud whitelabeling for…
waleedlatif1 Apr 8, 2026
f8f3758
v0.6.32: BYOK fixes, ui improvements, cloudwatch tools, jsm tools ext…
waleedlatif1 Apr 9, 2026
3c8bb40
v0.6.33: polling improvements, jsm forms tools, credentials reactquer…
waleedlatif1 Apr 9, 2026
d33acf4
v0.6.34: trigger.dev fixes, CI speedup, atlassian error extractor
waleedlatif1 Apr 9, 2026
4f40c4c
v0.6.35: additional jira fields, HITL docs, logs cleanup efficiency
waleedlatif1 Apr 10, 2026
cbfab1c
v0.6.36: new chunkers, sockets state machine, google sheets/drive/cal…
waleedlatif1 Apr 11, 2026
4309d06
v0.6.37: audit logs page, isolated-vm worker rotation, permission gro…
waleedlatif1 Apr 12, 2026
8b57476
v0.6.38: models page
waleedlatif1 Apr 12, 2026
e3d0e74
v0.6.39: billing fixes, tools audit, landing fix
waleedlatif1 Apr 13, 2026
0ac0539
v0.6.40: mothership tool loop, new skills, agiloft, STS, IAM integrat…
waleedlatif1 Apr 14, 2026
3838b6e
v0.6.41: webhooks fix, workers removal
waleedlatif1 Apr 14, 2026
fc07922
v0.6.42: mothership nested file reads, search modal improvements
waleedlatif1 Apr 14, 2026
3a1b1a8
v0.6.43: mothership billing idempotency, env var resolution fixes
waleedlatif1 Apr 14, 2026
46ffc49
v0.6.44: streamdown, mothership intelligence, excel extension
waleedlatif1 Apr 15, 2026
010435c
v0.6.45: superagent, csp, brightdata integration, gemini response for…
Sg312 Apr 15, 2026
c0bc62c
Merge pull request #4190 from simstudioai/staging
icecrasher321 Apr 16, 2026
387cc97
v0.6.46: mothership queueing, web vitals
waleedlatif1 Apr 16, 2026
2dbc7fd
v0.6.47: files focusing, documentation, opus 4.7
waleedlatif1 Apr 16, 2026
8a50f18
v0.6.48: import csv into tables, subflow fixes, CSP updates
waleedlatif1 Apr 16, 2026
dcf3302
v0.6.49: deploy sockets event, resolver, logs improvements, monday.co…
waleedlatif1 Apr 17, 2026
bc09865
v0.6.50: ppt/doc/pdf worker isolation, docs, chat, sidebar improvements
icecrasher321 Apr 18, 2026
5f56e46
v0.6.51: tables improvements, billing fixes, 404 pages, code hygiene
waleedlatif1 Apr 20, 2026
ca3bbf1
v0.6.52: data retention, docs updates, slack manifest generator, secu…
waleedlatif1 Apr 22, 2026
bbf400f
v0.6.53: permissions groups migration, docs updates
waleedlatif1 Apr 22, 2026
7c619e7
Merge pull request #4261 from simstudioai/staging
icecrasher321 Apr 22, 2026
64cfda5
v0.6.54: mothership tracing, db pool size increase
icecrasher321 Apr 22, 2026
7ca736a
v0.6.55: standardize monorepo conventions, api key hash, thinking tex…
waleedlatif1 Apr 23, 2026
6066fc1
v0.6.56: data retention improvements, tables column double click resi…
waleedlatif1 Apr 24, 2026
3422f64
Merge pull request #4285 from simstudioai/staging
waleedlatif1 Apr 24, 2026
595c4c3
Merge pull request #4293 from simstudioai/staging
TheodoreSpeaks Apr 24, 2026
d6c1bc2
v0.6.58: queue abort state machine improvement, contributing guide
icecrasher321 Apr 25, 2026
58a3ae2
v0.6.59: gpt 5.5, security hardening, parallel subagents rendering
icecrasher321 Apr 27, 2026
489f2d3
v0.6.60: copilot security improvements, slack canvas ops, retention j…
icecrasher321 Apr 27, 2026
6aa3fe3
v0.6.61: SAP integration, live URLs for browser use, 5xx error catego…
icecrasher321 Apr 29, 2026
ecbf5e5
Merge pull request #4342 from simstudioai/staging
TheodoreSpeaks Apr 29, 2026
2aaf2b7
v0.6.62: firecrawl parse, new gmail tools, trace improvements, tool f…
waleedlatif1 May 2, 2026
d445b9c
v0.6.63: knowledgebase UI, folder search in mothership
waleedlatif1 May 2, 2026
4bc6a17
v0.6.64: table limits env vars, workspace files improvements, integra…
waleedlatif1 May 3, 2026
5be12f8
v0.6.65: memory fix, image uploads in files
waleedlatif1 May 3, 2026
4253e57
v0.6.66: child trace spans, reranker controls, attachment previews, l…
waleedlatif1 May 5, 2026
8d6b615
v0.6.67: VFS upload fix, posthog/copilot correlation, exa date filter…
TheodoreSpeaks May 5, 2026
efcd51a
v0.6.68: atlassian service accounts, 30 day wait block, markdown rend…
waleedlatif1 May 6, 2026
8d934f3
v0.6.69: security hardening, nextjs upgrade, SAP Concur, Emailbison i…
waleedlatif1 May 7, 2026
5ea80a8
v0.6.70: legacy workflow sanitization
icecrasher321 May 7, 2026
3cc581e
v0.6.71: build error fix
icecrasher321 May 7, 2026
273e608
Merge pull request #4496 from simstudioai/staging
TheodoreSpeaks May 7, 2026
07b8f1b
v0.6.72: tables improvements, search and replace, logs with files, im…
waleedlatif1 May 9, 2026
dcaf3e9
v0.6.73: zustand v5 migration fix
icecrasher321 May 9, 2026
6aeb981
v0.6.74: security hardening, workers recycling, next-mdx-remote and o…
waleedlatif1 May 12, 2026
3e9849b
v0.6.75: scheduler claim-budget drain, helm chart hardening, mothersh…
TheodoreSpeaks May 12, 2026
64d855a
v0.6.76: helm updates, media centering, lazy loading, security hardening
waleedlatif1 May 13, 2026
0ad209f
improvement(db): add session statement/lock timeouts; simplify KB doc tx
TheodoreSpeaks May 14, 2026
512e887
fix(knowledge): close soft-delete TOCTOU on KB document insert
TheodoreSpeaks May 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
313 changes: 213 additions & 100 deletions apps/sim/lib/knowledge/documents/service.ts
Original file line number Diff line number Diff line change
Expand Up @@ -748,6 +748,126 @@ async function processDocumentsWithTrigger(
}
}

interface NewDocumentRow {
id: string
knowledgeBaseId: string
filename: string
fileUrl: string
fileSize: number
mimeType: string
chunkCount: number
tokenCount: number
characterCount: number
processingStatus: 'pending'
enabled: boolean
uploadedAt: Date
tag1: string | null
tag2: string | null
tag3: string | null
tag4: string | null
tag5: string | null
tag6: string | null
tag7: string | null
number1: number | null
number2: number | null
number3: number | null
number4: number | null
number5: number | null
date1: Date | null
date2: Date | null
boolean1: boolean | null
boolean2: boolean | null
boolean3: boolean | null
}

/**
* Insert N document rows IF the parent knowledge base is still alive
* (`deleted_at IS NULL`) at the statement's MVCC snapshot. Returns the
* number of rows actually inserted.
*
* Knowledge bases are soft-deleted, so a normal FK can't catch a concurrent
* delete — the KB row physically remains. We do the existence check and the
* insert in a single statement via INSERT...SELECT...WHERE EXISTS, which
* Postgres evaluates atomically. No transaction or row lock required, no
* race window between check and insert.
*
* Returns 0 if the KB was soft-deleted; caller throws.
*/
async function insertDocumentsIfKbAlive(
rows: NewDocumentRow[],
knowledgeBaseId: string
): Promise<number> {
if (rows.length === 0) return 0

// jsonb_to_recordset declares the column types once, so we don't need to
// cast every parameter individually to keep Postgres' type inference happy
// when nullable columns end up all-NULL across the batch.
const jsonRows = rows.map((d) => ({
id: d.id,
knowledge_base_id: d.knowledgeBaseId,
filename: d.filename,
file_url: d.fileUrl,
file_size: d.fileSize,
mime_type: d.mimeType,
chunk_count: d.chunkCount,
token_count: d.tokenCount,
character_count: d.characterCount,
processing_status: d.processingStatus,
enabled: d.enabled,
uploaded_at: d.uploadedAt.toISOString(),
tag1: d.tag1,
tag2: d.tag2,
tag3: d.tag3,
tag4: d.tag4,
tag5: d.tag5,
tag6: d.tag6,
tag7: d.tag7,
number1: d.number1,
number2: d.number2,
number3: d.number3,
number4: d.number4,
number5: d.number5,
date1: d.date1?.toISOString() ?? null,
date2: d.date2?.toISOString() ?? null,
boolean1: d.boolean1,
boolean2: d.boolean2,
boolean3: d.boolean3,
}))

const result = await db.execute(sql`
INSERT INTO document (
id, knowledge_base_id, filename, file_url, file_size, mime_type,
chunk_count, token_count, character_count, processing_status, enabled, uploaded_at,
tag1, tag2, tag3, tag4, tag5, tag6, tag7,
number1, number2, number3, number4, number5,
date1, date2,
boolean1, boolean2, boolean3
)
SELECT
id, knowledge_base_id, filename, file_url, file_size, mime_type,
chunk_count, token_count, character_count, processing_status, enabled, uploaded_at,
tag1, tag2, tag3, tag4, tag5, tag6, tag7,
number1, number2, number3, number4, number5,
date1, date2,
boolean1, boolean2, boolean3
FROM jsonb_to_recordset(${JSON.stringify(jsonRows)}::jsonb) AS x(
id text, knowledge_base_id text, filename text, file_url text, file_size integer, mime_type text,
chunk_count integer, token_count integer, character_count integer, processing_status text, enabled boolean, uploaded_at timestamp,
tag1 text, tag2 text, tag3 text, tag4 text, tag5 text, tag6 text, tag7 text,
number1 double precision, number2 double precision, number3 double precision, number4 double precision, number5 double precision,
date1 timestamp, date2 timestamp,
boolean1 boolean, boolean2 boolean, boolean3 boolean
)
WHERE EXISTS (
SELECT 1 FROM knowledge_base
WHERE id = ${knowledgeBaseId} AND deleted_at IS NULL
)
RETURNING id
`)

return Array.from(result).length
}

export async function createDocumentRecords(
documents: Array<{
filename: string
Expand All @@ -766,99 +886,102 @@ export async function createDocumentRecords(
knowledgeBaseId: string,
requestId: string
): Promise<DocumentData[]> {
return await db.transaction(async (tx) => {
await tx.execute(sql`SELECT 1 FROM knowledge_base WHERE id = ${knowledgeBaseId} FOR UPDATE`)

const kb = await tx
.select({ id: knowledgeBase.id })
.from(knowledgeBase)
.where(and(eq(knowledgeBase.id, knowledgeBaseId), isNull(knowledgeBase.deletedAt)))
.limit(1)
// Cheap upfront existence check so the common KB-not-found path fails fast
// before we burn CPU on tag processing. The atomic insert below is the
// race-safe guard against a concurrent KB soft-delete in the small window
// between this check and the insert.
const kb = await db
.select({ id: knowledgeBase.id })
.from(knowledgeBase)
.where(and(eq(knowledgeBase.id, knowledgeBaseId), isNull(knowledgeBase.deletedAt)))
.limit(1)

if (kb.length === 0) {
throw new Error('Knowledge base not found')
}
if (kb.length === 0) {
throw new Error('Knowledge base not found')
}

const now = new Date()
const documentRecords = []
const returnData: DocumentData[] = []
const now = new Date()
const documentRecords: NewDocumentRow[] = []
const returnData: DocumentData[] = []

for (const docData of documents) {
const documentId = generateId()
for (const docData of documents) {
const documentId = generateId()

let processedTags: Partial<ProcessedDocumentTags> = {}
let processedTags: Partial<ProcessedDocumentTags> = {}

if (docData.documentTagsData) {
try {
const tagData = JSON.parse(docData.documentTagsData)
if (Array.isArray(tagData)) {
processedTags = await processDocumentTags(knowledgeBaseId, tagData, requestId)
}
} catch (error) {
if (error instanceof SyntaxError) {
logger.warn(`[${requestId}] Failed to parse documentTagsData for bulk document:`, error)
} else {
throw error
}
if (docData.documentTagsData) {
try {
const tagData = JSON.parse(docData.documentTagsData)
if (Array.isArray(tagData)) {
processedTags = await processDocumentTags(knowledgeBaseId, tagData, requestId)
}
} catch (error) {
if (error instanceof SyntaxError) {
logger.warn(`[${requestId}] Failed to parse documentTagsData for bulk document:`, error)
} else {
throw error
}
}
}

const newDocument = {
id: documentId,
knowledgeBaseId,
filename: docData.filename,
fileUrl: docData.fileUrl,
fileSize: docData.fileSize,
mimeType: docData.mimeType,
chunkCount: 0,
tokenCount: 0,
characterCount: 0,
processingStatus: 'pending' as const,
enabled: true,
uploadedAt: now,
tag1: processedTags.tag1 ?? docData.tag1 ?? null,
tag2: processedTags.tag2 ?? docData.tag2 ?? null,
tag3: processedTags.tag3 ?? docData.tag3 ?? null,
tag4: processedTags.tag4 ?? docData.tag4 ?? null,
tag5: processedTags.tag5 ?? docData.tag5 ?? null,
tag6: processedTags.tag6 ?? docData.tag6 ?? null,
tag7: processedTags.tag7 ?? docData.tag7 ?? null,
number1: processedTags.number1 ?? null,
number2: processedTags.number2 ?? null,
number3: processedTags.number3 ?? null,
number4: processedTags.number4 ?? null,
number5: processedTags.number5 ?? null,
date1: processedTags.date1 ?? null,
date2: processedTags.date2 ?? null,
boolean1: processedTags.boolean1 ?? null,
boolean2: processedTags.boolean2 ?? null,
boolean3: processedTags.boolean3 ?? null,
}

documentRecords.push(newDocument)
returnData.push({
documentId,
filename: docData.filename,
fileUrl: docData.fileUrl,
fileSize: docData.fileSize,
mimeType: docData.mimeType,
})
const newDocument = {
id: documentId,
knowledgeBaseId,
filename: docData.filename,
fileUrl: docData.fileUrl,
fileSize: docData.fileSize,
mimeType: docData.mimeType,
chunkCount: 0,
tokenCount: 0,
characterCount: 0,
processingStatus: 'pending' as const,
enabled: true,
uploadedAt: now,
tag1: processedTags.tag1 ?? docData.tag1 ?? null,
tag2: processedTags.tag2 ?? docData.tag2 ?? null,
tag3: processedTags.tag3 ?? docData.tag3 ?? null,
tag4: processedTags.tag4 ?? docData.tag4 ?? null,
tag5: processedTags.tag5 ?? docData.tag5 ?? null,
tag6: processedTags.tag6 ?? docData.tag6 ?? null,
tag7: processedTags.tag7 ?? docData.tag7 ?? null,
number1: processedTags.number1 ?? null,
number2: processedTags.number2 ?? null,
number3: processedTags.number3 ?? null,
number4: processedTags.number4 ?? null,
number5: processedTags.number5 ?? null,
date1: processedTags.date1 ?? null,
date2: processedTags.date2 ?? null,
boolean1: processedTags.boolean1 ?? null,
boolean2: processedTags.boolean2 ?? null,
boolean3: processedTags.boolean3 ?? null,
}

if (documentRecords.length > 0) {
await tx.insert(document).values(documentRecords)
logger.info(
`[${requestId}] Bulk created ${documentRecords.length} document records in knowledge base ${knowledgeBaseId}`
)
documentRecords.push(newDocument)
returnData.push({
documentId,
filename: docData.filename,
fileUrl: docData.fileUrl,
fileSize: docData.fileSize,
mimeType: docData.mimeType,
})
}

await tx
.update(knowledgeBase)
.set({ updatedAt: now })
.where(eq(knowledgeBase.id, knowledgeBaseId))
if (documentRecords.length > 0) {
const insertedCount = await insertDocumentsIfKbAlive(documentRecords, knowledgeBaseId)
if (insertedCount === 0) {
throw new Error('Knowledge base not found')
}
logger.info(
`[${requestId}] Bulk created ${insertedCount} document records in knowledge base ${knowledgeBaseId}`
)

return returnData
})
await db
.update(knowledgeBase)
.set({ updatedAt: now })
.where(eq(knowledgeBase.id, knowledgeBaseId))
Comment thread
TheodoreSpeaks marked this conversation as resolved.
}

return returnData
}

export interface TagFilterCondition {
Expand Down Expand Up @@ -1297,7 +1420,7 @@ export async function createSingleDocument(
}
}

const newDocument = {
const newDocument: NewDocumentRow = {
id: documentId,
knowledgeBaseId,
filename: documentData.filename,
Expand All @@ -1307,31 +1430,21 @@ export async function createSingleDocument(
chunkCount: 0,
tokenCount: 0,
characterCount: 0,
processingStatus: 'pending',
enabled: true,
uploadedAt: now,
...processedTags,
}

await db.transaction(async (tx) => {
await tx.execute(sql`SELECT 1 FROM knowledge_base WHERE id = ${knowledgeBaseId} FOR UPDATE`)

const kb = await tx
.select({ id: knowledgeBase.id })
.from(knowledgeBase)
.where(and(eq(knowledgeBase.id, knowledgeBaseId), isNull(knowledgeBase.deletedAt)))
.limit(1)

if (kb.length === 0) {
throw new Error('Knowledge base not found')
}

await tx.insert(document).values(newDocument)
const insertedCount = await insertDocumentsIfKbAlive([newDocument], knowledgeBaseId)
if (insertedCount === 0) {
throw new Error('Knowledge base not found')
}

await tx
.update(knowledgeBase)
.set({ updatedAt: now })
.where(eq(knowledgeBase.id, knowledgeBaseId))
})
await db
.update(knowledgeBase)
.set({ updatedAt: now })
.where(eq(knowledgeBase.id, knowledgeBaseId))
logger.info(`[${requestId}] Document created: ${documentId} in knowledge base ${knowledgeBaseId}`)

return newDocument as {
Expand Down
1 change: 1 addition & 0 deletions apps/sim/lib/workspaces/lifecycle.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ describe('workspace lifecycle', () => {
})

const tx = {
execute: vi.fn().mockResolvedValue([]),
select: vi.fn().mockReturnValue({
from: vi.fn().mockReturnValue({
where: vi.fn().mockResolvedValue([{ id: 'kb-1' }]),
Expand Down
7 changes: 7 additions & 0 deletions apps/sim/lib/workspaces/lifecycle.ts
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,13 @@ export async function archiveWorkspace(
.where(eq(workflowMcpServer.workspaceId, workspaceId))

await db.transaction(async (tx) => {
// Workspace archival is a rare admin/cleanup operation that touches every
// child table; on large workspaces it can exceed the 30s session default.
// Override per-tx with a generous ceiling — if it ever runs longer than
// this something is genuinely wrong.
await tx.execute(sql`SET LOCAL statement_timeout = '5min'`)
await tx.execute(sql`SET LOCAL lock_timeout = '30s'`)

await tx
.update(knowledgeBase)
.set({
Expand Down
8 changes: 8 additions & 0 deletions packages/db/db.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,14 @@ const postgresClient = postgres(connectionString, {
connect_timeout: 30,
max: 30,
onnotice: () => {},
// Server-side guards. lock_timeout cancels a query waiting on a row lock for
// >5s (e.g. another tx holding `SELECT ... FOR UPDATE`). statement_timeout
// cancels any query running >30s. Heavy paths that legitimately need longer
// (table service bulk JSONB rewrites) override per-tx with `SET LOCAL`.
connection: {
lock_timeout: 5_000,
statement_timeout: 30_000,
},
})

export const db = drizzle(postgresClient, { schema })
Loading