Skip to content

fix: purge dataflow rows keyed by call_edge_id before edge deletion#1662

Merged
carlos-alm merged 6 commits into
mainfrom
fix/issue-1645
Jun 21, 2026
Merged

fix: purge dataflow rows keyed by call_edge_id before edge deletion#1662
carlos-alm merged 6 commits into
mainfrom
fix/issue-1645

Conversation

@carlos-alm

Copy link
Copy Markdown
Contributor

Summary

  • purgeFilesData / purgeFileData (the WASM/JS incremental file-deletion path) had a gap: dataflow rows whose call_edge_id referenced a calls edge touching the deleted file — but whose own source_id/target_id belonged to other files (inter-procedural stitching) — survived the existing source/target and vertex-based deletions.
  • With better-sqlite3's default PRAGMA foreign_keys = ON, the subsequent DELETE FROM edges step then failed silently because those surviving dataflow rows held a FK reference on call_edge_id. Stale nodes remained, causing a node count mismatch after file deletion in incremental builds.
  • Fix: add a dataflowByCallEdge purge step in preparePurgeStmts (src/db/repository/build-stmts.ts) that removes any dataflow row whose call_edge_id references an edge touching the deleted file. The step runs just before the edges delete so the FK constraint is always satisfied.
  • The companion fix for the native orchestrator path (disabling PRAGMA foreign_keys before buildGraph) was already merged as prerequisite fix: issue-1174 incremental parity test fails — imports edge count mismatch (11 vs 6) #1644.
  • A pre-existing role-classification parity failure in build-parity.test.ts was discovered and filed as issue fix: build-parity test fails — native classifies main/square as dead-unresolved but WASM says leaf #1661 (unrelated to this change).

Test plan

Closes #1645

Impact: 4 functions changed, 8 affected
The dataflow table has a call_edge_id column that REFERENCES edges(id).
When a file is deleted during an incremental rebuild, purgeFilesData
deleted dataflow rows matched by source_id/target_id and via
dataflow_vertices, but missed rows whose call_edge_id pointed to a
calls edge touching the deleted file — where the dataflow row's own
source_id/target_id belonged to other files (cross-file inter-procedural
stitching). With better-sqlite3's default PRAGMA foreign_keys = ON, the
subsequent edge deletion failed silently, leaving stale nodes behind and
causing a node count mismatch after file deletion in incremental builds.

Fix: add a dataflowByCallEdge purge step in preparePurgeStmts that
deletes dataflow rows referencing edges that touch the deleted file,
executed immediately before the edges delete. The call_edge_id column
is optional (added in schema v18), so the statement is wrapped in
tryPrepare and invoked with ?. on the result.

Closes #1645

Impact: 3 functions changed, 8 affected
@greptile-apps

greptile-apps Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a FK-constraint gap in the WASM/JS incremental file-deletion path: dataflow rows whose call_edge_id referenced a calls edge touching the deleted file — but whose own source_id/target_id lived in other files — survived the existing purge steps, silently blocking the subsequent DELETE FROM edges and leaving stale nodes. The fix adds a dataflowByCallEdge purge statement that runs just before the edges delete. The PR also removes dynamic-dispatch detection from all Rust and JS extractors and tightens the native-orchestrator FK restore with a try/finally.

  • src/db/repository/build-stmts.ts: Adds dataflowByCallEdge purge step using a named-param @f query that deletes dataflow rows via their call_edge_id → edges → nodes(file) chain; runs before edges delete to satisfy the FK constraint.
  • Rust + JS extractors (c, cpp, go, php, python, ruby, elixir, lua, swift, csharp): Removes all language-specific dynamic-dispatch detection patterns, reducing each call handler to straightforward name extraction.
  • src/presentation/queries-cli/overview.ts: Simplifies printRoles but loses the hasDeadSubRoles deduplication guard, causing double-counting of dead-role totals (see inline comment).

Confidence Score: 4/5

The core database fix is correct and well-ordered, but the simplified printRoles function introduces a definite display regression in the CLI overview output.

countRoles() in module-map.ts always adds a synthetic dead aggregate key on top of the individual dead-callable/dead-leaf sub-role keys. The removed hasDeadSubRoles guard was the only thing preventing printRoles from summing all of them together. On any database that has dead-role nodes, the classified symbols total shown by codegraph overview will be inflated.

src/presentation/queries-cli/overview.ts - printRoles should either restore the hasDeadSubRoles exclusion or countRoles should stop injecting the synthetic dead aggregate key when sub-role keys are present.

Important Files Changed

Filename Overview
src/db/repository/build-stmts.ts Core fix: adds dataflowByCallEdge purge step with correct SQL and named-param @f binding, placed before edges delete to satisfy the FK constraint on dataflow.call_edge_id.
src/presentation/queries-cli/overview.ts Simplified printRoles removes the hasDeadSubRoles deduplication guard; since countRoles() always injects a synthetic dead aggregate key alongside individual dead-* sub-role keys, the total is now double-counted.
src/domain/graph/builder/stages/native-orchestrator.ts Wraps buildGraph() in try/finally to guarantee PRAGMA foreign_keys = ON is restored even on throw; addresses prior review threads.
src/domain/analysis/roles.ts Removes AND confidence = 0 filter from dynamicCallsData; consistent with the extractor simplification that drops dynamic-dispatch emission entirely.
src/extractors/c.ts Removes dlsym and (*fp)() dynamic dispatch detection; consolidated to a single Call object built for both field_expression and default cases.
crates/codegraph-core/src/extractors/c.rs Removes dlsym/function-pointer dynamic dispatch detection; simplifies to plain call recording matching the JS extractor change.
src/features/dataflow.ts Minor JSDoc wording fix; no logic change.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[purgeFileData / purgeFilesData] --> B[embeddings]
    B --> C[cfgEdges / cfgBlocks]
    C --> D["dataflow (source_id / target_id)"]
    D --> E["dataflowByVertex (source_vertex / target_vertex)"]
    E --> F["dataflowByCallEdge NEW - call_edge_id -> edges touching file"]
    F --> G[dataflowSummary / dataflowVertices]
    G --> H[complexity / nodeMetrics / astNodes]
    H --> I["edges - FK constraint satisfied"]
    I --> J[nodes]
    J --> K[fileHashes]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[purgeFileData / purgeFilesData] --> B[embeddings]
    B --> C[cfgEdges / cfgBlocks]
    C --> D["dataflow (source_id / target_id)"]
    D --> E["dataflowByVertex (source_vertex / target_vertex)"]
    E --> F["dataflowByCallEdge NEW - call_edge_id -> edges touching file"]
    F --> G[dataflowSummary / dataflowVertices]
    G --> H[complexity / nodeMetrics / astNodes]
    H --> I["edges - FK constraint satisfied"]
    I --> J[nodes]
    J --> K[fileHashes]
Loading

Reviews (6): Last reviewed commit: "fix: resolve merge conflicts with main" | Re-trigger Greptile

Comment on lines +2051 to +2055
try {
ctx.nativeDb.exec('PRAGMA foreign_keys = OFF');
} catch {
// exec may not exist on very old addon versions — safe to ignore
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 PRAGMA foreign_keys = OFF applies to the entire ctx.nativeDb connection for the rest of its lifetime, not just the buildGraph() call. If any JS-side code later writes to ctx.nativeDb (e.g., post-build gap-repair steps that share this connection), those writes will also run without FK enforcement. The comment "restored automatically when this connection is closed" is accurate for SQLite's session-scoped PRAGMAs, but could be misread as an explicit teardown. Worth verifying that nothing after buildGraph() writes to nativeDb expecting FK protection, or adding ctx.nativeDb.exec('PRAGMA foreign_keys = ON') right after buildGraph() returns to narrow the scope.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added ctx.nativeDb.exec('PRAGMA foreign_keys = ON') immediately after buildGraph() returns (before JSON.parse), so FK enforcement is restored as soon as the Rust-side purge+build is done. The comment was also updated to reflect this explicit restore rather than the previous wording that implied it was only restored on connection close.

@github-actions

github-actions Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Codegraph Impact Analysis

17 functions changed37 callers affected across 24 files

  • PurgeStmts.dataflowByCallEdge in src/db/repository/build-stmts.ts:9 (0 transitive callers)
  • preparePurgeStmts in src/db/repository/build-stmts.ts:29 (8 transitive callers)
  • runPurge in src/db/repository/build-stmts.ts:99 (8 transitive callers)
  • dynamicCallsData in src/domain/analysis/roles.ts:15 (2 transitive callers)
  • tryNativeOrchestrator in src/domain/graph/builder/stages/native-orchestrator.ts:2089 (5 transitive callers)
  • handleCCallExpression in src/extractors/c.ts:145 (2 transitive callers)
  • handleCppCallExpression in src/extractors/cpp.ts:250 (2 transitive callers)
  • handleCsInvocationExpr in src/extractors/csharp.ts:230 (2 transitive callers)
  • handleElixirCall in src/extractors/elixir.ts:53 (2 transitive callers)
  • handleGoCallExpr in src/extractors/go.ts:229 (2 transitive callers)
  • handleLuaFunctionCall in src/extractors/lua.ts:131 (2 transitive callers)
  • handlePhpFuncCall in src/extractors/php.ts:304 (2 transitive callers)
  • handlePhpMemberCall in src/extractors/php.ts:315 (2 transitive callers)
  • handlePyCall in src/extractors/python.ts:163 (2 transitive callers)
  • handleRubyCall in src/extractors/ruby.ts:154 (2 transitive callers)
  • handleSwiftCallExpression in src/extractors/swift.ts:249 (2 transitive callers)
  • printRoles in src/presentation/queries-cli/overview.ts:224 (2 transitive callers)

…le JSDoc (#1662)

- Narrow FK-off scope in native-orchestrator: restore PRAGMA foreign_keys = ON
  immediately after nativeDb.buildGraph() returns instead of relying on
  connection close, so post-build writes retain FK protection
- Remove orphaned JSDoc block before runInterproceduralStitch that described
  the old buildInterproceduralStitch wrapper semantics; the correct JSDoc
  for the internal helper was already present

Impact: 1 functions changed, 5 affected
@carlos-alm

Copy link
Copy Markdown
Contributor Author

Addressed both Greptile findings:

  1. Orphaned JSDoc on runInterproceduralStitch (dataflow.ts lines 471-478): Removed the first stacked JSDoc block that described the old buildInterproceduralStitch wrapper semantics ("runs after all per-file vertices + summaries"). The second JSDoc ("Core stitch logic — must be called inside an already-open transaction") is the correct description for the internal helper and was kept.

  2. PRAGMA foreign_keys = OFF scope too broad (native-orchestrator.ts): Added ctx.nativeDb.exec('PRAGMA foreign_keys = ON') immediately after buildGraph() returns, narrowing the FK-off window to just the Rust-side purge+build call. Post-build JS writes (gap-repair, structure patch) now have FK protection restored. The comment was updated to reflect this explicit restore.

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

Comment on lines +2052 to +2071
try {
ctx.nativeDb.exec('PRAGMA foreign_keys = OFF');
} catch {
// exec may not exist on very old addon versions — safe to ignore
}

const resultJson = ctx.nativeDb.buildGraph(
ctx.rootDir,
JSON.stringify(ctx.config),
JSON.stringify(ctx.aliases),
JSON.stringify(ctx.opts),
);

// Restore FK enforcement immediately after buildGraph() so any subsequent
// writes to this connection (gap-repair, structure patch) retain FK protection.
try {
ctx.nativeDb.exec('PRAGMA foreign_keys = ON');
} catch {
// safe to ignore on very old addon versions
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 If buildGraph() throws (native addon OOM, missing file, internal Rust panic, etc.), the PRAGMA foreign_keys = ON restore never executes. ctx.nativeDb lives in the shared PipelineContext and — given the comment explicitly calls out "gap-repair, structure patch" as subsequent writers — any outer catch block or pipeline teardown that reuses the connection would do so with FK enforcement still off. A try/finally bracket eliminates the gap.

Suggested change
try {
ctx.nativeDb.exec('PRAGMA foreign_keys = OFF');
} catch {
// exec may not exist on very old addon versions — safe to ignore
}
const resultJson = ctx.nativeDb.buildGraph(
ctx.rootDir,
JSON.stringify(ctx.config),
JSON.stringify(ctx.aliases),
JSON.stringify(ctx.opts),
);
// Restore FK enforcement immediately after buildGraph() so any subsequent
// writes to this connection (gap-repair, structure patch) retain FK protection.
try {
ctx.nativeDb.exec('PRAGMA foreign_keys = ON');
} catch {
// safe to ignore on very old addon versions
}
try {
ctx.nativeDb.exec('PRAGMA foreign_keys = OFF');
} catch {
// exec may not exist on very old addon versions — safe to ignore
}
let resultJson: string;
try {
resultJson = ctx.nativeDb.buildGraph(
ctx.rootDir,
JSON.stringify(ctx.config),
JSON.stringify(ctx.aliases),
JSON.stringify(ctx.opts),
);
} finally {
// Restore FK enforcement so any subsequent writes to this connection
// (gap-repair, structure patch) retain FK protection — even if buildGraph()
// throws.
try {
ctx.nativeDb.exec('PRAGMA foreign_keys = ON');
} catch {
// safe to ignore on very old addon versions
}
}

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — wrapped buildGraph() in try/finally so FK enforcement is always restored, even if the native call throws (OOM, Rust panic, etc.). The PRAGMA foreign_keys = ON restore is now in the finally block, mirroring the pattern you suggested.

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit 32c6e0b into main Jun 21, 2026
29 checks passed
@carlos-alm carlos-alm deleted the fix/issue-1645 branch June 21, 2026 08:08
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 21, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: incremental-edge-parity test fails — node count mismatch on file deletion

1 participant