feat(dynamic-calls): phase 5 — Go MethodByName + C/C++ function pointer/dlsym#1655
Conversation
…on pointer detection
Go (go.ts + go.rs):
- v.MethodByName("name") — extracts string literal as reflection kind, resolves
to top-level function (100% recall for literal names)
- v.MethodByName(variable) — computed-key kind, sink edge
C/C++ (c.ts + cpp.ts + c.rs):
- (*fp)(args) function pointer dereference — unresolved-dynamic sink edge
- dlsym(handle, "symbol") with literal — reflection kind, resolves if same unit
- dlsym(handle, variable) — unresolved-dynamic sink edge
Fixtures:
- dynamic-go: MethodByName("Greet") → 100% P/R; variable → 1 flagged
- dynamic-c: dlsym("greet") → 100% P/R; (*fp)() → 1 flagged
FFI tests: 30 total (added 4 for Go/C MethodByName/dlsym/function-pointer)
docs check acknowledged
Impact: 16 functions changed, 10 affected
Greptile SummaryPhase 5 extends dynamic dispatch detection to Go (
Confidence Score: 5/5Safe to merge; new extraction logic is well-tested, benchmarks at 100% P/R, and the two minor observations are edge-case inconsistencies that do not affect real-world Go or C/C++ codebases. The changes follow the established Phase 2–4 pattern exactly. Both TypeScript and Rust extractors are updated in lockstep, 30/30 FFI tests pass, and two new benchmark fixtures confirm end-to-end resolution. The only findings are a defensive-coding inconsistency in get_go_arg and a degenerate-input divergence for MethodByName with an empty string, neither of which is reachable with valid Go or C source. crates/codegraph-core/src/extractors/go.rs — minor inconsistency in the new get_go_arg helper worth a one-line fix before the next phase lands on top of it. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[call_expression node] --> B{fn_node kind}
B -->|identifier| C{dlsym or dlvsym?}
C -->|yes| D{second arg is string literal?}
D -->|yes non-empty| E[emit reflection - name = literal value]
D -->|no or empty| F[emit unresolved-dynamic]
C -->|no| G[emit static call - name = fn_name]
B -->|parenthesized_expression or pointer_expression| H[emit unresolved-dynamic - fp pattern]
B -->|selector_expression Go only| I{field == MethodByName?}
I -->|yes| J{arg 0 is string literal?}
J -->|yes non-empty| K[emit reflection - name = stripped literal]
J -->|variable or empty| L[emit computed-key - keyExpr = arg text]
I -->|no| M[emit static method call]
B -->|field_expression C struct| N[emit static call with receiver]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[call_expression node] --> B{fn_node kind}
B -->|identifier| C{dlsym or dlvsym?}
C -->|yes| D{second arg is string literal?}
D -->|yes non-empty| E[emit reflection - name = literal value]
D -->|no or empty| F[emit unresolved-dynamic]
C -->|no| G[emit static call - name = fn_name]
B -->|parenthesized_expression or pointer_expression| H[emit unresolved-dynamic - fp pattern]
B -->|selector_expression Go only| I{field == MethodByName?}
I -->|yes| J{arg 0 is string literal?}
J -->|yes non-empty| K[emit reflection - name = stripped literal]
J -->|variable or empty| L[emit computed-key - keyExpr = arg text]
I -->|no| M[emit static method call]
B -->|field_expression C struct| N[emit static call with receiver]
Reviews (7): Last reviewed commit: "Merge remote-tracking branch 'origin/mai..." | Re-trigger Greptile |
| /* Fixture: C dynamic dispatch patterns | ||
| * (*fp)(args) → flagged as unresolved-dynamic | ||
| * dlsym(handle, "symbol") → flagged as unresolved-dynamic (cross-lib) | ||
| */ | ||
| #include <stdio.h> | ||
| #include <dlfcn.h> | ||
|
|
There was a problem hiding this comment.
Fixture header comment contradicts actual behavior
The file header says dlsym(handle, "symbol") → flagged as unresolved-dynamic (cross-lib), but both src/extractors/c.ts and crates/codegraph-core/src/extractors/c.rs resolve a string-literal dlsym call as dynamicKind: 'reflection'. The expected-edges.json correctly documents this as resolving to greet(), so the comment is the odd one out and will mislead future readers.
There was a problem hiding this comment.
Fixed — corrected the header comment in dispatch.c. The comment now accurately reflects that dlsym with a string literal resolves as reflection (matching the expected-edges.json), not as unresolved-dynamic.
| // For simplicity, flag as unresolved (symbol might not be in the codebase) | ||
| ctx.calls.push({ | ||
| name: '<dynamic:unresolved>', | ||
| line: callLine, | ||
| dynamic: true, | ||
| dynamicKind: 'unresolved-dynamic', | ||
| }); | ||
| return; | ||
| } | ||
| if (call.name) ctx.calls.push(call); | ||
|
|
||
| if (fnName) ctx.calls.push({ name: fnName, line: callLine }); | ||
| } | ||
|
|
There was a problem hiding this comment.
dlsym with a string literal silently downgrades to unresolved-dynamic in C++ while C resolves it
src/extractors/c.ts extracts dlsym(handle, "greet") as dynamicKind: 'reflection' and resolves it to the named function. The C++ extractor always emits unresolved-dynamic, losing resolution for string-literal arguments. Mixed-language projects using dlsym in .cpp files (the common case on Linux) will get no edge where the C extractor would produce one. If this is intentional (name mangling concerns), a note in the comment explaining the reasoning would avoid confusion.
There was a problem hiding this comment.
Fixed — added full dlsym string-literal resolution to the C++ extractor (both src/extractors/cpp.ts and crates/codegraph-core/src/extractors/cpp.rs), mirroring the C extractor. A string-literal dlsym argument now resolves as reflection; a variable argument falls back to unresolved-dynamic. Also added (*fp)(args) function-pointer detection to the Rust C++ extractor. The comment explains that extern "C" symbols are not mangled so string-literal resolution is valid.
| edges.push(ComputedEdge { | ||
| source_id: caller_id, | ||
| target_id: fc.file_node_id, | ||
| kind: "calls".to_string(), | ||
| confidence: 0.0, | ||
| dynamic: 1, | ||
| dynamic_kind: Some(dk.clone()), | ||
| }); | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| emit_hierarchy_edges(ctx, file_input, fc.rel_path, edges); |
There was a problem hiding this comment.
First-byte dedup key is fragile — a future flag-only kind sharing a first letter would silently skip sink edges
dk.as_bytes()[0] is used as the third element of seen_sink_edges, meaning the set treats two kinds with the same leading byte as the same discriminator. Currently "eval" (e), "computed-key" (c), and "unresolved-dynamic" (u) are all distinct. However, "computed-literal" also starts with 'c'. If it were ever promoted to a flag-only kind — or a new kind like "call-indirect" was added — the dedup would incorrectly suppress a sink edge that should be emitted. Using the full string (as the JS engine already does via `${caller.id}:${fc.file_node_id}:${dk}`) would be safer.
There was a problem hiding this comment.
Already fixed — the merge from main resolved this. build_edges.rs now uses HashSet<(u32, u32, String)> with dk.clone() as the full string key instead of the first byte, matching the JS engine's approach.
Impact: 29 functions changed, 165 affected
…on to C++ extractor (#1655) Impact: 2 functions changed, 3 affected
…on to C++ extractor (#1655) Impact: 2 functions changed, 3 affected
| // Back-fill dynamic_kind for flagged sink edges emitted by the native engine. | ||
| // Include dynamic_kind in the WHERE clause so two sink edges from the same caller | ||
| // to the same file node with different kinds don't clobber each other. | ||
| if (dynamicKindRows.length > 0) { | ||
| const stmt = db.prepare( | ||
| "UPDATE edges SET dynamic_kind = ? WHERE kind = 'calls' AND source_id = ? AND target_id = ? AND (dynamic_kind IS NULL OR dynamic_kind = ?)", | ||
| ); | ||
| for (const r of dynamicKindRows) stmt.run(r[6], r[0], r[1], r[6]); | ||
| } |
There was a problem hiding this comment.
New UPDATE block is missing
confidence = 0.0 AND dynamic = 1 guard
The pre-existing block at line 1665 correctly scopes to sink edges with AND confidence = 0.0 AND dynamic = 1. The new block added in this PR omits those predicates, so it will match any calls edge between the same (source_id, target_id) where dynamic_kind IS NULL — including ordinary high-confidence static call edges. In a codebase where runFoo calls greet() both directly and via dlsym(handle, "greet"), the static call edge (dynamic=NULL, confidence=0.8) would be incorrectly tagged with dynamic_kind = 'reflection' after this UPDATE runs. This taints the static-call metadata for all such pairs.
There was a problem hiding this comment.
Already addressed — the UPDATE block at this location includes AND confidence = 0.0 AND dynamic = 1 (visible in the current code at line 1667). The fix was applied via the 'fix: correct dispatch.c comment and add dlsym string-literal resolution to C++ extractor' commit. Static call edges sharing a (source_id, target_id) pair with a dynamic-dispatch row are not affected.
Codegraph Impact Analysis14 functions changed → 9 callers affected across 3 files
|
ea88009 to
c278409
Compare
Impact: 3 functions changed, 8 affected
| /// Get the first non-punctuation argument from a Ruby call node's argument list. | ||
| fn get_first_ruby_arg<'a>(node: &Node<'a>) -> Option<Node<'a>> { | ||
| let args = node.child_by_field_name("arguments")?; | ||
| for i in 0..args.child_count() { | ||
| let child = args.child(i)?; | ||
| match child.kind() { | ||
| "(" | ")" | "," => continue, | ||
| _ => return Some(child), | ||
| } | ||
| } | ||
| None | ||
| } |
There was a problem hiding this comment.
get_first_ruby_arg returns splat_argument nodes while the TypeScript mirror in ruby.ts skips them. For obj.send(*args, ...), the Rust extractor sees the splat_argument as the first argument, falls into the _ arm, and emits computed-key with keyExpr: "*args". The TypeScript extractor skips it, finds no remaining argument, and emits unresolved-dynamic. The two engines produce a different call kind for the same Ruby source.
| /// Get the first non-punctuation argument from a Ruby call node's argument list. | |
| fn get_first_ruby_arg<'a>(node: &Node<'a>) -> Option<Node<'a>> { | |
| let args = node.child_by_field_name("arguments")?; | |
| for i in 0..args.child_count() { | |
| let child = args.child(i)?; | |
| match child.kind() { | |
| "(" | ")" | "," => continue, | |
| _ => return Some(child), | |
| } | |
| } | |
| None | |
| } | |
| /// Get the first non-punctuation argument from a Ruby call node's argument list. | |
| fn get_first_ruby_arg<'a>(node: &Node<'a>) -> Option<Node<'a>> { | |
| let args = node.child_by_field_name("arguments")?; | |
| for i in 0..args.child_count() { | |
| let child = args.child(i)?; | |
| match child.kind() { | |
| "(" | ")" | "," | "splat_argument" => continue, | |
| _ => return Some(child), | |
| } | |
| } | |
| None | |
| } |
Impact: 2 functions changed, 4 affected
Summary
Stacked on PRs #1629-#1654. Phase 5 adds Go and C/C++ dynamic dispatch detection.
v.MethodByName("Greet")reflectionv.MethodByName(name)computed-key(*fp)(args)unresolved-dynamicdlsym(handle, "symbol")reflectiondlsym(handle, var)unresolved-dynamicAll mirrored in Rust extractors (go.rs, c.rs).
Benchmark results
dynamic-go: 100% P/R —MethodByName("Greet")resolves (Go functions unqualified in DB)dynamic-c: 100% P/R —dlsym(handle, "greet")resolves (C functions unqualified)Test plan
dynamic-gobenchmark: 100% P/R, 1 flagged (computed-key)dynamic-cbenchmark: 100% P/R, 1 flagged (unresolved-dynamic from *fp)cargo checkcleanSame pattern as Kotlin/Python/Ruby
Go and C use unqualified function names in the DB → literal
MethodByName/dlsymresolve correctly without type-aware lookup.