fix(ingestion): resolve Python multi-line + src-layout imports (cross-module edges)#170
Merged
Conversation
…-module edges)
Two bugs in Python import handling left cross-module CALLS/IMPORTS edges
unresolved, so a multi-file Python package read as disconnected islands
(field-report Issue 1, Bugs A + B).
Bug A — extractPyImports (providers/python.ts) was line-based: a multi-line
parenthesized import `from m import (\n a,\n b,\n)` matched the from-regex
on the first line only with rest='(' → 0 names → the whole import was
silently dropped. black/ruff wrap every long import list this way, so most
real modules lost their imports. Fix: joinLogicalLines() collapses physical
lines into logical ones across an open paren or a trailing backslash before
matching.
Bug B — resolveImportTarget (pipeline/phases/parse.ts) only handled ./ ../ /
specifiers, so a dotted ABSOLUTE src-layout import (`pkg.client` →
src/pkg/client.py) never resolved and was emitted as an <external> stub. Fix:
resolveDottedAbsoluteImport() converts dots→slashes and probes the module at
the repo root and under detected src roots (discoverSourceRoots), gated to
namespace-import languages (Python). A dotted specifier that resolves to no
in-repo file is still treated as third-party (external stub) as before.
Verified end-to-end on ngs-research-agent: zero <external> stubs for
ngs_research_agent.client; real file→file IMPORTS edges (mcp_server→client,
test_mcp_server→client) now land. Regression tests: multi-line/backslash
import extraction (python.test.ts) and a src-layout dotted-import → file
IMPORTS edge with no <external> stub (parse.test.ts). Ingestion 604/604,
tsc + biome clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two bugs in Python import handling left cross-module
CALLS/IMPORTSedges unresolved, so a multi-file Python package read as disconnected islands in the graph. Both are part of the field-report's Issue 1 (Bugs A + B) and are the foundation cross-modulecontext/impactdepend on.Bug A — multi-line parenthesized imports silently dropped
extractPyImportswas line-based. A multi-linefrom m import (\n a,\n b,\n)matched the from-regex on the first line only withrest = "("→ 0 names → the whole import discarded. black/ruff wrap every long import list this way, so most real modules lost their imports entirely.Fix:
joinLogicalLines()collapses physical lines into logical lines across an open paren (depth count) or a trailing backslash, before the per-line regex runs.Bug B — src-layout dotted absolute imports stub as
<external>resolveImportTargetonly handled./,../,/specifiers. A dotted absolute import (pkg.client→src/pkg/client.py) never resolved, so it was emitted as aCodeElement:<external>stub instead of linking the real file.Fix:
resolveDottedAbsoluteImport()converts dots→slashes and probes the module at the repo root and under detected src-layout roots (discoverSourceRoots), gated tonamespaceimport-semantics languages (Python). A dotted specifier that resolves to no in-repo file is still treated as third-party (external stub) — unchanged.Test plan
ngs-research-agent: zero<external>stubs forngs_research_agent.client; real file→file IMPORTS edges (mcp_server.py → client.py,test_mcp_server.py → client.py) now land.python.test.ts): multi-line parenthesized + backslash-continued + single-line-parens import extraction.parse.test.ts): a src-layout dotted import produces a file IMPORTS edge and no<external>stub.@opencodehub/ingestion604/604;tsc+biomeclean.Relationship to PR #167
#167 fixed the
ingest-sarifnode-clobber that hid the host Function node. This PR fixes the import resolution so cross-module edges between real nodes actually bind. Together they restore trustworthy cross-module blast-radius for Python.