Skip to content

fix(ingestion): resolve Python multi-line + src-layout imports (cross-module edges)#170

Merged
theagenticguy merged 5 commits into
mainfrom
fix/python-import-parser
May 29, 2026
Merged

fix(ingestion): resolve Python multi-line + src-layout imports (cross-module edges)#170
theagenticguy merged 5 commits into
mainfrom
fix/python-import-parser

Conversation

@theagenticguy
Copy link
Copy Markdown
Owner

Summary

Two bugs in Python import handling left cross-module CALLS/IMPORTS edges unresolved, so a multi-file Python package read as disconnected islands in the graph. Both are part of the field-report's Issue 1 (Bugs A + B) and are the foundation cross-module context/impact depend on.

Bug A — multi-line parenthesized imports silently dropped

extractPyImports was line-based. A multi-line from m import (\n a,\n b,\n) matched the from-regex on the first line only with rest = "(" → 0 names → the whole import discarded. black/ruff wrap every long import list this way, so most real modules lost their imports entirely.

Fix: joinLogicalLines() collapses physical lines into logical lines across an open paren (depth count) or a trailing backslash, before the per-line regex runs.

Bug B — src-layout dotted absolute imports stub as <external>

resolveImportTarget only handled ./, ../, / specifiers. A dotted absolute import (pkg.clientsrc/pkg/client.py) never resolved, so it was emitted as a CodeElement:<external> stub instead of linking the real file.

Fix: resolveDottedAbsoluteImport() converts dots→slashes and probes the module at the repo root and under detected src-layout roots (discoverSourceRoots), gated to namespace import-semantics languages (Python). A dotted specifier that resolves to no in-repo file is still treated as third-party (external stub) — unchanged.

Test plan

  • End-to-end on ngs-research-agent: zero <external> stubs for ngs_research_agent.client; real file→file IMPORTS edges (mcp_server.py → client.py, test_mcp_server.py → client.py) now land.
  • Bug A regression (python.test.ts): multi-line parenthesized + backslash-continued + single-line-parens import extraction.
  • Bug B regression (parse.test.ts): a src-layout dotted import produces a file IMPORTS edge and no <external> stub.
  • @opencodehub/ingestion 604/604; tsc + biome clean.

Relationship to PR #167

#167 fixed the ingest-sarif node-clobber that hid the host Function node. This PR fixes the import resolution so cross-module edges between real nodes actually bind. Together they restore trustworthy cross-module blast-radius for Python.

…-module edges)

Two bugs in Python import handling left cross-module CALLS/IMPORTS edges
unresolved, so a multi-file Python package read as disconnected islands
(field-report Issue 1, Bugs A + B).

Bug A — extractPyImports (providers/python.ts) was line-based: a multi-line
parenthesized import `from m import (\n a,\n b,\n)` matched the from-regex
on the first line only with rest='(' → 0 names → the whole import was
silently dropped. black/ruff wrap every long import list this way, so most
real modules lost their imports. Fix: joinLogicalLines() collapses physical
lines into logical ones across an open paren or a trailing backslash before
matching.

Bug B — resolveImportTarget (pipeline/phases/parse.ts) only handled ./ ../ /
specifiers, so a dotted ABSOLUTE src-layout import (`pkg.client` →
src/pkg/client.py) never resolved and was emitted as an <external> stub. Fix:
resolveDottedAbsoluteImport() converts dots→slashes and probes the module at
the repo root and under detected src roots (discoverSourceRoots), gated to
namespace-import languages (Python). A dotted specifier that resolves to no
in-repo file is still treated as third-party (external stub) as before.

Verified end-to-end on ngs-research-agent: zero <external> stubs for
ngs_research_agent.client; real file→file IMPORTS edges (mcp_server→client,
test_mcp_server→client) now land. Regression tests: multi-line/backslash
import extraction (python.test.ts) and a src-layout dotted-import → file
IMPORTS edge with no <external> stub (parse.test.ts). Ingestion 604/604,
tsc + biome clean.
@theagenticguy theagenticguy enabled auto-merge (squash) May 29, 2026 21:26
@theagenticguy theagenticguy merged commit a56544a into main May 29, 2026
43 of 45 checks passed
@theagenticguy theagenticguy deleted the fix/python-import-parser branch May 29, 2026 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant