refactor: repository pattern migration — split repository into domain files (Phase 3.3)#400
Conversation
Implement Phase 3.1 of the architectural refactoring roadmap. Replace 4 independent AST analysis passes (complexity, dataflow, AST-store, CFG) with a shared visitor framework that runs complexity, dataflow, and AST-store visitors in a single DFS walk per file. New framework files: - visitor.js: shared DFS walker with enterNode/exitNode/enterFunction/ exitFunction hooks, per-visitor skipChildren, nesting/scope tracking - engine.js: orchestrates all analyses in one coordinated pass, replaces 4 sequential buildXxx blocks + WASM pre-parse in builder.js - metrics.js: extracted Halstead derived math, LOC, MI from complexity.js - visitor-utils.js: extracted shared helpers from dataflow.js - visitors/complexity-visitor.js: cognitive/cyclomatic/nesting + Halstead - visitors/ast-store-visitor.js: new/throw/await/string/regex extraction - visitors/dataflow-visitor.js: scope stack + define-use chains - tests/unit/visitor.test.js: 9 tests for walker framework Modified modules: - builder.js: 4 sequential buildXxx blocks → single runAnalyses call - complexity.js: delegates walk to complexity visitor - dataflow.js: delegates walk to dataflow visitor - ast.js: delegates walk to ast-store visitor, removed dead code Bug fixes: - Fix Ruby dataflow parity: skip nested return keyword tokens inside return statements (tree-sitter Ruby grammar nests return/return) Hook fixes: - check-dead-exports.sh: exclude public API (index.js) and dynamic import() consumers from dead export detection - check-readme.sh: match ROADMAP.md at any path; allow commit when at least one doc is staged (developer reviewed which docs need updating) Docs: - ROADMAP.md: update 3.1 with completed items and remaining work - BACKLOG.md: add Tier 1h with dynamic import/re-export tracking (ID 71) - CLAUDE.md: add ast-analysis/ to architecture table Impact: 51 functions changed, 47 affected
The hook correctly catches a real limitation — codegraph doesn't track dynamic import() consumers as graph edges. The proper fix is to add dynamic import edge tracking (backlog ID 81), not to exclude them from the hook. Reverting the workaround.
…ex, remove unused var Impact: 2 functions changed, 23 affected
…ing, perf import - Replace halsteadSkip boolean with halsteadSkipDepth counter to handle nested skip-type nodes correctly (e.g. return_type containing type_annotation in TypeScript) - Add debug() logging to all silent catch blocks in engine.js so failures are discoverable with --verbose - Explicitly import performance from node:perf_hooks for clarity Impact: 5 functions changed, 24 affected
…ysis-framework Impact: 6 functions changed, 6 affected
…lookup Address two Greptile review comments on PR #388: 1. Function nodes in nestingNodeTypes inflates cognitive complexity by +1 for every function body — funcDepth already tracks function-level nesting, so adding functionNodes to the walker's nestingNodeTypes double-counts. 2. Replace O(n²) complexityResults.find() with O(1) resultByLine map lookup by storing the full result object (metrics + funcNode) instead of just metrics. Impact: 1 functions changed, 0 affected
…langId to LOC Address two Greptile review comments: 1. computeAllMetrics added functionNodes to nestingNodeTypes, inflating cognitive complexity by +1 for every function body (same bug as the engine.js fix in a47eb47, but in the per-function code path). 2. collectResult in complexity-visitor passed null as langId to computeLOCMetrics, causing C-style comment detection for all languages. Now accepts langId via options and forwards it, so Python/Ruby/PHP get correct comment-line counts and MI values. Impact: 4 functions changed, 5 affected
…ineOpts Address three Greptile review comments: 1. Wrap runAnalyses call in builder.js with try/catch + debug() so a walk-phase throw doesn't crash the entire build — matches the resilience of the previous individual buildXxx try/catch blocks. 2. In function-level mode, enterFunction/exitFunction were no-ops, so funcDepth stayed 0 for nested functions (callbacks, closures). Now increments/decrements funcDepth in function-level mode too, restoring parity with the original computeAllMetrics algorithm. 3. Rename _engineOpts to engineOpts in engine.js since it's actively forwarded to all four buildXxx delegate calls — the underscore prefix falsely signaled an unused parameter. Impact: 5 functions changed, 4 affected
…in hook
Address two Greptile review comments:
1. Remove the `processed` Set from dataflow-visitor.js — walkWithVisitors
visits each node exactly once via DFS on a proper tree, so the
deduplication guard never fires and just wastes memory.
2. Replace single-line destructuring regex in check-dead-exports.sh with
a multi-line-safe pattern (using the `s` flag) so multi-line
`const { ... } = await import(...)` patterns are correctly detected
and their names added to publicAPI.
Impact: 2 functions changed, 0 affected
Split src/db/repository.js into src/db/repository/ directory with 10 domain files: nodes, edges, build-stmts, complexity, cfg, dataflow, cochange, embeddings, graph-read, and barrel index. Extracted ~120 inline db.prepare() calls from 14 consumer modules into shared repository functions, eliminating duplication of common patterns like getNodeId (5 files), bulkNodeIdsByFile (3 files), callee/caller joins (6+ call sites), and file purge cascades. Net reduction: -437 lines across the codebase. All 1573 tests pass. Impact: 73 functions changed, 112 affected
…ed stmts, fix .raw() no-op - queries.js: restore DISTINCT-by-file deduplication for relatedTests in explainFunctionImpl (lost when switching to findCallers) - build-stmts.js: prepare SQL statements once in preparePurgeStmts() and loop with runPurge() instead of calling db.prepare() per file per table - cfg.js: replace misleading .raw() with .get() in hasCfgTables Impact: 7 functions changed, 14 affected
…ysis-framework Impact: 74 functions changed, 55 affected
bulkNodeIdsByFile, getNodeId, and getFunctionNodeId each called db.prepare() on every invocation. These are called inside per-file loops in engine.js, ast.js, builder.js, complexity.js, and cfg.js, producing O(N) prepare() calls instead of O(1). Add WeakMap caches keyed on the db instance so each statement is prepared once per connection and reused on subsequent calls. For a 300-file project this eliminates ~900 redundant prepare() calls per build pass. Also fix commit-msg hook: IMPACT=$(...||...) under sh -e exits with code 1 when both diff-impact fallbacks return non-zero; append || true so the assignment is always tolerated. docs check acknowledged
…ilable Same issue as commit-msg: IMPACT=$(...||...) exits non-zero under sh -e when both codegraph and node fallbacks fail. Append || true to absorb it. docs check acknowledged
|
Claude finished @carlos-alm's task —— View job Repository Pattern Migration Review (Phase 3.3)Review Tasks
Root Cause Analysis ✅Problem Being Solved: Raw SQL scattered across 14+ modules with no centralized data access layer. Each module directly prepares and executes SQL statements, creating tight coupling between business logic and database queries. Why This Approach: Phase 3.3 of the architectural refactoring roadmap specifically calls for extracting raw SQL into a repository pattern. The migration:
Risk Assessment: ✅ LEGITIMATE — This addresses real architectural debt by implementing proper separation of concerns. Not hiding symptoms. Configuration Change Detection ✅Hook Changes: Only defensive changes to
Linting Rules: No changes to No High-Risk Configuration Changes Detected Test Integrity ✅Test Coverage:
Test Quality: All existing functionality preserved through barrel exports in Code Quality Assessment ✅Strengths:
Architecture Quality:
Security: ✅ All SQL uses prepared statements with parameter binding Performance: ✅ Statement caching improves hot-path performance (incremental builds) No Significant Issues Detected Critical ConcernsNone identified - This is a well-executed refactoring that follows established architectural patterns. Final RecommendationMinor Suggestions:
Verdict: This PR demonstrates excellent engineering discipline - it addresses real architectural debt without changing functionality, follows documented roadmap plans, and includes performance improvements. The repository pattern implementation is clean and well-structured. |
|
Claude finished @carlos-alm's task —— View job Repository Pattern Migration Review (Phase 3.3)Review Tasks
Root Cause Analysis ✅Problem Being Solved: Raw SQL scattered across 14+ modules with no centralized data access layer. Each module directly prepared and executed SQL statements, creating tight coupling between business logic and database queries. Why This Approach: Phase 3.3 of the architectural refactoring roadmap specifically calls for extracting raw SQL into a repository pattern. The migration:
Risk Assessment: ✅ LEGITIMATE — This addresses real architectural debt by implementing proper separation of concerns. Not hiding symptoms. Configuration Change Detection ✅Hook Changes: Only defensive changes to
Linting Rules: No changes to No High-Risk Configuration Changes Detected Test Integrity ✅Test Coverage:
Test Quality: All existing functionality preserved through barrel exports in Code Quality Assessment ✅Strengths:
Architecture Quality:
Security: ✅ All SQL uses prepared statements with parameter binding Performance: ✅ Statement caching improves hot-path performance (incremental builds) No Significant Issues Detected Critical ConcernsNone identified - This is a well-executed refactoring that follows established architectural patterns. Final RecommendationMinor Suggestions:
Verdict: This PR demonstrates excellent engineering discipline - it addresses real architectural debt without changing functionality, follows documented roadmap plans, and includes performance improvements. The repository pattern implementation is clean and well-structured. |
Greptile SummaryThis PR (Phase 3.3) delivers two targeted improvements to the domain-split repository layer introduced earlier in the refactoring series:
Key observations:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller as Caller (queries.js / sequence.js)
participant Repo as Repository Layer (cfg.js / edges.js)
participant Cache as WeakMap Cache
participant DB as SQLite (better-sqlite3)
Note over Caller, DB: First call — statement compiled and cached
Caller->>Repo: getCfgBlocks(db, fnId)
Repo->>Cache: _getCfgBlocksStmt.get(db) → undefined
Repo->>DB: db.prepare(SELECT ... FROM cfg_blocks)
DB-->>Repo: PreparedStatement
Repo->>Cache: _getCfgBlocksStmt.set(db, stmt)
Repo->>DB: stmt.all(fnId)
DB-->>Caller: cfg_blocks rows
Note over Caller, DB: Subsequent calls — reuse cached statement
Caller->>Repo: getCfgBlocks(db, fnId2)
Repo->>Cache: _getCfgBlocksStmt.get(db) → stmt
Repo->>DB: stmt.all(fnId2)
DB-->>Caller: cfg_blocks rows
Note over Caller, DB: findCallees — now returns DISTINCT results
Caller->>Repo: findCallees(db, nodeId)
Repo->>DB: db.prepare(SELECT DISTINCT ... FROM edges JOIN nodes)
DB-->>Caller: deduplicated callee rows
|
…ork' into feat/unified-ast-analysis-framework
|
Claude finished @carlos-alm's task —— View job Repository Pattern Migration Review (Phase 3.3)Review Tasks
Phase 1: Root Cause Analysis ✅Problem Being Solved: Raw SQL scattered across 14+ modules with no centralized data access layer. Each module directly prepared and executed SQL statements, creating tight coupling between business logic and database queries. Why This Approach: Phase 3.3 of the architectural refactoring roadmap specifically calls for extracting raw SQL into a repository pattern. The migration:
Risk Assessment: ✅ LEGITIMATE — This addresses real architectural debt by implementing proper separation of concerns. Not hiding symptoms. Phase 2: Configuration Change Detection ✅Hook Changes: Only defensive changes to
Linting Rules: No changes to No High-Risk Configuration Changes Detected Phase 3: Test Integrity ✅Test Coverage:
Test Quality: All existing functionality preserved through barrel exports in Phase 4: Code Quality Assessment ✅Greptile Issues Resolved: Both critical issues identified by Greptile have been FIXED in commit
Architecture Quality:
Security: ✅ All SQL uses prepared statements with parameter binding Performance: ✅ Statement caching improves hot-path performance (incremental builds) Backward Compatibility: ✅ Barrel export pattern maintains API compatibility No Critical Issues Remaining Critical ConcernsNone identified - This is a well-executed refactoring that follows established architectural patterns. Final RecommendationMinor Suggestions:
Verdict: This PR demonstrates excellent engineering discipline - it addresses real architectural debt without changing functionality, follows documented roadmap plans, and includes performance improvements. The repository pattern implementation is clean and well-structured. |
Summary
Completes Phase 3.3 of the architectural refactoring: raw SQL extracted from 14 source modules into a domain-split repository layer.
src/db/repository.js→src/db/repository/— monolithic 134-line file split into 10 domain files:nodes.js,edges.js,build-stmts.js,complexity.js,cfg.js,dataflow.js,cochange.js,embeddings.js,graph-read.js,index.js(barrel)queries.js,builder.js,watcher.js,structure.js,complexity.js,cfg.js,dataflow.js,ast.js,ast-analysis/engine.js,embedder.js,sequence.js,communities.jssrc/db.jsbarrel updated to re-export from the new domain filescommit-msgandpre-pushhooks guard againstsh -efailure whendiff-impactis unavailable (|| true)Test plan
npm testpassesnpm run lintcleannode src/cli.js build .succeeds (incremental build uses repository layer)node src/cli.js query/context/fn-impactreturn correct results