refactor: Replace progressStore with file-based IndexStatus#40
refactor: Replace progressStore with file-based IndexStatus#40
Conversation
- Remove progressStore (preRegister, update, carry-over, flusher) - Remove IndexingProgressSummary, BuildIndexingProgress, formatAge, buildIndexingMessage - Remove auto-resume from SearchCode/HybridSearchCode (redundant with DetectContext) - Remove resumeAttempts field from Engine - Add IndexStatus/LangStatus/SaveIndexStatus/LoadIndexStatus in pkg/indexer/ - Indexer writes OnDisk/Changed/Processed via Progress callback - Tools read status directly from .ragcode/index_status.json - Fix TestDetectNoMarkers with AllowedRoots isolation
There was a problem hiding this comment.
Pull request overview
Replaces the in-memory progressStore indexing progress mechanism (with preRegister, flusher goroutine, carry-over logic) with a simpler file-based IndexStatus system in pkg/indexer/index_status.go. MCP tools now read status directly from {workspaceRoot}/.ragcode/index_status.json via indexer.LoadIndexStatus(). This is a significant simplification that removes ~1000 lines of complex concurrency code.
Changes:
- New
pkg/indexer/index_status.gowithIndexStatus/LangStatusstructs andSaveIndexStatus/LoadIndexStatusfunctions, replacing the oldengine/index_progress.gowith itsprogressStore, flusher goroutine, and deep-copy logic - All MCP tools updated to use the new
IndexingStatusfield (backed byindexer.IndexStatus) instead ofIndexingProgress(backed byIndexingProgressSummary), with the JSON tag preserved as"indexing_progress"for backward compatibility - Removed auto-resume logic from
SearchCode/HybridSearchCodeand theresumeAttemptsthrottle, as workspace re-indexing is now triggered viaDetectContext
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
pkg/indexer/index_status.go |
New file: IndexStatus/LangStatus types, SaveIndexStatus, LoadIndexStatus |
pkg/indexer/index_status_test.go |
New file: round-trip and missing-file tests |
internal/service/engine/engine.go |
Remove progressStore/resumeAttempts, add GetIndexStatus, wire Progress callback with file-based status |
internal/service/engine/index_progress.go |
Deleted: old progressStore and all related types/functions |
internal/service/engine/index_progress_test.go |
Deleted: tests for removed progressStore |
internal/service/tools/response.go |
Replace IndexingProgressSummary with IndexingStatus, rename helper to ContextFromWorkspaceWithStatus |
internal/service/tools/smart_search_pipeline.go |
Use LoadIndexStatus, replace dynamic fallback note with static string |
internal/service/tools/smart_search.go |
Simplify indexing error messages, remove progress attachment |
internal/service/tools/find_usages.go |
Switch to GetIndexStatus/ContextFromWorkspaceWithStatus |
internal/service/tools/call_hierarchy.go |
Switch to GetIndexStatus/ContextFromWorkspaceWithStatus |
internal/service/tools/list_package_exports.go |
Switch to ContextFromWorkspaceWithStatus, remove idx-based status override |
internal/service/tools/index_workspace.go |
Use LoadIndexStatus directly |
internal/service/tools/skills.go |
Set IndexingStatus: nil |
internal/service/tools/evaluate_ragcode.go |
Remove unused wctxID/wctxRoot, set IndexingStatus: nil |
internal/service/tools/read_file_context.go |
Set IndexingStatus: nil |
internal/service/tools/tests/health_metrics_test.go |
Remove tests for deleted types/functions |
internal/service/engine/engine_searchcode_test.go |
Remove auto-resume test |
internal/service/engine/engine_nonblocking_search_test.go |
Remove progress.stop() cleanup |
internal/service/engine/engine_fallback_search_test.go |
Remove progress.stop() cleanup |
internal/service/engine/engine_sticky_test.go |
Remove progress.stop() cleanup |
pkg/workspace/detector/detector_test.go |
Isolate test to avoid picking up .ragcode markers from parent dirs |
cmd/rag-code-mcp/main.go |
Version bump to 2.1.63 |
The Progress callback received totalFiles = len(changedFiles), which only counts modified files needing re-indexing. This was incorrectly assigned to OnDisk, causing on_disk: 1 when only 1 file changed — despite 232 Go files and 655 docs on disk. Fix: - Call CountAllFiles() once before the language loop for real disk totals - Pre-populate index_status.json with on_disk counts at indexing start - Use diskTotal (pre-counted) for OnDisk, totalFiles for Changed - Languages with 0 changed files now correctly show their disk totals
- Fix nil panic in ContextFromWorkspaceWithStatus when wctx is nil (#7) - Fix indentation in smart_search_pipeline.go (#1) - Use loaded idx instead of nil in call_hierarchy.go and find_usages.go (#3, #9) - Add backward-compat comment on JSON tag mismatch (#6) - Create fresh IndexStatus when LoadIndexStatus returns nil (#8) - Populate Elapsed field at completed/failed transitions (#2) - Throttle progress I/O writes to every 10 files (#4) - Fix test cleanup for .ragcode dir in TempDir
- Removed the 'State' field ('starting', 'running', 'completed', 'failed') from IndexStatus entirely.
- This state was misleading for AI consumers, especially during incremental re-indexing (which reset state to 'starting' even if the index was 99% complete), causing AI agents to prematurely abandon tools.
- Simplified engine.go progress callbacks and terminal states to only log timestamps and errors, rather than a potentially confusing overall state keyword.
- Updated related tests to match the simplified struct.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 23 changed files in this pull request and generated 9 comments.
Comments suppressed due to low confidence (2)
internal/service/tools/call_hierarchy.go:136
- The check
if idx != nilat line 133 is now semantically incorrect. With the old in-memoryprogressStore,GetIndexProgressreturned nil when no indexing was active. With the new file-based system,GetIndexStatusreads from disk and returns non-nil even for a completed previous run (the file persists). This means the response will always beindexing_in_progresswhen no collections exist but a previous run left anindex_status.jsonfile — even if indexing completed hours ago.
This needs to check the actual state (e.g., EndedAt is empty and Error is empty) to determine if indexing is truly in progress. Without a State field on IndexStatus, you'd need something like: if idx != nil && idx.EndedAt == "" && idx.Error == "".
if idx != nil {
resp.Status = "indexing_in_progress"
resp.Data = map[string]any{"indexing": idx}
}
internal/service/tools/find_usages.go:105
- Same issue as in
call_hierarchy.go:if idx != nilat line 103 will now be true even for a completed previous indexing run (the file persists on disk), incorrectly changing the status fromindexing_requiredtoindexing_in_progress. This check needs to verify that indexing is actually ongoing (e.g.,idx.EndedAt == "").
if idx != nil {
resp.Status = "indexing_in_progress"
}
BUG-001 (list_package_exports): normalize full import path to short package name before querying Qdrant. The index stores 'indexer', not 'github.com/doITmagic/rag-code-mcp/pkg/indexer'. BUG-003 (Go parser): go/doc automatically moves constructor/loader functions (NewX, LoadX) that return *T from docPkg.Funcs into docPkg.Types[T].Funcs. The parser only iterated typ.Methods, so these functions were silently dropped and never written to the vector index. Fix: add a typ.Funcs loop in AnalyzePackage() after the methods loop. Affected symbols confirmed missing from Qdrant before fix: LoadIndexStatus, NewService, NewState, LoadState (pkg/indexer) Tests: expanded analyzer_test.go to use real pkg/indexer code as fixture with expectations anchored to the Qdrant DB snapshot (25 points, 2026-03-09). Added regression tests for BUG-003, IsPublic correctness, signature accuracy, and line coverage.
Engine (BUG-004): - StartIndexingAsync now queues recreate=true as pendingOverflow when a job is already running, instead of silently dropping the request - Fix all flaky engine test cleanups: properly wait for background goroutines from BOTH engine instances with time.Sleep before TempDir removal - Add tests: TestStartIndexingAsyncRecreateQueues/StartsImmediately Python parser (treesitter.go): - Add patchExceptAs workaround for gotreesitter v0.6.0 broken AST on except-as - Extract module-level variables/constants (extractAssignment/extractAssignmentDirect) - Extract class variables from class body blocks - Extract function/method calls for Code Graph relations (rag_find_usages) - Detect generators via nodeContainsType(yield) - Parse metaclass= keyword arguments in class bases - Refactor docstring extraction with stripDocstringQuotes helper - Handle gotreesitter putting string nodes directly in blocks (no wrapper) Python parser (extract.go): - Refactor getIndentation to use tagged switch
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 28 out of 28 changed files in this pull request and generated 8 comments.
Comments suppressed due to low confidence (2)
internal/service/tools/call_hierarchy.go:136
- Same bug as in
find_usages.go:idxis loaded from the persistedindex_status.jsonfile. Once any indexing run has completed, the file exists andidx != nilis always true — causing the response status to be incorrectly set to"indexing_in_progress"even when no indexing job is active.
The condition on line 133 should check whether indexing is actually in progress (e.g., via ActiveIndexingJobs() or checking idx.EndedAt == "") rather than just checking file existence.
idx := t.engine.GetIndexStatus(wctx.Root)
visited := make(map[string]bool)
rootNode := &CallNode{Name: symbolName}
// Try to find root symbol info
rootRes := t.findSymbolInfo(ctx, wctx.ID, symbolName)
if rootRes != nil {
rootNode.Type, _ = rootRes.Point.Payload["type"].(string)
rootNode.FilePath, _ = rootRes.Point.Payload["file_path"].(string)
rootNode.Package, _ = rootRes.Point.Payload["package"].(string)
} else {
// If nothing is indexed yet, ExactSearchPolyglot will return ErrNoCollectionsFound.
// Signal indexing status instead of returning an empty hierarchy.
_, sErr := t.engine.ExactSearchPolyglot(ctx, wctx.ID, map[string]interface{}{"name": symbolName}, 1)
var noCollections *engine.ErrNoCollectionsFound
if errors.As(sErr, &noCollections) {
resp := ToolResponse{
Status: "indexing_required",
Message: fmt.Sprintf("⏳ Workspace '%s' is not indexed yet. Indexing is required for complete call hierarchy results.", wctx.Root),
Context: ContextFromWorkspaceWithStatus(wctx, t.engine),
}
if idx != nil {
resp.Status = "indexing_in_progress"
resp.Data = map[string]any{"indexing": idx}
}
internal/service/tools/find_usages.go:105
- Bug:
idxis now loaded from the persistedindex_status.jsonfile viaGetIndexStatus(). Unlike the oldGetIndexProgress()which returned non-nil only when an active in-memory job was running, the file persists after indexing completes. This meansidx != nilwill always be true once any indexing run has occurred, causing the status to incorrectly change from"indexing_required"to"indexing_in_progress"even when indexing finished long ago.
To fix: either check if an indexing job is currently active (via ActiveIndexingJobs() or indexingJobs.Load(wctx.ID)), or check a specific field on the status (e.g., s.EndedAt == "") before treating it as "in progress".
idx := t.engine.GetIndexStatus(wctx.Root)
allResults, err := t.engine.ExactSearchPolyglot(ctx, wctx.ID, filter, 100)
if err != nil {
var noCollections *engine.ErrNoCollectionsFound
if errors.As(err, &noCollections) {
resp := ToolResponse{
Status: "indexing_required",
Message: fmt.Sprintf("⏳ Workspace '%s' is not indexed yet. Indexing is required for complete results.", wctx.Root),
Context: ContextFromWorkspaceWithStatus(wctx, t.engine),
}
if idx != nil {
resp.Status = "indexing_in_progress"
}
This addresses issues where indexing large files (e.g., barou.sql) caused the host system to freeze due to host CPU/GPU starvation and excessive GC pressure. - Fix Ollama throttling bug in indexer service by correctly using a 150ms delay instead of 10ms. - Prevent GC thrashing in treesitter parser by evaluating byte sizes instead of allocating strings for every AST node. - Truncate massive leaf nodes (>8KB) to prevent crashing the Ollama embedding API.
This addresses issues where indexing large files (e.g., barou.sql) caused the host system to freeze due to host CPU/GPU starvation and excessive GC pressure. - Fix Ollama throttling bug in indexer service by correctly using a 150ms delay instead of 10ms. - Prevent GC thrashing in treesitter parser by evaluating byte sizes instead of allocating strings for every AST node. - Truncate massive leaf nodes (>8KB) to prevent crashing the Ollama embedding API.
Export IsInvalidRoot from the watch package and apply it as a safety check at the very start of StartIndexingAsync, before any job registration or SaveIndexStatus call. This prevents accidental indexing of dangerous paths such as the user home directory (~), filesystem root (/), or /tmp — which would cause .ragcode/index_status.json to be written outside any real workspace. - pkg/workspace/watch: isInvalidRoot → IsInvalidRoot (exported + docstring) - internal/service/engine: guard added as first check in StartIndexingAsync
Critical fixes: - Populate IndexingStatus in tool responses (was nil) for ListSkillsTool, InstallSkillTool, EvaluateRagCodeTool, ReadFileContextTool, SmartSearchTool, ListPackageExportsTool — use ContextFromWorkspaceWithStatus consistently - fix(engine): preserve Languages map during incremental indexing in StartIndexingAsync (was overwriting with empty object) - fix(engine): extract finalizeIndexStatus helper to eliminate duplicated EndedAt/Elapsed/Error finalization logic in success and error branches - fix(engine): Progress callback — eliminate LoadIndexStatus (disk read + JSON unmarshal) on every tick; keep single *IndexStatus in-memory and only call SaveIndexStatus (atomic write) for disk flush every 10 files - fix(indexer): SaveIndexStatus uses atomic write-to-temp-then-rename to prevent concurrent readers seeing partial JSON Hidden from AI consumers: - LangStatus.Changed field now json:"-" — AI sees only on_disk and processed Cleanup: - smart_search_pipeline.go: fix extra blank lines and restore missing return statement after buildResponseMeta refactor - treesitter.go: replace invalid issues/TBD link with descriptive comment - watcher.go: clarify IsInvalidRoot doc comment (~ is not expanded by filepath.Clean; rejection is via os.UserHomeDir()) - BUGS.md: mark BUG-003 as Fixed (PR #40) - SUGGESTIONS.md: translate to English, update with current State-field status - analyzer_test.go: remove stale Qdrant DB snapshot references from comments - extract.go: fix getIndentation break → return to exit for-loop Tests: - analyzer_test.go: relax exact line number assertions to > 0 - treesitter_test.go: add 7 new tests for patchExceptAs, call extraction (Code Graph), module-level vars/constants, class vars, IsGenerator - treesitter.go: fix extractClassVarsFromBlock to handle assignment nodes placed directly in block without expression_statement wrapper
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 35 out of 36 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (2)
internal/service/tools/find_usages.go:105
- In the ErrNoCollectionsFound branch, resp.Status is switched to "indexing_in_progress" whenever idx != nil. With the new file-based IndexStatus, idx will be non-nil for any workspace that has ever written index_status.json (even if indexing already completed), so this can misreport "indexing_in_progress". Consider keying this on a real in-progress signal (e.g., idx.EndedAt == "" / idx.Error == "" or checking Engine.ActiveIndexingJobs for wctx.ID) instead of mere file existence.
idx := t.engine.GetIndexStatus(wctx.Root)
allResults, err := t.engine.ExactSearchPolyglot(ctx, wctx.ID, filter, 100)
if err != nil {
var noCollections *engine.ErrNoCollectionsFound
if errors.As(err, &noCollections) {
resp := ToolResponse{
Status: "indexing_required",
Message: fmt.Sprintf("⏳ Workspace '%s' is not indexed yet. Indexing is required for complete results.", wctx.Root),
Context: ContextFromWorkspaceWithStatus(wctx, t.engine),
}
if idx != nil {
resp.Status = "indexing_in_progress"
}
internal/service/tools/call_hierarchy.go:136
- The "indexing_in_progress" status is currently set whenever idx != nil, but IndexStatus will be non-nil for any workspace that has an index_status.json from a previous run. This can incorrectly report indexing as in progress when indexing is actually completed (or stale). Consider switching this condition to something that reflects an active run (e.g., idx.EndedAt == "" and idx.Error == "" / checking Engine.ActiveIndexingJobs for wctx.ID).
idx := t.engine.GetIndexStatus(wctx.Root)
visited := make(map[string]bool)
rootNode := &CallNode{Name: symbolName}
// Try to find root symbol info
rootRes := t.findSymbolInfo(ctx, wctx.ID, symbolName)
if rootRes != nil {
rootNode.Type, _ = rootRes.Point.Payload["type"].(string)
rootNode.FilePath, _ = rootRes.Point.Payload["file_path"].(string)
rootNode.Package, _ = rootRes.Point.Payload["package"].(string)
} else {
// If nothing is indexed yet, ExactSearchPolyglot will return ErrNoCollectionsFound.
// Signal indexing status instead of returning an empty hierarchy.
_, sErr := t.engine.ExactSearchPolyglot(ctx, wctx.ID, map[string]interface{}{"name": symbolName}, 1)
var noCollections *engine.ErrNoCollectionsFound
if errors.As(sErr, &noCollections) {
resp := ToolResponse{
Status: "indexing_required",
Message: fmt.Sprintf("⏳ Workspace '%s' is not indexed yet. Indexing is required for complete call hierarchy results.", wctx.Root),
Context: ContextFromWorkspaceWithStatus(wctx, t.engine),
}
if idx != nil {
resp.Status = "indexing_in_progress"
resp.Data = map[string]any{"indexing": idx}
}
- index_status.Processed was being overwritten with session-only doneFiles, causing incremental re-indexes to reset the counter (e.g. 232 → 1) - Now captures baseProcessed before each language run and accumulates: Processed = baseProcessed + doneFiles - Full re-index (recreate=true or totalFiles >= diskTotal) resets base to 0 - Register JavaScript parser in daemon imports
- fix(engine): propagate recreate=true through pendingOverflow queue When a recreate request is queued while indexing, tryStartPendingIndex now correctly passes recreate=true to the follow-up job. - fix(index_status): Windows-safe atomic rename fallback os.Rename fails on Windows when dest exists; now falls back to remove-then-rename. - fix(docs/treesitter): avoid materializing huge strings for massive leaf nodes — slice source bytes directly when nodeLen > 8192. - cleanup(tests): remove empty t.Cleanup block in fallback search test.
Description
Replaces the old, complex
progressStoreindexing progress mechanism with a simple file-basedIndexStatussystem.Problem
The previous system had two separate file walks (one for counting, one for indexing), an in-memory
progressStorewithpreRegister/update/carry-over/flusherlogic, and multiple intermediate structs (IndexingProgressSummary,LangProgressItem). This made progress reporting confusing, inaccurate, and hard to maintain.Solution
{workspaceRoot}/.ragcode/index_status.json— a simple JSON file written by the indexer during processingProgresscallback inpkg/indexer/service.gowritesOnDisk,Changed,Processedcounts per language directly to the status fileindexer.LoadIndexStatus()— no intermediate transformationsstarting→running(first file processed) →completed/failedWhat was removed
progressStore(preRegister, update, carry-over, flusher goroutine)IndexingProgressSummary,LangProgressItem,BuildIndexingProgress,formatAge,buildIndexingMessageSearchCode/HybridSearchCode(redundant withDetectContextwhich already triggers re-indexing on first tool call)resumeAttemptsfield from Engine structWhat was added
pkg/indexer/index_status.go—IndexStatus,LangStatus,SaveIndexStatus,LoadIndexStatuspkg/indexer/index_status_test.go— round-trip and missing file testsengine.IndexWorkspaceFiles changed (22 files, +208 / -1182 lines)
engine/index_progress.go→pkg/indexer/index_status.goengine.go,response.go,smart_search.go,smart_search_pipeline.go,index_workspace.go,list_package_exports.go,find_usages.go,call_hierarchy.go,evaluate_ragcode.go,skills.go,read_file_context.goengine_searchcode_test.go,engine_*_test.go,health_metrics_test.go,detector_test.goType of change
Checklist:
go fmt ./...go test ./...and they pass