-
Notifications
You must be signed in to change notification settings - Fork 2.9k
fix: race conditions in subtask delegation system #11379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
All previously flagged issues are resolved. No new issues found.
Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues. |
daniel-lxs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is a well-reasoned fix. The root cause analysis is correct -- saveClineMessages rebuilding the full HistoryItem (including status) via taskMetadata() was clobbering delegation state transitions. The five fixes address distinct failure modes coherently.
A few observations:
- The double-write pattern (globalState + per-task file) could diverge, but since
getTaskWithIdalways merges the file on top, it's self-healing on the next read. Acceptable tradeoff. - The early clear of
delegationInProgressinreopenParentFromDelegationbeforeresumeAfterDelegation()is a nice detail -- avoids deadlock when the resumed parent immediately re-delegates. A brief inline comment explaining why it's not just relying on thefinallywould help future readers (the existing comment is good but could be slightly more explicit about thefinallyalso setting it). - Note: PR has merge conflicts to resolve.
- Expand inline comment in reopenParentFromDelegation explaining why delegationInProgress is cleared before resume and how the finally block acts as a safety net on error paths. - Add compile-time safeguard (Record<keyof Required<DelegationMeta>, true>) so DELEGATION_META_KEYS cannot silently drift from the interface. - Remove unused initialStatus from CreateTaskOptions (packages/types). The field was never consumed; delegation status is persisted via saveDelegationMeta / updateTaskHistory instead.
Two issues found in the current revision.
Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues. |
hannesrudolph
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review: fix/delegation-race-conditions
Solid work addressing the core race conditions in the delegation system. The delegationInProgress mutex, per-task delegation metadata files, retry-once pattern in AttemptCompletionTool, and the pWaitFor timeout are all well-motivated improvements.
Summary of findings
| Priority | Count | Description |
|---|---|---|
| P1 | 1 | Unsafe partial HistoryItem cast |
| P2 | 3 | Value validation gap, silent metadata failure, missing test coverage |
| P3 | 1 | Minor redundancy |
All 7 previously-raised threads are resolved and not repeated here.
- Remove initialStatus from taskMetadata rebuild to prevent saveClineMessages from clobbering delegation status on every save - Add per-task delegation metadata files (delegationMeta.ts) as cross-process-safe source of truth using safeWriteJson with proper-lockfile - Add delegationInProgress reentrancy guard to prevent UI actions from interleaving with async delegation transitions - Wrap child.start() in try/catch with parent status repair on failure - Cancel debouncedEmitTokenUsage in Task.dispose() to prevent zombie callbacks
FileContextTracker already owns task_metadata.json for files_in_context data. Using the same file for delegation metadata causes mutual data destruction on full overwrites. Add a separate delegationMetadata entry to GlobalFileNames and update delegationMeta read/write to use it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The delegation guard flag was held throughout the parent's resumeAfterDelegation() await. If the resumed parent immediately used new_task, delegateParentAndOpenChild would fail with "Delegation already in progress". Clear the flag before resuming since the metadata transition is complete at that point. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove initialStatus field from TaskOptions and Task class (dead after taskMetadata() no longer consumes it) - Remove initialStatus from callers in ClineProvider.ts (createTaskWithHistoryItem, delegateParentAndOpenChild) - Update comment blocks in delegateParentAndOpenChild and reopenParentFromDelegation to reflect new status-setting mechanism (explicit updateTaskHistory call instead of initialStatus in taskMetadata) - Update test assertion in provider-delegation.spec.ts
- Expand inline comment in reopenParentFromDelegation explaining why delegationInProgress is cleared before resume and how the finally block acts as a safety net on error paths. - Add compile-time safeguard (Record<keyof Required<DelegationMeta>, true>) so DELEGATION_META_KEYS cannot silently drift from the interface. - Remove unused initialStatus from CreateTaskOptions (packages/types). The field was never consumed; delegation status is persisted via saveDelegationMeta / updateTaskHistory instead.
Addresses review feedback: showTaskWithId, deleteTaskWithId, and cancelTask now show an informational message when skipped during delegation, instead of silently dropping the user action with only a log call.
- Move pushToolResult after successful delegation (Fix 1) - Add retry + parent status repair on delegation failure (Fix 2) - Pass skipDelegationRepair to removeClineFromStack (Fix 3) - Re-read parent metadata before write to fix TOCTOU race (Fix 4) - Fix misleading "Failed to get history" error message (Fix 5)
…errors catchable - Change undefined to null for cleared delegation fields in saveDelegationMeta calls so JSON.stringify preserves them and getTaskWithId can clear stale globalState values - Update DelegationMeta type to allow null for clearable fields - Replace individual field guards in getTaskWithId with Object.entries loop that converts null → undefined when applying to historyItem - Change Task.start() return type to Promise<void> | void so callers can attach .catch() handlers for async errors - Replace unreachable try/catch in delegateParentAndOpenChild with .catch() on the fire-and-forget start Promise - Update test mock to return Promise.resolve() from start()
1d050ae to
03d4b44
Compare
| }) | ||
| await provider.persistDelegationMeta(parentTaskId, { | ||
| status: "active", | ||
| awaitingChildId: undefined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awaitingChildId: undefined here will be stripped by JSON.stringify inside safeWriteJson, so the delegation meta file won't contain the key at all. If the file previously had awaitingChildId set (from the delegation), this repair path cannot clear it. After an extension restart (globalState loss), getTaskWithId merges the stale file value back onto the history item. All other repair paths in ClineProvider.ts (lines 527, 3472, 3683) correctly use null for this field. This should be null to match.
| awaitingChildId: undefined, | |
| awaitingChildId: null, |
Fix it with Roo Code or mention @roomote and request a fix.
…cy, test coverage - Add vscode.window.showWarningMessage when delegation metadata save fails (#3) - Hoist globalStoragePath in delegateParentAndOpenChild to avoid duplication (#5) - Change awaitingChildId: undefined to null for JSON serialization (#11) - Add 3 delegation flow tests: retry success, parent repair, repair failure (#4)
Summary
Comprehensive fix for race conditions and error handling gaps in the subtask delegation system. Addresses multiple failure modes that could leave parent tasks permanently stuck in "delegated" status, causing nested subtasks to hang.
Changes
Fix 1: Remove status from taskMetadata rebuild
taskMetadata.ts: RemovedinitialStatusparameter and its spread into the outputHistoryItemTask.ts: RemovedinitialStatusfrom thetaskMetadata()call insaveClineMessages()ClineProvider.ts: Added explicitupdateTaskHistorycall to set child initial "active" status after creationFix 2: Persist delegation metadata to per-task files
delegationMeta.ts:readDelegationMeta()/saveDelegationMeta()usingsafeWriteJson(cross-process atomic writes)delegation_metadata.jsonfile (avoids collision withFileContextTracker)getTaskWithId(): Merges per-task delegation file ontohistoryItemafter reading fromglobalStateRecord<keyof Required<DelegationMeta>, true>nullinstead ofundefinedfor cleared fields (survives JSON serialization)Fix 3: Add delegation-in-progress guard
delegateParentAndOpenChild()andreopenParentFromDelegation()wrapped intry/finallythat sets/clearsdelegationInProgressshowTaskWithId(),cancelTask(),deleteTaskWithId()return early with user notification whendelegationInProgressis truedelegationInProgressbeforeresumeAfterDelegation()to prevent deadlock in nested chainsFix 4: Error handling on child.start()
Task.start()now returnsPromise<void> | voidso callers can attach.catch()handlersdelegateParentAndOpenChild()attaches.catch()on the fire-and-forget Promise for parent status repairFix 5: Cancel debounce in dispose()
this.debouncedEmitTokenUsage.cancel()inTask.dispose()Fix 6: Delegation failure recovery in AttemptCompletionTool
pushToolResultafter successfulreopenParentFromDelegation(was premature)awaitingChildIdpushToolResultbug: returnstruefrom catch block to prevent fallthroughFix 7: TOCTOU race fixes in reopenParentFromDelegation
skipDelegationRepair: truetoremoveClineFromStack()when closing childupdateTaskHistorywrite (eliminates stale-snapshot overwrites)Fix 8: Harden against abort races and infinite hangs
presentAssistantMessage()after lock release.catch(() => {})on all 11 fire-and-forgetpresentAssistantMessage()callspWaitFor(userMessageContentReady)with diagnostic loggingpushToolResultinNewTaskTool(reopenParentFromDelegation handles injection)Test Plan
delegationMeta.spec.ts— 9 tests for read/write/backward-compat/key-filteringprovider-delegation.spec.ts— updated assertions for new delegation flowhistory-resume-delegation.spec.ts— 11 tests with delegation metadata supportremoveClineFromStack-delegation.spec.ts— 8 tests with delegation metadata mocknewTaskTool.spec.ts— updated to reflect removed orphaned pushToolResultImportant
Refactored task delegation system to persist delegation metadata separately from task metadata, remove initialStatus from task creation, and add error recovery mechanisms for failed delegations.
Delegation Metadata Persistence
delegationMetamodule (src/core/task-persistence/delegationMeta.ts):DelegationMetainterface to track delegation status, delegated recipient, child task relationships, and completion information.readDelegationMeta()andsaveDelegationMeta()functions for per-task delegation metadata storage with validation and error handling.ClineProvider:delegationInProgressflag to prevent concurrent operations during delegation.delegateParentAndOpenChild()to persist delegation metadata to both globalState and per-task files, with error recovery if child startup fails.reopenParentFromDelegation()to persist delegation metadata for both parent and child tasks.getTaskWithId().persistDelegationMeta()method to expose metadata persistence through the provider interface.Task Creation and Status Management
initialStatusparameter:initialStatusproperty fromCreateTaskOptionsinterface inpackages/types/src/task.ts.initialStatusfromTaskMetadataOptionstype andtaskMetadata()function insrc/core/task-persistence/taskMetadata.ts.initialStatusfromTaskOptionsinterface andTaskclass constructor insrc/core/task/Task.ts.Error Recovery and Delegation Failure Handling
AttemptCompletionTool:pushToolResultcall to execute only after successful delegation.presentAssistantMessage():Promise-Based Task Startup
Task.start()method:voidtoPromise<void> | voidand made it return the result ofstartTask.pWaitForcall inrecursivelyMakeClineRequestswith 60 second timeout and error logging.debouncedEmitTokenUsagein cleanup/abort logic to prevent zombie callbacks.delegateParentAndOpenChild()inClineProvider:Tool Updates
NewTaskTool:delegateParentAndOpenChildcall.pushToolResultcall that was reflecting delegation in tool result (now handled byreopenParentFromDelegation).DelegationProviderinterface:updateTaskHistory()method for updating task history with optional broadcast flag.persistDelegationMeta()method for persisting delegation metadata.Test Updates
newTaskTool.spec.ts:pushToolResultcalls instead of calls with "Delegated to child task" messages.provider-delegation.spec.ts,history-resume-delegation.spec.ts, andremoveClineFromStack-delegation.spec.ts:delegationMetamodule withreadDelegationMetaandsaveDelegationMetafunctions.contextProxyproperty withglobalStorageUrito provider mocks.createTaskcall expectations to removeinitialStatusparameter.updateTaskHistorycall count expectations and assertions.src/core/task-persistence/__tests__/delegationMeta.spec.ts:readDelegationMeta()andsaveDelegationMeta()functions.Configuration
globalFileNames.ts:delegationMetadataproperty with value"delegation_metadata.json"for global delegation metadata storage.This description was created by
for 1d050ae. You can customize this summary. It will automatically update as commits are pushed.