Skip to content

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Feb 10, 2026

Summary

Comprehensive fix for race conditions and error handling gaps in the subtask delegation system. Addresses multiple failure modes that could leave parent tasks permanently stuck in "delegated" status, causing nested subtasks to hang.

Changes

Fix 1: Remove status from taskMetadata rebuild

  • taskMetadata.ts: Removed initialStatus parameter and its spread into the output HistoryItem
  • Task.ts: Removed initialStatus from the taskMetadata() call in saveClineMessages()
  • ClineProvider.ts: Added explicit updateTaskHistory call to set child initial "active" status after creation

Fix 2: Persist delegation metadata to per-task files

  • New delegationMeta.ts: readDelegationMeta() / saveDelegationMeta() using safeWriteJson (cross-process atomic writes)
  • Uses separate delegation_metadata.json file (avoids collision with FileContextTracker)
  • getTaskWithId(): Merges per-task delegation file onto historyItem after reading from globalState
  • Compile-time key sync guard via Record<keyof Required<DelegationMeta>, true>
  • Uses null instead of undefined for cleared fields (survives JSON serialization)

Fix 3: Add delegation-in-progress guard

  • delegateParentAndOpenChild() and reopenParentFromDelegation() wrapped in try/finally that sets/clears delegationInProgress
  • showTaskWithId(), cancelTask(), deleteTaskWithId() return early with user notification when delegationInProgress is true
  • Early clear of delegationInProgress before resumeAfterDelegation() to prevent deadlock in nested chains

Fix 4: Error handling on child.start()

  • Task.start() now returns Promise<void> | void so callers can attach .catch() handlers
  • delegateParentAndOpenChild() attaches .catch() on the fire-and-forget Promise for parent status repair

Fix 5: Cancel debounce in dispose()

  • Added this.debouncedEmitTokenUsage.cancel() in Task.dispose()

Fix 6: Delegation failure recovery in AttemptCompletionTool

  • Moved pushToolResult after successful reopenParentFromDelegation (was premature)
  • Added retry logic: retries delegation once after 500ms delay
  • Added parent status repair on failure: restores parent to "active", clears awaitingChildId
  • Fixed double pushToolResult bug: returns true from catch block to prevent fallthrough
  • Updated misleading error message from "Failed to get history" to "Delegation failed"

Fix 7: TOCTOU race fixes in reopenParentFromDelegation

  • Passes skipDelegationRepair: true to removeClineFromStack() when closing child
  • Fresh re-read of parent metadata before updateTaskHistory write (eliminates stale-snapshot overwrites)

Fix 8: Harden against abort races and infinite hangs

  • Added abort early-exit in presentAssistantMessage() after lock release
  • Added .catch(() => {}) on all 11 fire-and-forget presentAssistantMessage() calls
  • Added 60s timeout on pWaitFor(userMessageContentReady) with diagnostic logging
  • Removed orphaned pushToolResult in NewTaskTool (reopenParentFromDelegation handles injection)

Test Plan

  • delegationMeta.spec.ts — 9 tests for read/write/backward-compat/key-filtering
  • provider-delegation.spec.ts — updated assertions for new delegation flow
  • history-resume-delegation.spec.ts — 11 tests with delegation metadata support
  • removeClineFromStack-delegation.spec.ts — 8 tests with delegation metadata mock
  • newTaskTool.spec.ts — updated to reflect removed orphaned pushToolResult
  • All delegation tests pass (37/37)
  • Full test suite: 5557 passed

Important

Refactored task delegation system to persist delegation metadata separately from task metadata, remove initialStatus from task creation, and add error recovery mechanisms for failed delegations.

Delegation Metadata Persistence

  • New delegationMeta module (src/core/task-persistence/delegationMeta.ts):
    • Introduced DelegationMeta interface to track delegation status, delegated recipient, child task relationships, and completion information.
    • Implemented readDelegationMeta() and saveDelegationMeta() functions for per-task delegation metadata storage with validation and error handling.
    • Added compile-time safeguard to keep interface keys synchronized with known keys set.
  • Updated ClineProvider:
    • Added delegationInProgress flag to prevent concurrent operations during delegation.
    • Modified delegateParentAndOpenChild() to persist delegation metadata to both globalState and per-task files, with error recovery if child startup fails.
    • Modified reopenParentFromDelegation() to persist delegation metadata for both parent and child tasks.
    • Added logic to merge per-task delegation metadata into task history items in getTaskWithId().
    • Added persistDelegationMeta() method to expose metadata persistence through the provider interface.

Task Creation and Status Management

  • Removed initialStatus parameter:
    • Removed initialStatus property from CreateTaskOptions interface in packages/types/src/task.ts.
    • Removed initialStatus from TaskMetadataOptions type and taskMetadata() function in src/core/task-persistence/taskMetadata.ts.
    • Removed initialStatus from TaskOptions interface and Task class constructor in src/core/task/Task.ts.
    • Task status is now persisted separately via delegation metadata instead of during initial creation.

Error Recovery and Delegation Failure Handling

  • Enhanced AttemptCompletionTool:
    • Added retry logic with 500ms delay on first delegation failure.
    • Implemented comprehensive error handling that repairs parent task status by resetting it to active state and clearing delegation metadata when both delegation attempts fail.
    • Updated error message to be more concise and indicate fallthrough to standalone completion.
    • Moved pushToolResult call to execute only after successful delegation.
  • Improved presentAssistantMessage():
    • Added early exit check after unlocking presentation lock to handle aborted tasks during tool execution.
    • Added catch handlers to recursive calls to suppress unhandled promise rejections.

Promise-Based Task Startup

  • Updated Task.start() method:
    • Changed return type from void to Promise<void> | void and made it return the result of startTask.
    • Added timeout handling to pWaitFor call in recursivelyMakeClineRequests with 60 second timeout and error logging.
    • Added cancellation of debouncedEmitTokenUsage in cleanup/abort logic to prevent zombie callbacks.
  • Updated delegateParentAndOpenChild() in ClineProvider:
    • Modified child task startup to handle promise-based return and implement error recovery.

Tool Updates

  • NewTaskTool:
    • Removed assignment of return value from delegateParentAndOpenChild call.
    • Removed pushToolResult call that was reflecting delegation in tool result (now handled by reopenParentFromDelegation).
    • Added comments explaining tool result injection behavior.
  • DelegationProvider interface:
    • Added updateTaskHistory() method for updating task history with optional broadcast flag.
    • Added persistDelegationMeta() method for persisting delegation metadata.

Test Updates

  • Updated test expectations in newTaskTool.spec.ts:
    • Changed 7 test assertions to expect no pushToolResult calls instead of calls with "Delegated to child task" messages.
  • Enhanced test mocks in provider-delegation.spec.ts, history-resume-delegation.spec.ts, and removeClineFromStack-delegation.spec.ts:
    • Added mocks for delegationMeta module with readDelegationMeta and saveDelegationMeta functions.
    • Added contextProxy property with globalStorageUri to provider mocks.
    • Updated createTask call expectations to remove initialStatus parameter.
    • Updated updateTaskHistory call count expectations and assertions.
  • New test file src/core/task-persistence/__tests__/delegationMeta.spec.ts:
    • Comprehensive test coverage for readDelegationMeta() and saveDelegationMeta() functions.
    • Tests for missing files, valid metadata parsing, filtering unknown keys, invalid JSON, and edge cases.

Configuration

  • Updated globalFileNames.ts:
    • Added delegationMetadata property with value "delegation_metadata.json" for global delegation metadata storage.

This description was created by Ellipsis for 1d050ae. You can customize this summary. It will automatically update as commits are pushed.

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. bug Something isn't working labels Feb 10, 2026
@roomote
Copy link
Contributor

roomote bot commented Feb 10, 2026

Rooviewer Clock   See task

All previously flagged issues are resolved. No new issues found.

  • File collision: saveDelegationMeta() and FileContextTracker.saveTaskMetadata() both do full overwrites to the same task_metadata.json file with incompatible data structures, causing mutual data loss. -- Fixed: delegation metadata now uses a dedicated delegation_metadata.json file.
  • Dead code in public API: initialStatus was removed from TaskOptions and the Task class but still existed in CreateTaskOptions. -- Fixed: initialStatus removed from CreateTaskOptions in packages/types/src/task.ts.
  • undefined delegation fields not surviving JSON round-trip: saveDelegationMeta writes undefined values (e.g. awaitingChildId: undefined) which JSON.stringify silently strips. -- Fixed: clearable fields now use null instead of undefined, and getTaskWithId converts null back to undefined on read.
  • child.start() error handling is unreachable: Task.start() is synchronous and fires startTask() without awaiting. -- Fixed: return value is now captured as startPromise and error handling is attached via .catch().
  • Double pushToolResult on delegation failure fallthrough: When both delegation attempts fail in delegateToParent, pushToolResult(toolError(...)) is called, then return false falls through to the normal completion ask flow. -- Fixed: changed to return true so the caller exits immediately after pushing the error tool result.
  • awaitingChildId: undefined in AttemptCompletionTool repair: persistDelegationMeta at line 219 passes awaitingChildId: undefined which gets stripped by JSON.stringify, failing to clear a previously-set value in the delegation file after extension restart. All other repair paths correctly use null. -- Fixed: changed to null.
Previous reviews

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

@hannesrudolph hannesrudolph marked this pull request as draft February 10, 2026 20:58
Copy link
Member

@daniel-lxs daniel-lxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is a well-reasoned fix. The root cause analysis is correct -- saveClineMessages rebuilding the full HistoryItem (including status) via taskMetadata() was clobbering delegation state transitions. The five fixes address distinct failure modes coherently.

A few observations:

  • The double-write pattern (globalState + per-task file) could diverge, but since getTaskWithId always merges the file on top, it's self-healing on the next read. Acceptable tradeoff.
  • The early clear of delegationInProgress in reopenParentFromDelegation before resumeAfterDelegation() is a nice detail -- avoids deadlock when the resumed parent immediately re-delegates. A brief inline comment explaining why it's not just relying on the finally would help future readers (the existing comment is good but could be slightly more explicit about the finally also setting it).
  • Note: PR has merge conflicts to resolve.

hannesrudolph added a commit that referenced this pull request Feb 10, 2026
- Expand inline comment in reopenParentFromDelegation explaining why
  delegationInProgress is cleared before resume and how the finally
  block acts as a safety net on error paths.
- Add compile-time safeguard (Record<keyof Required<DelegationMeta>, true>)
  so DELEGATION_META_KEYS cannot silently drift from the interface.
- Remove unused initialStatus from CreateTaskOptions (packages/types).
  The field was never consumed; delegation status is persisted via
  saveDelegationMeta / updateTaskHistory instead.
@hannesrudolph hannesrudolph marked this pull request as ready for review February 11, 2026 00:04
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 11, 2026
@roomote
Copy link
Contributor

roomote bot commented Feb 11, 2026

Rooviewer Clock   See task

Two issues found in the current revision.

  • undefined delegation fields not surviving JSON round-trip: saveDelegationMeta writes undefined values (e.g. awaitingChildId: undefined) which JSON.stringify silently strips. The delegation file cannot clear fields previously set in globalState, undermining its "source of truth" role.
  • child.start() error handling is unreachable: Task.start() is synchronous and fires startTask() without awaiting. The try/catch around child.start() in delegateParentAndOpenChild cannot catch async errors -- the parent-repair code inside it is dead code.
Previous reviews

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

Copy link
Collaborator Author

@hannesrudolph hannesrudolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: fix/delegation-race-conditions

Solid work addressing the core race conditions in the delegation system. The delegationInProgress mutex, per-task delegation metadata files, retry-once pattern in AttemptCompletionTool, and the pWaitFor timeout are all well-motivated improvements.

Summary of findings

Priority Count Description
P1 1 Unsafe partial HistoryItem cast
P2 3 Value validation gap, silent metadata failure, missing test coverage
P3 1 Minor redundancy

All 7 previously-raised threads are resolved and not repeated here.

hannesrudolph and others added 2 commits February 11, 2026 09:56
- Remove initialStatus from taskMetadata rebuild to prevent saveClineMessages
  from clobbering delegation status on every save
- Add per-task delegation metadata files (delegationMeta.ts) as cross-process-safe
  source of truth using safeWriteJson with proper-lockfile
- Add delegationInProgress reentrancy guard to prevent UI actions from
  interleaving with async delegation transitions
- Wrap child.start() in try/catch with parent status repair on failure
- Cancel debouncedEmitTokenUsage in Task.dispose() to prevent zombie callbacks
FileContextTracker already owns task_metadata.json for files_in_context
data. Using the same file for delegation metadata causes mutual data
destruction on full overwrites. Add a separate delegationMetadata entry
to GlobalFileNames and update delegationMeta read/write to use it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
hannesrudolph and others added 9 commits February 11, 2026 09:56
The delegation guard flag was held throughout the parent's
resumeAfterDelegation() await. If the resumed parent immediately
used new_task, delegateParentAndOpenChild would fail with
"Delegation already in progress". Clear the flag before resuming
since the metadata transition is complete at that point.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove initialStatus field from TaskOptions and Task class (dead after
  taskMetadata() no longer consumes it)
- Remove initialStatus from callers in ClineProvider.ts
  (createTaskWithHistoryItem, delegateParentAndOpenChild)
- Update comment blocks in delegateParentAndOpenChild and
  reopenParentFromDelegation to reflect new status-setting mechanism
  (explicit updateTaskHistory call instead of initialStatus in taskMetadata)
- Update test assertion in provider-delegation.spec.ts
- Expand inline comment in reopenParentFromDelegation explaining why
  delegationInProgress is cleared before resume and how the finally
  block acts as a safety net on error paths.
- Add compile-time safeguard (Record<keyof Required<DelegationMeta>, true>)
  so DELEGATION_META_KEYS cannot silently drift from the interface.
- Remove unused initialStatus from CreateTaskOptions (packages/types).
  The field was never consumed; delegation status is persisted via
  saveDelegationMeta / updateTaskHistory instead.
Addresses review feedback: showTaskWithId, deleteTaskWithId, and cancelTask
now show an informational message when skipped during delegation, instead of
silently dropping the user action with only a log call.
- Move pushToolResult after successful delegation (Fix 1)
- Add retry + parent status repair on delegation failure (Fix 2)
- Pass skipDelegationRepair to removeClineFromStack (Fix 3)
- Re-read parent metadata before write to fix TOCTOU race (Fix 4)
- Fix misleading "Failed to get history" error message (Fix 5)
…errors catchable

- Change undefined to null for cleared delegation fields in saveDelegationMeta
  calls so JSON.stringify preserves them and getTaskWithId can clear stale
  globalState values
- Update DelegationMeta type to allow null for clearable fields
- Replace individual field guards in getTaskWithId with Object.entries loop
  that converts null → undefined when applying to historyItem
- Change Task.start() return type to Promise<void> | void so callers can
  attach .catch() handlers for async errors
- Replace unreachable try/catch in delegateParentAndOpenChild with .catch()
  on the fire-and-forget start Promise
- Update test mock to return Promise.resolve() from start()
@hannesrudolph hannesrudolph force-pushed the fix/delegation-race-conditions branch from 1d050ae to 03d4b44 Compare February 11, 2026 16:59
})
await provider.persistDelegationMeta(parentTaskId, {
status: "active",
awaitingChildId: undefined,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awaitingChildId: undefined here will be stripped by JSON.stringify inside safeWriteJson, so the delegation meta file won't contain the key at all. If the file previously had awaitingChildId set (from the delegation), this repair path cannot clear it. After an extension restart (globalState loss), getTaskWithId merges the stale file value back onto the history item. All other repair paths in ClineProvider.ts (lines 527, 3472, 3683) correctly use null for this field. This should be null to match.

Suggested change
awaitingChildId: undefined,
awaitingChildId: null,

Fix it with Roo Code or mention @roomote and request a fix.

…cy, test coverage

- Add vscode.window.showWarningMessage when delegation metadata save fails (#3)
- Hoist globalStoragePath in delegateParentAndOpenChild to avoid duplication (#5)
- Change awaitingChildId: undefined to null for JSON serialization (#11)
- Add 3 delegation flow tests: retry success, parent repair, repair failure (#4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants