[BUG] ConcurrentToolExecutor collects tool_results in completion order, breaking prompt-cache stability

### Checks

- [x] I have updated to the lastest minor and patch version of Strands
- [x] I have checked the documentation and this is not expected behavior
- [x] I have searched [./issues](./issues?q=) and there are no duplicates of my issue

### Strands Version

1.35.0

### Python Version

3.13.9

### Operating System

macOS 26.4

### Installation Method

other

### Steps to Reproduce

```python
import asyncio
import json
from collections.abc import AsyncGenerator
from typing import Any, override

from strands import Agent, tool
from strands.models import Model
from strands.types.content import Messages
from strands.types.streaming import StreamEvent
from strands.types.tools import ToolSpec


class TwoToolModel(Model):
    """Model that emits two parallel tool_use blocks on the first turn, then ends."""

    def __init__(self) -> None:
        self.turn = 0

    @override
    def update_config(self, **model_config: Any) -> None:
        pass

    @override
    def get_config(self) -> Any:
        return {}

    @override
    def structured_output(
        self, output_model: Any, prompt: Messages, system_prompt: str | None = None, **kwargs: Any
    ) -> AsyncGenerator[Any, None]:
        raise NotImplementedError

    @override
    async def stream(
        self,
        messages: Messages,
        tool_specs: list[ToolSpec] | None = None,
        system_prompt: str | None = None,
        **kwargs: Any,
    ) -> AsyncGenerator[StreamEvent, None]:
        self.turn += 1
        yield StreamEvent(messageStart={"role": "assistant"})
        if self.turn == 1:
            for tid, name in [("id-slow", "slow_tool"), ("id-fast", "fast_tool")]:
                yield StreamEvent(
                    contentBlockStart={"start": {"toolUse": {"name": name, "toolUseId": tid}}},
                )
                yield StreamEvent(contentBlockDelta={"delta": {"toolUse": {"input": json.dumps({})}}})
                yield StreamEvent(contentBlockStop={})
            yield StreamEvent(messageStop={"stopReason": "tool_use"})
        else:
            yield StreamEvent(contentBlockStart={"contentBlockIndex": 0, "start": {}})
            yield StreamEvent(contentBlockDelta={"contentBlockIndex": 0, "delta": {"text": "done"}})
            yield StreamEvent(contentBlockStop={"contentBlockIndex": 0})
            yield StreamEvent(messageStop={"stopReason": "end_turn"})


@tool(name="slow_tool", description="sleeps briefly and returns")
async def slow_tool() -> str:
    await asyncio.sleep(0.05)
    return "slow done"


@tool(name="fast_tool", description="returns immediately")
async def fast_tool() -> str:
    return "fast done"


async def main() -> None:
    agent = Agent(model=TwoToolModel(), tools=[slow_tool, fast_tool])
    _ = await agent.invoke_async("call both tools")

    # Find the user message with the tool_result blocks
    tool_result_message = next(
        m for m in agent.messages
        if m.get("role") == "user" and any("toolResult" in b for b in m.get("content", []))
    )
    ids = [b["toolResult"]["toolUseId"] for b in tool_result_message["content"] if "toolResult" in b]
    print(f"tool_result order in next-turn prompt: {ids}")
    # Expected: ['id-slow', 'id-fast']  — matches the assistant's toolUse emission order
    # Actual:   ['id-fast', 'id-slow']  — fast_tool finished first, so it was appended first


asyncio.run(main())
```


### Expected Behavior

The `tool_result` blocks in the follow-up user message should appear in the **same order** as the `toolUse` blocks in the preceding assistant message. That order is deterministic (it comes from the model's output) and stable across runs, which is a prerequisite for byte-stable prompts.

### Actual Behavior

`tool_result` blocks appear in **tool-completion order**. With the reproducer above this is deterministically inverted (`fast_tool` finishes before `slow_tool`), but in general the ordering is scheduler-dependent and varies run to run when the tools have similar completion times.

### Additional Context

Byte-stable prompts are a load-bearing assumption for:

- **Anthropic's server-side prompt caching** — cache entries are keyed on the exact prompt prefix. A reordering of `tool_result` blocks in a turn invalidates every cache entry that would otherwise have been reused for the rest of the conversation.
- **Client-side request/response caching** — any workflow that hashes prompts to deduplicate LLM calls (replay caches used by CI, offline test runs, determinism harnesses) will miss on every run, because the scheduler coin-flip picks a different ordering.
- **Reproducible agent trajectories** — when cached replays fall through to live LLM calls, the new responses differ, and the agent's decision path forks. We hit this in a test suite where a single concurrent tool_use at turn 10 caused two subsets of otherwise-identical tests to end up on entirely different agent trajectories (16 vs 18 turns, different tool sequences, different final verdicts).

In our case this manifested as "two back-to-back runs of the same test suite, with no code changes, produced different prompt hashes and a new live LLM request against what was supposed to be a fully-cached offline run."

---

The bug is in `ConcurrentToolExecutor` (`src/strands/tools/executors/concurrent.py`) combined with `ToolExecutor._stream_with_trace` (`src/strands/tools/executors/_executor.py`).

`ConcurrentToolExecutor._execute` launches one `asyncio.Task` per tool_use, passing the **same shared** `tool_results: list[ToolResult]` to every task:

```python
for task_id, tool_use in enumerate(tool_uses):
    tasks.append(
        asyncio.create_task(
            self._task(
                agent,
                tool_use,
                tool_results,  # ← shared list
                ...
            )
        )
    )
```

Each task's `_stream_with_trace` appends to that shared list when its tool finishes:

```python
yield ToolResultEvent(after_event.result)
tool_results.append(after_event.result)  # ← append order = scheduler completion order
return
```

Then `event_loop.py` serializes the list in whatever order the scheduler left it:

```python
# src/strands/event_loop/event_loop.py
tool_result_message: Message = {
    "role": "user",
    "content": [{"toolResult": result} for result in tool_results],
}
```

`SequentialToolExecutor` does not have this problem — it iterates `tool_uses` in request order and each tool appends to `tool_results` serially, producing request-order output.

### Possible Solution

_No response_

### Related Issues

- #762 — [FEATURE] Custom Tool Executor. Action 2 of the additional-context list discusses removing `tool_results` from `_execute`'s signature in favor of yielded events.
- #1796 / PR #1797 — previous bug in the same executor (silently dropped exceptions): another case where state leaks through the shared mutable `tool_results` parameter rather than through the event stream.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] ConcurrentToolExecutor collects tool_results in completion order, breaking prompt-cache stability #2112

Checks

Strands Version

Python Version

Operating System

Installation Method

Steps to Reproduce

Expected Behavior

Actual Behavior

Additional Context

Possible Solution

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] ConcurrentToolExecutor collects tool_results in completion order, breaking prompt-cache stability #2112

Description

Checks

Strands Version

Python Version

Operating System

Installation Method

Steps to Reproduce

Expected Behavior

Actual Behavior

Additional Context

Possible Solution

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions