swarm - switch to handoff node only after current node stops #1147

pgrayy · 2025-11-07T03:53:50Z

Description

Set the handoff node to current only after the current node finishes. Currently, we make the switch in the middle of the current node execution. It is important to fix this for a few reasons:

We emit the AfterNodeCallEvent with the current node id and state.current_node set to the handoff node. This is going to cause customer confusion.
If the current node runs a tool that is interrupted in parallel (concurrently) to the hand off tool, the swarm state will be invalid. The swarm state needs a reference to the real current node so that users can properly respond to its interrupts and resume execution.

Related Issues

#204

Documentation PR

Implementation detail

Type of Change

Bug fix

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

I ran hatch run prepare: Relying on existing unit tests
I ran hatch test tests_integ/test_multiagent_swarm.py

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codecov · 2025-11-07T03:55:03Z

Codecov Report

❌ Patch coverage is 90.90909% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/strands/multiagent/swarm.py	90.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

pgrayy · 2025-11-07T03:58:31Z

src/strands/multiagent/swarm.py

                    logger.debug("reason=<%s> | stopping execution", reason)
                    break

-                # Get current node


I removed a few inline comments because I felt the code was already self explanatory.

pgrayy · 2025-11-07T04:00:19Z

src/strands/multiagent/swarm.py

                    self.state.node_history.append(current_node)
-
-                    #  After self.state add current node, swarm state finish updating, we persist here
                    self.hooks.invoke_callbacks(AfterNodeCallEvent(self, current_node.node_id, invocation_state))


To reiterate, setting self.state.current_node = handoff_node in the handoff tool means that AfterNodeCallEvent is emitted with a current node_id that does not match the self.state.current_node.node_id.

Also, for supporting interrupts, we can't have self.state.current_node update to the handoff node if the current node is interrupted.

pgrayy · 2025-11-07T04:14:41Z

src/strands/session/session_manager.py


        registry.add_callback(MultiAgentInitializedEvent, lambda event: self.initialize_multi_agent(event.source))
-        registry.add_callback(AfterNodeCallEvent, lambda event: self.sync_multi_agent(event.source))
+        registry.add_callback(BeforeNodeCallEvent, lambda event: self.sync_multi_agent(event.source))


Let's say we have successfully executed one node and are now executing the handoff node. If we crash on the handoff node, we would be left in different states depending on which event we persist on:

AfterNodeCallEvent: Current node is not set to the handoff node in session because the handoff node hasn't yet emitted its AfterNodeCallEvent. This means if we resume after crashing on the handoff node, we will be starting again from the first node.

BeforeNodeCallEvent: Current node is set to the handoff node in session because the handoff node already emitted its BeforeNodeCallEvent. This means if we resume after crashing on the handoff node, we will be starting again from the handoff node.

In short, persisting on AfterNodeCallEvent only made sense when setting the current node to the handoff in the handoff tool.

This changes also apply to Graph?
Initilized -> Before->1st node->After->Before->2nd node->After, so for 2nd node(handoff node) before/after is kinda same?

…nto swarm-handoff-delayed

pgrayy · 2025-11-09T17:09:11Z

src/strands/multiagent/swarm.py

-                    #  After self.state add current node, swarm state finish updating, we persist here
                    await self.hooks.invoke_callbacks_async(
                        AfterNodeCallEvent(self, current_node.node_id, invocation_state)
                    )


To reiterate, setting self.state.current_node = handoff_node in the handoff tool means that AfterNodeCallEvent is emitted with a current node_id that does not match the self.state.current_node.node_id.

Also, for supporting interrupts, we can't have self.state.current_node update to the handoff node if the current node is interrupted.

Now current_node means next nodes:
next_nodes = ([self.state.current_node.node_id] if self.state.completion_status == Status.EXECUTING and self.state.current_node else [] )

JackYPCOnline · 2025-11-10T16:22:58Z

src/strands/multiagent/swarm.py

    # Total metrics across all agents
    accumulated_metrics: Metrics = field(default_factory=lambda: Metrics(latencyMs=0))
    execution_time: int = 0  # Total execution time in milliseconds
+    handoff_node: SwarmNode | None = None  # The agent to execute next


if we want to introduce this handoff_node as next node, then we also need to update persist logic for mapping next_nodes_to_execute

Could you elaborate? I tested that the persisting logic continues to work as expected. What makes this work the same is syncing the swarm on BeforeNodeCallEvent as opposed to AfterNodeCallEvent. I make a comment about this down below.

JackYPCOnline · 2025-11-10T17:02:00Z

src/strands/multiagent/swarm.py

+                        current_node = self.state.handoff_node
+
+                        self.state.handoff_node = None
+                        self.state.current_node = current_node


Here self.state.current_node is previous next_node so this is actually the same node, only naming change right.

swarm - switch to handoff node only after current node stops

1e58e59

github-actions bot added the size/s label Nov 7, 2025

pgrayy temporarily deployed to auto-approve November 7, 2025 03:54 — with GitHub Actions Inactive

pgrayy commented Nov 7, 2025

View reviewed changes

pgrayy marked this pull request as ready for review November 7, 2025 04:15

mkmeral previously approved these changes Nov 9, 2025

View reviewed changes

Merge branch 'main' of https://github.com/strands-agents/sdk-python i…

de03fee

…nto swarm-handoff-delayed

pgrayy dismissed mkmeral’s stale review via de03fee November 9, 2025 17:08

github-actions bot added size/s and removed size/s labels Nov 9, 2025

pgrayy had a problem deploying to auto-approve November 9, 2025 17:08 — with GitHub Actions Failure

pgrayy commented Nov 9, 2025

View reviewed changes

pgrayy requested a review from mkmeral November 10, 2025 16:06

JackYPCOnline reviewed Nov 10, 2025

View reviewed changes

swarm - switch to handoff node only after current node stops #1147

Are you sure you want to change the base?

swarm - switch to handoff node only after current node stops #1147

Uh oh!

Conversation

pgrayy commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

codecov bot commented Nov 7, 2025

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JackYPCOnline Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pgrayy Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pgrayy commented Nov 7, 2025 •

edited

Loading

JackYPCOnline Nov 10, 2025 •

edited

Loading

pgrayy Nov 10, 2025 •

edited

Loading