Skip to content

Comments

feat: Add GraphAgent for directed-graph workflow orchestration#4582

Open
drahnreb wants to merge 1 commit intogoogle:mainfrom
drahnreb:feat/graph-agent-pr1
Open

feat: Add GraphAgent for directed-graph workflow orchestration#4582
drahnreb wants to merge 1 commit intogoogle:mainfrom
drahnreb:feat/graph-agent-pr1

Conversation

@drahnreb
Copy link

Please ensure you have read the contribution guide before creating a pull request.

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

2. Or, if no issue exists, describe the change:

Problem:
ADK lacks a general-purpose directed-graph orchestrator. Users cannot express conditional branching, cycles, or arbitrary DAG topologies with the existing SequentialAgent/ParallelAgent/LoopAgent.

Solution:
Add GraphAgent engine for directed-graph workflows with conditional routing, cyclic execution, state management with reducers, typed events, streaming, callbacks, rewind, resumability, OpenTelemetry tracing, evaluation metrics, and CLI graph visualization for GraphAgent topologies. Includes 9 sample agents and design documentation.

What's included:

  • src/google/adk/agents/graph/ (core files: graph_agent.py, graph_node.py, graph_edge.py, graph_state.py, graph_events.py, graph_export.py, graph_rewind.py, graph_agent_config.py, graph_agent_state.py, graph_telemetry.py, callbacks.py, evaluation_metrics.py, state_utils.py)
  • src/google/adk/telemetry/graph_tracing.py
  • src/google/adk/cli/agent_graph.py — CLI viz extension (core GraphAgent rendering)
  • 12 test files, ~370 tests
  • 9 samples (graph_agent_basic, graph_agent_react_pattern, graph_agent_dynamic_queue, examples 01-03/07-08/15)
  • graph_agent_design.md

Part 1 of 5 — see tracking issue #4581.

Testing Plan

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.
pytest tests/unittests/agents/test_graph_*.py -v — 12 test files, ~370 tests ✅
pytest tests/unittests/telemetry/test_graph_tracing.py -v ✅
pytest tests/unittests/cli/test_agent_graph.py -v — 5 CLI viz tests ✅

Manual End-to-End (E2E) Tests:

All 9 sample agents import and instantiate successfully.

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have manually tested my changes end-to-end.
  • Any dependent changes have been merged and published in downstream modules.

Additional context

This is part 1 of a 5-PR stack introducing GraphAgent. Subsequent PRs add: graph patterns (PR 2), parallel execution (PR 3), interrupt service (PR 4), checkpoint service (PR 5).

@google-cla
Copy link

google-cla bot commented Feb 22, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@adk-bot
Copy link
Collaborator

adk-bot commented Feb 22, 2026

Response from ADK Triaging Agent

Hello @drahnreb, thank you for your contribution!

Before we can merge this pull request, you'll need to sign a Contributor License Agreement (CLA). You can do so by following the instructions at https://cla.developers.google.com/.

For more information, please see the contribution guidelines.

Thanks!

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @drahnreb, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the foundational GraphAgent to the Agent Development Kit, providing a robust solution for orchestrating sophisticated, state-dependent workflows. It empowers developers to design agents that can dynamically adapt their execution paths based on real-time conditions, a significant advancement over previous fixed-pattern agents. This initial release includes core graph mechanics, comprehensive examples, and detailed design documentation, laying the groundwork for future enhancements in parallel execution, interrupt handling, and checkpointing.

Highlights

  • New GraphAgent Introduced: A new GraphAgent has been added to the ADK, enabling the creation and orchestration of complex directed-graph workflows with conditional routing, cyclic execution, and advanced state management capabilities.
  • Enhanced Workflow Control: The GraphAgent addresses limitations of existing agents by allowing runtime decisions based on state, supporting conditional branching, loops, and arbitrary Directed Acyclic Graph (DAG) topologies.
  • Comprehensive Feature Set: Key features include state management with reducers, typed events, streaming, callbacks, rewind functionality, resumability, OpenTelemetry tracing, evaluation metrics, and CLI graph visualization.
  • Extensive Examples and Documentation: Nine new sample agents and a detailed design document (graph_agent_design.md) have been included to demonstrate various patterns and architectural insights, such as ReAct, dynamic task queues, and enhanced routing.
  • ADK Integration and Conformance: The GraphAgent is built as a proper BaseAgent, integrating seamlessly with ADK's event system, session services, and resumability framework, ensuring native compatibility and extensibility.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • contributing/docs/graph_agent_design.md
    • Added a detailed design document for the new GraphAgent.
  • contributing/samples/graph_agent_basic/README.md
    • Added a README for the basic conditional routing GraphAgent example.
  • contributing/samples/graph_agent_basic/agent.py
    • Added Python code for a basic GraphAgent example demonstrating conditional routing.
  • contributing/samples/graph_agent_basic/root_agent.yaml
    • Added YAML configuration for the basic GraphAgent example.
  • contributing/samples/graph_agent_dynamic_queue/README.md
    • Added a README for the dynamic task queue GraphAgent example.
  • contributing/samples/graph_agent_dynamic_queue/agent.py
    • Added Python code for a GraphAgent example demonstrating a dynamic task queue pattern.
  • contributing/samples/graph_agent_react_pattern/README.md
    • Added a README for the ReAct pattern GraphAgent example.
  • contributing/samples/graph_agent_react_pattern/agent.py
    • Added Python code for a GraphAgent example implementing the ReAct pattern.
  • contributing/samples/graph_examples/01_basic/init.py
    • Added an initialization file for the basic GraphAgent example.
  • contributing/samples/graph_examples/01_basic/agent.py
    • Added Python code for a basic GraphAgent workflow example.
  • contributing/samples/graph_examples/02_conditional_routing/init.py
    • Added an initialization file for the conditional routing example.
  • contributing/samples/graph_examples/02_conditional_routing/agent.py
    • Added Python code for a GraphAgent example demonstrating conditional routing.
  • contributing/samples/graph_examples/03_cyclic_execution/init.py
    • Added an initialization file for the cyclic execution example.
  • contributing/samples/graph_examples/03_cyclic_execution/agent.py
    • Added Python code for a GraphAgent example demonstrating cyclic execution.
  • contributing/samples/graph_examples/07_callbacks/init.py
    • Added an initialization file for the node callbacks example.
  • contributing/samples/graph_examples/07_callbacks/agent.py
    • Added Python code for a GraphAgent example demonstrating node callbacks.
  • contributing/samples/graph_examples/08_rewind/init.py
    • Added an initialization file for the rewind example.
  • contributing/samples/graph_examples/08_rewind/agent.py
    • Added Python code for a GraphAgent example demonstrating rewind integration.
  • contributing/samples/graph_examples/15_enhanced_routing/init.py
    • Added an initialization file for the enhanced routing example.
  • contributing/samples/graph_examples/15_enhanced_routing/agent.py
    • Added Python code for a GraphAgent example demonstrating enhanced routing patterns.
  • contributing/samples/graph_examples/README.md
    • Added a comprehensive README for all GraphAgent examples.
  • contributing/samples/graph_examples/init.py
    • Added an initialization file for the graph examples directory.
  • contributing/samples/graph_examples/example_utils.py
    • Added utility functions for GraphAgent examples.
  • contributing/samples/graph_examples/run_example.py
    • Added a script to run GraphAgent examples with optional tracing and LLM mode.
  • src/google/adk/init.py
    • Removed Context from the __all__ export list.
  • src/google/adk/agents/init.py
    • Added GraphAgent-related imports and removed Context from the __all__ export list.
  • src/google/adk/agents/graph/init.py
    • Added a new initialization file for the graph module, exposing GraphAgent components.
  • src/google/adk/agents/graph/callbacks.py
    • Added a new file defining callback contexts and types for graph observability.
  • src/google/adk/agents/graph/evaluation_metrics.py
    • Added a new file defining custom evaluation metrics for GraphAgent workflows.
  • src/google/adk/agents/graph/graph_agent.py
    • Added the core GraphAgent implementation, including AST-safe condition parsing, node execution logic, and telemetry integration.
  • src/google/adk/agents/graph/graph_agent_config.py
    • Added a new file defining Pydantic config schemas for GraphAgent components.
  • src/google/adk/agents/graph/graph_agent_state.py
    • Added a new file defining GraphAgentState for tracking GraphAgent execution state.
  • src/google/adk/agents/graph/graph_edge.py
    • Added a new file defining EdgeCondition for conditional routing in GraphAgent.
  • src/google/adk/agents/graph/graph_events.py
    • Added a new file defining typed events for GraphAgent execution streaming.
  • src/google/adk/agents/graph/graph_export.py
    • Added a new file containing functions for exporting graph structure and execution data.
  • src/google/adk/agents/graph/graph_node.py
    • Added a new file defining GraphNode for wrapping agents and functions within GraphAgent.
  • src/google/adk/agents/graph/graph_rewind.py
    • Added a new file providing rewind functionality for GraphAgent execution.
  • src/google/adk/agents/graph/graph_state.py
    • Added a new file defining GraphState for managing domain data within GraphAgent.
  • src/google/adk/agents/graph/graph_telemetry.py
    • Added a new file defining telemetry mixins for GraphAgent observability.
  • src/google/adk/agents/graph/state_utils.py
    • Added a new file with reusable state parsing utilities for GraphAgent.
  • src/google/adk/cli/agent_graph.py
    • Updated the CLI tool to support visualization of GraphAgent topologies.
  • src/google/adk/telemetry/init.py
    • Updated the __all__ export list in the telemetry module.
  • src/google/adk/telemetry/graph_tracing.py
    • Added a new file for OpenTelemetry instrumentation specific to GraphAgent.
  • tests/unittests/agents/test_graph_agent.py
    • Added comprehensive unit tests for the GraphAgent implementation.
  • tests/unittests/agents/test_graph_agent_config.py
    • Added unit tests for GraphAgent configuration validation.
  • tests/unittests/agents/test_graph_agent_validation.py
    • Added unit tests for GraphAgent validation features, including duplicate names and auto-defaulting.
  • tests/unittests/agents/test_graph_callbacks.py
    • Added unit tests for GraphAgent callback infrastructure.
  • tests/unittests/agents/test_graph_convenience_api.py
    • Added unit tests for GraphAgent's convenience API methods.
  • tests/unittests/agents/test_graph_evaluation.py
    • Added unit tests for GraphAgent evaluation metrics.
  • tests/unittests/agents/test_graph_evaluation_integration.py
    • Added integration tests for GraphAgent evaluation with intermediate data extraction.
  • tests/unittests/agents/test_graph_resumability.py
    • Added unit tests for GraphAgent's resumability integration.
  • tests/unittests/agents/test_graph_rewind.py
    • Added unit tests for GraphAgent's rewind functionality.
  • tests/unittests/agents/test_graph_routing.py
    • Added unit tests for enhanced graph routing features like priority, weight, and fallback.
  • tests/unittests/agents/test_graph_state.py
    • Added unit tests for GraphState accessors and state utility functions.
  • tests/unittests/agents/test_graph_state_management.py
    • Added comprehensive state management tests for GraphAgent.
  • tests/unittests/agents/test_graph_telemetry_config.py
    • Added unit tests for GraphAgent telemetry configuration.
  • tests/unittests/cli/test_agent_graph.py
    • Updated unit tests for the CLI agent graph visualization to include GraphAgent.
  • tests/unittests/telemetry/test_graph_tracing.py
    • Added unit tests for GraphAgent's OpenTelemetry instrumentation.
Activity
  • This pull request is the first part of a 5-PR stack to introduce the GraphAgent feature.
  • All unit tests for the new GraphAgent and related components have been added and passed locally.
  • Manual end-to-end tests for all 9 sample agents included in this PR have been successfully executed.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces GraphAgent for directed-graph workflow orchestration, significantly enhancing ADK's capabilities by enabling conditional routing, cyclic execution, and advanced state management. The changes include core agent implementation, configuration schemas, telemetry integration, and a comprehensive set of examples and design documentation. The new GraphAgent addresses limitations of existing sequential, parallel, and loop agents by allowing dynamic decision-making based on runtime state. The design document is well-structured and clearly explains the motivation, use cases, architecture, and capabilities of GraphAgent. The examples cover various features, including basic workflows, conditional routing, cyclic execution, callbacks, and rewind integration, demonstrating the flexibility and power of the new agent. The addition of telemetry and evaluation metrics further strengthens the observability and testability of graph-based workflows. Overall, this is a substantial and well-thought-out addition to the ADK framework.

new_state.data[self.name] = []
new_state.data[self.name].append(output)
elif self.reducer == StateReducer.SUM:
new_state.data[self.name] = new_state.data.get(self.name, 0) + output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The new_state.data.get(self.name, 0) + output operation for StateReducer.SUM assumes output is a numeric type. However, agent outputs are typically strings. If output is a string, this will result in a TypeError. Consider adding a type conversion (e.g., int(output) or float(output)) or explicitly documenting that output_mapper should handle type conversion before reaching the reducer.

Copy link
Author

@drahnreb drahnreb Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. SUM reducer now infers zero-value defaults via type(output)() and raises TypeError on actual type mismatches.


from .base_agent import BaseAgent
from .context import Context
from .graph import END
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Context import was removed. Ensure that no other modules within google.adk.agents or external modules relying on google.adk.agents still expect Context to be available directly from this __init__.py.

Copy link
Author

@drahnreb drahnreb Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: Context is imported and exported in __init__.py.

# Warn if agent name shadows the graph itself (find_agent would
# return the graph, not the sub_agent). Allow it so existing code
# is not broken, but log a warning.
if agent.name == self.name:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The warning message mentions find_agent() will return the graph, not the node agent. This implies a potential shadowing issue. While a warning is present, it might be better to prevent this scenario entirely or provide a clearer mechanism for disambiguation if both names are allowed to coexist.

Copy link
Author

@drahnreb drahnreb Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: Now raises ValueError at graph build time (add_node()) if agent name collides with GraphAgent name.

graph.add_node("task_dispatcher", function=dynamic_task_dispatcher)

# Loop back to dispatcher while tasks remain.
# Check task_queue directly (mutated in-place by the function node).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment mentions that task_queue is mutated in-place by the function node, but the dynamic_task_dispatcher function returns a new dictionary. This might be confusing. Clarify whether the task_queue in state.data is directly modified or if the returned dictionary is used to update the state.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: Comment updated to 'updated via output_mapper return value'.

Comment on lines 212 to 218
[150ms] ✅ Fetched data from products_db (100ms)
[150ms] ✅ Fetched data from users_db (150ms)
[200ms] ✅ Fetched data from orders_db (200ms)

Total: ~200ms (vs 450ms sequential)
Speedup: ~2.25x
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The output examples with [ms] are hardcoded and might become outdated. Consider using a dynamic way to generate these or removing them if they are not strictly necessary for the documentation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. These are sample outputs for documentation context.

Returns:
New graph state with output merged
"""
new_state = GraphState(data=state.data.copy())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The new_state = GraphState(data=state.data.copy()) line performs a shallow copy of the data dictionary. If state.data contains mutable objects (like nested dictionaries or lists), modifying them in new_state will also affect the original state. This can lead to unexpected side effects. Consider using copy.deepcopy(state.data) if nested mutable objects are expected and true isolation is required.

Copy link
Author

@drahnreb drahnreb Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: Changed from shallow copy to deepcopy(state.data) in _default_output_mapper for proper nested state isolation.

Comment on lines 244 to 245
- **Core implementation**: ~1,231 lines
- **Tests**: 82 tests, 2,665 lines
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The lines of code and tests mentioned here might become outdated as the project evolves. Consider using a dynamic way to generate these numbers or removing them if they are not strictly necessary for the design document.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged.

Comment on lines +191 to +193
# Check if weights are all the same (default behavior: first match)
weights = [e.weight for _, e in matching_edges]
all_same_weight = len(set(weights)) == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The all_same_weight check is good, but if all weights are 0, total_weight will be 0, leading to a ZeroDivisionError in random.random() * total_weight. The current code handles total_weight == 0 by picking the first matching edge, which is a reasonable fallback. However, it might be clearer to explicitly check for total_weight == 0 earlier or add a comment explaining this edge case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: Added guard for total_weight == 0 with clarifying comment.

Comment on lines +1209 to +1218
def test_validate_ast_keyword_args():
"""Line 149: Keyword arguments in safe method calls are validated."""
from google.adk.agents.graph.graph_agent import _parse_condition_string

# .get() with keyword arg — should pass validation
fn = _parse_condition_string("data.get(key='x')")
state = GraphState(data={"x": "val"})
# Python's dict.get() doesn't accept 'key' kwarg, so it will raise at eval
# but the AST validation itself should succeed
assert fn(state) is False # eval error → returns False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The test test_validate_ast_keyword_args correctly identifies that keyword arguments to data.get() are not directly supported by Python's dict.get(). While the AST validation passes, the runtime evaluation will fail. This highlights a potential area for improvement in either the validation logic (to disallow unsupported kwargs) or the GraphState's get methods (to handle kwargs gracefully).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. The test documents that keyword args are allowed in AST validation for dict.get(key, default) patterns.

Comment on lines 474 to 493
"""Document shallow copy limitation with nested dicts.

This test documents that GraphNode._default_output_mapper uses .copy()
which is a shallow copy. For nested structures, this can cause issues.
"""
state1 = GraphState(data={"nested": {"key": "value"}})

# Shallow copy (what GraphNode does)
state2 = GraphState(data=state1.data.copy())

# Modify nested structure in state2
state2.data["nested"]["key"] = "modified"

# BUG: state1 is also modified (shared reference)
# This is a known limitation of shallow copy
assert state1.data["nested"]["key"] == "modified" # Unintended side effect

# NOTE: parallel.py uses deepcopy to avoid this issue
# For regular sequential execution, users should avoid nested mutations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The test test_state_nested_dict_shallow_copy_limitation is crucial for documenting the shallow copy limitation with nested mutable objects. It clearly demonstrates that modifying a nested dictionary in a copied state will unintentionally affect the original state. This highlights a potential source of bugs and emphasizes the need for deep copying in such scenarios.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: Test updated — now tests deepcopy isolation instead of shallow copy limitation.

@drahnreb drahnreb force-pushed the feat/graph-agent-pr1 branch from 8c0afaf to b64a3dd Compare February 22, 2026 10:57
@drahnreb
Copy link
Author

drahnreb commented Feb 22, 2026

Addressing review feedback

Force-pushed with the following fixes:

Critical:

  • AST dunder attribute escape: Added if node.attr.startswith("_"): raise ValueError(...) to block sandbox escape via state.__class__.__init__.__globals__.

High:

  • Shallow copy → deepcopy: _default_output_mapper now uses deepcopy(state.data) for proper nested state isolation.
  • Agent name collision: Changed from warning to ValueError at add_node() time.

Medium:

  • O(N²) → O(1): Replaced path.count() with node_invocations dict lookup.
  • effective_config: All 3 _get_next_node_with_telemetry call sites now pass effective_config.
  • Edge sort-once: Added _sorted_edges_cache with invalidation on add_edge().
  • Zero-weight edge: Added clarifying comment about ZeroDivisionError guard.
  • Context import: Restored in both __init__.py files.
  • logger.error: All sites include exc_info=True.

@drahnreb drahnreb force-pushed the feat/graph-agent-pr1 branch 4 times, most recently from 31de828 to 1414a38 Compare February 22, 2026 16:34
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The GraphAgent implementation provides a robust foundation for directed-graph workflows in ADK, supporting conditional routing, cyclic execution, and state management. The integration with ADK's resumability and telemetry systems is well-designed. However, there are some security concerns regarding the use of eval in condition parsing, and opportunities to improve performance in path tracking and edge evaluation. Additionally, ensuring deep state isolation between nodes would enhance reliability, especially for complex workflows.

Comment on lines 139 to 141
elif isinstance(node, ast.Attribute):
# Allow attribute access on safe names only
_validate_condition_ast(node.value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

The AST validation for attribute access is incomplete. It currently only validates the object being accessed (node.value) but does not check the attribute name itself (node.attr). This allows access to potentially dangerous dunder attributes like __class__, which could be used to escape the sandbox via eval (e.g., state.__class__.__init__.__globals__).

Suggested change
elif isinstance(node, ast.Attribute):
# Allow attribute access on safe names only
_validate_condition_ast(node.value)
elif isinstance(node, ast.Attribute):
if node.attr.startswith("_"):
raise ValueError(f"Unsafe attribute access: '{node.attr}'")
_validate_condition_ast(node.value)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: Added dunder attribute blocking in _validate_condition_ast.

Returns:
New graph state with output merged
"""
new_state = GraphState(data=state.data.copy())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The default output mapper uses a shallow copy of the state data. If the state contains nested dictionaries or lists, multiple nodes might inadvertently share and mutate the same objects, leading to unpredictable side effects and race conditions. Using a deep copy ensures proper state isolation between nodes. Since GraphState is a Pydantic model, model_copy(deep=True) is the recommended approach.

Suggested change
new_state = GraphState(data=state.data.copy())
new_state = state.model_copy(deep=True)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: Changed to deepcopy(state.data) in _default_output_mapper.

Comment on lines 1247 to 1250
if (
agent_state.path.count(current_node_name) > 1
and current_node.agent
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using list.count() inside the execution loop results in $O(N^2)$ complexity relative to the total number of steps in the workflow. For long-running cyclic graphs, this can become a performance bottleneck. Since agent_state.node_invocations already tracks visits per node, you can check its length for an $O(1)$ lookup.

Suggested change
if (
agent_state.path.count(current_node_name) > 1
and current_node.agent
):
if (
len(agent_state.node_invocations.get(current_node_name, [])) > 1
and current_node.agent
):

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: Replaced path.count() with O(1) node_invocations lookup.

Comment on lines +1382 to +1384
raise ValueError(
f"Node {current_node_name} has no outgoing edges and is not"
" an end node"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

effective_config is calculated at the start of _run_async_impl but not passed to _get_next_node_with_telemetry. This means edge evaluation telemetry might not respect inherited configuration from parent agents (e.g., sampling rates or additional attributes).

          next_node_name = self._get_next_node_with_telemetry(
              current_node, state, effective_config=effective_config
          )

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: All 3 call sites now pass effective_config.

Comment on lines 175 to 176
sorted_edges = sorted(indexed_edges, key=lambda x: (-x[1].priority, x[0]))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Edges are sorted by priority on every call to get_next_node. Since edge priorities are typically static, sorting them every time a transition occurs is inefficient. It would be better to maintain the edges in a sorted list or sort them once when they are added to the node.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: Added _sorted_edges_cache with invalidation on add_edge().

@drahnreb drahnreb force-pushed the feat/graph-agent-pr1 branch 3 times, most recently from 965a398 to b39cbab Compare February 22, 2026 23:43
@drahnreb
Copy link
Author

@gemini-code-assist please re-review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The PR introduces GraphAgent, a powerful addition to the ADK workflow agents, enabling directed-graph orchestration with conditional routing, cycles, and state management. The implementation is comprehensive, including telemetry, callbacks, and visualization support. However, there are some performance concerns regarding state management (use of deepcopy and full-state iteration for deltas) and some robustness issues in the state reduction logic and configuration parsing that should be addressed to ensure scalability and reliability.

Comment on lines +1350 to +1356
if (
not _k.startswith("_")
and _k not in _GRAPH_INTERNAL_KEYS
and ctx.session.state.get(_k) != _v
):
delta[_k] = _v
if delta:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calculating the state delta by iterating over all keys in state.data and performing a comparison with ctx.session.state after every node execution can be inefficient, especially as the state grows or contains complex objects. This O(N) operation per iteration could become a performance bottleneck. Consider tracking modified keys within GraphState to optimize this sync.

Comment on lines +1609 to +1613
name=node_config.name,
agent=sub_agents[0] if sub_agents else None,
function=function,
)
graph.add_node(node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The from_config method only uses the first agent from the sub_agents list in the node configuration. If a user provides multiple sub-agents for a single node in the YAML, the others will be silently ignored. It would be better to validate that only one sub-agent is provided or update the logic to handle multiple agents if that was the intent.

References
  1. Ensure that configuration parsing handles all provided data or explicitly validates against unsupported inputs to prevent silent failures.

Comment on lines +1634 to +1636
if edge_config.source_node in graph.nodes:
graph.nodes[edge_config.source_node].edges.append(edge)
graph.nodes[edge_config.source_node]._sorted_edges_cache = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Appending the edge directly to the node's edges list bypasses the validation logic and duplicate edge checks implemented in the add_edge method. It is safer to use the public API for graph construction.

References
  1. Prefer using public APIs that encapsulate validation logic over direct member manipulation.

Returns:
New graph state with output merged
"""
new_state = GraphState(data=deepcopy(state.data))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using deepcopy on the entire state data for every node execution can be very expensive in terms of memory and CPU, especially for large states or graphs with many iterations. Consider using a shallow copy (state.data.copy()) and ensuring that reducers (like APPEND) create new collection instances instead of mutating them in place to maintain isolation.

References
  1. Avoid expensive operations like deepcopy in tight loops or frequent execution paths if shallow copies or immutable patterns can achieve the same goal.

existing = new_state.data.get(self.name)
if existing is None:
# Infer zero-value from output type: "" for str, 0 for int/float, [] for list
existing = type(output)()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using type(output)() to infer a zero-value for the SUM reducer is risky. It will raise a TypeError if output is None (which can happen with function nodes) or if it's a type without a default constructor (like some Pydantic models with required fields). Consider adding explicit handling for common types or a null check.

References
  1. Ensure that dynamic type instantiation handles edge cases like NoneType or types without default constructors.

Add GraphAgent for building directed-graph workflows with conditional
routing, cyclic execution, state management with reducers, typed events,
streaming, callbacks, rewind, resumability, telemetry with OpenTelemetry
tracing, evaluation metrics, and CLI graph visualization for GraphAgent
topologies. Includes samples and design documentation.
@drahnreb drahnreb force-pushed the feat/graph-agent-pr1 branch from b39cbab to ef2722d Compare February 23, 2026 13:38
@drahnreb
Copy link
Author

@gemini-code-assist please re-review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the GraphAgent, a powerful new feature for orchestrating complex, non-linear workflows. The implementation is comprehensive, covering core graph logic, state management, conditional routing, resumability, and telemetry. The addition of numerous well-documented examples and tests is commendable and greatly aids in understanding the new capabilities.

My review focuses on a few minor improvements in the example code for robustness and clarity. The core implementation of GraphAgent and its related components appears solid and well-designed.

  • In graph_agent_dynamic_queue/agent.py and graph_agent_react_pattern/agent.py, the logic for retrieving the final session state could be made more robust to avoid using stale data if the session fetch fails.
  • In graph_examples/03_cyclic_execution/agent.py, the code to determine the final counter value is a bit complex and could be simplified.
  • In graph_examples/15_enhanced_routing/agent.py, the output mapper for storing the score could be improved to parse the value from the agent's output instead of being hardcoded, making the LLM mode example more realistic.

Overall, this is an excellent contribution that significantly enhances the ADK's workflow orchestration capabilities.

Comment on lines +286 to +288
final_session = fresh_session or session
final_data = final_session.state.get("graph_data", {})
final_state = GraphState(data=final_data) if final_data else GraphState()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fallback to the original session object in final_session = fresh_session or session could lead to incorrect final statistics being printed. The session object holds a stale copy of the state from before the runner.run_async call, as noted in the comment on line 245. If session_service.get_session were to fail and return None, this would fall back to the stale data, silently printing wrong results.

A more robust approach would be to handle the None case explicitly.

Suggested change
final_session = fresh_session or session
final_data = final_session.state.get("graph_data", {})
final_state = GraphState(data=final_data) if final_data else GraphState()
if not fresh_session:
print("Error: Could not fetch final session state.")
return
final_data = fresh_session.state.get("graph_data", {})
final_state = GraphState(data=final_data) if final_data else GraphState()

Comment on lines +200 to +201
final_data = (fresh_session or session).state.get("graph_data", {})
final_state = GraphState(data=final_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fallback to the original session object in (fresh_session or session) could lead to incorrect final state being displayed if get_session fails and returns None. The original session object contains stale data from before the graph execution.

It would be more robust to handle the possibility of fresh_session being None explicitly.

Suggested change
final_data = (fresh_session or session).state.get("graph_data", {})
final_state = GraphState(data=final_data)
if not fresh_session:
print("\nError: Could not fetch final session state.")
return
final_data = fresh_session.state.get("graph_data", {})
final_state = GraphState(data=final_data)

Comment on lines +283 to +298
final_counter = session.state.get("counter")
if final_counter is None:
graph_data_raw = session.state.get("graph_data")
if graph_data_raw:
try:
data = (
json.loads(graph_data_raw)
if isinstance(graph_data_raw, str)
else graph_data_raw
)
final_counter = data.get("counter", 0)
except (json.JSONDecodeError, TypeError):
final_counter = 0

if final_counter is None:
final_counter = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic to retrieve the final_counter is a bit complex and contains some redundancy, making it hard to follow. It handles multiple cases (direct state, graph_data as string, graph_data as dict) but can be simplified for better readability and maintainability.

Consider refactoring this block to be more direct. For example, you could default final_counter to 0 and then update it if found in either session.state or the parsed graph_data.

  final_counter = session.state.get("counter")
  if final_counter is None:
    final_counter = 0  # Default value
    graph_data_raw = session.state.get("graph_data")
    if isinstance(graph_data_raw, str):
        try:
            graph_data_raw = json.loads(graph_data_raw)
        except json.JSONDecodeError:
            graph_data_raw = {}
    if isinstance(graph_data_raw, dict):
        final_counter = graph_data_raw.get("counter", 0)

Comment on lines +213 to +216
def store_score(output, state):
new_state = GraphState(data=state.data.copy())
new_state.data["risk_score"] = 0.85 # Score from analyze agent
return new_state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The store_score output mapper currently hardcodes the risk_score to 0.85. While this is consistent with the value passed to create_agents_priority, it makes the example less robust. In LLM mode, it ignores the actual output from the model. In deterministic mode, it's redundant because the ScoreAgent already sets risk_score in the session state, which is then automatically synced to the graph state.

A better implementation would parse the score from the agent's output, making the example more realistic, especially for the LLM use case.

  def store_score(output, state):
    import re
    new_state = GraphState(data=state.data.copy())
    # For deterministic mode, ScoreAgent sets this in session state, which is synced.
    # For LLM mode, we parse it from the output string.
    match = re.search(r"score:\s*([\d.]+)", str(output))
    if match:
        new_state.data["risk_score"] = float(match.group(1))
    return new_state

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core [Component] This issue is related to the core interface and implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: GraphAgent — directed-graph workflow orchestration for ADK

2 participants