stanfordnlp · Ju-usc · Oct 10, 2025 · Oct 10, 2025 · Oct 10, 2025 · Oct 10, 2025
diff --git a/docs/docs/api/optimizers/GEPA/GEPA_Advanced.md b/docs/docs/api/optimizers/GEPA/GEPA_Advanced.md
@@ -443,3 +443,230 @@ gepa = dspy.GEPA(
     auto="medium"
 )
 ```
+
+## Tool Optimization
+
+### What is enable_tool_optimization?
+
+When `enable_tool_optimization=True`, GEPA jointly optimizes `dspy.ReAct` modules: predictor instructions and tool descriptions and argument descriptions are updated together, instead of being tuned in isolation. This lets the model learn better patterns for when to call a tool and how to use it from the same execution traces and feedback that drive core GEPA.
+
+### Usage and constraints
+
+- **Expose tools as `dspy.Tool` in signatures and examples.** GEPA only optimizes tools that are represented as `dspy.Tool` and actually passed as `dspy.Tool` objects into your modules.
+- **Treat `Tool.name` as a stable identifier.** `Tool.name` is the tool's name, and GEPA uses it to attach improved descriptions and argument descriptions. If you reuse the same `Tool.name` for different tools, they will share the same text updates.
+- **Avoid custom tools named `"finish"`.** The built-in ReAct `"finish"` tool is reserved and excluded from optimization. Custom tools with the name `"finish"` are also not optimized.
+- **Custom instruction proposers handle all modules and tool updates.** When you provide an `instruction_proposer`, GEPA routes every optimized module through your proposer instead of the built-in instruction proposer. If `enable_tool_optimization=True`, modules that call tools are still included, and your proposer is also responsible for updating their tool descriptions and argument descriptions.
+
+### Tool Module Optimization Prompt
+
+GEPA uses `ToolProposer` to optimize ReAct modules when `enable_tool_optimization=True`. For each module, the proposer builds a dynamic signature from the base `GenerateImprovedToolModuleDescriptionsFromFeedback` signature shown below, then appends output fields for each tool description and each tool argument description in that module. For ReAct modules, the proposer also appends input and output fields for the extract instruction.
+
+```python
+class GenerateImprovedToolModuleDescriptionsFromFeedback(dspy.Signature):
+    """I provided an assistant with predictor instructions and tool descriptions,
+    but its performance needs improvement based on the examples_with_feedback below.
+
+    Your task is to propose better predictor instructions, tool descriptions, and
+    tool argument descriptions that address the issues shown in these examples.
+    Focus on reinforcing patterns that clearly improve the assistant's performance
+    on similar tasks, rather than rewriting everything from scratch unless necessary.
+    These components are progressively optimized - refine only what needs to change.
+
+    Analyze the examples_with_feedback to identify success and failure patterns,
+    and write improved instructions and descriptions at their appropriate level
+    of abstraction and/or specificity, so that each layer plays a clear,
+    complementary role without unnecessary repetition or verbosity unless
+    redundancy clearly helps the assistant's performance.
+    """
+
+    current_predictor_instruction = dspy.InputField(
+        desc="Current instruction guiding the predictor"
+    )
+    current_tools = dspy.InputField(
+        annotation=list[dspy.Tool],
+        desc="Available tools with their complete schemas"
+    )
+    examples_with_feedback = dspy.InputField(
+        desc="Execution examples with feedback showing successes and failures"
+    )
+
+    improved_predictor_instruction: str | None = dspy.OutputField(
+        desc="Improved instruction for the predictor",
+        default=None
+    )
+
+    # GEPA appends output fields dynamically for each tool and argument:
+    # - improved_tool_{name}_desc with desc="Improved description of tool '{name}'"
+    # - improved_tool_{name}_arg_{param}_desc with desc="Improved description of the argument '{param}' of tool '{name}'"
+    # For ReAct modules, GEPA also appends:
+    # - current_extract_instruction (input) with desc="Current instruction for extraction predictor"
+    # - improved_extract_instruction (output) with desc="Improved instruction for extraction"
+```
+
+The reflection LM uses this dynamically-built signature to jointly propose updates across predictor instructions, tool descriptions, and argument descriptions based on execution feedback. Updates are coordinated rather than made in isolation: the LM sees all current components together and can selectively update any subset by returning new text, or return `None` to keep a component unchanged.
+
+### How Tool Optimization Works
+
+When `enable_tool_optimization=True`, GEPA:
+
+1. **Discovers ReAct modules** - Identifies `dspy.ReAct` modules and their associated tools
+2. **Treats them as joint optimization units** - Instead of only optimizing predictor instructions, GEPA optimizes predictor instructions and tool descriptions together as a coordinated set; for ReAct this includes both the react and extract instructions
+3. **Routes to specialized proposer** - Separates components by type and routes them appropriately:
+   - **With custom `instruction_proposer`**: Your custom proposer receives both ReAct modules and plain predictors, and is responsible for updating all components
+   - **With default proposer**: Plain predictors use the default instruction proposer; ReAct modules use `ToolProposer`, which employs the dynamic signature mechanism described above
+4. **Optimizes jointly** - `ToolProposer` improves predictor instructions and tool descriptions together based on execution feedback, coordinating updates across all components rather than tuning them in isolation
+5. **Applies updates** - Improved instructions update predictor signatures; improved tool descriptions and argument descriptions update all `dspy.Tool` objects with matching tool names throughout the program
+
+Modules without tools (like `dspy.Predict` or `dspy.ChainOfThought`) continue using standard GEPA instruction-only optimization.
+
+### When to Use Tool Optimization
+
+Enable `enable_tool_optimization=True` when tools are central to your program's behavior and you want GEPA to jointly optimize predictor instructions and tool descriptions together. Common scenarios:
+
+1. **Wrong tool selection** - Predictor with `search` and `weather` tools keeps searching when it should check weather, or vice versa. GEPA refines predictor instructions and tool descriptions to clarify when to use each tool.
+
+2. **Underused tools** - Predictor responds "I don't know" without using available tools that could answer the question. GEPA improves predictor instructions to be more proactive about tool usage.
+
+3. **Tool call loops** - Agent keeps calling `web_search` multiple times with similar queries instead of synthesizing information. GEPA improves instructions to encourage synthesis and tool descriptions to clarify when searches are sufficient.
+
+4. **Extraction failures (ReAct)** - Agent executes tools correctly but fails to extract the final answer from the trajectory. GEPA improves extract instruction to better identify and format answers from tool outputs.
+
+5. **Multi-agent delegation** - Parent agent has delegation tools to specialized sub-agents but doesn't understand when to use each. GEPA optimizes instructions and tool descriptions across both parent and sub-agent modules for coherent delegation.
+
+See the usage example below for tool-using programs.
+
+### Usage Example
+
+```python
+import dspy
+
+def search_web(query: str) -> str:
+    return f"Search results for: {query}"
+
+def get_weather(city: str) -> str:
+    """Get the current weather for a city."""
+    return f"The weather in {city} is sunny and 75°F"
+
+# Create tools with basic descriptions
+search_tool = dspy.Tool(search_web, name="search_web", desc="Search tool")
+weather_tool = dspy.Tool(get_weather, name="get_weather", desc="Weather tool")
+
+program = dspy.ReAct("question -> answer", tools=[search_tool, weather_tool])
+
+# Enable tool optimization
+gepa = dspy.GEPA(
+    metric=my_metric,
+    reflection_lm=dspy.LM(model="gpt-5-mini"),
+    enable_tool_optimization=True,
+    auto="medium"
+)
+
+optimized_program = gepa.compile(program, trainset=train_examples, valset=val_examples)
+```
+
+### Inspecting Optimized Programs
+
+View optimization results and metadata (requires `track_stats=True`):
+
+```python
+# High-level optimization metadata
+optimized_program.detailed_results
+```
+
+Access optimized instructions and tool descriptions directly:
+
+```python
+# Predictor instructions
+for name, predictor in optimized_program.named_predictors():
+    print(f"{name}: {predictor.signature.instructions}")
+
+# Tool descriptions and argument descriptions
+for tool_name, tool in optimized_program.tools.items():
+    print(f"{tool_name}: {tool.desc}")
+    for arg_name, arg_schema in tool.args.items():
+        print(f"  {arg_name}: {arg_schema.get('description', 'N/A')}")
+```
+
+### Custom Instruction Proposers with Tool Optimization
+
+When you provide a custom `instruction_proposer`, GEPA routes **all components** to your proposer, including both plain predictors and ReAct modules. Your proposer must handle both.
+
+**What your proposer receives:**
+
+- **Plain predictors**: instruction strings keyed by predictor name
+- **Tool modules (ReAct)**: JSON strings keyed by module identifier, containing predictor instructions and tool schemas
+  - Tool modules: `f"{TOOL_MODULE_PREFIX}:{extract_predictor_name}"`
+
+**Your proposer's responsibilities:**
+
+```python
+import json
+from dspy.teleprompt.gepa.gepa_utils import TOOL_MODULE_PREFIX
+
+def custom_proposer(candidate, reflective_dataset, components_to_update):
+    """Custom instruction proposer for GEPA with tool optimization.
+
+    Args:
+        candidate: dict[str, str] - All components in the program
+            {
+                "predictor_name": "instruction string",
+                "tool_module:extract_name": '{"react_name": "...", "extract_name": "...", "tools": {...}}'
+            }
+        reflective_dataset: dict[str, list[dict]] - Execution examples with feedback per component
+        components_to_update: list[str] - Component keys to optimize in this call
+
+    Returns:
+        dict[str, str]: Improved instructions for components_to_update keys only
+    """
+    improved_components = {}
+
+    for component_key in components_to_update:
+        if component_key.startswith(TOOL_MODULE_PREFIX):
+            config = json.loads(candidate[component_key])
+            # Example: {"react_pred": "react instruction", "extract_pred": "extract instruction", "tools": {...}}
+
+            # Find predictor names (predictor keys with string values and "tools" is a dict)
+            predictor_keys = [k for k, v in config.items() if isinstance(v, str)]
+            for pred_name in predictor_keys:
+                config[pred_name] = "improved predictor instruction"
+
+            # Update tool descriptions and argument descriptions
+            for tool_name, tool_info in config.get("tools", {}).items():
+                tool_info["desc"] = "improved tool description"
+                for arg_name in tool_info.get("args", {}):
+                    tool_info["args"][arg_name]["description"] = "improved argument description"
+
+            improved_components[component_key] = json.dumps(config)
+        else:
+            # Plain predictor: improve instruction string only
+            improved_components[component_key] = "improved instruction"
+
+    return improved_components
+```
+
+Your proposer can use any optimization approach: custom prompts, LM calls, heuristics, or rule-based logic.
+
+**ReAct module JSON structure:**
+
+```json
+{
+  "react_name": "react instruction",
+  "extract_name": "extract instruction",
+  "tools": {
+    "search": {
+      "desc": "...",
+      "args": {"query": {"type": "string", "description": "..."}}
+    }
+  }
+}
+```
+
+**What to update:**
+- `config[predictor_name] = "proposed predictor instruction"`
+- `config["tools"][tool_name]["desc"] = "proposed tool description"`
+- `config["tools"][tool_name]["args"][arg_name]["description"] = "proposed argument description"`
+
+**What to preserve:**
+- `config["tools"][tool_name]["args"][arg_name]["type"]` and other schema metadata (changing these breaks the tool since they must match the underlying function's parameter types)
+
+See [`ToolProposer`](https://github.com/stanfordnlp/dspy/blob/main/dspy/teleprompt/gepa/instruction_proposal.py) for reference.
diff --git a/docs/docs/api/optimizers/GEPA/overview.md b/docs/docs/api/optimizers/GEPA/overview.md
@@ -117,6 +117,12 @@ Practical Recipe for GEPA-Friendly Feedback:
 - **Multi-Objective Tasks** (e.g., PUPA): Decompose aggregate scores to reveal contributions from each objective, highlighting tradeoffs (e.g., quality vs. privacy).
 - **Stacked Pipelines** (e.g., code generation: parse → compile → run → profile → evaluate): Expose stage-specific failures; natural-language traces often suffice for LLM self-correction.
 
+## Tool Optimization with GEPA
+
+When `enable_tool_optimization=True`, GEPA jointly optimizes `dspy.ReAct` modules. This lets the optimizer update predictor instructions and tool descriptions/argument descriptions together, based on execution traces and feedback, instead of keeping tool behavior fixed.
+
+For details, examples, and the underlying design (tool discovery, naming requirements, and interaction with custom instruction proposers), see [Tool Optimization](GEPA_Advanced.md#tool-optimization).
+
 ## Custom Instruction Proposal
 
 For advanced customization of GEPA's instruction proposal mechanism, including custom instruction proposers and component selectors, see [Advanced Features](GEPA_Advanced.md).

diff --git a/dspy/teleprompt/gepa/gepa.py b/dspy/teleprompt/gepa/gepa.py
@@ -1,4 +1,5 @@
 import inspect
+import json
 import logging
 import random
 from dataclasses import dataclass
@@ -9,8 +10,15 @@
 from gepa.proposer.reflective_mutation.base import ReflectionComponentSelector
 
 from dspy.clients.lm import LM
+from dspy.predict.react import ReAct
 from dspy.primitives import Example, Module, Prediction
-from dspy.teleprompt.gepa.gepa_utils import DspyAdapter, DSPyTrace, PredictorFeedbackFn, ScoreWithFeedback
+from dspy.teleprompt.gepa.gepa_utils import (
+    TOOL_MODULE_PREFIX,
+    DspyAdapter,
+    DSPyTrace,
+    PredictorFeedbackFn,
+    ScoreWithFeedback,
+)
 from dspy.teleprompt.teleprompt import Teleprompter
 from dspy.utils.annotation import experimental
 
@@ -273,6 +281,11 @@ def metric(
         warn_on_score_mismatch: GEPA (currently) expects the metric to return the same module-level score when 
             called with and without the pred_name. This flag (defaults to True) determines whether a warning is 
             raised if a mismatch in module-level and predictor-level score is detected.
+        enable_tool_optimization: Whether to enable joint optimization of dspy.ReAct modules.
+            When enabled, GEPA jointly optimizes predictor instructions and tool descriptions together
+            for dspy.ReAct modules. See the
+            [Tool Optimization guide](https://dspy.ai/api/optimizers/GEPA/GEPA_Advanced/#tool-optimization)
+            for details on when to use this feature and how it works. Default is False.
         seed: The random seed to use for reproducibility. Default is 0.
         gepa_kwargs: (Optional) Additional keyword arguments to pass directly to [gepa.optimize](https://github.com/gepa-ai/gepa/blob/main/src/gepa/api.py).
             Useful for accessing advanced GEPA features not directly exposed through DSPy's GEPA interface.
@@ -355,6 +368,7 @@ def __init__(
         wandb_init_kwargs: dict[str, Any] | None = None,
         track_best_outputs: bool = False,
         warn_on_score_mismatch: bool = True,
+        enable_tool_optimization: bool = False,
         use_mlflow: bool = False,
         # Reproducibility
         seed: int | None = 0,
@@ -417,6 +431,7 @@ def __init__(
         self.wandb_api_key = wandb_api_key
         self.wandb_init_kwargs = wandb_init_kwargs
         self.warn_on_score_mismatch = warn_on_score_mismatch
+        self.enable_tool_optimization = enable_tool_optimization
         self.use_mlflow = use_mlflow
 
         if track_best_outputs:
@@ -546,11 +561,80 @@ def feedback_fn(
             reflection_lm=self.reflection_lm,
             custom_instruction_proposer=self.custom_instruction_proposer,
             warn_on_score_mismatch=self.warn_on_score_mismatch,
+            enable_tool_optimization=self.enable_tool_optimization,
             reflection_minibatch_size=self.reflection_minibatch_size,
         )
 
         # Instantiate GEPA with the simpler adapter-based API
-        base_program = {name: pred.signature.instructions for name, pred in student.named_predictors()}
+        base_program = {}
+
+        # First, process ReAct modules to claim their predictors
+        if self.enable_tool_optimization:
+            for module_path, module in student.named_sub_modules():
+                if not isinstance(module, ReAct):
+                    continue
+
+                # Verify DSPy's two-predictor ReAct design
+                assert hasattr(module, "extract") and hasattr(module.extract, "predict"), \
+                    f"ReAct module '{module_path}' missing extract.predict - DSPy design may have changed"
+
+                # Get predictor names via object identity
+                extract_predictor = module.extract.predict
+                react_predictor = module.react
+                extract_predictor_name = None
+                react_predictor_name = None
+                for name, pred in student.named_predictors():
+                    if pred is extract_predictor:
+                        extract_predictor_name = name
+                    elif pred is react_predictor:
+                        react_predictor_name = name
+
+                # Use extract.predict as the key since it is the target predictor for feedback lookup
+                module_key = f"{TOOL_MODULE_PREFIX}:{extract_predictor_name}"
+
+                # Build JSON config with dynamic predictor names as keys
+                config = {
+                    react_predictor_name: react_predictor.signature.instructions,
+                    extract_predictor_name: extract_predictor.signature.instructions,
+                    "tools": {
+                        tool_name: {
+                            "desc": tool.desc,
+                            "args": tool.args,
+                        }
+                        for tool_name, tool in module.tools.items()
+                        if tool_name != "finish"  # Skip the built-in finish tool
+                    }
+                }
+
+                base_program[module_key] = json.dumps(config, indent=2)
+        else:
+            # Warn if ReAct modules found but tool optimization disabled
+            for module_path, module in student.named_sub_modules():
+                if isinstance(module, ReAct):
+                    logger.info(
+                        f"Detected ReAct module at '{module_path}'. Consider using "
+                        "`enable_tool_optimization=True` to jointly optimize react instructions, "
+                        "extract instructions, tool descriptions, and tool argument descriptions."
+                    )
+
+        # Then, process individual predictors (skip if already part of a module config)
+        for name, pred in student.named_predictors():
+            if self.enable_tool_optimization:
+                # Skip if predictor is part of a ReAct module config
+                found = False
+                for key, val in base_program.items():
+                    if key.startswith(TOOL_MODULE_PREFIX):
+                        config = json.loads(val)
+                        if name in config:
+                            found = True
+                            break
+
+                if found:
+                    continue
+
+            # Add regular predictor
+            base_program[name] = pred.signature.instructions
+
         gepa_result: GEPAResult = optimize(
             seed_candidate=base_program,
             trainset=trainset,