Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
115 commits
Select commit Hold shift + click to select a range
6412a5d
feat(gepa): add tool description optimization for multi-agent systems
Ju-usc Oct 10, 2025
cf0be4f
style: fix ruff formatting (trailing whitespace)
Ju-usc Oct 10, 2025
aa53fe2
style: apply ruff formatting fixes
Ju-usc Oct 10, 2025
045c6cf
feat(gepa): implement tool-specific proposer for tool descriptions
Ju-usc Oct 10, 2025
c4f2041
docs(gepa): clean up multi-agent example code
Ju-usc Oct 10, 2025
260ca80
refactor(gepa): simplify tool reflective dataset with ReAct context r…
Ju-usc Oct 11, 2025
04f7e3d
fix(gepa): unify custom proposer routing for tools
Ju-usc Oct 12, 2025
f92e184
docs(gepa): clarify tool reflection prompt
Ju-usc Oct 12, 2025
7178869
test: streamline GEPA tool optimization tests
Ju-usc Oct 12, 2025
e34703b
fix(gepa): streamline tool proposer formatting
Ju-usc Oct 12, 2025
3f05311
test(gepa): drop legacy dummy tool fixture
Ju-usc Oct 12, 2025
4df9ce5
docs(gepa): add tool-specific reflection prompt and metric example
Ju-usc Oct 12, 2025
4296ccf
docs(gepa): fix implementation details with accurate code flow
Ju-usc Oct 13, 2025
ea1204a
docs(gepa): remove backward compatibility note
Ju-usc Oct 13, 2025
48d5cd6
docs(gepa): improve usage examples with optimization visualization
Ju-usc Oct 13, 2025
548d9b6
docs(gepa): add design rationale comments for tool context sharing
Ju-usc Oct 13, 2025
e61d0a1
docs(gepa): add tool optimization links to overview and parameter docs
Ju-usc Oct 13, 2025
5c95412
docs(gepa): refine tool optimization scenarios and remove implementat…
Ju-usc Oct 13, 2025
19d7717
docs(gepa): clarify future work section in code comments
Ju-usc Oct 13, 2025
9ce5fe4
refactor(gepa): unify ReAct optimization as single module
Ju-usc Oct 24, 2025
91331d0
test(gepa): add end-to-end ReAct module optimization test
Ju-usc Oct 24, 2025
3418b59
fix(gepa): enable arg description optimization for ReAct tools
Ju-usc Oct 24, 2025
b26d39a
chore: remove legacy test_gepa_tool_optimization.py
Ju-usc Oct 24, 2025
2791b5c
fix: restore accidentally removed score mismatch warning
Ju-usc Oct 24, 2025
8e63c62
test: update fixture after arg description optimization fix
Ju-usc Oct 25, 2025
7a9d2f3
fix(test): use JSON-based hashing for cross-version fixture stability
Ju-usc Oct 25, 2025
cd0de57
refactor(gepa): rename optimize_tool_descriptions to optimize_react_c…
Ju-usc Oct 26, 2025
67bb739
docs(gepa): improve 'What is optimize_react_components?' section
Ju-usc Oct 26, 2025
b3026a7
docs(gepa): replace outdated tool-specific prompt with actual ReAct o…
Ju-usc Oct 26, 2025
4e107aa
docs(gepa): simplify 'How It Works' section with accurate routing beh…
Ju-usc Oct 26, 2025
78547e7
docs(gepa): remove outdated Implementation Details section
Ju-usc Oct 26, 2025
7fa829b
docs(gepa): replace theoretical scenarios with real user pain points
Ju-usc Oct 26, 2025
da0e7bc
docs(gepa): fix usage examples reference to match updated scenarios
Ju-usc Oct 26, 2025
e51158d
docs(gepa): update inspect section to show all 4 ReAct components wit…
Ju-usc Oct 26, 2025
776ab9b
docs(gepa): rewrite Section 8 with accurate custom proposer behavior …
Ju-usc Oct 26, 2025
ec6bb7b
fix(gepa): fix top-level ReAct module lookup and remove tool name san…
Ju-usc Oct 27, 2025
b6cc67b
refactor(gepa): unify ReAct module key handling and use constant
Ju-usc Oct 28, 2025
1206f38
test(gepa): add ReAct module detection tests for nested structures
Ju-usc Oct 28, 2025
333cbbf
test(gepa): add comprehensive ReAct detection and reconstruction tests
Ju-usc Oct 28, 2025
a50552a
test(gepa): add reflective dataset tests for multi-agent trajectory v…
Ju-usc Oct 28, 2025
965b157
test(gepa): verify tool arg descriptions propagate to args schema
Ju-usc Oct 29, 2025
5ddc6d3
fix(gepa): propagate arg_desc updates to tool.args for prompt rendering
Ju-usc Oct 29, 2025
2269de5
test(gepa): remove fixture-based test and unused dependencies
Ju-usc Oct 29, 2025
17456f0
test(gepa): remove unused fixture file
Ju-usc Oct 29, 2025
c884c18
style: fix ruff linting issues (import formatting, whitespace, bare e…
Ju-usc Oct 31, 2025
82dee25
refactor(test): rename setup_spy_for_base_program to setup_capture_fo…
Ju-usc Oct 31, 2025
ca84b9d
docs(gepa): clarify why Tool.func uses placeholder lambda in proposer
Ju-usc Oct 31, 2025
2eb8986
refactor(gepa): make all ReAct components optional with None default …
Ju-usc Oct 31, 2025
9f37ac1
docs(gepa): clarify 'LM' as 'reflection LM' in comments for precision
Ju-usc Oct 31, 2025
bd4cdac
refactor(gepa): refine reflection prompt to guide concise, focused Re…
Ju-usc Oct 31, 2025
0ad4077
docs(gepa): revise ReAct metric example to be general and extensible
Ju-usc Oct 31, 2025
ef5563e
docs(gepa): replace custom proposer example with reference to ReActMo…
Ju-usc Oct 31, 2025
1b10b65
docs(gepa): make custom proposer section more approachable and clear
Ju-usc Oct 31, 2025
675a0cd
docs(gepa): update ReAct reflection prompt to match current implement…
Ju-usc Nov 1, 2025
4a4d209
feat(gepa): warn when ReAct modules detected but optimization disabled
Ju-usc Nov 3, 2025
d84842f
test(gepa): fix DummyLM configuration and remove exception swallowing
Ju-usc Nov 9, 2025
bb28f5f
test(gepa): add failing tests for generic tool optimization
Ju-usc Nov 9, 2025
a590e46
refactor(gepa): rename optimize_react_components to enable_tool_optim…
Ju-usc Nov 9, 2025
6aceaf5
refactor(gepa): extract nested function to private method
Ju-usc Nov 9, 2025
7a5bf05
feat(gepa): detect tool-using predictors via type checking
Ju-usc Nov 9, 2025
12b01ed
test(gepa): update ReAct tests for predictor-name-based keys
Ju-usc Nov 10, 2025
265896c
test(gepa): use explicit predictor keys in tool optimization tests
Ju-usc Nov 10, 2025
fe19dac
feat(gepa): extract tools from runtime traces
Ju-usc Nov 10, 2025
38dd7cb
feat(gepa): detect tool-using predictors at compile time
Ju-usc Nov 10, 2025
7f05a73
refactor(gepa): use predictor identity for ReAct detection
Ju-usc Nov 10, 2025
0a6016d
test(gepa): refactor ReAct tests to use dynamic predictor names
Ju-usc Nov 10, 2025
a635768
refactor(gepa): generalize proposer to support both ReAct and tool mo…
Ju-usc Nov 10, 2025
e35603a
refactor(gepa): eliminate create-delete pattern in base_program build
Ju-usc Nov 10, 2025
ecb3726
refactor(gepa): eliminate ReAct coupling in build_program
Ju-usc Nov 11, 2025
d3693c9
refactor(gepa): apply code cleanup principles consistently
Ju-usc Nov 11, 2025
a086646
refactor(gepa): unify config extraction patterns
Ju-usc Nov 11, 2025
0cecb75
refactor(gepa): remove verbose logs and consolidate comments
Ju-usc Nov 11, 2025
9592c50
docs(gepa): clarify ReAct trace workaround with TODO
Ju-usc Nov 12, 2025
76d7af5
test(gepa): remove deprecated ReAct-specific tests and refactor tool …
Ju-usc Nov 13, 2025
ac66e05
feat(gepa): add assertion for ReAct two-predictor design
Ju-usc Nov 13, 2025
3ec4ada
test(gepa): add DSPy ReAct design docs and improve test consistency
Ju-usc Nov 13, 2025
b679ba2
fix(test): remove trailing whitespace and extra blank lines
Ju-usc Nov 13, 2025
02aa151
refactor(gepa): clarify tool proposer output field descriptions
Ju-usc Nov 14, 2025
d37e433
Merge branch 'main' into feature/tool-description-optimization
Ju-usc Nov 14, 2025
d8b7c66
refactor(gepa): treat args as canonical for tool arg descriptions
Ju-usc Nov 14, 2025
f62a68e
refactor(gepa): tolerate missing arg descriptions when applying tool …
Ju-usc Nov 14, 2025
e031409
refactor(gepa): use args as sole source of tool arg descriptions
Ju-usc Nov 14, 2025
a133545
test(gepa): drop arg_desc expectations from tool optimization tests
Ju-usc Nov 14, 2025
b1e4f3d
refactor(gepa): refine reflection prompts for tool optimization
Ju-usc Nov 19, 2025
7f81e88
refactor(gepa): improve tool extraction robustness and observability
Ju-usc Nov 19, 2025
f267ccc
refactor(gepa): simplify initialization logic
Ju-usc Nov 19, 2025
28ceb70
refactor(gepa): remove ReAct trace workaround
Ju-usc Nov 19, 2025
d8275ef
chore(gepa): clean up whitespace and style changes from tool optimiza…
Ju-usc Nov 19, 2025
deeb010
chore(gepa): clean up whitespace and style changes from tool optimiza…
Ju-usc Nov 19, 2025
4bcc714
chore: restore .gitignore to match main
Ju-usc Nov 19, 2025
4b872d7
docs(gepa): document tool optimization flag in overview
Ju-usc Nov 19, 2025
5129586
docs(gepa): clarify enable_tool_optimization and custom proposers
Ju-usc Nov 19, 2025
ebe4221
docs(gepa): update tool module optimization prompt to match actual code
Ju-usc Nov 20, 2025
2133b0b
docs(gepa): update How Tool Optimization Works section
Ju-usc Nov 20, 2025
9c05b6a
docs(gepa): update When to Use Tool Optimization section
Ju-usc Nov 20, 2025
ec9241b
docs(gepa): update custom proposers section for tool optimization
Ju-usc Nov 20, 2025
46d8f5e
docs(gepa): update usage examples with correct tool patterns and inte…
Ju-usc Nov 20, 2025
5d33fc6
docs(gepa): remove redundant metrics section
Ju-usc Nov 20, 2025
b564029
refactor(gepa): use absolute import for ToolModuleProposer
Ju-usc Nov 20, 2025
13209f5
docs(gepa): update tool optimization doc link
Ju-usc Nov 20, 2025
09990a6
docs(gepa): replace eval() example with get_weather tool
Ju-usc Nov 29, 2025
33fc771
fix(gepa): change ReAct detection log from warning to info
Ju-usc Dec 2, 2025
fa72fc0
refactor(gepa): extract _propose_component_texts as private method
Ju-usc Dec 2, 2025
2a15e56
refactor(gepa): TODO out generic tool module optimization, keep ReAct…
Ju-usc Dec 2, 2025
59f23e5
refactor(gepa): remove generic tool module detection, keep ReAct only
Ju-usc Dec 2, 2025
68d7021
refactor(gepa): improve naming and extract tool update methods
Ju-usc Dec 2, 2025
d99ba1d
refactor(gepa): remove unused TOOL_MODULE_PREFIX and rename to tool_c…
Ju-usc Dec 2, 2025
3fd9a0a
refactor(gepa): rename ToolModuleProposer to ToolProposer
Ju-usc Dec 2, 2025
7d64e7a
docs(gepa): update tool optimization docs for ReAct-only support
Ju-usc Dec 2, 2025
4b3ee18
refactor(gepa): unify prefix to TOOL_MODULE_PREFIX for all tool-using…
Ju-usc Dec 2, 2025
3a5fb7f
docs(gepa): remove CustomAgent example, keep ReAct only
Ju-usc Dec 2, 2025
0e75d8c
docs(gepa): update enable_tool_optimization docstring for ReAct-only …
Ju-usc Dec 2, 2025
734fbdf
test(gepa): remove generic tool tests, keep ReAct-only tests
Ju-usc Dec 2, 2025
1fb15ba
refactor(gepa): use local ToolProposer variable, update docs for ReAc…
Ju-usc Dec 2, 2025
da2f6d0
docs(gepa): update tool optimization docs for ReAct-only support
Ju-usc Dec 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
227 changes: 227 additions & 0 deletions docs/docs/api/optimizers/GEPA/GEPA_Advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -443,3 +443,230 @@ gepa = dspy.GEPA(
auto="medium"
)
```

## Tool Optimization

### What is enable_tool_optimization?

When `enable_tool_optimization=True`, GEPA jointly optimizes `dspy.ReAct` modules: predictor instructions and tool descriptions and argument descriptions are updated together, instead of being tuned in isolation. This lets the model learn better patterns for when to call a tool and how to use it from the same execution traces and feedback that drive core GEPA.

### Usage and constraints

- **Expose tools as `dspy.Tool` in signatures and examples.** GEPA only optimizes tools that are represented as `dspy.Tool` and actually passed as `dspy.Tool` objects into your modules.
- **Treat `Tool.name` as a stable identifier.** `Tool.name` is the tool's name, and GEPA uses it to attach improved descriptions and argument descriptions. If you reuse the same `Tool.name` for different tools, they will share the same text updates.
- **Avoid custom tools named `"finish"`.** The built-in ReAct `"finish"` tool is reserved and excluded from optimization. Custom tools with the name `"finish"` are also not optimized.
- **Custom instruction proposers handle all modules and tool updates.** When you provide an `instruction_proposer`, GEPA routes every optimized module through your proposer instead of the built-in instruction proposer. If `enable_tool_optimization=True`, modules that call tools are still included, and your proposer is also responsible for updating their tool descriptions and argument descriptions.

### Tool Module Optimization Prompt

GEPA uses `ToolProposer` to optimize ReAct modules when `enable_tool_optimization=True`. For each module, the proposer builds a dynamic signature from the base `GenerateImprovedToolModuleDescriptionsFromFeedback` signature shown below, then appends output fields for each tool description and each tool argument description in that module. For ReAct modules, the proposer also appends input and output fields for the extract instruction.

```python
class GenerateImprovedToolModuleDescriptionsFromFeedback(dspy.Signature):
"""I provided an assistant with predictor instructions and tool descriptions,
but its performance needs improvement based on the examples_with_feedback below.

Your task is to propose better predictor instructions, tool descriptions, and
tool argument descriptions that address the issues shown in these examples.
Focus on reinforcing patterns that clearly improve the assistant's performance
on similar tasks, rather than rewriting everything from scratch unless necessary.
These components are progressively optimized - refine only what needs to change.

Analyze the examples_with_feedback to identify success and failure patterns,
and write improved instructions and descriptions at their appropriate level
of abstraction and/or specificity, so that each layer plays a clear,
complementary role without unnecessary repetition or verbosity unless
redundancy clearly helps the assistant's performance.
"""

current_predictor_instruction = dspy.InputField(
desc="Current instruction guiding the predictor"
)
current_tools = dspy.InputField(
annotation=list[dspy.Tool],
desc="Available tools with their complete schemas"
)
examples_with_feedback = dspy.InputField(
desc="Execution examples with feedback showing successes and failures"
)

improved_predictor_instruction: str | None = dspy.OutputField(
desc="Improved instruction for the predictor",
default=None
)

# GEPA appends output fields dynamically for each tool and argument:
# - improved_tool_{name}_desc with desc="Improved description of tool '{name}'"
# - improved_tool_{name}_arg_{param}_desc with desc="Improved description of the argument '{param}' of tool '{name}'"
# For ReAct modules, GEPA also appends:
# - current_extract_instruction (input) with desc="Current instruction for extraction predictor"
# - improved_extract_instruction (output) with desc="Improved instruction for extraction"
```

The reflection LM uses this dynamically-built signature to jointly propose updates across predictor instructions, tool descriptions, and argument descriptions based on execution feedback. Updates are coordinated rather than made in isolation: the LM sees all current components together and can selectively update any subset by returning new text, or return `None` to keep a component unchanged.

### How Tool Optimization Works

When `enable_tool_optimization=True`, GEPA:

1. **Discovers ReAct modules** - Identifies `dspy.ReAct` modules and their associated tools
2. **Treats them as joint optimization units** - Instead of only optimizing predictor instructions, GEPA optimizes predictor instructions and tool descriptions together as a coordinated set; for ReAct this includes both the react and extract instructions
3. **Routes to specialized proposer** - Separates components by type and routes them appropriately:
- **With custom `instruction_proposer`**: Your custom proposer receives both ReAct modules and plain predictors, and is responsible for updating all components
- **With default proposer**: Plain predictors use the default instruction proposer; ReAct modules use `ToolProposer`, which employs the dynamic signature mechanism described above
4. **Optimizes jointly** - `ToolProposer` improves predictor instructions and tool descriptions together based on execution feedback, coordinating updates across all components rather than tuning them in isolation
5. **Applies updates** - Improved instructions update predictor signatures; improved tool descriptions and argument descriptions update all `dspy.Tool` objects with matching tool names throughout the program

Modules without tools (like `dspy.Predict` or `dspy.ChainOfThought`) continue using standard GEPA instruction-only optimization.

### When to Use Tool Optimization

Enable `enable_tool_optimization=True` when tools are central to your program's behavior and you want GEPA to jointly optimize predictor instructions and tool descriptions together. Common scenarios:

1. **Wrong tool selection** - Predictor with `search` and `weather` tools keeps searching when it should check weather, or vice versa. GEPA refines predictor instructions and tool descriptions to clarify when to use each tool.

2. **Underused tools** - Predictor responds "I don't know" without using available tools that could answer the question. GEPA improves predictor instructions to be more proactive about tool usage.

3. **Tool call loops** - Agent keeps calling `web_search` multiple times with similar queries instead of synthesizing information. GEPA improves instructions to encourage synthesis and tool descriptions to clarify when searches are sufficient.

4. **Extraction failures (ReAct)** - Agent executes tools correctly but fails to extract the final answer from the trajectory. GEPA improves extract instruction to better identify and format answers from tool outputs.

5. **Multi-agent delegation** - Parent agent has delegation tools to specialized sub-agents but doesn't understand when to use each. GEPA optimizes instructions and tool descriptions across both parent and sub-agent modules for coherent delegation.

See the usage example below for tool-using programs.

### Usage Example

```python
import dspy

def search_web(query: str) -> str:
return f"Search results for: {query}"

def get_weather(city: str) -> str:
"""Get the current weather for a city."""
return f"The weather in {city} is sunny and 75°F"

# Create tools with basic descriptions
search_tool = dspy.Tool(search_web, name="search_web", desc="Search tool")
weather_tool = dspy.Tool(get_weather, name="get_weather", desc="Weather tool")

program = dspy.ReAct("question -> answer", tools=[search_tool, weather_tool])

# Enable tool optimization
gepa = dspy.GEPA(
metric=my_metric,
reflection_lm=dspy.LM(model="gpt-5-mini"),
enable_tool_optimization=True,
auto="medium"
)

optimized_program = gepa.compile(program, trainset=train_examples, valset=val_examples)
```

### Inspecting Optimized Programs

View optimization results and metadata (requires `track_stats=True`):

```python
# High-level optimization metadata
optimized_program.detailed_results
```

Access optimized instructions and tool descriptions directly:

```python
# Predictor instructions
for name, predictor in optimized_program.named_predictors():
print(f"{name}: {predictor.signature.instructions}")

# Tool descriptions and argument descriptions
for tool_name, tool in optimized_program.tools.items():
print(f"{tool_name}: {tool.desc}")
for arg_name, arg_schema in tool.args.items():
print(f" {arg_name}: {arg_schema.get('description', 'N/A')}")
```

### Custom Instruction Proposers with Tool Optimization

When you provide a custom `instruction_proposer`, GEPA routes **all components** to your proposer, including both plain predictors and ReAct modules. Your proposer must handle both.

**What your proposer receives:**

- **Plain predictors**: instruction strings keyed by predictor name
- **Tool modules (ReAct)**: JSON strings keyed by module identifier, containing predictor instructions and tool schemas
- Tool modules: `f"{TOOL_MODULE_PREFIX}:{extract_predictor_name}"`

**Your proposer's responsibilities:**

```python
import json
from dspy.teleprompt.gepa.gepa_utils import TOOL_MODULE_PREFIX

def custom_proposer(candidate, reflective_dataset, components_to_update):
"""Custom instruction proposer for GEPA with tool optimization.

Args:
candidate: dict[str, str] - All components in the program
{
"predictor_name": "instruction string",
"tool_module:extract_name": '{"react_name": "...", "extract_name": "...", "tools": {...}}'
}
reflective_dataset: dict[str, list[dict]] - Execution examples with feedback per component
components_to_update: list[str] - Component keys to optimize in this call

Returns:
dict[str, str]: Improved instructions for components_to_update keys only
"""
improved_components = {}

for component_key in components_to_update:
if component_key.startswith(TOOL_MODULE_PREFIX):
config = json.loads(candidate[component_key])
# Example: {"react_pred": "react instruction", "extract_pred": "extract instruction", "tools": {...}}

# Find predictor names (predictor keys with string values and "tools" is a dict)
predictor_keys = [k for k, v in config.items() if isinstance(v, str)]
for pred_name in predictor_keys:
config[pred_name] = "improved predictor instruction"

# Update tool descriptions and argument descriptions
for tool_name, tool_info in config.get("tools", {}).items():
tool_info["desc"] = "improved tool description"
for arg_name in tool_info.get("args", {}):
tool_info["args"][arg_name]["description"] = "improved argument description"

improved_components[component_key] = json.dumps(config)
else:
# Plain predictor: improve instruction string only
improved_components[component_key] = "improved instruction"

return improved_components
```

Your proposer can use any optimization approach: custom prompts, LM calls, heuristics, or rule-based logic.

**ReAct module JSON structure:**

```json
{
"react_name": "react instruction",
"extract_name": "extract instruction",
"tools": {
"search": {
"desc": "...",
"args": {"query": {"type": "string", "description": "..."}}
}
}
}
```

**What to update:**
- `config[predictor_name] = "proposed predictor instruction"`
- `config["tools"][tool_name]["desc"] = "proposed tool description"`
- `config["tools"][tool_name]["args"][arg_name]["description"] = "proposed argument description"`

**What to preserve:**
- `config["tools"][tool_name]["args"][arg_name]["type"]` and other schema metadata (changing these breaks the tool since they must match the underlying function's parameter types)

See [`ToolProposer`](https://github.com/stanfordnlp/dspy/blob/main/dspy/teleprompt/gepa/instruction_proposal.py) for reference.
6 changes: 6 additions & 0 deletions docs/docs/api/optimizers/GEPA/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,12 @@ Practical Recipe for GEPA-Friendly Feedback:
- **Multi-Objective Tasks** (e.g., PUPA): Decompose aggregate scores to reveal contributions from each objective, highlighting tradeoffs (e.g., quality vs. privacy).
- **Stacked Pipelines** (e.g., code generation: parse → compile → run → profile → evaluate): Expose stage-specific failures; natural-language traces often suffice for LLM self-correction.

## Tool Optimization with GEPA

When `enable_tool_optimization=True`, GEPA jointly optimizes `dspy.ReAct` modules. This lets the optimizer update predictor instructions and tool descriptions/argument descriptions together, based on execution traces and feedback, instead of keeping tool behavior fixed.

For details, examples, and the underlying design (tool discovery, naming requirements, and interaction with custom instruction proposers), see [Tool Optimization](GEPA_Advanced.md#tool-optimization).

## Custom Instruction Proposal

For advanced customization of GEPA's instruction proposal mechanism, including custom instruction proposers and component selectors, see [Advanced Features](GEPA_Advanced.md).
Expand Down
88 changes: 86 additions & 2 deletions dspy/teleprompt/gepa/gepa.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import inspect
import json
import logging
import random
from dataclasses import dataclass
Expand All @@ -9,8 +10,15 @@
from gepa.proposer.reflective_mutation.base import ReflectionComponentSelector

from dspy.clients.lm import LM
from dspy.predict.react import ReAct
from dspy.primitives import Example, Module, Prediction
from dspy.teleprompt.gepa.gepa_utils import DspyAdapter, DSPyTrace, PredictorFeedbackFn, ScoreWithFeedback
from dspy.teleprompt.gepa.gepa_utils import (
TOOL_MODULE_PREFIX,
DspyAdapter,
DSPyTrace,
PredictorFeedbackFn,
ScoreWithFeedback,
)
from dspy.teleprompt.teleprompt import Teleprompter
from dspy.utils.annotation import experimental

Expand Down Expand Up @@ -273,6 +281,11 @@ def metric(
warn_on_score_mismatch: GEPA (currently) expects the metric to return the same module-level score when
called with and without the pred_name. This flag (defaults to True) determines whether a warning is
raised if a mismatch in module-level and predictor-level score is detected.
enable_tool_optimization: Whether to enable joint optimization of dspy.ReAct modules.
When enabled, GEPA jointly optimizes predictor instructions and tool descriptions together
for dspy.ReAct modules. See the
[Tool Optimization guide](https://dspy.ai/api/optimizers/GEPA/GEPA_Advanced/#tool-optimization)
for details on when to use this feature and how it works. Default is False.
seed: The random seed to use for reproducibility. Default is 0.
gepa_kwargs: (Optional) Additional keyword arguments to pass directly to [gepa.optimize](https://github.com/gepa-ai/gepa/blob/main/src/gepa/api.py).
Useful for accessing advanced GEPA features not directly exposed through DSPy's GEPA interface.
Expand Down Expand Up @@ -355,6 +368,7 @@ def __init__(
wandb_init_kwargs: dict[str, Any] | None = None,
track_best_outputs: bool = False,
warn_on_score_mismatch: bool = True,
enable_tool_optimization: bool = False,
use_mlflow: bool = False,
# Reproducibility
seed: int | None = 0,
Expand Down Expand Up @@ -417,6 +431,7 @@ def __init__(
self.wandb_api_key = wandb_api_key
self.wandb_init_kwargs = wandb_init_kwargs
self.warn_on_score_mismatch = warn_on_score_mismatch
self.enable_tool_optimization = enable_tool_optimization
self.use_mlflow = use_mlflow

if track_best_outputs:
Expand Down Expand Up @@ -546,11 +561,80 @@ def feedback_fn(
reflection_lm=self.reflection_lm,
custom_instruction_proposer=self.custom_instruction_proposer,
warn_on_score_mismatch=self.warn_on_score_mismatch,
enable_tool_optimization=self.enable_tool_optimization,
reflection_minibatch_size=self.reflection_minibatch_size,
)

# Instantiate GEPA with the simpler adapter-based API
base_program = {name: pred.signature.instructions for name, pred in student.named_predictors()}
base_program = {}

# First, process ReAct modules to claim their predictors
if self.enable_tool_optimization:
for module_path, module in student.named_sub_modules():
if not isinstance(module, ReAct):
continue

# Verify DSPy's two-predictor ReAct design
assert hasattr(module, "extract") and hasattr(module.extract, "predict"), \
f"ReAct module '{module_path}' missing extract.predict - DSPy design may have changed"

# Get predictor names via object identity
extract_predictor = module.extract.predict
react_predictor = module.react
extract_predictor_name = None
react_predictor_name = None
for name, pred in student.named_predictors():
if pred is extract_predictor:
extract_predictor_name = name
elif pred is react_predictor:
react_predictor_name = name

# Use extract.predict as the key since it is the target predictor for feedback lookup
module_key = f"{TOOL_MODULE_PREFIX}:{extract_predictor_name}"

# Build JSON config with dynamic predictor names as keys
config = {
react_predictor_name: react_predictor.signature.instructions,
extract_predictor_name: extract_predictor.signature.instructions,
"tools": {
tool_name: {
"desc": tool.desc,
"args": tool.args,
}
for tool_name, tool in module.tools.items()
if tool_name != "finish" # Skip the built-in finish tool
}
}

base_program[module_key] = json.dumps(config, indent=2)
else:
# Warn if ReAct modules found but tool optimization disabled
for module_path, module in student.named_sub_modules():
if isinstance(module, ReAct):
logger.info(
f"Detected ReAct module at '{module_path}'. Consider using "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a info instead warning IMO, unless we can articulate that involving tools in the optimization works strictly better than not involving tools.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed!

"`enable_tool_optimization=True` to jointly optimize react instructions, "
"extract instructions, tool descriptions, and tool argument descriptions."
)

# Then, process individual predictors (skip if already part of a module config)
for name, pred in student.named_predictors():
if self.enable_tool_optimization:
# Skip if predictor is part of a ReAct module config
found = False
for key, val in base_program.items():
if key.startswith(TOOL_MODULE_PREFIX):
config = json.loads(val)
if name in config:
found = True
break

if found:
continue

# Add regular predictor
base_program[name] = pred.signature.instructions

gepa_result: GEPAResult = optimize(
seed_candidate=base_program,
trainset=trainset,
Expand Down
Loading