-
Notifications
You must be signed in to change notification settings - Fork 2.4k
feat(gepa): add tool description optimization for multi-agent systems #8928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(gepa): add tool description optimization for multi-agent systems #8928
Conversation
- Add optimize_tool_descriptions parameter (default False) to GEPA - Extract tool descriptions from all nested modules via named_sub_modules() - Apply optimized descriptions in DspyAdapter.build_program() - Enables holistic optimization of tools across main and subagent modules - Tests: 4 new tests, all 16 pass (4 new + 12 existing)
|
Apologies for accidentally closing #8927 Thank you for the thorough review, @LakshyAAAgrawal! I'll address your feedback:
I'll start working on items 1 and 2 and update the PR soon. Please let me know if you have any specific preferences for the tutorial format! |
|
Thanks a lot! For the tutorial, I think you can follow the current GEPA tutorial format (load a dataset, show an example from the dataset, build a dspy program, evaluate the baseline program on testset, run GEPA with new optimization settings, show the optimized programs' prompts and tool descriptions, and finally evaluate the optimized program). Hopefully we should be able to see a nice and large gain on agentic tasks with this amazing contribution by you! |
- Add ToolProposer with GenerateImprovedToolDescription signature - Implement routing logic to separate tools from signatures - Tools use ToolProposer, signatures use custom or parent default - Backward compatible: preserves existing custom_instruction_proposer behavior - Add test verifying routing splits components correctly
- Define tool functions outside class for clarity - Match structure of simple ReAct example - Add clear comments explaining architecture - Make code more readable and maintainable
197f077 to
c4f2041
Compare
|
Hi @LakshyAAAgrawal, I've implemented the tool-specific proposer as requested! Here's what's included: 1. Tool-Specific Proposer Implementation ✅
2. Documentation ✅
Reflection Prompt Design: Before I create a short tutorial (item #3), would you have any feedback on:
Any feedback would be helpful before I invest time in the tutorial. Thank you! |
|
wait there is a bug in the implementation working on it to fix. Also test has to be fixed. |
…euse Tools now copy ReAct's reflective data with tool-specific annotation instead of complex trajectory extraction. This 15-line approach reuses ReAct's existing context (thoughts, tool calls, observations) and adds focused annotation for each tool. Implementation: - Tools receive full ReAct reflective examples (same trajectory context) - Feedback prefixed: [Optimizing tool: 'X'] for focused optimization - Reflection LM sees complete multi-step execution traces per tool Benefits: - Simpler: 15 lines vs 70+ line extraction approach - Reuses code: No duplicate trajectory formatting logic - Same context: Tools see full ReAct execution traces - Clean: Removed all debug output Tests: - 4 focused tests following GEPA patterns (removed 1 redundant) - 226KB fixture with 34 LM + 6 reflection calls - All tests passing with gpt-5-nano traces Documentation: - Updated GEPA_Advanced.md with implementation details - Explains reflective dataset construction approach
|
|
||
| The `optimize_tool_descriptions` parameter enables GEPA to optimize tool descriptions in addition to signature instructions. This is particularly valuable for ReAct agents and other tool-using systems, where the quality of tool descriptions directly impacts the agent's ability to select appropriate tools for each task. | ||
|
|
||
| Unlike signature instructions that guide reasoning strategies, tool descriptions serve a fundamentally different purpose: they help agents decide **which tool to use** in a given situation. GEPA recognizes this categorical difference and applies a specialized reflection prompt tailored for tool selection decisions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which tool to use, when to use it, and how to use it. All three are captured by the description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid the word "fundamentally". One can imagine that all of tool descriptions can (and many times do) simply included in the system prompt itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add a corresponding entry in GEPA Overview, that links to this file/section.
|
|
||
| Consider enabling `optimize_tool_descriptions=True` when: | ||
|
|
||
| - **Building ReAct agents**: ReAct agents rely on tool descriptions to make action selection decisions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One should consider using this, when they use dspy.Tool anywhere in the DSPy program. Here are a few scenarios for using dspy.Tool:
| ) | ||
| ``` | ||
|
|
||
| **Note:** Tool optimization is fully backward compatible. Existing programs without tools, or with `optimize_tool_descriptions=False`, continue to work exactly as before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to inform users about backward compatibility here. It should be implicit that there should be no behaviour changes for any program not containing dspy.Tool.
dspy/teleprompt/gepa/gepa.py
Outdated
| raised if a mismatch in module-level and predictor-level score is detected. | ||
| optimize_tool_descriptions: Whether to optimize tool descriptions for modules with tools | ||
| (e.g., ReAct agents). When enabled, tool descriptions are included in the optimization | ||
| process alongside signature instructions. Default is False. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a link to GEPA Advanced/Tool section
dspy/teleprompt/gepa/gepa_utils.py
Outdated
| ) | ||
|
|
||
| self.propose_new_texts = custom_propose_new_texts | ||
| elif self.optimize_tool_descriptions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edge case: What should happen when user tries to provide both a custom proposer, and enables optimize_tool_descriptions
dspy/teleprompt/gepa/gepa_utils.py
Outdated
| # Handle signature components - replicate proposer's default behavior | ||
| sig_texts = {} | ||
| if sig_components: | ||
| from gepa.strategies.instruction_proposal import InstructionProposalSignature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a slight deviation from this PR, but would be a large enhancement (feel free to ignore):
- Create 2 fields, self.instruction_proposal_signature and self.tool_proposer, which are initialized to the default InstructionProposalSignature and ToolProposerSignature.
- Take an argument from dspy.GEPA that can override the default signature values.
dspy/teleprompt/gepa/gepa_utils.py
Outdated
| # Second pass: Process tools by copying ReAct data with annotation | ||
| react_module_name = None | ||
| for name in ret_d.keys(): | ||
| if "react" in name.lower(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this robust? Might it be better to use isinstance or some other way?
| Your task is to write a better description for this tool. | ||
| Read the examples carefully and identify patterns in when the tool was used successfully versus when it was misused or overlooked. Identify any domain-specific information about the tool's capabilities or appropriate usage that may not be available to the assistant in the future. The assistant may have developed effective patterns for tool selection - if so, ensure the tool description supports those patterns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tool use. Also suggest identifying any failure modes of the tool?
|
Dear @Ju-usc, This is a great PR. Thanks a lot! I have tried to be overly critical and made too many nits. Feel free to ignore if you disagree with something. Let me know if you'd like me to address anything! Regarding the meta prompt, overall I think it looks great. However, I suggest that as you build the tutorial, you may find that the reflection prompt needs tweaking, or the content exposed in reflective_dataset for the tool may be lacking or need improvement. This is going to be an empirical exercise, which will guide what works in the reflection meta prompts. ! Looking forward to the tutorial on this too! You may already have thoughts about what you'd like to show in the tutorial, but if not, you may consider building off (https://kargarisaac.medium.com/building-and-optimizing-multi-agent-rag-systems-with-dspy-and-gepa-2b88b5838ce2) by @kargarisaac. |
- Add GenerateImprovedToolDescriptionFromFeedback signature documentation - Include tool-aware metric example showing trajectory access - Document tool prefix annotation in feedback - Note component_selector applies to both signatures and tools - Fix 'fundamentally' language per reviewer feedback
- Separate Pass 1 (predictor examples) and Pass 2 (tool aggregation) - Clarify Generated Outputs includes full trajectory for ReAct - Fix feedback annotation format to [Tool 'name' from 'predictor_key'] - Add Component Identification & Proposer Routing section - Explain dual-proposer independence (custom proposer doesn't affect tool proposer) - Use consistent terminology: 'predictor' and 'signature instructions'
Improve instructions for the reflection LM to focus on reinforcing successful patterns and providing progressively optimized updates for predictor instructions and tool descriptions.
Move tool extraction logic to evaluate() loop for immediate capture. Fix overwrite risk by merging discovered tools with existing config. Improve logging and docstrings for better maintainability.
Move helper function outside loop and simplify predictor deduplication check by validating keys before parsing JSON.
Use standard trace selection logic (prioritizing failures) for all modules including ReAct. The extractor logic workaround is no longer needed as we handle aggregated duplicates differently.
7b95f13 to
86a885a
Compare
72f5ecb to
deeb010
Compare
|
@LakshyAAAgrawal @chenmoneygithub Thanks again for the thoughtful feedback — I've pushed toward as generic as safely possible while preserving ReAct behavior. Core idea: Optimize tools jointly with the predictor that uses them. For generic tool modules, I detect predictors with Here are my thoughts on a few design choices. Feel free to comment on these or anything else:
Ran an experiment with nested ReAct + custom tool module: https://gist.github.com/Ju-usc/80b9918fe07288204579df735e084cb4 Happy to iterate! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| current_module_config = json.loads(candidate[module_key]) | ||
|
|
||
| # Predictor keys: 1 for tool modules, 2 for ReAct modules (extra extract predictor) | ||
| predictor_keys = [k for k, v in current_module_config.items() if isinstance(v, str)] |
Copilot
AI
Nov 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential IndexError if predictor_keys is empty. Add validation to ensure at least one predictor key exists before accessing index 0. Consider: if not predictor_keys: logger.warning(...); continue or raising a more descriptive error.
| predictor_keys = [k for k, v in current_module_config.items() if isinstance(v, str)] | |
| predictor_keys = [k for k, v in current_module_config.items() if isinstance(v, str)] | |
| if not predictor_keys: | |
| logger.warning(f"No predictor keys found for module '{module_key}'. Skipping.") | |
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is needed - config is built internally by GEPA and always contains predictor keys. Edge case seems impossible.
| for tool_name, tool_info in current_tools_dict.items(): | ||
| # Update tool description if LM proposed a change | ||
| improved_tool_desc = getattr(result, f"improved_tool_{tool_name}_desc", None) | ||
| if improved_tool_desc is not None: | ||
| tool_info["desc"] = improved_tool_desc | ||
|
|
||
| # Update arg descriptions if LM proposed changes | ||
| for arg_name in tool_info["args"].keys(): | ||
| improved_tool_arg_desc = getattr(result, f"improved_tool_{tool_name}_arg_{arg_name}_desc", None) | ||
| if improved_tool_arg_desc is not None: | ||
| tool_info["args"][arg_name]["description"] = improved_tool_arg_desc | ||
|
|
||
| improved_module_config["tools"][tool_name] = tool_info |
Copilot
AI
Nov 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutating the input data structure. tool_info is a reference to a dict in current_tools_dict (from line 460), so modifications on lines 464 and 470 mutate the original candidate data. This can cause unintended side effects across GEPA iterations. Create a deep copy: import copy and tool_info = copy.deepcopy(tool_info) after line 460.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is an issue - candidate[module_key] is a json string, so json.loads() creates a new dict. Mutations don't affect the original.
Summary
Addresses #8706 which requested GEPA to optimize tool descriptions.
When
enable_tool_optimization=True, GEPA jointly optimizes:All components are optimized together based on shared execution traces, enabling the reflection LM to see how components work together.
Backward compatible -
enable_tool_optimization=False(default) preserves existing behavior.Issue
Closes #8706
Changes
Core Implementation
enable_tool_optimizationparameter on GEPA (defaultFalse)dspy.Tool,list[dspy.Tool],dict[str, Tool])ToolModuleProposer: Specialized proposer with dynamic signature generation for each tool and argumentisinstance()check, includes both react and extract predictorsToolModuleProposer; regular predictors → default/custom proposerdspy.Toolobjects by matchingtool.nameTesting
9 tests covering:
build_programNoneDocumentation
GEPA_Advanced.md- Tool optimization guide with usage examplesoverview.md- Brief introduction linking to advanced guideUsage Example
ReAct Agent
Custom Tool-Using Predictor
Key Features
Nonefor unchanged components