Commit 88ae78a
Add GRPO Optimizer to DSPy (#8171)
* D1 for GRPO
* Improve type for arbor
* Add temp test script for grpo
* Add note about assumption of same inputs to all predictors
* Disable LM cache in GRPO
* Add support for valset
* Add configurable variable module invocation handling strategy
* Noahs dspy.LM changes and dspy.ArborProvider implementation
* Add latest arbor changes
* First working grpo version
* Add modules
* Add training args in initialize
* Fix grpo
* Add batches
* Update finetuning infra
* Revise server interface
* Update example script
* Move temporary interface to a separate file
* Add LM level reinforce interface
* Update testing script
* Update api_base access for finetune
* Style check
* Style check all
* Add Test script with MATH dataset
* Ensure grpo trainer does not crash due to format issues, but temporary fix
* Add error log
* Fix termination
* Delete temp files
* Add diff
* Add model update endpoint support
* Remove experimental flag
* Remove extra files
* Add GRPO error resiliency to avoid parsing failures lead to crashes
* Param Passthrough and Consistent Tutorial Script (#3)
* Add param passthrough and default banking77 tutorial
* Add more threads
* Update banking tutorial
---------
Co-authored-by: Noah Ziems <nziems2@nziems2@nd.edu>
* Lower beta param for banking tutorial
* Add warning on no training data
* Add train logging to GRPIO
* Add max_prompt_length and max_completion_length support
* fix litellm retries
* no jsonadapter
* fix errors
* fix tests
* fix tests
* add the retry strategy back
* Add working implementation of format errors and negative rewards
* Fix bugs in validation
* Add validation logic to grpo
* Add more supported args
* Support max grad norm
* Add Train Shuffling logic
* Add lora support
* Add soft format rewards
* Disable proivide_traceback in all grpo invoked evaluates
* Remove temporary tutorial script
* Revert classification finetuning tutorial
* Comment out json adapter test
* Fix ruff errors
* Add teacher (#8)
* Modify teacher preparation logic
* Re-add teachers to GRPO
* Style fix
* Update tutorial script
* Housekeeping
* Revert number of train steps
* Address PR comments
* Add wandb support for GRPO training runs
* Add completion logging
* Add logging steps support
* update report_to to be default none
* Add max_context_length
* Fix num_samples_per_input computation
* Checkpointing Endpoints (#10)
* Fix typo
* Fix checkpoint url
* fix merge conflict leftover
* shorten the warning message in json adapter
* fix the error piping
---------
Co-authored-by: Lakshya A Agrawal <lakshyaaagrawal@berkeley.edu>
Co-authored-by: Dilara Soylu <21346670+dilarasoylu@users.noreply.github.com>
Co-authored-by: Noah Ziems <nziems2@nziems2@nd.edu>
Co-authored-by: chenmoneygithub <chen.qian@databricks.com>1 parent 637d759 commit 88ae78a
File tree
11 files changed
+1338
-37
lines changed- docs/docs/tutorials/classification_finetuning
- dspy
- adapters
- clients
- dsp/utils
- teleprompt
- tests/teleprompt
11 files changed
+1338
-37
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
211 | 211 | | |
212 | 212 | | |
213 | 213 | | |
214 | | - | |
| 214 | + | |
215 | 215 | | |
216 | 216 | | |
217 | 217 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
65 | | - | |
66 | | - | |
| 65 | + | |
| 66 | + | |
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
14 | | - | |
| 14 | + | |
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| |||
188 | 188 | | |
189 | 189 | | |
190 | 190 | | |
191 | | - | |
192 | | - | |
193 | | - | |
194 | | - | |
195 | 191 | | |
196 | 192 | | |
197 | 193 | | |
| |||
212 | 208 | | |
213 | 209 | | |
214 | 210 | | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
215 | 222 | | |
216 | 223 | | |
217 | 224 | | |
| |||
0 commit comments