-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[Doc] Sleep mode documentation #28357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc] Sleep mode documentation #28357
Conversation
Signed-off-by: Amir Balwel <amir.balwel@embeddedllm.com>
|
Documentation preview: https://vllm--28357.org.readthedocs.build/en/28357/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the sleep mode documentation with examples for level 2 sleep. While the Python API example is correct, the new HTTP API example for level 2 sleep has an incorrect sequence of commands that would cause it to fail. I've provided a critical review comment with a code suggestion to correct the command order and ensure the documentation is accurate and helpful for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
docs/features/sleep_mode.md
Outdated
| ```bash | ||
| curl -X POST 'http://localhost:8000/sleep?level=2' | ||
| curl -X POST 'http://localhost:8000/collective_rpc' -H 'Content-Type: application/json' -d '{"method":"reload_weights"}' | ||
| curl -X POST 'http://localhost:8000/wake_up' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document level-2 HTTP flow in the wrong order
The new level‑2 HTTP example calls POST /collective_rpc to reload weights before the engine is woken up and never specifies the required tags. After a level‑2 sleep all weight and KV cache allocations are unmapped; invoking reload_weights while the engine is still sleeping attempts to write into freed GPU buffers and will fail or leave the server in an undefined state. The Python example above correctly performs /wake_up?tags=weights first and then reloads weights, optionally followed by /wake_up?tags=kv_cache. The HTTP flow should follow the same sequence with tag parameters so users do not crash the service when copying the commands.
Useful? React with 👍 / 👎.
youkaichao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also link the blog https://blog.vllm.ai/2025/10/26/sleep-mode.html
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com>
tjtanaa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Amir Balwel <amir.balwel@embeddedllm.com> Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com> Co-authored-by: Amir Balwel <amir.balwel@embeddedllm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Amir Balwel <amir.balwel@embeddedllm.com> Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com> Co-authored-by: Amir Balwel <amir.balwel@embeddedllm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Updating sleep mode documentation with examples for level 2 sleep
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.