Skip to content

Conversation

@iAmir97
Copy link
Contributor

@iAmir97 iAmir97 commented Nov 9, 2025

Updating sleep mode documentation with examples for level 2 sleep


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Amir Balwel <amir.balwel@embeddedllm.com>
@mergify
Copy link

mergify bot commented Nov 9, 2025

Documentation preview: https://vllm--28357.org.readthedocs.build/en/28357/

@mergify mergify bot added the documentation Improvements or additions to documentation label Nov 9, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the sleep mode documentation with examples for level 2 sleep. While the Python API example is correct, the new HTTP API example for level 2 sleep has an incorrect sequence of commands that would cause it to fail. I've provided a critical review comment with a code suggestion to correct the command order and ensure the documentation is accurate and helpful for users.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 97 to 100
```bash
curl -X POST 'http://localhost:8000/sleep?level=2'
curl -X POST 'http://localhost:8000/collective_rpc' -H 'Content-Type: application/json' -d '{"method":"reload_weights"}'
curl -X POST 'http://localhost:8000/wake_up'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Document level-2 HTTP flow in the wrong order

The new level‑2 HTTP example calls POST /collective_rpc to reload weights before the engine is woken up and never specifies the required tags. After a level‑2 sleep all weight and KV cache allocations are unmapped; invoking reload_weights while the engine is still sleeping attempts to write into freed GPU buffers and will fail or leave the server in an undefined state. The Python example above correctly performs /wake_up?tags=weights first and then reloads weights, optionally followed by /wake_up?tags=kv_cache. The HTTP flow should follow the same sequence with tag parameters so users do not crash the service when copying the commands.

Useful? React with 👍 / 👎.

Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iAmir97 and others added 3 commits November 9, 2025 13:51
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com>
Signed-off-by: Amir Balwel <amir.balwel@embeddedllm.com>
Copy link
Collaborator

@tjtanaa tjtanaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tjtanaa tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 11, 2025
@tjtanaa tjtanaa enabled auto-merge (squash) November 11, 2025 06:33
@tjtanaa tjtanaa merged commit a7adbc6 into vllm-project:main Nov 11, 2025
9 checks passed
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Nov 13, 2025
Signed-off-by: Amir Balwel <amir.balwel@embeddedllm.com>
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com>
Co-authored-by: Amir Balwel <amir.balwel@embeddedllm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
Signed-off-by: Amir Balwel <amir.balwel@embeddedllm.com>
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com>
Co-authored-by: Amir Balwel <amir.balwel@embeddedllm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants