feat(serve): add --generation-config CLI for server sampling defaults by lvhan028 · Pull Request #4708 · InternLM/lmdeploy

lvhan028 · 2026-06-25T08:58:49Z

Motivation

LMDeploy's OpenAI-compatible API server previously relied on hard-coded protocol defaults for sampling parameters (e.g. temperature=0.7, top_k=40). This made it difficult to align server behavior with a model's own HuggingFace generation_config.json, which many models ship with recommended decoding settings.

vLLM exposes a similar --generation-config flag to load HF generation defaults at startup. This PR brings comparable behavior to LMDeploy:

Load a model's HF generation_config.json as server-side defaults when users omit sampling fields in requests.
Allow opting out via --generation-config lmdeploy to keep LMDeploy/GenerationConfig defaults.
Support loading from a custom folder path when needed.

To support proper merge semantics, protocol sampling fields are changed to default to None (meaning "not specified by the user") instead of fixed values. Unspecified fields fall back to HF defaults (if loaded) and then to GenerationConfig dataclass defaults.

Modification

CLI

Add --generation-config to lmdeploy serve api_server (default: auto).
- auto: load generation_config.json from the model path via HuggingFace GenerationConfig.from_pretrained().
- lmdeploy: do not load HF config; use LMDeploy defaults only.
- <path>: load from a custom directory.

Core module (`lmdeploy/serve/core/generation_config.py`)

Introduce a small helper module

API server integration

Parse --generation-config once at startup and store the result in VariableInterface.default_gen_config.
Route chat completions, completions, generate, Responses API, and Anthropic endpoints through build_generation_config() for consistent merge behavior.
Remove redundant same-name pass-through kwargs at call sites; keep only renamed (stop → stop_words, seed → random_seed), computed (logprobs), or raw-json fields (migration_request, with_cache, preserve_cache) in extra_kwargs.

Protocol changes

Set sampling-related fields in OpenAI/Responses/Anthropic protocols to None defaults (e.g. temperature, top_p, top_k, repetition_penalty, min_p) so "user did not send" can be distinguished from an explicit value.

Behavior notes

When --generation-config lmdeploy and the user omits sampling params, defaults come from GenerationConfig (temperature=0.8, top_k=50, etc.), not the old protocol defaults (0.7 / 40).
max_new_tokens is not taken from HF config; it is resolved from max_completion_tokens / max_tokens on the request, with engine-level fallback when unset.
do_sample=True is always set for serving requests.

Load HuggingFace generation_config.json as server-side defaults when requests omit sampling fields, with merge priority request > HF config > GenerationConfig defaults. Filter unsupported HF keys before building GenerationConfig, extract explicit request overrides via exclude_unset, and align /generate sampling protocol defaults with other endpoints. Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.

grimoire · 2026-07-02T12:43:25Z

+    # while leaving plain Pydantic defaults available for server defaults.
+    return {
+        key: value
+        for key, value in request.model_dump(exclude_unset=True).items()


Since ResponsesRequest uses extra='allow' at lmdeploy/serve/openai/responses/protocol.py:36, /v1/responses clients can send unsupported GenerationConfig fields like return_ppl, with_cache, migration_request, output_logits, or stop_token_ids, and they flow into the engine. I confirmed return_ppl=True, with_cache=True, stop_token_ids=[1] becomes active in GenerationConfig. Please restrict extraction to declared request fields or an endpoint-specific allowlist.

from agent.

Copilot AI review requested due to automatic review settings June 25, 2026 08:58

lvhan028 added the improvement label Jun 25, 2026

Copilot started reviewing on behalf of lvhan028 June 25, 2026 08:59 View session

This comment was marked as outdated.

Sign in to view

lvhan028 force-pushed the feat/generation-config-cli branch from 4d9cfa8 to 1cb9465 Compare July 2, 2026 03:29

lvhan028 force-pushed the feat/generation-config-cli branch from 7e46bed to 2ba421b Compare July 2, 2026 06:26

lvhan028 requested review from Copilot and grimoire July 2, 2026 06:27

Copilot started reviewing on behalf of lvhan028 July 2, 2026 06:27 View session

Copilot AI reviewed Jul 2, 2026

View reviewed changes

Comment thread lmdeploy/serve/core/generation_config.py

grimoire reviewed Jul 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(serve): add --generation-config CLI for server sampling defaults#4708

feat(serve): add --generation-config CLI for server sampling defaults#4708
lvhan028 wants to merge 1 commit into
InternLM:mainfrom
lvhan028:feat/generation-config-cli

lvhan028 commented Jun 25, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

grimoire Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

lvhan028 commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modification

CLI

Core module (lmdeploy/serve/core/generation_config.py)

API server integration

Protocol changes

Behavior notes

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

grimoire Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lvhan028 commented Jun 25, 2026 •

edited

Loading

Core module (`lmdeploy/serve/core/generation_config.py`)