[Optimization]Streaming requests return complete special tokens. by luukunn · Pull Request #6998 · PaddlePaddle/FastDeploy

luukunn · 2026-03-24T13:25:27Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-03-24T13:25:43Z

Thanks for your contribution!

Copilot

Pull request overview

该 PR 旨在优化 OpenAI 兼容的 streaming 响应：当引擎输出被标记为 skipped 时，如果客户端开启 return_token_ids，仍然返回对应的 token ids（从而让“特殊 token/完整 token 流”在 streaming 场景下不丢失）。

Changes:

在 completion streaming 中：仅在 skipped 且 return_token_ids=False 时跳过输出；否则返回空 text 并携带 token_ids。
在 chat completion streaming 中：同样调整 skipped 分支逻辑，并在 skipped 时返回空内容（文本/多模态）。
将 tool_calls 的检测提前，以便即使该帧被跳过也能正确影响最终 finish_reason 判定。

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
fastdeploy/entrypoints/openai/serving_completion.py	调整 streaming 中对 `skipped` 的处理逻辑，使 `return_token_ids=True` 时仍能输出 token ids（text 为空）。
fastdeploy/entrypoints/openai/serving_chat.py	调整 chat streaming 的 `skipped` 处理与内容填充逻辑（含 multimodal 分支）。

Copilot · 2026-03-24T13:29:14Z

fastdeploy/entrypoints/openai/serving_chat.py

-
                    if response_processor.enable_multimodal_content():
-                        delta_message.multimodal_content = output["multipart"]
+                        delta_message.multimodal_content = [{}] if output["skipped"] else output["multipart"]


当启用 multimodal 输出且 output["skipped"] 为 True 时，这里把 delta_message.multimodal_content 设为 [{}] 会导致返回的 multipart item 缺少上游约定的字段（例如 response_processors 里文本部分是 {type: "text", text: ...}），可能破坏客户端对 multimodal_content 结构的解析。建议在 skipped 时也返回一个结构一致的占位项（例如 type/text 为空字符串），再在其上附加 completion_token_ids。

Suggested change

delta_message.multimodal_content = [{}] if output["skipped"] else output["multipart"]

delta_message.multimodal_content = (

[{"type": "text", "text": ""}] if output["skipped"] else output["multipart"]

)

Copilot · 2026-03-24T13:29:15Z

fastdeploy/entrypoints/openai/serving_completion.py

                    delta_message = CompletionResponseStreamChoice(
                        index=idx,
-                        text=output["text"],
+                        text="" if output["skipped"] else (output["text"] or ""),
                        prompt_token_ids=None,


当前 PR 描述仍是模板占位（Motivation/Modifications/Usage/Accuracy Tests 未填写），且标题格式建议去掉引号并在 tag 后补空格（如 [Optimization] Streaming ...），否则后续回溯变更目的与验证方式会比较困难。请补充：为何要让 streaming 返回 skipped token 的 token_ids、预期行为/兼容性说明、以及至少一个可复现或验证的命令/用例。

Copilot · 2026-03-24T13:29:15Z

fastdeploy/entrypoints/openai/serving_completion.py

+                    if output["skipped"] and not request.return_token_ids:
+                        continue
+
                    delta_message = CompletionResponseStreamChoice(
                        index=idx,
-                        text=output["text"],
+                        text="" if output["skipped"] else (output["text"] or ""),
                        prompt_token_ids=None,


这里新增了 output["skipped"] and not request.return_token_ids 的分支，行为变化是：当 return_token_ids=True 时，即使 skipped 也会发送一个 text 为空的 chunk（携带 token_ids）。现有 tests/entrypoints/openai/test_serving_completion.py 的 streaming 用例只覆盖 skipped=False，建议补充 skipped=True 的场景（分别覆盖 return_token_ids=True/False），以避免后续回归（例如错误地继续 skip 导致 token_ids 丢失）。

Copilot · 2026-03-24T13:29:15Z

fastdeploy/entrypoints/openai/serving_chat.py

                    if response_processor.enable_multimodal_content():
-                        delta_message.multimodal_content = output["multipart"]
+                        delta_message.multimodal_content = [{}] if output["skipped"] else output["multipart"]
                    else:
-                        delta_message.content = output["text"]
+                        delta_message.content = "" if output["skipped"] else (output["text"] or "")


针对 skipped + multimodal + return_token_ids 的组合，当前实现会产出不含 type/text 的 multipart item（仅有 completion_token_ids）。tests/entrypoints/openai/test_serving_chat.py 虽覆盖了 skipped=True，但未断言 multipart item 的结构一致性。建议在这里补充/更新测试，明确 skipped chunk 的 multimodal_content 至少包含 type: "text" 与空 text，并包含 completion_token_ids。

codecov-commenter · 2026-03-24T15:55:04Z

Codecov Report

❌ Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@6cff780). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/entrypoints/openai/serving_chat.py	66.66%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #6998   +/-   ##
==========================================
  Coverage           ?   73.84%           
==========================================
  Files              ?      399           
  Lines              ?    56045           
  Branches           ?     8849           
==========================================
  Hits               ?    41385           
  Misses             ?    11732           
  Partials           ?     2928

Flag	Coverage Δ
GPU	`73.84% <80.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

luukunn added 3 commits March 24, 2026 17:43

return special token

6b52579

add completions

292f5f3

update

21dc999

Copilot AI review requested due to automatic review settings March 24, 2026 13:25

luukunn temporarily deployed to Metax_ci March 24, 2026 13:25 — with GitHub Actions Inactive

Copilot started reviewing on behalf of luukunn March 24, 2026 13:25 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Optimization]Streaming requests return complete special tokens.#6998

[Optimization]Streaming requests return complete special tokens.#6998
luukunn wants to merge 3 commits intoPaddlePaddle:developfrom
luukunn:return_token_ids

luukunn commented Mar 24, 2026

Uh oh!

paddle-bot bot commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

codecov-commenter commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-                        delta_message.multimodal_content = [{}] if output["skipped"] else output["multipart"]
+                        delta_message.multimodal_content = (
+                            [{"type": "text", "text": ""}] if output["skipped"] else output["multipart"]
+                        )

Conversation

luukunn commented Mar 24, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Mar 24, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants