[bugfix] Gemma4 suffix: drop trailing newline by shanhaoli · Pull Request #9440 · modelscope/ms-swift

shanhaoli · 2026-05-28T10:34:23Z

Summary

Gemma4TemplateMeta.suffix was ['<turn|>\n'], causing the base template's _add_dynamic_eos to reopen supervision on both <turn|> (106) and \n (107) at the end of each response.
The trailing \n belongs to the next user turn's prompt; the official Gemma IT model is never trained to predict it (inferred by testing on IT checkpoint), so supervising it is spurious.
Change suffix to ['<turn|>']. chat_sep stays ['<turn|>\n'] so the multi-turn wire format is unchanged — only the supervised position set shrinks by one token per response.

Test plan

Encode a Gemma4 sample and confirm labels ends with [..., <answer_token>, 106] (only <turn|> supervised) instead of [..., <answer_token>, 106, 107].
Multi-turn conversation still tokenizes with <turn|>\n between turns (verify input_ids unchanged).

🤖 Generated with Claude Code

…arn it The base template's `_add_dynamic_eos` reopens supervision on tokens matching `suffix_tokens_id` after each response. With `suffix = ['<turn|>\n']`, both `<turn|>` (106) and `\n` (107) get supervised, but `\n` belongs to the next user turn's prompt and the official IT model is never trained to predict it. Keep `chat_sep = ['<turn|>\n']` so the multi-turn wire format is unchanged; only `suffix` is shortened to `['<turn|>']`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request updates the Gemma4TemplateMeta class in swift/template/templates/gemma.py by removing the trailing newline from the suffix field, changing it from ['<turn|> '] to ['<turn|>']. There are no review comments to address, and I have no additional feedback to provide.

Jintao-Huang · 2026-05-28T16:13:23Z

This is fine and consistent with the Jinja template.
The Jinja template also includes a trailing '\n'.

shanhaoli · 2026-05-29T02:46:24Z

This is fine and consistent with the Jinja template. The Jinja template also includes a trailing '\n'.

Yes, the Jinja template would add '<turn|>\n' to every model responses. This is why in code, chat_sep remain unchanged as '<turn|>\n'.

This PR argues that learning the extra '\n' is unnecessary, which is consistent with previous implementations such as Gemma and DeepSeek.

Jintao-Huang · 2026-05-29T02:58:35Z

Either implementation is fine — it has no impact on training outcomes.

shanhaoli · 2026-06-01T04:08:49Z

It has impact on training due to learning the extra newline. Though I agree that it is a minor impact.
Feel free to close the PR.

gemini-code-assist Bot reviewed May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix] Gemma4 suffix: drop trailing newline#9440

[bugfix] Gemma4 suffix: drop trailing newline#9440
shanhaoli wants to merge 1 commit into
modelscope:mainfrom
shanhaoli:fix/gemma4-suffix

shanhaoli commented May 28, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Jintao-Huang commented May 28, 2026

Uh oh!

shanhaoli commented May 29, 2026

Uh oh!

Jintao-Huang commented May 29, 2026

Uh oh!

shanhaoli commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shanhaoli commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Jintao-Huang commented May 28, 2026

Uh oh!

shanhaoli commented May 29, 2026

Uh oh!

Jintao-Huang commented May 29, 2026

Uh oh!

shanhaoli commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shanhaoli commented May 28, 2026 •

edited

Loading