Skip to content

[bugfix] Gemma4 suffix: drop trailing newline#9440

Open
shanhaoli wants to merge 1 commit into
modelscope:mainfrom
shanhaoli:fix/gemma4-suffix
Open

[bugfix] Gemma4 suffix: drop trailing newline#9440
shanhaoli wants to merge 1 commit into
modelscope:mainfrom
shanhaoli:fix/gemma4-suffix

Conversation

@shanhaoli
Copy link
Copy Markdown

@shanhaoli shanhaoli commented May 28, 2026

Summary

  • Gemma4TemplateMeta.suffix was ['<turn|>\n'], causing the base template's _add_dynamic_eos to reopen supervision on both <turn|> (106) and \n (107) at the end of each response.
  • The trailing \n belongs to the next user turn's prompt; the official Gemma IT model is never trained to predict it (inferred by testing on IT checkpoint), so supervising it is spurious.
  • Change suffix to ['<turn|>']. chat_sep stays ['<turn|>\n'] so the multi-turn wire format is unchanged — only the supervised position set shrinks by one token per response.

Test plan

  • Encode a Gemma4 sample and confirm labels ends with [..., <answer_token>, 106] (only <turn|> supervised) instead of [..., <answer_token>, 106, 107].
  • Multi-turn conversation still tokenizes with <turn|>\n between turns (verify input_ids unchanged).

🤖 Generated with Claude Code

…arn it

The base template's `_add_dynamic_eos` reopens supervision on tokens
matching `suffix_tokens_id` after each response. With
`suffix = ['<turn|>\n']`, both `<turn|>` (106) and `\n` (107) get
supervised, but `\n` belongs to the next user turn's prompt and the
official IT model is never trained to predict it.

Keep `chat_sep = ['<turn|>\n']` so the multi-turn wire format is
unchanged; only `suffix` is shortened to `['<turn|>']`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Gemma4TemplateMeta class in swift/template/templates/gemma.py by removing the trailing newline from the suffix field, changing it from ['<turn|> '] to ['<turn|>']. There are no review comments to address, and I have no additional feedback to provide.

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

This is fine and consistent with the Jinja template.
The Jinja template also includes a trailing '\n'.

@shanhaoli
Copy link
Copy Markdown
Author

This is fine and consistent with the Jinja template. The Jinja template also includes a trailing '\n'.

Yes, the Jinja template would add '<turn|>\n' to every model responses. This is why in code, chat_sep remain unchanged as '<turn|>\n'.

This PR argues that learning the extra '\n' is unnecessary, which is consistent with previous implementations such as Gemma and DeepSeek.

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

Either implementation is fine — it has no impact on training outcomes.

@shanhaoli
Copy link
Copy Markdown
Author

It has impact on training due to learning the extra newline. Though I agree that it is a minor impact.
Feel free to close the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants