Skip to content

server: reserve context budget for DSML tool calls#105

Draft
gmontana wants to merge 1 commit into
antirez:mainfrom
gmontana:fix/tool-call-context-budget
Draft

server: reserve context budget for DSML tool calls#105
gmontana wants to merge 1 commit into
antirez:mainfrom
gmontana:fix/tool-call-context-budget

Conversation

@gmontana
Copy link
Copy Markdown

Summary

Mitigates #48 for the ds4 DeepSeek V4 Flash tool-enabled chat path by reserving the last 256 decode tokens for DSML tool-call closure.

For requests with tools, ordinary text generation stops at a soft limit before the hard context limit. If generation is already inside a DSML tool call, or is at the soft limit with a partial tool-start marker at the end of the generated text, decoding can use the reserve.

Notes

This is intentionally a small server-side budget guard, not constrained decoding. Tool-enabled chats may finish with finish_reason=length up to 256 tokens earlier than the hard context limit when no tool call is in progress. Oversized tool arguments can still reach the hard limit and hit the existing unterminated tool call backstop.

The KV continued-checkpoint gate is left with its previous condition; this PR only changes decode budget decisions.

Validation

  • git diff --check
  • make ds4_test
  • ./ds4_test --server
  • make ds4-server

I did not add a full model-backed reproduction test; the new coverage is a focused server-unit test for the decode budget logic and soft-limit transition.

For tool-enabled DeepSeek V4 chat requests, keep ordinary text generation out of the final 256 context tokens and allow that reserve only while a DSML tool call or partial tool-start marker is in progress.

This mitigates antirez#48 without adding constrained decoding. Oversized tool arguments can still reach the hard context limit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant