Skip to content

refactor: rename quant policy to kv cache dtype#4718

Draft
CUHKSZzxy wants to merge 1 commit into
InternLM:mainfrom
CUHKSZzxy:refactor/kv-cache-dtype
Draft

refactor: rename quant policy to kv cache dtype#4718
CUHKSZzxy wants to merge 1 commit into
InternLM:mainfrom
CUHKSZzxy:refactor/kv-cache-dtype

Conversation

@CUHKSZzxy

@CUHKSZzxy CUHKSZzxy commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Rename the KV-cache quantization selector from quant_policy to kv_cache_dtype.
  • Replace the CLI flag with --kv-cache-dtype and keep aliases such as auto, int4, int8, fp8, fp8_e4m3, fp8_e5m2, and turbo_quant.
  • Update the PyTorch and TurboMind config plumbing, attention metadata, KV-cache kernels, docs, autotest configs, and focused tests.

Motivation

This is a follow-up to the KV-cache FP8 quantization PR.

  • quant_policy is ambiguous across weight, activation, and KV-cache quantization.
  • This option selects the KV-cache storage dtype/layout.
  • kv_cache_dtype aligns with vLLM and SGLang naming.

Validation

  • Pre-commit checks on the touched files.
  • Focused pytest coverage for the FP8 KV-cache dtype config path and FA3 attention path.
  • CUDA KV-cache kernel tests for fill, flatten, and paged attention.

Assistance

Assisted with Codex + GPT-5.5 xHigh Fast, reviewed manually

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant