refactor: rename quant policy to kv cache dtype#4718

Draft

CUHKSZzxy wants to merge 1 commit into

InternLM:mainfrom

CUHKSZzxy:refactor/kv-cache-dtype

CUHKSZzxy commented Jun 29, 2026 •

edited

Loading

Collaborator

Summary

Rename the KV-cache quantization selector from quant_policy to kv_cache_dtype.
Replace the CLI flag with --kv-cache-dtype and keep aliases such as auto, int4, int8, fp8, fp8_e4m3, fp8_e5m2, and turbo_quant.
Update the PyTorch and TurboMind config plumbing, attention metadata, KV-cache kernels, docs, autotest configs, and focused tests.

Motivation

This is a follow-up to the KV-cache FP8 quantization PR.

quant_policy is ambiguous across weight, activation, and KV-cache quantization.
This option selects the KV-cache storage dtype/layout.
kv_cache_dtype aligns with vLLM and SGLang naming.

Validation

Pre-commit checks on the touched files.
Focused pytest coverage for the FP8 KV-cache dtype config path and FA3 attention path.
CUDA KV-cache kernel tests for fill, flatten, and paged attention.

Assistance

Assisted with Codex + GPT-5.5 xHigh Fast, reviewed manually


          refactor: rename quant policy to kv cache dtype

5dfe256

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet