Skip to content

Conversation

@coolkp
Copy link
Collaborator

@coolkp coolkp commented Nov 13, 2025

Changes

  • bugfix block_q_dq=None if attention_kernel == "tokamax_flash" else min(q_max_block_size, query.shape[2]), when block sizes are not specified.
  • Use specified q block sizes in cross attenttion, only overwrite KV blocks sizes to safe version
  • Log overriding of user changes for tokamax and automatically safe override of block size config for tokamax
  • test to check 'tokamax_flash' and 'flash' attention variants
  • Documentation of block sizes and usage.

…ect when 'tokamax_flash' requested

Signed-off-by: Kunjan Patel <kunjanp@google.com>
…ect when 'tokamax_flash' requested

Signed-off-by: Kunjan Patel <kunjanp@google.com>
…ect when 'tokamax_flash' requested

Signed-off-by: Kunjan Patel <kunjanp@google.com>
Signed-off-by: Kunjan Patel <kunjanp@google.com>
Signed-off-by: Kunjan Patel <kunjanp@google.com>
…ect when 'tokamax_flash' requested

Signed-off-by: Kunjan Patel <kunjanp@google.com>
…ect when 'tokamax_flash' requested

Signed-off-by: Kunjan Patel <kunjanp@google.com>
…ect when 'tokamax_flash' requested

Signed-off-by: Kunjan Patel <kunjanp@google.com>
…ect when 'tokamax_flash' requested

Signed-off-by: Kunjan Patel <kunjanp@google.com>
…ect when 'tokamax_flash' requested

Signed-off-by: Kunjan Patel <kunjanp@google.com>
…ect when 'tokamax_flash' requested

Signed-off-by: Kunjan Patel <kunjanp@google.com>
…ect when 'tokamax_flash' requested

Signed-off-by: Kunjan Patel <kunjanp@google.com>
Signed-off-by: Kunjan Patel <kunjanp@google.com>

## How block sizes matter for perfomance and accuracy

Block sizes key to saturating HBM bandwidth and ensuring maximum possible overlap of computation on cores with HBM use and VMEM to VREG. It is highly reccomended to tune them.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - mispelled recommended.

@coolkp coolkp merged commit 4896870 into main Nov 14, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants