Skip to content

Extend v2 gemm#754

Open
coderfeli wants to merge 10 commits into
mainfrom
extend_v2_gemm
Open

Extend v2 gemm#754
coderfeli wants to merge 10 commits into
mainfrom
extend_v2_gemm

Conversation

@coderfeli

Copy link
Copy Markdown
Collaborator

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

coderfeli and others added 7 commits June 26, 2026 07:32
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- preshuffle_gemm_v2: fx.recast_iter(Int8, ...) instead of building an
  explicit PointerType (per @sjfeng1999)
- universal.py: restore the Current atom state / soffset docstring note

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bind the A (read) and C (store) buffer descriptors to the actual M extent
(num_records) so blocks covering rows past M for ragged M drop their OOB
loads/stores instead of faulting / writing past the allocation, matching v1
(preshuffle_gemm.py) behavior.

Test: the shared harness now allocates guard rows past M (sentinel-filled)
and asserts nothing is written beyond row M. Adds test_v2_preshuffle_c_store_oob
covering the v2 kernel (fp8/int8/fp16/bf16) on a ragged M=33 shape.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant