Hello :
We notice that the ref function in float8_blockwise_gemm is torch._scaled_mm, however torch._scaled_mm may call cublasLtMatmul , which is same to what TE blockwise gemm calls.
So I'm confused that this utest will pass even when the blas gemm returns wrong result. What is the purpose of this utest?