Skip to content

Conversation

@rolfmorel
Copy link
Contributor

@rolfmorel rolfmorel commented Oct 7, 2025

Basic as can be torch-mlir converter for the level1 and level2 KernelBench kernels. The convert-kernel-bench-to-mlir.py script does the conversion and dumps the results in the cache/level1 and cache/level2 folders alongside the script.

56 of the 200 kernels are filtered out as they either crash torch-mlir or yield very big .mlir files. This ignore_list is meant to be amended as these issues get addressed, e.g. by altering init_inputs on a per kernel basis.

The conversion script sticks to outputting just linalg for now. As it does this, it does do some basic post-processing of torch-mlir's output, namely it runs the -linalg-specialize-generic-ops pass.

from mlir import ir, passmanager
from torch_mlir import fx

kernels_as_pytorch_folder = Path(__file__).parent / "KernelBench" / "KernelBench"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this depends on where the git was cloned in the bash script, perhaps that last step (clone) could be done in this script as well?

Copy link
Contributor Author

@rolfmorel rolfmorel Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure.

Doing a git clone in either script feels unclean. I also don't like the idea of it being a submodule as that then seems to imply you have to clone KernelBench to do anything useful with lighthouse. It seems to me KernelBench will be just one source of ingress compute graphs of interest, with it potentially making sense to allow users/CI to opt-in to which paths they want to run tests with. What's the right mechanism for that? I am not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KernelBench is NOT an ingress. Torch-MLIR is.

We now have three PRs that work with FX importer, none using the other. We should have one FX importer script that is used by others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The importer impasse has been resolved.

Whether the KernelBench submodule and converter script should live in this "ingress" directory is up to taste. I will defer to anyone who suggests a better path.

if not all(
hasattr(module, a) for a in ("Model", "get_inputs", "get_init_inputs")
):
print(f"Error: module in file {kernel_pytorch_file} not a proper benchmark")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to mark error so to return non-zero at the end upon any such continue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the uncaptured exception raised by the following line will terminate the whole script and exit with non-zero.

If we prefer to perform a more graceful exit, let me know.

My take is that an exception being raised here is truly unexpected and hence would provide us valuable info in case a user is able to include it in their report.

Basic as can be torch-mlir converter for the level1 and level2
KernelBench kernels. The `convert-kernel-bench-to-mlir.py` script does
the conversion and dumps the results in the `cache/level1` and
`cache/level2` folders.

Relies on pre-packaged mlir wheels and mlir-torch, as this PR considers
dealing with versioning and packaging an orthogonal matter to getting
ingress up and running.

About ~55 of the 200 kernels are filtered out as they either crash
torch-mlir or yield very big .mlir files. This ignore_list is meant to
be amended as these issues get addressed, e.g. by altering init_inputs
on a per kernel basis.

The conversion script sticks to outputting just linalg for now. As it
does this, it does do some basic post-processing of torch-mlir's output,
namely it runs the -linalg-specialize-generic-ops pass.
@rolfmorel rolfmorel force-pushed the users/rolfmorel/kernelbench-ingress branch from 63b8240 to 7b2309a Compare November 9, 2025 22:26
@rolfmorel
Copy link
Contributor Author

I thought to leave the following here:

$ time uv run convert-kernel-bench-to-mlir.py                                                                    
Processing: level1/100_HingeLoss.py                                                                                                                         
Processing: level1/10_3D_tensor_matrix_multiplication.py                                                                                                    
Processing: level1/11_4D_tensor_matrix_multiplication.py                                                                                                    
Skipping: level1/12_Matmul_with_diagonal_matrices_.py
...
Processing: level2/96_ConvTranspose3d_Multiply_Max_GlobalAvgPool_Clamp.py
Skipping: level2/97_Matmul_BatchNorm_BiasAdd_Divide_Swish.py
Skipping: level2/98_Matmul_AvgPool_GELU_Scale_Max.py
Skipping: level2/99_Matmul_GELU_Softmax.py
Skipping: level2/9_Matmul_Subtract_Multiply_ReLU.py

real    6m15.501s
user    5m29.552s
sys     1m24.632s
$ ls -l cache/* | grep .mlir | wc -l
144

That is, even with the worst offenders filtered out, using vanilla torch-mlir to convert these 144 simple NNs is still terribly slow. I expect this is in no small part due to the huge dialect_resources: { builtin: { torch_tensor_...float...: "0x040... binary blobs that get tacked onto the IR. We need to find a way to get torch-mlir to do a more sensible thing for us.

@rolfmorel rolfmorel marked this pull request as ready for review November 9, 2025 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants