-
Notifications
You must be signed in to change notification settings - Fork 4
[ingress][pytorch] Basic KernelBench to MLIR conversion #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| from mlir import ir, passmanager | ||
| from torch_mlir import fx | ||
|
|
||
| kernels_as_pytorch_folder = Path(__file__).parent / "KernelBench" / "KernelBench" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this depends on where the git was cloned in the bash script, perhaps that last step (clone) could be done in this script as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure.
Doing a git clone in either script feels unclean. I also don't like the idea of it being a submodule as that then seems to imply you have to clone KernelBench to do anything useful with lighthouse. It seems to me KernelBench will be just one source of ingress compute graphs of interest, with it potentially making sense to allow users/CI to opt-in to which paths they want to run tests with. What's the right mechanism for that? I am not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KernelBench is NOT an ingress. Torch-MLIR is.
We now have three PRs that work with FX importer, none using the other. We should have one FX importer script that is used by others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The importer impasse has been resolved.
Whether the KernelBench submodule and converter script should live in this "ingress" directory is up to taste. I will defer to anyone who suggests a better path.
| if not all( | ||
| hasattr(module, a) for a in ("Model", "get_inputs", "get_init_inputs") | ||
| ): | ||
| print(f"Error: module in file {kernel_pytorch_file} not a proper benchmark") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to mark error so to return non-zero at the end upon any such continue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the uncaptured exception raised by the following line will terminate the whole script and exit with non-zero.
If we prefer to perform a more graceful exit, let me know.
My take is that an exception being raised here is truly unexpected and hence would provide us valuable info in case a user is able to include it in their report.
fa1141e to
63b8240
Compare
Basic as can be torch-mlir converter for the level1 and level2 KernelBench kernels. The `convert-kernel-bench-to-mlir.py` script does the conversion and dumps the results in the `cache/level1` and `cache/level2` folders. Relies on pre-packaged mlir wheels and mlir-torch, as this PR considers dealing with versioning and packaging an orthogonal matter to getting ingress up and running. About ~55 of the 200 kernels are filtered out as they either crash torch-mlir or yield very big .mlir files. This ignore_list is meant to be amended as these issues get addressed, e.g. by altering init_inputs on a per kernel basis. The conversion script sticks to outputting just linalg for now. As it does this, it does do some basic post-processing of torch-mlir's output, namely it runs the -linalg-specialize-generic-ops pass.
63b8240 to
7b2309a
Compare
|
I thought to leave the following here: $ time uv run convert-kernel-bench-to-mlir.py
Processing: level1/100_HingeLoss.py
Processing: level1/10_3D_tensor_matrix_multiplication.py
Processing: level1/11_4D_tensor_matrix_multiplication.py
Skipping: level1/12_Matmul_with_diagonal_matrices_.py
...
Processing: level2/96_ConvTranspose3d_Multiply_Max_GlobalAvgPool_Clamp.py
Skipping: level2/97_Matmul_BatchNorm_BiasAdd_Divide_Swish.py
Skipping: level2/98_Matmul_AvgPool_GELU_Scale_Max.py
Skipping: level2/99_Matmul_GELU_Softmax.py
Skipping: level2/9_Matmul_Subtract_Multiply_ReLU.py
real 6m15.501s
user 5m29.552s
sys 1m24.632s
$ ls -l cache/* | grep .mlir | wc -l
144That is, even with the worst offenders filtered out, using vanilla torch-mlir to convert these 144 simple NNs is still terribly slow. I expect this is in no small part due to the huge |
Basic as can be torch-mlir converter for the level1 and level2 KernelBench kernels. The
convert-kernel-bench-to-mlir.pyscript does the conversion and dumps the results in thecache/level1andcache/level2folders alongside the script.56 of the 200 kernels are filtered out as they either crash torch-mlir or yield very big .mlir files. This ignore_list is meant to be amended as these issues get addressed, e.g. by altering init_inputs on a per kernel basis.
The conversion script sticks to outputting just linalg for now. As it does this, it does do some basic post-processing of torch-mlir's output, namely it runs the -linalg-specialize-generic-ops pass.