[AMX] syscall to enable AMX instructions on real hardware#6909
[AMX] syscall to enable AMX instructions on real hardware#6909frengels wants to merge 12 commits intohalide:mainfrom
Conversation
|
This seems to be a syscall that you make once and it enables AMX usage for the entire process? I'm not sure if Halide should be responsible for that if so. If it were scoped and you're supposed to enable and disable AMX access around every use of it, then it would make sense to inject that into our generated code, but without scoping, shouldn't the user be responsible for making sure their process has access to AMX? We could perhaps provide a user-callable function for it in the runtime module. The reason I'm wary is that it looks like this will only work on linux x86-64 (e.g. it hardcodes a syscall number), and it will actively fail on windows, macos, etc. Aren't people going to use AMX on Windows boxes at some point? |
| modules.push_back(get_initmod_x86_avx512_ll(c)); | ||
| } | ||
| if (t.has_feature(Target::AVX512_SapphireRapids)) { | ||
| modules.push_back(get_initmod_amx(c, bits_64, debug)); |
There was a problem hiding this comment.
This appears to have the possibility of injecting Linux specific code on non-Linux platforms.
|
I totally agree with Andrew's request for more info on how this works. To clarify the Linux specific nature of this, the syscall being Linux only is not a show stopper, but the code does need to indicate that this is the case and if other systems require similar system calls, we would likely want to mirror the support for those platforms. (Or at least setup the filenaming and such to allow for it.) E.g. see how |
|
It is correct that only need to perform the syscall once to enable AMX for the entire process. I agree that it should be called by the user then. I'll work on scoping the code correctly to x86_64 linux. |
| WEAK void halide_use_jit_module(); | ||
| WEAK void halide_release_jit_module(); | ||
|
|
||
| WEAK int halide_enable_amx(); |
There was a problem hiding this comment.
If we expect users to call this, I think you want the declaration in HalideRuntime.h instead.
There was a problem hiding this comment.
I'm having some trouble manually calling the function from outside of a kernel. I think it's because the runtime is only linked into the kernel's object file, I'm guessing I'd have to explicitly link the runtime into the hosting binary as well?
Whenever an AMX instruction is generated ReqPerm will be set to true, which as a result in a later lowering step will insert the appropriate calls to enable AMX instructions on a kernel which supports this.
Rename the file to indicate that this is linux specific code. Additionally makes sure the module is only available for linux OS.
This name reflects the intention better
|
You might need to add a reference to it in the list in src/runtime/runtime_api.cpp to stop it from getting dead-stripped when not linking a standalone runtime. |
| # `-fno-rtti` is necessary to allow us to use classes with virtual functions in the runtime code | ||
| RUNTIME_CXX_FLAGS = -std=c++17 -O3 -fno-vectorize -ffreestanding -fno-blocks -fno-exceptions -fno-unwind-tables -fno-threadsafe-statics -fno-rtti | ||
|
|
||
| $(BUILD_DIR)/initmod.amx_x86_32.ll: $(SRC_DIR)/runtime/amx.cpp $(BUILD_DIR)/clang_ok |
There was a problem hiding this comment.
FYI, these changes/additions will also need to be made in src/runtime/CMakeLists.txt -- we support CMake on all our platforms, whereas Make only on a small percentage.
The current work on AMX with #6582 has made it possible to generate AMX instructions for various tile allocations on the emulator. However for the real hardware there's a
syscallrequired to enable the feature in the kernel, I've used the following (5.16 is the lowest kernel version supported) as inspiration and tested the result on intel's devcloud.A few concerns I have:
syscallwill enable the instructions on real hardware but trigger a fault in the emulator.syscall?src/runtime/x86_cpu_features.cpp?halide_default_can_use_target_features.CpuFeaturesif the CPU could possibly support AMX.Looking forward to any feedback.