-
Notifications
You must be signed in to change notification settings - Fork 61
Temporary Fix for FP16 -> FP8 conversion failure on -0.0 #2387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
guangyey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question: how do you identify this is a compiler issue, any reproducer founded or regression compiler version detected?
@guangyey Thanks for the question. We found that this issue does not occur with the following explicit fp16->fp32->fp8 conversion: however, we will get The key difference between these two cases is that the conversion in the first case is submitted as two kernels, but the conversion in the second one is submitted as one kernel, where some optimizations exist in the second case. Such conjecture has been confirmed by a reproducer in local. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| template <> | ||
| struct CastScalarFunc<Half, Float8_e4m3fn> { | ||
| C10_HOST_DEVICE Float8_e4m3fn operator()(Half src_val) const { | ||
| return Float8_e4m3fn(c10::detail::fp16_ieee_to_fp32_value(src_val.x)); | ||
| } | ||
| }; | ||
|
|
||
| template <> | ||
| struct CastScalarFunc<Half, Float8_e4m3fnuz> { | ||
| C10_HOST_DEVICE Float8_e4m3fnuz operator()(Half src_val) const { | ||
| return Float8_e4m3fnuz(c10::detail::fp16_ieee_to_fp32_value(src_val.x)); | ||
| } | ||
| }; | ||
|
|
||
| template <> | ||
| struct CastScalarFunc<Half, Float8_e5m2> { | ||
| C10_HOST_DEVICE Float8_e5m2 operator()(Half src_val) const { | ||
| return Float8_e5m2(c10::detail::fp16_ieee_to_fp32_value(src_val.x)); | ||
| } | ||
| }; | ||
|
|
||
| template <> | ||
| struct CastScalarFunc<Half, Float8_e5m2fnuz> { | ||
| C10_HOST_DEVICE Float8_e5m2fnuz operator()(Half src_val) const { | ||
| return Float8_e5m2fnuz(c10::detail::fp16_ieee_to_fp32_value(src_val.x)); | ||
| } | ||
| }; | ||
|
|
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The four template specializations contain duplicated logic (identical implementation pattern with only the return type differing). Consider extracting this into a helper function template or macro to reduce code duplication and improve maintainability. For example, a helper template could be: template<typename Float8Type> Float8Type half_to_float8(Half src_val) { return Float8Type(c10::detail::fp16_ieee_to_fp32_value(src_val.x)); }
| template <> | |
| struct CastScalarFunc<Half, Float8_e4m3fn> { | |
| C10_HOST_DEVICE Float8_e4m3fn operator()(Half src_val) const { | |
| return Float8_e4m3fn(c10::detail::fp16_ieee_to_fp32_value(src_val.x)); | |
| } | |
| }; | |
| template <> | |
| struct CastScalarFunc<Half, Float8_e4m3fnuz> { | |
| C10_HOST_DEVICE Float8_e4m3fnuz operator()(Half src_val) const { | |
| return Float8_e4m3fnuz(c10::detail::fp16_ieee_to_fp32_value(src_val.x)); | |
| } | |
| }; | |
| template <> | |
| struct CastScalarFunc<Half, Float8_e5m2> { | |
| C10_HOST_DEVICE Float8_e5m2 operator()(Half src_val) const { | |
| return Float8_e5m2(c10::detail::fp16_ieee_to_fp32_value(src_val.x)); | |
| } | |
| }; | |
| template <> | |
| struct CastScalarFunc<Half, Float8_e5m2fnuz> { | |
| C10_HOST_DEVICE Float8_e5m2fnuz operator()(Half src_val) const { | |
| return Float8_e5m2fnuz(c10::detail::fp16_ieee_to_fp32_value(src_val.x)); | |
| } | |
| }; | |
| // Helper function template for Half to Float8_* conversion | |
| template <typename Float8Type> | |
| C10_HOST_DEVICE Float8Type half_to_float8(Half src_val) { | |
| return Float8Type(c10::detail::fp16_ieee_to_fp32_value(src_val.x)); | |
| } | |
| // Partial specialization for CastScalarFunc<Half, Float8Type> | |
| template <typename Float8Type> | |
| struct CastScalarFunc<Half, Float8Type> { | |
| C10_HOST_DEVICE Float8Type operator()(Half src_val) const { | |
| return half_to_float8<Float8Type>(src_val); | |
| } | |
| }; |
#2219
To temporarily work around the issue where FP16's -0.0 is erroneously converted to NaN during certain fusion passes (fp16 -> fp32 -> fp8), we are currently avoiding the use of the sycl::half data type in the intermediate conversion steps.
This bypass prevents the problematic fusion from occurring, ensuring correct handling of the negative zero value until the error is fixed.