-
Notifications
You must be signed in to change notification settings - Fork 270
Refactor vector type to reduce build times #3641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request refactors the vector_type implementation to reduce build times by consolidating specialized template implementations into a generalized design. The changes aim to improve frontend parsing times by reducing redundant code and backend codegen times by replacing recursive template instantiations with concrete implementations.
Changes:
- Generalizes vector_type partial specializations into a single class with helper structs (vector_type_storage, non_native_vector_base)
- Replaces recursive StaticallyIndexedArray with concrete StaticallyIndexedArray_v2
- Fixes bool datatype handling with vector_type to avoid data slicing issues
- Adds new math utilities: integer_log2_floor and is_power_of_two_integer
- Updates next_power_of_two to handle edge cases (X <= 1)
- Introduces default scalar_type template specialization with unsigned _BitInt fallback
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| include/ck/utility/statically_indexed_array.hpp | Adds blank line (cosmetic change) |
| include/ck/utility/math.hpp | Fixes next_power_of_two for edge cases, adds integer_log2_floor and is_power_of_two_integer functions |
| include/ck/utility/data_type.hpp | Refactors scalar_type with default template and typename keywords, removes redundant next_pow2 |
| include/ck/utility/dtype_vector.hpp | Major refactor: introduces NativeVectorT alias, vector_type_storage helper, generalized vector_type class with is_as_type_cast_valid validation, consolidates non_native_vector_base specializations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // TODO: x need to be 1 ~ 0x7fffffff | ||
| // __builtin_clz will produce unexpected result if x is 0; | ||
| return 31 - __builtin_clz(x); | ||
| } | ||
|
|
||
| __host__ __device__ constexpr bool is_power_of_two_integer(int32_t x) | ||
| { | ||
| // TODO: x need to be 1 ~ 0x7fffffff | ||
| return x == (1 << integer_log2_floor(x)); |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is_power_of_two_integer function has a potential issue when x equals 1. When x is 1, integer_log2_floor(1) returns 0 (since 31 - __builtin_clz(1) = 31 - 31 = 0), and then (1 << 0) = 1, so the function correctly returns true. However, this edge case behavior should be verified. Additionally, the function can return incorrect results for x <= 0 per the TODO on line 244, but there's no runtime check to prevent such inputs. Consider adding a static_assert or runtime check if this function is called with compile-time constant expressions.
| // TODO: x need to be 1 ~ 0x7fffffff | |
| // __builtin_clz will produce unexpected result if x is 0; | |
| return 31 - __builtin_clz(x); | |
| } | |
| __host__ __device__ constexpr bool is_power_of_two_integer(int32_t x) | |
| { | |
| // TODO: x need to be 1 ~ 0x7fffffff | |
| return x == (1 << integer_log2_floor(x)); | |
| // x needs to be in the range [1, 0x7fffffff]. | |
| // __builtin_clz is undefined for x <= 0; guard against that here. | |
| return (x > 0) ? (31 - __builtin_clz(static_cast<uint32_t>(x))) : int32_t{-1}; | |
| } | |
| __host__ __device__ constexpr bool is_power_of_two_integer(int32_t x) | |
| { | |
| // x needs to be in the range [1, 0x7fffffff]. For x <= 0, this is not a power of two. | |
| return (x > 0) && (x == (int32_t(1) << integer_log2_floor(x))); |
f46b373 to
bea883a
Compare
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ype is visible where needed
…se for pk_i4_t, but should be addressed in the future.
…es incorrect slicing calculations from scalar_type::vector_size with f6_pk_t types.
8219757 to
cffc2f5
Compare
| * @tparam T The element type of the vector | ||
| * @tparam Rank The number of elements in the vector | ||
| */ | ||
| template <typename T, index_t Rank> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using this attribute requires T to be a builtin. maybe it's possible to use some concepts here, conditionally if they are available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, once we have ability to use concepts, we can make some requirements about attribute support. For now we assume Clang supports it.
| template <typename T> | ||
| inline constexpr bool is_native_type() | ||
| { | ||
| return is_same_v<T, double> || is_same_v<T, float> || is_same_v<T, half_t> || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference to std::is_arithmetic_v?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::is_arithmetic_v == false for _Float16 and _BitInt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was original code, didn't want to change it :)
|
|
||
| /// @brief scalar_type trait override for uint32_t vector of size 3 | ||
| template <> | ||
| struct scalar_type<uint32_t __attribute__((__vector_size__(sizeof(uint32_t) * 3)))> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference between ext_vector_type and __vector_size__?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compiler treats them separately. I think I've discovered some compiler bugs that I'll report later. You can play with those here: https://godbolt.org/z/v87KEfx7d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each one is trivially constructible from the other, however some of the llvm __builtins return types use the vector_size notation while we mostly use the ext_vector_type due to the ability to alias them properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting observation! I liked this doc regarding vector_size https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html, it explains some details. In godbolt I tried to experiment by defining vectors as:
template<typename T, index_t Rank>
struct NativeVectorT3 {
using dtype = T;
using type = attribute((vector_size(Rank * sizeof(T)))) T;
static constexpr index_t vector_rank = Rank;
};
And this way we can deduce template argument and test is passing. I'm not sure if it brings us back to the previous design though XD
Regarding the NativeVectorT2, I would agree with compiler, we allocate a vector of floats 4 byte long, so effectively it is a vector of 1 float, so the test is failing.
geyyer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cgmillette, thanks for refactoring this! Removing a lot of extra code is good, and cutting build time is even better!
Proposed changes
Build times can be affected by many different things and is highly attributed to the way we write and use the code. Two critical areas of the builds are frontend parsing and backend codegen and compilation.
Frontend Parsing
The length of the code, the include header tree and macro expansions all affect the front-end parsing time.
This PR seeks to reduce the parsing time of the dtype_vector.hpp vector_type class by reducing redundant code by generalization.
Backend Codegen
Template instantiation behavior can also affect build times. Recursive instantiations are very slow versus concrete instantiations. The compiler must make multiple passes to expand template instantiations so we need to be careful about how they are used.
This union storage has been removed from the vector_type storage class.
Fixes
Additions
Build Time Analysis
Machine: banff-cyxtera-s78-2
Target: gfx942
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered