-
Notifications
You must be signed in to change notification settings - Fork 270
[CK-Tile] Improve cshuffle epilogue mfma tile coverage #3701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
tenpercent
wants to merge
21
commits into
develop
Choose a base branch
from
cshuffle-epilogue-tests
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+545
−167
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ions This commit expands the CShuffleEpilogue test coverage with: - Parameterized tests for multiple tile configurations - Half precision (FP16) tests with 5 different tile sizes - FP8 tests with 7 different tile sizes and MFMA configurations - Fix for acc_buffer initialization to fill entire buffer - Improved test organization with TileConfig template - Support for OCP FP8 flag in host code Test configurations cover various wave distributions (1x4, 2x2, 4x1) and XDL sizes to ensure correctness across different hardware patterns. Co-Authored-By: Claude <noreply@anthropic.com>
Split the parameterized tests into separate compilation units to reduce build times and enable parallel compilation: - test_cshuffle_epilogue_fp16.cpp: 5 FP16 configs + scale tests (100 LOC) - test_cshuffle_epilogue_fp8.cpp: 7 FP8 configs (34 LOC) - test_cshuffle_epilogue_common.hpp: Shared test infrastructure (84 LOC) This allows the FP16 and FP8 test suites to compile in parallel, significantly reducing overall build time compared to a single monolithic test file. Co-Authored-By: Claude <noreply@anthropic.com>
- Remove 'using namespace' directive from header file - Add explicit ck_tile:: namespace qualifications in header - Keep 'using namespace ck_tile' in .cpp files for convenience This fixes compilation with -Werror -Wheader-hygiene flags. Co-Authored-By: Claude <noreply@anthropic.com>
…P8 flag Revert the config.hpp host code change and instead follow the established pattern used in other test directories by adding the FP8 compile option directly in CMakeLists.txt when CK_USE_OCP_FP8 is enabled. Co-Authored-By: Claude <noreply@anthropic.com>
Use exactly representable integer values instead of floating-point for unique test data. Since cshuffle only permutes data without arithmetic, this allows exact EXPECT_EQ comparisons instead of EXPECT_FLOAT_EQ. - 16-bit types: Use values 1 to output_size (exactly representable) - 8-bit types: Use (index % 255) + 1 to stay in [1, 255] range Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
…omparison Refactor the CShuffleEpilogue tests to properly verify permutation correctness using set-based comparison instead of checking single values. The tests now: - Verify output has correct size and multiple unique values - Check values are in valid range (FP16: positive, FP8: >10 unique values) - For scale tests, compare scaled vs unscaled outputs to verify scaling Co-Authored-By: Claude <noreply@anthropic.com>
Remove duplicated configuration constants from scale tests by reusing the existing HalfConfig_256x256_2x2x1_32x32x8 type alias. Co-Authored-By: Claude <noreply@anthropic.com>
Move CShuffleEpilogueScaleTest (RowCol and Tensor scale tests) from test_cshuffle_epilogue_fp16.cpp into a dedicated test_cshuffle_epilogue_scale target for better test organization and faster incremental builds. Co-Authored-By: Claude <noreply@anthropic.com>
Extract reusable helpers for better code organization: - Add verification namespace with named constants (kMaxFP8Value, kMinUniqueFP8Values, kScaleEpsilon) - Add MakeProblem<Config> type alias to construct problem types from TileConfig, eliminating 11-line duplication in scale tests - Add convert_and_sort_output() helper for reusable output conversion - Add run_scale_comparison_test() helper to bundle unscaled/scaled test execution - Extract verify_permutation_output() for reusable verification logic - Simplify BasicTest from ~45 lines to ~8 lines using new helpers Co-Authored-By: Claude <noreply@anthropic.com>
…e tests - Fix memory leak: add hipFree for m_scale/n_scale buffers in scale tests - Remove unused struct members from CShuffleEpilogueTestResult - Add CK_INSTANTIATE_TYPED_TEST_SUITE macro to reduce pragma duplication - Remove unnecessary :: prefix from MakeProblem usage Co-Authored-By: Claude <noreply@anthropic.com>
Extract scale-specific logic from run_cshuffle_epilogue_test into dedicated helper functions to improve maintainability and readability. Changes: - Add DeviceScaleBuffers RAII wrapper for automatic memory management - Implements move semantics for efficient resource transfer - Prevents memory leaks via automatic cleanup in destructor - Extract buffer allocation into helper functions: - allocate_rowcol_scale_buffers() for per-row/column scaling - allocate_tensor_scale_buffers() for uniform tensor scaling - Extract kernel launch into separate functions: - launch_kernel_with_rowcol_scale() - launch_kernel_with_tensor_scale() - launch_kernel_without_scale() - Add HIP_CHECK_ERROR for all HIP API calls (hipGetLastError, hipDeviceSynchronize) - Use (void) cast for hipFree in destructor/move-assignment (cannot throw) - Reduce main function complexity from 82 to 52 lines (37% reduction) Benefits: - Clear separation of concerns (allocation, launch, cleanup) - Each helper has single, well-defined purpose - Safer resource management via RAII - More consistent error handling across all HIP calls - Easier to extend with new scale types Co-Authored-By: Claude <noreply@anthropic.com>
…fication Update verification logic to properly handle floating-point types: - Change convert_and_sort_output to return std::vector<float> instead of uint32_t - Use tolerance-based uniqueness counting in verify_permutation_output - Replace EXPECT_EQ with EXPECT_NEAR for tensor scale comparison Co-Authored-By: Claude <noreply@anthropic.com>
Change assertion from "at least half" to "all rows" for RowCol scaling verification. If scaling works correctly, every row should show the expected behavior. Co-Authored-By: Claude <noreply@anthropic.com>
Add tests for MFMA tile sizes 16x16x128 and 32x32x64 which are only available on gfx950. The new test file is conditionally compiled when GPU_TARGETS includes gfx950. Co-Authored-By: Claude <noreply@anthropic.com>
Add MfmaDataType template parameter to TileConfig to allow testing FP8-specific tile sizes (KPerXdl=32,64,128) with half_t output. This avoids FP8 range limitations and eliminates special verification logic for 8-bit types. - Use half_t output with fp8_t MFMA type in FP8 tests - Remove kMaxFP8Value and kMinUniqueFP8Values constants - Simplify value generation (no modulo wrapping needed) - Unify verification logic for all output types Co-Authored-By: Claude <noreply@anthropic.com>
Add 1x4 and 4x1 warp layouts in addition to existing 2x2 layout for gfx950-specific tile sizes (16x16x128, 32x32x64). Co-Authored-By: Claude <noreply@anthropic.com>
Add NaN check and clarify that C-shuffle epilogue broadcasts values from the distributed accumulator tile, so duplicates are expected. Verification now checks for: correct size, no NaN, no zeros, and multiple unique values. Co-Authored-By: Claude <noreply@anthropic.com>
…ize unique values The previous verification only checked for more than 1 unique value. Since each thread generates unique values (thread_id * buffer_size + i + 1), we should expect at least kBlockSize unique values in the output. This stricter check verifies that all threads contributed to the output, providing better test coverage for the C-shuffle epilogue correctness. Co-Authored-By: Claude <noreply@anthropic.com>
Define kTestScaleFactor, kIdentityScale, and kScaledColIndex constants in the verification namespace to eliminate hardcoded magic numbers. This ensures the scale factors used in buffer allocation match those used in test verification, making the tests more maintainable. Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR enhances the CShuffleEpilogue test suite by expanding test coverage with parameterized configurations and organizing tests by data type (FP16 and FP8) into separate executables.
Changes
1. Expanded test coverage with parameterized configurations
2. Sharded tests into FP16 and FP8 executables
test_cshuffle_epilogue.cppinto:test_cshuffle_epilogue_fp16.cpp- FP16/BF16 data type teststest_cshuffle_epilogue_fp8.cpp- FP8 data type teststest_cshuffle_epilogue_common.hppto share common test infrastructure3. Follow CMake pattern for FP8 compile flags
test/ck_tile/epilogue/CMakeLists.txtto conditionally add-DCK_TILE_USE_OCP_FP8compile option whenCK_USE_OCP_FP8is enabledtest/ck_tile/data_type,test/ck_tile/gemm, etc.)4. Fixed header hygiene
Test plan
test_ck_tile_cshuffle_epilogue_fp16,test_ck_tile_cshuffle_epilogue_fp8)