From 8d15604ce58a3c31b108876d00921511a7e8f354 Mon Sep 17 00:00:00 2001 From: spolifroni-amd Date: Tue, 20 Jan 2026 10:50:52 -0500 Subject: [PATCH 1/2] first commit of intra/interwave doc --- docs/conceptual/CK-Tile-intra-inter-wave.rst | 19 +++++++++++++++++++ docs/index.rst | 1 + docs/sphinx/_toc.yml.in | 2 ++ 3 files changed, 22 insertions(+) create mode 100644 docs/conceptual/CK-Tile-intra-inter-wave.rst diff --git a/docs/conceptual/CK-Tile-intra-inter-wave.rst b/docs/conceptual/CK-Tile-intra-inter-wave.rst new file mode 100644 index 00000000000..4bc85c8f0dd --- /dev/null +++ b/docs/conceptual/CK-Tile-intra-inter-wave.rst @@ -0,0 +1,19 @@ +.. meta:: + :description: Intrawave and interwave scheduling with CK Tile + :keywords: composable kernel, CK, CK Tile, ROCm, API, scheduling, intrawave, interwave + +************************************************************ +Intrawave and interwave scheduling with CK Tile +************************************************************ + +Two different scheduling pipelines are available to use with CK Tile's GEMM implementation. + +The interwave and intrawave scheduling pipelines coordinate waves in K dimension accumulation loops. Whether to use the interwave or intrawave pipeline depends on whether the workload is memory-bound or compute-bound. + +In interwave scheduling, the K dimension is separated into chunks. The same chunk is loaded into each wave and all the waves run the same operation on the chunk. The operation is only run once all the waves have loaded the chunk, and the next chunk is loaded only after all the waves have finished running their operation. + +Because all the waves are synchronized, memory accesses are coordinated and the cache hit rate is optimized, interwave scheduling is best for memory-bound workloads. + +In intrawave scheduling, the full k dimensions are loaded into each wave. The waves then all run their operations on the entire K dimension independently and without synchronization. The CU then interleaves the instructions from all the waves. + +Because the CU has flexibility in scheduling operations, intrawave scheduling is best for compute-bound workloads. diff --git a/docs/index.rst b/docs/index.rst index 865914ab4ce..73b07a7a8c1 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -25,6 +25,7 @@ The Composable Kernel repository is located at `https://github.com/ROCm/composab * :doc:`Composable Kernel structure <./conceptual/Composable-Kernel-structure>` * :doc:`Composable Kernel mathematical basis <./conceptual/Composable-Kernel-math>` + * :doc:`CK Tile interwave and intrawave scheduling <./conceptual/CK-Tile-intra-inter-wave>` * :doc:`CK Tile conceptual documentation <./conceptual/ck_tile/index>` .. grid-item-card:: Tutorials diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index c82e07ced8f..11845b47166 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -18,6 +18,8 @@ subtrees: title: Composable Kernel structure - file: conceptual/Composable-Kernel-math.rst title: Composable Kernel mathematical basis + - file: conceptual/CK-Tile-intra-inter-wave.rst + title: CK Tile pipeline scheduling - file: conceptual/ck_tile/index.rst title: CK Tile conceptual documentation From a8ef07180136273b38948fc8f1af2e7a04e25871 Mon Sep 17 00:00:00 2001 From: spolifroni-amd Date: Tue, 27 Jan 2026 15:20:38 -0500 Subject: [PATCH 2/2] [CK] first commit of intra/interwave scheduling doc --- docs/conceptual/CK-Tile-intra-inter-wave.rst | 13 ++++++++++--- docs/index.rst | 1 + docs/sphinx/_toc.yml.in | 2 ++ 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/docs/conceptual/CK-Tile-intra-inter-wave.rst b/docs/conceptual/CK-Tile-intra-inter-wave.rst index 4bc85c8f0dd..1d9e634c0fa 100644 --- a/docs/conceptual/CK-Tile-intra-inter-wave.rst +++ b/docs/conceptual/CK-Tile-intra-inter-wave.rst @@ -10,10 +10,17 @@ Two different scheduling pipelines are available to use with CK Tile's GEMM impl The interwave and intrawave scheduling pipelines coordinate waves in K dimension accumulation loops. Whether to use the interwave or intrawave pipeline depends on whether the workload is memory-bound or compute-bound. -In interwave scheduling, the K dimension is separated into chunks. The same chunk is loaded into each wave and all the waves run the same operation on the chunk. The operation is only run once all the waves have loaded the chunk, and the next chunk is loaded only after all the waves have finished running their operation. +In interwave scheduling, the K dimension is separated into chunks. The same chunk is loaded into each wave. When the chunk has been loaded into all the waves, the same operation is run on the chunk. -Because all the waves are synchronized, memory accesses are coordinated and the cache hit rate is optimized, interwave scheduling is best for memory-bound workloads. +Once all the waves have completed the operation, the next chunk is loaded into the waves. -In intrawave scheduling, the full k dimensions are loaded into each wave. The waves then all run their operations on the entire K dimension independently and without synchronization. The CU then interleaves the instructions from all the waves. +Because all the waves are synchronized, memory accesses are coordinated, and the cache hit rate is optimized, interwave scheduling is best for memory-bound workloads. + +In intrawave scheduling, the full K dimension is loaded into each wave. Each wave runs its own operation on the K dimension independently of the other waves, and without any synchronization with the other waves. The compute unit (CU) is responsible for interleaving the independent operations. Because the CU has flexibility in scheduling operations, intrawave scheduling is best for compute-bound workloads. + +An example of both interwave and intrawave scheduling can be found in |gemm_utils.hpp|_, which is part of the `GEMM with CK Tile example `_. + +.. |gemm_utils.hpp| replace:: ``gemm_utils.hpp`` +.. _gemm_utils.hpp: https://github.com/ROCm/composable_kernel/blob/develop/example/ck_tile/03_gemm/gemm_utils.hpp#L37 \ No newline at end of file diff --git a/docs/index.rst b/docs/index.rst index 31d8484dc20..6744318e51f 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -25,6 +25,7 @@ The Composable Kernel repository is located at `https://github.com/ROCm/composab * :doc:`Composable Kernel structure <./conceptual/Composable-Kernel-structure>` * :doc:`Composable Kernel mathematical basis <./conceptual/Composable-Kernel-math>` + * :doc:`CK Tile intrawave and interwave scheduling <../conceptual/CK-Tile-intra-inter-wave>` * :doc:`CK Tile conceptual documentation <./conceptual/ck_tile/CK-tile-index>` .. grid-item-card:: Tutorials diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index 90592879c09..a74b8cd363b 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -18,6 +18,8 @@ subtrees: title: Structure - file: conceptual/Composable-Kernel-math.rst title: Mathematical basis + - file: conceptual/CK-Tile-intra-inter-wave.rst + title: Intrawave and interwave scheduling - file: conceptual/ck_tile/CK-tile-index.rst title: CK Tile conceptual documentation