Skip to content

Add opus C++ template library documentation and unit tests#2004

Closed
sunway513 wants to merge 3 commits intoROCm:mainfrom
sunway513:docs/opus-guide
Closed

Add opus C++ template library documentation and unit tests#2004
sunway513 wants to merge 3 commits intoROCm:mainfrom
sunway513:docs/opus-guide

Conversation

@sunway513
Copy link
Collaborator

@sunway513 sunway513 commented Feb 9, 2026

Summary

  • Add comprehensive documentation guide for the opus single-header C++ template library (csrc/include/opus/opus.hpp)
  • Covers type system, compile-time constants, containers (seq/tuple/array/vector), layout system, vectorized gmem/smem access, MFMA operations, tiled MMA, distributed tensor views (p_dim/y_dim), and utility functions
  • Includes real usage examples from AITER kernels (topk, rmsnorm_quant, cache, quant)
  • All function names, type names, file paths, and MFMA aliases verified against the actual source code
  • 16 CPU unit tests (compile-time constexpr/static_assert): number, seq, array, tuple, vector, slice, layout, static_for, type_traits, underscore, embed, packed_types, adaptor, mfma_types, warp_size, functional
  • 16 GPU unit tests (device-side kernel execution on gfx942): type conversions (fp16/bf16/fp8), math ops, DPP warp operations, GMEM buffer load/store, SMEM/LDS load/store, MFMA intrinsics (16x16x16, 32x32x8, accumulator, scaled), mfma_adaptor shapes/layouts, swap_ab, tiled MMA, FP8 pack/unpack, container folding, s_waitcnt

Test plan

  • All 16 CPU tests pass (hipcc -std=c++20 -O2)
  • All 16 GPU tests pass on MI300X (gfx942)
  • Build and run: bash csrc/include/opus/run_test.sh
  • Verify doc guide accuracy against opus source

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a comprehensive Markdown guide documenting the opus single-header C++ template library (used by AITER HIP kernels) with sections covering types, compile-time constants, containers, layouts, gmem/smem access, MFMA/tiled MMA, distributed tensor views, and utilities.

Changes:

  • Introduces a new docs/opus_guide.md documentation guide for opus.
  • Adds usage examples and API references for gmem/smem, MFMA, tiled MMA, and partition/layout utilities.
  • Documents how Opus is used in several AITER kernels and helper utilities.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


// Compile-time constants
auto n = 42_I; // number<42>
auto s = seq<2, 4, 8>{}; // Compile-time integer sequence
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Quick Reference snippet, auto s is first used for seq<2, 4, 8>{} but later reused for make_smem(...) in the same code block, which makes the snippet fail to compile as-written. Rename one of the variables or split these examples into separate scopes/snippets so readers can copy/paste reliably.

Suggested change
auto s = seq<2, 4, 8>{}; // Compile-time integer sequence
auto seq_vals = seq<2, 4, 8>{}; // Compile-time integer sequence

Copilot uses AI. Check for mistakes.
Comment on lines 61 to 62
auto s = make_smem(smem);
auto val = s.load<4>(offset);
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Quick Reference snippet, val is declared earlier as fp16_t val; and then redeclared here as auto val = s.load<4>(offset);, which will not compile in a single scope. Use distinct names (e.g., val0 / smem_val) or separate the examples into different snippets/scopes.

Suggested change
auto s = make_smem(smem);
auto val = s.load<4>(offset);
auto smem_view = make_smem(smem);
auto smem_val = smem_view.load<4>(offset);

Copilot uses AI. Check for mistakes.
Comment on lines 396 to 399
| Value | Constant | Meaning |
|-------|----------|---------|
| 0 | `RT` | Default (return temporal) |
| 3 | `GROUP_NT` | Group non-temporal — hints that data won't be reused |
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aux section references constants RT and GROUP_NT, but these identifiers are not defined in opus/opus.hpp (they appear to be defined in csrc/include/aiter_opus_plus.h under namespace aiter). Consider either removing the constant-name column, qualifying them as aiter::RT / aiter::GROUP_NT, or documenting aux purely as a numeric immediate passed to the underlying buffer intrinsic.

Suggested change
| Value | Constant | Meaning |
|-------|----------|---------|
| 0 | `RT` | Default (return temporal) |
| 3 | `GROUP_NT` | Group non-temporal — hints that data won't be reused |
| Value | Meaning |
|-------|---------|
| 0 | Default (temporal) load/store behavior |
| 3 | Group non-temporal — hints that data won't be reused |

Copilot uses AI. Check for mistakes.
Comment on lines 120 to 122
| `i16_t` / `u16_t` | `int16_t` / `uint16_t` | 16-bit integers |
| `i8_t` / `u8_t` | `int8_t` / `uint8_t` | 8-bit integers |

Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scalar-type table implies u16_t is always available, but in opus.hpp it is only registered when __clang_major__ >= 20 (ROCm 7+). Please note this conditional availability (or avoid listing u16_t as a guaranteed type) so users on older toolchains don’t get compile errors.

Suggested change
| `i16_t` / `u16_t` | `int16_t` / `uint16_t` | 16-bit integers |
| `i8_t` / `u8_t` | `int8_t` / `uint8_t` | 8-bit integers |
| `i16_t` | `int16_t` | 16-bit signed integer |
| `u16_t`* | `uint16_t` | 16-bit unsigned integer (only when `__clang_major__ >= 20`, e.g. ROCm 7+) |
| `i8_t` / `u8_t` | `int8_t` / `uint8_t` | 8-bit integers |
\* `u16_t` is conditionally registered in `opus.hpp` and is only available on toolchains with `__clang_major__ >= 20` (ROCm 7+). On older ROCm/Clang versions, `u16_t` is not provided.

Copilot uses AI. Check for mistakes.
@sunway513 sunway513 marked this pull request as draft February 9, 2026 04:11
@sunway513 sunway513 changed the title Add opus C++ template library documentation guide Add opus C++ template library documentation and unit tests Feb 9, 2026
@sunway513 sunway513 marked this pull request as ready for review February 9, 2026 15:28
@sunway513 sunway513 requested a review from carlushuang February 9, 2026 15:28
@carlushuang
Copy link
Collaborator

#2017 => refactored here :)

- Documentation guide for the opus micro STD library
- 16 standalone test groups compiled with hipcc (no GTest dependency)
- Test coverage: number, seq, array, tuple, vector, slice, layout,
  static_for, type_traits, underscore, embed, packed_types, adaptor,
  mfma_types, warp_size, functional
GPU tests covering: type conversions (fp16/bf16/fp8), math ops (max/min/med3),
DPP warp operations (mov_dpp/upd_dpp), GMEM buffer load/store (scalar + vec),
SMEM/LDS load/store (scalar + vec), MFMA intrinsics (16x16x16, 32x32x8,
accumulator chaining, scaled values), mfma_adaptor device-side shapes/layouts,
swap_ab adaptor, tiled MMA (2x2x1 expansion), FP8 pack/unpack, container
folding, and s_waitcnt synchronization.
- Fix duplicate variable names in Quick Reference snippets (s, val)
- Clarify aux template parameter constants (RT/GROUP_NT from aiter_opus_plus.h)
- Note u16_t conditional availability (ROCm 7+ / clang >= 20)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments