Skip to content

Conversation

@mangguo321
Copy link
Contributor

Details:

  • Move transpose functions from executor_pa.cpp to transpose.hpp to reuse in xattention and executor_pa.cpp. Modify transpose_16NxK logic to handle tails

Tickets:

@mangguo321 mangguo321 requested review from a team as code owners December 3, 2025 03:31
@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Dec 3, 2025
Copy link
Contributor

@zhangYiIntel zhangYiIntel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mangguo321
Copy link
Contributor Author

mangguo321 commented Dec 9, 2025

Test on EMR, no regression in performance and accuracy.
image

@yuxu42 yuxu42 requested a review from Copilot December 11, 2025 08:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors transpose functions from executor_pa.cpp to a shared transpose.hpp header for reuse across multiple components. The changes enable better code organization and add new parameters to the transpose function signature to support quantization features.

  • Moved three transpose_16NxK template overloads from executor_pa.cpp to transpose.hpp
  • Updated function signatures to include tmp, group_size, and quant_key_bychannel parameters
  • Modified all call sites to pass additional parameters (including nullptr for unused tmp parameter)

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/plugins/intel_cpu/src/nodes/kernels/scaled_attn/transpose.hpp Added three transpose_16NxK template overloads moved from executor_pa.cpp, including support for quantized types (i8, u8, u4)
src/plugins/intel_cpu/src/nodes/kernels/scaled_attn/xattention.hpp Updated calls to transpose_16NxK to include new parameters (nullptr for tmp, 0 for group_size, false for quant_key_bychannel)
src/plugins/intel_cpu/src/nodes/kernels/scaled_attn/executor_pa.cpp Removed transpose_16NxK function definitions that were moved to transpose.hpp, added include for transpose.hpp

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

transpose_16NxK<uint32_t, ov::element::u32>(d, s, N, K >> 1, block_size, dst_stride, src_stride >> 1);
transpose_16NxK<uint32_t, ov::element::u32>(d,
s,
reinterpret_cast<uint32_t*>(0),
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using reinterpret_cast<uint32_t*>(0) to represent nullptr is non-idiomatic and less clear. Replace with nullptr or static_cast<uint32_t*>(nullptr) for better readability.

Suggested change
reinterpret_cast<uint32_t*>(0),
nullptr,

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a valid comment.
@mangguo321 , could you please explicitly give your opinion on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a valid comment. @mangguo321 , could you please explicitly give your opinion on this?

Hi @maxnick, this code was originally implemented in executor_pa.cpp and moved here without modification. The original intent is unclear, but I think we can update it to use nullptr for now.

}
transpose_16NxK<TDST, precision_of<TDST>::value>(dst,
tmp,
reinterpret_cast<TDST*>(0),
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using reinterpret_cast<TDST*>(0) to represent nullptr is non-idiomatic and less clear. Replace with nullptr for better readability.

Suggested change
reinterpret_cast<TDST*>(0),
nullptr,

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with nullptr.

@maxnick maxnick self-assigned this Dec 11, 2025
@maxnick maxnick added this to the 2026.0 milestone Dec 11, 2025
attn_dequant_by_channel_kernel<TDST,
SRC_PREC>(s, t, N, K, K / sub_byte_multiplier, src_stride, p_scales, p_zps);
} else {
static_assert(SRC_PREC == ov::element::i8, "i8 doesn't support by-channel quantization");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fails for types different than i8, but error message suggest that i8 is not correct.
```should the condition be SRC_PREC != ov::element::i8?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I've fixed in the latest commit, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: CPU OpenVINO CPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants