Skip to content

Conversation

@mcfi
Copy link
Contributor

@mcfi mcfi commented Nov 14, 2025

Summary:
This change added a vectorized requantize_ for Arm64 with NEON intrinsics:

  1. The newly added NEON intrinsics follows what the existing AVX2 code does.
  2. The scalar loop was moved to a new function requantize_i8dw_ref_ to make the code more readable and testable.
  3. Added new tests to make requantize_ and requantize_i8dw_ref_ produce identical results.

Differential Revision: D86216347

@meta-codesync
Copy link
Contributor

meta-codesync bot commented Nov 14, 2025

@mcfi has exported this pull request. If you are a Meta employee, you can view the originating Diff in D86216347.

mcfi added a commit to mcfi/FBGEMM that referenced this pull request Nov 14, 2025
Summary:
Pull Request resolved: pytorch#5130

X-link: https://github.com/facebookresearch/FBGEMM/pull/2132

This change added a vectorized requantize_ for Arm64 with NEON intrinsics:
1. The newly added NEON intrinsics follows what the existing AVX2 code does.
2. The scalar loop was moved to a new function requantize_i8dw_ref_ to make the code more readable and testable.
3. Added new tests to make requantize_ and requantize_i8dw_ref_ produce identical results.

Differential Revision: D86216347
mcfi added a commit to mcfi/FBGEMM that referenced this pull request Nov 14, 2025
Summary:
Pull Request resolved: pytorch#5130

X-link: https://github.com/facebookresearch/FBGEMM/pull/2132

This change added a vectorized requantize_ for Arm64 with NEON intrinsics:
1. The newly added NEON intrinsics follows what the existing AVX2 code does.
2. The scalar loop was moved to a new function requantize_i8dw_ref_ to make the code more readable and testable.
3. Added new tests to make requantize_ and requantize_i8dw_ref_ produce identical results.

Differential Revision: D86216347
mcfi added a commit to mcfi/FBGEMM that referenced this pull request Nov 14, 2025
Summary:
Pull Request resolved: pytorch#5130

X-link: https://github.com/facebookresearch/FBGEMM/pull/2132

This change added a vectorized requantize_ for Arm64 with NEON intrinsics:
1. The newly added NEON intrinsics follows what the existing AVX2 code does.
2. The scalar loop was moved to a new function requantize_i8dw_ref_ to make the code more readable and testable.
3. Added new tests to make requantize_ and requantize_i8dw_ref_ produce identical results.

Differential Revision: D86216347
mcfi added a commit to mcfi/FBGEMM that referenced this pull request Nov 14, 2025
Summary:
Pull Request resolved: pytorch#5130

X-link: https://github.com/facebookresearch/FBGEMM/pull/2132

This change added a vectorized requantize_ for Arm64 with NEON intrinsics:
1. The newly added NEON intrinsics follows what the existing AVX2 code does.
2. The scalar loop was moved to a new function requantize_i8dw_ref_ to make the code more readable and testable.
3. Added new tests to make requantize_ and requantize_i8dw_ref_ produce identical results.

Differential Revision: D86216347
Summary:
Pull Request resolved: pytorch#5130

X-link: https://github.com/facebookresearch/FBGEMM/pull/2132

This change added a vectorized requantize_ for Arm64 with NEON intrinsics:
1. The newly added NEON intrinsics follows what the existing AVX2 code does.
2. The scalar loop was moved to a new function requantize_i8dw_ref_ to make the code more readable and testable.
3. Added new tests to make sure requantize_ and requantize_i8dw_ref_ produce identical results.

Reviewed By: Nicoshev

Differential Revision: D86216347
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Nov 19, 2025

This pull request has been merged in 643894e.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants