Skip to content

Conversation

@CuiYifeng
Copy link
Contributor

This PR is to remove work-around for a functionality regression in oneMKL 2025.2 and fix the error handling code that has never been executed with 2025.2.

@CuiYifeng CuiYifeng added this to the PT2.10 milestone Nov 10, 2025
@CuiYifeng CuiYifeng added the mkl label Nov 10, 2025
@CuiYifeng CuiYifeng marked this pull request as ready for review December 1, 2025 08:49
Copilot AI review requested due to automatic review settings December 1, 2025 08:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses error handling in BatchLinearAlgebra operations by removing a workaround for a oneMKL 2025.2 regression and fixing indexing bugs in the error handling code. The changes enable proper exception handling that was previously non-functional.

Key Changes:

  • Removed temporary skip list entries for inverse operation tests that were disabled due to oneMKL regression
  • Fixed incorrect indexing in error handling code to use exception IDs instead of loop indices

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
test/xpu/skip_list_common.py Removed 8 test cases from skip list that were temporarily disabled due to oneMKL regression
src/ATen/native/xpu/mkl/BatchLinearAlgebra.cpp Removed workaround code block and fixed indexing bug in error handlers to use ids[i] instead of i

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"\nDetail: ",
e.detail());
info_cpu[i] = e.info();
info_cpu[ids[i]] = e.info();
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential out-of-bounds access: ids[i] is used to index info_cpu, but there's no validation that ids[i] is within the bounds of info_cpu. This could cause a segfault or memory corruption if the batch exception contains an invalid matrix ID.

Copilot uses AI. Check for mistakes.
} catch (const sycl::exception& e) {
TORCH_WARN("Caught SYCL exception:\nWhat: ", e.what(), "\nInfo: -1");
info_cpu[i] = -1;
info_cpu[ids[i]] = -1;
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential out-of-bounds access: Same issue as above - ids[i] is used to index info_cpu without bounds checking. Consider adding validation that ids[i] < info_cpu.size() before indexing.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants