-
Notifications
You must be signed in to change notification settings - Fork 61
Refine error_handle for BatchLinearAlgebra Ops #2321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses error handling in BatchLinearAlgebra operations by removing a workaround for a oneMKL 2025.2 regression and fixing indexing bugs in the error handling code. The changes enable proper exception handling that was previously non-functional.
Key Changes:
- Removed temporary skip list entries for inverse operation tests that were disabled due to oneMKL regression
- Fixed incorrect indexing in error handling code to use exception IDs instead of loop indices
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| test/xpu/skip_list_common.py | Removed 8 test cases from skip list that were temporarily disabled due to oneMKL regression |
| src/ATen/native/xpu/mkl/BatchLinearAlgebra.cpp | Removed workaround code block and fixed indexing bug in error handlers to use ids[i] instead of i |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "\nDetail: ", | ||
| e.detail()); | ||
| info_cpu[i] = e.info(); | ||
| info_cpu[ids[i]] = e.info(); |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential out-of-bounds access: ids[i] is used to index info_cpu, but there's no validation that ids[i] is within the bounds of info_cpu. This could cause a segfault or memory corruption if the batch exception contains an invalid matrix ID.
| } catch (const sycl::exception& e) { | ||
| TORCH_WARN("Caught SYCL exception:\nWhat: ", e.what(), "\nInfo: -1"); | ||
| info_cpu[i] = -1; | ||
| info_cpu[ids[i]] = -1; |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential out-of-bounds access: Same issue as above - ids[i] is used to index info_cpu without bounds checking. Consider adding validation that ids[i] < info_cpu.size() before indexing.
This PR is to remove work-around for a functionality regression in oneMKL 2025.2 and fix the error handling code that has never been executed with 2025.2.