Update guided matmul example by chuanyuf · Pull Request #2762 · oneapi-src/oneAPI-samples

chuanyuf · 2026-06-26T15:01:19Z

Existing Sample Changes

Description

Guided Debugging Sample updates, Updates for oneAPI 2026.0 tools and drivers
Update cmake to at least 3.5, update readme document, and ensure memory release after use.

No new functions added for the PRs.

Fixes Issue#

External Dependencies

List any external dependencies created as a result of this change.

Type of change

Please delete options that are not relevant. Add a 'X' to the one that is applicable.

[ x ] Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Implement fixes for ONSAM Jiras

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

[x ] Command Line
oneapi-cli
Visual Studio
Eclipse IDE
VSCode
When compiling the compliler flag "-Wall -Wformat-security -Werror=format-security" was used

2025.3 Guided Debugging Sample updates

Updates for oneAPI 2026.0 tools and drivers

Copilot

Pull request overview

This PR updates the guided matrix-multiplication debugging samples for Intel® oneAPI 2026.0 by refreshing documentation/tooling requirements, bumping the CMake minimum version, and adding USM deallocation to reduce leaks in the sample code.

Changes:

Add sycl::free(...) cleanup for USM allocations across multiple guided samples.
Update sample READMEs for oneAPI 2026.0 tool/driver versions and revise guided-debug instructions/output snippets.
Bump cmake_minimum_required from 3.4 to 3.5 for the affected samples.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
Tools/ApplicationDebugger/guided_matrix_mult_SLMSize/src/2_matrix_mul.cpp	Adds USM frees after `q.wait()` for the “working” variant.
Tools/ApplicationDebugger/guided_matrix_mult_SLMSize/src/1_matrix_mul_SLM_size.cpp	Adds USM frees after `q.wait()` for the SLM-size failure variant.
Tools/ApplicationDebugger/guided_matrix_mult_SLMSize/README.md	Updates prerequisites, error messages, and guided-debug narrative for newer runtimes/tools.
Tools/ApplicationDebugger/guided_matrix_mult_SLMSize/CMakeLists.txt	Raises CMake minimum version to 3.5.
Tools/ApplicationDebugger/guided_matrix_mult_RaceCondition/README.md	Updates prerequisites and guided-debug instructions/output.
Tools/ApplicationDebugger/guided_matrix_mult_RaceCondition/CMakeLists.txt	Raises CMake minimum version to 3.5.
Tools/ApplicationDebugger/guided_matrix_mult_InvalidContexts/src/2_matrix_mul.cpp	Adds device enumeration output and USM frees.
Tools/ApplicationDebugger/guided_matrix_mult_InvalidContexts/src/1_matrix_mul_invalid_contexts.cpp	Adds device enumeration/output, device-specific queue selection, and conditional free behavior for the tutorial.
Tools/ApplicationDebugger/guided_matrix_mult_InvalidContexts/README.md	Significant refresh of tutorial steps, additional scenarios (ASAN/bonus), and updated prerequisites.
Tools/ApplicationDebugger/guided_matrix_mult_InvalidContexts/CMakeLists.txt	Raises CMake minimum version to 3.5.
Tools/ApplicationDebugger/guided_matrix_mult_Exceptions/src/3_matrix_mul.cpp	Adds USM frees after `q.wait()`.
Tools/ApplicationDebugger/guided_matrix_mult_Exceptions/src/2_matrix_mul_multi_offload.cpp	Adds USM frees after `q.wait()`.
Tools/ApplicationDebugger/guided_matrix_mult_Exceptions/src/1_matrix_mul_null_pointer.cpp	Adds USM frees after `q.wait()`.
Tools/ApplicationDebugger/guided_matrix_mult_Exceptions/README.md	Updates prerequisites and guided-debug narrative for newer runtime behavior.
Tools/ApplicationDebugger/guided_matrix_mult_Exceptions/CMakeLists.txt	Raises CMake minimum version to 3.5.
Tools/ApplicationDebugger/guided_matrix_mult_BadBuffers/src/b2_matrix_mul_usm.cpp	Adds USM frees after `q.wait()`.
Tools/ApplicationDebugger/guided_matrix_mult_BadBuffers/src/b1_matrix_mul_null_usm.cpp	Adds USM frees after `q.wait()`.
Tools/ApplicationDebugger/guided_matrix_mult_BadBuffers/README.md	Expands tutorial with device-side AddressSanitizer guidance and updates prerequisites.
Tools/ApplicationDebugger/guided_matrix_mult_BadBuffers/CMakeLists.txt	Raises CMake minimum version to 3.5.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+#ifdef BAD_FREE
+    device selected_device = devices[0];
+#else
+    device selected_device = devices[1];
+#endif


+    // Be very specific about the device to use.
+    queue q(devices[0]);



    property_list propList = property_list{property::queue::enable_profiling()};

+    std::vector<sycl::device> devices = sycl::device::get_devices();
+    cout << "Devices:" << std::endl;
+
+    for (size_t index = 0; index < devices.size(); index++){
+       std::string device_name = devices[index].get_info<sycl::info::device::name>();
+       std::string device_driver = devices[index].get_info<sycl::info::device::driver_version>();
+       std::string sycl_version = devices[index].get_info<sycl::info::device::version>();
+       std::string vendor = devices[index].get_info<sycl::info::device::vendor>();
+       std::string backend = devices[index].get_info<sycl::info::device::backend_version>();
+       std::cout << "  [" << index << "] " << device_name << ", "  << sycl_version  << " [" << device_driver
+                << "] " << backend << ",  " << vendor <<  std::endl;
+    }
+
    queue q(default_selector_v);



    q.wait();
+
+    sycl::free(dev_a, q);


+   [opencl:gpu][opencl:2] Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max 1550 OpenCL 3.0 NEO  [25.18.33578]
+   [opencl:cpu][opencl:3] Intel(R) OpenCL, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz OpenCL 3.0 (Build 0) [2023.16.7.0.21_160000]
+   ```
+   > **Note:** If you have only one `[level_zero:gpu]` device listed, or the order is different from the above, the the main example below may not work.   Try to follow through anyway, and then try the bonus sample at the end of this document, which should work no matter what system configuration.


 ### Identify the Problem without Code Inspection

-You must have already built the [Unified Tracing and Profiling Tool](#getting-the-tracing-and-profiling-tool). Once you have built the utility, you can start it before your program (similar to using GBD).
+You need to build the [Unified Tracing and Profiling Tool](#getting-the-tracing-and-profiling-tool) before completing this section. Once you have built the utility, you can start it before your program (similar to using GBD).


+   101     queue q2(devicecontext, selected_device);
+   102     float * dev_c = sycl::malloc_device<float>(M*P, q2);
+   ```
+   As is hopefully obvious from the previous example, the problem is that we are trying to free memory allocated in SYCL queue `q2` that has a different device context fron SYCL queue `q`, even though under the covers they point to the same hardware device.


   ```

-Similarly, we specify targeting the CPU, which sometimes can avoid problems in your code that are specific to offloading to the GPU.
+Similarly, we an force the program to run on the CPU, which sometimes can avoid problems in your code that are specific to offloading to the GPU.


+#### Debugging the Problem

-Why did we try with multiple backends?   If one had shown correct or incorrect results, and one had crashed, we might be facing a race condition that only occasionally manifests as something that goes terribly wrong.  Or one of the backbends might have a bug.  But here all three crash, so it's likely the program is doing something illegal to memory.  The host CPU is a particularly good place to test for illegal memory accesses, because the CPU never allows pointers with an address within a few kilobytes of address 0x0, while this may be legally allocated memory on the GPU.
+Why did we try with multiple backends?   If one had shown correct or incorrect results, and one had crashed, we might be facing a race condition that only occasionally manifests when something goes terribly wrong.  Or one of the backbends might have a bug while the others do not.  But here all three crash, so it's likely the program is doing something illegal to memory.  The host CPU is a particularly good place to test for illegal memory accesses, because the CPU never allows pointers with an address within a few kilobytes of address `0x0`, while this may be legally allocated memory on the GPU.


   ```

-   We used the form of `parallel_for` that takes the `nd_range`, which specifies the global iteration range (163850) and the local work-group size (10) like so:  `nd_range<1>{{163850}, {10}}`. The first line above shows the workgroup size (`groupSizeX = 10 groupSizeY = 1 groupSizeZ = 1`), and the second shows how many total workgroups will be needed to process the global iteration range (`{16385, 1, 1}`).
+   At like 106 we used the form of `parallel_for` that takes the `nd_range`, which specifies the global iteration range (163850) and the local work-group size (10) like so:  `nd_range<1>{{163850}, {10}}`. The first line above shows the workgroup size (`groupSizeX = 0xa groupSizeY = 0x1 groupSizeZ = 0x1`), and the second shows how many total workgroups will be needed to process the global iteration range (`{16385, 1, 1}`).


chuanyuf added 2 commits June 25, 2026 07:47

Merge pull request #2700 from CharlesCongdon/cc_tutupdate20253

ee8133c

2025.3 Guided Debugging Sample updates

Merge pull request #2747 from CharlesCongdon/cc_tutupdate20260

c9cb28b

Updates for oneAPI 2026.0 tools and drivers

chuanyuf assigned e-zorina Jun 26, 2026

chuanyuf requested a review from Copilot June 26, 2026 15:02

Copilot started reviewing on behalf of chuanyuf June 26, 2026 15:02 View session

Copilot AI reviewed Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update guided matmul example#2762

Update guided matmul example#2762
chuanyuf wants to merge 2 commits into
mainfrom
update_guided_matmul_example

chuanyuf commented Jun 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		// Be very specific about the device to use.
		queue q(devices[0]);

Uh oh!

Conversation

chuanyuf commented Jun 26, 2026

Existing Sample Changes

Description

External Dependencies

Type of change

How Has This Been Tested?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants