Skip to content

Parallel pytato PyOpenCL array context#358

Draft
majosm wants to merge 4 commits into
inducer:mainfrom
majosm:pytato-parallel
Draft

Parallel pytato PyOpenCL array context#358
majosm wants to merge 4 commits into
inducer:mainfrom
majosm:pytato-parallel

Conversation

@majosm
Copy link
Copy Markdown
Collaborator

@majosm majosm commented May 22, 2026

Adds PytatoParallelPyOpenCLArrayContext, which is like PytatoPyOpenCLArrayContext but parallelizes over the device. Based on @kaushikcfd's work in #216. This version generalizes that code to work with fused loops.

The basic procedure is:

  1. Collect instructions that share inames into sets.
  2. Find the "outer" inames (i.e., inames shared between all instructions) in each of those sets.
  3. Parallelize over those.

cc @lukeolson

⚠️ Currently WIP. TODO:

@majosm majosm force-pushed the pytato-parallel branch 20 times, most recently from 7248dee to 3fed163 Compare May 29, 2026 20:38
@majosm majosm force-pushed the pytato-parallel branch 7 times, most recently from 648760a to cdabe82 Compare May 30, 2026 04:42
@majosm majosm force-pushed the pytato-parallel branch from cdabe82 to ed5a9c7 Compare May 30, 2026 04:47
@majosm
Copy link
Copy Markdown
Collaborator Author

majosm commented May 30, 2026

@inducer FWIW, I found a workaround for the Intel issues. Seems like the failures are due to buffer overflow, possibly from bad handling of the partially-filled workgroups at the tail ends of loops. Adding some padding at the end via custom allocator makes the failures go away (I tried it on several Intel CL versions, which were previously failing on different tests for each version). It's not an ideal solution, but it avoids having to fiddle with versions or skip tests at least. Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant