Handle reductions in get_insn_access_map#1009
Conversation
|
I can't request review on this repo, so @kaushikcfd @inducer this is probably ready for a glance. |
|
I should mention: this is more or less my first time doing anything nontrivial with One thing I'm not clear on is whether I should be using |
425a2c1 to
c12afbe
Compare
There was a problem hiding this comment.
Pull request overview
This PR updates loop-fusion dependency analysis so access maps can be computed for instructions containing reductions whose domains are separate from the surrounding inames.
Changes:
- Replaces single instruction access-map collection with per-reduction-level access maps.
- Updates loop fusion to project and union multiple access maps before dependence checks.
- Adds a regression test covering fusion with an inner reduction in a separate domain.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
loopy/kernel/tools.py |
Adds reduction-aware instruction access-map collection and exposes map unioning. |
loopy/transform/loop_fusion.py |
Consumes multiple access maps and unions them after projection. |
test/test_loop_fusion.py |
Adds regression coverage for fusing loops when a dependent read occurs inside a reduction. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self, expr: object, domain: isl.Set) -> dict[frozenset[str], isl.Map]: | ||
| return {} | ||
|
|
||
|
|
Currently,
get_insn_access_maponly passes the inames fromwithin_inamesin the call toget_access_map. As a result, if reductions are presentget_access_mapwillmay (edit: it only happens if the reduction domain is separate from the element domain, otherwiseknl.get_inames_domain()picks up the reduction) fail, e.g.:This causes
_compute_isinfusible_via_access_mapto returnTruewhenever reductions are present (code), which subsequently preventsget_kennedy_unweighted_fusion_candidatesfrom fusing loops that it potentially could.A real world example of this can be seen in the generated code for a DG operator here. I've annotated with
# DESCRIPTION:what DG operations are being done in each loop in the device code. The first 3 loop nests are performing face-local work on the interior faces (theielloop with 5641 iterations). Since the operations are face-local, theielloops should be able to be fused, but the reductions over theidofaxes are preventing that from happening due to this issue.This PR changes
get_insn_access_maptoget_insn_access_maps. It now computes separate access maps for accesses outside of reductions + each different reduction level present, by traversing the instruction and updating the domain for reductions accordingly (AFAIK, they cannot all be unioned together due to the different spaces involved). It additionally modifiescompute_isinfusible_via_access_mapto do the unioning after projection once it's safe to do so.With this change, I'm now seeing fusion of the element loops as expected: code.