Skip to content

[Intel] GroupMemoryBarrierWithGroupSync inconsistent #445

@inbelic

Description

@inbelic

Consider the following:

RWStructuredBuffer<uint4> Out : register(u0);

groupshared uint4 SharedData;

[numthreads(128,4,1)]
void main(uint3 ThreadID : SV_GroupThreadID) {
  if (ThreadID.x == 0 && ThreadID.y == 0) {
    SharedData = 0;
  }
  GroupMemoryBarrierWithGroupSync();

  for (uint I = 0; I < 128; I++) {
    if (ThreadID.x == I) {
      SharedData[ThreadID.y] = SharedData[ThreadID.y] + 1;
    }
    GroupMemoryBarrierWithGroupSync();
  }

  if (ThreadID.x == 0) {
    Out[0][ThreadID.y] = SharedData[ThreadID.y];
  }
}

We would expect Out[0] = {128, 128, 128, 128}, and this is observed when using WARP, NV and AMD with DXC.

Specific to Intel, the output is {0, 0, 0, 0}, demonstrated here.

Hence, it is suspected to be a runtime (driver) bug specific to intel.

This issue is to track a further investigation to confirm this is the case. For further reference, please see the ComponentAccumulationDataRace test-case, introduced here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    driver-bugBugs that are likely or confirmed GPU driver bugs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions