Skip to content

Commit 5de6650

Browse files
authored
Merge pull request #29 from ArielG-NV/coherent-ptr-amendment
Amend coherent-pointer proposal as per new guidelines of design requested
2 parents 0139947 + 71fd320 commit 5de6650

File tree

2 files changed

+333
-221
lines changed

2 files changed

+333
-221
lines changed
Lines changed: 333 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,333 @@
1+
# SP\#031: Coherent Pointer Operations & Pointer Access
2+
3+
## Status
4+
5+
Status: Design Review
6+
Implementation:
7+
Author: Ariel Glasroth
8+
Reviewer:
9+
10+
## Background
11+
12+
### Introduction
13+
14+
GPUs have a concept known as coherent operations. Coherent operations flush cache for reads/writes so that when **thread A** modifies memory, **thread B** may read that memory, seeing all changes to memory done by a different thread. When flushing cache it is important to note that not all caches will be flushed. If a user wants coherence to `WorkGroup` memory, only the levels of cache up to `WorkGroup` memory will need to be flushed.
15+
16+
Additionally, pointers have a topic called 'access', the ability to mark a pointer as read-only or read-write. A read-only pointer is immutable (unable to modify the data pointed to), a read-write pointer allows reading & writing to the data the pointer points at.
17+
18+
### Prior Implementations Of Coherence
19+
20+
* HLSL – `globallycoherent` keyword can be added to declarations (`globallycoherent RWStructuredBuffer<T> buffer`). This keyword ensures coherence with all operations to a tagged object. Memory scope of coherence is device memory. `groupshared` objects are likely coherent, specification does not specify.
21+
* GLSL – `coherent` keyword can be added to declarations (`coherent uniform image2D img`). This keyword ensures coherence with all operations to a tagged object. Memory scope of coherence is unspecified. Objects tagged with `shared` are [implicitly](https://www.khronos.org/opengl/wiki/Compute_Shader) `coherent`.
22+
* Metal – `memory_coherence::memory_coherence_device` is a generic argument to buffers. This argument ensures coherence with all operations to a tagged object. Memory scope of coherence is device memory.
23+
* WGSL – [All operations](https://www.w3.org/TR/WGSL/#private-vs-non-private) are coherent.
24+
25+
### SPIR-V Support For Coherence
26+
27+
Originally, SPIR-V supported coherent objects through the `coherent` type `Decoration`. Modern SPIR-V (`VulkanMemoryModel`) exposes this functionality differently to control coherence as a **per operation** functionality. Coherence is now done per operation through adding the memory operands `MakePointerAvailable`, `MakePointerVisible`, `MakeTexelAvailable`, and `MakeTexelVisible` to load and store operations. Users must additionally specify the memory scope to which an operation is coherent.
28+
29+
`MakePointerAvailable` is for memory stores of non textures, `OpStore`, `OpCooperativeMatrixStoreKHR` and `OpCooperativeVectorStoreNV`..
30+
31+
`MakePointerVisible` is for memory loads of non textures, `OpLoad`, `OpCooperativeMatrixLoadKHR` and `OpCooperativeVectorLoadNV`.
32+
33+
`MakeTexelAvailableKHR` is for memory stores of textures, `OpImageWrite`, `OpImageSparseLoad`.
34+
35+
`MakeTexelVisibleKHR` is for memory loads of textures, `OpImageRead`, `OpImageSparseRead`.
36+
37+
Additionally, `MakePointer{Visible,Available}` support usage in `OpCopyMemory` and `OpCopyMemorySized`.
38+
39+
### Example
40+
41+
The simple use-case of this feature can be modeled with the following example: (1) We have **thread1** and **thread2** both reading/writing to the same `RWStructuredBuffer`. (2) **thread1** `OpStore`’s non-coherently into the buffer. (3) if **thread2** uses an `OpLoad` on the texture they may not see the change **thread1** made for 2 reasons:
42+
43+
1) **thread1** does not promise its writes are visible to other threads. Cached writes may not immediately flush to device memory.
44+
2) **thread2** may load from a cache, not device memory. This means we will not see the new value because the new value was written to device memory, not the intermediate cache.
45+
46+
If we specify `MakePointerAvailable/MakePointerVisible` with `OpStore`/`OpLoad` to the memory scope `QueueFamily` we will solve this problem since we are flushing changes to a memory scope shared by the two threads. This additionally may be faster than the alternative of flushing to `Device` memory since a `QueueFamily` is a tighter scope than `Device`.
47+
48+
### Prior Implementations Of Access
49+
50+
* C/C++ – `const int* ptr` or `int const*` both mean the underlying data pointed to is constant
51+
* This will be equivalent to `Access::Read`
52+
* C/C++ – `int* const` means the value of the pointer is constant
53+
* This will be equivalent to `const Ptr<T, ...> ptr`
54+
55+
### Compiler Support For Coherence
56+
57+
This is currently planned to be a high-level concept which does not map to anything in SPIR-V.
58+
59+
## Proposed Solution
60+
61+
### Frontend For Pointer Access
62+
63+
Pointer Access will be implemented through a new generic-argument `Access access` on our `Ptr` data-type.
64+
We will also expose pointer `AddressSpace`s.
65+
66+
```c#
67+
enum Access : uint64_t
68+
{
69+
ReadWrite = 0,
70+
Read = 1
71+
//...
72+
}
73+
74+
enum AddressSpace : uint64_t
75+
{
76+
Generic = 0x7fffffff,
77+
// Corresponds to SPIR-V's SpvStorageClassPrivate
78+
ThreadLocal = 1,
79+
Global,
80+
// Corresponds to SPIR-V's SpvStorageClassWorkgroup
81+
GroupShared,
82+
// Corresponds to SPIR-V's SpvStorageClassUniform
83+
Uniform,
84+
// specific address space for payload data in metal
85+
MetalObjectData,
86+
// Corresponds to SPIR-V's SpvStorageClassInput
87+
Input,
88+
// Same as `Input`, but used for builtin input variables
89+
BuiltinInput,
90+
// Corresponds to SPIR-V's SpvStorageClassOutput
91+
Output,
92+
// Same as `Output`, but used for builtin output variables
93+
BuiltinOutput,
94+
// Corresponds to SPIR-V's SpvStorageClassTaskPayloadWorkgroupEXT
95+
TaskPayloadWorkgroup,
96+
// Corresponds to SPIR-V's SpvStorageClassFunction
97+
Function,
98+
// Corresponds to SPIR-V's SpvStorageClassStorageBuffer
99+
StorageBuffer,
100+
// Corresponds to SPIR-V's SpvStorageClassPushConstant
101+
PushConstant,
102+
// Corresponds to SPIR-V's SpvStorageClassRayPayloadKHR
103+
RayPayloadKHR,
104+
// Corresponds to SPIR-V's SpvStorageClassIncomingRayPayloadKHR
105+
IncomingRayPayload,
106+
// Corresponds to SPIR-V's SpvStorageClassCallableDataKHR
107+
CallableDataKHR,
108+
// Corresponds to SPIR-V's SpvStorageClassIncomingCallableDataKHR
109+
IncomingCallableData,
110+
// Corresponds to SPIR-V's SpvStorageClassHitObjectAttributeNV
111+
HitObjectAttribute,
112+
// Corresponds to SPIR-V's SpvStorageClassHitAttributeKHR
113+
HitAttribute,
114+
// Corresponds to SPIR-V's SpvStorageClassShaderRecordBufferKHR
115+
ShaderRecordBuffer,
116+
// Corresponds to SPIR-V's SpvStorageClassUniformConstant
117+
UniformConstant,
118+
// Corresponds to SPIR-V's SpvStorageClassImage
119+
Image,
120+
// Represents a SPIR-V specialization constant
121+
SpecializationConstant,
122+
// Corresponds to SPIR-V's SpvStorageClassNodePayloadAMDX
123+
NodePayloadAMDX,
124+
// Default address space for a user-defined pointer
125+
UserPointer = 0x100000001ULL,
126+
};
127+
128+
__generic<T, Access access = Access::ReadWrite, AddressSpace addrSpace = AddressSpace::UserPointer>
129+
struct Ptr
130+
{
131+
//...
132+
}
133+
```
134+
135+
If a pointer is `Access::Read`, a user program may only read from the given pointer. If a pointer is `Access::ReadWrite`, a user program may read from a given pointer or write to it.
136+
137+
### Frontend For Coherent Pointer Operations
138+
139+
We propose to implement coherence on a per-operation level for SPIR-V targets. This will be accomplished through new intrinsic methods to handle coherent load/store.
140+
141+
```c#
142+
public enum MemoryScope : int32_t
143+
{
144+
CrossDevice = 0,
145+
Device,
146+
Workgroup,
147+
Subgroup,
148+
Invocation,
149+
QueueFamily,
150+
ShaderCall,
151+
//...
152+
}
153+
154+
//// Ptr<>
155+
156+
// `ptr` is the value to be loaded.
157+
// The `int alignment` parameter controls the alignment to load from a pointer with.
158+
// The `MemoryScope scope` parameter controls the memory scope that an operation is coherent to.
159+
[ForceInline]
160+
[require(SPV_KHR_vulkan_memory_model)]
161+
__generic<T, Access access, AddressSpace addrSpace>
162+
T loadCoherent(Ptr<T, access, addrSpace> ptr, int alignment, constexpr MemoryScope scope = MemoryScope::Device);
163+
164+
// `ptr` is the dst for the store.
165+
// `val` is the value to store into `ptr`.
166+
// The `int alignment` parameter controls the alignment to load from a pointer with.
167+
// The `MemoryScope scope` parameter controls the memory scope that an operation is coherent to.
168+
[ForceInline]
169+
[require(SPV_KHR_vulkan_memory_model)]
170+
__generic<T, AddressSpace addrSpace>
171+
void storeCoherent(Ptr<T, Access::ReadWrite, addrSpace> ptr, T val, int alignment, constexpr MemoryScope scope = MemoryScope::Device);
172+
173+
//// CoopVec<>
174+
175+
// Return a `CoopVec`, loaded from `ptr`.
176+
[ForceInline]
177+
[require(SPV_KHR_vulkan_memory_model, cooperative_vector)]
178+
__generic<T : __BuiltinArithmeticType, let N : int, Access access, AddressSpace addrSpace>
179+
CoopVec<T, N> coopVecLoadCoherent(Ptr<T, access, addrSpace> ptr, int offset, int alignment, constexpr MemoryScope scope = MemoryScope::Device);
180+
181+
// As a method to `CoopVec`, keep consistent with how `CoopVec` is defined
182+
struct CoopVec<T, ...>
183+
{
184+
...
185+
// Store into `ptr` given `val`.
186+
[ForceInline]
187+
[require(SPV_KHR_vulkan_memory_model, cooperative_vector)]
188+
__generic<AddressSpace addrSpace>
189+
void storeCoherent(Ptr<T, Access::ReadWrite, addrSpace> ptr, int offset, int alignment, constexpr MemoryScope scope = MemoryScope::Device);
190+
...
191+
}
192+
193+
//// CoopMat<>
194+
195+
// Return a `CoopMat`, loaded from `ptr`.
196+
[ForceInline]
197+
[require(SPV_KHR_vulkan_memory_model, cooperative_matrix)]
198+
__generic<
199+
T : __BuiltinArithmeticType,
200+
let S : MemoryScope,
201+
let M : int,
202+
let N : int,
203+
let R : CoopMatMatrixUse,
204+
let matrixLayout : CoopMatMatrixLayout,
205+
let access : Access,
206+
let addrSpace : AddressSpace>
207+
CoopMat<T, S, M, N, R> coopMatLoadCoherent(Ptr<T, access, addrSpace> ptr, uint element, uint stride, int alignment, constexpr MemoryScope scope = MemoryScope::Device);
208+
209+
// As a method to `CoopMat`, keep consistent with how `CoopMat` is defined
210+
struct CoopMat<T, ...>
211+
{
212+
...
213+
// Store into `ptr` given `val`.
214+
[ForceInline]
215+
[require(SPV_KHR_vulkan_memory_model, cooperative_matrix)]
216+
__generic<AddressSpace addrSpace>
217+
void storeCoherent(Ptr<T, Access::ReadWrite, addrSpace> ptr, uint element, uint stride, int alignment, constexpr MemoryScope scope = MemoryScope::Device);
218+
...
219+
}
220+
```
221+
222+
### Support For Coherent Workgroup Memory
223+
224+
Any access through a coherent-pointer to a `groupshared` object is coherent; since Slang does not currently support pointers to `groupshared` memory, this proposal will extend the existing `AddressSpace::GroupShared` implementation for pointers as needed.
225+
226+
### Casting Pointers
227+
228+
All pointers can be casted to each other. Casting must be explicit.
229+
230+
### Banned keyword usage
231+
232+
The following keyword use is disallowed:
233+
* `globallycoherent T*`
234+
* `coherent T*`.
235+
* `const T*`, `T const*`, and `T* const`
236+
* `Ptr<const T>`, `Ptr<coherent T>`, and `Ptr<globallycoherent T>`
237+
238+
### Explicitly allowed keywords
239+
240+
`const Ptr<int>` is permitted. This means that a `Ptr` is constant, the address the pointer is pointing at will not change.
241+
242+
### Order of Implementation
243+
244+
* Frontend for pointer changes
245+
* Logic for pointer access
246+
* Support casting explicitly between pointers
247+
* Disallow `globallycoherent T*` and `coherent T*`
248+
* Disallow `const T*`, `T const*`, and `T* const`
249+
* Support for coherent buffers and textures
250+
* Support for workgroup memory pointers.
251+
* Support for coherent workgroup memory
252+
* Support for coherent cooperative matrix & cooperative vector
253+
254+
### Potential Next Steps
255+
256+
`coopVecLoadCoherent` and `coopMatLoadCoherent` should have a member-function version inside `CoopVec`/`CoopMat` (`loadCoherent`).
257+
258+
## Alternative Designs Considered
259+
260+
1. Using special methods (part of the `Ptr` type) to access coherent-operation functionality
261+
262+
```c#
263+
T* ptr1 = bufferPtr1;
264+
T* ptr2 = bufferPtr2;
265+
var loadedData = coherentLoad(ptr1, scope = MemoryScope::Device);
266+
coherentStore(ptr2, loadedData, scope = MemoryScope::Device);
267+
```
268+
269+
2. Tagging types as coherent through a modifier
270+
271+
```c#
272+
// Not allowed:
273+
globallycoherent RWStructuredBuffer<T> bufferPtr1 : register(u0);
274+
275+
cbuffer PtrBuffer
276+
{
277+
// We only allow coherent on pointers
278+
globallycoherent int* bufferPtr1;
279+
}
280+
```
281+
282+
3. ‘OOP’ approach, get a `CoherentPtr` from a regular `Ptr`. Any operation on a `CoherentPtr` will use the Coherent variant of a store/load.
283+
284+
```c#
285+
[require(SPV_KHR_vulkan_memory_model)]
286+
void computeMain()
287+
{
288+
int* ptr = gmemBuffer;
289+
CoherentPtr<int, MemoryScope::Workgroup> ptr_workgroup = CoherentPtr<int, MemoryScope::Workgroup>(gmemBuffer);
290+
CoherentPtr<int, MemoryScope::Workgroup> ptr_device = CoherentPtr<int, MemoryScope::Workgroup>(gmemBuffer);
291+
292+
293+
ptr_workgroup[0] = output[1];
294+
ptr_workgroup = ptr_workgroup + 1;
295+
output[2] = ptr_workgroup[0];
296+
ptr_workgroup = ptr_workgroup - 1;
297+
output[3] = ptr_workgroup[3];
298+
299+
300+
ptr[10] = 10;
301+
302+
ptr_device = ptr_device + 3;
303+
gmemBuffer[0] = 10;
304+
ptr_device[3] = output[3];
305+
}
306+
```
307+
308+
4. Modifier with parameter to specify memory-scope
309+
310+
```c#
311+
cbuffer PtrBuffer
312+
{
313+
int* bufferPtr1;
314+
}
315+
int main()
316+
{
317+
coherent<WorkgroupMemory> int* bufferPtrWorkgroup = bufferPtr1;
318+
coherent<Device> int* bufferPtrDevice = bufferPtr1;
319+
}
320+
```
321+
322+
5. Coherence as a generic argument
323+
324+
```c#
325+
typedef Ptr<int, AddressSpace::UserPointer, Access::ReadWrite, CoherentScope::Device> DeviceCoherentPtrInt;
326+
int main()
327+
{
328+
DeviceCoherentPtrInt ptr = DeviceCoherentPtrInt(&processMemory[id.x]);
329+
output[id] = ptr[id];
330+
}
331+
```
332+
333+
##

0 commit comments

Comments
 (0)