Skip to content

Commit 9bc0fc0

Browse files
authored
Merge pull request #24 from ArielG-NV/coherent-pointers
#30 - Coherent Pointers & Pointer Access Proposal
2 parents 3de73d8 + e69a7e4 commit 9bc0fc0

File tree

1 file changed

+225
-0
lines changed

1 file changed

+225
-0
lines changed

proposals/030-coherent-pointers.md

Lines changed: 225 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,225 @@
1+
# SP\#030: Coherent Pointers & Pointer Access
2+
3+
## Status
4+
5+
Status: Design Review
6+
Implementation:
7+
Author: Ariel Glasroth
8+
Reviewer:
9+
10+
## Background
11+
12+
### Introduction
13+
14+
GPUs have a concept known as coherent operations. Coherent operations flush cache for reads/writes so that when **thread A** modifies memory, **thread B** may read that memory, seeing all changes to memory done by a different thread. When flushing cache it is important to note that not all caches will be flushed. If a user wants coherence to `WorkGroup` memory, only the levels of cache up to `WorkGroup` memory will need to be flushed.
15+
16+
Additionally, pointers will be permitted to be marked as read-only or read/write. A read-only pointer will be immutable (unable to modify the data pointed to).
17+
18+
### Prior Implementations
19+
20+
* HLSL – `globallycoherent` keyword can be added to declarations (`globallycoherent RWStructuredBuffer<T> buffer`). This keyword ensures coherence with all operations to a tagged object. Memory scope of coherence is device memory. `groupshared` objects are likely coherent, specification does not specify.
21+
* GLSL– `coherent` keyword can be added to declarations (`coherent uniform image2D img`). This keyword ensures coherence with all operations to a tagged object. Memory scope of coherence is unspecified. Objects tagged with `shared` are [implicitly](https://www.khronos.org/opengl/wiki/Compute_Shader) `coherent`.
22+
* Metal – `memory_coherence::memory_coherence_device` is a generic argument to buffers. This argument ensures coherence with all operations to a tagged object. Memory scope of coherence is device memory.
23+
* WGSL – [All operations](https://www.w3.org/TR/WGSL/#private-vs-non-private) are coherent.
24+
25+
### SPIR-V Support For Coherence
26+
27+
Originally, SPIR-V supported coherent objects through the `coherent` type `Decoration`. Modern SPIR-V (`VulkanMemoryModel`) exposes this functionality differently to control coherence as a **per operation** functionality. Coherence is now done per operation through adding the memory operands `MakePointerAvailable`, `MakePointerVisible`, `MakeTexelAvailable`, and `MakeTexelVisible` to load and store operations. Users must additionally specify the memory scope to which an operation is coherent.
28+
29+
`MakePointerAvailable` is for memory stores of non textures, `OpStore`, `OpCooperativeMatrixStoreKHR` and `OpCooperativeVectorStoreNV`..
30+
31+
`MakePointerVisible` is for memory loads of non textures, `OpLoad`, `OpCooperativeMatrixLoadKHR` and `OpCooperativeVectorLoadNV`.
32+
33+
`MakeTexelAvailableKHR` is for memory stores of textures, `OpImageWrite`, `OpImageSparseLoad`.
34+
35+
`MakeTexelVisibleKHR` is for memory loads of textures, `OpImageRead`, `OpImageSparseRead`.
36+
37+
Additionally, `MakePointer{Visible,Available}` support usage in `OpCopyMemory` and `OpCopyMemorySized`.
38+
39+
### Example
40+
41+
The simple use-case of this feature can be modeled with the following example: (1) We have **thread1** and **thread2** both reading/writing to the same `RWStructuredBuffer`. (2) **thread1** `OpStore`’s non-coherently into the buffer. (3) if **thread2** uses an `OpLoad` on the texture they may not see the change **thread1** made for 2 reasons:
42+
43+
1) **thread1** does not promise its writes are visible to other threads. Cached writes may not immediately flush to device memory.
44+
2) **thread2** may load from a cache, not device memory. This means we will not see the new value because the new value was written to device memory, not the intermediate cache.
45+
46+
If we specify `MakePointerAvailable/MakePointerVisible` with `OpStore`/`OpLoad` to the memory scope `QueueFamily` we will solve this problem since we are flushing changes to a memory scope shared by the two threads. This additionally may be faster than the alternative of flushing to `Device` memory since a `QueueFamily` is a tighter scope than `Device`.
47+
48+
## Proposed Solution
49+
50+
### Frontend For Coherent Pointers & Pointer Access
51+
52+
We propose to implement coherence on a per-operation level for only SPIR-V targets. This will be accomplished through modifying `Ptr` to include the new generic argument `CoherentScope coherentScope`.
53+
54+
We also propose the new generic argument `Access access` to specify if a pointer is read-only or not.
55+
56+
```c#
57+
public enum CoherentScope
58+
{
59+
NotCoherent = 0xFF,
60+
CrossDevice = MemoryScope::CrossDevice,
61+
Device = MemoryScope::Device,
62+
Workgroup = MemoryScope::Workgroup,
63+
Subgroup = MemoryScope::Subgroup,
64+
Invocation = MemoryScope::Invocation,
65+
QueueFamily = MemoryScope::QueueFamily,
66+
ShaderCall = MemoryScope::ShaderCallKHR,
67+
//...
68+
}
69+
70+
public enum Access
71+
{
72+
ReadWrite = 0,
73+
Read = 1
74+
}
75+
76+
__generic<T, uint64_t addrSpace=AddressSpace::UserPointer, Access access = Access::ReadWrite, CoherentScope coherentScope=CoherentScope::NotCoherent>
77+
struct Ptr
78+
{
79+
...
80+
}
81+
```
82+
83+
If `coherentScope` is not `CoherentScope::NotCoherent`, all accesses to memory through this pointer will be considered coherent to the specified memory scope (example: `CoherentScope::Device` is coherent to the memory scope of `Device`).
84+
85+
If `access` is `Access::ReadWrite` a pointer can read/write to the data pointed to.
86+
If `access` is `Access::Read`, a pointer will only be allowed to read from the data pointed to.
87+
88+
We will also provide a a type alias for user-convenience.
89+
90+
```c#
91+
__generic<T, Access access = Access::ReadWrite>
92+
typealias CoherentPtr = Ptr<T, AddressSpace::UserPointer, access, CoherentScope::Device>;
93+
```
94+
95+
### Support For Coherent Buffers and Textures
96+
97+
Any access through a coherent-pointer to a buffer/texture is coherent.
98+
99+
```c#
100+
RWStructuredBuffer<int> val; // Texture works as well.
101+
CoherentPtr<int, CoherentScope::Device> p = &val[0];
102+
*p = 10; // coherent store
103+
p = p+10;
104+
int b = *p; //coherent load
105+
int c = val[10] + *p; //allowed to use coherent and non-coherent simultaneously
106+
```
107+
108+
### Support For Coherent Workgroup Memory
109+
110+
Any access through a coherent-pointer to a `groupshared` object is coherent; Since Slang does not currently support pointers to `groupshared` memory, this proposal will extend the existing `AddressSpace::GroupShared` implementation for pointers as needed.
111+
112+
### Support For Coherent Cooperative Matrix & Cooperative Vector
113+
114+
`CoopVec` and `CoopMat` load data into their respective data-structures from other objects using `CoopVec::Load`, `CoopVec::Store`, `CoopMat::Load`, and `CoopMat::Store`. Due to this design, we will add coherent operations to `CoopVec` and `CoopMat` by modifying `CoopVec::Load`, `CoopVec::Store`, `CoopMat::Load`, and `CoopMat::Store` to complete coherent operations if given a `CoherentPtr` as a parameter. Syntax required to use the method(s) will not change.
115+
116+
### Support Casting Pointers With Different `CoherentScope`
117+
118+
We will allow pointers with different `CoherentScope` to be explicitly castable to each other. For example, `CoherentPtr<int, CoherentScope::Device>` will be castable to `CoherentPtr<int, MemoryScope.Workgroup>`.
119+
120+
### Casting and Pointer access
121+
122+
We will not allow casting between pointers of different `Access`.
123+
124+
### Banned keywords
125+
126+
HLSL style `globallycoherent T*` and GLSL style `coherent T*` will be disallowed.
127+
128+
### Order of Implementation
129+
130+
* Frontend for coherent pointers & pointer access
131+
* Logic for pointer access
132+
* Support for coherent buffers and textures
133+
* Support casting between pointers with different `CoherentScope`
134+
* Support for workgroup memory pointers.
135+
* Support for coherent workgroup memory
136+
* Support for coherent cooperative matrix & cooperative vector
137+
* disallow `globallycoherent T*` and `coherent T*`
138+
139+
## Future Work
140+
141+
### Supporting Aligned Loads
142+
143+
Users may choose to load coherently given a specific alignment. This will be supported through the `[Align(ALIGNMENT)]` decoration.
144+
145+
```c#
146+
[Align(ALIGNMENT)]
147+
struct MyType {...}
148+
149+
MyType* p = ...;
150+
...
151+
let a = *p; // should be aligned load;
152+
let b = p.member; // should be aligned load, with alignment derived from both `MyType` and `member`'s type.
153+
```
154+
155+
When loading data from a pointer `p` Slang will honor the alignment and emit an `OpLoad` with the SPIR-V `Aligned` memory operand, providing the argument `ALIGNMENT`. This will function alongside `coherent` pointers.
156+
157+
### Additional Pointer Arguments
158+
159+
`Volatile` and `Const` are planned features for `Ptr`.
160+
161+
## Alternative Designs Considered
162+
163+
1. Using special methods (part of the `Ptr` type) to access coherent-operation functionality
164+
165+
```c#
166+
T* ptr1 = bufferPtr1;
167+
T* ptr2 = bufferPtr2;
168+
var loadedData = coherentLoad(ptr1, scope = MemoryScope::Device);
169+
coherentStore(ptr2, loadedData, scope = MemoryScope::Device);
170+
```
171+
172+
2. Tagging types as coherent through a modifier
173+
174+
```c#
175+
// Not allowed:
176+
globallycoherent RWStructuredBuffer<T> bufferPtr1 : register(u0);
177+
178+
cbuffer PtrBuffer
179+
{
180+
// We only allow coherent on pointers
181+
globallycoherent int* bufferPtr1;
182+
}
183+
```
184+
185+
3. ‘OOP’ approach, get a `CoherentPtr` from a regular `Ptr`. Any operation on a `CoherentPtr` will use the Coherent variant of a store/load.
186+
187+
```c#
188+
[require(SPV_KHR_vulkan_memory_model)]
189+
void computeMain()
190+
{
191+
int* ptr = gmemBuffer;
192+
CoherentPtr<int, MemoryScope::Workgroup> ptr_workgroup = CoherentPtr<int, MemoryScope::Workgroup>(gmemBuffer);
193+
CoherentPtr<int, MemoryScope::Workgroup> ptr_device = CoherentPtr<int, MemoryScope::Workgroup>(gmemBuffer);
194+
195+
196+
ptr_workgroup[0] = output[1];
197+
ptr_workgroup = ptr_workgroup + 1;
198+
output[2] = ptr_workgroup[0];
199+
ptr_workgroup = ptr_workgroup - 1;
200+
output[3] = ptr_workgroup[3];
201+
202+
203+
ptr[10] = 10;
204+
205+
ptr_device = ptr_device + 3;
206+
gmemBuffer[0] = 10;
207+
ptr_device[3] = output[3];
208+
}
209+
```
210+
211+
4. Modifier with parameter to specify memory-scope
212+
213+
```c#
214+
cbuffer PtrBuffer
215+
{
216+
int* bufferPtr1;
217+
}
218+
int main()
219+
{
220+
coherent<WorkgroupMemory> int* bufferPtrWorkgroup = bufferPtr1;
221+
coherent<Device> int* bufferPtrDevice = bufferPtr1;
222+
}
223+
```
224+
225+
##

0 commit comments

Comments
 (0)