Skip to content

Add VK_EXT_rasterization_order_attachment_access sample#1492

Merged
marty-johnson59 merged 1 commit intoKhronosGroup:mainfrom
ellioman:rasterization-order-attachment-access
Mar 9, 2026
Merged

Add VK_EXT_rasterization_order_attachment_access sample#1492
marty-johnson59 merged 1 commit intoKhronosGroup:mainfrom
ellioman:rasterization-order-attachment-access

Conversation

@ellioman
Copy link
Contributor

@ellioman ellioman commented Feb 26, 2026

Description

New extension sample for VK_EXT_rasterization_order_attachment_access, used together with VK_KHR_dynamic_rendering and VK_KHR_dynamic_rendering_local_read.

There is already a sample for VK_KHR_dynamic_rendering_local_read, but it doesn't cover rasterization-order guarantees or intra-draw fragment ordering. The interaction between local reads and ROAA is a common source of confusion: it's not always clear when local reads alone are sufficient and when rasterization-order guarantees are needed. This sample addresses that gap.

The scene is similar to the oit_linked_lists sample but uses a different transparency technique. Instead of per-fragment linked lists, the fragment shader reads the current framebuffer value with subpassLoad() and blends manually. ROAA guarantees that overlapping fragments at the same pixel are processed in primitive order, so the blending is deterministic without CPU sorting.

A runtime toggle compares two approaches:

  • ROAA ON: Single instanced draw call. The extension guarantees per-fragment ordering.
  • ROAA OFF: One draw call per sphere with pipeline barriers between each.

GPU timing and draw call/barrier counts are displayed in the UI.

With 16x16x16 spheres, ROAA replaces many draw calls and barriers with a single instanced draw, which gives a measurable performance win. On a Pixel 6 (Mali-G78): ~37-40 ms/frame with ROAA enabled vs ~45-47 ms/frame disabled. On a Pixel 8 (Mali-G715): ~23 ms vs ~25 ms. The benefit is larger on older GPUs, which is where the extension matters most in practice.

Tested on: Vivo X200 (Arm Immortalis-G925 MC12), Google Pixel 8 (Arm Mali-G715), Google Pixel 6 (Arm Mali-G78)

ROAA_On

Files added

  • samples/extensions/rasterization_order_attachment_access/ - sample source (.h, .cpp, README.adoc)
  • shaders/rasterization_order_attachment_access/glsl/ - vertex and fragment shaders (background, blend, fullscreen)

Extensions used

  • VK_KHR_dynamic_rendering
  • VK_KHR_dynamic_rendering_local_read
  • VK_EXT_rasterization_order_attachment_access
  • VK_KHR_synchronization2

General Checklist:

Please ensure the following points are checked:

  • My code follows the coding style
  • I have reviewed file licenses
  • I have commented any added functions (in line with Doxygen)
  • I have commented any code that could be hard to understand
  • My changes do not add any new compiler warnings
  • My changes do not add any new validation layer errors or warnings
  • I have used existing framework/helper functions where possible
  • My changes do not add any regressions
  • I have tested every sample to ensure everything runs correctly
  • This PR describes the scope and expected impact of the changes I am making

Note: The Samples CI runs a number of checks including:

  • I have updated the header Copyright to reflect the current year (CI build will fail if Copyright is out of date)
  • My changes build on Windows, Linux, macOS and Android. Otherwise I have documented any exceptions

If this PR contains framework changes:

  • I did a full batch run using the batch command line argument to make sure all samples still work properly

Sample Checklist

If your PR contains a new or modified sample, these further checks must be carried out in addition to the General Checklist:

  • I have tested the sample on at least one compliant Vulkan implementation
  • If the sample is vendor-specific, I have tagged it appropriately
  • I have stated on what implementation the sample has been tested so that others can test on different implementations and platforms
  • Any dependent assets have been merged and published in downstream modules
  • For new samples, I have added a paragraph with a summary to the appropriate chapter in the readme of the folder that the sample belongs to e.g. api samples readme
  • For new samples, I have added a tutorial README.md file to guide users through what they need to know to implement code using this feature. For example, see conditional_rendering
  • For new samples, I have added a link to the Antora navigation so that the sample will be listed at the Vulkan documentation site

Jira Task

https://jira.arm.com/browse/STEGFX-370

@JoseEmilio-ARM JoseEmilio-ARM requested a review from a team February 27, 2026 08:45
@gary-sweet
Copy link
Contributor

Correctly reports as not supported for Broadcom, but that's about all I can say.

@asuessenbach
Copy link
Contributor

Correctly reports as not supported for Broadcom

Same holds for me on Win11 with an NVIDIA GPU.

As you already have a non-ROAA-path included, maybe you could make the support of VK_EXT_RASTERIZATION_ORDER_ATTACHMENT_ACCESS_EXTENSION_NAME and VkPhysicalDeviceRasterizationOrderAttachmentAccessFeaturesEXT, rasterizationOrderColorAttachmentAccess optional?
Would at least showcase how it's supposed to look like without that extension and feature.

@ellioman ellioman force-pushed the rasterization-order-attachment-access branch from 4a3fe37 to b46bfdd Compare March 3, 2026 17:54
@ellioman
Copy link
Contributor Author

ellioman commented Mar 4, 2026

@asuessenbach @gary-sweet

Thanks for taking a look.

I thought about making the sample work on non supported platforms but wanted to get feedback before doing that.
Based on your comments I've done that now so it should run on every platform but on the non-ROAA supported platforms the UI will clearly show that it's not supported

image

@asuessenbach
Copy link
Contributor

ROAA guarantees that overlapping fragments at the same pixel are processed in primitive order, so the blending is deterministic without CPU sorting.

That is, the rendering is deterministic (in primitive order), but generally not correct, due to potentially wrong order of blending, right?
If so, what is the use case or ROAA? A deterministic but wrong image doesn't sound very valuable.

@ellioman
Copy link
Contributor Author

ellioman commented Mar 4, 2026

ROAA guarantees that overlapping fragments at the same pixel are processed in primitive order, so the blending is deterministic without CPU sorting.

That is, the rendering is deterministic (in primitive order), but generally not correct, due to potentially wrong order of blending, right? If so, what is the use case or ROAA? A deterministic but wrong image doesn't sound very valuable.

Good question.

The sample focuses on demonstrating the API setup and runtime behavior of ROAA with dynamic rendering local read, rather than implementing a full compositing or transparency pipeline. The goal was to keep the example minimal and focused on how ROAA and attachment local read work together.

ROAA guarantees deterministic primitive-order access, not automatic depth sorting. Whether blending is correct depends on the order in which primitives are submitted, which is fully under application control.

If primitives are submitted in the intended compositing order, for example back-to-front for traditional alpha blending, ROAA guarantees that order is respected at the pixel level. In that case, the result is both deterministic and correct.

If primitives are not sorted, then the result is still well-defined and deterministic, but order-dependent, exactly like fixed-function blending would be. The key difference is that without ROAA, overlapping fragments processed in a single draw with framebuffer fetch have undefined ordering and may produce frame-to-frame or vendor-dependent variation.

The primary use cases for ROAA are:

  • Enabling programmable blending via framebuffer fetch with well-defined ordering semantics.
  • Avoiding multi-draw plus barrier sequences that would otherwise be required for correct attachment read and write ordering.
  • Preserving tile-local rendering efficiency on tile-based GPUs by keeping the operation in a single draw without forcing tile flushes.
  • Ensuring temporal stability and reproducibility in order-dependent effects.

So the value is not "deterministic but wrong," but rather "deterministic and well-defined," while allowing applications to control ordering explicitly when needed.

If it would make the intent clearer, I can also extend the sample with a simple back-to-front CPU sort to demonstrate traditional alpha-correct compositing on top of ROAA.

@asuessenbach
Copy link
Contributor

So the value is not "deterministic but wrong," but rather "deterministic and well-defined," while allowing applications to control ordering explicitly when needed.

That sounds reasonable. Maybe you should add a few (more) words on ordering for transparency and overdrawing to the readme of this sample.

If it would make the intent clearer, I can also extend the sample with a simple back-to-front CPU sort to demonstrate traditional alpha-correct compositing on top of ROAA.

You're right, extending the sample with CPU sorting might actually hide the ROAA usage. That is, I'd vote for keeping it as small as it is, just with some more text in the readme.

@ellioman ellioman force-pushed the rasterization-order-attachment-access branch from b46bfdd to 4fa1477 Compare March 4, 2026 20:52
@ellioman
Copy link
Contributor Author

ellioman commented Mar 4, 2026

That sounds reasonable. Maybe you should add a few (more) words on ordering for transparency and overdrawing to the readme of this sample.

Done!
I've added an "Ordering and Transparency" subsection to the README and updated the Scene Setup section to explain that the spheres are intentionally unsorted to keep the sample focused on the API setup and performance comparison.

@ellioman
Copy link
Contributor Author

ellioman commented Mar 5, 2026

@gary-sweet when you have a moment, can I get you to look at this again?
The sample should run now on devices that do not have ROAA support but with a UI message notifying the user.

@gary-sweet
Copy link
Contributor

Thanks. This does now run and says that ROAA is not supported.

Copy link
Collaborator

@SaschaWillems SaschaWillems left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great sample. Successfully tested it on a Google Pixel 10.

@marty-johnson59 marty-johnson59 merged commit f4f5ce5 into KhronosGroup:main Mar 9, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants