Speed up HLSL preprocessing and prepared SPIR-V hot paths by AnastaZIuk · Pull Request #1029 · Devsh-Graphics-Programming/Nabla

AnastaZIuk · 2026-03-24T16:24:46Z

Summary

advance the DXC pointer to the current unroll-devshFixes line and promote the matching NSC channel
reduce Wave preprocess overhead in the hot EX31 HLSL path
reduce redundant include lookup and hashing work in the shader-compiler include path
add a prepared single-entrypoint fast path to ISPIRVEntryPointTrimmer
validate SPIR-V blobs once per unique content hash instead of revalidating the same blob on every hot pipeline-create path
thread one IGPUPipelineCache through compute, resolve, ImGui, and fullscreen present in the paired EX31 flow
update the Examples pointer to the paired Devsh-Graphics-Programming/Nabla-Examples-and-Tests#262

Root cause

Three costs were stacking on top of each other.

First, the preprocess part comes from avoidable HLSL include debt in the hot path:

path_tracing/concepts.hlsl on the base branch pulls bxdf/common.hlsl only to synthesize a placeholder interaction for Ray::setInteraction; that edge comes from 4d186db76f
member_test_macros.hlsl on the base branch uses the umbrella boost/preprocessor.hpp even though this header only needs a narrow subset; that comes from 72972a9d6e
the custom Wave include bridge on this path was introduced in 12afd3d42d, which added the custom Boost.Wave context and include-path classes for the HLSL preprocessor; dxc_compile_flags pragma bookkeeping was later layered on in ae4386064cf; later merges, cleanup, depfile plumbing, and backports carried the same path forward but are not the semantic origin of the extra per-include work

Second, the base include-loader path pays redundant work before preprocessing reaches DXC. The current disk-backed include body load path in IShaderCompiler.cpp comes from 5ac3b55552 and later loader reshapes like cc37325f28c. Per-lookup content hashing on that path was added in cf9a866623. The current local-first probe order for globally rooted names like nbl/... comes from 1f73d6a707.

Third, the pre-fast-path trimmer always validated and walked the incoming module before it could know whether the requested entrypoint set already matched the prepared shader. The old flow is visible in ISPIRVEntryPointTrimmer.cpp#L104-L246. That shape comes from cfb4bd1da6 and 9f3f823124.

The fullscreen-present helper was introduced in 2b08a15064. In that shape CFullScreenTriangle.cpp#L120 did not yet thread an external pipeline cache, so compute and present could not populate the same cache blob.

What this changes

cache and reuse include resolution results in the Wave bridge
avoid redundant reload and rehash work in the include loader path
route globally rooted includes like nbl/... through the global search path first instead of probing the local source directory first
trim token bookkeeping in CWaveStringResolver
replace the umbrella Boost include in member_test_macros.hlsl with the narrow Boost headers it actually uses
remove redundant public HLSL includes from hot headers and stop pulling bxdf/common.hlsl into path_tracing/concepts.hlsl
short-circuit ISPIRVEntryPointTrimmer when the incoming module is already a prepared single-entrypoint shader
cache successful validation per unique SPIR-V blob so hot paths keep validation without paying for it again
thread an external pipeline cache through FullScreenTriangle so EX31 can share one cache object across compute and present

Validation

Validation was run on AMD Ryzen 5 5600G with Radeon Graphics (6C/12T).

Local sequential nsc -P measurements on large EX31 path-tracer inputs moved preprocess time on representative heavy rules from roughly 8 s down to roughly 2.5 s, which is about a 3.20x speedup and a 68.75% reduction.

On the paired EX31 branch, the Debug warm-cache path moved first_render_submit_ms from 13952 to 2698, which is a 5.17x speedup and an 80.66% reduction.

Prepared-shader and pipeline-cache validation on the paired EX31 branch is recorded in Devsh-Graphics-Programming/Nabla-Examples-and-Tests#262.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

This reverts commit 94a501f.

devshgraphicsprogramming · 2026-03-25T08:11:58Z

src/nbl/asset/utils/IShaderCompiler.cpp

-    if (auto contents = m_defaultFileSystemLoader->getInclude(requestingSourceDir.string(), lookupName))
-        retVal = std::move(contents);
-    else retVal = std::move(trySearchPaths(lookupName));
+    if (asset::detail::isGloballyResolvedIncludeName(lookupName))
+    {
+        if (auto contents = tryIncludeGenerators(lookupName))
+            retVal = std::move(contents);
+        else if (auto contents = trySearchPaths(lookupName, needHash))
+            retVal = std::move(contents);
+        else retVal = m_defaultFileSystemLoader->getInclude(requestingSourceDir.string(), lookupName, needHash);
+    }
+    else
+    {
+        if (auto contents = m_defaultFileSystemLoader->getInclude(requestingSourceDir.string(), lookupName, needHash))
+            retVal = std::move(contents);
+        else if (auto contents = tryIncludeGenerators(lookupName))
+            retVal = std::move(contents);
+        else retVal = std::move(trySearchPaths(lookupName, needHash));
+    }


explain the reason for this change

you shouldn't try different include generators, the include generators should only be reachable with #include <> a and not #include ""

Also why should the precedence of a search path and default include loaders change depending on the path ?

AnastaZIuk and others added 22 commits March 22, 2026 21:52

Reduce Wave preprocess overhead and update DXC pointer

624184f

Advance DXC to latest unroll-devshFixes

555684d

Restore default include search path for builtin HLSL

03ad12b

Advance DXC to latest unroll-devshFixes

ac0289d

Promote NSC channel ac0289d (#1028)

8e3c301

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update examples_tests to local unroll

fe4a528

Merge remote-tracking branch 'origin/master' into unroll-local-sync

441dcb2

Update EX31 examples pointer

f195565

Wire path tracer pipeline cache

697cfcf

Update path tracer examples pointer

fad9d56

Add SPIR-V trimmer fast path

a0b65da

Update path tracer examples pointer

9515bdd

Update path tracer examples pointer

939de4f

Update path tracer examples pointer

dd5180b

Trim manifest whitespace and update examples pointer

8d3e66d

Clean up shader review leftovers

cba6113

Update path tracer examples pointer

476a5bf

Cache validated SPIR-V hashes

d986945

Update path tracer examples pointer

5ecde9a

Tighten final shader cleanup

1ede3de

Update path tracer examples pointer

758f7c8

Update path tracer examples pointer

8745660

AnastaZIuk mentioned this pull request Mar 24, 2026

Precompile and cache EX31 path tracer variants Devsh-Graphics-Programming/Nabla-Examples-and-Tests#262

Open

Mark generated NSC headers correctly

94a501f

AnastaZIuk changed the title ~~Support EX31 precompiled path tracer fast paths on unroll~~ Reduce HLSL preprocess overhead and speed up prepared SPIR-V hot paths Mar 24, 2026

AnastaZIuk added 4 commits March 24, 2026 18:32

Update path tracer examples pointer

b1f28c0

Revert "Mark generated NSC headers correctly"

4b444b6

This reverts commit 94a501f.

Update path tracer examples pointer

f4b0aed

Update path tracer examples pointer

01794c5

AnastaZIuk changed the title ~~Reduce HLSL preprocess overhead and speed up prepared SPIR-V hot paths~~ Speed up HLSL preprocessing and prepared SPIR-V hot paths Mar 24, 2026

AnastaZIuk added 10 commits March 24, 2026 22:11

Update path tracer examples pointer

4a0c2e2

Update path tracer examples pointer

02f04db

Update path tracer examples pointer

3ae2b26

Update path tracer examples pointer

fcae991

Update path tracer examples pointer

c8af81b

Update path tracer examples pointer

3541a9d

Validate SPIR-V once per blob

8723771

Update EX31 examples pointer

52ae40b

Update EX31 examples pointer

6476500

Update EX31 examples pointer

f5f036e

devshgraphicsprogramming reviewed Mar 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up HLSL preprocessing and prepared SPIR-V hot paths#1029

Speed up HLSL preprocessing and prepared SPIR-V hot paths#1029
AnastaZIuk wants to merge 37 commits intomasterfrom
unroll

AnastaZIuk commented Mar 24, 2026 •

edited

Loading

Uh oh!

devshgraphicsprogramming Mar 25, 2026

Uh oh!

devshgraphicsprogramming Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AnastaZIuk commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

What this changes

Validation

Uh oh!

devshgraphicsprogramming Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

devshgraphicsprogramming Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AnastaZIuk commented Mar 24, 2026 •

edited

Loading