Skip to content

Speed up HLSL preprocessing and prepared SPIR-V hot paths#1029

Open
AnastaZIuk wants to merge 37 commits intomasterfrom
unroll
Open

Speed up HLSL preprocessing and prepared SPIR-V hot paths#1029
AnastaZIuk wants to merge 37 commits intomasterfrom
unroll

Conversation

@AnastaZIuk
Copy link
Member

@AnastaZIuk AnastaZIuk commented Mar 24, 2026

Summary

  • advance the DXC pointer to the current unroll-devshFixes line and promote the matching NSC channel
  • reduce Wave preprocess overhead in the hot EX31 HLSL path
  • reduce redundant include lookup and hashing work in the shader-compiler include path
  • add a prepared single-entrypoint fast path to ISPIRVEntryPointTrimmer
  • validate SPIR-V blobs once per unique content hash instead of revalidating the same blob on every hot pipeline-create path
  • thread one IGPUPipelineCache through compute, resolve, ImGui, and fullscreen present in the paired EX31 flow
  • update the Examples pointer to the paired Devsh-Graphics-Programming/Nabla-Examples-and-Tests#262

Root cause

Three costs were stacking on top of each other.

First, the preprocess part comes from avoidable HLSL include debt in the hot path:

  • path_tracing/concepts.hlsl on the base branch pulls bxdf/common.hlsl only to synthesize a placeholder interaction for Ray::setInteraction; that edge comes from 4d186db76f
  • member_test_macros.hlsl on the base branch uses the umbrella boost/preprocessor.hpp even though this header only needs a narrow subset; that comes from 72972a9d6e
  • the custom Wave include bridge on this path was introduced in 12afd3d42d, which added the custom Boost.Wave context and include-path classes for the HLSL preprocessor; dxc_compile_flags pragma bookkeeping was later layered on in ae4386064cf; later merges, cleanup, depfile plumbing, and backports carried the same path forward but are not the semantic origin of the extra per-include work

Second, the base include-loader path pays redundant work before preprocessing reaches DXC. The current disk-backed include body load path in IShaderCompiler.cpp comes from 5ac3b55552 and later loader reshapes like cc37325f28c. Per-lookup content hashing on that path was added in cf9a866623. The current local-first probe order for globally rooted names like nbl/... comes from 1f73d6a707.

Third, the pre-fast-path trimmer always validated and walked the incoming module before it could know whether the requested entrypoint set already matched the prepared shader. The old flow is visible in ISPIRVEntryPointTrimmer.cpp#L104-L246. That shape comes from cfb4bd1da6 and 9f3f823124.

The fullscreen-present helper was introduced in 2b08a15064. In that shape CFullScreenTriangle.cpp#L120 did not yet thread an external pipeline cache, so compute and present could not populate the same cache blob.

What this changes

  • cache and reuse include resolution results in the Wave bridge
  • avoid redundant reload and rehash work in the include loader path
  • route globally rooted includes like nbl/... through the global search path first instead of probing the local source directory first
  • trim token bookkeeping in CWaveStringResolver
  • replace the umbrella Boost include in member_test_macros.hlsl with the narrow Boost headers it actually uses
  • remove redundant public HLSL includes from hot headers and stop pulling bxdf/common.hlsl into path_tracing/concepts.hlsl
  • short-circuit ISPIRVEntryPointTrimmer when the incoming module is already a prepared single-entrypoint shader
  • cache successful validation per unique SPIR-V blob so hot paths keep validation without paying for it again
  • thread an external pipeline cache through FullScreenTriangle so EX31 can share one cache object across compute and present

Validation

Validation was run on AMD Ryzen 5 5600G with Radeon Graphics (6C/12T).

Local sequential nsc -P measurements on large EX31 path-tracer inputs moved preprocess time on representative heavy rules from roughly 8 s down to roughly 2.5 s, which is about a 3.20x speedup and a 68.75% reduction.

On the paired EX31 branch, the Debug warm-cache path moved first_render_submit_ms from 13952 to 2698, which is a 5.17x speedup and an 80.66% reduction.

Prepared-shader and pipeline-cache validation on the paired EX31 branch is recorded in Devsh-Graphics-Programming/Nabla-Examples-and-Tests#262.

@AnastaZIuk AnastaZIuk changed the title Support EX31 precompiled path tracer fast paths on unroll Reduce HLSL preprocess overhead and speed up prepared SPIR-V hot paths Mar 24, 2026
@AnastaZIuk AnastaZIuk changed the title Reduce HLSL preprocess overhead and speed up prepared SPIR-V hot paths Speed up HLSL preprocessing and prepared SPIR-V hot paths Mar 24, 2026
Comment on lines -683 to +749
if (auto contents = m_defaultFileSystemLoader->getInclude(requestingSourceDir.string(), lookupName))
retVal = std::move(contents);
else retVal = std::move(trySearchPaths(lookupName));
if (asset::detail::isGloballyResolvedIncludeName(lookupName))
{
if (auto contents = tryIncludeGenerators(lookupName))
retVal = std::move(contents);
else if (auto contents = trySearchPaths(lookupName, needHash))
retVal = std::move(contents);
else retVal = m_defaultFileSystemLoader->getInclude(requestingSourceDir.string(), lookupName, needHash);
}
else
{
if (auto contents = m_defaultFileSystemLoader->getInclude(requestingSourceDir.string(), lookupName, needHash))
retVal = std::move(contents);
else if (auto contents = tryIncludeGenerators(lookupName))
retVal = std::move(contents);
else retVal = std::move(trySearchPaths(lookupName, needHash));
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain the reason for this change

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you shouldn't try different include generators, the include generators should only be reachable with #include <> a and not #include ""

Also why should the precedence of a search path and default include loaders change depending on the path ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants