Speed up HLSL preprocessing and prepared SPIR-V hot paths#1029
Open
AnastaZIuk wants to merge 37 commits intomasterfrom
Open
Speed up HLSL preprocessing and prepared SPIR-V hot paths#1029AnastaZIuk wants to merge 37 commits intomasterfrom
AnastaZIuk wants to merge 37 commits intomasterfrom
Conversation
Comment on lines
-683
to
+749
| if (auto contents = m_defaultFileSystemLoader->getInclude(requestingSourceDir.string(), lookupName)) | ||
| retVal = std::move(contents); | ||
| else retVal = std::move(trySearchPaths(lookupName)); | ||
| if (asset::detail::isGloballyResolvedIncludeName(lookupName)) | ||
| { | ||
| if (auto contents = tryIncludeGenerators(lookupName)) | ||
| retVal = std::move(contents); | ||
| else if (auto contents = trySearchPaths(lookupName, needHash)) | ||
| retVal = std::move(contents); | ||
| else retVal = m_defaultFileSystemLoader->getInclude(requestingSourceDir.string(), lookupName, needHash); | ||
| } | ||
| else | ||
| { | ||
| if (auto contents = m_defaultFileSystemLoader->getInclude(requestingSourceDir.string(), lookupName, needHash)) | ||
| retVal = std::move(contents); | ||
| else if (auto contents = tryIncludeGenerators(lookupName)) | ||
| retVal = std::move(contents); | ||
| else retVal = std::move(trySearchPaths(lookupName, needHash)); | ||
| } |
There was a problem hiding this comment.
explain the reason for this change
There was a problem hiding this comment.
you shouldn't try different include generators, the include generators should only be reachable with #include <> a and not #include ""
Also why should the precedence of a search path and default include loaders change depending on the path ?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
unroll-devshFixesline and promote the matching NSC channelISPIRVEntryPointTrimmerIGPUPipelineCachethrough compute, resolve, ImGui, and fullscreen present in the paired EX31 flowRoot cause
Three costs were stacking on top of each other.
First, the preprocess part comes from avoidable HLSL include debt in the hot path:
path_tracing/concepts.hlslon the base branch pullsbxdf/common.hlslonly to synthesize a placeholder interaction forRay::setInteraction; that edge comes from4d186db76fmember_test_macros.hlslon the base branch uses the umbrellaboost/preprocessor.hppeven though this header only needs a narrow subset; that comes from72972a9d6e12afd3d42d, which added the custom Boost.Wave context and include-path classes for the HLSL preprocessor;dxc_compile_flagspragma bookkeeping was later layered on inae4386064cf; later merges, cleanup, depfile plumbing, and backports carried the same path forward but are not the semantic origin of the extra per-include workSecond, the base include-loader path pays redundant work before preprocessing reaches DXC. The current disk-backed include body load path in
IShaderCompiler.cppcomes from5ac3b55552and later loader reshapes likecc37325f28c. Per-lookup content hashing on that path was added incf9a866623. The current local-first probe order for globally rooted names likenbl/...comes from1f73d6a707.Third, the pre-fast-path trimmer always validated and walked the incoming module before it could know whether the requested entrypoint set already matched the prepared shader. The old flow is visible in
ISPIRVEntryPointTrimmer.cpp#L104-L246. That shape comes fromcfb4bd1da6and9f3f823124.The fullscreen-present helper was introduced in
2b08a15064. In that shapeCFullScreenTriangle.cpp#L120did not yet thread an external pipeline cache, so compute and present could not populate the same cache blob.What this changes
nbl/...through the global search path first instead of probing the local source directory firstCWaveStringResolvermember_test_macros.hlslwith the narrow Boost headers it actually usesbxdf/common.hlslintopath_tracing/concepts.hlslISPIRVEntryPointTrimmerwhen the incoming module is already a prepared single-entrypoint shaderFullScreenTriangleso EX31 can share one cache object across compute and presentValidation
Validation was run on AMD Ryzen 5 5600G with Radeon Graphics (6C/12T).
Local sequential
nsc -Pmeasurements on large EX31 path-tracer inputs moved preprocess time on representative heavy rules from roughly8 sdown to roughly2.5 s, which is about a3.20xspeedup and a68.75%reduction.On the paired EX31 branch, the
Debugwarm-cache path movedfirst_render_submit_msfrom13952to2698, which is a5.17xspeedup and an80.66%reduction.Prepared-shader and pipeline-cache validation on the paired EX31 branch is recorded in Devsh-Graphics-Programming/Nabla-Examples-and-Tests#262.