UPSTREAM PR #1261: refactor: move VAE tiling parameters to SDGenerationParams by loci-dev · Pull Request #63 · auroralabs-loci/stable-diffusion.cpp

loci-dev · 2026-02-20T04:16:58Z

Note

Source pull request: leejet/stable-diffusion.cpp#1261

loci-review · 2026-02-20T05:19:46Z

Overview

Analysis of stable-diffusion.cpp refactoring commit (2367bc7: "move VAE tiling parameters to SDGenerationParams") across 48,374 functions shows minimal performance impact. Modified: 78 functions; New: 80; Removed: 80; Unchanged: 48,136.

Binaries Analyzed:

build.bin.sd-server: +0.614% power consumption (515,491.29 nJ → 518,655.31 nJ)
build.bin.sd-cli: +0.706% power consumption (480,109.60 nJ → 483,500.24 nJ)

The refactoring successfully moves VAE tiling parameters from context initialization to per-generation configuration, enabling flexible memory management with acceptable performance trade-offs.

Function Analysis

Configuration Parsing (Initialization Only):

SDContextParams::get_options() improved across both binaries: response time -6.6% (sd-server: 279,572ns → 261,119ns; sd-cli: 280,187ns → 261,795ns), throughput time -7.6% to -9.6% due to removing 4 VAE tiling options. This simplification reduced branching and parsing overhead.

SDGenerationParams::get_options() regressed consistently: response time +5.95-5.96% (sd-server: 306,582ns → 324,830ns; sd-cli: 307,317ns → 325,643ns), throughput time +6.11% due to adding the same 4 options with complex parsing logic. The ~200ns self-time increase reflects additional option registration overhead.

SDGenerationParams::to_string() (sd-cli) regressed +17.4% throughput time (1,714ns → 2,012ns) from serializing 6 additional vae_tiling_params fields—expected for a diagnostic function.

GGML Backend (Model Loading/Inference):

make_block_q4_Kx8 (sd-server) regressed +7.9% (8,126ns → 8,768ns) in both response and throughput time, indicating intrinsic overhead in quantization repacking. Affects model loading, not inference hot path.

forward_mul_mat for block_iq4_nl (sd-server) shows +5.38% response time regression (12,916ns → 13,611ns) while throughput time remains stable (2,390ns), indicating child function slowdown rather than direct implementation changes. This matrix multiplication function is inference-critical, though stable self-time suggests indirect impact.

Standard Library Optimizations:

Multiple functions improved significantly: std::make_move_iterator -58.6% response time (287ns → 119ns), __gnu_cxx::__normal_iterator::operator+ -42.1% (165ns → 95ns), std::swap -11% (112ns → 100ns), std::__unique -5.8% response time. These compiler optimizations partially offset regressions.

Other analyzed functions (JSON access, regex compilation, vector reallocation) showed minor self-time variations with negligible total execution impact.

Additional Findings

The architectural refactoring achieves its goal of enabling per-generation VAE tiling control with minimal cost. Configuration parsing improvements offset regressions, resulting in balanced initialization performance. Most performance changes affect initialization rather than inference hot paths. The forward_mul_mat regression warrants monitoring in production, though stable self-time suggests the function's implementation is unchanged with slowdown in GGML dependencies. Power consumption increases (<1%) are negligible for image generation workloads taking seconds to minutes per image.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

refactor: move VAE tiling parameters to SDGenerationParams

2367bc7

loci-dev temporarily deployed to stable-diffusion-cpp-prod February 20, 2026 04:17 — with GitHub Actions Inactive

loci-dev force-pushed the main branch from 10ea7dd to 2f8b672 Compare February 27, 2026 04:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #1261: refactor: move VAE tiling parameters to SDGenerationParams#63

UPSTREAM PR #1261: refactor: move VAE tiling parameters to SDGenerationParams#63
loci-dev wants to merge 1 commit intomainfrom
loci/pr-1261-sd_refactor_vae_tiling

loci-dev commented Feb 20, 2026

Uh oh!

loci-review bot commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Feb 20, 2026

Uh oh!

loci-review bot commented Feb 20, 2026

Overview

Function Analysis

Additional Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants