fix(invocation stats): Report delta VRAM for each invocation; fix RAM cache reporting #8746

lstein · 2026-01-08T17:38:39Z

Summary

The VRAM peak usage information included in the invocation performance statistics printed at the end of each generation was not very useful, as it did not indicate how much additional VRAM was actually being used by each node. This PR changes this information to delta VRAM, allowing you to see when a node's execution caused the allocated VRAM to increase over the course of the generation. Because of the VRAM cache allocation algorithm, there can also be decreases in VRAM when the execution of a node causes part of a model to be moved back to RAM. I think this is useful information as well.

This PR also fixes a bug that was causing the RAM cache size to be reported as 0.00G.

Related Issues / Discussions

None

QA Instructions

Run the same generation twice before applying this PR. The result will look something like this:

                          Node   Calls   Seconds  VRAM Used
                        string       1    0.000s     9.920G 
                       integer       1    0.000s     9.920G 
                 core_metadata       1    0.000s     9.920G 
          z_image_model_loader       1    0.000s     9.920G 
          z_image_text_encoder       1    0.000s     9.920G 
                       collect       1    0.000s     9.920G 
               z_image_denoise       1    6.219s    10.389G 
                   z_image_l2i       1    0.499s    12.047G  
TOTAL GRAPH EXECUTION TIME:  18.699s
TOTAL GRAPH WALL TIME:  18.702s
RAM used by InvokeAI process: 16.91G (+0.005G)
RAM used to load models: 10.07G
VRAM in use: 10.078G
RAM cache statistics:
   Model cache hits: 6
   Model cache misses: 0
   Models cached: 5
   Models cleared from cache: 0
   Cache high water mark: 10.07/0.00G

Notice that even non-GPU operations like "string" seem to be using VRAM. Also notice that the RAM cache size (last line) is shown as 0.00G.

Now apply the PR and again run the same generation twice. The result will look like this:

                          Node   Calls   Seconds VRAM Change
                        string       1    0.001s     +0.000G
                       integer       1    0.000s     +0.000G
                 core_metadata       1    0.000s     +0.000G
                 lora_selector       1    0.000s     +0.000G
                       collect       2    0.000s     +0.000G
          z_image_model_loader       1    0.000s     +0.000G
z_image_lora_collection_loader       1    0.000s     +0.000G
          z_image_text_encoder       1    0.416s     +0.000G
               z_image_denoise       1   17.785s     +0.000G
                   z_image_l2i       1    0.495s     +0.000G
TOTAL GRAPH EXECUTION TIME:  18.699s
TOTAL GRAPH WALL TIME:  18.702s
RAM used by InvokeAI process: 16.91G (+0.005G)
RAM used to load models: 10.07G
VRAM in use: 10.078G
RAM cache statistics:
   Model cache hits: 6
   Model cache misses: 0
   Models cached: 5
   Models cleared from cache: 0
   Cache high water mark: 10.07/12.48G

Since the same nodes are executing, there will ordinarily be no VRAM usage change (unless you are short of cache memory). Notice that the RAM cache size is now correct, and should match the cache size dynamically calculated at startup time.

Now change up the generation parameters by changing the model or generation parameters, and run a third time. Depending on the model & tensor caching behavior, you ought to see positive and/or negative VRAM changes.

                          Node   Calls   Seconds VRAM Change
             flux_model_loader       1    0.003s     +0.000G
                       integer       1    0.000s     +0.000G
                        string       1    0.000s     +0.000G
                 core_metadata       1    0.000s     +0.000G
             flux_text_encoder       1   12.806s     -0.583G
                       collect       1    0.000s     +0.000G
                  flux_denoise       1   39.379s     -2.793G
               flux_vae_decode       1    1.204s     -0.033G

Merge Plan

Simple merge.

Checklist

The PR has a short but descriptive title, suitable for a changelog
[] Tests added / updated (if applicable)
❗Changes to a redux slice have a corresponding migration
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)

…reporting of RAM cache size

fix(invocation stats): Report delta VRAM for each invocation and fix …

4629f95

…reporting of RAM cache size

lstein requested review from JPPhoto, blessedcoolant and dunkeroni as code owners January 8, 2026 17:38

github-actions bot added python PRs that change python files backend PRs that change backend files services PRs that change app services labels Jan 8, 2026

lstein and others added 2 commits January 8, 2026 12:40

chore(invocation stats): remove old dangling debug statement

ddfa5d5

Merge branch 'main' into lstein/fix/report-vram-usage-stats

07faf87

JPPhoto approved these changes Jan 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(invocation stats): Report delta VRAM for each invocation; fix RAM cache reporting #8746

fix(invocation stats): Report delta VRAM for each invocation; fix RAM cache reporting #8746

lstein commented Jan 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(invocation stats): Report delta VRAM for each invocation; fix RAM cache reporting #8746

Are you sure you want to change the base?

fix(invocation stats): Report delta VRAM for each invocation; fix RAM cache reporting #8746

Conversation

lstein commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lstein commented Jan 8, 2026 •

edited

Loading