Skip to content

Conversation

@lstein
Copy link
Collaborator

@lstein lstein commented Jan 8, 2026

Summary

The VRAM peak usage information included in the invocation performance statistics printed at the end of each generation was not very useful, as it did not indicate how much additional VRAM was actually being used by each node. This PR changes this information to delta VRAM, allowing you to see when a node's execution caused the allocated VRAM to increase over the course of the generation. Because of the VRAM cache allocation algorithm, there can also be decreases in VRAM when the execution of a node causes part of a model to be moved back to RAM. I think this is useful information as well.

This PR also fixes a bug that was causing the RAM cache size to be reported as 0.00G.

Related Issues / Discussions

None

QA Instructions

  1. Run the same generation twice before applying this PR. The result will look something like this:
                          Node   Calls   Seconds  VRAM Used
                        string       1    0.000s     9.920G 
                       integer       1    0.000s     9.920G 
                 core_metadata       1    0.000s     9.920G 
          z_image_model_loader       1    0.000s     9.920G 
          z_image_text_encoder       1    0.000s     9.920G 
                       collect       1    0.000s     9.920G 
               z_image_denoise       1    6.219s    10.389G 
                   z_image_l2i       1    0.499s    12.047G  
TOTAL GRAPH EXECUTION TIME:  18.699s
TOTAL GRAPH WALL TIME:  18.702s
RAM used by InvokeAI process: 16.91G (+0.005G)
RAM used to load models: 10.07G
VRAM in use: 10.078G
RAM cache statistics:
   Model cache hits: 6
   Model cache misses: 0
   Models cached: 5
   Models cleared from cache: 0
   Cache high water mark: 10.07/0.00G                                                                       

Notice that even non-GPU operations like "string" seem to be using VRAM. Also notice that the RAM cache size (last line) is shown as 0.00G.

  1. Now apply the PR and again run the same generation twice. The result will look like this:
                          Node   Calls   Seconds VRAM Change
                        string       1    0.001s     +0.000G
                       integer       1    0.000s     +0.000G
                 core_metadata       1    0.000s     +0.000G
                 lora_selector       1    0.000s     +0.000G
                       collect       2    0.000s     +0.000G
          z_image_model_loader       1    0.000s     +0.000G
z_image_lora_collection_loader       1    0.000s     +0.000G
          z_image_text_encoder       1    0.416s     +0.000G
               z_image_denoise       1   17.785s     +0.000G
                   z_image_l2i       1    0.495s     +0.000G
TOTAL GRAPH EXECUTION TIME:  18.699s
TOTAL GRAPH WALL TIME:  18.702s
RAM used by InvokeAI process: 16.91G (+0.005G)
RAM used to load models: 10.07G
VRAM in use: 10.078G
RAM cache statistics:
   Model cache hits: 6
   Model cache misses: 0
   Models cached: 5
   Models cleared from cache: 0
   Cache high water mark: 10.07/12.48G

Since the same nodes are executing, there will ordinarily be no VRAM usage change (unless you are short of cache memory). Notice that the RAM cache size is now correct, and should match the cache size dynamically calculated at startup time.

  1. Now change up the generation parameters by changing the model or generation parameters, and run a third time. Depending on the model & tensor caching behavior, you ought to see positive and/or negative VRAM changes.
                          Node   Calls   Seconds VRAM Change
             flux_model_loader       1    0.003s     +0.000G
                       integer       1    0.000s     +0.000G
                        string       1    0.000s     +0.000G
                 core_metadata       1    0.000s     +0.000G
             flux_text_encoder       1   12.806s     -0.583G
                       collect       1    0.000s     +0.000G
                  flux_denoise       1   39.379s     -2.793G
               flux_vae_decode       1    1.204s     -0.033G

Merge Plan

Simple merge.

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • [] Tests added / updated (if applicable)
  • ❗Changes to a redux slice have a corresponding migration
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

@github-actions github-actions bot added python PRs that change python files backend PRs that change backend files services PRs that change app services labels Jan 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend PRs that change backend files python PRs that change python files services PRs that change app services

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants