Skip to content

Propagate global --cache_dir to continuous batching pipeline (#4230)#4329

Open
exzile wants to merge 1 commit into
openvinotoolkit:mainfrom
exzile:fix/cache-dir-continuous-batching
Open

Propagate global --cache_dir to continuous batching pipeline (#4230)#4329
exzile wants to merge 1 commit into
openvinotoolkit:mainfrom
exzile:fix/cache-dir-continuous-batching

Conversation

@exzile

@exzile exzile commented Jun 26, 2026

Copy link
Copy Markdown

Summary

Fixes #4230.

The continuous batching servable initializer constructs the GenAI ContinuousBatchingPipeline directly and never applied the server-level --cache_dir (ServerSettings.cacheDir). Unlike the non-CB path — which applies it via ModelInstance::setCacheOptions (ieCore.set_property(ov::cache_dir(...))) — the CB path left model-compilation caching disabled unless the user manually duplicated the value into the node's plugin_config as CACHE_DIR. As a result, .blob/.cl_cache artifacts were never persisted and every server restart fully recompiled the model.

Fix

In ContinuousBatchingServableInitializer::initialize, inject the global cache_dir into the pipeline pluginConfig (keyed by ov::cache_dir.name() == CACHE_DIR) right after parsing the node plugin_config and before constructing the pipeline. An explicit CACHE_DIR in the node's plugin_config remains authoritative.

const std::string& globalCacheDir = Config::instance().cacheDir();
if (!globalCacheDir.empty()) {
    if (properties->pluginConfig.find(ov::cache_dir.name()) == properties->pluginConfig.end()) {
        properties->pluginConfig[ov::cache_dir.name()] = globalCacheDir;
    } // else: explicit node CACHE_DIR wins
}

This also applies to the VLM continuous-batching servable, which reuses this initializer.

Testing

Added a regression test LLMNodeOptionsCacheDirPropagation (run for both the CB and VLM fixtures) covering:

  1. The global --cache_dir is propagated into pluginConfig["CACHE_DIR"] when the node does not set it.
  2. An explicit CACHE_DIR in the node plugin_config takes precedence over the global value.

Built and ran locally on Windows (MSVC) against a real facebook/opt-125m continuous-batching pipeline:

[       OK ] LLMOptionsHttpTest.LLMNodeOptionsCheckPluginConfig
[       OK ] LLMOptionsHttpTest.LLMNodeOptionsCacheDirPropagation
[  PASSED  ] 2 tests.

🤖 Generated with Claude Code

@exzile

exzile commented Jun 26, 2026

Copy link
Copy Markdown
Author

End-to-end verification on GPU

Beyond the unit test, I verified the actual caching behavior on an Intel Arc Pro B70 dGPU (EXPORT_IMPORT capable) serving facebook/opt-125m via continuous batching with device: "GPU" and a --cache_dir pointing at an initially-empty directory:

Run 1 (empty cache):

  • Log confirms the fix path: servable_initializer.cpp] Applying global cache_dir to continuous batching pipeline: .../cachedir
  • Cache dir went from empty → *.cl_cache files + a 130 MB .blob (compiled model)
  • Pipeline init → AVAILABLE: ~1.6 s

Run 2 (restart, populated cache):

  • Same fix path fires
  • No recompile: blob is reused (file unchanged), no new artifacts written
  • Pipeline init → AVAILABLE: ~0.24 s (~6.7× faster cold start)

Without the change the cache directory stays empty on the CB path and every restart recompiles. With it, blobs persist and are reused across restarts as intended.

The continuous batching servable initializer constructs the GenAI
ContinuousBatchingPipeline directly and never applied the server-level
--cache_dir (ServerSettings.cacheDir). Unlike the non-CB path, which
applies it via ModelInstance::setCacheOptions, the CB path left model
compilation caching disabled unless the user duplicated the value into
the node's plugin_config as CACHE_DIR. As a result, .blob/.cl_cache
artifacts were never persisted and every restart fully recompiled the
model.

Inject the global cache_dir into the pipeline plugin config before
constructing the pipeline. An explicit CACHE_DIR in the node's
plugin_config remains authoritative.

Adds a regression test (LLMNodeOptionsCacheDirPropagation) covering both
propagation of the global value and precedence of an explicit node value.

Fixes openvinotoolkit#4230

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@exzile exzile force-pushed the fix/cache-dir-continuous-batching branch from 6fd48db to c45573c Compare June 27, 2026 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

--cache_dir not propagated to LLM continuous batching pipeline (regression vs 2025.4 promise)

1 participant