Python: Add prompt caching support to Anthropic connector#13947
Python: Add prompt caching support to Anthropic connector#13947Vizhy wants to merge 2 commits intomicrosoft:mainfrom
Conversation
Adds AnthropicCacheSettings and a `cache` field on AnthropicChatPromptExecutionSettings
to enable opt-in prompt caching via the Anthropic cache_control API.
When enabled, prepare_settings_dict() injects cache_control blocks on the system
message and the last tool definition before the request is sent. No changes to
AnthropicChatCompletion — caching is fully contained in the settings layer.
Off by default; opt in with cache=AnthropicCacheSettings.on().
Convenience constructors: .on() .off() .system() .tools() .short() .long()
TTL: "5m" -> {"type":"ephemeral"}, "1h" -> {"type":"ephemeral","ttl":3600}
Includes 16 new unit tests and a usage sample at
samples/concepts/caching/anthropic_prompt_caching.py.
There was a problem hiding this comment.
Pull request overview
Adds opt-in prompt caching support to the Python Anthropic connector by introducing a cache settings model and injecting Anthropic cache_control blocks into the serialized request payload (system content block and/or the last tool definition).
Changes:
- Introduces
AnthropicCacheSettingsand exposes it as a public API viasemantic_kernel.connectors.ai.anthropic. - Extends
AnthropicChatPromptExecutionSettingswith an excludedcachefield and injectscache_controlduringprepare_settings_dict(). - Adds unit tests for caching settings/injection behavior and a new sample demonstrating prompt caching usage.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| python/tests/unit/connectors/ai/anthropic/test_anthropic_request_settings.py | Adds unit tests covering cache settings constructors and prepare_settings_dict() injection behavior. |
| python/semantic_kernel/connectors/ai/anthropic/prompt_execution_settings/anthropic_prompt_execution_settings.py | Adds AnthropicCacheSettings, adds cache to execution settings (excluded from serialization), and injects cache_control into outbound payload. |
| python/semantic_kernel/connectors/ai/anthropic/init.py | Exports AnthropicCacheSettings as part of the Anthropic connector public surface. |
| python/samples/concepts/caching/anthropic_prompt_caching.py | Adds a runnable sample demonstrating multi-turn Anthropic prompt caching. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| class AnthropicCacheSettings(BaseModel): | ||
| """Configuration for Anthropic prompt caching. | ||
|
|
||
| Controls which parts of the request receive cache_control injection. | ||
|
|
| if self.cache.cache_tools: | ||
| tools: list[dict[str, Any]] | None = data.get("tools") | ||
| if tools: | ||
| tools = copy.deepcopy(tools) | ||
| tools[-1]["cache_control"] = cache_control | ||
| data["tools"] = tools |
| tools: list[dict[str, Any]] | None = data.get("tools") | ||
| if tools: | ||
| tools = copy.deepcopy(tools) | ||
| tools[-1]["cache_control"] = cache_control |
| ctrl = AnthropicCacheSettings.on(ttl="5m")._cache_control() | ||
| assert ctrl == {"type": "ephemeral"} | ||
|
|
||
|
|
||
| def test_cache_control_1h(): | ||
| ctrl = AnthropicCacheSettings.on(ttl="1h")._cache_control() | ||
| assert ctrl == {"type": "ephemeral", "ttl": 3600} |
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 92%
✓ Correctness
The PR adds Anthropic prompt caching support with a well-structured
AnthropicCacheSettingsmodel andprepare_settings_dictoverride. There is one correctness bug: the_cache_control()method emits"ttl": 3600(an integer) for the 1-hour TTL, but the Anthropic SDK'sCacheControlEphemeralParamtype definesttl: Literal["5m", "1h"]— it expects the string"1h", not an integer. This will cause a runtime API error or silent rejection when 1-hour caching is used. The corresponding tests also assert the wrong expected value, so they pass but do not catch the bug.
✓ Security Reliability
This PR adds Anthropic prompt caching support via a new
AnthropicCacheSettingsmodel andprepare_settings_dictoverride. The implementation is clean from a security and reliability standpoint: TTL values are constrained byLiteral["5m", "1h"], thecachefield is correctly excluded from API serialization (exclude=True), tools are deep-copied before mutation to prevent side effects, and edge cases (empty system string, missing tools) are handled properly. No secrets, injection risks, resource leaks, or unsafe deserialization were found.
✓ Test Coverage
The new AnthropicCacheSettings class and its integration into prepare_settings_dict are well-tested, covering factory methods, TTL variants, edge cases (empty system, no tools), mutation protection, and serialization exclusion. However, the PR widens the
systemfield type to acceptlist[dict[str, Any]]in addition tostr, yet there is no test verifying behavior whensystemis passed as a pre-structured list with caching enabled. The code silently skips cache injection in that case (line 158:isinstance(system, str)), and a test should document this intended behavior.
✗ Design Approach
I found one design-level issue. The new caching API broadens
systemto accept Anthropic-native block lists, but the caching implementation only injectscache_controlwhensystemis a plain string. That makes the newly supported structured-system form a silent no-op forcache_system, which is a contract gap in the core feature rather than a missing edge-case test.
Suggestions
- In python/semantic_kernel/connectors/ai/anthropic/prompt_execution_settings/anthropic_prompt_execution_settings.py:156-159, treat
systemas one normalized content-block sequence for serialization so caching works for both supported input shapes, rather than special-casing only raw strings.
Automated review by Vizhy's agents
… blocks
- _cache_control() now emits {"ttl":"1h"} string per CacheControlEphemeralParam
spec instead of integer 3600
- prepare_settings_dict() now injects cache_control on list[dict] system blocks
in addition to plain strings, closing the silent no-op design gap
- add test covering cache injection when system is pre-structured as list[dict]
- update 1h TTL test assertions to match corrected string value
|
@microsoft-github-policy-service agree |
|
Thanks for the thorough automated review — two valid issues were caught and both are addressed in the follow-up commit (da6de64): 1. TTL value fix (Correctness) 2. Pre-structured system blocks (Design Approach)
A test covering the |
Motivation and Context
The Anthropic API supports prompt caching via
cache_controlblocks on systemmessages and tool definitions. On a cache hit, input tokens are billed at 0.1x
cost — a significant saving for agentic loops, long system prompts, or large
tool catalogs that repeat across calls.
Today the Semantic Kernel Python Anthropic connector has no way to opt into
this feature. Users have to either fork the connector or build their own
wrapper. This PR adds first-class, opt-in caching support so the savings are
available to every SK Python user targeting Anthropic models.
Description
Adds a new
AnthropicCacheSettingsmodel and wires it intoAnthropicChatPromptExecutionSettingsas an excludedcachefield. Whencaching is enabled,
prepare_settings_dict()injectscache_controlblocksonto the system message and the last tool definition right before the request
is sent to the Anthropic SDK.
Key design choices:
explicitly with
cache=AnthropicCacheSettings.on().AnthropicChatCompletion— the service is unaware of caching, keeping thechange surface minimal and existing logic untouched.
"5m"or"1h". Translated internally to the correctAPI payload:
{"type": "ephemeral"}or{"type": "ephemeral", "ttl": "1h"}..on(),.off(),.system(),.tools(),.short(),.long().Public API additions:
Tests: 17 new unit tests covering all classmethods, both TTL variants,
serialization payload shape, and edge cases (empty system string, no tools,
pre-structured
list[dict]system blocks, list mutation safety,cachefieldexcluded from serialized output). All 54 existing Anthropic tests continue to pass.
Sample:
samples/concepts/caching/anthropic_prompt_caching.py— amulti-turn chat demo with caching enabled on a large system prompt, showing
the first call writing the cache and subsequent calls reading from it.
Observed Token Savings
The following traces were captured against
claude-haiku-4-5using the sameprompt, with caching off vs. on. Both runs used
AnthropicCacheSettings.on()with the 5-minute TTL.
Without caching
With caching active

83% reduction in billed input tokens on a warm cache hit. The 9,112 cached
tokens are served at 0.1x cost instead of full input price.
Contribution Checklist