Python: Add prompt caching support to Anthropic connector by Vizhy · Pull Request #13947 · microsoft/semantic-kernel

Vizhy · 2026-05-04T11:57:01Z

Motivation and Context

The Anthropic API supports prompt caching via cache_control blocks on system
messages and tool definitions. On a cache hit, input tokens are billed at 0.1x
cost — a significant saving for agentic loops, long system prompts, or large
tool catalogs that repeat across calls.

Today the Semantic Kernel Python Anthropic connector has no way to opt into
this feature. Users have to either fork the connector or build their own
wrapper. This PR adds first-class, opt-in caching support so the savings are
available to every SK Python user targeting Anthropic models.

Description

Adds a new AnthropicCacheSettings model and wires it into
AnthropicChatPromptExecutionSettings as an excluded cache field. When
caching is enabled, prepare_settings_dict() injects cache_control blocks
onto the system message and the last tool definition right before the request
is sent to the Anthropic SDK.

Key design choices:

Caching is off by default. No behavior change for existing users. Opt in
explicitly with cache=AnthropicCacheSettings.on().
All injection happens in the settings layer. No changes to
AnthropicChatCompletion — the service is unaware of caching, keeping the
change surface minimal and existing logic untouched.
TTL is exposed as "5m" or "1h". Translated internally to the correct
API payload: {"type": "ephemeral"} or {"type": "ephemeral", "ttl": "1h"}.
Convenience constructors cover common patterns:
.on(), .off(), .system(), .tools(), .short(), .long().

Public API additions:

from semantic_kernel.connectors.ai.anthropic import (
    AnthropicCacheSettings,
    AnthropicChatPromptExecutionSettings,
)

# cache system message + tools, 5-minute TTL
settings = AnthropicChatPromptExecutionSettings(
    cache=AnthropicCacheSettings.on(),
)

# cache system message only, 1-hour TTL
settings = AnthropicChatPromptExecutionSettings(
    cache=AnthropicCacheSettings.system(ttl="1h"),
)

Tests: 17 new unit tests covering all classmethods, both TTL variants,
serialization payload shape, and edge cases (empty system string, no tools,
pre-structured list[dict] system blocks, list mutation safety, cache field
excluded from serialized output). All 54 existing Anthropic tests continue to pass.

Sample: samples/concepts/caching/anthropic_prompt_caching.py — a
multi-turn chat demo with caching enabled on a large system prompt, showing
the first call writing the cache and subsequent calls reading from it.

Observed Token Savings

The following traces were captured against claude-haiku-4-5 using the same
prompt, with caching off vs. on. Both runs used AnthropicCacheSettings.on()
with the 5-minute TTL.

Without caching

Input: 12,421 tokens — Cache Read: 0 — Cache Write: 0

Input: 2,085 tokens — Cache Read: 9,112 tokens — Cache Write: 144 tokens

With caching active

83% reduction in billed input tokens on a warm cache hit. The 9,112 cached
tokens are served at 0.1x cost instead of full input price.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the SK Contribution Guidelines and the pre-submission formatting script raises no violations
All unit tests pass, and I have added new tests where possible
I didn't break anyone 😄

Adds AnthropicCacheSettings and a `cache` field on AnthropicChatPromptExecutionSettings to enable opt-in prompt caching via the Anthropic cache_control API. When enabled, prepare_settings_dict() injects cache_control blocks on the system message and the last tool definition before the request is sent. No changes to AnthropicChatCompletion — caching is fully contained in the settings layer. Off by default; opt in with cache=AnthropicCacheSettings.on(). Convenience constructors: .on() .off() .system() .tools() .short() .long() TTL: "5m" -> {"type":"ephemeral"}, "1h" -> {"type":"ephemeral","ttl":3600} Includes 16 new unit tests and a usage sample at samples/concepts/caching/anthropic_prompt_caching.py.

Copilot

Pull request overview

Adds opt-in prompt caching support to the Python Anthropic connector by introducing a cache settings model and injecting Anthropic cache_control blocks into the serialized request payload (system content block and/or the last tool definition).

Changes:

Introduces AnthropicCacheSettings and exposes it as a public API via semantic_kernel.connectors.ai.anthropic.
Extends AnthropicChatPromptExecutionSettings with an excluded cache field and injects cache_control during prepare_settings_dict().
Adds unit tests for caching settings/injection behavior and a new sample demonstrating prompt caching usage.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
python/tests/unit/connectors/ai/anthropic/test_anthropic_request_settings.py	Adds unit tests covering cache settings constructors and `prepare_settings_dict()` injection behavior.
python/semantic_kernel/connectors/ai/anthropic/prompt_execution_settings/anthropic_prompt_execution_settings.py	Adds `AnthropicCacheSettings`, adds `cache` to execution settings (excluded from serialization), and injects `cache_control` into outbound payload.
python/semantic_kernel/connectors/ai/anthropic/init.py	Exports `AnthropicCacheSettings` as part of the Anthropic connector public surface.
python/samples/concepts/caching/anthropic_prompt_caching.py	Adds a runnable sample demonstrating multi-turn Anthropic prompt caching.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+class AnthropicCacheSettings(BaseModel):
+    """Configuration for Anthropic prompt caching.
+
+    Controls which parts of the request receive cache_control injection.
+


+        if self.cache.cache_tools:
+            tools: list[dict[str, Any]] | None = data.get("tools")
+            if tools:
+                tools = copy.deepcopy(tools)
+                tools[-1]["cache_control"] = cache_control
+                data["tools"] = tools


+            tools: list[dict[str, Any]] | None = data.get("tools")
+            if tools:
+                tools = copy.deepcopy(tools)
+                tools[-1]["cache_control"] = cache_control


+    ctrl = AnthropicCacheSettings.on(ttl="5m")._cache_control()
+    assert ctrl == {"type": "ephemeral"}
+
+
+def test_cache_control_1h():
+    ctrl = AnthropicCacheSettings.on(ttl="1h")._cache_control()
+    assert ctrl == {"type": "ephemeral", "ttl": 3600}


github-actions

Automated Code Review

Reviewers: 4 | Confidence: 92%

✓ Correctness

The PR adds Anthropic prompt caching support with a well-structured AnthropicCacheSettings model and prepare_settings_dict override. There is one correctness bug: the _cache_control() method emits "ttl": 3600 (an integer) for the 1-hour TTL, but the Anthropic SDK's CacheControlEphemeralParam type defines ttl: Literal["5m", "1h"] — it expects the string "1h", not an integer. This will cause a runtime API error or silent rejection when 1-hour caching is used. The corresponding tests also assert the wrong expected value, so they pass but do not catch the bug.

✓ Security Reliability

This PR adds Anthropic prompt caching support via a new AnthropicCacheSettings model and prepare_settings_dict override. The implementation is clean from a security and reliability standpoint: TTL values are constrained by Literal["5m", "1h"], the cache field is correctly excluded from API serialization (exclude=True), tools are deep-copied before mutation to prevent side effects, and edge cases (empty system string, missing tools) are handled properly. No secrets, injection risks, resource leaks, or unsafe deserialization were found.

✓ Test Coverage

The new AnthropicCacheSettings class and its integration into prepare_settings_dict are well-tested, covering factory methods, TTL variants, edge cases (empty system, no tools), mutation protection, and serialization exclusion. However, the PR widens the system field type to accept list[dict[str, Any]] in addition to str, yet there is no test verifying behavior when system is passed as a pre-structured list with caching enabled. The code silently skips cache injection in that case (line 158: isinstance(system, str)), and a test should document this intended behavior.

✗ Design Approach

I found one design-level issue. The new caching API broadens system to accept Anthropic-native block lists, but the caching implementation only injects cache_control when system is a plain string. That makes the newly supported structured-system form a silent no-op for cache_system, which is a contract gap in the core feature rather than a missing edge-case test.

Suggestions

In python/semantic_kernel/connectors/ai/anthropic/prompt_execution_settings/anthropic_prompt_execution_settings.py:156-159, treat system as one normalized content-block sequence for serialization so caching works for both supported input shapes, rather than special-casing only raw strings.

Automated review by Vizhy's agents

… blocks - _cache_control() now emits {"ttl":"1h"} string per CacheControlEphemeralParam spec instead of integer 3600 - prepare_settings_dict() now injects cache_control on list[dict] system blocks in addition to plain strings, closing the silent no-op design gap - add test covering cache injection when system is pre-structured as list[dict] - update 1h TTL test assertions to match corrected string value

Vizhy · 2026-05-05T09:44:25Z

@microsoft-github-policy-service agree

Vizhy · 2026-05-05T09:44:34Z

Thanks for the thorough automated review — two valid issues were caught and both are addressed in the follow-up commit (da6de64):

1. TTL value fix (Correctness)
_cache_control() now emits {"ttl": "1h"} (string) instead of {"ttl": 3600} (integer), correctly matching the CacheControlEphemeralParam SDK type definition. The corresponding test assertions have been updated to match.

2. Pre-structured system blocks (Design Approach)
prepare_settings_dict() now handles both input shapes for system:

str → wrapped into a single content block with cache_control
list[dict] → cache_control injected on the last block (same pattern used for tools), with a copy.deepcopy to avoid mutation

A test covering the list[dict] case has been added to document this behaviour explicitly.

Copilot AI review requested due to automatic review settings May 4, 2026 11:57

Vizhy requested a review from a team as a code owner May 4, 2026 11:57

moonbox3 added the python Pull requests for the Python Semantic Kernel label May 4, 2026

Copilot started reviewing on behalf of Vizhy May 4, 2026 11:57 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

github-actions Bot reviewed May 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Add prompt caching support to Anthropic connector#13947

Python: Add prompt caching support to Anthropic connector#13947
Vizhy wants to merge 2 commits intomicrosoft:mainfrom
Vizhy:feature/connectors-ai-anthropic-cache

Vizhy commented May 4, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

Vizhy commented May 5, 2026

Uh oh!

Vizhy commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Vizhy commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Description

Observed Token Savings

Contribution Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Correctness

✓ Security Reliability

✓ Test Coverage

✗ Design Approach

Suggestions

Uh oh!

Vizhy commented May 5, 2026

Uh oh!

Vizhy commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Vizhy commented May 4, 2026 •

edited

Loading