Skip to content

Python: Add prompt caching support to Anthropic connector#13947

Open
Vizhy wants to merge 2 commits intomicrosoft:mainfrom
Vizhy:feature/connectors-ai-anthropic-cache
Open

Python: Add prompt caching support to Anthropic connector#13947
Vizhy wants to merge 2 commits intomicrosoft:mainfrom
Vizhy:feature/connectors-ai-anthropic-cache

Conversation

@Vizhy
Copy link
Copy Markdown

@Vizhy Vizhy commented May 4, 2026

Motivation and Context

The Anthropic API supports prompt caching via cache_control blocks on system
messages and tool definitions. On a cache hit, input tokens are billed at 0.1x
cost — a significant saving for agentic loops, long system prompts, or large
tool catalogs that repeat across calls.

Today the Semantic Kernel Python Anthropic connector has no way to opt into
this feature. Users have to either fork the connector or build their own
wrapper. This PR adds first-class, opt-in caching support so the savings are
available to every SK Python user targeting Anthropic models.

Description

Adds a new AnthropicCacheSettings model and wires it into
AnthropicChatPromptExecutionSettings as an excluded cache field. When
caching is enabled, prepare_settings_dict() injects cache_control blocks
onto the system message and the last tool definition right before the request
is sent to the Anthropic SDK.

Key design choices:

  • Caching is off by default. No behavior change for existing users. Opt in
    explicitly with cache=AnthropicCacheSettings.on().
  • All injection happens in the settings layer. No changes to
    AnthropicChatCompletion — the service is unaware of caching, keeping the
    change surface minimal and existing logic untouched.
  • TTL is exposed as "5m" or "1h". Translated internally to the correct
    API payload: {"type": "ephemeral"} or {"type": "ephemeral", "ttl": "1h"}.
  • Convenience constructors cover common patterns:
    .on(), .off(), .system(), .tools(), .short(), .long().

Public API additions:

from semantic_kernel.connectors.ai.anthropic import (
    AnthropicCacheSettings,
    AnthropicChatPromptExecutionSettings,
)

# cache system message + tools, 5-minute TTL
settings = AnthropicChatPromptExecutionSettings(
    cache=AnthropicCacheSettings.on(),
)

# cache system message only, 1-hour TTL
settings = AnthropicChatPromptExecutionSettings(
    cache=AnthropicCacheSettings.system(ttl="1h"),
)

Tests: 17 new unit tests covering all classmethods, both TTL variants,
serialization payload shape, and edge cases (empty system string, no tools,
pre-structured list[dict] system blocks, list mutation safety, cache field
excluded from serialized output). All 54 existing Anthropic tests continue to pass.

Sample: samples/concepts/caching/anthropic_prompt_caching.py — a
multi-turn chat demo with caching enabled on a large system prompt, showing
the first call writing the cache and subsequent calls reading from it.

Observed Token Savings

The following traces were captured against claude-haiku-4-5 using the same
prompt, with caching off vs. on. Both runs used AnthropicCacheSettings.on()
with the 5-minute TTL.

Without caching

Input: 12,421 tokens — Cache Read: 0 — Cache Write: 0

image

Input: 2,085 tokens — Cache Read: 9,112 tokens — Cache Write: 144 tokens

With caching active
image

83% reduction in billed input tokens on a warm cache hit. The 9,112 cached
tokens are served at 0.1x cost instead of full input price.

Contribution Checklist

Adds AnthropicCacheSettings and a `cache` field on AnthropicChatPromptExecutionSettings
to enable opt-in prompt caching via the Anthropic cache_control API.

When enabled, prepare_settings_dict() injects cache_control blocks on the system
message and the last tool definition before the request is sent. No changes to
AnthropicChatCompletion — caching is fully contained in the settings layer.

Off by default; opt in with cache=AnthropicCacheSettings.on().
Convenience constructors: .on() .off() .system() .tools() .short() .long()
TTL: "5m" -> {"type":"ephemeral"}, "1h" -> {"type":"ephemeral","ttl":3600}

Includes 16 new unit tests and a usage sample at
samples/concepts/caching/anthropic_prompt_caching.py.
Copilot AI review requested due to automatic review settings May 4, 2026 11:57
@Vizhy Vizhy requested a review from a team as a code owner May 4, 2026 11:57
@moonbox3 moonbox3 added the python Pull requests for the Python Semantic Kernel label May 4, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds opt-in prompt caching support to the Python Anthropic connector by introducing a cache settings model and injecting Anthropic cache_control blocks into the serialized request payload (system content block and/or the last tool definition).

Changes:

  • Introduces AnthropicCacheSettings and exposes it as a public API via semantic_kernel.connectors.ai.anthropic.
  • Extends AnthropicChatPromptExecutionSettings with an excluded cache field and injects cache_control during prepare_settings_dict().
  • Adds unit tests for caching settings/injection behavior and a new sample demonstrating prompt caching usage.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
python/tests/unit/connectors/ai/anthropic/test_anthropic_request_settings.py Adds unit tests covering cache settings constructors and prepare_settings_dict() injection behavior.
python/semantic_kernel/connectors/ai/anthropic/prompt_execution_settings/anthropic_prompt_execution_settings.py Adds AnthropicCacheSettings, adds cache to execution settings (excluded from serialization), and injects cache_control into outbound payload.
python/semantic_kernel/connectors/ai/anthropic/init.py Exports AnthropicCacheSettings as part of the Anthropic connector public surface.
python/samples/concepts/caching/anthropic_prompt_caching.py Adds a runnable sample demonstrating multi-turn Anthropic prompt caching.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +16 to +20
class AnthropicCacheSettings(BaseModel):
"""Configuration for Anthropic prompt caching.

Controls which parts of the request receive cache_control injection.

Comment on lines +161 to +166
if self.cache.cache_tools:
tools: list[dict[str, Any]] | None = data.get("tools")
if tools:
tools = copy.deepcopy(tools)
tools[-1]["cache_control"] = cache_control
data["tools"] = tools
tools: list[dict[str, Any]] | None = data.get("tools")
if tools:
tools = copy.deepcopy(tools)
tools[-1]["cache_control"] = cache_control
Comment on lines +178 to +184
ctrl = AnthropicCacheSettings.on(ttl="5m")._cache_control()
assert ctrl == {"type": "ephemeral"}


def test_cache_control_1h():
ctrl = AnthropicCacheSettings.on(ttl="1h")._cache_control()
assert ctrl == {"type": "ephemeral", "ttl": 3600}
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 92%

✓ Correctness

The PR adds Anthropic prompt caching support with a well-structured AnthropicCacheSettings model and prepare_settings_dict override. There is one correctness bug: the _cache_control() method emits "ttl": 3600 (an integer) for the 1-hour TTL, but the Anthropic SDK's CacheControlEphemeralParam type defines ttl: Literal["5m", "1h"] — it expects the string "1h", not an integer. This will cause a runtime API error or silent rejection when 1-hour caching is used. The corresponding tests also assert the wrong expected value, so they pass but do not catch the bug.

✓ Security Reliability

This PR adds Anthropic prompt caching support via a new AnthropicCacheSettings model and prepare_settings_dict override. The implementation is clean from a security and reliability standpoint: TTL values are constrained by Literal["5m", "1h"], the cache field is correctly excluded from API serialization (exclude=True), tools are deep-copied before mutation to prevent side effects, and edge cases (empty system string, missing tools) are handled properly. No secrets, injection risks, resource leaks, or unsafe deserialization were found.

✓ Test Coverage

The new AnthropicCacheSettings class and its integration into prepare_settings_dict are well-tested, covering factory methods, TTL variants, edge cases (empty system, no tools), mutation protection, and serialization exclusion. However, the PR widens the system field type to accept list[dict[str, Any]] in addition to str, yet there is no test verifying behavior when system is passed as a pre-structured list with caching enabled. The code silently skips cache injection in that case (line 158: isinstance(system, str)), and a test should document this intended behavior.

✗ Design Approach

I found one design-level issue. The new caching API broadens system to accept Anthropic-native block lists, but the caching implementation only injects cache_control when system is a plain string. That makes the newly supported structured-system form a silent no-op for cache_system, which is a contract gap in the core feature rather than a missing edge-case test.

Suggestions

  • In python/semantic_kernel/connectors/ai/anthropic/prompt_execution_settings/anthropic_prompt_execution_settings.py:156-159, treat system as one normalized content-block sequence for serialization so caching works for both supported input shapes, rather than special-casing only raw strings.

Automated review by Vizhy's agents

… blocks

- _cache_control() now emits {"ttl":"1h"} string per CacheControlEphemeralParam
  spec instead of integer 3600
- prepare_settings_dict() now injects cache_control on list[dict] system blocks
  in addition to plain strings, closing the silent no-op design gap
- add test covering cache injection when system is pre-structured as list[dict]
- update 1h TTL test assertions to match corrected string value
@Vizhy
Copy link
Copy Markdown
Author

Vizhy commented May 5, 2026

@microsoft-github-policy-service agree

@Vizhy
Copy link
Copy Markdown
Author

Vizhy commented May 5, 2026

Thanks for the thorough automated review — two valid issues were caught and both are addressed in the follow-up commit (da6de64):

1. TTL value fix (Correctness)
_cache_control() now emits {"ttl": "1h"} (string) instead of {"ttl": 3600} (integer), correctly matching the CacheControlEphemeralParam SDK type definition. The corresponding test assertions have been updated to match.

2. Pre-structured system blocks (Design Approach)
prepare_settings_dict() now handles both input shapes for system:

  • str → wrapped into a single content block with cache_control
  • list[dict]cache_control injected on the last block (same pattern used for tools), with a copy.deepcopy to avoid mutation

A test covering the list[dict] case has been added to document this behaviour explicitly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Pull requests for the Python Semantic Kernel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants