Skip to content

Python: .NET & Python: [Feature]: Externalize Durable Agents conversation storage to customer-owned stores #6675

Description

@cgillum

Summary

The Durable Agents extension (Microsoft.Agents.AI.DurableTask for .NET, agent-framework-durabletask for Python) currently stores the full conversation history for every agent session inside Durable Task framework (DTF) entity state. This causes two adoption-blocking problems:

  1. Hard ~1 MB size ceiling. The Durable Task Scheduler (DTS) limits a single entity payload to ~1 MB by default. Because each turn re-checkpoints the full conversation history, long-running sessions hit a wall. The LargePayloadStorage interceptor in durabletask-dotnet raises this to ~10 MB but does not remove the underlying unbounded-growth problem.
  2. No customer ownership of conversation data. Conversations live only inside DTF entity state, opaque to the customer's data plane. Compliance, audit, and analytics customers need transcripts in storage they own and operate (e.g., their own Cosmos DB).

Proposed solution

Introduce a pluggable DurableAgentConversationStore extension point that lets customers redirect durable-agent conversation history into a store they own, while preserving today's default in-entity-state behavior. Shipped implementations:

  • EntityStateConversationStoredefault, preserves today's behavior (exactly-once; the store is entity state).
  • A first-class backend store (e.g., Cosmos) that persists an idempotency marker atomically with each turn — recommended for production (exactly-once).
  • ChatHistoryProviderConversationStore / HistoryProviderConversationStore — a low-friction adapter bridging to any existing ChatHistoryProvider (.NET) / HistoryProvider (Python). Documented as at-least-once because the existing provider contract has no correlation-keyed backend primitive.

The contract centers on an atomic CommitTurnAsync(correlationId, request, response) primitive plus a TryGetTurnAsync(correlationId) replay check. Read-back is via a new DurableTaskClient.GetAgentConversationHistoryAsync(sessionId) extension method.

Full design, alternatives, and the idempotency analysis are captured in ADR-0027 (PR to follow).

Out of scope (tracked separately)

  • Entity-level compaction (the contract reserves a ReplaceAsync primitive for it).
  • Per-content externalization of large AIContent items.
  • Per-tool checkpointing inside a turn (to close the tool-double-execution window).
  • Reconciling .NET vs. Python entity-state schema-version handling.
  • A reference-holding external HistoryProvider for Python (none exists today).

Acceptance criteria

  • ADR-0027 reviewed and accepted by deciders.
  • .NET: DurableAgentConversationStore abstraction + default store + first-class backend store + bridge + UseConversationStore option + read-back extension + lazy migration + schema bump to 1.2.0.
  • Python parity for the storage abstraction and worker configuration.
  • Unit + integration tests, including the at-least-once duplicate-window negative test and exactly-once recovery test.
  • Documentation under docs/features/durable-agents/, including the LargePayloadStorage mitigation as a Phase-1 alternative.

References

  • dotnet/src/Microsoft.Agents.AI.DurableTask/AgentEntity.cs — current dual-storage entity.
  • dotnet/src/Microsoft.Agents.AI.Abstractions/ChatHistoryProvider.cs — provider abstraction to bridge.
  • python/packages/durabletask/agent_framework_durabletask/_entities.py — Python entity (same dual-storage pattern).

Metadata

Metadata

Assignees

No one assigned

    Labels

    .NETUsage: [Issues, PRs], Target: .NetdurabletaskUsage: [Issues, PRs], Target: durable taskpythonUsage: [Issues, PRs], Target: Python

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions