New Feature: Deterministic execution, replay, and audit gaps for long-running Semantic Kernel agent workflows

---
name: Feature request
about: Deterministic execution, replay, and audit gaps for long-running Semantic Kernel agent workflows

---







### **Issue  Description**
Semantic Kernel is increasingly used to orchestrate multi-step agent workflows that interact with external systems. However, there is currently no deterministic execution model that supports replayability, auditable state transitions, or safe recovery for long-running workflows.

When a workflow partially executes and fails, there is no reliable way to determine:

- Which steps completed
- Which external side effects occurred
- What memory state existed at the time of failure  or How to replay the workflow deterministically.

This limits Semantic Kernel adoption in enterprise and regulated environments.

### Scenario description

I'm using Semantic Kernel to orchestrate AI-assisted workflows with these characteristics:
- Multi-step planning and execution
- Tool invocation with side effects (APIs, storage, queues)
- Memory reads and writes during execution
- Long-running execution (minutes to hours)
- Possible restarts, retries, or human-in-the-loop pauses

The workflow must be auditable, replayable, and recoverable.

### Example code

```
var kernel = Kernel.CreateBuilder()
    .AddOpenAIChatCompletion("gpt-4", endpoint, apiKey)
    .Build();

var planner = new SequentialPlanner(kernel);

var plan = await planner.CreatePlanAsync("""
    1. Analyze the incoming request
    2. Retrieve customer data
    3. Call external credit service
    4. Persist decision
    5. Notify downstream systems
"");

var context = new KernelArguments
{
    ["requestId"] = "REQ-123",
    ["tenantId"] = "TENANT-A"
};

await kernel.RunAsync(plan, context);
```

### Failure scenario

If the process crashes or restarts after step 3:

- Step 3 may already have triggered an external side effect.
- Memory may have been mutated.
- Steps 4 and 5 may or may not have executed.
- There is no execution record that clearly captures what happened.

Attempting to rerun the plan risks:

- duplicating side effects,
- violating idempotency,
- producing inconsistent results.


### Expected behavior

Semantic Kernel should provide primitives that allow:

- Explicit execution boundaries and checkpoints
- Deterministic replay of workflows with identical inputs
- Clear distinction between reasoning steps and side-effecting steps
- Auditable execution history tied to memory state
- Safe resume or compensation strategies after partial failure

### Actual behavior

- Execution has no explicit checkpoints.
- Side effects are not tracked or classified.
- Memory mutations are not versioned.
- There is no built-in execution log or replay mechanism.
- Developers must build custom infrastructure around Semantic Kernel to handle these concerns.

### **Proposed solution**

Introduce a first-class execution model for Semantic Kernel.

1. Execution step abstraction - Each step should be explicitly modeled:

```
public record ExecutionStep(
    string StepId,
    string FunctionName,
    ExecutionStepType StepType, // Reasoning | SideEffect | Idempotent
    ExecutionStatus Status,
    DateTimeOffset Timestamp
);
```
 
2. Execution checkpoints -  Allow workflows to declare checkpoints:

```
kernel.Options.EnableCheckpoints = true;
kernel.Options.CheckpointInterval = CheckpointInterval.AfterEachStep;
```

Checkpoints should capture:

- memory snapshot
- step execution state
- inputs and outputs

3. Deterministic replay mode - Provide a replay API:

```
await kernel.ReplayAsync(
    executionId: "exec-123",
    ReplayMode.Deterministic
);
```
Replay mode would:

- reuse recorded decisions and tool outputs (when safe),
- avoid re-triggering side effects,
- allow inspection and debugging.

4. Side-effect classification for tools - Allow tool authors to declare behavior:

```
[KernelFunction(SideEffect = SideEffectType.External)]
public async Task NotifyDownstreamAsync(...) { }
```

This enables:

- safe retries,
- compensation logic,
- replay without duplication.
- 

5. Versioned memory snapshots - Memory should be versioned per execution step:

`var snapshot = kernel.Memory.GetSnapshot(stepId);`

This allows:

- forensic audit,
- regulatory review,
- decision explainability.
 


### **Alternatives considered**

1. External workflow engines (Durable Functions, Temporal, Camunda) can provide replay and checkpoints, but they do not address Semantic Kernel-specific needs such as plan/step semantics, tool invocation classification, and memory evolution tied to execution steps. Teams still end up building a custom “control layer” around SK.
2. Application-level logging is insufficient because it does not provide deterministic replay, memory reconstruction, or safe side-effect handling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New Feature: Deterministic execution, replay, and audit gaps for long-running Semantic Kernel agent workflows #13435

Issue Description

Scenario description

Example code

Failure scenario

Expected behavior

Actual behavior

Proposed solution

Alternatives considered

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

New Feature: Deterministic execution, replay, and audit gaps for long-running Semantic Kernel agent workflows #13435

Description

Issue Description

Scenario description

Example code

Failure scenario

Expected behavior

Actual behavior

Proposed solution

Alternatives considered

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions