feat(agentserver): Durable Tasks for azure-ai-agentserver-core#46997
Open
RaviPidaparthi wants to merge 33 commits into
Open
feat(agentserver): Durable Tasks for azure-ai-agentserver-core#46997RaviPidaparthi wants to merge 33 commits into
RaviPidaparthi wants to merge 33 commits into
Conversation
…-core Implements a crash-resilient durable task system with: - @durable_task decorator with full lifecycle management (start, run, get, cancel, terminate) - TaskResult[Output] wrapper replacing exception-based suspension handling - Cooperative cancellation and configurable timeouts - Configurable retry policies with backoff - Callable factories for tags, title, and description - Local in-memory provider for development/testing - Task streaming support via AsyncIterator - Lease-based distributed locking - Ephemeral and persistent task modes - Task metadata and source provenance tracking Includes: - 248 passing tests across 17 test modules - 3 sample applications (retry, source, streaming) - Developer guide documentation - Spec files (001-006) covering all design decisions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- TaskMetadata: add MutableMapping dict protocol (__setitem__, __getitem__, __delitem__, __contains__, __iter__, __len__, keys, values, items) with dirty-tracking on mutations - Fix cspell CI failures: rename 'sess' abbreviations in _models.py, test_local_provider.py, test_models.py, test_source.py - CHANGELOG 2.0.0b4: document all durable long-running agent features - README: add durable agents section with code examples and dev guide link - Developer guide: update metadata examples to dict-style syntax - Invocations: bump core dep to >=2.0.0b4, add durable samples changelog - Specs 001-007 and backlog: all 16 items resolved Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Explain the problem (containers can die), the 4-step durability mechanism (persist → lease → recover → complete), and the net effect before listing what the developer doesn't need to think about. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Clarify that durable tasks are not a checkpoint/replay engine, not a result store, not a stream log, not app-level persistence, and not unbounded storage. Fix misleading 'checkpoint progress' language to 'lightweight progress signals'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Clarify that the framework recovers crashed tasks on container restart automatically, not in response to a caller calling .run() again. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix name default: __qualname__, not 'Function name' - Add missing ctx.agent_name and ctx.lease_generation to properties table - Fix recovery description: automatic at startup + on .run()/.start() - Fix cancel semantics: function returning normally = success, not TaskCancelled - Update cancel vs terminate table with accurate outcomes - Fix resume docs: both .run() and .start() handle suspended tasks Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Sphinx: remove durable re-exports from core/__init__.py to fix duplicate object description warnings (symbols documented at both core and core.durable levels) - MyPy: fix 3 type errors (_run.py Future type, _manager.py narrowing) - Pylint: fix 55 issues across 7 files (docstrings, unused imports, import ordering, complexity suppressions) - Constitution v1.3.0: add pre-push validation gate (NON-NEGOTIABLE) All checks pass locally: pylint 10.00/10, mypy clean, sphinx clean, 261 tests passed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ng, samples Steering: - Full steering implementation with generation model, pending queue, drain logic - ctx.was_steered, ctx.previous_input, ctx.pending_inputs, ctx.generation - SteeringQueueFull exception, TaskResult.is_superseded - Completion-vs-steering race handling with etag - Crash recovery with drain_in_progress flag Task listing: - DurableTask.list(status, session_id) with auto-scoping per function - Server-side: agent_name, session_id, tag, status filters - Client-side: source.type filter (until DEV-009 resolved) - Provider protocol + local provider tag AND filtering Reserved tag protection: - _strip_reserved_tags() at all entry points (decorator, callsite, options) - Framework auto-stamps _durable_task_name tag, always wins Recovery routing: - _find_resume_callback() matches source.name first (stable anchor) - name param documented as stable identity anchor Other: - Local provider payload merge fixed to strict shallow (spec §11) - steering_poll_seconds removed from public API (internal 2s default kept) - Multi-worker references removed (single-container model) - Developer guide cleaned of internal implementation details - Steering spec updated to match implementation - Samples: durable_claude, durable_copilot, updated durable_langgraph Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ming Replace hardcoded asyncio.Queue with a pluggable StreamHandler protocol (put/get/close) for the durable task streaming path. Changes: - New _stream.py: StreamHandler protocol + QueueStreamHandler default - Refactored _context.py, _run.py, _manager.py: _stream_queue -> _stream_handler - Added stream_handler param to start()/run() in _decorator.py - Updated __init__.py exports - Updated test_streaming.py and test_sample_e2e.py - Updated developer guide with Custom Stream Handlers section - SSE streaming samples and invocations framework updates Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add get_active_run() to DurableTaskManager and DurableTask decorator for late-join stream consumers - Add comprehensive StreamHandler test suite (12 tests): custom handler dispatch, default behavior, steering carry-over, close on success/failure, error propagation, late-join via get_active_run, protocol conformance - Fix LangGraph sample to use ctx.stream() instead of private queue - Update developer guide with late-join consumer documentation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Recovery and resume paths previously defaulted to QueueStreamHandler, silently losing any custom stream transport. Add stream_handler_factory to the decorator so the framework can reconstruct the correct handler on crash-recovery and resume without a caller. Resolution order: call-site handler > factory > QueueStreamHandler. - Add StreamHandlerFactory type alias to _stream.py - Add stream_handler_factory to DurableTaskOptions and @durable_task - Thread stream_handler through _start_existing_task (resume/recovery) - Use factory fallback in both create_and_start and _start_existing_task - Add 3 tests: factory on fresh, call-site override, factory on recovery - Update developer guide with factory docs and decorator options table Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…urable-tasks # Conflicts: # sdk/agentserver/.gitignore # sdk/agentserver/azure-ai-agentserver-core/CHANGELOG.md # sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_base.py # sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_tracing.py # sdk/agentserver/azure-ai-agentserver-core/samples/selfhosted_invocation/selfhosted_invocation.py # sdk/agentserver/azure-ai-agentserver-core/tests/test_tracing_e2e.py # sdk/agentserver/azure-ai-agentserver-invocations/CHANGELOG.md # sdk/agentserver/azure-ai-agentserver-invocations/azure/ai/agentserver/invocations/_invocation.py # sdk/agentserver/azure-ai-agentserver-invocations/tests/conftest.py # sdk/agentserver/azure-ai-agentserver-invocations/tests/test_span_parenting.py # sdk/agentserver/azure-ai-agentserver-invocations/tests/test_tracing.py
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Implements spec-009’s “pluggable stream handler” work for the durable task framework by introducing a StreamHandler protocol with a default QueueStreamHandler, plus related durable-task capabilities (retry, resume route, metadata, samples/tests) and extensive formatting/tidying across tests and samples.
Changes:
- Added a pluggable streaming abstraction (
StreamHandler,QueueStreamHandler, factory type) and wired it intoTaskContext.stream()andTaskRunasync iteration. - Introduced/expanded durable-task building blocks:
TaskResult,RetryPolicy, resume HTTP route, hosted provider client, lease renewal helper, and substantial new test coverage + samples. - Updated docs/changelogs and reformatted various tests/samples for style consistency.
Reviewed changes
Copilot reviewed 88 out of 92 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/agentserver/azure-ai-agentserver-invocations/tests/test_tracing_e2e.py | Formatting-only adjustments (line wrapping/blank lines). |
| sdk/agentserver/azure-ai-agentserver-invocations/tests/test_session_id.py | Formatting-only adjustments (blank lines, wrapped AsyncClient context). |
| sdk/agentserver/azure-ai-agentserver-invocations/tests/test_server_routes.py | Formatting-only adjustments (blank lines). |
| sdk/agentserver/azure-ai-agentserver-invocations/tests/test_request_limits.py | Minor whitespace cleanup. |
| sdk/agentserver/azure-ai-agentserver-invocations/tests/test_request_id.py | Formatting-only adjustments. |
| sdk/agentserver/azure-ai-agentserver-invocations/tests/test_multimodal_protocol.py | Minor whitespace cleanup and section spacing. |
| sdk/agentserver/azure-ai-agentserver-invocations/tests/test_invoke.py | Formatting-only adjustments (blank lines). |
| sdk/agentserver/azure-ai-agentserver-invocations/tests/test_graceful_shutdown.py | Formatting + wrapped long asserts for readability. |
| sdk/agentserver/azure-ai-agentserver-invocations/tests/test_get_cancel.py | Minor whitespace cleanup. |
| sdk/agentserver/azure-ai-agentserver-invocations/tests/test_edge_cases.py | Formatting-only adjustments (blank lines). |
| sdk/agentserver/azure-ai-agentserver-invocations/tests/test_decorator_pattern.py | Formatting (wrapped JSONResponse returns). |
| sdk/agentserver/azure-ai-agentserver-invocations/samples/streaming_invoke_agent/streaming_invoke_agent.py | Reformatted token list for readability. |
| sdk/agentserver/azure-ai-agentserver-invocations/samples/simple_invoke_agent/simple_invoke_agent.py | Minor whitespace cleanup. |
| sdk/agentserver/azure-ai-agentserver-invocations/samples/multiturn_invoke_agent/multiturn_invoke_agent.py | Formatting; JSONResponse construction wrapped. |
| sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/store.py | New sample persistence helper (file-backed JSON store). |
| sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/requirements.txt | New sample requirements. |
| sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/app.py | New durable multiturn sample host wiring. |
| sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/agent.py | New durable multiturn sample agent task. |
| sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/store.py | New sample persistence helper (file-backed JSON store). |
| sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/requirements.txt | New sample requirements (LangGraph + deps). |
| sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/app.py | New streaming + steering durable LangGraph host sample. |
| sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/store.py | New sample persistence helper (file-backed JSON store). |
| sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/requirements.txt | New sample requirements (Copilot SDK, core, Starlette, uvicorn). |
| sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/app.py | New durable Copilot host sample with SSE. |
| sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/agent.py | New steerable durable Copilot agent sample. |
| sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_claude/store.py | New sample persistence helper (file-backed JSON store). |
| sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_claude/requirements.txt | New sample requirements (Anthropic SDK + runtime deps). |
| sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_claude/app.py | New durable Claude host sample with SSE. |
| sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_claude/agent.py | New steerable durable Claude agent sample. |
| sdk/agentserver/azure-ai-agentserver-invocations/samples/async_invoke_agent/async_invoke_agent.py | Formatting-only adjustments (wrapped JSON dict literals). |
| sdk/agentserver/azure-ai-agentserver-invocations/CHANGELOG.md | Changelog updates to mention durable samples + dependency bump. |
| sdk/agentserver/azure-ai-agentserver-core/tests/test_tracing.py | Formatting-only adjustments. |
| sdk/agentserver/azure-ai-agentserver-core/tests/test_startup_logging.py | Formatting-only adjustments and wrapped long lines. |
| sdk/agentserver/azure-ai-agentserver-core/tests/test_server_routes.py | Minor whitespace cleanup. |
| sdk/agentserver/azure-ai-agentserver-core/tests/test_logger.py | Minor whitespace cleanup. |
| sdk/agentserver/azure-ai-agentserver-core/tests/test_graceful_shutdown.py | Formatting-only adjustments and wrapped long asserts. |
| sdk/agentserver/azure-ai-agentserver-core/tests/test_config.py | Formatting for long function signatures. |
| sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_task_result.py | New tests for TaskResult wrapper behavior + guardrails. |
| sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_streaming.py | New tests for pluggable stream handler integration. |
| sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_source.py | New tests exercising source field persistence. |
| sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_retry.py | New tests for RetryPolicy and retry integration. |
| sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_resume_route.py | New tests for the resume HTTP route behavior. |
| sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_models.py | New tests for durable models/exceptions. |
| sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_metadata.py | New tests for dict-like TaskMetadata + flush semantics. |
| sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_local_provider.py | New tests for local durable provider CRUD/listing. |
| sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_lifecycle.py | New lifecycle automation tests. |
| sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_get.py | New tests for DurableTask.get(). |
| sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_entry_mode.py | New tests for ctx.entry_mode across paths. |
| sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_decorator.py | New tests for @durable_task decorator/options/type extraction. |
| sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_cancellation_timeout.py | New tests for cancellation, timeout, and termination. |
| sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_callable_factories.py | New tests for callable factories on tags/description. |
| sdk/agentserver/azure-ai-agentserver-core/tests/durable/init.py | New package init for durable tests. |
| sdk/agentserver/azure-ai-agentserver-core/tests/conftest.py | Formatting-only adjustments. |
| sdk/agentserver/azure-ai-agentserver-core/samples/durable_streaming/requirements.txt | New durable sample requirements. |
| sdk/agentserver/azure-ai-agentserver-core/samples/durable_streaming/durable_streaming.py | New sample demonstrating streaming with durable tasks. |
| sdk/agentserver/azure-ai-agentserver-core/samples/durable_source/requirements.txt | New durable sample requirements. |
| sdk/agentserver/azure-ai-agentserver-core/samples/durable_source/durable_source.py | New sample demonstrating source usage. |
| sdk/agentserver/azure-ai-agentserver-core/samples/durable_retry/requirements.txt | New durable sample requirements. |
| sdk/agentserver/azure-ai-agentserver-core/samples/durable_retry/durable_retry.py | New sample demonstrating retry policies. |
| sdk/agentserver/azure-ai-agentserver-core/pyproject.toml | Added httpx dependency + optional hosted extras (azure-identity). |
| sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_stream.py | New StreamHandler protocol + default QueueStreamHandler + factory alias. |
| sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_run.py | New TaskRun async-iter streaming integration and lifecycle control methods. |
| sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_retry.py | New RetryPolicy implementation and presets. |
| sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_resume_route.py | New Starlette route for POST /tasks/resume. |
| sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_result.py | New TaskResult wrapper class. |
| sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_provider.py | New storage provider protocol for durable subsystem. |
| sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_metadata.py | New dict-like TaskMetadata with flush/auto-flush. |
| sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_lease.py | New lease identity utilities + renewal loop. |
| sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_exceptions.py | New durable exception types (failed/suspended/cancelled/etc.). |
| sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_context.py | New TaskContext with stream support and lifecycle fields. |
| sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_client.py | New hosted durable task provider httpx client. |
| sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/init.py | New public durable API exports. |
| sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_middleware.py | Formatting-only adjustments for imports/log calls. |
| sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_errors.py | Minor formatting simplification. |
| sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_config.py | Minor formatting simplification. |
| sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/init.py | Minor whitespace cleanup. |
| sdk/agentserver/azure-ai-agentserver-core/README.md | Added durable-task documentation section + link. |
| sdk/agentserver/azure-ai-agentserver-core/CHANGELOG.md | Large changelog entry documenting durable subsystem and other changes. |
| sdk/agentserver/.gitignore | Added .vscode/ ignore. |
Comments suppressed due to low confidence (1)
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/store.py:1
- For JSON persistence, it’s better to write/read with an explicit encoding (UTF-8) for cross-platform consistency. Consider using
open(fd, \"w\", encoding=\"utf-8\")(oros.fdopen) and also usingread_text(encoding=\"utf-8\")inload()to avoid platform-default encoding surprises.
Comment on lines
+57
to
+72
| if initial_delay.total_seconds() < 0: | ||
| raise ValueError(f"initial_delay must be >= 0, got {initial_delay}") | ||
| if max_attempts < 1 and not ( | ||
| max_attempts == 1 and initial_delay == timedelta(0) | ||
| ): | ||
| pass # allow no_retry preset | ||
| if backoff_coefficient < 1.0: | ||
| raise ValueError( | ||
| f"backoff_coefficient must be >= 1.0, got {backoff_coefficient}" | ||
| ) | ||
| if max_delay < initial_delay: | ||
| raise ValueError( | ||
| f"max_delay ({max_delay}) must be >= initial_delay ({initial_delay})" | ||
| ) | ||
| if max_attempts < 1: | ||
| raise ValueError(f"max_attempts must be >= 1, got {max_attempts}") |
Comment on lines
+191
to
+192
| except Exception as exc: | ||
| if "not found" in str(exc).lower(): |
Comment on lines
+210
to
+213
| if task_info.payload and "metadata" in task_info.payload: | ||
| meta_data: dict[str, Any] = task_info.payload["metadata"] | ||
| for key, value in meta_data.items(): | ||
| self._metadata.set(key, value) |
| and self._flush_callback is not None | ||
| and self._flush_task is None | ||
| ): | ||
| self._flush_task = asyncio.get_event_loop().create_task( |
Comment on lines
+60
to
+67
| except Exception as exc: # pylint: disable=broad-exception-caught | ||
| msg = str(exc).lower() | ||
| if "not found" in msg: | ||
| return Response(status_code=404) | ||
| if "not 'suspended'" in msg or "already" in msg or "conflict" in msg: | ||
| return Response(status_code=409) | ||
| logger.error("Resume failed for task %s: %s", task_id, exc, exc_info=True) | ||
| return Response(status_code=500) |
|
|
||
| ### Breaking Changes | ||
|
|
||
| - **`source` parameter removed** — The `source` keyword argument has been removed from `@durable_task()`, `.run()`, `.start()`, and `.options()`. Source provenance is now auto-stamped by the framework and cannot be overridden by developers. Use `tags` for custom metadata. |
- Pin aiohttp>=3.9.0,<4.0.0 to prevent pre-release 4.0.0a1 from being pulled by --pre flag (fails to compile on Python 3.13) - Disable mindependency for invocations/responses since azure-ai-agentserver-core>=2.0.0b4 is not yet on PyPI - Disable apistub for core (tool bug with Generic[Input,Output] on 3.10) - Change task API route from /storage/tasks to /internal/tasks - Add durable task overview documentation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
….0a0 - AgentServerHost lifespan now automatically creates and initializes a DurableTaskManager during startup, and shuts it down on exit. This fixes 'DurableTaskManager not initialized' errors when using @durable_task without manual manager setup. - Pin aiohttp<4.0.0a0 to exclude pre-release 4.0.0a1 which fails to build (missing longintrepr.h) when CI uses --pre flag for nightly builds. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Changed HostedDurableTaskProvider base URL from /storage/tasks to /tasks - Task API integration remains disabled (FOUNDRY_TASK_API_ENABLED=0) - Includes all durable demo improvements: 12-stage research pipeline, crash recovery, GET reconnect with file fallback, cancel support, supervisor proxy, and updated README with demo script Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ointer Replace hand-crafted @durable_task checkpoint logic with LangGraph StateGraph and AsyncSqliteSaver. This eliminates FileStreamHandler, manual metadata management, and JSONL-based replay. Key changes: - agent.py: LangGraph StateGraph with looping research_stage node - app.py: Simplified HTTP handlers (no durable task framework imports) - GET handler: replays from checkpoint state instead of JSONL files - Cancel: asyncio.Event checked at node entry - requirements.txt: added langgraph, langgraph-checkpoint-sqlite, aiosqlite - README: updated architecture docs - .env: committed for deployment config All 5 test scenarios pass: - Full 12-stage execution with checkpointing - Already-complete detection on re-invocation - Cancel mid-execution (stops at next node boundary) - Resume after cancel (clears stale cancel flag) - Unknown thread returns None Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…e checkpointer" This reverts commit 4cf120a.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- POST handler returns 202 immediately (fire-and-forget) with invocation_id, session_id, task_id in response body - GET handler streams SSE with sequential id: N on each event - Supports last_event_id query param to skip already-seen events on reconnect (platform strips non x-client- headers) - Crash handler returns 202 then exits asynchronously - Session ID resolution simplified to use framework config - Demo client (demo-client.sh): POST→GET flow, client-side skip, LAST_EVENT_ID tracking, logs command for 3rd terminal - Verified live: crash mid-stage-5 → reconnect resumes from stage 5 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… architecture - Document POST 202 fire-and-forget flow - Document GET streaming with last_event_id query param - Add container logs section (azd ai agent monitor) - Update manual curl demo steps - Add 'How it works (client flow)' section Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…veness Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a complete Durable Task Framework to
azure-ai-agentserver-core— enabling crash-resilient, long-running agents that survive container crashes, OOM kills, and redeployments on Azure AI Foundry Hosted Agents.Key Capabilities
Core Framework (
@durable_taskdecorator).run()and.start()automatically start, resume, or recover tasks based on their current state in the task store.ctx.entry_modetells the function whether it was entered"fresh","resumed"from suspension, or"recovered"from a crash.ctx.suspend(output=..., reason=...)pauses execution for multi-turn agent patterns.run()andresult()returnTaskResult[Output]with.is_completed/.is_suspendedproperties.ctx.cancelevent, configurabletimeout, andterminate()for forced shutdown..exponential_backoff(),.fixed_delay(),.linear_backoff(),.no_retry().type,name,server_version).my_task.list(status=...)returns all tasks for a function, scoped by auto-stamped name tag.Streaming
StreamHandlerprotocol —put(),get(),close()async methods. Runtime-checkable.QueueStreamHandler— Defaultasyncio.Queue-based implementation (preserves existing behavior).stream_handler=— Pass a custom handler (e.g., Redis, WebSocket) tostart()/run().stream_handler_factoryon decorator — Ensures crash-recovery reconstructs the correct handler.get_active_run(task_id)— Late-join API for consumers to attach to an already-running task's stream.Steerable Tasks
steerable=True— Enables mid-flight steering where new inputs queue while a task is running.ctx.cancelis automatically set when new inputs arrive.TaskRun.result()resolves withstatus="superseded".Files Changed
New files
durable/_stream.py— StreamHandler protocol, QueueStreamHandler, StreamHandlerFactorydurable/_context.py— TaskContext with streaming, metadata, suspend, canceldurable/_run.py— TaskRun async iterator and handle operationsdurable/_manager.py— DurableTaskManager lifecycle enginedurable/_decorator.py—@durable_taskdecorator and DurableTask classdurable/_retry.py— RetryPolicy with factory presetsdurable/_local_provider.py— In-memory task store for testingdurable/_types.py— Protocol types (TaskStoreProvider, etc.)tests/durable/— 15+ test modules covering all functionalitydocs/durable-task-developer-guide.md— Comprehensive developer guide (1400+ lines)samples/durable_langgraph/— LangGraph integration samplesamples/durable_multiturn/— Multi-turn suspend/resume sampleModified files
durable/__init__.py— Public exportsCHANGELOG.md— Release notes for 2.0.0b4Validation