Skip to content

feat(agentserver): Durable Tasks for azure-ai-agentserver-core#46997

Open
RaviPidaparthi wants to merge 33 commits into
mainfrom
feature/agentserver-durable-tasks
Open

feat(agentserver): Durable Tasks for azure-ai-agentserver-core#46997
RaviPidaparthi wants to merge 33 commits into
mainfrom
feature/agentserver-durable-tasks

Conversation

@RaviPidaparthi
Copy link
Copy Markdown
Member

@RaviPidaparthi RaviPidaparthi commented May 19, 2026

Summary

Adds a complete Durable Task Framework to azure-ai-agentserver-core — enabling crash-resilient, long-running agents that survive container crashes, OOM kills, and redeployments on Azure AI Foundry Hosted Agents.

Key Capabilities

Core Framework (@durable_task decorator)

  • Lifecycle automation.run() and .start() automatically start, resume, or recover tasks based on their current state in the task store.
  • Entry mode awarenessctx.entry_mode tells the function whether it was entered "fresh", "resumed" from suspension, or "recovered" from a crash.
  • Suspend & resumectx.suspend(output=..., reason=...) pauses execution for multi-turn agent patterns.
  • TaskResult wrapperrun() and result() return TaskResult[Output] with .is_completed / .is_suspended properties.
  • Cancellation & timeout — Cooperative cancel via ctx.cancel event, configurable timeout, and terminate() for forced shutdown.
  • RetryPolicy — Configurable retry with factory presets: .exponential_backoff(), .fixed_delay(), .linear_backoff(), .no_retry().
  • Source auto-stamping — Framework automatically stamps every task with provenance metadata (type, name, server_version).
  • TaskMetadata — Dict-like mutable progress metadata with debounced auto-flush to the task store.
  • Task listingmy_task.list(status=...) returns all tasks for a function, scoped by auto-stamped name tag.

Streaming

  • Pluggable StreamHandler protocolput(), get(), close() async methods. Runtime-checkable.
  • QueueStreamHandler — Default asyncio.Queue-based implementation (preserves existing behavior).
  • Call-site stream_handler= — Pass a custom handler (e.g., Redis, WebSocket) to start()/run().
  • stream_handler_factory on decorator — Ensures crash-recovery reconstructs the correct handler.
  • get_active_run(task_id) — Late-join API for consumers to attach to an already-running task's stream.

Steerable Tasks

  • steerable=True — Enables mid-flight steering where new inputs queue while a task is running.
  • Automatic drain — Framework drains the queue after function suspends/completes, re-entering with next input.
  • Cancel signalctx.cancel is automatically set when new inputs arrive.
  • Superseded results — Previous generation's TaskRun.result() resolves with status="superseded".
  • Distributed steering — Lease renewal polls for pending inputs from other processes.

Files Changed

New files

  • durable/_stream.py — StreamHandler protocol, QueueStreamHandler, StreamHandlerFactory
  • durable/_context.py — TaskContext with streaming, metadata, suspend, cancel
  • durable/_run.py — TaskRun async iterator and handle operations
  • durable/_manager.py — DurableTaskManager lifecycle engine
  • durable/_decorator.py@durable_task decorator and DurableTask class
  • durable/_retry.py — RetryPolicy with factory presets
  • durable/_local_provider.py — In-memory task store for testing
  • durable/_types.py — Protocol types (TaskStoreProvider, etc.)
  • tests/durable/ — 15+ test modules covering all functionality
  • docs/durable-task-developer-guide.md — Comprehensive developer guide (1400+ lines)
  • samples/durable_langgraph/ — LangGraph integration sample
  • samples/durable_multiturn/ — Multi-turn suspend/resume sample

Modified files

  • durable/__init__.py — Public exports
  • CHANGELOG.md — Release notes for 2.0.0b4

Validation

  • 311 core tests passing ✅
  • 186 invocations tests passing ✅
  • Black ✅ | Pylint 9.49/10 ✅ | MyPy 0 issues ✅

RaviPidaparthi and others added 18 commits May 12, 2026 03:44
…-core

Implements a crash-resilient durable task system with:

- @durable_task decorator with full lifecycle management (start, run, get, cancel, terminate)
- TaskResult[Output] wrapper replacing exception-based suspension handling
- Cooperative cancellation and configurable timeouts
- Configurable retry policies with backoff
- Callable factories for tags, title, and description
- Local in-memory provider for development/testing
- Task streaming support via AsyncIterator
- Lease-based distributed locking
- Ephemeral and persistent task modes
- Task metadata and source provenance tracking

Includes:
- 248 passing tests across 17 test modules
- 3 sample applications (retry, source, streaming)
- Developer guide documentation
- Spec files (001-006) covering all design decisions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- TaskMetadata: add MutableMapping dict protocol (__setitem__,
  __getitem__, __delitem__, __contains__, __iter__, __len__, keys,
  values, items) with dirty-tracking on mutations
- Fix cspell CI failures: rename 'sess' abbreviations in _models.py,
  test_local_provider.py, test_models.py, test_source.py
- CHANGELOG 2.0.0b4: document all durable long-running agent features
- README: add durable agents section with code examples and dev guide link
- Developer guide: update metadata examples to dict-style syntax
- Invocations: bump core dep to >=2.0.0b4, add durable samples changelog
- Specs 001-007 and backlog: all 16 items resolved

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Explain the problem (containers can die), the 4-step durability mechanism
(persist → lease → recover → complete), and the net effect before listing
what the developer doesn't need to think about.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Clarify that durable tasks are not a checkpoint/replay engine, not a
result store, not a stream log, not app-level persistence, and not
unbounded storage. Fix misleading 'checkpoint progress' language to
'lightweight progress signals'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Clarify that the framework recovers crashed tasks on container restart
automatically, not in response to a caller calling .run() again.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix name default: __qualname__, not 'Function name'
- Add missing ctx.agent_name and ctx.lease_generation to properties table
- Fix recovery description: automatic at startup + on .run()/.start()
- Fix cancel semantics: function returning normally = success, not TaskCancelled
- Update cancel vs terminate table with accurate outcomes
- Fix resume docs: both .run() and .start() handle suspended tasks

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Sphinx: remove durable re-exports from core/__init__.py to fix
  duplicate object description warnings (symbols documented at both
  core and core.durable levels)
- MyPy: fix 3 type errors (_run.py Future type, _manager.py narrowing)
- Pylint: fix 55 issues across 7 files (docstrings, unused imports,
  import ordering, complexity suppressions)
- Constitution v1.3.0: add pre-push validation gate (NON-NEGOTIABLE)

All checks pass locally: pylint 10.00/10, mypy clean, sphinx clean,
261 tests passed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ng, samples

Steering:
- Full steering implementation with generation model, pending queue, drain logic
- ctx.was_steered, ctx.previous_input, ctx.pending_inputs, ctx.generation
- SteeringQueueFull exception, TaskResult.is_superseded
- Completion-vs-steering race handling with etag
- Crash recovery with drain_in_progress flag

Task listing:
- DurableTask.list(status, session_id) with auto-scoping per function
- Server-side: agent_name, session_id, tag, status filters
- Client-side: source.type filter (until DEV-009 resolved)
- Provider protocol + local provider tag AND filtering

Reserved tag protection:
- _strip_reserved_tags() at all entry points (decorator, callsite, options)
- Framework auto-stamps _durable_task_name tag, always wins

Recovery routing:
- _find_resume_callback() matches source.name first (stable anchor)
- name param documented as stable identity anchor

Other:
- Local provider payload merge fixed to strict shallow (spec §11)
- steering_poll_seconds removed from public API (internal 2s default kept)
- Multi-worker references removed (single-container model)
- Developer guide cleaned of internal implementation details
- Steering spec updated to match implementation
- Samples: durable_claude, durable_copilot, updated durable_langgraph

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ming

Replace hardcoded asyncio.Queue with a pluggable StreamHandler protocol
(put/get/close) for the durable task streaming path.

Changes:
- New _stream.py: StreamHandler protocol + QueueStreamHandler default
- Refactored _context.py, _run.py, _manager.py: _stream_queue -> _stream_handler
- Added stream_handler param to start()/run() in _decorator.py
- Updated __init__.py exports
- Updated test_streaming.py and test_sample_e2e.py
- Updated developer guide with Custom Stream Handlers section
- SSE streaming samples and invocations framework updates

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add get_active_run() to DurableTaskManager and DurableTask decorator
  for late-join stream consumers
- Add comprehensive StreamHandler test suite (12 tests):
  custom handler dispatch, default behavior, steering carry-over,
  close on success/failure, error propagation, late-join via
  get_active_run, protocol conformance
- Fix LangGraph sample to use ctx.stream() instead of private queue
- Update developer guide with late-join consumer documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Recovery and resume paths previously defaulted to QueueStreamHandler,
silently losing any custom stream transport. Add stream_handler_factory
to the decorator so the framework can reconstruct the correct handler
on crash-recovery and resume without a caller.

Resolution order: call-site handler > factory > QueueStreamHandler.

- Add StreamHandlerFactory type alias to _stream.py
- Add stream_handler_factory to DurableTaskOptions and @durable_task
- Thread stream_handler through _start_existing_task (resume/recovery)
- Use factory fallback in both create_and_start and _start_existing_task
- Add 3 tests: factory on fresh, call-site override, factory on recovery
- Update developer guide with factory docs and decorator options table

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…urable-tasks

# Conflicts:
#	sdk/agentserver/.gitignore
#	sdk/agentserver/azure-ai-agentserver-core/CHANGELOG.md
#	sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_base.py
#	sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_tracing.py
#	sdk/agentserver/azure-ai-agentserver-core/samples/selfhosted_invocation/selfhosted_invocation.py
#	sdk/agentserver/azure-ai-agentserver-core/tests/test_tracing_e2e.py
#	sdk/agentserver/azure-ai-agentserver-invocations/CHANGELOG.md
#	sdk/agentserver/azure-ai-agentserver-invocations/azure/ai/agentserver/invocations/_invocation.py
#	sdk/agentserver/azure-ai-agentserver-invocations/tests/conftest.py
#	sdk/agentserver/azure-ai-agentserver-invocations/tests/test_span_parenting.py
#	sdk/agentserver/azure-ai-agentserver-invocations/tests/test_tracing.py
@github-actions github-actions Bot added the Hosted Agents sdk/agentserver/* label May 19, 2026
@RaviPidaparthi RaviPidaparthi marked this pull request as ready for review May 19, 2026 19:21
@RaviPidaparthi RaviPidaparthi requested a review from ankitbko as a code owner May 19, 2026 19:21
Copilot AI review requested due to automatic review settings May 19, 2026 19:21
@RaviPidaparthi RaviPidaparthi requested a review from vangarp as a code owner May 19, 2026 19:21
@RaviPidaparthi RaviPidaparthi changed the title feat(agentserver): pluggable StreamHandler protocol and stream_handler_factory feat(agentserver): Durable Task Framework for azure-ai-agentserver-core May 19, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Implements spec-009’s “pluggable stream handler” work for the durable task framework by introducing a StreamHandler protocol with a default QueueStreamHandler, plus related durable-task capabilities (retry, resume route, metadata, samples/tests) and extensive formatting/tidying across tests and samples.

Changes:

  • Added a pluggable streaming abstraction (StreamHandler, QueueStreamHandler, factory type) and wired it into TaskContext.stream() and TaskRun async iteration.
  • Introduced/expanded durable-task building blocks: TaskResult, RetryPolicy, resume HTTP route, hosted provider client, lease renewal helper, and substantial new test coverage + samples.
  • Updated docs/changelogs and reformatted various tests/samples for style consistency.

Reviewed changes

Copilot reviewed 88 out of 92 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_tracing_e2e.py Formatting-only adjustments (line wrapping/blank lines).
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_session_id.py Formatting-only adjustments (blank lines, wrapped AsyncClient context).
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_server_routes.py Formatting-only adjustments (blank lines).
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_request_limits.py Minor whitespace cleanup.
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_request_id.py Formatting-only adjustments.
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_multimodal_protocol.py Minor whitespace cleanup and section spacing.
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_invoke.py Formatting-only adjustments (blank lines).
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_graceful_shutdown.py Formatting + wrapped long asserts for readability.
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_get_cancel.py Minor whitespace cleanup.
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_edge_cases.py Formatting-only adjustments (blank lines).
sdk/agentserver/azure-ai-agentserver-invocations/tests/test_decorator_pattern.py Formatting (wrapped JSONResponse returns).
sdk/agentserver/azure-ai-agentserver-invocations/samples/streaming_invoke_agent/streaming_invoke_agent.py Reformatted token list for readability.
sdk/agentserver/azure-ai-agentserver-invocations/samples/simple_invoke_agent/simple_invoke_agent.py Minor whitespace cleanup.
sdk/agentserver/azure-ai-agentserver-invocations/samples/multiturn_invoke_agent/multiturn_invoke_agent.py Formatting; JSONResponse construction wrapped.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/store.py New sample persistence helper (file-backed JSON store).
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/requirements.txt New sample requirements.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/app.py New durable multiturn sample host wiring.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/agent.py New durable multiturn sample agent task.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/store.py New sample persistence helper (file-backed JSON store).
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/requirements.txt New sample requirements (LangGraph + deps).
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/app.py New streaming + steering durable LangGraph host sample.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/store.py New sample persistence helper (file-backed JSON store).
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/requirements.txt New sample requirements (Copilot SDK, core, Starlette, uvicorn).
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/app.py New durable Copilot host sample with SSE.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/agent.py New steerable durable Copilot agent sample.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_claude/store.py New sample persistence helper (file-backed JSON store).
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_claude/requirements.txt New sample requirements (Anthropic SDK + runtime deps).
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_claude/app.py New durable Claude host sample with SSE.
sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_claude/agent.py New steerable durable Claude agent sample.
sdk/agentserver/azure-ai-agentserver-invocations/samples/async_invoke_agent/async_invoke_agent.py Formatting-only adjustments (wrapped JSON dict literals).
sdk/agentserver/azure-ai-agentserver-invocations/CHANGELOG.md Changelog updates to mention durable samples + dependency bump.
sdk/agentserver/azure-ai-agentserver-core/tests/test_tracing.py Formatting-only adjustments.
sdk/agentserver/azure-ai-agentserver-core/tests/test_startup_logging.py Formatting-only adjustments and wrapped long lines.
sdk/agentserver/azure-ai-agentserver-core/tests/test_server_routes.py Minor whitespace cleanup.
sdk/agentserver/azure-ai-agentserver-core/tests/test_logger.py Minor whitespace cleanup.
sdk/agentserver/azure-ai-agentserver-core/tests/test_graceful_shutdown.py Formatting-only adjustments and wrapped long asserts.
sdk/agentserver/azure-ai-agentserver-core/tests/test_config.py Formatting for long function signatures.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_task_result.py New tests for TaskResult wrapper behavior + guardrails.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_streaming.py New tests for pluggable stream handler integration.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_source.py New tests exercising source field persistence.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_retry.py New tests for RetryPolicy and retry integration.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_resume_route.py New tests for the resume HTTP route behavior.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_models.py New tests for durable models/exceptions.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_metadata.py New tests for dict-like TaskMetadata + flush semantics.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_local_provider.py New tests for local durable provider CRUD/listing.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_lifecycle.py New lifecycle automation tests.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_get.py New tests for DurableTask.get().
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_entry_mode.py New tests for ctx.entry_mode across paths.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_decorator.py New tests for @durable_task decorator/options/type extraction.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_cancellation_timeout.py New tests for cancellation, timeout, and termination.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/test_callable_factories.py New tests for callable factories on tags/description.
sdk/agentserver/azure-ai-agentserver-core/tests/durable/init.py New package init for durable tests.
sdk/agentserver/azure-ai-agentserver-core/tests/conftest.py Formatting-only adjustments.
sdk/agentserver/azure-ai-agentserver-core/samples/durable_streaming/requirements.txt New durable sample requirements.
sdk/agentserver/azure-ai-agentserver-core/samples/durable_streaming/durable_streaming.py New sample demonstrating streaming with durable tasks.
sdk/agentserver/azure-ai-agentserver-core/samples/durable_source/requirements.txt New durable sample requirements.
sdk/agentserver/azure-ai-agentserver-core/samples/durable_source/durable_source.py New sample demonstrating source usage.
sdk/agentserver/azure-ai-agentserver-core/samples/durable_retry/requirements.txt New durable sample requirements.
sdk/agentserver/azure-ai-agentserver-core/samples/durable_retry/durable_retry.py New sample demonstrating retry policies.
sdk/agentserver/azure-ai-agentserver-core/pyproject.toml Added httpx dependency + optional hosted extras (azure-identity).
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_stream.py New StreamHandler protocol + default QueueStreamHandler + factory alias.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_run.py New TaskRun async-iter streaming integration and lifecycle control methods.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_retry.py New RetryPolicy implementation and presets.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_resume_route.py New Starlette route for POST /tasks/resume.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_result.py New TaskResult wrapper class.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_provider.py New storage provider protocol for durable subsystem.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_metadata.py New dict-like TaskMetadata with flush/auto-flush.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_lease.py New lease identity utilities + renewal loop.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_exceptions.py New durable exception types (failed/suspended/cancelled/etc.).
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_context.py New TaskContext with stream support and lifecycle fields.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/_client.py New hosted durable task provider httpx client.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/init.py New public durable API exports.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_middleware.py Formatting-only adjustments for imports/log calls.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_errors.py Minor formatting simplification.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/_config.py Minor formatting simplification.
sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/init.py Minor whitespace cleanup.
sdk/agentserver/azure-ai-agentserver-core/README.md Added durable-task documentation section + link.
sdk/agentserver/azure-ai-agentserver-core/CHANGELOG.md Large changelog entry documenting durable subsystem and other changes.
sdk/agentserver/.gitignore Added .vscode/ ignore.
Comments suppressed due to low confidence (1)

sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/store.py:1

  • For JSON persistence, it’s better to write/read with an explicit encoding (UTF-8) for cross-platform consistency. Consider using open(fd, \"w\", encoding=\"utf-8\") (or os.fdopen) and also using read_text(encoding=\"utf-8\") in load() to avoid platform-default encoding surprises.

Comment on lines +57 to +72
if initial_delay.total_seconds() < 0:
raise ValueError(f"initial_delay must be >= 0, got {initial_delay}")
if max_attempts < 1 and not (
max_attempts == 1 and initial_delay == timedelta(0)
):
pass # allow no_retry preset
if backoff_coefficient < 1.0:
raise ValueError(
f"backoff_coefficient must be >= 1.0, got {backoff_coefficient}"
)
if max_delay < initial_delay:
raise ValueError(
f"max_delay ({max_delay}) must be >= initial_delay ({initial_delay})"
)
if max_attempts < 1:
raise ValueError(f"max_attempts must be >= 1, got {max_attempts}")
Comment on lines +191 to +192
except Exception as exc:
if "not found" in str(exc).lower():
Comment on lines +210 to +213
if task_info.payload and "metadata" in task_info.payload:
meta_data: dict[str, Any] = task_info.payload["metadata"]
for key, value in meta_data.items():
self._metadata.set(key, value)
and self._flush_callback is not None
and self._flush_task is None
):
self._flush_task = asyncio.get_event_loop().create_task(
Comment on lines +60 to +67
except Exception as exc: # pylint: disable=broad-exception-caught
msg = str(exc).lower()
if "not found" in msg:
return Response(status_code=404)
if "not 'suspended'" in msg or "already" in msg or "conflict" in msg:
return Response(status_code=409)
logger.error("Resume failed for task %s: %s", task_id, exc, exc_info=True)
return Response(status_code=500)

### Breaking Changes

- **`source` parameter removed** — The `source` keyword argument has been removed from `@durable_task()`, `.run()`, `.start()`, and `.options()`. Source provenance is now auto-stamped by the framework and cannot be overridden by developers. Use `tags` for custom metadata.
@RaviPidaparthi RaviPidaparthi changed the title feat(agentserver): Durable Task Framework for azure-ai-agentserver-core feat(agentserver): Durable Tasks for azure-ai-agentserver-core May 19, 2026
RaviPidaparthi and others added 3 commits May 19, 2026 21:48
- Pin aiohttp>=3.9.0,<4.0.0 to prevent pre-release 4.0.0a1 from being
  pulled by --pre flag (fails to compile on Python 3.13)
- Disable mindependency for invocations/responses since
  azure-ai-agentserver-core>=2.0.0b4 is not yet on PyPI
- Disable apistub for core (tool bug with Generic[Input,Output] on 3.10)
- Change task API route from /storage/tasks to /internal/tasks
- Add durable task overview documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
….0a0

- AgentServerHost lifespan now automatically creates and initializes
  a DurableTaskManager during startup, and shuts it down on exit.
  This fixes 'DurableTaskManager not initialized' errors when using
  @durable_task without manual manager setup.

- Pin aiohttp<4.0.0a0 to exclude pre-release 4.0.0a1 which fails to
  build (missing longintrepr.h) when CI uses --pre flag for nightly
  builds.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Changed HostedDurableTaskProvider base URL from /storage/tasks to /tasks
- Task API integration remains disabled (FOUNDRY_TASK_API_ENABLED=0)
- Includes all durable demo improvements: 12-stage research pipeline,
  crash recovery, GET reconnect with file fallback, cancel support,
  supervisor proxy, and updated README with demo script

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
RaviPidaparthi and others added 12 commits May 20, 2026 04:13
…ointer

Replace hand-crafted @durable_task checkpoint logic with LangGraph StateGraph
and AsyncSqliteSaver. This eliminates FileStreamHandler, manual metadata
management, and JSONL-based replay.

Key changes:
- agent.py: LangGraph StateGraph with looping research_stage node
- app.py: Simplified HTTP handlers (no durable task framework imports)
- GET handler: replays from checkpoint state instead of JSONL files
- Cancel: asyncio.Event checked at node entry
- requirements.txt: added langgraph, langgraph-checkpoint-sqlite, aiosqlite
- README: updated architecture docs
- .env: committed for deployment config

All 5 test scenarios pass:
- Full 12-stage execution with checkpointing
- Already-complete detection on re-invocation
- Cancel mid-execution (stops at next node boundary)
- Resume after cancel (clears stale cancel flag)
- Unknown thread returns None

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- POST handler returns 202 immediately (fire-and-forget) with
  invocation_id, session_id, task_id in response body
- GET handler streams SSE with sequential id: N on each event
- Supports last_event_id query param to skip already-seen events
  on reconnect (platform strips non x-client- headers)
- Crash handler returns 202 then exits asynchronously
- Session ID resolution simplified to use framework config
- Demo client (demo-client.sh): POST→GET flow, client-side skip,
  LAST_EVENT_ID tracking, logs command for 3rd terminal
- Verified live: crash mid-stage-5 → reconnect resumes from stage 5

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… architecture

- Document POST 202 fire-and-forget flow
- Document GET streaming with last_event_id query param
- Add container logs section (azd ai agent monitor)
- Update manual curl demo steps
- Add 'How it works (client flow)' section

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…veness

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Hosted Agents sdk/agentserver/*

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants