From 44fcdea83011992b3a1da3c8b6ec70b3bc8b27cb Mon Sep 17 00:00:00 2001 From: Ahmed Muhsin Date: Tue, 23 Jun 2026 15:47:39 -0400 Subject: [PATCH 01/12] feat(durabletask): add workflow naming helpers (multi-workflow phase 0) Foundation for hosting multiple workflows (and later sub-workflows) on one durable task host. Adds a host-agnostic naming module that derives the stable durable names a hosted workflow registers under. - New `_workflows/naming.py`: - `workflow_orchestrator_name(name)` -> `dafx-{name}` (orchestration name, aligned byte-for-byte with .NET `WorkflowNamingHelper`). - `workflow_name_from_orchestrator(name)` -> reverse, `None` when not prefixed. - `validate_workflow_name(name)` -> rejects empty / malformed / auto-generated `WorkflowBuilder-` names (validate-and-reject rather than silently sanitize, since the name becomes a durable identity and an HTTP route segment). - `is_auto_generated_workflow_name(name)`, `DURABLE_NAME_PREFIX`. - Export the helpers from the package public API. - Mark `WORKFLOW_ORCHESTRATOR_NAME` deprecated in favor of per-workflow names (kept functional; the single-workflow path still uses it until phase 1). - 39 unit tests covering round-trips and validation. Design: docs/design/durabletask-multiworkflow-and-subworkflows.md --- ...abletask-multiworkflow-and-subworkflows.md | 515 ++++++++++++++++++ .../agent_framework_durabletask/__init__.py | 12 + .../_workflows/naming.py | 145 +++++ .../_workflows/orchestrator.py | 6 + .../durabletask/tests/test_workflow_naming.py | 107 ++++ 5 files changed, 785 insertions(+) create mode 100644 docs/design/durabletask-multiworkflow-and-subworkflows.md create mode 100644 python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py create mode 100644 python/packages/durabletask/tests/test_workflow_naming.py diff --git a/docs/design/durabletask-multiworkflow-and-subworkflows.md b/docs/design/durabletask-multiworkflow-and-subworkflows.md new file mode 100644 index 00000000000..8415b9609dc --- /dev/null +++ b/docs/design/durabletask-multiworkflow-and-subworkflows.md @@ -0,0 +1,515 @@ +# Durable hosting: multiple workflows and sub-workflows (Python) + +Status: Draft / for discussion +Scope: `python/packages/durabletask` (standalone Durable Task worker) and +`python/packages/azurefunctions` (Azure Functions host) +Related: PR #6418 (standalone Durable Task workflow hosting), core +`agent_framework._workflows` + +This document sketches the work needed to add two capabilities to the Python +durable workflow hosting layer: + +1. **Multiple workflows per host** — register and address more than one workflow + on a single worker / Function App. +2. **Sub-workflows** — run a `Workflow` nested inside another workflow under + durable execution. + +It maps the current single-workflow architecture, summarizes the existing .NET +implementation and the in-process core engine, then proposes a design with +options, recommendations, and a phased work breakdown. Where the .NET approach +is shaped by C#/DI specifics, the Python-specific recommendation is called out. + +--- + +## 1. Current state (single workflow) + +The durable hosting layer today assumes exactly one workflow per host. + +- **Fixed orchestrator name.** `WORKFLOW_ORCHESTRATOR_NAME = "workflow_orchestrator"` + is a module constant in `_workflows/orchestrator.py`, exported from the package + `__all__`. Every workflow registers and starts under this one name. +- **Singular worker registration.** `DurableAIAgentWorker.configure_workflow(workflow)` + stores `self._workflow = workflow` and registers one orchestrator whose + `__name__` is set to the fixed constant. +- **Registration planner.** `plan_workflow_registration(workflow)` walks + `workflow.executors.values()` and classifies each executor: `AgentExecutor` + becomes a durable **entity**, everything else becomes a durable **activity**. + It returns a single `WorkflowRegistrationPlan(agent_executors, activity_executors, + orchestrator_name)`. +- **Global durable names.** Activities and agent entities are named + `dafx-{executor.id}` (`AgentSessionId.to_entity_name` uses the same `dafx-` + prefix). These names are **global per task hub**, so two workflows sharing an + executor id collide. +- **Singular client.** `DurableWorkflowClient.start_workflow()` (and + `run_workflow` / `stream_workflow`) always schedule `WORKFLOW_ORCHESTRATOR_NAME`. + There is no per-workflow targeting. +- **Singular Functions host.** `AgentFunctionApp(workflow=...)` takes one workflow, + registers one orchestrator (`workflow_orchestrator`), per-executor activities + (`dafx-{executor.id}`), and three flat HTTP routes: `workflow/run`, + `workflow/status/{instanceId}`, `workflow/respond/{instanceId}/{requestId}`. + The route-scoping check `_is_workflow_orchestration(status)` compares + `status.name.casefold() == WORKFLOW_ORCHESTRATOR_NAME.casefold()` so a caller + cannot read or inject into unrelated orchestrations in the same hub. + +**No sub-workflow support exists.** Searching both packages for +`subworkflow` / `WorkflowExecutor` / `nested` returns nothing. A core +`WorkflowExecutor` (see §3) is not an `AgentExecutor`, so the planner currently +classifies it as a plain non-agent executor and would register it as a single +activity. Its activity body calls `executor.execute(...)`, which runs the +**entire inner workflow in-process inside one activity invocation** via +`WorkflowExecutor.process_workflow` → `self.workflow.run(...)`. That means: + +- inner executors do not become durable activities/entities (no durable replay + for inner steps, inner agent calls are not durable entity calls); +- inner human-in-the-loop (HITL) cannot pause — there is no external-event pump + inside an activity, and the default `propagate_request=False` emits a + `SubWorkflowRequestMessage` to a parent executor that the durable host never + wires up; +- a long inner workflow can exceed activity time limits; +- inner events are not streamed. + +So sub-workflows are effectively unsupported, not merely unoptimized. + +--- + +## 2. .NET reference (alignment baseline) + +The .NET hosting layer already supports both capabilities. Key facts to align +with (or deliberately diverge from): + +- **Multiple workflows, keyed by name.** `AddWorkflow(name, factory)` registers + workflows additively, keyed by `Workflow.Name` (lookup dictionary uses + `StringComparer.OrdinalIgnoreCase`; registration asserts the factory's + `Workflow.Name` matches the key). +- **Per-workflow orchestration name.** `WorkflowNamingHelper.ToOrchestrationFunctionName(name)` + returns `"dafx-" + name`, with a `ToWorkflowName` reverse. The orchestration + name is parameterized, not fixed. +- **Per-workflow HTTP routes.** `workflows/{workflowName}/run`, + `workflows/{workflowName}/status/{runId}`, `workflows/{workflowName}/respond/{runId}`. + Ownership is enforced by `IsOrchestrationOwnedByWorkflow(orchestrationName, + functionName, suffix)` comparing the instance's orchestration name to + `dafx-{routeWorkflowName}`. +- **Executor → durable mapping (in the durable host).** Non-agent executor → + durable **activity** `dafx-{executorName}` (dispatched via + `context.CallActivityAsync`, so results are cached in orchestration history and + not re-run on replay); agent executor → durable **entity** + `AgentSessionId.ToEntityName(executorName)`. So in the .NET *durable* host the + executors are durable activities/entities — the same model Python uses — *not* + in-process objects. The dispatch switch lives in `DurableExecutorDispatcher.DispatchAsync`. + Activity registration is deduplicated across workflows by name via a `HashSet`, + and the executor registry is keyed by executor name (first registration wins), + so two workflows that define different executors with the same name **collide** + (a documented constraint, not a fix). +- **Sub-workflows run as durable CHILD ORCHESTRATIONS (not in-process).** In the + *durable* host, `DurableExecutorDispatcher.ExecuteSubWorkflowAsync` dispatches a + sub-workflow node via `context.CallSubOrchestratorAsync("dafx-{innerName}", ...)`. + The child orchestration runs the same superstep loop and its inner executors are + durable activities/entities cached in the *child's* history. Sub-workflow and + request-port bindings are skipped by activity registration precisely because + they use this specialized dispatch. (The `WorkflowHostExecutor` / + `InProcessRunner` path is the **core in-process engine**, a separate runtime; it + is *not* how the durable host runs sub-workflows.) +- **Client retains workflow identity.** `IWorkflowClient.RunAsync(workflow, input, runId)`; + run handles carry `WorkflowName`. + +**Corrected mental model (resolving an earlier mistake in this doc):** in the +.NET *durable* host, non-agent executors are durable activities, agent executors +are durable entities, and sub-workflows are durable child orchestrations via +`CallSubOrchestratorAsync`. "Executors run in-process" is true only of the *core +in-process engine*, never the durable host. This means the Python child- +orchestration model for sub-workflows (see §5) is **alignment with .NET, not a +divergence**. + +--- + +## 3. Core in-process model (what we mirror durably) + +The core engine (`agent_framework._workflows`) already models nested workflows +in-process, and the durable layer should mirror its semantics: + +- **`WorkflowExecutor(Executor)`** wraps a `Workflow` as an executor + (`process_workflow` runs `self.workflow.run(input)`), publicly exported from + `agent_framework` along with `SubWorkflowRequestMessage` / + `SubWorkflowResponseMessage`. +- **Request bridging.** Inner `request_info` either propagates to the parent's + own request surface (`propagate_request=True` → `ctx.request_info(...)`) or is + wrapped as a `SubWorkflowRequestMessage` sent to a parent executor + (`propagate_request=False`, the default). Responses route back by `request_id`. +- **Isolation + concurrency.** Each inner run gets an `execution_id`; a + `request_id → execution_id` map routes responses to the correct concurrent run. +- **Checkpointing.** `on_checkpoint_save` / `on_checkpoint_restore` persist the + inner execution contexts and rehydrate pending request-info events. +- **Execution + events.** Pregel-style supersteps; `WorkflowEvent` types include + `output`, `intermediate`, `request_info`, `executor_invoked/completed`, + `superstep_*`, lifecycle/diagnostic. `State` is shared within a workflow and + isolated per workflow instance. + +The durable orchestrator (`run_workflow_orchestrator`) already re-implements the +superstep loop, edge-group routing, fan-in/out, and HITL pause/resume against the +`WorkflowOrchestrationContext` protocol. Sub-workflow support extends this loop to +a new executor category; multi-workflow support parameterizes registration and +naming around it. + +--- + +## 4. Part 1 — Multiple workflows per host + +### 4.1 Naming helpers (foundation) + +Replace the fixed constant with a helper pair, mirroring .NET +`WorkflowNamingHelper`: + +```python +WORKFLOW_ORCHESTRATOR_PREFIX = "dafx-" + +def workflow_orchestrator_name(workflow_name: str) -> str: # "dafx-{name}" +def workflow_name_from_orchestrator(name: str) -> str | None # reverse, validates prefix +def sanitize_workflow_name(name: str) -> str # enforce durable-safe charset +``` + +Notes: +- This aligns the Python orchestration name scheme with .NET (`dafx-{name}`). +- `WORKFLOW_ORCHESTRATOR_NAME` stays exported as a **deprecated** alias to keep + the public surface stable; see §6 back-compat. + +### 4.2 Workflow names must be explicit and stable + +`WorkflowBuilder` defaults an unnamed workflow to `f"WorkflowBuilder-{uuid4()}"`. +A random name regenerates on every process build, which would change the +orchestration function name across worker restarts and **break resume of +in-flight instances**. Therefore: + +- Multi-workflow hosting **requires an explicit, stable `Workflow.name`** (reject + auto-generated `WorkflowBuilder-` names at registration, mirroring .NET's + assert-name-matches-key contract). +- Names are validated/sanitized to the durable name charset. +- Duplicate names within one host are rejected. + +### 4.3 Durable names (decision: scope workflow-internal names by workflow) + +The orchestration name stays **`dafx-{workflowName}`** (matches .NET; this is the +name the Durable Task tooling/UI keys off). For a workflow's **internal** +executors and agents, the durable names are **scoped by workflow**: + +- non-agent activity: `dafx-{workflowName}-{executorId}` +- agent entity: `dafx-{workflowName}-{executorId}` + +Each workflow registers its own distinctly named activities/entities, each a +closure capturing that workflow's specific executor/agent instance (the same +shape as today's single-workflow code, just with a longer name). `(workflow, +executor)` is globally unique, so two co-hosted workflows that reuse an executor +id never collide. + +**Why scope the names instead of resolving a bare name at runtime.** A +`dafx-{executorId}` activity/entity is created by a factory that **captures one +specific instance** (e.g. `__create_agent_entity` → `AgentEntity(agent=agent, +...)`, registered once via `add_entity`; `add_agent` even raises `ValueError` on +a duplicate id). With one global name per executor id, two workflows that define +the same id backed by **different** implementations (different agent +model/instructions/tools, or different executor code) would have one shadow the +other — a workflow silently gets behavior it did not expect. Putting the workflow +name in the durable name removes that foot-gun directly: different names, no +shared registration, plain closures, no per-call workflow lookup. + +This diverges from .NET's *inner* activity/entity names (.NET keeps bare +`dafx-{executorName}` and resolves from a global registry keyed by name, which +keeps the collision as a documented constraint). The divergence is deliberate and +low-cost: the **orchestration** name — the one the DT UI surfaces — is identical +to .NET (`dafx-{workflowName}`); only the inner activity/entity names differ, and +no tooling depends on those strings. + +**Agent state is still isolated by the entity key.** Independent of naming, an +agent entity is addressed by `(name, key)` with `key = ctx.instance_id` +(`_prepare_agent_task` → `AgentSessionId(name=..., key=instance_id)`), so two runs +never share conversation state and each run keeps its own session across turns — +mirroring core. Scoping the *name* fixes *which implementation* runs; the *key* +already isolated *state*. + +**Agent addressing (decision).** Workflow agents stay reachable, just under a +**workflow-qualified** identity rather than a bare one. Both registration paths +funnel through the same primitive today — `AgentFunctionApp` calls +`add_agent(agent, entity_id=...)` for `agents=` *and* for each agent extracted +from a workflow, and `DurableAIAgentWorker` does the same in `configure_workflow`. +The only change is the name the planner hands to that primitive for workflow +agents: scoped `{workflowName}-{executorId}` instead of bare `executorId`. So: + +- `agents=` (FunctionApp) / `add_agent(...)` (worker) → **bare** `dafx-{agentName}`, + the standalone HTTP/MCP-addressable surface. +- agents inside a `workflow` → **scoped** `dafx-{workflowName}-{executorId}`, + registered through the *same* primitive, still tracked in the registry. + +Lookup is qualified, so workflow agents do **not** disappear from the surface: + +```python +get_agent("translator") # bare standalone agent +get_agent("translator", workflow_name="orders") # workflow-scoped agent +``` + +We deliberately do **not** add a `workflow_agents=` constructor input. The agents +already live inside the `Workflow` object (each `AgentExecutor` holds its agent), +so a separate map would duplicate that and create a source-of-truth conflict. The +per-workflow agent grouping `{workflow_name: [agent_executors]}` is an *internal* +structure the planner produces and both hosts consume — not a public kwarg. An +agent used both standalone and inside a workflow is registered both ways and +becomes two independent entities (bare + scoped) with separate state, which is the +intended separation. This keeps "workflow step vs standalone callable" an explicit +registration choice while keeping both reachable. + +*Cross-workflow shared* agent memory (one agent that deliberately remembers +across two co-hosted workflows) remains out of scope; it would need an explicit +stable shared entity key rather than `instance_id`. + +### 4.4 Standalone worker changes (`durabletask`) + +- `DurableAIAgentWorker.configure_workflow` becomes **additive**: store + `self._workflows: dict[str, Workflow]` keyed by sanitized name; reject + duplicates and auto-generated names. +- Register one orchestrator per workflow, each a closure capturing its `Workflow`, + with `__name__ = workflow_orchestrator_name(name)`. +- Register that workflow's non-agent activities and agent entities under their + **scoped** names `dafx-{workflowName}-{executorId}` (§4.3), each capturing the + specific executor/agent instance, via the **same** `add_agent` / + activity-registration primitives that standalone `add_agent` uses (only the + name differs). Workflow agents stay tracked in the registry under their + workflow-qualified identity; an agent that should *also* be standalone- + addressable under a bare name is registered separately via `add_agent`. +- `plan_workflow_registration` already returns `orchestrator_name`; extend it to + also group agents/activities per workflow and thread the per-workflow name + through it (the `{workflow_name: [...]}` grouping both hosts consume). + +### 4.5 Client changes (`DurableWorkflowClient`) + +- Add an optional `workflow_name` to `start_workflow` / `run_workflow` / + `stream_workflow`. The client resolves the orchestration name via + `workflow_orchestrator_name(workflow_name)`. +- When the worker hosts exactly one workflow, `workflow_name` may be omitted + (resolves to the sole registered workflow) for ergonomic back-compat. +- Status/HITL methods remain keyed by `instance_id`; add an optional + `workflow_name` used to validate ownership (the instance's orchestration name + must match), mirroring `_is_workflow_orchestration`. + +### 4.6 Azure Functions host changes (`azurefunctions`) + +- `AgentFunctionApp` accepts `workflows: list[Workflow] | dict[str, Workflow]` + (keep singular `workflow=` as a back-compat alias for one entry). +- Per workflow: register an orchestrator via + `@function_name(workflow_orchestrator_name(name))` + `@orchestration_trigger`, + register its activities/entities under scoped names + `dafx-{workflowName}-{executorId}` (§4.3), and register **per-workflow routes**: + `workflow/{workflowName}/run`, `workflow/{workflowName}/status/{instanceId}`, + `workflow/{workflowName}/respond/{instanceId}/{requestId}`. +- **Routes are always per-workflow, even for a single workflow** (decision §8). + Keeping the shape constant means downstream callers do not have to change URLs + when an app grows from one workflow to many — the single-workflow case is just + the one-element case of the general shape. (No legacy flat `workflow/run` + aliases; this is still a preview surface.) +- Replace `_is_workflow_orchestration(status)` with + `_is_owned_orchestration(status, workflow_name)` comparing + `status.name == workflow_orchestrator_name(workflow_name)` (the existing + case-insensitive comment already anticipated per-workflow names). +- Workflow agents register through the **same** `add_agent` primitive as + `agents=`, under the scoped name `dafx-{workflowName}-{executorId}`, so they + stay tracked in the registry. `get_agent` gains an optional `workflow_name` to + resolve them: `get_agent(name)` for bare standalone agents, + `get_agent(name, workflow_name=...)` for workflow-scoped agents. Expose an agent + standalone (bare `dafx-{agentName}`) by passing it via `agents=`; an agent used + both ways is registered both ways and yields two independent entities (§4.3). +- No `workflow_agents=` constructor kwarg — the agents already live inside each + `Workflow`; the per-workflow grouping is internal (§4.3). + +--- + +## 5. Part 2 — Sub-workflows + +A `WorkflowExecutor` node in a hosted workflow must run its inner `Workflow` +durably. Three execution models were considered. + +### 5.1 Models considered + +- **Model A — inner workflow inside one activity (status quo).** Register the + `WorkflowExecutor` as a normal activity; its body runs `inner.workflow.run()` + in-process. Simplest, but not durable, cannot pause for inner HITL, and risks + activity timeouts. **Rejected** as the primary model (it is today's accidental, + broken behavior). +- **Model B — child orchestration (recommended).** When the orchestrator reaches + a `WorkflowExecutor` node, it starts the inner workflow as a **durable child + orchestration** (`call_sub_orchestrator(workflow_orchestrator_name(inner_name), + input=...)`) and awaits its result like any other task. The inner workflow's + executors become its own activities/entities; it is independently durable, + checkpointed, observable (own instance id), and can run long without hitting + activity limits. +- **Model C — inlined supersteps.** Recursively drive the inner workflow's + superstep loop inside the parent orchestration generator, scheduling inner + executor activities directly and qualifying inner request ids + (`{subId}.{requestId}`) like .NET ports. Durable and single-instance, but + bloats one orchestration history, prevents independent inner observation, and + re-implements nesting bookkeeping in the generator. **Rejected** as primary + (highest complexity, weakest observability). + +### 5.2 Recommendation: Model B (child orchestration) + +Model B is the natural durable fit for Python because Python's executors already +run as activities/entities driven by the orchestrator — so a sub-workflow is just +**another registered workflow started by a parent instead of by HTTP**. It reuses +the entire Part 1 multi-workflow machinery (named registration, per-workflow +orchestrator, scoped activity/entity names, ownership checks). + +**This matches .NET.** The .NET *durable* host dispatches sub-workflow nodes via +`DurableExecutorDispatcher.ExecuteSubWorkflowAsync` → +`context.CallSubOrchestratorAsync("dafx-{innerName}", ...)`, and the child +orchestration runs its own superstep loop with its inner executors as durable +activities/entities in the child's history. Model B is the same approach in +Python (the `WorkflowHostExecutor` / `InProcessRunner` path is the *core in- +process engine*, not the durable host). The tradeoff is more orchestration +instances (extra bookkeeping and more rows in the DT UI) in exchange for true +inner durability, independent inner observability, inner HITL, and no activity- +timeout coupling. + +### 5.3 Required changes + +- **Protocol.** Add `call_sub_orchestrator(name, input, instance_id=None)` to the + `WorkflowOrchestrationContext` protocol, implemented by both adapters + (`DurableTaskWorkflowContext` → `OrchestrationContext.call_sub_orchestrator`; + `AzureFunctionsWorkflowContext` → `DurableOrchestrationContext.call_sub_orchestrator`). + Both underlying SDKs support sub-orchestrations. +- **Planner.** Extend `plan_workflow_registration` to detect + `isinstance(executor, WorkflowExecutor)` and return a new + `subworkflow_executors` category carrying the inner `Workflow`. The host then + (a) **recursively registers** the inner workflow's orchestrator/activities/ + entities, and (b) does **not** register the `WorkflowExecutor` itself as an + activity. +- **Orchestrator routing.** In `run_workflow_orchestrator`'s task-preparation + phase, route a message destined for a `WorkflowExecutor` node to + `ctx.call_sub_orchestrator(...)` instead of an activity task. The child's + result feeds back into edge routing exactly like an activity result (outputs → + messages / final outputs). +- **Deterministic child instance ids.** Derive + `f"{parent_instance_id}::{executor_id}"` (append a deterministic counter when a + `WorkflowExecutor` runs on multiple messages in a superstep, e.g. fan-out) for + discoverability and idempotent replay. +- **Recursion bound.** Detect cycles and cap nesting depth (configurable) to + prevent unbounded sub-orchestration trees. +- **Result/output mapping.** Reuse the existing typed-output reconstruction + (`deserialize_workflow_output`) on the child result before routing. + +### 5.4 Sub-workflow HITL + +The inner workflow's `request_info` surfaces in the **child** orchestration's +custom status. Two addressing options: + +- **B1 — direct child addressing.** Expose child instance ids; the responder + posts to `workflow/{innerName}/respond/{childInstanceId}/{requestId}`. Simple; + caller discovers child ids from the parent status (which lists nested pending + requests with their child instance ids). +- **B2 — propagated single surface (recommended, .NET-aligned philosophy).** + Bubble inner pending requests up into the **parent** custom status with + **qualified request ids** (`{executor_path}::{requestId}`), mirroring .NET port + qualification. A response to the parent is routed by stripping the qualifier and + raising the event on the owning child instance. One addressing surface for + arbitrarily deep nesting, at the cost of parent→child response plumbing. + +**Decision: B2 (propagated single surface).** Pending inner requests bubble up +into the **parent** custom status with **qualified request ids** +(`{executor_path}::{requestId}`), mirroring .NET port qualification. A response to +the parent is routed by stripping the qualifier and raising the event on the +owning child instance. This gives one addressing surface for arbitrarily deep +nesting (the caller always talks to the top-level run), at the cost of +parent→child response plumbing. It is consistent with the "always per-workflow, +stable surface" routing decision: callers never need to discover child instance +ids. B1 (direct child addressing) is the rejected alternative — simpler plumbing +but leaks child instance ids into the caller and changes the surface per nesting +depth. + +--- + +## 6. Cross-cutting concerns + +- **Back-compat / migration (decision: hard switch).** `WORKFLOW_ORCHESTRATOR_NAME` + stays exported as a deprecated alias for source compatibility, but the + single-workflow default orchestration name moves from `workflow_orchestrator` + to `dafx-{name}` with **no runtime alias**. This means **in-flight + single-workflow instances created before the upgrade will not resume** under + the new name. Accepted because durable workflow runs are typically short-lived + and this is a preview surface; operators should drain in-flight workflow + instances before upgrading. (Resolves former open decision; §8.) +- **Determinism.** `call_sub_orchestrator`, `wait_for_external_event`, and timers + are replay-safe; child instance ids must be derived deterministically (no + `uuid4()` in the orchestrator — use `ctx.new_uuid()` or derived ids). +- **Security / route scoping.** Per-workflow ownership checks + (`_is_owned_orchestration`) extend the existing defense so a caller holding an + instance id cannot cross workflow boundaries. Sub-workflow respond endpoints + validate child ownership the same way. +- **Streaming.** `supports_event_streaming` stays host-gated (Azure Functions + off due to the 16 KB custom-status cap). Nested event propagation respects the + same gate. + +--- + +## 7. Phased work breakdown + +Each phase is independently shippable. + +- **Phase 0 — naming + validation.** Add `workflow_orchestrator_name` / + reverse / `sanitize_workflow_name`; deprecate `WORKFLOW_ORCHESTRATOR_NAME`. + Unit tests for naming round-trips and validation. (durabletask) +- **Phase 1 — multiple workflows on the standalone worker.** Additive + `configure_workflow`, per-workflow orchestrators, scoped activity/entity names + `dafx-{workflowName}-{executorId}`, client `workflow_name` targeting + + ownership. Unit + integration tests with two workflows in one hub (including two + workflows that reuse an executor/agent id with different implementations). +- **Phase 2 — multiple workflows on Azure Functions.** `workflows=`, + per-workflow orchestrators/activities/routes (always per-workflow), + `_is_owned_orchestration`. Unit tests + a two-workflow sample. +- **Phase 3 — sub-workflows via child orchestrations.** Protocol + `call_sub_orchestrator` + both adapters; planner `subworkflow_executors` + + recursive registration; orchestrator routing; deterministic child ids; + recursion bound. Unit + integration tests + a nested-workflow sample. +- **Phase 4 — sub-workflow HITL (B2).** Propagate inner pending requests to the + parent custom status with qualified request ids; route a parent response to the + owning child instance by stripping the qualifier. Tests + HITL sub-workflow + sample. +- **Phase 5 — docs, samples, ADR(s).** Promote the multi-workflow and + sub-workflow decisions into ADR(s) under `docs/decisions/`; add README/runbook + updates. + +--- + +## 8. Decisions + +**Resolved:** + +1. **Orchestration naming** (§4.1, §4.3): orchestration name **`dafx-{workflowName}`** + (matches .NET; the name the Durable Task tooling/UI surfaces). +2. **Workflow-internal durable names** (§4.3): scope inner activity/entity names + by workflow — **`dafx-{workflowName}-{executorId}`** (Approach A). Distinct + names per workflow, plain closures, no runtime registry; removes the + same-executor-id collision. Diverges from .NET's bare inner names, but only the + orchestration name (identical to .NET) is UI-surfaced. +3. **Multi-workflow route shape on Azure Functions** (§4.6): **always per-workflow + routes**, so downstream callers don't change URLs when an app grows from one + workflow to many. +4. **Sub-workflow execution model** (§5): **Model B (child orchestration via + `call_sub_orchestrator`)**, which is what the .NET durable host does + (`ExecuteSubWorkflowAsync`). Accept more orchestration instances in exchange + for inner durability and observability. +5. **Single-workflow orchestration-name migration** (§6): **hard switch** to + `dafx-{name}` with no runtime alias. Pre-upgrade in-flight instances under + `workflow_orchestrator` won't resume; acceptable for a preview surface. +6. **Sub-workflow HITL addressing** (§5.4): **B2** — propagate inner pending + requests to the parent custom status with qualified request ids; the caller + always responds to the top-level run. +7. **Agent addressing** (§4.3, §4.6): workflow agents register through the **same** + `add_agent` primitive as `agents=`, under the scoped name + `dafx-{workflowName}-{executorId}`, and stay reachable via + `get_agent(name, workflow_name=...)`. Bare `agents=` registration keeps the + standalone `dafx-{agentName}` surface. No `workflow_agents=` kwarg — the + per-workflow grouping is an internal planner structure both hosts consume. + Agent conversation *state* stays isolated by the entity key (`ctx.instance_id`) + regardless of naming. + +**Still open:** + +- **Cross-workflow shared agents** (§4.3): a single agent that intentionally + shares conversation memory across two co-hosted workflows is out of scope; if + wanted later it needs an explicit stable shared entity key rather than + `instance_id`. Flagged as a possible follow-up, not part of this work. diff --git a/python/packages/durabletask/agent_framework_durabletask/__init__.py b/python/packages/durabletask/agent_framework_durabletask/__init__.py index bcab531e51f..4a51d476cdd 100644 --- a/python/packages/durabletask/agent_framework_durabletask/__init__.py +++ b/python/packages/durabletask/agent_framework_durabletask/__init__.py @@ -55,6 +55,13 @@ from ._workflows.client import DurableWorkflowClient from ._workflows.context import WorkflowOrchestrationContext from ._workflows.dt_context import DurableTaskWorkflowContext +from ._workflows.naming import ( + DURABLE_NAME_PREFIX, + is_auto_generated_workflow_name, + validate_workflow_name, + workflow_name_from_orchestrator, + workflow_orchestrator_name, +) from ._workflows.orchestrator import WORKFLOW_ORCHESTRATOR_NAME, run_workflow_orchestrator from ._workflows.registration import WorkflowRegistrationPlan, plan_workflow_registration from ._workflows.runner_context import CapturingRunnerContext @@ -68,6 +75,7 @@ __all__ = [ "DEFAULT_MAX_POLL_RETRIES", "DEFAULT_POLL_INTERVAL_SECONDS", + "DURABLE_NAME_PREFIX", "MIMETYPE_APPLICATION_JSON", "MIMETYPE_TEXT_PLAIN", "REQUEST_RESPONSE_FORMAT_JSON", @@ -121,8 +129,12 @@ "deserialize_workflow_output", "ensure_response_format", "execute_workflow_activity", + "is_auto_generated_workflow_name", "load_agent_response", "plan_workflow_registration", "run_agent_coroutine", "run_workflow_orchestrator", + "validate_workflow_name", + "workflow_name_from_orchestrator", + "workflow_orchestrator_name", ] diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py new file mode 100644 index 00000000000..07c65e62e9e --- /dev/null +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py @@ -0,0 +1,145 @@ +# Copyright (c) Microsoft. All rights reserved. + +"""Durable naming helpers for hosting MAF Workflows. + +A hosted workflow maps to durable primitives (an orchestration, plus an activity +or entity per executor) whose names must be **stable** across worker restarts: +durable replay only resumes an in-flight orchestration if the orchestration, +activity, and entity names still resolve to the same functions. This module +centralizes how those names are derived from a workflow name so every host (the +Azure Functions host and the standalone durabletask worker) and the client agree +on one scheme. + +Naming scheme (the orchestration name is aligned byte-for-byte with .NET's +``WorkflowNamingHelper``):: + + orchestration: dafx-{workflowName} + non-agent activity: dafx-{workflowName}-{executorId} (wired up in Phase 1) + agent entity: dafx-{workflowName}-{executorId} (wired up in Phase 1) + +The orchestration name is the identifier the Durable Task tooling/UI surfaces, so +it matches .NET exactly. The inner activity/entity names are scoped by workflow in +Python (unlike .NET's bare ``dafx-{executorId}``) so two co-hosted workflows that +reuse an executor id cannot collide. See +``docs/design/durabletask-multiworkflow-and-subworkflows.md`` for the rationale. +""" + +from __future__ import annotations + +import re + +__all__ = [ + "DURABLE_NAME_PREFIX", + "is_auto_generated_workflow_name", + "validate_workflow_name", + "workflow_name_from_orchestrator", + "workflow_orchestrator_name", +] + +# Shared prefix for every durable name this hosting layer registers. Matches +# .NET's ``WorkflowNamingHelper.OrchestrationFunctionPrefix`` and the existing +# ``AgentSessionId.ENTITY_NAME_PREFIX``. +DURABLE_NAME_PREFIX = "dafx-" + +# A workflow name is interpolated into durable orchestration/activity/entity names +# *and* into HTTP route segments (``workflow/{workflowName}/run``), so it must be +# conservative enough to be safe in every position: ASCII letters, digits, '_' or +# '-', starting with a letter, at most 63 characters. The length cap leaves room +# for the ``dafx-`` prefix and an ``-{executorId}`` suffix within typical durable +# name limits. +_WORKFLOW_NAME_RE = re.compile(r"^[A-Za-z][A-Za-z0-9_-]{0,62}$") + +# Names auto-generated by ``WorkflowBuilder`` when the caller does not pass one, +# e.g. ``"WorkflowBuilder-3f2b1c0a-1234-5678-9abc-def012345678"``. They embed a +# fresh ``uuid4`` per process build, so they are not stable identities and must be +# rejected for durable hosting (see :func:`validate_workflow_name`). +_AUTO_GENERATED_NAME_RE = re.compile( + r"^WorkflowBuilder-[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$" +) + + +def workflow_orchestrator_name(workflow_name: str) -> str: + """Return the durable orchestration name for a workflow. + + Args: + workflow_name: The workflow's name. Must satisfy + :func:`validate_workflow_name`. + + Returns: + ``"dafx-{workflow_name}"``. + + Raises: + ValueError: If ``workflow_name`` is not a valid, stable workflow name. + """ + validate_workflow_name(workflow_name) + return f"{DURABLE_NAME_PREFIX}{workflow_name}" + + +def workflow_name_from_orchestrator(orchestrator_name: str) -> str | None: + """Recover the workflow name from a durable orchestration name. + + The inverse of :func:`workflow_orchestrator_name`. Intended to be applied to + orchestration names (for example a durable instance's ``status.name``); it + strips the shared :data:`DURABLE_NAME_PREFIX`. + + Args: + orchestrator_name: A durable orchestration name. + + Returns: + The workflow name, or ``None`` if ``orchestrator_name`` does not carry the + expected prefix (so a caller can treat it as "not one of ours"). + """ + if not orchestrator_name.startswith(DURABLE_NAME_PREFIX): + return None + name = orchestrator_name[len(DURABLE_NAME_PREFIX) :] + return name or None + + +def validate_workflow_name(workflow_name: str) -> None: + """Validate that a workflow name is usable as a stable durable identity. + + The name is **validated and rejected** rather than silently sanitized. A + workflow name is an identity baked into durable orchestration/activity/entity + names and HTTP routes, so transforming it could either (a) collapse two + distinct names into one and reintroduce the cross-workflow collision this + scheme exists to prevent, or (b) change the resolved name across versions and + break resume of in-flight instances. A loud error is safer than a silent + rename. + + Args: + workflow_name: The candidate name. + + Raises: + ValueError: If the name is empty, an auto-generated ``WorkflowBuilder`` + name, or contains characters outside + ``[A-Za-z][A-Za-z0-9_-]{0,62}``. + """ + if not workflow_name: + raise ValueError("Workflow name must be a non-empty string.") + if is_auto_generated_workflow_name(workflow_name): + raise ValueError( + f"Workflow name '{workflow_name}' is an auto-generated WorkflowBuilder name, which is " + "not stable across restarts. Pass an explicit, stable name to WorkflowBuilder(name=...) " + "before hosting the workflow durably." + ) + if not _WORKFLOW_NAME_RE.match(workflow_name): + raise ValueError( + f"Workflow name '{workflow_name}' is invalid. Use 1-63 characters consisting of ASCII " + "letters, digits, '_' or '-', and starting with a letter." + ) + + +def is_auto_generated_workflow_name(workflow_name: str) -> bool: + """Return whether a name looks like ``WorkflowBuilder``'s auto-generated default. + + ``WorkflowBuilder`` names an otherwise-unnamed workflow + ``f"WorkflowBuilder-{uuid4()}"``, which changes on every process build and is + therefore not a stable durable identity. + + Args: + workflow_name: The candidate name. + + Returns: + ``True`` if the name matches the auto-generated pattern. + """ + return bool(_AUTO_GENERATED_NAME_RE.match(workflow_name)) diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py index fec138fca8c..e211d8c9ae0 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py @@ -73,6 +73,12 @@ # Standalone clients start a configured workflow by scheduling an orchestration # with this name, e.g. # ``client.schedule_new_orchestration(WORKFLOW_ORCHESTRATOR_NAME, input=...)``. +# +# DEPRECATED (multi-workflow migration): this fixed single-workflow name is being +# replaced by per-workflow orchestration names ``dafx-{workflowName}`` derived via +# :func:`agent_framework_durabletask._workflows.naming.workflow_orchestrator_name`. +# It is retained for source compatibility while the single-workflow hosting path +# still uses it; new code should prefer ``workflow_orchestrator_name(name)``. WORKFLOW_ORCHESTRATOR_NAME = "workflow_orchestrator" diff --git a/python/packages/durabletask/tests/test_workflow_naming.py b/python/packages/durabletask/tests/test_workflow_naming.py new file mode 100644 index 00000000000..c09fda21369 --- /dev/null +++ b/python/packages/durabletask/tests/test_workflow_naming.py @@ -0,0 +1,107 @@ +# Copyright (c) Microsoft. All rights reserved. + +"""Unit tests for the durable workflow naming helpers. + +These helpers derive the **stable** durable names a hosted workflow registers +under. Stability matters: durable replay resumes an in-flight orchestration only +if the orchestration name still resolves, so the round-trip +(``workflow_orchestrator_name`` ↔ ``workflow_name_from_orchestrator``) and the +validation rules (reject empty / malformed / auto-generated names) are the +contract the multi-workflow hosting builds on. +""" + +import uuid + +import pytest + +from agent_framework_durabletask import ( + DURABLE_NAME_PREFIX, + is_auto_generated_workflow_name, + validate_workflow_name, + workflow_name_from_orchestrator, + workflow_orchestrator_name, +) + + +class TestWorkflowOrchestratorName: + """``workflow_orchestrator_name`` derives ``dafx-{name}`` for valid names.""" + + def test_prepends_prefix(self) -> None: + assert workflow_orchestrator_name("orders") == "dafx-orders" + + def test_uses_shared_prefix_constant(self) -> None: + assert workflow_orchestrator_name("orders") == f"{DURABLE_NAME_PREFIX}orders" + + @pytest.mark.parametrize("name", ["a", "Order_Processor", "spam-detection", "wf123"]) + def test_accepts_valid_names(self, name: str) -> None: + assert workflow_orchestrator_name(name) == f"dafx-{name}" + + @pytest.mark.parametrize("name", ["", "1abc", "has space", "bad/char", "emoji😀"]) + def test_rejects_invalid_names(self, name: str) -> None: + with pytest.raises(ValueError): + workflow_orchestrator_name(name) + + +class TestWorkflowNameRoundTrip: + """``workflow_name_from_orchestrator`` inverts ``workflow_orchestrator_name``.""" + + @pytest.mark.parametrize("name", ["orders", "Order_Processor", "spam-detection", "wf123"]) + def test_round_trips(self, name: str) -> None: + orchestrator = workflow_orchestrator_name(name) + assert workflow_name_from_orchestrator(orchestrator) == name + + def test_returns_none_without_prefix(self) -> None: + # A bare orchestration name (no dafx- prefix) is "not one of ours". + assert workflow_name_from_orchestrator("workflow_orchestrator") is None + + def test_returns_none_for_prefix_only(self) -> None: + assert workflow_name_from_orchestrator(DURABLE_NAME_PREFIX) is None + + def test_strips_only_leading_prefix(self) -> None: + # Reverse is meant for orchestration names; it strips just the prefix, so a + # scoped activity-style name returns the remainder verbatim. + assert workflow_name_from_orchestrator("dafx-orders-translator") == "orders-translator" + + +class TestValidateWorkflowName: + """``validate_workflow_name`` rejects unstable / unsafe identities.""" + + @pytest.mark.parametrize("name", ["a", "A", "wf", "Order_Processor", "spam-detection", "x" * 63]) + def test_accepts_valid(self, name: str) -> None: + validate_workflow_name(name) # should not raise + + def test_rejects_empty(self) -> None: + with pytest.raises(ValueError, match="non-empty"): + validate_workflow_name("") + + @pytest.mark.parametrize("name", ["1abc", "-abc", "_abc", "has space", "bad/char", "a.b", "x" * 64]) + def test_rejects_malformed(self, name: str) -> None: + with pytest.raises(ValueError, match="invalid"): + validate_workflow_name(name) + + def test_rejects_auto_generated(self) -> None: + name = f"WorkflowBuilder-{uuid.uuid4()}" + with pytest.raises(ValueError, match="auto-generated"): + validate_workflow_name(name) + + +class TestIsAutoGeneratedWorkflowName: + """``is_auto_generated_workflow_name`` detects WorkflowBuilder defaults.""" + + def test_detects_uuid_default(self) -> None: + assert is_auto_generated_workflow_name(f"WorkflowBuilder-{uuid.uuid4()}") is True + + def test_detects_uppercase_hex_uuid(self) -> None: + assert is_auto_generated_workflow_name(f"WorkflowBuilder-{str(uuid.uuid4()).upper()}") is True + + @pytest.mark.parametrize( + "name", + [ + "orders", + "WorkflowBuilder", + "WorkflowBuilder-not-a-uuid", + "MyWorkflowBuilder-3f2b1c0a-1234-5678-9abc-def012345678", + ], + ) + def test_ignores_explicit_names(self, name: str) -> None: + assert is_auto_generated_workflow_name(name) is False From eb3691ebf4afaa2d7614f6bfe066f6a77fc981ad Mon Sep 17 00:00:00 2001 From: Ahmed Muhsin Date: Tue, 23 Jun 2026 16:23:49 -0400 Subject: [PATCH 02/12] feat(durabletask): host multiple workflows per worker with scoped names (phase 1) Enables hosting more than one MAF workflow on a single standalone Durable Task worker, and aligns both hosts on workflow-scoped durable names so two co-hosted workflows that reuse an executor id cannot collide. Naming (shared, host-agnostic): - orchestration: dafx-{workflowName} (matches .NET; the name DT tooling surfaces) - non-agent activity / agent entity: dafx-{workflowName}-{executorId} (scoped) - New naming helpers workflow_scoped_executor_id / workflow_executor_activity_name. Standalone worker (agent-framework-durabletask): - configure_workflow is now additive: stores workflows keyed by Workflow.name, rejects duplicate / auto-generated (WorkflowBuilder-) / invalid names, registers one orchestrator per workflow plus its scoped activities/entities. - The shared orchestrator dispatches scoped names derived from workflow.name. - New registered_workflow_names property. Client (DurableWorkflowClient): - Optional default workflow_name on the client; start/run/stream accept a per-call workflow_name and target dafx-{name}. - Opt-in ownership validation on status/HITL methods: when a workflow name is resolvable, an instance whose orchestration name does not match is treated as not-found (status -> None, pending -> [], send_hitl_response / await -> raise), mirroring the Azure Functions route-scoping check. Azure Functions host (agent-framework-azurefunctions): - Registration now uses the same scoped names so the shared orchestrator's dispatch matches (single workflow per app for now; flat workflow/* routes kept). - Workflow name is validated up front; workflow agents register under the scoped entity id; _is_workflow_orchestration scopes to dafx-{workflow.name}. Samples + tests: - Durable Task and Azure Functions workflow samples now name their workflow. - Unit tests cover multi-workflow registration, name validation, client targeting, and ownership; integration tests target the named workflows. WORKFLOW_ORCHESTRATOR_NAME remains exported (deprecated). This is a hard switch: in-flight single-workflow instances created before upgrade (under the old workflow_orchestrator name) will not resume. Design: docs/design/durabletask-multiworkflow-and-subworkflows.md --- .../agent_framework_azurefunctions/_app.py | 52 ++++--- .../packages/azurefunctions/tests/test_app.py | 53 ++++--- .../agent_framework_durabletask/_worker.py | 90 ++++++++---- .../_workflows/client.py | 134 ++++++++++++++++-- .../_workflows/naming.py | 36 +++++ .../_workflows/orchestrator.py | 32 ++++- .../integration_tests/test_08_dt_workflow.py | 11 +- .../test_09_dt_workflow_hitl.py | 10 +- .../packages/durabletask/tests/test_worker.py | 77 +++++++++- .../durabletask/tests/test_workflow_client.py | 105 ++++++++++++-- .../09_workflow_shared_state/function_app.py | 2 +- .../function_app.py | 2 +- .../11_workflow_parallel/function_app.py | 2 +- .../12_workflow_hitl/function_app.py | 2 +- .../durabletask/08_workflow/client.py | 10 +- .../durabletask/08_workflow/worker.py | 3 +- .../durabletask/09_workflow_hitl/client.py | 7 +- .../durabletask/09_workflow_hitl/worker.py | 3 +- 18 files changed, 514 insertions(+), 117 deletions(-) diff --git a/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py b/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py index f46ba895d41..c31d9a9b700 100644 --- a/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py +++ b/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py @@ -32,7 +32,6 @@ THREAD_ID_HEADER, WAIT_FOR_RESPONSE_FIELD, WAIT_FOR_RESPONSE_HEADER, - WORKFLOW_ORCHESTRATOR_NAME, AgentResponseCallbackProtocol, AgentSessionId, ApiResponseFields, @@ -43,6 +42,12 @@ execute_workflow_activity, plan_workflow_registration, ) +from agent_framework_durabletask._workflows.naming import ( + validate_workflow_name, + workflow_executor_activity_name, + workflow_orchestrator_name, + workflow_scoped_executor_id, +) from agent_framework_durabletask._workflows.serialization import strip_pickle_markers from ._entities import create_agent_entity @@ -256,21 +261,25 @@ def __init__( # is shared with the standalone durabletask host via plan_workflow_registration. if workflow: logger.debug("[AgentFunctionApp] Extracting agents from workflow") + # Durable names are derived from the workflow name and must be stable across + # restarts, so reject an unnamed/auto-generated workflow up front. + validate_workflow_name(workflow.name) plan = plan_workflow_registration(workflow) for agent_executor in plan.agent_executors: - # Register each workflow agent through the same surface as a - # standalone agent (so it is tracked in ``agents`` / ``get_agent``), - # but keyed by the executor id the orchestrator dispatches to, so - # AgentExecutor(agent, id=...) works when the id differs from - # agent.name. Mirrors DurableAIAgentWorker.add_agent(entity_id=...). + # Register each workflow agent through the same surface as a standalone + # agent (so it stays tracked in ``agents`` / ``get_agent``), under the + # workflow-scoped entity id ``{workflow}-{executor}`` the orchestrator + # dispatches to. This keeps two co-hosted workflows that reuse an + # executor id from colliding on one global entity name. self.add_agent( agent_executor.agent, callback=self.default_callback, - entity_id=agent_executor.id, + entity_id=workflow_scoped_executor_id(workflow.name, agent_executor.id), ) for executor in plan.activity_executors: - # Set up a Functions activity trigger for each non-agent executor. - self._setup_executor_activity(executor.id) + # Set up a Functions activity trigger for each non-agent executor, + # scoped by workflow name to match the orchestrator's dispatch. + self._setup_executor_activity(workflow.name, executor.id) self._setup_workflow_orchestration() @@ -286,13 +295,14 @@ def __init__( logger.debug("[AgentFunctionApp] Initialization complete") - def _setup_executor_activity(self, executor_id: str) -> None: + def _setup_executor_activity(self, workflow_name: str, executor_id: str) -> None: """Register an activity for executing a specific non-agent executor. Args: + workflow_name: The owning workflow's name (scopes the activity name). executor_id: The ID of the executor to create an activity for. """ - activity_name = f"dafx-{executor_id}" + activity_name = workflow_executor_activity_name(workflow_name, executor_id) logger.debug(f"[AgentFunctionApp] Registering activity '{activity_name}' for executor '{executor_id}'") # Capture executor_id in closure @@ -322,7 +332,11 @@ def executor_activity(inputData: str) -> str: def _setup_workflow_orchestration(self) -> None: """Register the workflow orchestration and related HTTP endpoints.""" + if self.workflow is None: + raise RuntimeError("Workflow not initialized in AgentFunctionApp") + orchestrator_name = workflow_orchestrator_name(self.workflow.name) + @self.function_name(orchestrator_name) @self.orchestration_trigger(context_name="context") def workflow_orchestrator(context: df.DurableOrchestrationContext) -> Any: # type: ignore[type-arg] """Generic orchestrator for running the configured workflow.""" @@ -358,7 +372,7 @@ async def start_workflow_orchestration( return self._build_error_response("Request body is required") client_input = raw_body.decode("utf-8") - instance_id = await client.start_new(WORKFLOW_ORCHESTRATOR_NAME, client_input=client_input) + instance_id = await client.start_new(orchestrator_name, client_input=client_input) base_url = self._build_base_url(req.url) status_url = f"{base_url}/api/workflow/status/{instance_id}" @@ -503,8 +517,7 @@ def _build_base_url(self, request_url: str) -> str: base_url = request_url.rstrip("/") return base_url - @staticmethod - def _is_workflow_orchestration(status: Any) -> bool: + def _is_workflow_orchestration(self, status: Any) -> bool: """Return whether a durable orchestration status belongs to this app's workflow. The ``workflow/status`` and ``workflow/respond`` endpoints address instances by @@ -513,14 +526,17 @@ def _is_workflow_orchestration(status: Any) -> bool: orchestrations, and other apps sharing the hub. Without this check a caller holding one instance ID could read another orchestration's status (including pending HITL request payloads) or inject external events into it. Scoping to - ``WORKFLOW_ORCHESTRATOR_NAME`` keeps both endpoints bound to the workflow this - app hosts; anything else is treated as "not found". + the workflow's ``dafx-{name}`` orchestration name keeps both endpoints bound to + the workflow this app hosts; anything else is treated as "not found". The orchestration name is compared case-insensitively so the check stays robust - as workflow orchestrator naming evolves (e.g. per-workflow names). + to host/runtime casing differences. """ + if self.workflow is None: + return False + expected = workflow_orchestrator_name(self.workflow.name) name = getattr(status, "name", None) - return isinstance(name, str) and name.casefold() == WORKFLOW_ORCHESTRATOR_NAME.casefold() + return isinstance(name, str) and name.casefold() == expected.casefold() @property def agents(self) -> dict[str, SupportsAgentRun]: diff --git a/python/packages/azurefunctions/tests/test_app.py b/python/packages/azurefunctions/tests/test_app.py index 1e1d361a789..efed9f32375 100644 --- a/python/packages/azurefunctions/tests/test_app.py +++ b/python/packages/azurefunctions/tests/test_app.py @@ -19,10 +19,10 @@ THREAD_ID_HEADER, WAIT_FOR_RESPONSE_FIELD, WAIT_FOR_RESPONSE_HEADER, - WORKFLOW_ORCHESTRATOR_NAME, AgentEntity, AgentEntityStateProviderMixin, DurableAgentState, + workflow_orchestrator_name, ) from agent_framework_azurefunctions import AgentFunctionApp @@ -1332,6 +1332,7 @@ class TestAgentFunctionAppWorkflow: def test_init_with_workflow_stores_workflow(self) -> None: """Test that workflow is stored when provided.""" mock_workflow = Mock() + mock_workflow.name = "test_workflow" mock_workflow.executors = {} with ( @@ -1356,6 +1357,7 @@ def test_init_with_workflow_registers_agent_entity_by_executor_id(self) -> None: mock_executor.id = "custom-executor-id" mock_workflow = Mock() + mock_workflow.name = "orders" mock_workflow.executors = {"custom-executor-id": mock_executor} with ( @@ -1365,17 +1367,17 @@ def test_init_with_workflow_registers_agent_entity_by_executor_id(self) -> None: ): app = AgentFunctionApp(workflow=mock_workflow) - # The entity is registered under the executor id (the dispatch identity). + # The entity is registered under the workflow-scoped dispatch identity. setup_entity.assert_called_once() call_args = setup_entity.call_args.args assert call_args[0] is mock_agent - assert call_args[1] == "custom-executor-id" + assert call_args[1] == "orders-custom-executor-id" # Regression guard: the workflow agent must also be tracked on the app's - # normal registration surface, keyed by the executor id, so it appears in - # ``agents`` and is retrievable via ``get_agent`` (as the constructor documents). - assert "custom-executor-id" in app.agents - assert app.agents["custom-executor-id"] is mock_agent + # normal registration surface, keyed by the scoped id, so it appears in + # ``agents`` and is retrievable via ``get_agent``. + assert "orders-custom-executor-id" in app.agents + assert app.agents["orders-custom-executor-id"] is mock_agent def test_init_with_workflow_calls_setup_methods(self) -> None: """Test that workflow setup methods are called.""" @@ -1383,6 +1385,7 @@ def test_init_with_workflow_calls_setup_methods(self) -> None: mock_executor.id = "TestExecutor" mock_workflow = Mock() + mock_workflow.name = "test_workflow" # Include a non-AgentExecutor so _setup_executor_activity is called mock_workflow.executors = {"TestExecutor": mock_executor} @@ -1421,6 +1424,7 @@ def test_init_with_workflow_and_explicit_agent_does_not_raise(self) -> None: mock_executor.id = "SharedAgent" mock_workflow = Mock() + mock_workflow.name = "shared_flow" mock_workflow.executors = {"SharedAgent": mock_executor} with ( @@ -1437,6 +1441,7 @@ def test_init_with_workflow_and_explicit_agent_does_not_raise(self) -> None: def test_build_status_url(self) -> None: """Test _build_status_url constructs correct URL.""" mock_workflow = Mock() + mock_workflow.name = "test_workflow" mock_workflow.executors = {} with ( @@ -1452,6 +1457,7 @@ def test_build_status_url(self) -> None: def test_build_status_url_handles_trailing_slash(self) -> None: """Test _build_status_url handles URLs without /api/ correctly.""" mock_workflow = Mock() + mock_workflow.name = "test_workflow" mock_workflow.executors = {} with ( @@ -1524,40 +1530,55 @@ class TestWorkflowOrchestrationScoping: "not found" instead of leaking its status/HITL details or accepting injected events. """ + def _app_for(self, workflow_name: str) -> AgentFunctionApp: + mock_workflow = Mock() + mock_workflow.name = workflow_name + mock_workflow.executors = {} + with ( + patch.object(AgentFunctionApp, "_setup_executor_activity"), + patch.object(AgentFunctionApp, "_setup_workflow_orchestration"), + ): + return AgentFunctionApp(workflow=mock_workflow) + @pytest.mark.parametrize( "name", [ - WORKFLOW_ORCHESTRATOR_NAME, - WORKFLOW_ORCHESTRATOR_NAME.upper(), - "Workflow_Orchestrator", # case-insensitive: must match + workflow_orchestrator_name("orders"), # exact dafx-orders + workflow_orchestrator_name("orders").upper(), # case-insensitive: must match + "DAFX-orders", # mixed case prefix ], ) def test_accepts_matching_workflow_orchestration(self, name: str) -> None: + app = self._app_for("orders") status = Mock() status.name = name - assert AgentFunctionApp._is_workflow_orchestration(status) is True + assert app._is_workflow_orchestration(status) is True def test_rejects_none_status(self) -> None: # client.get_status returns None when no instance resolves for the ID. - assert AgentFunctionApp._is_workflow_orchestration(None) is False + app = self._app_for("orders") + assert app._is_workflow_orchestration(None) is False def test_rejects_status_without_name(self) -> None: + app = self._app_for("orders") status = Mock() status.name = None - assert AgentFunctionApp._is_workflow_orchestration(status) is False + assert app._is_workflow_orchestration(status) is False @pytest.mark.parametrize( "other_name", [ "SomeUserOrchestration", - "dafx-WeatherAgent", - "workflow_orchestrator_v2", + "dafx-WeatherAgent", # an agent entity, not this workflow's orchestration + "dafx-billing", # a *different* workflow's orchestration + "workflow_orchestrator", # the deprecated fixed name ], ) def test_rejects_other_orchestration_name(self, other_name: str) -> None: + app = self._app_for("orders") status = Mock() status.name = other_name - assert AgentFunctionApp._is_workflow_orchestration(status) is False + assert app._is_workflow_orchestration(status) is False if __name__ == "__main__": diff --git a/python/packages/durabletask/agent_framework_durabletask/_worker.py b/python/packages/durabletask/agent_framework_durabletask/_worker.py index 2e78ebb05a4..7d1a44ba099 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_worker.py +++ b/python/packages/durabletask/agent_framework_durabletask/_worker.py @@ -21,7 +21,13 @@ from ._entities import AgentEntity, DurableTaskEntityStateProvider from ._workflows.activity import execute_workflow_activity from ._workflows.dt_context import DurableTaskWorkflowContext -from ._workflows.orchestrator import WORKFLOW_ORCHESTRATOR_NAME, run_workflow_orchestrator +from ._workflows.naming import ( + validate_workflow_name, + workflow_executor_activity_name, + workflow_orchestrator_name, + workflow_scoped_executor_id, +) +from ._workflows.orchestrator import run_workflow_orchestrator from ._workflows.registration import plan_workflow_registration logger = logging.getLogger("agent_framework.durabletask") @@ -81,7 +87,7 @@ def __init__( self._worker = worker self._callback = callback self._registered_agents: dict[str, SupportsAgentRun] = {} - self._workflow: Workflow | None = None + self._workflows: dict[str, Workflow] = {} logger.debug("[DurableAIAgentWorker] Initialized with worker type: %s", type(worker).__name__) def add_agent( @@ -165,6 +171,16 @@ def registered_agent_names(self) -> list[str]: """ return list(self._registered_agents.keys()) + @property + def registered_workflow_names(self) -> list[str]: + """Get the names of all workflows configured on this worker. + + Returns: + List of workflow names (the identities used to derive each workflow's + ``dafx-{name}`` orchestration). + """ + return list(self._workflows.keys()) + # ----------------------------------------------------------------- # Workflow support # ----------------------------------------------------------------- @@ -180,43 +196,63 @@ def configure_workflow( entities, registers non-agent executors as activities, and creates an orchestrator function that drives the workflow graph. + Multiple workflows can be hosted on one worker: call this method once per + workflow. Each workflow is keyed by its :attr:`Workflow.name`, and its + durable primitives are scoped by that name (orchestration + ``dafx-{name}``; activities/entities ``dafx-{name}-{executorId}``) so two + co-hosted workflows that reuse an executor id do not collide. + Args: - workflow: The MAF :class:`Workflow` to register. + workflow: The MAF :class:`Workflow` to register. Must have an explicit, + stable :attr:`Workflow.name` (an auto-generated + ``WorkflowBuilder-`` name is rejected because it is not stable + across restarts and would break durable resume). callback: Optional callback for agent response notifications. + + Raises: + ValueError: If the workflow name is missing, invalid, auto-generated, + or already registered on this worker. """ - self._workflow = workflow + workflow_name = workflow.name + validate_workflow_name(workflow_name) + if workflow_name in self._workflows: + raise ValueError(f"Workflow '{workflow_name}' is already registered on this worker.") + self._workflows[workflow_name] = workflow # The "what to register" decision (agent -> entity, non-agent -> activity) # is shared with the Azure Functions host via plan_workflow_registration. plan = plan_workflow_registration(workflow) - # Register agent executors as durable entities. Each entity is keyed by - # the executor's id (the identity the orchestrator dispatches to) so - # AgentExecutor(agent, id=...) works even when the id differs from the - # agent's name. + # Register agent executors as durable entities, scoped by workflow name so + # two workflows that reuse an executor id register distinct entities. The + # entity is keyed by the scoped identity (the same identity the orchestrator + # dispatches to); the entity *key* at run time is the orchestration instance + # id, which keeps conversation state isolated per run. for agent_executor in plan.agent_executors: - if agent_executor.id not in self._registered_agents: - self.add_agent(agent_executor.agent, callback=callback, entity_id=agent_executor.id) + scoped_id = workflow_scoped_executor_id(workflow_name, agent_executor.id) + if scoped_id not in self._registered_agents: + self.add_agent(agent_executor.agent, callback=callback, entity_id=scoped_id) - # Register non-agent executors as durable activities. + # Register non-agent executors as durable activities, scoped by workflow name. for executor in plan.activity_executors: - self._register_executor_activity(executor) + self._register_executor_activity(workflow, executor) - # Register the workflow orchestrator. - self._register_workflow_orchestrator() + # Register this workflow's orchestrator under its per-workflow name. + self._register_workflow_orchestrator(workflow) logger.info( - "[DurableAIAgentWorker] Workflow configured with %d executors (%d agents, %d activities)", + "[DurableAIAgentWorker] Workflow '%s' configured with %d executors (%d agents, %d activities)", + workflow_name, len(workflow.executors), len(plan.agent_executors), len(plan.activity_executors), ) - def _register_executor_activity(self, executor: Any) -> None: - """Register a non-agent executor as a durabletask activity.""" + def _register_executor_activity(self, workflow: Workflow, executor: Any) -> None: + """Register a non-agent executor as a durabletask activity (workflow-scoped).""" captured_executor = executor - captured_workflow = self._workflow - activity_name = f"dafx-{executor.id}" + captured_workflow = workflow + activity_name = workflow_executor_activity_name(workflow.name, executor.id) def executor_activity(ctx: ActivityContext, input_data: str) -> str: return execute_workflow_activity(captured_executor, input_data, captured_workflow) @@ -228,14 +264,12 @@ def executor_activity(ctx: ActivityContext, input_data: str) -> str: self._worker.add_activity(executor_activity) logger.debug("[DurableAIAgentWorker] Registered activity: %s", activity_name) - def _register_workflow_orchestrator(self) -> None: - """Register the workflow orchestrator function with the worker.""" - captured_workflow = self._workflow + def _register_workflow_orchestrator(self, workflow: Workflow) -> None: + """Register a workflow's orchestrator function under its per-workflow name.""" + captured_workflow = workflow + orchestrator_name = workflow_orchestrator_name(workflow.name) def workflow_orchestrator(context: OrchestrationContext, input_data: Any) -> Any: - if captured_workflow is None: - raise RuntimeError("Workflow not configured") - # Pass the deserialized client input straight to the shared engine, which # reconstructs the start executor's declared type (see _coerce_initial_input). initial_message = input_data @@ -245,11 +279,11 @@ def workflow_orchestrator(context: OrchestrationContext, input_data: Any) -> Any outputs = yield from run_workflow_orchestrator(dt_ctx, captured_workflow, initial_message, shared_state) return outputs # noqa: B901 - workflow_orchestrator.__name__ = WORKFLOW_ORCHESTRATOR_NAME - workflow_orchestrator.__qualname__ = WORKFLOW_ORCHESTRATOR_NAME + workflow_orchestrator.__name__ = orchestrator_name + workflow_orchestrator.__qualname__ = orchestrator_name self._worker.add_orchestrator(workflow_orchestrator) - logger.debug("[DurableAIAgentWorker] Registered workflow orchestrator") + logger.debug("[DurableAIAgentWorker] Registered workflow orchestrator: %s", orchestrator_name) def __create_agent_entity( self, diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/client.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/client.py index 7faaaf10dda..6b1e833127f 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/client.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/client.py @@ -19,7 +19,7 @@ from agent_framework import WorkflowEvent from durabletask.client import TaskHubGrpcClient -from .orchestrator import WORKFLOW_ORCHESTRATOR_NAME +from .naming import workflow_orchestrator_name from .serialization import deserialize_workflow_event, deserialize_workflow_output, strip_pickle_markers logger = logging.getLogger("agent_framework.durabletask") @@ -45,52 +45,107 @@ class DurableWorkflowClient: # Create the underlying client client = DurableTaskSchedulerClient(host_address="localhost:8080", taskhub="default") - # Wrap it with the workflow client - workflow_client = DurableWorkflowClient(client) + # Wrap it with the workflow client, defaulting to the workflow named "orders" + workflow_client = DurableWorkflowClient(client, workflow_name="orders") # Start a workflow and wait for its output instance_id = workflow_client.start_workflow(input="some input") output = workflow_client.await_workflow_output(instance_id) print(output) + + # A client without a default targets workflows explicitly per call: + multi = DurableWorkflowClient(client) + instance_id = multi.start_workflow(input="...", workflow_name="billing") ``` """ - def __init__(self, client: TaskHubGrpcClient): + def __init__(self, client: TaskHubGrpcClient, *, workflow_name: str | None = None): """Initialize the workflow client wrapper. Args: client: The durabletask client instance to wrap. + workflow_name: Optional default workflow name to target. When set, the + per-call ``workflow_name`` may be omitted. When a worker hosts a + single workflow, set this once here; when it hosts several, either + set a default and override per call, or pass ``workflow_name`` on + each call. """ self._client = client + self._default_workflow_name = workflow_name logger.debug("[DurableWorkflowClient] Initialized with client type: %s", type(client).__name__) - def start_workflow(self, input: Any = None, *, instance_id: str | None = None) -> str: + def _resolve_workflow_name(self, workflow_name: str | None) -> str: + """Resolve the effective workflow name from a per-call value or the default. + + Raises: + ValueError: If neither a per-call ``workflow_name`` nor a constructor + default was provided. + """ + name = workflow_name or self._default_workflow_name + if not name: + raise ValueError( + "No workflow name provided. Pass workflow_name=... (or set a default on " + "DurableWorkflowClient(workflow_name=...)) so the client can target the " + "right orchestration." + ) + return name + + def start_workflow( + self, input: Any = None, *, workflow_name: str | None = None, instance_id: str | None = None + ) -> str: """Start the workflow orchestration registered by ``configure_workflow``. - This schedules the orchestrator that ``DurableAIAgentWorker.configure_workflow`` - auto-registers, so callers do not need to know its internal name. + This schedules the orchestration ``dafx-{workflow_name}`` that + ``DurableAIAgentWorker.configure_workflow`` auto-registers, so callers do + not need to know its internal name. Args: input: The initial message/payload for the workflow. + workflow_name: The workflow to start. Optional if a default was set on + the client; required otherwise. instance_id: Optional explicit orchestration instance ID. If omitted, one is generated. Returns: The orchestration instance ID, for use with ``await_workflow_output``. """ + orchestration_name = workflow_orchestrator_name(self._resolve_workflow_name(workflow_name)) new_instance_id = self._client.schedule_new_orchestration( - WORKFLOW_ORCHESTRATOR_NAME, + orchestration_name, input=input, instance_id=instance_id, ) logger.debug("[DurableWorkflowClient] Started workflow instance: %s", new_instance_id) return new_instance_id - def await_workflow_output(self, instance_id: str, *, timeout_seconds: int = 300) -> Any: + def _is_owned_orchestration(self, state: Any, workflow_name: str | None) -> bool: + """Return whether ``state`` belongs to the targeted workflow. + + Ownership validation is opt-in: when neither a per-call ``workflow_name`` + nor a constructor default is set there is nothing to validate against, so + this returns ``True``. When a name is resolvable, the instance's + orchestration name must equal ``dafx-{workflow_name}`` (compared + case-insensitively, mirroring the Azure Functions host's route-scoping + check). This guards against addressing an instance that belongs to a + different workflow on the same task hub. + """ + name = workflow_name or self._default_workflow_name + if not name: + return True + expected = workflow_orchestrator_name(name) + actual = getattr(state, "name", None) + return isinstance(actual, str) and actual.casefold() == expected.casefold() + + def await_workflow_output( + self, instance_id: str, *, workflow_name: str | None = None, timeout_seconds: int = 300 + ) -> Any: """Wait for a workflow orchestration to complete and return its output. Args: instance_id: The instance ID returned by ``start_workflow``. + workflow_name: Optional workflow name; when set (or a client default is + set) the instance's orchestration is validated to belong to that + workflow. timeout_seconds: Maximum time, in seconds, to wait for completion. Returns: @@ -100,11 +155,15 @@ def await_workflow_output(self, instance_id: str, *, timeout_seconds: int = 300) Raises: TimeoutError: If the workflow does not complete within ``timeout_seconds``. RuntimeError: If the workflow completes with a non-successful status. + ValueError: If the instance does not belong to the targeted workflow. """ metadata = self._client.wait_for_orchestration_completion(instance_id, timeout=timeout_seconds) if metadata is None: raise TimeoutError(f"Workflow '{instance_id}' did not complete within {timeout_seconds}s") + if not self._is_owned_orchestration(metadata, workflow_name): + raise ValueError(f"Instance '{instance_id}' does not belong to the targeted workflow.") + status = metadata.runtime_status.name if status != "COMPLETED": raise RuntimeError(f"Workflow '{instance_id}' ended with status {status}: {metadata.serialized_output}") @@ -120,6 +179,7 @@ async def run_workflow( self, input: Any = None, *, + workflow_name: str | None = None, instance_id: str | None = None, wait: bool = True, timeout_seconds: int = 300, @@ -132,6 +192,8 @@ async def run_workflow( Args: input: The initial message/payload for the workflow. + workflow_name: The workflow to start. Optional if a default was set on + the client; required otherwise. instance_id: Optional explicit orchestration instance ID. If omitted, one is generated. wait: When ``True`` (default), wait for completion and return the @@ -151,12 +213,16 @@ async def run_workflow( RuntimeError: If ``wait`` is ``True`` and the workflow ends with a non-successful status. """ - new_instance_id = await asyncio.to_thread(self.start_workflow, input, instance_id=instance_id) + new_instance_id = await asyncio.to_thread( + self.start_workflow, input, workflow_name=workflow_name, instance_id=instance_id + ) if not wait: return new_instance_id - return await asyncio.to_thread(self.await_workflow_output, new_instance_id, timeout_seconds=timeout_seconds) + return await asyncio.to_thread( + self.await_workflow_output, new_instance_id, workflow_name=workflow_name, timeout_seconds=timeout_seconds + ) - def get_runtime_status(self, instance_id: str) -> str | None: + def get_runtime_status(self, instance_id: str, *, workflow_name: str | None = None) -> str | None: """Return the workflow's current runtime status name, or ``None`` if unknown. Lets callers distinguish a workflow that is still running or paused for @@ -166,20 +232,27 @@ def get_runtime_status(self, instance_id: str) -> str | None: Args: instance_id: The instance ID returned by ``start_workflow``. + workflow_name: Optional workflow name; when set (or a client default is + set) an instance that does not belong to that workflow returns + ``None`` (treated as "not found"). Returns: The runtime status name (e.g. ``"RUNNING"``, ``"COMPLETED"``), or - ``None`` if no state is available for the instance. + ``None`` if no state is available for the instance or it belongs to a + different workflow. """ state = self._client.get_orchestration_state(instance_id) if state is None: return None + if not self._is_owned_orchestration(state, workflow_name): + return None return state.runtime_status.name async def stream_workflow( self, instance_id: str, *, + workflow_name: str | None = None, poll_interval_seconds: float = 1.0, timeout_seconds: int | None = None, ) -> AsyncIterator[WorkflowEvent]: @@ -201,6 +274,9 @@ async def stream_workflow( Args: instance_id: The instance ID returned by ``start_workflow``. + workflow_name: Optional workflow name; when set (or a client default is + set) the instance is validated to belong to that workflow before + streaming. poll_interval_seconds: Delay between status polls. timeout_seconds: Optional overall timeout; ``None`` streams until the workflow reaches a terminal state. @@ -210,14 +286,22 @@ async def stream_workflow( Raises: TimeoutError: If ``timeout_seconds`` elapses before completion. + ValueError: If the instance does not belong to the targeted workflow. """ cursor = 0 terminal_statuses = {"COMPLETED", "FAILED", "TERMINATED"} deadline = None if timeout_seconds is None else time.monotonic() + timeout_seconds + ownership_checked = False while True: state = await asyncio.to_thread(self._client.get_orchestration_state, instance_id) + # Validate ownership once, on the first poll that returns state. + if state is not None and not ownership_checked: + if not self._is_owned_orchestration(state, workflow_name): + raise ValueError(f"Instance '{instance_id}' does not belong to the targeted workflow.") + ownership_checked = True + if state is not None and state.serialized_custom_status: try: status = json.loads(state.serialized_custom_status) @@ -240,7 +324,7 @@ async def stream_workflow( await asyncio.sleep(poll_interval_seconds) - def get_pending_hitl_requests(self, instance_id: str) -> list[dict[str, Any]]: + def get_pending_hitl_requests(self, instance_id: str, *, workflow_name: str | None = None) -> list[dict[str, Any]]: """Return the workflow's pending human-in-the-loop (HITL) requests, if any. While a workflow is paused awaiting human input, the orchestrator records the @@ -249,6 +333,9 @@ def get_pending_hitl_requests(self, instance_id: str) -> list[dict[str, Any]]: Args: instance_id: The workflow instance ID returned by ``start_workflow``. + workflow_name: Optional workflow name; when set (or a client default is + set) an instance that does not belong to that workflow returns an + empty list (treated as "not found"). Returns: A list of pending requests. Each entry contains ``request_id``, @@ -258,6 +345,8 @@ def get_pending_hitl_requests(self, instance_id: str) -> list[dict[str, Any]]: state = self._client.get_orchestration_state(instance_id) if state is None or not state.serialized_custom_status: return [] + if not self._is_owned_orchestration(state, workflow_name): + return [] try: custom_status = json.loads(state.serialized_custom_status) @@ -287,7 +376,9 @@ def get_pending_hitl_requests(self, instance_id: str) -> list[dict[str, Any]]: }) return requests - def send_hitl_response(self, instance_id: str, request_id: str, response: Any) -> None: + def send_hitl_response( + self, instance_id: str, request_id: str, response: Any, *, workflow_name: str | None = None + ) -> None: """Send a response to a pending HITL request, resuming the workflow. The orchestrator correlates the response by using ``request_id`` as the @@ -298,11 +389,24 @@ def send_hitl_response(self, instance_id: str, request_id: str, response: Any) - request_id: The pending request's ID (from ``get_pending_hitl_requests``). response: The response payload (e.g. a dict matching the expected response type the executor's ``@response_handler`` expects). + workflow_name: Optional workflow name; when set (or a client default is + set) the instance is validated to belong to that workflow before the + event is raised, so a response is never injected into a different + workflow's orchestration. + + Raises: + ValueError: If the instance does not belong to the targeted workflow. Note: The payload is sanitized with ``strip_pickle_markers`` before delivery to neutralize pickle-marker injection, since the worker deserializes it. """ + # Validate ownership before raising the event when a target is resolvable. + if workflow_name or self._default_workflow_name: + state = self._client.get_orchestration_state(instance_id) + if state is None or not self._is_owned_orchestration(state, workflow_name): + raise ValueError(f"Instance '{instance_id}' does not belong to the targeted workflow.") + safe_response = strip_pickle_markers(response) self._client.raise_orchestration_event(instance_id, event_name=request_id, data=safe_response) logger.debug( diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py index 07c65e62e9e..a39fb9cd604 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py @@ -32,8 +32,10 @@ "DURABLE_NAME_PREFIX", "is_auto_generated_workflow_name", "validate_workflow_name", + "workflow_executor_activity_name", "workflow_name_from_orchestrator", "workflow_orchestrator_name", + "workflow_scoped_executor_id", ] # Shared prefix for every durable name this hosting layer registers. Matches @@ -95,6 +97,40 @@ def workflow_name_from_orchestrator(orchestrator_name: str) -> str | None: return name or None +def workflow_scoped_executor_id(workflow_name: str, executor_id: str) -> str: + """Return the workflow-scoped identity for an executor. + + Inner executors (non-agent activities and agent entities) are scoped by + workflow so two co-hosted workflows that reuse an ``executor_id`` register and + dispatch to distinct durable primitives instead of colliding on one global + name. This is the **unprefixed** identity (e.g. used as + :class:`~agent_framework_durabletask.AgentSessionId` ``name``, which the entity + layer then prefixes); see :func:`workflow_executor_activity_name` for the full + activity function name. + + Args: + workflow_name: The owning workflow's name. + executor_id: The executor's id within that workflow. + + Returns: + ``"{workflow_name}-{executor_id}"``. + """ + return f"{workflow_name}-{executor_id}" + + +def workflow_executor_activity_name(workflow_name: str, executor_id: str) -> str: + """Return the durable activity function name for a non-agent executor. + + Args: + workflow_name: The owning workflow's name. + executor_id: The executor's id within that workflow. + + Returns: + ``"dafx-{workflow_name}-{executor_id}"``. + """ + return f"{DURABLE_NAME_PREFIX}{workflow_scoped_executor_id(workflow_name, executor_id)}" + + def validate_workflow_name(workflow_name: str) -> None: """Validate that a workflow name is usable as a stable durable identity. diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py index e211d8c9ae0..22f7367484c 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py @@ -49,6 +49,7 @@ from agent_framework._workflows._state import State from .context import WorkflowOrchestrationContext +from .naming import workflow_executor_activity_name, workflow_scoped_executor_id from .serialization import ( deserialize_value, reconstruct_to_type, @@ -232,10 +233,19 @@ def _prepare_agent_task( ctx: WorkflowOrchestrationContext, executor_id: str, message: Any, + workflow_name: str, ) -> Any: - """Prepare an agent task for execution via the context adapter.""" + """Prepare an agent task for execution via the context adapter. + + The agent entity is addressed by the workflow-scoped identity + ``{workflow_name}-{executor_id}`` so two co-hosted workflows that reuse an + executor id dispatch to distinct entities (the entity layer prefixes this with + ``dafx-``). The session *key* stays the orchestration instance id, so + conversation state remains isolated per run. + """ message_content = _extract_message_content(message) - return ctx.prepare_agent_task(executor_id, message_content, ctx.instance_id) + scoped_id = workflow_scoped_executor_id(workflow_name, executor_id) + return ctx.prepare_agent_task(scoped_id, message_content, ctx.instance_id) def _prepare_activity_task( @@ -244,8 +254,14 @@ def _prepare_activity_task( message: Any, source_executor_id: str, shared_state_snapshot: dict[str, Any] | None, + workflow_name: str, ) -> Any: - """Prepare an activity task for execution via the context adapter.""" + """Prepare an activity task for execution via the context adapter. + + The activity is dispatched under the workflow-scoped name + ``dafx-{workflow_name}-{executor_id}`` so two co-hosted workflows that reuse an + executor id register and dispatch to distinct activity functions. + """ activity_input = { "executor_id": executor_id, "message": serialize_value(message), @@ -253,7 +269,7 @@ def _prepare_activity_task( "source_executor_ids": [source_executor_id], } activity_input_json = json.dumps(activity_input) - activity_name = f"dafx-{executor_id}" + activity_name = workflow_executor_activity_name(workflow_name, executor_id) return ctx.prepare_activity_task(activity_name, activity_input_json) @@ -655,7 +671,9 @@ def _prepare_all_tasks( agent_messages_by_executor[executor_id].append((executor_id, message, source_executor_id)) else: logger.debug("Preparing activity task: %s", executor_id) - task = _prepare_activity_task(ctx, executor_id, message, source_executor_id, shared_state) + task = _prepare_activity_task( + ctx, executor_id, message, source_executor_id, shared_state, workflow.name + ) all_tasks.append(task) task_metadata_list.append( TaskMetadata( @@ -671,7 +689,7 @@ def _prepare_all_tasks( remaining = messages_list[1:] logger.debug("Preparing agent task: %s", executor_id) - task = _prepare_agent_task(ctx, first_msg[0], first_msg[1]) + task = _prepare_agent_task(ctx, first_msg[0], first_msg[1], workflow.name) all_tasks.append(task) task_metadata_list.append( TaskMetadata( @@ -815,7 +833,7 @@ def publish_live_status(state: str, pending_requests: dict[str, Any] | None = No # Phase 3: Process sequential agent messages for executor_id, message, _source_executor_id in remaining_agent_messages: logger.debug("Processing sequential message for agent: %s", executor_id) - task = _prepare_agent_task(ctx, executor_id, message) + task = _prepare_agent_task(ctx, executor_id, message, workflow.name) agent_response: AgentResponse = yield task logger.debug("Agent %s sequential response completed", executor_id) diff --git a/python/packages/durabletask/tests/integration_tests/test_08_dt_workflow.py b/python/packages/durabletask/tests/integration_tests/test_08_dt_workflow.py index 6b544c37918..2d55986f13d 100644 --- a/python/packages/durabletask/tests/integration_tests/test_08_dt_workflow.py +++ b/python/packages/durabletask/tests/integration_tests/test_08_dt_workflow.py @@ -5,7 +5,7 @@ Exercises the standalone (non-Azure-Functions) workflow path: - ``DurableAIAgentWorker.configure_workflow`` auto-registers the agent entities, non-agent executor activities, and the workflow orchestrator. -- A client starts the workflow by scheduling ``WORKFLOW_ORCHESTRATOR_NAME``. +- A client starts the workflow by scheduling its ``dafx-{workflow_name}`` orchestration. - Conditional routing sends spam to a non-agent handler and legitimate email through a second agent and a sender executor. """ @@ -16,7 +16,10 @@ import pytest from durabletask.client import OrchestrationStatus -from agent_framework_durabletask import WORKFLOW_ORCHESTRATOR_NAME, DurableAIAgentClient +from agent_framework_durabletask import DurableAIAgentClient, workflow_orchestrator_name + +# Must match the workflow name in samples/04-hosting/durabletask/08_workflow/worker.py +WORKFLOW_NAME = "email_triage" logging.basicConfig(level=logging.WARNING) @@ -50,7 +53,7 @@ def setup(self, agent_client_factory: type[AgentClientFactoryProtocol], orchestr def test_legitimate_email_drafts_response(self) -> None: """A legitimate email routes through the email agent and is 'sent'.""" instance_id = self.dts_client.schedule_new_orchestration( - orchestrator=WORKFLOW_ORCHESTRATOR_NAME, + orchestrator=workflow_orchestrator_name(WORKFLOW_NAME), input=( "Hi team, just a reminder about our sprint planning meeting tomorrow at 10 AM. " "Please review the agenda in Jira." @@ -69,7 +72,7 @@ def test_legitimate_email_drafts_response(self) -> None: def test_spam_email_handled(self) -> None: """A spam email routes to the non-agent spam handler.""" instance_id = self.dts_client.schedule_new_orchestration( - orchestrator=WORKFLOW_ORCHESTRATOR_NAME, + orchestrator=workflow_orchestrator_name(WORKFLOW_NAME), input="URGENT! You've won $1,000,000! Click here now to claim your prize! Limited time offer!", ) diff --git a/python/packages/durabletask/tests/integration_tests/test_09_dt_workflow_hitl.py b/python/packages/durabletask/tests/integration_tests/test_09_dt_workflow_hitl.py index 6ff54222e79..63ce9343df8 100644 --- a/python/packages/durabletask/tests/integration_tests/test_09_dt_workflow_hitl.py +++ b/python/packages/durabletask/tests/integration_tests/test_09_dt_workflow_hitl.py @@ -21,6 +21,9 @@ logging.basicConfig(level=logging.WARNING) +# Must match the workflow name in samples/04-hosting/durabletask/09_workflow_hitl/worker.py +WORKFLOW_NAME = "content_moderation" + # Module-level markers pytestmark = [ pytest.mark.flaky, @@ -38,7 +41,7 @@ def _wait_for_hitl_request( """Poll until the workflow records at least one pending HITL request.""" deadline = time.time() + timeout_seconds while time.time() < deadline: - pending = client.get_pending_hitl_requests(instance_id) + pending = client.get_pending_hitl_requests(instance_id, workflow_name=WORKFLOW_NAME) if pending: return pending time.sleep(2) @@ -55,7 +58,7 @@ def setup(self, workflow_client: DurableWorkflowClient) -> None: def _run_case(self, submission: dict[str, Any], *, approve: bool) -> Any: """Start a moderation case, answer the HITL pause, and return the final output.""" - instance_id = self.client.start_workflow(input=submission) + instance_id = self.client.start_workflow(input=submission, workflow_name=WORKFLOW_NAME) pending = _wait_for_hitl_request(self.client, instance_id) request = pending[0] @@ -66,9 +69,10 @@ def _run_case(self, submission: dict[str, Any], *, approve: bool) -> Any: instance_id, request["request_id"], {"approved": approve, "reviewer_notes": "Looks good." if approve else "Violates content policy."}, + workflow_name=WORKFLOW_NAME, ) - return self.client.await_workflow_output(instance_id, timeout_seconds=180) + return self.client.await_workflow_output(instance_id, workflow_name=WORKFLOW_NAME, timeout_seconds=180) def test_hitl_workflow_approval(self) -> None: """Appropriate content is approved after the reviewer says yes.""" diff --git a/python/packages/durabletask/tests/test_worker.py b/python/packages/durabletask/tests/test_worker.py index 315690a821e..91ba8e6105a 100644 --- a/python/packages/durabletask/tests/test_worker.py +++ b/python/packages/durabletask/tests/test_worker.py @@ -179,11 +179,11 @@ def test_add_agent_with_entity_id_registers_under_override( def test_configure_workflow_registers_agent_entity_by_executor_id( self, agent_worker: DurableAIAgentWorker, mock_grpc_worker: Mock ) -> None: - """Workflow agent executors register entities keyed by executor id. + """Workflow agent executors register entities keyed by the workflow-scoped id. - The orchestrator dispatches by executor id, so an - ``AgentExecutor(agent, id=...)`` whose id differs from the agent name must - still be reachable. + The orchestrator dispatches by the scoped identity + ``{workflow}-{executorId}``, so an ``AgentExecutor(agent, id=...)`` whose id + differs from the agent name must still be reachable under that scoped id. """ from agent_framework import AgentExecutor @@ -194,12 +194,14 @@ def test_configure_workflow_registers_agent_entity_by_executor_id( agent_executor.agent = agent workflow = Mock() + workflow.name = "review" workflow.executors = {"custom-executor-id": agent_executor} agent_worker.configure_workflow(workflow) - assert "custom-executor-id" in agent_worker.registered_agent_names + assert "review-custom-executor-id" in agent_worker.registered_agent_names assert "Reviewer" not in agent_worker.registered_agent_names + assert "custom-executor-id" not in agent_worker.registered_agent_names mock_grpc_worker.add_orchestrator.assert_called_once() def test_configure_workflow_registers_non_agent_executor_as_activity( @@ -212,6 +214,7 @@ def test_configure_workflow_registers_non_agent_executor_as_activity( activity_executor.id = "router-node" workflow = Mock() + workflow.name = "route" workflow.executors = {"router-node": activity_executor} agent_worker.configure_workflow(workflow) @@ -219,6 +222,70 @@ def test_configure_workflow_registers_non_agent_executor_as_activity( assert agent_worker.registered_agent_names == [] mock_grpc_worker.add_activity.assert_called_once() mock_grpc_worker.add_orchestrator.assert_called_once() + # The activity is registered under the workflow-scoped name. + registered_activity = mock_grpc_worker.add_activity.call_args[0][0] + assert registered_activity.__name__ == "dafx-route-router-node" + + +class TestMultiWorkflowRegistration: + """Test hosting multiple workflows on one worker with scoped names.""" + + def _agent_workflow(self, name: str, executor_id: str) -> Mock: + from agent_framework import AgentExecutor + + agent = Mock() + agent.name = "Assistant" + agent_executor = Mock(spec=AgentExecutor) + agent_executor.id = executor_id + agent_executor.agent = agent + + workflow = Mock() + workflow.name = name + workflow.executors = {executor_id: agent_executor} + return workflow + + def test_two_workflows_reusing_executor_id_do_not_collide(self, agent_worker: DurableAIAgentWorker) -> None: + """Two workflows that reuse an executor id register distinct scoped entities.""" + agent_worker.configure_workflow(self._agent_workflow("orders", "assistant")) + agent_worker.configure_workflow(self._agent_workflow("billing", "assistant")) + + assert "orders-assistant" in agent_worker.registered_agent_names + assert "billing-assistant" in agent_worker.registered_agent_names + assert set(agent_worker.registered_workflow_names) == {"orders", "billing"} + + def test_registers_one_orchestrator_per_workflow( + self, agent_worker: DurableAIAgentWorker, mock_grpc_worker: Mock + ) -> None: + """Each configured workflow registers its own orchestrator.""" + agent_worker.configure_workflow(self._agent_workflow("orders", "a")) + agent_worker.configure_workflow(self._agent_workflow("billing", "b")) + + assert mock_grpc_worker.add_orchestrator.call_count == 2 + registered_names = {call.args[0].__name__ for call in mock_grpc_worker.add_orchestrator.call_args_list} + assert registered_names == {"dafx-orders", "dafx-billing"} + + def test_rejects_duplicate_workflow_name(self, agent_worker: DurableAIAgentWorker) -> None: + """Configuring two workflows with the same name is rejected.""" + agent_worker.configure_workflow(self._agent_workflow("orders", "a")) + + with pytest.raises(ValueError, match="already registered"): + agent_worker.configure_workflow(self._agent_workflow("orders", "b")) + + def test_rejects_auto_generated_workflow_name(self, agent_worker: DurableAIAgentWorker) -> None: + """A workflow with an auto-generated WorkflowBuilder name is rejected.""" + import uuid + + workflow = self._agent_workflow(f"WorkflowBuilder-{uuid.uuid4()}", "a") + + with pytest.raises(ValueError, match="auto-generated"): + agent_worker.configure_workflow(workflow) + + def test_rejects_invalid_workflow_name(self, agent_worker: DurableAIAgentWorker) -> None: + """A workflow with an invalid name is rejected.""" + workflow = self._agent_workflow("has space", "a") + + with pytest.raises(ValueError, match="invalid"): + agent_worker.configure_workflow(workflow) if __name__ == "__main__": diff --git a/python/packages/durabletask/tests/test_workflow_client.py b/python/packages/durabletask/tests/test_workflow_client.py index 81006e5a828..a80a012bfe2 100644 --- a/python/packages/durabletask/tests/test_workflow_client.py +++ b/python/packages/durabletask/tests/test_workflow_client.py @@ -15,7 +15,7 @@ from agent_framework import WorkflowEvent from agent_framework_durabletask import DurableWorkflowClient -from agent_framework_durabletask._workflows.orchestrator import WORKFLOW_ORCHESTRATOR_NAME +from agent_framework_durabletask._workflows.naming import workflow_orchestrator_name from agent_framework_durabletask._workflows.serialization import serialize_value, serialize_workflow_event @@ -45,14 +45,14 @@ class TestStartWorkflow: def test_start_workflow_schedules_orchestrator( self, workflow_client: DurableWorkflowClient, mock_client: Mock ) -> None: - """start_workflow schedules the auto-registered orchestrator by name.""" + """start_workflow schedules the per-workflow orchestration by name.""" mock_client.schedule_new_orchestration.return_value = "instance-1" - result = workflow_client.start_workflow(input="hello") + result = workflow_client.start_workflow(input="hello", workflow_name="orders") assert result == "instance-1" mock_client.schedule_new_orchestration.assert_called_once_with( - WORKFLOW_ORCHESTRATOR_NAME, input="hello", instance_id=None + workflow_orchestrator_name("orders"), input="hello", instance_id=None ) def test_start_workflow_passes_non_string_input_unchanged( @@ -62,7 +62,7 @@ def test_start_workflow_passes_non_string_input_unchanged( mock_client.schedule_new_orchestration.return_value = "instance-2" payload = {"order_id": 42, "items": ["a", "b"]} - workflow_client.start_workflow(input=payload) + workflow_client.start_workflow(input=payload, workflow_name="orders") _, kwargs = mock_client.schedule_new_orchestration.call_args assert kwargs["input"] == payload @@ -73,12 +73,100 @@ def test_start_workflow_forwards_instance_id( """An explicit instance id is forwarded to the underlying client.""" mock_client.schedule_new_orchestration.return_value = "explicit-id" - workflow_client.start_workflow(input="x", instance_id="explicit-id") + workflow_client.start_workflow(input="x", workflow_name="orders", instance_id="explicit-id") _, kwargs = mock_client.schedule_new_orchestration.call_args assert kwargs["instance_id"] == "explicit-id" +class TestWorkflowNameTargeting: + """Resolving the target workflow name from a default or per-call value.""" + + def test_uses_constructor_default(self, mock_client: Mock) -> None: + """A client default workflow name is used when none is passed per call.""" + client = DurableWorkflowClient(mock_client, workflow_name="billing") + mock_client.schedule_new_orchestration.return_value = "i" + + client.start_workflow(input="x") + + mock_client.schedule_new_orchestration.assert_called_once_with( + workflow_orchestrator_name("billing"), input="x", instance_id=None + ) + + def test_per_call_overrides_default(self, mock_client: Mock) -> None: + """A per-call workflow name overrides the constructor default.""" + client = DurableWorkflowClient(mock_client, workflow_name="billing") + mock_client.schedule_new_orchestration.return_value = "i" + + client.start_workflow(input="x", workflow_name="orders") + + mock_client.schedule_new_orchestration.assert_called_once_with( + workflow_orchestrator_name("orders"), input="x", instance_id=None + ) + + def test_raises_when_no_name_resolvable(self, workflow_client: DurableWorkflowClient) -> None: + """With no default and no per-call name, starting raises a clear error.""" + with pytest.raises(ValueError, match="No workflow name"): + workflow_client.start_workflow(input="x") + + +class TestOwnershipValidation: + """Opt-in validation that an instance belongs to the targeted workflow.""" + + def test_runtime_status_returns_none_for_foreign_instance(self, mock_client: Mock) -> None: + """A status query scoped to a workflow returns None for a foreign instance.""" + client = DurableWorkflowClient(mock_client, workflow_name="orders") + state = Mock() + state.name = workflow_orchestrator_name("billing") # different workflow + state.runtime_status.name = "RUNNING" + mock_client.get_orchestration_state.return_value = state + + assert client.get_runtime_status("instance-1") is None + + def test_runtime_status_returns_status_for_owned_instance(self, mock_client: Mock) -> None: + """A status query returns the status for an instance of the targeted workflow.""" + client = DurableWorkflowClient(mock_client, workflow_name="orders") + state = Mock() + state.name = workflow_orchestrator_name("orders") + state.runtime_status.name = "RUNNING" + mock_client.get_orchestration_state.return_value = state + + assert client.get_runtime_status("instance-1") == "RUNNING" + + def test_pending_hitl_empty_for_foreign_instance(self, mock_client: Mock) -> None: + """Pending HITL is empty for an instance of a different workflow.""" + client = DurableWorkflowClient(mock_client, workflow_name="orders") + state = Mock() + state.name = workflow_orchestrator_name("billing") + state.serialized_custom_status = json.dumps({"pending_requests": {"req-1": {"source_executor_id": "x"}}}) + mock_client.get_orchestration_state.return_value = state + + assert client.get_pending_hitl_requests("instance-1") == [] + + def test_send_hitl_rejects_foreign_instance(self, mock_client: Mock) -> None: + """Sending a HITL response to a foreign instance raises and does not deliver.""" + client = DurableWorkflowClient(mock_client, workflow_name="orders") + state = Mock() + state.name = workflow_orchestrator_name("billing") + mock_client.get_orchestration_state.return_value = state + + with pytest.raises(ValueError, match="does not belong"): + client.send_hitl_response("instance-1", "req-1", {"approved": True}) + + mock_client.raise_orchestration_event.assert_not_called() + + def test_send_hitl_allows_owned_instance(self, mock_client: Mock) -> None: + """Sending a HITL response to an owned instance delivers the event.""" + client = DurableWorkflowClient(mock_client, workflow_name="orders") + state = Mock() + state.name = workflow_orchestrator_name("orders") + mock_client.get_orchestration_state.return_value = state + + client.send_hitl_response("instance-1", "req-1", {"approved": True}) + + mock_client.raise_orchestration_event.assert_called_once() + + class TestAwaitWorkflowOutput: """Test awaiting workflow completion and output.""" @@ -356,11 +444,12 @@ async def test_waits_and_returns_output_by_default( """By default run_workflow starts the workflow and returns its deserialized output.""" mock_client.schedule_new_orchestration.return_value = "instance-1" metadata = Mock() + metadata.name = workflow_orchestrator_name("orders") metadata.runtime_status.name = "COMPLETED" metadata.serialized_output = json.dumps(["done"]) mock_client.wait_for_orchestration_completion.return_value = metadata - result = await workflow_client.run_workflow(input="hello") + result = await workflow_client.run_workflow(input="hello", workflow_name="orders") assert result == ["done"] mock_client.schedule_new_orchestration.assert_called_once() @@ -372,7 +461,7 @@ async def test_no_wait_returns_instance_id_without_awaiting( """With wait=False, run_workflow returns the instance id and does not await completion.""" mock_client.schedule_new_orchestration.return_value = "instance-2" - result = await workflow_client.run_workflow(input="hello", wait=False) + result = await workflow_client.run_workflow(input="hello", workflow_name="orders", wait=False) assert result == "instance-2" mock_client.wait_for_orchestration_completion.assert_not_called() diff --git a/python/samples/04-hosting/azure_functions/09_workflow_shared_state/function_app.py b/python/samples/04-hosting/azure_functions/09_workflow_shared_state/function_app.py index d8f2a8b5931..585296cf031 100644 --- a/python/samples/04-hosting/azure_functions/09_workflow_shared_state/function_app.py +++ b/python/samples/04-hosting/azure_functions/09_workflow_shared_state/function_app.py @@ -220,7 +220,7 @@ def _create_workflow() -> Workflow: # False -> submit_to_email_assistant -> email_assistant_agent -> finalize_and_send # True -> handle_spam return ( - WorkflowBuilder(start_executor=store_email) + WorkflowBuilder(name="email_triage_shared_state", start_executor=store_email) .add_edge(store_email, spam_detection_agent) .add_edge(spam_detection_agent, to_detection_result) .add_edge(to_detection_result, submit_to_email_assistant, condition=get_condition(False)) diff --git a/python/samples/04-hosting/azure_functions/10_workflow_no_shared_state/function_app.py b/python/samples/04-hosting/azure_functions/10_workflow_no_shared_state/function_app.py index d451bd9f6cc..3b75fea04e0 100644 --- a/python/samples/04-hosting/azure_functions/10_workflow_no_shared_state/function_app.py +++ b/python/samples/04-hosting/azure_functions/10_workflow_no_shared_state/function_app.py @@ -187,7 +187,7 @@ def _create_workflow() -> Workflow: # Build workflow return ( - WorkflowBuilder(start_executor=spam_agent) + WorkflowBuilder(name="email_triage", start_executor=spam_agent) .add_switch_case_edge_group( spam_agent, [ diff --git a/python/samples/04-hosting/azure_functions/11_workflow_parallel/function_app.py b/python/samples/04-hosting/azure_functions/11_workflow_parallel/function_app.py index 8c9dd00f1c4..4da134124ac 100644 --- a/python/samples/04-hosting/azure_functions/11_workflow_parallel/function_app.py +++ b/python/samples/04-hosting/azure_functions/11_workflow_parallel/function_app.py @@ -400,7 +400,7 @@ def _create_workflow() -> Workflow: # Build workflow with parallel patterns return ( - WorkflowBuilder(start_executor=input_router) + WorkflowBuilder(name="parallel_review", start_executor=input_router) # Pattern 1: Fan-out to two executors (run in parallel) .add_fan_out_edges( source=input_router, diff --git a/python/samples/04-hosting/azure_functions/12_workflow_hitl/function_app.py b/python/samples/04-hosting/azure_functions/12_workflow_hitl/function_app.py index 5e02faada5e..42af7ddf1c9 100644 --- a/python/samples/04-hosting/azure_functions/12_workflow_hitl/function_app.py +++ b/python/samples/04-hosting/azure_functions/12_workflow_hitl/function_app.py @@ -388,7 +388,7 @@ def _create_workflow() -> Workflow: # input_router -> content_analyzer_agent -> content_analyzer_executor # -> human_review_executor (HITL pause here) -> publish_executor return ( - WorkflowBuilder(start_executor=input_router) + WorkflowBuilder(name="content_moderation", start_executor=input_router) .add_edge(input_router, content_analyzer_agent) .add_edge(content_analyzer_agent, content_analyzer_executor) .add_edge(content_analyzer_executor, human_review_executor) diff --git a/python/samples/04-hosting/durabletask/08_workflow/client.py b/python/samples/04-hosting/durabletask/08_workflow/client.py index 7f8a40ff841..da8adffde85 100644 --- a/python/samples/04-hosting/durabletask/08_workflow/client.py +++ b/python/samples/04-hosting/durabletask/08_workflow/client.py @@ -3,9 +3,9 @@ """Client that starts the standalone workflow orchestration and prints the result. The worker (``worker.py``) must be running first. The workflow is started via -``DurableWorkflowClient.start_workflow`` - which schedules the orchestrator that -``DurableAIAgentWorker.configure_workflow`` auto-registers, so the caller never -needs to know its internal name. +``DurableWorkflowClient.start_workflow`` - which schedules the ``dafx-{name}`` +orchestration that ``DurableAIAgentWorker.configure_workflow`` auto-registers for +the workflow named ``email_triage``. Prerequisites: - ``worker.py`` running and connected to the same Durable Task Scheduler. @@ -26,6 +26,8 @@ logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) +WORKFLOW_NAME = "email_triage" + def get_client(taskhub: str | None = None, endpoint: str | None = None) -> DurableTaskSchedulerClient: """Create a configured DurableTaskSchedulerClient.""" @@ -53,7 +55,7 @@ def run_workflow(client: DurableWorkflowClient, email_content: str) -> None: async def main() -> None: """Run the workflow against a legitimate email and a spam email.""" - client = DurableWorkflowClient(get_client()) + client = DurableWorkflowClient(get_client(), workflow_name=WORKFLOW_NAME) logger.info("TEST 1: Legitimate email") run_workflow( diff --git a/python/samples/04-hosting/durabletask/08_workflow/worker.py b/python/samples/04-hosting/durabletask/08_workflow/worker.py index 1f233854a5e..889e767bb57 100644 --- a/python/samples/04-hosting/durabletask/08_workflow/worker.py +++ b/python/samples/04-hosting/durabletask/08_workflow/worker.py @@ -54,6 +54,7 @@ SPAM_AGENT_NAME = "SpamDetectionAgent" EMAIL_AGENT_NAME = "EmailAssistantAgent" +WORKFLOW_NAME = "email_triage" SPAM_DETECTION_INSTRUCTIONS = ( "You are a spam detection assistant that identifies spam emails. " @@ -148,7 +149,7 @@ def create_workflow() -> Workflow: email_sender = EmailSenderExecutor(id="email_sender") return ( - WorkflowBuilder(start_executor=spam_agent) + WorkflowBuilder(name=WORKFLOW_NAME, start_executor=spam_agent) .add_switch_case_edge_group( spam_agent, [ diff --git a/python/samples/04-hosting/durabletask/09_workflow_hitl/client.py b/python/samples/04-hosting/durabletask/09_workflow_hitl/client.py index 73d06dde3a8..af31d3c354d 100644 --- a/python/samples/04-hosting/durabletask/09_workflow_hitl/client.py +++ b/python/samples/04-hosting/durabletask/09_workflow_hitl/client.py @@ -33,6 +33,8 @@ logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) +WORKFLOW_NAME = "content_moderation" + def get_client(taskhub: str | None = None, endpoint: str | None = None) -> DurableTaskSchedulerClient: """Create a configured DurableTaskSchedulerClient.""" @@ -67,8 +69,7 @@ def _wait_for_hitl_request( status = client.get_runtime_status(instance_id) if status in terminal_statuses: raise RuntimeError( - f"Workflow instance {instance_id} reached terminal state '{status}' " - "before pausing for human input." + f"Workflow instance {instance_id} reached terminal state '{status}' before pausing for human input." ) time.sleep(2) raise TimeoutError(f"Timed out waiting for a HITL request on instance {instance_id}") @@ -96,7 +97,7 @@ def run_case(client: DurableWorkflowClient, submission: dict[str, Any], *, appro async def main() -> None: """Run an approved case and a rejected case.""" - client = DurableWorkflowClient(get_client()) + client = DurableWorkflowClient(get_client(), workflow_name=WORKFLOW_NAME) logger.info("CASE 1: Appropriate content (will approve)") run_case( diff --git a/python/samples/04-hosting/durabletask/09_workflow_hitl/worker.py b/python/samples/04-hosting/durabletask/09_workflow_hitl/worker.py index 774482b998f..e8d5c71d82a 100644 --- a/python/samples/04-hosting/durabletask/09_workflow_hitl/worker.py +++ b/python/samples/04-hosting/durabletask/09_workflow_hitl/worker.py @@ -63,6 +63,7 @@ logger = logging.getLogger(__name__) CONTENT_ANALYZER_AGENT_NAME = "ContentAnalyzerAgent" +WORKFLOW_NAME = "content_moderation" CONTENT_ANALYZER_INSTRUCTIONS = ( "You are a content moderation assistant that analyzes user-submitted content for policy compliance. " @@ -285,7 +286,7 @@ def create_workflow() -> Workflow: publish_executor = PublishExecutor() return ( - WorkflowBuilder(start_executor=input_router) + WorkflowBuilder(name=WORKFLOW_NAME, start_executor=input_router) .add_edge(input_router, content_analyzer_agent) .add_edge(content_analyzer_agent, content_analyzer_executor) .add_edge(content_analyzer_executor, human_review_executor) From de06e0be7d847507c1e5138cbd2f075de7686a48 Mon Sep 17 00:00:00 2001 From: Ahmed Muhsin Date: Tue, 23 Jun 2026 16:38:07 -0400 Subject: [PATCH 03/12] feat(azurefunctions): host multiple workflows per app with per-workflow routes (phase 2) Completes multi-workflow hosting on the Azure Functions host, building on the shared scoped-naming foundation from the worker phase. AgentFunctionApp: - New `workflows=` parameter accepting a list (keyed by each `Workflow.name`) or a name->Workflow mapping; the existing `workflow=` is a single-workflow alias. Both may be combined. Duplicate names and mapping-key/name mismatches are rejected. - Each workflow registers its own `dafx-{name}` orchestration, workflow-scoped activities/entities, and per-workflow HTTP routes: `workflow/{name}/run`, `workflow/{name}/status/{instanceId}`, `workflow/{name}/respond/{instanceId}/{requestId}`. Routes are always per-workflow (even for a single workflow) so callers don't change URLs as an app grows from one workflow to many. - Route ownership check is per-workflow (`_is_owned_orchestration(status, name)`): a leaked instance id for another orchestration -- or another workflow -- is treated as not-found, extending the route-scoping defense. - `get_agent(context, name, workflow_name=...)` resolves a workflow agent under its scoped id; bare `agents=` registration keeps the standalone surface. New `workflows` introspection property; `.workflow` now returns the sole workflow (or None when several are hosted). - Removed the now-unused flat-URL helper `_build_status_url` (handlers inline per-workflow URLs). Samples + tests: - Azure Functions workflow samples (09-12) name their workflow; integration tests target the per-workflow routes. - Unit tests cover multi-workflow registration, duplicate/mapping/auto-name rejection, and per-workflow ownership. Note: sample README / demo.http route docs are updated in the docs phase. Design: docs/design/durabletask-multiworkflow-and-subworkflows.md --- .../agent_framework_azurefunctions/_app.py | 220 +++++++++++------- .../test_09_workflow_shared_state.py | 9 +- .../test_10_workflow_no_shared_state.py | 11 +- .../test_11_workflow_parallel.py | 7 +- .../test_12_workflow_hitl.py | 21 +- .../packages/azurefunctions/tests/test_app.py | 76 ++++-- .../12_workflow_hitl/function_app.py | 11 +- 7 files changed, 234 insertions(+), 121 deletions(-) diff --git a/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py b/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py index c31d9a9b700..099e8a7de86 100644 --- a/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py +++ b/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py @@ -193,10 +193,12 @@ def my_orchestration(context): enable_mcp_tool_trigger: Whether MCP tool triggers are created for agents max_poll_retries: Maximum polling attempts when waiting for responses poll_interval_seconds: Delay (seconds) between polling attempts - workflow: Optional Workflow instance for workflow orchestration + workflow: The sole hosted Workflow when exactly one is hosted, else ``None`` + (use :attr:`workflows` to access all hosted workflows by name) """ _agent_metadata: dict[str, AgentMetadata] + _workflows: dict[str, Workflow] enable_health_check: bool enable_http_endpoints: bool enable_mcp_tool_trigger: bool @@ -206,6 +208,7 @@ def __init__( self, agents: list[SupportsAgentRun] | None = None, workflow: Workflow | None = None, + workflows: list[Workflow] | Mapping[str, Workflow] | None = None, http_auth_level: func.AuthLevel = func.AuthLevel.FUNCTION, enable_health_check: bool = True, enable_http_endpoints: bool = True, @@ -217,7 +220,11 @@ def __init__( """Initialize the AgentFunctionApp. :param agents: List of agent instances to register. - :param workflow: Optional Workflow instance to extract agents from and set up orchestration. + :param workflow: Optional single Workflow to host. Convenience alias for + ``workflows=[workflow]``; an app may host several workflows via ``workflows``. + :param workflows: Optional workflows to host, as a list (keyed by each + ``Workflow.name``) or a name->Workflow mapping. Each workflow gets its own + ``dafx-{name}`` orchestration and ``workflow/{name}/...`` HTTP routes. :param http_auth_level: HTTP authentication level (default: ``func.AuthLevel.FUNCTION``). :param enable_health_check: Enable the built-in health check endpoint (default: ``True``). :param enable_http_endpoints: Enable HTTP endpoints for agents (default: ``True``). @@ -238,11 +245,11 @@ def __init__( # Initialize agent metadata dictionary self._agent_metadata = {} + self._workflows: dict[str, Workflow] = {} self.enable_health_check = enable_health_check self.enable_http_endpoints = enable_http_endpoints self.enable_mcp_tool_trigger = enable_mcp_tool_trigger self.default_callback = default_callback - self.workflow = workflow try: retries = int(max_poll_retries) @@ -256,32 +263,14 @@ def __init__( interval = DEFAULT_POLL_INTERVAL_SECONDS self.poll_interval_seconds = interval if interval > 0 else DEFAULT_POLL_INTERVAL_SECONDS - # If workflow is provided, extract agents and set up orchestration. - # The "what to register" decision (agent -> entity, non-agent -> activity) - # is shared with the standalone durabletask host via plan_workflow_registration. - if workflow: - logger.debug("[AgentFunctionApp] Extracting agents from workflow") - # Durable names are derived from the workflow name and must be stable across - # restarts, so reject an unnamed/auto-generated workflow up front. - validate_workflow_name(workflow.name) - plan = plan_workflow_registration(workflow) - for agent_executor in plan.agent_executors: - # Register each workflow agent through the same surface as a standalone - # agent (so it stays tracked in ``agents`` / ``get_agent``), under the - # workflow-scoped entity id ``{workflow}-{executor}`` the orchestrator - # dispatches to. This keeps two co-hosted workflows that reuse an - # executor id from colliding on one global entity name. - self.add_agent( - agent_executor.agent, - callback=self.default_callback, - entity_id=workflow_scoped_executor_id(workflow.name, agent_executor.id), - ) - for executor in plan.activity_executors: - # Set up a Functions activity trigger for each non-agent executor, - # scoped by workflow name to match the orchestrator's dispatch. - self._setup_executor_activity(workflow.name, executor.id) + # Register each hosted workflow. ``workflow=`` is a convenience alias for a + # single-element ``workflows``; both may be combined. + for wf in self._collect_workflows(workflow, workflows): + self._register_workflow(wf) - self._setup_workflow_orchestration() + # Back-compat: expose the sole workflow as ``.workflow`` when exactly one is + # hosted (multi-workflow apps must address workflows by name). + self.workflow = next(iter(self._workflows.values())) if len(self._workflows) == 1 else None if agents: # Register all provided agents @@ -295,17 +284,82 @@ def __init__( logger.debug("[AgentFunctionApp] Initialization complete") - def _setup_executor_activity(self, workflow_name: str, executor_id: str) -> None: + def _collect_workflows( + self, + workflow: Workflow | None, + workflows: list[Workflow] | Mapping[str, Workflow] | None, + ) -> list[Workflow]: + """Combine the ``workflow`` alias and ``workflows`` into a de-duplicated list. + + A name->Workflow mapping must agree with each ``Workflow.name`` so a single + identity drives the durable names and HTTP routes. + + Raises: + ValueError: If a mapping key disagrees with its ``Workflow.name``. + """ + collected: list[Workflow] = [] + if workflow is not None: + collected.append(workflow) + if isinstance(workflows, Mapping): + for key, wf in workflows.items(): + if key != wf.name: + raise ValueError(f"workflows mapping key '{key}' does not match Workflow.name '{wf.name}'.") + collected.append(wf) + elif workflows is not None: + collected.extend(workflows) + return collected + + def _register_workflow(self, workflow: Workflow) -> None: + """Register one workflow's durable primitives and HTTP routes. + + The "what to register" decision (agent -> entity, non-agent -> activity) is + shared with the standalone durabletask host via ``plan_workflow_registration``. + + Raises: + ValueError: If the workflow name is missing/invalid/auto-generated, or a + workflow with the same name is already registered. + """ + # Durable names are derived from the workflow name and must be stable across + # restarts, so reject an unnamed/auto-generated workflow up front. + validate_workflow_name(workflow.name) + if workflow.name in self._workflows: + raise ValueError(f"Workflow '{workflow.name}' is already registered on this app.") + self._workflows[workflow.name] = workflow + + logger.debug("[AgentFunctionApp] Registering workflow '%s'", workflow.name) + plan = plan_workflow_registration(workflow) + for agent_executor in plan.agent_executors: + # Register each workflow agent through the same surface as a standalone + # agent (so it stays tracked in ``agents`` / ``get_agent``), under the + # workflow-scoped entity id ``{workflow}-{executor}`` the orchestrator + # dispatches to. This keeps two co-hosted workflows that reuse an executor + # id from colliding on one global entity name. + self.add_agent( + agent_executor.agent, + callback=self.default_callback, + entity_id=workflow_scoped_executor_id(workflow.name, agent_executor.id), + ) + for executor in plan.activity_executors: + # Set up a Functions activity trigger for each non-agent executor, scoped + # by workflow name to match the orchestrator's dispatch. + self._setup_executor_activity(workflow, executor.id) + + self._setup_workflow_orchestration(workflow) + + def _setup_executor_activity(self, workflow: Workflow, executor_id: str) -> None: """Register an activity for executing a specific non-agent executor. Args: - workflow_name: The owning workflow's name (scopes the activity name). + workflow: The owning workflow (scopes the activity name and provides the + executor instance at run time). executor_id: The ID of the executor to create an activity for. """ - activity_name = workflow_executor_activity_name(workflow_name, executor_id) + activity_name = workflow_executor_activity_name(workflow.name, executor_id) logger.debug(f"[AgentFunctionApp] Registering activity '{activity_name}' for executor '{executor_id}'") - # Capture executor_id in closure + # Capture the specific workflow + executor id in the closure so the right + # executor runs even when several workflows are hosted. + captured_workflow = workflow captured_executor_id = executor_id @self.function_name(activity_name) @@ -318,31 +372,30 @@ def executor_activity(inputData: str) -> str: The execution body is shared with the standalone durabletask host via ``execute_workflow_activity``. """ - if not self.workflow: - raise RuntimeError("Workflow not initialized in AgentFunctionApp") - - executor = self.workflow.executors.get(captured_executor_id) + executor = captured_workflow.executors.get(captured_executor_id) if not executor: raise ValueError(f"Unknown executor: {captured_executor_id}") - return execute_workflow_activity(executor, inputData, self.workflow) + return execute_workflow_activity(executor, inputData, captured_workflow) # Ensure the function is registered (prevents garbage collection) _ = executor_activity - def _setup_workflow_orchestration(self) -> None: - """Register the workflow orchestration and related HTTP endpoints.""" - if self.workflow is None: - raise RuntimeError("Workflow not initialized in AgentFunctionApp") - orchestrator_name = workflow_orchestrator_name(self.workflow.name) + def _setup_workflow_orchestration(self, workflow: Workflow) -> None: + """Register a workflow's orchestration and its per-workflow HTTP endpoints. + + Routes are scoped by workflow name (``workflow/{name}/run`` etc.) so the URL + shape stays the same whether the app hosts one workflow or many; callers do + not have to change URLs as an app grows. + """ + captured_workflow = workflow + workflow_name = workflow.name + orchestrator_name = workflow_orchestrator_name(workflow_name) @self.function_name(orchestrator_name) @self.orchestration_trigger(context_name="context") def workflow_orchestrator(context: df.DurableOrchestrationContext) -> Any: # type: ignore[type-arg] """Generic orchestrator for running the configured workflow.""" - if self.workflow is None: - raise RuntimeError("Workflow not initialized in AgentFunctionApp") - input_data = context.get_input() # Pass the deserialized client input straight to the shared engine, which @@ -352,11 +405,12 @@ def workflow_orchestrator(context: df.DurableOrchestrationContext) -> Any: # ty # Create local shared state dict for cross-executor state sharing shared_state: dict[str, Any] = {} - outputs = yield from run_workflow_orchestrator(context, self.workflow, initial_message, shared_state) + outputs = yield from run_workflow_orchestrator(context, captured_workflow, initial_message, shared_state) # Durable Functions runtime extracts return value from StopIteration return outputs # noqa: B901 - @self.route(route="workflow/run", methods=["POST"]) + @self.function_name(f"{orchestrator_name}-start") + @self.route(route=f"workflow/{workflow_name}/run", methods=["POST"]) @self.durable_client_input(client_name="client") async def start_workflow_orchestration( req: func.HttpRequest, client: df.DurableOrchestrationClient @@ -375,20 +429,21 @@ async def start_workflow_orchestration( instance_id = await client.start_new(orchestrator_name, client_input=client_input) base_url = self._build_base_url(req.url) - status_url = f"{base_url}/api/workflow/status/{instance_id}" + status_url = f"{base_url}/api/workflow/{workflow_name}/status/{instance_id}" return func.HttpResponse( json.dumps({ "instanceId": instance_id, "statusQueryGetUri": status_url, - "respondUri": f"{base_url}/api/workflow/respond/{instance_id}/{{requestId}}", + "respondUri": f"{base_url}/api/workflow/{workflow_name}/respond/{instance_id}/{{requestId}}", "message": "Workflow started", }), status_code=202, mimetype="application/json", ) - @self.route(route="workflow/status/{instanceId}", methods=["GET"]) + @self.function_name(f"{orchestrator_name}-status") + @self.route(route=f"workflow/{workflow_name}/status/{{instanceId}}", methods=["GET"]) @self.durable_client_input(client_name="client") async def get_workflow_status( req: func.HttpRequest, client: df.DurableOrchestrationClient @@ -400,11 +455,11 @@ async def get_workflow_status( status = await client.get_status(instance_id) - # Scope the endpoint to this app's workflow orchestrator. The durable client + # Scope the endpoint to this workflow's orchestrator. The durable client # resolves instance IDs across every orchestration in the task hub, so an ID - # belonging to a different orchestration must be treated as "not found" rather - # than leaking its status (including pending HITL request details). - if not self._is_workflow_orchestration(status): + # belonging to a different orchestration (or a different workflow) must be + # treated as "not found" rather than leaking its status / HITL details. + if not self._is_owned_orchestration(status, workflow_name): return self._build_error_response("Instance not found", status_code=404) # The workflow's yielded outputs are checkpoint-encoded by the shared @@ -442,7 +497,7 @@ async def get_workflow_status( "requestData": req_data.get("data"), # type: ignore[reportUnknownMemberType] "requestType": req_data.get("request_type"), # type: ignore[reportUnknownMemberType] "responseType": req_data.get("response_type"), # type: ignore[reportUnknownMemberType] - "respondUrl": f"{base_url}/api/workflow/respond/{instance_id}/{req_id}", + "respondUrl": f"{base_url}/api/workflow/{workflow_name}/respond/{instance_id}/{req_id}", }) response["pendingHumanInputRequests"] = pending_requests @@ -452,7 +507,8 @@ async def get_workflow_status( mimetype="application/json", ) - @self.route(route="workflow/respond/{instanceId}/{requestId}", methods=["POST"]) + @self.function_name(f"{orchestrator_name}-respond") + @self.route(route=f"workflow/{workflow_name}/respond/{{instanceId}}/{{requestId}}", methods=["POST"]) @self.durable_client_input(client_name="client") async def send_hitl_response(req: func.HttpRequest, client: df.DurableOrchestrationClient) -> func.HttpResponse: """HTTP endpoint to send a response to a pending HITL request. @@ -466,11 +522,11 @@ async def send_hitl_response(req: func.HttpRequest, client: df.DurableOrchestrat if not instance_id or not request_id: return self._build_error_response("Instance ID and Request ID are required.") - # Scope the endpoint to this app's workflow orchestrator before raising an + # Scope the endpoint to this workflow's orchestrator before raising an # external event, so a leaked instance ID cannot be used to inject events into - # a different orchestration in the task hub. + # a different orchestration (or a different workflow) in the task hub. status = await client.get_status(instance_id) - if not self._is_workflow_orchestration(status): + if not self._is_owned_orchestration(status, workflow_name): return self._build_error_response("Instance not found", status_code=404) try: @@ -505,11 +561,6 @@ async def send_hitl_response(req: func.HttpRequest, client: df.DurableOrchestrat _ = get_workflow_status _ = send_hitl_response - def _build_status_url(self, request_url: str, instance_id: str) -> str: - """Build the status URL for a workflow instance.""" - base_url = self._build_base_url(request_url) - return f"{base_url}/api/workflow/status/{instance_id}" - def _build_base_url(self, request_url: str) -> str: """Extract the base URL from a request URL.""" base_url, _, _ = request_url.partition("/api/") @@ -517,24 +568,22 @@ def _build_base_url(self, request_url: str) -> str: base_url = request_url.rstrip("/") return base_url - def _is_workflow_orchestration(self, status: Any) -> bool: - """Return whether a durable orchestration status belongs to this app's workflow. + def _is_owned_orchestration(self, status: Any, workflow_name: str) -> bool: + """Return whether a durable orchestration status belongs to the named workflow. - The ``workflow/status`` and ``workflow/respond`` endpoints address instances by - ``instanceId`` alone, but the durable client resolves IDs across *every* - orchestration in the task hub -- agent entities, any user-registered - orchestrations, and other apps sharing the hub. Without this check a caller - holding one instance ID could read another orchestration's status (including - pending HITL request payloads) or inject external events into it. Scoping to - the workflow's ``dafx-{name}`` orchestration name keeps both endpoints bound to - the workflow this app hosts; anything else is treated as "not found". + The ``workflow/{name}/status`` and ``.../respond`` endpoints address instances + by ``instanceId`` alone, but the durable client resolves IDs across *every* + orchestration in the task hub -- agent entities, other workflows on this app, + any user-registered orchestrations, and other apps sharing the hub. Without + this check a caller holding one instance ID could read another orchestration's + status (including pending HITL request payloads) or inject external events into + it. Scoping to the route's ``dafx-{workflow_name}`` orchestration keeps each + endpoint bound to its own workflow; anything else is treated as "not found". The orchestration name is compared case-insensitively so the check stays robust to host/runtime casing differences. """ - if self.workflow is None: - return False - expected = workflow_orchestrator_name(self.workflow.name) + expected = workflow_orchestrator_name(workflow_name) name = getattr(status, "name", None) return isinstance(name, str) and name.casefold() == expected.casefold() @@ -547,6 +596,11 @@ def agents(self) -> dict[str, SupportsAgentRun]: """ return {name: metadata.agent for name, metadata in self._agent_metadata.items()} + @property + def workflows(self) -> dict[str, Workflow]: + """Returns a dict of workflow name to the hosted :class:`Workflow` instances.""" + return dict(self._workflows) + def add_agent( self, agent: SupportsAgentRun, @@ -630,12 +684,18 @@ def get_agent( self, context: AgentOrchestrationContextType, agent_name: str, + workflow_name: str | None = None, ) -> DurableAIAgent[AgentTask]: """Return a DurableAIAgent proxy for a registered agent. Args: context: Durable Functions orchestration context invoking the agent. - agent_name: Name of the agent registered on this app. + agent_name: Name of the agent registered on this app. For an agent that + belongs to a hosted workflow, pass ``workflow_name`` to resolve it + under its workflow-scoped identity; for an agent registered standalone + via ``agents=`` / ``add_agent`` use its bare name. + workflow_name: Optional owning workflow name. When given, the agent is + resolved under the scoped id ``{workflow_name}-{agent_name}``. Returns: DurableAIAgent[AgentTask] wrapper bound to the orchestration context. @@ -643,7 +703,9 @@ def get_agent( Raises: ValueError: If the requested agent has not been registered. """ - normalized_name = str(agent_name) + normalized_name = ( + workflow_scoped_executor_id(workflow_name, str(agent_name)) if workflow_name else str(agent_name) + ) if normalized_name not in self._agent_metadata: raise ValueError(f"Agent '{normalized_name}' is not registered with this app.") diff --git a/python/packages/azurefunctions/tests/integration_tests/test_09_workflow_shared_state.py b/python/packages/azurefunctions/tests/integration_tests/test_09_workflow_shared_state.py index ddf6bc0f469..27836d04f8d 100644 --- a/python/packages/azurefunctions/tests/integration_tests/test_09_workflow_shared_state.py +++ b/python/packages/azurefunctions/tests/integration_tests/test_09_workflow_shared_state.py @@ -21,6 +21,9 @@ import pytest +# Must match the workflow name in samples/04-hosting/azure_functions/09_workflow_shared_state/function_app.py +WORKFLOW_NAME = "email_triage_shared_state" + # Module-level markers - applied to all tests in this file pytestmark = [ pytest.mark.flaky, @@ -45,7 +48,7 @@ def test_workflow_with_spam_email(self) -> None: spam_content = "URGENT! You have won $1,000,000! Click here to claim your prize now before it expires!" # Start orchestration with spam email - response = self.helper.post_text(f"{self.base_url}/api/workflow/run", spam_content) + response = self.helper.post_text(f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/run", spam_content) assert response.status_code == 202 data = response.json() assert "instanceId" in data @@ -64,7 +67,7 @@ def test_workflow_with_legitimate_email(self) -> None: ) # Start orchestration with legitimate email - response = self.helper.post_text(f"{self.base_url}/api/workflow/run", legitimate_content) + response = self.helper.post_text(f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/run", legitimate_content) assert response.status_code == 202 data = response.json() assert "instanceId" in data @@ -83,7 +86,7 @@ def test_workflow_with_phishing_email(self) -> None: ) # Start orchestration with phishing email - response = self.helper.post_text(f"{self.base_url}/api/workflow/run", phishing_content) + response = self.helper.post_text(f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/run", phishing_content) assert response.status_code == 202 data = response.json() assert "instanceId" in data diff --git a/python/packages/azurefunctions/tests/integration_tests/test_10_workflow_no_shared_state.py b/python/packages/azurefunctions/tests/integration_tests/test_10_workflow_no_shared_state.py index 88739057f03..479b87cf0b7 100644 --- a/python/packages/azurefunctions/tests/integration_tests/test_10_workflow_no_shared_state.py +++ b/python/packages/azurefunctions/tests/integration_tests/test_10_workflow_no_shared_state.py @@ -21,6 +21,9 @@ import pytest +# Must match the workflow name in samples/04-hosting/azure_functions/10_workflow_no_shared_state/function_app.py +WORKFLOW_NAME = "email_triage" + # Module-level markers - applied to all tests in this file pytestmark = [ pytest.mark.flaky, @@ -51,7 +54,7 @@ def test_workflow_with_spam_email(self) -> None: } # Start orchestration - response = self.helper.post_json(f"{self.base_url}/api/workflow/run", payload) + response = self.helper.post_json(f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/run", payload) assert response.status_code == 202 data = response.json() assert "instanceId" in data @@ -73,7 +76,7 @@ def test_workflow_with_legitimate_email(self) -> None: } # Start orchestration - response = self.helper.post_json(f"{self.base_url}/api/workflow/run", payload) + response = self.helper.post_json(f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/run", payload) assert response.status_code == 202 data = response.json() assert "instanceId" in data @@ -92,13 +95,13 @@ def test_workflow_status_endpoint(self) -> None: } # Start orchestration - response = self.helper.post_json(f"{self.base_url}/api/workflow/run", payload) + response = self.helper.post_json(f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/run", payload) assert response.status_code == 202 data = response.json() instance_id = data["instanceId"] # Check status using the workflow status endpoint - status_response = self.helper.get(f"{self.base_url}/api/workflow/status/{instance_id}") + status_response = self.helper.get(f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/status/{instance_id}") assert status_response.status_code == 200 status = status_response.json() assert "instanceId" in status diff --git a/python/packages/azurefunctions/tests/integration_tests/test_11_workflow_parallel.py b/python/packages/azurefunctions/tests/integration_tests/test_11_workflow_parallel.py index 831ff6f4adc..a8d8364f4ef 100644 --- a/python/packages/azurefunctions/tests/integration_tests/test_11_workflow_parallel.py +++ b/python/packages/azurefunctions/tests/integration_tests/test_11_workflow_parallel.py @@ -23,6 +23,9 @@ import pytest +# Must match the workflow name in samples/04-hosting/azure_functions/11_workflow_parallel/function_app.py +WORKFLOW_NAME = "parallel_review" + # Module-level markers - applied to all tests in this file pytestmark = [ pytest.mark.flaky, @@ -62,14 +65,14 @@ def test_parallel_workflow_end_to_end(self) -> None: } # Start the orchestration. - response = self.helper.post_json(f"{self.base_url}/api/workflow/run", payload) + response = self.helper.post_json(f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/run", payload) assert response.status_code == 202 data = response.json() instance_id = data["instanceId"] assert "statusQueryGetUri" in data # The status endpoint reflects the started instance. - status_response = self.helper.get(f"{self.base_url}/api/workflow/status/{instance_id}") + status_response = self.helper.get(f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/status/{instance_id}") assert status_response.status_code == 200 assert status_response.json()["instanceId"] == instance_id diff --git a/python/packages/azurefunctions/tests/integration_tests/test_12_workflow_hitl.py b/python/packages/azurefunctions/tests/integration_tests/test_12_workflow_hitl.py index 2b31c17c7a0..aec5e6615e2 100644 --- a/python/packages/azurefunctions/tests/integration_tests/test_12_workflow_hitl.py +++ b/python/packages/azurefunctions/tests/integration_tests/test_12_workflow_hitl.py @@ -31,6 +31,9 @@ pytest.mark.usefixtures("function_app_for_test"), ] +# Must match the workflow name in samples/04-hosting/azure_functions/12_workflow_hitl/function_app.py +WORKFLOW_NAME = "content_moderation" + @pytest.mark.orchestration class TestWorkflowHITL: @@ -46,7 +49,7 @@ def _wait_for_hitl_request(self, instance_id: str, timeout: int = 40) -> dict: """Polls for a pending HITL request.""" start_time = time.time() while time.time() - start_time < timeout: - status_response = self.helper.get(f"{self.base_url}/api/workflow/status/{instance_id}") + status_response = self.helper.get(f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/status/{instance_id}") if status_response.status_code == 200: status = status_response.json() pending_requests = status.get("pendingHumanInputRequests", []) @@ -69,7 +72,7 @@ def test_hitl_workflow_approval(self) -> None: } # Start orchestration - response = self.helper.post_json(f"{self.base_url}/api/workflow/run", payload) + response = self.helper.post_json(f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/run", payload) assert response.status_code == 202 data = response.json() assert "instanceId" in data @@ -89,7 +92,7 @@ def test_hitl_workflow_approval(self) -> None: # Send approval approval_response = self.helper.post_json( - f"{self.base_url}/api/workflow/respond/{instance_id}/{request_id}", + f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/respond/{instance_id}/{request_id}", {"approved": True, "reviewer_notes": "Content is appropriate and well-written."}, ) assert approval_response.status_code == 200 @@ -112,7 +115,7 @@ def test_hitl_workflow_rejection(self) -> None: } # Start orchestration - response = self.helper.post_json(f"{self.base_url}/api/workflow/run", payload) + response = self.helper.post_json(f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/run", payload) assert response.status_code == 202 data = response.json() instance_id = data["instanceId"] @@ -127,7 +130,7 @@ def test_hitl_workflow_rejection(self) -> None: # Send rejection rejection_response = self.helper.post_json( - f"{self.base_url}/api/workflow/respond/{instance_id}/{request_id}", + f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/respond/{instance_id}/{request_id}", {"approved": False, "reviewer_notes": "Content appears to be spam/scam material."}, ) assert rejection_response.status_code == 200 @@ -150,7 +153,7 @@ def test_hitl_workflow_status_endpoint(self) -> None: } # Start orchestration - response = self.helper.post_json(f"{self.base_url}/api/workflow/run", payload) + response = self.helper.post_json(f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/run", payload) assert response.status_code == 202 data = response.json() instance_id = data["instanceId"] @@ -169,7 +172,7 @@ def test_hitl_workflow_status_endpoint(self) -> None: if pending_requests: request_id = pending_requests[0]["requestId"] self.helper.post_json( - f"{self.base_url}/api/workflow/respond/{instance_id}/{request_id}", + f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/respond/{instance_id}/{request_id}", {"approved": True, "reviewer_notes": ""}, ) @@ -189,7 +192,7 @@ def test_hitl_workflow_with_neutral_content(self) -> None: } # Start orchestration - response = self.helper.post_json(f"{self.base_url}/api/workflow/run", payload) + response = self.helper.post_json(f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/run", payload) assert response.status_code == 202 data = response.json() instance_id = data["instanceId"] @@ -203,7 +206,7 @@ def test_hitl_workflow_with_neutral_content(self) -> None: # Approve self.helper.post_json( - f"{self.base_url}/api/workflow/respond/{instance_id}/{request_id}", + f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/respond/{instance_id}/{request_id}", {"approved": True, "reviewer_notes": "Approved after review."}, ) diff --git a/python/packages/azurefunctions/tests/test_app.py b/python/packages/azurefunctions/tests/test_app.py index efed9f32375..035b7b46d87 100644 --- a/python/packages/azurefunctions/tests/test_app.py +++ b/python/packages/azurefunctions/tests/test_app.py @@ -1438,37 +1438,75 @@ def test_init_with_workflow_and_explicit_agent_does_not_raise(self) -> None: assert "SharedAgent" in app.agents - def test_build_status_url(self) -> None: - """Test _build_status_url constructs correct URL.""" + def test_init_with_multiple_workflows_registers_each(self) -> None: + """The workflows= list registers each workflow keyed by name.""" + from agent_framework import Executor + + def _wf(name: str, executor_id: str) -> Mock: + ex = Mock(spec=Executor) + ex.id = executor_id + wf = Mock() + wf.name = name + wf.executors = {executor_id: ex} + return wf + + with ( + patch.object(AgentFunctionApp, "_setup_executor_activity") as setup_exec, + patch.object(AgentFunctionApp, "_setup_workflow_orchestration") as setup_orch, + ): + app = AgentFunctionApp(workflows=[_wf("orders", "router"), _wf("billing", "router")]) + + assert set(app.workflows) == {"orders", "billing"} + assert app.workflow is None # ambiguous with >1 workflow + assert setup_exec.call_count == 2 + assert setup_orch.call_count == 2 + + def test_init_rejects_duplicate_workflow_name(self) -> None: + """Two workflows with the same name are rejected.""" + from agent_framework import Executor + + def _wf(executor_id: str) -> Mock: + ex = Mock(spec=Executor) + ex.id = executor_id + wf = Mock() + wf.name = "orders" + wf.executors = {executor_id: ex} + return wf + + with ( + patch.object(AgentFunctionApp, "_setup_executor_activity"), + patch.object(AgentFunctionApp, "_setup_workflow_orchestration"), + pytest.raises(ValueError, match="already registered"), + ): + AgentFunctionApp(workflows=[_wf("a"), _wf("b")]) + + def test_init_rejects_mapping_key_mismatch(self) -> None: + """A workflows mapping whose key disagrees with Workflow.name is rejected.""" mock_workflow = Mock() - mock_workflow.name = "test_workflow" + mock_workflow.name = "orders" mock_workflow.executors = {} with ( patch.object(AgentFunctionApp, "_setup_executor_activity"), patch.object(AgentFunctionApp, "_setup_workflow_orchestration"), + pytest.raises(ValueError, match="does not match"), ): - app = AgentFunctionApp(workflow=mock_workflow) - - url = app._build_status_url("http://localhost:7071/api/workflow/run", "instance-123") + AgentFunctionApp(workflows={"wrong_key": mock_workflow}) - assert url == "http://localhost:7071/api/workflow/status/instance-123" + def test_init_rejects_auto_generated_workflow_name(self) -> None: + """An auto-generated WorkflowBuilder name is rejected.""" + import uuid - def test_build_status_url_handles_trailing_slash(self) -> None: - """Test _build_status_url handles URLs without /api/ correctly.""" mock_workflow = Mock() - mock_workflow.name = "test_workflow" + mock_workflow.name = f"WorkflowBuilder-{uuid.uuid4()}" mock_workflow.executors = {} with ( patch.object(AgentFunctionApp, "_setup_executor_activity"), patch.object(AgentFunctionApp, "_setup_workflow_orchestration"), + pytest.raises(ValueError, match="auto-generated"), ): - app = AgentFunctionApp(workflow=mock_workflow) - - url = app._build_status_url("http://localhost:7071/", "instance-456") - - assert "instance-456" in url + AgentFunctionApp(workflow=mock_workflow) # NOTE: State snapshot/diff tests were moved to durabletask once the activity @@ -1552,18 +1590,18 @@ def test_accepts_matching_workflow_orchestration(self, name: str) -> None: app = self._app_for("orders") status = Mock() status.name = name - assert app._is_workflow_orchestration(status) is True + assert app._is_owned_orchestration(status, "orders") is True def test_rejects_none_status(self) -> None: # client.get_status returns None when no instance resolves for the ID. app = self._app_for("orders") - assert app._is_workflow_orchestration(None) is False + assert app._is_owned_orchestration(None, "orders") is False def test_rejects_status_without_name(self) -> None: app = self._app_for("orders") status = Mock() status.name = None - assert app._is_workflow_orchestration(status) is False + assert app._is_owned_orchestration(status, "orders") is False @pytest.mark.parametrize( "other_name", @@ -1578,7 +1616,7 @@ def test_rejects_other_orchestration_name(self, other_name: str) -> None: app = self._app_for("orders") status = Mock() status.name = other_name - assert app._is_workflow_orchestration(status) is False + assert app._is_owned_orchestration(status, "orders") is False if __name__ == "__main__": diff --git a/python/samples/04-hosting/azure_functions/12_workflow_hitl/function_app.py b/python/samples/04-hosting/azure_functions/12_workflow_hitl/function_app.py index 42af7ddf1c9..e51f2e98a87 100644 --- a/python/samples/04-hosting/azure_functions/12_workflow_hitl/function_app.py +++ b/python/samples/04-hosting/azure_functions/12_workflow_hitl/function_app.py @@ -411,11 +411,12 @@ def launch(durable: bool = True) -> AgentFunctionApp | None: """ if durable: # Azure Functions mode with Durable Functions - # The app automatically provides HITL endpoints: - # - POST /api/workflow/run - Start the workflow - # - GET /api/workflow/status/{instanceId} - Check status and pending HITL requests - # - POST /api/workflow/respond/{instanceId}/{requestId} - Send HITL response - # - GET /api/health - Health check + # The app automatically provides per-workflow HITL endpoints (workflow name + # "content_moderation"): + # - POST /api/workflow/content_moderation/run - Start the workflow + # - GET /api/workflow/content_moderation/status/{instanceId} - Status + pending HITL requests + # - POST /api/workflow/content_moderation/respond/{instanceId}/{requestId} - Send HITL response + # - GET /api/health - Health check workflow = _create_workflow() return AgentFunctionApp(workflow=workflow, enable_health_check=True) # Pure MAF mode with DevUI for local development From d3fa3fec4d7c86646a662c2bee44c1af141738e4 Mon Sep 17 00:00:00 2001 From: Ahmed Muhsin Date: Tue, 23 Jun 2026 17:43:49 -0400 Subject: [PATCH 04/12] feat(durabletask): sub-workflows via durable child orchestrations (phase 3) Run WorkflowExecutor nodes as durable child orchestrations on both hosts. - Protocol: add call_sub_orchestrator to WorkflowOrchestrationContext, implemented by the durabletask and Azure Functions adapters. - Registration: planner classifies WorkflowExecutor as subworkflow_executors; collect_hosted_workflows walks nested workflows (parent first, deduped by name). Both hosts recursively register every nested workflow's orchestration/agents/activities once; only top-level workflows get HTTP routes. Names validated up front before any registration side effects. - Orchestrator: dispatch WorkflowExecutor nodes via call_sub_orchestrator(dafx-{innerName}) with deterministic child instance ids ({instanceId}::{executorId}::{counter}), a trusted-input marker carrying nesting depth (bounded at 25), and outputs routed as messages (default) or parent outputs (allow_direct_output). - Tests: registration/collect, orchestrator prepare/process/unwrap, recursive registration on both hosts. Sample: 11_subworkflow. --- .../agent_framework_azurefunctions/_app.py | 76 +++++-- .../_workflow_af_context.py | 4 + .../packages/azurefunctions/tests/test_app.py | 113 ++++++++++ .../agent_framework_durabletask/__init__.py | 3 +- .../agent_framework_durabletask/_worker.py | 59 ++++- .../_workflows/context.py | 20 ++ .../_workflows/dt_context.py | 3 + .../_workflows/orchestrator.py | 178 ++++++++++++++- .../_workflows/registration.py | 62 ++++- .../tests/test_subworkflow_orchestration.py | 143 ++++++++++++ .../packages/durabletask/tests/test_worker.py | 101 +++++++++ .../tests/test_workflow_registration.py | 67 +++++- .../durabletask/11_subworkflow/README.md | 69 ++++++ .../durabletask/11_subworkflow/client.py | 78 +++++++ .../durabletask/11_subworkflow/worker.py | 211 ++++++++++++++++++ 15 files changed, 1148 insertions(+), 39 deletions(-) create mode 100644 python/packages/durabletask/tests/test_subworkflow_orchestration.py create mode 100644 python/samples/04-hosting/durabletask/11_subworkflow/README.md create mode 100644 python/samples/04-hosting/durabletask/11_subworkflow/client.py create mode 100644 python/samples/04-hosting/durabletask/11_subworkflow/worker.py diff --git a/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py b/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py index 099e8a7de86..49efc6e43c8 100644 --- a/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py +++ b/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py @@ -48,6 +48,7 @@ workflow_orchestrator_name, workflow_scoped_executor_id, ) +from agent_framework_durabletask._workflows.registration import collect_hosted_workflows from agent_framework_durabletask._workflows.serialization import strip_pickle_markers from ._entities import create_agent_entity @@ -246,6 +247,9 @@ def __init__( # Initialize agent metadata dictionary self._agent_metadata = {} self._workflows: dict[str, Workflow] = {} + # Every workflow whose orchestration has been registered (top-level plus + # nested sub-workflows), so a shared sub-workflow is registered once. + self._registered_orchestrations: set[str] = set() self.enable_health_check = enable_health_check self.enable_http_endpoints = enable_http_endpoints self.enable_mcp_tool_trigger = enable_mcp_tool_trigger @@ -310,22 +314,49 @@ def _collect_workflows( return collected def _register_workflow(self, workflow: Workflow) -> None: - """Register one workflow's durable primitives and HTTP routes. + """Register a top-level workflow's durable primitives and HTTP routes. - The "what to register" decision (agent -> entity, non-agent -> activity) is - shared with the standalone durabletask host via ``plan_workflow_registration``. + The "what to register" decision (agent -> entity, non-agent -> activity, + sub-workflow -> child orchestration) is shared with the standalone + durabletask host via ``plan_workflow_registration``. Nested sub-workflows + have their orchestration primitives registered (deduped by name) so the + parent can drive them as child orchestrations, but only the top-level + workflow gets HTTP ``workflow/{name}/...`` routes. Raises: - ValueError: If the workflow name is missing/invalid/auto-generated, or a - workflow with the same name is already registered. + ValueError: If the workflow (or a nested sub-workflow) name is + missing/invalid/auto-generated, or a top-level workflow with the + same name is already registered. """ - # Durable names are derived from the workflow name and must be stable across - # restarts, so reject an unnamed/auto-generated workflow up front. validate_workflow_name(workflow.name) if workflow.name in self._workflows: raise ValueError(f"Workflow '{workflow.name}' is already registered on this app.") + + # Validate the whole composition (top-level plus every nested sub-workflow) + # up front, so an invalid/auto-generated nested name fails before any + # registration side effects leave the app partially configured. + hosted_workflows = list(collect_hosted_workflows(workflow)) + for hosted in hosted_workflows: + validate_workflow_name(hosted.name) + self._workflows[workflow.name] = workflow + # Register orchestration primitives for the top-level workflow and every + # nested sub-workflow (deduped by name). + for hosted in hosted_workflows: + if hosted.name in self._registered_orchestrations: + continue + self._register_workflow_primitives(hosted) + + # HTTP routes are only exposed for the top-level workflow; sub-workflows are + # driven by the parent via call_sub_orchestrator, not addressed directly. + self._register_workflow_routes(workflow) + + def _register_workflow_primitives(self, workflow: Workflow) -> None: + """Register one workflow's entities, activities, and orchestrator (no routes).""" + validate_workflow_name(workflow.name) + self._registered_orchestrations.add(workflow.name) + logger.debug("[AgentFunctionApp] Registering workflow '%s'", workflow.name) plan = plan_workflow_registration(workflow) for agent_executor in plan.agent_executors: @@ -341,7 +372,9 @@ def _register_workflow(self, workflow: Workflow) -> None: ) for executor in plan.activity_executors: # Set up a Functions activity trigger for each non-agent executor, scoped - # by workflow name to match the orchestrator's dispatch. + # by workflow name to match the orchestrator's dispatch. WorkflowExecutor + # nodes are not registered here: their inner workflows are registered + # separately and driven as child orchestrations. self._setup_executor_activity(workflow, executor.id) self._setup_workflow_orchestration(workflow) @@ -382,19 +415,19 @@ def executor_activity(inputData: str) -> str: _ = executor_activity def _setup_workflow_orchestration(self, workflow: Workflow) -> None: - """Register a workflow's orchestration and its per-workflow HTTP endpoints. + """Register a workflow's orchestrator function under its ``dafx-{name}`` name. - Routes are scoped by workflow name (``workflow/{name}/run`` etc.) so the URL - shape stays the same whether the app hosts one workflow or many; callers do - not have to change URLs as an app grows. + HTTP routes are registered separately (:meth:`_register_workflow_routes`) and + only for top-level workflows; this orchestrator function is registered for + every hosted workflow (including nested sub-workflows) so the parent can drive + them via ``call_sub_orchestrator``. """ captured_workflow = workflow - workflow_name = workflow.name - orchestrator_name = workflow_orchestrator_name(workflow_name) + orchestrator_name = workflow_orchestrator_name(workflow.name) @self.function_name(orchestrator_name) @self.orchestration_trigger(context_name="context") - def workflow_orchestrator(context: df.DurableOrchestrationContext) -> Any: # type: ignore[type-arg] + def workflow_orchestrator(context: df.DurableOrchestrationContext) -> Any: """Generic orchestrator for running the configured workflow.""" input_data = context.get_input() @@ -409,6 +442,19 @@ def workflow_orchestrator(context: df.DurableOrchestrationContext) -> Any: # ty # Durable Functions runtime extracts return value from StopIteration return outputs # noqa: B901 + # Ensure the orchestrator function is registered (prevents garbage collection) + _ = workflow_orchestrator + + def _register_workflow_routes(self, workflow: Workflow) -> None: + """Register a top-level workflow's per-workflow HTTP endpoints. + + Routes are scoped by workflow name (``workflow/{name}/run`` etc.) so the URL + shape stays the same whether the app hosts one workflow or many; callers do + not have to change URLs as an app grows. + """ + workflow_name = workflow.name + orchestrator_name = workflow_orchestrator_name(workflow_name) + @self.function_name(f"{orchestrator_name}-start") @self.route(route=f"workflow/{workflow_name}/run", methods=["POST"]) @self.durable_client_input(client_name="client") diff --git a/python/packages/azurefunctions/agent_framework_azurefunctions/_workflow_af_context.py b/python/packages/azurefunctions/agent_framework_azurefunctions/_workflow_af_context.py index 89df6721956..eaf99a5a91f 100644 --- a/python/packages/azurefunctions/agent_framework_azurefunctions/_workflow_af_context.py +++ b/python/packages/azurefunctions/agent_framework_azurefunctions/_workflow_af_context.py @@ -67,6 +67,10 @@ def prepare_activity_task(self, activity_name: str, input_json: str) -> Any: orchestration_context: Any = self._context return orchestration_context.call_activity(activity_name, input_json) + def call_sub_orchestrator(self, name: str, input: Any, instance_id: str | None = None) -> Any: + orchestration_context: Any = self._context + return orchestration_context.call_sub_orchestrator(name, input_=input, instance_id=instance_id) + # -- Composite tasks ------------------------------------------------------ def task_all(self, tasks: list[Any]) -> Any: diff --git a/python/packages/azurefunctions/tests/test_app.py b/python/packages/azurefunctions/tests/test_app.py index 035b7b46d87..1420c8b3048 100644 --- a/python/packages/azurefunctions/tests/test_app.py +++ b/python/packages/azurefunctions/tests/test_app.py @@ -1509,6 +1509,119 @@ def test_init_rejects_auto_generated_workflow_name(self) -> None: AgentFunctionApp(workflow=mock_workflow) +class TestAgentFunctionAppSubworkflow: + """Test recursive registration of nested sub-workflows on the Functions app.""" + + @staticmethod + def _inner_agent_wf(name: str, executor_id: str) -> tuple[Mock, Mock]: + from agent_framework import AgentExecutor + + agent = Mock() + agent.name = "InnerAssistant" + ex = Mock(spec=AgentExecutor) + ex.agent = agent + ex.id = executor_id + wf = Mock() + wf.name = name + wf.executors = {executor_id: ex} + return wf, agent + + @staticmethod + def _outer_wf(name: str, inner: Mock, *, sub_ids: tuple[str, ...] = ("sub",)) -> Mock: + from agent_framework import Executor, WorkflowExecutor + + executors: dict[str, Mock] = {} + for sid in sub_ids: + sub = Mock(spec=WorkflowExecutor) + sub.id = sid + sub.workflow = inner + sub.allow_direct_output = False + executors[sid] = sub + router = Mock(spec=Executor) + router.id = "router" + executors["router"] = router + wf = Mock() + wf.name = name + wf.executors = executors + return wf + + def test_nested_workflow_registers_both_orchestrations(self) -> None: + """An outer workflow registers an orchestration for itself and the inner workflow.""" + inner, _ = self._inner_agent_wf("inner", "agent_node") + outer = self._outer_wf("outer", inner) + + with ( + patch.object(AgentFunctionApp, "_setup_executor_activity"), + patch.object(AgentFunctionApp, "_setup_workflow_orchestration") as setup_orch, + ): + app = AgentFunctionApp(workflow=outer) + + assert setup_orch.call_count == 2 + registered = {call.args[0].name for call in setup_orch.call_args_list} + assert registered == {"outer", "inner"} + # Only the top-level workflow is tracked as an addressable workflow. + assert set(app.workflows) == {"outer"} + + def test_nested_workflow_registers_inner_agent_scoped(self) -> None: + """The inner workflow's agent entity is registered under the inner-scoped id.""" + inner, inner_agent = self._inner_agent_wf("inner", "agent_node") + outer = self._outer_wf("outer", inner) + + with ( + patch.object(AgentFunctionApp, "_setup_executor_activity"), + patch.object(AgentFunctionApp, "_setup_workflow_orchestration"), + patch.object(AgentFunctionApp, "_setup_agent_entity") as setup_entity, + ): + app = AgentFunctionApp(workflow=outer) + + setup_entity.assert_called_once() + call_args = setup_entity.call_args.args + assert call_args[0] is inner_agent + assert call_args[1] == "inner-agent_node" + assert "inner-agent_node" in app.agents + + def test_nested_workflow_routes_only_top_level(self) -> None: + """HTTP routes are registered only for the top-level workflow.""" + inner, _ = self._inner_agent_wf("inner", "agent_node") + outer = self._outer_wf("outer", inner) + + with ( + patch.object(AgentFunctionApp, "_setup_executor_activity"), + patch.object(AgentFunctionApp, "_setup_workflow_orchestration"), + patch.object(AgentFunctionApp, "_register_workflow_routes") as routes, + ): + AgentFunctionApp(workflow=outer) + + routes.assert_called_once() + assert routes.call_args.args[0] is outer + + def test_shared_subworkflow_registered_once(self) -> None: + """A sub-workflow reused by two nodes registers its orchestration only once.""" + inner, _ = self._inner_agent_wf("inner", "agent_node") + outer = self._outer_wf("outer", inner, sub_ids=("sub_a", "sub_b")) + + with ( + patch.object(AgentFunctionApp, "_setup_executor_activity"), + patch.object(AgentFunctionApp, "_setup_workflow_orchestration") as setup_orch, + ): + AgentFunctionApp(workflow=outer) + + registered = sorted(call.args[0].name for call in setup_orch.call_args_list) + assert registered == ["inner", "outer"] + + def test_nested_workflow_with_invalid_name_is_rejected(self) -> None: + """A nested sub-workflow must also have a valid, stable name.""" + inner, _ = self._inner_agent_wf("has space", "agent_node") + outer = self._outer_wf("outer", inner) + + with ( + patch.object(AgentFunctionApp, "_setup_executor_activity"), + patch.object(AgentFunctionApp, "_setup_workflow_orchestration"), + pytest.raises(ValueError, match="invalid"), + ): + AgentFunctionApp(workflow=outer) + + # NOTE: State snapshot/diff tests were moved to durabletask once the activity # execution body was extracted into the host-agnostic execute_workflow_activity. # See packages/durabletask/tests/test_workflow_activity.py. diff --git a/python/packages/durabletask/agent_framework_durabletask/__init__.py b/python/packages/durabletask/agent_framework_durabletask/__init__.py index 4a51d476cdd..3b4f185bdcd 100644 --- a/python/packages/durabletask/agent_framework_durabletask/__init__.py +++ b/python/packages/durabletask/agent_framework_durabletask/__init__.py @@ -63,7 +63,7 @@ workflow_orchestrator_name, ) from ._workflows.orchestrator import WORKFLOW_ORCHESTRATOR_NAME, run_workflow_orchestrator -from ._workflows.registration import WorkflowRegistrationPlan, plan_workflow_registration +from ._workflows.registration import WorkflowRegistrationPlan, collect_hosted_workflows, plan_workflow_registration from ._workflows.runner_context import CapturingRunnerContext from ._workflows.serialization import deserialize_workflow_output @@ -126,6 +126,7 @@ "WorkflowOrchestrationContext", "WorkflowRegistrationPlan", "__version__", + "collect_hosted_workflows", "deserialize_workflow_output", "ensure_response_format", "execute_workflow_activity", diff --git a/python/packages/durabletask/agent_framework_durabletask/_worker.py b/python/packages/durabletask/agent_framework_durabletask/_worker.py index 7d1a44ba099..8ef949bf93b 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_worker.py +++ b/python/packages/durabletask/agent_framework_durabletask/_worker.py @@ -28,7 +28,7 @@ workflow_scoped_executor_id, ) from ._workflows.orchestrator import run_workflow_orchestrator -from ._workflows.registration import plan_workflow_registration +from ._workflows.registration import collect_hosted_workflows, plan_workflow_registration logger = logging.getLogger("agent_framework.durabletask") @@ -88,6 +88,9 @@ def __init__( self._callback = callback self._registered_agents: dict[str, SupportsAgentRun] = {} self._workflows: dict[str, Workflow] = {} + # Every workflow whose orchestration has been registered (top-level plus nested + # sub-workflows), so a sub-workflow shared across the tree is registered once. + self._registered_orchestrations: set[str] = set() logger.debug("[DurableAIAgentWorker] Initialized with worker type: %s", type(worker).__name__) def add_agent( @@ -202,25 +205,58 @@ def configure_workflow( ``dafx-{name}``; activities/entities ``dafx-{name}-{executorId}``) so two co-hosted workflows that reuse an executor id do not collide. + Sub-workflows nest: if the workflow contains + :class:`~agent_framework.WorkflowExecutor` nodes, each inner workflow's + orchestration/agents/activities are registered too (deduped by name) so the + parent can drive them as durable child orchestrations. + Args: workflow: The MAF :class:`Workflow` to register. Must have an explicit, stable :attr:`Workflow.name` (an auto-generated ``WorkflowBuilder-`` name is rejected because it is not stable - across restarts and would break durable resume). + across restarts and would break durable resume). Every nested + sub-workflow must likewise be named. callback: Optional callback for agent response notifications. Raises: - ValueError: If the workflow name is missing, invalid, auto-generated, - or already registered on this worker. + ValueError: If the workflow (or a nested sub-workflow) name is missing, + invalid, or auto-generated, or if the top-level workflow name is + already registered on this worker. """ workflow_name = workflow.name validate_workflow_name(workflow_name) if workflow_name in self._workflows: raise ValueError(f"Workflow '{workflow_name}' is already registered on this worker.") + + # Validate the whole composition (top-level plus every nested sub-workflow) + # up front, so an invalid/auto-generated nested name fails before any + # registration side effects leave the worker partially configured. + hosted_workflows = list(collect_hosted_workflows(workflow)) + for hosted in hosted_workflows: + validate_workflow_name(hosted.name) + self._workflows[workflow_name] = workflow - # The "what to register" decision (agent -> entity, non-agent -> activity) - # is shared with the Azure Functions host via plan_workflow_registration. + # Register the top-level workflow and every nested sub-workflow (deduped by + # name), so the parent can drive sub-workflows as durable child orchestrations. + for hosted in hosted_workflows: + if hosted.name in self._registered_orchestrations: + continue + self._register_single_workflow(hosted, callback) + + def _register_single_workflow( + self, + workflow: Workflow, + callback: AgentResponseCallbackProtocol | None, + ) -> None: + """Register one workflow's durable primitives (no recursion into sub-workflows). + + The "what to register" decision (agent -> entity, non-agent -> activity, + sub-workflow -> child orchestration) is shared with the Azure Functions host + via ``plan_workflow_registration``. + """ + validate_workflow_name(workflow.name) + self._registered_orchestrations.add(workflow.name) plan = plan_workflow_registration(workflow) # Register agent executors as durable entities, scoped by workflow name so @@ -229,11 +265,14 @@ def configure_workflow( # dispatches to); the entity *key* at run time is the orchestration instance # id, which keeps conversation state isolated per run. for agent_executor in plan.agent_executors: - scoped_id = workflow_scoped_executor_id(workflow_name, agent_executor.id) + scoped_id = workflow_scoped_executor_id(workflow.name, agent_executor.id) if scoped_id not in self._registered_agents: self.add_agent(agent_executor.agent, callback=callback, entity_id=scoped_id) # Register non-agent executors as durable activities, scoped by workflow name. + # WorkflowExecutor nodes are intentionally not registered as activities: their + # inner workflows are registered separately (above, via collect_hosted_workflows) + # and driven as child orchestrations. for executor in plan.activity_executors: self._register_executor_activity(workflow, executor) @@ -241,11 +280,13 @@ def configure_workflow( self._register_workflow_orchestrator(workflow) logger.info( - "[DurableAIAgentWorker] Workflow '%s' configured with %d executors (%d agents, %d activities)", - workflow_name, + "[DurableAIAgentWorker] Workflow '%s' configured with %d executors " + "(%d agents, %d activities, %d sub-workflows)", + workflow.name, len(workflow.executors), len(plan.agent_executors), len(plan.activity_executors), + len(plan.subworkflow_executors), ) def _register_executor_activity(self, workflow: Workflow, executor: Any) -> None: diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/context.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/context.py index 8361f687549..d757d00ecc1 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/context.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/context.py @@ -98,6 +98,26 @@ def prepare_activity_task(self, activity_name: str, input_json: str) -> Any: """ ... + def call_sub_orchestrator(self, name: str, input: Any, instance_id: str | None = None) -> Any: + """Create a yieldable task that runs a nested workflow as a child orchestration. + + Used to drive a :class:`~agent_framework.WorkflowExecutor` node: the inner + workflow runs as its own durable orchestration (named ``dafx-{innerName}``), + independently checkpointed and observable, and its result flows back into + the parent's edge routing like any other executor's output. + + Args: + name: The registered orchestration name to invoke (``dafx-{innerName}``). + input: The JSON-serializable input for the child orchestration. + instance_id: Optional deterministic child instance ID. The orchestrator + derives one from the parent instance so nested runs are discoverable + and replay-safe. + + Returns: + A yieldable task whose result is the child orchestration's output. + """ + ... + def task_all(self, tasks: list[Any]) -> Any: """Create a yieldable composite task that completes when *all* tasks complete. diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/dt_context.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/dt_context.py index e517a757793..7388a0acf59 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/dt_context.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/dt_context.py @@ -66,6 +66,9 @@ def prepare_agent_task(self, executor_id: str, message: str, orchestration_insta def prepare_activity_task(self, activity_name: str, input_json: str) -> Any: return cast(Any, self._context.call_activity(activity_name, input=input_json)) + def call_sub_orchestrator(self, name: str, input: Any, instance_id: str | None = None) -> Any: + return cast(Any, self._context.call_sub_orchestrator(name, input=input, instance_id=instance_id)) + # -- Composite tasks ------------------------------------------------------ def task_all(self, tasks: list[Any]) -> Any: diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py index 22f7367484c..c666553070a 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py @@ -37,6 +37,7 @@ Message, Workflow, WorkflowConvergenceException, + WorkflowExecutor, ) from agent_framework._workflows._edge import ( Edge, @@ -49,7 +50,7 @@ from agent_framework._workflows._state import State from .context import WorkflowOrchestrationContext -from .naming import workflow_executor_activity_name, workflow_scoped_executor_id +from .naming import workflow_executor_activity_name, workflow_orchestrator_name, workflow_scoped_executor_id from .serialization import ( deserialize_value, reconstruct_to_type, @@ -69,6 +70,17 @@ SOURCE_ORCHESTRATOR = "__orchestrator__" SOURCE_HITL_RESPONSE = "__hitl_response__" +# A WorkflowExecutor node runs its inner workflow as a durable child orchestration. +# The parent passes the node's input wrapped in this marker so the child orchestrator +# can tell a trusted sub-orchestration payload (serialized by the parent) apart from +# untrusted top-level client input, and can track nesting depth to bound recursion. +SUBWORKFLOW_INPUT_KEY = "__subworkflow_input__" +SUBWORKFLOW_DEPTH_KEY = "__subworkflow_depth__" + +# Maximum sub-workflow nesting depth. Mutually-nested workflows (A hosts B hosts A) +# would otherwise spawn child orchestrations without bound; this caps the tree. +MAX_SUBWORKFLOW_DEPTH = 25 + # Name of the auto-generated orchestrator registered by # ``DurableAIAgentWorker.configure_workflow`` (and the Azure Functions host). # Standalone clients start a configured workflow by scheduling an orchestration @@ -93,6 +105,7 @@ class TaskType(Enum): AGENT = "agent" ACTIVITY = "activity" + SUBWORKFLOW = "subworkflow" @dataclass @@ -273,6 +286,28 @@ def _prepare_activity_task( return ctx.prepare_activity_task(activity_name, activity_input_json) +def _prepare_subworkflow_task( + ctx: WorkflowOrchestrationContext, + executor: WorkflowExecutor, + message: Any, + child_instance_id: str, + depth: int, +) -> Any: + """Prepare a child-orchestration task that runs a ``WorkflowExecutor``'s inner workflow. + + The inner workflow runs as its own durable orchestration (``dafx-{innerName}``), + so its executors are independently durable/observable. The node's message is + serialized and wrapped in a marker so the child orchestrator reconstructs the + original typed object (trusted internal input) and tracks nesting depth. + """ + inner_orchestration_name = workflow_orchestrator_name(executor.workflow.name) + child_input = { + SUBWORKFLOW_INPUT_KEY: serialize_value(message), + SUBWORKFLOW_DEPTH_KEY: depth + 1, + } + return ctx.call_sub_orchestrator(inner_orchestration_name, child_input, instance_id=child_instance_id) + + # ============================================================================ # Result Processing Helpers # ============================================================================ @@ -342,6 +377,47 @@ def _process_activity_result( ) +def _process_subworkflow_result( + child_result: Any, + executor: WorkflowExecutor, + workflow_outputs: list[Any], +) -> ExecutorResult: + """Process a child orchestration's result into an ``ExecutorResult``. + + The child orchestration returns the inner workflow's outputs (a list of values + already encoded by the inner activity via ``serialize_value``). Mirroring the + in-process :class:`~agent_framework.WorkflowExecutor`: + + * ``allow_direct_output`` is ``False`` (default): each inner output becomes a + message routed through the ``WorkflowExecutor`` node's outgoing edges. + * ``allow_direct_output`` is ``True``: each inner output becomes one of the + parent workflow's own outputs. + """ + if isinstance(child_result, list): + outputs: list[Any] = cast("list[Any]", child_result) + elif child_result is None: + outputs = [] + else: + outputs = [child_result] + + sent_messages: list[dict[str, Any]] = [] + if executor.allow_direct_output: + # Inner outputs are already serialized (serialize_value); workflow_outputs + # holds serialized values, so they are directly compatible. + workflow_outputs.extend(outputs) + else: + # Route each inner output as a message from the node; _route_result_messages + # deserializes each "message" value before routing through edge groups. + sent_messages = [{"message": output, "target_id": None, "source_id": executor.id} for output in outputs] + + return ExecutorResult( + executor_id=executor.id, + output_message=None, + activity_result={"sent_messages": sent_messages, "outputs": [], "events": []}, + task_type=TaskType.SUBWORKFLOW, + ) + + # ============================================================================ # Routing Helpers # ============================================================================ @@ -519,6 +595,23 @@ def _select_primary_input_type(executor: Executor) -> type | None: return None +def _try_unwrap_subworkflow_input(raw_value: Any) -> tuple[bool, Any]: + """Detect and unwrap a sub-orchestration input marker. + + Returns ``(True, inner)`` when ``raw_value`` is the parent-supplied marker + payload (see :data:`SUBWORKFLOW_INPUT_KEY`), with ``inner`` reconstructed from + the wrapped, parent-serialized message. Returns ``(False, None)`` otherwise. + + Kept separate from :func:`_coerce_initial_input` so the ``isinstance`` narrowing + here does not leak into that function's untyped ``raw_value`` coercion path. + """ + if isinstance(raw_value, dict): + marker_input = cast("dict[str, Any]", raw_value) + if SUBWORKFLOW_INPUT_KEY in marker_input: + return True, deserialize_value(marker_input[SUBWORKFLOW_INPUT_KEY]) + return False, None + + def _coerce_initial_input(workflow: Workflow, raw_value: Any) -> Any: """Coerce the client's initial workflow input to the start executor's type. @@ -533,7 +626,18 @@ def _coerce_initial_input(workflow: Workflow, raw_value: Any) -> Any: * Other executors get their primary declared input type reconstructed (``dict`` -> Pydantic/dataclass, ``str`` -> ``str``, ...) via :func:`reconstruct_to_type`; union/unannotated types pass through unchanged. + + A sub-orchestration payload (a ``WorkflowExecutor`` invoking this workflow as a + child) carries the node's message wrapped in :data:`SUBWORKFLOW_INPUT_KEY`. That + is trusted internal data the parent produced with :func:`serialize_value`, so it + is reconstructed directly to the original typed object -- mirroring the + in-process ``WorkflowExecutor`` which passes its input straight to the inner + workflow -- without the HTTP-boundary pickle-marker stripping. """ + unwrapped, inner_input = _try_unwrap_subworkflow_input(raw_value) + if unwrapped: + return inner_input + start_executor = workflow.executors.get(workflow.start_executor_id) if start_executor is None: return raw_value @@ -649,12 +753,28 @@ def _prepare_all_tasks( workflow: Workflow, pending_messages: dict[str, list[tuple[Any, str]]], shared_state: dict[str, Any] | None, + depth: int, + subworkflow_counter: list[int], ) -> tuple[list[Any], list[TaskMetadata], list[tuple[str, Any, str]]]: """Prepare all pending tasks for parallel execution. Groups agent messages by executor ID so that only the first message per agent runs in the parallel batch. Additional messages to the same agent are returned - for sequential processing. + for sequential processing. A :class:`~agent_framework.WorkflowExecutor` node is + dispatched as a durable child orchestration (one per message), with a + deterministic child instance id derived from the parent so replay is stable. + + Args: + ctx: The orchestration context used to schedule activities, entity calls, + and child orchestrations. + workflow: The workflow whose executors are being dispatched. + pending_messages: Messages to deliver this superstep, grouped by target + executor id, each paired with its source executor id. + shared_state: Optional dict for cross-executor state sharing. + depth: This orchestration's sub-workflow nesting depth, propagated to child + orchestrations so recursion can be bounded. + subworkflow_counter: A single-element mutable counter, persistent across + supersteps, used to derive unique deterministic child instance ids. """ all_tasks: list[Any] = [] task_metadata_list: list[TaskMetadata] = [] @@ -665,10 +785,29 @@ def _prepare_all_tasks( for executor_id, messages_with_sources in pending_messages.items(): executor = workflow.executors[executor_id] is_agent = isinstance(executor, AgentExecutor) + is_subworkflow = isinstance(executor, WorkflowExecutor) for message, source_executor_id in messages_with_sources: if is_agent: agent_messages_by_executor[executor_id].append((executor_id, message, source_executor_id)) + elif is_subworkflow: + # Derive a deterministic, globally-unique child instance id. The counter + # persists across supersteps, so two invocations of the same node (in the + # same or different supersteps, e.g. fan-out) never collide, and the ids + # are stable across orchestration replay. + child_instance_id = f"{ctx.instance_id}::{executor_id}::{subworkflow_counter[0]}" + subworkflow_counter[0] += 1 + logger.debug("Preparing sub-workflow task: %s -> %s", executor_id, child_instance_id) + task = _prepare_subworkflow_task(ctx, executor, message, child_instance_id, depth) + all_tasks.append(task) + task_metadata_list.append( + TaskMetadata( + executor_id=executor_id, + message=message, + source_executor_id=source_executor_id, + task_type=TaskType.SUBWORKFLOW, + ) + ) else: logger.debug("Preparing activity task: %s", executor_id) task = _prepare_activity_task( @@ -732,18 +871,38 @@ def run_workflow_orchestrator( Args: ctx: Host-specific orchestration context adapter. workflow: The MAF Workflow instance to execute. - initial_message: Initial message to send to the start executor. + initial_message: Initial message to send to the start executor. When this + workflow runs as a sub-workflow, this is the parent-supplied marker + payload (see :data:`SUBWORKFLOW_INPUT_KEY`), which also carries the + nesting depth. shared_state: Optional dict for cross-executor state sharing. Returns: List of workflow outputs collected from executor activities. """ + # When invoked as a child orchestration, the initial payload carries the nesting + # depth; bound recursion so mutually-nested workflows cannot spawn unbounded + # child orchestrations. (Top-level runs start at depth 0.) + depth = 0 + if isinstance(initial_message, dict) and SUBWORKFLOW_INPUT_KEY in initial_message: + marker = cast("dict[str, Any]", initial_message) + depth = int(marker.get(SUBWORKFLOW_DEPTH_KEY, 0) or 0) + if depth > MAX_SUBWORKFLOW_DEPTH: + raise RuntimeError( + f"Sub-workflow nesting exceeded the maximum depth of {MAX_SUBWORKFLOW_DEPTH} " + f"(workflow '{workflow.name}'). Check for mutually-nested workflows." + ) + pending_messages: dict[str, list[tuple[Any, str]]] = { workflow.start_executor_id: [(_coerce_initial_input(workflow, initial_message), SOURCE_WORKFLOW_START)] } workflow_outputs: list[Any] = [] iteration = 0 + # Monotonic, replay-stable counter for deriving child orchestration instance ids; + # persists across supersteps so repeated sub-workflow invocations never collide. + subworkflow_counter: list[int] = [0] + # Accumulate workflow events and publish them to the orchestration custom status # after each superstep so an external client can stream progress by polling. # Non-agent executors are run inside a durable activity that captures their events @@ -802,13 +961,14 @@ def publish_live_status(state: str, pending_requests: dict[str, Any] | None = No # Phase 1: Prepare all tasks all_tasks, task_metadata_list, remaining_agent_messages = _prepare_all_tasks( - ctx, workflow, pending_messages, shared_state + ctx, workflow, pending_messages, shared_state, depth, subworkflow_counter ) - # Agents bypass the activity, so synthesize their invoked event here; activity - # executors emit their own events from inside the activity. + # Agents and sub-workflows bypass the per-executor activity, so synthesize their + # invoked event here; activity executors emit their own events from inside the + # activity. for task_meta in task_metadata_list: - if task_meta.task_type == TaskType.AGENT: + if task_meta.task_type in (TaskType.AGENT, TaskType.SUBWORKFLOW): emit_event("executor_invoked", task_meta.executor_id) for invoked_executor_id, _invoked_message, _invoked_source in remaining_agent_messages: emit_event("executor_invoked", invoked_executor_id) @@ -825,6 +985,10 @@ def publish_live_status(state: str, pending_requests: dict[str, Any] | None = No if metadata.task_type == TaskType.AGENT: result = _process_agent_response(raw_result, metadata.executor_id, metadata.message) emit_event("executor_completed", metadata.executor_id) + elif metadata.task_type == TaskType.SUBWORKFLOW: + subworkflow_executor = cast(WorkflowExecutor, workflow.executors[metadata.executor_id]) + result = _process_subworkflow_result(raw_result, subworkflow_executor, workflow_outputs) + emit_event("executor_completed", metadata.executor_id) else: result = _process_activity_result(raw_result, metadata.executor_id, shared_state, workflow_outputs) append_activity_events(result.activity_result) diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/registration.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/registration.py index 684f25323a7..9fd260a2265 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/registration.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/registration.py @@ -5,7 +5,9 @@ A MAF :class:`Workflow` is hosted by turning each graph node into a durable primitive: -- each :class:`AgentExecutor` becomes a durable **entity**, and +- each :class:`AgentExecutor` becomes a durable **entity**, +- each :class:`WorkflowExecutor` (a nested sub-workflow) becomes a durable + **child orchestration**, and - each other :class:`Executor` becomes a durable **activity**, driven by a single workflow **orchestrator**. @@ -17,13 +19,20 @@ decision so each host applies one consistent plan with its own registration mechanism — analogous to .NET's shared ``DurableWorkflowOptions`` feeding host-specific trigger generation. + +Sub-workflows nest: a hosted workflow may contain :class:`WorkflowExecutor` +nodes whose inner workflows must themselves be registered (their orchestrator, +agents, and activities) so the parent can drive them via +``call_sub_orchestrator``. :func:`collect_hosted_workflows` walks that tree so a +host registers every reachable workflow exactly once. """ from __future__ import annotations +from collections.abc import Iterator from dataclasses import dataclass -from agent_framework import AgentExecutor, Executor, Workflow +from agent_framework import AgentExecutor, Executor, Workflow, WorkflowExecutor from .orchestrator import WORKFLOW_ORCHESTRATOR_NAME @@ -39,12 +48,21 @@ class WorkflowRegistrationPlan: identity the orchestrator dispatches to — which keeps ``AgentExecutor(agent, id=...)`` working when the id differs from ``agent.name``. - activity_executors: Non-agent executors to register as durable activities. - orchestrator_name: The orchestrator name to register and to start runs with. + activity_executors: Non-agent, non-subworkflow executors to register as + durable activities. + subworkflow_executors: :class:`WorkflowExecutor` nodes whose inner + workflows are driven as durable child orchestrations. The node itself + is *not* registered as an activity; its inner workflow is registered + separately (see :func:`collect_hosted_workflows`). + orchestrator_name: Deprecated fixed orchestrator name. Hosts derive the + actual per-workflow name via + :func:`~agent_framework_durabletask._workflows.naming.workflow_orchestrator_name`; + this field is retained for source compatibility only. """ agent_executors: list[AgentExecutor] activity_executors: list[Executor] + subworkflow_executors: list[WorkflowExecutor] orchestrator_name: str @@ -56,19 +74,53 @@ def plan_workflow_registration(workflow: Workflow) -> WorkflowRegistrationPlan: Returns: A :class:`WorkflowRegistrationPlan` describing the agent executors - (entities), non-agent executors (activities), and the orchestrator name. + (entities), sub-workflow executors (child orchestrations), the remaining + non-agent executors (activities), and the orchestrator name. """ agent_executors: list[AgentExecutor] = [] activity_executors: list[Executor] = [] + subworkflow_executors: list[WorkflowExecutor] = [] for executor in workflow.executors.values(): if isinstance(executor, AgentExecutor): agent_executors.append(executor) + elif isinstance(executor, WorkflowExecutor): + subworkflow_executors.append(executor) else: activity_executors.append(executor) return WorkflowRegistrationPlan( agent_executors=agent_executors, activity_executors=activity_executors, + subworkflow_executors=subworkflow_executors, orchestrator_name=WORKFLOW_ORCHESTRATOR_NAME, ) + + +def collect_hosted_workflows(workflow: Workflow) -> Iterator[Workflow]: + """Yield ``workflow`` and every nested sub-workflow, deduped by name. + + A host registers the orchestration primitives for each yielded workflow so a + parent orchestration can invoke its sub-workflows as child orchestrations. + Workflows are deduped by :attr:`Workflow.name`: a sub-workflow reused across + the tree (or shared by two top-level workflows) is yielded once. The top-level + ``workflow`` is yielded first. + + Args: + workflow: The top-level workflow to walk. + + Yields: + Each distinct workflow in the nesting tree, parent before child. + """ + seen: set[str] = set() + + def _walk(current: Workflow) -> Iterator[Workflow]: + if current.name in seen: + return + seen.add(current.name) + yield current + plan = plan_workflow_registration(current) + for sub in plan.subworkflow_executors: + yield from _walk(sub.workflow) + + yield from _walk(workflow) diff --git a/python/packages/durabletask/tests/test_subworkflow_orchestration.py b/python/packages/durabletask/tests/test_subworkflow_orchestration.py new file mode 100644 index 00000000000..ec98d0d46f8 --- /dev/null +++ b/python/packages/durabletask/tests/test_subworkflow_orchestration.py @@ -0,0 +1,143 @@ +# Copyright (c) Microsoft. All rights reserved. + +"""Unit tests for sub-workflow (child-orchestration) dispatch and result handling. + +A ``WorkflowExecutor`` node runs its inner workflow as a durable child +orchestration. These tests cover the host-side glue: + +* :func:`_prepare_subworkflow_task` wraps the node's message in a trusted-input + marker (carrying nesting depth) and schedules ``dafx-{innerName}``. +* :func:`_process_subworkflow_result` turns the child's outputs into either + routed messages (default) or parent outputs (``allow_direct_output``). +* :func:`_try_unwrap_subworkflow_input` / :func:`_coerce_initial_input` reconstruct + the original typed object on the child side and bound recursion via depth. +""" + +from unittest.mock import Mock + +from agent_framework import WorkflowExecutor + +from agent_framework_durabletask._workflows.orchestrator import ( + SUBWORKFLOW_DEPTH_KEY, + SUBWORKFLOW_INPUT_KEY, + TaskType, + _coerce_initial_input, + _prepare_subworkflow_task, + _process_subworkflow_result, + _try_unwrap_subworkflow_input, +) +from agent_framework_durabletask._workflows.serialization import deserialize_value + + +def _subworkflow_executor(executor_id: str, inner_name: str, *, allow_direct_output: bool = False) -> Mock: + inner = Mock() + inner.name = inner_name + executor = Mock(spec=WorkflowExecutor) + executor.id = executor_id + executor.workflow = inner + executor.allow_direct_output = allow_direct_output + return executor + + +class TestPrepareSubworkflowTask: + """Dispatch of a ``WorkflowExecutor`` node as a child orchestration.""" + + def test_schedules_inner_orchestration_by_scoped_name(self) -> None: + ctx = Mock() + ctx.call_sub_orchestrator.return_value = "task-sentinel" + executor = _subworkflow_executor("sub-node", "inner_wf") + + task = _prepare_subworkflow_task(ctx, executor, "hello", "parent::sub-node::0", depth=0) + + assert task == "task-sentinel" + ctx.call_sub_orchestrator.assert_called_once() + args, kwargs = ctx.call_sub_orchestrator.call_args + assert args[0] == "dafx-inner_wf" + assert kwargs["instance_id"] == "parent::sub-node::0" + + def test_wraps_message_in_marker_with_incremented_depth(self) -> None: + ctx = Mock() + executor = _subworkflow_executor("sub-node", "inner_wf") + + _prepare_subworkflow_task(ctx, executor, "payload", "child-id", depth=3) + + args, _ = ctx.call_sub_orchestrator.call_args + child_input = args[1] + assert child_input[SUBWORKFLOW_DEPTH_KEY] == 4 + # The wrapped payload round-trips back to the original message. + assert deserialize_value(child_input[SUBWORKFLOW_INPUT_KEY]) == "payload" + + +class TestProcessSubworkflowResult: + """Conversion of a child orchestration's outputs into an ``ExecutorResult``.""" + + def test_default_routes_outputs_as_messages(self) -> None: + executor = _subworkflow_executor("sub-node", "inner_wf", allow_direct_output=False) + workflow_outputs: list[object] = [] + + result = _process_subworkflow_result(["a", "b"], executor, workflow_outputs) + + assert result.task_type == TaskType.SUBWORKFLOW + assert workflow_outputs == [] + assert result.activity_result is not None + sent = result.activity_result["sent_messages"] + assert [m["message"] for m in sent] == ["a", "b"] + assert all(m["source_id"] == "sub-node" and m["target_id"] is None for m in sent) + + def test_allow_direct_output_extends_parent_outputs(self) -> None: + executor = _subworkflow_executor("sub-node", "inner_wf", allow_direct_output=True) + workflow_outputs: list[object] = ["existing"] + + result = _process_subworkflow_result(["x", "y"], executor, workflow_outputs) + + assert workflow_outputs == ["existing", "x", "y"] + assert result.activity_result is not None + assert result.activity_result["sent_messages"] == [] + + def test_none_result_produces_no_outputs(self) -> None: + executor = _subworkflow_executor("sub-node", "inner_wf") + workflow_outputs: list[object] = [] + + result = _process_subworkflow_result(None, executor, workflow_outputs) + + assert result.activity_result is not None + assert result.activity_result["sent_messages"] == [] + assert workflow_outputs == [] + + def test_scalar_result_is_wrapped_as_single_output(self) -> None: + executor = _subworkflow_executor("sub-node", "inner_wf", allow_direct_output=True) + workflow_outputs: list[object] = [] + + _process_subworkflow_result("solo", executor, workflow_outputs) + + assert workflow_outputs == ["solo"] + + +class TestSubworkflowInputUnwrap: + """Child-side reconstruction of the parent-supplied marker payload.""" + + def test_unwrap_detects_and_reconstructs_marker(self) -> None: + marker = {SUBWORKFLOW_INPUT_KEY: "wrapped", SUBWORKFLOW_DEPTH_KEY: 2} + + unwrapped, inner = _try_unwrap_subworkflow_input(marker) + + assert unwrapped is True + assert inner == "wrapped" + + def test_unwrap_ignores_non_marker_dict(self) -> None: + unwrapped, inner = _try_unwrap_subworkflow_input({"some": "data"}) + + assert unwrapped is False + assert inner is None + + def test_unwrap_ignores_non_dict(self) -> None: + assert _try_unwrap_subworkflow_input("plain") == (False, None) + + def test_coerce_initial_input_returns_unwrapped_inner(self) -> None: + # When the workflow runs as a child, _coerce_initial_input returns the + # reconstructed inner object directly, bypassing start-executor coercion. + workflow = Mock() + workflow.executors = {} + marker = {SUBWORKFLOW_INPUT_KEY: "inner-message", SUBWORKFLOW_DEPTH_KEY: 1} + + assert _coerce_initial_input(workflow, marker) == "inner-message" diff --git a/python/packages/durabletask/tests/test_worker.py b/python/packages/durabletask/tests/test_worker.py index 91ba8e6105a..ec87694e6e2 100644 --- a/python/packages/durabletask/tests/test_worker.py +++ b/python/packages/durabletask/tests/test_worker.py @@ -288,5 +288,106 @@ def test_rejects_invalid_workflow_name(self, agent_worker: DurableAIAgentWorker) agent_worker.configure_workflow(workflow) +class TestSubworkflowRegistration: + """Test recursive registration of nested sub-workflows on one worker.""" + + def _inner_agent_workflow(self, name: str, executor_id: str) -> Mock: + from agent_framework import AgentExecutor + + agent = Mock() + agent.name = "InnerAssistant" + agent_executor = Mock(spec=AgentExecutor) + agent_executor.id = executor_id + agent_executor.agent = agent + + workflow = Mock() + workflow.name = name + workflow.executors = {executor_id: agent_executor} + return workflow + + def _outer_workflow(self, name: str, inner: Mock, *, sub_ids: tuple[str, ...] = ("sub",)) -> Mock: + from agent_framework import Executor, WorkflowExecutor + + executors: dict[str, Mock] = {} + for sub_id in sub_ids: + sub = Mock(spec=WorkflowExecutor) + sub.id = sub_id + sub.workflow = inner + sub.allow_direct_output = False + executors[sub_id] = sub + + router = Mock(spec=Executor) + router.id = "router" + executors["router"] = router + + workflow = Mock() + workflow.name = name + workflow.executors = executors + return workflow + + def test_nested_workflow_registers_both_orchestrations( + self, agent_worker: DurableAIAgentWorker, mock_grpc_worker: Mock + ) -> None: + """Configuring an outer workflow registers the inner workflow's orchestration too.""" + inner = self._inner_agent_workflow("inner", "agent_node") + outer = self._outer_workflow("outer", inner) + + agent_worker.configure_workflow(outer) + + registered = {call.args[0].__name__ for call in mock_grpc_worker.add_orchestrator.call_args_list} + assert registered == {"dafx-outer", "dafx-inner"} + + def test_nested_workflow_registers_inner_agent_scoped(self, agent_worker: DurableAIAgentWorker) -> None: + """The inner workflow's agent is registered under the inner-scoped id.""" + inner = self._inner_agent_workflow("inner", "agent_node") + outer = self._outer_workflow("outer", inner) + + agent_worker.configure_workflow(outer) + + assert "inner-agent_node" in agent_worker.registered_agent_names + + def test_subworkflow_node_not_registered_as_activity( + self, agent_worker: DurableAIAgentWorker, mock_grpc_worker: Mock + ) -> None: + """A WorkflowExecutor node is driven as a child orchestration, not an activity.""" + inner = self._inner_agent_workflow("inner", "agent_node") + outer = self._outer_workflow("outer", inner) + + agent_worker.configure_workflow(outer) + + # Only the outer 'router' non-agent executor becomes an activity. + registered_activities = {call.args[0].__name__ for call in mock_grpc_worker.add_activity.call_args_list} + assert registered_activities == {"dafx-outer-router"} + + def test_top_level_names_exclude_nested_workflows(self, agent_worker: DurableAIAgentWorker) -> None: + """``registered_workflow_names`` reports only top-level workflows.""" + inner = self._inner_agent_workflow("inner", "agent_node") + outer = self._outer_workflow("outer", inner) + + agent_worker.configure_workflow(outer) + + assert agent_worker.registered_workflow_names == ["outer"] + + def test_shared_subworkflow_registered_once( + self, agent_worker: DurableAIAgentWorker, mock_grpc_worker: Mock + ) -> None: + """A sub-workflow reused by two nodes registers its orchestration only once.""" + inner = self._inner_agent_workflow("inner", "agent_node") + outer = self._outer_workflow("outer", inner, sub_ids=("sub_a", "sub_b")) + + agent_worker.configure_workflow(outer) + + registered = [call.args[0].__name__ for call in mock_grpc_worker.add_orchestrator.call_args_list] + assert sorted(registered) == ["dafx-inner", "dafx-outer"] + + def test_nested_workflow_with_invalid_name_is_rejected(self, agent_worker: DurableAIAgentWorker) -> None: + """A nested sub-workflow must also have a valid, stable name.""" + inner = self._inner_agent_workflow("has space", "agent_node") + outer = self._outer_workflow("outer", inner) + + with pytest.raises(ValueError, match="invalid"): + agent_worker.configure_workflow(outer) + + if __name__ == "__main__": pytest.main([__file__, "-v", "--tb=short"]) diff --git a/python/packages/durabletask/tests/test_workflow_registration.py b/python/packages/durabletask/tests/test_workflow_registration.py index f9cb9e190ee..390b3981ad3 100644 --- a/python/packages/durabletask/tests/test_workflow_registration.py +++ b/python/packages/durabletask/tests/test_workflow_registration.py @@ -10,9 +10,13 @@ from unittest.mock import Mock -from agent_framework import AgentExecutor, Executor +from agent_framework import AgentExecutor, Executor, WorkflowExecutor -from agent_framework_durabletask import WorkflowRegistrationPlan, plan_workflow_registration +from agent_framework_durabletask import ( + WorkflowRegistrationPlan, + collect_hosted_workflows, + plan_workflow_registration, +) from agent_framework_durabletask._workflows.orchestrator import WORKFLOW_ORCHESTRATOR_NAME @@ -31,6 +35,20 @@ def _activity_executor(executor_id: str) -> Mock: return executor +def _subworkflow_executor(executor_id: str, inner_workflow: Mock) -> Mock: + executor = Mock(spec=WorkflowExecutor) + executor.id = executor_id + executor.workflow = inner_workflow + return executor + + +def _workflow(name: str, executors: dict[str, Mock]) -> Mock: + workflow = Mock() + workflow.name = name + workflow.executors = executors + return workflow + + class TestPlanWorkflowRegistration: """Test classification of workflow executors into durable primitives.""" @@ -95,3 +113,48 @@ def test_returns_workflow_registration_plan(self) -> None: assert isinstance(plan, WorkflowRegistrationPlan) assert plan.agent_executors == [] assert plan.activity_executors == [] + + def test_subworkflow_executor_classified_separately(self) -> None: + """A WorkflowExecutor goes to subworkflow_executors, not activities.""" + inner = _workflow("inner", {}) + sub_exec = _subworkflow_executor("sub-node", inner) + activity_exec = _activity_executor("router-node") + workflow = _workflow("outer", {"sub-node": sub_exec, "router-node": activity_exec}) + + plan = plan_workflow_registration(workflow) + + assert plan.subworkflow_executors == [sub_exec] + assert plan.activity_executors == [activity_exec] + assert plan.agent_executors == [] + + +class TestCollectHostedWorkflows: + """Test the recursive walk over nested sub-workflows.""" + + def test_single_workflow_yields_itself(self) -> None: + workflow = _workflow("solo", {"node": _activity_executor("node")}) + + assert [w.name for w in collect_hosted_workflows(workflow)] == ["solo"] + + def test_yields_nested_subworkflows_parent_first(self) -> None: + inner = _workflow("inner", {"leaf": _activity_executor("leaf")}) + sub_exec = _subworkflow_executor("sub", inner) + outer = _workflow("outer", {"sub": sub_exec}) + + assert [w.name for w in collect_hosted_workflows(outer)] == ["outer", "inner"] + + def test_dedupes_shared_subworkflow_by_name(self) -> None: + """A sub-workflow reused by two nodes is yielded once.""" + inner = _workflow("shared", {"leaf": _activity_executor("leaf")}) + sub_a = _subworkflow_executor("a", inner) + sub_b = _subworkflow_executor("b", inner) + outer = _workflow("outer", {"a": sub_a, "b": sub_b}) + + assert [w.name for w in collect_hosted_workflows(outer)] == ["outer", "shared"] + + def test_walks_multiple_levels(self) -> None: + leaf = _workflow("leaf_wf", {"x": _activity_executor("x")}) + mid = _workflow("mid_wf", {"l": _subworkflow_executor("l", leaf)}) + top = _workflow("top_wf", {"m": _subworkflow_executor("m", mid)}) + + assert [w.name for w in collect_hosted_workflows(top)] == ["top_wf", "mid_wf", "leaf_wf"] diff --git a/python/samples/04-hosting/durabletask/11_subworkflow/README.md b/python/samples/04-hosting/durabletask/11_subworkflow/README.md new file mode 100644 index 00000000000..6f6b9f14a3e --- /dev/null +++ b/python/samples/04-hosting/durabletask/11_subworkflow/README.md @@ -0,0 +1,69 @@ +# Composed Workflow (Sub-Workflow) on a Standalone Durable Task Worker + +This sample demonstrates **workflow composition** on a standalone Durable Task +worker: an inner agent-framework `Workflow` is embedded as a node inside an outer +`Workflow` using `WorkflowExecutor`. On the durable host, the inner workflow runs +as its own durable **child orchestration**. + +## Key Concepts Demonstrated + +- Embedding one `Workflow` inside another with + `WorkflowExecutor(inner_workflow, id=...)`. +- A single `DurableAIAgentWorker.configure_workflow(outer_workflow)` call walks the + composition and auto-registers a durable orchestration for **each** workflow: + - `dafx-review_pipeline` — the outer workflow. + - `dafx-sentiment_analysis` — the inner workflow, run as a durable **child + orchestration** when the outer workflow reaches the `WorkflowExecutor` node. +- Per-workflow scoping: each workflow's agent executors become durable entities and + its non-agent executors become durable activities, named per workflow so the same + executor id in two workflows never collides. +- Output forwarding: the inner workflow yields a string and, because + `allow_direct_output` is left at its default (`False`), that output is forwarded to + the outer workflow as a message delivered to the `reporter` executor. + +## Composition Layout + +```text +review_pipeline (outer) + intake (executor) + -> sentiment_sub = WorkflowExecutor(sentiment_analysis) + sentiment_agent (agent) -> sentiment_formatter (executor) + -> reporter (executor) +``` + +## Environment Setup + +See the [README.md](../README.md) in the parent directory for environment setup. + +This sample uses Azure AI Foundry credentials: + +- `FOUNDRY_PROJECT_ENDPOINT` +- `FOUNDRY_MODEL` + +It also needs a Durable Task Scheduler. For local development, start the +emulator (defaults to `http://localhost:8080`): + +```bash +docker run -d -p 8080:8080 -p 8082:8082 mcr.microsoft.com/dts/dts-emulator:latest +``` + +## Running the Sample + +Start the worker in one terminal: + +```bash +cd samples/04-hosting/durabletask/11_subworkflow +python worker.py +``` + +In a second terminal, run the client: + +```bash +python client.py +``` + +The client targets only the outer workflow (`review_pipeline`); the sub-workflow +runs automatically as a child orchestration. Each review flows: + +`intake` → `sentiment_sub` (child orchestration: `sentiment_agent` → +`sentiment_formatter`) → `reporter` → `"Review analysis complete -> sentiment: ..."`. diff --git a/python/samples/04-hosting/durabletask/11_subworkflow/client.py b/python/samples/04-hosting/durabletask/11_subworkflow/client.py new file mode 100644 index 00000000000..ced423cd3e6 --- /dev/null +++ b/python/samples/04-hosting/durabletask/11_subworkflow/client.py @@ -0,0 +1,78 @@ +# Copyright (c) Microsoft. All rights reserved. + +"""Client that starts the composed workflow orchestration and prints the result. + +The worker (``worker.py``) must be running first. Only the *outer* workflow is +started by the client; its embedded sub-workflow runs automatically as a durable +child orchestration when the outer workflow reaches the ``WorkflowExecutor`` node. + +The workflow is started via ``DurableWorkflowClient.start_workflow`` - which +schedules the ``dafx-review_pipeline`` orchestration that +``DurableAIAgentWorker.configure_workflow`` auto-registers for the outer workflow. + +Prerequisites: +- ``worker.py`` running and connected to the same Durable Task Scheduler. +- A Durable Task Scheduler reachable at ``ENDPOINT`` (default ``http://localhost:8080``). +""" + +import asyncio +import logging +import os + +from agent_framework.azure import DurableWorkflowClient +from azure.identity import AzureCliCredential +from dotenv import load_dotenv +from durabletask.azuremanaged.client import DurableTaskSchedulerClient + +load_dotenv() + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + +# The client targets the outer workflow; the sub-workflow runs as a child orchestration. +WORKFLOW_NAME = "review_pipeline" + + +def get_client(taskhub: str | None = None, endpoint: str | None = None) -> DurableTaskSchedulerClient: + """Create a configured DurableTaskSchedulerClient.""" + taskhub_name = taskhub or os.getenv("TASKHUB", "default") + endpoint_url = endpoint or os.getenv("ENDPOINT", "http://localhost:8080") + + credential = None if endpoint_url == "http://localhost:8080" else AzureCliCredential() + + return DurableTaskSchedulerClient( + host_address=endpoint_url, + secure_channel=endpoint_url != "http://localhost:8080", + taskhub=taskhub_name, + token_credential=credential, + ) + + +def run_workflow(client: DurableWorkflowClient, review: str) -> None: + """Start the outer workflow with a review and wait for the result.""" + instance_id = client.start_workflow(input=review) + logger.info("Started workflow instance: %s", instance_id) + + output = client.await_workflow_output(instance_id) + logger.info("Workflow output: %s", output) + + +async def main() -> None: + """Run the composed workflow against a couple of product reviews.""" + client = DurableWorkflowClient(get_client(), workflow_name=WORKFLOW_NAME) + + logger.info("TEST 1: Positive review") + run_workflow( + client, + "Absolutely love this espresso machine - it heats up fast and the coffee is consistently great.", + ) + + logger.info("TEST 2: Negative review") + run_workflow( + client, + "Disappointed. The device stopped working after two weeks and support never replied.", + ) + + +if __name__ == "__main__": + asyncio.run(main()) diff --git a/python/samples/04-hosting/durabletask/11_subworkflow/worker.py b/python/samples/04-hosting/durabletask/11_subworkflow/worker.py new file mode 100644 index 00000000000..3ba0b0bf5b1 --- /dev/null +++ b/python/samples/04-hosting/durabletask/11_subworkflow/worker.py @@ -0,0 +1,211 @@ +# Copyright (c) Microsoft. All rights reserved. + +"""Worker that hosts a MAF Workflow composed of a nested sub-workflow. + +This sample shows workflow *composition* on the Durable Task host. A +``WorkflowExecutor`` embeds an inner workflow as a node inside an outer workflow. +``DurableAIAgentWorker.configure_workflow`` walks the composition and +auto-registers a durable orchestration for *each* workflow: + +- ``dafx-sentiment_analysis`` - the inner workflow, run as a durable **child + orchestration** whenever the outer workflow reaches the ``WorkflowExecutor`` node. +- ``dafx-review_pipeline`` - the outer workflow. + +Each workflow's agent executors become durable entities and its non-agent +executors become durable activities, scoped per workflow so the same executor id +in two workflows never collides. + +Composition layout:: + + review_pipeline (outer) + intake (executor) + -> sentiment_sub = WorkflowExecutor(sentiment_analysis) + sentiment_agent (agent) -> sentiment_formatter (executor) + -> reporter (executor) + +The inner workflow yields a string; because ``allow_direct_output`` is left at its +default (``False``), that output is forwarded to the outer workflow as a message +delivered to ``reporter``, which produces the final result. + +Prerequisites: +- Set ``FOUNDRY_PROJECT_ENDPOINT`` and ``FOUNDRY_MODEL``. +- Sign in with Azure CLI (``az login``) for ``AzureCliCredential``. +- Start a Durable Task Scheduler (e.g. the DTS emulator on ``localhost:8080``). + +Run the worker (this process), then run ``client.py`` in another process. +""" + +import asyncio +import logging +import os +from typing import Any + +from agent_framework import ( + Agent, + AgentExecutorResponse, + Executor, + Workflow, + WorkflowBuilder, + WorkflowContext, + WorkflowExecutor, + handler, +) +from agent_framework.azure import DurableAIAgentWorker +from agent_framework.foundry import FoundryChatClient, FoundryChatOptions +from azure.identity import AzureCliCredential +from azure.identity.aio import AzureCliCredential as AsyncAzureCliCredential +from dotenv import load_dotenv +from durabletask.azuremanaged.worker import DurableTaskSchedulerWorker +from pydantic import BaseModel, ValidationError +from typing_extensions import Never + +load_dotenv() + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + +SENTIMENT_AGENT_NAME = "SentimentAgent" +INNER_WORKFLOW_NAME = "sentiment_analysis" +OUTER_WORKFLOW_NAME = "review_pipeline" + +SENTIMENT_INSTRUCTIONS = ( + "You classify the sentiment of a customer product review. " + "Return JSON with fields sentiment (one of 'positive', 'neutral', 'negative') " + "and confidence (a number between 0 and 1)." +) + + +class SentimentResult(BaseModel): + """Structured output from the sentiment agent.""" + + sentiment: str + confidence: float + + +class SentimentFormatterExecutor(Executor): + """Inner-workflow executor that turns the agent's JSON into a summary line.""" + + @handler + async def format_sentiment(self, agent_response: AgentExecutorResponse, ctx: WorkflowContext[Never, str]) -> None: + text = agent_response.agent_response.text + try: + result = SentimentResult.model_validate_json(text) + summary = f"{result.sentiment} (confidence {result.confidence:.0%})" + except ValidationError: + summary = "unknown (could not parse sentiment)" + await ctx.yield_output(summary) + + +class IntakeExecutor(Executor): + """Outer-workflow entry point that normalizes the review before analysis.""" + + @handler + async def intake(self, review: str, ctx: WorkflowContext[str]) -> None: + normalized = review.strip() + logger.info("Intake received review (%d chars)", len(normalized)) + await ctx.send_message(normalized) + + +class ReporterExecutor(Executor): + """Outer-workflow executor that consumes the sub-workflow's forwarded output.""" + + @handler + async def report(self, sentiment_summary: str, ctx: WorkflowContext[Never, str]) -> None: + await ctx.yield_output(f"Review analysis complete -> sentiment: {sentiment_summary}") + + +def _create_chat_client() -> FoundryChatClient: + """Create an Azure AI Foundry chat client using AzureCliCredential.""" + return FoundryChatClient( + project_endpoint=os.environ["FOUNDRY_PROJECT_ENDPOINT"], + model=os.environ["FOUNDRY_MODEL"], + credential=AsyncAzureCliCredential(), + ) + + +def create_inner_workflow(chat_client: FoundryChatClient) -> Workflow: + """Build the inner ``sentiment_analysis`` workflow (agent -> formatter).""" + sentiment_agent = Agent( + client=chat_client, + name=SENTIMENT_AGENT_NAME, + instructions=SENTIMENT_INSTRUCTIONS, + default_options=FoundryChatOptions[Any](response_format=SentimentResult), + ) + sentiment_formatter = SentimentFormatterExecutor(id="sentiment_formatter") + + return ( + WorkflowBuilder(name=INNER_WORKFLOW_NAME, start_executor=sentiment_agent) + .add_edge(sentiment_agent, sentiment_formatter) + .build() + ) + + +def create_workflow() -> Workflow: + """Build the outer ``review_pipeline`` workflow that embeds the inner workflow.""" + chat_client = _create_chat_client() + inner_workflow = create_inner_workflow(chat_client) + + intake = IntakeExecutor(id="intake") + # WorkflowExecutor embeds the inner workflow as a single node in the outer + # workflow. On the durable host this node runs as a child orchestration. + sentiment_sub = WorkflowExecutor(inner_workflow, id="sentiment_sub") + reporter = ReporterExecutor(id="reporter") + + return ( + WorkflowBuilder(name=OUTER_WORKFLOW_NAME, start_executor=intake) + .add_edge(intake, sentiment_sub) + .add_edge(sentiment_sub, reporter) + .build() + ) + + +def get_worker( + taskhub: str | None = None, endpoint: str | None = None, log_handler: logging.Handler | None = None +) -> DurableTaskSchedulerWorker: + """Create a configured DurableTaskSchedulerWorker.""" + taskhub_name = taskhub or os.getenv("TASKHUB", "default") + endpoint_url = endpoint or os.getenv("ENDPOINT", "http://localhost:8080") + + credential = None if endpoint_url == "http://localhost:8080" else AzureCliCredential() + + return DurableTaskSchedulerWorker( + host_address=endpoint_url, + secure_channel=endpoint_url != "http://localhost:8080", + taskhub=taskhub_name, + token_credential=credential, + log_handler=log_handler, + ) + + +def setup_worker(worker: DurableTaskSchedulerWorker) -> DurableAIAgentWorker: + """Register the outer workflow and its nested sub-workflow on the worker.""" + agent_worker = DurableAIAgentWorker(worker) + + workflow = create_workflow() + # A single call walks the composition: it registers the outer workflow plus + # every nested sub-workflow (here, sentiment_analysis) as its own durable + # orchestration, deduped by workflow name. + agent_worker.configure_workflow(workflow) + logger.info("✓ Configured workflow '%s' with embedded sub-workflow '%s'", OUTER_WORKFLOW_NAME, INNER_WORKFLOW_NAME) + + return agent_worker + + +async def main() -> None: + """Start the worker and block until interrupted.""" + worker = get_worker() + setup_worker(worker) + + logger.info("Worker is ready and listening for work items. Press Ctrl+C to stop.") + try: + worker.start() + while True: + await asyncio.sleep(1) + except KeyboardInterrupt: + logger.info("Worker shutdown initiated") + + logger.info("Worker stopped") + + +if __name__ == "__main__": + asyncio.run(main()) From 78e3a9c62dd34c41c676d71138a57d2d2b7bf660 Mon Sep 17 00:00:00 2001 From: Ahmed Muhsin Date: Tue, 23 Jun 2026 18:02:40 -0400 Subject: [PATCH 05/12] feat(durabletask): sub-workflow HITL via qualified request ids (phase 4) Surface a nested sub-workflow's human-in-the-loop request behind the top-level instance (B2 single addressing surface). - Orchestrator records dispatched sub-workflow child instance ids in its custom status (subworkflows map) before suspending in task_all, so the read side can reach a child's pending request while the parent is paused. - Read side (durabletask client get_pending_hitl_requests; AF status route) recurses into nested child statuses, qualifying each nested request id as {executorId}::{requestId} (accumulated for deeper nesting). - Write side (durabletask client send_hitl_response; AF respond route) splits a qualified id on '::', resolves the owning child orchestration via the parent's subworkflows map, and raises the event on the leaf child with the bare request id. Unknown/inactive sub-workflow -> error/404. - Shared SUBWORKFLOW_REQUEST_SEPARATOR ('::') in naming so both hosts and the client agree. respondUrl/respond always targets the top-level instance. - Tests: TestSubworkflowHitl (durabletask client, 7), TestAgentFunctionAppSubworkflowHitl (AF, 7). Sample: 12_subworkflow_hitl (HITL pause inside an embedded sub-workflow). --- .../agent_framework_azurefunctions/_app.py | 133 +++++++-- .../packages/azurefunctions/tests/test_app.py | 104 +++++++ .../_workflows/client.py | 123 ++++++-- .../_workflows/naming.py | 10 + .../_workflows/orchestrator.py | 28 +- .../durabletask/tests/test_workflow_client.py | 145 +++++++++ .../durabletask/12_subworkflow_hitl/README.md | 63 ++++ .../durabletask/12_subworkflow_hitl/client.py | 134 +++++++++ .../durabletask/12_subworkflow_hitl/worker.py | 274 ++++++++++++++++++ 9 files changed, 967 insertions(+), 47 deletions(-) create mode 100644 python/samples/04-hosting/durabletask/12_subworkflow_hitl/README.md create mode 100644 python/samples/04-hosting/durabletask/12_subworkflow_hitl/client.py create mode 100644 python/samples/04-hosting/durabletask/12_subworkflow_hitl/worker.py diff --git a/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py b/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py index 49efc6e43c8..cc5145f6cd9 100644 --- a/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py +++ b/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py @@ -43,6 +43,7 @@ plan_workflow_registration, ) from agent_framework_durabletask._workflows.naming import ( + SUBWORKFLOW_REQUEST_SEPARATOR, validate_workflow_name, workflow_executor_activity_name, workflow_orchestrator_name, @@ -525,27 +526,29 @@ async def get_workflow_status( "lastUpdatedTime": status.last_updated_time.isoformat() if status.last_updated_time else None, } - # Add pending HITL requests info if available - if ( - (custom_status := status.custom_status) - and isinstance(custom_status, dict) - and (pending_requests_dict := custom_status.get("pending_requests")) # type: ignore - and isinstance(pending_requests_dict, dict) - ): - base_url = self._build_base_url(req.url) - pending_requests: list[dict[str, Any]] = [] - for req_id, req_data in pending_requests_dict.items(): # type: ignore - if not isinstance(req_data, dict): - continue - pending_requests.append({ - "requestId": req_id, - "sourceExecutor": req_data.get("source_executor_id"), # type: ignore[reportUnknownMemberType] - "requestData": req_data.get("data"), # type: ignore[reportUnknownMemberType] - "requestType": req_data.get("request_type"), # type: ignore[reportUnknownMemberType] - "responseType": req_data.get("response_type"), # type: ignore[reportUnknownMemberType] - "respondUrl": f"{base_url}/api/workflow/{workflow_name}/respond/{instance_id}/{req_id}", - }) - response["pendingHumanInputRequests"] = pending_requests + # Add pending HITL requests info if available. Requests originating in a + # nested sub-workflow are bubbled up here with a qualified requestId + # ({executorId}::{requestId}); the respondUrl always targets this top-level + # instance, so the caller has a single addressing surface (B2). + custom_status = status.custom_status + if isinstance(custom_status, dict): + gathered = await self._gather_pending_hitl_requests(client, cast("dict[str, Any]", custom_status)) + if gathered: + base_url = self._build_base_url(req.url) + pending_requests: list[dict[str, Any]] = [ + { + "requestId": qualified_id, + "sourceExecutor": req_data.get("source_executor_id"), + "requestData": req_data.get("data"), + "requestType": req_data.get("request_type"), + "responseType": req_data.get("response_type"), + "respondUrl": ( + f"{base_url}/api/workflow/{workflow_name}/respond/{instance_id}/{qualified_id}" + ), + } + for qualified_id, req_data in gathered + ] + response["pendingHumanInputRequests"] = pending_requests return func.HttpResponse( json.dumps(response, default=_json_default), @@ -584,11 +587,19 @@ async def send_hitl_response(req: func.HttpRequest, client: df.DurableOrchestrat # See strip_pickle_markers() docstring for details on the attack vector. response_data = strip_pickle_markers(response_data) - # Send the response as an external event - # The request_id is used as the event name for correlation + # A qualified requestId ({executorId}::{requestId}) addresses a request that + # originated in a nested sub-workflow: resolve it to the owning child + # orchestration instance and the bare request id it is waiting on. + resolved = await self._resolve_hitl_target(client, instance_id, request_id) + if resolved is None: + return self._build_error_response("Pending request not found", status_code=404) + target_instance_id, bare_request_id = resolved + + # Send the response as an external event. The (bare) request_id is used as the + # event name for correlation on the owning orchestration instance. await client.raise_event( - instance_id=instance_id, - event_name=request_id, + instance_id=target_instance_id, + event_name=bare_request_id, event_data=response_data, ) @@ -607,6 +618,78 @@ async def send_hitl_response(req: func.HttpRequest, client: df.DurableOrchestrat _ = get_workflow_status _ = send_hitl_response + async def _gather_pending_hitl_requests( + self, + client: "df.DurableOrchestrationClient", + custom_status: dict[str, Any], + *, + prefix: str = "", + ) -> list[tuple[str, dict[str, Any]]]: + """Collect ``(qualifiedRequestId, requestData)`` pairs for an instance and its sub-workflows. + + ``custom_status`` is the already-fetched custom status of the instance at the + current level. Nested sub-workflows (listed in its ``subworkflows`` map) are + fetched by id and recursed into, accumulating an ``{executorId}::`` prefix so a + request deep in the tree carries its full path. Child instances come from the + trusted parent status, so no per-child ownership check is applied (the caller + validated the top-level instance). + """ + gathered: list[tuple[str, dict[str, Any]]] = [] + + pending = custom_status.get("pending_requests") + if isinstance(pending, dict): + for req_id, req_data in cast("dict[str, Any]", pending).items(): + if isinstance(req_data, dict): + gathered.append((f"{prefix}{req_id}", cast("dict[str, Any]", req_data))) + + subworkflows = custom_status.get("subworkflows") + if isinstance(subworkflows, dict): + for executor_id, child_instance_id in cast("dict[str, Any]", subworkflows).items(): + if not isinstance(child_instance_id, str): + continue + child_status = await client.get_status(child_instance_id) + child_custom = child_status.custom_status if child_status else None + if isinstance(child_custom, dict): + gathered.extend( + await self._gather_pending_hitl_requests( + client, + cast("dict[str, Any]", child_custom), + prefix=f"{prefix}{executor_id}{SUBWORKFLOW_REQUEST_SEPARATOR}", + ) + ) + + return gathered + + async def _resolve_hitl_target( + self, + client: "df.DurableOrchestrationClient", + instance_id: str, + request_id: str, + ) -> tuple[str, str] | None: + """Resolve a possibly-qualified request id to ``(owningInstanceId, bareRequestId)``. + + An unqualified id targets ``instance_id`` directly. A qualified id + ``{executorId}::{rest}`` addresses a nested sub-workflow: the executor's child + instance id is read from this instance's ``subworkflows`` custom-status map and + the remainder resolved recursively. Returns ``None`` when a referenced + sub-workflow is not currently active (so the caller can return "not found"). + """ + if SUBWORKFLOW_REQUEST_SEPARATOR not in request_id: + return instance_id, request_id + + executor_id, remainder = request_id.split(SUBWORKFLOW_REQUEST_SEPARATOR, 1) + status = await client.get_status(instance_id) + custom_status = status.custom_status if status else None + if not isinstance(custom_status, dict): + return None + subworkflows = cast("dict[str, Any]", custom_status).get("subworkflows") + if not isinstance(subworkflows, dict): + return None + child_instance_id = cast("dict[str, Any]", subworkflows).get(executor_id) + if not isinstance(child_instance_id, str): + return None + return await self._resolve_hitl_target(client, child_instance_id, remainder) + def _build_base_url(self, request_url: str) -> str: """Extract the base URL from a request URL.""" base_url, _, _ = request_url.partition("/api/") diff --git a/python/packages/azurefunctions/tests/test_app.py b/python/packages/azurefunctions/tests/test_app.py index 1420c8b3048..a2fe2c8d589 100644 --- a/python/packages/azurefunctions/tests/test_app.py +++ b/python/packages/azurefunctions/tests/test_app.py @@ -1732,5 +1732,109 @@ def test_rejects_other_orchestration_name(self, other_name: str) -> None: assert app._is_owned_orchestration(status, "orders") is False +class TestAgentFunctionAppSubworkflowHitl: + """Sub-workflow HITL plumbing: gather nested pending requests and route responses. + + These exercise the host-side helpers the ``workflow/{name}/status`` and + ``.../respond`` routes use to support nested sub-workflows behind a single + top-level addressing surface (B2 qualified request ids). + """ + + @staticmethod + def _app() -> AgentFunctionApp: + mock_workflow = Mock() + mock_workflow.name = "orders" + mock_workflow.executors = {} + with ( + patch.object(AgentFunctionApp, "_setup_executor_activity"), + patch.object(AgentFunctionApp, "_setup_workflow_orchestration"), + ): + return AgentFunctionApp(workflow=mock_workflow) + + @staticmethod + def _client(by_instance: dict[str, dict | None]) -> AsyncMock: + """An AsyncMock durable client whose get_status returns a per-instance custom status.""" + + async def _get_status(instance_id: str) -> Mock | None: + if instance_id not in by_instance: + return None + status = Mock() + status.custom_status = by_instance[instance_id] + return status + + client = AsyncMock() + client.get_status.side_effect = _get_status + return client + + async def test_gather_returns_top_level_requests_unqualified(self) -> None: + app = self._app() + client = self._client({}) + custom_status = {"pending_requests": {"top-1": {"source_executor_id": "outer"}}} + + gathered = await app._gather_pending_hitl_requests(client, custom_status) + + assert gathered == [("top-1", {"source_executor_id": "outer"})] + + async def test_gather_qualifies_nested_requests(self) -> None: + app = self._app() + client = self._client({"child-1": {"pending_requests": {"inner-1": {"source_executor_id": "inner"}}}}) + parent_status = { + "pending_requests": {"top-1": {"source_executor_id": "outer"}}, + "subworkflows": {"sub": "child-1"}, + } + + gathered = await app._gather_pending_hitl_requests(client, parent_status) + + ids = {qid for qid, _ in gathered} + assert ids == {"top-1", "sub::inner-1"} + + async def test_gather_accumulates_deep_path(self) -> None: + app = self._app() + client = self._client({ + "child-1": {"subworkflows": {"leaf": "child-2"}}, + "child-2": {"pending_requests": {"deep": {"source_executor_id": "leaf_node"}}}, + }) + parent_status = {"subworkflows": {"mid": "child-1"}} + + gathered = await app._gather_pending_hitl_requests(client, parent_status) + + assert [qid for qid, _ in gathered] == ["mid::leaf::deep"] + + async def test_resolve_unqualified_targets_same_instance(self) -> None: + app = self._app() + client = self._client({}) + + resolved = await app._resolve_hitl_target(client, "parent", "req-1") + + assert resolved == ("parent", "req-1") + + async def test_resolve_qualified_targets_child_instance(self) -> None: + app = self._app() + client = self._client({"parent": {"subworkflows": {"sub": "child-1"}}}) + + resolved = await app._resolve_hitl_target(client, "parent", "sub::req-9") + + assert resolved == ("child-1", "req-9") + + async def test_resolve_deeply_qualified_targets_leaf(self) -> None: + app = self._app() + client = self._client({ + "parent": {"subworkflows": {"mid": "child-1"}}, + "child-1": {"subworkflows": {"leaf": "child-2"}}, + }) + + resolved = await app._resolve_hitl_target(client, "parent", "mid::leaf::deep") + + assert resolved == ("child-2", "deep") + + async def test_resolve_unknown_subworkflow_returns_none(self) -> None: + app = self._app() + client = self._client({"parent": {"state": "running"}}) # no subworkflows map + + resolved = await app._resolve_hitl_target(client, "parent", "sub::req-9") + + assert resolved is None + + if __name__ == "__main__": pytest.main([__file__, "-v", "--tb=short"]) diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/client.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/client.py index 6b1e833127f..4d142613392 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/client.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/client.py @@ -19,7 +19,7 @@ from agent_framework import WorkflowEvent from durabletask.client import TaskHubGrpcClient -from .naming import workflow_orchestrator_name +from .naming import SUBWORKFLOW_REQUEST_SEPARATOR, workflow_orchestrator_name from .serialization import deserialize_workflow_event, deserialize_workflow_output, strip_pickle_markers logger = logging.getLogger("agent_framework.durabletask") @@ -341,6 +341,13 @@ def get_pending_hitl_requests(self, instance_id: str, *, workflow_name: str | No A list of pending requests. Each entry contains ``request_id``, ``source_executor_id``, ``data``, ``request_type``, and ``response_type``. Empty if the workflow is not currently waiting for human input. + + Note: + Requests originating in a nested sub-workflow are included with a + **qualified** ``request_id`` (``{executorId}::{requestId}``, nested for + deeper levels). Pass that qualified id straight back to + :meth:`send_hitl_response`; it is routed to the owning child orchestration + automatically, so the caller only ever addresses the top-level instance. """ state = self._client.get_orchestration_state(instance_id) if state is None or not state.serialized_custom_status: @@ -348,32 +355,54 @@ def get_pending_hitl_requests(self, instance_id: str, *, workflow_name: str | No if not self._is_owned_orchestration(state, workflow_name): return [] + return self._collect_pending_hitl_requests(state.serialized_custom_status) + + def _collect_pending_hitl_requests(self, serialized_custom_status: str) -> list[dict[str, Any]]: + """Collect an orchestration's pending requests plus any nested sub-workflow ones. + + Nested requests (discovered via the ``subworkflows`` map the parent records in + its custom status) are qualified with the owning executor id so deeper requests + accumulate a full ``{executorId}::{...}::{requestId}`` path. Child instances are + reached directly by id (already trusted, having come from the parent's status), + so no per-child ownership check is applied. + """ try: - custom_status = json.loads(state.serialized_custom_status) + custom_status = json.loads(serialized_custom_status) except (json.JSONDecodeError, TypeError): return [] - if not isinstance(custom_status, dict): return [] status_dict = cast(dict[str, Any], custom_status) + requests: list[dict[str, Any]] = [] + pending = status_dict.get("pending_requests") - if not isinstance(pending, dict): - return [] - pending_dict = cast(dict[str, Any], pending) + if isinstance(pending, dict): + for request_id, req_data in cast(dict[str, Any], pending).items(): + if not isinstance(req_data, dict): + continue + req = cast(dict[str, Any], req_data) + requests.append({ + "request_id": req.get("request_id", request_id), + "source_executor_id": req.get("source_executor_id"), + "data": req.get("data"), + "request_type": req.get("request_type"), + "response_type": req.get("response_type"), + }) + + subworkflows = status_dict.get("subworkflows") + if isinstance(subworkflows, dict): + for executor_id, child_instance_id in cast(dict[str, str], subworkflows).items(): + if not isinstance(child_instance_id, str): + continue + child_state = self._client.get_orchestration_state(child_instance_id) + if child_state is None or not child_state.serialized_custom_status: + continue + for child_req in self._collect_pending_hitl_requests(child_state.serialized_custom_status): + qualified = dict(child_req) + qualified["request_id"] = f"{executor_id}{SUBWORKFLOW_REQUEST_SEPARATOR}{child_req['request_id']}" + requests.append(qualified) - requests: list[dict[str, Any]] = [] - for request_id, req_data in pending_dict.items(): - if not isinstance(req_data, dict): - continue - req = cast(dict[str, Any], req_data) - requests.append({ - "request_id": req.get("request_id", request_id), - "source_executor_id": req.get("source_executor_id"), - "data": req.get("data"), - "request_type": req.get("request_type"), - "response_type": req.get("response_type"), - }) return requests def send_hitl_response( @@ -387,6 +416,9 @@ def send_hitl_response( Args: instance_id: The workflow instance ID. request_id: The pending request's ID (from ``get_pending_hitl_requests``). + May be a **qualified** id (``{executorId}::{requestId}``) for a request + that originated in a nested sub-workflow; it is routed to the owning + child orchestration automatically. response: The response payload (e.g. a dict matching the expected response type the executor's ``@response_handler`` expects). workflow_name: Optional workflow name; when set (or a client default is @@ -395,7 +427,8 @@ def send_hitl_response( workflow's orchestration. Raises: - ValueError: If the instance does not belong to the targeted workflow. + ValueError: If the instance does not belong to the targeted workflow, or a + qualified id references a sub-workflow that is not currently active. Note: The payload is sanitized with ``strip_pickle_markers`` before delivery to @@ -407,8 +440,56 @@ def send_hitl_response( if state is None or not self._is_owned_orchestration(state, workflow_name): raise ValueError(f"Instance '{instance_id}' does not belong to the targeted workflow.") + # A qualified id addresses a nested sub-workflow: resolve it to the owning child + # orchestration instance and the bare request id the child is actually waiting on. + target_instance_id, bare_request_id = self._resolve_hitl_target(instance_id, request_id) + safe_response = strip_pickle_markers(response) - self._client.raise_orchestration_event(instance_id, event_name=request_id, data=safe_response) + self._client.raise_orchestration_event(target_instance_id, event_name=bare_request_id, data=safe_response) logger.debug( - "[DurableWorkflowClient] Sent HITL response for request %s on instance %s", request_id, instance_id + "[DurableWorkflowClient] Sent HITL response for request %s on instance %s", + bare_request_id, + target_instance_id, ) + + def _resolve_hitl_target(self, instance_id: str, request_id: str) -> tuple[str, str]: + """Resolve a possibly-qualified request id to ``(owning_instance_id, bare_request_id)``. + + An unqualified id targets ``instance_id`` directly. A qualified id + ``{executorId}::{rest}`` addresses a nested sub-workflow: the executor's child + instance id is read from this instance's ``subworkflows`` custom-status map and + the remainder is resolved recursively, so arbitrarily deep nesting lands on the + leaf child orchestration and its bare request id. + """ + if SUBWORKFLOW_REQUEST_SEPARATOR not in request_id: + return instance_id, request_id + + executor_id, remainder = request_id.split(SUBWORKFLOW_REQUEST_SEPARATOR, 1) + child_instance_id = self._lookup_subworkflow_instance(instance_id, executor_id) + if child_instance_id is None: + raise ValueError( + f"No active sub-workflow '{executor_id}' found for instance '{instance_id}' " + f"while routing HITL response for request '{request_id}'." + ) + return self._resolve_hitl_target(child_instance_id, remainder) + + def _lookup_subworkflow_instance(self, instance_id: str, executor_id: str) -> str | None: + """Return the child orchestration instance id for ``executor_id``, if active. + + Reads the ``subworkflows`` map (``{executorId: childInstanceId}``) the parent + records in its custom status while dispatching sub-workflow nodes. + """ + state = self._client.get_orchestration_state(instance_id) + if state is None or not state.serialized_custom_status: + return None + try: + custom_status = json.loads(state.serialized_custom_status) + except (json.JSONDecodeError, TypeError): + return None + if not isinstance(custom_status, dict): + return None + subworkflows = cast(dict[str, Any], custom_status).get("subworkflows") + if not isinstance(subworkflows, dict): + return None + child = cast(dict[str, Any], subworkflows).get(executor_id) + return child if isinstance(child, str) else None diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py index a39fb9cd604..03cbcb20c2c 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py @@ -30,6 +30,7 @@ __all__ = [ "DURABLE_NAME_PREFIX", + "SUBWORKFLOW_REQUEST_SEPARATOR", "is_auto_generated_workflow_name", "validate_workflow_name", "workflow_executor_activity_name", @@ -43,6 +44,15 @@ # ``AgentSessionId.ENTITY_NAME_PREFIX``. DURABLE_NAME_PREFIX = "dafx-" +# Separator joining an executor id to a (possibly already-qualified) request id when +# a nested sub-workflow's pending HITL request is bubbled up to the top-level instance +# (B2 single-surface addressing: ``{executorId}::{requestId}``). Both hosts and the +# client must agree on this so a qualified id round-trips: the read side qualifies an +# inner request with it; the respond side splits on it to route the response to the +# owning child orchestration. Executor ids and framework request ids do not contain +# this sequence, so the split is unambiguous. +SUBWORKFLOW_REQUEST_SEPARATOR = "::" + # A workflow name is interpolated into durable orchestration/activity/entity names # *and* into HTTP route segments (``workflow/{workflowName}/run``), so it must be # conservative enough to be safe in every position: ASCII letters, digits, '_' or diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py index c666553070a..864a190bcf3 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py @@ -117,6 +117,10 @@ class TaskMetadata: source_executor_id: str task_type: TaskType remaining_messages: list[tuple[str, Any, str]] | None = None + # For SUBWORKFLOW tasks: the deterministic child orchestration instance id. The + # parent records these in its custom status before awaiting the child so the read + # side can reach nested pending HITL requests while the parent is suspended. + child_instance_id: str | None = None @dataclass @@ -806,6 +810,7 @@ def _prepare_all_tasks( message=message, source_executor_id=source_executor_id, task_type=TaskType.SUBWORKFLOW, + child_instance_id=child_instance_id, ) ) else: @@ -934,7 +939,11 @@ def append_activity_events(activity_result: dict[str, Any] | None) -> None: enriched["iteration"] = iteration live_events.append(enriched) - def publish_live_status(state: str, pending_requests: dict[str, Any] | None = None) -> None: + def publish_live_status( + state: str, + pending_requests: dict[str, Any] | None = None, + subworkflows: dict[str, str] | None = None, + ) -> None: # Publish only on live execution so events are not re-emitted on replay # (the custom status set during the first execution already persisted). if ctx.is_replaying: @@ -947,6 +956,12 @@ def publish_live_status(state: str, pending_requests: dict[str, Any] | None = No status["events"] = live_events if pending_requests is not None: status["pending_requests"] = pending_requests + # Map of {executorId: childInstanceId} for sub-workflows dispatched this + # superstep. The parent is suspended in task_all while a child waits for human + # input, so recording the child ids here lets the read side discover and + # qualify nested pending requests (B2 single-surface HITL). + if subworkflows: + status["subworkflows"] = subworkflows ctx.set_custom_status(status) fan_in_pending: dict[str, dict[str, list[tuple[Any, str]]]] = { @@ -977,6 +992,17 @@ def publish_live_status(state: str, pending_requests: dict[str, Any] | None = No all_results: list[ExecutorResult] = [] if all_tasks: logger.debug("Executing %d tasks in parallel (agents + activities)", len(all_tasks)) + # Record dispatched sub-workflow child instance ids before suspending in + # task_all. While a nested sub-workflow waits for human input, this parent + # stays suspended here, so its custom status must already carry the child + # ids for the read side to discover and qualify nested pending requests. + active_subworkflows = { + meta.executor_id: meta.child_instance_id + for meta in task_metadata_list + if meta.task_type == TaskType.SUBWORKFLOW and meta.child_instance_id is not None + } + if active_subworkflows: + publish_live_status("running", subworkflows=active_subworkflows) raw_results = yield ctx.task_all(all_tasks) logger.debug("All %d tasks completed", len(all_tasks)) diff --git a/python/packages/durabletask/tests/test_workflow_client.py b/python/packages/durabletask/tests/test_workflow_client.py index a80a012bfe2..3f07702211b 100644 --- a/python/packages/durabletask/tests/test_workflow_client.py +++ b/python/packages/durabletask/tests/test_workflow_client.py @@ -465,3 +465,148 @@ async def test_no_wait_returns_instance_id_without_awaiting( assert result == "instance-2" mock_client.wait_for_orchestration_completion.assert_not_called() + + +class TestSubworkflowHitl: + """Sub-workflow HITL: qualified request ids in/out (B2 single-surface addressing).""" + + @staticmethod + def _states(mock_client: Mock, by_instance: dict[str, dict | None]) -> None: + """Wire get_orchestration_state to return a state per instance id. + + Each value is the custom-status dict for that instance (or None for no + status). ``name`` is unset so ownership validation is skipped (these tests + construct the client without a workflow_name default). + """ + + def _get_state(instance_id: str) -> Mock | None: + if instance_id not in by_instance: + return None + status = by_instance[instance_id] + state = Mock() + state.serialized_custom_status = json.dumps(status) if status is not None else None + return state + + mock_client.get_orchestration_state.side_effect = _get_state + + def test_collects_nested_request_with_qualified_id( + self, workflow_client: DurableWorkflowClient, mock_client: Mock + ) -> None: + """A request pending in a child sub-workflow surfaces with an {executor}::{id} id.""" + self._states( + mock_client, + { + "parent": {"state": "running", "subworkflows": {"sub": "child-1"}}, + "child-1": { + "state": "waiting_for_human_input", + "pending_requests": {"req-9": {"request_id": "req-9", "source_executor_id": "inner_node"}}, + }, + }, + ) + + requests = workflow_client.get_pending_hitl_requests("parent") + + assert len(requests) == 1 + assert requests[0]["request_id"] == "sub::req-9" + assert requests[0]["source_executor_id"] == "inner_node" + + def test_collects_parent_and_nested_requests_together( + self, workflow_client: DurableWorkflowClient, mock_client: Mock + ) -> None: + """Top-level and nested pending requests are both returned (nested qualified).""" + self._states( + mock_client, + { + "parent": { + "state": "waiting_for_human_input", + "pending_requests": {"top-1": {"request_id": "top-1", "source_executor_id": "outer_node"}}, + "subworkflows": {"sub": "child-1"}, + }, + "child-1": { + "state": "waiting_for_human_input", + "pending_requests": {"inner-1": {"request_id": "inner-1", "source_executor_id": "inner_node"}}, + }, + }, + ) + + ids = {r["request_id"] for r in workflow_client.get_pending_hitl_requests("parent")} + + assert ids == {"top-1", "sub::inner-1"} + + def test_collects_deeply_nested_request_with_full_path( + self, workflow_client: DurableWorkflowClient, mock_client: Mock + ) -> None: + """Two levels of nesting accumulate a full {a}::{b}::{id} path.""" + self._states( + mock_client, + { + "parent": {"state": "running", "subworkflows": {"mid": "child-1"}}, + "child-1": {"state": "running", "subworkflows": {"leaf": "child-2"}}, + "child-2": { + "state": "waiting_for_human_input", + "pending_requests": {"deep": {"request_id": "deep", "source_executor_id": "leaf_node"}}, + }, + }, + ) + + requests = workflow_client.get_pending_hitl_requests("parent") + + assert [r["request_id"] for r in requests] == ["mid::leaf::deep"] + + def test_send_qualified_response_routes_to_child_instance( + self, workflow_client: DurableWorkflowClient, mock_client: Mock + ) -> None: + """A qualified id resolves to the owning child instance and bare request id.""" + self._states( + mock_client, + {"parent": {"state": "running", "subworkflows": {"sub": "child-1"}}}, + ) + + workflow_client.send_hitl_response("parent", "sub::req-9", {"approved": True}) + + mock_client.raise_orchestration_event.assert_called_once() + args, kwargs = mock_client.raise_orchestration_event.call_args + assert args[0] == "child-1" + assert kwargs["event_name"] == "req-9" + assert kwargs["data"] == {"approved": True} + + def test_send_deeply_qualified_response_routes_to_leaf( + self, workflow_client: DurableWorkflowClient, mock_client: Mock + ) -> None: + """A two-level qualified id lands on the leaf child with the bare id.""" + self._states( + mock_client, + { + "parent": {"state": "running", "subworkflows": {"mid": "child-1"}}, + "child-1": {"state": "running", "subworkflows": {"leaf": "child-2"}}, + }, + ) + + workflow_client.send_hitl_response("parent", "mid::leaf::deep", {"ok": 1}) + + args, kwargs = mock_client.raise_orchestration_event.call_args + assert args[0] == "child-2" + assert kwargs["event_name"] == "deep" + + def test_send_qualified_response_unknown_subworkflow_raises( + self, workflow_client: DurableWorkflowClient, mock_client: Mock + ) -> None: + """A qualified id for an inactive sub-workflow raises and delivers nothing.""" + self._states(mock_client, {"parent": {"state": "running"}}) # no subworkflows map + + with pytest.raises(ValueError, match="No active sub-workflow"): + workflow_client.send_hitl_response("parent", "sub::req-9", {"approved": True}) + + mock_client.raise_orchestration_event.assert_not_called() + + def test_unqualified_response_still_targets_named_instance( + self, workflow_client: DurableWorkflowClient, mock_client: Mock + ) -> None: + """A plain (unqualified) request id targets the given instance directly.""" + self._states(mock_client, {"parent": {"state": "waiting_for_human_input"}}) + + workflow_client.send_hitl_response("parent", "req-1", {"approved": True}) + + args, kwargs = mock_client.raise_orchestration_event.call_args + assert args[0] == "parent" + assert kwargs["event_name"] == "req-1" diff --git a/python/samples/04-hosting/durabletask/12_subworkflow_hitl/README.md b/python/samples/04-hosting/durabletask/12_subworkflow_hitl/README.md new file mode 100644 index 00000000000..c6d1c28c37f --- /dev/null +++ b/python/samples/04-hosting/durabletask/12_subworkflow_hitl/README.md @@ -0,0 +1,63 @@ +# Human-in-the-Loop in a Sub-Workflow (Durable Task Worker) + +This sample combines **workflow composition** (`11_subworkflow`) with +**human-in-the-loop** (`09_workflow_hitl`): the HITL `request_info` pause lives +**inside an inner workflow** that an outer workflow embeds via `WorkflowExecutor`. + +On the durable host the inner workflow runs as its own **child orchestration**, so +its pending request is recorded on the *child* instance. The parent records the +child instance id in its custom status, which lets the client discover the nested +request behind a **single top-level addressing surface**. + +## Key Concepts Demonstrated + +- A HITL pause (`ctx.request_info` / `@response_handler`) inside a sub-workflow. +- `DurableAIAgentWorker.configure_workflow(outer_workflow)` registers a durable + orchestration for each workflow: + - `dafx-moderation_pipeline` — the outer workflow. + - `dafx-human_review` — the inner (HITL) workflow, run as a child orchestration. +- **Qualified request ids:** the nested request surfaces to the client with a + qualified id (`review_sub::{requestId}`). The client posts the response against the + *top-level* instance id, and the host routes it to the owning child orchestration — + so the caller never has to discover child instance ids. + +## Composition Layout + +```text +moderation_pipeline (outer) + intake (executor) + -> review_sub = WorkflowExecutor(human_review) + review_gate (executor: request_info -> response_handler) + -> publish (executor) +``` + +## Environment Setup + +See the [README.md](../README.md) in the parent directory for environment setup. + +This sample uses **no AI agents**, so no model credentials are required. It only +needs a Durable Task Scheduler. For local development, start the emulator (defaults +to `http://localhost:8080`): + +```bash +docker run -d -p 8080:8080 -p 8082:8082 mcr.microsoft.com/dts/dts-emulator:latest +``` + +## Running the Sample + +Start the worker in one terminal: + +```bash +cd samples/04-hosting/durabletask/12_subworkflow_hitl +python worker.py +``` + +In a second terminal, run the client: + +```bash +python client.py +``` + +Each case flows: `intake` → `review_sub` (child orchestration pauses at +`review_gate`) → client responds to the qualified request → `review_gate` resumes → +inner decision forwarded to `publish` → final output. diff --git a/python/samples/04-hosting/durabletask/12_subworkflow_hitl/client.py b/python/samples/04-hosting/durabletask/12_subworkflow_hitl/client.py new file mode 100644 index 00000000000..da2405fd36e --- /dev/null +++ b/python/samples/04-hosting/durabletask/12_subworkflow_hitl/client.py @@ -0,0 +1,134 @@ +# Copyright (c) Microsoft. All rights reserved. + +"""Client that drives the composed HITL workflow, responding to a nested sub-workflow request. + +The worker (``worker.py``) must be running first. This client: + +1. Starts the *outer* workflow with ``DurableWorkflowClient.start_workflow``. +2. Polls ``get_pending_hitl_requests`` until a request appears. Because the HITL pause + happens inside a sub-workflow, the request surfaces with a **qualified** request id + (``review_sub::{requestId}``). +3. Sends the decision with ``send_hitl_response`` against the *top-level* instance id and + the qualified request id; the host routes it to the owning child orchestration. +4. Reads the final output with ``await_workflow_output``. + +It runs two cases: approved content and rejected content. + +Prerequisites: +- ``worker.py`` running and connected to the same Durable Task Scheduler. +- A Durable Task Scheduler reachable at ``ENDPOINT`` (default ``http://localhost:8080``). +""" + +import asyncio +import logging +import os +import time +from typing import Any + +from agent_framework.azure import DurableWorkflowClient +from azure.identity import AzureCliCredential +from dotenv import load_dotenv +from durabletask.azuremanaged.client import DurableTaskSchedulerClient + +load_dotenv() + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + +# The client targets the outer workflow; the HITL pause lives in the sub-workflow. +WORKFLOW_NAME = "moderation_pipeline" + + +def get_client(taskhub: str | None = None, endpoint: str | None = None) -> DurableTaskSchedulerClient: + """Create a configured DurableTaskSchedulerClient.""" + taskhub_name = taskhub or os.getenv("TASKHUB", "default") + endpoint_url = endpoint or os.getenv("ENDPOINT", "http://localhost:8080") + + credential = None if endpoint_url == "http://localhost:8080" else AzureCliCredential() + + return DurableTaskSchedulerClient( + host_address=endpoint_url, + secure_channel=endpoint_url != "http://localhost:8080", + taskhub=taskhub_name, + token_credential=credential, + ) + + +def _wait_for_hitl_request( + client: DurableWorkflowClient, instance_id: str, timeout_seconds: int = 60 +) -> list[dict[str, Any]]: + """Poll until the workflow (or one of its sub-workflows) has a pending HITL request. + + Stops early if the workflow reaches a terminal state without pausing, so a + misconfiguration surfaces the real status instead of a misleading timeout. + """ + terminal_statuses = {"COMPLETED", "FAILED", "TERMINATED"} + deadline = time.time() + timeout_seconds + while time.time() < deadline: + pending = client.get_pending_hitl_requests(instance_id) + if pending: + return pending + status = client.get_runtime_status(instance_id) + if status in terminal_statuses: + raise RuntimeError( + f"Workflow instance {instance_id} reached terminal state '{status}' before pausing for human input." + ) + time.sleep(2) + raise TimeoutError(f"Timed out waiting for a HITL request on instance {instance_id}") + + +def run_case(client: DurableWorkflowClient, submission: dict[str, Any], *, approve: bool) -> None: + """Run one moderation case: start, respond to the nested HITL pause, print the result.""" + instance_id = client.start_workflow(input=submission) + logger.info("Started workflow instance: %s", instance_id) + + pending = _wait_for_hitl_request(client, instance_id) + request = pending[0] + # The request id is qualified (e.g. "review_sub::") because the pause lives + # in a sub-workflow. We pass it back verbatim against the top-level instance id; + # the host resolves it to the owning child orchestration. + logger.info("Pending HITL request %s from %s", request["request_id"], request["source_executor_id"]) + + decision = { + "approved": approve, + "reviewer_notes": "Looks good." if approve else "Violates content policy.", + } + client.send_hitl_response(instance_id, request["request_id"], decision) + logger.info("Sent decision: approved=%s", approve) + + output = client.await_workflow_output(instance_id) + logger.info("Workflow output: %s", output) + + +async def main() -> None: + """Run an approved case and a rejected case.""" + client = DurableWorkflowClient(get_client(), workflow_name=WORKFLOW_NAME) + + logger.info("CASE 1: Appropriate content (will approve)") + run_case( + client, + { + "content_id": "article-001", + "title": "Introduction to AI in Healthcare", + "body": ( + "Artificial intelligence is improving healthcare by enabling faster diagnosis, " + "personalized treatment plans, and better patient outcomes." + ), + }, + approve=True, + ) + + logger.info("CASE 2: Spammy content (will reject)") + run_case( + client, + { + "content_id": "article-002", + "title": "Get Rich Quick", + "body": "Click here NOW to make $10,000 overnight! GUARANTEED! Limited time offer!", + }, + approve=False, + ) + + +if __name__ == "__main__": + asyncio.run(main()) diff --git a/python/samples/04-hosting/durabletask/12_subworkflow_hitl/worker.py b/python/samples/04-hosting/durabletask/12_subworkflow_hitl/worker.py new file mode 100644 index 00000000000..ecddf6c97c7 --- /dev/null +++ b/python/samples/04-hosting/durabletask/12_subworkflow_hitl/worker.py @@ -0,0 +1,274 @@ +# Copyright (c) Microsoft. All rights reserved. + +"""Worker hosting a composed workflow whose Human-in-the-Loop pause lives in a sub-workflow. + +This sample combines composition (``11_subworkflow``) with human-in-the-loop +(``09_workflow_hitl``): the HITL ``request_info`` happens **inside an inner +workflow** that an outer workflow embeds via ``WorkflowExecutor``. On the durable +host the inner workflow runs as its own child orchestration, so its pending request +is recorded on the *child* instance. The parent records the child instance id in its +custom status, which lets the client discover the nested request behind a single +top-level addressing surface. + +``DurableAIAgentWorker.configure_workflow`` walks the composition and registers a +durable orchestration for each workflow: + +- ``dafx-moderation_pipeline`` - the outer workflow. +- ``dafx-human_review`` - the inner workflow (run as a child orchestration), which + contains the HITL pause. + +Composition layout:: + + moderation_pipeline (outer) + intake (executor) + -> review_sub = WorkflowExecutor(human_review) + review_gate (executor: request_info -> response_handler) + -> publish (executor) + +The client sees the inner pending request with a **qualified** request id +(``review_sub::{requestId}``) and posts the response back to the *top-level* +instance; the host routes it to the owning child orchestration automatically. + +Prerequisites: +- Start a Durable Task Scheduler (e.g. the DTS emulator on ``localhost:8080``). + (This sample uses no AI agents, so no model credentials are required.) + +Run the worker (this process), then run ``client.py`` in another process. +""" + +import asyncio +import logging +import os +from dataclasses import dataclass + +from agent_framework import ( + Executor, + Workflow, + WorkflowBuilder, + WorkflowContext, + WorkflowExecutor, + handler, + response_handler, +) +from agent_framework.azure import DurableAIAgentWorker +from azure.identity import AzureCliCredential +from dotenv import load_dotenv +from durabletask.azuremanaged.worker import DurableTaskSchedulerWorker +from pydantic import BaseModel +from typing_extensions import Never + +load_dotenv() + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + +INNER_WORKFLOW_NAME = "human_review" +OUTER_WORKFLOW_NAME = "moderation_pipeline" + + +# ============================================================================ +# Data Models +# ============================================================================ + + +@dataclass +class ContentSubmission: + """Content submitted for moderation (outer workflow input).""" + + content_id: str + title: str + body: str + + +@dataclass +class HumanApprovalRequest: + """Request surfaced to the human reviewer (carried in the orchestration status).""" + + content_id: str + title: str + body: str + prompt: str + + +class HumanApprovalResponse(BaseModel): + """Response the external client sends back via the HITL response endpoint/method.""" + + approved: bool + reviewer_notes: str = "" + + +@dataclass +class ModerationDecision: + """The inner workflow's output: the human's decision for a submission.""" + + content_id: str + approved: bool + reviewer_notes: str + + +# ============================================================================ +# Inner workflow (contains the HITL pause) +# ============================================================================ + + +class ReviewGateExecutor(Executor): + """Inner-workflow executor that pauses for human approval via request_info.""" + + def __init__(self) -> None: + super().__init__(id="review_gate") + + @handler + async def request_review(self, submission: ContentSubmission, ctx: WorkflowContext) -> None: + prompt = ( + f"Please review the following content for publication:\n\n" + f"Title: {submission.title}\n" + f"Content: {submission.body}\n\n" + f"Approve or reject this content." + ) + approval_request = HumanApprovalRequest( + content_id=submission.content_id, + title=submission.title, + body=submission.body, + prompt=prompt, + ) + # Pause the (inner) workflow and wait for a human response. On the durable + # host this pauses the child orchestration running this inner workflow. + await ctx.request_info(request_data=approval_request, response_type=HumanApprovalResponse) + + @response_handler + async def handle_approval_response( + self, + original_request: HumanApprovalRequest, + response: HumanApprovalResponse, + ctx: WorkflowContext[Never, ModerationDecision], + ) -> None: + logger.info( + "Human review received for content %s: approved=%s", + original_request.content_id, + response.approved, + ) + # Yield the decision as the inner workflow's output; the WorkflowExecutor + # forwards it to the outer workflow as a message to the next node. + await ctx.yield_output( + ModerationDecision( + content_id=original_request.content_id, + approved=response.approved, + reviewer_notes=response.reviewer_notes, + ) + ) + + +def create_inner_workflow() -> Workflow: + """Build the inner ``human_review`` workflow (a single HITL gate).""" + review_gate = ReviewGateExecutor() + return WorkflowBuilder(name=INNER_WORKFLOW_NAME, start_executor=review_gate).build() + + +# ============================================================================ +# Outer workflow (embeds the inner workflow) +# ============================================================================ + + +class IntakeExecutor(Executor): + """Outer-workflow entry point that normalizes the submission before review.""" + + def __init__(self) -> None: + super().__init__(id="intake") + + @handler + async def intake(self, submission: ContentSubmission, ctx: WorkflowContext[ContentSubmission]) -> None: + logger.info("Intake received submission %s", submission.content_id) + await ctx.send_message(submission) + + +class PublishExecutor(Executor): + """Outer-workflow executor that consumes the inner workflow's forwarded decision.""" + + def __init__(self) -> None: + super().__init__(id="publish") + + @handler + async def handle_decision(self, decision: ModerationDecision, ctx: WorkflowContext[Never, str]) -> None: + if decision.approved: + message = ( + f"Content '{decision.content_id}' APPROVED and published. " + f"Reviewer notes: {decision.reviewer_notes or 'None'}" + ) + else: + message = f"Content '{decision.content_id}' REJECTED. Reviewer notes: {decision.reviewer_notes or 'None'}" + logger.info(message) + await ctx.yield_output(message) + + +def create_workflow() -> Workflow: + """Build the outer ``moderation_pipeline`` workflow embedding the HITL sub-workflow.""" + inner_workflow = create_inner_workflow() + + intake = IntakeExecutor() + # WorkflowExecutor embeds the inner (HITL) workflow as a single node. On the + # durable host this node runs as a child orchestration, and the inner pause + # surfaces to the client as a qualified request id (``review_sub::{requestId}``). + review_sub = WorkflowExecutor(inner_workflow, id="review_sub") + publish = PublishExecutor() + + return ( + WorkflowBuilder(name=OUTER_WORKFLOW_NAME, start_executor=intake) + .add_edge(intake, review_sub) + .add_edge(review_sub, publish) + .build() + ) + + +def get_worker( + taskhub: str | None = None, endpoint: str | None = None, log_handler: logging.Handler | None = None +) -> DurableTaskSchedulerWorker: + """Create a configured DurableTaskSchedulerWorker.""" + taskhub_name = taskhub or os.getenv("TASKHUB", "default") + endpoint_url = endpoint or os.getenv("ENDPOINT", "http://localhost:8080") + + credential = None if endpoint_url == "http://localhost:8080" else AzureCliCredential() + + return DurableTaskSchedulerWorker( + host_address=endpoint_url, + secure_channel=endpoint_url != "http://localhost:8080", + taskhub=taskhub_name, + token_credential=credential, + log_handler=log_handler, + ) + + +def setup_worker(worker: DurableTaskSchedulerWorker) -> DurableAIAgentWorker: + """Register the outer workflow and its nested HITL sub-workflow on the worker.""" + agent_worker = DurableAIAgentWorker(worker) + + workflow = create_workflow() + # A single call registers the outer workflow plus the nested human_review + # sub-workflow (each as its own durable orchestration). + agent_worker.configure_workflow(workflow) + logger.info( + "✓ Configured workflow '%s' with embedded HITL sub-workflow '%s'", + OUTER_WORKFLOW_NAME, + INNER_WORKFLOW_NAME, + ) + + return agent_worker + + +async def main() -> None: + """Start the worker and block until interrupted.""" + worker = get_worker() + setup_worker(worker) + + logger.info("Worker is ready and listening for work items. Press Ctrl+C to stop.") + try: + worker.start() + while True: + await asyncio.sleep(1) + except KeyboardInterrupt: + logger.info("Worker shutdown initiated") + + logger.info("Worker stopped") + + +if __name__ == "__main__": + asyncio.run(main()) From 9985c3dda8135f2b23ee774abcb95e50ba174d7c Mon Sep 17 00:00:00 2001 From: Ahmed Muhsin Date: Tue, 23 Jun 2026 18:08:34 -0400 Subject: [PATCH 06/12] docs(durabletask): ADR + sample route docs for multi-workflow and sub-workflows (phase 5) - Add ADR-0030 capturing the multi-workflow and sub-workflow hosting decisions (naming, scoped inner names, per-workflow routes, child-orchestration sub-workflows, hard-switch migration, B2 sub-workflow HITL, scoped agent addressing) with considered alternatives; mark the design doc as implemented and link the ADR. - Update Azure Functions workflow samples (09-12) README/demo.http to the per-workflow route shape (workflow/{name}/run|status|respond) introduced in phase 2. - Extend the durabletask sample catalog with the workflow hosting patterns (08-12), including the new 11_subworkflow and 12_subworkflow_hitl samples. --- ...abletask-multiworkflow-and-subworkflows.md | 118 ++++++++++++++++++ ...abletask-multiworkflow-and-subworkflows.md | 3 +- .../09_workflow_shared_state/README.md | 4 +- .../09_workflow_shared_state/demo.http | 8 +- .../10_workflow_no_shared_state/README.md | 6 +- .../10_workflow_no_shared_state/demo.http | 6 +- .../11_workflow_parallel/README.md | 4 +- .../11_workflow_parallel/demo.http | 6 +- .../12_workflow_hitl/README.md | 14 +-- .../12_workflow_hitl/demo.http | 18 +-- .../samples/04-hosting/durabletask/README.md | 7 ++ 11 files changed, 160 insertions(+), 34 deletions(-) create mode 100644 docs/decisions/0030-durabletask-multiworkflow-and-subworkflows.md diff --git a/docs/decisions/0030-durabletask-multiworkflow-and-subworkflows.md b/docs/decisions/0030-durabletask-multiworkflow-and-subworkflows.md new file mode 100644 index 00000000000..0808e34e179 --- /dev/null +++ b/docs/decisions/0030-durabletask-multiworkflow-and-subworkflows.md @@ -0,0 +1,118 @@ +--- +status: proposed +contact: ahmedmuhsin +date: 2025-06-13 +deciders: ahmedmuhsin +consulted: +informed: +--- + +# Durable Task hosting: multiple workflows per host and sub-workflows + +## Context and Problem Statement + +The Python Durable Task hosting layer (the standalone `DurableAIAgentWorker` and the Azure Functions `AgentFunctionApp`) originally hosted exactly **one** MAF `Workflow` per host, registered under a single fixed orchestration name (`workflow_orchestrator`). Two capabilities were missing relative to the in-process MAF runtime and the .NET durable host: + +1. **Multiple workflows per host** — one worker / one Function app could not host more than one workflow, and two workflows that happened to reuse an executor or agent id would collide on shared durable primitive names. +2. **Sub-workflows (composition)** — MAF's `WorkflowExecutor` embeds one workflow inside another, but the durable hosts had no way to run a nested workflow as a first-class durable unit. + +This ADR records the design decisions for adding both capabilities to the Python durable hosts, keeping them aligned with the .NET durable host where it matters (the Durable Task tooling/UI surface) and with the in-process MAF semantics everywhere else. The full design exploration lives in [`docs/design/durabletask-multiworkflow-and-subworkflows.md`](../design/durabletask-multiworkflow-and-subworkflows.md); this ADR captures the decisions and the considered alternatives. + +## Decision Drivers + +- **Stable durable identities.** Durable replay only resumes an in-flight orchestration if the orchestration/activity/entity names still resolve to the same functions. Names must be stable across restarts and derived deterministically from a workflow name. +- **No accidental collisions.** Two co-hosted workflows that reuse an executor or agent id must not share a durable entity/activity, or one workflow's implementation would silently service another's dispatch. +- **Alignment with .NET on the surfaced identity.** The orchestration name is what the Durable Task tooling/UI shows; it should match .NET's `WorkflowNamingHelper` byte-for-byte. +- **Alignment with in-process MAF semantics.** Sub-workflow output forwarding, HITL request/response, and per-run state isolation should behave the same durably as in-process. +- **Stable caller surface.** HTTP callers should not have to change URLs as an app grows from one workflow to many, or discover internal child orchestration instance ids for nested HITL. +- **Determinism.** Orchestrator code must be replay-safe (deterministic child instance ids, no `uuid4()` in the orchestrator). + +## Considered Options + +The design has several semi-independent decision points; the considered alternatives are grouped by decision below. The chosen options are summarized in the Decision Outcome. + +### Workflow-internal durable names (collision avoidance) + +- **Approach A — scope inner names by workflow** (`dafx-{workflowName}-{executorId}`). Distinct names per workflow using plain closures; no runtime registry. + - Good: removes same-executor-id collisions with no extra moving parts. + - Good: each workflow's primitives are independently inspectable. + - Neutral: diverges from .NET's bare `dafx-{executorId}` inner names (but those are not UI-surfaced). +- **Approach B — a runtime registry keyed by (workflow, executor)** mapping to shared handlers. + - Good: closer to .NET's bare inner names. + - Bad: introduces a registry indirection and a stateful lookup on the hot path; more to get wrong on replay. + +### Sub-workflow execution model + +- **Model A — run the inner workflow inside one activity** of the parent orchestration. + - Good: fewest orchestration instances. + - Bad: the inner workflow's executors are not independently durable or observable; HITL inside the inner workflow cannot pause durably. +- **Model B — run the inner workflow as a durable child orchestration** via `call_sub_orchestrator(dafx-{innerName})`. + - Good: matches what the .NET durable host does (`ExecuteSubWorkflowAsync` → child orchestration). + - Good: inner executors are independently durable/observable; inner HITL pauses durably on the child instance. + - Neutral: more orchestration instances on the task hub. + +### Azure Functions route shape for multiple workflows + +- **Always per-workflow routes** (`workflow/{name}/run|status|respond`), even for a single workflow. + - Good: the URL shape never changes as an app grows; callers are stable. + - Neutral: a single-workflow app has a slightly longer URL. +- **Bare routes for one workflow, per-workflow routes only when there are many.** + - Bad: callers must change URLs when a second workflow is added. + +### Single-workflow orchestration-name migration + +- **Hard switch** to `dafx-{name}` with no runtime alias for the old `workflow_orchestrator` name. + - Good: one naming scheme everywhere; no special-case alias code. + - Bad: pre-upgrade in-flight single-workflow instances under `workflow_orchestrator` will not resume. + - Acceptable: durable workflow runs are typically short-lived and this is a preview surface; operators drain in-flight instances before upgrading. `WORKFLOW_ORCHESTRATOR_NAME` remains exported as a deprecated source alias. +- **Dual registration / runtime alias** that resumes both names. + - Good: in-flight instances survive the upgrade. + - Bad: permanent alias-compat code on a preview surface for a low-value case. + +### Sub-workflow HITL addressing + +- **B1 — direct child addressing.** Expose child instance ids; the responder posts to `workflow/{innerName}/respond/{childInstanceId}/{requestId}`. + - Good: simple host plumbing. + - Bad: leaks child instance ids to the caller and changes the addressing surface per nesting depth. +- **B2 — propagated single surface.** Bubble inner pending requests up into the parent custom status with qualified request ids (`{executorId}::{requestId}`); a response to the parent is routed by stripping the qualifier and raising the event on the owning child instance. + - Good: one addressing surface for arbitrarily deep nesting; the caller always talks to the top-level run. + - Good: consistent with the "always per-workflow, stable surface" decision. + - Neutral: requires parent→child response plumbing in the host/client. + +### Workflow agent addressing + +- **Reuse `add_agent` with a scoped entity id** (`dafx-{workflowName}-{executorId}`); workflow agents are reachable via `get_agent(name, workflow_name=...)`. No separate `workflow_agents=` kwarg. + - Good: one registration path; workflow agents appear in `agents` / `get_agent`. + - Good: the per-workflow grouping is an internal planner structure both hosts consume. +- **A separate `workflow_agents=` registration surface.** + - Bad: a parallel registration path and a second public kwarg for what is an internal grouping concern. + +## Decision Outcome + +1. **Orchestration naming:** `dafx-{workflowName}` (matches .NET; the UI-surfaced name). +2. **Workflow-internal durable names:** **Approach A** — scope inner activity/entity names by workflow (`dafx-{workflowName}-{executorId}`). +3. **Azure Functions route shape:** **always per-workflow routes** (`workflow/{name}/run|status|respond`). +4. **Sub-workflow execution model:** **Model B** — child orchestration via `call_sub_orchestrator`, matching the .NET durable host. +5. **Single-workflow migration:** **hard switch** to `dafx-{name}` with no runtime alias; `WORKFLOW_ORCHESTRATOR_NAME` stays as a deprecated source alias only. +6. **Sub-workflow HITL addressing:** **B2** — propagate inner pending requests to the parent custom status with qualified request ids; the caller always responds to the top-level run. +7. **Workflow agent addressing:** register through the **same** `add_agent` primitive under the scoped name; reachable via `get_agent(name, workflow_name=...)`; no `workflow_agents=` kwarg. Agent conversation state stays isolated by the entity key (`ctx.instance_id`). + +### Consequences + +- Good: two workflows can be co-hosted on one worker / app and reuse executor and agent ids without colliding; each workflow's durable primitives are independently inspectable. +- Good: sub-workflows are first-class durable units; inner HITL pauses durably and surfaces behind a single top-level addressing surface. +- Good: the orchestration name remains identical to .NET, so the Durable Task tooling/UI is consistent across languages. +- Good: HTTP callers have a stable URL shape and never need to discover internal child instance ids. +- Bad / accepted: pre-upgrade single-workflow instances under `workflow_orchestrator` will not resume after the hard switch. +- Neutral: sub-workflows add orchestration instances to the task hub (one child orchestration per `WorkflowExecutor` invocation). + +### Out of scope / follow-up + +- **Cross-workflow shared agents.** A single agent that intentionally shares conversation memory across two co-hosted workflows is out of scope. Today, agent state is isolated per run by the entity key (`ctx.instance_id`); intentional sharing would need an explicit stable shared entity key rather than `instance_id`. Flagged as a possible follow-up. + +## More Information + +- Design document: [`docs/design/durabletask-multiworkflow-and-subworkflows.md`](../design/durabletask-multiworkflow-and-subworkflows.md) +- Implementation: Python `agent_framework_durabletask` (standalone worker, client, orchestrator, naming) and `agent_framework_azurefunctions` (`AgentFunctionApp`). +- Samples: `python/samples/04-hosting/durabletask/11_subworkflow` (composition) and `.../12_subworkflow_hitl` (HITL inside a sub-workflow). +- .NET reference: `WorkflowNamingHelper` (orchestration naming) and the durable host's `ExecuteSubWorkflowAsync` (sub-workflow as child orchestration). diff --git a/docs/design/durabletask-multiworkflow-and-subworkflows.md b/docs/design/durabletask-multiworkflow-and-subworkflows.md index 8415b9609dc..c3b36e0b3b6 100644 --- a/docs/design/durabletask-multiworkflow-and-subworkflows.md +++ b/docs/design/durabletask-multiworkflow-and-subworkflows.md @@ -1,6 +1,7 @@ # Durable hosting: multiple workflows and sub-workflows (Python) -Status: Draft / for discussion +Status: Implemented — decisions promoted to +[ADR-0030](../decisions/0030-durabletask-multiworkflow-and-subworkflows.md) Scope: `python/packages/durabletask` (standalone Durable Task worker) and `python/packages/azurefunctions` (Azure Functions host) Related: PR #6418 (standalone Durable Task workflow hosting), core diff --git a/python/samples/04-hosting/azure_functions/09_workflow_shared_state/README.md b/python/samples/04-hosting/azure_functions/09_workflow_shared_state/README.md index 44866739734..6fd2a6b4a9c 100644 --- a/python/samples/04-hosting/azure_functions/09_workflow_shared_state/README.md +++ b/python/samples/04-hosting/azure_functions/09_workflow_shared_state/README.md @@ -69,14 +69,14 @@ Use the `demo.http` file with REST Client extension or curl: ### Test Spam Email ```bash -curl -X POST http://localhost:7071/api/workflow/run \ +curl -X POST http://localhost:7071/api/workflow/email_triage_shared_state/run \ -H "Content-Type: application/json" \ -d '"URGENT! You have won $1,000,000! Click here to claim!"' ``` ### Test Legitimate Email ```bash -curl -X POST http://localhost:7071/api/workflow/run \ +curl -X POST http://localhost:7071/api/workflow/email_triage_shared_state/run \ -H "Content-Type: application/json" \ -d '"Hi team, reminder about our meeting tomorrow at 10 AM."' ``` diff --git a/python/samples/04-hosting/azure_functions/09_workflow_shared_state/demo.http b/python/samples/04-hosting/azure_functions/09_workflow_shared_state/demo.http index 48b6a73f727..50f0ef0a68c 100644 --- a/python/samples/04-hosting/azure_functions/09_workflow_shared_state/demo.http +++ b/python/samples/04-hosting/azure_functions/09_workflow_shared_state/demo.http @@ -1,25 +1,25 @@ @endpoint = http://localhost:7071 ### Start the workflow with a spam email -POST {{endpoint}}/api/workflow/run +POST {{endpoint}}/api/workflow/email_triage_shared_state/run Content-Type: application/json "URGENT! You have won $1,000,000! Click here to claim your prize now before it expires!" ### Start the workflow with a legitimate email -POST {{endpoint}}/api/workflow/run +POST {{endpoint}}/api/workflow/email_triage_shared_state/run Content-Type: application/json "Hi team, just a reminder about the sprint planning meeting tomorrow at 10 AM. Please review the agenda items in Jira before the call." ### Start the workflow with another legitimate email -POST {{endpoint}}/api/workflow/run +POST {{endpoint}}/api/workflow/email_triage_shared_state/run Content-Type: application/json "Hello, I wanted to follow up on our conversation from last week regarding the project timeline. Could we schedule a brief call this afternoon to discuss the next steps?" ### Start the workflow with a phishing attempt -POST {{endpoint}}/api/workflow/run +POST {{endpoint}}/api/workflow/email_triage_shared_state/run Content-Type: application/json "Dear Customer, Your account has been compromised! Click this link immediately to secure your account: http://totallylegit.suspicious.com/secure" diff --git a/python/samples/04-hosting/azure_functions/10_workflow_no_shared_state/README.md b/python/samples/04-hosting/azure_functions/10_workflow_no_shared_state/README.md index f5f77f3c911..2de3569f33a 100644 --- a/python/samples/04-hosting/azure_functions/10_workflow_no_shared_state/README.md +++ b/python/samples/04-hosting/azure_functions/10_workflow_no_shared_state/README.md @@ -72,21 +72,21 @@ Use the `demo.http` file with REST Client extension or curl: ### Test Spam Email ```bash -curl -X POST http://localhost:7071/api/workflow/run \ +curl -X POST http://localhost:7071/api/workflow/email_triage/run \ -H "Content-Type: application/json" \ -d '{"email_id": "test-001", "email_content": "URGENT! You have won $1,000,000! Click here!"}' ``` ### Test Legitimate Email ```bash -curl -X POST http://localhost:7071/api/workflow/run \ +curl -X POST http://localhost:7071/api/workflow/email_triage/run \ -H "Content-Type: application/json" \ -d '{"email_id": "test-002", "email_content": "Hi team, reminder about our meeting tomorrow at 10 AM."}' ``` ### Check Status ```bash -curl http://localhost:7071/api/workflow/status/{instanceId} +curl http://localhost:7071/api/workflow/email_triage/status/{instanceId} ``` ## Expected Output diff --git a/python/samples/04-hosting/azure_functions/10_workflow_no_shared_state/demo.http b/python/samples/04-hosting/azure_functions/10_workflow_no_shared_state/demo.http index 2c81ddc9bc4..42f24e38b14 100644 --- a/python/samples/04-hosting/azure_functions/10_workflow_no_shared_state/demo.http +++ b/python/samples/04-hosting/azure_functions/10_workflow_no_shared_state/demo.http @@ -1,5 +1,5 @@ ### Start Workflow Orchestration - Spam Email -POST http://localhost:7071/api/workflow/run +POST http://localhost:7071/api/workflow/email_triage/run Content-Type: application/json { @@ -10,7 +10,7 @@ Content-Type: application/json ### ### Start Workflow Orchestration - Legitimate Email -POST http://localhost:7071/api/workflow/run +POST http://localhost:7071/api/workflow/email_triage/run Content-Type: application/json { @@ -22,7 +22,7 @@ Content-Type: application/json ### Get Workflow Status # Replace {instanceId} with the actual instance ID from the start response -GET http://localhost:7071/api/workflow/status/{instanceId} +GET http://localhost:7071/api/workflow/email_triage/status/{instanceId} ### diff --git a/python/samples/04-hosting/azure_functions/11_workflow_parallel/README.md b/python/samples/04-hosting/azure_functions/11_workflow_parallel/README.md index e12cec00461..669f2579844 100644 --- a/python/samples/04-hosting/azure_functions/11_workflow_parallel/README.md +++ b/python/samples/04-hosting/azure_functions/11_workflow_parallel/README.md @@ -155,7 +155,7 @@ Use the `demo.http` file with REST Client extension or curl: ### Analyze a Document ```bash -curl -X POST http://localhost:7071/api/workflow/run \ +curl -X POST http://localhost:7071/api/workflow/parallel_review/run \ -H "Content-Type: application/json" \ -d '{ "document_id": "doc-001", @@ -165,7 +165,7 @@ curl -X POST http://localhost:7071/api/workflow/run \ ### Check Status ```bash -curl http://localhost:7071/api/workflow/status/{instanceId} +curl http://localhost:7071/api/workflow/parallel_review/status/{instanceId} ``` ## Observing Parallel Execution diff --git a/python/samples/04-hosting/azure_functions/11_workflow_parallel/demo.http b/python/samples/04-hosting/azure_functions/11_workflow_parallel/demo.http index a8ae96e4523..065faf6c742 100644 --- a/python/samples/04-hosting/azure_functions/11_workflow_parallel/demo.http +++ b/python/samples/04-hosting/azure_functions/11_workflow_parallel/demo.http @@ -1,5 +1,5 @@ ### Analyze a document (triggers parallel workflow) -POST http://localhost:7071/api/workflow/run +POST http://localhost:7071/api/workflow/parallel_review/run Content-Type: application/json { @@ -10,7 +10,7 @@ Content-Type: application/json ### ### Short document test -POST http://localhost:7071/api/workflow/run +POST http://localhost:7071/api/workflow/parallel_review/run Content-Type: application/json { @@ -21,7 +21,7 @@ Content-Type: application/json ### ### Check workflow status -GET http://localhost:7071/api/workflow/status/{{instanceId}} +GET http://localhost:7071/api/workflow/parallel_review/status/{{instanceId}} ### diff --git a/python/samples/04-hosting/azure_functions/12_workflow_hitl/README.md b/python/samples/04-hosting/azure_functions/12_workflow_hitl/README.md index 68850bea640..090e84c145f 100644 --- a/python/samples/04-hosting/azure_functions/12_workflow_hitl/README.md +++ b/python/samples/04-hosting/azure_functions/12_workflow_hitl/README.md @@ -43,9 +43,9 @@ async def handle_approval_response( | Endpoint | Description | |----------|-------------| -| `POST /api/workflow/run` | Start the workflow | -| `GET /api/workflow/status/{instanceId}` | Check status and pending HITL requests | -| `POST /api/workflow/respond/{instanceId}/{requestId}` | Send human response | +| `POST /api/workflow/content_moderation/run` | Start the workflow | +| `GET /api/workflow/content_moderation/status/{instanceId}` | Check status and pending HITL requests | +| `POST /api/workflow/content_moderation/respond/{instanceId}/{requestId}` | Send human response | | `GET /api/health` | Health check | ### Durable Functions Integration @@ -129,10 +129,10 @@ This launches the DevUI at http://localhost:8096 where you can interact with the Use the `demo.http` file with the VS Code REST Client extension: -1. **Start workflow** - `POST /api/workflow/run` with content payload -2. **Check status** - `GET /api/workflow/status/{instanceId}` to see pending HITL requests -3. **Send response** - `POST /api/workflow/respond/{instanceId}/{requestId}` with approval -4. **Check result** - `GET /api/workflow/status/{instanceId}` to see final output +1. **Start workflow** - `POST /api/workflow/content_moderation/run` with content payload +2. **Check status** - `GET /api/workflow/content_moderation/status/{instanceId}` to see pending HITL requests +3. **Send response** - `POST /api/workflow/content_moderation/respond/{instanceId}/{requestId}` with approval +4. **Check result** - `GET /api/workflow/content_moderation/status/{instanceId}` to see final output ## Related Samples diff --git a/python/samples/04-hosting/azure_functions/12_workflow_hitl/demo.http b/python/samples/04-hosting/azure_functions/12_workflow_hitl/demo.http index 9ed4c368c9e..423254964a0 100644 --- a/python/samples/04-hosting/azure_functions/12_workflow_hitl/demo.http +++ b/python/samples/04-hosting/azure_functions/12_workflow_hitl/demo.http @@ -20,7 +20,7 @@ ### This starts the workflow. The AI will analyze the content, then the workflow ### will pause waiting for human approval. -POST http://localhost:7071/api/workflow/run +POST http://localhost:7071/api/workflow/content_moderation/run Content-Type: application/json { @@ -36,7 +36,7 @@ Content-Type: application/json ### ============================================================================ ### This content should trigger higher risk assessment from the AI analyzer. -POST http://localhost:7071/api/workflow/run +POST http://localhost:7071/api/workflow/content_moderation/run Content-Type: application/json { @@ -55,7 +55,7 @@ Content-Type: application/json @instanceId = 3130c486c9374e4e87125cbd9a238dfc -GET http://localhost:7071/api/workflow/status/{{instanceId}} +GET http://localhost:7071/api/workflow/content_moderation/status/{{instanceId}} ### ============================================================================ @@ -66,7 +66,7 @@ GET http://localhost:7071/api/workflow/status/{{instanceId}} @requestId = 1682e5f8-0917-4b68-aa04-d4688cfa2e69 -POST http://localhost:7071/api/workflow/respond/{{instanceId}}/{{requestId}} +POST http://localhost:7071/api/workflow/content_moderation/respond/{{instanceId}}/{{requestId}} Content-Type: application/json { @@ -80,7 +80,7 @@ Content-Type: application/json ### ============================================================================ ### Reject the content with feedback. -POST http://localhost:7071/api/workflow/respond/{{instanceId}}/{{requestId}} +POST http://localhost:7071/api/workflow/content_moderation/respond/{{instanceId}}/{{requestId}} Content-Type: application/json { @@ -94,15 +94,15 @@ Content-Type: application/json ### ============================================================================ ### ### Step 1: Start workflow with content -### POST http://localhost:7071/api/workflow/run +### POST http://localhost:7071/api/workflow/content_moderation/run ### -> Returns instanceId: "abc123..." ### ### Step 2: Check status (workflow is waiting for human input) -### GET http://localhost:7071/api/workflow/status/abc123 +### GET http://localhost:7071/api/workflow/content_moderation/status/abc123 ### -> Returns pendingHumanInputRequests with requestId: "req-456..." ### ### Step 3: Approve content -### POST http://localhost:7071/api/workflow/respond/abc123/req-456 +### POST http://localhost:7071/api/workflow/content_moderation/respond/abc123/req-456 ### { ### "approved": true, ### "reviewer_notes": "Looks good!" @@ -110,7 +110,7 @@ Content-Type: application/json ### -> Returns success ### ### Step 4: Check final status -### GET http://localhost:7071/api/workflow/status/abc123 +### GET http://localhost:7071/api/workflow/content_moderation/status/abc123 ### -> Returns runtimeStatus: "Completed", output: "✅ Content approved..." ### ### ============================================================================ diff --git a/python/samples/04-hosting/durabletask/README.md b/python/samples/04-hosting/durabletask/README.md index 2f35ba65e26..c247965f532 100644 --- a/python/samples/04-hosting/durabletask/README.md +++ b/python/samples/04-hosting/durabletask/README.md @@ -15,6 +15,13 @@ This directory contains samples for durable agent hosting using the Durable Task - **[06_multi_agent_orchestration_conditionals](06_multi_agent_orchestration_conditionals/)**: Implement conditional branching in orchestrations with spam detection and email assistant agents. Demonstrates structured outputs with Pydantic models and activity functions for side effects. - **[07_single_agent_orchestration_hitl](07_single_agent_orchestration_hitl/)**: Human-in-the-loop pattern with external event handling, timeouts, and iterative refinement based on human feedback. Shows long-running workflows with external interactions. +### Workflow Hosting Patterns +- **[08_workflow](08_workflow/)**: Host a MAF `Workflow` as a durable orchestration on a standalone worker via `DurableAIAgentWorker.configure_workflow`. Demonstrates conditional routing and mixing AI agents with non-agent executors. +- **[09_workflow_hitl](09_workflow_hitl/)**: A workflow that pauses for human approval using `ctx.request_info` / `@response_handler`, with the client discovering and answering the pending request. +- **[10_workflow_streaming](10_workflow_streaming/)**: Stream a hosted workflow's events as typed `WorkflowEvent` objects by polling the orchestration's custom status. +- **[11_subworkflow](11_subworkflow/)**: Compose workflows by embedding an inner `Workflow` as a node via `WorkflowExecutor`. On the durable host the inner workflow runs as its own child orchestration, and a single `configure_workflow` call registers both. +- **[12_subworkflow_hitl](12_subworkflow_hitl/)**: A human-in-the-loop pause that lives **inside a sub-workflow**. The nested request surfaces to the client with a qualified request id (`{executor}::{requestId}`) behind a single top-level addressing surface. + ## Running the Samples These samples are designed to be run locally in a cloned repository. From 10035b8d3a8e8e06cac589cccb6b361c8cf55271 Mon Sep 17 00:00:00 2001 From: Ahmed Muhsin Date: Tue, 23 Jun 2026 20:37:25 -0400 Subject: [PATCH 07/12] fix(durabletask): harden sub-workflow hosting + add sub-workflow integration tests Post-review hardening of the multi-workflow / sub-workflow durable hosting: - Trust boundary: strip the reserved sub-workflow envelope key from untrusted client input at both host boundaries (DurableWorkflowClient.start_workflow and the AF start route) so a forged envelope cannot reach the trusted pickle path. - Nested HITL addressing: qualify nested pending requests by (executorId, ordinal) using a '~' separator (was '::', which collided with core's auto::N functional request ids); the parent status subworkflows map is now a per-executor list so multiple children dispatched in one superstep stay independently addressable. - Reject two different workflow instances that share a name (the same instance reused by sibling nodes is still deduped); validate executor ids (separator-free, length-bounded) when hosting durably. - Remove the arbitrary sub-workflow nesting depth cap: a WorkflowExecutor wraps a concrete Workflow so the nesting tree is finite at build time, and the durable instance-id length limit is the natural ceiling (matches .NET, which has none). Tests/samples: - New durabletask integration tests for sub-workflow composition (11) and nested sub-workflow HITL (12); new no-agent AF sub-workflow HITL sample (13) + test. - Exempt no-agent samples from the model-credential gate in both integration conftests so the nested-HITL plumbing is covered deterministically. - Update durabletask sample 12 docs to the new qualified-id format. Validated: 484 unit tests; durabletask integration 08/09/11/12 and AF 12/13 pass against the live emulators; pyright 0 errors; ruff clean. --- ...abletask-multiworkflow-and-subworkflows.md | 7 +- ...abletask-multiworkflow-and-subworkflows.md | 59 +++- .../agent_framework_azurefunctions/_app.py | 112 +++++--- .../tests/integration_tests/conftest.py | 7 +- .../test_13_workflow_subworkflow_hitl.py | 150 ++++++++++ .../packages/azurefunctions/tests/test_app.py | 102 ++++++- .../agent_framework_durabletask/__init__.py | 2 + .../agent_framework_durabletask/_worker.py | 23 +- .../_workflows/client.py | 102 ++++--- .../_workflows/naming.py | 125 +++++++- .../_workflows/orchestrator.py | 73 ++--- .../_workflows/registration.py | 26 +- .../_workflows/serialization.py | 39 +++ .../tests/integration_tests/conftest.py | 4 + .../test_11_dt_subworkflow.py | 74 +++++ .../test_12_dt_subworkflow_hitl.py | 152 ++++++++++ .../tests/test_subworkflow_orchestration.py | 16 +- .../packages/durabletask/tests/test_worker.py | 43 +++ .../durabletask/tests/test_workflow_client.py | 110 ++++++- .../durabletask/tests/test_workflow_naming.py | 63 ++++ .../tests/test_workflow_registration.py | 17 ++ .../tests/test_workflow_serialization.py | 28 ++ .../13_subworkflow_hitl/.gitignore | 5 + .../13_subworkflow_hitl/README.md | 70 +++++ .../13_subworkflow_hitl/demo.http | 74 +++++ .../13_subworkflow_hitl/function_app.py | 270 ++++++++++++++++++ .../13_subworkflow_hitl/host.json | 12 + .../local.settings.json.sample | 9 + .../13_subworkflow_hitl/requirements.txt | 11 + .../durabletask/12_subworkflow_hitl/README.md | 2 +- .../durabletask/12_subworkflow_hitl/client.py | 4 +- .../durabletask/12_subworkflow_hitl/worker.py | 4 +- 32 files changed, 1600 insertions(+), 195 deletions(-) create mode 100644 python/packages/azurefunctions/tests/integration_tests/test_13_workflow_subworkflow_hitl.py create mode 100644 python/packages/durabletask/tests/integration_tests/test_11_dt_subworkflow.py create mode 100644 python/packages/durabletask/tests/integration_tests/test_12_dt_subworkflow_hitl.py create mode 100644 python/samples/04-hosting/azure_functions/13_subworkflow_hitl/.gitignore create mode 100644 python/samples/04-hosting/azure_functions/13_subworkflow_hitl/README.md create mode 100644 python/samples/04-hosting/azure_functions/13_subworkflow_hitl/demo.http create mode 100644 python/samples/04-hosting/azure_functions/13_subworkflow_hitl/function_app.py create mode 100644 python/samples/04-hosting/azure_functions/13_subworkflow_hitl/host.json create mode 100644 python/samples/04-hosting/azure_functions/13_subworkflow_hitl/local.settings.json.sample create mode 100644 python/samples/04-hosting/azure_functions/13_subworkflow_hitl/requirements.txt diff --git a/docs/decisions/0030-durabletask-multiworkflow-and-subworkflows.md b/docs/decisions/0030-durabletask-multiworkflow-and-subworkflows.md index 0808e34e179..dd1dfa4ee3e 100644 --- a/docs/decisions/0030-durabletask-multiworkflow-and-subworkflows.md +++ b/docs/decisions/0030-durabletask-multiworkflow-and-subworkflows.md @@ -74,10 +74,12 @@ The design has several semi-independent decision points; the considered alternat - **B1 — direct child addressing.** Expose child instance ids; the responder posts to `workflow/{innerName}/respond/{childInstanceId}/{requestId}`. - Good: simple host plumbing. - Bad: leaks child instance ids to the caller and changes the addressing surface per nesting depth. -- **B2 — propagated single surface.** Bubble inner pending requests up into the parent custom status with qualified request ids (`{executorId}::{requestId}`); a response to the parent is routed by stripping the qualifier and raising the event on the owning child instance. +- **B2 — propagated single surface.** Bubble inner pending requests up into the parent custom status with qualified request ids (`{executorId}~{ordinal}~{requestId}`); a response to the parent is routed by peeling one hop and raising the event on the owning child instance. - Good: one addressing surface for arbitrarily deep nesting; the caller always talks to the top-level run. - Good: consistent with the "always per-workflow, stable surface" decision. + - Good: the `~{ordinal}~` hop indexes the parent's `subworkflows` child list, so a node that dispatches several children in one superstep keeps each addressable. - Neutral: requires parent→child response plumbing in the host/client. + - Note: the separator is `~` (not `::`) because core emits `auto::{index}` request ids for functional `@workflow` HITL; `~` never appears in a core request id and is rejected in executor ids, so qualified ids round-trip unambiguously. ### Workflow agent addressing @@ -94,8 +96,9 @@ The design has several semi-independent decision points; the considered alternat 3. **Azure Functions route shape:** **always per-workflow routes** (`workflow/{name}/run|status|respond`). 4. **Sub-workflow execution model:** **Model B** — child orchestration via `call_sub_orchestrator`, matching the .NET durable host. 5. **Single-workflow migration:** **hard switch** to `dafx-{name}` with no runtime alias; `WORKFLOW_ORCHESTRATOR_NAME` stays as a deprecated source alias only. -6. **Sub-workflow HITL addressing:** **B2** — propagate inner pending requests to the parent custom status with qualified request ids; the caller always responds to the top-level run. +6. **Sub-workflow HITL addressing:** **B2** — propagate inner pending requests to the parent custom status with qualified request ids (`{executorId}~{ordinal}~{requestId}`); the caller always responds to the top-level run. 7. **Workflow agent addressing:** register through the **same** `add_agent` primitive under the scoped name; reachable via `get_agent(name, workflow_name=...)`; no `workflow_agents=` kwarg. Agent conversation state stays isolated by the entity key (`ctx.instance_id`). +8. **Hardening:** reject two **different** workflow instances that share a name (the same instance reused by several nodes is deduped); validate executor ids (separator-free, length-bounded); and strip the reserved sub-workflow envelope key from untrusted client input at the host boundary so a forged envelope cannot reach the trusted pickle path. Sub-workflow nesting is **not** capped by a depth counter — the nesting tree is finite at build time and the durable instance-id length limit is the natural ceiling (matching .NET, which imposes no limit). ### Consequences diff --git a/docs/design/durabletask-multiworkflow-and-subworkflows.md b/docs/design/durabletask-multiworkflow-and-subworkflows.md index c3b36e0b3b6..1eb34ceb338 100644 --- a/docs/design/durabletask-multiworkflow-and-subworkflows.md +++ b/docs/design/durabletask-multiworkflow-and-subworkflows.md @@ -184,7 +184,14 @@ in-flight instances**. Therefore: auto-generated `WorkflowBuilder-` names at registration, mirroring .NET's assert-name-matches-key contract). - Names are validated/sanitized to the durable name charset. -- Duplicate names within one host are rejected. +- Duplicate names within one host are rejected: two **different** workflow + instances that share a name collide on one `dafx-{name}` orchestration and raise; + the **same** instance reused by several nodes (fan-out) is deduplicated and + registered once. +- Executor ids are validated for durable hosting too: they must be separator-free + (the nested-HITL qualifier, below) and length-bounded, since they are + interpolated into durable activity/entity names and nested child-orchestration + instance ids. ### 4.3 Durable names (decision: scope workflow-internal names by workflow) @@ -385,11 +392,19 @@ timeout coupling. result feeds back into edge routing exactly like an activity result (outputs → messages / final outputs). - **Deterministic child instance ids.** Derive - `f"{parent_instance_id}::{executor_id}"` (append a deterministic counter when a - `WorkflowExecutor` runs on multiple messages in a superstep, e.g. fan-out) for - discoverability and idempotent replay. -- **Recursion bound.** Detect cycles and cap nesting depth (configurable) to - prevent unbounded sub-orchestration trees. + `f"{parent_instance_id}::{executor_id}::{counter}"` (the counter is monotonic + across supersteps, so a `WorkflowExecutor` that runs on multiple messages — e.g. + fan-out — gets a distinct, replay-stable id per child) for discoverability and + idempotent replay. (These are orchestration *instance* ids, distinct from the + HITL *request*-id qualifier below.) +- **No artificial recursion cap.** Nesting depth is *not* bounded by a counter. A + `WorkflowExecutor` wraps a concrete `Workflow` instance, so the nesting tree is + finite and fixed at build time; there is no way to express unbounded runtime + recursion. The recursively-derived child instance ids grow with depth, so the + durable backend's instance-id length limit (together with the workflow-name and + executor-id caps) is the natural ceiling for any pathological construction — a + separate magic depth number would only reject legitimate deep compositions. This + also matches .NET, whose durable host imposes no depth limit. - **Result/output mapping.** Reuse the existing typed-output reconstruction (`deserialize_workflow_output`) on the child result before routing. @@ -404,15 +419,16 @@ custom status. Two addressing options: requests with their child instance ids). - **B2 — propagated single surface (recommended, .NET-aligned philosophy).** Bubble inner pending requests up into the **parent** custom status with - **qualified request ids** (`{executor_path}::{requestId}`), mirroring .NET port - qualification. A response to the parent is routed by stripping the qualifier and - raising the event on the owning child instance. One addressing surface for - arbitrarily deep nesting, at the cost of parent→child response plumbing. + **qualified request ids** (`{executorId}~{ordinal}~{requestId}`, nested deeper for + deeper levels), mirroring .NET port qualification. A response to the parent is + routed by peeling one hop at a time and raising the event on the owning child + instance. One addressing surface for arbitrarily deep nesting, at the cost of + parent→child response plumbing. **Decision: B2 (propagated single surface).** Pending inner requests bubble up into the **parent** custom status with **qualified request ids** -(`{executor_path}::{requestId}`), mirroring .NET port qualification. A response to -the parent is routed by stripping the qualifier and raising the event on the +(`{executorId}~{ordinal}~{requestId}`), mirroring .NET port qualification. A +response to the parent is routed by peeling one hop and raising the event on the owning child instance. This gives one addressing surface for arbitrarily deep nesting (the caller always talks to the top-level run), at the cost of parent→child response plumbing. It is consistent with the "always per-workflow, @@ -421,6 +437,21 @@ ids. B1 (direct child addressing) is the rejected alternative — simpler plumbi but leaks child instance ids into the caller and changes the surface per nesting depth. +The `~{ordinal}~` hop carries the child's index in the parent's `subworkflows` +status list (`{executorId: [childInstanceId, ...]}`), so a node that dispatches +several children in one superstep keeps each child independently addressable. The +separator is `~` (not `::`) because core emits `auto::{index}` request ids for +functional `@workflow` HITL; a `::` separator would mis-parse those leaf ids, +whereas `~` never appears in a core request id and is rejected in executor ids. + +**Trust boundary.** The sub-orchestration input envelope (which carries the +trusted, parent-serialized child payload) uses a reserved key that the child +orchestrator deserializes via pickle *without* the usual marker-stripping. A +genuine envelope is only ever built internally after the trust boundary, so hosts +strip that reserved key from untrusted client input before scheduling a run — +preventing a forged envelope from smuggling a pickle payload onto the trusted +deserialization path. + --- ## 6. Cross-cutting concerns @@ -463,8 +494,8 @@ Each phase is independently shippable. `_is_owned_orchestration`. Unit tests + a two-workflow sample. - **Phase 3 — sub-workflows via child orchestrations.** Protocol `call_sub_orchestrator` + both adapters; planner `subworkflow_executors` + - recursive registration; orchestrator routing; deterministic child ids; - recursion bound. Unit + integration tests + a nested-workflow sample. + recursive registration; orchestrator routing; deterministic child ids. Unit + + integration tests + a nested-workflow sample. - **Phase 4 — sub-workflow HITL (B2).** Propagate inner pending requests to the parent custom status with qualified request ids; route a parent response to the owning child instance by stripping the qualifier. Tests + HITL sub-workflow diff --git a/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py b/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py index cc5145f6cd9..d62aa72cc69 100644 --- a/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py +++ b/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py @@ -44,13 +44,15 @@ ) from agent_framework_durabletask._workflows.naming import ( SUBWORKFLOW_REQUEST_SEPARATOR, + split_subworkflow_request_id, + validate_executor_id, validate_workflow_name, workflow_executor_activity_name, workflow_orchestrator_name, workflow_scoped_executor_id, ) from agent_framework_durabletask._workflows.registration import collect_hosted_workflows -from agent_framework_durabletask._workflows.serialization import strip_pickle_markers +from agent_framework_durabletask._workflows.serialization import strip_pickle_markers, strip_subworkflow_markers from ._entities import create_agent_entity from ._errors import IncomingRequestError @@ -249,8 +251,10 @@ def __init__( self._agent_metadata = {} self._workflows: dict[str, Workflow] = {} # Every workflow whose orchestration has been registered (top-level plus - # nested sub-workflows), so a shared sub-workflow is registered once. - self._registered_orchestrations: set[str] = set() + # nested sub-workflows), keyed by name -> the registered instance, so a shared + # sub-workflow is registered once while two different workflows that collide on + # a name are rejected. + self._registered_orchestrations: dict[str, Workflow] = {} self.enable_health_check = enable_health_check self.enable_http_endpoints = enable_http_endpoints self.enable_mcp_tool_trigger = enable_mcp_tool_trigger @@ -334,18 +338,29 @@ def _register_workflow(self, workflow: Workflow) -> None: raise ValueError(f"Workflow '{workflow.name}' is already registered on this app.") # Validate the whole composition (top-level plus every nested sub-workflow) - # up front, so an invalid/auto-generated nested name fails before any + # up front, so an invalid/auto-generated nested name (or an executor id that + # would break durable naming / nested-HITL addressing) fails before any # registration side effects leave the app partially configured. hosted_workflows = list(collect_hosted_workflows(workflow)) for hosted in hosted_workflows: validate_workflow_name(hosted.name) + for executor_id in hosted.executors: + validate_executor_id(executor_id) self._workflows[workflow.name] = workflow # Register orchestration primitives for the top-level workflow and every - # nested sub-workflow (deduped by name). + # nested sub-workflow (deduped by name; a different workflow reusing a name is + # rejected). for hosted in hosted_workflows: - if hosted.name in self._registered_orchestrations: + existing = self._registered_orchestrations.get(hosted.name) + if existing is not None: + if existing is not hosted: + raise ValueError( + f"A different workflow named '{hosted.name}' is already registered on this " + f"app. A workflow name maps to a single durable orchestration " + f"('dafx-{hosted.name}'); rename one of them." + ) continue self._register_workflow_primitives(hosted) @@ -356,7 +371,7 @@ def _register_workflow(self, workflow: Workflow) -> None: def _register_workflow_primitives(self, workflow: Workflow) -> None: """Register one workflow's entities, activities, and orchestrator (no routes).""" validate_workflow_name(workflow.name) - self._registered_orchestrations.add(workflow.name) + self._registered_orchestrations[workflow.name] = workflow logger.debug("[AgentFunctionApp] Registering workflow '%s'", workflow.name) plan = plan_workflow_registration(workflow) @@ -473,6 +488,12 @@ async def start_workflow_orchestration( return self._build_error_response("Request body is required") client_input = raw_body.decode("utf-8") + # Neutralize a forged sub-workflow envelope before scheduling: only an + # internal child dispatch (post trust boundary) may carry those reserved + # keys, so stripping them here keeps untrusted input off the orchestrator's + # trusted-deserialization path (see strip_subworkflow_markers). + client_input = strip_subworkflow_markers(client_input) + instance_id = await client.start_new(orchestrator_name, client_input=client_input) base_url = self._build_base_url(req.url) @@ -528,8 +549,9 @@ async def get_workflow_status( # Add pending HITL requests info if available. Requests originating in a # nested sub-workflow are bubbled up here with a qualified requestId - # ({executorId}::{requestId}); the respondUrl always targets this top-level - # instance, so the caller has a single addressing surface (B2). + # ({executorId}~{ordinal}~{requestId}, nested deeper for deeper levels); the + # respondUrl always targets this top-level instance, so the caller has a + # single addressing surface (B2). custom_status = status.custom_status if isinstance(custom_status, dict): gathered = await self._gather_pending_hitl_requests(client, cast("dict[str, Any]", custom_status)) @@ -587,7 +609,7 @@ async def send_hitl_response(req: func.HttpRequest, client: df.DurableOrchestrat # See strip_pickle_markers() docstring for details on the attack vector. response_data = strip_pickle_markers(response_data) - # A qualified requestId ({executorId}::{requestId}) addresses a request that + # A qualified requestId ({executorId}~{ordinal}~{requestId}) addresses a request that # originated in a nested sub-workflow: resolve it to the owning child # orchestration instance and the bare request id it is waiting on. resolved = await self._resolve_hitl_target(client, instance_id, request_id) @@ -628,11 +650,13 @@ async def _gather_pending_hitl_requests( """Collect ``(qualifiedRequestId, requestData)`` pairs for an instance and its sub-workflows. ``custom_status`` is the already-fetched custom status of the instance at the - current level. Nested sub-workflows (listed in its ``subworkflows`` map) are - fetched by id and recursed into, accumulating an ``{executorId}::`` prefix so a - request deep in the tree carries its full path. Child instances come from the - trusted parent status, so no per-child ownership check is applied (the caller - validated the top-level instance). + current level. Nested sub-workflows (listed in its ``subworkflows`` map as + ``{executorId: [childInstanceId, ...]}``) are fetched by id and recursed into, + accumulating an ``{executorId}~{ordinal}~`` prefix so a request deep in the tree + carries its full path and a node with several children this superstep keeps each + child distinctly addressable. Child instances come from the trusted parent + status, so no per-child ownership check is applied (the caller validated the + top-level instance). """ gathered: list[tuple[str, dict[str, Any]]] = [] @@ -640,23 +664,31 @@ async def _gather_pending_hitl_requests( if isinstance(pending, dict): for req_id, req_data in cast("dict[str, Any]", pending).items(): if isinstance(req_data, dict): - gathered.append((f"{prefix}{req_id}", cast("dict[str, Any]", req_data))) + typed = cast("dict[str, Any]", req_data) + # Use the request's own id field (the event name the orchestrator + # waits on), falling back to the map key; the durabletask client + # qualifies against the same value so a qualified id round-trips. + bare_id = typed.get("request_id", req_id) + gathered.append((f"{prefix}{bare_id}", typed)) subworkflows = custom_status.get("subworkflows") if isinstance(subworkflows, dict): - for executor_id, child_instance_id in cast("dict[str, Any]", subworkflows).items(): - if not isinstance(child_instance_id, str): - continue - child_status = await client.get_status(child_instance_id) - child_custom = child_status.custom_status if child_status else None - if isinstance(child_custom, dict): - gathered.extend( - await self._gather_pending_hitl_requests( - client, - cast("dict[str, Any]", child_custom), - prefix=f"{prefix}{executor_id}{SUBWORKFLOW_REQUEST_SEPARATOR}", + sep = SUBWORKFLOW_REQUEST_SEPARATOR + for executor_id, child_ids in cast("dict[str, Any]", subworkflows).items(): + children: list[Any] = cast("list[Any]", child_ids) if isinstance(child_ids, list) else [] + for ordinal, child_instance_id in enumerate(children): + if not isinstance(child_instance_id, str): + continue + child_status = await client.get_status(child_instance_id) + child_custom = child_status.custom_status if child_status else None + if isinstance(child_custom, dict): + gathered.extend( + await self._gather_pending_hitl_requests( + client, + cast("dict[str, Any]", child_custom), + prefix=f"{prefix}{executor_id}{sep}{ordinal}{sep}", + ) ) - ) return gathered @@ -668,16 +700,18 @@ async def _resolve_hitl_target( ) -> tuple[str, str] | None: """Resolve a possibly-qualified request id to ``(owningInstanceId, bareRequestId)``. - An unqualified id targets ``instance_id`` directly. A qualified id - ``{executorId}::{rest}`` addresses a nested sub-workflow: the executor's child - instance id is read from this instance's ``subworkflows`` custom-status map and - the remainder resolved recursively. Returns ``None`` when a referenced - sub-workflow is not currently active (so the caller can return "not found"). + An unqualified id (no well-formed hop) targets ``instance_id`` directly. A + qualified id ``{executorId}~{ordinal}~{rest}`` addresses a nested sub-workflow: + the executor's child instance id is read from this instance's ``subworkflows`` + custom-status map (a list selected by ``ordinal``) and the remainder resolved + recursively. Returns ``None`` when a referenced sub-workflow child is not + currently active (so the caller can return "not found"). """ - if SUBWORKFLOW_REQUEST_SEPARATOR not in request_id: + hop = split_subworkflow_request_id(request_id) + if hop is None: return instance_id, request_id - executor_id, remainder = request_id.split(SUBWORKFLOW_REQUEST_SEPARATOR, 1) + executor_id, ordinal, remainder = hop status = await client.get_status(instance_id) custom_status = status.custom_status if status else None if not isinstance(custom_status, dict): @@ -685,7 +719,13 @@ async def _resolve_hitl_target( subworkflows = cast("dict[str, Any]", custom_status).get("subworkflows") if not isinstance(subworkflows, dict): return None - child_instance_id = cast("dict[str, Any]", subworkflows).get(executor_id) + children_raw = cast("dict[str, Any]", subworkflows).get(executor_id) + if not isinstance(children_raw, list): + return None + children = cast("list[Any]", children_raw) + if ordinal < 0 or ordinal >= len(children): + return None + child_instance_id = children[ordinal] if not isinstance(child_instance_id, str): return None return await self._resolve_hitl_target(client, child_instance_id, remainder) diff --git a/python/packages/azurefunctions/tests/integration_tests/conftest.py b/python/packages/azurefunctions/tests/integration_tests/conftest.py index 7e17b84cfb5..79bc8a0ac36 100644 --- a/python/packages/azurefunctions/tests/integration_tests/conftest.py +++ b/python/packages/azurefunctions/tests/integration_tests/conftest.py @@ -338,7 +338,12 @@ def _load_and_validate_env(sample_path: Path) -> None: "DURABLE_TASK_SCHEDULER_CONNECTION_STRING", "FUNCTIONS_WORKER_RUNTIME", ] - if sample_path.name == "11_workflow_parallel": + # Samples that host no AI agents need no model credentials (only the DTS emulator + # and Azurite). The suite-level gate still requires *some* LLM config to be present. + no_llm_samples = {"13_subworkflow_hitl"} + if sample_path.name in no_llm_samples: + pass + elif sample_path.name == "11_workflow_parallel": required_env_vars.extend(["AZURE_OPENAI_ENDPOINT", "AZURE_OPENAI_MODEL"]) else: required_env_vars.extend(["FOUNDRY_PROJECT_ENDPOINT", "FOUNDRY_MODEL"]) diff --git a/python/packages/azurefunctions/tests/integration_tests/test_13_workflow_subworkflow_hitl.py b/python/packages/azurefunctions/tests/integration_tests/test_13_workflow_subworkflow_hitl.py new file mode 100644 index 00000000000..1b94eba3b35 --- /dev/null +++ b/python/packages/azurefunctions/tests/integration_tests/test_13_workflow_subworkflow_hitl.py @@ -0,0 +1,150 @@ +# Copyright (c) Microsoft. All rights reserved. +""" +Integration Tests for the Sub-workflow HITL Sample (13_subworkflow_hitl) + +Tests nested human-in-the-loop through the Azure Functions host: the HITL pause +lives inside an inner workflow embedded via ``WorkflowExecutor``, so the pending +request surfaces at the top-level instance with a **qualified** request id +(``review_sub~0~{requestId}``). The caller responds against the top-level instance +and the host routes it to the owning child orchestration. + +This sample hosts no AI agents, so it exercises the AF nested-HITL plumbing +deterministically (no model latency / variability). + +The function app is automatically started by the test fixture. + +Prerequisites: +- Azurite running for durable orchestrations +- Durable Task Scheduler emulator running on localhost:8080 + +Usage: + uv run pytest packages/azurefunctions/tests/integration_tests/test_13_workflow_subworkflow_hitl.py -v +""" + +import time + +import pytest + +# Module-level markers - applied to all tests in this file +pytestmark = [ + pytest.mark.flaky, + pytest.mark.integration, + pytest.mark.sample("13_subworkflow_hitl"), + pytest.mark.usefixtures("function_app_for_test"), +] + +# Must match the outer workflow name in samples/.../13_subworkflow_hitl/function_app.py +WORKFLOW_NAME = "moderation_pipeline" +# The WorkflowExecutor node id that embeds the inner HITL workflow. +SUBWORKFLOW_NODE_ID = "review_sub" + + +@pytest.mark.orchestration +class TestSubworkflowHITL: + """Tests for the 13_subworkflow_hitl sample (nested HITL behind one surface).""" + + @pytest.fixture(autouse=True) + def _setup(self, base_url: str, sample_helper) -> None: + """Provide the helper and base URL for each test.""" + self.base_url = base_url + self.helper = sample_helper + + def _wait_for_hitl_request(self, instance_id: str, timeout: int = 40) -> dict: + """Poll the top-level status endpoint until a (nested) HITL request appears.""" + start_time = time.time() + while time.time() - start_time < timeout: + status_response = self.helper.get(f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/status/{instance_id}") + if status_response.status_code == 200: + status = status_response.json() + if status.get("pendingHumanInputRequests"): + return status + time.sleep(2) + raise AssertionError(f"Timed out waiting for a nested HITL request for instance {instance_id}") + + def _start(self, payload: dict) -> dict: + """Start the outer workflow and return the run response JSON.""" + response = self.helper.post_json(f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/run", payload) + assert response.status_code == 202 + return response.json() + + def test_nested_request_surfaces_with_qualified_id(self) -> None: + """The nested pending request is surfaced with a ``review_sub~0~{id}`` qualified id.""" + data = self._start({ + "content_id": "article-100", + "title": "Quarterly Roadmap", + "body": "A summary of the upcoming features planned for the next quarter.", + }) + instance_id = data["instanceId"] + + status = self._wait_for_hitl_request(instance_id) + pending = status.get("pendingHumanInputRequests", []) + assert len(pending) == 1 + request_id = pending[0]["requestId"] + + # The qualifier carries the node id and the child's ordinal (0 for the single + # dispatch), then the inner bare request id: ``review_sub~0~{requestId}``. + expected_prefix = f"{SUBWORKFLOW_NODE_ID}~0~" + assert request_id.startswith(expected_prefix), request_id + assert request_id[len(expected_prefix) :] # non-empty inner id + + # The respondUrl always targets the top-level instance. + assert f"/api/workflow/{WORKFLOW_NAME}/respond/{instance_id}/" in pending[0]["respondUrl"] + + # Drain the pause so the instance does not hang. + approve = self.helper.post_json( + f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/respond/{instance_id}/{request_id}", + {"approved": True, "reviewer_notes": "ok"}, + ) + assert approve.status_code == 200 + self.helper.wait_for_orchestration(data["statusQueryGetUri"]) + + def test_nested_hitl_approval(self) -> None: + """Responding 'approved' to the nested request resumes the outer workflow to APPROVED.""" + data = self._start({ + "content_id": "article-001", + "title": "Introduction to AI in Healthcare", + "body": ( + "Artificial intelligence is improving healthcare by enabling faster diagnosis, " + "personalized treatment plans, and better patient outcomes." + ), + }) + instance_id = data["instanceId"] + + status = self._wait_for_hitl_request(instance_id) + request_id = status["pendingHumanInputRequests"][0]["requestId"] + + approval = self.helper.post_json( + f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/respond/{instance_id}/{request_id}", + {"approved": True, "reviewer_notes": "Looks good."}, + ) + assert approval.status_code == 200 + + final_status = self.helper.wait_for_orchestration(data["statusQueryGetUri"]) + assert final_status["runtimeStatus"] == "Completed" + assert "APPROVED" in str(final_status.get("output")).upper() + + def test_nested_hitl_rejection(self) -> None: + """Responding 'rejected' to the nested request resumes the outer workflow to REJECTED.""" + data = self._start({ + "content_id": "article-002", + "title": "Get Rich Quick", + "body": "Click here NOW to make $10,000 overnight! GUARANTEED! Limited time offer!", + }) + instance_id = data["instanceId"] + + status = self._wait_for_hitl_request(instance_id) + request_id = status["pendingHumanInputRequests"][0]["requestId"] + + rejection = self.helper.post_json( + f"{self.base_url}/api/workflow/{WORKFLOW_NAME}/respond/{instance_id}/{request_id}", + {"approved": False, "reviewer_notes": "Violates content policy."}, + ) + assert rejection.status_code == 200 + + final_status = self.helper.wait_for_orchestration(data["statusQueryGetUri"]) + assert final_status["runtimeStatus"] == "Completed" + assert "REJECTED" in str(final_status.get("output")).upper() + + +if __name__ == "__main__": + pytest.main([__file__, "-v"]) diff --git a/python/packages/azurefunctions/tests/test_app.py b/python/packages/azurefunctions/tests/test_app.py index a2fe2c8d589..bedd2cc7117 100644 --- a/python/packages/azurefunctions/tests/test_app.py +++ b/python/packages/azurefunctions/tests/test_app.py @@ -1621,6 +1621,46 @@ def test_nested_workflow_with_invalid_name_is_rejected(self) -> None: ): AgentFunctionApp(workflow=outer) + def test_different_subworkflow_sharing_a_name_is_rejected(self) -> None: + """Two different sub-workflow instances that share a name collide and are rejected.""" + inner_a, _ = self._inner_agent_wf("shared", "agent_node") + inner_b, _ = self._inner_agent_wf("shared", "other_node") # different instance, same name + from agent_framework import WorkflowExecutor + + sub_a = Mock(spec=WorkflowExecutor) + sub_a.id = "a" + sub_a.workflow = inner_a + sub_b = Mock(spec=WorkflowExecutor) + sub_b.id = "b" + sub_b.workflow = inner_b + outer = Mock() + outer.name = "outer" + outer.executors = {"a": sub_a, "b": sub_b} + + with ( + patch.object(AgentFunctionApp, "_setup_executor_activity"), + patch.object(AgentFunctionApp, "_setup_workflow_orchestration"), + pytest.raises(ValueError, match="different workflow"), + ): + AgentFunctionApp(workflow=outer) + + def test_executor_id_with_reserved_separator_is_rejected(self) -> None: + """An executor id containing the nested-HITL separator is rejected at registration.""" + from agent_framework import Executor + + ex = Mock(spec=Executor) + ex.id = "bad~id" + wf = Mock() + wf.name = "orders" + wf.executors = {"bad~id": ex} + + with ( + patch.object(AgentFunctionApp, "_setup_executor_activity"), + patch.object(AgentFunctionApp, "_setup_workflow_orchestration"), + pytest.raises(ValueError, match="reserved sub-workflow request separator"), + ): + AgentFunctionApp(workflow=wf) + # NOTE: State snapshot/diff tests were moved to durabletask once the activity # execution body was extracted into the host-agnostic execute_workflow_activity. @@ -1676,7 +1716,7 @@ class TestWorkflowOrchestrationScoping: Both endpoints address durable instances by ID only, but the durable client resolves IDs across every orchestration in the task hub (agent entities, user-registered - orchestrations, other apps on the same hub). ``_is_workflow_orchestration`` gates the + orchestrations, other apps on the same hub). ``_is_owned_orchestration`` gates the endpoints so a leaked instance ID for a different orchestration is treated as "not found" instead of leaking its status/HITL details or accepting injected events. """ @@ -1780,25 +1820,25 @@ async def test_gather_qualifies_nested_requests(self) -> None: client = self._client({"child-1": {"pending_requests": {"inner-1": {"source_executor_id": "inner"}}}}) parent_status = { "pending_requests": {"top-1": {"source_executor_id": "outer"}}, - "subworkflows": {"sub": "child-1"}, + "subworkflows": {"sub": ["child-1"]}, } gathered = await app._gather_pending_hitl_requests(client, parent_status) ids = {qid for qid, _ in gathered} - assert ids == {"top-1", "sub::inner-1"} + assert ids == {"top-1", "sub~0~inner-1"} async def test_gather_accumulates_deep_path(self) -> None: app = self._app() client = self._client({ - "child-1": {"subworkflows": {"leaf": "child-2"}}, + "child-1": {"subworkflows": {"leaf": ["child-2"]}}, "child-2": {"pending_requests": {"deep": {"source_executor_id": "leaf_node"}}}, }) - parent_status = {"subworkflows": {"mid": "child-1"}} + parent_status = {"subworkflows": {"mid": ["child-1"]}} gathered = await app._gather_pending_hitl_requests(client, parent_status) - assert [qid for qid, _ in gathered] == ["mid::leaf::deep"] + assert [qid for qid, _ in gathered] == ["mid~0~leaf~0~deep"] async def test_resolve_unqualified_targets_same_instance(self) -> None: app = self._app() @@ -1810,20 +1850,20 @@ async def test_resolve_unqualified_targets_same_instance(self) -> None: async def test_resolve_qualified_targets_child_instance(self) -> None: app = self._app() - client = self._client({"parent": {"subworkflows": {"sub": "child-1"}}}) + client = self._client({"parent": {"subworkflows": {"sub": ["child-1"]}}}) - resolved = await app._resolve_hitl_target(client, "parent", "sub::req-9") + resolved = await app._resolve_hitl_target(client, "parent", "sub~0~req-9") assert resolved == ("child-1", "req-9") async def test_resolve_deeply_qualified_targets_leaf(self) -> None: app = self._app() client = self._client({ - "parent": {"subworkflows": {"mid": "child-1"}}, - "child-1": {"subworkflows": {"leaf": "child-2"}}, + "parent": {"subworkflows": {"mid": ["child-1"]}}, + "child-1": {"subworkflows": {"leaf": ["child-2"]}}, }) - resolved = await app._resolve_hitl_target(client, "parent", "mid::leaf::deep") + resolved = await app._resolve_hitl_target(client, "parent", "mid~0~leaf~0~deep") assert resolved == ("child-2", "deep") @@ -1831,10 +1871,48 @@ async def test_resolve_unknown_subworkflow_returns_none(self) -> None: app = self._app() client = self._client({"parent": {"state": "running"}}) # no subworkflows map - resolved = await app._resolve_hitl_target(client, "parent", "sub::req-9") + resolved = await app._resolve_hitl_target(client, "parent", "sub~0~req-9") assert resolved is None + async def test_multiple_children_of_one_executor_stay_addressable(self) -> None: + app = self._app() + client = self._client({ + "parent": {"subworkflows": {"sub": ["child-1", "child-2"]}}, + "child-1": {"pending_requests": {"r1": {"source_executor_id": "a"}}}, + "child-2": {"pending_requests": {"r2": {"source_executor_id": "b"}}}, + }) + parent_status = {"subworkflows": {"sub": ["child-1", "child-2"]}} + + gathered = await app._gather_pending_hitl_requests(client, parent_status) + assert {qid for qid, _ in gathered} == {"sub~0~r1", "sub~1~r2"} + + # The second child (ordinal 1) resolves distinctly, not shadowed by the first. + resolved = await app._resolve_hitl_target(client, "parent", "sub~1~r2") + assert resolved == ("child-2", "r2") + + async def test_nested_double_colon_leaf_round_trips(self) -> None: + app = self._app() + client = self._client({ + "parent": {"subworkflows": {"sub": ["child-1"]}}, + "child-1": {"pending_requests": {"auto::0": {"request_id": "auto::0", "source_executor_id": "fn"}}}, + }) + parent_status = {"subworkflows": {"sub": ["child-1"]}} + + gathered = await app._gather_pending_hitl_requests(client, parent_status) + assert [qid for qid, _ in gathered] == ["sub~0~auto::0"] + + resolved = await app._resolve_hitl_target(client, "parent", "sub~0~auto::0") + assert resolved == ("child-1", "auto::0") + + async def test_top_level_double_colon_leaf_is_not_nested(self) -> None: + app = self._app() + client = self._client({}) + + resolved = await app._resolve_hitl_target(client, "parent", "auto::0") + + assert resolved == ("parent", "auto::0") + if __name__ == "__main__": pytest.main([__file__, "-v", "--tb=short"]) diff --git a/python/packages/durabletask/agent_framework_durabletask/__init__.py b/python/packages/durabletask/agent_framework_durabletask/__init__.py index 3b4f185bdcd..efc6dda37ef 100644 --- a/python/packages/durabletask/agent_framework_durabletask/__init__.py +++ b/python/packages/durabletask/agent_framework_durabletask/__init__.py @@ -58,6 +58,7 @@ from ._workflows.naming import ( DURABLE_NAME_PREFIX, is_auto_generated_workflow_name, + validate_executor_id, validate_workflow_name, workflow_name_from_orchestrator, workflow_orchestrator_name, @@ -135,6 +136,7 @@ "plan_workflow_registration", "run_agent_coroutine", "run_workflow_orchestrator", + "validate_executor_id", "validate_workflow_name", "workflow_name_from_orchestrator", "workflow_orchestrator_name", diff --git a/python/packages/durabletask/agent_framework_durabletask/_worker.py b/python/packages/durabletask/agent_framework_durabletask/_worker.py index 8ef949bf93b..db36b68afb5 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_worker.py +++ b/python/packages/durabletask/agent_framework_durabletask/_worker.py @@ -22,6 +22,7 @@ from ._workflows.activity import execute_workflow_activity from ._workflows.dt_context import DurableTaskWorkflowContext from ._workflows.naming import ( + validate_executor_id, validate_workflow_name, workflow_executor_activity_name, workflow_orchestrator_name, @@ -89,8 +90,10 @@ def __init__( self._registered_agents: dict[str, SupportsAgentRun] = {} self._workflows: dict[str, Workflow] = {} # Every workflow whose orchestration has been registered (top-level plus nested - # sub-workflows), so a sub-workflow shared across the tree is registered once. - self._registered_orchestrations: set[str] = set() + # sub-workflows), keyed by name -> the registered instance, so a sub-workflow + # shared across the tree is registered once while two different workflows that + # collide on a name are rejected. + self._registered_orchestrations: dict[str, Workflow] = {} logger.debug("[DurableAIAgentWorker] Initialized with worker type: %s", type(worker).__name__) def add_agent( @@ -229,18 +232,28 @@ def configure_workflow( raise ValueError(f"Workflow '{workflow_name}' is already registered on this worker.") # Validate the whole composition (top-level plus every nested sub-workflow) - # up front, so an invalid/auto-generated nested name fails before any + # up front, so an invalid/auto-generated nested name (or an executor id that + # would break durable naming / nested-HITL addressing) fails before any # registration side effects leave the worker partially configured. hosted_workflows = list(collect_hosted_workflows(workflow)) for hosted in hosted_workflows: validate_workflow_name(hosted.name) + for executor_id in hosted.executors: + validate_executor_id(executor_id) self._workflows[workflow_name] = workflow # Register the top-level workflow and every nested sub-workflow (deduped by # name), so the parent can drive sub-workflows as durable child orchestrations. for hosted in hosted_workflows: - if hosted.name in self._registered_orchestrations: + existing = self._registered_orchestrations.get(hosted.name) + if existing is not None: + if existing is not hosted: + raise ValueError( + f"A different workflow named '{hosted.name}' is already registered on this " + f"worker. A workflow name maps to a single durable orchestration " + f"('dafx-{hosted.name}'); rename one of them." + ) continue self._register_single_workflow(hosted, callback) @@ -256,7 +269,7 @@ def _register_single_workflow( via ``plan_workflow_registration``. """ validate_workflow_name(workflow.name) - self._registered_orchestrations.add(workflow.name) + self._registered_orchestrations[workflow.name] = workflow plan = plan_workflow_registration(workflow) # Register agent executors as durable entities, scoped by workflow name so diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/client.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/client.py index 4d142613392..8d0db527298 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/client.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/client.py @@ -19,8 +19,17 @@ from agent_framework import WorkflowEvent from durabletask.client import TaskHubGrpcClient -from .naming import SUBWORKFLOW_REQUEST_SEPARATOR, workflow_orchestrator_name -from .serialization import deserialize_workflow_event, deserialize_workflow_output, strip_pickle_markers +from .naming import ( + qualify_subworkflow_request_id, + split_subworkflow_request_id, + workflow_orchestrator_name, +) +from .serialization import ( + deserialize_workflow_event, + deserialize_workflow_output, + strip_pickle_markers, + strip_subworkflow_markers, +) logger = logging.getLogger("agent_framework.durabletask") @@ -112,7 +121,11 @@ def start_workflow( orchestration_name = workflow_orchestrator_name(self._resolve_workflow_name(workflow_name)) new_instance_id = self._client.schedule_new_orchestration( orchestration_name, - input=input, + # Neutralize a forged sub-workflow envelope before scheduling: only an + # internal child dispatch (post trust boundary) may carry those reserved + # keys, so stripping them here keeps untrusted input off the orchestrator's + # trusted-deserialization path even if start_workflow is exposed remotely. + input=strip_subworkflow_markers(input), instance_id=instance_id, ) logger.debug("[DurableWorkflowClient] Started workflow instance: %s", new_instance_id) @@ -344,8 +357,8 @@ def get_pending_hitl_requests(self, instance_id: str, *, workflow_name: str | No Note: Requests originating in a nested sub-workflow are included with a - **qualified** ``request_id`` (``{executorId}::{requestId}``, nested for - deeper levels). Pass that qualified id straight back to + **qualified** ``request_id`` (``{executorId}~{ordinal}~{requestId}``, nested + for deeper levels). Pass that qualified id straight back to :meth:`send_hitl_response`; it is routed to the owning child orchestration automatically, so the caller only ever addresses the top-level instance. """ @@ -361,10 +374,12 @@ def _collect_pending_hitl_requests(self, serialized_custom_status: str) -> list[ """Collect an orchestration's pending requests plus any nested sub-workflow ones. Nested requests (discovered via the ``subworkflows`` map the parent records in - its custom status) are qualified with the owning executor id so deeper requests - accumulate a full ``{executorId}::{...}::{requestId}`` path. Child instances are - reached directly by id (already trusted, having come from the parent's status), - so no per-child ownership check is applied. + its custom status as ``{executorId: [childInstanceId, ...]}``) are qualified by + ``(executorId, ordinal)`` so deeper requests accumulate a full + ``{executorId}~{ordinal}~...~{requestId}`` path and a node with several children + keeps each one addressable. Child instances are reached directly by id (already + trusted, having come from the parent's status), so no per-child ownership check + is applied. """ try: custom_status = json.loads(serialized_custom_status) @@ -392,16 +407,20 @@ def _collect_pending_hitl_requests(self, serialized_custom_status: str) -> list[ subworkflows = status_dict.get("subworkflows") if isinstance(subworkflows, dict): - for executor_id, child_instance_id in cast(dict[str, str], subworkflows).items(): - if not isinstance(child_instance_id, str): - continue - child_state = self._client.get_orchestration_state(child_instance_id) - if child_state is None or not child_state.serialized_custom_status: - continue - for child_req in self._collect_pending_hitl_requests(child_state.serialized_custom_status): - qualified = dict(child_req) - qualified["request_id"] = f"{executor_id}{SUBWORKFLOW_REQUEST_SEPARATOR}{child_req['request_id']}" - requests.append(qualified) + for executor_id, child_ids in cast(dict[str, Any], subworkflows).items(): + children: list[Any] = cast("list[Any]", child_ids) if isinstance(child_ids, list) else [] + for ordinal, child_instance_id in enumerate(children): + if not isinstance(child_instance_id, str): + continue + child_state = self._client.get_orchestration_state(child_instance_id) + if child_state is None or not child_state.serialized_custom_status: + continue + for child_req in self._collect_pending_hitl_requests(child_state.serialized_custom_status): + qualified = dict(child_req) + qualified["request_id"] = qualify_subworkflow_request_id( + executor_id, ordinal, child_req["request_id"] + ) + requests.append(qualified) return requests @@ -416,9 +435,9 @@ def send_hitl_response( Args: instance_id: The workflow instance ID. request_id: The pending request's ID (from ``get_pending_hitl_requests``). - May be a **qualified** id (``{executorId}::{requestId}``) for a request - that originated in a nested sub-workflow; it is routed to the owning - child orchestration automatically. + May be a **qualified** id (``{executorId}~{ordinal}~{requestId}``) for a + request that originated in a nested sub-workflow; it is routed to the + owning child orchestration automatically. response: The response payload (e.g. a dict matching the expected response type the executor's ``@response_handler`` expects). workflow_name: Optional workflow name; when set (or a client default is @@ -455,29 +474,32 @@ def send_hitl_response( def _resolve_hitl_target(self, instance_id: str, request_id: str) -> tuple[str, str]: """Resolve a possibly-qualified request id to ``(owning_instance_id, bare_request_id)``. - An unqualified id targets ``instance_id`` directly. A qualified id - ``{executorId}::{rest}`` addresses a nested sub-workflow: the executor's child - instance id is read from this instance's ``subworkflows`` custom-status map and - the remainder is resolved recursively, so arbitrarily deep nesting lands on the - leaf child orchestration and its bare request id. + An unqualified id (no well-formed hop) targets ``instance_id`` directly. A + qualified id ``{executorId}~{ordinal}~{rest}`` addresses a nested sub-workflow: + the executor's child instance id is read from this instance's ``subworkflows`` + custom-status map (a list selected by ``ordinal``) and the remainder is resolved + recursively, so arbitrarily deep nesting lands on the leaf child orchestration + and its bare request id. """ - if SUBWORKFLOW_REQUEST_SEPARATOR not in request_id: + hop = split_subworkflow_request_id(request_id) + if hop is None: return instance_id, request_id - executor_id, remainder = request_id.split(SUBWORKFLOW_REQUEST_SEPARATOR, 1) - child_instance_id = self._lookup_subworkflow_instance(instance_id, executor_id) + executor_id, ordinal, remainder = hop + child_instance_id = self._lookup_subworkflow_instance(instance_id, executor_id, ordinal) if child_instance_id is None: raise ValueError( - f"No active sub-workflow '{executor_id}' found for instance '{instance_id}' " - f"while routing HITL response for request '{request_id}'." + f"No active sub-workflow '{executor_id}' (ordinal {ordinal}) found for instance " + f"'{instance_id}' while routing HITL response for request '{request_id}'." ) return self._resolve_hitl_target(child_instance_id, remainder) - def _lookup_subworkflow_instance(self, instance_id: str, executor_id: str) -> str | None: - """Return the child orchestration instance id for ``executor_id``, if active. + def _lookup_subworkflow_instance(self, instance_id: str, executor_id: str, ordinal: int) -> str | None: + """Return the child orchestration instance id for ``(executor_id, ordinal)``, if active. - Reads the ``subworkflows`` map (``{executorId: childInstanceId}``) the parent - records in its custom status while dispatching sub-workflow nodes. + Reads the ``subworkflows`` map (``{executorId: [childInstanceId, ...]}``) the + parent records in its custom status while dispatching sub-workflow nodes, and + selects the child at ``ordinal`` (its dispatch order this superstep). """ state = self._client.get_orchestration_state(instance_id) if state is None or not state.serialized_custom_status: @@ -491,5 +513,11 @@ def _lookup_subworkflow_instance(self, instance_id: str, executor_id: str) -> st subworkflows = cast(dict[str, Any], custom_status).get("subworkflows") if not isinstance(subworkflows, dict): return None - child = cast(dict[str, Any], subworkflows).get(executor_id) + children_raw = cast(dict[str, Any], subworkflows).get(executor_id) + if not isinstance(children_raw, list): + return None + children = cast("list[Any]", children_raw) + if ordinal < 0 or ordinal >= len(children): + return None + child = children[ordinal] return child if isinstance(child, str) else None diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py index 03cbcb20c2c..273f2ed97e1 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py @@ -30,8 +30,12 @@ __all__ = [ "DURABLE_NAME_PREFIX", + "MAX_EXECUTOR_ID_LENGTH", "SUBWORKFLOW_REQUEST_SEPARATOR", "is_auto_generated_workflow_name", + "qualify_subworkflow_request_id", + "split_subworkflow_request_id", + "validate_executor_id", "validate_workflow_name", "workflow_executor_activity_name", "workflow_name_from_orchestrator", @@ -44,14 +48,27 @@ # ``AgentSessionId.ENTITY_NAME_PREFIX``. DURABLE_NAME_PREFIX = "dafx-" -# Separator joining an executor id to a (possibly already-qualified) request id when -# a nested sub-workflow's pending HITL request is bubbled up to the top-level instance -# (B2 single-surface addressing: ``{executorId}::{requestId}``). Both hosts and the -# client must agree on this so a qualified id round-trips: the read side qualifies an -# inner request with it; the respond side splits on it to route the response to the -# owning child orchestration. Executor ids and framework request ids do not contain -# this sequence, so the split is unambiguous. -SUBWORKFLOW_REQUEST_SEPARATOR = "::" +# Separator used to qualify a nested sub-workflow's pending HITL request when it is +# bubbled up to the top-level instance (B2 single-surface addressing). A qualified id +# is a path of ``{executorId}~{ordinal}`` hops ending in the leaf's bare request id, +# e.g. ``review~0~approve~1~``. Both hosts and the client must agree on it +# so a qualified id round-trips: the read side prepends hops; the respond side peels +# them to route the response to the owning child orchestration. +# +# ``~`` (RFC 3986 "unreserved", so URL-path-safe) is deliberately **not** ``::``: +# core emits ``auto::{index}`` request ids for functional ``@workflow`` HITL, so a +# ``::`` separator would mis-parse those leaf ids. ``~`` does not appear in core +# request ids (uuid4 or ``auto::N``); executor ids are validated to exclude it (see +# :func:`validate_executor_id`), so only the structural hops carry the separator. +SUBWORKFLOW_REQUEST_SEPARATOR = "~" + +# Upper bound on an executor id's length when a workflow is hosted durably. The id is +# interpolated into durable activity/entity names (``dafx-{workflow}-{executor}``) and, +# for sub-workflow nodes, into recursively-nested child orchestration instance ids +# (``{parent}::{executor}::{n}``). Capping it keeps those derived strings within typical +# durable backend name/id limits; combined with the workflow-name cap, the worst-case +# instance id stays bounded even for deeply-nested sub-workflows. +MAX_EXECUTOR_ID_LENGTH = 128 # A workflow name is interpolated into durable orchestration/activity/entity names # *and* into HTTP route segments (``workflow/{workflowName}/run``), so it must be @@ -189,3 +206,95 @@ def is_auto_generated_workflow_name(workflow_name: str) -> bool: ``True`` if the name matches the auto-generated pattern. """ return bool(_AUTO_GENERATED_NAME_RE.match(workflow_name)) + + +def validate_executor_id(executor_id: str) -> None: + """Validate that an executor id is safe to host durably. + + An executor id is interpolated into durable activity/entity names and, for + sub-workflow nodes, into nested child-orchestration instance ids and the + qualified ids used to address nested human-in-the-loop requests. Two properties + must hold: + + * It must not contain :data:`SUBWORKFLOW_REQUEST_SEPARATOR`. That sequence + separates the structural hops of a qualified nested-HITL request id, so an id + containing it would make a qualified id ambiguous and mis-route a response. + * It must be at most :data:`MAX_EXECUTOR_ID_LENGTH` characters, so the durable + names and (recursively nested) instance ids derived from it stay within typical + durable backend limits. + + Args: + executor_id: The executor's id within a hosted workflow. + + Raises: + ValueError: If the id is empty, contains the reserved separator, or is too + long. + """ + if not executor_id: + raise ValueError("Executor id must be a non-empty string.") + if SUBWORKFLOW_REQUEST_SEPARATOR in executor_id: + raise ValueError( + f"Executor id '{executor_id}' contains the reserved sub-workflow request separator " + f"'{SUBWORKFLOW_REQUEST_SEPARATOR}', which is used to address nested human-in-the-loop " + "requests. Rename the executor so its id does not contain that sequence." + ) + if len(executor_id) > MAX_EXECUTOR_ID_LENGTH: + raise ValueError( + f"Executor id '{executor_id[:32]}...' is too long ({len(executor_id)} > " + f"{MAX_EXECUTOR_ID_LENGTH}). Durable activity/entity names and nested instance ids are " + "derived from it; use a shorter id." + ) + + +def qualify_subworkflow_request_id(executor_id: str, ordinal: int, inner_request_id: str) -> str: + """Prepend one sub-workflow hop to a (possibly already-qualified) request id. + + Produces ``{executor_id}~{ordinal}~{inner_request_id}``. ``ordinal`` selects the + specific child orchestration among several a single ``WorkflowExecutor`` node may + dispatch in one superstep, so two children of the same executor stay distinctly + addressable. ``inner_request_id`` is the child's bare leaf request id or its own + already-qualified path for deeper nesting. + + Args: + executor_id: The sub-workflow node's executor id (separator-free; see + :func:`validate_executor_id`). + ordinal: The child's index in the parent's ``subworkflows`` status list. + inner_request_id: The request id (bare or qualified) within the child. + + Returns: + The qualified request id one level higher. + """ + sep = SUBWORKFLOW_REQUEST_SEPARATOR + return f"{executor_id}{sep}{ordinal}{sep}{inner_request_id}" + + +def split_subworkflow_request_id(request_id: str) -> tuple[str, int, str] | None: + """Peel the outermost sub-workflow hop off a qualified request id. + + The inverse of :func:`qualify_subworkflow_request_id` for a single level. + Returns ``(executor_id, ordinal, remainder)`` where ``remainder`` is the still + (possibly) qualified id one level deeper, or ``None`` when ``request_id`` carries + no well-formed hop -- i.e. it is a bare leaf request id that targets the current + instance directly. A leaf id may itself contain the separator (e.g. core's + ``auto::N`` does not, but a custom id could); because only structural hops use the + ``{executor}~{int-ordinal}~`` shape, a value whose second segment is not an integer + is treated as a bare leaf rather than a hop. + + Args: + request_id: A bare or qualified request id. + + Returns: + ``(executor_id, ordinal, remainder)`` for a qualified id, else ``None``. + """ + sep = SUBWORKFLOW_REQUEST_SEPARATOR + if sep not in request_id: + return None + parts = request_id.split(sep, 2) + if len(parts) < 3: + return None + executor_id, ordinal_str, remainder = parts + try: + ordinal = int(ordinal_str) + except ValueError: + return None + return executor_id, ordinal, remainder diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py index 864a190bcf3..7854390ff4a 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py @@ -52,6 +52,7 @@ from .context import WorkflowOrchestrationContext from .naming import workflow_executor_activity_name, workflow_orchestrator_name, workflow_scoped_executor_id from .serialization import ( + SUBWORKFLOW_INPUT_KEY, deserialize_value, reconstruct_to_type, resolve_type, @@ -71,15 +72,15 @@ SOURCE_HITL_RESPONSE = "__hitl_response__" # A WorkflowExecutor node runs its inner workflow as a durable child orchestration. -# The parent passes the node's input wrapped in this marker so the child orchestrator -# can tell a trusted sub-orchestration payload (serialized by the parent) apart from -# untrusted top-level client input, and can track nesting depth to bound recursion. -SUBWORKFLOW_INPUT_KEY = "__subworkflow_input__" -SUBWORKFLOW_DEPTH_KEY = "__subworkflow_depth__" - -# Maximum sub-workflow nesting depth. Mutually-nested workflows (A hosts B hosts A) -# would otherwise spawn child orchestrations without bound; this caps the tree. -MAX_SUBWORKFLOW_DEPTH = 25 +# The parent wraps the node's input in SUBWORKFLOW_INPUT_KEY (defined alongside the +# trust-boundary sanitizer in serialization.py) so the child orchestrator can tell a +# trusted sub-orchestration payload apart from untrusted top-level client input. +# +# Nesting is intentionally *not* capped by a depth counter: a workflow graph cannot +# express unbounded recursion (a WorkflowExecutor wraps a concrete Workflow instance, +# so the nesting tree is finite and fixed at build time), and the recursively-derived +# child instance ids grow with depth, so the durable backend's instance-id length +# limit is the natural ceiling for any pathological construction. # Name of the auto-generated orchestrator registered by # ``DurableAIAgentWorker.configure_workflow`` (and the Azure Functions host). @@ -295,20 +296,16 @@ def _prepare_subworkflow_task( executor: WorkflowExecutor, message: Any, child_instance_id: str, - depth: int, ) -> Any: """Prepare a child-orchestration task that runs a ``WorkflowExecutor``'s inner workflow. The inner workflow runs as its own durable orchestration (``dafx-{innerName}``), so its executors are independently durable/observable. The node's message is serialized and wrapped in a marker so the child orchestrator reconstructs the - original typed object (trusted internal input) and tracks nesting depth. + original typed object (trusted internal input). """ inner_orchestration_name = workflow_orchestrator_name(executor.workflow.name) - child_input = { - SUBWORKFLOW_INPUT_KEY: serialize_value(message), - SUBWORKFLOW_DEPTH_KEY: depth + 1, - } + child_input = {SUBWORKFLOW_INPUT_KEY: serialize_value(message)} return ctx.call_sub_orchestrator(inner_orchestration_name, child_input, instance_id=child_instance_id) @@ -757,7 +754,6 @@ def _prepare_all_tasks( workflow: Workflow, pending_messages: dict[str, list[tuple[Any, str]]], shared_state: dict[str, Any] | None, - depth: int, subworkflow_counter: list[int], ) -> tuple[list[Any], list[TaskMetadata], list[tuple[str, Any, str]]]: """Prepare all pending tasks for parallel execution. @@ -775,8 +771,6 @@ def _prepare_all_tasks( pending_messages: Messages to deliver this superstep, grouped by target executor id, each paired with its source executor id. shared_state: Optional dict for cross-executor state sharing. - depth: This orchestration's sub-workflow nesting depth, propagated to child - orchestrations so recursion can be bounded. subworkflow_counter: A single-element mutable counter, persistent across supersteps, used to derive unique deterministic child instance ids. """ @@ -802,7 +796,7 @@ def _prepare_all_tasks( child_instance_id = f"{ctx.instance_id}::{executor_id}::{subworkflow_counter[0]}" subworkflow_counter[0] += 1 logger.debug("Preparing sub-workflow task: %s -> %s", executor_id, child_instance_id) - task = _prepare_subworkflow_task(ctx, executor, message, child_instance_id, depth) + task = _prepare_subworkflow_task(ctx, executor, message, child_instance_id) all_tasks.append(task) task_metadata_list.append( TaskMetadata( @@ -878,26 +872,12 @@ def run_workflow_orchestrator( workflow: The MAF Workflow instance to execute. initial_message: Initial message to send to the start executor. When this workflow runs as a sub-workflow, this is the parent-supplied marker - payload (see :data:`SUBWORKFLOW_INPUT_KEY`), which also carries the - nesting depth. + payload (see :data:`SUBWORKFLOW_INPUT_KEY`). shared_state: Optional dict for cross-executor state sharing. Returns: List of workflow outputs collected from executor activities. """ - # When invoked as a child orchestration, the initial payload carries the nesting - # depth; bound recursion so mutually-nested workflows cannot spawn unbounded - # child orchestrations. (Top-level runs start at depth 0.) - depth = 0 - if isinstance(initial_message, dict) and SUBWORKFLOW_INPUT_KEY in initial_message: - marker = cast("dict[str, Any]", initial_message) - depth = int(marker.get(SUBWORKFLOW_DEPTH_KEY, 0) or 0) - if depth > MAX_SUBWORKFLOW_DEPTH: - raise RuntimeError( - f"Sub-workflow nesting exceeded the maximum depth of {MAX_SUBWORKFLOW_DEPTH} " - f"(workflow '{workflow.name}'). Check for mutually-nested workflows." - ) - pending_messages: dict[str, list[tuple[Any, str]]] = { workflow.start_executor_id: [(_coerce_initial_input(workflow, initial_message), SOURCE_WORKFLOW_START)] } @@ -942,7 +922,7 @@ def append_activity_events(activity_result: dict[str, Any] | None) -> None: def publish_live_status( state: str, pending_requests: dict[str, Any] | None = None, - subworkflows: dict[str, str] | None = None, + subworkflows: dict[str, list[str]] | None = None, ) -> None: # Publish only on live execution so events are not re-emitted on replay # (the custom status set during the first execution already persisted). @@ -956,10 +936,11 @@ def publish_live_status( status["events"] = live_events if pending_requests is not None: status["pending_requests"] = pending_requests - # Map of {executorId: childInstanceId} for sub-workflows dispatched this - # superstep. The parent is suspended in task_all while a child waits for human - # input, so recording the child ids here lets the read side discover and - # qualify nested pending requests (B2 single-surface HITL). + # Map of {executorId: [childInstanceId, ...]} for sub-workflows dispatched this + # superstep. A single WorkflowExecutor node can receive several messages in one + # superstep and dispatch one child each, so the value is a list indexed by + # dispatch order; the read side qualifies nested pending requests by + # (executorId, ordinal) so every child stays addressable (B2 single-surface HITL). if subworkflows: status["subworkflows"] = subworkflows ctx.set_custom_status(status) @@ -976,7 +957,7 @@ def publish_live_status( # Phase 1: Prepare all tasks all_tasks, task_metadata_list, remaining_agent_messages = _prepare_all_tasks( - ctx, workflow, pending_messages, shared_state, depth, subworkflow_counter + ctx, workflow, pending_messages, shared_state, subworkflow_counter ) # Agents and sub-workflows bypass the per-executor activity, so synthesize their @@ -996,11 +977,13 @@ def publish_live_status( # task_all. While a nested sub-workflow waits for human input, this parent # stays suspended here, so its custom status must already carry the child # ids for the read side to discover and qualify nested pending requests. - active_subworkflows = { - meta.executor_id: meta.child_instance_id - for meta in task_metadata_list - if meta.task_type == TaskType.SUBWORKFLOW and meta.child_instance_id is not None - } + # Grouped as {executorId: [childInstanceId, ...]} in dispatch order so a + # node that dispatches several children this superstep keeps each one + # addressable by its ordinal. + active_subworkflows: dict[str, list[str]] = {} + for meta in task_metadata_list: + if meta.task_type == TaskType.SUBWORKFLOW and meta.child_instance_id is not None: + active_subworkflows.setdefault(meta.executor_id, []).append(meta.child_instance_id) if active_subworkflows: publish_live_status("running", subworkflows=active_subworkflows) raw_results = yield ctx.task_all(all_tasks) diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/registration.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/registration.py index 9fd260a2265..2e3ffa02f2a 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/registration.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/registration.py @@ -102,22 +102,36 @@ def collect_hosted_workflows(workflow: Workflow) -> Iterator[Workflow]: A host registers the orchestration primitives for each yielded workflow so a parent orchestration can invoke its sub-workflows as child orchestrations. - Workflows are deduped by :attr:`Workflow.name`: a sub-workflow reused across - the tree (or shared by two top-level workflows) is yielded once. The top-level - ``workflow`` is yielded first. + Workflows are deduped by :attr:`Workflow.name`: the *same* sub-workflow instance + reused across the tree (or shared by two top-level workflows) is yielded once, + which is the expected fan-out pattern. Two **different** workflow instances that + share a name are rejected, since both would resolve to one durable orchestration + (``dafx-{name}``) and silently shadow each other. The top-level ``workflow`` is + yielded first. Args: workflow: The top-level workflow to walk. Yields: Each distinct workflow in the nesting tree, parent before child. + + Raises: + ValueError: If two different workflow instances in the tree share a name. """ - seen: set[str] = set() + seen: dict[str, Workflow] = {} def _walk(current: Workflow) -> Iterator[Workflow]: - if current.name in seen: + existing = seen.get(current.name) + if existing is not None: + if existing is not current: + raise ValueError( + f"Two different workflows are named '{current.name}'. A workflow name maps to a " + f"single durable orchestration ('dafx-{current.name}'), so names must be unique " + "within a hosted composition. Rename one, or reuse the same Workflow instance if " + "they are meant to be the same sub-workflow." + ) return - seen.add(current.name) + seen[current.name] = current yield current plan = plan_workflow_registration(current) for sub in plan.subworkflow_executors: diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/serialization.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/serialization.py index 0605d32fe92..51b0627a954 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/serialization.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/serialization.py @@ -107,6 +107,45 @@ def strip_pickle_markers(data: Any) -> Any: return data +# ============================================================================ +# Sub-workflow envelope markers (trust boundary) +# ============================================================================ + +# A WorkflowExecutor node runs its inner workflow as a durable child orchestration. +# The parent wraps the node's input in this envelope so the child orchestrator can +# tell a trusted sub-orchestration payload (serialized by the parent, post-boundary, +# via call_sub_orchestrator) apart from untrusted top-level client input. +SUBWORKFLOW_INPUT_KEY = "__subworkflow_input__" + + +def strip_subworkflow_markers(data: Any) -> Any: + """Remove the reserved sub-workflow envelope key from untrusted top-level input. + + The orchestrator treats a top-level input dict carrying :data:`SUBWORKFLOW_INPUT_KEY` + as a *trusted* child-orchestration payload and reconstructs it with + :func:`deserialize_value` (pickle) **without** the usual + :func:`strip_pickle_markers` sanitization, because a genuine envelope is only ever + built internally (post trust boundary) by ``call_sub_orchestrator``. If untrusted + client input could carry that key, an attacker could smuggle a pickle payload + straight into ``pickle.loads`` (RCE). + + Hosts therefore call this on client-supplied workflow input *before* scheduling the + orchestration, so the only way the orchestrator ever sees the envelope is from a + real internal child dispatch. Only the top-level key is removed (that is the only + position the orchestrator interprets it), leaving the rest of the caller's payload + untouched. + """ + if not isinstance(data, dict): + return data + typed = cast(dict[str, Any], data) + if SUBWORKFLOW_INPUT_KEY not in typed: + return typed + logger.debug("Stripped reserved sub-workflow envelope key from untrusted input.") + cleaned = typed.copy() + cleaned.pop(SUBWORKFLOW_INPUT_KEY, None) + return cleaned + + # ============================================================================ # Serialize / Deserialize # ============================================================================ diff --git a/python/packages/durabletask/tests/integration_tests/conftest.py b/python/packages/durabletask/tests/integration_tests/conftest.py index c6e88547a20..f9590f42827 100644 --- a/python/packages/durabletask/tests/integration_tests/conftest.py +++ b/python/packages/durabletask/tests/integration_tests/conftest.py @@ -354,6 +354,10 @@ def check_sample_env(request: pytest.FixtureRequest) -> None: pytest.fail("Test class must have @pytest.mark.sample() marker") sample_name = cast(str, sample_marker.args[0]) # type: ignore[union-attr] + # Samples that host no AI agents need no model credentials (only the DTS emulator). + no_llm_samples = {"12_subworkflow_hitl"} + if sample_name in no_llm_samples: + return if sample_name == "06_multi_agent_orchestration_conditionals": required_vars = ["AZURE_OPENAI_ENDPOINT", "AZURE_OPENAI_MODEL"] else: diff --git a/python/packages/durabletask/tests/integration_tests/test_11_dt_subworkflow.py b/python/packages/durabletask/tests/integration_tests/test_11_dt_subworkflow.py new file mode 100644 index 00000000000..5a118e5c055 --- /dev/null +++ b/python/packages/durabletask/tests/integration_tests/test_11_dt_subworkflow.py @@ -0,0 +1,74 @@ +# Copyright (c) Microsoft. All rights reserved. + +"""Integration tests for the composed sub-workflow sample (11_subworkflow). + +Exercises workflow *composition* on a standalone durabletask worker: +- An outer ``review_pipeline`` embeds an inner ``sentiment_analysis`` workflow via a + ``WorkflowExecutor`` node (``sentiment_sub``). +- ``DurableAIAgentWorker.configure_workflow`` walks the composition and registers a + durable orchestration for each workflow; the inner workflow runs as a child + orchestration when the outer reaches the ``WorkflowExecutor`` node. +- The inner workflow's output (a sentiment summary) is forwarded to the outer + ``reporter`` executor, which produces the final result. + +The inner workflow hosts an AI agent, so these tests require model credentials. +""" + +import logging +from typing import Any + +import pytest + +from agent_framework_durabletask import DurableWorkflowClient + +logging.basicConfig(level=logging.WARNING) + +# Must match the outer workflow name in samples/04-hosting/durabletask/11_subworkflow/worker.py +WORKFLOW_NAME = "review_pipeline" + +# Module-level markers +pytestmark = [ + pytest.mark.flaky, + pytest.mark.integration, + pytest.mark.sample("11_subworkflow"), + pytest.mark.integration_test, + pytest.mark.requires_dts, + pytest.mark.requires_azure_openai, +] + + +class TestSubworkflowComposition: + """Composed (outer + inner) workflow execution on a standalone durabletask worker.""" + + @pytest.fixture(autouse=True) + def setup(self, workflow_client: DurableWorkflowClient) -> None: + """Bind the DurableWorkflowClient for the current sample worker.""" + self.client = workflow_client + + def _run(self, review: str) -> Any: + """Run the composed workflow with a review and return its final output.""" + instance_id = self.client.start_workflow(input=review, workflow_name=WORKFLOW_NAME) + return self.client.await_workflow_output(instance_id, workflow_name=WORKFLOW_NAME, timeout_seconds=180) + + def test_positive_review_runs_through_subworkflow(self) -> None: + """A positive review flows through the embedded sentiment sub-workflow to a report.""" + output = self._run( + "Absolutely love this espresso machine - it heats up fast and the coffee is consistently great." + ) + + assert output is not None + # The outer reporter wraps the inner sub-workflow's forwarded sentiment summary. + assert "sentiment" in str(output).lower() + + def test_negative_review_runs_through_subworkflow(self) -> None: + """A negative review also completes the composed pipeline end-to-end.""" + output = self._run( + "Disappointed. The device stopped working after two weeks and support never replied." + ) + + assert output is not None + assert "sentiment" in str(output).lower() + + +if __name__ == "__main__": + pytest.main([__file__, "-v"]) diff --git a/python/packages/durabletask/tests/integration_tests/test_12_dt_subworkflow_hitl.py b/python/packages/durabletask/tests/integration_tests/test_12_dt_subworkflow_hitl.py new file mode 100644 index 00000000000..673be16bbee --- /dev/null +++ b/python/packages/durabletask/tests/integration_tests/test_12_dt_subworkflow_hitl.py @@ -0,0 +1,152 @@ +# Copyright (c) Microsoft. All rights reserved. + +"""Integration tests for the composed sub-workflow HITL sample (12_subworkflow_hitl). + +Exercises human-in-the-loop **inside a nested sub-workflow** on a standalone +durabletask worker: +- An outer ``moderation_pipeline`` embeds an inner ``human_review`` workflow via a + ``WorkflowExecutor`` node (``review_sub``); on the durable host the inner workflow + runs as a child orchestration. +- The inner ``review_gate`` pauses via ``request_info``. The pending request surfaces + at the top-level instance with a **qualified** id ``review_sub~0~{requestId}`` (the + ``~{ordinal}~`` hop addresses the specific child the node dispatched). +- The client responds with that qualified id against the *top-level* instance and the + host routes it to the owning child orchestration, resuming to an approved/rejected + outcome. + +This sample hosts **no AI agents**, so it needs only the DTS emulator (no model +credentials), which makes it a deterministic end-to-end check of the nested-HITL +addressing. +""" + +import logging +import time +from typing import Any + +import pytest + +from agent_framework_durabletask import DurableWorkflowClient +from agent_framework_durabletask._workflows.naming import SUBWORKFLOW_REQUEST_SEPARATOR + +logging.basicConfig(level=logging.WARNING) + +# Must match the outer workflow name in samples/04-hosting/durabletask/12_subworkflow_hitl/worker.py +WORKFLOW_NAME = "moderation_pipeline" +# The WorkflowExecutor node id that embeds the inner HITL workflow. +SUBWORKFLOW_NODE_ID = "review_sub" + +# Module-level markers. No requires_azure_openai: the sample hosts no agents. +pytestmark = [ + pytest.mark.flaky, + pytest.mark.integration, + pytest.mark.sample("12_subworkflow_hitl"), + pytest.mark.integration_test, + pytest.mark.requires_dts, +] + + +def _wait_for_hitl_request( + client: DurableWorkflowClient, instance_id: str, timeout_seconds: int = 90 +) -> list[dict[str, Any]]: + """Poll until the workflow (or a nested sub-workflow) records a pending HITL request.""" + deadline = time.time() + timeout_seconds + while time.time() < deadline: + pending = client.get_pending_hitl_requests(instance_id, workflow_name=WORKFLOW_NAME) + if pending: + return pending + time.sleep(2) + raise AssertionError(f"Timed out waiting for a nested HITL request on instance {instance_id}") + + +class TestSubworkflowHITL: + """Nested (sub-workflow) human-in-the-loop on a standalone durabletask worker.""" + + @pytest.fixture(autouse=True) + def setup(self, workflow_client: DurableWorkflowClient) -> None: + """Bind the DurableWorkflowClient for the current sample worker.""" + self.client = workflow_client + + def _run_case(self, submission: dict[str, Any], *, approve: bool) -> tuple[dict[str, Any], Any]: + """Start a moderation case, answer the nested HITL pause, return (request, output).""" + instance_id = self.client.start_workflow(input=submission, workflow_name=WORKFLOW_NAME) + + pending = _wait_for_hitl_request(self.client, instance_id) + request = pending[0] + + self.client.send_hitl_response( + instance_id, + request["request_id"], + {"approved": approve, "reviewer_notes": "Looks good." if approve else "Violates content policy."}, + workflow_name=WORKFLOW_NAME, + ) + + output = self.client.await_workflow_output(instance_id, workflow_name=WORKFLOW_NAME, timeout_seconds=180) + return request, output + + def test_nested_request_id_is_qualified_with_ordinal(self) -> None: + """The nested pending request surfaces with a ``review_sub~0~{id}`` qualified id.""" + instance_id = self.client.start_workflow( + input={ + "content_id": "article-100", + "title": "Quarterly Roadmap", + "body": "A summary of the upcoming features planned for the next quarter.", + }, + workflow_name=WORKFLOW_NAME, + ) + + pending = _wait_for_hitl_request(self.client, instance_id) + + assert len(pending) == 1 + request = pending[0] + # The qualifier carries the node id and the child's ordinal (0 for the single + # dispatch), then the inner bare request id: ``review_sub~0~{requestId}``. + expected_prefix = f"{SUBWORKFLOW_NODE_ID}{SUBWORKFLOW_REQUEST_SEPARATOR}0{SUBWORKFLOW_REQUEST_SEPARATOR}" + assert request["request_id"].startswith(expected_prefix), request["request_id"] + # The bare inner id is non-empty after the qualifier. + assert request["request_id"][len(expected_prefix) :] + # The originating executor is the inner workflow's review gate. + assert request["source_executor_id"] == "review_gate" + + # Drain the pause so the worker does not leave the instance hanging. + self.client.send_hitl_response( + instance_id, + request["request_id"], + {"approved": True, "reviewer_notes": "ok"}, + workflow_name=WORKFLOW_NAME, + ) + self.client.await_workflow_output(instance_id, workflow_name=WORKFLOW_NAME, timeout_seconds=180) + + def test_nested_hitl_approval(self) -> None: + """Responding 'approved' to the nested request resumes the outer workflow to APPROVED.""" + _request, output = self._run_case( + { + "content_id": "article-001", + "title": "Introduction to AI in Healthcare", + "body": ( + "Artificial intelligence is improving healthcare by enabling faster diagnosis, " + "personalized treatment plans, and better patient outcomes." + ), + }, + approve=True, + ) + + assert output is not None + assert "APPROVED" in str(output).upper() + + def test_nested_hitl_rejection(self) -> None: + """Responding 'rejected' to the nested request resumes the outer workflow to REJECTED.""" + _request, output = self._run_case( + { + "content_id": "article-002", + "title": "Get Rich Quick", + "body": "Click here NOW to make $10,000 overnight! GUARANTEED! Limited time offer!", + }, + approve=False, + ) + + assert output is not None + assert "REJECTED" in str(output).upper() + + +if __name__ == "__main__": + pytest.main([__file__, "-v"]) diff --git a/python/packages/durabletask/tests/test_subworkflow_orchestration.py b/python/packages/durabletask/tests/test_subworkflow_orchestration.py index ec98d0d46f8..5bf8cc19dcd 100644 --- a/python/packages/durabletask/tests/test_subworkflow_orchestration.py +++ b/python/packages/durabletask/tests/test_subworkflow_orchestration.py @@ -6,11 +6,11 @@ orchestration. These tests cover the host-side glue: * :func:`_prepare_subworkflow_task` wraps the node's message in a trusted-input - marker (carrying nesting depth) and schedules ``dafx-{innerName}``. + marker and schedules ``dafx-{innerName}``. * :func:`_process_subworkflow_result` turns the child's outputs into either routed messages (default) or parent outputs (``allow_direct_output``). * :func:`_try_unwrap_subworkflow_input` / :func:`_coerce_initial_input` reconstruct - the original typed object on the child side and bound recursion via depth. + the original typed object on the child side. """ from unittest.mock import Mock @@ -18,7 +18,6 @@ from agent_framework import WorkflowExecutor from agent_framework_durabletask._workflows.orchestrator import ( - SUBWORKFLOW_DEPTH_KEY, SUBWORKFLOW_INPUT_KEY, TaskType, _coerce_initial_input, @@ -47,7 +46,7 @@ def test_schedules_inner_orchestration_by_scoped_name(self) -> None: ctx.call_sub_orchestrator.return_value = "task-sentinel" executor = _subworkflow_executor("sub-node", "inner_wf") - task = _prepare_subworkflow_task(ctx, executor, "hello", "parent::sub-node::0", depth=0) + task = _prepare_subworkflow_task(ctx, executor, "hello", "parent::sub-node::0") assert task == "task-sentinel" ctx.call_sub_orchestrator.assert_called_once() @@ -55,15 +54,14 @@ def test_schedules_inner_orchestration_by_scoped_name(self) -> None: assert args[0] == "dafx-inner_wf" assert kwargs["instance_id"] == "parent::sub-node::0" - def test_wraps_message_in_marker_with_incremented_depth(self) -> None: + def test_wraps_message_in_marker(self) -> None: ctx = Mock() executor = _subworkflow_executor("sub-node", "inner_wf") - _prepare_subworkflow_task(ctx, executor, "payload", "child-id", depth=3) + _prepare_subworkflow_task(ctx, executor, "payload", "child-id") args, _ = ctx.call_sub_orchestrator.call_args child_input = args[1] - assert child_input[SUBWORKFLOW_DEPTH_KEY] == 4 # The wrapped payload round-trips back to the original message. assert deserialize_value(child_input[SUBWORKFLOW_INPUT_KEY]) == "payload" @@ -117,7 +115,7 @@ class TestSubworkflowInputUnwrap: """Child-side reconstruction of the parent-supplied marker payload.""" def test_unwrap_detects_and_reconstructs_marker(self) -> None: - marker = {SUBWORKFLOW_INPUT_KEY: "wrapped", SUBWORKFLOW_DEPTH_KEY: 2} + marker = {SUBWORKFLOW_INPUT_KEY: "wrapped"} unwrapped, inner = _try_unwrap_subworkflow_input(marker) @@ -138,6 +136,6 @@ def test_coerce_initial_input_returns_unwrapped_inner(self) -> None: # reconstructed inner object directly, bypassing start-executor coercion. workflow = Mock() workflow.executors = {} - marker = {SUBWORKFLOW_INPUT_KEY: "inner-message", SUBWORKFLOW_DEPTH_KEY: 1} + marker = {SUBWORKFLOW_INPUT_KEY: "inner-message"} assert _coerce_initial_input(workflow, marker) == "inner-message" diff --git a/python/packages/durabletask/tests/test_worker.py b/python/packages/durabletask/tests/test_worker.py index ec87694e6e2..efaf32c7eeb 100644 --- a/python/packages/durabletask/tests/test_worker.py +++ b/python/packages/durabletask/tests/test_worker.py @@ -388,6 +388,49 @@ def test_nested_workflow_with_invalid_name_is_rejected(self, agent_worker: Durab with pytest.raises(ValueError, match="invalid"): agent_worker.configure_workflow(outer) + def test_different_subworkflow_sharing_a_name_is_rejected(self, agent_worker: DurableAIAgentWorker) -> None: + """Two different sub-workflow instances that share a name collide and are rejected.""" + from agent_framework import Executor, WorkflowExecutor + + inner_a = self._inner_agent_workflow("shared", "agent_node") + inner_b = self._inner_agent_workflow("shared", "other_node") # different instance, same name + + sub_a = Mock(spec=WorkflowExecutor) + sub_a.id = "a" + sub_a.workflow = inner_a + sub_b = Mock(spec=WorkflowExecutor) + sub_b.id = "b" + sub_b.workflow = inner_b + router = Mock(spec=Executor) + router.id = "router" + outer = Mock() + outer.name = "outer" + outer.executors = {"a": sub_a, "b": sub_b, "router": router} + + with pytest.raises(ValueError, match="different workflow|different workflows"): + agent_worker.configure_workflow(outer) + + def test_executor_id_with_reserved_separator_is_rejected(self, agent_worker: DurableAIAgentWorker) -> None: + """An executor id containing the nested-HITL separator is rejected at registration.""" + workflow = self._agent_workflow_with_executor_id("orders", "bad~id") + + with pytest.raises(ValueError, match="reserved sub-workflow request separator"): + agent_worker.configure_workflow(workflow) + + @staticmethod + def _agent_workflow_with_executor_id(name: str, executor_id: str) -> Mock: + from agent_framework import AgentExecutor + + agent = Mock() + agent.name = "Assistant" + agent_executor = Mock(spec=AgentExecutor) + agent_executor.id = executor_id + agent_executor.agent = agent + workflow = Mock() + workflow.name = name + workflow.executors = {executor_id: agent_executor} + return workflow + if __name__ == "__main__": pytest.main([__file__, "-v", "--tb=short"]) diff --git a/python/packages/durabletask/tests/test_workflow_client.py b/python/packages/durabletask/tests/test_workflow_client.py index 3f07702211b..6da63b89101 100644 --- a/python/packages/durabletask/tests/test_workflow_client.py +++ b/python/packages/durabletask/tests/test_workflow_client.py @@ -67,6 +67,23 @@ def test_start_workflow_passes_non_string_input_unchanged( _, kwargs = mock_client.schedule_new_orchestration.call_args assert kwargs["input"] == payload + def test_start_workflow_strips_forged_subworkflow_envelope( + self, workflow_client: DurableWorkflowClient, mock_client: Mock + ) -> None: + """Reserved sub-workflow envelope keys in client input are stripped at the boundary. + + Only an internal child dispatch may carry these keys; if untrusted input could, + it would smuggle a payload onto the orchestrator's trusted (pickle) path. + """ + mock_client.schedule_new_orchestration.return_value = "i" + forged = {"__subworkflow_input__": {"__pickled__": "evil", "__type__": "x"}, "real": 1} + + workflow_client.start_workflow(input=forged, workflow_name="orders") + + _, kwargs = mock_client.schedule_new_orchestration.call_args + assert kwargs["input"] == {"real": 1} + assert "__subworkflow_input__" not in kwargs["input"] + def test_start_workflow_forwards_instance_id( self, workflow_client: DurableWorkflowClient, mock_client: Mock ) -> None: @@ -492,11 +509,11 @@ def _get_state(instance_id: str) -> Mock | None: def test_collects_nested_request_with_qualified_id( self, workflow_client: DurableWorkflowClient, mock_client: Mock ) -> None: - """A request pending in a child sub-workflow surfaces with an {executor}::{id} id.""" + """A request pending in a child sub-workflow surfaces with an {executor}~{ordinal}~{id} id.""" self._states( mock_client, { - "parent": {"state": "running", "subworkflows": {"sub": "child-1"}}, + "parent": {"state": "running", "subworkflows": {"sub": ["child-1"]}}, "child-1": { "state": "waiting_for_human_input", "pending_requests": {"req-9": {"request_id": "req-9", "source_executor_id": "inner_node"}}, @@ -507,7 +524,7 @@ def test_collects_nested_request_with_qualified_id( requests = workflow_client.get_pending_hitl_requests("parent") assert len(requests) == 1 - assert requests[0]["request_id"] == "sub::req-9" + assert requests[0]["request_id"] == "sub~0~req-9" assert requests[0]["source_executor_id"] == "inner_node" def test_collects_parent_and_nested_requests_together( @@ -520,7 +537,7 @@ def test_collects_parent_and_nested_requests_together( "parent": { "state": "waiting_for_human_input", "pending_requests": {"top-1": {"request_id": "top-1", "source_executor_id": "outer_node"}}, - "subworkflows": {"sub": "child-1"}, + "subworkflows": {"sub": ["child-1"]}, }, "child-1": { "state": "waiting_for_human_input", @@ -531,17 +548,17 @@ def test_collects_parent_and_nested_requests_together( ids = {r["request_id"] for r in workflow_client.get_pending_hitl_requests("parent")} - assert ids == {"top-1", "sub::inner-1"} + assert ids == {"top-1", "sub~0~inner-1"} def test_collects_deeply_nested_request_with_full_path( self, workflow_client: DurableWorkflowClient, mock_client: Mock ) -> None: - """Two levels of nesting accumulate a full {a}::{b}::{id} path.""" + """Two levels of nesting accumulate a full {a}~{i}~{b}~{j}~{id} path.""" self._states( mock_client, { - "parent": {"state": "running", "subworkflows": {"mid": "child-1"}}, - "child-1": {"state": "running", "subworkflows": {"leaf": "child-2"}}, + "parent": {"state": "running", "subworkflows": {"mid": ["child-1"]}}, + "child-1": {"state": "running", "subworkflows": {"leaf": ["child-2"]}}, "child-2": { "state": "waiting_for_human_input", "pending_requests": {"deep": {"request_id": "deep", "source_executor_id": "leaf_node"}}, @@ -551,7 +568,7 @@ def test_collects_deeply_nested_request_with_full_path( requests = workflow_client.get_pending_hitl_requests("parent") - assert [r["request_id"] for r in requests] == ["mid::leaf::deep"] + assert [r["request_id"] for r in requests] == ["mid~0~leaf~0~deep"] def test_send_qualified_response_routes_to_child_instance( self, workflow_client: DurableWorkflowClient, mock_client: Mock @@ -559,10 +576,10 @@ def test_send_qualified_response_routes_to_child_instance( """A qualified id resolves to the owning child instance and bare request id.""" self._states( mock_client, - {"parent": {"state": "running", "subworkflows": {"sub": "child-1"}}}, + {"parent": {"state": "running", "subworkflows": {"sub": ["child-1"]}}}, ) - workflow_client.send_hitl_response("parent", "sub::req-9", {"approved": True}) + workflow_client.send_hitl_response("parent", "sub~0~req-9", {"approved": True}) mock_client.raise_orchestration_event.assert_called_once() args, kwargs = mock_client.raise_orchestration_event.call_args @@ -577,12 +594,12 @@ def test_send_deeply_qualified_response_routes_to_leaf( self._states( mock_client, { - "parent": {"state": "running", "subworkflows": {"mid": "child-1"}}, - "child-1": {"state": "running", "subworkflows": {"leaf": "child-2"}}, + "parent": {"state": "running", "subworkflows": {"mid": ["child-1"]}}, + "child-1": {"state": "running", "subworkflows": {"leaf": ["child-2"]}}, }, ) - workflow_client.send_hitl_response("parent", "mid::leaf::deep", {"ok": 1}) + workflow_client.send_hitl_response("parent", "mid~0~leaf~0~deep", {"ok": 1}) args, kwargs = mock_client.raise_orchestration_event.call_args assert args[0] == "child-2" @@ -595,7 +612,7 @@ def test_send_qualified_response_unknown_subworkflow_raises( self._states(mock_client, {"parent": {"state": "running"}}) # no subworkflows map with pytest.raises(ValueError, match="No active sub-workflow"): - workflow_client.send_hitl_response("parent", "sub::req-9", {"approved": True}) + workflow_client.send_hitl_response("parent", "sub~0~req-9", {"approved": True}) mock_client.raise_orchestration_event.assert_not_called() @@ -610,3 +627,66 @@ def test_unqualified_response_still_targets_named_instance( args, kwargs = mock_client.raise_orchestration_event.call_args assert args[0] == "parent" assert kwargs["event_name"] == "req-1" + + def test_multiple_children_of_one_executor_stay_addressable( + self, workflow_client: DurableWorkflowClient, mock_client: Mock + ) -> None: + """Two children dispatched by one node are qualified by ordinal, not collapsed.""" + self._states( + mock_client, + { + "parent": {"state": "running", "subworkflows": {"sub": ["child-1", "child-2"]}}, + "child-1": { + "state": "waiting_for_human_input", + "pending_requests": {"r1": {"request_id": "r1", "source_executor_id": "a"}}, + }, + "child-2": { + "state": "waiting_for_human_input", + "pending_requests": {"r2": {"request_id": "r2", "source_executor_id": "b"}}, + }, + }, + ) + + ids = {r["request_id"] for r in workflow_client.get_pending_hitl_requests("parent")} + assert ids == {"sub~0~r1", "sub~1~r2"} + + # The second child (ordinal 1) is reachable, not shadowed by the first. + workflow_client.send_hitl_response("parent", "sub~1~r2", {"ok": 1}) + args, kwargs = mock_client.raise_orchestration_event.call_args + assert args[0] == "child-2" + assert kwargs["event_name"] == "r2" + + def test_nested_leaf_request_id_with_double_colon_round_trips( + self, workflow_client: DurableWorkflowClient, mock_client: Mock + ) -> None: + """A functional sub-workflow's ``auto::N`` leaf id survives qualification and routing.""" + self._states( + mock_client, + { + "parent": {"state": "running", "subworkflows": {"sub": ["child-1"]}}, + "child-1": { + "state": "waiting_for_human_input", + "pending_requests": {"auto::0": {"request_id": "auto::0", "source_executor_id": "fn"}}, + }, + }, + ) + + requests = workflow_client.get_pending_hitl_requests("parent") + assert [r["request_id"] for r in requests] == ["sub~0~auto::0"] + + workflow_client.send_hitl_response("parent", "sub~0~auto::0", {"ok": 1}) + args, kwargs = mock_client.raise_orchestration_event.call_args + assert args[0] == "child-1" + assert kwargs["event_name"] == "auto::0" + + def test_top_level_auto_request_id_is_not_treated_as_nested( + self, workflow_client: DurableWorkflowClient, mock_client: Mock + ) -> None: + """A top-level ``auto::N`` id (contains ``::`` but no ``~``) routes to the instance itself.""" + self._states(mock_client, {"parent": {"state": "waiting_for_human_input"}}) + + workflow_client.send_hitl_response("parent", "auto::0", {"approved": True}) + + args, kwargs = mock_client.raise_orchestration_event.call_args + assert args[0] == "parent" + assert kwargs["event_name"] == "auto::0" diff --git a/python/packages/durabletask/tests/test_workflow_naming.py b/python/packages/durabletask/tests/test_workflow_naming.py index c09fda21369..619eafb113d 100644 --- a/python/packages/durabletask/tests/test_workflow_naming.py +++ b/python/packages/durabletask/tests/test_workflow_naming.py @@ -17,10 +17,17 @@ from agent_framework_durabletask import ( DURABLE_NAME_PREFIX, is_auto_generated_workflow_name, + validate_executor_id, validate_workflow_name, workflow_name_from_orchestrator, workflow_orchestrator_name, ) +from agent_framework_durabletask._workflows.naming import ( + MAX_EXECUTOR_ID_LENGTH, + SUBWORKFLOW_REQUEST_SEPARATOR, + qualify_subworkflow_request_id, + split_subworkflow_request_id, +) class TestWorkflowOrchestratorName: @@ -54,6 +61,62 @@ def test_returns_none_without_prefix(self) -> None: # A bare orchestration name (no dafx- prefix) is "not one of ours". assert workflow_name_from_orchestrator("workflow_orchestrator") is None + +class TestValidateExecutorId: + """``validate_executor_id`` guards the durable-naming / nested-HITL contract.""" + + @pytest.mark.parametrize("executor_id", ["router", "agent_node", "reviewer-node", "a", "Step1"]) + def test_accepts_ordinary_ids(self, executor_id: str) -> None: + validate_executor_id(executor_id) # does not raise + + def test_rejects_empty(self) -> None: + with pytest.raises(ValueError, match="non-empty"): + validate_executor_id("") + + def test_rejects_id_containing_separator(self) -> None: + bad = f"a{SUBWORKFLOW_REQUEST_SEPARATOR}b" + with pytest.raises(ValueError, match="reserved sub-workflow request separator"): + validate_executor_id(bad) + + def test_rejects_overly_long_id(self) -> None: + with pytest.raises(ValueError, match="too long"): + validate_executor_id("x" * (MAX_EXECUTOR_ID_LENGTH + 1)) + + +class TestSubworkflowRequestIdQualification: + """Round-trip of the ``{executor}~{ordinal}~{leaf}`` qualified-request-id scheme.""" + + def test_separator_is_url_safe_tilde(self) -> None: + # '~' is RFC 3986 unreserved and (unlike '::') never appears in core request ids. + assert SUBWORKFLOW_REQUEST_SEPARATOR == "~" + + def test_qualify_then_split_round_trips(self) -> None: + qualified = qualify_subworkflow_request_id("sub", 2, "req-9") + assert qualified == "sub~2~req-9" + assert split_subworkflow_request_id(qualified) == ("sub", 2, "req-9") + + def test_split_returns_none_for_bare_id(self) -> None: + assert split_subworkflow_request_id("req-9") is None + + def test_split_preserves_double_colon_leaf(self) -> None: + # A functional workflow's ``auto::0`` leaf survives one peel as the remainder. + assert split_subworkflow_request_id("sub~0~auto::0") == ("sub", 0, "auto::0") + + def test_split_treats_double_colon_only_id_as_bare(self) -> None: + # ``auto::0`` has no '~', so it is a bare leaf, not a nested hop. + assert split_subworkflow_request_id("auto::0") is None + + def test_split_treats_non_integer_ordinal_as_bare(self) -> None: + # A value whose second segment is not an integer is not a structural hop. + assert split_subworkflow_request_id("a~b~c") is None + + def test_nested_qualification_round_trips(self) -> None: + deep = qualify_subworkflow_request_id("mid", 0, qualify_subworkflow_request_id("leaf", 1, "deep")) + assert deep == "mid~0~leaf~1~deep" + executor_id, ordinal, remainder = split_subworkflow_request_id(deep) + assert (executor_id, ordinal) == ("mid", 0) + assert split_subworkflow_request_id(remainder) == ("leaf", 1, "deep") + def test_returns_none_for_prefix_only(self) -> None: assert workflow_name_from_orchestrator(DURABLE_NAME_PREFIX) is None diff --git a/python/packages/durabletask/tests/test_workflow_registration.py b/python/packages/durabletask/tests/test_workflow_registration.py index 390b3981ad3..91166d01b29 100644 --- a/python/packages/durabletask/tests/test_workflow_registration.py +++ b/python/packages/durabletask/tests/test_workflow_registration.py @@ -10,6 +10,7 @@ from unittest.mock import Mock +import pytest from agent_framework import AgentExecutor, Executor, WorkflowExecutor from agent_framework_durabletask import ( @@ -158,3 +159,19 @@ def test_walks_multiple_levels(self) -> None: top = _workflow("top_wf", {"m": _subworkflow_executor("m", mid)}) assert [w.name for w in collect_hosted_workflows(top)] == ["top_wf", "mid_wf", "leaf_wf"] + + def test_rejects_two_different_workflows_sharing_a_name(self) -> None: + """Two different sub-workflow instances with the same name collide and raise.""" + inner_a = _workflow("shared", {"x": _activity_executor("x")}) + inner_b = _workflow("shared", {"y": _activity_executor("y")}) # different instance, same name + outer = _workflow("outer", {"a": _subworkflow_executor("a", inner_a), "b": _subworkflow_executor("b", inner_b)}) + + with pytest.raises(ValueError, match="different workflows"): + list(collect_hosted_workflows(outer)) + + def test_same_instance_reused_is_deduped_not_rejected(self) -> None: + """The same sub-workflow instance referenced by two nodes (fan-out) is yielded once.""" + inner = _workflow("shared", {"x": _activity_executor("x")}) + outer = _workflow("outer", {"a": _subworkflow_executor("a", inner), "b": _subworkflow_executor("b", inner)}) + + assert [w.name for w in collect_hosted_workflows(outer)] == ["outer", "shared"] diff --git a/python/packages/durabletask/tests/test_workflow_serialization.py b/python/packages/durabletask/tests/test_workflow_serialization.py index 6ab190e5921..80295c9c804 100644 --- a/python/packages/durabletask/tests/test_workflow_serialization.py +++ b/python/packages/durabletask/tests/test_workflow_serialization.py @@ -31,6 +31,7 @@ from pydantic import BaseModel from agent_framework_durabletask._workflows.serialization import ( + SUBWORKFLOW_INPUT_KEY, deserialize_value, deserialize_workflow_event, deserialize_workflow_output, @@ -39,6 +40,7 @@ serialize_value, serialize_workflow_event, strip_pickle_markers, + strip_subworkflow_markers, ) @@ -421,3 +423,29 @@ def test_mixed_safe_and_malicious(self) -> None: } result = strip_pickle_markers(data) assert result == {"user_input": "hello", "evil": None, "count": 42} + + +class TestStripSubworkflowMarkers: + """Boundary defence: a forged sub-workflow envelope in untrusted input is removed. + + Only an internal child dispatch (post trust boundary) may carry the reserved + key; if untrusted client input could, it would be treated as a trusted + sub-orchestration payload and reach pickle.loads without sanitization. + """ + + def test_strips_input_key(self) -> None: + data = {SUBWORKFLOW_INPUT_KEY: {"__pickled__": "evil"}, "real": 1} + assert strip_subworkflow_markers(data) == {"real": 1} + + def test_strips_full_forged_envelope(self) -> None: + data = {SUBWORKFLOW_INPUT_KEY: "x"} + assert strip_subworkflow_markers(data) == {} + + def test_preserves_ordinary_dict(self) -> None: + data = {"order_id": 42, "items": ["a", "b"]} + assert strip_subworkflow_markers(data) == data + + def test_preserves_non_dict(self) -> None: + assert strip_subworkflow_markers("hello") == "hello" + assert strip_subworkflow_markers([1, 2]) == [1, 2] + assert strip_subworkflow_markers(None) is None diff --git a/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/.gitignore b/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/.gitignore new file mode 100644 index 00000000000..7097fe01703 --- /dev/null +++ b/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/.gitignore @@ -0,0 +1,5 @@ +# Local settings - copy from local.settings.json.sample and fill in your values +local.settings.json +__pycache__/ +*.pyc +.venv/ diff --git a/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/README.md b/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/README.md new file mode 100644 index 00000000000..c90fb436456 --- /dev/null +++ b/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/README.md @@ -0,0 +1,70 @@ +# 13. Sub-workflow Human-in-the-Loop (HITL) + +This sample demonstrates a **nested** human-in-the-loop pause: the `request_info` +happens inside an **inner workflow** that an outer workflow embeds via +`WorkflowExecutor`. It combines composition (sample 11 on the durabletask host) with +HITL (sample 12) and runs on Azure Durable Functions. + +Unlike sample 12, this sample hosts **no AI agents**, so it needs only Azurite and +the Durable Task Scheduler emulator — no model credentials. + +## Overview + +``` +moderation_pipeline (outer) + intake (executor) + -> review_sub = WorkflowExecutor(human_review) + review_gate (executor: request_info -> response_handler) <-- HITL pause + -> publish (executor) +``` + +1. **User starts** the outer `moderation_pipeline` workflow with content. +2. **`intake`** normalizes the submission and forwards it. +3. **`review_sub`** runs the inner `human_review` workflow as a **child + orchestration**; its `review_gate` pauses via `request_info`. +4. **The status endpoint** surfaces the nested pending request with a **qualified** + id `review_sub~0~{requestId}`. +5. **The caller responds** against the *top-level* instance with that qualified id; + the host routes it to the owning child orchestration. +6. **The inner workflow resumes**, yields its decision, and the outer `publish` + executor produces the final result. + +## Key Concept: one addressing surface for nested HITL + +On the durable host each `WorkflowExecutor` node runs its inner workflow as its own +child orchestration, so a nested `request_info` is recorded on the *child* instance. +`AgentFunctionApp` bubbles those nested requests up into the top-level status with a +**qualified request id**, so the caller only ever addresses the top-level instance: + +| Part | Meaning | +|------|---------| +| `review_sub` | the `WorkflowExecutor` node id that owns the child | +| `0` | the child's ordinal (a node may dispatch several children in one superstep) | +| `{requestId}` | the inner workflow's bare request id | + +The separator is `~` (not `:`), so it never collides with framework-generated +request ids such as functional-workflow `auto::N` ids. + +## Endpoints + +`AgentFunctionApp` exposes routes only for the **top-level** workflow; the inner +workflow is driven as a child orchestration, not addressed directly. + +| Endpoint | Description | +|----------|-------------| +| `POST /api/workflow/moderation_pipeline/run` | Start the workflow | +| `GET /api/workflow/moderation_pipeline/status/{instanceId}` | Status + nested pending HITL requests (qualified ids) | +| `POST /api/workflow/moderation_pipeline/respond/{instanceId}/{requestId}` | Send the human response (use the qualified id) | +| `GET /api/health` | Health check | + +## Running + +1. Start Azurite: `azurite --silent --location .` +2. Start the Durable Task Scheduler emulator on `localhost:8080`. +3. Copy `local.settings.json.sample` to `local.settings.json`. +4. `func start` +5. Drive it with [demo.http](./demo.http): start a run, GET the status to read the + qualified `review_sub~0~{requestId}`, then POST the response to the top-level + instance with that id. + +Run `python function_app.py --maf` for pure MAF mode with DevUI (no Azure Functions). diff --git a/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/demo.http b/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/demo.http new file mode 100644 index 00000000000..10feb08e818 --- /dev/null +++ b/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/demo.http @@ -0,0 +1,74 @@ +### ============================================================================ +### Sub-workflow HITL Sample - Nested Human-in-the-Loop behind one surface +### ============================================================================ +### The HITL pause lives inside an inner workflow (human_review) that the outer +### workflow (moderation_pipeline) embeds via WorkflowExecutor. The nested request +### surfaces with a qualified id (review_sub~0~{requestId}); you respond against the +### top-level instance and the host routes it to the owning child orchestration. +### +### This sample hosts no AI agents, so it needs only Azurite + the DTS emulator. +### +### Prerequisites: +### 1. Start Azurite: azurite --silent --location . +### 2. Start the Durable Task Scheduler emulator (localhost:8080) +### 3. Run: func start +### ============================================================================ + + +### ============================================================================ +### 1. Start the Workflow with Content for Moderation (will approve) +### ============================================================================ +### Starts the outer workflow. It runs intake, then the embedded human_review +### sub-workflow pauses for approval as a child orchestration. + +POST http://localhost:7071/api/workflow/moderation_pipeline/run +Content-Type: application/json + +{ + "content_id": "article-001", + "title": "Introduction to AI in Healthcare", + "body": "Artificial intelligence is improving healthcare by enabling faster diagnosis, personalized treatment plans, and better patient outcomes." +} + + +### ============================================================================ +### 2. Start Workflow with Spammy Content (will reject) +### ============================================================================ + +POST http://localhost:7071/api/workflow/moderation_pipeline/run +Content-Type: application/json + +{ + "content_id": "article-002", + "title": "Get Rich Quick", + "body": "Click here NOW to make $10,000 overnight! GUARANTEED! Limited time offer!" +} + + +### ============================================================================ +### 3. Check Workflow Status (shows the nested pending HITL request) +### ============================================================================ +### Replace INSTANCE_ID with the value returned from the run call. The +### pendingHumanInputRequests entry carries a qualified requestId of the form +### "review_sub~0~" because the pause lives in the sub-workflow. + +@instanceId = REPLACE_WITH_INSTANCE_ID + +GET http://localhost:7071/api/workflow/moderation_pipeline/status/{{instanceId}} + + +### ============================================================================ +### 4. Respond to the nested HITL request (approve) +### ============================================================================ +### Use the qualified requestId from the status response verbatim. The host +### resolves it to the owning child orchestration. + +@requestId = REPLACE_WITH_QUALIFIED_REQUEST_ID + +POST http://localhost:7071/api/workflow/moderation_pipeline/respond/{{instanceId}}/{{requestId}} +Content-Type: application/json + +{ + "approved": true, + "reviewer_notes": "Looks good." +} diff --git a/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/function_app.py b/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/function_app.py new file mode 100644 index 00000000000..bea2887f907 --- /dev/null +++ b/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/function_app.py @@ -0,0 +1,270 @@ +# Copyright (c) Microsoft. All rights reserved. +"""Composed workflow whose Human-in-the-Loop pause lives in a nested sub-workflow. + +This sample combines composition with human-in-the-loop on Azure Durable +Functions: the HITL ``request_info`` happens **inside an inner workflow** that an +outer workflow embeds via ``WorkflowExecutor``. On the durable host the inner +workflow runs as its own child orchestration, so its pending request is recorded on +the *child* instance. The parent records the child instance id in its custom status, +which lets the host surface the nested request behind a single top-level addressing +surface. + +``AgentFunctionApp`` walks the composition and registers a durable orchestration for +each workflow, but exposes HTTP routes only for the **top-level** workflow: + +- ``dafx-moderation_pipeline`` - the outer workflow (HTTP routes). +- ``dafx-human_review`` - the inner workflow (run as a child orchestration), which + contains the HITL pause (no direct routes). + +Composition layout:: + + moderation_pipeline (outer) + intake (executor) + -> review_sub = WorkflowExecutor(human_review) + review_gate (executor: request_info -> response_handler) + -> publish (executor) + +The status endpoint surfaces the inner pending request with a **qualified** request +id (``review_sub~0~{requestId}``); the caller posts the response back to the +*top-level* instance and the host routes it to the owning child orchestration +automatically. + +This sample hosts **no AI agents**, so it needs only the Durable Task Scheduler and +Azurite (no model credentials). + +Prerequisites: +- Start Azurite: ``azurite --silent --location .`` +- Start a Durable Task Scheduler emulator on ``localhost:8080``. +- Run: ``func start`` +""" + +import logging +from dataclasses import dataclass + +from agent_framework import ( + Executor, + Workflow, + WorkflowBuilder, + WorkflowContext, + WorkflowExecutor, + handler, + response_handler, +) +from agent_framework_azurefunctions import AgentFunctionApp +from pydantic import BaseModel +from typing_extensions import Never + +logger = logging.getLogger(__name__) + +INNER_WORKFLOW_NAME = "human_review" +OUTER_WORKFLOW_NAME = "moderation_pipeline" + + +# ============================================================================ +# Data Models +# ============================================================================ + + +@dataclass +class ContentSubmission: + """Content submitted for moderation (outer workflow input).""" + + content_id: str + title: str + body: str + + +@dataclass +class HumanApprovalRequest: + """Request surfaced to the human reviewer (carried in the orchestration status).""" + + content_id: str + title: str + body: str + prompt: str + + +class HumanApprovalResponse(BaseModel): + """Response the external client sends back via the HITL response endpoint.""" + + approved: bool + reviewer_notes: str = "" + + +@dataclass +class ModerationDecision: + """The inner workflow's output: the human's decision for a submission.""" + + content_id: str + approved: bool + reviewer_notes: str + + +# ============================================================================ +# Inner workflow (contains the HITL pause) +# ============================================================================ + + +class ReviewGateExecutor(Executor): + """Inner-workflow executor that pauses for human approval via request_info.""" + + def __init__(self) -> None: + super().__init__(id="review_gate") + + @handler + async def request_review(self, submission: ContentSubmission, ctx: WorkflowContext) -> None: + prompt = ( + f"Please review the following content for publication:\n\n" + f"Title: {submission.title}\n" + f"Content: {submission.body}\n\n" + f"Approve or reject this content." + ) + approval_request = HumanApprovalRequest( + content_id=submission.content_id, + title=submission.title, + body=submission.body, + prompt=prompt, + ) + # Pause the (inner) workflow and wait for a human response. On the durable + # host this pauses the child orchestration running this inner workflow. + await ctx.request_info(request_data=approval_request, response_type=HumanApprovalResponse) + + @response_handler + async def handle_approval_response( + self, + original_request: HumanApprovalRequest, + response: HumanApprovalResponse, + ctx: WorkflowContext[Never, ModerationDecision], + ) -> None: + logger.info( + "Human review received for content %s: approved=%s", + original_request.content_id, + response.approved, + ) + # Yield the decision as the inner workflow's output; the WorkflowExecutor + # forwards it to the outer workflow as a message to the next node. + await ctx.yield_output( + ModerationDecision( + content_id=original_request.content_id, + approved=response.approved, + reviewer_notes=response.reviewer_notes, + ) + ) + + +def create_inner_workflow() -> Workflow: + """Build the inner ``human_review`` workflow (a single HITL gate).""" + review_gate = ReviewGateExecutor() + return WorkflowBuilder(name=INNER_WORKFLOW_NAME, start_executor=review_gate).build() + + +# ============================================================================ +# Outer workflow (embeds the inner workflow) +# ============================================================================ + + +class IntakeExecutor(Executor): + """Outer-workflow entry point that normalizes the submission before review.""" + + def __init__(self) -> None: + super().__init__(id="intake") + + @handler + async def intake(self, submission: ContentSubmission, ctx: WorkflowContext[ContentSubmission]) -> None: + logger.info("Intake received submission %s", submission.content_id) + await ctx.send_message(submission) + + +class PublishExecutor(Executor): + """Outer-workflow executor that consumes the inner workflow's forwarded decision.""" + + def __init__(self) -> None: + super().__init__(id="publish") + + @handler + async def handle_decision(self, decision: ModerationDecision, ctx: WorkflowContext[Never, str]) -> None: + if decision.approved: + message = ( + f"Content '{decision.content_id}' APPROVED and published. " + f"Reviewer notes: {decision.reviewer_notes or 'None'}" + ) + else: + message = f"Content '{decision.content_id}' REJECTED. Reviewer notes: {decision.reviewer_notes or 'None'}" + logger.info(message) + await ctx.yield_output(message) + + +def _create_workflow() -> Workflow: + """Build the outer ``moderation_pipeline`` workflow embedding the HITL sub-workflow.""" + inner_workflow = create_inner_workflow() + + intake = IntakeExecutor() + # WorkflowExecutor embeds the inner (HITL) workflow as a single node. On the + # durable host this node runs as a child orchestration, and the inner pause + # surfaces to the client as a qualified request id (``review_sub~0~{requestId}``). + review_sub = WorkflowExecutor(inner_workflow, id="review_sub") + publish = PublishExecutor() + + return ( + WorkflowBuilder(name=OUTER_WORKFLOW_NAME, start_executor=intake) + .add_edge(intake, review_sub) + .add_edge(review_sub, publish) + .build() + ) + + +# ============================================================================ +# Application Entry Point +# ============================================================================ + + +def launch(durable: bool = True) -> AgentFunctionApp | None: + """Launch the function app or DevUI. + + Args: + durable: If True, returns AgentFunctionApp for Azure Functions. + If False, launches DevUI for local MAF development. + """ + if durable: + # Azure Functions mode. The app automatically provides per-workflow HITL + # endpoints for the top-level workflow ("moderation_pipeline"): + # - POST /api/workflow/moderation_pipeline/run + # - GET /api/workflow/moderation_pipeline/status/{instanceId} + # (surfaces the nested request as review_sub~0~{requestId}) + # - POST /api/workflow/moderation_pipeline/respond/{instanceId}/{requestId} + # - GET /api/health + workflow = _create_workflow() + return AgentFunctionApp(workflow=workflow, enable_health_check=True) + + # Pure MAF mode with DevUI for local development. + from pathlib import Path + + from agent_framework.devui import serve + from dotenv import load_dotenv + + env_path = Path(__file__).parent / ".env" + load_dotenv(dotenv_path=env_path) + + logger.info("Starting Sub-workflow HITL Sample in MAF mode") + logger.info("Available at: http://localhost:8097") + logger.info("\nThis workflow demonstrates:") + logger.info("- Human-in-the-loop inside a nested sub-workflow (WorkflowExecutor)") + logger.info("- Qualified request ids (review_sub~0~{requestId}) behind a single surface") + logger.info("\nFlow: Intake -> WorkflowExecutor(human_review: ReviewGate HITL) -> Publish") + + workflow = _create_workflow() + serve(entities=[workflow], port=8097, auto_open=True) + + return None + + +# Default: Azure Functions mode +# Run with `python function_app.py --maf` for pure MAF mode with DevUI +app = launch(durable=True) + + +if __name__ == "__main__": + import sys + + if "--maf" in sys.argv: + launch(durable=False) diff --git a/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/host.json b/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/host.json new file mode 100644 index 00000000000..9e7fd873dda --- /dev/null +++ b/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/host.json @@ -0,0 +1,12 @@ +{ + "version": "2.0", + "extensionBundle": { + "id": "Microsoft.Azure.Functions.ExtensionBundle", + "version": "[4.*, 5.0.0)" + }, + "extensions": { + "durableTask": { + "hubName": "%TASKHUB_NAME%" + } + } +} diff --git a/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/local.settings.json.sample b/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/local.settings.json.sample new file mode 100644 index 00000000000..04dd252a1ab --- /dev/null +++ b/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/local.settings.json.sample @@ -0,0 +1,9 @@ +{ + "IsEncrypted": false, + "Values": { + "AzureWebJobsStorage": "UseDevelopmentStorage=true", + "DURABLE_TASK_SCHEDULER_CONNECTION_STRING": "Endpoint=http://localhost:8080;TaskHub=default;Authentication=None", + "TASKHUB_NAME": "default", + "FUNCTIONS_WORKER_RUNTIME": "python" + } +} diff --git a/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/requirements.txt b/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/requirements.txt new file mode 100644 index 00000000000..1d98dded06f --- /dev/null +++ b/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/requirements.txt @@ -0,0 +1,11 @@ +# Agent Framework packages +# To use the deployed version, uncomment the lines below and comment out the local installation lines +# agent-framework-azurefunctions + +# Local installation (for development and testing) +# Each package must be listed explicitly because pip doesn't resolve uv workspace sources. +# Without explicit entries, pip would fetch transitive dependencies from PyPI instead of local source. +# This sample hosts no AI agents, so it needs only the core + durabletask + azurefunctions packages. +-e ../../../../packages/core # Core framework - base dependency for all packages +-e ../../../../packages/durabletask # Durable Task support - dependency of azurefunctions +-e ../../../../packages/azurefunctions # Azure Functions integration - the main package for this sample diff --git a/python/samples/04-hosting/durabletask/12_subworkflow_hitl/README.md b/python/samples/04-hosting/durabletask/12_subworkflow_hitl/README.md index c6d1c28c37f..0a2a8a76a5c 100644 --- a/python/samples/04-hosting/durabletask/12_subworkflow_hitl/README.md +++ b/python/samples/04-hosting/durabletask/12_subworkflow_hitl/README.md @@ -17,7 +17,7 @@ request behind a **single top-level addressing surface**. - `dafx-moderation_pipeline` — the outer workflow. - `dafx-human_review` — the inner (HITL) workflow, run as a child orchestration. - **Qualified request ids:** the nested request surfaces to the client with a - qualified id (`review_sub::{requestId}`). The client posts the response against the + qualified id (`review_sub~0~{requestId}`). The client posts the response against the *top-level* instance id, and the host routes it to the owning child orchestration — so the caller never has to discover child instance ids. diff --git a/python/samples/04-hosting/durabletask/12_subworkflow_hitl/client.py b/python/samples/04-hosting/durabletask/12_subworkflow_hitl/client.py index da2405fd36e..aeb8a9cf7b4 100644 --- a/python/samples/04-hosting/durabletask/12_subworkflow_hitl/client.py +++ b/python/samples/04-hosting/durabletask/12_subworkflow_hitl/client.py @@ -7,7 +7,7 @@ 1. Starts the *outer* workflow with ``DurableWorkflowClient.start_workflow``. 2. Polls ``get_pending_hitl_requests`` until a request appears. Because the HITL pause happens inside a sub-workflow, the request surfaces with a **qualified** request id - (``review_sub::{requestId}``). + (``review_sub~0~{requestId}``). 3. Sends the decision with ``send_hitl_response`` against the *top-level* instance id and the qualified request id; the host routes it to the owning child orchestration. 4. Reads the final output with ``await_workflow_output``. @@ -84,7 +84,7 @@ def run_case(client: DurableWorkflowClient, submission: dict[str, Any], *, appro pending = _wait_for_hitl_request(client, instance_id) request = pending[0] - # The request id is qualified (e.g. "review_sub::") because the pause lives + # The request id is qualified (e.g. "review_sub~0~") because the pause lives # in a sub-workflow. We pass it back verbatim against the top-level instance id; # the host resolves it to the owning child orchestration. logger.info("Pending HITL request %s from %s", request["request_id"], request["source_executor_id"]) diff --git a/python/samples/04-hosting/durabletask/12_subworkflow_hitl/worker.py b/python/samples/04-hosting/durabletask/12_subworkflow_hitl/worker.py index ecddf6c97c7..8c4b7bc4b6c 100644 --- a/python/samples/04-hosting/durabletask/12_subworkflow_hitl/worker.py +++ b/python/samples/04-hosting/durabletask/12_subworkflow_hitl/worker.py @@ -26,7 +26,7 @@ -> publish (executor) The client sees the inner pending request with a **qualified** request id -(``review_sub::{requestId}``) and posts the response back to the *top-level* +(``review_sub~0~{requestId}``) and posts the response back to the *top-level* instance; the host routes it to the owning child orchestration automatically. Prerequisites: @@ -207,7 +207,7 @@ def create_workflow() -> Workflow: intake = IntakeExecutor() # WorkflowExecutor embeds the inner (HITL) workflow as a single node. On the # durable host this node runs as a child orchestration, and the inner pause - # surfaces to the client as a qualified request id (``review_sub::{requestId}``). + # surfaces to the client as a qualified request id (``review_sub~0~{requestId}``). review_sub = WorkflowExecutor(inner_workflow, id="review_sub") publish = PublishExecutor() From f419f22bc23d4a2f4df82b8d9e2a86fb394163d8 Mon Sep 17 00:00:00 2001 From: Ahmed Muhsin Date: Tue, 23 Jun 2026 21:45:58 -0400 Subject: [PATCH 08/12] fix(durabletask): address PR review feedback on naming, typing, and docs - Unquote df.DurableOrchestrationClient annotations so pyupgrade passes. - Narrow the split_subworkflow_request_id result before unpacking in a naming test so the strict type checkers pass. - Correct the durabletask sample catalog to the {executor}~{ordinal}~{requestId} qualified id format. - Reword the Azure Functions sub-workflow sample intro so it does not imply a difference from a same-numbered sample. - Drop internal shorthand (B2, phase labels) from code comments. --- .../azurefunctions/agent_framework_azurefunctions/_app.py | 6 +++--- .../agent_framework_durabletask/_workflows/naming.py | 6 +++--- .../_workflows/orchestrator.py | 2 +- python/packages/durabletask/tests/test_workflow_naming.py | 4 +++- .../azure_functions/13_subworkflow_hitl/README.md | 8 ++++---- python/samples/04-hosting/durabletask/README.md | 2 +- 6 files changed, 15 insertions(+), 13 deletions(-) diff --git a/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py b/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py index d62aa72cc69..036ee231c5e 100644 --- a/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py +++ b/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py @@ -551,7 +551,7 @@ async def get_workflow_status( # nested sub-workflow are bubbled up here with a qualified requestId # ({executorId}~{ordinal}~{requestId}, nested deeper for deeper levels); the # respondUrl always targets this top-level instance, so the caller has a - # single addressing surface (B2). + # single addressing surface. custom_status = status.custom_status if isinstance(custom_status, dict): gathered = await self._gather_pending_hitl_requests(client, cast("dict[str, Any]", custom_status)) @@ -642,7 +642,7 @@ async def send_hitl_response(req: func.HttpRequest, client: df.DurableOrchestrat async def _gather_pending_hitl_requests( self, - client: "df.DurableOrchestrationClient", + client: df.DurableOrchestrationClient, custom_status: dict[str, Any], *, prefix: str = "", @@ -694,7 +694,7 @@ async def _gather_pending_hitl_requests( async def _resolve_hitl_target( self, - client: "df.DurableOrchestrationClient", + client: df.DurableOrchestrationClient, instance_id: str, request_id: str, ) -> tuple[str, str] | None: diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py index 273f2ed97e1..d45df96064e 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py @@ -14,8 +14,8 @@ ``WorkflowNamingHelper``):: orchestration: dafx-{workflowName} - non-agent activity: dafx-{workflowName}-{executorId} (wired up in Phase 1) - agent entity: dafx-{workflowName}-{executorId} (wired up in Phase 1) + non-agent activity: dafx-{workflowName}-{executorId} + agent entity: dafx-{workflowName}-{executorId} The orchestration name is the identifier the Durable Task tooling/UI surfaces, so it matches .NET exactly. The inner activity/entity names are scoped by workflow in @@ -49,7 +49,7 @@ DURABLE_NAME_PREFIX = "dafx-" # Separator used to qualify a nested sub-workflow's pending HITL request when it is -# bubbled up to the top-level instance (B2 single-surface addressing). A qualified id +# bubbled up to the top-level instance (one top-level addressing surface). A qualified id # is a path of ``{executorId}~{ordinal}`` hops ending in the leaf's bare request id, # e.g. ``review~0~approve~1~``. Both hosts and the client must agree on it # so a qualified id round-trips: the read side prepends hops; the respond side peels diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py index 7854390ff4a..e37aa8888e3 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py @@ -940,7 +940,7 @@ def publish_live_status( # superstep. A single WorkflowExecutor node can receive several messages in one # superstep and dispatch one child each, so the value is a list indexed by # dispatch order; the read side qualifies nested pending requests by - # (executorId, ordinal) so every child stays addressable (B2 single-surface HITL). + # (executorId, ordinal) so every child stays addressable behind one top-level surface. if subworkflows: status["subworkflows"] = subworkflows ctx.set_custom_status(status) diff --git a/python/packages/durabletask/tests/test_workflow_naming.py b/python/packages/durabletask/tests/test_workflow_naming.py index 619eafb113d..de3124edd49 100644 --- a/python/packages/durabletask/tests/test_workflow_naming.py +++ b/python/packages/durabletask/tests/test_workflow_naming.py @@ -113,7 +113,9 @@ def test_split_treats_non_integer_ordinal_as_bare(self) -> None: def test_nested_qualification_round_trips(self) -> None: deep = qualify_subworkflow_request_id("mid", 0, qualify_subworkflow_request_id("leaf", 1, "deep")) assert deep == "mid~0~leaf~1~deep" - executor_id, ordinal, remainder = split_subworkflow_request_id(deep) + hop = split_subworkflow_request_id(deep) + assert hop is not None + executor_id, ordinal, remainder = hop assert (executor_id, ordinal) == ("mid", 0) assert split_subworkflow_request_id(remainder) == ("leaf", 1, "deep") diff --git a/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/README.md b/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/README.md index c90fb436456..cdbbe9b3689 100644 --- a/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/README.md +++ b/python/samples/04-hosting/azure_functions/13_subworkflow_hitl/README.md @@ -2,11 +2,11 @@ This sample demonstrates a **nested** human-in-the-loop pause: the `request_info` happens inside an **inner workflow** that an outer workflow embeds via -`WorkflowExecutor`. It combines composition (sample 11 on the durabletask host) with -HITL (sample 12) and runs on Azure Durable Functions. +`WorkflowExecutor`. It runs on Azure Durable Functions and is the Azure Functions +counterpart of the durabletask `12_subworkflow_hitl` sample. -Unlike sample 12, this sample hosts **no AI agents**, so it needs only Azurite and -the Durable Task Scheduler emulator — no model credentials. +This sample hosts **no AI agents**, so it needs only Azurite and the Durable Task +Scheduler emulator, with no model credentials. ## Overview diff --git a/python/samples/04-hosting/durabletask/README.md b/python/samples/04-hosting/durabletask/README.md index c247965f532..698e1aaefe8 100644 --- a/python/samples/04-hosting/durabletask/README.md +++ b/python/samples/04-hosting/durabletask/README.md @@ -20,7 +20,7 @@ This directory contains samples for durable agent hosting using the Durable Task - **[09_workflow_hitl](09_workflow_hitl/)**: A workflow that pauses for human approval using `ctx.request_info` / `@response_handler`, with the client discovering and answering the pending request. - **[10_workflow_streaming](10_workflow_streaming/)**: Stream a hosted workflow's events as typed `WorkflowEvent` objects by polling the orchestration's custom status. - **[11_subworkflow](11_subworkflow/)**: Compose workflows by embedding an inner `Workflow` as a node via `WorkflowExecutor`. On the durable host the inner workflow runs as its own child orchestration, and a single `configure_workflow` call registers both. -- **[12_subworkflow_hitl](12_subworkflow_hitl/)**: A human-in-the-loop pause that lives **inside a sub-workflow**. The nested request surfaces to the client with a qualified request id (`{executor}::{requestId}`) behind a single top-level addressing surface. +- **[12_subworkflow_hitl](12_subworkflow_hitl/)**: A human-in-the-loop pause that lives **inside a sub-workflow**. The nested request surfaces to the client with a qualified request id (`{executor}~{ordinal}~{requestId}`) behind a single top-level addressing surface. ## Running the Samples From c291d02a28d670857eeb2f31911b99e1a99cca84 Mon Sep 17 00:00:00 2001 From: Ahmed Muhsin Date: Tue, 23 Jun 2026 22:02:38 -0400 Subject: [PATCH 09/12] fix(durabletask): reject case-insensitive workflow name collisions The route ownership guard compares the durable orchestration name with casefold(), but registration kept raw names as distinct keys. Hosting 'Orders' and 'orders' therefore succeeded while either workflow's status/respond route could operate on the other's instances. Reject case-insensitive name collisions at registration (within a composition via collect_hosted_workflows, and across registration calls via the case-folded _registered_orchestrations map and the top-level guard in both hosts) so the case-folded ownership boundary stays real. Single names of any case remain valid; only collisions are rejected. --- .../agent_framework_azurefunctions/_app.py | 24 ++++++++------ .../packages/azurefunctions/tests/test_app.py | 23 ++++++++++++++ .../agent_framework_durabletask/_worker.py | 24 ++++++++------ .../_workflows/registration.py | 31 +++++++++++-------- .../packages/durabletask/tests/test_worker.py | 11 +++++++ .../tests/test_workflow_registration.py | 16 +++++++++- 6 files changed, 95 insertions(+), 34 deletions(-) diff --git a/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py b/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py index 036ee231c5e..5cfdb088554 100644 --- a/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py +++ b/python/packages/azurefunctions/agent_framework_azurefunctions/_app.py @@ -251,9 +251,9 @@ def __init__( self._agent_metadata = {} self._workflows: dict[str, Workflow] = {} # Every workflow whose orchestration has been registered (top-level plus - # nested sub-workflows), keyed by name -> the registered instance, so a shared - # sub-workflow is registered once while two different workflows that collide on - # a name are rejected. + # nested sub-workflows), keyed by case-folded name -> the registered instance, + # so a shared sub-workflow is registered once while two different workflows + # whose names collide (including case-only differences) are rejected. self._registered_orchestrations: dict[str, Workflow] = {} self.enable_health_check = enable_health_check self.enable_http_endpoints = enable_http_endpoints @@ -334,8 +334,11 @@ def _register_workflow(self, workflow: Workflow) -> None: same name is already registered. """ validate_workflow_name(workflow.name) - if workflow.name in self._workflows: - raise ValueError(f"Workflow '{workflow.name}' is already registered on this app.") + if any(name.casefold() == workflow.name.casefold() for name in self._workflows): + raise ValueError( + f"Workflow '{workflow.name}' is already registered on this app " + "(workflow names are compared case-insensitively)." + ) # Validate the whole composition (top-level plus every nested sub-workflow) # up front, so an invalid/auto-generated nested name (or an executor id that @@ -353,13 +356,14 @@ def _register_workflow(self, workflow: Workflow) -> None: # nested sub-workflow (deduped by name; a different workflow reusing a name is # rejected). for hosted in hosted_workflows: - existing = self._registered_orchestrations.get(hosted.name) + existing = self._registered_orchestrations.get(hosted.name.casefold()) if existing is not None: if existing is not hosted: raise ValueError( - f"A different workflow named '{hosted.name}' is already registered on this " - f"app. A workflow name maps to a single durable orchestration " - f"('dafx-{hosted.name}'); rename one of them." + f"A different workflow named '{hosted.name}' collides with already-registered " + f"'{existing.name}' on this app. A workflow name maps to a single durable " + f"orchestration ('dafx-{hosted.name}'), compared case-insensitively; rename one " + "of them." ) continue self._register_workflow_primitives(hosted) @@ -371,7 +375,7 @@ def _register_workflow(self, workflow: Workflow) -> None: def _register_workflow_primitives(self, workflow: Workflow) -> None: """Register one workflow's entities, activities, and orchestrator (no routes).""" validate_workflow_name(workflow.name) - self._registered_orchestrations[workflow.name] = workflow + self._registered_orchestrations[workflow.name.casefold()] = workflow logger.debug("[AgentFunctionApp] Registering workflow '%s'", workflow.name) plan = plan_workflow_registration(workflow) diff --git a/python/packages/azurefunctions/tests/test_app.py b/python/packages/azurefunctions/tests/test_app.py index bedd2cc7117..e33ad40d30d 100644 --- a/python/packages/azurefunctions/tests/test_app.py +++ b/python/packages/azurefunctions/tests/test_app.py @@ -1480,6 +1480,29 @@ def _wf(executor_id: str) -> Mock: ): AgentFunctionApp(workflows=[_wf("a"), _wf("b")]) + def test_init_rejects_case_insensitive_duplicate_workflow_name(self) -> None: + """Workflow names that differ only by case collide and are rejected. + + The route ownership guard folds case, so hosting both ``orders`` and + ``Orders`` would let one workflow's routes reach the other's instances. + """ + from agent_framework import Executor + + def _wf(name: str, executor_id: str) -> Mock: + ex = Mock(spec=Executor) + ex.id = executor_id + wf = Mock() + wf.name = name + wf.executors = {executor_id: ex} + return wf + + with ( + patch.object(AgentFunctionApp, "_setup_executor_activity"), + patch.object(AgentFunctionApp, "_setup_workflow_orchestration"), + pytest.raises(ValueError, match="case-insensitively"), + ): + AgentFunctionApp(workflows=[_wf("orders", "a"), _wf("Orders", "b")]) + def test_init_rejects_mapping_key_mismatch(self) -> None: """A workflows mapping whose key disagrees with Workflow.name is rejected.""" mock_workflow = Mock() diff --git a/python/packages/durabletask/agent_framework_durabletask/_worker.py b/python/packages/durabletask/agent_framework_durabletask/_worker.py index db36b68afb5..a34bafc7759 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_worker.py +++ b/python/packages/durabletask/agent_framework_durabletask/_worker.py @@ -90,9 +90,9 @@ def __init__( self._registered_agents: dict[str, SupportsAgentRun] = {} self._workflows: dict[str, Workflow] = {} # Every workflow whose orchestration has been registered (top-level plus nested - # sub-workflows), keyed by name -> the registered instance, so a sub-workflow - # shared across the tree is registered once while two different workflows that - # collide on a name are rejected. + # sub-workflows), keyed by case-folded name -> the registered instance, so a + # sub-workflow shared across the tree is registered once while two different + # workflows whose names collide (including case-only differences) are rejected. self._registered_orchestrations: dict[str, Workflow] = {} logger.debug("[DurableAIAgentWorker] Initialized with worker type: %s", type(worker).__name__) @@ -228,8 +228,11 @@ def configure_workflow( """ workflow_name = workflow.name validate_workflow_name(workflow_name) - if workflow_name in self._workflows: - raise ValueError(f"Workflow '{workflow_name}' is already registered on this worker.") + if any(name.casefold() == workflow_name.casefold() for name in self._workflows): + raise ValueError( + f"Workflow '{workflow_name}' is already registered on this worker " + "(workflow names are compared case-insensitively)." + ) # Validate the whole composition (top-level plus every nested sub-workflow) # up front, so an invalid/auto-generated nested name (or an executor id that @@ -246,13 +249,14 @@ def configure_workflow( # Register the top-level workflow and every nested sub-workflow (deduped by # name), so the parent can drive sub-workflows as durable child orchestrations. for hosted in hosted_workflows: - existing = self._registered_orchestrations.get(hosted.name) + existing = self._registered_orchestrations.get(hosted.name.casefold()) if existing is not None: if existing is not hosted: raise ValueError( - f"A different workflow named '{hosted.name}' is already registered on this " - f"worker. A workflow name maps to a single durable orchestration " - f"('dafx-{hosted.name}'); rename one of them." + f"A different workflow named '{hosted.name}' collides with already-registered " + f"'{existing.name}' on this worker. A workflow name maps to a single durable " + f"orchestration ('dafx-{hosted.name}'), compared case-insensitively; rename one " + "of them." ) continue self._register_single_workflow(hosted, callback) @@ -269,7 +273,7 @@ def _register_single_workflow( via ``plan_workflow_registration``. """ validate_workflow_name(workflow.name) - self._registered_orchestrations[workflow.name] = workflow + self._registered_orchestrations[workflow.name.casefold()] = workflow plan = plan_workflow_registration(workflow) # Register agent executors as durable entities, scoped by workflow name so diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/registration.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/registration.py index 2e3ffa02f2a..e3140d1577f 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/registration.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/registration.py @@ -102,12 +102,14 @@ def collect_hosted_workflows(workflow: Workflow) -> Iterator[Workflow]: A host registers the orchestration primitives for each yielded workflow so a parent orchestration can invoke its sub-workflows as child orchestrations. - Workflows are deduped by :attr:`Workflow.name`: the *same* sub-workflow instance - reused across the tree (or shared by two top-level workflows) is yielded once, - which is the expected fan-out pattern. Two **different** workflow instances that - share a name are rejected, since both would resolve to one durable orchestration - (``dafx-{name}``) and silently shadow each other. The top-level ``workflow`` is - yielded first. + Workflows are deduped by :attr:`Workflow.name`, **compared case-insensitively**: + the *same* sub-workflow instance reused across the tree (or shared by two + top-level workflows) is yielded once, which is the expected fan-out pattern. Two + **different** workflow instances whose names collide (including case-only + differences) are rejected, since both would resolve to one durable orchestration + (``dafx-{name}``) -- whose name the route ownership check compares + case-insensitively -- and would silently shadow each other. The top-level + ``workflow`` is yielded first. Args: workflow: The top-level workflow to walk. @@ -116,22 +118,25 @@ def collect_hosted_workflows(workflow: Workflow) -> Iterator[Workflow]: Each distinct workflow in the nesting tree, parent before child. Raises: - ValueError: If two different workflow instances in the tree share a name. + ValueError: If two different workflow instances in the tree have colliding + (case-insensitive) names. """ seen: dict[str, Workflow] = {} def _walk(current: Workflow) -> Iterator[Workflow]: - existing = seen.get(current.name) + key = current.name.casefold() + existing = seen.get(key) if existing is not None: if existing is not current: raise ValueError( - f"Two different workflows are named '{current.name}'. A workflow name maps to a " - f"single durable orchestration ('dafx-{current.name}'), so names must be unique " - "within a hosted composition. Rename one, or reuse the same Workflow instance if " - "they are meant to be the same sub-workflow." + f"A different workflow named '{current.name}' collides with '{existing.name}'. A " + f"workflow name maps to a single durable orchestration ('dafx-{current.name}'), " + "compared case-insensitively, so names must be unique within a hosted composition. " + "Rename one, or reuse the same Workflow instance if they are meant to be the same " + "sub-workflow." ) return - seen[current.name] = current + seen[key] = current yield current plan = plan_workflow_registration(current) for sub in plan.subworkflow_executors: diff --git a/python/packages/durabletask/tests/test_worker.py b/python/packages/durabletask/tests/test_worker.py index efaf32c7eeb..2c1b5ad68a3 100644 --- a/python/packages/durabletask/tests/test_worker.py +++ b/python/packages/durabletask/tests/test_worker.py @@ -271,6 +271,17 @@ def test_rejects_duplicate_workflow_name(self, agent_worker: DurableAIAgentWorke with pytest.raises(ValueError, match="already registered"): agent_worker.configure_workflow(self._agent_workflow("orders", "b")) + def test_rejects_case_insensitive_duplicate_workflow_name(self, agent_worker: DurableAIAgentWorker) -> None: + """Workflow names that differ only by case collide and are rejected. + + The route ownership guard folds case, so allowing both ``orders`` and + ``Orders`` would let one workflow's surface reach the other's instances. + """ + agent_worker.configure_workflow(self._agent_workflow("orders", "a")) + + with pytest.raises(ValueError, match="case-insensitively"): + agent_worker.configure_workflow(self._agent_workflow("Orders", "b")) + def test_rejects_auto_generated_workflow_name(self, agent_worker: DurableAIAgentWorker) -> None: """A workflow with an auto-generated WorkflowBuilder name is rejected.""" import uuid diff --git a/python/packages/durabletask/tests/test_workflow_registration.py b/python/packages/durabletask/tests/test_workflow_registration.py index 91166d01b29..7ecee8710e3 100644 --- a/python/packages/durabletask/tests/test_workflow_registration.py +++ b/python/packages/durabletask/tests/test_workflow_registration.py @@ -166,7 +166,21 @@ def test_rejects_two_different_workflows_sharing_a_name(self) -> None: inner_b = _workflow("shared", {"y": _activity_executor("y")}) # different instance, same name outer = _workflow("outer", {"a": _subworkflow_executor("a", inner_a), "b": _subworkflow_executor("b", inner_b)}) - with pytest.raises(ValueError, match="different workflows"): + with pytest.raises(ValueError, match="collides"): + list(collect_hosted_workflows(outer)) + + def test_rejects_case_insensitive_name_collision(self) -> None: + """Two different instances whose names differ only by case collide and raise. + + The route ownership guard compares the durable orchestration name + case-insensitively, so case-only name variants must be rejected here or one + workflow's routes could operate on the other's instances. + """ + inner_a = _workflow("shared", {"x": _activity_executor("x")}) + inner_b = _workflow("Shared", {"y": _activity_executor("y")}) # case-only difference + outer = _workflow("outer", {"a": _subworkflow_executor("a", inner_a), "b": _subworkflow_executor("b", inner_b)}) + + with pytest.raises(ValueError, match="collides"): list(collect_hosted_workflows(outer)) def test_same_instance_reused_is_deduped_not_rejected(self) -> None: From 5df15ef28b731b3fe6c7e9467a85e21617f45902 Mon Sep 17 00:00:00 2001 From: Ahmed Muhsin Date: Wed, 24 Jun 2026 13:32:51 -0400 Subject: [PATCH 10/12] docs(durabletask): remove multiworkflow/subworkflow ADR and design docs Drop the ADR and design exploration documents and the dangling docstring reference to them. --- ...abletask-multiworkflow-and-subworkflows.md | 121 ---- ...abletask-multiworkflow-and-subworkflows.md | 547 ------------------ .../_workflows/naming.py | 3 +- 3 files changed, 1 insertion(+), 670 deletions(-) delete mode 100644 docs/decisions/0030-durabletask-multiworkflow-and-subworkflows.md delete mode 100644 docs/design/durabletask-multiworkflow-and-subworkflows.md diff --git a/docs/decisions/0030-durabletask-multiworkflow-and-subworkflows.md b/docs/decisions/0030-durabletask-multiworkflow-and-subworkflows.md deleted file mode 100644 index dd1dfa4ee3e..00000000000 --- a/docs/decisions/0030-durabletask-multiworkflow-and-subworkflows.md +++ /dev/null @@ -1,121 +0,0 @@ ---- -status: proposed -contact: ahmedmuhsin -date: 2025-06-13 -deciders: ahmedmuhsin -consulted: -informed: ---- - -# Durable Task hosting: multiple workflows per host and sub-workflows - -## Context and Problem Statement - -The Python Durable Task hosting layer (the standalone `DurableAIAgentWorker` and the Azure Functions `AgentFunctionApp`) originally hosted exactly **one** MAF `Workflow` per host, registered under a single fixed orchestration name (`workflow_orchestrator`). Two capabilities were missing relative to the in-process MAF runtime and the .NET durable host: - -1. **Multiple workflows per host** — one worker / one Function app could not host more than one workflow, and two workflows that happened to reuse an executor or agent id would collide on shared durable primitive names. -2. **Sub-workflows (composition)** — MAF's `WorkflowExecutor` embeds one workflow inside another, but the durable hosts had no way to run a nested workflow as a first-class durable unit. - -This ADR records the design decisions for adding both capabilities to the Python durable hosts, keeping them aligned with the .NET durable host where it matters (the Durable Task tooling/UI surface) and with the in-process MAF semantics everywhere else. The full design exploration lives in [`docs/design/durabletask-multiworkflow-and-subworkflows.md`](../design/durabletask-multiworkflow-and-subworkflows.md); this ADR captures the decisions and the considered alternatives. - -## Decision Drivers - -- **Stable durable identities.** Durable replay only resumes an in-flight orchestration if the orchestration/activity/entity names still resolve to the same functions. Names must be stable across restarts and derived deterministically from a workflow name. -- **No accidental collisions.** Two co-hosted workflows that reuse an executor or agent id must not share a durable entity/activity, or one workflow's implementation would silently service another's dispatch. -- **Alignment with .NET on the surfaced identity.** The orchestration name is what the Durable Task tooling/UI shows; it should match .NET's `WorkflowNamingHelper` byte-for-byte. -- **Alignment with in-process MAF semantics.** Sub-workflow output forwarding, HITL request/response, and per-run state isolation should behave the same durably as in-process. -- **Stable caller surface.** HTTP callers should not have to change URLs as an app grows from one workflow to many, or discover internal child orchestration instance ids for nested HITL. -- **Determinism.** Orchestrator code must be replay-safe (deterministic child instance ids, no `uuid4()` in the orchestrator). - -## Considered Options - -The design has several semi-independent decision points; the considered alternatives are grouped by decision below. The chosen options are summarized in the Decision Outcome. - -### Workflow-internal durable names (collision avoidance) - -- **Approach A — scope inner names by workflow** (`dafx-{workflowName}-{executorId}`). Distinct names per workflow using plain closures; no runtime registry. - - Good: removes same-executor-id collisions with no extra moving parts. - - Good: each workflow's primitives are independently inspectable. - - Neutral: diverges from .NET's bare `dafx-{executorId}` inner names (but those are not UI-surfaced). -- **Approach B — a runtime registry keyed by (workflow, executor)** mapping to shared handlers. - - Good: closer to .NET's bare inner names. - - Bad: introduces a registry indirection and a stateful lookup on the hot path; more to get wrong on replay. - -### Sub-workflow execution model - -- **Model A — run the inner workflow inside one activity** of the parent orchestration. - - Good: fewest orchestration instances. - - Bad: the inner workflow's executors are not independently durable or observable; HITL inside the inner workflow cannot pause durably. -- **Model B — run the inner workflow as a durable child orchestration** via `call_sub_orchestrator(dafx-{innerName})`. - - Good: matches what the .NET durable host does (`ExecuteSubWorkflowAsync` → child orchestration). - - Good: inner executors are independently durable/observable; inner HITL pauses durably on the child instance. - - Neutral: more orchestration instances on the task hub. - -### Azure Functions route shape for multiple workflows - -- **Always per-workflow routes** (`workflow/{name}/run|status|respond`), even for a single workflow. - - Good: the URL shape never changes as an app grows; callers are stable. - - Neutral: a single-workflow app has a slightly longer URL. -- **Bare routes for one workflow, per-workflow routes only when there are many.** - - Bad: callers must change URLs when a second workflow is added. - -### Single-workflow orchestration-name migration - -- **Hard switch** to `dafx-{name}` with no runtime alias for the old `workflow_orchestrator` name. - - Good: one naming scheme everywhere; no special-case alias code. - - Bad: pre-upgrade in-flight single-workflow instances under `workflow_orchestrator` will not resume. - - Acceptable: durable workflow runs are typically short-lived and this is a preview surface; operators drain in-flight instances before upgrading. `WORKFLOW_ORCHESTRATOR_NAME` remains exported as a deprecated source alias. -- **Dual registration / runtime alias** that resumes both names. - - Good: in-flight instances survive the upgrade. - - Bad: permanent alias-compat code on a preview surface for a low-value case. - -### Sub-workflow HITL addressing - -- **B1 — direct child addressing.** Expose child instance ids; the responder posts to `workflow/{innerName}/respond/{childInstanceId}/{requestId}`. - - Good: simple host plumbing. - - Bad: leaks child instance ids to the caller and changes the addressing surface per nesting depth. -- **B2 — propagated single surface.** Bubble inner pending requests up into the parent custom status with qualified request ids (`{executorId}~{ordinal}~{requestId}`); a response to the parent is routed by peeling one hop and raising the event on the owning child instance. - - Good: one addressing surface for arbitrarily deep nesting; the caller always talks to the top-level run. - - Good: consistent with the "always per-workflow, stable surface" decision. - - Good: the `~{ordinal}~` hop indexes the parent's `subworkflows` child list, so a node that dispatches several children in one superstep keeps each addressable. - - Neutral: requires parent→child response plumbing in the host/client. - - Note: the separator is `~` (not `::`) because core emits `auto::{index}` request ids for functional `@workflow` HITL; `~` never appears in a core request id and is rejected in executor ids, so qualified ids round-trip unambiguously. - -### Workflow agent addressing - -- **Reuse `add_agent` with a scoped entity id** (`dafx-{workflowName}-{executorId}`); workflow agents are reachable via `get_agent(name, workflow_name=...)`. No separate `workflow_agents=` kwarg. - - Good: one registration path; workflow agents appear in `agents` / `get_agent`. - - Good: the per-workflow grouping is an internal planner structure both hosts consume. -- **A separate `workflow_agents=` registration surface.** - - Bad: a parallel registration path and a second public kwarg for what is an internal grouping concern. - -## Decision Outcome - -1. **Orchestration naming:** `dafx-{workflowName}` (matches .NET; the UI-surfaced name). -2. **Workflow-internal durable names:** **Approach A** — scope inner activity/entity names by workflow (`dafx-{workflowName}-{executorId}`). -3. **Azure Functions route shape:** **always per-workflow routes** (`workflow/{name}/run|status|respond`). -4. **Sub-workflow execution model:** **Model B** — child orchestration via `call_sub_orchestrator`, matching the .NET durable host. -5. **Single-workflow migration:** **hard switch** to `dafx-{name}` with no runtime alias; `WORKFLOW_ORCHESTRATOR_NAME` stays as a deprecated source alias only. -6. **Sub-workflow HITL addressing:** **B2** — propagate inner pending requests to the parent custom status with qualified request ids (`{executorId}~{ordinal}~{requestId}`); the caller always responds to the top-level run. -7. **Workflow agent addressing:** register through the **same** `add_agent` primitive under the scoped name; reachable via `get_agent(name, workflow_name=...)`; no `workflow_agents=` kwarg. Agent conversation state stays isolated by the entity key (`ctx.instance_id`). -8. **Hardening:** reject two **different** workflow instances that share a name (the same instance reused by several nodes is deduped); validate executor ids (separator-free, length-bounded); and strip the reserved sub-workflow envelope key from untrusted client input at the host boundary so a forged envelope cannot reach the trusted pickle path. Sub-workflow nesting is **not** capped by a depth counter — the nesting tree is finite at build time and the durable instance-id length limit is the natural ceiling (matching .NET, which imposes no limit). - -### Consequences - -- Good: two workflows can be co-hosted on one worker / app and reuse executor and agent ids without colliding; each workflow's durable primitives are independently inspectable. -- Good: sub-workflows are first-class durable units; inner HITL pauses durably and surfaces behind a single top-level addressing surface. -- Good: the orchestration name remains identical to .NET, so the Durable Task tooling/UI is consistent across languages. -- Good: HTTP callers have a stable URL shape and never need to discover internal child instance ids. -- Bad / accepted: pre-upgrade single-workflow instances under `workflow_orchestrator` will not resume after the hard switch. -- Neutral: sub-workflows add orchestration instances to the task hub (one child orchestration per `WorkflowExecutor` invocation). - -### Out of scope / follow-up - -- **Cross-workflow shared agents.** A single agent that intentionally shares conversation memory across two co-hosted workflows is out of scope. Today, agent state is isolated per run by the entity key (`ctx.instance_id`); intentional sharing would need an explicit stable shared entity key rather than `instance_id`. Flagged as a possible follow-up. - -## More Information - -- Design document: [`docs/design/durabletask-multiworkflow-and-subworkflows.md`](../design/durabletask-multiworkflow-and-subworkflows.md) -- Implementation: Python `agent_framework_durabletask` (standalone worker, client, orchestrator, naming) and `agent_framework_azurefunctions` (`AgentFunctionApp`). -- Samples: `python/samples/04-hosting/durabletask/11_subworkflow` (composition) and `.../12_subworkflow_hitl` (HITL inside a sub-workflow). -- .NET reference: `WorkflowNamingHelper` (orchestration naming) and the durable host's `ExecuteSubWorkflowAsync` (sub-workflow as child orchestration). diff --git a/docs/design/durabletask-multiworkflow-and-subworkflows.md b/docs/design/durabletask-multiworkflow-and-subworkflows.md deleted file mode 100644 index 1eb34ceb338..00000000000 --- a/docs/design/durabletask-multiworkflow-and-subworkflows.md +++ /dev/null @@ -1,547 +0,0 @@ -# Durable hosting: multiple workflows and sub-workflows (Python) - -Status: Implemented — decisions promoted to -[ADR-0030](../decisions/0030-durabletask-multiworkflow-and-subworkflows.md) -Scope: `python/packages/durabletask` (standalone Durable Task worker) and -`python/packages/azurefunctions` (Azure Functions host) -Related: PR #6418 (standalone Durable Task workflow hosting), core -`agent_framework._workflows` - -This document sketches the work needed to add two capabilities to the Python -durable workflow hosting layer: - -1. **Multiple workflows per host** — register and address more than one workflow - on a single worker / Function App. -2. **Sub-workflows** — run a `Workflow` nested inside another workflow under - durable execution. - -It maps the current single-workflow architecture, summarizes the existing .NET -implementation and the in-process core engine, then proposes a design with -options, recommendations, and a phased work breakdown. Where the .NET approach -is shaped by C#/DI specifics, the Python-specific recommendation is called out. - ---- - -## 1. Current state (single workflow) - -The durable hosting layer today assumes exactly one workflow per host. - -- **Fixed orchestrator name.** `WORKFLOW_ORCHESTRATOR_NAME = "workflow_orchestrator"` - is a module constant in `_workflows/orchestrator.py`, exported from the package - `__all__`. Every workflow registers and starts under this one name. -- **Singular worker registration.** `DurableAIAgentWorker.configure_workflow(workflow)` - stores `self._workflow = workflow` and registers one orchestrator whose - `__name__` is set to the fixed constant. -- **Registration planner.** `plan_workflow_registration(workflow)` walks - `workflow.executors.values()` and classifies each executor: `AgentExecutor` - becomes a durable **entity**, everything else becomes a durable **activity**. - It returns a single `WorkflowRegistrationPlan(agent_executors, activity_executors, - orchestrator_name)`. -- **Global durable names.** Activities and agent entities are named - `dafx-{executor.id}` (`AgentSessionId.to_entity_name` uses the same `dafx-` - prefix). These names are **global per task hub**, so two workflows sharing an - executor id collide. -- **Singular client.** `DurableWorkflowClient.start_workflow()` (and - `run_workflow` / `stream_workflow`) always schedule `WORKFLOW_ORCHESTRATOR_NAME`. - There is no per-workflow targeting. -- **Singular Functions host.** `AgentFunctionApp(workflow=...)` takes one workflow, - registers one orchestrator (`workflow_orchestrator`), per-executor activities - (`dafx-{executor.id}`), and three flat HTTP routes: `workflow/run`, - `workflow/status/{instanceId}`, `workflow/respond/{instanceId}/{requestId}`. - The route-scoping check `_is_workflow_orchestration(status)` compares - `status.name.casefold() == WORKFLOW_ORCHESTRATOR_NAME.casefold()` so a caller - cannot read or inject into unrelated orchestrations in the same hub. - -**No sub-workflow support exists.** Searching both packages for -`subworkflow` / `WorkflowExecutor` / `nested` returns nothing. A core -`WorkflowExecutor` (see §3) is not an `AgentExecutor`, so the planner currently -classifies it as a plain non-agent executor and would register it as a single -activity. Its activity body calls `executor.execute(...)`, which runs the -**entire inner workflow in-process inside one activity invocation** via -`WorkflowExecutor.process_workflow` → `self.workflow.run(...)`. That means: - -- inner executors do not become durable activities/entities (no durable replay - for inner steps, inner agent calls are not durable entity calls); -- inner human-in-the-loop (HITL) cannot pause — there is no external-event pump - inside an activity, and the default `propagate_request=False` emits a - `SubWorkflowRequestMessage` to a parent executor that the durable host never - wires up; -- a long inner workflow can exceed activity time limits; -- inner events are not streamed. - -So sub-workflows are effectively unsupported, not merely unoptimized. - ---- - -## 2. .NET reference (alignment baseline) - -The .NET hosting layer already supports both capabilities. Key facts to align -with (or deliberately diverge from): - -- **Multiple workflows, keyed by name.** `AddWorkflow(name, factory)` registers - workflows additively, keyed by `Workflow.Name` (lookup dictionary uses - `StringComparer.OrdinalIgnoreCase`; registration asserts the factory's - `Workflow.Name` matches the key). -- **Per-workflow orchestration name.** `WorkflowNamingHelper.ToOrchestrationFunctionName(name)` - returns `"dafx-" + name`, with a `ToWorkflowName` reverse. The orchestration - name is parameterized, not fixed. -- **Per-workflow HTTP routes.** `workflows/{workflowName}/run`, - `workflows/{workflowName}/status/{runId}`, `workflows/{workflowName}/respond/{runId}`. - Ownership is enforced by `IsOrchestrationOwnedByWorkflow(orchestrationName, - functionName, suffix)` comparing the instance's orchestration name to - `dafx-{routeWorkflowName}`. -- **Executor → durable mapping (in the durable host).** Non-agent executor → - durable **activity** `dafx-{executorName}` (dispatched via - `context.CallActivityAsync`, so results are cached in orchestration history and - not re-run on replay); agent executor → durable **entity** - `AgentSessionId.ToEntityName(executorName)`. So in the .NET *durable* host the - executors are durable activities/entities — the same model Python uses — *not* - in-process objects. The dispatch switch lives in `DurableExecutorDispatcher.DispatchAsync`. - Activity registration is deduplicated across workflows by name via a `HashSet`, - and the executor registry is keyed by executor name (first registration wins), - so two workflows that define different executors with the same name **collide** - (a documented constraint, not a fix). -- **Sub-workflows run as durable CHILD ORCHESTRATIONS (not in-process).** In the - *durable* host, `DurableExecutorDispatcher.ExecuteSubWorkflowAsync` dispatches a - sub-workflow node via `context.CallSubOrchestratorAsync("dafx-{innerName}", ...)`. - The child orchestration runs the same superstep loop and its inner executors are - durable activities/entities cached in the *child's* history. Sub-workflow and - request-port bindings are skipped by activity registration precisely because - they use this specialized dispatch. (The `WorkflowHostExecutor` / - `InProcessRunner` path is the **core in-process engine**, a separate runtime; it - is *not* how the durable host runs sub-workflows.) -- **Client retains workflow identity.** `IWorkflowClient.RunAsync(workflow, input, runId)`; - run handles carry `WorkflowName`. - -**Corrected mental model (resolving an earlier mistake in this doc):** in the -.NET *durable* host, non-agent executors are durable activities, agent executors -are durable entities, and sub-workflows are durable child orchestrations via -`CallSubOrchestratorAsync`. "Executors run in-process" is true only of the *core -in-process engine*, never the durable host. This means the Python child- -orchestration model for sub-workflows (see §5) is **alignment with .NET, not a -divergence**. - ---- - -## 3. Core in-process model (what we mirror durably) - -The core engine (`agent_framework._workflows`) already models nested workflows -in-process, and the durable layer should mirror its semantics: - -- **`WorkflowExecutor(Executor)`** wraps a `Workflow` as an executor - (`process_workflow` runs `self.workflow.run(input)`), publicly exported from - `agent_framework` along with `SubWorkflowRequestMessage` / - `SubWorkflowResponseMessage`. -- **Request bridging.** Inner `request_info` either propagates to the parent's - own request surface (`propagate_request=True` → `ctx.request_info(...)`) or is - wrapped as a `SubWorkflowRequestMessage` sent to a parent executor - (`propagate_request=False`, the default). Responses route back by `request_id`. -- **Isolation + concurrency.** Each inner run gets an `execution_id`; a - `request_id → execution_id` map routes responses to the correct concurrent run. -- **Checkpointing.** `on_checkpoint_save` / `on_checkpoint_restore` persist the - inner execution contexts and rehydrate pending request-info events. -- **Execution + events.** Pregel-style supersteps; `WorkflowEvent` types include - `output`, `intermediate`, `request_info`, `executor_invoked/completed`, - `superstep_*`, lifecycle/diagnostic. `State` is shared within a workflow and - isolated per workflow instance. - -The durable orchestrator (`run_workflow_orchestrator`) already re-implements the -superstep loop, edge-group routing, fan-in/out, and HITL pause/resume against the -`WorkflowOrchestrationContext` protocol. Sub-workflow support extends this loop to -a new executor category; multi-workflow support parameterizes registration and -naming around it. - ---- - -## 4. Part 1 — Multiple workflows per host - -### 4.1 Naming helpers (foundation) - -Replace the fixed constant with a helper pair, mirroring .NET -`WorkflowNamingHelper`: - -```python -WORKFLOW_ORCHESTRATOR_PREFIX = "dafx-" - -def workflow_orchestrator_name(workflow_name: str) -> str: # "dafx-{name}" -def workflow_name_from_orchestrator(name: str) -> str | None # reverse, validates prefix -def sanitize_workflow_name(name: str) -> str # enforce durable-safe charset -``` - -Notes: -- This aligns the Python orchestration name scheme with .NET (`dafx-{name}`). -- `WORKFLOW_ORCHESTRATOR_NAME` stays exported as a **deprecated** alias to keep - the public surface stable; see §6 back-compat. - -### 4.2 Workflow names must be explicit and stable - -`WorkflowBuilder` defaults an unnamed workflow to `f"WorkflowBuilder-{uuid4()}"`. -A random name regenerates on every process build, which would change the -orchestration function name across worker restarts and **break resume of -in-flight instances**. Therefore: - -- Multi-workflow hosting **requires an explicit, stable `Workflow.name`** (reject - auto-generated `WorkflowBuilder-` names at registration, mirroring .NET's - assert-name-matches-key contract). -- Names are validated/sanitized to the durable name charset. -- Duplicate names within one host are rejected: two **different** workflow - instances that share a name collide on one `dafx-{name}` orchestration and raise; - the **same** instance reused by several nodes (fan-out) is deduplicated and - registered once. -- Executor ids are validated for durable hosting too: they must be separator-free - (the nested-HITL qualifier, below) and length-bounded, since they are - interpolated into durable activity/entity names and nested child-orchestration - instance ids. - -### 4.3 Durable names (decision: scope workflow-internal names by workflow) - -The orchestration name stays **`dafx-{workflowName}`** (matches .NET; this is the -name the Durable Task tooling/UI keys off). For a workflow's **internal** -executors and agents, the durable names are **scoped by workflow**: - -- non-agent activity: `dafx-{workflowName}-{executorId}` -- agent entity: `dafx-{workflowName}-{executorId}` - -Each workflow registers its own distinctly named activities/entities, each a -closure capturing that workflow's specific executor/agent instance (the same -shape as today's single-workflow code, just with a longer name). `(workflow, -executor)` is globally unique, so two co-hosted workflows that reuse an executor -id never collide. - -**Why scope the names instead of resolving a bare name at runtime.** A -`dafx-{executorId}` activity/entity is created by a factory that **captures one -specific instance** (e.g. `__create_agent_entity` → `AgentEntity(agent=agent, -...)`, registered once via `add_entity`; `add_agent` even raises `ValueError` on -a duplicate id). With one global name per executor id, two workflows that define -the same id backed by **different** implementations (different agent -model/instructions/tools, or different executor code) would have one shadow the -other — a workflow silently gets behavior it did not expect. Putting the workflow -name in the durable name removes that foot-gun directly: different names, no -shared registration, plain closures, no per-call workflow lookup. - -This diverges from .NET's *inner* activity/entity names (.NET keeps bare -`dafx-{executorName}` and resolves from a global registry keyed by name, which -keeps the collision as a documented constraint). The divergence is deliberate and -low-cost: the **orchestration** name — the one the DT UI surfaces — is identical -to .NET (`dafx-{workflowName}`); only the inner activity/entity names differ, and -no tooling depends on those strings. - -**Agent state is still isolated by the entity key.** Independent of naming, an -agent entity is addressed by `(name, key)` with `key = ctx.instance_id` -(`_prepare_agent_task` → `AgentSessionId(name=..., key=instance_id)`), so two runs -never share conversation state and each run keeps its own session across turns — -mirroring core. Scoping the *name* fixes *which implementation* runs; the *key* -already isolated *state*. - -**Agent addressing (decision).** Workflow agents stay reachable, just under a -**workflow-qualified** identity rather than a bare one. Both registration paths -funnel through the same primitive today — `AgentFunctionApp` calls -`add_agent(agent, entity_id=...)` for `agents=` *and* for each agent extracted -from a workflow, and `DurableAIAgentWorker` does the same in `configure_workflow`. -The only change is the name the planner hands to that primitive for workflow -agents: scoped `{workflowName}-{executorId}` instead of bare `executorId`. So: - -- `agents=` (FunctionApp) / `add_agent(...)` (worker) → **bare** `dafx-{agentName}`, - the standalone HTTP/MCP-addressable surface. -- agents inside a `workflow` → **scoped** `dafx-{workflowName}-{executorId}`, - registered through the *same* primitive, still tracked in the registry. - -Lookup is qualified, so workflow agents do **not** disappear from the surface: - -```python -get_agent("translator") # bare standalone agent -get_agent("translator", workflow_name="orders") # workflow-scoped agent -``` - -We deliberately do **not** add a `workflow_agents=` constructor input. The agents -already live inside the `Workflow` object (each `AgentExecutor` holds its agent), -so a separate map would duplicate that and create a source-of-truth conflict. The -per-workflow agent grouping `{workflow_name: [agent_executors]}` is an *internal* -structure the planner produces and both hosts consume — not a public kwarg. An -agent used both standalone and inside a workflow is registered both ways and -becomes two independent entities (bare + scoped) with separate state, which is the -intended separation. This keeps "workflow step vs standalone callable" an explicit -registration choice while keeping both reachable. - -*Cross-workflow shared* agent memory (one agent that deliberately remembers -across two co-hosted workflows) remains out of scope; it would need an explicit -stable shared entity key rather than `instance_id`. - -### 4.4 Standalone worker changes (`durabletask`) - -- `DurableAIAgentWorker.configure_workflow` becomes **additive**: store - `self._workflows: dict[str, Workflow]` keyed by sanitized name; reject - duplicates and auto-generated names. -- Register one orchestrator per workflow, each a closure capturing its `Workflow`, - with `__name__ = workflow_orchestrator_name(name)`. -- Register that workflow's non-agent activities and agent entities under their - **scoped** names `dafx-{workflowName}-{executorId}` (§4.3), each capturing the - specific executor/agent instance, via the **same** `add_agent` / - activity-registration primitives that standalone `add_agent` uses (only the - name differs). Workflow agents stay tracked in the registry under their - workflow-qualified identity; an agent that should *also* be standalone- - addressable under a bare name is registered separately via `add_agent`. -- `plan_workflow_registration` already returns `orchestrator_name`; extend it to - also group agents/activities per workflow and thread the per-workflow name - through it (the `{workflow_name: [...]}` grouping both hosts consume). - -### 4.5 Client changes (`DurableWorkflowClient`) - -- Add an optional `workflow_name` to `start_workflow` / `run_workflow` / - `stream_workflow`. The client resolves the orchestration name via - `workflow_orchestrator_name(workflow_name)`. -- When the worker hosts exactly one workflow, `workflow_name` may be omitted - (resolves to the sole registered workflow) for ergonomic back-compat. -- Status/HITL methods remain keyed by `instance_id`; add an optional - `workflow_name` used to validate ownership (the instance's orchestration name - must match), mirroring `_is_workflow_orchestration`. - -### 4.6 Azure Functions host changes (`azurefunctions`) - -- `AgentFunctionApp` accepts `workflows: list[Workflow] | dict[str, Workflow]` - (keep singular `workflow=` as a back-compat alias for one entry). -- Per workflow: register an orchestrator via - `@function_name(workflow_orchestrator_name(name))` + `@orchestration_trigger`, - register its activities/entities under scoped names - `dafx-{workflowName}-{executorId}` (§4.3), and register **per-workflow routes**: - `workflow/{workflowName}/run`, `workflow/{workflowName}/status/{instanceId}`, - `workflow/{workflowName}/respond/{instanceId}/{requestId}`. -- **Routes are always per-workflow, even for a single workflow** (decision §8). - Keeping the shape constant means downstream callers do not have to change URLs - when an app grows from one workflow to many — the single-workflow case is just - the one-element case of the general shape. (No legacy flat `workflow/run` - aliases; this is still a preview surface.) -- Replace `_is_workflow_orchestration(status)` with - `_is_owned_orchestration(status, workflow_name)` comparing - `status.name == workflow_orchestrator_name(workflow_name)` (the existing - case-insensitive comment already anticipated per-workflow names). -- Workflow agents register through the **same** `add_agent` primitive as - `agents=`, under the scoped name `dafx-{workflowName}-{executorId}`, so they - stay tracked in the registry. `get_agent` gains an optional `workflow_name` to - resolve them: `get_agent(name)` for bare standalone agents, - `get_agent(name, workflow_name=...)` for workflow-scoped agents. Expose an agent - standalone (bare `dafx-{agentName}`) by passing it via `agents=`; an agent used - both ways is registered both ways and yields two independent entities (§4.3). -- No `workflow_agents=` constructor kwarg — the agents already live inside each - `Workflow`; the per-workflow grouping is internal (§4.3). - ---- - -## 5. Part 2 — Sub-workflows - -A `WorkflowExecutor` node in a hosted workflow must run its inner `Workflow` -durably. Three execution models were considered. - -### 5.1 Models considered - -- **Model A — inner workflow inside one activity (status quo).** Register the - `WorkflowExecutor` as a normal activity; its body runs `inner.workflow.run()` - in-process. Simplest, but not durable, cannot pause for inner HITL, and risks - activity timeouts. **Rejected** as the primary model (it is today's accidental, - broken behavior). -- **Model B — child orchestration (recommended).** When the orchestrator reaches - a `WorkflowExecutor` node, it starts the inner workflow as a **durable child - orchestration** (`call_sub_orchestrator(workflow_orchestrator_name(inner_name), - input=...)`) and awaits its result like any other task. The inner workflow's - executors become its own activities/entities; it is independently durable, - checkpointed, observable (own instance id), and can run long without hitting - activity limits. -- **Model C — inlined supersteps.** Recursively drive the inner workflow's - superstep loop inside the parent orchestration generator, scheduling inner - executor activities directly and qualifying inner request ids - (`{subId}.{requestId}`) like .NET ports. Durable and single-instance, but - bloats one orchestration history, prevents independent inner observation, and - re-implements nesting bookkeeping in the generator. **Rejected** as primary - (highest complexity, weakest observability). - -### 5.2 Recommendation: Model B (child orchestration) - -Model B is the natural durable fit for Python because Python's executors already -run as activities/entities driven by the orchestrator — so a sub-workflow is just -**another registered workflow started by a parent instead of by HTTP**. It reuses -the entire Part 1 multi-workflow machinery (named registration, per-workflow -orchestrator, scoped activity/entity names, ownership checks). - -**This matches .NET.** The .NET *durable* host dispatches sub-workflow nodes via -`DurableExecutorDispatcher.ExecuteSubWorkflowAsync` → -`context.CallSubOrchestratorAsync("dafx-{innerName}", ...)`, and the child -orchestration runs its own superstep loop with its inner executors as durable -activities/entities in the child's history. Model B is the same approach in -Python (the `WorkflowHostExecutor` / `InProcessRunner` path is the *core in- -process engine*, not the durable host). The tradeoff is more orchestration -instances (extra bookkeeping and more rows in the DT UI) in exchange for true -inner durability, independent inner observability, inner HITL, and no activity- -timeout coupling. - -### 5.3 Required changes - -- **Protocol.** Add `call_sub_orchestrator(name, input, instance_id=None)` to the - `WorkflowOrchestrationContext` protocol, implemented by both adapters - (`DurableTaskWorkflowContext` → `OrchestrationContext.call_sub_orchestrator`; - `AzureFunctionsWorkflowContext` → `DurableOrchestrationContext.call_sub_orchestrator`). - Both underlying SDKs support sub-orchestrations. -- **Planner.** Extend `plan_workflow_registration` to detect - `isinstance(executor, WorkflowExecutor)` and return a new - `subworkflow_executors` category carrying the inner `Workflow`. The host then - (a) **recursively registers** the inner workflow's orchestrator/activities/ - entities, and (b) does **not** register the `WorkflowExecutor` itself as an - activity. -- **Orchestrator routing.** In `run_workflow_orchestrator`'s task-preparation - phase, route a message destined for a `WorkflowExecutor` node to - `ctx.call_sub_orchestrator(...)` instead of an activity task. The child's - result feeds back into edge routing exactly like an activity result (outputs → - messages / final outputs). -- **Deterministic child instance ids.** Derive - `f"{parent_instance_id}::{executor_id}::{counter}"` (the counter is monotonic - across supersteps, so a `WorkflowExecutor` that runs on multiple messages — e.g. - fan-out — gets a distinct, replay-stable id per child) for discoverability and - idempotent replay. (These are orchestration *instance* ids, distinct from the - HITL *request*-id qualifier below.) -- **No artificial recursion cap.** Nesting depth is *not* bounded by a counter. A - `WorkflowExecutor` wraps a concrete `Workflow` instance, so the nesting tree is - finite and fixed at build time; there is no way to express unbounded runtime - recursion. The recursively-derived child instance ids grow with depth, so the - durable backend's instance-id length limit (together with the workflow-name and - executor-id caps) is the natural ceiling for any pathological construction — a - separate magic depth number would only reject legitimate deep compositions. This - also matches .NET, whose durable host imposes no depth limit. -- **Result/output mapping.** Reuse the existing typed-output reconstruction - (`deserialize_workflow_output`) on the child result before routing. - -### 5.4 Sub-workflow HITL - -The inner workflow's `request_info` surfaces in the **child** orchestration's -custom status. Two addressing options: - -- **B1 — direct child addressing.** Expose child instance ids; the responder - posts to `workflow/{innerName}/respond/{childInstanceId}/{requestId}`. Simple; - caller discovers child ids from the parent status (which lists nested pending - requests with their child instance ids). -- **B2 — propagated single surface (recommended, .NET-aligned philosophy).** - Bubble inner pending requests up into the **parent** custom status with - **qualified request ids** (`{executorId}~{ordinal}~{requestId}`, nested deeper for - deeper levels), mirroring .NET port qualification. A response to the parent is - routed by peeling one hop at a time and raising the event on the owning child - instance. One addressing surface for arbitrarily deep nesting, at the cost of - parent→child response plumbing. - -**Decision: B2 (propagated single surface).** Pending inner requests bubble up -into the **parent** custom status with **qualified request ids** -(`{executorId}~{ordinal}~{requestId}`), mirroring .NET port qualification. A -response to the parent is routed by peeling one hop and raising the event on the -owning child instance. This gives one addressing surface for arbitrarily deep -nesting (the caller always talks to the top-level run), at the cost of -parent→child response plumbing. It is consistent with the "always per-workflow, -stable surface" routing decision: callers never need to discover child instance -ids. B1 (direct child addressing) is the rejected alternative — simpler plumbing -but leaks child instance ids into the caller and changes the surface per nesting -depth. - -The `~{ordinal}~` hop carries the child's index in the parent's `subworkflows` -status list (`{executorId: [childInstanceId, ...]}`), so a node that dispatches -several children in one superstep keeps each child independently addressable. The -separator is `~` (not `::`) because core emits `auto::{index}` request ids for -functional `@workflow` HITL; a `::` separator would mis-parse those leaf ids, -whereas `~` never appears in a core request id and is rejected in executor ids. - -**Trust boundary.** The sub-orchestration input envelope (which carries the -trusted, parent-serialized child payload) uses a reserved key that the child -orchestrator deserializes via pickle *without* the usual marker-stripping. A -genuine envelope is only ever built internally after the trust boundary, so hosts -strip that reserved key from untrusted client input before scheduling a run — -preventing a forged envelope from smuggling a pickle payload onto the trusted -deserialization path. - ---- - -## 6. Cross-cutting concerns - -- **Back-compat / migration (decision: hard switch).** `WORKFLOW_ORCHESTRATOR_NAME` - stays exported as a deprecated alias for source compatibility, but the - single-workflow default orchestration name moves from `workflow_orchestrator` - to `dafx-{name}` with **no runtime alias**. This means **in-flight - single-workflow instances created before the upgrade will not resume** under - the new name. Accepted because durable workflow runs are typically short-lived - and this is a preview surface; operators should drain in-flight workflow - instances before upgrading. (Resolves former open decision; §8.) -- **Determinism.** `call_sub_orchestrator`, `wait_for_external_event`, and timers - are replay-safe; child instance ids must be derived deterministically (no - `uuid4()` in the orchestrator — use `ctx.new_uuid()` or derived ids). -- **Security / route scoping.** Per-workflow ownership checks - (`_is_owned_orchestration`) extend the existing defense so a caller holding an - instance id cannot cross workflow boundaries. Sub-workflow respond endpoints - validate child ownership the same way. -- **Streaming.** `supports_event_streaming` stays host-gated (Azure Functions - off due to the 16 KB custom-status cap). Nested event propagation respects the - same gate. - ---- - -## 7. Phased work breakdown - -Each phase is independently shippable. - -- **Phase 0 — naming + validation.** Add `workflow_orchestrator_name` / - reverse / `sanitize_workflow_name`; deprecate `WORKFLOW_ORCHESTRATOR_NAME`. - Unit tests for naming round-trips and validation. (durabletask) -- **Phase 1 — multiple workflows on the standalone worker.** Additive - `configure_workflow`, per-workflow orchestrators, scoped activity/entity names - `dafx-{workflowName}-{executorId}`, client `workflow_name` targeting + - ownership. Unit + integration tests with two workflows in one hub (including two - workflows that reuse an executor/agent id with different implementations). -- **Phase 2 — multiple workflows on Azure Functions.** `workflows=`, - per-workflow orchestrators/activities/routes (always per-workflow), - `_is_owned_orchestration`. Unit tests + a two-workflow sample. -- **Phase 3 — sub-workflows via child orchestrations.** Protocol - `call_sub_orchestrator` + both adapters; planner `subworkflow_executors` + - recursive registration; orchestrator routing; deterministic child ids. Unit + - integration tests + a nested-workflow sample. -- **Phase 4 — sub-workflow HITL (B2).** Propagate inner pending requests to the - parent custom status with qualified request ids; route a parent response to the - owning child instance by stripping the qualifier. Tests + HITL sub-workflow - sample. -- **Phase 5 — docs, samples, ADR(s).** Promote the multi-workflow and - sub-workflow decisions into ADR(s) under `docs/decisions/`; add README/runbook - updates. - ---- - -## 8. Decisions - -**Resolved:** - -1. **Orchestration naming** (§4.1, §4.3): orchestration name **`dafx-{workflowName}`** - (matches .NET; the name the Durable Task tooling/UI surfaces). -2. **Workflow-internal durable names** (§4.3): scope inner activity/entity names - by workflow — **`dafx-{workflowName}-{executorId}`** (Approach A). Distinct - names per workflow, plain closures, no runtime registry; removes the - same-executor-id collision. Diverges from .NET's bare inner names, but only the - orchestration name (identical to .NET) is UI-surfaced. -3. **Multi-workflow route shape on Azure Functions** (§4.6): **always per-workflow - routes**, so downstream callers don't change URLs when an app grows from one - workflow to many. -4. **Sub-workflow execution model** (§5): **Model B (child orchestration via - `call_sub_orchestrator`)**, which is what the .NET durable host does - (`ExecuteSubWorkflowAsync`). Accept more orchestration instances in exchange - for inner durability and observability. -5. **Single-workflow orchestration-name migration** (§6): **hard switch** to - `dafx-{name}` with no runtime alias. Pre-upgrade in-flight instances under - `workflow_orchestrator` won't resume; acceptable for a preview surface. -6. **Sub-workflow HITL addressing** (§5.4): **B2** — propagate inner pending - requests to the parent custom status with qualified request ids; the caller - always responds to the top-level run. -7. **Agent addressing** (§4.3, §4.6): workflow agents register through the **same** - `add_agent` primitive as `agents=`, under the scoped name - `dafx-{workflowName}-{executorId}`, and stay reachable via - `get_agent(name, workflow_name=...)`. Bare `agents=` registration keeps the - standalone `dafx-{agentName}` surface. No `workflow_agents=` kwarg — the - per-workflow grouping is an internal planner structure both hosts consume. - Agent conversation *state* stays isolated by the entity key (`ctx.instance_id`) - regardless of naming. - -**Still open:** - -- **Cross-workflow shared agents** (§4.3): a single agent that intentionally - shares conversation memory across two co-hosted workflows is out of scope; if - wanted later it needs an explicit stable shared entity key rather than - `instance_id`. Flagged as a possible follow-up, not part of this work. diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py index d45df96064e..b1b90727b4a 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/naming.py @@ -20,8 +20,7 @@ The orchestration name is the identifier the Durable Task tooling/UI surfaces, so it matches .NET exactly. The inner activity/entity names are scoped by workflow in Python (unlike .NET's bare ``dafx-{executorId}``) so two co-hosted workflows that -reuse an executor id cannot collide. See -``docs/design/durabletask-multiworkflow-and-subworkflows.md`` for the rationale. +reuse an executor id cannot collide. """ from __future__ import annotations From d7424bc406c22d67ba64a18319bd7726864ca0c8 Mon Sep 17 00:00:00 2001 From: Ahmed Muhsin Date: Wed, 24 Jun 2026 13:53:42 -0400 Subject: [PATCH 11/12] refactor(durabletask): simplify workflow client status parsing and drop deprecated orchestrator-name symbols Extract a shared _parse_custom_status helper in DurableWorkflowClient to remove duplicated custom-status JSON parsing across three call sites. Drop the now-unused single-workflow compatibility shims WORKFLOW_ORCHESTRATOR_NAME and WorkflowRegistrationPlan.orchestrator_name, replaced by per-workflow workflow_orchestrator_name(name). --- .../agent_framework_durabletask/__init__.py | 3 +- .../_workflows/client.py | 46 ++++++++++--------- .../_workflows/orchestrator.py | 13 ------ .../_workflows/registration.py | 12 +---- .../tests/test_workflow_registration.py | 2 - .../durabletask/09_workflow_hitl/worker.py | 2 +- 6 files changed, 29 insertions(+), 49 deletions(-) diff --git a/python/packages/durabletask/agent_framework_durabletask/__init__.py b/python/packages/durabletask/agent_framework_durabletask/__init__.py index efc6dda37ef..bcf0abdf110 100644 --- a/python/packages/durabletask/agent_framework_durabletask/__init__.py +++ b/python/packages/durabletask/agent_framework_durabletask/__init__.py @@ -63,7 +63,7 @@ workflow_name_from_orchestrator, workflow_orchestrator_name, ) -from ._workflows.orchestrator import WORKFLOW_ORCHESTRATOR_NAME, run_workflow_orchestrator +from ._workflows.orchestrator import run_workflow_orchestrator from ._workflows.registration import WorkflowRegistrationPlan, collect_hosted_workflows, plan_workflow_registration from ._workflows.runner_context import CapturingRunnerContext from ._workflows.serialization import deserialize_workflow_output @@ -85,7 +85,6 @@ "THREAD_ID_HEADER", "WAIT_FOR_RESPONSE_FIELD", "WAIT_FOR_RESPONSE_HEADER", - "WORKFLOW_ORCHESTRATOR_NAME", "AgentCallbackContext", "AgentEntity", "AgentEntityStateProviderMixin", diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/client.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/client.py index 8d0db527298..4a045463b68 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/client.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/client.py @@ -315,13 +315,10 @@ async def stream_workflow( raise ValueError(f"Instance '{instance_id}' does not belong to the targeted workflow.") ownership_checked = True - if state is not None and state.serialized_custom_status: - try: - status = json.loads(state.serialized_custom_status) - except (json.JSONDecodeError, TypeError): - status = None - if isinstance(status, dict): - events = cast("dict[str, Any]", status).get("events") + if state is not None: + status = self._parse_custom_status(state.serialized_custom_status) + if status is not None: + events = status.get("events") if isinstance(events, list): typed_events = cast("list[dict[str, Any]]", events) while cursor < len(typed_events): @@ -370,6 +367,22 @@ def get_pending_hitl_requests(self, instance_id: str, *, workflow_name: str | No return self._collect_pending_hitl_requests(state.serialized_custom_status) + @staticmethod + def _parse_custom_status(serialized_custom_status: str | None) -> dict[str, Any] | None: + """Parse a serialized custom status into a dict, or ``None`` if unusable. + + Returns ``None`` for an empty/absent status or any value that is not a JSON + object (the only shape the orchestrator ever writes), so callers can treat + "no usable status" uniformly. + """ + if not serialized_custom_status: + return None + try: + parsed = json.loads(serialized_custom_status) + except (json.JSONDecodeError, TypeError): + return None + return cast("dict[str, Any]", parsed) if isinstance(parsed, dict) else None + def _collect_pending_hitl_requests(self, serialized_custom_status: str) -> list[dict[str, Any]]: """Collect an orchestration's pending requests plus any nested sub-workflow ones. @@ -381,13 +394,9 @@ def _collect_pending_hitl_requests(self, serialized_custom_status: str) -> list[ trusted, having come from the parent's status), so no per-child ownership check is applied. """ - try: - custom_status = json.loads(serialized_custom_status) - except (json.JSONDecodeError, TypeError): - return [] - if not isinstance(custom_status, dict): + status_dict = self._parse_custom_status(serialized_custom_status) + if status_dict is None: return [] - status_dict = cast(dict[str, Any], custom_status) requests: list[dict[str, Any]] = [] @@ -502,15 +511,10 @@ def _lookup_subworkflow_instance(self, instance_id: str, executor_id: str, ordin selects the child at ``ordinal`` (its dispatch order this superstep). """ state = self._client.get_orchestration_state(instance_id) - if state is None or not state.serialized_custom_status: - return None - try: - custom_status = json.loads(state.serialized_custom_status) - except (json.JSONDecodeError, TypeError): - return None - if not isinstance(custom_status, dict): + custom_status = self._parse_custom_status(state.serialized_custom_status if state else None) + if custom_status is None: return None - subworkflows = cast(dict[str, Any], custom_status).get("subworkflows") + subworkflows = custom_status.get("subworkflows") if not isinstance(subworkflows, dict): return None children_raw = cast(dict[str, Any], subworkflows).get(executor_id) diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py index e37aa8888e3..2493912750c 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/orchestrator.py @@ -82,19 +82,6 @@ # child instance ids grow with depth, so the durable backend's instance-id length # limit is the natural ceiling for any pathological construction. -# Name of the auto-generated orchestrator registered by -# ``DurableAIAgentWorker.configure_workflow`` (and the Azure Functions host). -# Standalone clients start a configured workflow by scheduling an orchestration -# with this name, e.g. -# ``client.schedule_new_orchestration(WORKFLOW_ORCHESTRATOR_NAME, input=...)``. -# -# DEPRECATED (multi-workflow migration): this fixed single-workflow name is being -# replaced by per-workflow orchestration names ``dafx-{workflowName}`` derived via -# :func:`agent_framework_durabletask._workflows.naming.workflow_orchestrator_name`. -# It is retained for source compatibility while the single-workflow hosting path -# still uses it; new code should prefer ``workflow_orchestrator_name(name)``. -WORKFLOW_ORCHESTRATOR_NAME = "workflow_orchestrator" - # ============================================================================ # Task Types and Data Structures diff --git a/python/packages/durabletask/agent_framework_durabletask/_workflows/registration.py b/python/packages/durabletask/agent_framework_durabletask/_workflows/registration.py index e3140d1577f..63bfcafaa12 100644 --- a/python/packages/durabletask/agent_framework_durabletask/_workflows/registration.py +++ b/python/packages/durabletask/agent_framework_durabletask/_workflows/registration.py @@ -34,8 +34,6 @@ from agent_framework import AgentExecutor, Executor, Workflow, WorkflowExecutor -from .orchestrator import WORKFLOW_ORCHESTRATOR_NAME - @dataclass class WorkflowRegistrationPlan: @@ -54,16 +52,11 @@ class WorkflowRegistrationPlan: workflows are driven as durable child orchestrations. The node itself is *not* registered as an activity; its inner workflow is registered separately (see :func:`collect_hosted_workflows`). - orchestrator_name: Deprecated fixed orchestrator name. Hosts derive the - actual per-workflow name via - :func:`~agent_framework_durabletask._workflows.naming.workflow_orchestrator_name`; - this field is retained for source compatibility only. """ agent_executors: list[AgentExecutor] activity_executors: list[Executor] subworkflow_executors: list[WorkflowExecutor] - orchestrator_name: str def plan_workflow_registration(workflow: Workflow) -> WorkflowRegistrationPlan: @@ -74,8 +67,8 @@ def plan_workflow_registration(workflow: Workflow) -> WorkflowRegistrationPlan: Returns: A :class:`WorkflowRegistrationPlan` describing the agent executors - (entities), sub-workflow executors (child orchestrations), the remaining - non-agent executors (activities), and the orchestrator name. + (entities), sub-workflow executors (child orchestrations), and the + remaining non-agent executors (activities). """ agent_executors: list[AgentExecutor] = [] activity_executors: list[Executor] = [] @@ -93,7 +86,6 @@ def plan_workflow_registration(workflow: Workflow) -> WorkflowRegistrationPlan: agent_executors=agent_executors, activity_executors=activity_executors, subworkflow_executors=subworkflow_executors, - orchestrator_name=WORKFLOW_ORCHESTRATOR_NAME, ) diff --git a/python/packages/durabletask/tests/test_workflow_registration.py b/python/packages/durabletask/tests/test_workflow_registration.py index 7ecee8710e3..5f7f03fd4bc 100644 --- a/python/packages/durabletask/tests/test_workflow_registration.py +++ b/python/packages/durabletask/tests/test_workflow_registration.py @@ -18,7 +18,6 @@ collect_hosted_workflows, plan_workflow_registration, ) -from agent_framework_durabletask._workflows.orchestrator import WORKFLOW_ORCHESTRATOR_NAME def _agent_executor(executor_id: str, agent_name: str) -> Mock: @@ -63,7 +62,6 @@ def test_agent_executor_classified_as_entity(self) -> None: assert plan.agent_executors == [agent_exec] assert plan.activity_executors == [] - assert plan.orchestrator_name == WORKFLOW_ORCHESTRATOR_NAME def test_non_agent_executor_classified_as_activity(self) -> None: """A plain Executor is classified as an activity.""" diff --git a/python/samples/04-hosting/durabletask/09_workflow_hitl/worker.py b/python/samples/04-hosting/durabletask/09_workflow_hitl/worker.py index e8d5c71d82a..47198024b6b 100644 --- a/python/samples/04-hosting/durabletask/09_workflow_hitl/worker.py +++ b/python/samples/04-hosting/durabletask/09_workflow_hitl/worker.py @@ -11,7 +11,7 @@ - a durable entity for each agent executor, - a durable activity for each non-agent executor, and -- the workflow orchestrator (named ``WORKFLOW_ORCHESTRATOR_NAME``). +- the workflow orchestrator (named ``dafx-{workflow.name}``). When the workflow calls ``ctx.request_info``, the orchestrator pauses and records the open request in its custom status. An external client discovers the request From 68220a05474d5abc6fe6529016f8edb36aecaec7 Mon Sep 17 00:00:00 2001 From: Ahmed Muhsin Date: Wed, 24 Jun 2026 14:19:49 -0400 Subject: [PATCH 12/12] fix(core): drop WORKFLOW_ORCHESTRATOR_NAME from agent_framework.azure re-exports The constant was removed from agent-framework-durabletask, but the core azure lazy-loading namespace still re-exported it, breaking pyright in packages/core. Remove it from both the runtime _IMPORTS map and the .pyi stub. --- python/packages/core/agent_framework/azure/__init__.py | 1 - python/packages/core/agent_framework/azure/__init__.pyi | 2 -- 2 files changed, 3 deletions(-) diff --git a/python/packages/core/agent_framework/azure/__init__.py b/python/packages/core/agent_framework/azure/__init__.py index a6a1de34064..ba6a0e996d3 100644 --- a/python/packages/core/agent_framework/azure/__init__.py +++ b/python/packages/core/agent_framework/azure/__init__.py @@ -20,7 +20,6 @@ "DurableAIAgentOrchestrationContext": ("agent_framework_durabletask", "agent-framework-durabletask"), "DurableAIAgentWorker": ("agent_framework_durabletask", "agent-framework-durabletask"), "DurableWorkflowClient": ("agent_framework_durabletask", "agent-framework-durabletask"), - "WORKFLOW_ORCHESTRATOR_NAME": ("agent_framework_durabletask", "agent-framework-durabletask"), } diff --git a/python/packages/core/agent_framework/azure/__init__.pyi b/python/packages/core/agent_framework/azure/__init__.pyi index 97a44180f8f..7a914cee51d 100644 --- a/python/packages/core/agent_framework/azure/__init__.pyi +++ b/python/packages/core/agent_framework/azure/__init__.pyi @@ -10,7 +10,6 @@ from agent_framework_azure_ai_search import ( from agent_framework_azure_cosmos import CosmosHistoryProvider from agent_framework_azurefunctions import AgentFunctionApp from agent_framework_durabletask import ( - WORKFLOW_ORCHESTRATOR_NAME, AgentCallbackContext, AgentResponseCallbackProtocol, DurableAIAgent, @@ -21,7 +20,6 @@ from agent_framework_durabletask import ( ) __all__ = [ - "WORKFLOW_ORCHESTRATOR_NAME", "AgentCallbackContext", "AgentFunctionApp", "AgentResponseCallbackProtocol",