AzureCosmosDB · aayush3011 · May 29, 2026 · May 26, 2026 · May 27, 2026 · May 27, 2026
diff --git a/.gitignore b/.gitignore
@@ -35,3 +35,10 @@ venv/
 
 # macOS
 .DS_Store
+
+# JetBrains IDEs
+.idea/
+
+# Local scratch notes
+Agent Memory Toolkit.txt
+Cosmos AI Memory Service.txt
diff --git a/Docs/architecture/orchestrators.md b/Docs/architecture/orchestrators.md
@@ -0,0 +1,23 @@
+# Durable orchestrator activity chains
+
+W15 splits LLM extraction from persistence so Durable retries after a Cosmos write failure do not re-run the LLM activity.
+
+```text
+ExtractMemoriesOrchestrator
+  em_Extract  (load recent turns + LLM + parse; no embeddings/writes)
+  em_Persist  (embeddings + deterministic create_item; 409 = already persisted)
+  em_ReconcileMemories (optional; single activity for GA)
+
+ThreadSummaryOrchestrator
+  ts_Extract
+  ts_PersistSummary
+
+UserSummaryOrchestrator
+  us_Extract
+  us_PersistUserSummary
+
+SynthesizeProceduralOrchestrator
+  sp_SynthesizeProcedural (single activity for GA)
+```
+
+Fact and episodic IDs are deterministic from user, thread, and normalized content. Thread and user summaries keep their deterministic summary IDs.
diff --git a/Docs/concepts.md b/Docs/concepts.md
@@ -119,7 +119,7 @@ Prompts for summarization and fact extraction live in `azure_functions/prompts/`
 The `reconcile_memories(user_id, n=50)` pipeline step reads up to N most-recent active facts for a user and asks the LLM to identify two orthogonal outcomes in one pass:
 
 - **Duplicates** — two or more facts that restate the same claim in different words. Resolution: collapse into one merged fact; the originals are soft-deleted with `supersede_reason="duplicate"` and `superseded_by` set to the merged fact's id.
-- **Contradictions** — two facts that assert opposing claims about the same subject. Resolution: keep the winner (more recent first, higher confidence as tiebreaker), soft-delete the loser with `supersede_reason="contradiction"` and `superseded_by` set to the winner.
+- **Contradictions** — two facts that assert opposing claims about the same subject. Resolution: keep the winner (more recent first, higher confidence as tiebreaker), soft-delete the loser with `supersede_reason="contradict"` and `superseded_by` set to the winner.
 
 ### Why one pass
 

diff --git a/Docs/operations.md b/Docs/operations.md
@@ -0,0 +1,27 @@
+# Operations
+
+Runtime knobs for an Agent-Memory-Toolkit deployment. Most ops levers live in `.env` / Function-app App Settings — change them, restart the consumer, and you're done. Deployment-time knobs (Bicep params bound to `azd env set ...`) live in [`infra/README.md`](../infra/README.md).
+
+## Memory lifecycle (TTL)
+
+| Type | Default TTL | Source |
+|---|---:|---|
+| turn | 30 d | container default (memories_turns) |
+| episodic | 90 d | per-doc ttl (memories container) |
+| thread_summary | never | container default (memories, -1) |
+| user_summary | never | container default |
+| fact | never | container default; supersession handles aging |
+| procedural | never | container default; supersession handles aging |
+
+Override per write:
+
+    client.add_memory(text, type="turn", ttl=60)   # expires in 60 seconds
+
+Override per container at provision time:
+
+    azd env set MEMORIES_TURNS_DEFAULT_TTL 86400   # 1 day
+
+## Counter-based trigger configuration
+
+Function-app threshold knobs (`THREAD_SUMMARY_EVERY_N`, `FACT_EXTRACTION_EVERY_N`, `DEDUP_EVERY_N`, `USER_SUMMARY_EVERY_N`, `MAX_BATCH_SIZE`, `MEMORY_PROCESSOR_OWNER`) are documented in [`infra/README.md` → Counter-based trigger configuration](../infra/README.md#counter-based-trigger-configuration-function-app-only). Change them with `azd env set ...` then `azd up`.
+
diff --git a/Docs/public_api.md b/Docs/public_api.md
@@ -0,0 +1,123 @@
+# Public API
+
+## Architecture
+
+`CosmosMemoryClient` and `AsyncCosmosMemoryClient` are thin orchestrators. They keep local-buffer state and Cosmos connection lifecycle, then delegate persistence to `MemoryStore` / `AsyncMemoryStore` and higher-level behavior to:
+
+- `ChatClient` / `EmbeddingsClient` (sync) and `AsyncEmbeddingsClient` (async) — Azure OpenAI wrappers.
+- `RetrievalService` / `AsyncRetrievalService` for filtering, vector search, and episodic context.
+- `PipelineService` for extraction, summaries, procedural synthesis, and reconciliation.
+- `InProcessProcessor` / `AsyncInProcessProcessor` / `DurableFunctionProcessor` for immediate or change-feed-driven processing.
+- `auto_trigger.maybe_trigger_steps` (sync) and `aio.auto_trigger.maybe_trigger_steps` (async) for threshold-driven step firing after each `push_to_cosmos`.
+
+## CosmosMemoryClient (sync)
+
+### Connection
+
+- `__init__(cosmos_endpoint=None, cosmos_credential=None, cosmos_key=None, cosmos_database=None, cosmos_container=None, cosmos_turns_container=None, cosmos_counter_container=None, cosmos_lease_container=None, cosmos_throughput_mode=None, cosmos_autoscale_max_ru=None, ai_foundry_endpoint=None, ai_foundry_credential=None, ai_foundry_api_key=None, embedding_deployment_name='text-embedding-3-large', embedding_dimensions=None, chat_deployment_name='gpt-4o-mini', use_default_credential=True, processor=None) -> None` — configure local state, model clients, optional Cosmos auto-connect, and optional processing backend. When `cosmos_turns_container` is set, turn-type documents land in a dedicated container so the main `memories` container only fires the Durable change-feed trigger for processed memory writes.
+- `close() -> None` — close Cosmos/model clients and owned credentials.
+- `connect_cosmos(endpoint=None, credential=None, key=None, database=None, container=None, turns_container=None) -> None` — connect to an existing memory container.
+- `create_memory_store(database=None, container=None, turns_container=None, counter_container=None, lease_container=None, endpoint=None, credential=None, key=None, embedding_dimensions=None, embedding_data_type=None, distance_function=None, full_text_language=None, throughput_mode=None, autoscale_max_ru=None) -> None` — create/connect the memory, optional turns, counter, and lease containers.
+
+### Memory CRUD
+
+- `add_local(user_id, role, content, memory_type='turn', agent_id=None, metadata=None, thread_id=None, tags=None, ttl=None, salience=None) -> None` — append a memory to the local buffer.
+- `get_local(memory_id=None, user_id=None, role=None, memory_types=None) -> list[dict]` — filter local buffered memories.
+- `update_local(memory_id, content=None, role=None, memory_type=None, metadata=None) -> None` — update a local buffered memory.
+- `delete_local(memory_id) -> None` — remove a local buffered memory.
+- `add_cosmos(user_id, role, content, memory_type='turn', metadata=None, thread_id=None, tags=None, ttl=None, salience=None, embedding=None, embed=None) -> str` — upsert one memory to Cosmos and return its id.
+- `push_to_cosmos(batch_size=25) -> None` — flush local buffered memories to Cosmos.
+- `get_memories(memory_id=None, user_id=None, thread_id=None, role=None, memory_types=None, recent_k=None, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — retrieve memories with filters.
+- `update_cosmos(memory_id, content=None, role=None, memory_type=None, metadata=None) -> None` — update a Cosmos memory.
+- `delete_cosmos(memory_id, thread_id, user_id) -> None` — delete a Cosmos memory.
+- `get_thread(thread_id, user_id=None, memory_types=None, recent_k=None, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, created_after=None, created_before=None) -> list[dict]` — retrieve a thread oldest-first.
+- `get_user_summary(user_id) -> Optional[dict]` — retrieve the active user-summary document.
+
+### Retrieval
+
+- `search_cosmos(search_terms, memory_id=None, user_id=None, role=None, memory_types=None, thread_id=None, hybrid_search=False, top_k=5, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — vector or hybrid search memories.
+- `get_procedural_prompt(user_id) -> Optional[str]` — read the active procedural prompt.
+- `get_procedural_history(user_id, limit=10) -> list[dict]` — read procedural prompt history.
+- `get_procedural_memories(user_id, priority=None, category=None, min_salience=None, include_superseded=False) -> list[dict]` — retrieve procedural memory documents.
+- `search_episodic_memories(user_id, search_terms, top_k=5, min_salience=None, include_superseded=False) -> list[dict]` — search episodic memories.
+- `build_procedural_context(user_id) -> str` — format procedural context for prompts.
+- `build_episodic_context(user_id, query, top_k=3) -> str` — format relevant episodic context.
+
+### Processing
+
+- `extract_memories(user_id, thread_id, recent_k=None) -> dict[str, int]` — extract facts/episodic memories from a thread.
+- `synthesize_procedural(user_id, *, force=False) -> dict` — synthesize the procedural prompt.
+- `generate_thread_summary(user_id, thread_id, recent_k=None, **kwargs) -> dict` — generate and persist a thread summary.
+- `generate_user_summary(user_id, thread_ids=None, recent_k=None, **kwargs) -> dict` — generate and persist a user summary.
+- `reconcile(user_id, n=None) -> dict[str, int]` — reconcile duplicate or contradictory facts.
+- `process_now(*, user_id, thread_id) -> ProcessThreadResult` — run the configured processor immediately.
+- `process_now_and_wait(*, user_id, thread_id, timeout=30.0) -> bool` — process and wait for a summary.
+
+### Tagging
+
+- `add_tags(memory_id, user_id, thread_id, tags) -> None` — add tags to a memory.
+- `remove_tags(memory_id, user_id, thread_id, tags) -> None` — remove tags from a memory.
+- `list_tags(user_id, *, thread_id=None, prefix=None, include_sys=False) -> list[str]` — list sorted, deduped tags for a user; omits `sys:*` by default.
+
+## AsyncCosmosMemoryClient
+
+Local-buffer methods remain synchronous in-memory operations; Cosmos, retrieval, and processing methods are `async` and must be awaited.
+
+### Connection
+
+- `__init__(cosmos_endpoint=None, cosmos_credential=None, cosmos_key=None, cosmos_database=None, cosmos_container=None, cosmos_turns_container=None, cosmos_counter_container=None, cosmos_lease_container=None, cosmos_throughput_mode=None, cosmos_autoscale_max_ru=None, ai_foundry_endpoint=None, ai_foundry_credential=None, ai_foundry_api_key=None, embedding_deployment_name='text-embedding-3-large', embedding_dimensions=None, chat_deployment_name='gpt-4o-mini', use_default_credential=True, processor=None) -> None` — configure async local state, model clients, and optional processing backend. When `cosmos_turns_container` is set, turn-type documents land in a dedicated container so the main `memories` container only fires the Durable change-feed trigger for processed memory writes.
+- `async close() -> None` — close async/sync resources and owned credentials.
+- `async connect_cosmos(endpoint=None, credential=None, key=None, database=None, container=None, turns_container=None) -> None` — connect to an existing memory container.
+- `async create_memory_store(database=None, container=None, turns_container=None, counter_container=None, lease_container=None, endpoint=None, credential=None, key=None, embedding_dimensions=None, embedding_data_type=None, distance_function=None, full_text_language=None, throughput_mode=None, autoscale_max_ru=None) -> None` — create/connect memory, optional turns, counter, and lease containers.
+
+### Memory CRUD
+
+- `add_local(user_id, role, content, memory_type='turn', agent_id=None, metadata=None, thread_id=None, tags=None, ttl=None, salience=None) -> None` — append a memory to the local buffer.
+- `get_local(memory_id=None, user_id=None, role=None, memory_types=None) -> list[dict]` — filter local buffered memories.
+- `update_local(memory_id, content=None, role=None, memory_type=None, metadata=None) -> None` — update a local buffered memory.
+- `delete_local(memory_id) -> None` — remove a local buffered memory.
+- `async add_cosmos(user_id, role, content, memory_type='turn', metadata=None, thread_id=None, tags=None, ttl=None, salience=None, embedding=None, embed=None) -> str` — upsert one memory to Cosmos and return its id.
+- `async push_to_cosmos(batch_size=25) -> None` — flush local buffered memories to Cosmos.
+- `async get_memories(memory_id=None, user_id=None, thread_id=None, role=None, memory_types=None, recent_k=None, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — retrieve memories with filters.
+- `async update_cosmos(memory_id, content=None, role=None, memory_type=None, metadata=None) -> None` — update a Cosmos memory.
+- `async delete_cosmos(memory_id, thread_id, user_id) -> None` — delete a Cosmos memory.
+- `async get_thread(thread_id, user_id=None, memory_types=None, recent_k=None, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, created_after=None, created_before=None) -> list[dict]` — retrieve a thread oldest-first.
+- `async get_user_summary(user_id) -> Optional[dict]` — retrieve the active user-summary document.
+
+### Retrieval
+
+- `async search_cosmos(search_terms, memory_id=None, user_id=None, role=None, memory_types=None, thread_id=None, hybrid_search=False, top_k=5, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — vector or hybrid search memories.
+- `async get_procedural_prompt(user_id) -> Optional[str]` — read the active procedural prompt.
+- `async get_procedural_history(user_id, limit=10) -> list[dict]` — read procedural prompt history.
+- `async get_procedural_memories(user_id, priority=None, category=None, min_salience=None, include_superseded=False) -> list[dict]` — retrieve procedural memory documents.
+- `async search_episodic_memories(user_id, search_terms, top_k=5, min_salience=None, include_superseded=False) -> list[dict]` — search episodic memories.
+- `async build_procedural_context(user_id) -> str` — format procedural context for prompts.
+- `async build_episodic_context(user_id, query, top_k=3) -> str` — format relevant episodic context.
+
+### Processing
+
+- `async extract_memories(user_id, thread_id, recent_k=None) -> dict[str, int]` — extract facts/episodic memories from a thread.
+- `async synthesize_procedural(user_id, *, force=False) -> dict` — synthesize the procedural prompt.
+- `async generate_thread_summary(user_id, thread_id, recent_k=None, **kwargs) -> dict` — generate and persist a thread summary.
+- `async generate_user_summary(user_id, thread_ids=None, recent_k=None, **kwargs) -> dict` — generate and persist a user summary.
+- `async reconcile(user_id, n=None) -> dict[str, int]` — reconcile duplicate or contradictory facts.
+- `async process_now(*, user_id, thread_id) -> ProcessThreadResult` — run the configured processor immediately.
+- `async process_now_and_wait(*, user_id, thread_id, timeout=30.0) -> bool` — process and wait for a summary.
+
+### Tagging
+
+- `async add_tags(memory_id, user_id, thread_id, tags) -> None` — add tags to a memory.
+- `async remove_tags(memory_id, user_id, thread_id, tags) -> None` — remove tags from a memory.
+- `async list_tags(user_id, *, thread_id=None, prefix=None, include_sys=False) -> list[str]` — list sorted, deduped tags for a user; omits `sys:*` by default.
+
+## Extension Points
+
+Sync extension protocols live in `agent_memory_toolkit.services`; async variants live in `agent_memory_toolkit.aio.services`.
+
+- `MemoryStoreProtocol` (`agent_memory_toolkit.services`): persistence primitives (`query`, `read_item`, `add_cosmos`, `mark_superseded`) consumed by the pipeline.
+
+Concrete service classes are exported from their respective packages:
+
+- Sync: `RetrievalService`, `PipelineService` from `agent_memory_toolkit.services` (sub-modules `retrieval`, `pipeline`).
+- Async: `AsyncRetrievalService` and `AsyncPipelineService` from `agent_memory_toolkit.aio.services` (sub-modules `retrieval`, `pipeline`). The async pipeline is a fully-native asyncio implementation — not an `asyncio.to_thread` shim over the sync pipeline.
+- Threshold-driven auto-trigger: `maybe_trigger_steps` from `agent_memory_toolkit.auto_trigger` (sync) and `agent_memory_toolkit.aio.auto_trigger` (async).
diff --git a/README.md b/README.md
@@ -68,9 +68,7 @@ cp .env.template .env
 # edit COSMOS_DB_ENDPOINT, AI_FOUNDRY_ENDPOINT, AI_FOUNDRY_EMBEDDING_DEPLOYMENT_NAME, AI_FOUNDRY_CHAT_DEPLOYMENT_NAME
 ```
 
-You can also point `azd up` at existing resources via `azd env set USE_EXISTING_COSMOS true` / `USE_EXISTING_AI_FOUNDRY true` (full BYOR flag list in `infra/README.md`).
-
-> For the Durable Function app counter-trigger settings, Bicep module reference, and RBAC scopes — see **[`infra/README.md`](infra/README.md)**.
+> For the Durable Function app counter-trigger settings, Bicep module reference, RBAC scopes, and the SDK-only escape hatch (`DEPLOY_FUNCTION_APP=false`) — see **[`infra/README.md`](infra/README.md)**.
 
 ### 3. Use the SDK
 
@@ -170,14 +168,14 @@ high_conf_facts = memory.get_memories(user_id="u1", memory_types=["fact"], min_c
 
 ### Memory Reconciliation
 
-`reconcile(user_id, n=50)` (on the public client; underlying pipeline method is `ProcessingPipeline.reconcile_memories`) collapses paraphrased duplicates and resolves semantic contradictions in a single LLM pass over the N most-recent active facts. Both outcomes soft-delete the loser with a `supersede_reason` of `"duplicate"` or `"contradiction"`. See [Docs/concepts.md](Docs/concepts.md#memory-reconciliation) for details.
+`reconcile(user_id, n=50)` (on the public client; underlying pipeline method is `ProcessingPipeline.reconcile_memories`) collapses paraphrased duplicates and resolves semantic contradictions in a single LLM pass over the N most-recent active facts. Both outcomes soft-delete the loser with a `supersede_reason` of `"duplicate"` or `"contradict"`. See [Docs/concepts.md](Docs/concepts.md#memory-reconciliation) for details.
 
 > **Cost note.** Each reconciliation makes one LLM call covering up to `n` facts (default 50, hard cap 500). With auto-trigger, this fires every `FACT_EXTRACTION_EVERY_N × DEDUP_EVERY_N` turns per user, with `n` taken from `DEDUP_POOL_SIZE`. The previous cosine-cluster pre-filter was removed deliberately — it could not catch semantic contradictions like "vegetarian" vs "ribeye steak" — so the LLM is now invoked whenever there are ≥ 2 active facts. To bound LLM cost more tightly: raise `DEDUP_EVERY_N` (lower frequency — reconcile fires every Nth extraction, so a *higher* N means *less often*), lower `DEDUP_POOL_SIZE` (smaller per-call pool), or override `n` per call when invoking `reconcile()` directly.
 
 | New `MemoryRecord` field | Meaning |
 |---|---|
 | `content_hash` | SHA-256 of normalized content; enables write-time exact-dedup short-circuit |
-| `supersede_reason` | `"duplicate"` or `"contradiction"` (None for live records) |
+| `supersede_reason` | `"duplicate"` or `"contradict"` (None for live records) |
 | `superseded_at` | ISO timestamp when the supersede happened (None for live records) |
 | `superseded_by` | Id of the record that replaced this one (existing field) |
 
@@ -282,7 +280,7 @@ Async equivalents (`AsyncInProcessProcessor`, `AsyncDurableFunctionProcessor`) l
 - **[Docs/design_patterns.md](Docs/design_patterns.md)** — Integration patterns for chat apps and multi-agent systems
 - **[Docs/local_testing.md](Docs/local_testing.md)** — Prerequisites, environment setup, running locally, debugging
 - **[Docs/azure_testing.md](Docs/azure_testing.md)** — Azure deployment, RBAC, cloud validation
-- **[infra/README.md](infra/README.md)** — `azd` deployment, Bicep modules, BYOR settings, counter-trigger tuning
+- **[infra/README.md](infra/README.md)** — `azd` deployment, Bicep modules, RBAC, counter-trigger tuning, SDK-only mode
 - **[Docs/troubleshooting.md](Docs/troubleshooting.md)** — Common issues and resolutions for setup, auth, Cosmos DB, embeddings, Durable Functions, vector search, change feed, etc.
 
 ---

diff --git a/Samples/Advanced/advanced_search_patterns.py b/Samples/Advanced/advanced_search_patterns.py
@@ -140,7 +140,7 @@ def filtered_by_memory_type(mem: CosmosMemoryClient, user_id: str) -> None:
     results = mem.search_cosmos(
         search_terms="food preferences",
         user_id=user_id,
-        memory_type="fact",
+        memory_types=["fact"],
         top_k=3,
     )
     print_results(results)