Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,10 @@ venv/

# macOS
.DS_Store

# JetBrains IDEs
.idea/

# Local scratch notes
Agent Memory Toolkit.txt
Cosmos AI Memory Service.txt
23 changes: 23 additions & 0 deletions Docs/architecture/orchestrators.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Durable orchestrator activity chains

W15 splits LLM extraction from persistence so Durable retries after a Cosmos write failure do not re-run the LLM activity.

```text
ExtractMemoriesOrchestrator
em_Extract (load recent turns + LLM + parse; no embeddings/writes)
em_Persist (embeddings + deterministic create_item; 409 = already persisted)
em_ReconcileMemories (optional; single activity for GA)

ThreadSummaryOrchestrator
ts_Extract
ts_PersistSummary

UserSummaryOrchestrator
us_Extract
us_PersistUserSummary

SynthesizeProceduralOrchestrator
sp_SynthesizeProcedural (single activity for GA)
```

Fact and episodic IDs are deterministic from user, thread, and normalized content. Thread and user summaries keep their deterministic summary IDs.
2 changes: 1 addition & 1 deletion Docs/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ Prompts for summarization and fact extraction live in `azure_functions/prompts/`
The `reconcile_memories(user_id, n=50)` pipeline step reads up to N most-recent active facts for a user and asks the LLM to identify two orthogonal outcomes in one pass:

- **Duplicates** — two or more facts that restate the same claim in different words. Resolution: collapse into one merged fact; the originals are soft-deleted with `supersede_reason="duplicate"` and `superseded_by` set to the merged fact's id.
- **Contradictions** — two facts that assert opposing claims about the same subject. Resolution: keep the winner (more recent first, higher confidence as tiebreaker), soft-delete the loser with `supersede_reason="contradiction"` and `superseded_by` set to the winner.
- **Contradictions** — two facts that assert opposing claims about the same subject. Resolution: keep the winner (more recent first, higher confidence as tiebreaker), soft-delete the loser with `supersede_reason="contradict"` and `superseded_by` set to the winner.

### Why one pass

Expand Down
27 changes: 27 additions & 0 deletions Docs/operations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Operations

Runtime knobs for an Agent-Memory-Toolkit deployment. Most ops levers live in `.env` / Function-app App Settings — change them, restart the consumer, and you're done. Deployment-time knobs (Bicep params bound to `azd env set ...`) live in [`infra/README.md`](../infra/README.md).

## Memory lifecycle (TTL)

| Type | Default TTL | Source |
|---|---:|---|
| turn | 30 d | container default (memories_turns) |
| episodic | 90 d | per-doc ttl (memories container) |
| thread_summary | never | container default (memories, -1) |
| user_summary | never | container default |
| fact | never | container default; supersession handles aging |
| procedural | never | container default; supersession handles aging |

Override per write:

client.add_memory(text, type="turn", ttl=60) # expires in 60 seconds

Override per container at provision time:

azd env set MEMORIES_TURNS_DEFAULT_TTL 86400 # 1 day

## Counter-based trigger configuration

Function-app threshold knobs (`THREAD_SUMMARY_EVERY_N`, `FACT_EXTRACTION_EVERY_N`, `DEDUP_EVERY_N`, `USER_SUMMARY_EVERY_N`, `MAX_BATCH_SIZE`, `MEMORY_PROCESSOR_OWNER`) are documented in [`infra/README.md` → Counter-based trigger configuration](../infra/README.md#counter-based-trigger-configuration-function-app-only). Change them with `azd env set ...` then `azd up`.

123 changes: 123 additions & 0 deletions Docs/public_api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Public API

## Architecture

`CosmosMemoryClient` and `AsyncCosmosMemoryClient` are thin orchestrators. They keep local-buffer state and Cosmos connection lifecycle, then delegate persistence to `MemoryStore` / `AsyncMemoryStore` and higher-level behavior to:

- `ChatClient` / `EmbeddingsClient` (sync) and `AsyncEmbeddingsClient` (async) — Azure OpenAI wrappers.
- `RetrievalService` / `AsyncRetrievalService` for filtering, vector search, and episodic context.
- `PipelineService` for extraction, summaries, procedural synthesis, and reconciliation.
- `InProcessProcessor` / `AsyncInProcessProcessor` / `DurableFunctionProcessor` for immediate or change-feed-driven processing.
- `auto_trigger.maybe_trigger_steps` (sync) and `aio.auto_trigger.maybe_trigger_steps` (async) for threshold-driven step firing after each `push_to_cosmos`.

## CosmosMemoryClient (sync)

### Connection

- `__init__(cosmos_endpoint=None, cosmos_credential=None, cosmos_key=None, cosmos_database=None, cosmos_container=None, cosmos_turns_container=None, cosmos_counter_container=None, cosmos_lease_container=None, cosmos_throughput_mode=None, cosmos_autoscale_max_ru=None, ai_foundry_endpoint=None, ai_foundry_credential=None, ai_foundry_api_key=None, embedding_deployment_name='text-embedding-3-large', embedding_dimensions=None, chat_deployment_name='gpt-4o-mini', use_default_credential=True, processor=None) -> None` — configure local state, model clients, optional Cosmos auto-connect, and optional processing backend. When `cosmos_turns_container` is set, turn-type documents land in a dedicated container so the main `memories` container only fires the Durable change-feed trigger for processed memory writes.
- `close() -> None` — close Cosmos/model clients and owned credentials.
- `connect_cosmos(endpoint=None, credential=None, key=None, database=None, container=None, turns_container=None) -> None` — connect to an existing memory container.
- `create_memory_store(database=None, container=None, turns_container=None, counter_container=None, lease_container=None, endpoint=None, credential=None, key=None, embedding_dimensions=None, embedding_data_type=None, distance_function=None, full_text_language=None, throughput_mode=None, autoscale_max_ru=None) -> None` — create/connect the memory, optional turns, counter, and lease containers.

### Memory CRUD

- `add_local(user_id, role, content, memory_type='turn', agent_id=None, metadata=None, thread_id=None, tags=None, ttl=None, salience=None) -> None` — append a memory to the local buffer.
- `get_local(memory_id=None, user_id=None, role=None, memory_types=None) -> list[dict]` — filter local buffered memories.
- `update_local(memory_id, content=None, role=None, memory_type=None, metadata=None) -> None` — update a local buffered memory.
- `delete_local(memory_id) -> None` — remove a local buffered memory.
- `add_cosmos(user_id, role, content, memory_type='turn', metadata=None, thread_id=None, tags=None, ttl=None, salience=None, embedding=None, embed=None) -> str` — upsert one memory to Cosmos and return its id.
- `push_to_cosmos(batch_size=25) -> None` — flush local buffered memories to Cosmos.
- `get_memories(memory_id=None, user_id=None, thread_id=None, role=None, memory_types=None, recent_k=None, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — retrieve memories with filters.
- `update_cosmos(memory_id, content=None, role=None, memory_type=None, metadata=None) -> None` — update a Cosmos memory.
- `delete_cosmos(memory_id, thread_id, user_id) -> None` — delete a Cosmos memory.
- `get_thread(thread_id, user_id=None, memory_types=None, recent_k=None, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, created_after=None, created_before=None) -> list[dict]` — retrieve a thread oldest-first.
- `get_user_summary(user_id) -> Optional[dict]` — retrieve the active user-summary document.

### Retrieval

- `search_cosmos(search_terms, memory_id=None, user_id=None, role=None, memory_types=None, thread_id=None, hybrid_search=False, top_k=5, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — vector or hybrid search memories.
- `get_procedural_prompt(user_id) -> Optional[str]` — read the active procedural prompt.
- `get_procedural_history(user_id, limit=10) -> list[dict]` — read procedural prompt history.
- `get_procedural_memories(user_id, priority=None, category=None, min_salience=None, include_superseded=False) -> list[dict]` — retrieve procedural memory documents.
- `search_episodic_memories(user_id, search_terms, top_k=5, min_salience=None, include_superseded=False) -> list[dict]` — search episodic memories.
- `build_procedural_context(user_id) -> str` — format procedural context for prompts.
- `build_episodic_context(user_id, query, top_k=3) -> str` — format relevant episodic context.

### Processing

- `extract_memories(user_id, thread_id, recent_k=None) -> dict[str, int]` — extract facts/episodic memories from a thread.
- `synthesize_procedural(user_id, *, force=False) -> dict` — synthesize the procedural prompt.
- `generate_thread_summary(user_id, thread_id, recent_k=None, **kwargs) -> dict` — generate and persist a thread summary.
- `generate_user_summary(user_id, thread_ids=None, recent_k=None, **kwargs) -> dict` — generate and persist a user summary.
- `reconcile(user_id, n=None) -> dict[str, int]` — reconcile duplicate or contradictory facts.
- `process_now(*, user_id, thread_id) -> ProcessThreadResult` — run the configured processor immediately.
- `process_now_and_wait(*, user_id, thread_id, timeout=30.0) -> bool` — process and wait for a summary.

### Tagging

- `add_tags(memory_id, user_id, thread_id, tags) -> None` — add tags to a memory.
- `remove_tags(memory_id, user_id, thread_id, tags) -> None` — remove tags from a memory.
- `list_tags(user_id, *, thread_id=None, prefix=None, include_sys=False) -> list[str]` — list sorted, deduped tags for a user; omits `sys:*` by default.

## AsyncCosmosMemoryClient

Local-buffer methods remain synchronous in-memory operations; Cosmos, retrieval, and processing methods are `async` and must be awaited.

### Connection

- `__init__(cosmos_endpoint=None, cosmos_credential=None, cosmos_key=None, cosmos_database=None, cosmos_container=None, cosmos_turns_container=None, cosmos_counter_container=None, cosmos_lease_container=None, cosmos_throughput_mode=None, cosmos_autoscale_max_ru=None, ai_foundry_endpoint=None, ai_foundry_credential=None, ai_foundry_api_key=None, embedding_deployment_name='text-embedding-3-large', embedding_dimensions=None, chat_deployment_name='gpt-4o-mini', use_default_credential=True, processor=None) -> None` — configure async local state, model clients, and optional processing backend. When `cosmos_turns_container` is set, turn-type documents land in a dedicated container so the main `memories` container only fires the Durable change-feed trigger for processed memory writes.
- `async close() -> None` — close async/sync resources and owned credentials.
- `async connect_cosmos(endpoint=None, credential=None, key=None, database=None, container=None, turns_container=None) -> None` — connect to an existing memory container.
- `async create_memory_store(database=None, container=None, turns_container=None, counter_container=None, lease_container=None, endpoint=None, credential=None, key=None, embedding_dimensions=None, embedding_data_type=None, distance_function=None, full_text_language=None, throughput_mode=None, autoscale_max_ru=None) -> None` — create/connect memory, optional turns, counter, and lease containers.

### Memory CRUD

- `add_local(user_id, role, content, memory_type='turn', agent_id=None, metadata=None, thread_id=None, tags=None, ttl=None, salience=None) -> None` — append a memory to the local buffer.
- `get_local(memory_id=None, user_id=None, role=None, memory_types=None) -> list[dict]` — filter local buffered memories.
- `update_local(memory_id, content=None, role=None, memory_type=None, metadata=None) -> None` — update a local buffered memory.
- `delete_local(memory_id) -> None` — remove a local buffered memory.
- `async add_cosmos(user_id, role, content, memory_type='turn', metadata=None, thread_id=None, tags=None, ttl=None, salience=None, embedding=None, embed=None) -> str` — upsert one memory to Cosmos and return its id.
- `async push_to_cosmos(batch_size=25) -> None` — flush local buffered memories to Cosmos.
- `async get_memories(memory_id=None, user_id=None, thread_id=None, role=None, memory_types=None, recent_k=None, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — retrieve memories with filters.
- `async update_cosmos(memory_id, content=None, role=None, memory_type=None, metadata=None) -> None` — update a Cosmos memory.
- `async delete_cosmos(memory_id, thread_id, user_id) -> None` — delete a Cosmos memory.
- `async get_thread(thread_id, user_id=None, memory_types=None, recent_k=None, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, created_after=None, created_before=None) -> list[dict]` — retrieve a thread oldest-first.
- `async get_user_summary(user_id) -> Optional[dict]` — retrieve the active user-summary document.

### Retrieval

- `async search_cosmos(search_terms, memory_id=None, user_id=None, role=None, memory_types=None, thread_id=None, hybrid_search=False, top_k=5, tags_all=None, tags_any=None, exclude_tags=None, include_superseded=False, min_salience=None, min_confidence=None, created_after=None, created_before=None) -> list[dict]` — vector or hybrid search memories.
- `async get_procedural_prompt(user_id) -> Optional[str]` — read the active procedural prompt.
- `async get_procedural_history(user_id, limit=10) -> list[dict]` — read procedural prompt history.
- `async get_procedural_memories(user_id, priority=None, category=None, min_salience=None, include_superseded=False) -> list[dict]` — retrieve procedural memory documents.
- `async search_episodic_memories(user_id, search_terms, top_k=5, min_salience=None, include_superseded=False) -> list[dict]` — search episodic memories.
- `async build_procedural_context(user_id) -> str` — format procedural context for prompts.
- `async build_episodic_context(user_id, query, top_k=3) -> str` — format relevant episodic context.

### Processing

- `async extract_memories(user_id, thread_id, recent_k=None) -> dict[str, int]` — extract facts/episodic memories from a thread.
- `async synthesize_procedural(user_id, *, force=False) -> dict` — synthesize the procedural prompt.
- `async generate_thread_summary(user_id, thread_id, recent_k=None, **kwargs) -> dict` — generate and persist a thread summary.
- `async generate_user_summary(user_id, thread_ids=None, recent_k=None, **kwargs) -> dict` — generate and persist a user summary.
- `async reconcile(user_id, n=None) -> dict[str, int]` — reconcile duplicate or contradictory facts.
- `async process_now(*, user_id, thread_id) -> ProcessThreadResult` — run the configured processor immediately.
- `async process_now_and_wait(*, user_id, thread_id, timeout=30.0) -> bool` — process and wait for a summary.

### Tagging

- `async add_tags(memory_id, user_id, thread_id, tags) -> None` — add tags to a memory.
- `async remove_tags(memory_id, user_id, thread_id, tags) -> None` — remove tags from a memory.
- `async list_tags(user_id, *, thread_id=None, prefix=None, include_sys=False) -> list[str]` — list sorted, deduped tags for a user; omits `sys:*` by default.

## Extension Points

Sync extension protocols live in `agent_memory_toolkit.services`; async variants live in `agent_memory_toolkit.aio.services`.

- `MemoryStoreProtocol` (`agent_memory_toolkit.services`): persistence primitives (`query`, `read_item`, `add_cosmos`, `mark_superseded`) consumed by the pipeline.

Concrete service classes are exported from their respective packages:

- Sync: `RetrievalService`, `PipelineService` from `agent_memory_toolkit.services` (sub-modules `retrieval`, `pipeline`).
- Async: `AsyncRetrievalService` and `AsyncPipelineService` from `agent_memory_toolkit.aio.services` (sub-modules `retrieval`, `pipeline`). The async pipeline is a fully-native asyncio implementation — not an `asyncio.to_thread` shim over the sync pipeline.
- Threshold-driven auto-trigger: `maybe_trigger_steps` from `agent_memory_toolkit.auto_trigger` (sync) and `agent_memory_toolkit.aio.auto_trigger` (async).
10 changes: 4 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,9 +68,7 @@ cp .env.template .env
# edit COSMOS_DB_ENDPOINT, AI_FOUNDRY_ENDPOINT, AI_FOUNDRY_EMBEDDING_DEPLOYMENT_NAME, AI_FOUNDRY_CHAT_DEPLOYMENT_NAME
```

You can also point `azd up` at existing resources via `azd env set USE_EXISTING_COSMOS true` / `USE_EXISTING_AI_FOUNDRY true` (full BYOR flag list in `infra/README.md`).

> For the Durable Function app counter-trigger settings, Bicep module reference, and RBAC scopes — see **[`infra/README.md`](infra/README.md)**.
> For the Durable Function app counter-trigger settings, Bicep module reference, RBAC scopes, and the SDK-only escape hatch (`DEPLOY_FUNCTION_APP=false`) — see **[`infra/README.md`](infra/README.md)**.

### 3. Use the SDK

Expand Down Expand Up @@ -170,14 +168,14 @@ high_conf_facts = memory.get_memories(user_id="u1", memory_types=["fact"], min_c

### Memory Reconciliation

`reconcile(user_id, n=50)` (on the public client; underlying pipeline method is `ProcessingPipeline.reconcile_memories`) collapses paraphrased duplicates and resolves semantic contradictions in a single LLM pass over the N most-recent active facts. Both outcomes soft-delete the loser with a `supersede_reason` of `"duplicate"` or `"contradiction"`. See [Docs/concepts.md](Docs/concepts.md#memory-reconciliation) for details.
`reconcile(user_id, n=50)` (on the public client; underlying pipeline method is `ProcessingPipeline.reconcile_memories`) collapses paraphrased duplicates and resolves semantic contradictions in a single LLM pass over the N most-recent active facts. Both outcomes soft-delete the loser with a `supersede_reason` of `"duplicate"` or `"contradict"`. See [Docs/concepts.md](Docs/concepts.md#memory-reconciliation) for details.

> **Cost note.** Each reconciliation makes one LLM call covering up to `n` facts (default 50, hard cap 500). With auto-trigger, this fires every `FACT_EXTRACTION_EVERY_N × DEDUP_EVERY_N` turns per user, with `n` taken from `DEDUP_POOL_SIZE`. The previous cosine-cluster pre-filter was removed deliberately — it could not catch semantic contradictions like "vegetarian" vs "ribeye steak" — so the LLM is now invoked whenever there are ≥ 2 active facts. To bound LLM cost more tightly: raise `DEDUP_EVERY_N` (lower frequency — reconcile fires every Nth extraction, so a *higher* N means *less often*), lower `DEDUP_POOL_SIZE` (smaller per-call pool), or override `n` per call when invoking `reconcile()` directly.

| New `MemoryRecord` field | Meaning |
|---|---|
| `content_hash` | SHA-256 of normalized content; enables write-time exact-dedup short-circuit |
| `supersede_reason` | `"duplicate"` or `"contradiction"` (None for live records) |
| `supersede_reason` | `"duplicate"` or `"contradict"` (None for live records) |
| `superseded_at` | ISO timestamp when the supersede happened (None for live records) |
| `superseded_by` | Id of the record that replaced this one (existing field) |

Expand Down Expand Up @@ -282,7 +280,7 @@ Async equivalents (`AsyncInProcessProcessor`, `AsyncDurableFunctionProcessor`) l
- **[Docs/design_patterns.md](Docs/design_patterns.md)** — Integration patterns for chat apps and multi-agent systems
- **[Docs/local_testing.md](Docs/local_testing.md)** — Prerequisites, environment setup, running locally, debugging
- **[Docs/azure_testing.md](Docs/azure_testing.md)** — Azure deployment, RBAC, cloud validation
- **[infra/README.md](infra/README.md)** — `azd` deployment, Bicep modules, BYOR settings, counter-trigger tuning
- **[infra/README.md](infra/README.md)** — `azd` deployment, Bicep modules, RBAC, counter-trigger tuning, SDK-only mode
- **[Docs/troubleshooting.md](Docs/troubleshooting.md)** — Common issues and resolutions for setup, auth, Cosmos DB, embeddings, Durable Functions, vector search, change feed, etc.

---
Expand Down
2 changes: 1 addition & 1 deletion Samples/Advanced/advanced_search_patterns.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ def filtered_by_memory_type(mem: CosmosMemoryClient, user_id: str) -> None:
results = mem.search_cosmos(
search_terms="food preferences",
user_id=user_id,
memory_type="fact",
memory_types=["fact"],
top_k=3,
)
print_results(results)
Expand Down
Loading
Loading