Skip to content

Commit 75de21a

Browse files
committed
docs(design): update architecture summary to v1.1 (knowledge layer, hybrid RAG, compliance)
1 parent e55fa9e commit 75de21a

File tree

1 file changed

+95
-11
lines changed

1 file changed

+95
-11
lines changed

doc/design/integration_architecture_summary.md

Lines changed: 95 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,28 @@
11
# DeepAgent + DocumentAgent Integration - Architecture Summary
22

3+
**Version:** 1.1 (Aligned with design addendum)
4+
**Enhancement Summary:** Adds context-aware tagging + confirmation loop, Postgres + pgvector knowledge persistence, hybrid (vector + lexical) retrieval, compliance & cross-standard reasoning flows, and extended deployment phases (5–8).
5+
For full detail see `deepagent_document_tools_integration.md` Section 14.
6+
37
## Quick Reference
48

59
### Core Concept
610

711
**DeepAgent uses DocumentAgent family as LangChain tools**, enabling conversational AI to perform document processing tasks automatically.
812

13+
New (v1.1) Knowledge Layer Additions:
14+
15+
```text
16+
┌──────────────────────────────────────────────────────────────┐
17+
│ KNOWLEDGE LAYER (NEW v1.1) │
18+
│ • Tagging & Confirmation (auto + user override) │
19+
│ • High-Accuracy Requirements Pipeline (>99% precision) │
20+
│ • Postgres + pgvector persistence │
21+
│ • Hybrid Retrieval (vector + lexical + metadata) │
22+
│ • Compliance & Standards Graph Reasoning │
23+
└──────────────────────────────────────────────────────────────┘
24+
```
25+
926
```text
1027
┌─────────────────────────────────────────────────────────────────┐
1128
│ USER QUERY │
@@ -138,15 +155,17 @@
138155
│ • Auto document type detection │
139156
│ • Adaptive prompt selection │
140157
│ • Domain-specific processing │
141-
│ - Requirements docs
158+
│ - Requirements docs (hi-accuracy)
142159
│ - Technical specs │
143160
│ - Legal contracts │
144161
│ - Business documents │
162+
│ • Persistence + embedding hooks │
145163
├─────────────────────────────────────┤
146164
│ WRAPS: TagAwareDocumentAgent │
147-
│ • DocumentTagger
165+
│ • DocumentTagger (heuristic + LLM)
148166
│ • PromptSelector │
149167
│ • Dynamic strategy selection │
168+
│ • ExternalKnowledgeStore client │
150169
├─────────────────────────────────────┤
151170
│ INPUT: │
152171
│ • file_path: str │
@@ -200,7 +219,7 @@ User Query Analysis
200219
201220
202221
SmartDocumentProcessingTool
203-
(auto-detects and adapts)
222+
(auto-detects, adapts, persists, embeds)
204223
```
205224

206225
### Fallback Chain
@@ -263,7 +282,8 @@ DocumentAgent processes:
263282
3. Chunk markdown (8000 chars/chunk)
264283
4. LLM structures each chunk
265284
5. Merge results
266-
6. Apply quality enhancements
285+
6. Apply quality enhancements (multi-pass if hi-accuracy enabled)
286+
7. (v1.1) Persist structured requirements + embeddings (if configured)
267287
268288
269289
Returns: {
@@ -360,9 +380,8 @@ DeepAgent: → SmartDocumentProcessingTool
360380
361381
TURN 2:
362382
USER: "How many database requirements?"
363-
DeepAgent: → Recalls previous extraction results from session
364-
→ Filters requirements by "database" keyword
365-
→ Returns count + list
383+
DeepAgent: → If session cache warm, use memory; else query hybrid retrieval restricted to `requirements_spec` + keyword filter
384+
→ Returns count + list (with persistent IDs)
366385
367386
TURN 3:
368387
USER: "Export those to JSON"
@@ -403,6 +422,10 @@ DeepAgent: → Formats previously filtered results as JSON
403422
│ • Per-requirement confidence │
404423
│ • Quality flags │
405424
├────────────────────────────────────────────────┤
425+
│ 6. Multi-Pass Refinement (v1.1) │
426+
│ • Second pass on ambiguous items │
427+
│ • Atomic split & duplication resolution │
428+
├────────────────────────────────────────────────┤
406429
│ TOTAL ACCURACY: 99-100% (exceeds ≥98% target) │
407430
└────────────────────────────────────────────────┘
408431
```
@@ -466,6 +489,15 @@ User → [Status updates...] → Result
466489

467490
## Security Measures
468491

492+
### New (v1.1) Persistence & Data Handling Considerations
493+
494+
| Aspect | Control |
495+
|--------|---------|
496+
| PII in embeddings | Optional redaction pre-embedding (regex + classifier) |
497+
| Audit of overrides | Tag override table with user + timestamp |
498+
| Data minimization | Store only atomic requirement text + derived metadata |
499+
| Encryption | Recommend TLS for external Postgres API + at-rest encryption |
500+
469501
### Path Validation
470502

471503
```text
@@ -538,6 +570,23 @@ document_tools:
538570
auto_detect: true
539571
confidence_threshold: 0.7
540572

573+
knowledge_layer:
574+
enabled: true
575+
persistence:
576+
external_store: true
577+
batch_size: 100
578+
embeddings:
579+
model: "text-embedding-3-large"
580+
dimension: 1536
581+
store_vectors: true
582+
hybrid_retrieval:
583+
enabled: true
584+
vector_weight: 0.6
585+
lexical_weight: 0.4
586+
compliance:
587+
enable_gap_analysis: true
588+
enable_suggestions: true
589+
541590
output:
542591
default_format: "summary"
543592
include_confidence_scores: true
@@ -610,6 +659,34 @@ Week 7-8: Phase 4 - Production
610659
│ ✓ Monitoring + alerts │
611660
│ ✓ Production deployment │
612661
└────────────────────────────────────┘
662+
663+
Week 9-10: Phase 5 - Knowledge Layer
664+
┌────────────────────────────────────┐
665+
│ ✓ Tagging confirmation loop │
666+
│ ✓ Requirements persistence │
667+
│ ✓ Embedding generation pipeline │
668+
└────────────────────────────────────┘
669+
670+
Week 11-12: Phase 6 - Hybrid Retrieval
671+
┌────────────────────────────────────┐
672+
│ ✓ Hybrid (vector + lexical) API │
673+
│ ✓ Scoring fusion + re-ranking │
674+
│ ✓ Initial retrieval benchmarks │
675+
└────────────────────────────────────┘
676+
677+
Week 13-14: Phase 7 - Compliance Reasoning
678+
┌────────────────────────────────────┐
679+
│ ✓ Standards graph construction │
680+
│ ✓ Gap analysis engine │
681+
│ ✓ Cross-standard Q&A │
682+
└────────────────────────────────────┘
683+
684+
Week 15: Phase 8 - Optimization & QA
685+
┌────────────────────────────────────┐
686+
│ ✓ Performance tuning (latency p95) │
687+
│ ✓ Continuous eval harness │
688+
│ ✓ Risk & drift monitoring │
689+
└────────────────────────────────────┘
613690
```
614691

615692
---
@@ -634,7 +711,9 @@ Week 7-8: Phase 4 - Production
634711

635712
**Provider Agnostic**: Works with any LLM provider
636713

637-
**Quality Preserved**: All DocumentAgent features maintained
714+
**Quality Preserved**: All DocumentAgent features maintained + enhanced multi-pass refinement
715+
**Persistent Knowledge**: Structured + vectorized artifacts for future reasoning
716+
**Hybrid Intelligence**: Combines semantic, lexical, and metadata signals
638717

639718
---
640719

@@ -644,10 +723,15 @@ Week 7-8: Phase 4 - Production
644723
2. **Graceful Degradation**: Basic features always available
645724
3. **Quality First**: 99-100% accuracy maintained
646725
4. **Security by Default**: Path validation, resource limits, sanitization
647-
5. **Performance Optimized**: Caching, async, streaming
648-
6. **User-Centric**: Natural language interface, helpful errors
726+
5. **Performance Optimized**: Caching, async, streaming, fused retrieval
727+
6. **User-Centric**: Natural language interface, helpful errors, confirmation loops
728+
7. **Observability & Evaluation**: Metrics for tagging, extraction, retrieval, compliance
649729

650730
---
651731

652732
**For complete implementation details, see:**
653-
`doc/design/deepagent_document_tools_integration.md`
733+
`doc/design/deepagent_document_tools_integration.md` (Sections 14.x for v1.1 enhancements)
734+
735+
---
736+
737+
*This architecture summary is synchronized with the v1.1 design addendum.*

0 commit comments

Comments
 (0)