11# DeepAgent + DocumentAgent Integration - Architecture Summary
22
3+ ** Version:** 1.1 (Aligned with design addendum)
4+ ** Enhancement Summary:** Adds context-aware tagging + confirmation loop, Postgres + pgvector knowledge persistence, hybrid (vector + lexical) retrieval, compliance & cross-standard reasoning flows, and extended deployment phases (5–8).
5+ For full detail see ` deepagent_document_tools_integration.md ` Section 14.
6+
37## Quick Reference
48
59### Core Concept
610
711** DeepAgent uses DocumentAgent family as LangChain tools** , enabling conversational AI to perform document processing tasks automatically.
812
13+ New (v1.1) Knowledge Layer Additions:
14+
15+ ``` text
16+ ┌──────────────────────────────────────────────────────────────┐
17+ │ KNOWLEDGE LAYER (NEW v1.1) │
18+ │ • Tagging & Confirmation (auto + user override) │
19+ │ • High-Accuracy Requirements Pipeline (>99% precision) │
20+ │ • Postgres + pgvector persistence │
21+ │ • Hybrid Retrieval (vector + lexical + metadata) │
22+ │ • Compliance & Standards Graph Reasoning │
23+ └──────────────────────────────────────────────────────────────┘
24+ ```
25+
926``` text
1027┌─────────────────────────────────────────────────────────────────┐
1128│ USER QUERY │
138155│ • Auto document type detection │
139156│ • Adaptive prompt selection │
140157│ • Domain-specific processing │
141- │ - Requirements docs │
158+ │ - Requirements docs (hi-accuracy) │
142159│ - Technical specs │
143160│ - Legal contracts │
144161│ - Business documents │
162+ │ • Persistence + embedding hooks │
145163├─────────────────────────────────────┤
146164│ WRAPS: TagAwareDocumentAgent │
147- │ • DocumentTagger │
165+ │ • DocumentTagger (heuristic + LLM) │
148166│ • PromptSelector │
149167│ • Dynamic strategy selection │
168+ │ • ExternalKnowledgeStore client │
150169├─────────────────────────────────────┤
151170│ INPUT: │
152171│ • file_path: str │
@@ -200,7 +219,7 @@ User Query Analysis
200219 │
201220 ▼
202221 SmartDocumentProcessingTool
203- (auto-detects and adapts)
222+ (auto-detects, adapts, persists, embeds )
204223```
205224
206225### Fallback Chain
@@ -263,7 +282,8 @@ DocumentAgent processes:
263282 3. Chunk markdown (8000 chars/chunk)
264283 4. LLM structures each chunk
265284 5. Merge results
266- 6. Apply quality enhancements
285+ 6. Apply quality enhancements (multi-pass if hi-accuracy enabled)
286+ 7. (v1.1) Persist structured requirements + embeddings (if configured)
267287 │
268288 ▼
269289Returns: {
@@ -360,9 +380,8 @@ DeepAgent: → SmartDocumentProcessingTool
360380
361381TURN 2:
362382USER: "How many database requirements?"
363- DeepAgent: → Recalls previous extraction results from session
364- → Filters requirements by "database" keyword
365- → Returns count + list
383+ DeepAgent: → If session cache warm, use memory; else query hybrid retrieval restricted to `requirements_spec` + keyword filter
384+ → Returns count + list (with persistent IDs)
366385
367386TURN 3:
368387USER: "Export those to JSON"
@@ -403,6 +422,10 @@ DeepAgent: → Formats previously filtered results as JSON
403422│ • Per-requirement confidence │
404423│ • Quality flags │
405424├────────────────────────────────────────────────┤
425+ │ 6. Multi-Pass Refinement (v1.1) │
426+ │ • Second pass on ambiguous items │
427+ │ • Atomic split & duplication resolution │
428+ ├────────────────────────────────────────────────┤
406429│ TOTAL ACCURACY: 99-100% (exceeds ≥98% target) │
407430└────────────────────────────────────────────────┘
408431```
@@ -466,6 +489,15 @@ User → [Status updates...] → Result
466489
467490## Security Measures
468491
492+ ### New (v1.1) Persistence & Data Handling Considerations
493+
494+ | Aspect | Control |
495+ | --------| ---------|
496+ | PII in embeddings | Optional redaction pre-embedding (regex + classifier) |
497+ | Audit of overrides | Tag override table with user + timestamp |
498+ | Data minimization | Store only atomic requirement text + derived metadata |
499+ | Encryption | Recommend TLS for external Postgres API + at-rest encryption |
500+
469501### Path Validation
470502
471503``` text
@@ -538,6 +570,23 @@ document_tools:
538570 auto_detect : true
539571 confidence_threshold : 0.7
540572
573+ knowledge_layer :
574+ enabled : true
575+ persistence :
576+ external_store : true
577+ batch_size : 100
578+ embeddings :
579+ model : " text-embedding-3-large"
580+ dimension : 1536
581+ store_vectors : true
582+ hybrid_retrieval :
583+ enabled : true
584+ vector_weight : 0.6
585+ lexical_weight : 0.4
586+ compliance :
587+ enable_gap_analysis : true
588+ enable_suggestions : true
589+
541590output :
542591 default_format : " summary"
543592 include_confidence_scores : true
@@ -610,6 +659,34 @@ Week 7-8: Phase 4 - Production
610659│ ✓ Monitoring + alerts │
611660│ ✓ Production deployment │
612661└────────────────────────────────────┘
662+
663+ Week 9-10: Phase 5 - Knowledge Layer
664+ ┌────────────────────────────────────┐
665+ │ ✓ Tagging confirmation loop │
666+ │ ✓ Requirements persistence │
667+ │ ✓ Embedding generation pipeline │
668+ └────────────────────────────────────┘
669+
670+ Week 11-12: Phase 6 - Hybrid Retrieval
671+ ┌────────────────────────────────────┐
672+ │ ✓ Hybrid (vector + lexical) API │
673+ │ ✓ Scoring fusion + re-ranking │
674+ │ ✓ Initial retrieval benchmarks │
675+ └────────────────────────────────────┘
676+
677+ Week 13-14: Phase 7 - Compliance Reasoning
678+ ┌────────────────────────────────────┐
679+ │ ✓ Standards graph construction │
680+ │ ✓ Gap analysis engine │
681+ │ ✓ Cross-standard Q&A │
682+ └────────────────────────────────────┘
683+
684+ Week 15: Phase 8 - Optimization & QA
685+ ┌────────────────────────────────────┐
686+ │ ✓ Performance tuning (latency p95) │
687+ │ ✓ Continuous eval harness │
688+ │ ✓ Risk & drift monitoring │
689+ └────────────────────────────────────┘
613690```
614691
615692---
@@ -634,7 +711,9 @@ Week 7-8: Phase 4 - Production
634711
635712✅ ** Provider Agnostic** : Works with any LLM provider
636713
637- ✅ ** Quality Preserved** : All DocumentAgent features maintained
714+ ✅ ** Quality Preserved** : All DocumentAgent features maintained + enhanced multi-pass refinement
715+ ✅ ** Persistent Knowledge** : Structured + vectorized artifacts for future reasoning
716+ ✅ ** Hybrid Intelligence** : Combines semantic, lexical, and metadata signals
638717
639718---
640719
@@ -644,10 +723,15 @@ Week 7-8: Phase 4 - Production
6447232 . ** Graceful Degradation** : Basic features always available
6457243 . ** Quality First** : 99-100% accuracy maintained
6467254 . ** Security by Default** : Path validation, resource limits, sanitization
647- 5 . ** Performance Optimized** : Caching, async, streaming
648- 6 . ** User-Centric** : Natural language interface, helpful errors
726+ 5 . ** Performance Optimized** : Caching, async, streaming, fused retrieval
727+ 6 . ** User-Centric** : Natural language interface, helpful errors, confirmation loops
728+ 7 . ** Observability & Evaluation** : Metrics for tagging, extraction, retrieval, compliance
649729
650730---
651731
652732** For complete implementation details, see:**
653- ` doc/design/deepagent_document_tools_integration.md `
733+ ` doc/design/deepagent_document_tools_integration.md ` (Sections 14.x for v1.1 enhancements)
734+
735+ ---
736+
737+ * This architecture summary is synchronized with the v1.1 design addendum.*
0 commit comments