Skip to content

Commit 035c17b

Browse files
committed
docs: update main documentation with new structure
Phase 4 (Enhancement) - Documentation Index Complete Updated main documentation files to reflect new organized structure: 1. README.md - Added Quick Start section with 4-step setup - Completely rewrote Documentation section - Added 4 subsections: User Guides, Developer Guides, Features, Additional Resources - Linked to all 10 main documentation files - Clear contributor guidelines 2. doc/README.md (Documentation Index) - Complete documentation index (250+ lines) - Organized by role (Users, Developers, Evaluators) - Quick navigation by task - Documentation standards and maintenance guidelines - Cross-reference map for all 60+ docs - Sections: * User Guides (3 files) * Developer Guides (3 files) * Feature Docs (4 files) * Architecture (26 templates) * Business (4 docs) * Specifications (3+ specs) * Process docs (10 files) * Historical archive (24 files) Navigation improvements: - By Role: User, Developer, Evaluator paths - By Task: Setup, Testing, Architecture, Quality - Complete cross-referencing - Documentation writing standards Total documented: 60+ files organized and indexed
1 parent e81fe4a commit 035c17b

File tree

2 files changed

+315
-5
lines changed

2 files changed

+315
-5
lines changed

README.md

Lines changed: 69 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,10 @@
66
- [Unstructured Data RAG Platform Overview](#unstructured-data-rag-platform-overview)
77
- [unstructuredDataHandler Roadmap](#unstructureddatahandler-roadmap)
88
- [Overview](#overview)
9+
- [Quick Start](#quick-start)
10+
- [Documentation](#documentation)
911
- [Resources](#resources)
1012
- [FAQ](#faq)
11-
- [Documentation](#documentation)
1213
- [Contributing](#contributing)
1314
- [Communicating with the Team](#communicating-with-the-team)
1415
- [Developer Guidance](#developer-guidance)
@@ -49,6 +50,34 @@ The project specification, plan, and task breakdown are defined in YAML files:
4950

5051
---
5152

53+
## Quick Start
54+
55+
Get started with unstructuredDataHandler in minutes:
56+
57+
```bash
58+
# 1. Clone and setup
59+
git clone <repository-url>
60+
cd unstructuredDataHandler
61+
python3 -m venv .venv
62+
source .venv/bin/activate # or `.venv\Scripts\activate` on Windows
63+
64+
# 2. Install dependencies
65+
pip install -r requirements-dev.txt
66+
67+
# 3. Set PYTHONPATH
68+
export PYTHONPATH=.
69+
70+
# 4. Extract requirements from a document
71+
python examples/basic_completion.py
72+
```
73+
74+
**For detailed setup and usage**, see:
75+
- **[Quick Start Guide](doc/user-guide/quick-start.md)** - 5-minute getting started
76+
- **[Configuration Guide](doc/user-guide/configuration.md)** - LLM providers and settings
77+
- **[Testing Guide](doc/user-guide/testing.md)** - Running tests and validation
78+
79+
---
80+
5281
## Overview
5382

5483
The Unstructured Data RAG Platform provides:
@@ -194,9 +223,46 @@ Extracted text, metadata, and relationships are normalized into JSON, stored in
194223

195224
## Documentation
196225

197-
All project documentation is located at [softwaremodule-docs](./doc/). Architecture, schema, and developer guides are maintained here.
226+
Complete documentation is available in the `doc/` directory, organized into four main sections:
227+
228+
### 📘 User Guides (`doc/user-guide/`)
229+
230+
Start here if you're new to the project:
231+
232+
- **[Quick Start](doc/user-guide/quick-start.md)** - Get running in 5 minutes with programmatic usage and Streamlit UI
233+
- **[Configuration](doc/user-guide/configuration.md)** - LLM provider setup (Ollama, Cerebras, OpenAI, Anthropic) and optimization
234+
- **[Testing](doc/user-guide/testing.md)** - Running tests, test types, coverage, and CI/CD integration
235+
236+
### 👨‍💻 Developer Guides (`doc/developer-guide/`)
237+
238+
For contributors and developers extending the codebase:
239+
240+
- **[Architecture](doc/developer-guide/architecture.md)** - System design, components, data flows, design patterns
241+
- **[Development Setup](doc/developer-guide/development-setup.md)** - Environment setup, IDE configuration, branch strategy, workflows
242+
- **[API Reference](doc/developer-guide/api-reference.md)** - Complete API documentation with code examples
243+
244+
### ⚡ Feature Docs (`doc/features/`)
245+
246+
Detailed documentation for specific features:
247+
248+
- **[Requirements Extraction](doc/features/requirements-extraction.md)** - AI-powered extraction from documents (PDF/DOCX/PPTX/HTML)
249+
- **[Document Tagging](doc/features/document-tagging.md)** - Automatic categorization and metadata extraction
250+
- **[Quality Enhancements](doc/features/quality-enhancements.md)** - 99-100% accuracy mode with confidence scoring
251+
- **[LLM Integration](doc/features/llm-integration.md)** - Multi-provider LLM support and optimization
252+
253+
### 📚 Additional Resources
254+
255+
- **[Architecture Details](doc/architecture/)** - System architecture templates and domain context
256+
- **[Business Documentation](doc/business/)** - Stakeholder analysis and differentiation strategy
257+
- **[Specifications](doc/specs/)** - Feature specs and templates
258+
- **[Historical Docs](doc/.archive/)** - Implementation reports and phase summaries
259+
260+
### Contributing to Documentation
198261

199-
If you would like to contribute to the documentation, please submit a pull request on the [unstructuredDataHandler Documentation][docs-repo] repository.
262+
If you would like to contribute to the documentation, please:
263+
1. Follow the templates in `doc/specs/`
264+
2. Ensure code examples are tested and work
265+
3. Submit a pull request with clear description
200266

201267
---
202268

doc/README.md

Lines changed: 246 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,248 @@
1-
# Documentation
1+
# Documentation Index
22

3-
This directory contains the relevant documents important to the software.
3+
**Complete documentation for the unstructuredDataHandler project**
44

5+
This directory contains all project documentation, organized for easy navigation. Choose a section based on your role and needs:
6+
7+
---
8+
9+
## 📘 For Users
10+
11+
**New to the project? Start here:**
12+
13+
### User Guides (`user-guide/`)
14+
15+
- **[Quick Start](user-guide/quick-start.md)**
16+
- 5-minute setup and first extraction
17+
- Programmatic usage with Python
18+
- Streamlit UI walkthrough
19+
- Provider comparison (Ollama vs Cerebras vs OpenAI)
20+
- Troubleshooting guide
21+
22+
- **[Configuration](user-guide/configuration.md)**
23+
- 4-tier configuration system
24+
- LLM provider setup (Ollama, Cerebras, OpenAI, Anthropic)
25+
- Environment variables and YAML config
26+
- Image storage (local/MinIO)
27+
- Debug and optimization settings
28+
29+
- **[Testing](user-guide/testing.md)**
30+
- Running test suite (233 tests, 87.2% pass rate)
31+
- Test types: unit, integration, smoke, E2E
32+
- Manual test scripts
33+
- Performance benchmarks
34+
- CI/CD integration
35+
36+
---
37+
38+
## 👨‍💻 For Developers
39+
40+
**Contributing to the codebase? Read these:**
41+
42+
### Developer Guides (`developer-guide/`)
43+
44+
- **[Architecture](developer-guide/architecture.md)**
45+
- 7-layer system architecture
46+
- Component details (Agent, Parser, LLM, Skill, Memory, Retrieval)
47+
- Data flow examples and diagrams
48+
- Design patterns (Factory, Strategy, Pipeline, Observer, Singleton)
49+
- Quality attributes
50+
- Extension guide
51+
52+
- **[Development Setup](developer-guide/development-setup.md)**
53+
- Prerequisites and environment setup
54+
- Python environments (venv, conda, pyenv)
55+
- Dependency management
56+
- LLM provider configuration
57+
- IDE setup (VS Code, PyCharm)
58+
- Pre-commit hooks
59+
- Code quality tools
60+
- Branch strategy (dev/<alias>/<feature>)
61+
- Git2Git workflow
62+
63+
- **[API Reference](developer-guide/api-reference.md)**
64+
- DocumentAgent API (5 methods with examples)
65+
- DocumentParser API (3 methods)
66+
- LLMRouter API (2 methods)
67+
- Configuration utilities
68+
- Quality metrics structures
69+
- Type hints and error handling
70+
71+
---
72+
73+
## ⚡ Feature Documentation
74+
75+
**Deep dive into specific features:**
76+
77+
### Features (`features/`)
78+
79+
- **[Requirements Extraction](features/requirements-extraction.md)**
80+
- AI-powered extraction from documents
81+
- Multi-format support (PDF, DOCX, PPTX, HTML, images)
82+
- Quality enhancement mode (99-100% accuracy)
83+
- Batch processing
84+
- Provider optimization
85+
86+
- **[Document Tagging](features/document-tagging.md)**
87+
- Automatic categorization
88+
- Tag types: requirement types, domains, priorities, sections
89+
- Tag-based filtering and analytics
90+
- Multi-label classification
91+
- Custom taxonomies
92+
93+
- **[Quality Enhancements](features/quality-enhancements.md)**
94+
- Confidence scoring (0.0-1.0)
95+
- 5 confidence levels (very high to very low)
96+
- Quality flags (7 types: vague, missing ID, duplicate, incomplete, etc.)
97+
- Auto-approval workflow
98+
- Quality metrics and reporting
99+
100+
- **[LLM Integration](features/llm-integration.md)**
101+
- Multi-provider support (4 providers)
102+
- Provider details: Ollama, Cerebras, OpenAI, Anthropic
103+
- Performance comparison
104+
- Cost optimization strategies
105+
- Advanced topics: custom providers, caching, token tracking
106+
107+
---
108+
109+
## 📚 Additional Resources
110+
111+
### Architecture (`architecture/`)
112+
113+
System architecture documentation and templates:
114+
- [System Overview](architecture/system_overview.md) - High-level system view
115+
- [Component](architecture/component.md) - Component documentation template
116+
- [Interface](architecture/interface.md) - Interface specifications
117+
- [Pattern](architecture/pattern.md) - Design patterns used
118+
- [Quality](architecture/quality.md) - Quality attributes
119+
- And 9 more architectural templates
120+
121+
### Business Documentation (`business/`)
122+
123+
- [Stakeholder Analysis](business/stakeholder_analysis.md)
124+
- [Stakeholder Goals](business/stakeholder_goals.md)
125+
- [Differentiation Strategy](business/differentiation_strategy.md)
126+
- [Risks and Challenges](business/risks_and_challenges.md)
127+
128+
### Specifications (`specs/`)
129+
130+
- [Spec Template](specs/spec-template.md) - Feature specification template
131+
- [Settings Spec Template](specs/settings-spec-template.md)
132+
- [Example Spec](specs/#000%20-%20Example/) - Example specification
133+
134+
### Process Documentation
135+
136+
- [Building](building.md) - Build instructions
137+
- [Submitting Code](submitting_code.md) - Branch strategy and Git2Git workflow
138+
- [Code Style](STYLE.md) - Code style guidelines
139+
- [Code Organization](ORGANIZATION.md) - Repository structure rules
140+
- [Bot Usage](bot.md) - Working with AI coding assistants
141+
142+
### Development Guides
143+
144+
- [DeepAgent](deepagent.md) - DeepAgent implementation guide
145+
- [Debugging](Debugging.md) - Debugging techniques
146+
- [Feature Flags](feature_flags.md) - Feature flag system
147+
- [Fuzzing](fuzzing.md) - Fuzzing and testing strategies
148+
- [Exceptions](EXCEPTIONS.md) - Exception handling
149+
- [Cooked Read Data](COOKED_READ_DATA.md) - Data processing
150+
151+
### Historical Documentation (`.archive/`)
152+
153+
Implementation reports and phase summaries preserved for reference:
154+
155+
- **phase1/** - Phase 1 implementation (3 docs)
156+
- **phase2/** - Phase 2 implementation (10 docs)
157+
- **implementation-reports/** - Task completion reports (5 docs)
158+
- **working-docs/** - Working documents and summaries (6 docs)
159+
160+
---
161+
162+
## 🔍 Quick Navigation
163+
164+
### By Role
165+
166+
**I'm a user wanting to extract requirements:**
167+
1. Start: [Quick Start Guide](user-guide/quick-start.md)
168+
2. Configure: [Configuration Guide](user-guide/configuration.md)
169+
3. Deep dive: [Requirements Extraction](features/requirements-extraction.md)
170+
171+
**I'm a developer wanting to contribute:**
172+
1. Setup: [Development Setup](developer-guide/development-setup.md)
173+
2. Understand: [Architecture](developer-guide/architecture.md)
174+
3. Reference: [API Reference](developer-guide/api-reference.md)
175+
176+
**I'm evaluating the project:**
177+
1. Overview: [README](../README.md)
178+
2. Architecture: [System Overview](architecture/system_overview.md)
179+
3. Business: [Stakeholder Analysis](business/stakeholder_analysis.md)
180+
181+
### By Task
182+
183+
**Setting up LLM providers:**
184+
- [Configuration Guide](user-guide/configuration.md) → LLM Provider Setup
185+
- [LLM Integration](features/llm-integration.md) → Provider Details
186+
187+
**Running tests:**
188+
- [Testing Guide](user-guide/testing.md) → Running Tests
189+
- [Development Setup](developer-guide/development-setup.md) → Testing Setup
190+
191+
**Understanding the architecture:**
192+
- [Architecture Guide](developer-guide/architecture.md) → System Architecture
193+
- [Source README](../src/README.md) → Code Structure
194+
195+
**Improving extraction quality:**
196+
- [Quality Enhancements](features/quality-enhancements.md) → Confidence Scoring
197+
- [Requirements Extraction](features/requirements-extraction.md) → Quality Mode
198+
199+
---
200+
201+
## 📝 Documentation Standards
202+
203+
### Writing New Documentation
204+
205+
1. **Use templates** from `specs/` for consistency
206+
2. **Include code examples** that are tested and work
207+
3. **Add cross-references** to related documentation
208+
4. **Keep it updated** when making code changes
209+
5. **Follow markdown linting** rules (see `.markdownlint.json`)
210+
211+
### Documentation Structure
212+
213+
Each major doc should include:
214+
- **Overview** - What is this about?
215+
- **Quick Start** - How do I use it?
216+
- **Configuration** - How do I configure it?
217+
- **Examples** - Show me working code
218+
- **Troubleshooting** - What if it doesn't work?
219+
- **Related Docs** - Where can I learn more?
220+
221+
### Maintenance
222+
223+
- User guides: Update when user-facing features change
224+
- Developer guides: Update when APIs or architecture change
225+
- Feature docs: Update when feature capabilities change
226+
- Architecture docs: Update when design decisions change
227+
228+
---
229+
230+
## 🤝 Contributing
231+
232+
See [CONTRIBUTING.md](../CONTRIBUTING.md) for:
233+
- How to submit documentation changes
234+
- Pull request template
235+
- Review process
236+
- Style guidelines
237+
238+
---
239+
240+
**Need help?** See [SUPPORT.md](../SUPPORT.md) for support channels.
241+
242+
**Have a question?** Check the [FAQ section](../README.md#faq) in the main README.
243+
244+
---
245+
246+
*Last Updated: 2025-10-07*
247+
*Total Documentation Files: 60+*
248+
*Documentation Coverage: Complete*

0 commit comments

Comments
 (0)