SoftwareDevLabs
diff --git a/‎.pre-commit-config.yaml‎
Lines changed: 26 additions & 0 deletions b/‎.pre-commit-config.yaml‎
Lines changed: 26 additions & 0 deletions
diff --git a/‎AGENTS.md‎
Lines changed: 37 additions & 23 deletions b/‎AGENTS.md‎
Lines changed: 37 additions & 23 deletions
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 6 additions & 6 deletions b/‎CONTRIBUTING.md‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎Makefile‎
Lines changed: 36 additions & 1 deletion b/‎Makefile‎
Lines changed: 36 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 72 additions & 18 deletions b/‎README.md‎
Lines changed: 72 additions & 18 deletions
@@ -0,0 +1,26 @@
+repos:
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.4.2
+    hooks:
+      - id: ruff
+        args: ["--fix"]
+      - id: ruff-format
+
+  - repo: https://github.com/pre-commit/mirrors-mypy
+    rev: v1.9.0
+    hooks:
+      - id: mypy
+        name: mypy (src)
+        args:
+          - "--ignore-missing-imports"
+          - "--exclude=src/llm/router.py"
+        files: ^src/.*\.py$
+
+  - repo: local
+    hooks:
+      - id: run-tests-with-coverage
+        name: Run pytest with coverage (pre-push)
+        entry: ./scripts/run-tests.sh --cov=src --cov-report=term-missing
+        language: system
+        pass_filenames: false
+        stages: [push]
@@ -1,22 +1,33 @@
 # Instructions for AI Agents
 
-This document provides instructions and guidelines for AI agents working with the SDLC_core repository.
+This document provides instructions and guidelines for AI agents working with the unstructuredDataHandler repository.
 
 ## Repository Overview
 
-SDLC_core is a Python-based Software Development Life Cycle core project that provides AI/ML capabilities for software development workflows. The repository contains modules for LLM clients, intelligent agents, memory management, prompt engineering, document retrieval, skill execution, and various utilities.
+unstructuredDataHandler is a Python-based Software Development Life Cycle core project that provides AI/ML capabilities for software development workflows. The repository contains modules for LLM clients, intelligent agents, memory management, prompt engineering, document retrieval, skill execution, and various utilities.
 
 - **Primary Language**: Python 3.10-3.12
 - **Secondary Languages**: TypeScript (for Azure pipelines), Shell scripts
 - **Project Type**: AI/ML library and tooling for SDLC workflows
 
 ## Environment Setup
 
-### 1. Install Dependencies
+### 1. Preferred: Isolated venv via script
 
-**IMPORTANT**: The project's dependencies are split into multiple files. For development and testing, you should install the dependencies from `requirements-dev.txt`.
+Use the reproducible test script. It creates `.venv_ci` and pins pytest for reliable runs.
 
 ```bash
+./scripts/run-tests.sh
+```
+
+### 2. Alternative: Local dev venv
+
+Create and activate your own virtual environment, then install dev dependencies.
+
+```bash
+python3 -m venv .venv
+source .venv/bin/activate
+pip install -U pip
 pip install -r requirements-dev.txt
 ```
 
@@ -38,16 +49,18 @@ PYTHONPATH=. python -m pytest
 
 ### Testing
 
-The test infrastructure is set up. Use the following commands to run tests:
+Preferred (isolated venv):
 
 ```bash
-# Run all tests
-PYTHONPATH=. python -m pytest test/ -v
+./scripts/run-tests.sh                      # Full test run
+./scripts/run-tests.sh test/unit -k deepagent  # Narrow selection
+```
 
-# Run tests with coverage
-PYTHONPATH=. python -m pytest test/ --cov=src/ --cov-report=xml
+Alternative (local venv):
 
-# Run specific test suites
+```bash
+PYTHONPATH=. python -m pytest test/ -v
+PYTHONPATH=. python -m pytest test/ --cov=src/ --cov-report=xml
 PYTHONPATH=. python -m pytest test/unit/ -v
 PYTHONPATH=. python -m pytest test/integration/ -v
 PYTHONPATH=. python -m pytest test/e2e/ -v
@@ -83,26 +96,27 @@ The core logic is in the `src/` directory, which is organized into the following
 - `src/utils/`: Logging, caching, rate limiting, tokens
 
 Other important directories:
+
 - `config/`: YAML configurations for models, prompts, logging
 - `data/`: Prompts, embeddings, dynamic content
 - `examples/`: Minimal scripts demonstrating key features
 - `test/`: Unit, integration, smoke, and e2e tests
 
 ## Key Development Rules
 
-### ALWAYS:
+### ALWAYS
 
-1.  **Install dependencies** before making changes.
-2.  **Set the `PYTHONPATH`** for all commands.
-3.  **Run tests** (`PYTHONPATH=. python -m pytest test/ -v`) to validate the current state before making changes.
-4.  **Configure the agent** by editing `config/model_config.yaml` before running it.
-5.  **Ensure new Python modules** have proper `__init__.py` files.
-6.  **Follow the branch naming convention**: `dev/<alias>/<feature>`.
-7.  **Fill out the PR template** when submitting a pull request. The template is located at `.github/PULL_REQUEST_TEMPLATE.md`.
+1. **Install dependencies** before making changes.
+2. **Set the `PYTHONPATH`** for all commands.
+3. **Run tests** (`PYTHONPATH=. python -m pytest test/ -v`) to validate the current state before making changes.
+4. **Configure the agent** by editing `config/model_config.yaml` before running it.
+5. **Ensure new Python modules** have proper `__init__.py` files.
+6. **Follow the branch naming convention**: `dev/<alias>/<feature>`.
+7. **Fill out the PR template** when submitting a pull request. The template is located at `.github/PULL_REQUEST_TEMPLATE.md`.
 
-### NEVER:
+### NEVER
 
--   Run tests without setting `PYTHONPATH`.
--   Assume `requirements.txt` contains dependencies.
--   Create modules named "router" (conflicts with existing router.py files).
--   Modify Azure pipeline scripts (`build/azure-pipelines/`) without TypeScript knowledge.
+- Run tests without setting `PYTHONPATH`.
+- Assume `requirements.txt` contains dependencies.
+- Create modules named "router" (conflicts with existing router.py files).
+- Modify Azure pipeline scripts (`build/azure-pipelines/`) without TypeScript knowledge.
@@ -63,14 +63,14 @@ If you don't have any additional info/context to add but would like to indicate
 
 If you're able & willing to help fix issues and/or implement features, we'd love your contribution!
 
-The list of ["good first issue"](https://github.com/SoftwareDevLabs/SDLC_core/issues?q=is%3Aopen+is%3Aissue+label%3A%22Help+Wanted%22++label%3A%22good+first+issue%22+)s is another set of issues that might be easier for first-time contributors. Once you're feeling more comfortable in the codebase, feel free to just use the ["Help Wanted"](https://github.com/SoftwareDevLabs/SDLC_core/issues?q=is%3Aopen+is%3Aissue+label%3A%22Help+Wanted%22+) label, or just find any issue you're interested in and hop in!
+The list of ["good first issue"](https://github.com/SoftwareDevLabs/unstructuredDataHandler/issues?q=is%3Aopen+is%3Aissue+label%3A%22Help+Wanted%22++label%3A%22good+first+issue%22+)s is another set of issues that might be easier for first-time contributors. Once you're feeling more comfortable in the codebase, feel free to just use the ["Help Wanted"](https://github.com/SoftwareDevLabs/unstructuredDataHandler/issues?q=is%3Aopen+is%3Aissue+label%3A%22Help+Wanted%22+) label, or just find any issue you're interested in and hop in!
 
 Generally, we categorize issues in the following way:
-* ["Bugs"](https://github.com/SoftwareDevLabs/SDLC_core/issues?q=is%3Aopen+is%3Aissue+label%3A%22Issue-Bug%22+) are parts of the SDLC_core that are not quite working the right way. There's code to already support some scenario, but it's not quite working right. Fixing these is generally a matter of debugging the broken functionality and fixing the wrong code.
-* ["Tasks"](https://github.com/SoftwareDevLabs/SDLC_core/issues?q=is%3Aopen+is%3Aissue+label%3A%22Issue-Task%22+) are usually new pieces of functionality that aren't yet implemented for the SDLC_core. These are usually smaller features, which we believe
+* ["Bugs"](https://github.com/SoftwareDevLabs/unstructuredDataHandler/issues?q=is%3Aopen+is%3Aissue+label%3A%22Issue-Bug%22+) are parts of the unstructuredDataHandler that are not quite working the right way. There's code to already support some scenario, but it's not quite working right. Fixing these is generally a matter of debugging the broken functionality and fixing the wrong code.
+* ["Tasks"](https://github.com/SoftwareDevLabs/unstructuredDataHandler/issues?q=is%3Aopen+is%3Aissue+label%3A%22Issue-Task%22+) are usually new pieces of functionality that aren't yet implemented for the unstructuredDataHandler. These are usually smaller features, which we believe
   - could be a single, atomic PR
   - Don't require much design consideration, or we've already written the spec for the larger feature they belong to.
-* ["Features"](https://github.com/SoftwareDevLabs/SDLC_core/issues?q=is%3Aopen+is%3Aissue+label%3A%22Issue-Feature%22+) are larger pieces of new functionality. These are usually things we believe would require larger discussion of how they should be implemented, or they'll require some complicated new settings. They might just be features that are composed of many individual tasks. Often times, with features, we like to have a spec written before development work is started, to make sure we're all on the same page (see below).
+* ["Features"](https://github.com/SoftwareDevLabs/unstructuredDataHandler/issues?q=is%3Aopen+is%3Aissue+label%3A%22Issue-Feature%22+) are larger pieces of new functionality. These are usually things we believe would require larger discussion of how they should be implemented, or they'll require some complicated new settings. They might just be features that are composed of many individual tasks. Often times, with features, we like to have a spec written before development work is started, to make sure we're all on the same page (see below).
 
 Bugs and tasks are obviously the easiest to get started with, but don't feel afraid of features either! We've had some community members contribute some amazing "feature"-level work to our repos (albeit, with lots of discussion 😄).
 
@@ -101,7 +101,7 @@ Team members will be happy to help review specs and guide them to completion.
 
 ### Help Wanted
 
-Once the team has approved an issue/spec, development can proceed. If no developers are immediately available, the spec can be parked ready for a developer to get started. Parked specs' issues will be labeled "Help Wanted". To find a list of development opportunities waiting for developer involvement, visit the Issues and filter on [the Help-Wanted label](https://github.com/SoftwareDevLabs/SDLC_core/labels/Help%20Wanted).
+Once the team has approved an issue/spec, development can proceed. If no developers are immediately available, the spec can be parked ready for a developer to get started. Parked specs' issues will be labeled "Help Wanted". To find a list of development opportunities waiting for developer involvement, visit the Issues and filter on [the Help-Wanted label](https://github.com/SoftwareDevLabs/unstructuredDataHandler/labels/Help%20Wanted).
 
 ---
 
@@ -130,7 +130,7 @@ Here are a few things you can do that will increase the likelihood of your pull
 
 ### Testing
 
-Testing is a key component in the development workflow. This SDLC_core should use well defined testing methodology to ensure that SDLC_core and its key components are tested.
+Testing is a key component in the development workflow. This unstructuredDataHandler should use well defined testing methodology to ensure that unstructuredDataHandler and its key components are tested.
 
 <!---TAEF (the Test Authoring and Execution Framework) as the main framework for testing.
 
 
@@ -1,9 +1,26 @@
-.PHONY: test lint clean
+.PHONY: test lint lint-fix typecheck format ci clean coverage coverage-html
 
 # Run tests using the reproducible script (creates .venv_ci)
 test:
 	./scripts/run-tests.sh
 
+# Coverage reports (terminal summary)
+coverage:
+	./scripts/run-tests.sh --cov=src --cov-report=term-missing
+
+# Coverage reports (HTML)
+coverage-html:
+	./scripts/run-tests.sh --cov=src --cov-report=html
+
+# CI bundle: tests with coverage + ruff + mypy + pylint
+ci:
+	./scripts/run-tests.sh --cov=src --cov-report=term-missing
+	python3 -m pip install --upgrade pip
+	pip install ruff mypy pylint
+	python3 -m ruff check src/
+	python3 -m mypy src/ --ignore-missing-imports --exclude="src/llm/router.py"
+	python3 -m pylint src/ --exit-zero
+
 # Run lint and static checks
 lint:
 	# Fast lint with ruff and type check with mypy
@@ -12,5 +29,23 @@ lint:
 	python3 -m ruff check src/
 	python3 -m mypy src/ --ignore-missing-imports
 
+# Auto-fix lint issues with ruff
+lint-fix:
+	python3 -m pip install --upgrade pip
+	pip install ruff
+	ruff check src/ --fix
+
+# Type checking with mypy (with router exclusion)
+typecheck:
+	python3 -m pip install --upgrade pip
+	pip install mypy
+	python3 -m mypy src/ --ignore-missing-imports --exclude="src/llm/router.py"
+
+# Auto-format with ruff
+format:
+	python3 -m pip install --upgrade pip
+	pip install ruff
+	ruff format src/
+
 clean:
 	rm -rf .venv_ci
@@ -1,13 +1,13 @@
 
 
-# Welcome to the SDLC_core Repo
+# Welcome to the unstructuredDataHandler Repo
 
 <details>
   <summary><strong>Table of Contents</strong></summary>
 
 - [Installing and running Windows Terminal](#installing-and-running-windows-terminal)
-- [Module Roadmap](#SDLC_core-roadmap)
-- [SDLC_core Overview](#sdlc_core-overview)
+- [Module Roadmap](#unstructureddatahandler-roadmap)
+- [unstructuredDataHandler Overview](#unstructureddatahandler-overview)
 - [Resources](#resources)
 - [FAQ](#faq)
 - [Documentation](#documentation)
@@ -20,25 +20,25 @@
 
 <br />
 
-This repository contains the source code for the SDLC_core project, a Python-based framework for building AI-powered software development life cycle tools.
+This repository contains the source code for the unstructuredDataHandler project, a Python-based framework for building AI-powered software development life cycle tools.
 
 Related repositories include:
 
-* [SDLC_core Documentation](https://github.com/SoftwareDevLabs) (Placeholder)
+* [unstructuredDataHandler Documentation][docs-repo] (Placeholder)
 
-## SDLC_core Roadmap
+## unstructuredDataHandler Roadmap
 
-The plan for the SDLC_core [is described here](./doc/roadmap-20xx.md) and
+The plan for the unstructuredDataHandler [is described here](./doc/roadmap-20xx.md) and
 will be updated as the project proceeds.
 
 ## Installing and running Windows Terminal
 
 > [!NOTE]
 > This section is a placeholder and may not be relevant to this project.
 
-## SDLC_core Overview
+## unstructuredDataHandler Overview
 
-SDLC_core is a Python-based Software Development Life Cycle core project that provides AI/ML capabilities for software development workflows. The repository contains modules for LLM clients, intelligent agents, memory management, prompt engineering, document retrieval, skill execution, and various utilities. It combines a Python core with TypeScript for Azure DevOps pipeline configurations.
+unstructuredDataHandler is a Python-based Software Development Life Cycle core project that provides AI/ML capabilities for software development workflows. The repository contains modules for LLM clients, intelligent agents, memory management, prompt engineering, document retrieval, skill execution, and various utilities. It combines a Python core with TypeScript for Azure DevOps pipeline configurations.
 
 ## Resources
 
@@ -59,8 +59,8 @@ SDLC_core is a Python-based Software Development Life Cycle core project that pr
 ## Documentation
 
 All project documentation is located at [softwaremodule-docs](./doc/). If you would like
-to contribute to the documentation, please submit a pull request on the [SDLC_core
-Documentation](https://github.com/SoftwareDevLabs) repository.
+to contribute to the documentation, please submit a pull request on the [unstructuredDataHandler
+Documentation][docs-repo] repository.
 
 ---
 
@@ -82,7 +82,7 @@ Documentation](https://github.com/SoftwareDevLabs) repository.
 
 ### Agents
 
-The `agents` module provides the core components for creating AI agents. It includes a flexible `SDLCFlexibleAgent` that can be configured to use different LLM providers (like OpenAI, Gemini, and Ollama) and a set of tools. The module is designed to be extensible, allowing for the creation of custom agents with specialized skills. Key components include a planner and an executor (currently placeholders for future development) and a `MockAgent` for testing and CI.
+The `agents` module provides the core components for creating AI agents. It includes a flexible `FlexibleAgent` (formerly `SDLCFlexibleAgent`) that can be configured to use different LLM providers (like OpenAI, Gemini, and Ollama) and a set of tools. The module is designed to be extensible, allowing for the creation of custom agents with specialized skills. Key components include a planner and an executor (currently placeholders for future development) and a `MockAgent` for testing and CI.
 
 ### Parsers
 
@@ -159,8 +159,48 @@ You can also use the Makefile targets:
 ```bash
 make test
 make lint
+make typecheck
+make format
+make coverage       # terminal coverage summary
+make coverage-html  # generate HTML report in ./htmlcov/
+make ci             # tests with coverage + ruff + mypy + pylint
 ```
 
+### How to test locally (two options)
+
+- Preferred (isolated venv): Use `./scripts/run-tests.sh`. It creates `.venv_ci`, pins pytest, and runs with `PYTHONPATH` set correctly.
+- Alternative (your own venv):
+  1. `python3 -m venv .venv`
+  2. `source .venv/bin/activate`
+  3. `pip install -U pip`
+  4. `pip install -r requirements-dev.txt`
+  5. `PYTHONPATH=. python -m pytest test/ -v`
+
+### Optional: run with coverage
+
+- Isolated venv script (add flags after the script path):
+  - `./scripts/run-tests.sh --cov=src --cov-report=term-missing`
+- Local venv (after installing `requirements-dev.txt`):
+  - `PYTHONPATH=. python -m pytest test/ --cov=src --cov-report=term-missing`
+
+Makefile shortcuts:
+
+- `make coverage` — terminal summary
+- `make coverage-html` — generates an HTML report in `./htmlcov/`
+
+### Quick lint and type checks
+
+- Makefile shortcut:
+  - `make lint`
+  - `make lint-fix`   # ruff check with autofix
+  - `make typecheck`  # mypy with router exclusion
+  - `make format`     # ruff formatter
+- Manual (useful in CI or local venv):
+  - `python -m pylint src/ --exit-zero`
+  - `python -m mypy src/ --ignore-missing-imports --exclude="src/llm/router.py"`
+
+Note: The mypy exclusion for `src/llm/router.py` avoids a duplicate module conflict with `src/fallback/router.py` during type analysis.
+
 ---
 
 ## Contributing
@@ -170,6 +210,20 @@ We are excited to work with the community to build and enhance this project.
 ***BEFORE you start work on a feature/fix***, please read & follow our [Contributor's Guide](./CONTRIBUTING.md) to
 help avoid any wasted or duplicate effort.
 
+### Developer setup: pre-commit hooks (optional but recommended)
+
+To keep code quality consistent, we provide pre-commit hooks for ruff (lint+format) and mypy; and a pre-push hook that runs tests with coverage.
+
+1. Install dev deps (once): `pip install -r requirements-dev.txt`
+2. Install hooks (once): `pre-commit install --install-hooks`
+3. Optional: enable pre-push test runner: `pre-commit install --hook-type pre-push`
+
+Hooks configured in `.pre-commit-config.yaml`:
+
+- ruff (with autofix) and ruff-format
+- mypy with the router exclusion
+- pre-push: `./scripts/run-tests.sh --cov=src --cov-report=term-missing`
+
 ## Communicating with the Team
 
 The easiest way to communicate with the team is via GitHub issues.
@@ -189,18 +243,18 @@ Please review these brief docs below about our coding practices.
 This is a work in progress as we learn what we'll need to provide people in
 order to be effective contributors to our project.
 
- - [Coding Style](./doc/STYLE.md)
- - [Code Organization](./doc/ORGANIZATION.md)
- - [Exceptions in our legacy codebase](./doc/EXCEPTIONS.md)
+- [Coding Style](./doc/STYLE.md)
+- [Code Organization](./doc/ORGANIZATION.md)
+- [Exceptions in our legacy codebase](./doc/EXCEPTIONS.md)
 
 ---
 
 ## Code of Conduct
 
 This project has adopted the [Code of Conduct][conduct-code]. For more information see the [Code of Conduct][conduct-code] or contact [info@softwaredevlabs.com][conduct-email] with any additional questions or comments.
 
-[conduct-code](./CODE_OF_CONDUCT.md)
-
-[conduct-email]: mailto:info@softwaredevlabs.com
 [conduct-code]: ./CODE_OF_CONDUCT.md
 [conduct-email]: mailto:info@softwaredevlabs.com
+[docs-repo]: https://github.com/SoftwareDevLabs  
+<!-- TODO: update [docs-repo] once the dedicated docs repository is created,
+     e.g. https://github.com/SoftwareDevLabs/unstructuredDataHandler-docs -->