feat:Dark Web Extraction feature #378

vaishcodescape · 2025-10-13T06:55:09Z

Issue #377 - Deep Web Content Extraction Module

Changes Proposed

Added comprehensive deep web content extraction framework with modular architecture
Implemented 10 specialized extractors for different types of dark web intelligence gathering
Integrated deep extraction functionality into main TorBot CLI
Updated project dependencies to support advanced NLP and pattern matching capabilities
Enhanced OSINT capabilities for Tor hidden services analysis

Explanation of Changes

This PR introduces a complete Deep Web Content Extraction Module that significantly enhances TorBot's OSINT capabilities for analyzing dark web content. The implementation consists of:

Core Architecture

base.py: Abstract base class (BaseExtractor) defining the extraction interface and ExtractionResult data structure
orchestrator.py: Main DeepExtractor class that coordinates all specialized extractors and manages the extraction pipeline

Specialized Extractors (10 modules)

breach_detector.py: Detects and analyzes data breach indicators, leaked credentials patterns
communication_extractor.py: Extracts communication channels (email, Jabber, IRC, etc.)
credentials_extractor.py: Identifies credentials, API keys, tokens, and authentication data
crypto_extractor.py: Extracts cryptocurrency addresses and payment information
hidden_services_extractor.py: Discovers and catalogs .onion links and hidden services
linguistic_analyzer.py: Performs NLP-based content analysis and language pattern detection
marketplace_extractor.py: Identifies marketplace-specific data (products, vendors, pricing)
pii_extractor.py: Extracts Personally Identifiable Information
threat_indicators_extractor.py: Detects IOCs (Indicators of Compromise) and threat intelligence

Integration

Modified main.py to add deep extraction CLI commands
Updated requirements.txt with necessary dependencies for NLP, pattern matching, and data extraction
Updated pyproject.toml to reflect new module dependencies
Cleaned up deprecated code in info.py

Statistics

3,113 insertions across 16 files
Fully modular and extensible architecture
Each extractor operates independently with consistent interface

Screenshots of new feature/change

…orBot into aditya/features

vaishcodescape · 2025-10-13T07:03:58Z

New Feature added successfully @KingAkeem Can you review and merge it ?
Thanks : )

KingAkeem · 2025-10-14T22:29:08Z

@vaishcodescape This merge request is too large, you need to break this down into smaller chunks that separates the areas of concern. (e.g. dependency updates vs. functional updates). Also provide ways to test each merge request, thank you.

vaishcodescape added 6 commits October 9, 2025 22:24

feature added which extracts dark web links

2902e54

DarkWeb Extractor feature added

93ef0e5

Merge branch 'dev' into aditya/features

754ecf7

bugs ficed

4ed57c1

bugs fixed

dc8b1ee

Merge branch 'aditya/features' of https://github.com/vaishcodescape/T…

18f5844

…orBot into aditya/features

vaishcodescape closed this Oct 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat:Dark Web Extraction feature #378

feat:Dark Web Extraction feature #378

Uh oh!

vaishcodescape commented Oct 13, 2025

Uh oh!

vaishcodescape commented Oct 13, 2025

Uh oh!

KingAkeem commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat:Dark Web Extraction feature #378

feat:Dark Web Extraction feature #378

Uh oh!

Conversation

vaishcodescape commented Oct 13, 2025

Issue #377 - Deep Web Content Extraction Module

Changes Proposed

Explanation of Changes

Core Architecture

Specialized Extractors (10 modules)

Integration

Statistics

Screenshots of new feature/change

Uh oh!

vaishcodescape commented Oct 13, 2025

Uh oh!

KingAkeem commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants