| 6 interconnected tools |
565 tests — zero network calls |
20,000+ leads generated |
3 CI operating systems |
8.7 / 10 avg project rating |
A complete, production-grade B2B lead generation system in Python — six interconnected tools that together cover the full pipeline from discovery → enrichment → CRM-ready output. Any business directory on the internet. Any city. Any sector.
This toolkit has processed 20,000+ verified business leads across the UK property management sector.
DISCOVERY find companies from any source
---------------------------------------------------------------
+- Google Maps Scraper Maps listings + email enrichment
+- LeadHunter Pro 4 search engines, HOT/WARM/COLD
+- Trustpilot Scraper reputation-filtered businesses
+- JSON Directory Harvester any JSON API, config-only
+- HTML Directory Scrapers any HTML or WordPress directory
ENRICHMENT add verified contact details
---------------------------------------------------------------
+- Email & Phone Enricher HTTP + Playwright, CF bypass
OUTPUT consistent schema across all 6 tools
---------------------------------------------------------------
+- 3-sheet Excel ( Data / Flagged / Summary )
deduplicated * validated * CRM-importable
A client who needs 10,000 verified business contacts — starting from zero — gets back a single deduplicated, formatted Excel file drawn from multiple sources and ready for CRM import.
|
Playwright-driven Maps scraper with concurrent email enrichment, Cloudflare XOR decode, and atomic checkpoint/resume. Runs survive interruption and pick up exactly where they left off.
|
Two-pass crawler: fast HTTP first, Playwright fallback for JS-rendered pages. E.164 phone normalisation. Works standalone or as the enrichment layer for any discovery tool.
|
|
4-engine parallel search (Bing, DuckDuckGo, Mojeek, Yahoo) via abstract base class architecture. Configurable HOT/WARM/COLD/NOISE keyword scoring. Domain deduplication before enrichment.
|
Selenium + Chrome scraper for Trustpilot listings — name, contact, website, rating, review count. Targets only established, active companies. Anti-scraping evasion built in.
|
|
Config-only retargeting — no code changes to point at a new JSON API directory. Two pagination modes, dot-path navigation, geographic bounding-box filter, two-pass deduplication. Best-documented codebase in the toolkit.
|
Dual-engine toolkit: Engine 1 for any paginated HTML via CSS selectors; Engine 2 for WordPress AJAX — automatic nonce extraction, mid-run nonce refresh, manual gzip/zlib decompression. Most technically complex codebase in portfolio.
|
Specialist techniques: Cloudflare XOR email decoding · WordPress
admin-ajax.phpnonce lifecycle · manual gzip/zlib decompression · SMTP RCPT email verification · anti-bot fingerprint evasion
Techniques: two-pass deduplication (ID-based + normalised name+postcode) · E.164 phone normalisation · geographic bounding-box filtering · HOT/WARM/COLD/NOISE keyword scoring · configurable record validation
Patterns: exponential-backoff retry · circuit-breaker (N failures → auto-pause) · atomic file ops (
.tmp → os.replace()) · crash-safe checkpoint/resume · cross-platform keyboard listener ·command.txtheadless controls ·tqdm-safe log routing
CI matrix: Ubuntu · Windows · macOS across Python 3.9 – 3.12
Each of these directly extends what the current toolkit can do — async throughput, database output, cloud-deployable APIs, visual dashboards.
B2B sales market , Lead Generation and CRM. Property management, Sales, Lettings, and professional services directories.
Postcode validation, geographic filtering, and UK, US, and EU sector categorisation are built into the architecture — not retrofitted.
Open to remote freelance projects — scraping pipelines, lead generation, data extraction, automation.