Skip to content
View FAAQJAVED's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report FAAQJAVED

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
FAAQJAVED/README.md

Afaq Javed

Python Automation · Web Scraping · B2B Lead Generation

     


6
interconnected tools
565
tests — zero network calls
20,000+
leads generated
3
CI operating systems
8.7 / 10
avg project rating

What I build

A complete, production-grade B2B lead generation system in Python — six interconnected tools that together cover the full pipeline from discovery → enrichment → CRM-ready output. Any business directory on the internet. Any city. Any sector.

This toolkit has processed 20,000+ verified business leads across the UK property management sector.


The pipeline

  DISCOVERY    find companies from any source
  ---------------------------------------------------------------
  +-  Google Maps Scraper       Maps listings + email enrichment
  +-  LeadHunter Pro            4 search engines, HOT/WARM/COLD
  +-  Trustpilot Scraper        reputation-filtered businesses
  +-  JSON Directory Harvester  any JSON API, config-only
  +-  HTML Directory Scrapers   any HTML or WordPress directory

  ENRICHMENT   add verified contact details
  ---------------------------------------------------------------
  +-  Email & Phone Enricher    HTTP + Playwright, CF bypass

  OUTPUT       consistent schema across all 6 tools
  ---------------------------------------------------------------
  +-  3-sheet Excel  ( Data  /  Flagged  /  Summary )
      deduplicated  *  validated  *  CRM-importable

A client who needs 10,000 verified business contacts — starting from zero — gets back a single deduplicated, formatted Excel file drawn from multiple sources and ready for CRM import.


Projects

Version Tests CI

Playwright-driven Maps scraper with concurrent email enrichment, Cloudflare XOR decode, and atomic checkpoint/resume. Runs survive interruption and pick up exactly where they left off.

Playwright ThreadPoolExecutor openpyxl Cloudflare bypass

Version Tests CI

Two-pass crawler: fast HTTP first, Playwright fallback for JS-rendered pages. E.164 phone normalisation. Works standalone or as the enrichment layer for any discovery tool.

httpx Playwright E.164 normalisation Cloudflare bypass

Version Tests CI

4-engine parallel search (Bing, DuckDuckGo, Mojeek, Yahoo) via abstract base class architecture. Configurable HOT/WARM/COLD/NOISE keyword scoring. Domain deduplication before enrichment.

Abstract base classes 4-engine orchestration YAML scoring config

Version Tests CI

Selenium + Chrome scraper for Trustpilot listings — name, contact, website, rating, review count. Targets only established, active companies. Anti-scraping evasion built in.

Selenium Chrome reputation filtering anti-bot evasion

Version Tests CI

Config-only retargeting — no code changes to point at a new JSON API directory. Two pagination modes, dot-path navigation, geographic bounding-box filter, two-pass deduplication. Best-documented codebase in the toolkit.

Generic pipeline dot-path JSON geo filtering pure functions

Version Tests CI

Dual-engine toolkit: Engine 1 for any paginated HTML via CSS selectors; Engine 2 for WordPress AJAX — automatic nonce extraction, mid-run nonce refresh, manual gzip/zlib decompression. Most technically complex codebase in portfolio.

CSS selectors WordPress AJAX nonce lifecycle ThreadPoolExecutor


Stack

Languages & runtime

Scraping & automation

Specialist techniques: Cloudflare XOR email decoding · WordPress admin-ajax.php nonce lifecycle · manual gzip/zlib decompression · SMTP RCPT email verification · anti-bot fingerprint evasion

Data engineering

Techniques: two-pass deduplication (ID-based + normalised name+postcode) · E.164 phone normalisation · geographic bounding-box filtering · HOT/WARM/COLD/NOISE keyword scoring · configurable record validation

Concurrency & resilience

Patterns: exponential-backoff retry · circuit-breaker (N failures → auto-pause) · atomic file ops (.tmp → os.replace()) · crash-safe checkpoint/resume · cross-platform keyboard listener · command.txt headless controls · tqdm-safe log routing

Infrastructure & DevOps

CI matrix: Ubuntu · Windows · macOS across Python 3.9 – 3.12


What I'm learning next

Each of these directly extends what the current toolkit can do — async throughput, database output, cloud-deployable APIs, visual dashboards.


Specialisation

B2B sales market , Lead Generation and CRM. Property management, Sales, Lettings, and professional services directories.

Postcode validation, geographic filtering, and UK, US, and EU sector categorisation are built into the architecture — not retrofitted.


Contact

Open to remote freelance projects — scraping pipelines, lead generation, data extraction, automation.

📧 faaqjaved@gmail.com  ·  LinkedIn  ·  GitHub

Popular repositories Loading

  1. Leadhunter_Pro Leadhunter_Pro Public

    Multi-engine web scraper and contact enricher — finds business leads via Bing, Yahoo, DuckDuckGo & Mojeek, then extracts emails and phone numbers and scores them HOT/WARM/COLD into a colour-coded E…

    Python 3

  2. Email-Phone-Number-Enrichment-Tool Email-Phone-Number-Enrichment-Tool Public

    Automatically scrape "contact email addresses and phone numbers" from a list of company websites — with zero column-name requirements, automatic file detection, a live progress bar, two-pass archit…

    Python 2

  3. Google-Maps-Business-Scraper Google-Maps-Business-Scraper Public

    A robust, resumable scraper that extracts business contact data from Google Maps for any search query and location — outputting clean, styled XLSX or CSV files.

    Python 2 1

  4. Trustpilot-Business-Scraper Trustpilot-Business-Scraper Public

    Production-grade Python scraper that extracts business contact data (email, phone, website, postcode) from any Trustpilot search query and saves to Excel. Parallel HTTP fetching, checkpoint/resume,…

    Python 2

  5. json-directory-harvester json-directory-harvester Public

    Configurable, resumable Python scraper for harvesting business records from any JSON-based directory API — with geographic filtering, two-pass deduplication, data validation, and formatted three-sh…

    Python 2

  6. FAAQJAVED FAAQJAVED Public