WebGitDumper

A Python tool for downloading exposed .git directories from web servers, with an optional 24/7 watch mode that hunts for newly issued certificates and dumps exposed git repositories the moment they appear.

For authorized security testing, CTF challenges, and educational purposes only.

Features

Two modes:
- dump — manually download a single exposed .git directory
- watch — monitor a Certificate Transparency stream and auto-dump every exposed .git it finds
Multi-threaded downloads - Configurable thread count for parallel downloads
Proxy support - HTTP, HTTPS, and SOCKS5 proxy support
Progress display - Real-time progress with download statistics
Resume capability - Skip already downloaded files
Retry logic - Configurable retries with exponential backoff
Custom User-Agent - Randomized or custom UA strings
Timeout handling - Configurable connection/read timeouts
SSL verification toggle - Option to disable for self-signed certificates
Smart discovery - Automatically discovers and downloads git objects by parsing refs, index, pack files, and object contents
Secret scanning - Optional post-download scan with trufflehog across the full git history

Installation

# Clone the repository
git clone https://github.com/yourusername/WebGitDumper.git
cd WebGitDumper

# Install dependencies
pip install -r requirements.txt

# Make executable (optional)
chmod +x webgitdumper.py

Usage

WebGitDumper has two subcommands: dump (one target) and watch (auto-discover via CT logs).

Breaking change: Previous versions used webgitdumper.py URL OUTPUT_DIR. The equivalent now is webgitdumper.py dump URL OUTPUT_DIR.

`dump` — single target

# Download a .git directory
python webgitdumper.py dump http://example.com/.git/ ./output

# Or just provide the base URL
python webgitdumper.py dump http://example.com/ ./output

Options

Usage: webgitdumper.py dump [OPTIONS] URL OUTPUT_DIR

  URL: Target URL (e.g., http://example.com/.git/ or http://example.com/)
  OUTPUT_DIR: Directory to save the downloaded repository

Options:
  -t, --threads INTEGER    Number of download threads (default: 10)
  -p, --proxy TEXT         Proxy URL (http://host:port or socks5://host:port)
  --timeout INTEGER        Request timeout in seconds (default: 30)
  -r, --retries INTEGER    Number of retries per file (default: 3)
  -u, --user-agent TEXT    Custom user agent string
  --no-verify              Disable SSL verification
  -v, --verbose            Verbose output
  -q, --quiet              Minimal output
  --scan-secrets           Run trufflehog against the dumped repo after download
  --help                   Show this message and exit

Examples

# Use a proxy (e.g., Burp Suite or thermoptic)
python webgitdumper.py dump http://target.com/.git/ ./repo --proxy http://127.0.0.1:8080

# Use SOCKS5 proxy
python webgitdumper.py dump http://target.com/.git/ ./repo --proxy socks5://127.0.0.1:1080

# More threads for faster downloads
python webgitdumper.py dump http://target.com/.git/ ./repo --threads 20

# Custom user agent
python webgitdumper.py dump http://target.com/.git/ ./repo --user-agent "CustomBot/1.0"

# Disable SSL verification for self-signed certs
python webgitdumper.py dump https://target.com/.git/ ./repo --no-verify

# Verbose output for debugging
python webgitdumper.py dump http://target.com/.git/ ./repo --verbose

# Quiet mode (errors only)
python webgitdumper.py dump http://target.com/.git/ ./repo --quiet

# Scan dumped repo for secrets across the full git history
python webgitdumper.py dump http://target.com/.git/ ./repo --scan-secrets

`watch` — 24/7 credential harvester

watch subscribes to a Certificate Transparency stream, probes every newly issued domain for /.git/HEAD, and on each hit runs the full dumper + trufflehog pipeline. Findings are persisted as JSONL and the raw repo is deleted after scanning, so the output stays small even when running for days.

Required infrastructure

You need a running certstream-server-go instance. The original public Calidog feed (wss://certstream.calidog.io) has been broken for years and Calidog themselves describe it as "demo only". Self-host with one command:

docker run -d --name certstream -p 8080:8080 0rickyy0/certstream-server-go:latest

Pipeline

certstream WS  →  [check_queue]   →  N GET-probes  →  [dump_queue]   →  M dumpers + trufflehog
   producer       (bounded ~10k)     /.git/HEAD       (bounded 500)     → secrets.jsonl, raw deleted

Producer: WebSocket-Thread with auto-reconnect and 30s ping keepalive. Drops domains if the queue is full (we can't keep up anyway).
Probe workers: GET https://{domain}/.git/HEAD with a short timeout, match ref: refs/heads/ in the body. False-positive-resistant against catch-all SPAs.
Dump workers: Run the existing GitDumper against hits, then trufflehog with --no-verification. Findings are appended to secrets.jsonl, raw repo deleted.
Dedup: In-memory set, configurable TTL (default 24h), so the same domain isn't re-checked endlessly when it appears in many certs.

Output files

hits.jsonl — every domain where /.git/HEAD matched, regardless of whether secrets were found
secrets.jsonl — every trufflehog finding (one JSON object per line)

Examples

# Default: watch local certstream-server-go on port 8080
python webgitdumper.py watch ./loot

# Custom certstream URL and tuned worker counts
python webgitdumper.py watch ./loot \
  --certstream-url ws://localhost:8765/full-stream \
  --check-workers 80 \
  --dump-workers 5

# Shorter dedup window for testing
python webgitdumper.py watch ./loot --dedup-ttl 3600

Options

Usage: webgitdumper.py watch [OPTIONS] OUTPUT_DIR

Options:
  --certstream-url TEXT    WebSocket URL of a certstream-server instance
                           (default: ws://localhost:8080/full-stream)
  --check-workers INTEGER  Parallel GET probes for /.git/HEAD (default: 55)
  --dump-workers INTEGER   Parallel full dumps + trufflehog scans (default: 3)
  --check-timeout INTEGER  Timeout for the /.git/HEAD probe in seconds (default: 5)
  --dedup-ttl INTEGER      Skip already-seen domains for N seconds (default: 86400)
  -v, --verbose            Verbose output
  --help                   Show this message and exit

Secret Scanning

With --scan-secrets, WebGitDumper invokes trufflehog against the dumped .git directory after the download finishes. The scan covers the entire commit history, not just the current tree — old commits often hold the most interesting leaks.

Verification is explicitly disabled (--no-verification) so trufflehog never sends discovered credentials to third-party APIs. This avoids leaking secrets to upstream providers, triggering alerts at the target, or leaving traces in external logs. Findings must be reviewed manually.

Requires the trufflehog binary in PATH (brew install trufflehog). If not present, the scan is skipped with a warning instead of failing.

After Downloading (`dump` mode)

Once the download is complete, you can restore the repository:

cd ./output
git checkout .

Or view the commit history:

cd ./output
git log --oneline

How It Works

Initial Discovery - Starts by downloading known git files (HEAD, config, index, refs, etc.)
SHA1 Extraction - Parses downloaded files for SHA1 hashes (40 hex characters)
Object Discovery - Queues discovered objects (objects/XX/XXXXX...)
Pack File Parsing - Downloads and parses pack files and their indexes
Recursive Discovery - Decompresses objects to find additional references
Resume Support - Skips files that already exist locally

Technical Details

Parsed File Types

HEAD, refs/ - Branch and tag references
index - Git index file (staged files)
packed-refs - Packed references
objects/info/packs - Pack file listing
*.idx - Pack index files
objects/XX/* - Loose objects (decompressed for more refs)

Object Decompression

The tool decompresses git objects using zlib to:

Extract SHA1 references from commit and tree objects
Parse tree objects for file blob references
Discover the complete object graph

Legal Disclaimer

This tool is intended for:

Authorized penetration testing
CTF (Capture The Flag) competitions
Security research
Educational purposes

Do not use this tool against systems you do not have permission to test.

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
__pycache__		__pycache__
README.md		README.md
requirements.txt		requirements.txt
webgitdumper.py		webgitdumper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebGitDumper

Features

Installation

Usage

`dump` — single target

Options

Examples

`watch` — 24/7 credential harvester

Required infrastructure

Pipeline

Output files

Examples

Options

Secret Scanning

After Downloading (`dump` mode)

How It Works

Technical Details

Parsed File Types

Object Decompression

Legal Disclaimer

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WebGitDumper

Features

Installation

Usage

dump — single target

Options

Examples

watch — 24/7 credential harvester

Required infrastructure

Pipeline

Output files

Examples

Options

Secret Scanning

After Downloading (dump mode)

How It Works

Technical Details

Parsed File Types

Object Decompression

Legal Disclaimer

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`dump` — single target

`watch` — 24/7 credential harvester

After Downloading (`dump` mode)

Packages