A Python tool for downloading exposed .git directories from web servers, with an
optional 24/7 watch mode that hunts for newly issued certificates and dumps
exposed git repositories the moment they appear.
For authorized security testing, CTF challenges, and educational purposes only.
- Two modes:
dump— manually download a single exposed.gitdirectorywatch— monitor a Certificate Transparency stream and auto-dump every exposed.gitit finds
- Multi-threaded downloads - Configurable thread count for parallel downloads
- Proxy support - HTTP, HTTPS, and SOCKS5 proxy support
- Progress display - Real-time progress with download statistics
- Resume capability - Skip already downloaded files
- Retry logic - Configurable retries with exponential backoff
- Custom User-Agent - Randomized or custom UA strings
- Timeout handling - Configurable connection/read timeouts
- SSL verification toggle - Option to disable for self-signed certificates
- Smart discovery - Automatically discovers and downloads git objects by parsing refs, index, pack files, and object contents
- Secret scanning - Optional post-download scan with trufflehog across the full git history
# Clone the repository
git clone https://github.com/yourusername/WebGitDumper.git
cd WebGitDumper
# Install dependencies
pip install -r requirements.txt
# Make executable (optional)
chmod +x webgitdumper.pyWebGitDumper has two subcommands: dump (one target) and watch (auto-discover via CT logs).
Breaking change: Previous versions used
webgitdumper.py URL OUTPUT_DIR. The equivalent now iswebgitdumper.py dump URL OUTPUT_DIR.
# Download a .git directory
python webgitdumper.py dump http://example.com/.git/ ./output
# Or just provide the base URL
python webgitdumper.py dump http://example.com/ ./outputUsage: webgitdumper.py dump [OPTIONS] URL OUTPUT_DIR
URL: Target URL (e.g., http://example.com/.git/ or http://example.com/)
OUTPUT_DIR: Directory to save the downloaded repository
Options:
-t, --threads INTEGER Number of download threads (default: 10)
-p, --proxy TEXT Proxy URL (http://host:port or socks5://host:port)
--timeout INTEGER Request timeout in seconds (default: 30)
-r, --retries INTEGER Number of retries per file (default: 3)
-u, --user-agent TEXT Custom user agent string
--no-verify Disable SSL verification
-v, --verbose Verbose output
-q, --quiet Minimal output
--scan-secrets Run trufflehog against the dumped repo after download
--help Show this message and exit
# Use a proxy (e.g., Burp Suite or thermoptic)
python webgitdumper.py dump http://target.com/.git/ ./repo --proxy http://127.0.0.1:8080
# Use SOCKS5 proxy
python webgitdumper.py dump http://target.com/.git/ ./repo --proxy socks5://127.0.0.1:1080
# More threads for faster downloads
python webgitdumper.py dump http://target.com/.git/ ./repo --threads 20
# Custom user agent
python webgitdumper.py dump http://target.com/.git/ ./repo --user-agent "CustomBot/1.0"
# Disable SSL verification for self-signed certs
python webgitdumper.py dump https://target.com/.git/ ./repo --no-verify
# Verbose output for debugging
python webgitdumper.py dump http://target.com/.git/ ./repo --verbose
# Quiet mode (errors only)
python webgitdumper.py dump http://target.com/.git/ ./repo --quiet
# Scan dumped repo for secrets across the full git history
python webgitdumper.py dump http://target.com/.git/ ./repo --scan-secretswatch subscribes to a Certificate Transparency stream, probes every newly issued
domain for /.git/HEAD, and on each hit runs the full dumper + trufflehog pipeline.
Findings are persisted as JSONL and the raw repo is deleted after scanning, so the
output stays small even when running for days.
You need a running certstream-server-go
instance. The original public Calidog feed (wss://certstream.calidog.io) has been broken
for years and Calidog themselves describe it as "demo only". Self-host with one command:
docker run -d --name certstream -p 8080:8080 0rickyy0/certstream-server-go:latestcertstream WS → [check_queue] → N GET-probes → [dump_queue] → M dumpers + trufflehog
producer (bounded ~10k) /.git/HEAD (bounded 500) → secrets.jsonl, raw deleted
- Producer: WebSocket-Thread with auto-reconnect and 30s ping keepalive. Drops domains if the queue is full (we can't keep up anyway).
- Probe workers: GET
https://{domain}/.git/HEADwith a short timeout, matchref: refs/heads/in the body. False-positive-resistant against catch-all SPAs. - Dump workers: Run the existing
GitDumperagainst hits, then trufflehog with--no-verification. Findings are appended tosecrets.jsonl, raw repo deleted. - Dedup: In-memory set, configurable TTL (default 24h), so the same domain isn't re-checked endlessly when it appears in many certs.
hits.jsonl— every domain where/.git/HEADmatched, regardless of whether secrets were foundsecrets.jsonl— every trufflehog finding (one JSON object per line)
# Default: watch local certstream-server-go on port 8080
python webgitdumper.py watch ./loot
# Custom certstream URL and tuned worker counts
python webgitdumper.py watch ./loot \
--certstream-url ws://localhost:8765/full-stream \
--check-workers 80 \
--dump-workers 5
# Shorter dedup window for testing
python webgitdumper.py watch ./loot --dedup-ttl 3600Usage: webgitdumper.py watch [OPTIONS] OUTPUT_DIR
Options:
--certstream-url TEXT WebSocket URL of a certstream-server instance
(default: ws://localhost:8080/full-stream)
--check-workers INTEGER Parallel GET probes for /.git/HEAD (default: 55)
--dump-workers INTEGER Parallel full dumps + trufflehog scans (default: 3)
--check-timeout INTEGER Timeout for the /.git/HEAD probe in seconds (default: 5)
--dedup-ttl INTEGER Skip already-seen domains for N seconds (default: 86400)
-v, --verbose Verbose output
--help Show this message and exit
With --scan-secrets, WebGitDumper invokes trufflehog
against the dumped .git directory after the download finishes. The scan covers the
entire commit history, not just the current tree — old commits often hold the most
interesting leaks.
Verification is explicitly disabled (--no-verification) so trufflehog never sends
discovered credentials to third-party APIs. This avoids leaking secrets to upstream
providers, triggering alerts at the target, or leaving traces in external logs.
Findings must be reviewed manually.
Requires the trufflehog binary in PATH (brew install trufflehog). If not present,
the scan is skipped with a warning instead of failing.
Once the download is complete, you can restore the repository:
cd ./output
git checkout .Or view the commit history:
cd ./output
git log --oneline- Initial Discovery - Starts by downloading known git files (HEAD, config, index, refs, etc.)
- SHA1 Extraction - Parses downloaded files for SHA1 hashes (40 hex characters)
- Object Discovery - Queues discovered objects (
objects/XX/XXXXX...) - Pack File Parsing - Downloads and parses pack files and their indexes
- Recursive Discovery - Decompresses objects to find additional references
- Resume Support - Skips files that already exist locally
- HEAD, refs/ - Branch and tag references
- index - Git index file (staged files)
- packed-refs - Packed references
- objects/info/packs - Pack file listing
- *.idx - Pack index files
- objects/XX/* - Loose objects (decompressed for more refs)
The tool decompresses git objects using zlib to:
- Extract SHA1 references from commit and tree objects
- Parse tree objects for file blob references
- Discover the complete object graph
This tool is intended for:
- Authorized penetration testing
- CTF (Capture The Flag) competitions
- Security research
- Educational purposes
Do not use this tool against systems you do not have permission to test.
MIT License
Contributions are welcome! Please feel free to submit issues and pull requests.