High-performance native Windows WinHTTP web crawler powered by Java 17+ Virtual Threads.
FastSpider is the high-concurrency network crawling engine of the FastJava stack. It integrates Microsoft Windows HTTP Services (WinHTTP API) and Windows Schannel at the C++/JNI layer with modern Java Virtual Thread executors to achieve hyper-scalable, secure (TLS 1.2/1.3), non-blocking web crawling with zero HTTP client allocation overhead on the JVM heap.
// Quick Start — Asynchronous Fetch
FastSpider spider = FastSpider.open();
spider.fetchAsync("https://example.com")
.thenAccept(response -> {
if (response.isSuccess()) {
System.out.println("Fetched " + response.rawBody().length + " bytes in " + response.fetchTimeMs() + "ms");
}
});- Key Features
- Performance
- API Quick Reference
- Installation
- Technical Examples & Hero Demos
- Platform Support
- Modular Ecosystem
- License
- 🌐 WinHTTP Enterprise Core: Native Microsoft HTTP client that handles DNS, connection pooling, and secure TLS 1.3 handshakes automatically.
- 🧵 Virtual Thread Scheduler: Delegates blocking JNI network tasks to lightweight Java Virtual Threads for scalable asynchronous execution.
- ⚡ Built-in AVX2 Extractor: Shares FastJava's AVX2 vectors to clean formatting and find links directly on the downloaded bytes.
- 📦 Zero-Heap Networking: Avoids JVM connection descriptors, request buffers, and GC cycles for extreme request densities.
Measured on Intel/AMD x64 Hardware with Windows 11.
| Operation | Requests | Java HttpClient (Async) | FastSpider Native (v0.1.0) | Speedup |
|---|---|---|---|---|
| Concurrent Fetch | 100 Req | ~220 ms | ~120 ms | 1.8x |
| Max Memory Overhead | 100 Req | ~84 MB | ~4 MB | 21x |
Note
FastSpider drastically reduces GC pause frequency and native thread handle count compared to traditional JVM client engines.
| Method | Description | Target Path |
|---|---|---|
fetchAsync(...) |
Schedules a non-blocking asynchronous fetch inside the Virtual Thread Executor. | Reference → |
fetchBatch(...) |
Performs parallel concurrent page crawls and blocks until all complete. | Reference → |
extractCleanText(...) |
Cleans document tags natively to yield readable text for LLMs. | Reference → |
extractHrefs(...) |
Rapidly extracts all hyperlink targets from HTML page bytes natively. | Reference → |
Tip
Use FastSpider.open() to obtain a thread-safe, reusable native crawler instance.
Add the JitPack repository and the dependencies to your pom.xml:
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
<dependencies>
<!-- FastSpider Library -->
<dependency>
<groupId>com.github.andrestubbe</groupId>
<artifactId>fastspider</artifactId>
<version>v0.1.0</version>
</dependency>
<!-- FastCore (Required Native Loader) -->
<dependency>
<groupId>com.github.andrestubbe</groupId>
<artifactId>fastcore</artifactId>
<version>v0.1.0</version>
</dependency>
</dependencies>repositories {
maven { url 'https://jitpack.io' }
}
dependencies {
implementation 'com.github.andrestubbe:fastspider:v0.1.0'
implementation 'com.github.andrestubbe:fastcore:v0.1.0'
}Download the latest JARs directly to add them to your classpath:
- 📦 fastspider-v0.1.0.jar (The Core Library)
- ⚙️ fastcore-v0.1.0.jar (The Mandatory Native Loader)
Important
All JARs must be in your classpath for the native JNI calls to function correctly.
Explore the complete source configurations and benchmarks:
- ⚡ Interactive Demo: Demo.java (sets up an offline mock server, performs parallel fetches of delayed endpoints, and extracts content).
- ⚡ Joint Pipeline Demo: PipelineDemo.java (orchestrates FastSpider and FastScrape in unison: fetches asynchronously via WinHTTP and parses HTML via AVX2 in a zero-copy pipeline).
- 📈 Performance Benchmark: Benchmark.java (races concurrent fetches against standard Java HttpClient).
- 🧪 Test Suite: FastSpiderTest.java (fully automated JUnit 5 crawler test suite).
Run the hero demo locally from the command line:
mvn exec:java "-Dexec.mainClass=fastspider.Demo"Run the combined crawler & parser pipeline demo:
cd examples/PipelineDemo
run-pipeline.bat| Platform | Status |
|---|---|
| Windows 10/11 (x64) | ✅ Fully Supported (WinHTTP + AVX2 Native) |
| Linux | 🚧 Planned |
| macOS | 🚧 Planned |
Combine FastSpider with other accelerators for maximum efficiency:
- FastScrape — Native SIMD HTML parser.
- FastCore — Native loading substrate.
- FastBytes — Hardware-aligned byte arrays.
- FastJSON — SIMD-powered JSON parser.
Part of the FastJava Ecosystem — Making the JVM faster.
Made with ⚡ by Andre Stubbe