FastSpider — High-performance native WinHTTP web crawler for Java

High-performance native Windows WinHTTP web crawler powered by Java 17+ Virtual Threads.

FastSpider is the high-concurrency network crawling engine of the FastJava stack. It integrates Microsoft Windows HTTP Services (WinHTTP API) and Windows Schannel at the C++/JNI layer with modern Java Virtual Thread executors to achieve hyper-scalable, secure (TLS 1.2/1.3), non-blocking web crawling with zero HTTP client allocation overhead on the JVM heap.

// Quick Start — Asynchronous Fetch
FastSpider spider = FastSpider.open();

spider.fetchAsync("https://example.com")
      .thenAccept(response -> {
          if (response.isSuccess()) {
              System.out.println("Fetched " + response.rawBody().length + " bytes in " + response.fetchTimeMs() + "ms");
          }
      });

Key Features

🌐 WinHTTP Enterprise Core: Native Microsoft HTTP client that handles DNS, connection pooling, and secure TLS 1.3 handshakes automatically.
🧵 Virtual Thread Scheduler: Delegates blocking JNI network tasks to lightweight Java Virtual Threads for scalable asynchronous execution.
⚡ Built-in AVX2 Extractor: Shares FastJava's AVX2 vectors to clean formatting and find links directly on the downloaded bytes.
📦 Zero-Heap Networking: Avoids JVM connection descriptors, request buffers, and GC cycles for extreme request densities.

📊 Performance (v0.1.0)

Measured on Intel/AMD x64 Hardware with Windows 11.

Operation	Requests	Java HttpClient (Async)	FastSpider Native (v0.1.0)	Speedup
Concurrent Fetch	100 Req	~220 ms	~120 ms	1.8x
Max Memory Overhead	100 Req	~84 MB	~4 MB	21x

Note

FastSpider drastically reduces GC pause frequency and native thread handle count compared to traditional JVM client engines.

API Quick Reference

Method	Description	Target Path
`fetchAsync(...)`	Schedules a non-blocking asynchronous fetch inside the Virtual Thread Executor.	Reference →
`fetchBatch(...)`	Performs parallel concurrent page crawls and blocks until all complete.	Reference →
`extractCleanText(...)`	Cleans document tags natively to yield readable text for LLMs.	Reference →
`extractHrefs(...)`	Rapidly extracts all hyperlink targets from HTML page bytes natively.	Reference →

Tip

Use FastSpider.open() to obtain a thread-safe, reusable native crawler instance.

Installation

Option 1: Maven (Recommended)

Add the JitPack repository and the dependencies to your pom.xml:

<repositories>
    <repository>
        <id>jitpack.io</id>
        <url>https://jitpack.io</url>
    </repository>
</repositories>

<dependencies>
    <!-- FastSpider Library -->
    <dependency>
        <groupId>com.github.andrestubbe</groupId>
        <artifactId>fastspider</artifactId>
        <version>v0.1.0</version>
    </dependency>

    <!-- FastCore (Required Native Loader) -->
    <dependency>
        <groupId>com.github.andrestubbe</groupId>
        <artifactId>fastcore</artifactId>
        <version>v0.1.0</version>
    </dependency>
</dependencies>

Option 2: Gradle (via JitPack)

repositories {
    maven { url 'https://jitpack.io' }
}

dependencies {
    implementation 'com.github.andrestubbe:fastspider:v0.1.0'
    implementation 'com.github.andrestubbe:fastcore:v0.1.0'
}

Option 3: Direct Download (No Build Tool)

Download the latest JARs directly to add them to your classpath:

📦 fastspider-v0.1.0.jar (The Core Library)
⚙️ fastcore-v0.1.0.jar (The Mandatory Native Loader)

Important

All JARs must be in your classpath for the native JNI calls to function correctly.

Technical Examples & Hero Demos

Explore the complete source configurations and benchmarks:

⚡ Interactive Demo: Demo.java (sets up an offline mock server, performs parallel fetches of delayed endpoints, and extracts content).
⚡ Joint Pipeline Demo: PipelineDemo.java (orchestrates FastSpider and FastScrape in unison: fetches asynchronously via WinHTTP and parses HTML via AVX2 in a zero-copy pipeline).
📈 Performance Benchmark: Benchmark.java (races concurrent fetches against standard Java HttpClient).
🧪 Test Suite: FastSpiderTest.java (fully automated JUnit 5 crawler test suite).

Run the hero demo locally from the command line:

mvn exec:java "-Dexec.mainClass=fastspider.Demo"

Run the combined crawler & parser pipeline demo:

cd examples/PipelineDemo
run-pipeline.bat

Platform Support

Platform	Status
Windows 10/11 (x64)	✅ Fully Supported (WinHTTP + AVX2 Native)
Linux	🚧 Planned
macOS	🚧 Planned

Modular Ecosystem

Combine FastSpider with other accelerators for maximum efficiency:

FastScrape — Native SIMD HTML parser.
FastCore — Native loading substrate.
FastBytes — Hardware-aligned byte arrays.
FastJSON — SIMD-powered JSON parser.

Part of the FastJava Ecosystem — Making the JVM faster.

Made with ⚡ by Andre Stubbe

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
examples		examples
native		native
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
COMPILE.md		COMPILE.md
LICENSE		LICENSE
README.md		README.md
REFERENCE.md		REFERENCE.md
ROADMAP.md		ROADMAP.md
compile.bat		compile.bat
philosophie.md		philosophie.md
pom.xml		pom.xml
run-benchmark.bat		run-benchmark.bat
run-demo.bat		run-demo.bat
run-pipeline.bat		run-pipeline.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastSpider — High-performance native WinHTTP web crawler for Java

Table of Contents

Key Features

📊 Performance (v0.1.0)

API Quick Reference

Installation

Option 1: Maven (Recommended)

Option 2: Gradle (via JitPack)

Option 3: Direct Download (No Build Tool)

Technical Examples & Hero Demos

Platform Support

Modular Ecosystem

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FastSpider — High-performance native WinHTTP web crawler for Java

Table of Contents

Key Features

📊 Performance (v0.1.0)

API Quick Reference

Installation

Option 1: Maven (Recommended)

Option 2: Gradle (via JitPack)

Option 3: Direct Download (No Build Tool)

Technical Examples & Hero Demos

Platform Support

Modular Ecosystem

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages