Most log archives are compressed. Most queries touch one hour. Most tools make you decompress everything anyway.
PFC-JSONL stores a block index alongside every compressed file. Query a time window with DuckDB and only the relevant blocks are decompressed — the rest stays on disk, untouched.
~9% compression ratio (25% smaller than gzip, 37% smaller than zstd on typical JSONL logs). 30×–700× faster time-range queries vs. full-file decompression.
| Tool | Ratio on JSONL Logs | Block Access | 10 TB archive, 1h query |
|---|---|---|---|
| PFC-JSONL | ~9 % | ✅ Block-level | ~26 MB download |
| gzip | ~12 % | ❌ Full file | ~1.43 TB |
| zstd | ~14.2 % | ❌ Full file | ~1.43 TB |
PFC-JSONL default is 25% smaller than gzip and 37% smaller than zstd at typical settings. Ratios measured on 200 MB JSONL log data (8 services, mixed log levels, ~961K lines), PFC-JSONL v3.4.
Linux x86_64:
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x64 \
-o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl
pfc_jsonl --helpmacOS (Apple Silicon M1/M2/M3/M4):
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-arm64 \
-o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl
pfc_jsonl --helpmacOS Intel (x64): Binary coming soon. Contact: info@impossibleforge.com Windows: No native binary. Use WSL2 or a Linux machine.
Query .pfc files directly from DuckDB SQL — no intermediate decompression step:
INSTALL pfc FROM community;
LOAD pfc;
LOAD json;
-- Read all lines
SELECT line->>'$.level' AS level, line->>'$.message' AS msg
FROM read_pfc_jsonl('/path/to/events.pfc')
LIMIT 10;
-- Block-level timestamp filter: only decompress relevant blocks
SELECT count(*)
FROM read_pfc_jsonl(
'/path/to/events.pfc',
ts_from = epoch(TIMESTAMPTZ '2026-01-01 00:00:00+00'),
ts_to = epoch(TIMESTAMPTZ '2026-01-02 00:00:00+00')
);The DuckDB extension calls pfc_jsonl as a subprocess. Install the binary first (see above).
See pfc-duckdb on GitHub for manual install instructions.
Plug PFC-JSONL into your existing logging or metrics pipeline. All ingest tools buffer data locally, compress when the buffer is full, and optionally upload to S3.
| Tool | Protocol / Format | Port | Repo |
|---|---|---|---|
| pfc-fluentbit | Fluent Bit output plugin | — | Fluent Bit → .pfc |
| pfc-vector | HTTP sink (JSON / NDJSON) | 8766 | Vector.dev → .pfc |
| pfc-telegraf | HTTP (InfluxDB line protocol + JSON) | 8767 | Telegraf → .pfc |
| pfc-otel-collector | OTLP/HTTP (logs, traces, metrics) | 4318 | OpenTelemetry → .pfc |
| pfc-kafka-consumer | Kafka / Redpanda consumer | — | Kafka topic → .pfc |
| pfc-gateway ↕ | HTTP REST POST /ingest |
8765 | Any source → .pfc (+ query) |
pfc-gateway is bidirectional — it accepts ingest via
POST /ingestand serves queries viaPOST /query. No DuckDB required.
The fastest way to query .pfc archives locally — see the DuckDB Extension section above.
Query .pfc archives over HTTP without DuckDB — works with any language, curl, Grafana, or PowerBI:
# Start the gateway (points at your archive directory)
PFC_ARCHIVE_DIR=/var/lib/pfc PFC_API_KEY=secret \
python3 pfc_gateway.py --port 8765
# Query a time range
curl -X POST http://localhost:8765/query \
-H "x-api-key: secret" \
-H "Content-Type: application/json" \
-d '{
"file": "/var/lib/pfc/logs_20260101.pfc",
"from_ts": "2026-01-01T10:00:00Z",
"to_ts": "2026-01-01T11:00:00Z"
}'
# Query multiple files at once
curl -X POST http://localhost:8765/query/batch \
-H "x-api-key: secret" \
-H "Content-Type: application/json" \
-d '{"files": ["/var/lib/pfc/logs_20260101.pfc", "/var/lib/pfc/logs_20260102.pfc"]}'Also supports Grafana SimpleJSON — point the Grafana data source at http://localhost:8765/grafana.
See pfc-gateway on GitHub for full documentation.
Already have logs stored as gzip, zstd, bzip2, or lz4 — on disk, on S3, on Azure, or on GCS?
pfc-migrate converts them in one command, directly in your storage (no egress charges):
pip install pfc-migrate[all]
# Local
pfc-migrate convert --dir /var/log/archive/ --output-dir /var/log/pfc/ -v
# S3
pfc-migrate s3 --bucket my-logs --prefix 2025/ --out-bucket my-logs-pfc --out-prefix pfc/
# Azure Blob
pfc-migrate azure --container my-logs --prefix 2025/ --out-container my-logs-pfc --connection-string "..."
# GCS
pfc-migrate gcs --bucket my-logs --prefix 2025/ --out-bucket my-logs-pfcUse the pfc Python package (PyPI: pfc-jsonl) to compress, decompress, and query .pfc files from Python:
pip install pfc-jsonlimport pfc
pfc.compress("logs/app.jsonl", "logs/app.pfc")
pfc.query("logs/app.pfc",
from_ts="2026-01-15T08:00:00",
to_ts="2026-01-15T09:00:00",
output_path="logs/morning.jsonl")| Command | Description |
|---|---|
pfc_jsonl compress <input> <output> |
Compress JSONL → .pfc + .pfc.bidx |
pfc_jsonl decompress <input> <output> |
Full decompression |
pfc_jsonl query <input> --from X --to Y --out <output> |
Decompress blocks matching time range |
pfc_jsonl seek-block N <input> [output] |
Extract single block by index |
pfc_jsonl seek-blocks <input> --blocks N [N...] |
Extract multiple blocks (DuckDB primitive) |
pfc_jsonl info <input> |
Show block table + timestamp ranges |
One JSON object per line with a timestamp field:
{"timestamp": "2025-01-15T06:32:11Z", "level": "ERROR", "service": "api", "msg": "timeout"}
{"timestamp": "2025-01-15T06:32:12Z", "level": "INFO", "service": "db", "msg": "query_ok"}Supported timestamp fields: timestamp, ts, time, @timestamp (ISO 8601 or Unix epoch seconds).
PFC divides JSONL logs into independent blocks (configurable, default 32 MiB).
Each block is compressed with: BWT → MTF → RLE → rANS O1.
Block timestamp ranges are stored in .pfc.bidx (32 bytes/block, binary, C++-readable).
To query a time range, only the relevant blocks are decompressed — the rest is never read.
Ingest
- pfc-fluentbit — Fluent Bit output plugin → PFC
- pfc-vector — Vector.dev HTTP sink → PFC (Rust)
- pfc-telegraf — Telegraf HTTP output plugin → PFC
- pfc-otel-collector — OpenTelemetry OTLP/HTTP → PFC
- pfc-kafka-consumer — Kafka / Redpanda consumer → PFC
- pfc-grafana — Grafana data source plugin for PFC archives
Query & Gateway
- pfc-gateway — HTTP REST API: ingest + query, no DuckDB required
- pfc-duckdb — DuckDB community extension for SQL queries on PFC files
Archive & Migration
- pfc-migrate — convert gzip/zstd/lz4/bz2 archives → PFC (local, S3, Azure, GCS)
- pfc-archiver-cratedb — autonomous archive daemon for CrateDB
- pfc-archiver-questdb — autonomous archive daemon for QuestDB
SDK
- pfc-py — Python client library (PyPI:
pfc-jsonl)
PFC-JSONL is free for personal and open-source use.
Commercial use (production pipelines, paid services, or business operations) requires a license. Contact: info@impossibleforge.com
Built by ImpossibleForge