PFC-JSONL — High-Ratio JSONL Compressor with Block-Level Random Access

Most log archives are compressed. Most queries touch one hour. Most tools make you decompress everything anyway.

PFC-JSONL stores a block index alongside every compressed file. Query a time window with DuckDB and only the relevant blocks are decompressed — the rest stays on disk, untouched.

~9% compression ratio (25% smaller than gzip, 37% smaller than zstd on typical JSONL logs). 30×–700× faster time-range queries vs. full-file decompression.

Why PFC-JSONL?

Tool	Ratio on JSONL Logs	Block Access	10 TB archive, 1h query
PFC-JSONL	~9 %	✅ Block-level	~26 MB download
gzip	~12 %	❌ Full file	~1.43 TB
zstd	~14.2 %	❌ Full file	~1.43 TB

PFC-JSONL default is 25% smaller than gzip and 37% smaller than zstd at typical settings. Ratios measured on 200 MB JSONL log data (8 services, mixed log levels, ~961K lines), PFC-JSONL v3.4.

Install

Linux x86_64 & macOS ARM64 — Direct Binary

Linux x86_64:

curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x64 \
     -o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl

pfc_jsonl --help

macOS (Apple Silicon M1/M2/M3/M4):

curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-arm64 \
     -o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl

pfc_jsonl --help

macOS Intel (x64): Binary coming soon. Contact: info@impossibleforge.com Windows: No native binary. Use WSL2 or a Linux machine.

DuckDB Extension

Query .pfc files directly from DuckDB SQL — no intermediate decompression step:

INSTALL pfc FROM community;
LOAD pfc;
LOAD json;

-- Read all lines
SELECT line->>'$.level' AS level, line->>'$.message' AS msg
FROM read_pfc_jsonl('/path/to/events.pfc')
LIMIT 10;

-- Block-level timestamp filter: only decompress relevant blocks
SELECT count(*)
FROM read_pfc_jsonl(
    '/path/to/events.pfc',
    ts_from = epoch(TIMESTAMPTZ '2026-01-01 00:00:00+00'),
    ts_to   = epoch(TIMESTAMPTZ '2026-01-02 00:00:00+00')
);

The DuckDB extension calls pfc_jsonl as a subprocess. Install the binary first (see above). See pfc-duckdb on GitHub for manual install instructions.

Ingest — Send Data to PFC

Plug PFC-JSONL into your existing logging or metrics pipeline. All ingest tools buffer data locally, compress when the buffer is full, and optionally upload to S3.

Tool	Protocol / Format	Port	Repo
pfc-fluentbit	Fluent Bit output plugin	—	Fluent Bit → `.pfc`
pfc-vector	HTTP sink (JSON / NDJSON)	8766	Vector.dev → `.pfc`
pfc-telegraf	HTTP (InfluxDB line protocol + JSON)	8767	Telegraf → `.pfc`
pfc-otel-collector	OTLP/HTTP (logs, traces, metrics)	4318	OpenTelemetry → `.pfc`
pfc-kafka-consumer	Kafka / Redpanda consumer	—	Kafka topic → `.pfc`
pfc-gateway ↕	HTTP REST `POST /ingest`	8765	Any source → `.pfc` (+ query)

pfc-gateway is bidirectional — it accepts ingest via POST /ingest and serves queries via POST /query. No DuckDB required.

Query — Read PFC Archives

DuckDB Extension

The fastest way to query .pfc archives locally — see the DuckDB Extension section above.

pfc-gateway — HTTP REST API

Query .pfc archives over HTTP without DuckDB — works with any language, curl, Grafana, or PowerBI:

# Start the gateway (points at your archive directory)
PFC_ARCHIVE_DIR=/var/lib/pfc PFC_API_KEY=secret \
  python3 pfc_gateway.py --port 8765

# Query a time range
curl -X POST http://localhost:8765/query \
  -H "x-api-key: secret" \
  -H "Content-Type: application/json" \
  -d '{
    "file": "/var/lib/pfc/logs_20260101.pfc",
    "from_ts": "2026-01-01T10:00:00Z",
    "to_ts":   "2026-01-01T11:00:00Z"
  }'

# Query multiple files at once
curl -X POST http://localhost:8765/query/batch \
  -H "x-api-key: secret" \
  -H "Content-Type: application/json" \
  -d '{"files": ["/var/lib/pfc/logs_20260101.pfc", "/var/lib/pfc/logs_20260102.pfc"]}'

Also supports Grafana SimpleJSON — point the Grafana data source at http://localhost:8765/grafana. See pfc-gateway on GitHub for full documentation.

Migrate Existing Archives

Already have logs stored as gzip, zstd, bzip2, or lz4 — on disk, on S3, on Azure, or on GCS?

pfc-migrate converts them in one command, directly in your storage (no egress charges):

pip install pfc-migrate[all]

# Local
pfc-migrate convert --dir /var/log/archive/ --output-dir /var/log/pfc/ -v

# S3
pfc-migrate s3 --bucket my-logs --prefix 2025/ --out-bucket my-logs-pfc --out-prefix pfc/

# Azure Blob
pfc-migrate azure --container my-logs --prefix 2025/ --out-container my-logs-pfc --connection-string "..."

# GCS
pfc-migrate gcs --bucket my-logs --prefix 2025/ --out-bucket my-logs-pfc

Python Package

Use the pfc Python package (PyPI: pfc-jsonl) to compress, decompress, and query .pfc files from Python:

pip install pfc-jsonl

import pfc

pfc.compress("logs/app.jsonl", "logs/app.pfc")
pfc.query("logs/app.pfc",
          from_ts="2026-01-15T08:00:00",
          to_ts="2026-01-15T09:00:00",
          output_path="logs/morning.jsonl")

Commands

Command	Description
`pfc_jsonl compress <input> <output>`	Compress JSONL → `.pfc` + `.pfc.bidx`
`pfc_jsonl decompress <input> <output>`	Full decompression
`pfc_jsonl query <input> --from X --to Y --out <output>`	Decompress blocks matching time range
`pfc_jsonl seek-block N <input> [output]`	Extract single block by index
`pfc_jsonl seek-blocks <input> --blocks N [N...]`	Extract multiple blocks (DuckDB primitive)
`pfc_jsonl info <input>`	Show block table + timestamp ranges

Input Format

One JSON object per line with a timestamp field:

{"timestamp": "2025-01-15T06:32:11Z", "level": "ERROR", "service": "api", "msg": "timeout"}
{"timestamp": "2025-01-15T06:32:12Z", "level": "INFO",  "service": "db",  "msg": "query_ok"}

Supported timestamp fields: timestamp, ts, time, @timestamp (ISO 8601 or Unix epoch seconds).

How It Works

PFC divides JSONL logs into independent blocks (configurable, default 32 MiB). Each block is compressed with: BWT → MTF → RLE → rANS O1. Block timestamp ranges are stored in .pfc.bidx (32 bytes/block, binary, C++-readable).

To query a time range, only the relevant blocks are decompressed — the rest is never read.

Related Repos

Ingest

pfc-fluentbit — Fluent Bit output plugin → PFC
pfc-vector — Vector.dev HTTP sink → PFC (Rust)
pfc-telegraf — Telegraf HTTP output plugin → PFC
pfc-otel-collector — OpenTelemetry OTLP/HTTP → PFC
pfc-kafka-consumer — Kafka / Redpanda consumer → PFC
pfc-grafana — Grafana data source plugin for PFC archives

Query & Gateway

pfc-gateway — HTTP REST API: ingest + query, no DuckDB required
pfc-duckdb — DuckDB community extension for SQL queries on PFC files

Archive & Migration

pfc-migrate — convert gzip/zstd/lz4/bz2 archives → PFC (local, S3, Azure, GCS)
pfc-archiver-cratedb — autonomous archive daemon for CrateDB
pfc-archiver-questdb — autonomous archive daemon for QuestDB

SDK

pfc-py — Python client library (PyPI: pfc-jsonl)

License

PFC-JSONL is free for personal and open-source use.

Commercial use (production pipelines, paid services, or business operations) requires a license. Contact: info@impossibleforge.com

Built by ImpossibleForge

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
docs		docs
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PFC-JSONL — High-Ratio JSONL Compressor with Block-Level Random Access

Why PFC-JSONL?

Install

Linux x86_64 & macOS ARM64 — Direct Binary

DuckDB Extension

Ingest — Send Data to PFC

Query — Read PFC Archives

DuckDB Extension

pfc-gateway — HTTP REST API

Migrate Existing Archives

Python Package

Commands

Input Format

How It Works

Related Repos

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

PFC-JSONL — High-Ratio JSONL Compressor with Block-Level Random Access

Why PFC-JSONL?

Install

Linux x86_64 & macOS ARM64 — Direct Binary

DuckDB Extension

Ingest — Send Data to PFC

Query — Read PFC Archives

DuckDB Extension

pfc-gateway — HTTP REST API

Migrate Existing Archives

Python Package

Commands

Input Format

How It Works

Related Repos

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Packages