Skip to content

r3l1c7/Logos-Compression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Logos: Compression by Meaning

ALPHA VERSION - VERIFY YOUR DATA. IN THE ALPHA VERSION IT WILL NOT REMOVE YOUR ORIGINAL FILE. It will create file.logos and restore to file.logos.restored.

use python or cpp source or the compiled binary. Logos is an experimental "Structure-Aware" compression engine.

Standard compressors (Gzip, Zstd) are Context-Free—they treat your data as a meaningless stream of bytes. Logos is Context-Aware. It finds the ordering principle (the Logos) behind the data, separating the Structure from the Content.

The Problem

SQL dumps, CSVs, and Logs are 90% repetitive syntax (INSERT INTO, VALUES, TIMESTAMP). Standard compressors try to match these strings over and over. Logos simply removes them.

The "Split-Stream" Architecture

For structured files (like SQL), Logos performs a "Semantic Transposition":

  1. Structure Stream: Discards the repetitive syntax entirely.
  2. Data Streams: Splits values by column and compresses them by type.
    • Integers (IDs, Dates): Compressed using Delta Encoding (stores +1 instead of 10005).
    • Enums (Status, Types): Compressed using Dictionary Encoding.
    • Text: Compressed using LZMA.

Benchmarks (Proof of Concept)

File Type Standard Gzip Logos (Semantic) Reduction
SQL Dump (100k rows) 3.2 MB 1.4 MB ~56%
CSV Data 1.5 MB 0.7 MB ~53%

Usage

Packing (Compress):

python logos.py pack large_dump.sql
# Output: large_dump.sql.logos

Releases

No releases published

Packages

No packages published