Skip to content

Commit f856559

Browse files
dfa1claude
andcommitted
feat(writer): pco E6 — IntMult mode (mode=1) with triple-GCD detection
Adds IntMult encoding mode to PcoEncodingEncoder. For integer dtypes, samples the chunk, computes triple-GCDs over disjoint triples, applies statistical filter (z-score > 3 vs. uniform-mod-GCD null), and verifies > 0.5 bits/element savings before switching from Classic. When chosen: mult[i] = latent[i] / base adj[i] = latent[i] % base Both streams encoded independently with bin DP + tANS; chunk meta carries mode=1, base, primary bins, secondary bins. Wire format: page header: primary states (4), secondary states (4), align per batch: primary ANS, primary offsets, secondary ANS, secondary offsets Refactored single-stream Classic page encoding into a StreamData helper shared between Classic and IntMult paths. FloatMult/FloatQuant deferred — marginal gain over existing Classic+ALP cascade, significant algorithmic complexity (approx pair-GCD, false position root finder, trailing-zero detector). Java→Rust integration test confirms wire format: values × 1000 in I64 column round-trips through Rust JNI decoder. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 14c98be commit f856559

5 files changed

Lines changed: 503 additions & 58 deletions

File tree

docs/adr/0007-pco-encode.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# ADR 0007: Pure-Java `vortex.pco` encoder
22

3-
- **Status:** Accepted (E0 gate cleared — E1 is next)
3+
- **Status:** Implemented (E1-E5, E7-E9 done; E6 partial: IntMult only; FloatMult/FloatQuant deferred as marginal vs. existing Classic+ALP cascade)
44
- **Date:** 2026-06-13
55
- **Deciders:** project maintainer
66
- **Supersedes:**

integration/src/test/java/io/github/dfa1/vortex/integration/JavaWritesRustReadsIntegrationTest.java

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1454,6 +1454,23 @@ void javaWriter_rustReader_pco_i64_multiChunk(@TempDir Path tmp) throws IOExcept
14541454
assertThat(decoded).containsExactly(data);
14551455
}
14561456

1457+
@Test
1458+
void javaWriter_rustReader_pco_i64_intMult(@TempDir Path tmp) throws IOException {
1459+
// Given — values × 1000: triple-GCD detection picks base=1000 → mode=1
1460+
Path file = tmp.resolve("java_pco_i64_intmult.vtx");
1461+
long[] data = LongStream.range(0, 2000).map(i -> i * 1000L).toArray();
1462+
try (var ch = FileChannel.open(file, StandardOpenOption.CREATE, StandardOpenOption.WRITE);
1463+
var sut = VortexWriter.create(ch, TS_SCHEMA, WriteOptions.defaults(),
1464+
List.of(new PcoEncodingEncoder()))) {
1465+
// When
1466+
sut.writeChunk(Map.of("ts", data));
1467+
}
1468+
1469+
// Then
1470+
long[] decoded = readLongColumn(file, "ts");
1471+
assertThat(decoded).containsExactly(data);
1472+
}
1473+
14571474
static Stream<long[]> pcoSequentialI64ArrayProvider() {
14581475
// Sequential arrays exercise the Consecutive delta path (stride-1 and stride-N)
14591476
return Stream.of(

0 commit comments

Comments
 (0)