Skip to content

Add CompactionProgressListener to OnDiskGraphIndexCompactor#7

Merged
eolivelli merged 1 commit into
mainfrom
issue-530-compaction-progress-listener
May 11, 2026
Merged

Add CompactionProgressListener to OnDiskGraphIndexCompactor#7
eolivelli merged 1 commit into
mainfrom
issue-530-compaction-progress-listener

Conversation

@eolivelli
Copy link
Copy Markdown
Owner

Summary

  • Introduce CompactionProgressListener — a dedicated @FunctionalInterface in the io.github.jbellis.jvector.graph.disk package whose single method onProgress(long completedBatches, long totalBatches) is called every ten batches (and at the final batch of each level) during streaming compaction.
  • Add compact(Path, CompactionProgressListener) overload to OnDiskGraphIndexCompactor; the existing compact(Path) delegates with nullfully backward-compatible.
  • Thread the listener through compactLevels()runBatchesWithBackpressure(), alongside the existing % 10 SLF4J log line.
  • Also fire at completed == total so callers always see a 100 % notification at level completion (previously the final batch was only logged if total % 10 == 0).

Motivation

HerdDB issue eolivelli/herddb#530: the index-optimizer's GET /status endpoint returned batches_written: 0 / pct_complete: 0.00 throughout multi-minute streaming compaction runs because there was no callback to propagate the per-batch counters back to the HTTP status object. Parsing the SLF4J log messages from the caller side would have been fragile; a typed interface is the right abstraction.

Test plan

🤖 Implemented by the pr-worker agent.

Introduce a dedicated CompactionProgressListener functional interface
and a compact(Path, CompactionProgressListener) overload so callers
can track streaming compaction I/O progress without parsing log messages.

The listener is called every ten batches (and at level completion) with
(completedBatches, totalBatches) for each graph level processed.  The
existing compact(Path) delegate passes null and preserves full backward
compatibility.

Motivated by HerdDB issue datastax#530: the index-optimizer's GET /status
endpoint showed batches_written=0 / pct_complete=0 throughout multi-
minute streaming compaction runs because there was no hook to propagate
the per-batch progress counters back to the HTTP status object.
@eolivelli eolivelli merged commit 9b01846 into main May 11, 2026
4 of 10 checks passed
eolivelli added a commit to eolivelli/herddb that referenced this pull request May 12, 2026
…progress callback (#531)

Fixes #530.

## Root cause

`GET /status` on the index-optimizer always returned `batches_written:
0`,
`batches_total: 0`, `pct_complete: 0.00` throughout a streaming
compaction
because `RemoteSegmentGraphMerger.mergeStreaming()` never forwarded the
`batchListener` field to `OnDiskGraphIndexCompactor`. The legacy
in-memory
path (`mergeLegacy`) already wired the callback to `buildGraph()`; the
streaming path had no equivalent plumbing.

## Changes

- **`RemoteSegmentGraphMerger.java`**
  - On entering the `"compacting"` phase, fires an initial
    `(0, keptCount)` notification so `/status` immediately shows a
    non-zero denominator — not only after the first batch completes.
  - Creates a `CompactionProgressListener` that delegates to the
    `batchListener` and passes it to the new
`OnDiskGraphIndexCompactor.compact(Path, CompactionProgressListener)`
overload (added in `eolivelli/jvector#7`, now on `eolivelli/jvector`
main).
- Introduces `fireBatchProgress(LongBinaryOperator, long, long)` static
    helper with `@SuppressFBWarnings` to bridge the JDK's void-less
`LongBinaryOperator` to the typed `void onProgress()` contract without
    SpotBugs false positives.

## Tests

- **`StreamingCompactionBatchProgressTest`** (new, plain test — no
cluster
infra required): builds 3 real on-disk segments via
`PersistentVectorStore`,
  enables `VectorIndexCompactor.streamingCompactionEnabled`, runs a full
  streaming merge, and asserts:
1. An initial `(0, keptCount=600)` notification fires immediately when
the
`"compacting"` phase begins (denominator always non-zero from the first
instant).
2. At least one `(completed ≥ 10, total > 0)` notification arrives from
`OnDiskGraphIndexCompactor` (proves the jvector callback actually
fires).
  3. A final `(completed == total)` notification marks level completion
     (100% signal).
  4. `completed` values are monotonically non-decreasing within a level.

🤖 Implemented by the `pr-worker` agent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant