perf: use cpu pool to process all posting lists #5780

BubbleCal · 2026-01-22T08:37:31Z

less contention to improve indexing perf (+3%)

…g/optimize-write-postings

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

github-actions · 2026-01-22T08:37:47Z

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

github-actions · 2026-01-22T08:38:30Z

Code Review

Summary: This PR optimizes inverted index building by processing posting lists on a CPU pool with a channel-based producer-consumer pattern instead of buffered async streams.

P0/P1 Issues

P1: Missing error propagation from producer task

In builder.rs, when the channel rx.recv() returns None, the loop exits silently but doesn't check if the producer encountered an error before completing. The producer.await? at the end will catch panics, but if the producer successfully sent an error and then exited, the flow depends on the consumer processing that error batch.

The current code does handle this correctly (error batches are still received and propagated via batch?), but consider adding a comment to clarify this intentional behavior, as the control flow is subtle.

P1: Potential issue with length variable shadowing in to_batch_with_docs

In index.rs:1976, length is used for both the posting list length and passed to build_batch, but build_batch doesn't actually use length from the calling function - it shadows it with self.len() on line 1919. This works correctly but the local length variable on line 1976 is unused. Consider removing it or using self.len() directly in the idf() call.

Minor Observations (not blocking)

The commented-out channel_capacity line should be removed if not needed
Good test coverage added that verifies equivalence between the two batch building methods

Overall the approach looks sound for reducing contention. The channel-based pattern should indeed help by keeping CPU-bound work off the async runtime.

codecov · 2026-01-22T09:32:00Z

Codecov Report

❌ Patch coverage is 91.87500% with 13 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance-index/src/scalar/inverted/encoding.rs	87.50%	0 Missing and 9 partials ⚠️
rust/lance-index/src/scalar/inverted/builder.rs	84.61%	2 Missing ⚠️
rust/lance-index/src/scalar/inverted/index.rs	97.33%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Xuanwo · 2026-01-22T09:57:02Z

rust/lance-index/src/scalar/inverted/builder.rs

+        let docs_for_batches = docs.clone();
+        let schema_for_batches = schema.clone();
+
+        // let channel_capacity = get_num_compute_intensive_cpus().max(1);


Do we need to remove this?

Xuanwo · 2026-01-22T09:57:45Z

rust/lance-index/src/scalar/inverted/encoding.rs

+where
+    F: FnMut(u32, u32) -> f32,
+{
+    debug_assert!(length > 0);


Better to provide more information about this assert.

…g/optimize-write-postings

BubbleCal added 8 commits January 16, 2026 14:11

Optimize posting list write

2369b90

Limit posting list write fanout

3775afc

Serialize posting list writes

99cc71a

Increase posting list channel buffer

55d5515

Merge branch 'main' of https://github.com/lance-format/lance into yan…

515d174

…g/optimize-write-postings

Merge branch 'main' of https://github.com/lance-format/lance into yan…

569db24

…g/optimize-write-postings

fix

0759c8a

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

smaller channel

985b1cf

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

github-actions bot added the performance label Jan 22, 2026

BubbleCal requested review from Xuanwo and westonpace January 22, 2026 08:38

BubbleCal changed the title ~~perf: use cpu pool to process all posting lists~~ perf: use cpu pool to process all posting lists Jan 22, 2026

Clarify posting list error flow

dc26a58

Xuanwo reviewed Jan 22, 2026

View reviewed changes

BubbleCal added 2 commits January 23, 2026 15:54

Merge branch 'main' of https://github.com/lance-format/lance into yan…

d98b212

…g/optimize-write-postings

Address review comments

424760e

BubbleCal requested a review from Xuanwo January 23, 2026 09:31

Xuanwo approved these changes Jan 23, 2026

View reviewed changes

BubbleCal merged commit 2e6adce into main Jan 23, 2026
33 checks passed

BubbleCal deleted the yang/optimize-write-postings branch January 23, 2026 09:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: use cpu pool to process all posting lists #5780

perf: use cpu pool to process all posting lists #5780

BubbleCal commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

codecov bot commented Jan 22, 2026

Uh oh!

Xuanwo Jan 22, 2026

Uh oh!

BubbleCal Jan 23, 2026

Uh oh!

Xuanwo Jan 22, 2026

Uh oh!

BubbleCal Jan 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf: use cpu pool to process all posting lists #5780

perf: use cpu pool to process all posting lists #5780

Conversation

BubbleCal commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Code Review

P0/P1 Issues

Minor Observations (not blocking)

Uh oh!

codecov bot commented Jan 22, 2026

Codecov Report

Uh oh!

Xuanwo Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

BubbleCal Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Xuanwo Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

BubbleCal Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants