Skip to content

Conversation

@Xuanwo
Copy link
Collaborator

@Xuanwo Xuanwo commented Jan 22, 2026

This PR will fix #5714.

Cardinality calculation is slow, but we only need it when building the dictionary. This PR changes the calculation to a lazy approach.

BITMAP 5M low-float:229.725 (base) → 264.306 (bad) → 215.792 (good)
BTREE 5M low-float:216.628 (base) → 340.683 (bad) → 222.974 (good)

Parts of this PR were drafted with assistance from Codex (with gpt-5.2) and fully reviewed and edited by me. I take full responsibility for all changes.

@github-actions
Copy link
Contributor

Code Review

Summary: The PR correctly implements lazy cardinality calculation to fix the performance regression in BITMAP/BTREE index building. The approach is sound.

P1: Minor inefficiency in get_stat implementation

In FixedWidthDataBlock::get_stat(), the computed cardinality is cloned twice unnecessarily:

let computed = self.cardinality();
let mut block_info = self.block_info.0.write().unwrap();
Some(
    block_info
        .entry(stat)
        .or_insert_with(|| computed.clone())  // clone here
        .clone(),  // and clone again here
)

Since computed is moved into the closure scope, the first .clone() inside or_insert_with is unnecessary when the entry doesn't exist. Consider:

let computed = self.cardinality();
let mut block_info = self.block_info.0.write().unwrap();
Some(block_info.entry(stat).or_insert(computed).clone())

This avoids one clone when the stat wasn't already present. The entry API's or_insert will only insert computed if the key is vacant, so it's equivalent but slightly more efficient.

Note on race condition (non-issue)

The pattern of releasing the read lock before acquiring the write lock could theoretically allow another thread to compute the same stat concurrently. However, this is benign:

  1. The entry().or_insert_with() pattern ensures only one computation is stored
  2. Duplicate computations would produce identical results
  3. The performance win from avoiding unnecessary computation on the happy path outweighs rare double-computation

Tests: The two new tests adequately verify the lazy behavior.

Verdict: LGTM with the minor optimization suggestion above.

@codecov
Copy link

codecov bot commented Jan 22, 2026

Codecov Report

❌ Patch coverage is 88.88889% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-encoding/src/statistics.rs 88.88% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build time regression for BITMAP and BTREE index

2 participants