perf: calculate cardinality lazily #5783

Xuanwo · 2026-01-22T12:04:36Z

This PR will fix #5714.

Cardinality calculation is slow, but we only need it when building the dictionary. This PR changes the calculation to a lazy approach.

BITMAP 5M low-float：229.725 (base) → 264.306 (bad) → 215.792 (good)
BTREE 5M low-float：216.628 (base) → 340.683 (bad) → 222.974 (good)

Parts of this PR were drafted with assistance from Codex (with gpt-5.2) and fully reviewed and edited by me. I take full responsibility for all changes.

github-actions · 2026-01-22T12:05:40Z

Code Review

Summary: The PR correctly implements lazy cardinality calculation to fix the performance regression in BITMAP/BTREE index building. The approach is sound.

P1: Minor inefficiency in `get_stat` implementation

In FixedWidthDataBlock::get_stat(), the computed cardinality is cloned twice unnecessarily:

let computed = self.cardinality();
let mut block_info = self.block_info.0.write().unwrap();
Some(
    block_info
        .entry(stat)
        .or_insert_with(|| computed.clone())  // clone here
        .clone(),  // and clone again here
)

Since computed is moved into the closure scope, the first .clone() inside or_insert_with is unnecessary when the entry doesn't exist. Consider:

let computed = self.cardinality();
let mut block_info = self.block_info.0.write().unwrap();
Some(block_info.entry(stat).or_insert(computed).clone())

This avoids one clone when the stat wasn't already present. The entry API's or_insert will only insert computed if the key is vacant, so it's equivalent but slightly more efficient.

Note on race condition (non-issue)

The pattern of releasing the read lock before acquiring the write lock could theoretically allow another thread to compute the same stat concurrently. However, this is benign:

The entry().or_insert_with() pattern ensures only one computation is stored
Duplicate computations would produce identical results
The performance win from avoiding unnecessary computation on the happy path outweighs rare double-computation

Tests: The two new tests adequately verify the lazy behavior.

Verdict: LGTM with the minor optimization suggestion above.

codecov · 2026-01-22T12:44:24Z

Codecov Report

❌ Patch coverage is 88.88889% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance-encoding/src/statistics.rs	88.88%	4 Missing ⚠️

📢 Thoughts on this report? Let us know!

perf: calculate cardinality lazily

9253569

github-actions bot added the performance label Jan 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: calculate cardinality lazily #5783

perf: calculate cardinality lazily #5783

Xuanwo commented Jan 22, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

codecov bot commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

perf: calculate cardinality lazily #5783

Are you sure you want to change the base?

perf: calculate cardinality lazily #5783

Conversation

Xuanwo commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 22, 2026

Code Review

P1: Minor inefficiency in get_stat implementation

Note on race condition (non-issue)

Uh oh!

codecov bot commented Jan 22, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Xuanwo commented Jan 22, 2026 •

edited

Loading

P1: Minor inefficiency in `get_stat` implementation