perf: add a lightweight scheduler implementation #5773

westonpace · 2026-01-21T13:23:31Z

The current scheduler introduces too much synchronization overhead when we are in a high-iops throughput situation. This scheduler reduces the number of asynchronous context switches. On my desktop it doesn't actually have much impact on performance. However, on a system with more cores and higher RAM bandwidth the new scheduler more than doubles the amount of IOPS/s. Combined with the I/O uring (coming in a future PR) performance is actually 4x.

This scheduler makes a tradeoff which is not ideal for the current readers, in particular cloud readers, but which is important for the uring reader:

There is no dedicated background I/O loop thread and tasks are not launched with tokio spawn. This is fine for the local filesystem since each task is its own spawn blocking call. However, for cloud stores this is not great. Tasks are not spawned calls and need to be polled occasionally. The end result is that I/O tasks which are not front of the line might get paused if they fill up (for example, if the HTTP request queues fill up). In I/O uring this isn't a problem because polling one task actually progresses all I/O requests (not just the one associated with the task).

TODO: Figure out concurrency backpressure again

…one tasks

westonpace · 2026-01-21T13:24:00Z

Drafting until I merge #5755

github-actions · 2026-01-21T13:24:40Z

Code Review Summary

This PR introduces a lightweight scheduler implementation to reduce synchronization overhead in high-IOPS scenarios. The design is sound and the claimed 2-4x performance improvements are significant.

P0/P1 Issues

1. Concurrency throttle is ineffective (P0 - Bug)

In lite.rs:1662-1673, SimpleConcurrencyThrottle::try_acquire always returns true when concurrency_available > 0 but never decrements the counter:

fn try_acquire(&mut self) -> bool {
    if self.concurrency_available > 0 {
        // ...commented out...
        true  // Returns true but doesn't decrement
    } else {
        false
    }
}

Combined with release() being a no-op, this means the concurrency throttle provides no actual limiting. This appears intentional based on the TODO comment about deadlocks, but the current implementation allows unbounded concurrent I/O, which could cause resource exhaustion under load.

2. PrioritiesInFlight insertion is O(n) (P1 - Performance)

In lite.rs:1718-1724, every push performs a binary search followed by an insert, making it O(n). With up to 256 concurrent tasks as mentioned in comments, this could add measurable overhead in the hot path. Consider using a BTreeMap<u128, usize> to track priority counts instead.

3. Task cancellation leaks backpressure reservations (P1 - Bug)

In lite.rs:1527-1542, IoTask::cancel() creates a dummy BackpressureReservation with num_bytes: 0 regardless of whether the task had a real reservation. The comment in close() at line 2027-2028 acknowledges this is currently safe because the queue is local, but this is fragile if the design changes.

Minor Observations

The Reader trait change from async fn to BoxFuture return is a reasonable approach to make the futures 'static for the lite scheduler
The serialized_scheduling option controlled via env var is a good debugging/testing escape hatch
Tests should be added specifically for the lite scheduler path (currently tests only use use_lite_scheduler: false)

Overall, this is a well-structured performance improvement. Addressing the concurrency throttle behavior (either fix it or document why unbounded is acceptable) would be the main blocker.

codecov · 2026-01-21T13:59:26Z

Codecov Report

❌ Patch coverage is 31.50000% with 411 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance-io/src/scheduler/lite.rs	0.00%	315 Missing ⚠️
rust/lance-io/src/scheduler.rs	53.48%	40 Missing ⚠️
rust/lance-io/src/object_reader.rs	78.02%	18 Missing and 2 partials ⚠️
rust/lance-io/src/local.rs	72.34%	6 Missing and 7 partials ⚠️
rust/lance-encoding/src/decoder.rs	64.28%	5 Missing ⚠️
rust/lance-index/src/vector/ivf/shuffler.rs	20.00%	3 Missing and 1 partial ⚠️
rust/lance-tools/src/meta.rs	0.00%	3 Missing ⚠️
rust/lance/src/dataset/fragment.rs	40.00%	2 Missing and 1 partial ⚠️
rust/lance/src/dataset/statistics.rs	0.00%	2 Missing ⚠️
rust/lance/src/index/vector/ivf.rs	0.00%	2 Missing ⚠️
... and 3 more

📢 Thoughts on this report? Let us know!

westonpace added 2 commits January 21, 2026 05:18

Modify reader trait to return a static future from get_range

42d19fe

Add a lightweight scheduler implementation which doesn't have standal…

6dfe2be

…one tasks

github-actions bot added python performance labels Jan 21, 2026

westonpace marked this pull request as draft January 21, 2026 13:24

This was referenced Jan 21, 2026

feat: io_uring based file reader #5777

Draft

perf: rework scheduler #5496

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: add a lightweight scheduler implementation #5773

perf: add a lightweight scheduler implementation #5773

Uh oh!

westonpace commented Jan 21, 2026 •

edited

Loading

Uh oh!

westonpace commented Jan 21, 2026

Uh oh!

github-actions bot commented Jan 21, 2026

Uh oh!

codecov bot commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

perf: add a lightweight scheduler implementation #5773

Are you sure you want to change the base?

perf: add a lightweight scheduler implementation #5773

Uh oh!

Conversation

westonpace commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

westonpace commented Jan 21, 2026

Uh oh!

github-actions bot commented Jan 21, 2026

Code Review Summary

P0/P1 Issues

Minor Observations

Uh oh!

codecov bot commented Jan 21, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

westonpace commented Jan 21, 2026 •

edited

Loading