Skip to content

Conversation

@westonpace
Copy link
Member

@westonpace westonpace commented Jan 21, 2026

The current scheduler introduces too much synchronization overhead when we are in a high-iops throughput situation. This scheduler reduces the number of asynchronous context switches. On my desktop it doesn't actually have much impact on performance. However, on a system with more cores and higher RAM bandwidth the new scheduler more than doubles the amount of IOPS/s. Combined with the I/O uring (coming in a future PR) performance is actually 4x.

This scheduler makes a tradeoff which is not ideal for the current readers, in particular cloud readers, but which is important for the uring reader:

  • There is no dedicated background I/O loop thread and tasks are not launched with tokio spawn. This is fine for the local filesystem since each task is its own spawn blocking call. However, for cloud stores this is not great. Tasks are not spawned calls and need to be polled occasionally. The end result is that I/O tasks which are not front of the line might get paused if they fill up (for example, if the HTTP request queues fill up). In I/O uring this isn't a problem because polling one task actually progresses all I/O requests (not just the one associated with the task).

TODO: Figure out concurrency backpressure again

@westonpace
Copy link
Member Author

Drafting until I merge #5755

@westonpace westonpace marked this pull request as draft January 21, 2026 13:24
@github-actions
Copy link
Contributor

Code Review Summary

This PR introduces a lightweight scheduler implementation to reduce synchronization overhead in high-IOPS scenarios. The design is sound and the claimed 2-4x performance improvements are significant.

P0/P1 Issues

1. Concurrency throttle is ineffective (P0 - Bug)

In lite.rs:1662-1673, SimpleConcurrencyThrottle::try_acquire always returns true when concurrency_available > 0 but never decrements the counter:

fn try_acquire(&mut self) -> bool {
    if self.concurrency_available > 0 {
        // ...commented out...
        true  // Returns true but doesn't decrement
    } else {
        false
    }
}

Combined with release() being a no-op, this means the concurrency throttle provides no actual limiting. This appears intentional based on the TODO comment about deadlocks, but the current implementation allows unbounded concurrent I/O, which could cause resource exhaustion under load.

2. PrioritiesInFlight insertion is O(n) (P1 - Performance)

In lite.rs:1718-1724, every push performs a binary search followed by an insert, making it O(n). With up to 256 concurrent tasks as mentioned in comments, this could add measurable overhead in the hot path. Consider using a BTreeMap<u128, usize> to track priority counts instead.

3. Task cancellation leaks backpressure reservations (P1 - Bug)

In lite.rs:1527-1542, IoTask::cancel() creates a dummy BackpressureReservation with num_bytes: 0 regardless of whether the task had a real reservation. The comment in close() at line 2027-2028 acknowledges this is currently safe because the queue is local, but this is fragile if the design changes.

Minor Observations

  • The Reader trait change from async fn to BoxFuture return is a reasonable approach to make the futures 'static for the lite scheduler
  • The serialized_scheduling option controlled via env var is a good debugging/testing escape hatch
  • Tests should be added specifically for the lite scheduler path (currently tests only use use_lite_scheduler: false)

Overall, this is a well-structured performance improvement. Addressing the concurrency throttle behavior (either fix it or document why unbounded is acceptable) would be the main blocker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant