Skip to content

perf: add multiplexing performance tests for AsyncMultiRangeDownloader#16501

Open
zhixiangli wants to merge 1 commit intogoogleapis:mainfrom
zhixiangli:zhixiangli/perf-multiplexing-downloader
Open

perf: add multiplexing performance tests for AsyncMultiRangeDownloader#16501
zhixiangli wants to merge 1 commit intogoogleapis:mainfrom
zhixiangli:zhixiangli/perf-multiplexing-downloader

Conversation

@zhixiangli
Copy link
Copy Markdown

@zhixiangli zhixiangli commented Apr 1, 2026

Overview

This PR introduces new microbenchmarks to measure and expose the performance bottleneck caused by lock contention in the AsyncMultiRangeDownloader. It provides a concrete way to compare the previous serialized implementation against the new multiplexed architecture.

Before vs. After: The Performance Gap

Before (Serialized via Lock)

In the previous implementation, download_ranges used a shared lock to prevent concurrent access to the bidi-gRPC stream. This meant that even with multiple coroutines, only one could "own" the stream at a time. The entire download cycle (Send -> Receive All) had to complete before another task could start.

Execution Flow:

sequenceDiagram
    participant C1 as Coroutine 1
    participant C2 as Coroutine 2
    participant S as gRPC Stream

    C1->>C1: Acquire Lock
    C1->>S: Send Requests
    S-->>C1: Receive Data (Streaming...)
    S-->>C1: End of Range
    C1->>C1: Release Lock
    
    Note over C2: Waiting for Lock...
    
    C2->>C2: Acquire Lock
    C2->>S: Send Requests
    S-->>C2: Receive Data (Streaming...)
    S-->>C2: End of Range
    C2->>C2: Release Lock
Loading

After (Multiplexed Concurrent)

With the introduction of the _StreamMultiplexer, multiple coroutines can now share the same stream concurrently. Requests are interleaved, and a background receiver loop routes incoming data to the correct task using read_id.

Execution Flow:

sequenceDiagram
    participant C1 as Coroutine 1
    participant C2 as Coroutine 2
    participant M as Multiplexer
    participant S as gRPC Stream

    C1->>M: Send Requests
    M->>S: Forward Req 1
    C2->>M: Send Requests
    M->>S: Forward Req 2
    
    Note over C1,C2: Tasks wait on their own queues
    
    S-->>M: Data for C1
    M-->>C1: Route to Q1
    S-->>M: Data for C2
    M-->>C2: Route to Q2
    S-->>M: Data for C1
    M-->>C1: Route to Q1
Loading

How the Benchmark Works

This PR adds a read_rand_multi_coro workload that:

  1. Spawns multiple asynchronous tasks (coroutines).
  2. Shares a single AsyncMultiRangeDownloader instance across all tasks.
  3. Simulates the old serialized behavior by explicitly passing a shared_lock to download_ranges.
  4. Measures total throughput (MiB/s) and resource utilization.

Key Changes

  • test_reads.py: Refactored to support launching concurrent coroutines within a single worker process.
  • config.yaml: Added read_rand_multi_coro with 1, 16 coroutines to stress the downloader.
  • config.py: Updated naming convention to include coroutine count (e.g., 16c) in reports for easier differentiation.

@zhixiangli zhixiangli requested review from a team as code owners April 1, 2026 08:19
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the GCS read microbenchmarks and configuration to test multiplexing by executing multiple download_ranges calls concurrently. The review identifies a critical TypeError where an unsupported lock argument was passed to download_ranges, and recommends removing the asyncio.Lock entirely as it would prevent true multiplexing. Additionally, the feedback suggests a more robust round-robin chunking strategy to ensure the desired number of concurrent tasks and requests the reorganization of standard library imports to follow PEP 8 guidelines.

@zhixiangli zhixiangli marked this pull request as draft April 1, 2026 09:19
@zhixiangli zhixiangli force-pushed the zhixiangli/perf-multiplexing-downloader branch 7 times, most recently from 62e59ba to e00a201 Compare April 1, 2026 12:02
@zhixiangli zhixiangli marked this pull request as ready for review April 1, 2026 12:14
@zhixiangli zhixiangli force-pushed the zhixiangli/perf-multiplexing-downloader branch 2 times, most recently from 643ba06 to 90999b4 Compare April 2, 2026 07:02
@zhixiangli zhixiangli force-pushed the zhixiangli/perf-multiplexing-downloader branch from 90999b4 to d35e824 Compare April 2, 2026 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant