perf: add multiplexing performance tests for AsyncMultiRangeDownloader#16501
Open
zhixiangli wants to merge 1 commit intogoogleapis:mainfrom
Open
perf: add multiplexing performance tests for AsyncMultiRangeDownloader#16501zhixiangli wants to merge 1 commit intogoogleapis:mainfrom
zhixiangli wants to merge 1 commit intogoogleapis:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request updates the GCS read microbenchmarks and configuration to test multiplexing by executing multiple download_ranges calls concurrently. The review identifies a critical TypeError where an unsupported lock argument was passed to download_ranges, and recommends removing the asyncio.Lock entirely as it would prevent true multiplexing. Additionally, the feedback suggests a more robust round-robin chunking strategy to ensure the desired number of concurrent tasks and requests the reorganization of standard library imports to follow PEP 8 guidelines.
packages/google-cloud-storage/tests/perf/microbenchmarks/time_based/reads/test_reads.py
Outdated
Show resolved
Hide resolved
packages/google-cloud-storage/tests/perf/microbenchmarks/time_based/reads/test_reads.py
Outdated
Show resolved
Hide resolved
packages/google-cloud-storage/tests/perf/microbenchmarks/time_based/reads/test_reads.py
Outdated
Show resolved
Hide resolved
packages/google-cloud-storage/tests/perf/microbenchmarks/time_based/reads/test_reads.py
Outdated
Show resolved
Hide resolved
packages/google-cloud-storage/tests/perf/microbenchmarks/time_based/reads/test_reads.py
Outdated
Show resolved
Hide resolved
62e59ba to
e00a201
Compare
643ba06 to
90999b4
Compare
90999b4 to
d35e824
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR introduces new microbenchmarks to measure and expose the performance bottleneck caused by lock contention in the
AsyncMultiRangeDownloader. It provides a concrete way to compare the previous serialized implementation against the new multiplexed architecture.Before vs. After: The Performance Gap
Before (Serialized via Lock)
In the previous implementation,
download_rangesused a shared lock to prevent concurrent access to the bidi-gRPC stream. This meant that even with multiple coroutines, only one could "own" the stream at a time. The entire download cycle (Send -> Receive All) had to complete before another task could start.Execution Flow:
sequenceDiagram participant C1 as Coroutine 1 participant C2 as Coroutine 2 participant S as gRPC Stream C1->>C1: Acquire Lock C1->>S: Send Requests S-->>C1: Receive Data (Streaming...) S-->>C1: End of Range C1->>C1: Release Lock Note over C2: Waiting for Lock... C2->>C2: Acquire Lock C2->>S: Send Requests S-->>C2: Receive Data (Streaming...) S-->>C2: End of Range C2->>C2: Release LockAfter (Multiplexed Concurrent)
With the introduction of the
_StreamMultiplexer, multiple coroutines can now share the same stream concurrently. Requests are interleaved, and a background receiver loop routes incoming data to the correct task usingread_id.Execution Flow:
sequenceDiagram participant C1 as Coroutine 1 participant C2 as Coroutine 2 participant M as Multiplexer participant S as gRPC Stream C1->>M: Send Requests M->>S: Forward Req 1 C2->>M: Send Requests M->>S: Forward Req 2 Note over C1,C2: Tasks wait on their own queues S-->>M: Data for C1 M-->>C1: Route to Q1 S-->>M: Data for C2 M-->>C2: Route to Q2 S-->>M: Data for C1 M-->>C1: Route to Q1How the Benchmark Works
This PR adds a
read_rand_multi_coroworkload that:AsyncMultiRangeDownloaderinstance across all tasks.shared_locktodownload_ranges.Key Changes
test_reads.py: Refactored to support launching concurrent coroutines within a single worker process.config.yaml: Addedread_rand_multi_corowith 1, 16 coroutines to stress the downloader.config.py: Updated naming convention to include coroutine count (e.g.,16c) in reports for easier differentiation.