[Perf]ImproveHttp2HeaderCleanerHandler#48455
Open
xinlian12 wants to merge 42 commits intoAzure:mainfrom
Open
Conversation
Replaced individual createItem calls with executeBulkOperations for document pre-population in AsyncBenchmark, AsyncCtlWorkload, AsyncEncryptionBenchmark, and ReadMyWriteWorkflow. Also migrated ReadMyWriteWorkflow from internal Document/AsyncDocumentClient APIs to the public PojoizedJson/CosmosAsyncContainer v4 APIs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace pre-materialized List<CosmosItemOperation> with Flux.range().map() to lazily emit operations on demand. This avoids holding all N operations in memory simultaneously - the bulk executor consumes them as they are generated, allowing GC to reclaim processed operation wrappers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
If a bulk operation fails, fall back to individual createItem calls with retry logic (max 5 retries for transient errors: 410, 408, 429, 500, 503) and 409 conflict suppression. The retry helper is centralized in BenchmarkHelper.retryFailedBulkOperations(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. HttpHeaders.set()/getHeader(): Add toLowerCaseIfNeeded() fast-path that skips String.toLowerCase() allocation when header name is already all-lowercase (common for x-ms-* and standard Cosmos headers). 2. RxGatewayStoreModel.getUri(): Build URI via StringBuilder instead of the 7-arg URI constructor which re-validates and re-encodes all components. Since components are already well-formed, the single-arg URI(String) constructor is sufficient and avoids URI$Parser overhead. 3. RxDocumentServiceRequest: Cache getCollectionName() result to avoid repeated O(n) slash-scanning across 14+ call sites per request lifecycle. Cache is invalidated when resourceAddress changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The char-by-char scan added method call + branch overhead that offset the toLowerCase savings. Profiling showed ConcurrentHashMap.get(), HashMap.putVal(), and the scan loop itself caused ~10% throughput regression. Reverting to original toLowerCase(Locale.ROOT) which the JIT handles as an intrinsic. The URI construction and collection name caching optimizations are retained as they don't have this issue. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The JFR profiling showed URI$Parser.parse() consuming ~757 CPU samples per 60s recording, all from RxGatewayStoreModel.getUri(). The root cause was a String->URI->String round-trip: we built a URI string, parsed it into java.net.URI (expensive), then Reactor Netty called .toASCIIString() to convert it back to a String. Changes: - RxGatewayStoreModel.getUri() now returns String directly (no URI parse) - HttpRequest: add uriString field with lazy URI parsing via uri() - HttpRequest: new String-based constructor to skip URI parse entirely - ReactorNettyClient: use request.uriString() instead of uri().toASCIIString() - RxGatewayStoreModel: use uriString() for diagnostics/error paths - URI is only parsed lazily on error paths that require a URI object Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add http2Enabled and http2MaxConcurrentStreams config options to TenantWorkloadConfig. When http2Enabled=true, configures Http2ConnectionConfig on GatewayConnectionConfig for AsyncBenchmark, AsyncCtlWorkload, and AsyncEncryptionBenchmark. Usage in workload JSON config: "http2Enabled": true, "http2MaxConcurrentStreams": 30 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…aults Add missing cases in applyField switch statement so these fields are properly inherited from tenantDefaults, not only from individual tenant entries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Ensures every @JsonProperty field in TenantWorkloadConfig has a corresponding case in the applyField() switch statement. This prevents future fields from silently failing to inherit from tenantDefaults, which was the root cause of the http2Enabled bug. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previously, every Gateway response copied ALL Netty response headers through a 3-step chain: 1. Netty headers → HttpHeaders (toLowerCase + new HttpHeader per entry) 2. HttpHeaders.toLowerCaseMap() → new HashMap<String,String> 3. StoreResponse constructor → String[] arrays Now the flow is: 1. Netty headers → Map<String,String> directly (single toLowerCase pass) 2. StoreResponse constructor → String[] arrays Changes: - HttpResponse: add headerMap() returning Map<String,String> directly - ReactorNettyHttpResponse: override headerMap() to build lowercase map from Netty headers without intermediate HttpHeaders object - HttpTransportSerializer: unwrapToStoreResponse takes Map<String,String> instead of HttpHeaders - RxGatewayStoreModel: use httpResponse.headerMap() instead of headers() - ThinClientStoreModel: pass response.getHeaders().asMap() directly instead of wrapping in new HttpHeaders() This eliminates per-response: ~20 HttpHeader object allocations, ~20 extra toLowerCase calls, and one intermediate HashMap. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
StoreResponse now stores the response headers Map<String,String> directly instead of converting to parallel String[] arrays. This eliminates a redundant copy since RxDocumentServiceResponse and StoreClient were immediately converting back to Map. Before: Map → String[] + String[] → Map (3 allocations, 2 iterations) After: Map shared directly (0 extra allocations, 0 extra iterations) Also upgrades StoreResponse.getHeaderValue() from O(n) linear scan to O(1) HashMap.get() with case-insensitive fallback. Null header values from Netty are skipped (matching old HttpHeaders.set behavior which removed null entries). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The new toArray(new String[0]) calls in getResponseHeaderNames() and getResponseHeaderValues() created garbage arrays on every call. These methods have zero production callers — only test validators used them. Changes: - Mark getResponseHeaderNames/Values as @deprecated - Update StoreResponseValidator to use getResponseHeaders() map directly instead of converting to arrays and doing indexOf lookups Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Revert the headerMap() direct-from-Netty path because the per-header toLowerCase() calls caused a throughput regression vs v4. The JIT optimizes the existing HttpHeaders.set() + toLowerCaseMap() path better. Kept improvements: - StoreResponse stores Map<String,String> directly (no String[] arrays) - RxDocumentServiceResponse shares the Map reference (no extra copy) - StoreClient uses getResponseHeaders() directly (no Map reconstruction) - StoreResponse.getHeaderValue() uses HashMap.get() instead of O(n) scan - unwrapToStoreResponse calls toLowerCaseMap() once, reuses the Map for both validateOrThrow and StoreResponse construction Net effect vs v4: eliminates the Map→String[]→Map round-trip while preserving the JIT-optimized HttpHeaders copy path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Netty's HttpObjectDecoder starts with a 256-byte buffer for header parsing and resizes via ensureCapacityInternal() as headers grow. Cosmos responses have ~2-4KB of headers, triggering multiple resizes. Pre-sizing to 16KB (16384 bytes) avoids the resize overhead at the cost of ~16KB per connection (negligible vs connection pool size). JFR v6 showed AbstractStringBuilder.ensureCapacityInternal at 248 samples (1.6% CPU). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Revert all header copy chain changes (R3/v5/v6/v7) back to the v4 state which had the best throughput. Only addition on top of v4 is initialBufferSize(16384) to pre-size Netty's header parsing buffer and reduce ensureCapacityInternal() resize overhead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Benchmark showed initialBufferSize change also produced regression. Reverting to pure v4 state (URI elimination + collection name cache) which had the best throughput at 2,421 ops/s. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
After bulk document pre-population, the CPU spike can pollute workload metrics. Add CpuMonitor utility that captures baseline CPU before ingestion and waits for it to settle (baseline + 10%, max 5 minutes) before starting the workload. Cool-down is internal default behavior — not user-configurable. Benchmark duration is unaffected since each benchmark measures its own start time inside run(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… types These operation types were functionally identical to their Throughput counterparts after metrics capture was unified. Remove them to reduce confusion and dead code paths. Affected: Operation enum, AsyncBenchmark, SyncBenchmark, AsyncEncryptionBenchmark, BenchmarkOrchestrator, Main, tests, README, and workload-config-sample.json. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix thread-unsafe ArrayList mutations: build docs/operations eagerly in loops instead of reactive map(), use Collections.synchronizedList() for failedResponses across AsyncBenchmark, AsyncCtlWorkload, ReadMyWriteWorkflow, and AsyncEncryptionBenchmark - Fix encryption retry bypass: refactor retryFailedBulkOperations to accept a BiFunction<PojoizedJson, PartitionKey, Mono<Void>> so encryption benchmark retries through the encryption container - Re-throw errors after retries exhausted instead of silently swallowing (per reviewer direction) - Remove unused partitionKeyName parameter from retryFailedBulkOperations - Add NaN/negative handling for getProcessCpuLoad() in CpuMonitor - Cache OperatingSystemMXBean as static final in CpuMonitor - Log warning when HTTP/2 is enabled but connection mode is DIRECT - Add retry logic with transient error handling to SyncBenchmark pre-population (408, 410, 429, 500, 503) and 409 conflict handling - Rename writeLatency->writeThroughputWithDataProvider and readLatency->readThroughputWithDataProvider in WorkflowTest - Reduce per-item error logging to debug level in AsyncCtlWorkload, emit aggregated warn summary - Fix brittle test path: use basedir property, UTF-8 charset, and restrict regex to applyField method body Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove HTTP/2 Direct mode warnings: Direct mode also uses gateway connections for metadata, so HTTP/2 settings can still be relevant - Optimize memory: build docs eagerly into list but create CosmosItemOperations lazily via Flux.fromIterable().map(), avoiding storing both lists simultaneously - Rewrite TenantWorkloadConfigApplyFieldTest to use pure reflection: invokes private applyField() via reflection for each @JsonProperty and verifies the field was set, eliminating brittle source file parsing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Revert to lazy doc creation via Flux.range().map() — this is thread-safe because Reactive Streams guarantees serial onNext signals at the map stage (rule 1.3). The thread-safety fix for failedResponses (Collections.synchronizedList) is retained since doOnNext on executeBulkOperations can fire from executor threads. - Store only id + partitionKey in docsToRead instead of full documents. All docsToRead consumers only access getId() and getProperty(pk). This reduces memory from O(N * docSize) to O(N * idSize). - Add BenchmarkHelper.idsToLightweightDocs() utility for constructing minimal PojoizedJson objects from collected ids. - ReadMyWriteWorkflow retains full docs in its cache since queries need QUERY_FIELD_NAME, but still uses lazy creation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reverts the id-only optimization — docsToRead retains full PojoizedJson objects as before. Lazy creation via Flux.range().map() is kept since map() is serial per Reactive Streams spec. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Switch from Retry.max(5) to Retry.backoff(5, 100ms) with max 5s backoff and 0.5 jitter, aligned with BulkWriter reference pattern - Add status code 449 (RetryWith) to retryable set, matching BulkWriter - Reduce retry concurrency from 100 to 20 per reviewer request - Add 449 to SyncBenchmark transient retry set Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Changed from instance field initialized in constructor to static final field, following the standard Logger pattern. Uses package-visible access since subclasses in the same package reference it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- AsyncBenchmark: static final (package-visible for subclasses) - SyncBenchmark: static final (package-visible for consistency) - AsyncCtlWorkload: private static final (no subclasses) - Removed constructor logger assignments in all three classes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pass workload concurrency to retryFailedBulkOperations and cap at 20, so retry parallelism adapts to the configured workload concurrency. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…dler Replace O(n) forEach iteration over all HTTP/2 response headers with a direct O(1) hash lookup via Http2Headers.get(). This handler runs on the IO event loop thread and was consuming ~9.1% of total CPU by scanning all 15-25 headers on every response just to find x-ms-serviceversion. Changes: - Use headers.get(SERVER_VERSION_KEY) instead of headers.forEach() - Cache the header key as a static AsciiString constant - Remove unused StringUtils import Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…pplyFieldTest The test was using 'test-sentinel-value' which fails Integer.parseInt() inside applyField(). Since applyField() catches exceptions internally and does not rethrow, the test could not detect the NumberFormatException and incorrectly reported Integer fields as missing from the switch. Changed sentinel to '42' which is valid for String, Integer, and Boolean (via Boolean.parseBoolean) field types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…p2-header-cleaner-handler
Contributor
There was a problem hiding this comment.
Pull request overview
This PR targets Cosmos Java HTTP/2 performance by optimizing response header cleanup on the hot path, and it also updates the Cosmos benchmark harness to better support/measure HTTP/2 throughput scenarios.
Changes:
- Optimize HTTP/2 response header cleanup by replacing per-header iteration with a direct lookup for
x-ms-serviceversion. - Refactor benchmark pre-population to use bulk operations + retry helper, and add a CPU cool-down step between ingestion and measured workload.
- Remove benchmark latency operation modes, updating configs/tests/docs accordingly, and add HTTP/2 benchmark config knobs.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/http/Http2ResponseHeaderCleanerHandler.java | Replace O(n) header scan with direct lookup + trim for x-ms-serviceversion. |
| sdk/cosmos/azure-cosmos-benchmark/workload-config-sample.json | Update sample operation to ReadThroughput. |
| sdk/cosmos/azure-cosmos-benchmark/src/test/java/com/azure/cosmos/benchmark/WorkflowTest.java | Rename/update tests to use throughput operations. |
| sdk/cosmos/azure-cosmos-benchmark/src/test/java/com/azure/cosmos/benchmark/TenantWorkloadConfigApplyFieldTest.java | New reflection-based test to ensure applyField() covers all @JsonProperty fields. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/encryption/AsyncEncryptionBenchmark.java | Enable HTTP/2 gateway config + switch pre-population to bulk + retry helper. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ctl/AsyncCtlWorkload.java | Enable HTTP/2 gateway config + switch pre-population to bulk + retry helper. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/TenantWorkloadConfig.java | Add http2Enabled/http2MaxConcurrentStreams config + applyField support. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/SyncBenchmark.java | Adjust pre-population behavior and add retry/backoff logic for createItem. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ReadMyWriteWorkflow.java | Migrate workflow to v4 container APIs and bulk pre-population. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/Operation.java | Remove latency operations from benchmark CLI enum. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/Main.java | Update validation messaging and operation handling after latency op removal. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/CpuMonitor.java | New CPU monitor utility used to cool down between ingestion and workload. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkOrchestrator.java | Capture baseline CPU + cool-down before workload execution. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkHelper.java | Add shared retry helper for failed bulk operation responses. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/AsyncBenchmark.java | Enable HTTP/2 gateway config + switch pre-population to bulk + retry helper. |
| sdk/cosmos/azure-cosmos-benchmark/README.md | Update docs to refer to throughput workloads; remove latency operations. |
You can also share your feedback on Copilot code review. Take the survey.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/SyncBenchmark.java
Outdated
Show resolved
Hide resolved
...os/src/main/java/com/azure/cosmos/implementation/http/Http2ResponseHeaderCleanerHandler.java
Outdated
Show resolved
Hide resolved
...s/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkOrchestrator.java
Show resolved
Hide resolved
...s/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkOrchestrator.java
Show resolved
Hide resolved
...os/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ctl/AsyncCtlWorkload.java
Outdated
Show resolved
Hide resolved
…thub.com/xinlian12/azure-sdk-for-java into perf/p1-fix-http2-header-cleaner-handler
…, and exception cause - Fix log grammar in Http2ResponseHeaderCleanerHandler: 'There is extra whitespace' - Track lastException in SyncBenchmark retry loop for proper cause chaining - Renumber lifecycle steps 3/4 to 5/6 in BenchmarkOrchestrator - Fix misleading log message in AsyncCtlWorkload pre-population error Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
The Configuration class was refactored to remove CLI parameters in favor of JSON-based workload config. Update readMyWritesCLI and writeThroughputCLI tests to create a temp JSON config file and pass -workloadConfig to Main.main(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…eness The ConsistencyLevel enum uses SCREAMING_SNAKE_CASE (BOUNDED_STALENESS) but config values use PascalCase display names (BoundedStaleness). Simple toUpperCase() + valueOf() fails for multi-word values. Match by both display name and enum name case-insensitively. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes (Http2ResponseHeaderCleanerHandler.java):
Current code:
change into ->
H2 Benchmark Sweep Results
Summary
Benchmark sweep comparing main (baseline) vs p1fix (this PR) across WriteThroughput and ReadThroughput at concurrency levels c3, c5, c10, c15, c20. Each run: 30 multi-tenant Cosmos DB accounts, Gateway/H2 mode, 10-min duration, 5-min cooldown.
p1fix wins across the board on writes (+0.2–3.6%) and on most reads (+1.6–8.8%). It delivers lower mean latency (1–9%) with identical resource footprint (memory, GC, threads). At CPU-saturated concurrency (c20 reads, 99.6% CPU), results converge — both branches are equally bottlenecked.
Steady-State Throughput
Steady-state: skip first 1 minute (warmup), drop last minute (partial).
Write Throughput (ops/s)
Read Throughput (ops/s)
Steady-State Mean Latency
Steady-State CPU
Memory (Heap)
GC Metrics
Thread Count
Per-Minute Timelines
Line 1 = main (baseline), Line 2 = p1fix (this PR). Each point is a 1-minute reporting interval.
Write c3 Throughput — main: 510 | p1fix: 529 ops/s
xychart-beta title "Write c3 Throughput (ops/s)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "ops/s" 13 --> 591 line [59.3, 500.2, 503.9, 506.2, 510.3, 484.1, 516.0, 525.0, 521.0, 526.0, 443.2] line [478.6, 526.9, 525.6, 531.1, 532.2, 501.6, 537.0, 534.7, 533.2, 535.5, 15.9]Write c5 Throughput — main: 880 | p1fix: 893 ops/s
xychart-beta title "Write c5 Throughput (ops/s)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "ops/s" 166 --> 994 line [614.7, 882.0, 874.0, 885.9, 896.2, 809.8, 893.6, 894.5, 890.0, 896.3, 207.1] line [382.0, 884.2, 893.8, 892.7, 901.6, 886.9, 903.2, 888.2, 889.5, 898.8, 451.2]Write c10 Throughput — main: 1579 | p1fix: 1583 ops/s
xychart-beta title "Write c10 Throughput (ops/s)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "ops/s" 27 --> 1766 line [161.5, 1585.0, 1605.2, 1580.4, 1558.7, 1520.7, 1584.3, 1591.4, 1592.8, 1594.4, 1264.7] line [33.7, 1538.7, 1593.3, 1588.7, 1596.9, 1574.1, 1573.0, 1579.9, 1603.8, 1595.2, 1465.6]Write c15 Throughput — main: 1837 | p1fix: 1875 ops/s
xychart-beta title "Write c15 Throughput (ops/s)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "ops/s" 183 --> 2110 line [1470.1, 1894.3, 1847.1, 1815.8, 1802.9, 1775.6, 1832.3, 1842.8, 1865.5, 1855.6, 228.8] line [1263.6, 1918.1, 1880.7, 1881.8, 1872.4, 1809.6, 1850.1, 1881.8, 1889.9, 1890.1, 466.8]Write c20 Throughput — main: 1796 | p1fix: 1834 ops/s
xychart-beta title "Write c20 Throughput (ops/s)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "ops/s" 327 --> 2104 line [747.5, 1788.2, 1775.4, 1794.3, 1793.5, 1754.4, 1794.5, 1805.9, 1828.2, 1830.7, 846.6] line [409.1, 1912.9, 1900.1, 1841.4, 1807.3, 1782.5, 1752.0, 1797.9, 1840.4, 1867.3, 1255.1]Read c3 Throughput — main: 1091 | p1fix: 1120 ops/s
xychart-beta title "Read c3 Throughput (ops/s)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "ops/s" 39 --> 1250 line [49.2, 1094.1, 1096.8, 1101.6, 1104.3, 1063.4, 1086.7, 1088.4, 1093.0, 1090.5, 998.3] line [695.4, 1116.8, 1112.5, 1125.1, 1132.2, 1067.7, 1129.8, 1133.2, 1129.3, 1136.0, 350.3]Read c5 Throughput — main: 1421 | p1fix: 1449 ops/s
xychart-beta title "Read c5 Throughput (ops/s)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "ops/s" 182 --> 1611 line [227.2, 1438.2, 1425.1, 1436.7, 1433.9, 1407.2, 1404.2, 1416.5, 1417.1, 1410.8, 1095.7] line [1067.3, 1446.9, 1442.2, 1462.1, 1451.1, 1446.5, 1461.9, 1464.5, 1443.9, 1420.7, 274.4]Read c10 Throughput — main: 1663 | p1fix: 1809 ops/s
xychart-beta title "Read c10 Throughput (ops/s)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "ops/s" 40 --> 2014 line [605.0, 1679.0, 1665.4, 1670.5, 1677.1, 1637.1, 1670.3, 1665.4, 1655.6, 1646.0, 910.2] line [49.9, 1802.5, 1831.0, 1822.6, 1817.8, 1797.1, 1797.3, 1808.9, 1808.8, 1792.3, 1639.6]Read c15 Throughput — main: 1744 | p1fix: 1773 ops/s
xychart-beta title "Read c15 Throughput (ops/s)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "ops/s" 249 --> 1976 line [1292.9, 1770.6, 1772.7, 1767.2, 1771.4, 1706.5, 1729.2, 1720.1, 1726.3, 1734.8, 311.6] line [847.1, 1796.5, 1785.5, 1784.8, 1776.1, 1752.5, 1766.7, 1769.1, 1763.0, 1759.3, 767.8]Read c20 Throughput — main: 1796 | p1fix: 1776 ops/s
xychart-beta title "Read c20 Throughput (ops/s)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "ops/s" 161 --> 2004 line [200.8, 1821.8, 1809.5, 1807.1, 1809.4, 1778.0, 1776.2, 1787.2, 1783.3, 1790.8, 1442.7] line [1202.5, 1790.7, 1762.2, 1778.2, 1758.9, 1764.4, 1778.6, 1775.2, 1789.4, 1783.8, 411.6]Write c3 CPU
xychart-beta title "Write c3 CPU (%)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "CPU %" 13 --> 41 line [37.5, 31.6, 31.7, 31.4, 31.8, 33.1, 33.3, 33.7, 34.0, 32.9, 32.2] line [34.5, 32.3, 33.9, 34.2, 34.8, 34.9, 33.4, 34.8, 34.5, 34.4, 15.8]Write c5 CPU
xychart-beta title "Write c5 CPU (%)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "CPU %" 41 --> 71 line [58.9, 60.1, 61.4, 61.9, 63.4, 60.6, 64.0, 63.2, 62.8, 62.9, 60.8] line [51.1, 62.0, 61.7, 62.6, 63.5, 64.8, 64.8, 63.6, 63.3, 64.4, 55.0]Write c10 CPU
xychart-beta title "Write c10 CPU (%)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "CPU %" 38 --> 107 line [47.2, 97.0, 97.1, 97.4, 97.5, 96.8, 97.3, 97.4, 97.4, 97.3, 97.1] line [48.8, 96.9, 96.6, 97.0, 97.0, 96.7, 97.5, 97.2, 97.3, 97.4, 97.4]Write c15 CPU
xychart-beta title "Write c15 CPU (%)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "CPU %" 66 --> 109 line [82.6, 99.1, 99.1, 99.2, 99.1, 99.2, 99.2, 99.2, 99.2, 99.2, 98.8] line [88.6, 99.1, 99.1, 99.0, 99.1, 97.6, 97.1, 97.4, 98.0, 98.2, 98.1]Write c20 CPU
xychart-beta title "Write c20 CPU (%)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "CPU %" 52 --> 109 line [72.1, 99.3, 99.3, 99.3, 99.3, 99.2, 99.3, 99.3, 99.3, 99.3, 99.1] line [65.1, 99.1, 98.4, 97.9, 97.9, 97.3, 97.3, 97.6, 98.2, 98.7, 98.9]Read c3 CPU
xychart-beta title "Read c3 CPU (%)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "CPU %" 16 --> 97 line [20.4, 87.2, 87.6, 87.8, 87.9, 84.1, 86.9, 86.3, 86.1, 86.3, 86.5] line [54.7, 85.2, 86.6, 86.1, 86.2, 81.3, 86.5, 86.4, 86.5, 86.8, 85.8]Read c5 CPU
xychart-beta title "Read c5 CPU (%)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11", "min 12"] y-axis "CPU %" 11 --> 108 line [36.2, 97.8, 98.1, 98.1, 98.0, 97.7, 96.8, 96.9, 97.0, 97.3, 94.1, 0.0] line [13.3, 75.2, 97.9, 98.0, 98.1, 97.9, 97.7, 97.9, 98.1, 96.8, 95.2, 93.0]Read c10 CPU
xychart-beta title "Read c10 CPU (%)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "CPU %" 6 --> 109 line [58.0, 97.8, 98.9, 99.3, 99.4, 99.3, 99.4, 99.4, 99.0, 97.9, 97.9] line [7.6, 99.3, 99.4, 99.4, 99.4, 99.3, 99.4, 99.4, 99.4, 99.4, 99.4]Read c15 CPU
xychart-beta title "Read c15 CPU (%)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "CPU %" 57 --> 109 line [78.0, 99.5, 99.5, 99.5, 99.4, 99.5, 99.5, 99.5, 99.5, 99.5, 99.5] line [70.6, 99.4, 99.5, 99.5, 99.5, 99.4, 99.5, 99.5, 99.5, 99.5, 99.5]Read c20 CPU
xychart-beta title "Read c20 CPU (%)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11", "min 12"] y-axis "CPU %" 8 --> 110 line [38.4, 99.5, 99.5, 99.6, 99.6, 99.5, 99.6, 99.6, 99.6, 99.6, 99.6, 0.0] line [9.8, 76.4, 99.5, 99.5, 99.6, 99.5, 99.5, 99.6, 99.6, 99.6, 99.6, 99.6]Write c3 Latency
xychart-beta title "Write c3 Mean Latency (ms)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "ms" 4 --> 23 line [7.2, 6.0, 5.9, 5.9, 5.8, 6.2, 5.8, 5.7, 5.7, 5.7, 6.2] line [5.8, 5.7, 5.7, 5.6, 5.6, 5.9, 5.5, 5.6, 5.6, 5.6, 21.2]Write c5 Latency
xychart-beta title "Write c5 Mean Latency (ms)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "ms" 4 --> 7 line [6.0, 5.6, 5.7, 5.6, 5.5, 6.1, 5.6, 5.5, 5.6, 5.5, 6.8] line [6.3, 5.6, 5.6, 5.6, 5.5, 5.6, 5.5, 5.6, 5.6, 5.5, 5.9]Write c10 Latency
xychart-beta title "Write c10 Mean Latency (ms)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "ms" 5 --> 25 line [11.8, 6.2, 6.1, 6.2, 6.3, 6.4, 6.2, 6.1, 6.1, 6.1, 6.2] line [23.2, 6.3, 6.1, 6.1, 6.1, 6.2, 6.2, 6.2, 6.1, 6.1, 6.1]Write c15 Latency
xychart-beta title "Write c15 Mean Latency (ms)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "ms" 6 --> 9 line [8.6, 7.7, 7.9, 8.0, 8.0, 8.2, 7.9, 7.9, 7.8, 7.8, 7.6] line [8.6, 7.6, 7.7, 7.7, 7.7, 8.0, 7.9, 7.7, 7.7, 7.7, 7.5]Write c20 Latency
xychart-beta title "Write c20 Mean Latency (ms)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "ms" 8 --> 16 line [13.6, 10.8, 10.9, 10.8, 10.8, 11.1, 10.8, 10.7, 10.6, 10.6, 10.7] line [14.9, 10.1, 10.2, 10.5, 10.7, 10.9, 11.1, 10.8, 10.5, 10.3, 10.3]Read c3 Latency
xychart-beta title "Read c3 Mean Latency (ms)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "ms" 2 --> 5 line [4.7, 2.7, 2.7, 2.7, 2.7, 2.8, 2.7, 2.7, 2.7, 2.7, 2.7] line [2.8, 2.6, 2.6, 2.6, 2.6, 2.7, 2.6, 2.6, 2.6, 2.6, 2.8]Read c5 Latency
xychart-beta title "Read c5 Mean Latency (ms)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "ms" 3 --> 5 line [4.6, 3.3, 3.3, 3.3, 3.3, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4] line [3.6, 3.3, 3.3, 3.3, 3.3, 3.3, 3.3, 3.3, 3.3, 3.4, 3.5]Read c10 Latency
xychart-beta title "Read c10 Mean Latency (ms)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "ms" 4 --> 18 line [6.9, 5.7, 5.7, 5.7, 5.6, 5.8, 5.7, 5.7, 5.7, 5.8, 5.8] line [16.0, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.3, 5.2]Read c15 Latency
xychart-beta title "Read c15 Mean Latency (ms)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "ms" 6 --> 10 line [8.9, 8.0, 8.0, 8.1, 8.0, 8.3, 8.2, 8.3, 8.2, 8.2, 8.1] line [9.1, 7.9, 8.0, 8.0, 8.0, 8.1, 8.1, 8.0, 8.1, 8.1, 8.3]Read c20 Latency
xychart-beta title "Read c20 Mean Latency (ms)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "ms" 8 --> 18 line [16.4, 10.4, 10.5, 10.5, 10.5, 10.7, 10.7, 10.6, 10.6, 10.6, 10.5] line [11.8, 10.6, 10.8, 10.7, 10.8, 10.7, 10.7, 10.7, 10.6, 10.6, 10.9]Write c3 Heap
xychart-beta title "Write c3 Heap (MiB)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10"] y-axis "MiB" 43 --> 71 line [53.9, 54.6, 57.9, 57.7, 58.0, 58.3, 61.4, 61.9, 62.2, 62.5] line [56.7, 56.3, 59.5, 59.8, 60.2, 60.5, 63.6, 64.1, 64.4, 64.7]Write c5 Heap
xychart-beta title "Write c5 Heap (MiB)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10"] y-axis "MiB" 45 --> 78 line [58.1, 59.9, 62.0, 63.4, 64.3, 68.2, 69.0, 69.6, 70.3, 70.9] line [56.0, 59.5, 62.1, 63.4, 64.3, 66.7, 68.4, 69.2, 69.7, 70.3]Write c10 Heap
xychart-beta title "Write c10 Heap (MiB)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10"] y-axis "MiB" 44 --> 92 line [62.0, 64.5, 68.1, 70.0, 72.1, 76.3, 77.7, 78.9, 79.9, 80.8] line [55.5, 63.2, 70.0, 72.3, 74.1, 75.5, 79.9, 81.6, 82.7, 83.6]Write c15 Heap
xychart-beta title "Write c15 Heap (MiB)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10"] y-axis "MiB" 49 --> 98 line [60.9, 69.1, 72.8, 75.6, 77.8, 82.2, 83.7, 85.1, 86.5, 87.7] line [67.3, 72.4, 75.8, 77.6, 79.7, 83.6, 85.0, 86.3, 87.6, 88.8]Write c20 Heap
xychart-beta title "Write c20 Heap (MiB)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10"] y-axis "MiB" 52 --> 104 line [70.0, 75.7, 78.6, 81.3, 83.3, 87.5, 89.5, 91.2, 92.9, 0.0] line [65.3, 72.7, 77.2, 79.7, 82.4, 87.1, 89.5, 91.2, 93.2, 94.9]Read c3 Heap
xychart-beta title "Read c3 Heap (MiB)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"] y-axis "MiB" 4 --> 223 line [187.8, 192.7, 193.5, 194.0, 194.6, 198.3, 199.1, 199.6, 200.0, 200.5, 0.0] line [5.0, 190.5, 195.2, 196.0, 196.6, 197.1, 200.7, 201.4, 201.9, 202.4, 202.9]Read c5 Heap
xychart-beta title "Read c5 Heap (MiB)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10"] y-axis "MiB" 150 --> 233 line [192.8, 201.4, 203.7, 204.7, 205.4, 209.3, 210.2, 210.8, 211.3, 211.9] line [188.0, 195.8, 198.8, 200.4, 201.3, 204.1, 205.7, 206.5, 207.1, 207.7]Read c10 Heap
xychart-beta title "Read c10 Heap (MiB)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10"] y-axis "MiB" 149 --> 247 line [197.9, 208.3, 211.9, 213.8, 215.3, 220.0, 221.1, 222.0, 222.8, 224.3] line [186.1, 201.5, 205.2, 209.5, 212.0, 216.3, 217.8, 218.6, 219.4, 220.5]Read c15 Heap
xychart-beta title "Read c15 Heap (MiB)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10"] y-axis "MiB" 157 --> 251 line [197.5, 210.8, 215.5, 217.8, 219.8, 223.8, 225.0, 226.2, 227.6, 228.6] line [196.1, 207.3, 210.9, 213.2, 214.9, 218.8, 220.0, 221.0, 222.2, 223.6]Read c20 Heap
xychart-beta title "Read c20 Heap (MiB)" x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10"] y-axis "MiB" 153 --> 264 line [197.7, 215.6, 221.6, 227.1, 229.3, 233.7, 236.2, 238.0, 239.3, 240.4] line [191.6, 208.5, 214.9, 219.7, 223.2, 227.8, 229.6, 231.6, 233.2, 235.3]Methodology
-Xmx8g -Xms8g -XX:+UseG1GC -XX:MaxDirectMemorySize=2g