[Perf]ImproveHttp2HeaderCleanerHandler by xinlian12 · Pull Request #48455 · Azure/azure-sdk-for-java

xinlian12 · 2026-03-17T21:51:47Z

Changes (Http2ResponseHeaderCleanerHandler.java):

Current code:

// Iterates ALL 15-25 headers to find ONE header
headers.forEach(entry -> {
    if (StringUtils.equalsIgnoreCase(key, HttpConstants.HttpHeaders.SERVER_VERSION)) {
        // trim whitespace
    }
});

change into ->

// Direct O(1) lookup instead of O(n) iteration
CharSequence serverVersion = headers.get(HttpConstants.HttpHeaders.SERVER_VERSION);
if (serverVersion != null && serverVersion.length() > 0
    && (serverVersion.charAt(0) == ' '
        || serverVersion.charAt(serverVersion.length() - 1) == ' ')) {
    headers.set(
        AsciiString.of(HttpConstants.HttpHeaders.SERVER_VERSION),
        serverVersion.toString().trim());
}

H2 Benchmark Sweep Results

Summary

Benchmark sweep comparing main (baseline) vs p1fix (this PR) across WriteThroughput and ReadThroughput at concurrency levels c3, c5, c10, c15, c20. Each run: 30 multi-tenant Cosmos DB accounts, Gateway/H2 mode, 10-min duration, 5-min cooldown.

p1fix wins across the board on writes (+0.2–3.6%) and on most reads (+1.6–8.8%). It delivers lower mean latency (1–9%) with identical resource footprint (memory, GC, threads). At CPU-saturated concurrency (c20 reads, 99.6% CPU), results converge — both branches are equally bottlenecked.

Steady-State Throughput

Steady-state: skip first 1 minute (warmup), drop last minute (partial).

Write Throughput (ops/s)

Concurrency	main	p1fix	Delta
c3	510	529	+3.6%
c5	880	893	+1.5%
c10	1579	1583	+0.2%
c15	1837	1875	+2.1%
c20	1796	1834	+2.1%

Read Throughput (ops/s)

Concurrency	main	p1fix	Delta
c3	1091	1120	+2.7%
c5	1421	1449	+2.0%
c10	1663	1809	+8.8%
c15	1744	1773	+1.6%
c20	1796	1776	-1.1%

Steady-State Mean Latency

Op	Concurrency	main (ms)	p1fix (ms)	Delta
Write	c3	5.84	5.64	-3.5%
Write	c5	5.64	5.55	-1.5%
Write	c10	6.19	6.17	-0.3%
Write	c15	7.90	7.75	-2.0%
Write	c20	10.80	10.58	-2.1%
Read	c3	2.68	2.61	-2.7%
Read	c5	3.35	3.29	-1.8%
Read	c10	5.70	5.21	-8.6%
Read	c15	8.16	8.03	-1.6%
Read	c20	10.56	10.69	+1.3%

Steady-State CPU

Op	Concurrency	main	p1fix	Delta
Write	c3	32.6%	34.1%	+1.5pp
Write	c5	62.2%	63.4%	+1.2pp
Write	c10	97.2%	97.1%	-0.2pp
Write	c15	99.2%	98.3%	-0.9pp
Write	c20	99.3%	98.1%	-1.2pp
Read	c3	86.7%	85.7%	-1.0pp
Read	c5	97.5%	95.3%	-2.2pp
Read	c10	98.9%	99.4%	+0.4pp
Read	c15	99.5%	99.5%	-0.0pp
Read	c20	99.6%	97.2%	-2.3pp

Memory (Heap)

Op	Concurrency	main (MiB)	p1fix (MiB)	Delta
Write	c3	62.5	64.7	+2.2
Write	c5	70.9	70.3	-0.7
Write	c10	80.8	83.6	+2.9
Write	c15	87.7	88.8	+1.1
Write	c20	92.9	94.9	+2.0
Read	c3	200.5	202.9	+2.4
Read	c5	211.9	207.7	-4.3
Read	c10	224.3	220.5	-3.8
Read	c15	228.6	223.6	-5.0
Read	c20	240.4	235.3	-5.1

GC Metrics

Op	Concurrency	main pause (ms)	p1fix pause (ms)	main GCs	p1fix GCs
Write	c3	6.59	6.40	129	128
Write	c5	6.00	6.23	225	224
Write	c10	6.45	6.55	365	365
Write	c15	6.09	5.84	397	437
Write	c20	6.38	6.38	382	417
Read	c3	7.20	6.64	302	310
Read	c5	6.62	7.76	376	368
Read	c10	6.81	7.11	426	413
Read	c15	6.50	6.59	413	449
Read	c20	7.15	8.31	437	416

Thread Count

Op	Concurrency	main	p1fix	Delta
Write	c3	201	204	+3
Write	c5	213	216	+3
Write	c10	229	228	-1
Write	c15	228	229	+1
Write	c20	237	235	-2
Read	c3	210	207	-2
Read	c5	217	216	-1
Read	c10	229	240	+11
Read	c15	246	246	+0
Read	c20	251	251	-0

Per-Minute Timelines

Line 1 = main (baseline), Line 2 = p1fix (this PR). Each point is a 1-minute reporting interval.

Write c3 Throughput — main: 510 | p1fix: 529 ops/s

xychart-beta
    title "Write c3 Throughput (ops/s)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "ops/s" 13 --> 591
    line [59.3, 500.2, 503.9, 506.2, 510.3, 484.1, 516.0, 525.0, 521.0, 526.0, 443.2]
    line [478.6, 526.9, 525.6, 531.1, 532.2, 501.6, 537.0, 534.7, 533.2, 535.5, 15.9]

Write c5 Throughput — main: 880 | p1fix: 893 ops/s

xychart-beta
    title "Write c5 Throughput (ops/s)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "ops/s" 166 --> 994
    line [614.7, 882.0, 874.0, 885.9, 896.2, 809.8, 893.6, 894.5, 890.0, 896.3, 207.1]
    line [382.0, 884.2, 893.8, 892.7, 901.6, 886.9, 903.2, 888.2, 889.5, 898.8, 451.2]

Write c10 Throughput — main: 1579 | p1fix: 1583 ops/s

xychart-beta
    title "Write c10 Throughput (ops/s)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "ops/s" 27 --> 1766
    line [161.5, 1585.0, 1605.2, 1580.4, 1558.7, 1520.7, 1584.3, 1591.4, 1592.8, 1594.4, 1264.7]
    line [33.7, 1538.7, 1593.3, 1588.7, 1596.9, 1574.1, 1573.0, 1579.9, 1603.8, 1595.2, 1465.6]

Write c15 Throughput — main: 1837 | p1fix: 1875 ops/s

xychart-beta
    title "Write c15 Throughput (ops/s)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "ops/s" 183 --> 2110
    line [1470.1, 1894.3, 1847.1, 1815.8, 1802.9, 1775.6, 1832.3, 1842.8, 1865.5, 1855.6, 228.8]
    line [1263.6, 1918.1, 1880.7, 1881.8, 1872.4, 1809.6, 1850.1, 1881.8, 1889.9, 1890.1, 466.8]

Write c20 Throughput — main: 1796 | p1fix: 1834 ops/s

xychart-beta
    title "Write c20 Throughput (ops/s)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "ops/s" 327 --> 2104
    line [747.5, 1788.2, 1775.4, 1794.3, 1793.5, 1754.4, 1794.5, 1805.9, 1828.2, 1830.7, 846.6]
    line [409.1, 1912.9, 1900.1, 1841.4, 1807.3, 1782.5, 1752.0, 1797.9, 1840.4, 1867.3, 1255.1]

Read c3 Throughput — main: 1091 | p1fix: 1120 ops/s

xychart-beta
    title "Read c3 Throughput (ops/s)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "ops/s" 39 --> 1250
    line [49.2, 1094.1, 1096.8, 1101.6, 1104.3, 1063.4, 1086.7, 1088.4, 1093.0, 1090.5, 998.3]
    line [695.4, 1116.8, 1112.5, 1125.1, 1132.2, 1067.7, 1129.8, 1133.2, 1129.3, 1136.0, 350.3]

Read c5 Throughput — main: 1421 | p1fix: 1449 ops/s

xychart-beta
    title "Read c5 Throughput (ops/s)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "ops/s" 182 --> 1611
    line [227.2, 1438.2, 1425.1, 1436.7, 1433.9, 1407.2, 1404.2, 1416.5, 1417.1, 1410.8, 1095.7]
    line [1067.3, 1446.9, 1442.2, 1462.1, 1451.1, 1446.5, 1461.9, 1464.5, 1443.9, 1420.7, 274.4]

Read c10 Throughput — main: 1663 | p1fix: 1809 ops/s

xychart-beta
    title "Read c10 Throughput (ops/s)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "ops/s" 40 --> 2014
    line [605.0, 1679.0, 1665.4, 1670.5, 1677.1, 1637.1, 1670.3, 1665.4, 1655.6, 1646.0, 910.2]
    line [49.9, 1802.5, 1831.0, 1822.6, 1817.8, 1797.1, 1797.3, 1808.9, 1808.8, 1792.3, 1639.6]

Read c15 Throughput — main: 1744 | p1fix: 1773 ops/s

xychart-beta
    title "Read c15 Throughput (ops/s)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "ops/s" 249 --> 1976
    line [1292.9, 1770.6, 1772.7, 1767.2, 1771.4, 1706.5, 1729.2, 1720.1, 1726.3, 1734.8, 311.6]
    line [847.1, 1796.5, 1785.5, 1784.8, 1776.1, 1752.5, 1766.7, 1769.1, 1763.0, 1759.3, 767.8]

Read c20 Throughput — main: 1796 | p1fix: 1776 ops/s

xychart-beta
    title "Read c20 Throughput (ops/s)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "ops/s" 161 --> 2004
    line [200.8, 1821.8, 1809.5, 1807.1, 1809.4, 1778.0, 1776.2, 1787.2, 1783.3, 1790.8, 1442.7]
    line [1202.5, 1790.7, 1762.2, 1778.2, 1758.9, 1764.4, 1778.6, 1775.2, 1789.4, 1783.8, 411.6]

Write c3 CPU

xychart-beta
    title "Write c3 CPU (%)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "CPU %" 13 --> 41
    line [37.5, 31.6, 31.7, 31.4, 31.8, 33.1, 33.3, 33.7, 34.0, 32.9, 32.2]
    line [34.5, 32.3, 33.9, 34.2, 34.8, 34.9, 33.4, 34.8, 34.5, 34.4, 15.8]

Write c5 CPU

xychart-beta
    title "Write c5 CPU (%)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "CPU %" 41 --> 71
    line [58.9, 60.1, 61.4, 61.9, 63.4, 60.6, 64.0, 63.2, 62.8, 62.9, 60.8]
    line [51.1, 62.0, 61.7, 62.6, 63.5, 64.8, 64.8, 63.6, 63.3, 64.4, 55.0]

Write c10 CPU

xychart-beta
    title "Write c10 CPU (%)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "CPU %" 38 --> 107
    line [47.2, 97.0, 97.1, 97.4, 97.5, 96.8, 97.3, 97.4, 97.4, 97.3, 97.1]
    line [48.8, 96.9, 96.6, 97.0, 97.0, 96.7, 97.5, 97.2, 97.3, 97.4, 97.4]

Write c15 CPU

xychart-beta
    title "Write c15 CPU (%)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "CPU %" 66 --> 109
    line [82.6, 99.1, 99.1, 99.2, 99.1, 99.2, 99.2, 99.2, 99.2, 99.2, 98.8]
    line [88.6, 99.1, 99.1, 99.0, 99.1, 97.6, 97.1, 97.4, 98.0, 98.2, 98.1]

Write c20 CPU

xychart-beta
    title "Write c20 CPU (%)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "CPU %" 52 --> 109
    line [72.1, 99.3, 99.3, 99.3, 99.3, 99.2, 99.3, 99.3, 99.3, 99.3, 99.1]
    line [65.1, 99.1, 98.4, 97.9, 97.9, 97.3, 97.3, 97.6, 98.2, 98.7, 98.9]

Read c3 CPU

xychart-beta
    title "Read c3 CPU (%)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "CPU %" 16 --> 97
    line [20.4, 87.2, 87.6, 87.8, 87.9, 84.1, 86.9, 86.3, 86.1, 86.3, 86.5]
    line [54.7, 85.2, 86.6, 86.1, 86.2, 81.3, 86.5, 86.4, 86.5, 86.8, 85.8]

Read c5 CPU

xychart-beta
    title "Read c5 CPU (%)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11", "min 12"]
    y-axis "CPU %" 11 --> 108
    line [36.2, 97.8, 98.1, 98.1, 98.0, 97.7, 96.8, 96.9, 97.0, 97.3, 94.1, 0.0]
    line [13.3, 75.2, 97.9, 98.0, 98.1, 97.9, 97.7, 97.9, 98.1, 96.8, 95.2, 93.0]

Read c10 CPU

xychart-beta
    title "Read c10 CPU (%)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "CPU %" 6 --> 109
    line [58.0, 97.8, 98.9, 99.3, 99.4, 99.3, 99.4, 99.4, 99.0, 97.9, 97.9]
    line [7.6, 99.3, 99.4, 99.4, 99.4, 99.3, 99.4, 99.4, 99.4, 99.4, 99.4]

Read c15 CPU

xychart-beta
    title "Read c15 CPU (%)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "CPU %" 57 --> 109
    line [78.0, 99.5, 99.5, 99.5, 99.4, 99.5, 99.5, 99.5, 99.5, 99.5, 99.5]
    line [70.6, 99.4, 99.5, 99.5, 99.5, 99.4, 99.5, 99.5, 99.5, 99.5, 99.5]

Read c20 CPU

xychart-beta
    title "Read c20 CPU (%)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11", "min 12"]
    y-axis "CPU %" 8 --> 110
    line [38.4, 99.5, 99.5, 99.6, 99.6, 99.5, 99.6, 99.6, 99.6, 99.6, 99.6, 0.0]
    line [9.8, 76.4, 99.5, 99.5, 99.6, 99.5, 99.5, 99.6, 99.6, 99.6, 99.6, 99.6]

Write c3 Latency

xychart-beta
    title "Write c3 Mean Latency (ms)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "ms" 4 --> 23
    line [7.2, 6.0, 5.9, 5.9, 5.8, 6.2, 5.8, 5.7, 5.7, 5.7, 6.2]
    line [5.8, 5.7, 5.7, 5.6, 5.6, 5.9, 5.5, 5.6, 5.6, 5.6, 21.2]

Write c5 Latency

xychart-beta
    title "Write c5 Mean Latency (ms)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "ms" 4 --> 7
    line [6.0, 5.6, 5.7, 5.6, 5.5, 6.1, 5.6, 5.5, 5.6, 5.5, 6.8]
    line [6.3, 5.6, 5.6, 5.6, 5.5, 5.6, 5.5, 5.6, 5.6, 5.5, 5.9]

Write c10 Latency

xychart-beta
    title "Write c10 Mean Latency (ms)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "ms" 5 --> 25
    line [11.8, 6.2, 6.1, 6.2, 6.3, 6.4, 6.2, 6.1, 6.1, 6.1, 6.2]
    line [23.2, 6.3, 6.1, 6.1, 6.1, 6.2, 6.2, 6.2, 6.1, 6.1, 6.1]

Write c15 Latency

xychart-beta
    title "Write c15 Mean Latency (ms)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "ms" 6 --> 9
    line [8.6, 7.7, 7.9, 8.0, 8.0, 8.2, 7.9, 7.9, 7.8, 7.8, 7.6]
    line [8.6, 7.6, 7.7, 7.7, 7.7, 8.0, 7.9, 7.7, 7.7, 7.7, 7.5]

Write c20 Latency

xychart-beta
    title "Write c20 Mean Latency (ms)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "ms" 8 --> 16
    line [13.6, 10.8, 10.9, 10.8, 10.8, 11.1, 10.8, 10.7, 10.6, 10.6, 10.7]
    line [14.9, 10.1, 10.2, 10.5, 10.7, 10.9, 11.1, 10.8, 10.5, 10.3, 10.3]

Read c3 Latency

xychart-beta
    title "Read c3 Mean Latency (ms)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "ms" 2 --> 5
    line [4.7, 2.7, 2.7, 2.7, 2.7, 2.8, 2.7, 2.7, 2.7, 2.7, 2.7]
    line [2.8, 2.6, 2.6, 2.6, 2.6, 2.7, 2.6, 2.6, 2.6, 2.6, 2.8]

Read c5 Latency

xychart-beta
    title "Read c5 Mean Latency (ms)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "ms" 3 --> 5
    line [4.6, 3.3, 3.3, 3.3, 3.3, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4]
    line [3.6, 3.3, 3.3, 3.3, 3.3, 3.3, 3.3, 3.3, 3.3, 3.4, 3.5]

Read c10 Latency

xychart-beta
    title "Read c10 Mean Latency (ms)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "ms" 4 --> 18
    line [6.9, 5.7, 5.7, 5.7, 5.6, 5.8, 5.7, 5.7, 5.7, 5.8, 5.8]
    line [16.0, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.3, 5.2]

Read c15 Latency

xychart-beta
    title "Read c15 Mean Latency (ms)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "ms" 6 --> 10
    line [8.9, 8.0, 8.0, 8.1, 8.0, 8.3, 8.2, 8.3, 8.2, 8.2, 8.1]
    line [9.1, 7.9, 8.0, 8.0, 8.0, 8.1, 8.1, 8.0, 8.1, 8.1, 8.3]

Read c20 Latency

xychart-beta
    title "Read c20 Mean Latency (ms)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "ms" 8 --> 18
    line [16.4, 10.4, 10.5, 10.5, 10.5, 10.7, 10.7, 10.6, 10.6, 10.6, 10.5]
    line [11.8, 10.6, 10.8, 10.7, 10.8, 10.7, 10.7, 10.7, 10.6, 10.6, 10.9]

Write c3 Heap

xychart-beta
    title "Write c3 Heap (MiB)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10"]
    y-axis "MiB" 43 --> 71
    line [53.9, 54.6, 57.9, 57.7, 58.0, 58.3, 61.4, 61.9, 62.2, 62.5]
    line [56.7, 56.3, 59.5, 59.8, 60.2, 60.5, 63.6, 64.1, 64.4, 64.7]

Write c5 Heap

xychart-beta
    title "Write c5 Heap (MiB)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10"]
    y-axis "MiB" 45 --> 78
    line [58.1, 59.9, 62.0, 63.4, 64.3, 68.2, 69.0, 69.6, 70.3, 70.9]
    line [56.0, 59.5, 62.1, 63.4, 64.3, 66.7, 68.4, 69.2, 69.7, 70.3]

Write c10 Heap

xychart-beta
    title "Write c10 Heap (MiB)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10"]
    y-axis "MiB" 44 --> 92
    line [62.0, 64.5, 68.1, 70.0, 72.1, 76.3, 77.7, 78.9, 79.9, 80.8]
    line [55.5, 63.2, 70.0, 72.3, 74.1, 75.5, 79.9, 81.6, 82.7, 83.6]

Write c15 Heap

xychart-beta
    title "Write c15 Heap (MiB)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10"]
    y-axis "MiB" 49 --> 98
    line [60.9, 69.1, 72.8, 75.6, 77.8, 82.2, 83.7, 85.1, 86.5, 87.7]
    line [67.3, 72.4, 75.8, 77.6, 79.7, 83.6, 85.0, 86.3, 87.6, 88.8]

Write c20 Heap

xychart-beta
    title "Write c20 Heap (MiB)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10"]
    y-axis "MiB" 52 --> 104
    line [70.0, 75.7, 78.6, 81.3, 83.3, 87.5, 89.5, 91.2, 92.9, 0.0]
    line [65.3, 72.7, 77.2, 79.7, 82.4, 87.1, 89.5, 91.2, 93.2, 94.9]

Read c3 Heap

xychart-beta
    title "Read c3 Heap (MiB)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10", "min 11"]
    y-axis "MiB" 4 --> 223
    line [187.8, 192.7, 193.5, 194.0, 194.6, 198.3, 199.1, 199.6, 200.0, 200.5, 0.0]
    line [5.0, 190.5, 195.2, 196.0, 196.6, 197.1, 200.7, 201.4, 201.9, 202.4, 202.9]

Read c5 Heap

xychart-beta
    title "Read c5 Heap (MiB)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10"]
    y-axis "MiB" 150 --> 233
    line [192.8, 201.4, 203.7, 204.7, 205.4, 209.3, 210.2, 210.8, 211.3, 211.9]
    line [188.0, 195.8, 198.8, 200.4, 201.3, 204.1, 205.7, 206.5, 207.1, 207.7]

Read c10 Heap

xychart-beta
    title "Read c10 Heap (MiB)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10"]
    y-axis "MiB" 149 --> 247
    line [197.9, 208.3, 211.9, 213.8, 215.3, 220.0, 221.1, 222.0, 222.8, 224.3]
    line [186.1, 201.5, 205.2, 209.5, 212.0, 216.3, 217.8, 218.6, 219.4, 220.5]

Read c15 Heap

xychart-beta
    title "Read c15 Heap (MiB)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10"]
    y-axis "MiB" 157 --> 251
    line [197.5, 210.8, 215.5, 217.8, 219.8, 223.8, 225.0, 226.2, 227.6, 228.6]
    line [196.1, 207.3, 210.9, 213.2, 214.9, 218.8, 220.0, 221.0, 222.2, 223.6]

Read c20 Heap

xychart-beta
    title "Read c20 Heap (MiB)"
    x-axis ["min 1", "min 2", "min 3", "min 4", "min 5", "min 6", "min 7", "min 8", "min 9", "min 10"]
    y-axis "MiB" 153 --> 264
    line [197.7, 215.6, 221.6, 227.1, 229.3, 233.7, 236.2, 238.0, 239.3, 240.4]
    line [191.6, 208.5, 214.9, 219.7, 223.2, 227.8, 229.6, 231.6, 233.2, 235.3]

Methodology

VM: Azure VM, 2 vCPU
JVM: -Xmx8g -Xms8g -XX:+UseG1GC -XX:MaxDirectMemorySize=2g
Tenants: 30 Cosmos DB accounts (multi-tenant)
Connection: Gateway mode, HTTP/2
Duration: 10 min per run, 5 min cooldown
Steady-state: Skip first 1 min (warmup), drop last min (partial)
All 20 runs complete (10 write + 10 read)

Replaced individual createItem calls with executeBulkOperations for document pre-population in AsyncBenchmark, AsyncCtlWorkload, AsyncEncryptionBenchmark, and ReadMyWriteWorkflow. Also migrated ReadMyWriteWorkflow from internal Document/AsyncDocumentClient APIs to the public PojoizedJson/CosmosAsyncContainer v4 APIs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace pre-materialized List<CosmosItemOperation> with Flux.range().map() to lazily emit operations on demand. This avoids holding all N operations in memory simultaneously - the bulk executor consumes them as they are generated, allowing GC to reclaim processed operation wrappers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

If a bulk operation fails, fall back to individual createItem calls with retry logic (max 5 retries for transient errors: 410, 408, 429, 500, 503) and 409 conflict suppression. The retry helper is centralized in BenchmarkHelper.retryFailedBulkOperations(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

1. HttpHeaders.set()/getHeader(): Add toLowerCaseIfNeeded() fast-path that skips String.toLowerCase() allocation when header name is already all-lowercase (common for x-ms-* and standard Cosmos headers). 2. RxGatewayStoreModel.getUri(): Build URI via StringBuilder instead of the 7-arg URI constructor which re-validates and re-encodes all components. Since components are already well-formed, the single-arg URI(String) constructor is sufficient and avoids URI$Parser overhead. 3. RxDocumentServiceRequest: Cache getCollectionName() result to avoid repeated O(n) slash-scanning across 14+ call sites per request lifecycle. Cache is invalidated when resourceAddress changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The char-by-char scan added method call + branch overhead that offset the toLowerCase savings. Profiling showed ConcurrentHashMap.get(), HashMap.putVal(), and the scan loop itself caused ~10% throughput regression. Reverting to original toLowerCase(Locale.ROOT) which the JIT handles as an intrinsic. The URI construction and collection name caching optimizations are retained as they don't have this issue. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The JFR profiling showed URI$Parser.parse() consuming ~757 CPU samples per 60s recording, all from RxGatewayStoreModel.getUri(). The root cause was a String->URI->String round-trip: we built a URI string, parsed it into java.net.URI (expensive), then Reactor Netty called .toASCIIString() to convert it back to a String. Changes: - RxGatewayStoreModel.getUri() now returns String directly (no URI parse) - HttpRequest: add uriString field with lazy URI parsing via uri() - HttpRequest: new String-based constructor to skip URI parse entirely - ReactorNettyClient: use request.uriString() instead of uri().toASCIIString() - RxGatewayStoreModel: use uriString() for diagnostics/error paths - URI is only parsed lazily on error paths that require a URI object Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add http2Enabled and http2MaxConcurrentStreams config options to TenantWorkloadConfig. When http2Enabled=true, configures Http2ConnectionConfig on GatewayConnectionConfig for AsyncBenchmark, AsyncCtlWorkload, and AsyncEncryptionBenchmark. Usage in workload JSON config: "http2Enabled": true, "http2MaxConcurrentStreams": 30 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…aults Add missing cases in applyField switch statement so these fields are properly inherited from tenantDefaults, not only from individual tenant entries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@JsonProperty

Ensures every @JsonProperty field in TenantWorkloadConfig has a corresponding case in the applyField() switch statement. This prevents future fields from silently failing to inherit from tenantDefaults, which was the root cause of the http2Enabled bug. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Previously, every Gateway response copied ALL Netty response headers through a 3-step chain: 1. Netty headers → HttpHeaders (toLowerCase + new HttpHeader per entry) 2. HttpHeaders.toLowerCaseMap() → new HashMap<String,String> 3. StoreResponse constructor → String[] arrays Now the flow is: 1. Netty headers → Map<String,String> directly (single toLowerCase pass) 2. StoreResponse constructor → String[] arrays Changes: - HttpResponse: add headerMap() returning Map<String,String> directly - ReactorNettyHttpResponse: override headerMap() to build lowercase map from Netty headers without intermediate HttpHeaders object - HttpTransportSerializer: unwrapToStoreResponse takes Map<String,String> instead of HttpHeaders - RxGatewayStoreModel: use httpResponse.headerMap() instead of headers() - ThinClientStoreModel: pass response.getHeaders().asMap() directly instead of wrapping in new HttpHeaders() This eliminates per-response: ~20 HttpHeader object allocations, ~20 extra toLowerCase calls, and one intermediate HashMap. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

StoreResponse now stores the response headers Map<String,String> directly instead of converting to parallel String[] arrays. This eliminates a redundant copy since RxDocumentServiceResponse and StoreClient were immediately converting back to Map. Before: Map → String[] + String[] → Map (3 allocations, 2 iterations) After: Map shared directly (0 extra allocations, 0 extra iterations) Also upgrades StoreResponse.getHeaderValue() from O(n) linear scan to O(1) HashMap.get() with case-insensitive fallback. Null header values from Netty are skipped (matching old HttpHeaders.set behavior which removed null entries). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@deprecated

The new toArray(new String[0]) calls in getResponseHeaderNames() and getResponseHeaderValues() created garbage arrays on every call. These methods have zero production callers — only test validators used them. Changes: - Mark getResponseHeaderNames/Values as @deprecated - Update StoreResponseValidator to use getResponseHeaders() map directly instead of converting to arrays and doing indexOf lookups Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Revert the headerMap() direct-from-Netty path because the per-header toLowerCase() calls caused a throughput regression vs v4. The JIT optimizes the existing HttpHeaders.set() + toLowerCaseMap() path better. Kept improvements: - StoreResponse stores Map<String,String> directly (no String[] arrays) - RxDocumentServiceResponse shares the Map reference (no extra copy) - StoreClient uses getResponseHeaders() directly (no Map reconstruction) - StoreResponse.getHeaderValue() uses HashMap.get() instead of O(n) scan - unwrapToStoreResponse calls toLowerCaseMap() once, reuses the Map for both validateOrThrow and StoreResponse construction Net effect vs v4: eliminates the Map→String[]→Map round-trip while preserving the JIT-optimized HttpHeaders copy path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Netty's HttpObjectDecoder starts with a 256-byte buffer for header parsing and resizes via ensureCapacityInternal() as headers grow. Cosmos responses have ~2-4KB of headers, triggering multiple resizes. Pre-sizing to 16KB (16384 bytes) avoids the resize overhead at the cost of ~16KB per connection (negligible vs connection pool size). JFR v6 showed AbstractStringBuilder.ensureCapacityInternal at 248 samples (1.6% CPU). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Revert all header copy chain changes (R3/v5/v6/v7) back to the v4 state which had the best throughput. Only addition on top of v4 is initialBufferSize(16384) to pre-size Netty's header parsing buffer and reduce ensureCapacityInternal() resize overhead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Benchmark showed initialBufferSize change also produced regression. Reverting to pure v4 state (URI elimination + collection name cache) which had the best throughput at 2,421 ops/s. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

After bulk document pre-population, the CPU spike can pollute workload metrics. Add CpuMonitor utility that captures baseline CPU before ingestion and waits for it to settle (baseline + 10%, max 5 minutes) before starting the workload. Cool-down is internal default behavior — not user-configurable. Benchmark duration is unaffected since each benchmark measures its own start time inside run(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… types These operation types were functionally identical to their Throughput counterparts after metrics capture was unified. Remove them to reduce confusion and dead code paths. Affected: Operation enum, AsyncBenchmark, SyncBenchmark, AsyncEncryptionBenchmark, BenchmarkOrchestrator, Main, tests, README, and workload-config-sample.json. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Fix thread-unsafe ArrayList mutations: build docs/operations eagerly in loops instead of reactive map(), use Collections.synchronizedList() for failedResponses across AsyncBenchmark, AsyncCtlWorkload, ReadMyWriteWorkflow, and AsyncEncryptionBenchmark - Fix encryption retry bypass: refactor retryFailedBulkOperations to accept a BiFunction<PojoizedJson, PartitionKey, Mono<Void>> so encryption benchmark retries through the encryption container - Re-throw errors after retries exhausted instead of silently swallowing (per reviewer direction) - Remove unused partitionKeyName parameter from retryFailedBulkOperations - Add NaN/negative handling for getProcessCpuLoad() in CpuMonitor - Cache OperatingSystemMXBean as static final in CpuMonitor - Log warning when HTTP/2 is enabled but connection mode is DIRECT - Add retry logic with transient error handling to SyncBenchmark pre-population (408, 410, 429, 500, 503) and 409 conflict handling - Rename writeLatency->writeThroughputWithDataProvider and readLatency->readThroughputWithDataProvider in WorkflowTest - Reduce per-item error logging to debug level in AsyncCtlWorkload, emit aggregated warn summary - Fix brittle test path: use basedir property, UTF-8 charset, and restrict regex to applyField method body Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@JsonProperty

- Remove HTTP/2 Direct mode warnings: Direct mode also uses gateway connections for metadata, so HTTP/2 settings can still be relevant - Optimize memory: build docs eagerly into list but create CosmosItemOperations lazily via Flux.fromIterable().map(), avoiding storing both lists simultaneously - Rewrite TenantWorkloadConfigApplyFieldTest to use pure reflection: invokes private applyField() via reflection for each @JsonProperty and verifies the field was set, eliminating brittle source file parsing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Revert to lazy doc creation via Flux.range().map() — this is thread-safe because Reactive Streams guarantees serial onNext signals at the map stage (rule 1.3). The thread-safety fix for failedResponses (Collections.synchronizedList) is retained since doOnNext on executeBulkOperations can fire from executor threads. - Store only id + partitionKey in docsToRead instead of full documents. All docsToRead consumers only access getId() and getProperty(pk). This reduces memory from O(N * docSize) to O(N * idSize). - Add BenchmarkHelper.idsToLightweightDocs() utility for constructing minimal PojoizedJson objects from collected ids. - ReadMyWriteWorkflow retains full docs in its cache since queries need QUERY_FIELD_NAME, but still uses lazy creation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Reverts the id-only optimization — docsToRead retains full PojoizedJson objects as before. Lazy creation via Flux.range().map() is kept since map() is serial per Reactive Streams spec. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Switch from Retry.max(5) to Retry.backoff(5, 100ms) with max 5s backoff and 0.5 jitter, aligned with BulkWriter reference pattern - Add status code 449 (RetryWith) to retryable set, matching BulkWriter - Reduce retry concurrency from 100 to 20 per reviewer request - Add 449 to SyncBenchmark transient retry set Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Changed from instance field initialized in constructor to static final field, following the standard Logger pattern. Uses package-visible access since subclasses in the same package reference it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- AsyncBenchmark: static final (package-visible for subclasses) - SyncBenchmark: static final (package-visible for consistency) - AsyncCtlWorkload: private static final (no subclasses) - Removed constructor logger assignments in all three classes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Pass workload concurrency to retryFailedBulkOperations and cap at 20, so retry parallelism adapts to the configured workload concurrency. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…dler Replace O(n) forEach iteration over all HTTP/2 response headers with a direct O(1) hash lookup via Http2Headers.get(). This handler runs on the IO event loop thread and was consuming ~9.1% of total CPU by scanning all 15-25 headers on every response just to find x-ms-serviceversion. Changes: - Use headers.get(SERVER_VERSION_KEY) instead of headers.forEach() - Cache the header key as a static AsciiString constant - Remove unused StringUtils import Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…pplyFieldTest The test was using 'test-sentinel-value' which fails Integer.parseInt() inside applyField(). Since applyField() catches exceptions internally and does not rethrow, the test could not detect the NumberFormatException and incorrectly reported Integer fields as missing from the switch. Changed sentinel to '42' which is valid for String, Integer, and Boolean (via Boolean.parseBoolean) field types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…p2-header-cleaner-handler

Copilot

Pull request overview

This PR targets Cosmos Java HTTP/2 performance by optimizing response header cleanup on the hot path, and it also updates the Cosmos benchmark harness to better support/measure HTTP/2 throughput scenarios.

Changes:

Optimize HTTP/2 response header cleanup by replacing per-header iteration with a direct lookup for x-ms-serviceversion.
Refactor benchmark pre-population to use bulk operations + retry helper, and add a CPU cool-down step between ingestion and measured workload.
Remove benchmark latency operation modes, updating configs/tests/docs accordingly, and add HTTP/2 benchmark config knobs.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/http/Http2ResponseHeaderCleanerHandler.java	Replace O(n) header scan with direct lookup + trim for `x-ms-serviceversion`.
sdk/cosmos/azure-cosmos-benchmark/workload-config-sample.json	Update sample operation to `ReadThroughput`.
sdk/cosmos/azure-cosmos-benchmark/src/test/java/com/azure/cosmos/benchmark/WorkflowTest.java	Rename/update tests to use throughput operations.
sdk/cosmos/azure-cosmos-benchmark/src/test/java/com/azure/cosmos/benchmark/TenantWorkloadConfigApplyFieldTest.java	New reflection-based test to ensure `applyField()` covers all `@JsonProperty` fields.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/encryption/AsyncEncryptionBenchmark.java	Enable HTTP/2 gateway config + switch pre-population to bulk + retry helper.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ctl/AsyncCtlWorkload.java	Enable HTTP/2 gateway config + switch pre-population to bulk + retry helper.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/TenantWorkloadConfig.java	Add `http2Enabled`/`http2MaxConcurrentStreams` config + applyField support.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/SyncBenchmark.java	Adjust pre-population behavior and add retry/backoff logic for createItem.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ReadMyWriteWorkflow.java	Migrate workflow to v4 container APIs and bulk pre-population.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/Operation.java	Remove latency operations from benchmark CLI enum.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/Main.java	Update validation messaging and operation handling after latency op removal.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/CpuMonitor.java	New CPU monitor utility used to cool down between ingestion and workload.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkOrchestrator.java	Capture baseline CPU + cool-down before workload execution.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkHelper.java	Add shared retry helper for failed bulk operation responses.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/AsyncBenchmark.java	Enable HTTP/2 gateway config + switch pre-population to bulk + retry helper.
sdk/cosmos/azure-cosmos-benchmark/README.md	Update docs to refer to throughput workloads; remove latency operations.

You can also share your feedback on Copilot code review. Take the survey.

sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/SyncBenchmark.java

...os/src/main/java/com/azure/cosmos/implementation/http/Http2ResponseHeaderCleanerHandler.java

...s/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkOrchestrator.java

...os/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ctl/AsyncCtlWorkload.java

…thub.com/xinlian12/azure-sdk-for-java into perf/p1-fix-http2-header-cleaner-handler

…, and exception cause - Fix log grammar in Http2ResponseHeaderCleanerHandler: 'There is extra whitespace' - Track lastException in SyncBenchmark retry loop for proper cause chaining - Renumber lifecycle steps 3/4 to 5/6 in BenchmarkOrchestrator - Fix misleading log message in AsyncCtlWorkload pre-population error Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

FabianMeiswinkel

LGTM

xinlian12 · 2026-03-18T04:11:44Z

/azp run java - cosmos - tests

azure-pipelines · 2026-03-18T04:12:09Z

Azure Pipelines successfully started running 1 pipeline(s).

The Configuration class was refactored to remove CLI parameters in favor of JSON-based workload config. Update readMyWritesCLI and writeThroughputCLI tests to create a temp JSON config file and pass -workloadConfig to Main.main(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…eness The ConsistencyLevel enum uses SCREAMING_SNAKE_CASE (BOUNDED_STALENESS) but config values use PascalCase display names (BoundedStaleness). Simple toUpperCase() + valueOf() fails for multi-word values. Match by both display name and enum name case-insensitively. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

xinlian12 · 2026-03-18T06:04:24Z

/azp run java - cosmos - tests

azure-pipelines · 2026-03-18T06:04:49Z

Azure Pipelines successfully started running 1 pipeline(s).

Annie Liang and others added 30 commits March 11, 2026 19:10

Remove test-output from tracking and add to .gitignore

89339b4

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Remove test-output from tracking

6f29ee3

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add test-output/ to gitignore

f7a841d

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge branch 'main' into optimizeBenchmarkPreCreateDocFlow

1c15947

revert URI elimiation change

d4ce345

Merge branch 'main' into optimizeBenchmarkPreCreateDocFlow

54ab5f0

Annie Liang and others added 3 commits March 17, 2026 14:19

Use Math.min(workloadConcurrency, 20) for retry concurrency

8cb463e

Pass workload concurrency to retryFailedBulkOperations and cap at 20, so retry parallelism adapts to the configured workload concurrency. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions bot added the Cosmos label Mar 17, 2026

FabianMeiswinkel and others added 3 commits March 18, 2026 01:37

Merge branch 'main' into optimizeBenchmarkPreCreateDocFlow

39d64df

Merge branch 'optimizeBenchmarkPreCreateDocFlow' into perf/p1-fix-htt…

5acb565

…p2-header-cleaner-handler

xinlian12 marked this pull request as ready for review March 18, 2026 03:21

xinlian12 requested review from a team and kirankumarkolli as code owners March 18, 2026 03:21

Copilot AI review requested due to automatic review settings March 18, 2026 03:21

Copilot started reviewing on behalf of xinlian12 March 18, 2026 03:22 View session

xinlian12 changed the title ~~Perf/p1 fix http2 header cleaner handler~~ [Perf]ImproveHttp2HeaderCleanerHandler Mar 18, 2026

Copilot AI reviewed Mar 18, 2026

View reviewed changes

FabianMeiswinkel and others added 4 commits March 18, 2026 04:36

Merge branch 'main' into perf/p1-fix-http2-header-cleaner-handler

0b8c38a

Merge branch 'main' into perf/p1-fix-http2-header-cleaner-handler

80db8dd

Merge branch 'perf/p1-fix-http2-header-cleaner-handler' of https://gi…

7ae9ff7

…thub.com/xinlian12/azure-sdk-for-java into perf/p1-fix-http2-header-cleaner-handler

FabianMeiswinkel approved these changes Mar 18, 2026

View reviewed changes

Annie Liang and others added 2 commits March 17, 2026 23:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf]ImproveHttp2HeaderCleanerHandler#48455

[Perf]ImproveHttp2HeaderCleanerHandler#48455
xinlian12 wants to merge 42 commits intoAzure:mainfrom
xinlian12:perf/p1-fix-http2-header-cleaner-handler

xinlian12 commented Mar 17, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FabianMeiswinkel left a comment

Uh oh!

xinlian12 commented Mar 18, 2026

Uh oh!

azure-pipelines bot commented Mar 18, 2026

Uh oh!

xinlian12 commented Mar 18, 2026

Uh oh!

azure-pipelines bot commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xinlian12 commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes (Http2ResponseHeaderCleanerHandler.java):

H2 Benchmark Sweep Results

Summary

Steady-State Throughput

Write Throughput (ops/s)

Read Throughput (ops/s)

Steady-State Mean Latency

Steady-State CPU

Memory (Heap)

GC Metrics

Thread Count

Per-Minute Timelines

Methodology

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FabianMeiswinkel left a comment

Choose a reason for hiding this comment

Uh oh!

xinlian12 commented Mar 18, 2026

Uh oh!

azure-pipelines bot commented Mar 18, 2026

Uh oh!

xinlian12 commented Mar 18, 2026

Uh oh!

azure-pipelines bot commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xinlian12 commented Mar 17, 2026 •

edited

Loading