Add Fortio-Envoy optimization guide#29
Conversation
Signed-off-by: Vaibhav Shankar <vaibhav.shankar@intel.com>
Signed-off-by: Vaibhav Shankar <vaibhav.shankar@intel.com>
Signed-off-by: Vaibhav Shankar <vaibhav.shankar@intel.com>
Signed-off-by: Vaibhav Shankar <vaibhav.shankar@intel.com>
| | High LLC-load-misses in `perf stat` | Cross-NUMA memory access | Pin to single NUMA node | | ||
| | GC overhead in perf traces (`gcDrain`, `trygetfull`) | High allocation rate with default GC | Build Fortio with GreenTea GC (Go 1.25.1); raise `GOGC` | | ||
| | Envoy CPU bottlenecked in TLS | mTLS handshake overhead | Enable TLS session resumption. Make TLS communicaiton/handshake Async | | ||
| | NIC softirq on same cores as Envoy | IRQ affinity not set or not part of of same NUMA | Separate NIC IRQ cores from Envoy worker cores or ensure cores + mem + NIC are on same NUMA | |
There was a problem hiding this comment.
did not understand this comment
There was a problem hiding this comment.
| | NIC softirq on same cores as Envoy | IRQ affinity not set or not part of of same NUMA | Separate NIC IRQ cores from Envoy worker cores or ensure cores + mem + NIC are on same NUMA | | |
| | NIC softirq on same cores as Envoy | IRQ affinity not set or not part of same NUMA | Separate NIC IRQ cores from Envoy worker cores or ensure cores + mem + NIC are on same NUMA | |
|
|
||
| --- | ||
| ### 5. Use fewer CPU cores as possible to run Envoy + fortio | ||
| Fortio and Envoy are intentionally run on a reduced number of CPU cores to avoid excessive spin-lock contention and busy-waiting behavior. By limiting core availability, the benchmark prevents threads from continuously spinning on shared locks and instead exposes meaningful contention, scheduling, and proxy-path behavior under realistic CPU pressure |
There was a problem hiding this comment.
Is the suggestion to increase docker cpu quota but restrict to fewer CPU cores? This is a bit confusing. Can you clarify/elaborate?
should cpu quota match the number of cores it is being pinned to
i.e. --cpus = -cpuset-cpus ""
|
|
||
| #### Worker and socket tuning | ||
| - **`SO_REUSEPORT`**: Already used by Envoy by default. Verify it is not disabled by any sysctls - it distributes `accept()` load evenly across worker threads without a shared accept mutex. | ||
| - **`--concurrency`**: Tune thread concurrency to match the number of physical cores assigned to the container. Over-provisioning causes false sharing, under-provisioning wastes hardware. |
There was a problem hiding this comment.
Same as bullet 2? --concurrency equal to --cpus
| ### 6. Other Envoy Tuning -- which can be explored | ||
|
|
||
| #### Worker and socket tuning | ||
| - **`SO_REUSEPORT`**: Already used by Envoy by default. Verify it is not disabled by any sysctls - it distributes `accept()` load evenly across worker threads without a shared accept mutex. |
There was a problem hiding this comment.
Will be useful to provide which sysctl setting to verify
|
|
||
| ### 2. Increase CPU Quota for Both Containers | ||
|
|
||
| With NUMA pinning in place, increasing the CPU quota for both Fortio and Envoy containers further reduces scheduling delays and spin lock overhead. Keep `--concurrency` equal to `--cpus` to avoid over-subscription. The numbers are just examples. Tune this based on the SKU/cores used. |
There was a problem hiding this comment.
On 256 threads system, you have allocated only 32+16 cores? What would be a good general recommendation? If someone had to tune based on SKU/cores they have?
Add Fortio-Envoy optimization guide and related documentation.