Skip to content

Add Fortio-Envoy optimization guide#29

Open
vaibhavk2 wants to merge 4 commits intointel:mainfrom
vaibhavk2:envoy
Open

Add Fortio-Envoy optimization guide#29
vaibhavk2 wants to merge 4 commits intointel:mainfrom
vaibhavk2:envoy

Conversation

@vaibhavk2
Copy link
Copy Markdown

Add Fortio-Envoy optimization guide and related documentation.

vaibhavk2 added 3 commits May 1, 2026 09:12
Signed-off-by: Vaibhav Shankar <vaibhav.shankar@intel.com>
Signed-off-by: Vaibhav Shankar <vaibhav.shankar@intel.com>
Signed-off-by: Vaibhav Shankar <vaibhav.shankar@intel.com>
Comment thread software/envoy/README.md Outdated
Comment thread software/envoy/README.md Outdated
Comment thread software/envoy/README.md
Comment thread software/envoy/README.md Outdated
Comment thread software/envoy/README.md Outdated
Comment thread software/envoy/README.md Outdated
Signed-off-by: Vaibhav Shankar <vaibhav.shankar@intel.com>
@rsiyer-intel rsiyer-intel requested a review from adgubrud May 5, 2026 18:14
Comment thread software/envoy/README.md
Comment thread software/envoy/README.md
Comment thread software/envoy/README.md
Comment thread software/envoy/README.md
Comment thread software/envoy/README.md
Comment thread software/envoy/README.md
Comment thread software/envoy/README.md
| High LLC-load-misses in `perf stat` | Cross-NUMA memory access | Pin to single NUMA node |
| GC overhead in perf traces (`gcDrain`, `trygetfull`) | High allocation rate with default GC | Build Fortio with GreenTea GC (Go 1.25.1); raise `GOGC` |
| Envoy CPU bottlenecked in TLS | mTLS handshake overhead | Enable TLS session resumption. Make TLS communicaiton/handshake Async |
| NIC softirq on same cores as Envoy | IRQ affinity not set or not part of of same NUMA | Separate NIC IRQ cores from Envoy worker cores or ensure cores + mem + NIC are on same NUMA |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate of

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did not understand this comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| NIC softirq on same cores as Envoy | IRQ affinity not set or not part of of same NUMA | Separate NIC IRQ cores from Envoy worker cores or ensure cores + mem + NIC are on same NUMA |
| NIC softirq on same cores as Envoy | IRQ affinity not set or not part of same NUMA | Separate NIC IRQ cores from Envoy worker cores or ensure cores + mem + NIC are on same NUMA |

Comment thread software/envoy/README.md
Comment thread software/envoy/README.md

---
### 5. Use fewer CPU cores as possible to run Envoy + fortio
Fortio and Envoy are intentionally run on a reduced number of CPU cores to avoid excessive spin-lock contention and busy-waiting behavior. By limiting core availability, the benchmark prevents threads from continuously spinning on shared locks and instead exposes meaningful contention, scheduling, and proxy-path behavior under realistic CPU pressure
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the suggestion to increase docker cpu quota but restrict to fewer CPU cores? This is a bit confusing. Can you clarify/elaborate?
should cpu quota match the number of cores it is being pinned to
i.e. --cpus = -cpuset-cpus ""

Comment thread software/envoy/README.md

#### Worker and socket tuning
- **`SO_REUSEPORT`**: Already used by Envoy by default. Verify it is not disabled by any sysctls - it distributes `accept()` load evenly across worker threads without a shared accept mutex.
- **`--concurrency`**: Tune thread concurrency to match the number of physical cores assigned to the container. Over-provisioning causes false sharing, under-provisioning wastes hardware.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as bullet 2? --concurrency equal to --cpus

Comment thread software/envoy/README.md
### 6. Other Envoy Tuning -- which can be explored

#### Worker and socket tuning
- **`SO_REUSEPORT`**: Already used by Envoy by default. Verify it is not disabled by any sysctls - it distributes `accept()` load evenly across worker threads without a shared accept mutex.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be useful to provide which sysctl setting to verify

Comment thread software/envoy/README.md

### 2. Increase CPU Quota for Both Containers

With NUMA pinning in place, increasing the CPU quota for both Fortio and Envoy containers further reduces scheduling delays and spin lock overhead. Keep `--concurrency` equal to `--cpus` to avoid over-subscription. The numbers are just examples. Tune this based on the SKU/cores used.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 256 threads system, you have allocated only 32+16 cores? What would be a good general recommendation? If someone had to tune based on SKU/cores they have?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants