Skip to content

virtio/net/unixgram: handle ENOBUFS on macOS by retrying#556

Open
nirs wants to merge 1 commit intocontainers:mainfrom
nirs:retry-on-enobufs
Open

virtio/net/unixgram: handle ENOBUFS on macOS by retrying#556
nirs wants to merge 1 commit intocontainers:mainfrom
nirs:retry-on-enobufs

Conversation

@nirs
Copy link
Contributor

@nirs nirs commented Feb 19, 2026

When running iperf3 with gvproxy, krunkit breaks quickly with:

[2026-02-19T02:53:41Z ERROR devices::virtio::net::worker] Failed to process rx:
Backend(Internal(ENOBUFS)) (triggered by backend socket readable)

macOS returns ENOBUFS when the kernel socket buffer is full, rather than blocking or returning EAGAIN. The kernel does not provide any readiness notification for this condition, so the only recovery is to retry the send.

Map ENOBUFS to WriteError::NothingWritten so the caller retries the frame instead of treating it as a fatal error.

Test results

Tested using vmnet-helper and gvproxy using:

iperf3 -c vm-address --bidir -Z -P 8 -t 600 > stress.json
stress-sum

With vmnet-helper we get consistent performance in both direction (20.14-23.67 Gbits/s). With gvproxy we get higher bandwidth from vm to host (28 Gbits/s) but very low bandwidth from host to vm (0.69 Gbits/s).

Note

Without this change we cannot test performance, stress test breaks in few seconds.

Using gvproxy from containers/gvisor-tap-vsock#613.

Fixes #555

When running iperf3 with gvproxy, krunkit breaks quickly with:

    [2026-02-19T02:53:41Z ERROR devices::virtio::net::worker] Failed to process rx:
    Backend(Internal(ENOBUFS)) (triggered by backend socket readable)

macOS returns ENOBUFS when the kernel socket buffer is full, rather
than blocking or returning EAGAIN. The kernel does not provide any
readiness notification for this condition, so the only recovery is
to retry the send.

Map ENOBUFS to WriteError::NothingWritten so the caller retries the
frame instead of treating it as a fatal error.

Assisted-by: Cursor/Claude Opus 4.6
Signed-off-by: Nir Soffer <nirsof@gmail.com>
@gemini-code-assist
Copy link

Summary of Changes

Hello @nirs, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical network stability and performance issue on macOS when using gvproxy with iperf3. It introduces specific error handling for ENOBUFS (no buffer space available) errors, which macOS returns when the kernel socket buffer is full. By treating this condition as a signal to retry the write operation, the PR prevents crashes and enables successful, stable network stress testing, significantly improving the robustness of the virtio network backend on macOS.

Highlights

  • macOS ENOBUFS Handling: Modified the write_frame function in virtio/net/unixgram to explicitly handle nix::Error::ENOBUFS on macOS. This error, which occurs when the kernel socket buffer is full, is now mapped to WriteError::NothingWritten, allowing the caller to retry the send operation instead of treating it as a fatal error.
  • Improved Network Stability: This change prevents gvproxy from crashing during iperf3 stress tests on macOS, which previously occurred due to unhandled ENOBUFS errors. The system can now gracefully handle transient buffer full conditions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/devices/src/virtio/net/unixgram.rs
    • Modified the write_frame function to use a match statement for the send system call result.
    • Added a specific case to catch nix::Error::ENOBUFS and return WriteError::NothingWritten.
    • Included comments explaining the macOS-specific behavior of ENOBUFS and referencing a similar handling in vmnet-helper.
Activity
  • The author identified a critical issue where iperf3 stress tests would fail within seconds on macOS due to Backend(Internal(ENOBUFS)) errors when using gvproxy.
  • Detailed test results were provided, demonstrating that the change enables successful iperf3 bidirectional stress testing for 600 seconds, with consistent performance using vmnet-helper and improved stability for gvproxy.
  • The author noted that without this change, performance testing was impossible as the stress test would immediately break.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly handles the ENOBUFS error on macOS for unixgram sockets by mapping it to WriteError::NothingWritten, which allows the caller to retry sending the frame. This prevents a fatal error when the kernel socket buffer is full. I've added one suggestion to also handle EAGAIN and EWOULDBLOCK in the same way to improve portability and make the error handling more robust across different Unix-like systems.

Comment on lines +119 to +128
// macOS returns ENOBUFS when the kernel socket buffer is full,
// rather than blocking or returning EAGAIN on non-blocking sockets.
// The only way to recover is to retry the send; the kernel does not
// provide any readiness notification for this condition.
// vmnet-helper handles this with a 50us sleep between retries,
// and typically one retry is enough:
// https://github.com/nirs/vmnet-helper/blob/455d172/helper.c#L42
Err(nix::Error::ENOBUFS) => {
return Err(WriteError::NothingWritten);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While handling ENOBUFS is correct for macOS, for better portability it's a good practice to also handle EAGAIN and EWOULDBLOCK. These errors can be returned on other Unix-like systems (e.g., Linux) for non-blocking sockets when the send buffer is full. Handling them will make the implementation more robust.

You can use an or-pattern to match all three errors. I've also updated the comment to reflect this change.

Suggested change
// macOS returns ENOBUFS when the kernel socket buffer is full,
// rather than blocking or returning EAGAIN on non-blocking sockets.
// The only way to recover is to retry the send; the kernel does not
// provide any readiness notification for this condition.
// vmnet-helper handles this with a 50us sleep between retries,
// and typically one retry is enough:
// https://github.com/nirs/vmnet-helper/blob/455d172/helper.c#L42
Err(nix::Error::ENOBUFS) => {
return Err(WriteError::NothingWritten);
}
// On macOS, ENOBUFS is returned when the kernel socket buffer is full.
// On other systems, EAGAIN or EWOULDBLOCK can be returned.
// In all cases, we should retry the send. The kernel does not
// provide any readiness notification for this condition.
//
// Ref: https://github.com/nirs/vmnet-helper/blob/455d172/helper.c#L42
//
#[allow(unreachable_patterns)]
Err(nix::Error::ENOBUFS) | Err(nix::Error::EAGAIN) | Err(nix::Error::EWOULDBLOCK) => {
return Err(WriteError::NothingWritten);
}

Copy link
Contributor Author

@nirs nirs Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist This code is macOS only so EAGAIN or EWOULDBLOCK are not relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Network breaks under high load randomly with: ERROR devices::virtio::net::worker] Failed to process rx: Backend(Internal(ENOBUFS))

1 participant

Comments