dhcpcd receive loop / unresponsiveness under high multi-interface load on FreeBSD

Hello

We are seeing a scalability / stability issue with dhcpcd under high multi-interface load on a FreeBSD-based appliance.

The issue was initially observed on IPv6, but further testing shows that IPv4 is affected as well.

When many interfaces / VLANs are managed by dhcpcd at the same time, the daemon can enter repeated receive-side error loops and eventually stop responding properly to management signals. This makes it difficult or impossible to reload or stop the daemon cleanly.

**Environment**

- dhcpcd version initially tested: 10.2.2
- Also tested with: 10.3.1
- Platform: FreeBSD-based appliance
- Many active interfaces / VLANs
- IPv4 and IPv6 enabled across multiple interfaces
- dhcpcd contrib version, unmodified

**Problem**

When dhcpcd manages many interfaces, it becomes unstable beyond a certain interface count.

The issue was first observed on the IPv6 path, with repeated receive errors and unreliable behavior when many VLANs were active.

However, further investigation shows that the issue is not limited to IPv6. We can also reproduce a similar problem on the IPv4 path.

On IPv4, with the unmodified contrib version, dhcpcd repeatedly logs:

`dhcp_readudp: Resource temporarily unavailable`

This appears in bursts while many interfaces are being rebound or renewed at the same time.

From code inspection, this appears to come from the UDP receive path when `recvmsg()` returns `-1` with `errno=35` on FreeBSD, which corresponds to `EAGAIN` / `EWOULDBLOCK`.

We understand that `EAGAIN` can be normal for a non-blocking socket. The problematic part is that, under high multi-interface load, dhcpcd appears to repeatedly hit this receive-side path and can become unresponsive afterwards.

**Observed behavior**

Under high interface count / VLAN load:

- IPv6 becomes unstable beyond a certain number of managed interfaces.
- IPv4 can also enter a repeated `dhcp_readudp: Resource temporarily unavailable loop`.
- These receive errors appear when many interfaces are active and DHCP events happen close together.
- After the issue is triggered, dhcpcd may stop responding properly to signals.
- This is especially visible when requesting a daemon reload.
- Once this happens, we lose control over the daemon and cannot reliably reload or stop it cleanly.
- IPv4 / IPv6 behavior becomes unreliable across interfaces.

**Hook-related investigation**

We initially suspected our shell hooks, because they add overhead after dhcpcd events.

We progressively reduced the hook logic until it was almost empty. In the reduced case, the hook only sent a lightweight IPC notification to another daemon, and the actual heavy work was executed asynchronously outside dhcpcd.

We also tested with hooks completely disabled using `-c /dev/null`.

Result:

- with a minimal hook path, the issue appears earlier, around 14 managed interfaces in our test setup;
- with hooks fully disabled using `-c /dev/null`, the issue still appears, but later, around 19 managed interfaces in our test setup.

So hooks clearly make the issue easier to trigger, but they do not appear to be the root cause by themselves.

**IPv4 observation**

The same kind of scalability problem is also visible on IPv4.

Example logs from the unmodified contrib version:

```
vlan76: leased 192.168.98.1 for 43200 seconds
vlan76: renew in 21600 seconds, rebind in 32400 seconds
vlan76: writing lease: /var/db/dhcpcd/vlan76.lease
vlan76: IP address 192.168.98.1/24 already exists
vlan76: executing: /usr/libexec/dhcpcd/dhcpcd-run-hooks RENEW
vlan12: rebinding lease of 192.168.34.1
vlan12: sending REQUEST (xid 0xdb1de80a), next in 3.9 seconds
vlan0: bound, ignoring 203.0.113.9 from 203.0.113.14
vlan72: bound, ignoring 192.168.94.1 from 192.168.94.254
vlan18: bound, ignoring 192.168.40.1 from 192.168.40.254
dhcp_readudp: Resource temporarily unavailable
vlan12: acknowledged 192.168.34.1 from 192.168.34.254
vlan12: leased 192.168.34.1 for 43200 seconds
vlan12: renew in 21600 seconds, rebind in 32400 seconds
vlan12: writing lease: /var/db/dhcpcd/vlan12.lease
vlan12: IP address 192.168.34.1/24 already exists
vlan12: executing: /usr/libexec/lib/dhcpcd/dhcpcd-run-hooks RENEW
vlan12: bound, ignoring 192.168.34.1 from 192.168.34.254
dhcp_readudp: Resource temporarily unavailable
dhcp_readudp: Resource temporarily unavailable
dhcp_readudp: Resource temporarily unavailable
dhcp_readudp: Resource temporarily unavailable
dhcp_readudp: Resource temporarily unavailable
dhcp_readudp: Resource temporarily unavailable
dhcp_readudp: Resource temporarily unavailable
dhcp_readudp: Resource temporarily unavailable
dhcp_readudp: Resource temporarily unavailable
dhcp_readudp: Resource temporarily unavailable
dhcp_readudp: Resource temporarily unavailable
dhcp_readudp: Resource temporarily unavailable
dhcp_readudp: Resource temporarily unavailable
dhcp_readudp: Resource temporarily unavailable
dhcp_readudp: Resource temporarily unavailable
dhcp_readudp: Resource temporarily unavailable
```

After this starts, the same message may repeat many times.

**Version 10.3.1**

We also tested with dhcpcd 10.3.1.

The problem is still present with 10.3.1.

Therefore, this does not appear to be fixed between 10.2.2 and 10.3.1.

**Current interpretation**

Our current understanding is:

- The issue is not specific to shell hooks.
- Hooks add overhead and make the issue appear earlier.
- The issue still reproduces with hooks disabled using `-c /dev/null`.
- The issue is not IPv6-only; IPv4 can also reproduce a receive-side loop.
- The common factor seems to be high multi-interface load and many DHCP events occurring close together.
- On FreeBSD, the receive path can repeatedly hit `EAGAIN` / `EWOULDBLOCK`.
- Once the daemon reaches this state, it may stop responding properly to signals.

**Question**

Has anything similar already been reported regarding dhcpcd on FreeBSD with many managed interfaces?

In particular:

- high number of VLANs / interfaces;
- repeated IPv4 `dhcp_readudp: Resource temporarily unavailable messages`;
- repeated IPv6 receive-side errors;
- many DHCP / RA events occurring close together;
- daemon reload causing dhcpcd to become unresponsive;
- issue still present with hooks disabled;
- issue still present in dhcpcd 10.3.1.

We would appreciate any guidance on whether this looks like a known receive-loop / event-loop issue, or if there are recommended patches or diagnostics for this scenario.


Note: we originally described this as an IPv6 issue, but after further investigation we can also reproduce the same class of problem in IPv4. The issue description has been updated accordingly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dhcpcd receive loop / unresponsiveness under high multi-interface load on FreeBSD #598

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

dhcpcd receive loop / unresponsiveness under high multi-interface load on FreeBSD #598

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions