Skip to content

Fix tc class EEXIST error during TAP device creation#178

Draft
sjmiller609 wants to merge 1 commit intomainfrom
hypeship/fix-tc-class-eexist
Draft

Fix tc class EEXIST error during TAP device creation#178
sjmiller609 wants to merge 1 commit intomainfrom
hypeship/fix-tc-class-eexist

Conversation

@sjmiller609
Copy link
Copy Markdown
Collaborator

Summary

Fixes the failed to create instance error caused by:

allocate network: create TAP device: apply upload rate limit: tc class add vm: exit status 2 (output: RTNETLINK answers: File exists)

Root cause: addVMClass uses tc class add which fails with EEXIST when a class with the same ID already exists on the bridge. This happens because:

  1. Orphaned tc classes from failed cleanup. removeVMClass (called during TAP teardown) is entirely best-effort — filter, qdisc, and class deletions all silently swallow errors. If the filter deletion fails (fragile string parsing of tc filter show output), tc class del also fails silently because the class still has references. The TAP device gets deleted, but the tc class persists as an orphan. On the next allocation that maps to the same class ID, tc class add hits the orphan and fails.

  2. 16-bit hash collision risk. deriveClassID hashes TAP names via FNV-1a truncated to 16 bits (65,536 possible IDs). By birthday paradox, collision probability reaches ~50% at 256 concurrent TAPs per host. Two different live instances can map to the same class ID.

  3. Restore/re-creation paths. createTAPDevice checks for an existing TAP (and deletes it if found), but does not check for an existing tc class independently. When the TAP is gone but the class persists, the idempotency check is bypassed. CleanupOrphanedClasses handles this but only runs at Initialize() time, not before each allocation.

Fix: Change tc class add to tc class replace in addVMClass. This is the idempotent equivalent — creates the class if missing, updates it in place if it already exists. No behavioral change for the happy path; eliminates the EEXIST failure on all three root cause paths.

One-line change in lib/network/bridge_linux.go.

tc class add fails with RTNETLINK EEXIST when a class with the same ID
already exists. This happens when removeVMClass cleanup silently fails
(all tc teardown is best-effort) leaving orphaned classes, or on
restore/re-creation paths where the TAP is gone but the class persists.

tc class replace is the idempotent equivalent — it creates the class if
missing or updates it in place if it already exists, eliminating the
'File exists' error without changing any other behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant