Open
Conversation
CMGS
added a commit
that referenced
this pull request
Apr 7, 2026
P1 GC: Implement RegisterGC for FC backend — protects blob IDs referenced by FC VMs from garbage collection, mirroring CH's GC module. P1 Clone paths: Save cocoon.json metadata (StorageConfigs + BootConfig) in snapshot tar. Create temporary symlinks from source drive paths to clone paths before snapshot/load so FC finds drives at expected locations. Symlinks are cleaned up after load + reconfigure. P2 Rebuild: Replace fragile rebuildFromSnapshot (searched live VM records) with self-contained metadata from cocoon.json. Clones no longer depend on the source VM or any sibling VM existing in the DB. P2 Console relay: Add 3s timeout on second goroutine wait after client disconnect to prevent blocking the accept loop when PTY read is stuck.
Add --fc flag to select Firecracker as hypervisor backend. Validates mutual exclusion with --windows and rejects cloudimg (UEFI boot) since Firecracker only supports direct kernel boot. InitHypervisor dispatches based on config; FC returns stub error until the backend is implemented.
Create Firecracker backend package with Config (path helpers), main Firecracker struct (constructor, Inspect, List, Watchable), and helper utilities (toVM, path functions). Wire up InitHypervisor to create FC backend when --fc is set. Lifecycle methods are stubs pending implementation.
Add FC REST API client (pre-boot config model), Create (COW disk + device-path cmdline), and Start (launch process → REST API config sequence → InstanceStart). FC references disks by /dev/vdX path since it lacks virtio serial support. Update overlay.sh init script to resolve both device paths and serial names.
Add Stop (SendCtrlAltDel → SIGTERM → SIGKILL) and Delete (stop-if-running → cleanup dirs → remove DB record) for the Firecracker backend. Follows the same patterns as CH.
FC binds serial to process stdin/stdout. Create PTY pair at launch: slave → FC stdin/stdout, master → background relay process. The relay (self-exec with env var detection) listens on console.sock and bridges connections to the PTY master. Auto-exits when FC dies. Console() connects to console.sock, consistent with CH backend.
Add full snapshot lifecycle for FC backend: - Snapshot: pause → PUT /snapshot/create (vmstate+mem) → reflink COW → resume - Clone: extract → launch new FC → PUT /snapshot/load → reconfigure drives/NICs → resume - Restore: kill running → extract → new FC → snapshot/load → reconfigure → resume - Direct: hardlink mem, reflink COW, copy vmstate for local snapshots FC snapshot/load does not preserve drive/NIC config, so drives and networks are re-attached after load. Implements hypervisor.Direct interface for reflink-optimized local snapshot operations.
Add FC_VERSION variable (v1.12.0), firecracker binary detection in check_binary, and auto-install from GitHub releases in --upgrade mode.
Add --fc flag to global flags, Firecracker section with feature comparison matrix, limitations, OCI image compatibility notes. Update requirements, doctor, VM lifecycle, and shutdown behavior sections to reflect dual-backend support.
- Pre-create FC log file (FC requires O_WRONLY|O_APPEND, no O_CREATE) - Use underscores in drive/iface IDs (FC rejects hyphens) - Add vmlinux extraction from vmlinuz (FC needs uncompressed ELF kernel) - Support zstd and gzip compressed kernels via CLI decompressor - Fix FC download URL in doctor/check.sh (tarball format)
- Guard boot pointer nil dereference in prepareOCI - Fix relayBidirectional goroutine leak: buffer 2, close conn, wait - Optimize ensureVmlinux: check ELF magic (4 bytes) and cache before reading full vmlinuz into memory - Extract magic strings to constants (driveIDFmt, ifaceIDFmt, cowFileName, FC action types, VM state strings) - Deep-copy SnapshotIDs map in toVM to prevent shared DB mutation - Return real error from decompressZstd when output is empty
Extract ~650 lines of duplicated code from CH and FC backends into shared hypervisor/ layer: - Backend struct with BackendConfig interface: provides Inspect, List, ToVM, ResolveRef(s), LoadRecord, WithRunningVM, UpdateStates, MarkError, ReserveVM, RollbackCreate, ForEachVM, AbortLaunch - shared.go: EnterNetns, WaitForSocket, ExtractBlobIDs, BuildIPParams, PrefixToNetmask, CopyFile, RemoveVMDirs, CleanupRuntimeFiles, BlobHexFromPath, SocketPath, ConsoleSockPath - config.HypervisorType enum + switch-case in InitHypervisor - FC version updated to v1.15.0
P1 GC: Implement RegisterGC for FC backend — protects blob IDs referenced by FC VMs from garbage collection, mirroring CH's GC module. P1 Clone paths: Save cocoon.json metadata (StorageConfigs + BootConfig) in snapshot tar. Create temporary symlinks from source drive paths to clone paths before snapshot/load so FC finds drives at expected locations. Symlinks are cleaned up after load + reconfigure. P2 Rebuild: Replace fragile rebuildFromSnapshot (searched live VM records) with self-contained metadata from cocoon.json. Clones no longer depend on the source VM or any sibling VM existing in the DB. P2 Console relay: Add 3s timeout on second goroutine wait after client disconnect to prevent blocking the accept loop when PTY read is stuck.
P1: GC now registers ALL hypervisor backends (CH + FC) via InitAllHypervisors, protecting blobs from both backends on mixed-backend hosts regardless of --fc flag. P2: doctor/check.sh treats firecracker as optional — warns instead of failing when not installed, since it's only needed for --fc. P3: vm debug rejects --fc with a clear error since it only generates Cloud Hypervisor launch commands.
…el path P1: createDriveRedirects now unconditionally redirects the source COW path to the clone's copy. When the source VM is still running, its cow.raw is renamed to a temporary backup, a symlink is placed, and after snapshot/load the backup is restored. This prevents FC from reopening the live source VM's disk state. P2: saveSnapshotMeta stores the portable vmlinuz path instead of the host-local vmlinux cache. cloneAfterExtract runs ensureVmlinux on the clone host to (re)create vmlinux from vmlinuz, making FC snapshots fully portable across hosts.
P2 redirect: createDriveRedirects now returns error. On symlink failure after backup rename, the backup is immediately restored and all prior redirects are cleaned up, preventing source VM disk corruption from a half-installed redirect. P2 portable paths: snapshot metadata (cocoon.json) now stores paths relative to root_dir using filepath.Rel. loadSnapshotMeta resolves them against the local host's root_dir. Snapshots exported from one host can be imported on another with a different Cocoon directory layout, as long as the same OCI image has been pulled.
P1: SnapshotConfig now carries a Hypervisor field ("cloud-hypervisor"
or "firecracker") set during Snapshot(). Clone validates that the
snapshot's backend matches the active backend before proceeding,
with a clear error suggesting the correct flag.
P2: COW redirect during clone is now serialized via a per-source-COW
flock (.clone.lock). Concurrent snapshot/restore/clone operations on
the source VM block until the redirect is cleaned up, preventing
them from following the temporary symlink to the wrong disk.
saveSnapshotMeta now stores ALL drive entries (RO layers + RW COW), not just RO entries. Without the source COW path, createDriveRedirects had no old→new mapping to redirect, so snapshot/load would reopen the live source cow.raw (if source VM exists) or fail (if deleted).
…creation P1: acquireCOWLock (via lockCOWPath) now creates the parent directory before locking, fixing ENOENT when source VM has been deleted. P2: snapshotMeta stores SourceRootDir. vmstatePaths() reconstructs the original absolute paths baked into FC's vmstate binary. createDriveRedirects uses vmstate paths as symlink targets, so cross-host clones redirect at the correct (source host) paths. P2: COW flock is now taken in Snapshot and Restore too (via shared lockCOWPath helper), not just Clone. Concurrent snapshot/restore operations on the source VM are serialized with clone redirects.
…esign P1: InitAllHypervisors now returns error instead of silently skipping failed backends. GC aborts if any hypervisor can't be loaded, preventing blob deletion when pinning data is incomplete. P2: ensureVmlinux writes to a temp file and renames atomically, preventing concurrent readers from observing a truncated kernel cache. P2: Added zstd to doctor/check.sh binary checks — required by FC's kernel decompression but was previously an undeclared dependency. P2: Redesigned console relay to use a single persistent PTY reader goroutine with broadcaster pattern. Each session subscribes/unsubscribes via setSink(). No per-session read goroutines on the PTY master, eliminating stale goroutine data theft after disconnect.
P1: vmstatePaths() now reconstructs from raw relative paths saved before local resolution, so cross-host clones correctly redirect at source-host paths even when root_dir differs. P2: zstd treated as optional in doctor/check.sh (like firecracker), warns instead of failing on CH-only hosts. P3: FC Stop now honors --force (skip SendCtrlAltDel, immediate kill) and --timeout (wait for guest response before escalating). Added gracefulStop with SendCtrlAltDel → poll → forceTerminate pattern.
…ath >26 P2: snapshotRecordToConfig now copies the Hypervisor field so export/import preserves the backend tag. Clone validation works correctly after a round-trip. P2: devPath handles >26 drives with Linux-style multi-letter naming (vda..vdz, vdaa..vdaz, ...) for OCI images with deep layer stacks.
…install P1: FC clone/restore now clamp CPU/memory to snapshot's original values since FC cannot PATCH machine-config after snapshot/load. Snapshot metadata stores CPU/Memory for clone to use. Prevents metadata from advertising overrides FC didn't actually apply. P2: doctor --upgrade now installs zstd via apt-get/yum when missing, so fresh FC setups don't silently break on zstd-compressed kernels.
P2: Set VM.ID in synthetic VMRecord for clone launchProcess so FC gets a valid --id flag instead of empty string. P2: Drive redirects now only apply for same-host clones (where SourceRootDir matches local rootDir). Cross-host clones skip redirects entirely — they require the same rootDir layout, and creating symlinks under a foreign path tree would be incorrect.
…keep PTY P1: Always create drive redirects from vmstate paths → local paths, including cross-host clones. COW flock only on same-host (where source VM may be running). Cross-host redirects are safe since no live VM owns those paths on the target host. P2: FC clone/restore now reject --cpu/--memory overrides with a clear error instead of silently clamping, since FC cannot PATCH machine-config after snapshot/load. P2: Keep PTY master open (intentional fd leak) when console relay fails, preventing the slave-side hangup that would crash FC's serial console output during boot.
… ops Move FC CPU/memory override rejection to before any destructive operations. Clone validates against snapshot metadata before launch. Restore validates against current VM record before killing the running VM (via validateRestoreOverrides helper). Prevents downtime from unsupported override requests.
SetUnlinkOnClose(false) before closing the Go listener so the socket file persists on disk for the relay child process. Without this, net.UnixListener.Close() removes the socket file, making console.sock disappear before the relay starts accepting.
Network: - Add SingleQueueNet flag to VMConfig for FC single-queue TAPs - CNI creates TAPs with IFF_NO_PI when SingleQueueNet is set (FC requires it) - Set SingleQueueNet in both createVM and prepareClone paths Console: - Fix SetUnlinkOnClose(false) so console.sock persists for relay Snapshot/Clone: - Use FC network_overrides (v1.14+) during snapshot/load to provide clone's TAP devices, avoiding TAP flag mismatch - Skip drive reconfiguration after snapshot/load (FC opens drives via fd during load, fds survive symlink cleanup) - Remove unused reconfigureDrives function Restore: - Skip drive reconfiguration (same VM, paths unchanged) - Pass nil network_overrides (same TAP) COW lock: - Rewrite lockCOWPath to withCOWPathLocked closure form - Update all callers (snapshot, clone, restore, direct) All e2e tests pass: FC create/start/network/console/snapshot/clone/ restore/stop/delete + CH smoke test (no regression).
…tempty - prepareClone: move ctx before cmd per Go convention - create_linux.go: re-read link after LinkSetHardwareAddr to get the actual MAC (link.Attrs() is stale after override) - types/vm.go: add omitempty to FirstBooted for consistent JSON - debug.go: normalize nolint comment alignment
Remove SingleQueueNet from VMConfig — FC queue decision stays at the cmd layer via tapQueues parameter to initNetwork. The network layer uses vmCfg.CPU for TAP queues, which initNetwork temporarily overrides to 1 for FC. Also add IFF_NO_PI to all TAPs unconditionally — both CH and FC open TAPs with IFF_NO_PI, so the flag must always be set at creation time for TUNSETIFF to succeed.
P2: FC clone now rejects --nics > snapshot NIC count since FC can't hot-add NICs after snapshot/load (only network_overrides for existing). P3: Debug command runs EnsureVmlinux to resolve vmlinuz → vmlinux before printing the FC boot-source curl, so the output is runnable. Export EnsureVmlinux for use by cmd/vm/debug.go.
Add Hypervisor field to types.VM so each VM carries its backend identity. Move --fc from root PersistentFlags to create/run/debug subcommands only. Commands like list/inspect/console/stop/rm now auto-detect the backend by querying all registered backends — no --fc needed for existing VMs. Clone infers the backend from the snapshot's Hypervisor field. Snapshot save and list --vm auto-detect from the VM ref. Status merges watchers from all backends via fan-in channel.
Validate --cpu/--memory/--nics overrides at cmd layer before creating network and VM dirs, avoiding late failure and unnecessary rollback. Add MAC change instructions to FC clone post-clone hints since FC vmstate bakes in the source VM's guest MAC.
fe2c700 to
29139fe
Compare
FC has no ACPI PM on x86 — the only shutdown/reboot signal path is the i8042 keyboard controller reset. Without reboot=k, guest reboot hangs (FC doesn't recognize the signal) and SendCtrlAltDel-based vm stop times out after 30s before falling back to SIGTERM.
Contributor
Author
E2E Regression Test ResultsFull lifecycle test across both backends, both network types, single and multi CPU. Test Matrix
Key Findings
Known Issues
|
GC orchestrator holds the module's flock for the entire cycle. Collect called LoadRecord which called DB.With → locker.Lock on the same flock, causing self-deadlock since flock is not re-entrant. Replace LoadRecord (lock-acquiring) with DB.ReadRaw (lock-free) in both FC and CH GC Collect. This is safe because the GC orchestrator already holds the lock, preventing concurrent DB mutations.
… issues IP=dhcp caused three problems: 1. --nics 0 VMs hung forever (dhcpcd retries every 120s with no interface) 2. DHCP network VMs had leases persisted as static configs by systemd-network-generator, breaking DHCP semantics on reboot 3. Source VMs and cloned VMs had inconsistent network behavior IP=off tells initramfs to skip networking entirely. Kernel ip= parameters (when present for static IP networks) override this setting and still trigger ipconfig. DHCP networks rely on systemd-networkd via the existing 20-wired.network (DHCP=yes) fallback, or cocoon-network's MAC-based DHCP config generation. Fixes #17
configure_networking probes for devices and waits for udev even when IP=off, adding ~180s delay on VMs with no NICs. Only call it when a kernel ip= parameter is present on the cmdline.
…fs fixes - Move --fc from Global Flags to VM Flags (only create/run/debug) - Update FC examples to show auto-detect for list/console/stop/clone - Fix debug command description - Add initramfs IP=off note to DHCP networking section
- Add /dev/vdX direct path branch to Android overlay.sh resolve_disk() so FC VMs can find disks (FC has no virtio serial support) - Skip configure_networking unless kernel ip= param is present - Extract GC Collect to shared Backend.GCCollect() (was duplicated) - Fix goroutine leak in mergeWatchChannels (missing ctx.Done check)
Remove ndc dependency — ndc network interface add causes netd to take over eth0 and clear existing routes from the main table. Instead: - Static IP: kernel ip= routes already in main table, copy to policy tables - DHCP: udhcpc obtains lease and configures main table, then same copy logic Both paths use ip route replace into legacy_system/legacy_network/local_network policy tables. Add /proc/1/cmdline fallback for SELinux-restricted /proc/cmdline. Add guard file to prevent repeated execution on netd restart.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add Firecracker as an alternative hypervisor backend, selected via the
--fcpersistent root flag. Both CH and FC implement the samehypervisor.Hypervisorinterface; shared store operations are extracted into a commonhypervisor.Backendstruct.What's included
IFF_NO_PIflag;SingleQueueNetVMConfig flag controls TAP creationnetwork_overrides(FC v1.14+);withCOWPathLockedclosure for serializationRegisterGC; both backends registered duringcocoon gccocoon --fc vm debugoutputs FC launch command + REST API curl sequenceShared code extraction (
hypervisor/layer)Extracted ~150 lines of duplicated code from CH and FC into shared packages:
hypervisor.Backendstruct: Inspect, List, ToVM, ResolveRef(s), LoadRecord, WithRunningVM, UpdateStates, MarkError, ReserveVM, RollbackCreate, ForEachVM, AbortLaunch, BatchMarkStarted, CleanStalePlaceholdershypervisor/shared.go: EnterNetns, WaitForSocket, ExtractBlobIDs, BuildIPParams, PrefixToNetmask, CopyFile, RemoveVMDirs, CleanupRuntimeFiles, VerifyBaseFiles, BlobHexFromPathFeature comparison
Key design decisions
console.sockvia broadcaster pattern (single persistent reader, no per-session goroutine leaks)snapshot/load, serialized viawithCOWPathLockednetwork_overrides: FC v1.14+ feature replaces TAP devices during snapshot/load, avoiding TAP flag mismatch on cloneroot_dir/run_dirlayoutSingleQueueNet: Generic VMConfig flag for single-queue TAP creation withIFF_NO_PI; set by cmd layer for FC, consumed by CNI layerKnown limitations (documented in KNOWN_ISSUES.md)
drive_overrides) will eliminate this/dev/vdXdevice pathsTesting
E2E tested on
35.240.182.52(FC v1.15.0):Review history
Test plan
--fc --windowsmutual exclusion--fcwith cloudimg rejectionclose #15 #17