microvm MVP cleanup: align rootfs behavior to gvisor, ... by BenTheElder · Pull Request #313 · agent-substrate/substrate

Benjamin Elder (BenTheElder) · 2026-06-26T01:42:57Z

Follow-up to #287 / #123

align snapshot behavior: use a tmpfs (for writes) on top of read-only viritio-fs mount for the container image rootfs instead of ext4 images
enable multiple container support
cleanup stale comments

It's a good idea to open an issue first for discussion.

Tests pass
Appropriate changes to documentation are included in the PR

Benjamin Elder (BenTheElder) · 2026-06-26T05:29:12Z

Davanum Srinivas (@dims) this should significantly unblock large images. Still needs follow-ups to optimize image pull / unpack. I tested with the 3GB openshell image.

Davanum Srinivas (dims) · 2026-06-26T11:46:33Z

Benjamin Elder (@BenTheElder) does this need me to build/use virtiofsd from source?

Benjamin Elder (BenTheElder) · 2026-06-26T14:43:35Z

Benjamin Elder (@BenTheElder) does this need me to build/use virtiofsd from source?

Unfortunately yes but the scripts will handle it.

Davanum Srinivas (dims) · 2026-06-26T18:12:27Z

+// it with a tmpfs upper. On failure it dumps the guest overlay state.
+func startOverlayContainer(ctx context.Context, ac *kata.AgentClient, vsockPath string, c actorContainer) error {
+	carrierCtx, carrierCancel := context.WithTimeout(ctx, 30*time.Second)
+	err := ac.CreateCarrier(carrierCtx, c.name, c.spec)


Will there be a problem if the user container itself has -ovl as a suffix? Do we need to make sure there is no collision between carrier and workload ids?

Yes, we can switch to _ovl since k8s DNS-label restrictions shouldn't apply here but do apply to the user specified containers. Testing / fixing.

Addressed this, I think that fix works well. It seems unlikely we'll want to allow invalid k8s container names ... later we might consider something like the dynamic containers KEP.

Benjamin Elder (BenTheElder) · 2026-06-26T19:28:24Z

I need to rework the branch atop the CI changes now that we have those, so we get coverage on this PR :-)

Will do that before we merge [IE even if this LGTM to reviewers, let's wait until that is done, pointing an agent at it ...]

The overlay rootfs serves each container's image read-only over virtio-fs, which needs a vhost-user fs device (a virtiofsd socket) and more than one PCI segment (the fs device sits on segment 1, kata's convention). Add FsConfig + PlatformConfig and the Fs/Platform fields to VmConfig; both are omitempty, so a config without them is serialized exactly as before.

…s upper) Helpers to assemble a container's rootfs as an overlay: its OCI image served read-only over virtio-fs (the lower) plus a guest tmpfs (the writable upper). - StartVirtiofsd: run virtiofsd in find-paths migration mode (so the fs device survives CH snapshot/restore), serving the per-sandbox shared dir. - ReconstructSharedDirFromImage: bind-mount a container's image into <cid>/rootfs under the shared dir (no host-side copy; virtiofsd serves it to the guest on demand), ensure the standard OCI mountpoints exist, and remount it read-only so the lower is immutable and byte-identical on every node (find-paths re-opens its inodes by path on restore). - CreateSandboxForActor: create the sandbox with the kataShared virtio-fs mount. - CreateCarrier: a created-but-unstarted container that binds the base to a stable per-container path the overlay uses as its lowerdir. - StartOverlayWorkload: create + start the container with an overlayfs rootfs whose upper/work live on a guest tmpfs.

Run all of an actor's containers in the one micro-VM (the pod sandbox), each with its own overlay rootfs (virtio-fs RO lower + guest-tmpfs upper) rather than a per-container disk. Because the writable upper is a guest tmpfs, rootfs writes are part of the CH memory snapshot and persist across suspend/resume alongside process memory. - RunWorkload: bind each container's image into the shared dir and start one virtiofsd; create the sandbox, then a carrier + overlay workload per container. - CheckpointWorkload: pause + snapshot memory; the tmpfs upper rides along, so there is no per-container disk to ship. - RestoreWorkload: reconstruct each read-only lower from the local OCI bundle, start virtiofsd, repoint the snapshot config's per-VMDir paths (vsock, serial, fs socket), and OnDemand-restore + resume. This replaces the per-container disk path: remove the disk builder (BuildExt4Image), the blk workload (StartBlkWorkload), and the now-obsolete blk integration test.

…overlay rootfs The overlay rootfs serves the image over virtio-fs, so the asset set gains virtiofsd and moves to kata 3.32. virtiofsd is built from a pinned source commit because the vhost-0.16 snapshot/restore fix isn't in a release tag yet (tracking: gitlab.com/virtio-fs/virtiofsd work_items/236). assemble.sh builds it and the stagers upload it to rustfs (kind) and GCS (GKE); the counter-microvm SandboxConfig lists the virtiofsd asset for arm64 + amd64, and the sandboxconfig-assets VAP (with its envtest) now requires virtiofsd for every microvm architecture. The overlay formats nothing on the host, so it runs on the committed debian:stable-slim worker base: drop the custom worker base (hack/ateom-base) and its use in run-microvm-demo.sh. Update the asset README and architecture doc for the overlay.

Terminology and accuracy fixup in files the overlay change doesn't otherwise touch: the runtime no longer resets the rootfs to golden (the overlay's tmpfs upper persists in the memory snapshot), and "owned-boot" was local jargon for ateom booting cloud-hypervisor itself. Comments only.

Benjamin Elder (BenTheElder) · 2026-06-26T22:57:03Z

OK, should be synced with the e2e changes now

Benjamin Elder (BenTheElder) · 2026-06-26T23:02:52Z

+	cmd := exec.Command(bin,
+		"--socket-path="+o.SocketPath,
+		"--shared-dir="+o.SharedDir,
+		"--cache=auto",


Deferring: currently we only mount this ro, so we could use cache=always, but it's not clear yet how we will handle projected volumes cc Taahir Ahmed (@ahmedtd)

I think we might want to do a special writer for those and leverage the ability to exec into the sandboxes, in which case we could leave the virtiofsd mount fully read-only with no host <> guest churn and aggressively cache in the guest to improve performance.

But until we decide I'm just going to leave this, using it is already a massive startup improvement versus writing to an ext4 for multi-gigabyte disks ... may be worse for read heavy workloads but we can iterate.

Benjamin Elder (BenTheElder) changed the title ~~microvm MVP cleanup: align rootfs behavior to gvisor, improve large container image performance, ...~~ microvm MVP cleanup: align rootfs behavior to gvisor, ... Jun 26, 2026

Benjamin Elder (BenTheElder) marked this pull request as ready for review June 26, 2026 05:28

Benjamin Elder (BenTheElder) force-pushed the microvm-overlay-tmpfs branch from 1a621d5 to 6fc1b32 Compare June 26, 2026 05:39

Davanum Srinivas (dims) reviewed Jun 26, 2026

View reviewed changes

Comment thread cmd/ateom-microvm/internal/kata/overlay_linux.go

Benjamin Elder (BenTheElder) force-pushed the microvm-overlay-tmpfs branch from 6fc1b32 to c6b05ed Compare June 26, 2026 17:18

Davanum Srinivas (dims) reviewed Jun 26, 2026

View reviewed changes

Benjamin Elder (BenTheElder) force-pushed the microvm-overlay-tmpfs branch from c6b05ed to 6d6a4f7 Compare June 26, 2026 19:26

Benjamin Elder (BenTheElder) added 5 commits June 26, 2026 13:09

Benjamin Elder (BenTheElder) force-pushed the microvm-overlay-tmpfs branch from 6d6a4f7 to 06ef4ed Compare June 26, 2026 22:55

Benjamin Elder (BenTheElder) commented Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

microvm MVP cleanup: align rootfs behavior to gvisor, ...#313

microvm MVP cleanup: align rootfs behavior to gvisor, ...#313
Benjamin Elder (BenTheElder) wants to merge 5 commits into
agent-substrate:mainfrom
BenTheElder:microvm-overlay-tmpfs

Benjamin Elder (BenTheElder) commented Jun 26, 2026

Uh oh!

Benjamin Elder (BenTheElder) commented Jun 26, 2026

Uh oh!

Davanum Srinivas (dims) commented Jun 26, 2026

Uh oh!

Uh oh!

Benjamin Elder (BenTheElder) commented Jun 26, 2026

Uh oh!

Davanum Srinivas (dims) Jun 26, 2026

Uh oh!

Benjamin Elder (BenTheElder) Jun 26, 2026 •

edited

Loading

Uh oh!

Benjamin Elder (BenTheElder) Jun 26, 2026

Uh oh!

Benjamin Elder (BenTheElder) commented Jun 26, 2026

Uh oh!

Benjamin Elder (BenTheElder) commented Jun 26, 2026

Uh oh!

Benjamin Elder (BenTheElder) Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Benjamin Elder (BenTheElder) commented Jun 26, 2026

Uh oh!

Benjamin Elder (BenTheElder) commented Jun 26, 2026

Uh oh!

Davanum Srinivas (dims) commented Jun 26, 2026

Uh oh!

Uh oh!

Benjamin Elder (BenTheElder) commented Jun 26, 2026

Uh oh!

Davanum Srinivas (dims) Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Benjamin Elder (BenTheElder) Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Benjamin Elder (BenTheElder) Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Benjamin Elder (BenTheElder) commented Jun 26, 2026

Uh oh!

Benjamin Elder (BenTheElder) commented Jun 26, 2026

Uh oh!

Benjamin Elder (BenTheElder) Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Benjamin Elder (BenTheElder) Jun 26, 2026 •

edited

Loading