microvm MVP by BenTheElder · Pull Request #287 · agent-substrate/substrate

Benjamin Elder (BenTheElder) · 2026-06-23T06:01:20Z

Fixes #123

cloud-hypervisor + kata (guest only).

Tests pass
Appropriate changes to documentation are included in the PR

TODO:

CI coverage ... not sure if we can pull this off in GHA.

There are some other obvious TODOs that should be resolvable but are not resolved yet:

We require a custom base image (for mkfs), and we have no where to host them currently, so it has to be built on demand.
Only supports one container. (Should be a quick fix ... EDIT: it is but it could stay a follow-up given the PR size ...)
We have to hardcode the kvm mount etc in the workerpool controller. Punting generalizing that as pluggability is a long-term goal vs the short term POC-2 goals. We can think about how much config we want to expose there.
cmd/setup-gcp isn't provisioning with nested virt yet, I added a node pool manually
We probably need docs for setting up local dev. I'm using limactl on an M4 mac, with nested virt enabled (non-default config). The kind script will handle this IF the container env has KVM available.
No /run/ate projected mount. Easy enough to fix ™️

I've started on these but I'm not blocking on them.

agent-substrate#287's micro-VM (kata + cloud-hypervisor) MVP builds the actor's virtio-blk rootfs synchronously with mkfs.ext4 -d, bound by the resume RPC deadline. For a small image that is sub-second; a real multi-GB image (OpenShell's ~3.2 GB helpdesk supervisor) takes minutes, so the context is cancelled and mkfs is SIGKILLed mid-write -- the actor never boots. - ateom-microvm RunWorkload (golden boot): build the rootfs on a deadline- detached context to a temp file + atomic rename; skip if it already exists. Idempotent + crash-safe so the controller retries converge. - ateom-microvm RunWorkload (restore reconstruct): detach the rebuild from the RPC deadline too (reset-each-restore, so no idempotency). - ateapi ResumeActor: hold the actor lock long enough to cover the build (30s -> 5m; suspend/delete stay short). - atenet router implicit-resume: 15s -> 5m, so a request that triggers a resume waits for the multi-GB rebuild + VM restore instead of cancelling it. Verified on CPU kind: the OpenShell helpdesk golden snapshot reaches Ready (supervisor boots inside the kata guest + is checkpointed) and a resumed user actor's agent serves /status, where before every path looped on 'mkfs.ext4 ... signal: killed'. Signed-off-by: Davanum Srinivas <davanum@gmail.com>

Davanum Srinivas (dims) · 2026-06-23T18:25:29Z

Benjamin Elder (@BenTheElder) I was able to test this with my openshell integration with some changes, please see:
c3aa089

It's mostly timeout bumps but there's a suggestion on making building Ext4 images a bit more robust.

Verified on bigbox (CPU kind cluster, /dev/kvm + nested virt):
- #287's own counter-microvm demo passes (boot → suspend → resume, in-RAM count survives the VM snapshot).
- OpenShell helpdesk golden snapshot reaches Ready — the supervisor boots inside a kata/cloud-hypervisor VM and is checkpointed.
- A resumed user actor's agent serves traffic: GET /status → {"history_turns":0,"uptime_seconds":79.7,"model":"gpt-oss:20b-cloud"}.

What was broken + the fix

#287's MVP builds each actor's virtio-blk rootfs synchronously with mkfs.ext4 -d, bound by the resume RPC deadline. Sub-second for a tiny image; for OpenShell's 3.2 GB helpdesk image it takes minutes → the context cancels → mkfs … signal: killed, at four spots. The branch addresses all:
1. ateom-microvm golden-boot build → deadline-detached + idempotent (temp→atomic-rename).
2. ateom-microvm restore-path rebuild → deadline-detached.
3. ateapi ResumeActor lock TTL 30s→5m.
4. atenet router implicit-resume 15s→5m.

Benjamin Elder (BenTheElder) · 2026-06-23T19:02:51Z

It's mostly timeout bumps but there's a suggestion on making building Ext4 images a bit more robust.

ACK on ext4 more robust, I'm also working on moving the ext4 provisioning into Go (without exec) so we can drop the custom base image and make them more deterministic, but it's a complex change. I think it works fine now but I haven't battle tested it yet.

Our mock workloads have pretty small disks, will have to reconsider this aspect. Those timeouts seem abysmal!

I think I know how we want to fix this ... let me test some things.

Benjamin Elder (BenTheElder) · 2026-06-23T22:41:30Z

[just trivial rebases for now]

Michelle Au (msau42) · 2026-06-23T23:31:18Z

+	if err != nil {
 		return nil, fmt.Errorf("while calling ateom.CheckpointWorkload: %w", err)
 	}
+	sandboxRec.SnapshotFiles = resp.GetSnapshotFiles()


Are there more files stored in the checkpoint directory besides the snapshot files? Wondering if we could require that the entire directory only contain the necessary files.

It should only have the snapshot files, but the specific list is currently the restore contract / recorded to the manifest. If we want to pivot to e.g. tarring them up, that should be fine.

@HavenXia

) While working on multi-container support for uVM in #287 I noticed how difficult it is to debug multi-container workloads. This makes it a lot easier. TODO: We could support filtering on this in kubectl ate (cc @HavenXia)

go-toml/v2 parses the kata configuration.toml; ttrpc (and its log dependency) backs the kata-agent client the micro-VM runtime drives. Includes their licenses.

atelet passes the runtime-fetched sandbox asset paths to the ateom worker and records the exact set of files in each snapshot, so snapshots are self-describing and the worker image bakes in no sandbox toolchain.

A copy of the kata-containers kata-agent protocol buffers (agent/oci/types/csi), used to drive the guest agent over ttrpc. Copied verbatim from kata-containers 3.31.0; see PROVENANCE.md for the upstream source, version, and license.

WorkerPool renders the micro-VM worker pod shape, and the SandboxConfig validating admission policy enforces the micro-VM asset set (cloud-hypervisor, guest kernel, guest image, kata config).

…ansfer Ship/restore exactly the files each runtime records in the snapshot manifest. The guest memory image is mostly free (zero) RAM, so compress and download only the populated extents (versioned sparse-extent format, backward compatible with plain zstd). On restore, download the snapshot concurrently with unpacking the OCI image; for streaming object stores, pipe compression straight into the upload.

Move the ActorLogger (and SyncedWriter) out of cmd/ateom-gvisor into a shared internal package so both ateom runtimes forward actor container logs to the pod log with the same ate.dev/* labels. Add serverboot.InitLoggerWithWriter so a runtime can route its own slog through the same synchronized writer as the actor log forwarder (no interleaved lines).

A small REST client to own the guest boot (vm.create / vm.add-net with tap FDs / vm.boot) and to snapshot and restore it. Restore uses cloud-hypervisor's OnDemand (userfaultfd) memory restore to avoid densifying the guest memory, and a sparse diff-merge overlays the post-restore delta back onto its source so each snapshot stays complete and re-restorable.

A ttrpc kata-agent client (sandbox create, container create/start, guest interface/route/ARP setup, stdout/stderr read), an mkfs.ext4 builder that turns the OCI bundle rootfs into a virtio-blk disk image, OCI-to-agent spec conversion, and a go-toml reader for the guest sizing + kernel params.

A second ateom runtime that runs the actor in a kata 3.31 guest under cloud-hypervisor. ateom owns the CH boot itself (no kata shim) and gives the actor a writable boot-time virtio-blk /dev/vdb rootfs built from the OCI bundle, so rootfs writes land off guest RAM and the snapshot is memory-only (no balloon). It drives the guest agent for sandbox setup + networking, snapshots/restores the VM with reset-to-golden disk semantics (rootfs writes discarded across suspend/resume, in-RAM state preserved) including cross-node restore, and forwards the actor's stdout/stderr to the pod log with ate.dev/* labels.

Assemble + stage the micro-VM runtime assets, an ateom-base image (debian-slim + e2fsprogs for mkfs.ext4), and run-microvm-demo.sh to build + deploy the counter-microvm demo end to end (overriding the worker base via KO_CONFIG_PATH so no committed file is edited). Document the micro-VM sandbox class.

Davanum Srinivas (dims) · 2026-06-24T01:21:28Z

 	github.com/hashicorp/go-reap v0.0.0-20260220095743-4e27870b4f51
 	github.com/klauspost/compress v1.18.5
 	github.com/opencontainers/runtime-spec v1.3.0
+	github.com/pelletier/go-toml/v2 v2.4.0


looks like we need this for reading kata's configuration.toml

yes. and ttrpc is for talking to kata-agent (should be in the commit message)

Davanum Srinivas (dims) · 2026-06-24T01:24:48Z

@@ -0,0 +1,52 @@
+# third_party/kata — vendored kata-containers sources
+
+Source copied from [kata-containers](https://github.com/kata-containers/kata-containers)


Anything we can ask kata folks to do here to make it easier for us consume?

I suppose we could ask for it to be split to another module?

I'm not sure if this is even supported, the kata-agent is something of an implementation detail currently, we could pretty easily substitute our own.

At the moment the most annoying part would be having another binary to plumb, especially without hosted builds.

It seemed premature to ask for that now. We can consider it for a follow-up, but it's functionally pretty similar to vendoring. If we need to upgrade this often, that's a signal that we should move to a more stable integration point.

I was originally using the kata shim, but ... the gap to do snapshot/restore is pretty big.

Davanum Srinivas (dims) · 2026-06-24T01:25:22Z

+`CreateSandbox`, `CreateContainer`, `StartContainer`, `UpdateInterface`, `UpdateRoutes`,
+`AddARPNeighbors`, `ReadStdout`, `ReadStderr`.
+
+### Regenerating


if we touch this again, we'll need to add a make file or script!

~agents :-)

Let's take a TODO -- agents sort of do it but it's hard to repeatedly verify exactly what was done to go from the Kata code to this codebase. It's easy to leave things out of a readme file because that is not executed deterministically.

Bowei Du (bowei) · 2026-06-24T06:44:55Z

+
+// WaitReady blocks until the api-socket answers vmm.ping or the deadline passes.
+func (c *Client) WaitReady(ctx context.Context, deadline time.Duration) error {
+	end := time.Now().Add(deadline)


why not use context deadline?

Bowei Du (bowei) · 2026-06-24T06:52:13Z

+	if err := m.Close(); err != nil {
+		return err
+	}
+	// Put the merged image at delta's name. We UNLINK CH's old delta FIRST, then


these comments are somewhat distracting detail-wise and could be summarized by something like "partially written back data will be discarded so there is no need to worry about sync'ing pages fully to disk".

Same with comment above.

Bowei Du (bowei) · 2026-06-24T06:54:12Z

+}
+
+// MergeDeltaIntoBase produces the same COMPLETE merged snapshot as
+// MergeSparseOverlay(base, delta, delta) — base with delta's populated pages


I found this comment confusing -- especially since it describes what MergeSparseOverlay does. Can we have the comment just describe what MergeDeltaIntoBase does directly?

Also, some wierdness: MergeSparseOverlay(base, delta, ?delta?)

Bowei Du (bowei) · 2026-06-24T06:54:26Z

+// checkpoint-state/), so the renames are same-filesystem (metadata-only). On the
+// off chance they straddle a mount boundary (EXDEV), it falls back to the copying
+// MergeSparseOverlay (base is untouched until the first rename succeeds).
+func MergeDeltaIntoBase(ctx context.Context, base, delta string) error {


suggest: base-> baseFile, deltaFile

Bowei Du (bowei) · 2026-06-24T06:56:37Z

+	return os.Rename(merged, delta)
+}
+
+// overlayDataRegions copies every populated (non-hole) region of src onto dst at


suggest just calling this copySparseRegions. There is no overlay being produced, you are overwriting the dst.

Bowei Du (bowei) · 2026-06-24T06:58:23Z

+			}
+			return copied, fmt.Errorf("SEEK_DATA: %w", err)
+		}
+		de, err := unix.Seek(sfd, ds, unix.SEEK_HOLE)


minor: there is an iterator-based impl over sparse that probably would be easier to understand.

Bowei Du (bowei) · 2026-06-24T07:00:04Z

+	// Raw HTTP/1.1 over the unix socket: net/http cannot attach SCM_RIGHTS, and
+	// CH's micro_http collects fds from the recvmsg ancillary data of the
+	// request that carries them.
+	req := fmt.Sprintf("PUT /api/v1/vm.restore HTTP/1.1\r\nHost: localhost\r\nAccept: */*\r\nContent-Type: application/json\r\nContent-Length: %d\r\n\r\n%s", len(body), body)


this is kind of gross -- can we not just use the standard http transport here?

I don't think so, because we need to do the SCM_RIGHTS stuff ...

You might be able to hack it with a custom round tripper, but probably save that for later. For right now, can we make it a bit more structured:

func formatHTTPRequest(method, path string, headers map[string]string, body string) string

and

func parseHTTPResponse(...) ...

so we can unit test this?

Bowei Du (bowei) · 2026-06-24T07:00:28Z

+	} else {
+		_ = conn.SetDeadline(time.Now().Add(30 * time.Second))
+	}
+	req := fmt.Sprintf("PUT /api/v1/vm.add-net HTTP/1.1\r\nHost: localhost\r\nAccept: */*\r\nContent-Type: application/json\r\nContent-Length: %d\r\n\r\n%s", len(body), body)


same here... is there a good reason why we have to do this completely raw?

Bowei Du (bowei) · 2026-06-24T16:44:04Z

+	if _, err := br.ReadString('\n'); err != nil { // the "OK <n>" line
+		return "debug-console CONNECT reply: " + err.Error()
+	}
+	// The kata debug console is an INTERACTIVE shell on a PTY (console.rs spawns


or you could look for the sentinel appearing exactly twice?

Sure. Didn't seem super important. This approach works fine.

It relies on bash string pasting semantics with the single quote, not the most obvious thing in the world.

Bowei Du (bowei) · 2026-06-24T16:59:55Z

+// DialAgent connects to the kata-agent through the hybrid-vsock socket at
+// vsockPath (VsockSocketPath(id)): plain-text "CONNECT <port>" handshake with
+// the VMM, then ttrpc over the stream.
+func DialAgent(ctx context.Context, vsockPath string) (*AgentClient, error) {


Some of this stuff is in agent/protocols/client/client.go in the Kata repo? Any reason we replicate it vs import from Kata?

We could third_party it as well, the dependency tree is huge and we just want a small subset for the client. This file is basically all crud on top of the protos + TTRPC.

Would be good to document that very obviously...

Bowei Du (bowei) · 2026-06-24T17:09:56Z

+// sandbox id (= actor id) then collides on the next attempt: "listen unix
+// .../virtiofsd.sock: bind: address already in use", "Could not bind mount
+// .../shared/sandboxes/<id>/mounts", "directory not empty". Calling this
+// before each run gives kata a clean slate. Safe when nothing exists.


[minor] delete "Safe when nothing exists" somewhat weird text

Bowei Du (bowei) · 2026-06-24T17:13:02Z

+		}
+		argv0 := strings.SplitN(string(cmdline), "\x00", 2)[0]
+		if strings.Contains(argv0, "cloud-hypervisor") || strings.Contains(argv0, "virtiofsd") || strings.Contains(argv0, "containerd-shim-kata") {
+			_ = unix.Kill(pid, unix.SIGKILL)


we should log on error instead of eating it

Bowei Du (bowei) · 2026-06-24T17:13:25Z

+		// Deepest paths first so child mounts unmount before their parents.
+		sort.Slice(mounts, func(i, j int) bool { return len(mounts[i]) > len(mounts[j]) })
+		for _, mp := range mounts {
+			_ = unix.Unmount(mp, unix.MNT_DETACH)


we should log on error, if we eat it and there is something wrong, then we would not know what happened?

Bowei Du (bowei) · 2026-06-24T17:13:31Z

+		}
+	}
+	for _, d := range dirs {
+		_ = os.RemoveAll(d)


Bowei Du (bowei) · 2026-06-24T17:21:10Z

+// configuration.toml. memDefault/vcpuDefault are substituted when the key is
+// absent or non-positive (kata also accepts default_vcpus = -1 meaning "all host
+// CPUs", which the owned boot does not support).
+func ParseConfig(base []byte, memDefault, vcpuDefault int) (KataConfig, error) {


*KataConfig? avoid copying by default

2 ints and a string? probably not worth it?

Bowei Du (bowei) · 2026-06-24T17:48:50Z

+
+var (
+	podUID = flag.String("pod-uid", "", "The UID of the current pod")
+


[minor] any reason to space this out like this?

no, some weird artifact, I AI split the commits (aside from generating much of the code). cleaning up along with most of these comments.

Bowei Du (bowei) · 2026-06-24T17:50:32Z

+
+	// Share one synchronized writer between the runtime logger and the actor-log
+	// forwarder (created below) so the two log streams to the pod's stdout don't
+	// interleave-corrupt each other's lines (mirrors ateom-gvisor).


[minor] probably don't want to leave a bunch of "mirrors ateom-gvisor" in the code.

We can pull this out to high level docs. I want the behaviors to align so switching sandboxClass is ~easy.

Bowei Du (bowei) · 2026-06-24T18:10:05Z

+}
+
+func lastLines(s string, n int) string {
+	lines := []string{}


var lines []string

Any particular reason to prefer that format?

mostly style -- it also doesn't alloc memory by default

Bowei Du (bowei) · 2026-06-24T18:11:36Z

+	t.Logf("reset-to-golden OK: discarded the rootfs write (disk sentinel gone) while RAM continuity held: %q", strings.TrimSpace(got))
+}
+
+func lastLines(s string, n int) string {


any reason why not

lines := strings.Split(s, "\n") if len(lines) < n { return strings.Join(lines, "\n") } etc

Bowei Du (bowei) · 2026-06-24T18:13:51Z

+	if err != nil {
+		return err
+	}
+	hostMAC, err := net.ParseMAC(hostVethMAC)


can we just do this in init()

Bowei Du (bowei) · 2026-06-24T18:44:39Z

+		return nil, fmt.Errorf("while writing %s: %w", baseIDFile, err)
+	}
+
+	// NB: the snapshot is MEMORY-ONLY (config/state/memory-ranges + base-id). The


do we need this comment text repeated in multiple places?

just AI nonsense. will clean up.

Bowei Du (bowei) · 2026-06-24T18:44:58Z

+	}
+	dSnapshot := time.Since(tSnapshot)
+
+	// Diff-snapshot completion for an OnDemand-restored actor: CH's snapshot here is


same here -- are you reminding yourself of this?

Bowei Du (bowei) · 2026-06-24T19:07:36Z

+
+	// Tear down the per-activation actor network (mirrors gVisor).
+	if err := s.cleanupActorNetwork(ctx); err != nil {
+		slog.WarnContext(ctx, "Failed to clean up actor network after checkpoint", slog.Any("err", err))


I'm wondering if we keep going or just panic if the environment seems to have failed to be cleaned up

I think we want to best effort tear down everything. Maybe after attempting to do all cleanup it panics.

Bowei Du (bowei) · 2026-06-24T19:10:03Z

+	// guest ext4 cache:
+	//   - same-node: a verbatim golden template (copyDiskFile) — guaranteed identical.
+	//   - cross-node: rebuild from the OCI image atelet unpacked to the bundle at
+	//     restore (mkfs.ext4 -d is LAYOUT-deterministic for identical inputs, so the


is this assumption always true for mkfs.ext4? probably ok in practice, although it would be kind of hard to diagnose if there were subtle shifts

Yeah, this is an explicit TODO. I have a stab at moving to go, but really we will just switch to virtiofsd + tmpfs overlay upper in the short term (perhaps before this merges even ...)

Bowei Du (bowei) · 2026-06-24T19:11:15Z

+			_ = chCmd.Process.Kill()
+		}
+	}()
+	// OnDemand (userfaultfd) memory restore: ~75ms vs ~1.8s eager, and it keeps the


is this comment needed here?

We need these details somewhere and we're setting the mode here, if there's temptation to switch to "Copy" mode it will be problematic for repeated roundtrips.

Bowei Du (bowei) · 2026-06-24T19:16:34Z

I took a pass through everything...

Given the current MVP state where we still have to hash out a bunch of things, I didn't find anything show stopping in the review. I'm sure we need to morph quite a few things with the merging of the gvisor and CHV shared code. One thing that would be good to fix is to break up some of the really long func and files into more self contained (understandable) pieces that are unit tested.

- cleanup_linux: log on error (unmount/RemoveAll/kill) via ctx slog instead of swallowing; drop the stale kata-shim leftovers (no shim/containerd) while keeping the virtiofsd path: remove the /run/vc/sbs dir and the containerd-shim-kata kill-arm, fix the doc comment. - run.go: include checkpointDir in the clear/create error messages; rename the resolvedRuntime.ch field to chBinary; make firstNonEmpty variadic + handle all-empty; drop stale "shim-owned"/"eager/shim paths" wording. - main.go: compact the flag var block; trim "mirrors ateom-gvisor" comments. - service_integration_test: simplify lastLines with strings.Split.

- ateom-base Dockerfile: full apt cleanup (apt-get clean + rm tmp). - stage-to-rustfs.sh: fail fast when the aws CLI is missing.

- Parse the fixed veth CIDRs/MACs/gateway once into package vars (mustParse* at init) instead of re-parsing per activation (bowei). - cleanupActorNetwork: gather the removeActorNftablesRules error and keep going (errors.Join + warn) instead of returning early. - enableIPv4Forwarding: open the sysctl O_WRONLY rather than os.WriteFile (do not create it if missing). - actorVethMTU: log a warning when the veth link can't be read before defaulting.

Drop the no-longer-used kata-shim from the example asset list; keep virtiofsd.

Benjamin Elder (BenTheElder) · 2026-06-25T01:15:19Z

I'm working on addressing the comments tonight.

I'm sure we need to morph quite a few things with the merging of the gvisor and CHV shared code

Yeah. We could do more code dedupe here.

I'm slightly concerned about lots of low level gvisor changes (e.g. networking stuff) landing into microvm in the short term, since we will have no CI coverage at least for now. But I suppose either way it has to be dealt with ASAP.

"Production Grade uVM" is a Beta target, FWIW.

I already have a bunch of local changes waiting to rebase and stack on this.

Leaning towards folding in some of the smaller and more critical changes given the timing.

Davanum Srinivas (dims) · 2026-06-25T01:16:43Z

Benjamin Elder (@BenTheElder) yes let's land this and iterate!

- Rename params base/delta/out -> baseFile/deltaFile/outFile. - Rename overlayDataRegions -> copySparseRegions (it overwrites dst, not an overlay). - Rewrite MergeDeltaIntoBase doc to describe it directly; condense the no-fsync and rename-to-free-name comments.

- workerpool_apply: rename applyMicroVMPodShape -> maybeApplyMicroVMPodShape and pass wp.Spec.SandboxClass instead of the whole WorkerPool. - sandboxconfig validation test: add an arm64 micro-VM asset-set case. - specconv: TODO to forward Seccomp/Sysctl + Apparmor/SELinux for OCI parity. - run.go: clarify the dialAgentRetry per-attempt cap vs retry-gap comment. - roadmap: drop the microVM line (shipped).

- readSparseZstd: validate totalSize >= 0 and that each extent falls within size (the header is read from the downloaded snapshot) — guards Truncate/Seek/CopyN. - copyZstdSparse: Truncate(0) up front so skipped (hole) regions can't expose stale bytes; it is a sparse write-out, not an in-place overlay. - Rename sendToGCSWithZstd -> sendZstd, sendToGCSStreaming -> sendStreamingZstd (the package is already ategcs; it also handles S3/rustfs).

The memory-only snapshot holds because the rootfs is a host-backed virtio-blk disk rather than a guest tmpfs overlay-upper (block writes still transit the guest page cache transiently). Reword the 'off guest RAM'/'NOT guest RAM' phrasing accordingly and trim the repeated memory-only NB.

run.go was ~1060 lines. Split the file (bowei) into cohesive units in the same package, no logic changes: - checkpoint.go: CheckpointWorkload + listFiles + teardownActor - restore.go: RestoreWorkload + rewriteSnapshotSocketPaths + repointActorRootfsDisk - spec.go: ensureKataCompatibleSpec + defaultKataMounts + defaultKataResources run.go keeps RunWorkload + the shared boot/agent/net helpers (~500 lines).

RunWorkload was ~400 lines. Extract the dense, self-contained blocks into helpers (no logic change); RunWorkload stays the orchestrator (the retErr-tied cleanup defers must live there): - guestConfig: parse kata config -> mem/vcpus/kernel-params - buildVMConfig: assemble the CH VmConfig (cmdline + disks + vsock) - startActorContainer: post-boot agent setup (sandbox, guest net, start container) RunWorkload is now ~163 lines.

Restructure the ategcs object paths so the compress/decompress logic is in small, unit-testable funcs that only touch io.Reader/io.Writer (bowei): - writeContent: sparse-extent (file) vs plain zstd; returns a writeContentResult ({logicalBytes, populatedBytes, sparse}) instead of multi-returns + side vars. - decodeContent: the symmetric download half (auto-detect sparse vs plain). - sendZstd is now a thin dispatcher; the temp-file path is sendBufferedZstd, symmetric with sendStreamingZstd (both call writeContent). - Rename the confusing logical/dataBytes -> logicalBytes/populatedBytes (log keys too). Add a direct writeContent<->decodeContent round-trip test.

Make the sparse-extent format streamable (bowei): instead of writing numExtents + the extent table up front, emit (off,len,data) frames terminated by an end-offset sentinel, with the metadata compressed alongside the data in the single zstd stream. The writer discovers + emits extents incrementally (no up-front scan to count them) and drops the in-memory extent table; the reader replays frames until the sentinel. Bump sparseVersion 1->2 (readers reject older snapshots); keep the per-extent bounds validation. Round-trip tests cover it.

Both implementers (gcsClient, test streamingMemStore) returned true and s3Client doesn't implement it at all, so the SupportsStreamingPut() bool was redundant with the type assertion (dims). Make streamingPutter a marker: presence of the (unexported) method is the signal; the call site checks only the assertion.

Add sparsezstd_test.go (bowei): table-driven writeSparseZstd<->readSparseZstd round-trips across hole/data layouts (empty, all-hole, all-data, leading/trailing holes, single + multi extent), plus malformed-input coverage of the reader's validation (bad version, negative size, extent past end, negative offset) and a truncated stream (missing end sentinel).

The existing TestCopyZstdSparse used a fresh dst, so it never exercised the defensive Truncate(0). Add TestCopyZstdSparseClearsStaleData: a dst pre-filled with stale non-zero bytes (larger than the new content) must come back byte-exact with holes zeroed and shrunk to the logical size.

Benjamin Elder (BenTheElder) · 2026-06-25T04:02:25Z

I'm pushing commits on top to make them more reviewable ... in theory. But I think it's probably going to be worth folding back into a logical stream again before merge ...

This is a lot for a single commit, so I don't think we should squash merge it, but I also don't necessarily think we want a ton of fixup commits at the end either.

The 'NB: the snapshot is memory-only...' note duplicated the CheckpointWorkload doc comment (bowei). Removed it. (main.go's var-block spacing artifact was already compacted in an earlier fixup.)

Benjamin Elder (BenTheElder) · 2026-06-25T06:30:29Z

main...BenTheElder:substrate:microvm-blk-rootfs-review-address-snapshot has the commits as reviewed + the review addressing commits ... I'll leave that as is for reference and then clean up the history here.

Bowei Du (bowei) · 2026-06-25T18:24:39Z

mega pr for micro vm

Benjamin Elder (BenTheElder) · 2026-06-26T00:43:11Z

... I meant to squash back in all the "Address review" comments before merge, got too tied up with oncall ... oh well.

cleaning up some follow-up in smaller PRs ...

Benjamin Elder (BenTheElder) requested a review from Taahir Ahmed (ahmedtd) June 23, 2026 06:01

Benjamin Elder (BenTheElder) marked this pull request as ready for review June 23, 2026 06:37

Benjamin Elder (BenTheElder) requested a review from Bowei Du (bowei) June 23, 2026 17:20

Bowei Du (bowei) reviewed Jun 23, 2026

View reviewed changes

Comment thread hack/update/licenses.sh

This was referenced Jun 23, 2026

ateom-gvisor: tag forwarded actor logs with ate.dev/container_name #290

Merged

hack/update/licenses.sh: mirror in-tree third_party licenses #291

Merged

Benjamin Elder (BenTheElder) force-pushed the microvm-blk-rootfs branch 2 times, most recently from 3b96e1e to 9937fa0 Compare June 23, 2026 22:37

Benjamin Elder (BenTheElder) force-pushed the microvm-blk-rootfs branch from 9937fa0 to b668f12 Compare June 23, 2026 22:54

Michelle Au (msau42) reviewed Jun 23, 2026

View reviewed changes

Benjamin Elder (BenTheElder) added 10 commits June 23, 2026 17:01

vendor: add go-toml/v2 and ttrpc

1a2bbbe

go-toml/v2 parses the kata configuration.toml; ttrpc (and its log dependency) backs the kata-agent client the micro-VM runtime drives. Includes their licenses.

proto(ateompb): add runtime_asset_paths and snapshot_files

964d2ee

atelet passes the runtime-fetched sandbox asset paths to the ateom worker and records the exact set of files in each snapshot, so snapshots are self-describing and the worker image bakes in no sandbox toolchain.

api,atecontroller: add the micro-VM sandbox class

c4cf583

WorkerPool renders the micro-VM worker pod shape, and the SandboxConfig validating admission policy enforces the micro-VM asset set (cloud-hypervisor, guest kernel, guest image, kata config).

Benjamin Elder (BenTheElder) force-pushed the microvm-blk-rootfs branch from b668f12 to 2b4efb5 Compare June 24, 2026 00:25

Davanum Srinivas (dims) reviewed Jun 24, 2026

View reviewed changes

Comment thread cmd/atecontroller/internal/controllers/workerpool_apply.go Outdated

Bowei Du (bowei) reviewed Jun 24, 2026

View reviewed changes

Benjamin Elder (BenTheElder) added 4 commits June 24, 2026 17:56

address review: hack tooling cleanups

2539cc2

- ateom-base Dockerfile: full apt cleanup (apt-get clean + rm tmp). - stage-to-rustfs.sh: fail fast when the aws CLI is missing.

address review: trim runtime_asset_paths example (drop kata-shim)

67975a6

Drop the no-longer-used kata-shim from the example asset list; keep virtiofsd.

Benjamin Elder (BenTheElder) added 11 commits June 24, 2026 18:40

address review: drop redundant memory-only NB comment

9470764

The 'NB: the snapshot is memory-only...' note duplicated the CheckpointWorkload doc comment (bowei). Removed it. (main.go's var-block spacing artifact was already compacted in an earlier fixup.)

Bowei Du (bowei) approved these changes Jun 25, 2026

View reviewed changes

Bowei Du (bowei) merged commit e44249f into agent-substrate:main Jun 25, 2026
9 checks passed

Benjamin Elder (BenTheElder) mentioned this pull request Jun 26, 2026

microvm MVP cleanup: align rootfs behavior to gvisor, ... #313

Open

2 tasks

Davanum Srinivas (dims) mentioned this pull request Jun 26, 2026

ateom-microvm: give the micro-VM guest DNS so name-based egress works dims/substrate#1

Closed

		@@ -0,0 +1,52 @@
		# third_party/kata — vendored kata-containers sources

		Source copied from [kata-containers](https://github.com/kata-containers/kata-containers)


		var (
		podUID = flag.String("pod-uid", "", "The UID of the current pod")

Uh oh!

Conversation

Benjamin Elder (BenTheElder) commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Davanum Srinivas (dims) commented Jun 23, 2026

Uh oh!

Benjamin Elder (BenTheElder) commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Benjamin Elder (BenTheElder) commented Jun 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Benjamin Elder (BenTheElder) Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Benjamin Elder (BenTheElder) commented Jun 23, 2026 •

edited

Loading

Benjamin Elder (BenTheElder) commented Jun 23, 2026 •

edited

Loading

Benjamin Elder (BenTheElder) Jun 24, 2026 •

edited

Loading