rel/0.41.0 → main#153
Open
mobileoverlord wants to merge 19 commits into
Open
Conversation
Users and groups are now declared in a top-level `permissions:` map and referenced by name from `rootfs.<name>.permissions` / `initramfs.<name>.permissions` (or inlined), instead of buried inside one extension. This puts identity provisioning at the image layer where a single coherent passwd/shadow/group makes sense, lets the same block be reused across rootfs and initramfs, and leaves room to grow into directory perms or sudoers without further grammar churn. When no `permissions:` is set on an image, no script section is emitted — the base packages' generic /etc/passwd/shadow/group are left untouched. Extensions that still declare `users:` / `groups:` continue to work but emit a deprecation warning; that path will be removed in a future release. The script generator was extracted from ext/build.rs into a shared `utils::permissions::render_users_groups_script` helper so the legacy extension path and the new rootfs/initramfs path share one implementation.
Previously, ext install/build/image all shared a single
`compute_ext_input_hash` and runtime install/build shared a single
`compute_runtime_input_hash`. Editing a field that only affects the
build (e.g. ext `image:` kabtool args, `var_files:`, runtime `var:`,
runtime `post_build:`) invalidated the install stamp too, which cascaded
into the install step being re-run via the dependency chain.
Per-step hash functions now cover exactly what each step uses:
ext install -> packages, types, source
ext build -> install inputs + image, overlay, post_build (path + content)
ext image -> build inputs + var_files, subvolumes, filesystem
runtime install -> packages, target
runtime build -> install inputs + narrowed kernel, var, var_files,
post_build (path + content), rootfs/initramfs filesystem,
ext docker_images
sdk install -> sdk.packages/image/repo_url/repo_release (no longer
includes rootfs/initramfs.packages — those have their
own install stamps)
rootfs install -> rootfs.packages, rootfs.overlay, narrowed kernel,
post_install (path + content)
initramfs install -> same shape as rootfs
The `kernel:` block is now hashed via a narrow {package, version, compile,
install} mapping at every call site, so cosmetic edits (metadata, new
fields) don't invalidate stamps that don't actually consume them. The
`post_build` / `post_install` hooks now hash script *contents* in addition
to the path, so editing the script body invalidates the stamp without
`--no-stamps`.
`validate_stamps_batch` now accepts a slice of (component, command,
hash) triples so each requirement is compared against the matching
step's hash instead of one shared hash applied to all stamps for a
component.
STAMP_VERSION bumped 1 -> 2; older stamps invalidate on first run after
upgrade, then the new narrower hashes apply going forward.
Adds 14 negative-invalidation tests locking the new shape in place ("X
must NOT invalidate Y" for each step+field pair we untangled).
…resolution When a runtime has no explicit `rootfs:` / `initramfs:` ref (the common case for projects that define images at the top level), the resolver returned None and the permissions section came out empty — meaning the root user's shadow entry never got rewritten, root login was silently broken on the resulting image. Fix: in runtime/build.rs, fall back to `config.rootfs_default()` / `config.initramfs_default()` when the runtime-level ref is unset, same fallback the image build itself uses for filesystem/post_install. Adds a regression test in `utils::config::tests` that mirrors the test project shape (top-level rootfs/initramfs with `permissions: dev`, runtime declares no rootfs/initramfs of its own) and asserts the fallback path picks up the permissions block. Verified end-to-end: after rebuild, the rootfs erofs image's /etc/shadow now carries `root::19000:...` (empty password) instead of the inherited `root:*:...` from the sysroot.
Investigate why `avocado runtime deploy` fails on macOS and design the fix. Root cause: the deploy script runs inside the SDK container, which runs inside the slirp-NAT'd avocado-vm, so the TUF repo HTTP server (:8585) it starts is unreachable by the target device, and the script's host-IP autodetect returns container/VM addresses. Plan: a per-deploy QMP hostfwd (bound 0.0.0.0, opened only during deploy) + publishing the container repo port to the VM + setting AVOCADO_DEPLOY_REPO_HOST to the macOS LAN IP (get_local_ip_for_remote), surfaced as a reusable `avocado vm port-forward` primitive. No desktop change — the CLI owns the qemu lifecycle. Plan only; no behavior change yet.
On macOS the deploy container runs inside the slirp-NAT'd avocado-vm, so the TUF repo HTTP server it starts (:8585) was unreachable by the target device and the in-container host-IP autodetect returned VM-internal addresses. Bridge the device->repo path: - qmp: add human_monitor_command + hostfwd_add/hostfwd_remove (runtime slirp port forwarding via the QEMU monitor), with unit tests. - deploy: on macOS/Windows (is_docker_desktop), set AVOCADO_DEPLOY_REPO_HOST to this host's LAN IP and publish the repo port; on the avocado-vm (is_vm_routing_active) also open a `hostfwd 0.0.0.0:PORT->guest:PORT` for the deploy and tear it down afterward. Skip `-p` when the SDK container uses host networking (docker discards it and the hostfwd already reaches the VM-bound port). Linux (native docker) is untouched. Validated end-to-end to a LAN Raspberry Pi 4: device fetched the repo metadata over the forward (HTTP 200). See docs/features/macos-deploy-port-forwarding.md.
Lets the SDK trust a self-signed / private-CA package endpoint (e.g. an internal Pulp behind package-ca). Centralized so it covers EVERY dnf invocation - sdk bootstrap, sdk packages, ext, runtime, rootfs, initramfs, and the per-module 'dnf' subcommands, host AND target repo confs: - config: distro.repo.ca + distro.repo.tls_verify; resolvers get_repo_ca()/get_repo_insecure() (env AVOCADO_REPO_CA / AVOCADO_REPO_INSECURE win over config). promote_repo_tls_env() pushes config values to the process env at load so the container env-builders pick them up uniformly. - container: inject_repo_tls_env() adds AVOCADO_REPO_CA_B64 (base64 of the CA file) + AVOCADO_REPO_INSECURE to the container env at the env-builder chokepoints. REPO_TLS_SETUP_SNIPPET appends the CA to the SDK trust bundle (which SSL_CERT_FILE/CURL_CA_BUNDLE and every explicit sslcacert point at) and, for insecure, adds --setopt=sslverify=0 to DNF_SDK_HOST (base of every dnf call). Emitted by both entrypoint generators. - sdk bootstrap: snippet appended to the bootstrap command so the FIRST dnf (target pkg from sdk/all) is covered too.
Pin each target to an immutable point-in-time snapshot of its feed channel
so a clean + rebuild reproduces exactly, even after the live channel head
advances or evicts the NEVRAs the lock file references.
Mechanism: every dnf baseurl is ${repo_url}/$releasever/... with releasever
= {release}/{channel}; pinning injects one segment -> {release}/{channel}/
snapshots/<id>, exposed via AVOCADO_RELEASEVER (which get_releasever() already
honors first), so all sysroots freeze together with no per-call-site plumbing.
- Lock file v7: per-target `repo-snapshot` (RepoSnapshot). Additive — v6 reads
as v7 with no pin (= track head), fully backward-compatible. merge adopts a
disk pin when the writer has none; unlock (clear_all) drops it.
- utils/snapshot.rs: resolve-and-apply runs once per command — reuse a matching
pin, auto-pin to the channel's latest snapshot on first fetch, pre-flight a
pinned snapshot and emit an actionable "run avocado update" error if it was
GC'd, warn + track head on a stale release/channel, degrade to head if the
feed serves no snapshots (snapshots-latest.json 404s). Honors repo CA / TLS.
- Wired into install (umbrella) + fetch + sdk/rootfs/runtime/ext/initramfs
install; fetch stays the reproducible metadata cache.
- avocado update: Cargo-style move-forward — advance the snapshot pin to newest
and clear package/kernel pins so the next install re-resolves + re-locks.
- Tests: v6->v7 migration, round-trip, clear-on-unlock, merge-adopts-disk-pin,
plus pure releasever/pin-status/url transforms.
…repo URL The snapshot resolver early-returned when distro.repo.url was unset, so projects relying on the baked default feed (no explicit repo.url) never recorded a repo-snapshot pin even though their dnf fetch hit that default. Fix by deriving the same default the container uses. Single source of truth: add Config::DEFAULT_REPO_URL + Config::effective_repo_url() in config.rs. The snapshot resolver uses effective_repo_url(); the container env-builder always sets AVOCADO_SDK_REPO_URL from the same const, so the shell's duplicated literal default is removed (it just consumes the env now).
Packaged extensions now nest their content under /<ext_name>/ and self-describe the layout via `Provides: avocado-ext-layout(nested)`. ext_fetch repoqueries that provide (repo metadata, no download) and installs nested packages into the SHARED $AVOCADO_PREFIX/includes installroot, so one rpmdb tracks every installed extension with no cross-extension file collisions. Legacy packages lacking the provide keep the per-extension installroot. Either way the final content lands at includes/<ext_name>/, so consumers are unchanged.
Build-once publish of a packaged extension RPM to the feed, plus status and list of published versions. Adds the commands::connect::ext module and wires the ConnectExtCommands subcommands and dispatch in main.rs.
Add the avocado.yaml manifest plus compile/install/clean helper scripts that build avocado-cli into the avocado-ext-cli extension, and gitignore the transient /.cargo/ cross-compile config that avocado-cli-compile.sh writes during the build.
`{{ env.VAR }}` interpolation of an unset variable emitted its warning with a
raw eprintln!, which lands inside the TaskRenderer's live cursor region without
being counted in rendered_lines. The next redraw's MoveUp/Clear then cleared one
line too few and stranded a task line, showing as stacked "sdk bootstrap" spinner
lines during installs that fetch remote extensions (whose configs use
`{{ env.AVOCADO_EXT_VERSION }}`). Route it through print_warning, which is
suppressed while a TUI/JSON renderer is active and still prints in plain/CI runs.
runtime/deploy.rs referenced crate::utils::vm::qmp::QmpClient unconditionally, but the qmp module is `#[cfg(unix)]` (unix-socket transport). That broke the Windows `cargo check` (E0433: cannot find `qmp` in `vm`). Gate the port-forward setup and teardown behind cfg(unix) with a non-unix no-op; avocado-vm routing only occurs on unix hosts, so there is no behavior change on unix.
QEMU's `-machine virt` doesn't emit `cpu-idle-states` device-tree bindings, so CONFIG_ARM_PSCI_CPUIDLE never binds and idle CPUs fall back to bare WFI. Under HVF that pattern bounces the vCPU thread through vmexit/vmenter instead of blocking on the WFI handler's pthread_cond_timedwait, costing ~80% host CPU per vCPU at guest idle. On arm64 launches we now dump QEMU's auto-generated DTB once (via `-machine virt,dumpdtb=`), splice in `/idle-states/cpu-sleep-0` plus per-CPU `cpu-idle-states` properties, cache the patched copy under `~/.avocado/vm/dtb/` keyed by (smp, memory, qemu_version), and pass it back with `-dtb`. Cache hits on subsequent launches. Measured on smp=8 idle: 670% -> 275-344% host CPU. State1 stays cosmetic on HVF (PSCI CPU_SUSPEND isn't deeper than WFI) but the framework binding alone fixes the vmexit-loop pattern. Pure-Rust FDT v17 parse/serialize in fdt.rs; no external dtc dependency. Failures degrade gracefully to the previous auto-generated DTB path. `AVOCADO_VM_DTB` env var preserved as a debug override.
Adds a long-lived `avocado vm supervise` process spawned alongside
QEMU. Owns the user-facing SSH port and docker socket; QEMU's hostfwd
moves to a loopback-only internal port. The supervisor:
- Proxies inbound TCP to QEMU's internal hostfwd. On accept, sends
QMP `cont` if the VM is paused; SSH handshake then continues
against the freshly-resumed guest.
- Owns `~/.avocado/vm/docker.sock`. On accept, wakes the VM and
lazily spawns the ssh -L tunnel to /run/docker.sock in the guest
(cached for the awake-window, torn down on pause so QEMU can
sleep cleanly).
- Tracks active connections + idle timer. With no inbound activity
for `idle.hibernate_after_secs` (default 10s for testing), sends
QMP `stop`. Host CPU on QEMU drops to ~0% while RAM stays
resident. Any subsequent SSH or docker connection wakes it
transparently.
Cache key for the DTB also switches from `qemu --version` to the QEMU
binary mtime — saves ~300-500ms of subprocess overhead on every VM
start. Mtime naturally invalidates on `brew upgrade qemu`.
Known limitations (deferred):
- Docker forwarder lifecycle is now supervisor-owned when
hibernation is enabled (idle_after_secs > 0); legacy long-lived
forwarder still used when disabled to avoid regressing existing
non-hibernating setups.
- CPU hotplug for awake-but-idle floor: QMP `device_add` returns
"machine does not support hot-plugging CPUs" on QEMU 11 + HVF +
ARM virt. Defer; Linux CPU offline is a fallback path if needed
later.
The 10s default was useful while iterating on the supervisor — short enough to verify pause/wake every few minutes of testing. For real use, 10s pauses mid-SSH-session whenever the user pauses to think, which adds noticeable wake latency on every command. 60s is comfortable for normal interactive work while still freeing host CPU within a minute of stepping away. Users who want either extreme can override via `avocado vm config set idle.hibernate_after_secs N` or the `AVOCADO_VM_IDLE_HIBERNATE_SECS` env var.
`cargo fmt --check` and `cargo clippy --all-targets --all-features -- -D warnings` were both failing on the just-merged supervisor and DTB changes. Auto-applies rustfmt and rewrites three `pos % 4 != 0` clippy::manual_is_multiple_of sites in fdt.rs to `!pos.is_multiple_of(4)`. No behavior changes.
The hibernation supervisor uses tokio's UnixListener/UnixStream for
the docker socket path and tokio::signal::unix for graceful shutdown,
neither of which exist on Windows. Without gating, `cargo check
--target x86_64-pc-windows-gnu` fails with E0432 (unresolved
UnixListener/UnixStream imports).
Gated unix-only:
- `pub mod supervisor` in utils/vm/mod.rs
- `pub mod supervise` in commands/vm/mod.rs
- `VmCommands::Supervise` variant + dispatch in main.rs
- `spawn_supervisor` / `stop_supervisor` / `resolve_idle_after_secs`
/ `DEFAULT_IDLE_AFTER_SECS` in lifecycle.rs
- The internal-port pick + ssh_port file write in `start`
On Windows the hibernation feature is unavailable: QEMU binds the
user-facing port directly (today's pre-supervisor behavior), the
legacy long-lived docker forwarder runs, and the VM never auto-pauses.
3caa3e3 to
3487b50
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Integrates the
rel/0.41.0line intomain, rebased onto the latestmain(picks up 0.40.1/0.40.2 and the connect org fallback; the duplicate node24/indent-fix/release commits were dropped as already-applied).Features
connect extpublish/status/list — super-admin build-once publish of a packaged extension RPM to the feed, plus status/list of published versions.Provides: avocado-ext-layout(nested)and nest content under/<ext_name>/;ext_fetchinstalls them into the sharedincludesinstallroot (one rpmdb, no cross-extension collisions). Legacy packages keep the per-extension installroot.avocado deployon macOS via VM port forwarding (+ design notes).permissions:block for rootfs/initramfs.Fixes / hardening
eprintln!during{{ env.VAR }}interpolation landed inside the TaskRenderer's cursor region without being counted inrendered_lines, stranding task lines (stacked "sdk bootstrap" spinner lines) during installs that fetch remote extensions.Build
avocado-ext-cliextension (manifest + compile/install/clean scripts).Notes
Cargo.tomlis still at0.40.2— norelease: 0.41.0version bump is included in this PR.cargo fmt --check,cargo clippy --all-targets --all-features -- -D warnings, andcargo testall pass on the rebased tip.