Skip to content

rel/0.41.0 → main#153

Open
mobileoverlord wants to merge 19 commits into
mainfrom
rel/0.41.0
Open

rel/0.41.0 → main#153
mobileoverlord wants to merge 19 commits into
mainfrom
rel/0.41.0

Conversation

@mobileoverlord
Copy link
Copy Markdown
Contributor

Integrates the rel/0.41.0 line into main, rebased onto the latest main (picks up 0.40.1/0.40.2 and the connect org fallback; the duplicate node24/indent-fix/release commits were dropped as already-applied).

Features

  • feat(connect): connect ext publish/status/list — super-admin build-once publish of a packaged extension RPM to the feed, plus status/list of published versions.
  • feat(ext): nested extension layout — packaged extensions self-describe Provides: avocado-ext-layout(nested) and nest content under /<ext_name>/; ext_fetch installs them into the shared includes installroot (one rpmdb, no cross-extension collisions). Legacy packages keep the per-extension installroot.
  • feat(snapshots): reproducible channel snapshot pinning — lock-file channel snapshots, auto-pinned against the default feed; repo URL single-sourced.
  • deploy: avocado deploy on macOS via VM port forwarding (+ design notes).
  • config: top-level permissions: block for rootfs/initramfs.

Fixes / hardening

  • fix(tui): route the unset-env-var warning through the output module — a raw eprintln! during {{ env.VAR }} interpolation landed inside the TaskRenderer's cursor region without being counted in rendered_lines, stranding task lines (stacked "sdk bootstrap" spinner lines) during installs that fetch remote extensions.
  • repo TLS: custom CA + insecure mode across all dnf phases.
  • runtime build: fall back to default rootfs/initramfs for permissions resolution.
  • stamps: split input hashes per build step to fix over-invalidation.

Build

  • build(ext): package avocado-cli itself as the avocado-ext-cli extension (manifest + compile/install/clean scripts).

Notes

  • Cargo.toml is still at 0.40.2 — no release: 0.41.0 version bump is included in this PR.
  • Local verification: cargo fmt --check, cargo clippy --all-targets --all-features -- -D warnings, and cargo test all pass on the rebased tip.

mobileoverlord and others added 12 commits June 2, 2026 10:58
Users and groups are now declared in a top-level `permissions:` map and
referenced by name from `rootfs.<name>.permissions` / `initramfs.<name>.permissions`
(or inlined), instead of buried inside one extension. This puts identity
provisioning at the image layer where a single coherent passwd/shadow/group
makes sense, lets the same block be reused across rootfs and initramfs,
and leaves room to grow into directory perms or sudoers without further
grammar churn.

When no `permissions:` is set on an image, no script section is emitted —
the base packages' generic /etc/passwd/shadow/group are left untouched.

Extensions that still declare `users:` / `groups:` continue to work but
emit a deprecation warning; that path will be removed in a future release.

The script generator was extracted from ext/build.rs into a shared
`utils::permissions::render_users_groups_script` helper so the legacy
extension path and the new rootfs/initramfs path share one implementation.
Previously, ext install/build/image all shared a single
`compute_ext_input_hash` and runtime install/build shared a single
`compute_runtime_input_hash`. Editing a field that only affects the
build (e.g. ext `image:` kabtool args, `var_files:`, runtime `var:`,
runtime `post_build:`) invalidated the install stamp too, which cascaded
into the install step being re-run via the dependency chain.

Per-step hash functions now cover exactly what each step uses:

  ext install -> packages, types, source
  ext build   -> install inputs + image, overlay, post_build (path + content)
  ext image   -> build inputs + var_files, subvolumes, filesystem
  runtime install -> packages, target
  runtime build   -> install inputs + narrowed kernel, var, var_files,
                     post_build (path + content), rootfs/initramfs filesystem,
                     ext docker_images
  sdk install     -> sdk.packages/image/repo_url/repo_release (no longer
                     includes rootfs/initramfs.packages — those have their
                     own install stamps)
  rootfs install  -> rootfs.packages, rootfs.overlay, narrowed kernel,
                     post_install (path + content)
  initramfs install -> same shape as rootfs

The `kernel:` block is now hashed via a narrow {package, version, compile,
install} mapping at every call site, so cosmetic edits (metadata, new
fields) don't invalidate stamps that don't actually consume them. The
`post_build` / `post_install` hooks now hash script *contents* in addition
to the path, so editing the script body invalidates the stamp without
`--no-stamps`.

`validate_stamps_batch` now accepts a slice of (component, command,
hash) triples so each requirement is compared against the matching
step's hash instead of one shared hash applied to all stamps for a
component.

STAMP_VERSION bumped 1 -> 2; older stamps invalidate on first run after
upgrade, then the new narrower hashes apply going forward.

Adds 14 negative-invalidation tests locking the new shape in place ("X
must NOT invalidate Y" for each step+field pair we untangled).
…resolution

When a runtime has no explicit `rootfs:` / `initramfs:` ref (the common
case for projects that define images at the top level), the resolver
returned None and the permissions section came out empty — meaning the
root user's shadow entry never got rewritten, root login was silently
broken on the resulting image.

Fix: in runtime/build.rs, fall back to `config.rootfs_default()` /
`config.initramfs_default()` when the runtime-level ref is unset, same
fallback the image build itself uses for filesystem/post_install.

Adds a regression test in `utils::config::tests` that mirrors the test
project shape (top-level rootfs/initramfs with `permissions: dev`,
runtime declares no rootfs/initramfs of its own) and asserts the
fallback path picks up the permissions block.

Verified end-to-end: after rebuild, the rootfs erofs image's
/etc/shadow now carries `root::19000:...` (empty password) instead of
the inherited `root:*:...` from the sysroot.
Investigate why `avocado runtime deploy` fails on macOS and design the
fix. Root cause: the deploy script runs inside the SDK container, which
runs inside the slirp-NAT'd avocado-vm, so the TUF repo HTTP server
(:8585) it starts is unreachable by the target device, and the script's
host-IP autodetect returns container/VM addresses.

Plan: a per-deploy QMP hostfwd (bound 0.0.0.0, opened only during
deploy) + publishing the container repo port to the VM + setting
AVOCADO_DEPLOY_REPO_HOST to the macOS LAN IP (get_local_ip_for_remote),
surfaced as a reusable `avocado vm port-forward` primitive. No desktop
change — the CLI owns the qemu lifecycle.

Plan only; no behavior change yet.
On macOS the deploy container runs inside the slirp-NAT'd avocado-vm, so
the TUF repo HTTP server it starts (:8585) was unreachable by the target
device and the in-container host-IP autodetect returned VM-internal
addresses. Bridge the device->repo path:

- qmp: add human_monitor_command + hostfwd_add/hostfwd_remove (runtime
  slirp port forwarding via the QEMU monitor), with unit tests.
- deploy: on macOS/Windows (is_docker_desktop), set AVOCADO_DEPLOY_REPO_HOST
  to this host's LAN IP and publish the repo port; on the avocado-vm
  (is_vm_routing_active) also open a `hostfwd 0.0.0.0:PORT->guest:PORT`
  for the deploy and tear it down afterward. Skip `-p` when the SDK
  container uses host networking (docker discards it and the hostfwd
  already reaches the VM-bound port). Linux (native docker) is untouched.

Validated end-to-end to a LAN Raspberry Pi 4: device fetched the repo
metadata over the forward (HTTP 200). See
docs/features/macos-deploy-port-forwarding.md.
Lets the SDK trust a self-signed / private-CA package endpoint (e.g. an internal Pulp behind
package-ca). Centralized so it covers EVERY dnf invocation - sdk bootstrap, sdk packages, ext,
runtime, rootfs, initramfs, and the per-module 'dnf' subcommands, host AND target repo confs:

- config: distro.repo.ca + distro.repo.tls_verify; resolvers get_repo_ca()/get_repo_insecure()
  (env AVOCADO_REPO_CA / AVOCADO_REPO_INSECURE win over config). promote_repo_tls_env() pushes
  config values to the process env at load so the container env-builders pick them up uniformly.
- container: inject_repo_tls_env() adds AVOCADO_REPO_CA_B64 (base64 of the CA file) +
  AVOCADO_REPO_INSECURE to the container env at the env-builder chokepoints. REPO_TLS_SETUP_SNIPPET
  appends the CA to the SDK trust bundle (which SSL_CERT_FILE/CURL_CA_BUNDLE and every explicit
  sslcacert point at) and, for insecure, adds --setopt=sslverify=0 to DNF_SDK_HOST (base of every
  dnf call). Emitted by both entrypoint generators.
- sdk bootstrap: snippet appended to the bootstrap command so the FIRST dnf (target pkg from
  sdk/all) is covered too.
Pin each target to an immutable point-in-time snapshot of its feed channel
so a clean + rebuild reproduces exactly, even after the live channel head
advances or evicts the NEVRAs the lock file references.

Mechanism: every dnf baseurl is ${repo_url}/$releasever/... with releasever
= {release}/{channel}; pinning injects one segment -> {release}/{channel}/
snapshots/<id>, exposed via AVOCADO_RELEASEVER (which get_releasever() already
honors first), so all sysroots freeze together with no per-call-site plumbing.

- Lock file v7: per-target `repo-snapshot` (RepoSnapshot). Additive — v6 reads
  as v7 with no pin (= track head), fully backward-compatible. merge adopts a
  disk pin when the writer has none; unlock (clear_all) drops it.
- utils/snapshot.rs: resolve-and-apply runs once per command — reuse a matching
  pin, auto-pin to the channel's latest snapshot on first fetch, pre-flight a
  pinned snapshot and emit an actionable "run avocado update" error if it was
  GC'd, warn + track head on a stale release/channel, degrade to head if the
  feed serves no snapshots (snapshots-latest.json 404s). Honors repo CA / TLS.
- Wired into install (umbrella) + fetch + sdk/rootfs/runtime/ext/initramfs
  install; fetch stays the reproducible metadata cache.
- avocado update: Cargo-style move-forward — advance the snapshot pin to newest
  and clear package/kernel pins so the next install re-resolves + re-locks.
- Tests: v6->v7 migration, round-trip, clear-on-unlock, merge-adopts-disk-pin,
  plus pure releasever/pin-status/url transforms.
…repo URL

The snapshot resolver early-returned when distro.repo.url was unset, so projects
relying on the baked default feed (no explicit repo.url) never recorded a
repo-snapshot pin even though their dnf fetch hit that default. Fix by deriving
the same default the container uses.

Single source of truth: add Config::DEFAULT_REPO_URL + Config::effective_repo_url()
in config.rs. The snapshot resolver uses effective_repo_url(); the container
env-builder always sets AVOCADO_SDK_REPO_URL from the same const, so the shell's
duplicated literal default is removed (it just consumes the env now).
Packaged extensions now nest their content under /<ext_name>/ and self-describe
the layout via `Provides: avocado-ext-layout(nested)`. ext_fetch repoqueries that
provide (repo metadata, no download) and installs nested packages into the SHARED
$AVOCADO_PREFIX/includes installroot, so one rpmdb tracks every installed
extension with no cross-extension file collisions. Legacy packages lacking the
provide keep the per-extension installroot. Either way the final content lands at
includes/<ext_name>/, so consumers are unchanged.
Build-once publish of a packaged extension RPM to the feed, plus status and
list of published versions. Adds the commands::connect::ext module and wires
the ConnectExtCommands subcommands and dispatch in main.rs.
Add the avocado.yaml manifest plus compile/install/clean helper scripts that
build avocado-cli into the avocado-ext-cli extension, and gitignore the
transient /.cargo/ cross-compile config that avocado-cli-compile.sh writes
during the build.
`{{ env.VAR }}` interpolation of an unset variable emitted its warning with a
raw eprintln!, which lands inside the TaskRenderer's live cursor region without
being counted in rendered_lines. The next redraw's MoveUp/Clear then cleared one
line too few and stranded a task line, showing as stacked "sdk bootstrap" spinner
lines during installs that fetch remote extensions (whose configs use
`{{ env.AVOCADO_EXT_VERSION }}`). Route it through print_warning, which is
suppressed while a TUI/JSON renderer is active and still prints in plain/CI runs.
Comment thread src/utils/permissions.rs Dismissed
Comment thread src/utils/permissions.rs Dismissed
Comment thread src/utils/permissions.rs Dismissed
Comment thread src/utils/permissions.rs Dismissed
runtime/deploy.rs referenced crate::utils::vm::qmp::QmpClient unconditionally,
but the qmp module is `#[cfg(unix)]` (unix-socket transport). That broke the
Windows `cargo check` (E0433: cannot find `qmp` in `vm`). Gate the port-forward
setup and teardown behind cfg(unix) with a non-unix no-op; avocado-vm routing
only occurs on unix hosts, so there is no behavior change on unix.
QEMU's `-machine virt` doesn't emit `cpu-idle-states` device-tree
bindings, so CONFIG_ARM_PSCI_CPUIDLE never binds and idle CPUs fall
back to bare WFI. Under HVF that pattern bounces the vCPU thread
through vmexit/vmenter instead of blocking on the WFI handler's
pthread_cond_timedwait, costing ~80% host CPU per vCPU at guest idle.

On arm64 launches we now dump QEMU's auto-generated DTB once (via
`-machine virt,dumpdtb=`), splice in `/idle-states/cpu-sleep-0` plus
per-CPU `cpu-idle-states` properties, cache the patched copy under
`~/.avocado/vm/dtb/` keyed by (smp, memory, qemu_version), and pass
it back with `-dtb`. Cache hits on subsequent launches.

Measured on smp=8 idle: 670% -> 275-344% host CPU. State1 stays
cosmetic on HVF (PSCI CPU_SUSPEND isn't deeper than WFI) but the
framework binding alone fixes the vmexit-loop pattern.

Pure-Rust FDT v17 parse/serialize in fdt.rs; no external dtc
dependency. Failures degrade gracefully to the previous auto-generated
DTB path. `AVOCADO_VM_DTB` env var preserved as a debug override.
Adds a long-lived `avocado vm supervise` process spawned alongside
QEMU. Owns the user-facing SSH port and docker socket; QEMU's hostfwd
moves to a loopback-only internal port. The supervisor:

  - Proxies inbound TCP to QEMU's internal hostfwd. On accept, sends
    QMP `cont` if the VM is paused; SSH handshake then continues
    against the freshly-resumed guest.
  - Owns `~/.avocado/vm/docker.sock`. On accept, wakes the VM and
    lazily spawns the ssh -L tunnel to /run/docker.sock in the guest
    (cached for the awake-window, torn down on pause so QEMU can
    sleep cleanly).
  - Tracks active connections + idle timer. With no inbound activity
    for `idle.hibernate_after_secs` (default 10s for testing), sends
    QMP `stop`. Host CPU on QEMU drops to ~0% while RAM stays
    resident. Any subsequent SSH or docker connection wakes it
    transparently.

Cache key for the DTB also switches from `qemu --version` to the QEMU
binary mtime — saves ~300-500ms of subprocess overhead on every VM
start. Mtime naturally invalidates on `brew upgrade qemu`.

Known limitations (deferred):
  - Docker forwarder lifecycle is now supervisor-owned when
    hibernation is enabled (idle_after_secs > 0); legacy long-lived
    forwarder still used when disabled to avoid regressing existing
    non-hibernating setups.
  - CPU hotplug for awake-but-idle floor: QMP `device_add` returns
    "machine does not support hot-plugging CPUs" on QEMU 11 + HVF +
    ARM virt. Defer; Linux CPU offline is a fallback path if needed
    later.
The 10s default was useful while iterating on the supervisor — short
enough to verify pause/wake every few minutes of testing. For real
use, 10s pauses mid-SSH-session whenever the user pauses to think,
which adds noticeable wake latency on every command. 60s is
comfortable for normal interactive work while still freeing host CPU
within a minute of stepping away. Users who want either extreme can
override via `avocado vm config set idle.hibernate_after_secs N` or
the `AVOCADO_VM_IDLE_HIBERNATE_SECS` env var.
`cargo fmt --check` and `cargo clippy --all-targets --all-features
-- -D warnings` were both failing on the just-merged supervisor and
DTB changes. Auto-applies rustfmt and rewrites three `pos % 4 != 0`
clippy::manual_is_multiple_of sites in fdt.rs to `!pos.is_multiple_of(4)`.

No behavior changes.
The hibernation supervisor uses tokio's UnixListener/UnixStream for
the docker socket path and tokio::signal::unix for graceful shutdown,
neither of which exist on Windows. Without gating, `cargo check
--target x86_64-pc-windows-gnu` fails with E0432 (unresolved
UnixListener/UnixStream imports).

Gated unix-only:
  - `pub mod supervisor` in utils/vm/mod.rs
  - `pub mod supervise` in commands/vm/mod.rs
  - `VmCommands::Supervise` variant + dispatch in main.rs
  - `spawn_supervisor` / `stop_supervisor` / `resolve_idle_after_secs`
    / `DEFAULT_IDLE_AFTER_SECS` in lifecycle.rs
  - The internal-port pick + ssh_port file write in `start`

On Windows the hibernation feature is unavailable: QEMU binds the
user-facing port directly (today's pre-supervisor behavior), the
legacy long-lived docker forwarder runs, and the VM never auto-pauses.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants