Skip to content

Improve Apple container compose compatibility#119

Merged
Cyb3rDudu merged 7 commits into
Mcrich23:mainfrom
thromel:fix/apple-container-compose-compat
Jul 1, 2026
Merged

Improve Apple container compose compatibility#119
Cyb3rDudu merged 7 commits into
Mcrich23:mainfrom
thromel:fix/apple-container-compose-compat

Conversation

@thromel

@thromel thromel commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

This improves compatibility with Apple container for several Compose workflows:

  • decode depends_on map form and preserve dependency conditions
  • wait for service_started, service_healthy, and service_completed_successfully
  • parse service networks object form, including aliases
  • emit Apple network alias properties when the linked Apple container command parser supports them, while preserving a warning/skip fallback for Apple Container 1.0.0
  • warn and skip Compose hostname: because Apple container run 1.0.0 does not currently expose a hostname flag; proposed upstream support remains under discussion in Add container run hostname flag apple/container#1811
  • execute supported healthchecks (CMD and CMD-SHELL) with interval/start-period/retry handling
  • create and mount named volumes through native container volume support instead of rewriting them into host paths
  • surface failed detached container run commands as errors, which makes one-shot failures fail compose up

Related issues: #52, #115, #116, #117, and tracker #118.

Apple-side service discovery work is tracked upstream in apple/container#1809. Current upstream PR chain:

Testing

Static/build checks:

git diff --check
swift build
swift test --filter EntrypointCommandTests
swift test --filter Container_Compose_StaticTests --skip WaitForeverCpuTests

The static test command passed 141 tests. WaitForeverCpuTests was skipped because the existing CPU-threshold test is noisy on this machine and unrelated to these changes.

Manual Apple container smoke tests using the locally built Container-Compose binary:

  • one-shot success dependency completes and unblocks dependent startup
  • one-shot failure exits nonzero and fails up -d
  • native named volume persists data and can be read from a separate container run
  • service network object form attaches to the requested network and emits an alias fallback warning against Apple Container 1.0.0
  • passing healthcheck unblocks dependent service startup
  • failing healthcheck prevents dependent service startup
  • service with hostname: emits a warning, does not pass unsupported --hostname, and completes successfully (HOSTNAME_OK logged)

All dynamic test containers, networks, and volumes were cleaned up after the runs.

@thromel thromel force-pushed the fix/apple-container-compose-compat branch from b0b30a4 to 1ee22f2 Compare June 25, 2026 04:36
@thromel thromel marked this pull request as ready for review June 25, 2026 05:58
@thromel thromel changed the title [codex] Improve Apple container compose compatibility Improve Apple container compose compatibility Jun 25, 2026
@thromel thromel force-pushed the fix/apple-container-compose-compat branch from c46f218 to d862ae7 Compare June 26, 2026 19:45
@Cyb3rDudu Cyb3rDudu self-requested a review June 27, 2026 03:23
@Cyb3rDudu Cyb3rDudu self-assigned this Jun 27, 2026

@Cyb3rDudu Cyb3rDudu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR - this meaningfully closes the gap with Apple container, and the parsing/arg-building test coverage is solid. I'd like to merge this, but I think two items should be addressed first since they're regressions on existing workflows rather than missing features. The rest can be follow-ups.

Blockers (before merge)

  1. up now hard-fails on transitive dependencies
    Sources/Container-Compose/Commands/ComposeUp.swift:160

The service filter only keeps a service if it's requested or a direct dependent (via dependedBy, which topoSortConfiguredServices populates one level deep). For a chain c → b → a, compose up c keeps c and b but drops a. The new waitForDependencyConditions then throws dependencyNotStarted("b", "a"). Before this PR the missing grandparent was tolerated, so this is a regression on a core workflow.

Fix: expand the filter to keep the full transitive dependency closure of the requested services (walk depends_on recursively), or have service_started start a missing dependency on demand instead of throwing.

  1. up -d silently reports success when a long-running service crashes
    Sources/Container-Compose/Commands/ComposeUp.swift:336

container run -d returns exit 0 on backgrounding, so the guard at :726 passes. If the container then crashes within the poll window, it reaches .stopped and waitUntilServiceStarted records .completed — the container's real exit code is never inspected. This undermines the PR's own goal of surfacing failed detached runs. A db that exits non-zero ~1s after start will print success and launch dependents against a dead service.

Fix: for a .stopped long-running service, inspect the container's actual exit status rather than assuming success.

Follow-ups (file as issues, don't block)

  • retries is total attempts, not consecutive failures (ComposeUp.swift:797) — a healthy-but-slow service (e.g. a DB needing ~45s) is declared unhealthy. Compose treats retries as consecutive failures.
  • healthcheck.timeout is parsed but never applied (ComposeUp.swift:806) — a hanging probe blocks streamCommand indefinitely instead of being bounded.
  • service_completed_successfully race in attached mode (ComposeUp.swift:730) — the detached Task + immediate poll can sample a one-shot as .running before it exits, breaking its dependents. Timing-dependent; worth a known-limitation note.

Minor (cleanup)

  • supportsNetworkAliases() is re-evaluated on every networkRunArg call (:295) — cache it in a static let.
  • Healthcheck.parseDuration recompiles the regex on every call (Healthcheck.swift:58) — hoist to static let.
  • The detach/attached branches duplicate handleOutput, the start banner, and the streamCommand call (:716) — extract the shared part and branch only on error handling.

Happy to help with any of these. The core work here is solid - just want to land it without the two regressions.

@Cyb3rDudu

Copy link
Copy Markdown
Collaborator

Separate from the items in the review above - wanted to flag one thing on the volume handling specifically.

The native-volume switch is the right call, and I like that it respects name / external / driver / labels and replaces the old createVolumeHardLink host-dir dance. One gap though: the volume ends up named just data (or the explicit name), with no project prefix. Docker Compose names these <project>_<source> precisely so two stacks on the same machine don't collide on a shared name - as it stands, two projects both declaring volumes: { data: ... } would write into the same data volume.

Coincidentally #122 just opened and hits this same branch from the other direction: it uses <project>_<source> namespacing (which solves the collision) but drops the name / external / driver handling and leaves createVolumeHardLink running as dead code. I think your approach here is the more complete one — would you be open to borrowing #122's namespacing as the default name? Roughly:

let actualVolumeName = volumeConfig.name
    ?? volumeConfig.external?.name        // external: keep as-is, skip create
    ?? "\(projectName)_\(source)"         // fall back to project-namespaced

That keeps everything you've built and adds the isolation. Happy to help wire it up.

@thromel thromel force-pushed the fix/apple-container-compose-compat branch from d862ae7 to bdc0a64 Compare June 30, 2026 03:01
@thromel

thromel commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the detailed review. I pushed bdc0a64 with the merge blockers fixed: up <service> now keeps the full dependency closure, stopped detached services check the actual init-process exit code, and default native volume names are project-scoped while preserving name / external. I also cached alias-support probing and hoisted the healthcheck duration regex. Static affected tests and swift build passed locally; I left the retry/timeout and attached one-shot race items as follow-ups per your note.

@thromel thromel requested a review from Cyb3rDudu June 30, 2026 03:07
Resolve ComposeUp.swift conflicts after Mcrich23#97 (DNS + compose labels +
ActivityClock idle wait + stopOldStuff->stopExistingContainers rename)
landed on main:

- Wait function: keep Mcrich23#119's waitUntilServiceStarted name, ServiceStartState
  return, and .stopped exit-code validation; adopt Mcrich23#97's ActivityClock
  idle/maxWait signature and containerName(for:) lookup.
- Run path: keep Mcrich23#119's detach (exit-code guard) vs attached (fire-and-forget
  Task) structure, threading an ActivityClock through both branches so the
  idle wait sees pull progress; keep Mcrich23#119's startState switch (healthcheck +
  dependency gating).
- Stop: keep Mcrich23#97's stopExistingContainers (container-name strings); drop
  Mcrich23#119's stopOldStuff (no remaining callers after the merge took Mcrich23#97's call
  sites).
- Instance state: keep both Mcrich23#119's serviceStartStates/serviceHealth and Mcrich23#97's
  serviceContainerNames.
- Helpers: keep both Mcrich23#119's run-arg/service-selection helpers and Mcrich23#97's
  containerName(for:).

Builds clean; 104 static tests pass (dependencies, healthcheck, networks,
parsing, volumes, entrypoint, helpers).

Follow-up: Mcrich23#97's runTask teardown-on-wait-failure is intentionally omitted —
it assumes a single wrapping runTask, which Mcrich23#119's detach/attached split
doesn't have. Worth re-adding once the run path is unified.
@Cyb3rDudu

Copy link
Copy Markdown
Collaborator

@thromel I pulled main into this branch to clear the conflicts that landed after #97 merged (DNS, compose labels, the ActivityClock idle-wait, and the stopOldStuffstopExistingContainers rename). The PR is MERGEABLE again.

The conflict was all in ComposeUp.swift - I merged both sides rather than picking one:

  • the readiness wait keeps your waitUntilServiceStarted / ServiceStartState / .stopped exit-code check, and adopts main's ActivityClock idle + maxWait signature and containerName(for:);
  • the run path keeps your detach (exit-code guard) vs attached split, with an ActivityClock threaded through both branches so the idle wait sees pull progress;
  • teardown uses main's stopExistingContainers

Builds clean; 104 static tests pass.

Two notes:

  1. I left out main's runTask teardown-on-wait-failure - it assumes a single wrapping runTask, which your detach/attached split doesn't have. Worth re-adding once the run path is unified. No regression vs. your branch; it just doesn't gain the cleanup.
  2. Separately, my earlier note on waitForInitExitCode still stands - worth double-checking the processIdentifier (currently set to the container name, not a PID) and that .exitCode is always present, so a missing value can't default to 0 (= success).

Happy to merge once you've confirmed the XPC behavior in (2) — either a quick
smoke test (exit-0 vs non-zero) or a small fix. (1) is fine as a follow-up;
no need to block the merge on it.

Everything else looks good. One tiny optional tweak: the teardown bullet currently reads "teardown uses main's stopExistingContainers" with no why — you could append "(your stopOldStuff had no callers left after the merge)" so it's clear it wasn't an arbitrary choice, but that's cosmetic.

@thromel

thromel commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

Confirmed the XPC concern was valid: waiting only after the container reached .stopped was too late because the runtime client may already be gone. I pushed 5dfb26e, which registers the containerWait task immediately after detached container run succeeds, then consumes that task if startup polling observes .stopped.

Smoke-tested locally with busybox:latest:

  • exit 0 one-shot: up -d succeeds.
  • exit 17 one-shot: up -d fails with Service 'fail' failed to start ... status 17.

Also reran swift build and swift test --filter 'ServiceDependencyTests|ComposeVolumeTests|ActivityClockTests'. The runTask teardown item still looks fine as a follow-up per your note.

@Cyb3rDudu Cyb3rDudu merged commit cdfc7d5 into Mcrich23:main Jul 1, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Request]: Add network-scoped aliases for container network attachments

2 participants