Skip to content

Split llama-stack container build into deps and app layers for faster rebuilds#1895

Open
Jazzcort wants to merge 1 commit into
lightspeed-core:mainfrom
Jazzcort:speed-up-dev-build
Open

Split llama-stack container build into deps and app layers for faster rebuilds#1895
Jazzcort wants to merge 1 commit into
lightspeed-core:mainfrom
Jazzcort:speed-up-dev-build

Conversation

@Jazzcort

@Jazzcort Jazzcort commented Jun 10, 2026

Copy link
Copy Markdown

Description

Separate the monolithic container image into a deps base image (test.deps.containerfile) that installs system packages and Python dependencies, and a thin app image (test.containerfile) that only copies source code on top. The Makefile tracks a hash of pyproject.toml and uv.lock to skip rebuilding the deps image when dependencies haven't changed, making source-only iterations significantly faster.

Remove existing images by name before building replacements to prevent dangling : images from accumulating and wasting disk space.

make run takes around 1 min to build the developer build which was around 2.5 mins before (mac M3 pro)

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: claude-opus-4.6
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  1. If the dependencies are not changed, make run should just rebuild the app image and should take less time than before.
  2. If the dependencies are changed, make run should rebuild both deps and app images.
  3. There shouldn't be any dangling none:none images after executing make run for multiple times.

Summary by CodeRabbit

  • Chores

    • Added dependency-change detection to avoid unnecessary rebuilds.
    • Introduced a dedicated deps image and ensures stale containers/images are removed before rebuild/start.
    • Clean routine now removes the deps image and its stored hash.
    • .gitignore updated to exclude the auto-generated deps hash file.
  • Refactor

    • Converted container build to a multi-stage layout for smaller runtime images and improved layer caching.

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Refactors the container build into a deps-driven multi-stage flow: Makefile builds/ensures a dedicated deps image using a dependency-hash, the Containerfile supports a deps-builder stage and runtime stage FROM ${DEPS_IMAGE}, and lifecycle/cleanup plus .gitignore are updated for the generated hash.

Changes

Container layering and build orchestration

Layer / File(s) Summary
Makefile: deps image orchestration and build targets
Makefile
Adds LLAMA_STACK_DEPS_IMAGE and hash-tracking variables (DEPS_HASH_FILE, CURRENT_DEPS_HASH, STORED_DEPS_HASH); adds build-llama-stack-deps-image and ensure-llama-stack-deps-image; updates build-llama-stack-image to depend on ensure-llama-stack-deps-image, removes existing app image before rebuilding, and passes --build-arg DEPS_IMAGE=$(LLAMA_STACK_DEPS_IMAGE) to the app build.
Container image refactored for multi-stage build with deps base
deploy/llama-stack/test.containerfile
Adds ARG DEPS_IMAGE and a deps-builder stage that installs build tools and uv, copies only dependency metadata (including src/version.py and providers/* metadata), runs uv sync and conditionally exports provider requirements; second stage uses FROM ${DEPS_IMAGE}, copies src/, providers/, entrypoint and config, and fixes ownership/permissions.
Makefile container lifecycle and build artifacts cleanup
Makefile, .gitignore
Updates start-llama-stack-container prerequisites to remove existing container before building; expands clean-llama-stack to remove the deps image and delete .llama-stack-deps.hash; adds .llama-stack-deps.hash to .gitignore.

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • tisnik
  • radofuchs
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: splitting the llama-stack container build into separate deps and app layers to improve rebuild performance.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Makefile`:
- Around line 42-45: Makefile currently removes $(LLAMA_STACK_DEPS_IMAGE) (and
similarly the app image at the other removal) before building which sacrifices a
fallback on build failure; change the flow so the rmi of
$(LLAMA_STACK_DEPS_IMAGE) (and $(LLAMA_STACK_APP_IMAGE) at the other site)
happens only after a successful build—e.g., build the new image into a
temporary/tagged name, verify build success, then remove the old image with
$(CONTAINER_RUNTIME) rmi—alternatively add a dedicated cleanup target (invoked
manually or in CI) to accept dangling images during development; update the
removal commands referenced by the Makefile targets that currently use
$(CONTAINER_RUNTIME) image inspect ... rmi to follow this atomic replacement
pattern.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bfaef2ec-afc0-46f9-898b-7844a7dc6150

📥 Commits

Reviewing files that changed from the base of the PR and between be591ad and ca96e96.

📒 Files selected for processing (4)
  • .gitignore
  • Makefile
  • deploy/llama-stack/test.containerfile
  • deploy/llama-stack/test.deps.containerfile
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-04-15T18:53:54.785Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:742-769
Timestamp: 2026-04-15T18:53:54.785Z
Learning: In `src/models/requests.py`, keep `ResponsesRequest.validate_body_size` using `json.dumps(values)` with Python defaults (including ASCII escaping and the default separators). This default formatting is intentional to make the 65,536-character limit conservatively strict (accounting for small encoding/format overhead). Do not recommend changing it to `ensure_ascii=False` or using compact separators for this validator, since an exact wire-format byte match with the client payload is not achievable via `json.dumps` anyway.

Applied to files:

  • .gitignore
📚 Learning: 2026-04-20T15:09:45.119Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:45.119Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, treat the line `_get_rh_identity_context = get_rh_identity_context` as an intentional temporary backward-compatibility shim (introduced for PR `#1548`, Splunk HEC telemetry work). Do not flag it as dead/unnecessary/unused code during review until the planned part 3 is merged and the responses endpoint is fully wired up such that no tests or external consumers reference the underscore-prefixed name.

Applied to files:

  • .gitignore
🪛 checkmake (0.3.2)
Makefile

[warning] 36-36: Target body for "build-llama-stack-deps-image" exceeds allowed length of 5 lines (12).

(maxbodylength)


[warning] 50-50: Target body for "ensure-llama-stack-deps-image" exceeds allowed length of 5 lines (13).

(maxbodylength)


[warning] 65-65: Target body for "build-llama-stack-image" exceeds allowed length of 5 lines (8).

(maxbodylength)


[warning] 97-97: Target body for "start-llama-stack-container" exceeds allowed length of 5 lines (48).

(maxbodylength)


[warning] 163-163: Target body for "clean-llama-stack" exceeds allowed length of 5 lines (9).

(maxbodylength)

🔇 Additional comments (17)
.gitignore (1)

198-199: LGTM!

Makefile (5)

26-26: LGTM!


50-63: LGTM!


97-97: LGTM!


163-172: LGTM!


21-24: Harden deps hash portability in Makefile
Makefile lines 21-24 use shasum -a 256; shasum isn’t guaranteed to exist on all target systems (many only provide sha256sum). Add a fallback (or switch to sha256sum) and avoid silent empty/incorrect hashes so dependency change detection can’t degrade unexpectedly.

deploy/llama-stack/test.deps.containerfile (8)

1-10: LGTM!


16-19: LGTM!


21-22: LGTM!


24-29: LGTM!


31-32: LGTM!


34-39: LGTM!


41-41: LGTM!


12-14: Security: avoid unpinned curl | sh for UV installer in test containerfile

In deploy/llama-stack/test.deps.containerfile (lines 12-14), RUN curl -LsSf https://astral.sh/uv/install.sh | sh downloads and executes a remote installer without checksum/signature verification and without pinning the installer/version. Consider pinning the UV installer to a specific release and verifying the installer/archive content before execution to reduce supply-chain risk.

deploy/llama-stack/test.containerfile (3)

1-4: LGTM!


6-17: LGTM!


19-22: LGTM!

Comment thread Makefile
Comment on lines +42 to +45
@if $(CONTAINER_RUNTIME) image inspect $(LLAMA_STACK_DEPS_IMAGE) >/dev/null 2>&1; then \
echo "Removing existing deps image to avoid dangling images..."; \
$(CONTAINER_RUNTIME) rmi $(LLAMA_STACK_DEPS_IMAGE); \
fi

@coderabbitai coderabbitai Bot Jun 10, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Both image removals share the same atomicity trade-off. Removing images before building (both deps at lines 42-45 and app at lines 67-70) means any build failure leaves no fallback image. This trades build resilience for avoiding dangling <none>:<none> images. During development with frequent build iterations, consider whether atomic replacement (remove only after successful build) or accepting dangling images with periodic cleanup would better serve reliability.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` around lines 42 - 45, Makefile currently removes
$(LLAMA_STACK_DEPS_IMAGE) (and similarly the app image at the other removal)
before building which sacrifices a fallback on build failure; change the flow so
the rmi of $(LLAMA_STACK_DEPS_IMAGE) (and $(LLAMA_STACK_APP_IMAGE) at the other
site) happens only after a successful build—e.g., build the new image into a
temporary/tagged name, verify build success, then remove the old image with
$(CONTAINER_RUNTIME) rmi—alternatively add a dedicated cleanup target (invoked
manually or in CI) to accept dangling images during development; update the
removal commands referenced by the Makefile targets that currently use
$(CONTAINER_RUNTIME) image inspect ... rmi to follow this atomic replacement
pattern.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a fallback image if we're building a developer build for testing? 🤔 If the build fails, we fix the problem and build it again. Cleaning those dangling images is just too tedious and they are huge.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point, @Jazzcort. For a developer build workflow, the "fix and rebuild" loop is the natural recovery path — there's no meaningful benefit to keeping a potentially stale fallback image around. Removing it upfront to avoid accumulating large dangling images is the right trade-off here. I'll withdraw this concern.

(┳◇┳)🐇


✏️ Learnings added
Learnt from: Jazzcort
Repo: lightspeed-core/lightspeed-stack PR: 1895
File: Makefile:42-45
Timestamp: 2026-06-10T18:11:00.333Z
Learning: In the `lightspeed-stack` repository (`Makefile`), the `build-llama-stack-deps-image` and `build-llama-stack-image` targets intentionally remove existing images *before* building (rather than atomic replacement after a successful build). This is deliberate for the developer/test workflow: the expected recovery from a build failure is to fix the issue and rebuild, so a fallback image provides no value; removing first avoids accumulating large dangling `<none>:<none>` images.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@Jazzcort

Jazzcort commented Jun 10, 2026

Copy link
Copy Markdown
Author

This patch breaks the CI process in E2E server mode test. Should be a easy fix. I'll fix it later. 😁

@Jazzcort Jazzcort force-pushed the speed-up-dev-build branch from ca96e96 to 01ed340 Compare June 12, 2026 19:00

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/e2e_tests.yaml:
- Line 141: The CI step running the Make target build-llama-stack-deps-image can
pick podman when present, leaving the image in podman’s store and breaking
subsequent docker compose up; modify the workflow to force the Docker runtime
when invoking the Make target (e.g., set CONTAINER_RUNTIME=docker or equivalent
environment variable for the make invocation that calls
build-llama-stack-deps-image) so the lightspeed-llama-stack-deps:local image is
created in Docker’s local store and docker compose up succeeds.

In `@deploy/llama-stack/test.deps.containerfile`:
- Line 14: Replace the unpinned installer command `RUN curl -LsSf
https://astral.sh/uv/install.sh | sh` with a pinned installation: either copy
the binary from the official image (e.g., use a multi-stage step that pulls
`ghcr.io/astral-sh/uv:<VERSION>` and `COPY --from=ghcr.io/astral-sh/uv:<VERSION>
/usr/local/bin/uv /usr/local/bin/uv`) or call the versioned installer URL (`RUN
curl -LsSf https://astral.sh/uv/<VERSION>/install.sh | sh`); pick and substitute
a concrete <VERSION> (or digest) to ensure deterministic builds and remove the
unpinned installer line.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0cb091a8-5cbb-476b-b0d4-544733acd13e

📥 Commits

Reviewing files that changed from the base of the PR and between ca96e96 and 01ed340.

📒 Files selected for processing (5)
  • .github/workflows/e2e_tests.yaml
  • .gitignore
  • Makefile
  • deploy/llama-stack/test.containerfile
  • deploy/llama-stack/test.deps.containerfile
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2026-05-06T08:35:54.687Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1690
File: .github/workflows/e2e_tests_providers.yaml:279-285
Timestamp: 2026-05-06T08:35:54.687Z
Learning: In .github/workflows/e2e_tests_providers.yaml and related e2e workflow files, the show_logs step should not use docker compose logs with --tail or --since (i.e., keep logs unbounded). The quick connectivity test runs once immediately after container startup, so the log output is small and a log tail limit is unnecessary. If you adjust this, add a rationale comment in the workflow explaining why unbounded logs are acceptable and ensure CI behavior remains deterministic.

Applied to files:

  • .github/workflows/e2e_tests.yaml
📚 Learning: 2026-04-15T18:53:54.785Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:742-769
Timestamp: 2026-04-15T18:53:54.785Z
Learning: In `src/models/requests.py`, keep `ResponsesRequest.validate_body_size` using `json.dumps(values)` with Python defaults (including ASCII escaping and the default separators). This default formatting is intentional to make the 65,536-character limit conservatively strict (accounting for small encoding/format overhead). Do not recommend changing it to `ensure_ascii=False` or using compact separators for this validator, since an exact wire-format byte match with the client payload is not achievable via `json.dumps` anyway.

Applied to files:

  • .gitignore
📚 Learning: 2026-04-20T15:09:45.119Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:45.119Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, treat the line `_get_rh_identity_context = get_rh_identity_context` as an intentional temporary backward-compatibility shim (introduced for PR `#1548`, Splunk HEC telemetry work). Do not flag it as dead/unnecessary/unused code during review until the planned part 3 is merged and the responses endpoint is fully wired up such that no tests or external consumers reference the underscore-prefixed name.

Applied to files:

  • .gitignore
🪛 checkmake (0.3.2)
Makefile

[warning] 36-36: Target body for "build-llama-stack-deps-image" exceeds allowed length of 5 lines (12).

(maxbodylength)


[warning] 50-50: Target body for "ensure-llama-stack-deps-image" exceeds allowed length of 5 lines (13).

(maxbodylength)


[warning] 65-65: Target body for "build-llama-stack-image" exceeds allowed length of 5 lines (8).

(maxbodylength)


[warning] 97-97: Target body for "start-llama-stack-container" exceeds allowed length of 5 lines (48).

(maxbodylength)


[warning] 163-163: Target body for "clean-llama-stack" exceeds allowed length of 5 lines (9).

(maxbodylength)

🔇 Additional comments (3)
Makefile (1)

16-17: LGTM!

Also applies to: 21-27, 36-74, 97-97, 163-173

.gitignore (1)

198-199: LGTM!

deploy/llama-stack/test.containerfile (1)

1-4: LGTM!

Also applies to: 8-9, 15-17

Comment thread .github/workflows/e2e_tests.yaml Outdated

# Install uv
ENV PATH="/root/.local/bin:${PATH}"
RUN curl -LsSf https://astral.sh/uv/install.sh | sh

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

What is the official way (from Astral uv docs) to install a pinned uv version in a non-interactive container build, instead of always installing the latest release from install.sh?

💡 Result:

To install a specific version of uv in a non-interactive container build, the Astral uv documentation recommends two primary methods: copying the binary from an official pre-built image or using the installer script with a versioned URL [1][2]. Method 1: Copying from the Distroless Image (Recommended) The most common and efficient way to install a specific version of uv in a Dockerfile is to copy the uv and uvx binaries directly from the official Astral distroless image [1][3]. This method avoids the overhead of running an installation script and allows you to pin to a specific release tag or SHA256 hash for maximum reproducibility [1][2][4]. Example using a specific version tag: FROM python:3.12-slim-trixie COPY --from=ghcr.io/astral-sh/uv:0.11.21 /uv /uvx /bin/ Example using a SHA256 hash (best practice for reproducibility): COPY --from=ghcr.io/astral-sh/uv@sha256:2381d6aa60c326b71fd40023f921a0a3b8f91b14d5db6b90402e65a635053709 /uv /uvx /bin/ Method 2: Using the Installer Script If you prefer or require the use of the installation script, you can request a specific version by embedding the version number directly into the URL [5][6]. Example in a Dockerfile: FROM python:3.12-slim-trixie # Install curl and ca-certificates for the installer RUN apt-get update && apt-get install -y --no-install-recommends curl ca-certificates # Add and execute the specific version installer ADD https://astral.sh/uv/0.11.21/install.sh /uv-installer.sh RUN sh /uv-installer.sh && rm /uv-installer.sh # Add to PATH ENV PATH="/root/.local/bin/:$PATH" When using the installer script, ensure that the environment is handled appropriately for non-interactive builds (e.g., configuring PATH) as shown above [7][3]. Using the pre-built image (Method 1) is generally considered the standard best practice for Docker integration [1][3].

Citations:


Pin uv to a specific version in deploy/llama-stack/test.deps.containerfile instead of installing the latest at build time.

The build currently runs RUN curl -LsSf https://astral.sh/uv/install.sh | sh, which pulls whatever “latest” is, causing drift. Astral’s Docker integration docs recommend pinning either by copying uv/uvx from the official ghcr.io/astral-sh/uv:<version> image (tag or digest) or by using the installer script from a versioned URL (e.g., https://astral.sh/uv/<version>/install.sh).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/llama-stack/test.deps.containerfile` at line 14, Replace the unpinned
installer command `RUN curl -LsSf https://astral.sh/uv/install.sh | sh` with a
pinned installation: either copy the binary from the official image (e.g., use a
multi-stage step that pulls `ghcr.io/astral-sh/uv:<VERSION>` and `COPY
--from=ghcr.io/astral-sh/uv:<VERSION> /usr/local/bin/uv /usr/local/bin/uv`) or
call the versioned installer URL (`RUN curl -LsSf
https://astral.sh/uv/<VERSION>/install.sh | sh`); pick and substitute a concrete
<VERSION> (or digest) to ensure deterministic builds and remove the unpinned
installer line.

@Jazzcort Jazzcort force-pushed the speed-up-dev-build branch 4 times, most recently from e535019 to 14d80b6 Compare June 12, 2026 20:15

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Makefile`:
- Around line 65-70: The Makefile target build-llama-stack-image (and the
analogous target referenced at the other location) can fail to rmi the image if
existing containers still reference it; modify the target(s) to first detect and
forcibly remove any containers based on $(LLAMA_STACK_IMAGE) before calling
$(CONTAINER_RUNTIME) rmi. Concretely, run a container lookup using
$(CONTAINER_RUNTIME) ps -a with a filter for the ancestor/image
$(LLAMA_STACK_IMAGE) (or inspect containers for Image==$(LLAMA_STACK_IMAGE)),
stop/remove any returned container IDs (force-remove if needed), then proceed to
remove the image to avoid image-in-use conflicts.
- Around line 23-24: CURRENT_DEPS_HASH can be empty if shasum isn't installed;
change its shell pipeline to use a portable fallback (prefer sha256sum, fallback
to shasum -a 256) so the hash always computes; update the CURRENT_DEPS_HASH
assignment to use a conditional command (e.g. use command -v sha256sum
>/dev/null && sha256sum || shasum -a 256) while keeping the same input (cat
pyproject.toml uv.lock), and leave STORED_DEPS_HASH and the
ensure-llama-stack-deps-image rebuild gate logic unchanged so the rebuild
decision uses the reliably computed CURRENT_DEPS_HASH.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 93f14351-725f-4b17-bcea-d450b703e13b

📥 Commits

Reviewing files that changed from the base of the PR and between 01ed340 and 14d80b6.

📒 Files selected for processing (3)
  • .gitignore
  • Makefile
  • deploy/llama-stack/test.containerfile
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-04-15T18:53:54.785Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:742-769
Timestamp: 2026-04-15T18:53:54.785Z
Learning: In `src/models/requests.py`, keep `ResponsesRequest.validate_body_size` using `json.dumps(values)` with Python defaults (including ASCII escaping and the default separators). This default formatting is intentional to make the 65,536-character limit conservatively strict (accounting for small encoding/format overhead). Do not recommend changing it to `ensure_ascii=False` or using compact separators for this validator, since an exact wire-format byte match with the client payload is not achievable via `json.dumps` anyway.

Applied to files:

  • .gitignore
📚 Learning: 2026-04-20T15:09:45.119Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:45.119Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, treat the line `_get_rh_identity_context = get_rh_identity_context` as an intentional temporary backward-compatibility shim (introduced for PR `#1548`, Splunk HEC telemetry work). Do not flag it as dead/unnecessary/unused code during review until the planned part 3 is merged and the responses endpoint is fully wired up such that no tests or external consumers reference the underscore-prefixed name.

Applied to files:

  • .gitignore
🪛 checkmake (0.3.2)
Makefile

[warning] 36-36: Target body for "build-llama-stack-deps-image" exceeds allowed length of 5 lines (12).

(maxbodylength)


[warning] 50-50: Target body for "ensure-llama-stack-deps-image" exceeds allowed length of 5 lines (13).

(maxbodylength)


[warning] 65-65: Target body for "build-llama-stack-image" exceeds allowed length of 5 lines (8).

(maxbodylength)


[warning] 97-97: Target body for "start-llama-stack-container" exceeds allowed length of 5 lines (48).

(maxbodylength)


[warning] 163-163: Target body for "clean-llama-stack" exceeds allowed length of 5 lines (9).

(maxbodylength)

🔇 Additional comments (2)
deploy/llama-stack/test.containerfile (1)

1-8: LGTM!

Also applies to: 21-27, 46-55, 60-62

.gitignore (1)

198-199: LGTM!

Comment thread Makefile Outdated
Comment on lines +23 to +24
CURRENT_DEPS_HASH := $(shell cat pyproject.toml uv.lock 2>/dev/null | shasum -a 256 | cut -d' ' -f1)
STORED_DEPS_HASH := $(shell cat $(DEPS_HASH_FILE) 2>/dev/null)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Hash tool availability in this environment:"
command -v sha256sum || true
command -v shasum || true

echo
echo "Hashing callsites in Makefile:"
rg -n 'CURRENT_DEPS_HASH|STORED_DEPS_HASH|shasum|sha256sum' Makefile

Repository: lightspeed-core/lightspeed-stack

Length of output: 444


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Makefile lines 1-120 (hash + gating context) ==="
sed -n '1,120p' Makefile | nl -ba

echo
echo "=== Search for SHA/deps-hash related definitions ==="
rg -n 'DEPS_HASH_FILE|CURRENT_DEPS_HASH|STORED_DEPS_HASH|shasum|sha256sum|sha256' Makefile

echo
echo "=== Hash tool availability in this sandbox (evidence for portability issue) ==="
command -v sha256sum || true
command -v shasum || true

Repository: lightspeed-core/lightspeed-stack

Length of output: 190


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Makefile lines 1-120 (hash + gating context) ==="
sed -n '1,120p' Makefile | cat -n

echo
echo "=== Search for SHA/deps-hash related definitions ==="
rg -n 'DEPS_HASH_FILE|CURRENT_DEPS_HASH|STORED_DEPS_HASH|shasum|sha256sum|DEPS_HASH|HASH' Makefile

echo
echo "=== Hash tool availability in this environment (for CI portability guidance) ==="
command -v sha256sum || true
command -v shasum || true

Repository: lightspeed-core/lightspeed-stack

Length of output: 7236


Make deps hash calculation portable (sha256sum/shasum fallback)

CURRENT_DEPS_HASH hardcodes shasum -a 256 (Line 23); the rebuild gate in ensure-llama-stack-deps-image (Lines 58-60) relies entirely on this value. If shasum isn’t available in CI, CURRENT_DEPS_HASH can end up empty, causing the deps-image rebuild to be incorrectly skipped.

🔧 Proposed fix
+HASH_CMD := $(shell command -v sha256sum 2>/dev/null || command -v shasum 2>/dev/null)
+
+ifeq ($(HASH_CMD),)
+$(error No SHA-256 tool found. Install sha256sum or shasum)
+endif
+
 # Dependency change detection
 DEPS_HASH_FILE := .llama-stack-deps.hash
-CURRENT_DEPS_HASH := $(shell cat pyproject.toml uv.lock 2>/dev/null | shasum -a 256 | cut -d' ' -f1)
+CURRENT_DEPS_HASH := $(shell cat pyproject.toml uv.lock 2>/dev/null | (command -v sha256sum >/dev/null 2>&1 && sha256sum || shasum -a 256) | awk '{print $$1}')
 STORED_DEPS_HASH := $(shell cat $(DEPS_HASH_FILE) 2>/dev/null)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` around lines 23 - 24, CURRENT_DEPS_HASH can be empty if shasum
isn't installed; change its shell pipeline to use a portable fallback (prefer
sha256sum, fallback to shasum -a 256) so the hash always computes; update the
CURRENT_DEPS_HASH assignment to use a conditional command (e.g. use command -v
sha256sum >/dev/null && sha256sum || shasum -a 256) while keeping the same input
(cat pyproject.toml uv.lock), and leave STORED_DEPS_HASH and the
ensure-llama-stack-deps-image rebuild gate logic unchanged so the rebuild
decision uses the reliably computed CURRENT_DEPS_HASH.

Comment thread Makefile
@Jazzcort Jazzcort force-pushed the speed-up-dev-build branch from 14d80b6 to 53332b0 Compare June 12, 2026 21:41

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
Makefile (2)

21-24: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Hash portability issue remains unresolved.

This concern was raised in a previous review. Line 23 hardcodes shasum -a 256, which may not be available in all CI environments. If shasum is missing, CURRENT_DEPS_HASH becomes empty, and the rebuild gate in ensure-llama-stack-deps-image (lines 58-60) can incorrectly skip the deps-image rebuild.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` around lines 21 - 24, The CURRENT_DEPS_HASH assignment is
non-portable because it hardcodes `shasum -a 256`; update the Makefile so
`CURRENT_DEPS_HASH` computes a SHA-256 hash in a portable way (e.g., try `shasum
-a 256`, fall back to `sha256sum`, then to `openssl dgst -sha256` or a small
`python -c` hashing fallback) and ensure the pipeline always yields a non-empty
hash (or a deterministic sentinel) so the `ensure-llama-stack-deps-image`
rebuild gate can reliably compare `CURRENT_DEPS_HASH` with `STORED_DEPS_HASH`;
change the assignment that sets `CURRENT_DEPS_HASH` and leave
`DEPS_HASH_FILE`/`STORED_DEPS_HASH` and target `ensure-llama-stack-deps-image`
intact.

65-73: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Container-removal prerequisite issue remains unresolved.

This concern was raised in a previous review. build-llama-stack-image removes the app image (lines 67-70) but does not first remove any existing container that may reference it. Direct invocation of make build-llama-stack-image (without going through start-llama-stack-container) can fail with an image-in-use conflict if a container is still running or stopped.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` around lines 65 - 73, The build-llama-stack-image target currently
deletes $(LLAMA_STACK_IMAGE) without first ensuring no containers reference that
image; update build-llama-stack-image to first query for containers referencing
$(LLAMA_STACK_IMAGE) via $(CONTAINER_RUNTIME) (both running and stopped), stop
and remove any found containers, then proceed to rmi and rebuild; reference the
build-llama-stack-image target, the $(LLAMA_STACK_IMAGE) variable, and
$(CONTAINER_RUNTIME) in your changes and ensure this cleanup runs before the
existing rmi step (so invoking make build-llama-stack-image directly won’t fail
if containers exist).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@Makefile`:
- Around line 21-24: The CURRENT_DEPS_HASH assignment is non-portable because it
hardcodes `shasum -a 256`; update the Makefile so `CURRENT_DEPS_HASH` computes a
SHA-256 hash in a portable way (e.g., try `shasum -a 256`, fall back to
`sha256sum`, then to `openssl dgst -sha256` or a small `python -c` hashing
fallback) and ensure the pipeline always yields a non-empty hash (or a
deterministic sentinel) so the `ensure-llama-stack-deps-image` rebuild gate can
reliably compare `CURRENT_DEPS_HASH` with `STORED_DEPS_HASH`; change the
assignment that sets `CURRENT_DEPS_HASH` and leave
`DEPS_HASH_FILE`/`STORED_DEPS_HASH` and target `ensure-llama-stack-deps-image`
intact.
- Around line 65-73: The build-llama-stack-image target currently deletes
$(LLAMA_STACK_IMAGE) without first ensuring no containers reference that image;
update build-llama-stack-image to first query for containers referencing
$(LLAMA_STACK_IMAGE) via $(CONTAINER_RUNTIME) (both running and stopped), stop
and remove any found containers, then proceed to rmi and rebuild; reference the
build-llama-stack-image target, the $(LLAMA_STACK_IMAGE) variable, and
$(CONTAINER_RUNTIME) in your changes and ensure this cleanup runs before the
existing rmi step (so invoking make build-llama-stack-image directly won’t fail
if containers exist).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d735502a-afe8-4488-818b-9598ac3df46d

📥 Commits

Reviewing files that changed from the base of the PR and between 14d80b6 and 53332b0.

📒 Files selected for processing (3)
  • .gitignore
  • Makefile
  • deploy/llama-stack/test.containerfile
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-04-15T18:53:54.785Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:742-769
Timestamp: 2026-04-15T18:53:54.785Z
Learning: In `src/models/requests.py`, keep `ResponsesRequest.validate_body_size` using `json.dumps(values)` with Python defaults (including ASCII escaping and the default separators). This default formatting is intentional to make the 65,536-character limit conservatively strict (accounting for small encoding/format overhead). Do not recommend changing it to `ensure_ascii=False` or using compact separators for this validator, since an exact wire-format byte match with the client payload is not achievable via `json.dumps` anyway.

Applied to files:

  • .gitignore
📚 Learning: 2026-04-20T15:09:45.119Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:45.119Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, treat the line `_get_rh_identity_context = get_rh_identity_context` as an intentional temporary backward-compatibility shim (introduced for PR `#1548`, Splunk HEC telemetry work). Do not flag it as dead/unnecessary/unused code during review until the planned part 3 is merged and the responses endpoint is fully wired up such that no tests or external consumers reference the underscore-prefixed name.

Applied to files:

  • .gitignore
🪛 checkmake (0.3.2)
Makefile

[warning] 36-36: Target body for "build-llama-stack-deps-image" exceeds allowed length of 5 lines (12).

(maxbodylength)


[warning] 50-50: Target body for "ensure-llama-stack-deps-image" exceeds allowed length of 5 lines (13).

(maxbodylength)


[warning] 65-65: Target body for "build-llama-stack-image" exceeds allowed length of 5 lines (8).

(maxbodylength)


[warning] 97-97: Target body for "start-llama-stack-container" exceeds allowed length of 5 lines (47).

(maxbodylength)


[warning] 162-162: Target body for "clean-llama-stack" exceeds allowed length of 5 lines (9).

(maxbodylength)

🔇 Additional comments (11)
Makefile (6)

16-16: LGTM!


36-48: LGTM!


50-63: LGTM!


97-97: LGTM!


162-171: LGTM!


26-26: LGTM!

deploy/llama-stack/test.containerfile (4)

1-5: LGTM!


7-52: LGTM!


54-75: LGTM!


24-24: 📐 Maintainability & Code Quality

Keep src/version.py copied in the deps-builder stage (required for dynamic version metadata).

pyproject.toml declares dynamic = ["version"] and configures version to come from src/version.py via [tool.pdm.version] path = "src/version.py". Since the deps image runs uv sync --locked --no-install-project ..., it still needs the project metadata evaluation to succeed, which explains the earlier COPY src/version.py ./src/version.py before uv sync (deps-builder stage). Deferring that copy to stage 2 would risk breaking uv sync if version metadata is computed during the lock/deps resolution step.

.gitignore (1)

198-199: LGTM!

… rebuilds

Separate the monolithic container image into a deps base image
(test.deps.containerfile) that installs system packages and Python
dependencies, and a thin app image (test.containerfile) that only copies
source code on top. The Makefile tracks a hash of pyproject.toml and
uv.lock to skip rebuilding the deps image when dependencies haven't
changed, making source-only iterations significantly faster.

Remove existing images by name before building replacements to prevent
dangling <none>:<none> images from accumulating and wasting disk space.
@Jazzcort Jazzcort force-pushed the speed-up-dev-build branch from 53332b0 to 22c0fd5 Compare June 12, 2026 23:15

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Makefile`:
- Around line 42-45: The deps rebuild can fail because
build-llama-stack-deps-image removes $(LLAMA_STACK_DEPS_IMAGE) while a child
$(LLAMA_STACK_IMAGE) still exists; update the Makefile so the child app image is
removed before the parent: either move the existing removal of
$(LLAMA_STACK_IMAGE) (the app image cleanup currently in build-llama-stack-image
/ ensure-llama-stack-deps-image flow) to run before build-llama-stack-deps-image
attempts to rmi $(LLAMA_STACK_DEPS_IMAGE), or add a step inside
build-llama-stack-deps-image to remove $(LLAMA_STACK_IMAGE) first, and ensure
you check CONTAINER_RUNTIME is non-empty before invoking it.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5ad44c84-776d-4154-a0ca-340dc9863084

📥 Commits

Reviewing files that changed from the base of the PR and between 53332b0 and 22c0fd5.

📒 Files selected for processing (3)
  • .gitignore
  • Makefile
  • deploy/llama-stack/test.containerfile
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 1
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-04-15T18:53:54.785Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:742-769
Timestamp: 2026-04-15T18:53:54.785Z
Learning: In `src/models/requests.py`, keep `ResponsesRequest.validate_body_size` using `json.dumps(values)` with Python defaults (including ASCII escaping and the default separators). This default formatting is intentional to make the 65,536-character limit conservatively strict (accounting for small encoding/format overhead). Do not recommend changing it to `ensure_ascii=False` or using compact separators for this validator, since an exact wire-format byte match with the client payload is not achievable via `json.dumps` anyway.

Applied to files:

  • .gitignore
📚 Learning: 2026-04-20T15:09:45.119Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:45.119Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, treat the line `_get_rh_identity_context = get_rh_identity_context` as an intentional temporary backward-compatibility shim (introduced for PR `#1548`, Splunk HEC telemetry work). Do not flag it as dead/unnecessary/unused code during review until the planned part 3 is merged and the responses endpoint is fully wired up such that no tests or external consumers reference the underscore-prefixed name.

Applied to files:

  • .gitignore
🪛 checkmake (0.3.2)
Makefile

[warning] 36-36: Target body for "build-llama-stack-deps-image" exceeds allowed length of 5 lines (12).

(maxbodylength)


[warning] 50-50: Target body for "ensure-llama-stack-deps-image" exceeds allowed length of 5 lines (13).

(maxbodylength)


[warning] 65-65: Target body for "build-llama-stack-image" exceeds allowed length of 5 lines (8).

(maxbodylength)


[warning] 97-97: Target body for "start-llama-stack-container" exceeds allowed length of 5 lines (47).

(maxbodylength)


[warning] 162-162: Target body for "clean-llama-stack" exceeds allowed length of 5 lines (9).

(maxbodylength)

🔇 Additional comments (2)
.gitignore (1)

198-199: LGTM!

deploy/llama-stack/test.containerfile (1)

27-33: 🩺 Stability & Availability

deploy/llama-stack/test.containerfile: COPY makes providers/ dependency files effectively mandatory; the later if [ -f ... ] can’t prevent build failure.

  • In this repo checkout, providers/pyproject.toml and providers/uv.lock are missing, so any build context that lacks them will fail at COPY providers/pyproject.tom[l] providers/uv.loc[k] ./providers/ (before the if [ -f providers/pyproject.toml ]; then ... guard).
  • CI does fetch submodules recursively in several workflows (e.g., e2e/build/release), but this doesn’t guarantee the specific image-build job for this Containerfile always materializes those files.
  • Align the Containerfile’s behavior with the Makefile’s “tolerate missing providers” hashing intent, or ensure the container build always has the required providers/pyproject.toml + providers/uv.lock present in the build context. [verify_review_comment:deploy/llama-stack/test.containerfile:27-33,63]

Comment thread Makefile
Comment on lines +42 to +45
@if $(CONTAINER_RUNTIME) image inspect $(LLAMA_STACK_DEPS_IMAGE) >/dev/null 2>&1; then \
echo "Removing existing deps image to avoid dangling images..."; \
$(CONTAINER_RUNTIME) rmi $(LLAMA_STACK_DEPS_IMAGE); \
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Do Docker and Podman allow removing a parent/base image tag via rmi when a tagged child image built FROM it still exists, or do they fail with a dependency conflict?

💡 Result:

Docker: No—removing (via docker rmi) a base/parent image while a tagged child image that is built FROM it still exists causes a dependency conflict. Docker’s docker image rm removes images from the local host (and untags as needed), but if the image you’re removing has a dependent child image, the removal is canceled as a conflict (even with -f in the parent/child case). [1][2] Podman: Also no—podman rmi will fail when the target image has child images (or is used by a container). Podman’s rmi exit status documentation explicitly states that “one of the specified images has child images or is being used by a container” causes failure. [3][4] Practical implication (both): you must remove the dependent child images first, then remove the parent/base image; otherwise rmi reports a conflict/dependency error. [1][3]

Citations:


Fix deps rebuild ordering to prevent docker/podman rmi dependency conflicts
Docker/Podman won’t remove a base/parent image if a child image still exists. The Makefile removes $(LLAMA_STACK_DEPS_IMAGE) in build-llama-stack-deps-image (lines 42-45) while build-llama-stack-image calls ensure-llama-stack-deps-image before removing the existing app image (lines 65-70), so a stale $(LLAMA_STACK_IMAGE) built FROM the deps image can make the deps rebuild fail/abort.

Suggested fix: remove $(LLAMA_STACK_IMAGE) (child) before removing $(LLAMA_STACK_DEPS_IMAGE) (parent)—either by moving the app cleanup earlier than ensure-llama-stack-deps-image, or by adding the app-image removal inside the deps build target while ensuring $(CONTAINER_RUNTIME) is checked for non-empty before it’s used.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` around lines 42 - 45, The deps rebuild can fail because
build-llama-stack-deps-image removes $(LLAMA_STACK_DEPS_IMAGE) while a child
$(LLAMA_STACK_IMAGE) still exists; update the Makefile so the child app image is
removed before the parent: either move the existing removal of
$(LLAMA_STACK_IMAGE) (the app image cleanup currently in build-llama-stack-image
/ ensure-llama-stack-deps-image flow) to run before build-llama-stack-deps-image
attempts to rmi $(LLAMA_STACK_DEPS_IMAGE), or add a step inside
build-llama-stack-deps-image to remove $(LLAMA_STACK_IMAGE) first, and ensure
you check CONTAINER_RUNTIME is non-empty before invoking it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant