Split llama-stack container build into deps and app layers for faster rebuilds#1895
Split llama-stack container build into deps and app layers for faster rebuilds#1895Jazzcort wants to merge 1 commit into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughRefactors the container build into a deps-driven multi-stage flow: Makefile builds/ensures a dedicated deps image using a dependency-hash, the Containerfile supports a deps-builder stage and runtime stage FROM ${DEPS_IMAGE}, and lifecycle/cleanup plus .gitignore are updated for the generated hash. ChangesContainer layering and build orchestration
🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Makefile`:
- Around line 42-45: Makefile currently removes $(LLAMA_STACK_DEPS_IMAGE) (and
similarly the app image at the other removal) before building which sacrifices a
fallback on build failure; change the flow so the rmi of
$(LLAMA_STACK_DEPS_IMAGE) (and $(LLAMA_STACK_APP_IMAGE) at the other site)
happens only after a successful build—e.g., build the new image into a
temporary/tagged name, verify build success, then remove the old image with
$(CONTAINER_RUNTIME) rmi—alternatively add a dedicated cleanup target (invoked
manually or in CI) to accept dangling images during development; update the
removal commands referenced by the Makefile targets that currently use
$(CONTAINER_RUNTIME) image inspect ... rmi to follow this atomic replacement
pattern.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: bfaef2ec-afc0-46f9-898b-7844a7dc6150
📒 Files selected for processing (4)
.gitignoreMakefiledeploy/llama-stack/test.containerfiledeploy/llama-stack/test.deps.containerfile
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-04-15T18:53:54.785Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:742-769
Timestamp: 2026-04-15T18:53:54.785Z
Learning: In `src/models/requests.py`, keep `ResponsesRequest.validate_body_size` using `json.dumps(values)` with Python defaults (including ASCII escaping and the default separators). This default formatting is intentional to make the 65,536-character limit conservatively strict (accounting for small encoding/format overhead). Do not recommend changing it to `ensure_ascii=False` or using compact separators for this validator, since an exact wire-format byte match with the client payload is not achievable via `json.dumps` anyway.
Applied to files:
.gitignore
📚 Learning: 2026-04-20T15:09:45.119Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:45.119Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, treat the line `_get_rh_identity_context = get_rh_identity_context` as an intentional temporary backward-compatibility shim (introduced for PR `#1548`, Splunk HEC telemetry work). Do not flag it as dead/unnecessary/unused code during review until the planned part 3 is merged and the responses endpoint is fully wired up such that no tests or external consumers reference the underscore-prefixed name.
Applied to files:
.gitignore
🪛 checkmake (0.3.2)
Makefile
[warning] 36-36: Target body for "build-llama-stack-deps-image" exceeds allowed length of 5 lines (12).
(maxbodylength)
[warning] 50-50: Target body for "ensure-llama-stack-deps-image" exceeds allowed length of 5 lines (13).
(maxbodylength)
[warning] 65-65: Target body for "build-llama-stack-image" exceeds allowed length of 5 lines (8).
(maxbodylength)
[warning] 97-97: Target body for "start-llama-stack-container" exceeds allowed length of 5 lines (48).
(maxbodylength)
[warning] 163-163: Target body for "clean-llama-stack" exceeds allowed length of 5 lines (9).
(maxbodylength)
🔇 Additional comments (17)
.gitignore (1)
198-199: LGTM!Makefile (5)
26-26: LGTM!
50-63: LGTM!
97-97: LGTM!
163-172: LGTM!
21-24: Harden deps hash portability in Makefile
Makefile lines 21-24 useshasum -a 256;shasumisn’t guaranteed to exist on all target systems (many only providesha256sum). Add a fallback (or switch tosha256sum) and avoid silent empty/incorrect hashes so dependency change detection can’t degrade unexpectedly.deploy/llama-stack/test.deps.containerfile (8)
1-10: LGTM!
16-19: LGTM!
21-22: LGTM!
24-29: LGTM!
31-32: LGTM!
34-39: LGTM!
41-41: LGTM!
12-14: Security: avoid unpinnedcurl | shfor UV installer in test containerfileIn
deploy/llama-stack/test.deps.containerfile(lines 12-14),RUN curl -LsSf https://astral.sh/uv/install.sh | shdownloads and executes a remote installer without checksum/signature verification and without pinning the installer/version. Consider pinning the UV installer to a specific release and verifying the installer/archive content before execution to reduce supply-chain risk.deploy/llama-stack/test.containerfile (3)
1-4: LGTM!
6-17: LGTM!
19-22: LGTM!
| @if $(CONTAINER_RUNTIME) image inspect $(LLAMA_STACK_DEPS_IMAGE) >/dev/null 2>&1; then \ | ||
| echo "Removing existing deps image to avoid dangling images..."; \ | ||
| $(CONTAINER_RUNTIME) rmi $(LLAMA_STACK_DEPS_IMAGE); \ | ||
| fi |
There was a problem hiding this comment.
Both image removals share the same atomicity trade-off. Removing images before building (both deps at lines 42-45 and app at lines 67-70) means any build failure leaves no fallback image. This trades build resilience for avoiding dangling <none>:<none> images. During development with frequent build iterations, consider whether atomic replacement (remove only after successful build) or accepting dangling images with periodic cleanup would better serve reliability.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Makefile` around lines 42 - 45, Makefile currently removes
$(LLAMA_STACK_DEPS_IMAGE) (and similarly the app image at the other removal)
before building which sacrifices a fallback on build failure; change the flow so
the rmi of $(LLAMA_STACK_DEPS_IMAGE) (and $(LLAMA_STACK_APP_IMAGE) at the other
site) happens only after a successful build—e.g., build the new image into a
temporary/tagged name, verify build success, then remove the old image with
$(CONTAINER_RUNTIME) rmi—alternatively add a dedicated cleanup target (invoked
manually or in CI) to accept dangling images during development; update the
removal commands referenced by the Makefile targets that currently use
$(CONTAINER_RUNTIME) image inspect ... rmi to follow this atomic replacement
pattern.
There was a problem hiding this comment.
Why do we need a fallback image if we're building a developer build for testing? 🤔 If the build fails, we fix the problem and build it again. Cleaning those dangling images is just too tedious and they are huge.
There was a problem hiding this comment.
That's a fair point, @Jazzcort. For a developer build workflow, the "fix and rebuild" loop is the natural recovery path — there's no meaningful benefit to keeping a potentially stale fallback image around. Removing it upfront to avoid accumulating large dangling images is the right trade-off here. I'll withdraw this concern.
(┳◇┳)🐇
✏️ Learnings added
Learnt from: Jazzcort
Repo: lightspeed-core/lightspeed-stack PR: 1895
File: Makefile:42-45
Timestamp: 2026-06-10T18:11:00.333Z
Learning: In the `lightspeed-stack` repository (`Makefile`), the `build-llama-stack-deps-image` and `build-llama-stack-image` targets intentionally remove existing images *before* building (rather than atomic replacement after a successful build). This is deliberate for the developer/test workflow: the expected recovery from a build failure is to fix the issue and rebuild, so a fallback image provides no value; removing first avoids accumulating large dangling `<none>:<none>` images.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
|
This patch breaks the CI process in E2E server mode test. Should be a easy fix. I'll fix it later. 😁 |
ca96e96 to
01ed340
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/e2e_tests.yaml:
- Line 141: The CI step running the Make target build-llama-stack-deps-image can
pick podman when present, leaving the image in podman’s store and breaking
subsequent docker compose up; modify the workflow to force the Docker runtime
when invoking the Make target (e.g., set CONTAINER_RUNTIME=docker or equivalent
environment variable for the make invocation that calls
build-llama-stack-deps-image) so the lightspeed-llama-stack-deps:local image is
created in Docker’s local store and docker compose up succeeds.
In `@deploy/llama-stack/test.deps.containerfile`:
- Line 14: Replace the unpinned installer command `RUN curl -LsSf
https://astral.sh/uv/install.sh | sh` with a pinned installation: either copy
the binary from the official image (e.g., use a multi-stage step that pulls
`ghcr.io/astral-sh/uv:<VERSION>` and `COPY --from=ghcr.io/astral-sh/uv:<VERSION>
/usr/local/bin/uv /usr/local/bin/uv`) or call the versioned installer URL (`RUN
curl -LsSf https://astral.sh/uv/<VERSION>/install.sh | sh`); pick and substitute
a concrete <VERSION> (or digest) to ensure deterministic builds and remove the
unpinned installer line.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0cb091a8-5cbb-476b-b0d4-544733acd13e
📒 Files selected for processing (5)
.github/workflows/e2e_tests.yaml.gitignoreMakefiledeploy/llama-stack/test.containerfiledeploy/llama-stack/test.deps.containerfile
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2026-05-06T08:35:54.687Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1690
File: .github/workflows/e2e_tests_providers.yaml:279-285
Timestamp: 2026-05-06T08:35:54.687Z
Learning: In .github/workflows/e2e_tests_providers.yaml and related e2e workflow files, the show_logs step should not use docker compose logs with --tail or --since (i.e., keep logs unbounded). The quick connectivity test runs once immediately after container startup, so the log output is small and a log tail limit is unnecessary. If you adjust this, add a rationale comment in the workflow explaining why unbounded logs are acceptable and ensure CI behavior remains deterministic.
Applied to files:
.github/workflows/e2e_tests.yaml
📚 Learning: 2026-04-15T18:53:54.785Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:742-769
Timestamp: 2026-04-15T18:53:54.785Z
Learning: In `src/models/requests.py`, keep `ResponsesRequest.validate_body_size` using `json.dumps(values)` with Python defaults (including ASCII escaping and the default separators). This default formatting is intentional to make the 65,536-character limit conservatively strict (accounting for small encoding/format overhead). Do not recommend changing it to `ensure_ascii=False` or using compact separators for this validator, since an exact wire-format byte match with the client payload is not achievable via `json.dumps` anyway.
Applied to files:
.gitignore
📚 Learning: 2026-04-20T15:09:45.119Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:45.119Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, treat the line `_get_rh_identity_context = get_rh_identity_context` as an intentional temporary backward-compatibility shim (introduced for PR `#1548`, Splunk HEC telemetry work). Do not flag it as dead/unnecessary/unused code during review until the planned part 3 is merged and the responses endpoint is fully wired up such that no tests or external consumers reference the underscore-prefixed name.
Applied to files:
.gitignore
🪛 checkmake (0.3.2)
Makefile
[warning] 36-36: Target body for "build-llama-stack-deps-image" exceeds allowed length of 5 lines (12).
(maxbodylength)
[warning] 50-50: Target body for "ensure-llama-stack-deps-image" exceeds allowed length of 5 lines (13).
(maxbodylength)
[warning] 65-65: Target body for "build-llama-stack-image" exceeds allowed length of 5 lines (8).
(maxbodylength)
[warning] 97-97: Target body for "start-llama-stack-container" exceeds allowed length of 5 lines (48).
(maxbodylength)
[warning] 163-163: Target body for "clean-llama-stack" exceeds allowed length of 5 lines (9).
(maxbodylength)
🔇 Additional comments (3)
Makefile (1)
16-17: LGTM!Also applies to: 21-27, 36-74, 97-97, 163-173
.gitignore (1)
198-199: LGTM!deploy/llama-stack/test.containerfile (1)
1-4: LGTM!Also applies to: 8-9, 15-17
|
|
||
| # Install uv | ||
| ENV PATH="/root/.local/bin:${PATH}" | ||
| RUN curl -LsSf https://astral.sh/uv/install.sh | sh |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What is the official way (from Astral uv docs) to install a pinned uv version in a non-interactive container build, instead of always installing the latest release from install.sh?
💡 Result:
To install a specific version of uv in a non-interactive container build, the Astral uv documentation recommends two primary methods: copying the binary from an official pre-built image or using the installer script with a versioned URL [1][2]. Method 1: Copying from the Distroless Image (Recommended) The most common and efficient way to install a specific version of uv in a Dockerfile is to copy the uv and uvx binaries directly from the official Astral distroless image [1][3]. This method avoids the overhead of running an installation script and allows you to pin to a specific release tag or SHA256 hash for maximum reproducibility [1][2][4]. Example using a specific version tag: FROM python:3.12-slim-trixie COPY --from=ghcr.io/astral-sh/uv:0.11.21 /uv /uvx /bin/ Example using a SHA256 hash (best practice for reproducibility): COPY --from=ghcr.io/astral-sh/uv@sha256:2381d6aa60c326b71fd40023f921a0a3b8f91b14d5db6b90402e65a635053709 /uv /uvx /bin/ Method 2: Using the Installer Script If you prefer or require the use of the installation script, you can request a specific version by embedding the version number directly into the URL [5][6]. Example in a Dockerfile: FROM python:3.12-slim-trixie # Install curl and ca-certificates for the installer RUN apt-get update && apt-get install -y --no-install-recommends curl ca-certificates # Add and execute the specific version installer ADD https://astral.sh/uv/0.11.21/install.sh /uv-installer.sh RUN sh /uv-installer.sh && rm /uv-installer.sh # Add to PATH ENV PATH="/root/.local/bin/:$PATH" When using the installer script, ensure that the environment is handled appropriately for non-interactive builds (e.g., configuring PATH) as shown above [7][3]. Using the pre-built image (Method 1) is generally considered the standard best practice for Docker integration [1][3].
Citations:
- 1: https://docs.astral.sh/uv/guides/integration/docker/
- 2: https://github.com/astral-sh/uv/blob/main/docs/guides/integration/docker.md
- 3: https://astral-sh-uv.mintlify.app/integrations/docker
- 4: https://pydevtools.com/handbook/how-to/how-to-use-uv-in-a-dockerfile/
- 5: https://docs.astral.sh/uv/getting-started/installation/index.md
- 6: How can you select a specific version of
uvusinginstall.sh? astral-sh/uv#14089 - 7: https://docs.astral.sh/uv/reference/installer/index.md
Pin uv to a specific version in deploy/llama-stack/test.deps.containerfile instead of installing the latest at build time.
The build currently runs RUN curl -LsSf https://astral.sh/uv/install.sh | sh, which pulls whatever “latest” is, causing drift. Astral’s Docker integration docs recommend pinning either by copying uv/uvx from the official ghcr.io/astral-sh/uv:<version> image (tag or digest) or by using the installer script from a versioned URL (e.g., https://astral.sh/uv/<version>/install.sh).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deploy/llama-stack/test.deps.containerfile` at line 14, Replace the unpinned
installer command `RUN curl -LsSf https://astral.sh/uv/install.sh | sh` with a
pinned installation: either copy the binary from the official image (e.g., use a
multi-stage step that pulls `ghcr.io/astral-sh/uv:<VERSION>` and `COPY
--from=ghcr.io/astral-sh/uv:<VERSION> /usr/local/bin/uv /usr/local/bin/uv`) or
call the versioned installer URL (`RUN curl -LsSf
https://astral.sh/uv/<VERSION>/install.sh | sh`); pick and substitute a concrete
<VERSION> (or digest) to ensure deterministic builds and remove the unpinned
installer line.
e535019 to
14d80b6
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Makefile`:
- Around line 65-70: The Makefile target build-llama-stack-image (and the
analogous target referenced at the other location) can fail to rmi the image if
existing containers still reference it; modify the target(s) to first detect and
forcibly remove any containers based on $(LLAMA_STACK_IMAGE) before calling
$(CONTAINER_RUNTIME) rmi. Concretely, run a container lookup using
$(CONTAINER_RUNTIME) ps -a with a filter for the ancestor/image
$(LLAMA_STACK_IMAGE) (or inspect containers for Image==$(LLAMA_STACK_IMAGE)),
stop/remove any returned container IDs (force-remove if needed), then proceed to
remove the image to avoid image-in-use conflicts.
- Around line 23-24: CURRENT_DEPS_HASH can be empty if shasum isn't installed;
change its shell pipeline to use a portable fallback (prefer sha256sum, fallback
to shasum -a 256) so the hash always computes; update the CURRENT_DEPS_HASH
assignment to use a conditional command (e.g. use command -v sha256sum
>/dev/null && sha256sum || shasum -a 256) while keeping the same input (cat
pyproject.toml uv.lock), and leave STORED_DEPS_HASH and the
ensure-llama-stack-deps-image rebuild gate logic unchanged so the rebuild
decision uses the reliably computed CURRENT_DEPS_HASH.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 93f14351-725f-4b17-bcea-d450b703e13b
📒 Files selected for processing (3)
.gitignoreMakefiledeploy/llama-stack/test.containerfile
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-04-15T18:53:54.785Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:742-769
Timestamp: 2026-04-15T18:53:54.785Z
Learning: In `src/models/requests.py`, keep `ResponsesRequest.validate_body_size` using `json.dumps(values)` with Python defaults (including ASCII escaping and the default separators). This default formatting is intentional to make the 65,536-character limit conservatively strict (accounting for small encoding/format overhead). Do not recommend changing it to `ensure_ascii=False` or using compact separators for this validator, since an exact wire-format byte match with the client payload is not achievable via `json.dumps` anyway.
Applied to files:
.gitignore
📚 Learning: 2026-04-20T15:09:45.119Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:45.119Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, treat the line `_get_rh_identity_context = get_rh_identity_context` as an intentional temporary backward-compatibility shim (introduced for PR `#1548`, Splunk HEC telemetry work). Do not flag it as dead/unnecessary/unused code during review until the planned part 3 is merged and the responses endpoint is fully wired up such that no tests or external consumers reference the underscore-prefixed name.
Applied to files:
.gitignore
🪛 checkmake (0.3.2)
Makefile
[warning] 36-36: Target body for "build-llama-stack-deps-image" exceeds allowed length of 5 lines (12).
(maxbodylength)
[warning] 50-50: Target body for "ensure-llama-stack-deps-image" exceeds allowed length of 5 lines (13).
(maxbodylength)
[warning] 65-65: Target body for "build-llama-stack-image" exceeds allowed length of 5 lines (8).
(maxbodylength)
[warning] 97-97: Target body for "start-llama-stack-container" exceeds allowed length of 5 lines (48).
(maxbodylength)
[warning] 163-163: Target body for "clean-llama-stack" exceeds allowed length of 5 lines (9).
(maxbodylength)
🔇 Additional comments (2)
deploy/llama-stack/test.containerfile (1)
1-8: LGTM!Also applies to: 21-27, 46-55, 60-62
.gitignore (1)
198-199: LGTM!
| CURRENT_DEPS_HASH := $(shell cat pyproject.toml uv.lock 2>/dev/null | shasum -a 256 | cut -d' ' -f1) | ||
| STORED_DEPS_HASH := $(shell cat $(DEPS_HASH_FILE) 2>/dev/null) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Hash tool availability in this environment:"
command -v sha256sum || true
command -v shasum || true
echo
echo "Hashing callsites in Makefile:"
rg -n 'CURRENT_DEPS_HASH|STORED_DEPS_HASH|shasum|sha256sum' MakefileRepository: lightspeed-core/lightspeed-stack
Length of output: 444
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Makefile lines 1-120 (hash + gating context) ==="
sed -n '1,120p' Makefile | nl -ba
echo
echo "=== Search for SHA/deps-hash related definitions ==="
rg -n 'DEPS_HASH_FILE|CURRENT_DEPS_HASH|STORED_DEPS_HASH|shasum|sha256sum|sha256' Makefile
echo
echo "=== Hash tool availability in this sandbox (evidence for portability issue) ==="
command -v sha256sum || true
command -v shasum || trueRepository: lightspeed-core/lightspeed-stack
Length of output: 190
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Makefile lines 1-120 (hash + gating context) ==="
sed -n '1,120p' Makefile | cat -n
echo
echo "=== Search for SHA/deps-hash related definitions ==="
rg -n 'DEPS_HASH_FILE|CURRENT_DEPS_HASH|STORED_DEPS_HASH|shasum|sha256sum|DEPS_HASH|HASH' Makefile
echo
echo "=== Hash tool availability in this environment (for CI portability guidance) ==="
command -v sha256sum || true
command -v shasum || trueRepository: lightspeed-core/lightspeed-stack
Length of output: 7236
Make deps hash calculation portable (sha256sum/shasum fallback)
CURRENT_DEPS_HASH hardcodes shasum -a 256 (Line 23); the rebuild gate in ensure-llama-stack-deps-image (Lines 58-60) relies entirely on this value. If shasum isn’t available in CI, CURRENT_DEPS_HASH can end up empty, causing the deps-image rebuild to be incorrectly skipped.
🔧 Proposed fix
+HASH_CMD := $(shell command -v sha256sum 2>/dev/null || command -v shasum 2>/dev/null)
+
+ifeq ($(HASH_CMD),)
+$(error No SHA-256 tool found. Install sha256sum or shasum)
+endif
+
# Dependency change detection
DEPS_HASH_FILE := .llama-stack-deps.hash
-CURRENT_DEPS_HASH := $(shell cat pyproject.toml uv.lock 2>/dev/null | shasum -a 256 | cut -d' ' -f1)
+CURRENT_DEPS_HASH := $(shell cat pyproject.toml uv.lock 2>/dev/null | (command -v sha256sum >/dev/null 2>&1 && sha256sum || shasum -a 256) | awk '{print $$1}')
STORED_DEPS_HASH := $(shell cat $(DEPS_HASH_FILE) 2>/dev/null)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Makefile` around lines 23 - 24, CURRENT_DEPS_HASH can be empty if shasum
isn't installed; change its shell pipeline to use a portable fallback (prefer
sha256sum, fallback to shasum -a 256) so the hash always computes; update the
CURRENT_DEPS_HASH assignment to use a conditional command (e.g. use command -v
sha256sum >/dev/null && sha256sum || shasum -a 256) while keeping the same input
(cat pyproject.toml uv.lock), and leave STORED_DEPS_HASH and the
ensure-llama-stack-deps-image rebuild gate logic unchanged so the rebuild
decision uses the reliably computed CURRENT_DEPS_HASH.
14d80b6 to
53332b0
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
Makefile (2)
21-24: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winHash portability issue remains unresolved.
This concern was raised in a previous review. Line 23 hardcodes
shasum -a 256, which may not be available in all CI environments. Ifshasumis missing,CURRENT_DEPS_HASHbecomes empty, and the rebuild gate inensure-llama-stack-deps-image(lines 58-60) can incorrectly skip the deps-image rebuild.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Makefile` around lines 21 - 24, The CURRENT_DEPS_HASH assignment is non-portable because it hardcodes `shasum -a 256`; update the Makefile so `CURRENT_DEPS_HASH` computes a SHA-256 hash in a portable way (e.g., try `shasum -a 256`, fall back to `sha256sum`, then to `openssl dgst -sha256` or a small `python -c` hashing fallback) and ensure the pipeline always yields a non-empty hash (or a deterministic sentinel) so the `ensure-llama-stack-deps-image` rebuild gate can reliably compare `CURRENT_DEPS_HASH` with `STORED_DEPS_HASH`; change the assignment that sets `CURRENT_DEPS_HASH` and leave `DEPS_HASH_FILE`/`STORED_DEPS_HASH` and target `ensure-llama-stack-deps-image` intact.
65-73: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winContainer-removal prerequisite issue remains unresolved.
This concern was raised in a previous review.
build-llama-stack-imageremoves the app image (lines 67-70) but does not first remove any existing container that may reference it. Direct invocation ofmake build-llama-stack-image(without going throughstart-llama-stack-container) can fail with an image-in-use conflict if a container is still running or stopped.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Makefile` around lines 65 - 73, The build-llama-stack-image target currently deletes $(LLAMA_STACK_IMAGE) without first ensuring no containers reference that image; update build-llama-stack-image to first query for containers referencing $(LLAMA_STACK_IMAGE) via $(CONTAINER_RUNTIME) (both running and stopped), stop and remove any found containers, then proceed to rmi and rebuild; reference the build-llama-stack-image target, the $(LLAMA_STACK_IMAGE) variable, and $(CONTAINER_RUNTIME) in your changes and ensure this cleanup runs before the existing rmi step (so invoking make build-llama-stack-image directly won’t fail if containers exist).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@Makefile`:
- Around line 21-24: The CURRENT_DEPS_HASH assignment is non-portable because it
hardcodes `shasum -a 256`; update the Makefile so `CURRENT_DEPS_HASH` computes a
SHA-256 hash in a portable way (e.g., try `shasum -a 256`, fall back to
`sha256sum`, then to `openssl dgst -sha256` or a small `python -c` hashing
fallback) and ensure the pipeline always yields a non-empty hash (or a
deterministic sentinel) so the `ensure-llama-stack-deps-image` rebuild gate can
reliably compare `CURRENT_DEPS_HASH` with `STORED_DEPS_HASH`; change the
assignment that sets `CURRENT_DEPS_HASH` and leave
`DEPS_HASH_FILE`/`STORED_DEPS_HASH` and target `ensure-llama-stack-deps-image`
intact.
- Around line 65-73: The build-llama-stack-image target currently deletes
$(LLAMA_STACK_IMAGE) without first ensuring no containers reference that image;
update build-llama-stack-image to first query for containers referencing
$(LLAMA_STACK_IMAGE) via $(CONTAINER_RUNTIME) (both running and stopped), stop
and remove any found containers, then proceed to rmi and rebuild; reference the
build-llama-stack-image target, the $(LLAMA_STACK_IMAGE) variable, and
$(CONTAINER_RUNTIME) in your changes and ensure this cleanup runs before the
existing rmi step (so invoking make build-llama-stack-image directly won’t fail
if containers exist).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d735502a-afe8-4488-818b-9598ac3df46d
📒 Files selected for processing (3)
.gitignoreMakefiledeploy/llama-stack/test.containerfile
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-04-15T18:53:54.785Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:742-769
Timestamp: 2026-04-15T18:53:54.785Z
Learning: In `src/models/requests.py`, keep `ResponsesRequest.validate_body_size` using `json.dumps(values)` with Python defaults (including ASCII escaping and the default separators). This default formatting is intentional to make the 65,536-character limit conservatively strict (accounting for small encoding/format overhead). Do not recommend changing it to `ensure_ascii=False` or using compact separators for this validator, since an exact wire-format byte match with the client payload is not achievable via `json.dumps` anyway.
Applied to files:
.gitignore
📚 Learning: 2026-04-20T15:09:45.119Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:45.119Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, treat the line `_get_rh_identity_context = get_rh_identity_context` as an intentional temporary backward-compatibility shim (introduced for PR `#1548`, Splunk HEC telemetry work). Do not flag it as dead/unnecessary/unused code during review until the planned part 3 is merged and the responses endpoint is fully wired up such that no tests or external consumers reference the underscore-prefixed name.
Applied to files:
.gitignore
🪛 checkmake (0.3.2)
Makefile
[warning] 36-36: Target body for "build-llama-stack-deps-image" exceeds allowed length of 5 lines (12).
(maxbodylength)
[warning] 50-50: Target body for "ensure-llama-stack-deps-image" exceeds allowed length of 5 lines (13).
(maxbodylength)
[warning] 65-65: Target body for "build-llama-stack-image" exceeds allowed length of 5 lines (8).
(maxbodylength)
[warning] 97-97: Target body for "start-llama-stack-container" exceeds allowed length of 5 lines (47).
(maxbodylength)
[warning] 162-162: Target body for "clean-llama-stack" exceeds allowed length of 5 lines (9).
(maxbodylength)
🔇 Additional comments (11)
Makefile (6)
16-16: LGTM!
36-48: LGTM!
50-63: LGTM!
97-97: LGTM!
162-171: LGTM!
26-26: LGTM!deploy/llama-stack/test.containerfile (4)
1-5: LGTM!
7-52: LGTM!
54-75: LGTM!
24-24: 📐 Maintainability & Code QualityKeep
src/version.pycopied in the deps-builder stage (required for dynamic version metadata).
pyproject.tomldeclaresdynamic = ["version"]and configuresversionto come fromsrc/version.pyvia[tool.pdm.version] path = "src/version.py". Since the deps image runsuv sync --locked --no-install-project ..., it still needs the project metadata evaluation to succeed, which explains the earlierCOPY src/version.py ./src/version.pybeforeuv sync(deps-builder stage). Deferring that copy to stage 2 would risk breakinguv syncif version metadata is computed during the lock/deps resolution step..gitignore (1)
198-199: LGTM!
… rebuilds Separate the monolithic container image into a deps base image (test.deps.containerfile) that installs system packages and Python dependencies, and a thin app image (test.containerfile) that only copies source code on top. The Makefile tracks a hash of pyproject.toml and uv.lock to skip rebuilding the deps image when dependencies haven't changed, making source-only iterations significantly faster. Remove existing images by name before building replacements to prevent dangling <none>:<none> images from accumulating and wasting disk space.
53332b0 to
22c0fd5
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Makefile`:
- Around line 42-45: The deps rebuild can fail because
build-llama-stack-deps-image removes $(LLAMA_STACK_DEPS_IMAGE) while a child
$(LLAMA_STACK_IMAGE) still exists; update the Makefile so the child app image is
removed before the parent: either move the existing removal of
$(LLAMA_STACK_IMAGE) (the app image cleanup currently in build-llama-stack-image
/ ensure-llama-stack-deps-image flow) to run before build-llama-stack-deps-image
attempts to rmi $(LLAMA_STACK_DEPS_IMAGE), or add a step inside
build-llama-stack-deps-image to remove $(LLAMA_STACK_IMAGE) first, and ensure
you check CONTAINER_RUNTIME is non-empty before invoking it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5ad44c84-776d-4154-a0ca-340dc9863084
📒 Files selected for processing (3)
.gitignoreMakefiledeploy/llama-stack/test.containerfile
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 1
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-04-15T18:53:54.785Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:742-769
Timestamp: 2026-04-15T18:53:54.785Z
Learning: In `src/models/requests.py`, keep `ResponsesRequest.validate_body_size` using `json.dumps(values)` with Python defaults (including ASCII escaping and the default separators). This default formatting is intentional to make the 65,536-character limit conservatively strict (accounting for small encoding/format overhead). Do not recommend changing it to `ensure_ascii=False` or using compact separators for this validator, since an exact wire-format byte match with the client payload is not achievable via `json.dumps` anyway.
Applied to files:
.gitignore
📚 Learning: 2026-04-20T15:09:45.119Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:45.119Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, treat the line `_get_rh_identity_context = get_rh_identity_context` as an intentional temporary backward-compatibility shim (introduced for PR `#1548`, Splunk HEC telemetry work). Do not flag it as dead/unnecessary/unused code during review until the planned part 3 is merged and the responses endpoint is fully wired up such that no tests or external consumers reference the underscore-prefixed name.
Applied to files:
.gitignore
🪛 checkmake (0.3.2)
Makefile
[warning] 36-36: Target body for "build-llama-stack-deps-image" exceeds allowed length of 5 lines (12).
(maxbodylength)
[warning] 50-50: Target body for "ensure-llama-stack-deps-image" exceeds allowed length of 5 lines (13).
(maxbodylength)
[warning] 65-65: Target body for "build-llama-stack-image" exceeds allowed length of 5 lines (8).
(maxbodylength)
[warning] 97-97: Target body for "start-llama-stack-container" exceeds allowed length of 5 lines (47).
(maxbodylength)
[warning] 162-162: Target body for "clean-llama-stack" exceeds allowed length of 5 lines (9).
(maxbodylength)
🔇 Additional comments (2)
.gitignore (1)
198-199: LGTM!deploy/llama-stack/test.containerfile (1)
27-33: 🩺 Stability & Availabilitydeploy/llama-stack/test.containerfile: COPY makes providers/ dependency files effectively mandatory; the later
if [ -f ... ]can’t prevent build failure.
- In this repo checkout,
providers/pyproject.tomlandproviders/uv.lockare missing, so any build context that lacks them will fail atCOPY providers/pyproject.tom[l] providers/uv.loc[k] ./providers/(before theif [ -f providers/pyproject.toml ]; then ...guard).- CI does fetch submodules recursively in several workflows (e.g., e2e/build/release), but this doesn’t guarantee the specific image-build job for this Containerfile always materializes those files.
- Align the Containerfile’s behavior with the Makefile’s “tolerate missing providers” hashing intent, or ensure the container build always has the required
providers/pyproject.toml+providers/uv.lockpresent in the build context. [verify_review_comment:deploy/llama-stack/test.containerfile:27-33,63]
| @if $(CONTAINER_RUNTIME) image inspect $(LLAMA_STACK_DEPS_IMAGE) >/dev/null 2>&1; then \ | ||
| echo "Removing existing deps image to avoid dangling images..."; \ | ||
| $(CONTAINER_RUNTIME) rmi $(LLAMA_STACK_DEPS_IMAGE); \ | ||
| fi |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major
🧩 Analysis chain
🌐 Web query:
Do Docker and Podman allow removing a parent/base image tag via rmi when a tagged child image built FROM it still exists, or do they fail with a dependency conflict?
💡 Result:
Docker: No—removing (via docker rmi) a base/parent image while a tagged child image that is built FROM it still exists causes a dependency conflict. Docker’s docker image rm removes images from the local host (and untags as needed), but if the image you’re removing has a dependent child image, the removal is canceled as a conflict (even with -f in the parent/child case). [1][2] Podman: Also no—podman rmi will fail when the target image has child images (or is used by a container). Podman’s rmi exit status documentation explicitly states that “one of the specified images has child images or is being used by a container” causes failure. [3][4] Practical implication (both): you must remove the dependent child images first, then remove the parent/base image; otherwise rmi reports a conflict/dependency error. [1][3]
Citations:
- 1: https://docs.docker.com/reference/cli/docker/image/rm/
- 2: https://stackoverflow.com/questions/38118791/can-t-delete-docker-image-with-dependent-child-images
- 3: https://docs.podman.io/en/stable/markdown/podman-rmi.1.html
- 4: https://docs.podman.io/en/v5.7.1/markdown/podman-rmi.1.html
Fix deps rebuild ordering to prevent docker/podman rmi dependency conflicts
Docker/Podman won’t remove a base/parent image if a child image still exists. The Makefile removes $(LLAMA_STACK_DEPS_IMAGE) in build-llama-stack-deps-image (lines 42-45) while build-llama-stack-image calls ensure-llama-stack-deps-image before removing the existing app image (lines 65-70), so a stale $(LLAMA_STACK_IMAGE) built FROM the deps image can make the deps rebuild fail/abort.
Suggested fix: remove $(LLAMA_STACK_IMAGE) (child) before removing $(LLAMA_STACK_DEPS_IMAGE) (parent)—either by moving the app cleanup earlier than ensure-llama-stack-deps-image, or by adding the app-image removal inside the deps build target while ensuring $(CONTAINER_RUNTIME) is checked for non-empty before it’s used.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Makefile` around lines 42 - 45, The deps rebuild can fail because
build-llama-stack-deps-image removes $(LLAMA_STACK_DEPS_IMAGE) while a child
$(LLAMA_STACK_IMAGE) still exists; update the Makefile so the child app image is
removed before the parent: either move the existing removal of
$(LLAMA_STACK_IMAGE) (the app image cleanup currently in build-llama-stack-image
/ ensure-llama-stack-deps-image flow) to run before build-llama-stack-deps-image
attempts to rmi $(LLAMA_STACK_DEPS_IMAGE), or add a step inside
build-llama-stack-deps-image to remove $(LLAMA_STACK_IMAGE) first, and ensure
you check CONTAINER_RUNTIME is non-empty before invoking it.
Description
Separate the monolithic container image into a deps base image (test.deps.containerfile) that installs system packages and Python dependencies, and a thin app image (test.containerfile) that only copies source code on top. The Makefile tracks a hash of pyproject.toml and uv.lock to skip rebuilding the deps image when dependencies haven't changed, making source-only iterations significantly faster.
Remove existing images by name before building replacements to prevent dangling : images from accumulating and wasting disk space.
make runtakes around 1 min to build the developer build which was around 2.5 mins before (mac M3 pro)Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
make runshould just rebuild the app image and should take less time than before.make runshould rebuild both deps and app images.none:noneimages after executingmake runfor multiple times.Summary by CodeRabbit
Chores
Refactor