Skip to content

Prefer workers with cached actor images during scheduling#298

Open
han2ni3bal-pixel wants to merge 2 commits into
agent-substrate:mainfrom
han2ni3bal-pixel:feat-actor-image-cache-affinity
Open

Prefer workers with cached actor images during scheduling#298
han2ni3bal-pixel wants to merge 2 commits into
agent-substrate:mainfrom
han2ni3bal-pixel:feat-actor-image-cache-affinity

Conversation

@han2ni3bal-pixel

Copy link
Copy Markdown
Contributor

Fixes #276

Changes

  • atelet periodically reports image digests held in its in-memory cache to ate-api
  • ate-api stores cache information in Valkey, keyed by node
  • During ResumeActor worker selection, prefer nodes where all required images are already cached; fall back to the existing random strategy when there is no cache hit
  • Local snapshot node restrictions remain a hard constraint; image cache affinity is applied only after that

Testing

  • Unit tests
  • Manual verification on Multipass + k3s: resume counter actor; Cache miss/hit behaves as expected; Redis contains imageDigests

@google-cla

google-cla Bot commented Jun 24, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Resolve conflict in workflow_resume_test.go by keeping both
poolWithClass from upstream and image-cache affinity tests.
@han2ni3bal-pixel han2ni3bal-pixel force-pushed the feat-actor-image-cache-affinity branch from e98a6cb to f2340fb Compare June 25, 2026 01:28
@juli4n

Copy link
Copy Markdown
Collaborator

There have been discussions around soft-constraints for scheduling based on node properties (like this), this not only applies to OCI images but other artifacts like volume snapshots. I would recommend we discuss this in #276 and agree on a path forward before turning it into an implementation.

FYI Michelle Au (@msau42)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prefer workers with cached actor images during scheduling

2 participants