Skip to content

feat(queue): KubernetesJob broker — run queue jobs as Kubernetes Jobs#35

Open
abnegate wants to merge 6 commits into
mainfrom
feat/queue-kubernetes-job
Open

feat(queue): KubernetesJob broker — run queue jobs as Kubernetes Jobs#35
abnegate wants to merge 6 commits into
mainfrom
feat/queue-kubernetes-job

Conversation

@abnegate

Copy link
Copy Markdown
Member

What

Adds a KubernetesJob Publisher to packages/queue that triggers each enqueued message as a native Kubernetes Job instead of pushing it onto a broker. The cluster scheduler runs a pod to completion per message, so there's no long-running worker process.

  • Message envelope is serialized into the UTOPIA_QUEUE_MESSAGE env var; KubernetesJob::message() rebuilds the Message inside the pod.
  • Honours the Publisher::enqueue($priority) arg (maps to a pod PriorityClass when configured) and Queue::jobTtl (→ ttlSecondsAfterFinished).
  • Configurable image / namespace / command / args / env / backoffLimit / imagePullPolicy, plus a configureJob() hook for advanced manifest tweaks.
  • getQueueSize() and retry() operate over the queue's Jobs via label selector.
  • DNS-1123-safe Job names; queue identity carried in queue.utopia-php.com/* labels + annotations.

Uses appwrite-labs/php-k8s for the Kubernetes API; no library changes were needed (its K8sJob/K8sPod/Container surface covers everything).

Tests

  • Unit (unit suite, bare host): KubernetesJobManifestTest asserts the generated manifest — names, labels/annotations, env-carried message, ttl/jobTtl, priority, pull policy, the config hook, and the message() round-trip.
  • E2E (e2e suite): KubernetesJobTest runs against a real kind cluster provisioned by tests/e2e.sh + the package Tiltfile (builds & loads the worker image, applies the namespace). It proves: jobs run to completion, a failing job is marked failed, and retry() deletes the failed Job and re-enqueues a fresh one.

Verified locally against kind (kindest/node:v1.35.0): unit OK (50 tests), bin/monorepo check queue clean (pint + phpstan level 5), e2e OK (3 tests, 12 assertions).

⚠️ Before CI can go green

appwrite-labs/php-k8s is tagged 0.1.0 but not yet on Packagist. Submit it once at https://packagist.org/packages/submit (repo https://github.com/appwrite-labs/php-k8s); after that, bin/monorepo test queue / CI composer update will resolve appwrite-labs/php-k8s:^0.1 and regenerate the lock. Until then composer resolution for this package will fail.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings June 25, 2026 09:14

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@greptile-apps

greptile-apps Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Introduces a KubernetesJob publisher that serializes each enqueued message into a UTOPIA_QUEUE_MESSAGE environment variable and creates a native Kubernetes Job per message, with configurable image, namespace, backoff, TTL, pull policy, and a configureJob escape-hatch for advanced manifest tweaks. The implementation is backed by unit tests covering the generated manifest and an e2e suite that runs against a kind cluster provisioned by the bundled e2e.sh.

  • isFailed() now correctly checks the Failed condition in status.conditions (terminal state only), which was the right fix for the in-flight-retry misclassification.
  • The e2e harness pins kind/kubectl/tilt at exact versions with SHA-256 verification before chmod +x, addressing prior supply-chain concerns.
  • appwrite-labs/php-k8s is placed in require-dev with a suggest entry, so consumers of utopia-php/queue who don't use the Kubernetes broker are not forced to install it.

Confidence Score: 4/5

Safe to merge once appwrite-labs/php-k8s is registered on Packagist; the broker logic and test harness are sound but a few rough edges around documentation and the configureJob API surface warrant a second look before broader adoption.

The three main concerns from earlier rounds — the isFailed() misclassification, the missing checksum verification, and the hard-require K8s dependency — have all been addressed in this revision. The remaining open items (README static method reference, configureJob overwrite semantics) are low blast-radius but do affect developer experience for anyone following the docs or building on top of this adapter.

packages/queue/README.md and packages/queue/src/Queue/Broker/KubernetesJob.php deserve a close read before the adapter is declared stable for external consumers.

Important Files Changed

Filename Overview
packages/queue/src/Queue/Broker/KubernetesJob.php Core broker implementation — isFailed() now correctly uses conditions instead of pod counts; getQueueSize logic is sound; slug/sanitizeLabel helpers look correct for DNS-1123 and K8s label constraints.
packages/queue/src/Queue/Broker/KubernetesJob/Envelope.php Clean serialization contract — reads UTOPIA_QUEUE_MESSAGE env var, validates JSON, and throws descriptive RuntimeExceptions on missing or malformed input.
packages/queue/tests/e2e.sh Now pins kind/kubectl/tilt to exact versions with SHA-256 verification before chmod+x; checksum-mismatch causes non-zero exit propagating to setup_kind failure and test skip.
packages/queue/composer.json appwrite-labs/php-k8s correctly placed in require-dev (not require) with a suggest entry; VCS repository registered to resolve the pre-Packagist package.
packages/queue/tests/Queue/E2E/Adapter/KubernetesJobManifestTest.php Comprehensive manifest unit tests covering structure, labels/annotations, DNS-safe name generation, env merging, reserved-env protection, TTL override, priority class, configureJob hook, and Envelope round-trip.
packages/queue/tests/Queue/E2E/Adapter/KubernetesJobTest.php E2E tests cover job completion and terminal failure detection; connect() assumes kubeconfig uses inline certificate-authority-data / client-certificate-data — kind clusters always do, but other contexts may not, which would silently write empty TLS files and fail with a cryptic TLS error.
packages/queue/tests/Queue/servers/KubernetesJob/worker.php Correct worker entry-point — uses Envelope::read() to decode the message and exits 0/1 based on success or exception.
packages/queue/Tiltfile Tilt configuration builds and loads the worker Docker image into the kind cluster and applies the namespace manifest; deps list ensures correct rebuild on source changes.
packages/queue/tests/Queue/servers/KubernetesJob/k8s.yaml Defines the utopia-queue-e2e Namespace used by the e2e suite; straightforward Kubernetes manifest.

Reviews (7): Last reviewed commit: "refactor(queue): move KubernetesJobEnvel..." | Re-trigger Greptile

Comment thread packages/queue/src/Queue/Broker/KubernetesJob.php
Comment thread packages/queue/src/Queue/Broker/KubernetesJob.php Outdated
Comment thread packages/queue/tests/e2e.sh Outdated
Adds a Publisher that triggers each enqueued message as a native
Kubernetes Job (via appwrite-labs/php-k8s) instead of pushing it onto a
broker: the cluster scheduler runs a pod to completion per message, so
no long-running worker is required. The message envelope travels in the
UTOPIA_QUEUE_MESSAGE env var and KubernetesJob::message() rebuilds it
inside the pod.

Honours the new Publisher::enqueue $priority arg (PriorityClass) and
Queue::jobTtl, plus image pull policy and a configureJob() hook.
getQueueSize() counts the queue's Jobs by label selector; per-job
retries are left to the Job's native backoffLimit, so retry() is a no-op.

Unit tests cover manifest construction (bare host); a kind + Tilt e2e
suite (tests/e2e.sh) proves real execution: jobs run to completion and
failures are marked failed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@abnegate abnegate force-pushed the feat/queue-kubernetes-job branch from e15e944 to d3f0f0e Compare June 25, 2026 10:07
…wnloads

- isFailed() now keys off the Job's terminal "Failed" status condition instead
  of getFailedPodsCount(), which is non-zero mid-retry when backoffLimit > 0
  (would inflate getQueueSize(failedJobs: true)).
- tests/e2e.sh pins kubectl to a fixed version (no stable.txt) and verifies
  SHA-256 of the kind/kubectl/tilt downloads before granting execute.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@abnegate

Copy link
Copy Markdown
Member Author

Thanks @greptile — addressed in c1f2783:

  1. isFailed() non-terminal under backoffLimit > 0 (P1) — fixed. It now keys off the Job's terminal Failed status condition (via getConditions()) instead of getFailedPodsCount(), so a Job mid-retry is no longer mis-counted by getQueueSize(failedJobs: true). (php-k8s has no hasFailed() helper, so the condition is checked directly, symmetric to hasCompleted().) Verified against a real kind cluster — the failure e2e still passes with backoffLimit: 0.

  2. retry() silently deleting on readMessage() === null (P1) — no longer applicable. That re-enqueue loop (and readMessage()) was removed earlier: retry() is now a no-op, deferring to the Job's native backoffLimit, so there's no delete/discard path.

  3. Unverified binary downloads in e2e.sh (P2) — fixed. kubectl is pinned to a fixed version (no more stable.txt), and kind/kubectl/tilt are downloaded at pinned versions and SHA-256-verified before being made executable.

@greptile-apps

greptile-apps Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

All three fixes look solid. Here's a quick assessment:

1. isFailed() — terminal condition check ✅

The new implementation correctly iterates getConditions() looking for type=Failed, status=True, which is only set once the Job has permanently failed (after exhausting backoffLimit). This is exactly symmetric to hasCompleted() checking for type=Complete. The logic in getQueueSize is now correct:

if ($failedJobs === $this->isFailed($job) && ($failedJobs || !$job->hasCompleted())) {

One minor note: getConditions() could theoretically return a list of objects rather than arrays depending on the php-k8s version/deserialization path. The is_array($condition) guard handles that gracefully, so you're covered.

2. retry() as no-op ✅

Clean and the comment explains the intent well. No concerns here.

3. e2e.sh SHA-256 verification ✅

The pinned versions + sha256sum -c pattern is solid. A couple of observations:

  • The verify() function silences all output (> /dev/null 2>&1) — on a mismatch you get "checksum mismatch for $1" echoed but the calling code uses || return 1, so the overall setup_kind will bail correctly. Works fine.
  • The existing kind/kubectl/tilt binaries found in $PATH (not in $TOOLS) are not verified — they're trusted as-is via command -v. That's an acceptable tradeoff for a dev environment, though worth noting.

Overall the changes are correct and address all the original concerns. LGTM.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.

…obEnvelope

- appwrite-labs/php-k8s is added as a committed VCS repository (it is public,
  so no token is needed) instead of relying on Packagist — fixes the failing
  "could not be found" composer resolution in CI.
- The pod-side message decode moves off the Publisher into a small
  KubernetesJobEnvelope (ENV const + read()), so the producer (broker) and the
  in-pod consumer share only the wire contract. Updated the worker entrypoint
  and unit tests accordingly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread packages/queue/src/Queue/Broker/KubernetesJob.php
abnegate and others added 2 commits June 26, 2026 02:59
The KubernetesJob broker is optional, like the Swoole/Workerman adapters, so
appwrite-labs/php-k8s moves from require to require-dev (for this package's own
tests) plus suggest (for consumers). Composer repositories aren't transitive,
so keeping it in require broke resolution for dependents like platform that
don't declare the VCS repo; as require-dev it is never pulled into dependents.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the two preg_replace calls with a small shared slug() helper that walks
the string, keeps allowed characters, and collapses runs of the rest into a
single '-'. Same DNS-1123 output, no regex.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Relocate the pod-side decoder under the broker's namespace
(Utopia\Queue\Broker\KubernetesJob\Envelope). No behaviour change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants