Skip to content

ci: combined cleanup — integration install + lint debt + Oracle runners#10

Merged
kvaps merged 21 commits into
mainfrom
ci/cleanup-unified
May 22, 2026
Merged

ci: combined cleanup — integration install + lint debt + Oracle runners#10
kvaps merged 21 commits into
mainfrom
ci/cleanup-unified

Conversation

@kvaps
Copy link
Copy Markdown
Member

@kvaps kvaps commented May 22, 2026

Summary

Three independent CI clean-ups bundled into one PR so the pipeline runs once instead of three times.

1. Integration: pip-install linstor-client from LINBIT GitHub tarball

Original PR #7. Closes the pip install linstor-client "No matching distribution found" failure on ubuntu-latest by pulling the CLI from the upstream LINBIT GitHub release tarball with --no-deps (sidesteps a typo in v1.27.1's setup.py), and installing python-linstor + argcomplete from PyPI separately.

2. Lint cleanup — clear ~15 pre-existing golangci-lint findings

Original PR #8. Eight independent commits, one per finding category:

  • contextcheck (7) — r.Context() propagated into closures
  • err113 (2) — ErrVolumeSizeBelowMinimum / ErrVolumeSizeAboveMaximum sentinels
  • funcorder (3) — helpers moved below exported callers
  • embeddedstructfieldcheck (1) — blank line added
  • goconst (5) — queryFlag, propValueOff, drbd.DiskStateUpToDate
  • gocritic hugeParam (3) — Config passed by pointer through NewManager
  • dupl (1) — runPeriodicTick extracted to pkg/satellite/controllers/runnable_common.go
  • funlen / gocognit / gocyclo (4) — long satellite-controller fns split into helpers

3. E2E: switch to CNCF Oracle runners

Original PR #9. Mirrors cozystack/cozystack's e2e runner expression:

runs-on: ${{ contains(github.event.pull_request.labels.*.name, 'debug') && 'self-hosted' || 'oracle-vm-24cpu-96gb-x86-64' }}

24 CPU / 96 GB / KVM nested virt is enough for both kind-based e2e and a future real-DRBD QEMU stand. Drops the temporary continue-on-error: true on e2e (added when the runner was the cramped ubuntu-latest) and bumps timeout-minutes: 60 → 180 to match cozystack.

Why bundle them

These three changes don't overlap functionally but they all hit the same PR pipeline. Running them in a single CI cycle saves three Oracle runner allocations.

Test plan

  • make lint shows zero remaining findings from the categories listed above.
  • go build ./... && go vet ./... clean (verified locally).
  • go test ./... passes the same set as origin/main.
  • Integration job picks up linstor-client v1.27.1 and linstor --version prints 1.27.1.
  • e2e job lands on oracle-vm-24cpu-96gb-x86-64 by default; on a debug-labelled PR lands on self-hosted.

Closes: #7, #8, #9 (superseded — same commits, single pipeline run).

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

Warning

Rate limit exceeded

@kvaps has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 3 minutes and 1 second before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 673a7114-f45d-470f-8772-5eadc256680a

📥 Commits

Reviewing files that changed from the base of the PR and between b4b56c6 and 58ef982.

📒 Files selected for processing (66)
  • .github/workflows/integration.yml
  • .github/workflows/pull-request.yml
  • .golangci.yml
  • Makefile
  • cmd/linstor-trace-recorder/main.go
  • cmd/satellite/main.go
  • config/manager/manager.yaml
  • internal/controller/auto_diskful_controller.go
  • internal/controller/auto_diskful_timer_test.go
  • internal/controller/autosnapshot_controller.go
  • internal/controller/autosnapshot_controller_test.go
  • internal/controller/resource_controller.go
  • pkg/rest/autoplace.go
  • pkg/rest/cache_invalidation_bug_124_test.go
  • pkg/rest/encryption.go
  • pkg/rest/node_lifecycle.go
  • pkg/rest/nodes.go
  • pkg/rest/resource_adjust.go
  • pkg/rest/resource_definitions.go
  • pkg/rest/resource_groups.go
  • pkg/rest/resource_toggle_disk.go
  • pkg/rest/server.go
  • pkg/rest/server_test.go
  • pkg/rest/snapshot_restore.go
  • pkg/rest/storage_pools.go
  • pkg/rest/storage_pools_test.go
  • pkg/rest/volume_definitions.go
  • pkg/satellite/agent_internal_test.go
  • pkg/satellite/attach.go
  • pkg/satellite/attach_test.go
  • pkg/satellite/controllers/discovered_storage.go
  • pkg/satellite/controllers/manager.go
  • pkg/satellite/controllers/observer.go
  • pkg/satellite/controllers/physicaldevice_discovery.go
  • pkg/satellite/controllers/physicaldevice_discovery_uevent_bug341_test.go
  • pkg/satellite/controllers/resource.go
  • pkg/satellite/controllers/resource_test.go
  • pkg/satellite/controllers/resource_toggle_disk.go
  • pkg/satellite/controllers/resource_toggle_disk_test.go
  • pkg/satellite/controllers/runnable_common.go
  • pkg/satellite/controllers/snapshot_test.go
  • pkg/satellite/controllers/storage_sweeper.go
  • pkg/satellite/controllers/sweeper.go
  • pkg/satellite/fsm_dispatch.go
  • pkg/satellite/peer_identity_cleanup.go
  • pkg/satellite/reconciler.go
  • pkg/satellite/reconciler_drbd_test.go
  • pkg/satellite/reconciler_internal_test.go
  • pkg/satellite/reconciler_locale_test.go
  • pkg/satellite/ship_dispatch_test.go
  • pkg/storage/lvm/lvm_common.go
  • pkg/storage/lvm/lvm_thin.go
  • pkg/storage/zfs/zfs.go
  • pkg/store/inmemory.go
  • pkg/store/inmemory_resource.go
  • pkg/store/inmemory_resource_definition.go
  • pkg/store/inmemory_resource_group.go
  • pkg/store/inmemory_storage_pool.go
  • pkg/store/inmemory_volume_definition.go
  • pkg/store/k8s/patch_bug_204a_test.go
  • pkg/store/k8s/snapshots.go
  • pkg/store/k8s/storage_pools.go
  • pkg/uevent/netlink.go
  • pkg/version/bug_238_test.go
  • pkg/version/version.go
  • test/cheatsheet/cheatsheet_test.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ci/cleanup-unified

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request focuses on refactoring the satellite controllers and REST handlers to improve code maintainability and robustness. Key changes include extracting several helper functions (such as runPeriodicTick, upsertDeviceCRD, prepareDRBDApply, and sweepPool) to adhere to function length budgets, centralizing query parameter parsing via a new queryFlag helper, and adopting pointer-based configurations in the controller manager. Additionally, it improves context handling in tests and introduces sentinel errors for volume size validation. Feedback was provided regarding the robustness of the Status().Update call in the physical device discovery logic, suggesting the use of a Patch approach to better handle potential concurrent writes.

Comment on lines +611 to +614
existing, ok := p.upsertDeviceCRD(ctx, logger, name)
if !ok {
return "", false
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The helper upsertDeviceCRD returns a pointer to a local variable existing. While Go's escape analysis handles this by moving the variable to the heap, the current implementation of publishDeviceWithReason then modifies this object and passes it to Status().Update. If upsertDeviceCRD performed an Update on line 683, the existing object already has an updated ResourceVersion. However, if the Status().Update fails, the caller returns false without any retry logic. Consider if a retry loop or using a Patch for the status update would be more robust against concurrent writes.

kvaps and others added 19 commits May 22, 2026 17:43
The previous apt-get install path failed on ubuntu-latest (noble): LINBIT
ships linstor-client debs only for Debian (bookworm/bullseye/buster/trixie)
and on the LINBIT PPA, neither of which covers noble. `apt-get install
linstor-client` exits with "Unable to locate package".

The PyPI route is also a dead end: only `python-linstor` is published
there; `linstor-client` and the bare `linstor` package name are NOT on
PyPI, so `pip install linstor-client` exits with "No matching distribution
found".

Install path that actually works on ubuntu:24.04 (validated locally in a
fresh container):

  1. pip install `python-linstor==1.27.1` + `argcomplete` from PyPI for
     the runtime deps.
  2. pip install the v1.27.1 tarball directly from
     https://github.com/LINBIT/linstor-client/archive/refs/tags/v1.27.1.tar.gz
     with `--no-deps` to bypass an upstream setup.py typo (missing comma
     joins `python3-setuptools` + `python-linstor` into one malformed
     requirement).

Version pin is deliberate: tests/integration/group_h_test.go explicitly
documents that it is written against linstor-client 1.27.1 CLI behaviour.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Hoist r.Context() into a local before constructing the
deleteWithRollback struct so the capture/remove closures
receive ctx through closure capture rather than re-derive
it via r.Context() inside the goroutine — contextcheck.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Replace bare fmt.Errorf in validateVDSize with two sentinel
errors (ErrVolumeSizeBelowMinimum / ErrVolumeSizeAboveMaximum)
wrapped via %w, satisfying err113 without changing the
human-readable detail string or the FAIL_INVLD_VLM_SIZE
envelope semantics.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Move three unexported method helpers below the public method
that uses them (Start/RestoreVolumeFromSnapshot/Delete),
satisfying funcorder. Pure textual relocation, no logic
change.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
- queryFlag helper centralises ?<name>=true checks (autoplace,
  resource_adjust, resource_definitions, resource_toggle_disk)
  via strconv.ParseBool, replacing four scattered "true"
  comparisons (goconst).
- stampAutoTiebreakerOptOut hoists "false" into a propValueOff
  const (goconst).
- DRBD diskState comparison in satellite toggle-disk routed
  through drbd.DiskStateUpToDate (goconst).
- recordingDeleteProvider gets a blank line between the embedded
  flakyCreateProvider and its own deleted counter
  (embeddedstructfieldcheck).

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Convert NewManager and every internal wirer / runnable-builder
helper that previously accepted a 96-byte Config by value
(hugeParam). Hoist the three required-field gate into a
validateConfig helper so NewManager stays under the funlen
budget. Test caller for NewPhysicalDeviceDiscoveryRunnable
FromConfig and cmd/satellite main.go updated accordingly.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The Start methods on DiscoveredStorageRunnable and
OrphanSweeperRunnable shared an identical immediate-tick +
ticker-driven serve loop. Hoist it into runPeriodicTick in
runnable_common.go so dupl no longer flags the byte-identical
copies. Behaviour unchanged.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Bring four oversized satellite-controller functions back under
the cyclomatic / cognitive / funlen budgets by hoisting their
inner blocks into per-concern helpers:

- publishDeviceWithReason → upsertDeviceCRD (funlen)
- runApply → prepareDRBDApply + stampAppliedPeerUIDsBaseline
  (gocognit + gocyclo)
- sweepOnce → sweepPool + reapIfOrphan (gocyclo + funlen)
- EvictPeersByUIDMismatch → evictOnePeerByUIDMismatch (gocyclo)

All extractions preserve behaviour: comments, logging keys,
err semantics, and ordering are byte-equivalent to the
pre-split path; only the call sites move.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Mirror cozystack/cozystack's PR-pipeline runner selection: a labelled
`debug` PR lands on a long-lived `self-hosted` runner (so the
breakpoint step has somewhere stable to attach SSH); regular PRs land
on the CNCF-provided Oracle pool `oracle-vm-24cpu-96gb-x86-64` (24
CPU / 96 GB / x86-64). Both labels are registered org-wide on the
cozystack org by the CNCF infra team — no per-repo setup needed.

96 GB RAM is enough headroom for real-DRBD QEMU stands (Talos VMs in
.work/<stand>, ~50 GB RAM, KVM nested virt), so this also lifts the
cli-matrix tier into CI once the workflow drives `make e2e<N>`
explicitly.

Drop continue-on-error: true (added when the runner was the cramped
ubuntu-latest) — with real hardware the e2e job should gate the PR
again. Bump timeout-minutes 60 → 180 to match cozystack's e2e budget.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Two follow-ups to the cherry-picked bundle:

1. `pull-request.yml`'s integration job was still using the broken pip
   command (`pip install linstor-client python-linstor`) — that path
   was the interim fix on the PR #6 branch and got cherry-picked back
   in. Sync it with `integration.yml`'s GitHub tarball approach so
   both PR and push runs share the same install rationale.

2. `integration.yml` had its `pull_request:` trigger restored
   manually; that duplicates with the consolidated pipeline (both
   workflows fired on every PR). Drop it again — push-only on main,
   per the original PR #6 design.

3. Drop `continue-on-error: true` from the lint job. Lint debt is
   cleared in the same PR (the cherry-picked PR #8 commits), so the
   muzzle is no longer needed — lint should gate again.

4. Drop `continue-on-error: true` from the integration job. The
   tarball install path now works, so integration should gate again.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The previous `@latest` install pulls in setup-envtest v0.24.x, which is
now a separate Go submodule that requires `go >= 1.26.0`. blockstor is
on go 1.25.7 and controller-runtime v0.23.3, so the install step fails
before Integration tests can even start:

  go: sigs.k8s.io/controller-runtime/tools/setup-envtest@latest:
    sigs.k8s.io/controller-runtime/tools/setup-envtest@v0.24.1
    requires go >= 1.26.0 (running go 1.25.7; GOTOOLCHAIN=local)

Pin to `@release-0.23`, which tracks the same release line as the
runtime dependency in go.mod and only needs Go 1.25. Apply the fix to
both the standalone integration workflow and the consolidated
pull-request pipeline.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Production-code packages use `Bug NNN:` comment markers as a deliberate
cross-reference index into the project's bug tracker. The comments
document WHY a workaround exists and point at the underlying issue —
they are not stray TODOs that should be cleaned up.

godox was flagging 50 such markers across cmd/apiserver, pkg/api/v1,
pkg/drbd, pkg/luks, pkg/storage, pkg/store, pkg/uevent, pkg/rest,
pkg/satellite, pkg/dispatcher, pkg/version, and tests/contract.

Add a separate path-based exclusion rule per package that disables
ONLY godox (every other linter still gates) so the bug-tracker
markers don't fail CI while real godot/wrapcheck/etc. issues remain
enforced.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
`golangci-lint run --fix` cleanup pass — purely mechanical
transformations applied across 25 files:

- gofmt / gofumpt: whitespace + import grouping normalisation
- intrange: `for i := 0; i < N; i++` → `for i := range N` (Go 1.22+)
- wsl_v5: blank-line policy around `if`/assignment statements
- modernize: `m[k] = v` loop → `maps.Copy`, etc.
- nlreturn: blank line before `return`

All changes are non-functional; unit/contract test suites stay green.
Drops 24 of the 87 outstanding lint findings on this branch.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Mixed-bag cleanup pass closing 44 of the remaining 87 lint findings.

pkg/version:
- gochecknoglobals: `//nolint` on the ldflags-injected Version /
  GitCommit / LinstorGitHash / LinstorBuildTime vars (standard
  pattern — `-ldflags -X` only works on vars).
- godoclint: rewrite Project godoc to start with the symbol name.
- wrapcheck: wrap os.ReadFile errors in bug_238_test.go with
  fmt.Errorf "%w".

pkg/storage:
- gocritic emptyStringTest: `len(attr) == 0` → `attr == ""`.
- gosec G109: annotate volume-suffix Atoi→int32 conversions as
  bounded by the fixed `volNumberDigits=5` digit count.

internal/controller:
- gosec G115: annotate `len(diskful)→int32` as bounded by the
  in-memory slice cap.
- gocritic whyNoLint: add reasons to the two `//nolint:nilerr`
  directives.
- unparam: drop the always-nil error result from `placeCountForRD`.
- goconst: extract `labelResourceDefinition` /
  `labelTrueValue` in autosnapshot_controller; reuse
  `drbd.DiskStateUpToDate` in resource_controller for the `UpToDate`
  literal (removes both occurrences).
- mnd: extract `takenAllocsInitialCap` for the slice prealloc hint.
- nestif: split `stampDeadline` / `stripDeadlineIfPresent` /
  `ensureRDPortMinor` retry path into focused helpers
  (`stampDeadlineViaCRD`, `stripDeadlineViaCRD`,
  `reloadCommittedPortMinor`) so each branch stays under the
  complexity budget.

All changes are non-functional; unit tests stay green.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Continuation of the cleanup pass. Closes 22 more findings across
pkg/satellite, leaving 18 in the wider codebase.

.golangci.yml exclusions:
- unused/funlen: pkg/rest/peer_delete_sync.go (intentional Bug 342
  v10 plumbing, not yet wired from the REST handler).
- unused: pkg/satellite/reconciler_drbd_test.go (reserved sentinels
  for table-driven test growth).

pkg/satellite/attach.go:
- mnd: extract wipeZeroSpanMiB / wipeMinDeviceSizeForEndZeroMiB /
  mibBytes constants out of wipeDevice + readDeviceSizeMiB.
- noinlineerr: hoist inline `err` assignments out of the `if` heads.
- varnamelen: rename `sz` → `sizeBytes` in readDeviceSizeMiB.

pkg/satellite/controllers:
- snapshot_test.go staticcheck SA1019: drop deprecated `result.Requeue`
  reads — `result.IsZero()` is the modern equivalent that covers both
  the legacy `Requeue:true` shape and the `RequeueAfter>0` shape.
- observer.go / resource.go varnamelen: rename `v` → `vol` /
  `p` → `peer` in two range loops where the short form spans more
  than the scope cap.
- resource.go wrapcheck: wrap the Reader.Get error inside the
  retry-on-conflict closure with `cockroachdb/errors.Wrap`.

pkg/satellite/peer_identity_cleanup.go:
- revive unused-parameter: rename `vols` → `_` (signature shape kept
  for symmetry with the upstream call site).
- nilnil: annotate the legitimate `(nil, nil)` no-op return with a
  reason — the caller treats a nil map as no-op on purpose.

pkg/satellite/reconciler.go:
- noinlineerr: lift the `detachIfStillAttached` inline assignment.

pkg/satellite/reconciler_drbd_test.go:
- prealloc: size `steadyCmds` with `len(commands) * len(cases)`.

pkg/satellite/reconciler_internal_test.go,
pkg/satellite/reconciler_locale_test.go:
- revive error-strings / staticcheck ST1005 / err113: annotate the
  verbatim drbdadm / cryptsetup stderr fixtures with reasons; locale
  + capitalization fidelity is the entire point of these table rows.

All unit tests stay green.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Final pass closes the last 18 findings on this branch — make lint is
now zero issues.

.golangci.yml exclusions:
- goconst: pkg/rest/kv_store.go, pkg/rest/resources.go,
  pkg/rest/resource_toggle_disk.go — LINSTOR-wire field-name allow
  lists and short-branch wire-status strings; extracting constants
  doesn't aid readability since the literals match JSON keys
  verbatim.
- wrapcheck/varnamelen: pkg/rest/peer_delete_sync.go — intentional
  Bug 342 v10 plumbing, kept consistent with the prior unused/funlen
  carve-out.
- dupl: pkg/store/inmemory (un-anchored prefix) — Patch templates
  share the same shape across InMemory store impls on purpose.
- gosec: cmd/linstor-trace-recorder/ — G703 taint analysis can't see
  through the explicit filepath.Base sanitisation right above the
  os.WriteFile call.

pkg/rest:
- wrapcheck: wrap Store.Resources().List errors in nodes.go and
  storage_pools.go; pin the bytes.Buffer.Write nolint with reason.
- gocritic paramTypeCombine: collapse the two
  `*apiv1.ResourceDefinition` params on placeRestoredResources.
- gocritic rangeValCopy: index pkg/rest/storage_pools.go's
  Volumes loop instead of copying 176-byte structs per iteration.
- goconst: introduce `storPoolPropKey = "StorPoolName"` in
  snapshot_restore.go to match the existing test-side `poolKey`.
- revive unexported-return: export ResolveHostFunc as the canonical
  alias for the resolveHostFunc seam; SetResolveHost now uses the
  exported name.
- revive unused-parameter: rename `host` → `_` in server_test.go's
  resolver stub.
- varnamelen: rename `j` → `idx` in volume_definitions.go.

pkg/store:
- varnamelen: rename `k` → `key` across
  inmemory_resource/storage_pool/volume_definition patch helpers.
- funlen: split crdToWireSnapshot into vdPropsMap +
  wireVolumeDefinitions + wirePerNodeSnapshots.
- goconst: introduce storagePoolStateOk / storagePoolStateFaulty
  constants in storage_pools.go.

cmd/linstor-trace-recorder:
- hoist the post-sanitisation path into a local before WriteFile so
  the gosec exclusion has a clean target line.

All unit tests stay green.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
CI (Linux) surfaced three findings invisible on the macOS dev box.

pkg/uevent/netlink.go is gated by `//go:build linux`, so the local
`golangci-lint` runner skipped it entirely:
- varnamelen: rename `fd` → `sockFD` in New (only the local var; the
  one-letter `fd` struct field stays — idiomatic for a Listener and
  out of varnamelen's scope cap).
- wsl_v5: add the missing blank line before `_ = unix.Close(l.fd)`
  inside the ctx-cancel goroutine.

pkg/storage/file/diskfree.go: unconvert flagged `int64(stat.Bsize)`
on Linux (where Bsize is already int64) but the conversion is
required on macOS (Bsize is uint32). Carve out a path-scoped
unconvert exclusion in .golangci.yml so the cross-platform shape
stays explicit.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
`make docker-build` ran `docker build -t $IMG .` with no `--target`,
so Docker picked the LAST stage in the multi-stage Dockerfile —
`satellite` (debian:trixie-slim, no USER directive). `make deploy`
then deployed the controller-manager Pod with that satellite image
underneath kustomize's `runAsNonRoot: true` securityContext, and
kubelet refused to start it:

  Error: container has runAsNonRoot and image will run as root
         (container: manager)

E2E (`test/e2e/e2e_test.go::Manager should run successfully`) timed
out after 120s waiting for the controller pod to become ready,
which is what surfaced on the consolidated CI pipeline (PR #6) once
the kustomize manifest fix unblocked `make deploy` itself.

Pin the build target to `controller` — the distroless-nonroot stage
that already carries `USER 65532:65532`. The `apiserver` and
`satellite` images are built by separate targets in the production
release pipeline; e2e only needs the controller manager image.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
`TestBug204aLostUpdateOnVanillaUpdate` is a witness for the
Get→mutate→Update lost-update race: it fires 50 concurrent goroutines
against the in-memory fakeClient and asserts at least one mutation
gets stomped. On a fast scheduler (24-vCPU Oracle CI runner) the
goroutines can serialise through the lock-free fakeClient before any
of them race, so `lost == 0` and the test fails with a false-positive
"fixture may not exercise the race".

Loop the burst up to 10 attempts and accept the witness if any
attempt observes the race. If all 10 come back clean, the underlying
fakeClient really doesn't model the race and the test fails — that's
a real fixture regression, not scheduler luck.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@kvaps kvaps force-pushed the ci/cleanup-unified branch from 88c10ec to 3dabf41 Compare May 22, 2026 15:43
kvaps and others added 2 commits May 22, 2026 17:48
Drop the `contains(labels.*.name, 'debug')` gate from the breakpoint
step. The `debug` label still selects the self-hosted runner (so a
maintainer can attach to the host directly without SSH dance), but
the breakpoint itself opens on every e2e failure as long as the repo
variable BREAKPOINT_ENDPOINT is set. Forks have the variable scrubbed
by GitHub's fork-PR rules so the step silently skips there.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Pod spec hardcoded the kubebuilder default `/manager` command, but
the Dockerfile's controller stage builds `/controller`. Result:

  exec: "/manager": stat /manager: no such file or directory
  RunContainerError → e2e wait-for-ready timeout

Found via the SSH breakpoint on PR #10's E2E failure. The previous
fix in this PR (`docker-build --target controller`) put the right
image in place; this completes the chain to make the container
actually exec the binary that ships inside it.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@kvaps kvaps marked this pull request as ready for review May 22, 2026 16:40
@kvaps kvaps merged commit d02480b into main May 22, 2026
8 checks passed
@kvaps kvaps deleted the ci/cleanup-unified branch May 22, 2026 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant