ci: combined cleanup — integration install + lint debt + Oracle runners#10
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (66)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request focuses on refactoring the satellite controllers and REST handlers to improve code maintainability and robustness. Key changes include extracting several helper functions (such as runPeriodicTick, upsertDeviceCRD, prepareDRBDApply, and sweepPool) to adhere to function length budgets, centralizing query parameter parsing via a new queryFlag helper, and adopting pointer-based configurations in the controller manager. Additionally, it improves context handling in tests and introduces sentinel errors for volume size validation. Feedback was provided regarding the robustness of the Status().Update call in the physical device discovery logic, suggesting the use of a Patch approach to better handle potential concurrent writes.
| existing, ok := p.upsertDeviceCRD(ctx, logger, name) | ||
| if !ok { | ||
| return "", false | ||
| } |
There was a problem hiding this comment.
The helper upsertDeviceCRD returns a pointer to a local variable existing. While Go's escape analysis handles this by moving the variable to the heap, the current implementation of publishDeviceWithReason then modifies this object and passes it to Status().Update. If upsertDeviceCRD performed an Update on line 683, the existing object already has an updated ResourceVersion. However, if the Status().Update fails, the caller returns false without any retry logic. Consider if a retry loop or using a Patch for the status update would be more robust against concurrent writes.
The previous apt-get install path failed on ubuntu-latest (noble): LINBIT
ships linstor-client debs only for Debian (bookworm/bullseye/buster/trixie)
and on the LINBIT PPA, neither of which covers noble. `apt-get install
linstor-client` exits with "Unable to locate package".
The PyPI route is also a dead end: only `python-linstor` is published
there; `linstor-client` and the bare `linstor` package name are NOT on
PyPI, so `pip install linstor-client` exits with "No matching distribution
found".
Install path that actually works on ubuntu:24.04 (validated locally in a
fresh container):
1. pip install `python-linstor==1.27.1` + `argcomplete` from PyPI for
the runtime deps.
2. pip install the v1.27.1 tarball directly from
https://github.com/LINBIT/linstor-client/archive/refs/tags/v1.27.1.tar.gz
with `--no-deps` to bypass an upstream setup.py typo (missing comma
joins `python3-setuptools` + `python-linstor` into one malformed
requirement).
Version pin is deliberate: tests/integration/group_h_test.go explicitly
documents that it is written against linstor-client 1.27.1 CLI behaviour.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Hoist r.Context() into a local before constructing the deleteWithRollback struct so the capture/remove closures receive ctx through closure capture rather than re-derive it via r.Context() inside the goroutine — contextcheck. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Replace bare fmt.Errorf in validateVDSize with two sentinel errors (ErrVolumeSizeBelowMinimum / ErrVolumeSizeAboveMaximum) wrapped via %w, satisfying err113 without changing the human-readable detail string or the FAIL_INVLD_VLM_SIZE envelope semantics. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Move three unexported method helpers below the public method that uses them (Start/RestoreVolumeFromSnapshot/Delete), satisfying funcorder. Pure textual relocation, no logic change. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
- queryFlag helper centralises ?<name>=true checks (autoplace, resource_adjust, resource_definitions, resource_toggle_disk) via strconv.ParseBool, replacing four scattered "true" comparisons (goconst). - stampAutoTiebreakerOptOut hoists "false" into a propValueOff const (goconst). - DRBD diskState comparison in satellite toggle-disk routed through drbd.DiskStateUpToDate (goconst). - recordingDeleteProvider gets a blank line between the embedded flakyCreateProvider and its own deleted counter (embeddedstructfieldcheck). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Convert NewManager and every internal wirer / runnable-builder helper that previously accepted a 96-byte Config by value (hugeParam). Hoist the three required-field gate into a validateConfig helper so NewManager stays under the funlen budget. Test caller for NewPhysicalDeviceDiscoveryRunnable FromConfig and cmd/satellite main.go updated accordingly. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The Start methods on DiscoveredStorageRunnable and OrphanSweeperRunnable shared an identical immediate-tick + ticker-driven serve loop. Hoist it into runPeriodicTick in runnable_common.go so dupl no longer flags the byte-identical copies. Behaviour unchanged. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Bring four oversized satellite-controller functions back under the cyclomatic / cognitive / funlen budgets by hoisting their inner blocks into per-concern helpers: - publishDeviceWithReason → upsertDeviceCRD (funlen) - runApply → prepareDRBDApply + stampAppliedPeerUIDsBaseline (gocognit + gocyclo) - sweepOnce → sweepPool + reapIfOrphan (gocyclo + funlen) - EvictPeersByUIDMismatch → evictOnePeerByUIDMismatch (gocyclo) All extractions preserve behaviour: comments, logging keys, err semantics, and ordering are byte-equivalent to the pre-split path; only the call sites move. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Mirror cozystack/cozystack's PR-pipeline runner selection: a labelled `debug` PR lands on a long-lived `self-hosted` runner (so the breakpoint step has somewhere stable to attach SSH); regular PRs land on the CNCF-provided Oracle pool `oracle-vm-24cpu-96gb-x86-64` (24 CPU / 96 GB / x86-64). Both labels are registered org-wide on the cozystack org by the CNCF infra team — no per-repo setup needed. 96 GB RAM is enough headroom for real-DRBD QEMU stands (Talos VMs in .work/<stand>, ~50 GB RAM, KVM nested virt), so this also lifts the cli-matrix tier into CI once the workflow drives `make e2e<N>` explicitly. Drop continue-on-error: true (added when the runner was the cramped ubuntu-latest) — with real hardware the e2e job should gate the PR again. Bump timeout-minutes 60 → 180 to match cozystack's e2e budget. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Two follow-ups to the cherry-picked bundle: 1. `pull-request.yml`'s integration job was still using the broken pip command (`pip install linstor-client python-linstor`) — that path was the interim fix on the PR #6 branch and got cherry-picked back in. Sync it with `integration.yml`'s GitHub tarball approach so both PR and push runs share the same install rationale. 2. `integration.yml` had its `pull_request:` trigger restored manually; that duplicates with the consolidated pipeline (both workflows fired on every PR). Drop it again — push-only on main, per the original PR #6 design. 3. Drop `continue-on-error: true` from the lint job. Lint debt is cleared in the same PR (the cherry-picked PR #8 commits), so the muzzle is no longer needed — lint should gate again. 4. Drop `continue-on-error: true` from the integration job. The tarball install path now works, so integration should gate again. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The previous `@latest` install pulls in setup-envtest v0.24.x, which is
now a separate Go submodule that requires `go >= 1.26.0`. blockstor is
on go 1.25.7 and controller-runtime v0.23.3, so the install step fails
before Integration tests can even start:
go: sigs.k8s.io/controller-runtime/tools/setup-envtest@latest:
sigs.k8s.io/controller-runtime/tools/setup-envtest@v0.24.1
requires go >= 1.26.0 (running go 1.25.7; GOTOOLCHAIN=local)
Pin to `@release-0.23`, which tracks the same release line as the
runtime dependency in go.mod and only needs Go 1.25. Apply the fix to
both the standalone integration workflow and the consolidated
pull-request pipeline.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Production-code packages use `Bug NNN:` comment markers as a deliberate cross-reference index into the project's bug tracker. The comments document WHY a workaround exists and point at the underlying issue — they are not stray TODOs that should be cleaned up. godox was flagging 50 such markers across cmd/apiserver, pkg/api/v1, pkg/drbd, pkg/luks, pkg/storage, pkg/store, pkg/uevent, pkg/rest, pkg/satellite, pkg/dispatcher, pkg/version, and tests/contract. Add a separate path-based exclusion rule per package that disables ONLY godox (every other linter still gates) so the bug-tracker markers don't fail CI while real godot/wrapcheck/etc. issues remain enforced. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
`golangci-lint run --fix` cleanup pass — purely mechanical transformations applied across 25 files: - gofmt / gofumpt: whitespace + import grouping normalisation - intrange: `for i := 0; i < N; i++` → `for i := range N` (Go 1.22+) - wsl_v5: blank-line policy around `if`/assignment statements - modernize: `m[k] = v` loop → `maps.Copy`, etc. - nlreturn: blank line before `return` All changes are non-functional; unit/contract test suites stay green. Drops 24 of the 87 outstanding lint findings on this branch. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Mixed-bag cleanup pass closing 44 of the remaining 87 lint findings. pkg/version: - gochecknoglobals: `//nolint` on the ldflags-injected Version / GitCommit / LinstorGitHash / LinstorBuildTime vars (standard pattern — `-ldflags -X` only works on vars). - godoclint: rewrite Project godoc to start with the symbol name. - wrapcheck: wrap os.ReadFile errors in bug_238_test.go with fmt.Errorf "%w". pkg/storage: - gocritic emptyStringTest: `len(attr) == 0` → `attr == ""`. - gosec G109: annotate volume-suffix Atoi→int32 conversions as bounded by the fixed `volNumberDigits=5` digit count. internal/controller: - gosec G115: annotate `len(diskful)→int32` as bounded by the in-memory slice cap. - gocritic whyNoLint: add reasons to the two `//nolint:nilerr` directives. - unparam: drop the always-nil error result from `placeCountForRD`. - goconst: extract `labelResourceDefinition` / `labelTrueValue` in autosnapshot_controller; reuse `drbd.DiskStateUpToDate` in resource_controller for the `UpToDate` literal (removes both occurrences). - mnd: extract `takenAllocsInitialCap` for the slice prealloc hint. - nestif: split `stampDeadline` / `stripDeadlineIfPresent` / `ensureRDPortMinor` retry path into focused helpers (`stampDeadlineViaCRD`, `stripDeadlineViaCRD`, `reloadCommittedPortMinor`) so each branch stays under the complexity budget. All changes are non-functional; unit tests stay green. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Continuation of the cleanup pass. Closes 22 more findings across pkg/satellite, leaving 18 in the wider codebase. .golangci.yml exclusions: - unused/funlen: pkg/rest/peer_delete_sync.go (intentional Bug 342 v10 plumbing, not yet wired from the REST handler). - unused: pkg/satellite/reconciler_drbd_test.go (reserved sentinels for table-driven test growth). pkg/satellite/attach.go: - mnd: extract wipeZeroSpanMiB / wipeMinDeviceSizeForEndZeroMiB / mibBytes constants out of wipeDevice + readDeviceSizeMiB. - noinlineerr: hoist inline `err` assignments out of the `if` heads. - varnamelen: rename `sz` → `sizeBytes` in readDeviceSizeMiB. pkg/satellite/controllers: - snapshot_test.go staticcheck SA1019: drop deprecated `result.Requeue` reads — `result.IsZero()` is the modern equivalent that covers both the legacy `Requeue:true` shape and the `RequeueAfter>0` shape. - observer.go / resource.go varnamelen: rename `v` → `vol` / `p` → `peer` in two range loops where the short form spans more than the scope cap. - resource.go wrapcheck: wrap the Reader.Get error inside the retry-on-conflict closure with `cockroachdb/errors.Wrap`. pkg/satellite/peer_identity_cleanup.go: - revive unused-parameter: rename `vols` → `_` (signature shape kept for symmetry with the upstream call site). - nilnil: annotate the legitimate `(nil, nil)` no-op return with a reason — the caller treats a nil map as no-op on purpose. pkg/satellite/reconciler.go: - noinlineerr: lift the `detachIfStillAttached` inline assignment. pkg/satellite/reconciler_drbd_test.go: - prealloc: size `steadyCmds` with `len(commands) * len(cases)`. pkg/satellite/reconciler_internal_test.go, pkg/satellite/reconciler_locale_test.go: - revive error-strings / staticcheck ST1005 / err113: annotate the verbatim drbdadm / cryptsetup stderr fixtures with reasons; locale + capitalization fidelity is the entire point of these table rows. All unit tests stay green. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Final pass closes the last 18 findings on this branch — make lint is now zero issues. .golangci.yml exclusions: - goconst: pkg/rest/kv_store.go, pkg/rest/resources.go, pkg/rest/resource_toggle_disk.go — LINSTOR-wire field-name allow lists and short-branch wire-status strings; extracting constants doesn't aid readability since the literals match JSON keys verbatim. - wrapcheck/varnamelen: pkg/rest/peer_delete_sync.go — intentional Bug 342 v10 plumbing, kept consistent with the prior unused/funlen carve-out. - dupl: pkg/store/inmemory (un-anchored prefix) — Patch templates share the same shape across InMemory store impls on purpose. - gosec: cmd/linstor-trace-recorder/ — G703 taint analysis can't see through the explicit filepath.Base sanitisation right above the os.WriteFile call. pkg/rest: - wrapcheck: wrap Store.Resources().List errors in nodes.go and storage_pools.go; pin the bytes.Buffer.Write nolint with reason. - gocritic paramTypeCombine: collapse the two `*apiv1.ResourceDefinition` params on placeRestoredResources. - gocritic rangeValCopy: index pkg/rest/storage_pools.go's Volumes loop instead of copying 176-byte structs per iteration. - goconst: introduce `storPoolPropKey = "StorPoolName"` in snapshot_restore.go to match the existing test-side `poolKey`. - revive unexported-return: export ResolveHostFunc as the canonical alias for the resolveHostFunc seam; SetResolveHost now uses the exported name. - revive unused-parameter: rename `host` → `_` in server_test.go's resolver stub. - varnamelen: rename `j` → `idx` in volume_definitions.go. pkg/store: - varnamelen: rename `k` → `key` across inmemory_resource/storage_pool/volume_definition patch helpers. - funlen: split crdToWireSnapshot into vdPropsMap + wireVolumeDefinitions + wirePerNodeSnapshots. - goconst: introduce storagePoolStateOk / storagePoolStateFaulty constants in storage_pools.go. cmd/linstor-trace-recorder: - hoist the post-sanitisation path into a local before WriteFile so the gosec exclusion has a clean target line. All unit tests stay green. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
CI (Linux) surfaced three findings invisible on the macOS dev box. pkg/uevent/netlink.go is gated by `//go:build linux`, so the local `golangci-lint` runner skipped it entirely: - varnamelen: rename `fd` → `sockFD` in New (only the local var; the one-letter `fd` struct field stays — idiomatic for a Listener and out of varnamelen's scope cap). - wsl_v5: add the missing blank line before `_ = unix.Close(l.fd)` inside the ctx-cancel goroutine. pkg/storage/file/diskfree.go: unconvert flagged `int64(stat.Bsize)` on Linux (where Bsize is already int64) but the conversion is required on macOS (Bsize is uint32). Carve out a path-scoped unconvert exclusion in .golangci.yml so the cross-platform shape stays explicit. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
`make docker-build` ran `docker build -t $IMG .` with no `--target`,
so Docker picked the LAST stage in the multi-stage Dockerfile —
`satellite` (debian:trixie-slim, no USER directive). `make deploy`
then deployed the controller-manager Pod with that satellite image
underneath kustomize's `runAsNonRoot: true` securityContext, and
kubelet refused to start it:
Error: container has runAsNonRoot and image will run as root
(container: manager)
E2E (`test/e2e/e2e_test.go::Manager should run successfully`) timed
out after 120s waiting for the controller pod to become ready,
which is what surfaced on the consolidated CI pipeline (PR #6) once
the kustomize manifest fix unblocked `make deploy` itself.
Pin the build target to `controller` — the distroless-nonroot stage
that already carries `USER 65532:65532`. The `apiserver` and
`satellite` images are built by separate targets in the production
release pipeline; e2e only needs the controller manager image.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
`TestBug204aLostUpdateOnVanillaUpdate` is a witness for the Get→mutate→Update lost-update race: it fires 50 concurrent goroutines against the in-memory fakeClient and asserts at least one mutation gets stomped. On a fast scheduler (24-vCPU Oracle CI runner) the goroutines can serialise through the lock-free fakeClient before any of them race, so `lost == 0` and the test fails with a false-positive "fixture may not exercise the race". Loop the burst up to 10 attempts and accept the witness if any attempt observes the race. If all 10 come back clean, the underlying fakeClient really doesn't model the race and the test fails — that's a real fixture regression, not scheduler luck. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Drop the `contains(labels.*.name, 'debug')` gate from the breakpoint step. The `debug` label still selects the self-hosted runner (so a maintainer can attach to the host directly without SSH dance), but the breakpoint itself opens on every e2e failure as long as the repo variable BREAKPOINT_ENDPOINT is set. Forks have the variable scrubbed by GitHub's fork-PR rules so the step silently skips there. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Pod spec hardcoded the kubebuilder default `/manager` command, but the Dockerfile's controller stage builds `/controller`. Result: exec: "/manager": stat /manager: no such file or directory RunContainerError → e2e wait-for-ready timeout Found via the SSH breakpoint on PR #10's E2E failure. The previous fix in this PR (`docker-build --target controller`) put the right image in place; this completes the chain to make the container actually exec the binary that ships inside it. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Summary
Three independent CI clean-ups bundled into one PR so the pipeline runs once instead of three times.
1. Integration: pip-install linstor-client from LINBIT GitHub tarball
Original PR #7. Closes the
pip install linstor-client"No matching distribution found" failure on ubuntu-latest by pulling the CLI from the upstream LINBIT GitHub release tarball with--no-deps(sidesteps a typo in v1.27.1'ssetup.py), and installingpython-linstor+argcompletefrom PyPI separately.2. Lint cleanup — clear ~15 pre-existing golangci-lint findings
Original PR #8. Eight independent commits, one per finding category:
r.Context()propagated into closuresErrVolumeSizeBelowMinimum/ErrVolumeSizeAboveMaximumsentinelsqueryFlag,propValueOff,drbd.DiskStateUpToDateConfigpassed by pointer throughNewManagerrunPeriodicTickextracted topkg/satellite/controllers/runnable_common.go3. E2E: switch to CNCF Oracle runners
Original PR #9. Mirrors
cozystack/cozystack's e2e runner expression:24 CPU / 96 GB / KVM nested virt is enough for both kind-based e2e and a future real-DRBD QEMU stand. Drops the temporary
continue-on-error: trueon e2e (added when the runner was the cramped ubuntu-latest) and bumpstimeout-minutes: 60 → 180to match cozystack.Why bundle them
These three changes don't overlap functionally but they all hit the same PR pipeline. Running them in a single CI cycle saves three Oracle runner allocations.
Test plan
make lintshows zero remaining findings from the categories listed above.go build ./... && go vet ./...clean (verified locally).go test ./...passes the same set asorigin/main.linstor-client v1.27.1andlinstor --versionprints1.27.1.oracle-vm-24cpu-96gb-x86-64by default; on adebug-labelled PR lands onself-hosted.Closes: #7, #8, #9 (superseded — same commits, single pipeline run).