Skip to content

test: expand unit-test coverage across pkg/#51

Open
luthermonson wants to merge 24 commits into
mainfrom
tests/expand-unit-coverage
Open

test: expand unit-test coverage across pkg/#51
luthermonson wants to merge 24 commits into
mainfrom
tests/expand-unit-coverage

Conversation

@luthermonson
Copy link
Copy Markdown
Contributor

Summary

Adds 21 prioritized unit tests across 12 packages from a coverage audit, plus three small fixes uncovered while landing them.

Test additions (21 items)

  • Group A — config / workflow / metrics / localtunnel / vm: Provider detection + validate (5 forges, App vs PAT, webhook secret), duration parsers ("7d"/"168h"/invalid), Container.UnmarshalYAML (string/mapping/null), Prometheus label cardinality guardrail, dropped-connection abort + retry schedule, macOS SSH key material.
  • Group B — dind / cni: tar path-traversal rejection (cross-platform + Linux build-context), gzip magic detection in extractTarGz, handleImageBuild early error paths (nil client, missing Dockerfile, invalid tar), streamToTempFile streaming (large/partial/error/empty).
  • Group C — github / scheduler: AppAuth.Token() concurrent refresh (32 goroutines, asserts single upstream call), webhook registration partial-failure rollback, backoffDuration repo-state + 60s cap, concurrent claim dedup, destroy-still-runs-after-cancel.
  • Group D — forgerunner / runtime / networking: ParseCommand edge cases (= in values, malformed prefixes), heredoc parsing (empty delimiter, missing EOF, CRLF, nested), SecretMasker ordering / unicode / 150 secrets, CleanOldLogs symlinks/permissions/mid-walk, pullMu serialization (10 goroutines, panic-releases-lock), pickSubnet retry + 10.199.0.0/16 fallback.

Production code changes

Five small, behavior-preserving extractions to make logic testable:

  • pkg/networking/networking.gopickSubnetFromAddrs([]net.Addr) + subnetInUseAmong
  • pkg/runtime/runtime.gowithPullLock(fn) + pullImageLocked
  • pkg/dind/exec.gostreamToTempFile
  • pkg/github/appauth.go — optional tokenURL/httpClient fields (zero-valued = production behavior)

Fixes uncovered while testing

  • pkg/scheduler/scheduler.go — three _ = s.cfg.Runtime.Destroy(...) calls now log on error, matching the existing LinuxDispatcher.Destroy pattern. Found while writing scheduler cancel-cleanup tests.
  • mage/download/download.go — bump LinuxVirtVersion 6.12.81-r06.12.83-r0. Alpine rotated the package out of v3.21 main, breaking mage ci build stage. Verified both x86_64 and aarch64 URLs.
  • pkg/vm/embed_test.go — drop incidental "placeholder" substring check on the error message; the actual contract is on the returned filename, which is still asserted.

Test plan

  • mage ci passes locally (lint, test, build) on the integrated branch
  • All in-scope packages green; no flakes at -count=10 on concurrency tests
  • CI run on Linux exercises -race (Windows toolchain has no gcc)
  • Reviewer skim of the 5 production extractions for behavior preservation

Three cleanup sites silently discarded the error from
Runtime.Destroy via _ =, violating the project's no-blank-discard
rule. Bring them in line with the existing LinuxDispatcher.Destroy
pattern in the same code paths: log a warning with job_id and the
error.

Sites: handleQueued goroutine cleanup, handleCompleted env branch,
destroyAll env branch.
TestFindEmbedded_SkipsPlaceholder asserted that the error message
returned when no real rootfs is embedded did not contain the word
"placeholder". The error message at linuxvm_windows.go:812 legitimately
includes that word as user-facing guidance ("only the placeholder
rootfs is embedded"), so the test fails on any build where the rootfs
is a placeholder — which is the exact case the test was meant to cover.

The remaining check on the matched filename already covers the actual
contract: findEmbedded must not return a file containing "placeholder".
pkg/config: add tests for GitHub App + PAT happy paths, env-token fallback
when only owner is set, hex-format webhook secret, ngrok-tunnel secret
generation, multi-provider validation paths, and additional duration parser
edge cases ("7d", "168h", single "d", malformed days, mixed units).
Also covers Providers() ordering across all 5 providers.

pkg/workflow: cover Container.UnmarshalYAML for both string shorthand and
mapping forms, including null/empty inputs, mappings without an image key,
mappings with extra unknown fields, sequence inputs (which must error),
nested-in-Job parsing (both forms), and a regression guard against the
type-alias trick that prevents recursive UnmarshalYAML calls.
Per-job label keys must remain bounded (provider, repo, status, etc.) to
avoid Prometheus series explosion. New tests assert each *Vec metric's
declared variable labels match the expected set and reject any "banned"
high-cardinality labels (job_id, run_id, sha, runner, jit_id, event_id).

Also adds a scrape-the-exposition test that triggers each metric, hits
/metrics over HTTP, and confirms no banned label name appears in the
serialized output. Future label additions that include per-run identifiers
will fail this guardrail.
Adds a fakeTunnelServer harness that registers via httptest and exposes
a real TCP listener on a random port. Tests cover the drop-and-abort
path that proxy() takes when the backhaul connection closes (no
auto-reconnect at the per-conn level — abort propagates to Accept()),
plus context-cancel behavior for clean shutdown.

Also documents the i*i*3-second backoff schedule used by proxy()'s
initial dial-retry loop and verifies conn.Done() handles repeated
close attempts via sync.Once (preventing a closed-channel panic during
reconnect/cleanup).
pkg/vm: add cross-platform tests for the SSH key injection layer that
macosvm_darwin.go consumes via MacOSVMConfig. Verifies the generated
ed25519 private key satisfies the runtime type assertion used in
setupRunnerViaSSH/monitorRunner, that the public key parses as an
authorized_keys entry, that priv↔pub agree on the wire, and that
strings.TrimSpace handles the trailing newline cleanly. Also documents
the partial-key-material cases (signer-only, pubkey-only, neither).

pkg/vm: relax TestFindEmbedded_SkipsPlaceholder — the error message
returned when no real rootfs is embedded legitimately mentions
"placeholder" as informational text. Only the matched filename
matters for the assertion.
Adds Linux-only unit tests for extractTarGz exercising:
- gzip magic detection and rejection of non-gzip streams
- directory and regular-file entries
- path traversal rejection (../, absolute, mid-path escape)
- symlink handling within destination
- corrupt and truncated archives
- unknown tar entry types silently ignored
Adds focused unit tests for the dind tar extractors:
- extractTar (cross-platform): symlink and regular path traversal,
  rejection of ../, absolute paths, mid-path escapes; symlink in/out of
  dest; truncated and empty archives. Symlink-specific cases skip on
  Windows where unprivileged symlink creation is unreliable.
- extractBuildContext (Linux-only): plain tar and gzip-tar code paths,
  gzip magic detection, traversal rejection in both branches, corrupt
  gzip handling, empty/short bodies, directory entries.
Drives the early error paths of the /build handler without invoking
real buildah:
- nil containerd client returns 500 immediately
- versioned route (/v1.45/build) reaches the handler
- invalid (non-tar) request body surfaces an error message in the
  streamed JSON response
- valid tar without a Dockerfile reports the missing-Dockerfile error
- custom Dockerfile name from the query string is honored

Uses a sentinel zero-value *client.Client so the early nil-check
passes and the request progresses far enough to exercise context
extraction and Dockerfile lookup before any client method is called.
Pulls the temp-file streaming logic out of copyToViaExec into a small
streamToTempFile helper so its behavior is unit-testable without a
real containerd. Behavior is preserved: create temp file, io.Copy
the request body in, close, and return the path. On error the helper
cleans up the partial temp file and returns a wrapped error.

Tests cover the cases called out in the worktree spec:
- small payload round-trips correctly
- large (2 MiB) payload streams without truncation
- chunked / partial-write reader (mimics slow HTTP body)
- mid-stream read error surfaces a wrapped error and removes the
  partial temp file
- empty body produces a zero-byte file
- CreateTemp pattern is honored (suffix and prefix preserved)

The full async-wait ordering inside copyToViaExec (proc.Wait before
proc.Start) requires a real containerd task and is left to the
existing test/e2e/dind privileged suite.
Adds unit tests for two correctness-critical github paths:

- AppAuth.Token() concurrent refresh: 32 goroutines hit Token() inside
  the 5-minute expiry window; the test asserts only one upstream HTTP
  refresh occurs and that all callers see a valid token. A second test
  asserts that once a long-lived token is cached, subsequent calls take
  the RLock-only fast path. To enable testability, AppAuth gains
  internal-only tokenURL and httpClient fields (zero-valued -> existing
  production behavior).

- RegisterWebhooks partial-failure rollback: a multi-repo registration
  whose 3rd POST fails must DELETE the hooks already created for the
  earlier repos. A second test asserts that even if one delete fails
  during rollback, the remaining hooks are still cleaned up.

Both tests use httptest.Server / counting RoundTripper / fake handler
mux — no network calls leave the test process.
Adds three groups of unit tests around scheduler invariants the existing
suite did not exercise:

- backoffDuration repo-level state: a simulated provider event sequence
  (failure/success/failure) that asserts resetBackoff returns the next
  failure to the 2s base; a 60s cap stress test running 50 iterations;
  and an interleaved multi-repo test guarding against accidental shared
  state in the failureCounts map.

- Concurrent claim + dedup (item #13): 32 goroutines call handleQueued
  for the same job id at the same time; only one ClaimJob and one
  dispatch CreateJob fire. A companion test verifies that two providers
  both claiming the SAME numeric job id each get exactly one claim
  (composite-key routing).

- Timeout / cancellation during cleanup (item #14): handleCompleted
  invoked with a cancelled parent ctx still calls dispatch.Destroy via
  context.Background; destroyAll destroys every dispatched container
  even when the parent ctx is cancelled; handleLinuxJob whose Wait
  errors triggers a Destroy on the cleanup path; destroyAll Stops every
  macosVM exactly once.

Tests use an in-process gRPC dispatch fake (TCP loopback) plus a
counting provider mock so no external systems are touched.
Extends commands_test.go with edge cases for:
- ParseCommand: '=' inside values, naive comma splitting (no quote
  support), empty messages, malformed prefixes (missing closing '::',
  empty name, etc.), params without '=', and double-colon in value
- ParseFileCommands heredoc: empty delimiter, missing EOF, CRLF line
  endings, nested EOF tokens (substring matches must not terminate),
  immediate EOF, multiple heredoc blocks
- SecretMasker: overlapping secrets (documents that registration order
  determines masking), empty list, secrets under 3 chars dropped at
  registration and AddSecret, multi-byte UTF-8 secrets, deterministic
  ordering with 150 secrets, multiple occurrences of the same secret

These tests document the parser's current behavior so future changes
(e.g. adding quote support or normalizing CRLF) intentionally update
the assertions instead of silently regressing.
Extracts pickSubnetFromAddrs(log, []net.Addr) from pickSubnet so the
selection logic can be tested without touching the host's interfaces.
The new helper takes a snapshot of addresses and runs the same strategy:
prefer DefaultSubnet, retry up to 10 random 10.x.0.0/16 candidates,
fall back to 10.199.0.0/16. pickSubnet now wraps it with a tiny
hostInterfaceAddrs() collector. subnetInUse is also rebased onto a
testable subnetInUseAmong helper for the same reason.

New tests:
- DefaultFreeReturnsDefault: no overlap → DefaultSubnet, no retries
- NoAddrs: nil/empty slice falls through to DefaultSubnet
- DefaultBusyPicksAlternative: conflict on default → some other 10.x
- AllTenRangesBusyFallsBack: every 10.x /16 occupied → 10.199.0.0/16
- SkipsBusyCandidatesUntilFree: handful of blocked /16s, succeeds
- subnetInUseAmong: positive, negative, and invalid-CIDR paths

No behavior change in production code — pickSubnet's external contract
is identical.
CleanOldLogs (item #15):
- StaleSymlinkRemoved: a .log symlink older than maxAge is unlinked
  while its target file remains untouched
- BrokenSymlinkRemoved: a dangling .log symlink is removed (Lstat path)
- PermissionDeniedFile: chmodding the dir read-only causes os.Remove
  to fail; CleanOldLogs logs at debug level and continues
- FileRemovedBeforeStat: simulates the mid-walk race by deleting
  files before the loop runs, asserting the function tolerates
  missing entries
- LogSuffixOnDirIgnored: files whose name contains ".log" but doesn't
  end with it are skipped (mirrors existing dir-with-.log-name test)
- BoundaryAge: a file barely younger than maxAge is preserved
- DiscardsOutputWriter: confirms the helper logs the expected key
  ("removed old job log") at debug level

Symlink and permission tests skip on Windows because they need
either developer-mode privileges (for symlinks) or Unix file-mode
semantics (for chmod-based permission denial).

PullImage / pullMu (item #16):
- Extracts a small withPullLock(fn) helper from PullImage so the
  mutex contract can be tested without a real containerd client.
  PullImage's external behaviour is unchanged; the body of the
  function moved verbatim into pullImageLocked.
- SerializesConcurrentCallers: spawns 10 goroutines, asserts the
  in-flight critical-section count never exceeds 1
- FailedCallDoesNotPoison: an error from fn doesn't keep the lock
  held — the next call still acquires successfully
- PanicReleasesLock: a panic inside fn unlocks via the deferred
  Unlock; a follow-up call returns within 2 seconds (instead of
  deadlocking)
- ConcurrentSuccessAndFailure: 20 interleaved calls, half failing,
  produce the expected per-call result with no shared state damage
…k mtime

TestCleanOldLogs_StaleSymlinkRemoved was failing in CI because
os.Chtimes follows symbolic links on Unix, so the call was modifying
the *target* file's mtime — leaving the symlink itself looking fresh,
so CleanOldLogs (which uses Lstat-style ModTime) didn't remove it.

Add a small lchtimes helper backed by golang.org/x/sys/unix.Lutimes on
Unix and a stub on Windows (the symlink test skips Windows because
symlink creation needs admin/developer-mode privileges there).
The function was in build.go which imports buildah/podman C-dependent
packages that fail to compile in CI's golangci-lint pass. Moved to
buildcontext_linux.go with only stdlib imports so the test file can
resolve it. Also fixed three blank error discards (gz.Close, f.Close,
os.Remove) and two errorlint violations (bare == comparisons on
sentinel errors).
@luthermonson luthermonson force-pushed the tests/expand-unit-coverage branch from 94308b8 to 5967d57 Compare April 28, 2026 02:17
The buildah handler was replaced by BuildKit on main. Tests that
injected a fake containerd client to drive tar-extraction and
missing-Dockerfile error paths no longer apply — those code paths
now live inside the BuildKit solver. Updated to test the 501
no-BuildKit response and versioned route, which is the behavior
the handler actually exposes without a running solver.
containerd's content store does not persist zero-length blobs, so
stageSyntheticImage's empty []byte{} layer was never actually written.
When the push handler walked the manifest to upload blobs it failed with
"content digest sha256:e3b0c44...: not found".

Use a non-empty byte slice so the content store actually ingests the blob.
concurrent_test.go and dispatch_test.go both declared fakeDispatchServer
and startFakeDispatchServer with incompatible signatures, breaking the
build. Merge them into a single fake that supports both atomic counters
(for concurrent claim tests) and per-RPC request slices (for dispatch
client round-trip tests). startFakeDispatchServer now returns the bound
address alongside the connected DispatchClient so both call sites are
satisfied.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant