Skip to content

Cluster roots runner#69179

Open
dwoz wants to merge 5 commits into
saltstack:3008.xfrom
dwoz:cluster-max-voters-2
Open

Cluster roots runner#69179
dwoz wants to merge 5 commits into
saltstack:3008.xfrom
dwoz:cluster-max-voters-2

Conversation

@dwoz
Copy link
Copy Markdown
Contributor

@dwoz dwoz commented May 16, 2026

What does this PR do?

Add cluster roots runner to manually sync file/pillar roots.

Daniel A. Wozniak added 2 commits May 16, 2026 01:02
Operator-driven content fan-out for file_roots and pillar_roots.
Run on whichever master holds the canonical content; the cluster
fans out from there to every peer over the encrypted cluster pub
bus.  Same transport, chunk format, and apply path as the join-time
bulk state-sync.

How it works

1. Runner fires 'cluster/runner/sync_roots' to the local event
   bus with the requested channel list.
2. MasterPubServerChannel.publish_payload intercepts that tag --
   instead of broadcasting as a regular cluster event, it spawns
   _run_root_sync_to_peers(channels).
3. For each peer pusher:
   * Allocate a fresh session id.
   * Send a 'cluster/peer/sync-roots-begin' event so the receiver
     pre-registers a state-sync session -- same contract as the
     join-reply flow, but the receiver's on_complete is a no-op
     (this is an ad-hoc push, not a Raft-learner bootstrap).
   * Stream the requested channels in the standard state-sync chunk
     format.
4. Receivers apply chunks via the existing _apply_state_sync_chunk
   path -- no new install code; the channels are file_roots and
   pillar_roots and install_root_chunk handles both.

Live smoke verified on a 5-master cluster: added new content to m1's
srv/salt (including a nested subdirectory) and srv/pillar after the
cluster was up, ran cluster.sync_roots, all 4 peers had the new
files within ~8 seconds.  Nested directories preserved.

What's NOT in this slice

* No completion feedback -- runner returns immediately with
  status='fan-out initiated'.  Operators tail each peer's log for
  the 'state-sync ... installed N items' lines to confirm delivery.
* No re-sync trigger -- operator-driven only.  No filesystem watcher,
  no periodic poll.

Tests

* test_sync_roots_rejects_invalid_roots -- input validation
* test_sync_roots_no_cluster_id_is_skip -- non-cluster master returns
  a structured skip rather than firing a meaningless event
* test_sync_roots_fires_local_event -- happy path: event fires with
  the resolved channel list
* test_sync_roots_file_only_filters_channels -- channel filter
  honoured
End-to-end integration coverage on the 3-master isolated-FS cluster.
Complements the unit tests in
tests/pytests/unit/runners/test_cluster_runner.py (event firing,
input validation) with full daemon-driven runs that exercise the
runner -> master event bus -> publish_payload intercept ->
_run_root_sync_to_peers -> peer state-sync chunk install path.

Two scenarios:

* test_isolated_sync_roots_runner_propagates_content
  After the cluster is steady-state, write new content to master_1's
  file_roots and pillar_roots, run salt-run cluster.sync_roots, poll
  master_2 and master_3 until both files appear (or 30s elapses),
  and assert the marker round-trips through encrypted state-sync.
  Distinct from test_isolated_late_joiner_receives_file_and_pillar_roots
  which covers JOIN-time bulk sync.

* test_isolated_sync_roots_runner_file_only
  Pins the channels= filter: roots=file syncs only file_roots; the
  pillar tree on peers remains untouched.  Operator escape hatch
  against accidentally fanning out secret pillar data when only an
  SLS update is intended.

Pre-existing trivial change: the commented-out warning log line at
the top of publish_payload (one of those //log it all// debug
crutches) is removed since the function already gets per-event log
output via the cluster-event broadcast path and the targeted
publish_payload branches.

Diagnosis note for future debuggers

While bringing these tests up I hit a confusing failure where the
runner fired the event, master_1's daemon broadcast it as
cluster/event/127.0.0.1/cluster/runner/sync_roots, but the
intercept never ran.  Root cause: salt-factories had cached
/tmp/stsuite/scripts/cli_salt_master.py pointing at a *different*
worktree (saw a stale CODE_DIR=.../masterbug entry), so the test
daemons loaded an older copy of salt that didn't have the
publish_payload intercept.  Clearing /tmp/stsuite/scripts fixed it.
Worth knowing if you see 'feature works in unit test but
integration test silently misses it' — check the factory cache
first.
@dwoz dwoz requested a review from a team as a code owner May 16, 2026 08:12
@dwoz dwoz added the test:full Run the full test suite label May 16, 2026
The five operator-facing runners (cluster.ring_create, ring_destroy,
route_set, route_clear, ring_set) propose entries through the cluster
Raft log, and that only works on the leader.  The runner intercept in
MasterPubServerChannel.publish_payload dispatched locally only — when
the operator invoked the runner on a non-leader master the propose
raised RuntimeError, the exception was swallowed by log.exception, and
no fan-out reached the actual leader.

The intercept now also schedules a cluster_aes-encrypted
cluster/peer/multi-ring-request event to every peer.  Each peer
re-dispatches through _handle_multi_ring_runner_event; only the master
that is the current Raft leader appends an entry, followers log "not
leader" and skip.  No double-commit because Raft has one leader at any
moment.  A new unit-test file pins the wire shape (one publish per
pusher, cluster_aes round-trip, broken-pusher tolerance, no-peers
no-op, missing-secret silent skip).

Also raise per-test timeouts on slow tests that were tripping the
global 90-second pytest-timeout under coverage tracing on loaded GHA
runners.  The workflow's automatic --lf rerun absorbs the resulting
slow-but-passes case so a transient runner-load spike no longer fails
the job:

  * test_check_minions_grain_target_10000 — 240 s (test budget is
    180 s; the global 90 s default fired before the budget assertion
    could evaluate).
  * test_r_state_apply_logical_resource_no_state_module — 180 s
    (three sequential state.apply runs against dummy resources).
  * test_auth_not_starved_with_routing — 180 s (7-daemon
    saturation test).
  * test_orchestration_with_pillar_dot_items — passes _timeout=120
    to salt_run_cli.run() so the 30 s factory default doesn't kill
    the recursive saltutil.runner('pillar.show_pillar') invocation.

Tornado timing flake: test_when_is_timed_out_is_set_before_other_events_are_completed
used gather_timeout=0.1 s and event_timeout=0.15 s.  The 50 ms ordering
window was regularly inverted by GHA scheduler jitter; bumped to 1 s
and 2 s respectively, preserving the test's logical invariant.

Windows assertion drift: test_win_user_present_new_password now
checks changes.get("passwd") == "XXX-REDACTED-XXX" instead of
strict-equal so the new password_changed / lstchg keys surfaced by
the recent shadow.info refactor don't fail the test.
test_jobs_migration_round_trip: the assertion order raced the
peer-side ``_handle_collect_request``, which streams the requested
channels back to the requester sequentially (``jobs/loads`` first,
then ``jobs/minions``, ``jobs/endtimes``, ``jobs/nocache``).  The old
flow waited up to 30 s for ``jobs/loads`` to land then immediately
asserted on ``jobs/minions`` — fine on a fast box where the whole
fan-out lands in well under a second, but on a loaded GHA runner
``jobs/minions`` chunks were still in flight when the assertion fired.
Wait on the operator-facing sentinel (``cluster-collect-status.json``
``complete=True``) — the actual end-of-collect signal — and then check
every bank.

test_ring_lifecycle_on_shared_filesystem: three serial ``_wait_until``
polls (60 s baseline + 45 s registry/route + 30 s destroy), each
running ``cluster.members`` as a fresh ``salt-run`` subprocess.  On a
2-vCPU GHA runner the cumulative wall-clock has been observed at
95–130 s, past the global 90 s pytest-timeout default.  Add
``pytest.mark.timeout(360, func_only=True)`` so the test's own
predicate timeouts remain the failure signal.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:coverage test:full Run the full test suite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant