Skip to content

e2e: remove activator container and consolidate to onchain allocation#3629

Draft
elitegreg wants to merge 3 commits intomainfrom
gm/activator-removal-phase-2
Draft

e2e: remove activator container and consolidate to onchain allocation#3629
elitegreg wants to merge 3 commits intomainfrom
gm/activator-removal-phase-2

Conversation

@elitegreg
Copy link
Copy Markdown
Contributor

@elitegreg elitegreg commented Apr 30, 2026

Summary

  • Bundles activator-removal phase 2.1, 2.2, and 2.3 into a single PR. Each step in isolation breaks the next, so they ship together.
  • 2.1: Drops the legacy-allocation BackwardCompatibility e2e shard (legacy allocation is going away).
  • 2.2: Floors the surviving (onchain) BackwardCompatibility shard at CLI ≥ 0.12.0 and bumps onchain MIN_COMPATIBLE_VERSION in the serviceability program to match. The 0.14.1 floor for link accept is already covered by the existing before("0.19.0") RFC-18 guard.
  • 2.3: Removes the activator container from the e2e devnet entirely; all remaining shards run with onchain allocation. DZ_E2E_ONCHAIN_ALLOCATION and DZ_E2E_DISABLE_ACTIVATOR env knobs are deleted. Shard 3 (the standalone onchain-no-activator shard) is folded into the round-robin shards.
  • Bug fixes uncovered by removing the activator:
    • smartcontract/serviceability: CreateDeviceInterface was unconditionally overwriting a caller-supplied ip_net for any Loopback under onchain allocation, allocating from the device-tunnel-block (172.16/16) and clobbering the user-tunnel-endpoint IP the operator passed in. It now only allocates when no ip_net was supplied — matching the existing interface/activate.rs behavior. Without this, TestE2E_SentinelMulticastPublisherCreatesPublishers failed because the sentinel picked the auto-allocated private IP as tunnel_endpoint and the contract rejected it with InvalidTunnelEndpoint (the per-user is_global(tunnel_endpoint) check).
    • sentinel: build_create_multicast_publisher_instructions hard-coded dz_prefix_count = 0, so create_subscribe_user produced a Pending multicast user with no resources allocated. With no activator to finish activation, the user never reached Activated, the next poll cycle re-attempted creation, and the contract rejected it with AccountAlreadyInitialized. The sentinel now fetches device.dz_prefixes.len() and supplies the ResourceExtension accounts so create+allocate+activate happens atomically.

After this PR, the e2e devnet has no path to start an activator — there is no environment variable that can re-enable it. Per the issue's note, breakage in onchain-allocation behavior is no longer masked by the activator picking up the slack; expect the first few CI runs to surface latent issues.

PR size: Exceeds the ~500-line guideline. The three phases were authored as separate issues for that reason but were bundled per request because each step depends on the next.

Closes #3609
Closes #3610
Closes #3611

Testing Verification

  • go vet -tags=e2e ./e2e/... clean.
  • go build ./... clean.
  • cargo test -p doublezero-serviceability and cargo clippy -p doublezero-serviceability --all-targets -- -Dclippy::all -Dwarnings clean after the `MIN_COMPATIBLE_VERSION` bump.
  • New unit test `test_create_loopback_with_onchain_allocation_honors_supplied_ip_net` asserts the caller-supplied `ip_net` survives a Loopback create under onchain allocation and that the DeviceTunnelBlock is left untouched.
  • `TestE2E_SentinelMulticastPublisherCreatesPublishers` previously failed at the multicast-publisher creation step (`InvalidTunnelEndpoint` → after the contract fix, `AccountAlreadyInitialized` on the second poll). It now passes end-to-end: the sentinel detects the three IBRL validators, creates Activated multicast publisher users on the configured group, and `wait-for-multicast-publishers` completes in ~5s.
  • `grep -ri DZ_E2E_ONCHAIN_ALLOCATION` and `grep -ri DZ_E2E_DISABLE_ACTIVATOR` return nothing.
  • `grep -ri activator e2e/` returns only the onchain `activator-authority` / `activator_authority_pk` references (program-level authority field, out of scope for phase 2).
  • E2E shards in CI: matrix should render as 5 entries — shard 1 (`TestE2E_BackwardCompatibility`) and shards 2–5 round-robin remainder. No `onchain_allocation`/`disable_activator` keys present.

Drops the legacy-allocation BackwardCompatibility shard, floors the
surviving (onchain) BackwardCompatibility shard at CLI 0.12.0, and
removes the activator container from the e2e devnet entirely. All
remaining shards now run with onchain allocation; the
DZ_E2E_ONCHAIN_ALLOCATION and DZ_E2E_DISABLE_ACTIVATOR env knobs are
gone.

Closes #3609
Closes #3610
Closes #3611
…on under onchain allocation

Two related bugs prevented the sentinel from creating multicast publishers
once the activator was removed and onchain allocation became the only path:

- smartcontract/serviceability: CreateDeviceInterface unconditionally
  overwrote a caller-supplied ip_net for any Loopback when use_onchain_allocation
  was true, allocating from the device-tunnel-block (172.16/16) instead.
  Now it only allocates when no ip_net was supplied, matching the existing
  behavior in interface/activate.rs. User-tunnel-endpoint loopbacks can
  again land on a globally routable IP rather than a private one that
  the user-create validation later rejects with InvalidTunnelEndpoint.

- sentinel: build_create_multicast_publisher_instructions hard-coded
  dz_prefix_count = 0, so create_subscribe_user produced a Pending user
  with no resources allocated. Without an activator the user never
  reached Activated, the next poll cycle re-attempted creation, and
  the contract rejected it with AccountAlreadyInitialized. The sentinel
  now fetches device.dz_prefixes.len() and supplies the resource
  extension accounts so create+allocate+activate happens atomically.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

1 participant