Conductor: Resource Arbitration Across Memory Runtimes

When multiple memory runtimes (weight Stagehand, Activation Stagehand, SquareQ) run simultaneously during training, they compete for the same VRAM headroom, pinned CPU budget, and PCIe bandwidth. Conductor is a thin coordination layer that arbitrates these shared resources without rewriting any runtime internals.

Status: v0, integrated in Serenity. Operates above Stagehand — does not modify Stagehand's core logic. No standalone library integration yet.

The Problem

Consider LTX-2 two-stage video training on a 24GB GPU. During stage 2 (transformer training), three runtimes are active simultaneously:

Runtime	Uses VRAM	Uses Pinned CPU	PCIe Direction
Weight Stagehand	Prefetch window (1-3 blocks)	8 GB pinned pool	H2D (load blocks)
Activation Stagehand	Spill/restore activations	Slab pool (~500 MB)	D2H (spill) + H2D (restore)
SquareQ	INT8 dequant staging	Slab backing (~256 MB)	H2D (dequant)

Without coordination:

All three issue PCIe transfers concurrently, saturating bandwidth and increasing stall time
Weight prefetch during optimizer step wastes VRAM (optimizer needs that headroom for state)
Activation spills during backward compete with weight loads for the same PCIe lanes

How Conductor Works

Conductor does three things:

1. Phase Broadcasting

The training step is divided into phases: STEP_BEGIN → FORWARD → BACKWARD → OPTIMIZER → STEP_END. Conductor broadcasts phase transitions to all registered runtime adapters, so each runtime knows what's happening.

2. Budget Arbitration

A BudgetManager tracks VRAM headroom and pinned CPU reservations with typed modes:

Mode	Behavior
HARD	Full grant or deny — no partial
SOFT	Partial grants OK (get what's available)
BURST	Can exceed soft cap up to hard cap
FLOOR	Guaranteed minimum — full grant or deny
CEILING	Upper bound hint

Reservations can be phase-scoped (auto-released when a phase ends) and carry priority levels (CRITICAL > REQUIRED > SPECULATIVE > BACKGROUND).

3. Transfer Slot Management

A TransferManager maintains separate H2D and D2H slot pools. Runtimes acquire tokens before issuing PCIe transfers. If all slots are taken, the request gets a typed denial reason (e.g., H2D_SLOTS_EXHAUSTED, PHASE_RULE_SUPPRESSED_SPECULATIVE).

Policy Rules (v0)

The HeuristicPolicyBridgeV0 computes RuntimeLimitHints at every phase boundary and pushes them to all adapters. Three rules:

Rule 1 — Backward pressure: When VRAM pressure exceeds 80% during BACKWARD, suppress speculative prefetch and reduce the prefetch window to 1. This frees VRAM for activation accumulation.

Rule 2 — Optimizer protection: During OPTIMIZER phase, suppress all speculative work and limit H2D to 1 slot. The optimizer needs headroom for state updates — prefetching blocks into VRAM during this phase is wasted work.

Rule 3 — Contention reduction: When all transfer slots are consistently full (>3 consecutive checks), progressively reduce the prefetch window. This backs off PCIe pressure.

Monotonicity guarantee: Within a single step, hints only tighten — prefetch_window is non-increasing and suppress_speculative never reverts from True to False. Resets at STEP_BEGIN.

What Gets Tuned

Conductor pushes hints to runtime adapters, which translate them into internal knob writes:

Hint	Weight Stagehand	Activation Stagehand	SquareQ
`max_inflight_h2d`	`engine._max_inflight`	`transfer._max_h2d`	N/A
`max_inflight_d2h`	N/A	`transfer._max_d2h`	N/A
`prefetch_window_cap`	`scheduler.policy.prefetch_window`	N/A	N/A
`suppress_speculative`	sets `prefetch_window = 1`	sets `_max_d2h = 0` in OPTIMIZER	N/A

All writes are to plain instance variables on the training thread — same thread that calls before_block() / _wait_for_slot(). No synchronization issues.

Adapters save original values on attach() and restore them on detach().

Zero-Cost Disable

from serenity.memory.conductor import ConductorConfig, ConductorRuntime

# Disabled — no sub-objects allocated, all methods return immediately
rt = ConductorRuntime(ConductorConfig(enabled=False))
rt.begin_step(0)    # no-op
rt.enter_forward()  # no-op
rt.shutdown()       # no-op

When enabled=False, the constructor returns after storing _enabled=False. No BudgetManager, no TransferManager, no PhaseCoordinator — nothing is allocated.

Integration Point

Conductor is wired into StagehandStrategy.forward_context() in Serenity:

# In training config JSON:
{
  "memory": {
    "stagehand": {
      "conductor_config": {
        "enabled": true,
        "vram_soft_cap_mb": 22000,
        "vram_hard_cap_mb": 23500,
        "h2d_slots": 2,
        "d2h_slots": 2
      }
    }
  }
}

The strategy creates a ConductorRuntime, registers adapters for whichever runtimes are active, and calls lifecycle methods at the right points:

begin_step(step)
  enter_forward()
    [forward hooks fire — weight stagehand streams blocks]
    [loss.backward() — backward hooks fire]
  enter_backward()
  enter_optimizer()
    [optimizer.step()]
  end_step()

Telemetry

Conductor writes per-step metrics to conductor_telemetry.jsonl at configurable intervals:

{
  "step_id": 50,
  "vram_allocated_mb": 21456.3,
  "vram_headroom_mb": 2043.7,
  "pinned_granted_mb": 8704.0,
  "h2d_inflight": 1,
  "d2h_inflight": 0,
  "grant_count": 12,
  "deny_count": 0,
  "partial_count": 2,
  "phase_durations": {"forward": 3.21, "backward": 4.56, "optimizer": 1.89},
  "runtime_snapshots": {
    "stagehand": {"max_inflight": 2, "prefetch_window": 2},
    "activation_stagehand": {"max_d2h": 1, "max_h2d": 1}
  }
}

Set debug_event_trace: true for fine-grained event logging (phase transitions, reservation requests, slot acquisitions).

Architecture

ConductorRuntime
├── PhaseCoordinator ──→ broadcasts to all adapters
├── BudgetManager ────→ VRAM + pinned CPU reservations
├── TransferManager ──→ H2D / D2H slot pools
├── PolicyBridge ─────→ computes RuntimeLimitHints per phase
├── TelemetryHub ─────→ JSONL metrics + debug events
└── Adapters
    ├── StagehandConductorAdapter ──→ tunes weight prefetch
    ├── ActStagehandConductorAdapter → tunes activation spill/restore
    └── SquareQConductorAdapter ────→ reports pinned usage (no knobs v0)

What Conductor Does NOT Do

Does not rewrite any runtime internals (Stagehand, Activation Stagehand, SquareQ)
Does not make scheduling decisions — runtimes still decide when/what to prefetch
Does not allocate VRAM or pinned memory — only tracks and constrains
Does not add overhead when disabled (zero-cost gate)
Does not require all three runtimes — works with any subset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conductor: Resource Arbitration Across Memory Runtimes

The Problem

How Conductor Works

1. Phase Broadcasting

2. Budget Arbitration

3. Transfer Slot Management

Policy Rules (v0)

What Gets Tuned

Zero-Cost Disable

Integration Point

Telemetry

Architecture

What Conductor Does NOT Do

FilesExpand file tree

conductor.md

Latest commit

History

conductor.md

File metadata and controls

Conductor: Resource Arbitration Across Memory Runtimes

The Problem

How Conductor Works

1. Phase Broadcasting

2. Budget Arbitration

3. Transfer Slot Management

Policy Rules (v0)

What Gets Tuned

Zero-Cost Disable

Integration Point

Telemetry

Architecture

What Conductor Does NOT Do