Decouple backend-specific cloning from InteractiveScene by ooctipus · Pull Request #5770 · isaac-sim/IsaacLab

ooctipus · 2026-05-25T11:09:41Z

Summary

This PR removes backend-specific cloning logic from InteractiveScene and
replaces the previous implicit replicate_session_defaults /
replicate_session machinery with three orthogonal, explicit primitives that
compose identically whether you drive cloning through InteractiveScene or
by hand in a DirectRLEnv or standalone script.

The result is that InteractiveScene no longer knows anything about USD vs.
PhysX vs. Newton — it just enters a ReplicateSession and lets each asset's
constructor register the backend(s) it needs. This is foundational for two
follow-on capabilities the project has wanted for a while: flexible backend
cloning and skip-cloning workflows.

What changed

New core primitives in `isaaclab.cloner`

REPLICATION_QUEUE — module-level list that asset constructors append
(cfg, BackendCtxCls) pairs to via tiny per-backend helpers
(queue_usd_replication, queue_physx_replication,
queue_newton_replication). Backends are no longer special-cased inside
InteractiveScene; each one self-registers.
ClonePlan — self-contained dataclass describing the world layout
(sources, destinations, clone_mask, env_ids, positions,
cfg_rows). Stage-agnostic by design; the USD stage is now passed
explicitly to consumers so the same plan can be replayed, inspected, or
serialized.
replicate(plan, *, stage) — free function that drains
REPLICATION_QUEUE against a plan, groups queued cfgs by backend context
class, runs each context in ascending replicate_priority order (physics
before USD), publishes the plan to SimulationContext, and clears the
queue. The queue is snapshotted and cleared up front so a backend failure
cannot leak stale entries into the next call.
ReplicateSession is now a thin context manager that calls
make_clone_plan in __enter__ and replicate in __exit__. The
state-bag version with plan / stage / cfg_rows /
replicate_on_exit fields is gone.
ClonePlan.from_env_0 — classmethod that builds the single-source
homogeneous plan most direct envs need by auto-populating cfg_rows
from REPLICATION_QUEUE filtered by env-root prefix.
CloneCfg.clone_regex (default "/World/envs/env_.*") — single
source of truth for the env-namespace convention. InteractiveScene
reads it directly when expanding {ENV_REGEX_NS} cfg macros.

Two equivalent invocation paths

# InteractiveScene path (what the scene runs under the hood)
with cloner.ReplicateSession(cfgs, num_clones=N, env_spacing=2.0,
                             device=device, stage=stage):
    for cfg in cfgs:
        cfg.class_type(cfg)

# Direct env / script path
plan = cloner.ClonePlan.from_env_0(src, dest, num_envs, device, positions)
cloner.replicate(plan, stage=scene.stage)

Both end in the same cloner.replicate(plan, stage=...) call. The only
difference is how the plan was built and how asset construction was
interleaved.

What got removed from `InteractiveScene`

clone_environments(...) deprecated shim. The scene now replicates
inside __init__ via ReplicateSession.
env_ns / env_regex_ns properties (used only internally).
_build_clone_plan_from_cfg and _default_env_origins internals.
Cfg-driven plan construction now lives in make_clone_plan; per-env
positions are read from the published ClonePlan.
InteractiveScene.env_origins now reads from the plan published to
SimulationContext, making the plan the single source of truth for
env placement.

Why this matters (the actual point of the PR)

This refactor is foundational for two capabilities the current scene
coupling blocks:

Flexible backend cloning. Backends now plug in by shipping a
<Backend>ReplicateContext class + a one-line queue helper. Swapping
PhysX ↔ Newton no longer requires InteractiveScene to change; cfgs
and user code stay untouched, and a third-party backend can register
itself without modifying core.
Skip-cloning workflows. Because plan construction, asset
registration, and drain are three independent primitives, callers
that want to author env-0 prims by hand and skip the cloner — or
drive replication out-of-band from a visualizer, replay tool, or
test fixture — can do so without fighting InteractiveScene.

Migration notes

with cloner.ReplicateSession(): (no-arg) →
cloner.replicate(cloner.ClonePlan.from_env_0(...), stage=...).
InteractiveScene.clone_environments(...) → removed; the scene
replicates inside __init__.
make_clone_plan(sources, destinations, ...) →
make_clone_plan(cfgs, num_clones, env_spacing, device, ...).
Pass stage=... explicitly to replicate() and ReplicateSession().
Read CloneCfg.clone_regex if you previously used
InteractiveScene.env_ns / env_regex_ns.

About 17 direct envs were migrated to the new pattern in this PR.

Test plan

./isaaclab.sh -p -m pytest source/isaaclab/test/sim/test_cloner.py
./isaaclab.sh -p -m pytest source/isaaclab/test/scene/test_interactive_scene.py
./isaaclab.sh -p -m pytest source/isaaclab_physx/test/sim/test_cloner.py
Migrated direct envs (cartpole, anymal_c, franka_cabinet,
factory, humanoid_amp, inhand_manipulation, locomotion,
quadcopter, shadow_hand_*, automate/*, cart_double_pendulum,
cartpole_warp, inhand_manipulation_warp, locomotion_warp)
spawn and step on PhysX
Same envs spawn and step on Newton
./isaaclab.sh -f passes
Docs build (cd docs && make html)

isaaclab-review-bot

🤖 Isaac Lab Review Bot — Multi-Perspective Analysis

PR: Decouple backend-specific cloning from InteractiveScene
SHA: e75f1bc8e6d3cdc07d5bfcbdff33b4c2f2afc14d

Architecture Assessment (Isaac Lab Expert)

This is an excellent structural improvement. The PR replaces InteractiveScene's hard-coded backend dispatching (if physx... elif newton... elif ovphysx...) with a self-registering queue pattern where each backend ships a <Backend>ReplicateContext + a queue_<backend>_replication() helper. The three orthogonal primitives (REPLICATION_QUEUE, ClonePlan, replicate()) compose well regardless of whether you're in InteractiveScene, a DirectRLEnv, or a standalone script.

Design wins:

Clean separation of concerns — InteractiveScene now only cares about what to clone, not how
Backend extensibility without core modification (Open/Closed principle)
Both invocation paths (ReplicateSession context manager vs. manual ClonePlan.from_env_0 + replicate) converge on the same drain function — no hidden divergence
The replicate_priority ordering (physics=0, USD=100) ensures correct execution order

Findings

1. 🟡 Performance Regression Risk: `disabled_fabric_change_notifies` Removed

The old InteractiveScene.__init__ wrapped both the initial usd_replicate and clone_environments calls inside cloner.disabled_fabric_change_notifies(self.stage, restore=False). The new code calls cloner.usd_replicate(...) and the ReplicateSession without this optimization.

The old code's own comments documented that this suspension provides a measurable speedup when cloned prims carry PhysX rigid-body schemas and total Sdf.CopySpec firings reach ~32K. For large scenes (128+ envs, many rigid bodies), this could regress scene-init time.

Suggestion: Consider wrapping the UsdReplicateContext.replicate() method (or the entire ReplicateSession.__exit__ drain) in disabled_fabric_change_notifies so the optimization is preserved in the new architecture. Since the context manager was already moved to _fabric_notices.py, it could be imported from there.

2. 🟡 Module-Level `REPLICATION_QUEUE` — Implicit Global State

REPLICATION_QUEUE is a module-level mutable list that asset constructors append to and replicate() drains. While the PR handles the "exception during session" case by clearing the queue in ReplicateSession.__exit__, there are still scenarios where the queue could accumulate stale entries:

If replicate() is called without a ReplicateSession and a backend raises mid-drain, the queue is already cleared up front (good), but the partial work from earlier backends cannot be rolled back.
If user code calls queue_<backend>_replication(cfg) outside of any session/replicate cycle (e.g., in a test or standalone script that forgets to call replicate()), entries accumulate silently until the next replicate() call — possibly in a completely unrelated context.

Suggestion: Consider adding a warnings.warn() or debug log in replicate() when queued contains entries whose cfg is not in plan.cfg_rows (currently silently skipped). This would help developers catch mismatched queue/plan scenarios.

3. 🟡 `ClonePlan.from_env_0` Couples to Live Queue State

ClonePlan.from_env_0 reads REPLICATION_QUEUE at call time to auto-populate cfg_rows. This means the return value depends on when it's called relative to asset construction. If called too early (before all assets register) or too late (after a previous replicate() drained the queue), the plan will be incomplete.

The docstring mentions filtering by prefix, but doesn't document this timing constraint. For the InteractiveScene path this is fine (construction happens inside ReplicateSession), but direct-env users calling from_env_0 need to know that queue population must precede this call.

Suggestion: Add a note to the from_env_0 docstring: "Must be called after all asset constructors have registered their replication entries into REPLICATION_QUEUE."

4. 🟢 `_collect_asset_cfgs` vs `_add_entities_from_cfg` — Duplicated `ENV_REGEX_NS` Resolution

Both _collect_asset_cfgs() and _add_entities_from_cfg() resolve {ENV_REGEX_NS} macros in prim_path. _collect_asset_cfgs does it for all children with a prim_path, while _add_entities_from_cfg also does asset_cfg.prim_path.format(ENV_REGEX_NS=env_regex_ns). Since _collect_asset_cfgs already mutates the cfg objects (it writes back to child.prim_path), the second resolution in _add_entities_from_cfg either double-formats (safe if no {ENV_REGEX_NS} remains) or is redundant.

This isn't a bug (double-format of an already-resolved string is a no-op), but it's confusing for maintainers. Consider removing the redundant format in _add_entities_from_cfg or adding a comment explaining the ordering.

5. 🟡 `SimulationContext.instance()` Crash in `replicate()`

In replicate(), the final line calls SimulationContext.instance().set_clone_plan(plan). If SimulationContext hasn't been initialized yet (e.g., in a unit test that manually creates a stage), .instance() returns None and this line raises AttributeError: 'NoneType' object has no attribute 'set_clone_plan'.

The direct env / script path example in the PR description shows cloner.replicate(plan, stage=scene.stage) which implies a SimulationContext exists, but the function has no guard.

Suggestion: Add a defensive check: sim = SimulationContext.instance(); if sim is not None: sim.set_clone_plan(plan) — or document that replicate() requires an active SimulationContext.

6. 🟢 Strong Test Coverage for New Primitives

The test additions are solid:

test_usd_replicate_context_queue_and_replicate — validates the new UsdReplicateContext
test_queue_usd_replication_only_appends — confirms queue-only semantics
test_make_clone_plan_homogeneous_returns_env_root_plan — validates the fast-path
test_make_clone_plan_heterogeneous_mutates_spawn_paths — validates multi-variant behavior
_drain_replication_queue fixture prevents test cross-contamination

Gap: No integration test verifies the full ReplicateSession context-manager flow (enter → construct assets → exit → verify plan published). The unit tests cover individual primitives but not the composed lifecycle that InteractiveScene relies on. Consider adding a test that exercises ReplicateSession end-to-end with a mock backend.

Summary

Category	Rating
Architecture	✅ Excellent — clean decoupling, extensible backends
Error Handling	⚠️ Minor gaps (SimulationContext guard, silent queue skipping)
Performance	⚠️ Fabric notice suspension removed — verify no regression
Test Coverage	✅ Good unit tests; integration test for composed lifecycle would strengthen
Migration Safety	✅ Good — legacy free-function wrappers preserved for backward compat

Verdict: Strong architectural improvement. The findings above are advisory — the design is sound and the implementation is clean. The main risk to validate is the Fabric notice suspension removal under real workloads.

Update (f774d4d): Reviewed incremental changes. All new commits are cosmetic — docstring trimming, line reformatting, and removing string-quoted type annotations (e.g. "ClonePlan" → ClonePlan). No logic changes, no new issues introduced. Previous findings still apply as-is.

Update (788d71a): Two previous concerns addressed:

✅ Finding #1 (Fabric notice suspension): replicate() now wraps copy work in disabled_fabric_change_notifies(self.stage) — performance optimization restored.
✅ Finding #3 (ClonePlan.from_env_0 timing): Docstring now documents the ordering constraint (must be called after asset constructors have run).

No new issues introduced. The refactor extracting _apply_queue() is clean.

Update (034506d): Cosmetic changes only — line reformatting in usd.py (multi-line statements condensed to single lines) and added changelog skip marker. No logic changes, no new issues.

Update (de00120): Functional improvements with good test coverage:

replicate_session.py: Refactored to group cfgs by backend and deduplicate shared rows before dispatching — prevents redundant instantiation when multiple cfgs share the same row (common in homogeneous plans)
sensor_base.py: Now uses sim.get_clone_plan().env_ids for env count when available, handling backends like Newton that clone solver-side without authoring per-env USD specs
Added test_replicate_dedupes_shared_rows_across_cfgs regression test validating the dedup logic

No new issues. These are solid bug fixes / edge case handling.

Update (fb61d98): CI config change only — added threedworld.org to the link-checker exclusion list in .github/workflows/check-links.yml. Unrelated to the core PR logic; no code changes.

- ruff-format collapsed two multi-line expressions in cloner/usd.py back to single lines. - isaaclab_ovphysx gained changes in this PR (cloner module rewrite) but had no changelog fragment; add a .skip mirroring the sibling backends.

Host returns HTTP 403 to lychee's user agent on every run, blocking the "Check for Broken Links" job on unrelated docs (`ecosystem.rst:212`). Same pattern used for other bot-blocking hosts already in the exclude list (stackoverflow.com, helm.ngc.nvidia.com, etc.).

commit

e75f1bc

ooctipus requested review from ClemensSchwarke, Mayankm96, Toni-SM, hhansen-bdai, jtigue-bdai, kellyguo11, masoudmoghani, matthewtrepte, ossamaAhmed, pascal-roth, peterd-NV and xyao-nv as code owners May 25, 2026 11:09

ooctipus changed the base branch from main to develop May 25, 2026 11:10

github-actions Bot added documentation Improvements or additions to documentation enhancement New feature or request isaac-sim Related to Isaac Sim team isaac-mimic Related to Isaac Mimic team infrastructure labels May 25, 2026

isaaclab-review-bot Bot reviewed May 25, 2026

View reviewed changes

lint

f774d4d

github-actions Bot added the isaac-lab Related to Isaac Lab team label May 25, 2026

ooctipus added 4 commits May 25, 2026 04:41

address some comments

788d71a

fix bugs

de00120

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple backend-specific cloning from InteractiveScene#5770

Decouple backend-specific cloning from InteractiveScene#5770
ooctipus wants to merge 6 commits into
isaac-sim:developfrom
ooctipus:feature/clone_plan_new_representation

ooctipus commented May 25, 2026

Uh oh!

isaaclab-review-bot Bot left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ooctipus commented May 25, 2026

Summary

What changed

New core primitives in isaaclab.cloner

Two equivalent invocation paths

What got removed from InteractiveScene

Why this matters (the actual point of the PR)

Migration notes

Test plan

Uh oh!

isaaclab-review-bot Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

🤖 Isaac Lab Review Bot — Multi-Perspective Analysis

Architecture Assessment (Isaac Lab Expert)

Findings

1. 🟡 Performance Regression Risk: disabled_fabric_change_notifies Removed

2. 🟡 Module-Level REPLICATION_QUEUE — Implicit Global State

3. 🟡 ClonePlan.from_env_0 Couples to Live Queue State

4. 🟢 _collect_asset_cfgs vs _add_entities_from_cfg — Duplicated ENV_REGEX_NS Resolution

5. 🟡 SimulationContext.instance() Crash in replicate()

6. 🟢 Strong Test Coverage for New Primitives

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New core primitives in `isaaclab.cloner`

What got removed from `InteractiveScene`

isaaclab-review-bot Bot left a comment •

edited

Loading

1. 🟡 Performance Regression Risk: `disabled_fabric_change_notifies` Removed

2. 🟡 Module-Level `REPLICATION_QUEUE` — Implicit Global State

3. 🟡 `ClonePlan.from_env_0` Couples to Live Queue State

4. 🟢 `_collect_asset_cfgs` vs `_add_entities_from_cfg` — Duplicated `ENV_REGEX_NS` Resolution

5. 🟡 `SimulationContext.instance()` Crash in `replicate()`