Skip to content

[IK] Stop DiffIK action term from accumulating body-offset on aliased Jacobian buffer#5775

Open
hujc7 wants to merge 1 commit into
isaac-sim:developfrom
hujc7:jichuanh/diffik-jacobian-aliasing-fix
Open

[IK] Stop DiffIK action term from accumulating body-offset on aliased Jacobian buffer#5775
hujc7 wants to merge 1 commit into
isaac-sim:developfrom
hujc7:jichuanh/diffik-jacobian-aliasing-fix

Conversation

@hujc7
Copy link
Copy Markdown
Collaborator

@hujc7 hujc7 commented May 26, 2026

1. Summary

  • DifferentialInverseKinematicsAction._compute_frame_jacobian aliased self.jacobian_b and applied the body-offset correction in place. If the underlying data source returns a view onto the engine's mutable Jacobian buffer, repeated calls within a single step drifted the translational Jacobian linearly per call.
  • Fix: allocate an owned self._jacobian_b buffer in __init__, copy into it before mutating. Mirrors the pattern already used by OperationalSpaceControllerAction._compute_ee_jacobian.
  • Adds a stub-based regression test that fails against the previous behavior (2/3 assertions) and passes after the fix.
  • One file changed in product code (+16 −5); no behavior change for callers because the returned tensor has identical values.

2. Why the previous code happened to work today

  • self.jacobian_b calls self.jacobian_w, which advanced-indexes the source buffer as [:, int_body_idx, :, list_joint_ids]. PyTorch mixed advanced indexing returns a fresh copy per call, so the in-place += mutated a copy in practice.
  • The safety depended entirely on _jacobi_body_idx being a single int and _jacobi_joint_ids being a list. A refactor to slices would re-expose the bug.
  • The new owned-buffer pattern makes the copy explicit at the action term boundary and is insensitive to upstream indexing semantics.

3. Test plan

  • test_compute_frame_jacobian_is_idempotent_within_step — three consecutive calls under non-None body offset return identical tensors. Fails against develop, passes against the fix.
  • test_compute_frame_jacobian_applies_offset_once — output matches an out-of-place reference computation.
  • test_compute_frame_jacobian_returns_owned_buffer — return tensor is the owned buffer, not the data-layer source.
  • Pre-commit (./isaaclab.sh -f) clean on changed files.

4. Out of scope

  • The jacobian_b property itself does the same alias-then-mutate trick when rotating into base frame (jacobian[:, :3, :] = torch.bmm(...)). Currently safe by the same advanced-indexing copy. Left alone in this PR to keep scope minimal; flagging for a possible follow-up.
  • OperationalSpaceControllerAction.jacobian_b has the same property-level pattern but its consumer copies into an owned buffer before mutating, so it's safe-by-consumer.

Refs NVBug 6043099

@github-actions github-actions Bot added bug Something isn't working isaac-lab Related to Isaac Lab team labels May 26, 2026
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR correctly fixes a subtle Jacobian aliasing bug (NVBug 6043099) where _compute_frame_jacobian was mutating a potentially shared buffer in place. The fix allocates an owned _jacobian_b buffer and copies into it before applying the body-offset correction — a robust pattern already used in OperationalSpaceControllerAction.

✅ What Works Well

  1. Correct fix approach: The copy-before-mutate pattern ensures idempotency regardless of whether jacobian_b returns a view or copy
  2. Good test coverage: Three regression tests covering idempotency, correctness, and buffer ownership
  3. Clear documentation: The PR description thoroughly explains why the old code "happened to work" and the fix rationale
  4. Follows existing patterns: Mirrors the safe pattern already in OperationalSpaceControllerAction._compute_ee_jacobian

📋 Minor Observations

The test coverage is thorough for the non-None body_offset case. Consider adding a brief test for the body_offset=None path to ensure the owned buffer is still returned correctly (though this path has simpler logic and lower risk).

📝 Design Note

The PR body correctly notes that jacobian_b property itself has the same alias-then-mutate pattern when rotating into base frame. While safe today due to advanced indexing semantics, this is worth tracking for a follow-up to make the safety explicit.


Overall, this is a well-crafted fix with appropriate tests. The change is minimal and focused on the bug at hand.


# owned buffer for the body-frame Jacobian. Avoids aliasing the underlying
# data source (which can be a view onto the engine's mutable buffer) so the
# in-place body-offset correction in :meth:`_compute_frame_jacobian` is safe
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good practice: Pre-allocating the buffer in __init__ is the right approach. This ensures consistent memory layout and avoids repeated allocations during simulation steps.

Minor suggestion: Consider adding a brief inline comment noting the buffer shape (num_envs, 6, num_joints) corresponds to (batch, spatial_dim, dof) for future readers unfamiliar with Jacobian conventions.


torch.testing.assert_close(j1, j2)
torch.testing.assert_close(j1, j3)
# The backing buffer must be untouched: the fix may not corrupt the source.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent test design: Verifying that the backing buffer is untouched after the operation (torch.testing.assert_close(backing, backing_snapshot)) is crucial — it ensures the fix doesn't corrupt the source data, which would cause issues for other consumers of the same Jacobian buffer.

… Jacobian buffer

DifferentialInverseKinematicsAction._compute_frame_jacobian read
self.jacobian_b into a local alias and applied the body-offset
correction in place. When the underlying data source returns a view
onto the engine's mutable Jacobian buffer, repeated calls within a
single step accumulated the correction, drifting the translational
Jacobian linearly per call.

Allocate an owned self._jacobian_b buffer in __init__ and copy into
it before mutating, mirroring the pattern already used by
OperationalSpaceControllerAction. The fix is robust to whether
jacobian_b returns a view or a copy.

Add a stub-based regression test that fails against the previous
behavior and passes after the fix.

Refs NVBug 6043099
@hujc7 hujc7 force-pushed the jichuanh/diffik-jacobian-aliasing-fix branch from 432e22d to f0154ac Compare May 26, 2026 04:50
@hujc7 hujc7 marked this pull request as ready for review May 26, 2026 04:51
@hujc7 hujc7 requested a review from ooctipus as a code owner May 26, 2026 04:51
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 26, 2026

Greptile Summary

This PR fixes DifferentialInverseKinematicsAction._compute_frame_jacobian so it no longer aliases the jacobian_b return value and mutates it in-place; instead it pre-allocates an owned _jacobian_b buffer in __init__ and copies into it before applying the body-offset correction on every call.

  • Production fix (task_space_actions.py): three lines changed — buffer allocation in __init__, copy-then-mutate in _compute_frame_jacobian, return the owned buffer. Mirrors the existing pattern in OperationalSpaceControllerAction._compute_ee_jacobian.
  • Regression tests (test_diffik_jacobian_aliasing.py): three stub-based tests confirm idempotency across repeated calls, single application of the offset correction, and buffer-identity of the returned tensor.
  • The PR explicitly flags that jacobian_b itself still applies the same alias+mutate pattern internally (safe today due to PyTorch advanced-indexing copy semantics, but fragile under refactoring) and defers that fix to a follow-up.

Confidence Score: 4/5

Safe to merge; the one-line copy before mutation is a straightforward defensive hardening with no behavior change under current engine semantics.

The production change is minimal and correct — it replaces a fragile alias with an explicit owned-buffer copy, matching an established pattern elsewhere in the same file. The jacobian_b property retains the same structural pattern (acknowledged and deferred), meaning the overall approach still has one latent alias dependency that a future indexing refactor could expose. The test suite covers the core regression well but leaves the rotational-correction branch of the idempotency test exercised only with an identity quaternion.

The jacobian_b property in task_space_actions.py (lines 144–150) is worth a second look for the follow-up fix noted in the PR description.

Important Files Changed

Filename Overview
source/isaaclab/isaaclab/envs/mdp/actions/task_space_actions.py Allocates an owned _jacobian_b buffer in __init__ and copies into it before mutating in _compute_frame_jacobian, eliminating the aliased-buffer accumulation bug; no logic changes to callers.
source/isaaclab/test/envs/test_diffik_jacobian_aliasing.py Stub-based regression suite covering idempotency, single-application of offset, and owned-buffer identity; all three tests target the specific aliasing failure mode, though the idempotency test uses an identity rotation that leaves the rotational-correction branch un-exercised for accumulation.
source/isaaclab/changelog.d/jichuanh-diffik-jacobian-aliasing-fix.rst Changelog fragment correctly describing the fixed drift behaviour; no issues.

Sequence Diagram

sequenceDiagram
    participant C as apply_actions()
    participant M as _compute_frame_jacobian()
    participant P as jacobian_b (property)
    participant W as jacobian_w (property)
    participant E as Engine Jacobian Buffer

    C->>M: call
    M->>P: self.jacobian_b
    P->>W: self.jacobian_w
    W->>E: advanced index [:, int_idx, :, list_ids]
    E-->>W: fresh copy (no alias)
    W-->>P: tensor copy
    P->>P: rotate in-place (mutates copy, not engine buffer)
    P-->>M: rotated copy
    M->>M: "self._jacobian_b[:] = rotated copy"
    M->>M: apply body-offset correction on self._jacobian_b
    M-->>C: return self._jacobian_b (owned buffer ref)
Loading

Comments Outside Diff (1)

  1. source/isaaclab/isaaclab/envs/mdp/actions/task_space_actions.py, line 144-150 (link)

    P2 jacobian_b still aliases and mutates the jacobian_w copy in-place

    The property calls self.jacobian_w, stores the result in jacobian, then applies jacobian[:, :3, :] = torch.bmm(...) and jacobian[:, 3:, :] = torch.bmm(...) before returning it. This is the same structural pattern that caused the bug in _compute_frame_jacobian. It is currently safe only because jacobian_w happens to return a fresh tensor via mixed advanced indexing ([:, int_body_idx, :, list_joint_ids]); a refactor of that indexing to a pure slice would silently corrupt the engine's live Jacobian buffer. The PR description flags this for a follow-up, but adding a comment here (or an assertion) would make the implicit assumption explicit for future maintainers.

Reviews (1): Last reviewed commit: "[IK] Stop DiffIK action term from accumu..." | Re-trigger Greptile

Comment on lines +56 to +77
num_envs, num_joints = 4, 7
backing = torch.randn(num_envs, 6, num_joints)
backing_snapshot = backing.clone()

stub = _make_stub(
num_envs,
num_joints,
body_offset_pos=[0.0, 0.0, 0.05],
body_offset_rot=[1.0, 0.0, 0.0, 0.0],
backing_buffer=backing,
)

compute = DifferentialInverseKinematicsAction._compute_frame_jacobian

j1 = compute(stub).clone()
j2 = compute(stub).clone()
j3 = compute(stub).clone()

torch.testing.assert_close(j1, j2)
torch.testing.assert_close(j1, j3)
# The backing buffer must be untouched: the fix may not corrupt the source.
torch.testing.assert_close(backing, backing_snapshot)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Idempotency test uses identity rotation, leaving rotational-accumulation path unchecked

body_offset_rot=[1.0, 0.0, 0.0, 0.0] makes matrix_from_quat(offset_rot) equal to the identity matrix, so the rotational correction (self._jacobian_b[:, 3:, :] = bmm(I, ...)) is a no-op and would be idempotent even under the old alias-and-mutate code. The test fully captures the translational += accumulation bug, but a second parameterization with a non-trivial rotation would give confidence that the rotational path is also covered by the owned-buffer pattern (it is, but the test doesn't prove it).

@isaaclab-review-bot
Copy link
Copy Markdown

🤖 Review Update (f0154ac)

Reviewed incremental changes from 432e22df0154ac.

Summary

Changelog entry added — proper documentation of the fix
Core fix is clean — copies into owned buffer before mutation, preventing aliasing issues
Test suite is comprehensive — good coverage of idempotency, correctness, and buffer ownership

Previous Suggestions

  • The P2 suggestion about using non-trivial rotation in the idempotency test: Note that test_compute_frame_jacobian_applies_offset_once exercises a non-trivial rotation [0.7071, 0.0, 0.7071, 0.0], so rotational coverage exists across the test suite.

Verdict

No new issues found. Ready to merge. 🚀

@hujc7 hujc7 requested a review from kellyguo11 May 26, 2026 05:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant