[IK] Stop DiffIK action term from accumulating body-offset on aliased Jacobian buffer#5775
[IK] Stop DiffIK action term from accumulating body-offset on aliased Jacobian buffer#5775hujc7 wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review Summary
This PR correctly fixes a subtle Jacobian aliasing bug (NVBug 6043099) where _compute_frame_jacobian was mutating a potentially shared buffer in place. The fix allocates an owned _jacobian_b buffer and copies into it before applying the body-offset correction — a robust pattern already used in OperationalSpaceControllerAction.
✅ What Works Well
- Correct fix approach: The copy-before-mutate pattern ensures idempotency regardless of whether
jacobian_breturns a view or copy - Good test coverage: Three regression tests covering idempotency, correctness, and buffer ownership
- Clear documentation: The PR description thoroughly explains why the old code "happened to work" and the fix rationale
- Follows existing patterns: Mirrors the safe pattern already in
OperationalSpaceControllerAction._compute_ee_jacobian
📋 Minor Observations
The test coverage is thorough for the non-None body_offset case. Consider adding a brief test for the body_offset=None path to ensure the owned buffer is still returned correctly (though this path has simpler logic and lower risk).
📝 Design Note
The PR body correctly notes that jacobian_b property itself has the same alias-then-mutate pattern when rotating into base frame. While safe today due to advanced indexing semantics, this is worth tracking for a follow-up to make the safety explicit.
Overall, this is a well-crafted fix with appropriate tests. The change is minimal and focused on the bug at hand.
|
|
||
| # owned buffer for the body-frame Jacobian. Avoids aliasing the underlying | ||
| # data source (which can be a view onto the engine's mutable buffer) so the | ||
| # in-place body-offset correction in :meth:`_compute_frame_jacobian` is safe |
There was a problem hiding this comment.
Good practice: Pre-allocating the buffer in __init__ is the right approach. This ensures consistent memory layout and avoids repeated allocations during simulation steps.
Minor suggestion: Consider adding a brief inline comment noting the buffer shape (num_envs, 6, num_joints) corresponds to (batch, spatial_dim, dof) for future readers unfamiliar with Jacobian conventions.
|
|
||
| torch.testing.assert_close(j1, j2) | ||
| torch.testing.assert_close(j1, j3) | ||
| # The backing buffer must be untouched: the fix may not corrupt the source. |
There was a problem hiding this comment.
Excellent test design: Verifying that the backing buffer is untouched after the operation (torch.testing.assert_close(backing, backing_snapshot)) is crucial — it ensures the fix doesn't corrupt the source data, which would cause issues for other consumers of the same Jacobian buffer.
… Jacobian buffer DifferentialInverseKinematicsAction._compute_frame_jacobian read self.jacobian_b into a local alias and applied the body-offset correction in place. When the underlying data source returns a view onto the engine's mutable Jacobian buffer, repeated calls within a single step accumulated the correction, drifting the translational Jacobian linearly per call. Allocate an owned self._jacobian_b buffer in __init__ and copy into it before mutating, mirroring the pattern already used by OperationalSpaceControllerAction. The fix is robust to whether jacobian_b returns a view or a copy. Add a stub-based regression test that fails against the previous behavior and passes after the fix. Refs NVBug 6043099
432e22d to
f0154ac
Compare
Greptile SummaryThis PR fixes
Confidence Score: 4/5Safe to merge; the one-line copy before mutation is a straightforward defensive hardening with no behavior change under current engine semantics. The production change is minimal and correct — it replaces a fragile alias with an explicit owned-buffer copy, matching an established pattern elsewhere in the same file. The The Important Files Changed
Sequence DiagramsequenceDiagram
participant C as apply_actions()
participant M as _compute_frame_jacobian()
participant P as jacobian_b (property)
participant W as jacobian_w (property)
participant E as Engine Jacobian Buffer
C->>M: call
M->>P: self.jacobian_b
P->>W: self.jacobian_w
W->>E: advanced index [:, int_idx, :, list_ids]
E-->>W: fresh copy (no alias)
W-->>P: tensor copy
P->>P: rotate in-place (mutates copy, not engine buffer)
P-->>M: rotated copy
M->>M: "self._jacobian_b[:] = rotated copy"
M->>M: apply body-offset correction on self._jacobian_b
M-->>C: return self._jacobian_b (owned buffer ref)
|
| num_envs, num_joints = 4, 7 | ||
| backing = torch.randn(num_envs, 6, num_joints) | ||
| backing_snapshot = backing.clone() | ||
|
|
||
| stub = _make_stub( | ||
| num_envs, | ||
| num_joints, | ||
| body_offset_pos=[0.0, 0.0, 0.05], | ||
| body_offset_rot=[1.0, 0.0, 0.0, 0.0], | ||
| backing_buffer=backing, | ||
| ) | ||
|
|
||
| compute = DifferentialInverseKinematicsAction._compute_frame_jacobian | ||
|
|
||
| j1 = compute(stub).clone() | ||
| j2 = compute(stub).clone() | ||
| j3 = compute(stub).clone() | ||
|
|
||
| torch.testing.assert_close(j1, j2) | ||
| torch.testing.assert_close(j1, j3) | ||
| # The backing buffer must be untouched: the fix may not corrupt the source. | ||
| torch.testing.assert_close(backing, backing_snapshot) |
There was a problem hiding this comment.
Idempotency test uses identity rotation, leaving rotational-accumulation path unchecked
body_offset_rot=[1.0, 0.0, 0.0, 0.0] makes matrix_from_quat(offset_rot) equal to the identity matrix, so the rotational correction (self._jacobian_b[:, 3:, :] = bmm(I, ...)) is a no-op and would be idempotent even under the old alias-and-mutate code. The test fully captures the translational += accumulation bug, but a second parameterization with a non-trivial rotation would give confidence that the rotational path is also covered by the owned-buffer pattern (it is, but the test doesn't prove it).
🤖 Review Update (f0154ac)Reviewed incremental changes from Summary✅ Changelog entry added — proper documentation of the fix Previous Suggestions
VerdictNo new issues found. Ready to merge. 🚀 |
1. Summary
DifferentialInverseKinematicsAction._compute_frame_jacobianaliasedself.jacobian_band applied the body-offset correction in place. If the underlying data source returns a view onto the engine's mutable Jacobian buffer, repeated calls within a single step drifted the translational Jacobian linearly per call.self._jacobian_bbuffer in__init__, copy into it before mutating. Mirrors the pattern already used byOperationalSpaceControllerAction._compute_ee_jacobian.2. Why the previous code happened to work today
self.jacobian_bcallsself.jacobian_w, which advanced-indexes the source buffer as[:, int_body_idx, :, list_joint_ids]. PyTorch mixed advanced indexing returns a fresh copy per call, so the in-place+=mutated a copy in practice._jacobi_body_idxbeing a single int and_jacobi_joint_idsbeing a list. A refactor to slices would re-expose the bug.3. Test plan
test_compute_frame_jacobian_is_idempotent_within_step— three consecutive calls under non-Nonebody offset return identical tensors. Fails againstdevelop, passes against the fix.test_compute_frame_jacobian_applies_offset_once— output matches an out-of-place reference computation.test_compute_frame_jacobian_returns_owned_buffer— return tensor is the owned buffer, not the data-layer source../isaaclab.sh -f) clean on changed files.4. Out of scope
jacobian_bproperty itself does the same alias-then-mutate trick when rotating into base frame (jacobian[:, :3, :] = torch.bmm(...)). Currently safe by the same advanced-indexing copy. Left alone in this PR to keep scope minimal; flagging for a possible follow-up.OperationalSpaceControllerAction.jacobian_bhas the same property-level pattern but its consumer copies into an owned buffer before mutating, so it's safe-by-consumer.Refs NVBug 6043099