🐛 Describe the bug
Description
A small model exported through ExecuTorch lowers successfully, but fails at runtime when the input tensor uses PyTorch channels_last memory format.
The same model runs correctly with a contiguous input. The channels_last input is valid in PyTorch and eager execution works.
Reproducer
import importlib.metadata
import torch
from torch.export import export
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
from executorch.exir import to_edge_transform_and_lower
from executorch.extension.pybindings.portable_lib import _load_for_executorch_from_buffer
class Model(torch.nn.Module):
def __init__(self):
super().__init__()
torch.manual_seed(1014)
self.conv = torch.nn.Conv2d(5, 11, kernel_size=3, padding=1)
def forward(self, x):
y = torch.sigmoid(self.conv(x))
return y.mean(dim=(2, 3))
print("torch:", torch.__version__)
print("executorch:", importlib.metadata.version("executorch"))
model = Model().eval()
x = torch.randn(2, 5, 13, 7).to(memory_format=torch.channels_last)
print("shape:", tuple(x.shape))
print("stride:", tuple(x.stride()))
print("is_contiguous:", x.is_contiguous())
print("is_channels_last:", x.is_contiguous(memory_format=torch.channels_last))
ref = model(x)
ep = export(model, (x,), strict=True)
et = to_edge_transform_and_lower(
ep,
partitioner=[XnnpackPartitioner()],
).to_executorch()
mod = _load_for_executorch_from_buffer(et.buffer)
out = mod.run_method("forward", (x,))[0]
print("max diff:", (out - ref).abs().max().item())
Actual Behavior
Lowering succeeds, but runtime execution fails:
[tensor_util_portable.cpp:130] Check failed (all_contiguous || all_channels_last): 2 input tensors have different dim orders
[op_mean.cpp:37] Check failed (tensors_have_same_dim_order(in, out)):
[method.cpp:1387] KernelCall failed at instruction 0:1 in operator aten::mean.out: 0x12
RuntimeError: Failed to execute method forward, error: 0x12
Expected Behavior
The program should either:
- run successfully and match PyTorch eager output, or
- fail during lowering / validation with a clear unsupported-layout diagnostic.
It seems undesirable for the official XNNPACK lowering path to successfully
produce a program that later fails in the runtime due to dim order mismatch.
Control Case
Changing only the input layout to contiguous makes the same model pass:
x = torch.randn(2, 5, 13, 7)
Observed result:
runtime: ok
max_diff: ~2.4e-07
Notes
The failing input is a valid PyTorch channels_last tensor:
shape: (2, 5, 13, 7)
stride: (455, 1, 35, 5)
is_contiguous: False
is_channels_last: True
The model is ordinary inference code: Conv2d -> sigmoid -> mean(dim=(2, 3)).
The failure looks related to dim order propagation across XNNPACK delegated ops
and the remaining portable aten::mean.out op, but I may be misreading the
lowering boundary.
The same basic failure also reproduces through the portable lowering path
(executorch.exir.to_edge(...).to_executorch()), so this may be broader than an
XNNPACK-only issue.
Additional targeted scouting found similar channels_last-only runtime failures
for sum, softmax, slice_copy, permute_copy, and add consumers after
Conv2d. Contiguous controls passed in these cases.
Local Triage
- Stable repro: yes, reproduced on
torch 2.11.0 + executorch 1.2.0.
- Nightly repro: not verified locally. In
/data1/tzh/pt_nightly_env
(torch 2.13.0.dev20260425 + executorch 1.2.0), even contiguous control
cases fail/abort in the runtime with unrelated-looking metadata/type errors,
so that environment is not a clean validation target.
- Contiguous control: passes.
- Invalid-argument check: input is a valid PyTorch
channels_last tensor and
eager execution succeeds.
- Existing reports: searched exact error strings and did not find a matching
public issue:
tensors_have_same_dim_order
op_mean.cpp
Attempted to resize a static tensor
XNNExecutor.cpp
all_contiguous || all_channels_last
Versions
- torch:
2.11.0+cu130
- executorch:
1.2.0
- Python:
3.11
- Platform: Linux x86_64
- Backend: XNNPACK and portable runtime
- API path:
torch.export.export
executorch.exir.to_edge_transform_and_lower(..., partitioner=[XnnpackPartitioner()])
.to_executorch()
- Python runtime
_load_for_executorch_from_buffer
cc @GregoryComer @digantdesai @cbilgin
🐛 Describe the bug
Description
A small model exported through ExecuTorch lowers successfully, but fails at runtime when the input tensor uses PyTorch
channels_lastmemory format.The same model runs correctly with a contiguous input. The
channels_lastinput is valid in PyTorch and eager execution works.Reproducer
Actual Behavior
Lowering succeeds, but runtime execution fails:
Expected Behavior
The program should either:
It seems undesirable for the official XNNPACK lowering path to successfully
produce a program that later fails in the runtime due to dim order mismatch.
Control Case
Changing only the input layout to contiguous makes the same model pass:
Observed result:
Notes
The failing input is a valid PyTorch channels_last tensor:
The model is ordinary inference code:
Conv2d -> sigmoid -> mean(dim=(2, 3)).The failure looks related to dim order propagation across XNNPACK delegated ops
and the remaining portable
aten::mean.outop, but I may be misreading thelowering boundary.
The same basic failure also reproduces through the portable lowering path
(
executorch.exir.to_edge(...).to_executorch()), so this may be broader than anXNNPACK-only issue.
Additional targeted scouting found similar channels_last-only runtime failures
for
sum,softmax,slice_copy,permute_copy, andaddconsumers afterConv2d. Contiguous controls passed in these cases.Local Triage
torch 2.11.0 + executorch 1.2.0./data1/tzh/pt_nightly_env(
torch 2.13.0.dev20260425 + executorch 1.2.0), even contiguous controlcases fail/abort in the runtime with unrelated-looking metadata/type errors,
so that environment is not a clean validation target.
channels_lasttensor andeager execution succeeds.
public issue:
tensors_have_same_dim_orderop_mean.cppAttempted to resize a static tensorXNNExecutor.cppall_contiguous || all_channels_lastVersions
2.11.0+cu1301.2.03.11torch.export.exportexecutorch.exir.to_edge_transform_and_lower(..., partitioner=[XnnpackPartitioner()]).to_executorch()_load_for_executorch_from_buffercc @GregoryComer @digantdesai @cbilgin