Skip to content

Runtime fails on valid channels_last input for Conv2d + Sigmoid + spatial Mean #19139

@wuyii8941

Description

@wuyii8941

🐛 Describe the bug

Description

A small model exported through ExecuTorch lowers successfully, but fails at runtime when the input tensor uses PyTorch channels_last memory format.

The same model runs correctly with a contiguous input. The channels_last input is valid in PyTorch and eager execution works.

Reproducer

import importlib.metadata

import torch
from torch.export import export

from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
from executorch.exir import to_edge_transform_and_lower
from executorch.extension.pybindings.portable_lib import _load_for_executorch_from_buffer


class Model(torch.nn.Module):
    def __init__(self):
        super().__init__()
        torch.manual_seed(1014)
        self.conv = torch.nn.Conv2d(5, 11, kernel_size=3, padding=1)

    def forward(self, x):
        y = torch.sigmoid(self.conv(x))
        return y.mean(dim=(2, 3))


print("torch:", torch.__version__)
print("executorch:", importlib.metadata.version("executorch"))

model = Model().eval()

x = torch.randn(2, 5, 13, 7).to(memory_format=torch.channels_last)
print("shape:", tuple(x.shape))
print("stride:", tuple(x.stride()))
print("is_contiguous:", x.is_contiguous())
print("is_channels_last:", x.is_contiguous(memory_format=torch.channels_last))

ref = model(x)

ep = export(model, (x,), strict=True)
et = to_edge_transform_and_lower(
    ep,
    partitioner=[XnnpackPartitioner()],
).to_executorch()

mod = _load_for_executorch_from_buffer(et.buffer)
out = mod.run_method("forward", (x,))[0]

print("max diff:", (out - ref).abs().max().item())

Actual Behavior

Lowering succeeds, but runtime execution fails:

[tensor_util_portable.cpp:130] Check failed (all_contiguous || all_channels_last): 2 input tensors have different dim orders
[op_mean.cpp:37] Check failed (tensors_have_same_dim_order(in, out)):
[method.cpp:1387] KernelCall failed at instruction 0:1 in operator aten::mean.out: 0x12
RuntimeError: Failed to execute method forward, error: 0x12

Expected Behavior

The program should either:

  1. run successfully and match PyTorch eager output, or
  2. fail during lowering / validation with a clear unsupported-layout diagnostic.

It seems undesirable for the official XNNPACK lowering path to successfully
produce a program that later fails in the runtime due to dim order mismatch.

Control Case

Changing only the input layout to contiguous makes the same model pass:

x = torch.randn(2, 5, 13, 7)

Observed result:

runtime: ok
max_diff: ~2.4e-07

Notes

The failing input is a valid PyTorch channels_last tensor:

shape: (2, 5, 13, 7)
stride: (455, 1, 35, 5)
is_contiguous: False
is_channels_last: True

The model is ordinary inference code: Conv2d -> sigmoid -> mean(dim=(2, 3)).

The failure looks related to dim order propagation across XNNPACK delegated ops
and the remaining portable aten::mean.out op, but I may be misreading the
lowering boundary.

The same basic failure also reproduces through the portable lowering path
(executorch.exir.to_edge(...).to_executorch()), so this may be broader than an
XNNPACK-only issue.

Additional targeted scouting found similar channels_last-only runtime failures
for sum, softmax, slice_copy, permute_copy, and add consumers after
Conv2d. Contiguous controls passed in these cases.

Local Triage

  • Stable repro: yes, reproduced on torch 2.11.0 + executorch 1.2.0.
  • Nightly repro: not verified locally. In /data1/tzh/pt_nightly_env
    (torch 2.13.0.dev20260425 + executorch 1.2.0), even contiguous control
    cases fail/abort in the runtime with unrelated-looking metadata/type errors,
    so that environment is not a clean validation target.
  • Contiguous control: passes.
  • Invalid-argument check: input is a valid PyTorch channels_last tensor and
    eager execution succeeds.
  • Existing reports: searched exact error strings and did not find a matching
    public issue:
    • tensors_have_same_dim_order
    • op_mean.cpp
    • Attempted to resize a static tensor
    • XNNExecutor.cpp
    • all_contiguous || all_channels_last

Versions

  • torch: 2.11.0+cu130
  • executorch: 1.2.0
  • Python: 3.11
  • Platform: Linux x86_64
  • Backend: XNNPACK and portable runtime
  • API path:
    • torch.export.export
    • executorch.exir.to_edge_transform_and_lower(..., partitioner=[XnnpackPartitioner()])
    • .to_executorch()
    • Python runtime _load_for_executorch_from_buffer

cc @GregoryComer @digantdesai @cbilgin

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions