Skip to content

[Relax][ONNX] Add ONNX Backend Tests for systematic frontend coverage#19515

Open
Aharrypotter wants to merge 2 commits intoapache:mainfrom
Aharrypotter:relax-onnx-backend-tests
Open

[Relax][ONNX] Add ONNX Backend Tests for systematic frontend coverage#19515
Aharrypotter wants to merge 2 commits intoapache:mainfrom
Aharrypotter:relax-onnx-backend-tests

Conversation

@Aharrypotter
Copy link
Copy Markdown
Contributor

@Aharrypotter Aharrypotter commented May 6, 2026

Summary

Introduce a test runner that reuses the official ONNX Backend Test Suite to systematically verify the Relax ONNX importer. This complements the existing hand-written tests in test_frontend_onnx.py by providing spec-aligned coverage of standard ONNX operator semantics.

Towards #19505

Motivation

The existing test_frontend_onnx.py has 187 hand-written tests that validate TVM-specific importer behavior (parameter handling, name sanitization, dynamic shapes, Relax IR structure). However, it relies on ONNX Runtime as the reference and cannot systematically cover all edge cases defined in the ONNX specification.

The ONNX Backend Test Suite provides 1653+ node-level tests with protobuf reference inputs/outputs. It is the industry standard for validating ONNX importers/exporters (used by ONNX Runtime, TensorFlow, PyTorch). Reusing it gives Relax a living, upstream-aligned correctness baseline.

What this PR adds

  • tests/python/relax/test_frontend_onnx_backend.py — a backend adapter (TVMRelaxBackend) that implements the onnx.backend.base.Backend interface, wiring from_onnx()DecomposeOpsForInference()LegalizeOps()tvm.compile()VirtualMachine.

Coverage

72 operators with 388 test cases, all passing. Only operators where every ONNX node test passes are included — no xfail markers.

Operators not yet covered include: cast (exotic dtypes), reduce ops (edge cases), reshape/resize/attention (complex behavior), quantization, and several others with known importer gaps. These can be added incrementally as the importer improves.

Test results

388 passed, 3216 skipped (CUDA variants + operators not yet in allowlist), 0 failed, 0 xfailed

CI impact

  • New test file is not added to any existing CI test shard by default
  • Full suite (388 tests) is lightweight on CPU-only runners

Design decisions

  • Coexistence with existing tests: test_frontend_onnx.py remains unchanged. Backend tests cover standard ONNX semantics; hand-written tests continue to cover TVM-specific behavior (dynamic shapes, Relax IR structure, importer options).
  • Public API only: uses backend_test.include() with ^-anchored regex patterns. No access to private ONNX APIs.
  • No xfail: only include operators that fully pass. Uncovered operators are documented in code comments and this PR description. Follow-up PRs can expand coverage as importer gaps are fixed.
  • Prefix conflict handling: include() patterns use ^test_{op}(?:_.*)?(?:_cpu|_cuda)$, which can cause false matches when a short op name is a prefix of a longer one (e.g. log vs log_softmax). Affected ops (log, max, relu) are excluded until a more precise matching strategy is adopted.

…frontend coverage (apache#19505)

Add a test harness that wraps the official ONNX Backend Test Suite
(Node Tests) around the Relax ONNX importer.  This gives systematic,
spec-aligned coverage of 116 operators with 533 passing tests,
replacing hand-written edge-case models with standardized protobuf
test data.

The runner follows the standard `onnx.backend.base.Backend` interface,
using `from_onnx()` → `DecomposeOpsForInference()` → `LegalizeOps()`
→ `tvm.compile()` → `VirtualMachine` to execute each test case.

Known failures are tracked via `xfail` by category (trig precision,
quantization edge cases, dynamic split, etc.).
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a systematic verification suite for the Relax ONNX importer using the official ONNX Backend Test Suite, including a new pytest marker and a backend adapter. Review feedback identifies a bug where tvm.runtime.Tensor is used instead of tvm.runtime.NDArray and a logic error in input mapping where initializers should be filtered out from the graph inputs to align with the ONNX test runner's behavior.

I am having trouble creating individual review comments. Click here to see my feedback.

tests/python/relax/test_frontend_onnx_backend.py (92)

high

tvm.runtime.Tensor is not a valid class in the TVM Python API. It should be tvm.runtime.NDArray (or tvm.nd.NDArray). Using the incorrect class name will result in an AttributeError when the test runner attempts to verify outputs.

        if isinstance(output, (tvm.runtime.NDArray, np.ndarray)):

tests/python/relax/test_frontend_onnx_backend.py (123)

high

In ONNX, model.graph.input includes both model inputs and initializers (which serve as default values). The ONNX backend test runner typically only provides positional values in the inputs list for elements that are not initializers. Mapping positional inputs to graph.input directly in run() will lead to an incorrect mapping if initializers are interspersed in the input list. Filtering graph_input_names here to exclude initializers ensures that the positional mapping in TVMRelaxBackendRep.run aligns with the test runner's behavior.

        initializer_names = {t.name for t in model.graph.initializer}
        graph_input_names = [inp.name for inp in model.graph.input if inp.name not in initializer_names]

@Aharrypotter
Copy link
Copy Markdown
Contributor Author

cc @mshr-h @tlopex

@Aharrypotter
Copy link
Copy Markdown
Contributor Author

Code Review  代码审查

This pull request adds a systematic verification suite for the Relax ONNX importer using the official ONNX Backend Test Suite, including a new pytest marker and a backend adapter. Review feedback identifies a bug where tvm.runtime.Tensor is used instead of tvm.runtime.NDArray and a logic error in input mapping where initializers should be filtered out from the graph inputs to align with the ONNX test runner's behavior.此拉取请求为 Relax ONNX 导入器添加了一个系统验证套件,使用官方 ONNX 后端测试套件,包括一个新的 pytest 标记和一个后端适配器。审查反馈指出了一个错误,即在输入映射中使用了 tvm.runtime.Tensor 而不是 tvm.runtime.NDArray ,并且在输入映射中存在逻辑错误,初始值应从图输入中过滤,以与 ONNX 测试运行器的行为保持一致。

I am having trouble creating individual review comments. Click here to see my feedback.
我在创建单独的审查评论时遇到问题。点击这里查看我的反馈。

Thanks for the review, Addressing both points:

1. tvm.runtime.Tensor vs tvm.runtime.NDArray

tvm.runtime.Tensor is a valid class in the current TVM Python API — it is defined in tvm.runtime._tensor.Tensor and inherits from tvm_ffi.core.Tensor. It is not an AttributeError risk.

>>> import tvm
>>> hasattr(tvm.runtime.Tensor)
True
>>> tvm.runtime.Tensor.__mro__
(<class "tvm.runtime._tensor.Tensor">, <class "tvm_ffi.core.Tensor">, ...)

2. Excluding initializers from graph_input_names

The ONNX Backend Test data does not use initializers — all inputs (including weights) are provided as .pb files. We verified this across multiple models (test_batchnorm_epsilon, test_conv, test_matmul_2d, test_add, test_gemm): model.graph.initializer is always empty.

The current positional mapping logic is correct for this test data format. That said, adding the initializer filter is a good defensive measure for robustness — if the test data format ever changes or a third-party test suite uses initializers, the filter would prevent silent misalignment.

@mshr-h
Copy link
Copy Markdown
Contributor

mshr-h commented May 7, 2026

Let's try to run the new tests in CI and see if they increase CI pressure. @Aharrypotter

@Aharrypotter
Copy link
Copy Markdown
Contributor Author

@tvm-bot run slow tests

This PR introduces a test runner that reuses the official ONNX Backend Test suite to systematically cover relax.frontend.onnx.

- Node-level test filtering via BackendTest._test_items
- ONNX backend pytest marker
- SKIP_SLOW_TESTS support
- Documented xfails for known importer gaps
@Aharrypotter Aharrypotter force-pushed the relax-onnx-backend-tests branch from 9b3785e to a7b6c01 Compare May 7, 2026 14:56
@mshr-h
Copy link
Copy Markdown
Contributor

mshr-h commented May 8, 2026

Just curious can we completely move to new backend tests or do we still need to maintain old ones?
We need to investigate if we can test ops such as Sequence, Attention, Quantization as they seems to be complicated.

@tlopex
Copy link
Copy Markdown
Member

tlopex commented May 8, 2026

I think it's fine to move to new backend tests @mshr-h

@Aharrypotter
Copy link
Copy Markdown
Contributor Author

Just curious can we completely move to new backend tests or do we still need to maintain old ones? We need to investigate if we can test ops such as Sequence, Attention, Quantization as they seems to be complicated.

I checked Sequence, Attention, and Quantization locally.

Quantization has a few passing cases, but enabling it cleanly would require very specific per-test include patterns. The broader QuantizeLinear / DequantizeLinear prefixes also pull in unsupported variants like blocked quantization, float8/float4, and int2/int4/uint2/uint4. So I think it is better to leave it for a follow-up PR.

Attention also needs separate work: the ONNX backend tests use the standard Q/K/V Attention form, while the current Relax converter seems to support the older Microsoft-style packed-QKV path with num_heads.

Sequence has similar issues, mostly around runtime sequence inputs, dynamic positions, SequenceMap/Loop, ReverseSequence, and SplitToSequence.

Given that, I would keep this PR focused on the initial stable subset and track these categories as follow-up items.

@Aharrypotter
Copy link
Copy Markdown
Contributor Author

I think we can move in that direction, but probably not completely in one step.

The ONNX Backend Tests are very useful for standard operator semantic coverage, and they can replace some duplicated hand-written semantic tests over time. However, test_frontend_onnx.py also contains TVM-specific importer tests that the ONNX Backend Tests do not directly cover, such as keep_params_in_input, initializer/runtime parameter handling, input/name sanitization, symbolic/dynamic shape handling, Relax IR structure checks, validation/error paths, and importer-to-legalization/compile integration cases.

I agree that having two ONNX frontend test files could be confusing unless the boundary is clear.

My intended split is that test_frontend_onnx_backend.py covers standard ONNX operator semantics with the official backend tests, while test_frontend_onnx.py keeps TVM-specific importer behavior and integration/regression tests.

For this PR, I would keep the scope to landing the backend-test runner and the initial stable subset. Then, in a follow-up, we can audit test_frontend_onnx.py, migrate/delete duplicated semantic tests, and document where future ONNX frontend tests should go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants