Skip to content

[Relax][ONNX] Support 3D AffineGrid#19863

Open
guan404ming wants to merge 2 commits into
mainfrom
fix/onnx-affinegrid-3d
Open

[Relax][ONNX] Support 3D AffineGrid#19863
guan404ming wants to merge 2 commits into
mainfrom
fix/onnx-affinegrid-3d

Conversation

@guan404ming

@guan404ming guan404ming commented Jun 22, 2026

Copy link
Copy Markdown
Member

Related Issue

closes #19689

Why

The Relax AffineGrid op only handled 2D (4D theta/grid); 5D 3D inputs from ONNX failed.

How

  • Generalize struct-info inference to 2D/3D via spatial = size_sinfo->ndim.
  • Branch TOPI affine_grid compute on 2D vs 3D.
  • Add the 3D permute path in the frontend and a test_affine_grid_3d case.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request extends the affine_grid operator to support 3D sampling grids in addition to the existing 2D support, updating the ONNX frontend, TOPI implementation, type inference, and adding a test case. The review feedback suggests refactoring the ONNX frontend to avoid duplicating the grid emission and permutation logic, and replacing the branched 2D/3D computation in TOPI with a generic N-D implementation to improve maintainability.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread python/tvm/relax/frontend/onnx/onnx_frontend.py Outdated
Comment thread python/tvm/topi/image/grid_sample.py Outdated
@guan404ming guan404ming force-pushed the fix/onnx-affinegrid-3d branch from 8f704ef to 4957ad7 Compare June 22, 2026 07:08
@guan404ming guan404ming requested review from cbalint13 and tlopex June 22, 2026 09:07

@cbalint13 cbalint13 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @guan404ming , thanks !

@cbalint13

Copy link
Copy Markdown
Contributor

@guan404ming , could you update to solve the conflicts ?

@guan404ming guan404ming force-pushed the fix/onnx-affinegrid-3d branch from 4957ad7 to 35eddd9 Compare June 22, 2026 09:58
@guan404ming

Copy link
Copy Markdown
Member Author

Thanks @cbalint13, just resolved!

)

model = helper.make_model(graph, producer_name="affine_grid_3d_test")
check_correctness(model, opset=20)

@tqchen tqchen Jun 22, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we have been bringing in some of the tests that involves check_correctness against onxx runtime.
I know that we have been doing so in past.

Unfortunately keep doing so would slow down our ci and and a better way would be not doing integration tests in frontends and instead rely on cheaper structural verify, would be great if we can hold off merging changes that involves integration tests in frontends. For some of the key relax op), they should be constructed as relax ast and run verification per OP as UT (perhaps against pytorch.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense to me, thanks for letting me know. Just switched test_affine_grid_3d to a structural verify, no more onnxruntime execution, just imports and asserts on the IR (affine_grid + permute_dims and output shape).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tqchen ,

Apologise to approve with no oversee on this issue.

I addressed a comment for a possible solution to replace onnxruntime with onnx internal ReferenceEvaluator, see the pros/cons there.

@cbalint13

cbalint13 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

@guan404ming , @tlopex
Cc @Hzfengsy , @Msr-H, @tqchen

Perhaps not here is the best discussion for this (out of context), but I rise it here:

Why testcases did not use onnx own provided ReferenceEvaluator ?

To my understanding (and some experience) ReferenceEvaluator is not only for functional correctness (indeed onnxruntime also does it) but provides specification compliance too. It comes with onnx package itself, provides coverage for operator histories too. Also any particular operator can be also sweeped over its all existing op versions using history schema, (see a random testcase example) . ONNX really excels at reflecting its own schema with reference validation too.

I am not saying we should drop onnxruntime now, but if we want excelent onnx coverage with full history schema and compliance we may consider this.

Later EDIT:

  • Concerning speed , ~514 tests (include ops all own versions) runs in ~20sec on the example provided.

Signed-off-by: Guan-Ming (Wesley) Chiu <105915352+guan404ming@users.noreply.github.com>
@guan404ming guan404ming requested a review from tqchen June 22, 2026 10:54
@tqchen

tqchen commented Jun 22, 2026

Copy link
Copy Markdown
Member

On reference evaluator, my guess is reference evaluator would even be slower because it generates reference results and also is integration runs, so it would be worse than onxxruntime in terms of ci time.

I think the main issue is to consider the balance of ci pressure and our core features. Ideally the invariance are:

  • For importers, we cross validate the ingestion correctness via stuctural check
  • We have some levels of correctness checks in op UT, which might help

An altnertative could be we split out the frontend modules into separate repos, but still there would be an ci pressure concern there

@cbalint13

cbalint13 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

On reference evaluator, my guess is reference evaluator would even be slower because it generates reference results and also is integration runs, so it would be worse than onxxruntime in terms of ci time.

I think the main issue is to consider the balance of ci pressure and our core features. Ideally the invariance are:

  • For importers, we cross validate the ingestion correctness via stuctural check
  • We have some levels of correctness checks in op UT, which might help

An altnertative could be we split out the frontend modules into separate repos, but still there would be an ci pressure concern there

@tqchen

Despite its pure python its pretty lightweight, not slower, I was also surprised.

I give you a ruff number: ~514 tests (include ops all own versions) runs in ~20sec on the example , in this 20 sec the lowering to mlir and instantiating the mlir runtime to compute the operator is also included.

I am up for a task to see if we can benefit ReferenceEvaluator over onnxruntime, can migrate all testcases within a Draft PR to see numbers, I dont mind if experiment proves the otherwise and we drop the idea.

Let me know what you think.

@tqchen

tqchen commented Jun 22, 2026

Copy link
Copy Markdown
Member

For onnx tests, i think it is still better to move to the principle of structural verification

  • For importers, we cross validate the ingestion correctness via stuctural check
  • We have some levels of correctness checks in op UT, which might help

So for onnx tests we can rely on structural verification. We can perhaps have some form of side integration checks that are not run by every ci run, eg. nightly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug][Relax][ONNX] AffineGrid 3D output (size=[N,C,D,H,W]) is rejected with ValueError

3 participants