-
Notifications
You must be signed in to change notification settings - Fork 222
[5694695][AutoCast] Preserve outer scope variable types in subgraphs #717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…during type clearing When clearing type information for shape inference, preserve value_info for outer scope variables in subgraphs. Previously, all value_info entries were cleared indiscriminately, causing shape inference failures when subgraph nodes referenced outer scope variables. Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #717 +/- ##
==========================================
- Coverage 74.69% 74.68% -0.01%
==========================================
Files 192 192
Lines 18946 18956 +10
==========================================
+ Hits 14152 14158 +6
- Misses 4794 4798 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I am actually running into an issue that is slightly related. Hopefully can get some clarification before looking further into creating a reproducer. I found that for a specific model with dynamic inputs, after converting to FP16 the FP16 graph cannot be run on CPU or Cuda. This is not the biggest deal since we plan to deploy on TensorRT anyway. Looking into it further the root cause is value_info for some operators being mostly "unk", rather than the real values added after shape inference. I end up with shape mismatch errors at inference time. Removing value_info from the converted model and re-updating value_info with shape inference values resolves the issue. This leads me to wonder about the purpose of filling value_info with undefined type and generic dims before running shape inference? |
Hey @jricker2 !
Can you explain this statement? Do you mean deploy to TensorRT with weak typing (AKA --fp16 / kFP16 flag)? AutoCast is meant to prepare the model for deployment in TensorRT with strong typing. Please keep in mind that weak typing was deprecated and is planned to be removed in the next major release.
This is a bug. Would you like to share the model so we can look into it?
We want to ensure consistent type inference after adding cast nodes. The simplest way to do this is using ONNX's infer_shapes function which also performs type inference. If we leave the existing types, we'll get shape/type mismatches during shape inference. |
Apologies for the confusion here I worded this poorly. We use TensorRT EP in ONNXRuntime (which works with the converted model). We will not use the CUDA/CPU EP at any point so the failure I'm mentioning is not a show stopper by any means. I am working on pushing folks towards strongly typed models via AutoCast.
Unfortunately I cannot share the full model, I will hopefully have an IP-free reproducer soon and will file an issue with that.
This makes sense, I was wondering about the difference in having value_info be pre-populated with undefined types/generic dims (how it currently is), vs just being empty before running infer_shapes to populate it. This made the difference in making the model work for CPU/Cuda EP. I don't know enough about infer_shapes to know why this is the case. As a note, shape_inference passes on the converted model, it is actual inference on CPU/Cuda EP which fails. The local shape inference is interesting, if I have some time I will try it out, thanks for the response. |
What does this PR do?
Type of change: Bug fix
Overview:
When clearing type information for shape inference, preserve value_info for outer scope variables in subgraphs. Previously, all value_info entries were cleared indiscriminately, causing shape inference failures when subgraph nodes referenced outer scope variables.
Testing
pytest tests/unit/onnx/autocast/test_precisionconverter.py::test_if_subgraph_outer_scope_type_preservation
Before your PR is "Ready for review"