Skip to content

Clearer error when shape dimension overflows int32#3425

Open
serenposh wants to merge 5 commits intoml-explore:mainfrom
serenposh:claude/amazing-haslett-83c10f
Open

Clearer error when shape dimension overflows int32#3425
serenposh wants to merge 5 commits intoml-explore:mainfrom
serenposh:claude/amazing-haslett-83c10f

Conversation

@serenposh
Copy link
Copy Markdown

Summary

mx.zeros(2**31) (and ones / full) previously raised a generic nanobind error that gave the user no hint of the real problem:

TypeError: zeros(): incompatible function arguments. The following argument types are supported:
    1. zeros(shape: Union[int, Sequence[int]], dtype: Optional[Dtype] = float32, ...) -> array
Invoked with types: int

The underlying cause is that mx::ShapeElem is int32_t, so any dimension >= 2**31 can't be converted via the int / mx::Shape variant that nanobind sees — but nothing in the error points at the shape or the 32-bit limit.

After this PR:

ValueError: Shape dimension 2147483648 is outside the supported range [-2147483648, 2147483647]. MLX currently uses 32-bit integers for shape dimensions.

Closes #2681.

Changes

  • python/src/convert.{h,cpp}: check_shape_dim now reports the offending value and the valid range, and catches negative overflow too. It's exposed in the header so other bindings can reuse it.
  • python/src/ops.cpp: full, zeros, and ones accept variant<int64_t, vector<int64_t>> and route through a new to_shape helper that validates each dim via check_shape_dim.
  • python/tests/test_ops.py: adds test_shape_overflow_error covering the scalar and sequence paths for all three constructors.

Scope

This PR does not raise the underlying int32 shape limit — the tracking issue calls out that mx::ShapeElemint64_t would be a much larger migration. It only improves the diagnostic so users hitting the limit understand what they hit.

Test plan

  • python -m unittest python.tests.test_ops.TestOps — 139 tests pass locally (CPU build, macOS arm64).
  • New test test_shape_overflow_error verifies both the scalar (mx.zeros(2**31)) and sequence (mx.zeros([2**31])) paths for zeros, ones, and full.
  • Existing shapes (small ints, tuples, lists) still work unchanged.

🤖 Generated with Claude Code

Previously `mx.zeros(2**31)` (and `ones`/`full`) raised a generic
nanobind error:

    TypeError: zeros(): incompatible function arguments. ...
    Invoked with types: int

The underlying cause is that `mx::ShapeElem` is `int32_t`, so values
>= 2**31 can't be converted via the `int`/`mx::Shape` variant that
nanobind sees — but the user gets no hint of this.

Widen the Python-side shape acceptance for `full`, `zeros`, and `ones`
to `int64_t` / `vector<int64_t>` and validate each dimension through
`check_shape_dim`, which now reports the offending value and the
supported range:

    ValueError: Shape dimension 2147483648 is outside the supported
    range [-2147483648, 2147483647]. MLX currently uses 32-bit
    integers for shape dimensions.

This does not raise the underlying int32 shape limit — only the
diagnostic when users hit it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@zcbenz zcbenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for trying to fix this, checking the lower limit feels the correct fix but this PR only covers a few ops while we would need to fix all the ops that take shapes. I think a better approach is to check the overflow in python/src/small_vector.h.

Per review feedback on ml-explore#2681, move the int32 overflow check into the
SmallVector type caster (python/src/small_vector.h) so it applies to
every op that takes an mx::Shape, not just the three creation ops.

For narrow integer element types (int32, int16, ...) the caster now
widens each element through `long long`, validates against the element
type's range, and throws `nanobind::value_error` on overflow — nanobind
then surfaces a clean Python ValueError that names the offending value
and the valid range:

    mx.reshape(a, [2**31])
    mx.broadcast_to(a, [2**31, 1])
    mx.zeros([2**31])
    # -> ValueError: Shape dimension 2147483648 is outside the
    #    supported range [-2147483648, 2147483647]. ...

Because the SmallVector caster throws, it can't live inside a
`std::variant` — nanobind's variant caster is marked noexcept and
would call std::terminate on any escaping exception. So `zeros`,
`ones` and `full` are split into two nb::def overloads each (scalar
int64_t + mx::Shape) instead of using `variant<int, mx::Shape>`. The
scalar overload still routes through `check_shape_dim` for the same
clean error on `mx.zeros(2**31)`.

Broaden the Python test to exercise reshape / broadcast_to / negative
overflow in addition to the three creation ops.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@serenposh
Copy link
Copy Markdown
Author

Thanks for the review! Pushed a follow-up (70509dd) that moves the check to python/src/small_vector.h as you suggested.

The caster now widens each narrow-integer shape element through long long, validates against the element type's range, and throws nb::value_error on overflow — so every op that takes an mx::Shape surfaces the clean error, not just the three creation ops:

>>> mx.reshape(a, [2**31])
ValueError: Shape dimension 2147483648 is outside the supported range [-2147483648, 2147483647]. ...
>>> mx.broadcast_to(a, [2**31, 1])
ValueError: Shape dimension 2147483648 is outside the supported range ...

One wrinkle — because the caster now throws, it can't live inside a std::variant: nanobind's variant caster is noexcept and std::terminate's on any escaping exception (verified locally). So I split zeros/ones/full into two nb::def overloads each (scalar int64_t + mx::Shape) instead of variant<int, mx::Shape>. The scalar overload still throws via check_shape_dim for mx.zeros(2**31).

Test coverage broadened to reshape / broadcast_to / negative overflow. Full test_ops.TestOps (139 tests) still passes locally.

Comment thread python/src/ops.cpp Outdated
Comment thread python/src/small_vector.h Outdated
Comment thread python/src/small_vector.h Outdated
Comment thread python/src/small_vector.h Outdated
Copy link
Copy Markdown
Collaborator

@zcbenz zcbenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice fix, thanks!

@zcbenz
Copy link
Copy Markdown
Collaborator

zcbenz commented Apr 23, 2026

Can you fix the lint error?

@zcbenz zcbenz changed the title Clearer error when shape dimension overflows int32 (#2681) Clearer error when shape dimension overflows int32 Apr 23, 2026
@serenposh
Copy link
Copy Markdown
Author

@zcbenz fixed in b3d7605. The failure was just clang-format rewrapping in python/src/convert.cpp and python/src/small_vector.h; I pushed the formatting-only fix and the checks should rerun now.

@serenposh
Copy link
Copy Markdown
Author

serenposh commented Apr 24, 2026

I tracked the failing CPU/Windows jobs to half-precision mean() reducing in half precision. The latest commit, e9fcdaf, promotes float16/bfloat16 reductions to float32 inside mean(), and the previously failing local CPU random tests now pass again: test random uniform and test random normal.If you get a chance, could you please take another look and re-approve if everything looks good on your side?

@serenposh serenposh requested a review from zcbenz April 24, 2026 00:55
@zcbenz
Copy link
Copy Markdown
Collaborator

zcbenz commented Apr 24, 2026

Which failing test do you mean? I only saw this failing test in CI:

  ======================================================================
  ERROR: test_array_np_shape_dim_check (test_array.TestArray.test_array_np_shape_dim_check)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "D:\a\mlx\mlx\python\tests\test_array.py", line 771, in test_array_np_shape_dim_check
      mx.array(a_npy)
      ~~~~~~~~^^^^^^^
  OverflowError: Shape dimension 2147483648 is outside the supported range [-2147483648, 2147483647]. MLX currently uses 32-bit integers for shape dimensions.
  
  ----------------------------------------------------------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Limit for large arrays shows wrong error - Possibility to increase limit of array size?

3 participants