Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions spec/2025.12/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,13 @@ Contents
verification_test_suite
benchmark_suite

.. toctree::
:caption: Guides and Tutorials
:maxdepth: 1

migration_guide
tutorial_basic

.. toctree::
:caption: Other
:maxdepth: 1
Expand Down
205 changes: 205 additions & 0 deletions spec/2025.12/migration_guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
(migration-guide)=

# Migration Guide

This page is meant to help migrate your codebase to an Array API compliant
implementation. The guide is divided into two parts and, depending on your
exact use-case, you should look thoroughly into at least one of them.

The first part is dedicated for {ref}`array-producers`. If your library
mimics e.g. NumPy's or Dask's functionality, then you can find there an
additional instructions and guidance on how to ensure downstream users can
easily pick your solution as an array provider for their system/algorithm.

The second part delves into details for Array API compatibility for
{ref}`array-consumers`. This pertains to any software that performs
multidimensional array manipulation in Python, such as: scikit-learn, SciPy,
or statsmodels. If your software relies on a certain array producing library,
such as NumPy or JAX, then here you can learn how to make it library agnostic
and interchange them with way less friction.

## Ecosystem

Apart from the documented standard, the Array API ecosystem also provides
a set of tools and packages to help you with the migration process:


(array-api-compat)=

### Array API Compat

GitHub: [array-api-compat](https://github.com/data-apis/array-api-compat)

Although NumPy, Dask, CuPy, and PyTorch support the Array API Standard, there
are still some corner cases where their behavior diverges from the standard.
`array-api-compat` provides a compatibility layer to cover these cases as well.
This is also accompanied by a few utility functions for easier introspection
into array objects.


(array-api-strict)=

### Array API Strict

GitHub: [array-api-strict](https://github.com/data-apis/array-api-strict)

`array-api-strict` is a library that provides a strict and minimal
implementation of the Array API Standard. It is designed to be used as
a reference implementation for testing and development purposes. By comparing
your API calls with `array-api-strict` counterparts, you can ensure that your
library is fully compliant with the standard and can serve as a reliable
reference for other developers in the ecosystem.


(array-api-tests)=

### Array API Test

GitHub: [array-api-tests](https://github.com/data-apis/array-api-tests)

`array-api-tests` is a collection of tests that can be used to verify the
compliance of your library with the Array API Standard. It includes tests
for array producers, covering a wide range of functionalities and use cases.
By running these tests, you can ensure that your library adheres to the
standard and can be used with compatible array consumers libraries.


(array-api-extra)=

### Array API Extra

GitHub: [array-api-extra](https://github.com/data-apis/array-api-extra)

`array-api-extra` is a collection of additional utilities and tools that are
missing from the Array API Standard but can be useful for compliant array
consumers. It includes additional array manipulation and statistical functions.
It is already used by SciPy and scikit-learn.

The sections below mention when and how to use them.


(array-producers)=

## Array Producers

For array producers, the central task during the development/migration process
is adhering user-facing API to the Array API Standard.

The complete API of the standard is documented on the
[API specification](https://data-apis.org/array-api/latest/API_specification/index.html)
page.

There, each function, constant, and object is described with details
on parameters, return values, and special cases.

### Testing against Array API

There are two main ways to test your API for compliance: Either using
`array-api-tests` suite or testing your API manually against `array-api-strict`
reference implementation.

#### Array API Test suite (Recommended)

{ref}`array-api-tests` is a test suite which verifies that your API
for adhering to the standard. For each function or method it confirms
it's importable, verifies the signature, and generates multiple test
cases with hypothesis package and runs asserts for the outputs.

The setup details are enclosed in the GitHub repository, so here we
cover only the minimal workflow:

1. Install your package, for example in editable mode.
2. Clone `array-api-tests`, and set `ARRAY_API_TESTS_MODULE` environment
variable to your package import name.
3. Inside the `array-api-tests` directory run `pytest` command. There are
multiple useful options delivered by the test suite, a few worth mentioning:
- `--max-examples=2` - maximal number of test cases to generate by the
hypothesis. This allows you to balance between execution time of the test
suite and thoroughness of the testing.
- With `--xfails-file` option you can describe which tests are expected to
fail - it's impossible to get the whole API perfectly implemented on a
first try, so tracking what still fails gives you more control over the
state of your API.
- `-o xfail_strict=<bool>` is often used with the previous one. If a test
expected to fail actually passes (`XPASS`) then you can decide whether
to ignore that fact or raise it as an error.
- `--skips-file` for skipping files. At times some failing tests might stall
the execution time of the test suite - in that case the most convenient
option is to skip these for the time being.

We strongly advise you to embed this setup in your CI as well. This will allow
you to monitor the coverage live, and make sure new changes don't break existing
API. For a reference here's a [NumPy Array API Tests CI setup](https://github.com/numpy/numpy/blob/581d10f43b539a189a2d37856e5130464de9e5f6/.github/workflows/linux.yml#L296).


#### Array API Strict

A simpler, and more manual, way of testing the Array API coverage is to
run your API calls along with {ref}`array-api-strict` Python implementation.

This way you can ensure the outputs coming from your API match the minimal
reference implementation, but bare in mind you need to write the tests cases
yourself, so you need to also take into account the edge cases as well.


(array-consumers)=

## Array Consumers

For array consumers the main premise is keep in mind that your **array
manipulation operations should not lock in for a particular array producing
library**. For instance, if you use NumPy for arrays, then your code could
contain:

```python
import numpy as np

# ...
b = np.full(shape, val, dtype=dtype) @ a
c = np.mean(a, axis=0)
return np.dot(c, b)
```

The first step should be as simple as assigning `np` namespace to a dedicated
namespace variable - the convention in the ecosystem is to name it `xp`. Then
Making sure that each method and function call is something that Array API
supports is vital (we will get to that soon):

```python
import numpy as np

xp = np

# ...
b = xp.full(shape, val, dtype=dtype) @ a
c = xp.mean(a, axis=0)
return xp.tensordot(c, b, axes=1)
```

Then replacing one backend with another one should rely on providing a different
namespace, such as: `xp = torch`, e.g. via environment variable. This can be useful
if you're writing a script or in your custom software. The other alternatives are:

- If you are building a library where the backend is determined by input arrays
passed by the end-user, then a recommended way is to ask your input arrays for a
namespace to use: `xp = arr.__array_namespace__()`
- Each function you implement can have a namespace `xp` as a parameter in the
signature. Then enforcing inputs to be of type by the provided backend can be
achieved with `arg1 = xp.asarray(arg1)` for each input array.

If you're relying on NumPy, CuPy, PyTorch, Dask, or JAX then
{ref}`array-api-compat` can come in handy for the transition. The compat layer
allows you to still rely on your selection of array producing library, while
making sure you're already using standard compatible API. Additionally, it
offers a set of useful utility functions, such as:

- [array_namespace()](https://data-apis.org/array-api-compat/helper-functions.html#array_api_compat.array_namespace)
for fetching the namespace based on input arrays.
- [is_array_api_obj()](https://data-apis.org/array-api-compat/helper-functions.html#array_api_compat.is_array_api_obj)
for the introspection whether a given object is Array API compatible.
- [device()](https://data-apis.org/array-api-compat/helper-functions.html#array_api_compat.device)
to get a device the array resides on.

For now the migration from a specific library (e.g. NumPy) to a standard compatible
setup requires a manual intervention for each failing API call but in the future
we plan to provide some automation tools for it.
112 changes: 112 additions & 0 deletions spec/2025.12/tutorial_basic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
(tutorial-basic)=

# Array API Tutorial

In this tutorial we're going to show the migration from the array consumer
point of view for a simple graph algorithm.

The example presented here comes from [`graphblas-algorithms`](https://github.com/python-graphblas/graphblas-algorithms).
library. There we can find [the HITS algorithm](https://github.com/python-graphblas/graphblas-algorithms/blob/35dbc90e808c6bf51b63d51d8a63f59238c02975/graphblas_algorithms/algorithms/link_analysis/hits_alg.py#L9),
used for the link analysis for estimating prominence in sparse networks.

The inlined and slightly simplified (without "authority" feature)
implementation looks like this:

```python
def hits(G, max_iter=100, tol=1.0e-8, normalized=True):
N = len(G)
h = Vector(float, N, name="h")
a = Vector(float, N, name="a")
h << 1.0 / N
# Power iteration: make up to max_iter iterations
A = G._A
hprev = Vector(float, N, name="h_prev")
for _i in range(max_iter):
hprev, h = h, hprev
a << hprev @ A
h << A @ a
h *= 1.0 / h.reduce(monoid.max).get(0)
if is_converged(hprev, h, tol):
break
else:
raise ConvergenceFailure(max_iter)
if normalized:
h *= 1.0 / h.reduce().get(0)
a *= 1.0 / a.reduce().get(0)
return h, a

def is_converged(xprev, x, tol):
xprev << binary.minus(xprev | x)
xprev << unary.abs(xprev)
return xprev.reduce().get(0) < xprev.size * tol
```

We can see that the API is specific to the GraphBLAS array object.
There is `Vector` constructor, overloaded `<<` for assigning new values,
and `reduce`/`get` for reductions. We need to replace them, and, by convention,
we will use `xp` namespace for calling respective functions.

First we want to make sure we construct arrays in an agnostic way:

```python
h = xp.full(N, 1.0 / N)
A = xp.asarray(G.A)
```

Then, instead of `reduce` calls we use appropriate reducing
functions from the Array API:

```python
h = h / xp.max(h)
# ...
h = h / xp.sum(xp.abs(h))
a = a / xp.sum(xp.abs(a))
# ...
err = xp.sum(xp.abs(...))
```

We replace custom binary operation with the Array API counterpart:

```python
...(x - xprev)
```

And last but not least, let's ensure that the result of the convergence
condition is a scalar coming from our API:

```python
err < xp.asarray(N * tol)
```

The rewrite is complete now, we can assemble all constituent parts into
a full implementation:

```python
def hits(G, max_iter=100, tol=1.0e-8, normalized=True):
N = len(G)
h = xp.full(N, 1.0 / N)
A = xp.asarray(G.A)
# Power iteration: make up to max_iter iterations
for _i in range(max_iter):
hprev = h
a = hprev @ A
h = A @ a
h = h / xp.max(h)
if is_converged(hprev, h, N, tol):
break
else:
raise Exception("Didn't converge")
if normalized:
h = h / xp.sum(xp.abs(h))
a = a / xp.sum(xp.abs(a))
return h, a

def is_converged(xprev, x, N, tol):
err = xp.sum(xp.abs(x - xprev))
return err < xp.asarray(N * tol)
```

At this point the actual execution depends only on `xp` namespace,
and replacing that one variable allow us to switch from e.g. NumPy arrays
to a JAX execution on a GPU. This allows us to be more flexible, and, for
example use lazy evaluation and JIT compile a loop body with JAX's JIT compilation.
Loading