openml · PGijsbers · Apr 14, 2026 · Apr 14, 2026 · Apr 14, 2026 · Apr 14, 2026
diff --git a/docs/contributing/documentation.md b/docs/contributing/documentation.md
@@ -19,7 +19,7 @@ For larger changes, clone a fork of the repository as described in the
     After cloning the repository, you may also build and serve the documentation through Docker:
 
     ```
-    docker compose up docs
+    docker compose -f compose.yaml -f compose.ports.yaml up docs
     ```
 
 

diff --git a/docs/contributing/contributing.md → docs/contributing/setup.md b/docs/contributing/contributing.md → docs/contributing/setup.md
@@ -112,7 +112,7 @@ This is useful, for example, to run unit tests in the container:
 python -m pytest -x -v -m "not php_api"
 ```
 
-## Unit tests
+## Running Unit tests
 
 Our unit tests are written with the [`pytest`](https://pytest.org) framework.
 An invocation could look like this:

diff --git a/docs/contributing/tests.md b/docs/contributing/tests.md
@@ -1,98 +1,135 @@
-# Writing Tests
+# Testing
 
-tl;dr:
- - Setting up the `py_api` fixture to test directly against a REST API endpoint is really slow, only use it for migration/integration tests.
- - Getting a database fixture and doing a database call is slow, consider mocking if appropriate.
+This page covers running and writing tests for the REST API.
+It assumes you already followed the instructions on the ["Setup"](setup.md) page.
 
-## Overhead from Fixtures
-Sometimes, you want to interact with the REST API through the `py_api` fixture,
-or want access to a database with `user_test` or `expdb_test` fixtures.
-Be warned that these come with considerable relative overhead, which adds up when running thousands of tests.
+!!! note "Follow the documentation"
+
+    Some tests and files in the current test suite need to be updated to reflect these conventions.
+    This documentation specifies the desired way of doing things even if the source code does something else.
+
+## Running Tests
+
+Following the setup page, you can run tests when the services are running.
+Start the services using one of two commands:
+
+```bash
+docker compose --profile apis up -d  # if you run tests which need PHP REST API
+docker compose up python-api -d  # if you do not run tests that require the PHP API
+```
+and then invoke pytest:
+```bash
+docker compose exec python-api python -m pytest tests
+```
+As of writing, the full test suite ran in sequence takes about 2 minutes.
+We generally recommend running subsets of the tests during development.
+In particular, running only the tests which do not use fuzzing or the PHP API takes under 3 seconds:
+```bash
+docker compose exec python-api python -m pytest tests -m "not php_api and not slow"
+```
+
+## Writing Tests
+
+We use the ubiquitous [Pytest](https://docs.pytest.org) framework when writing tests.
+
+### File Structure
+When writing tests, we have the following additional conventions on the file structure:
+
+ - Use a `_test` suffix when naming our files (not a `test_` prefix). Our tests already exist in a `tests` directory, and in common tree list side panels it's likely you can only see the start of file names, so this is more informative.
+ - One dedicated test file per endpoint
+
+
+### General Test Guidelines
+Some guidelines and things to keep in mind when writing tests:
+
+ - Try to keep tests small, so that they fail for one particular reason only.
+ - Mark tests that update the database in anyway with the `mut` marker (`@pytest.mark.mut`).
+ - If the test is excessively slow (>0.1 sec) and does not connect to PHP, use a `slow` marker. Tests that include PHP always require roundtrips through other services which makes them slow by default. PHP tests can be filtered out with the automatically generated "php_api" marker.
+ - Four common fixtures you might need when writing tests are:
+    - py_api: an async client for the Python based REST API
+    - php_api: an async client for the PHP based REST API
+    - expdb_test: an AsyncConnection to the "expdb" OpenML database.
+    - user_test: an AsyncConnection to the "openml" OpenML database.
+ - Above fixtures have considerable per-test overhead. Use them only when you need them.
+ - When writing assertions the expected value (a constant, or a php response) should be on the right (`assert response == expected`).
+
+### Writing Tests for an Endpoint
+Because the `py_api` and database fixtures provide considerable per-test overhead,
+follow these guidelines for writing a test suite for an endpoint.
+
+Include tests against `py_api` for input validation specific to that endpoint. Validation in reused components should be tested centrally (e.g., Pagination).
+```python
+def test_get_dataset_identifier_validation(py_api: httpx.AsyncClient) -> None:
+    response = await py_api.get("/datasets/not-an-integer")
+    assert response.status_code == HTTPStatus.UNPROCESSABLE_ENTITY
+```
+
+Include one test against the `py_api` that confirms a successful request has the expected result:
+
+```python
+def test_get_dataset_success(py_api: httpx.AsyncClient) -> None:
+    response = await py_api.get("/datasets/1")
+    assert response.status_code == HTTPStatus.OK
+    assert response.json() == {...}  # some expected data
+```
+
+For all other tests, do not use `py_api` but call the implementing function directly. For example, do not call `client.get("/datasets/1")` but instead `get_dataset`:
 
 ```python
-@pytest.mark.parametrize('execution_number', range(5000))
-def test_private_dataset_owner_access(
-        execution_number,
-        expdb_test: Connection,
-        user_test: Connection,
-        py_api: TestClient,
-) -> None:
-    fetch_user(ApiKey.REGULAR_USER, user_test)  # accesses only the user db
-    get_estimation_procedures(expdb_test)  # accesses only the experiment db
-    py_api.get("/does/not/exist")  # only queries the api
-    pass
+async def test_get_dataset_private_success(expdb_test: AsyncConnection, user_test: AsyncConnection) -> None:
+    private_dataset = 42
+    owner_of_that_dataset = OWNER_USER
+    dataset = await get_dataset(dataset_id=42, user=owner_of_that_dataset, user_db=user_test, expdb_db=expdb_test)
+    assert dataset.id == private_dataset
+
+async def test_get_dataset_private_access_denied(expdb_test: AsyncConnection, user_test: AsyncConnection) -> None:
+    private_dataset = 42
+    owner_of_that_dataset = SOME_USER  # Test User defined in a common file
+    with pytest.raises(DatasetNoAccessError) as e:
+        await get_dataset(dataset_id=42, user=owner_of_that_dataset, user_db=user_test, expdb_db=expdb_test)
+    assert e.value.status_code == HTTPStatus.FORBIDDEN
 ```
 
-When individually adding/removing components, we measure (for 5000 repeats, n=1):
-
-| expdb | user | api | exp call | user call | api get |  time (s) |
-|-------|------|-----|----------|-----------|---------|----------:|
-|  ❌   |  ❌  | ❌  | ❌      | ❌        | ❌      |      1.78 |
-|  ✅   |  ❌  | ❌  | ❌      | ❌        | ❌      |      3.45 |
-|  ❌   |  ✅  | ❌  | ❌      | ❌        | ❌      |      3.22 |
-|  ❌   |  ❌  | ✅  | ❌      | ❌        | ❌      |    298.48 |
-|  ✅   |  ✅  | ❌  | ❌      | ❌        | ❌      |      4.44 |
-|  ✅   |  ✅  | ✅  | ❌      | ❌        | ❌      |    285.69 |
-|  ✅   |  ❌  | ❌  | ✅      | ❌        | ❌      |      4.91 |
-|  ❌   |  ✅  | ❌  | ❌      | ✅        | ❌      |      5.81 |
-|  ✅   |  ✅  | ✅  | ✅      | ✅        | ✅      |    307.91 |
-
-Adding a fixture that just returns some value adds only minimal overhead (1.91s),
-so the burden comes from establishing the database connection itself.
-
-We make the following observations:
-
-- Adding a database fixture adds the same overhead as instantiating an entirely new test.
-- Overhead of adding multiple database fixtures is not free
-- The `py_api` fixture adds two orders of magnitude more overhead
-
-We want our tests to be fast, so we want to avoid using these fixtures when we reasonably can.
-We restrict usage of `py_api` fixtures to integration/migration tests, since it is very slow.
-These only run on CI before merges.
-For database fixtures
-
-We will write some fixtures that can be used to e.g., get a `User` without accessing the database.
-The validity of these users will be tested against the database in only a single test.
-
-### Mocking
-Mocking can help us reduce the reliance on database connections in tests.
-A mocked function can prevent accessing the database, and instead return a predefined value instead.
-
-It has a few upsides:
- - It's faster than using a database fixture (see below).
- - The test is not dependent on the database: you can run the test without a database.
-
-But it also has downsides:
- - Behavior changes in the database, such as schema changes, are not automatically reflected in the tests.
- - The database layer (e.g., queries) are not actually tested.
-
-Basically, the mocked behavior may not match real behavior when executed on a database.
-For this reason, for each mocked entity, we should add a test that verifies that if the database layer
-is invoked with the database, it returns the expected output that matches the mock.
-This is additional overhead in development, but hopefully it pays back in more granular test feedback and faster tests.
-
-On the speed of mocks, consider these two tests:
-
-```diff
-@pytest.mark.parametrize('execution_number', range(5000))
-def test_private_dataset_owner_access(
-        execution_number,
-        admin,
-+        mocker,
--        expdb_test: Connection,
-) -> None:
-+    mock = mocker.patch('database.datasets.get')
-+    class Dataset(NamedTuple):
-+        uploader: int
-+        visibility: Visibility
-+    mock.return_value = Dataset(uploader=1, visibility=Visibility.PRIVATE)
-
-    _get_dataset_raise_otherwise(
-        dataset_id=1,
-        user=admin,
--        expdb=expdb_test,
-+        expdb=None,
+*note:* We will likely mock the database layer at some point, but it's still taking shape.
+
+For the initial development of this API, we want to have a clear mapping from PHP API output to the new output.
+We also want to be aware of quirks that the PHP API might have.
+For both these reasons, we write what we call "migration" tests: they call both APIs with a variety of input and compare the result.
+Note that in some cases, there are some quite significant differences between the PHP and the Python based API.
+That's okay, but in that case we want to "document" the behavior of both in the test.
+Please reference a few implemented migration tests to get a better understanding, but here is a high level sketch:
+
+```python
+async def test_get_dataset(py_api: httpx.AsyncClient, php_api: httpx.AsyncClient) -> None:
+    py_response, php_response = await asyncio.gather(
+        py_api.get("/datasets/1"),
+        php_api.get("/data/1"),
     )
+
+    if py_response.status_code == HTTPStatus.OK and php_response.status_code == HTTPStatus.OK:
+        _assert_success_response_equal(py_response.json(), php_response.json())
+    else:
+        _assert_error_response_equal(py_response, php_response)
+
+def _assert_success_response_equal(py_json, php_json) -> None:
+    # PHP API returns numbers as strings
+    py_json = nested_num_to_str(py_json)
+
+    # There might be more differences which need addressing
+    # ...
+    # and then finally we compare the results to ensure the remaining data is identical
+    assert py_json == php_json
+
+def _assert_error_response_equal(py_response, php_response) -> None:
+    # There might be some translation of error codes
+    if py_response.status_code == HTTPStatus.UNPROCESSABLE_ENTITY:
+        assert php_response.status_code == HTTPStatus.PRECONDITION_FAILED
+    elif ...:
+        ...
+    else:
+        assert py_response.status_code == php_response.status_code
+
+    # Python follows RFC9457 while PHP has a custom system:
+    assert py_response.json()["code"] == php_response.json()["error"]["code"]
+
 ```
-There is only a single database call in the test. It fetches a record on an indexed field and does not require any joins.
-Despite the database call being very light, the database-included test is ~50% slower than the mocked version (3.50s vs 5.04s).
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -21,7 +21,7 @@ nav:
       - Changes: migration.md
   - Contributing:
       - contributing/index.md
-      - Development: contributing/contributing.md
+      - Setup: contributing/setup.md
       - Tests: contributing/tests.md
       - Documentation: contributing/documentation.md
       - Project Overview: contributing/project_overview.md

diff --git a/pyproject.toml b/pyproject.toml
@@ -121,11 +121,15 @@ asyncio_mode = "auto"
 asyncio_default_fixture_loop_scope = "session"
 asyncio_default_test_loop_scope = "session"
 markers = [
-    "slow: test or sets of tests which take more than a few seconds to run.",
+    # We exclude PHP tests because they are quite slow to begin with.
+    "slow: python-only test or sets of tests which take more than 0.1 second to run.",
     # While the `mut`ation marker below is not strictly necessary as every change is
-    # executed within transaction that is rolled back, it can halt other unit tests which
+    # executed within transaction that is rolled back, it can halt other unit tests
     # whose queries may depend on the execution or rollback of the transaction.
-    "mut: executes a mutation on the database (in a transaction which is rolled back)",
+    "mut: executes a mutation on the database (even when it is cleaned up).",
+    # *_api markers included for documentation, should not be added manually.
+    "py_api: requests the py_api fixture. Added automatically.",
+    "php_api: requests the php_api fixture. Added automatically.",
 ]
 filterwarnings = [
     'ignore:A private pytest class or function was used.:DeprecationWarning:tests.conftest:',
-Original file line number
+Diff line change
@@ Expand Up @@
         After cloning the repository, you may also build and serve the documentation through Docker:
         ```
-        docker compose up docs
+        docker compose -f compose.yaml -f compose.ports.yaml up docs
         ```
@@ Expand Down @@