Parallelize tests #221

PGijsbers · 2025-12-19T15:51:23Z

Summary by Sourcery

Parallelize test execution in CI and adjust test configuration for improved performance.

Build:

Add pytest-xdist as a development dependency to enable parallel test execution.

CI:

Consolidate separate PHP and Python GitHub Actions jobs into a single matrix-based test job that runs different marker combinations in parallel using pytest-xdist.

Tests:

Reduce Hypothesis max_examples for a slow dataset listing test to decrease runtime while retaining coverage.

Summary by CodeRabbit

Tests
- CI test workflow unified into a single job that runs test combinations in parallel via a matrix, improving feedback speed.
- Test execution consolidated into fewer steps and now produces a single coverage report.
- Reduced slow-test Hypothesis iterations to lower runtime while preserving coverage.
Chores
- Added a dev dependency to enable parallel test execution.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Mainly because the test runs for long. I will have to look into what an appropriate number is, but I suspect 500 will still find most of the failures that 5000 samples would.

sourcery-ai · 2025-12-19T15:51:29Z

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

CI test workflow is consolidated into a single matrix-based job that runs pytest in parallel across API markers and mutation markers, Hypothesis test data generation is reduced for a slow test, and pytest-xdist is added as a development dependency to support parallel execution.

Flow diagram for matrix-based parallel pytest execution

flowchart TD
    A[Start CI workflow] --> B[Checkout code]
    B --> C[Set up Python environment]
    C --> D[Install dev dependencies including pytest-xdist]
    D --> E[Define matrix with markers]
    E --> F{Matrix entry}

    F -->|api| G[Run pytest -m api with xdist]
    F -->|mutation| H[Run pytest -m mutation with xdist]

    G --> I[Reduce Hypothesis settings for slow test]
    H --> I

    I --> J[Collect and upload test results]
    J --> K[Finish CI workflow]

File-Level Changes

Change	Details	Files
Consolidated separate CI jobs into a single matrix-based test job and enabled parallel pytest execution with marker combinations.	Replaced the previous compare-php and python jobs with a single tests job using a GitHub Actions strategy matrix over php and mutations dimensions. Kept the docker compose startup step but adjusted it to run once per matrix entry using both python and php profiles. Updated the pytest invocation inside the container to use pytest-xdist with -n auto and to select tests via combined matrix markers for php_api vs not php_api and mut vs not mut.	`.github/workflows/tests.yml`
Reduced workload of a slow Hypothesis-based test to make parallel CI runs more tractable.	Lowered Hypothesis settings max_examples from 5000 to 500 for the list data quality test while preserving slow and other settings. Documented via an inline comment that the new example count should be better justified in the future.	`tests/routers/openml/datasets_list_datasets_test.py`
Added pytest-xdist as a development dependency to support parallel test execution.	Extended the dev extra in pyproject.toml to include pytest-xdist so the CI and developers can run pytest with -n auto.	`pyproject.toml`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

coderabbitai · 2025-12-19T15:51:33Z

Warning

Rate limit exceeded

@PGijsbers has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 5 minutes and 10 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 1ad0318 and 99c3986.

📒 Files selected for processing (1)

.github/workflows/tests.yml (2 hunks)

Walkthrough

Consolidates CI test jobs into a single matrix-driven GitHub Actions job, adds pytest-xdist to dev deps for parallel test runs, and reduces a Hypothesis max_examples setting from 5000 to 500 in one slow test.

Changes

Cohort / File(s)	Change Summary
Workflow consolidation `.github/workflows/tests.yml`	Replaces two jobs (`compare-php`, `python`) with a single `tests` job using a matrix over `php_api` and `mutations`; merges service startup and test execution into unified steps; runs pytest with a computed marker and `-n auto`; keeps Codecov upload.
Dev dependency update `pyproject.toml`	Adds `pytest-xdist` to the `[project.optional-dependencies]` `dev` extras to enable parallel test execution.
Test tuning `tests/routers/openml/datasets_list_datasets_test.py`	Lowers Hypothesis `max_examples` from 5000 to 500 for the slow configuration of `test_list_data_identical`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10–15 minutes

Verify the workflow matrix covers all previous combinations and the computed marker logic maps to expected pytest markers.
Confirm conditional service startup correctly includes/excludes the PHP profile when php_api is toggled.
Check that -n auto with pytest-xdist does not surface flaky or race conditions in tests.
Ensure the reduced Hypothesis max_examples still provides acceptable coverage for the slow test.

Poem

🐰 I hopped through CI, tidy and bright,
Jobs joined as one to speed the night.
Fewer trials, swift parallel runs,
Coverage still shines like carrot suns.
A rabbit’s cheer — the tests take flight! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title 'Parallelize tests' directly and clearly summarizes the main objective of the changeset, which is to enable parallel test execution across all modified files.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey - I've left some high level feedback:

The test matrix currently encodes marker expressions directly as strings (e.g. "not php_api", "mut") and concatenates them in the pytest -m argument; consider using boolean matrix axes (like php_api: [true,false], mutations: [true,false]) and constructing the marker expression in the step to avoid fragile string handling and make the intent clearer.
Because docker compose is always started with both the python and php profiles, even when running only non-php_api tests, you might be incurring unnecessary startup cost; consider varying the profiles based on the matrix entry so that only the needed services are started for each test subset.
The inline comment # This number needs to be better motivated on max_examples=500 suggests a TODO; it would be helpful either to document the rationale for this lower value now (e.g. time vs. coverage tradeoff) or to replace it with a more specific action item so the intent is clear to future maintainers.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The test matrix currently encodes marker expressions directly as strings (e.g. `"not php_api"`, `"mut"`) and concatenates them in the `pytest -m` argument; consider using boolean matrix axes (like `php_api: [true,false]`, `mutations: [true,false]`) and constructing the marker expression in the step to avoid fragile string handling and make the intent clearer.
- Because `docker compose` is always started with both the `python` and `php` profiles, even when running only non-`php_api` tests, you might be incurring unnecessary startup cost; consider varying the profiles based on the matrix entry so that only the needed services are started for each test subset.
- The inline comment `# This number needs to be better motivated` on `max_examples=500` suggests a TODO; it would be helpful either to document the rationale for this lower value now (e.g. time vs. coverage tradeoff) or to replace it with a more specific action item so the intent is clear to future maintainers.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

codecov · 2025-12-19T15:55:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@73e998e). Learn more about missing BASE report.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #221   +/-   ##
=======================================
  Coverage        ?   54.32%           
=======================================
  Files           ?       32           
  Lines           ?     1132           
  Branches        ?      100           
=======================================
  Hits            ?      615           
  Misses          ?      516           
  Partials        ?        1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/routers/openml/datasets_list_datasets_test.py (1)

226-231: Consider documenting the rationale for the 10x reduction in test examples.

The reduction from 5000 to 500 examples is significant (90% reduction in coverage). While this aligns with the PR's goal of faster test execution through parallelization, the inline comment suggests this value needs better justification.

Consider:

Documenting why 500 examples provides sufficient coverage for this property-based test

Whether the parallelization gains justify the reduced test thoroughness

If 500 is too aggressive, a middle ground like 1000-2000 might balance speed and coverage

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 73e998e and 65fc6de.

📒 Files selected for processing (3)

.github/workflows/tests.yml (2 hunks)
pyproject.toml (1 hunks)
tests/routers/openml/datasets_list_datasets_test.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Sourcery review
GitHub Check: tests (php_api, not mut)
GitHub Check: tests (not php_api, not mut)
GitHub Check: tests (not php_api, mut)
GitHub Check: tests (php_api, mut)

🔇 Additional comments (3)

pyproject.toml (1)

34-34: LGTM!

The addition of pytest-xdist correctly enables parallel test execution with the -n auto flag used in the workflow.
.github/workflows/tests.yml (2)
15-19: LGTM! Matrix strategy correctly partitions test execution.

The matrix strategy creates 4 parallel jobs that partition tests by markers, enabling faster test execution through parallelization.

29-29: Verify that all tests are appropriately marked for matrix filtering.

The parallel execution strategy looks correct, using -n auto for parallelization and marker filtering to partition tests across matrix jobs. However, ensure that:

All relevant tests are marked with either php_api or the absence is intentional

All mutation tests are marked with mut

Tests without markers will run only in the "not php_api and not mut" job

Run the following script to verify marker coverage:
#!/bin/bash
# Description: Verify test marker coverage for matrix filtering

echo "=== Tests marked with 'php_api' ==="
rg -n --type=py -C2 '@pytest\.mark\.php_api' tests/

echo -e "\n=== Tests marked with 'mut' ==="
rg -n --type=py -C2 '@pytest\.mark\.mut' tests/

echo -e "\n=== All test functions (for comparison) ==="
ast-grep --pattern 'def test_$_($$$):'

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

.github/workflows/tests.yml (1)
40-45: Coverage file not accessible to Codecov action.

The coverage xml command generates the file inside the Docker container, but the Codecov action runs on the host and won't find it. You need to copy the coverage file from the container to the host filesystem.
🔎 Proposed fix
       - name: Produce coverage report
-        run: docker exec openml-python-rest-api coverage xml
+        run: |
+          docker exec openml-python-rest-api coverage xml
+          docker cp openml-python-rest-api:/app/coverage.xml coverage.xml
       - name: Upload results to Codecov
         uses: codecov/codecov-action@v4
         with:
           token: ${{ secrets.CODECOV_TOKEN }}
+          files: ./coverage.xml
Note: Adjust /app/coverage.xml to the actual path inside the container where the coverage file is generated.

🧹 Nitpick comments (1)

.github/workflows/tests.yml (1)
15-19: Consider adding fail-fast: false to the matrix strategy.

With the default fail-fast: true, if one matrix combination fails, the other running jobs are cancelled. For test jobs, it's often useful to see all failures across the matrix.
🔎 Suggested change
   strategy:
+    fail-fast: false
     matrix:
       php_api: [true, false]
       mutations: [true, false]

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65fc6de and 1ad0318.

📒 Files selected for processing (1)

.github/workflows/tests.yml (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Sourcery review
GitHub Check: tests (true, false)
GitHub Check: tests (true, true)

🔇 Additional comments (2)

.github/workflows/tests.yml (2)

28-34: LGTM!

The conditional profile inclusion and error handling for the docker compose command are well-implemented.

36-39: LGTM!

The marker construction logic correctly maps matrix values to pytest markers, and -n auto enables parallel test execution via pytest-xdist.

PGijsbers added 3 commits December 19, 2025 16:08

Add pytest-xdist for parallel testing

9609d99

Reduce number of samples from the hypothesis space

bc9c71e

Mainly because the test runs for long. I will have to look into what an appropriate number is, but I suspect 500 will still find most of the failures that 5000 samples would.

Split up job matrix and parallelize each invocation

65fc6de

sourcery-ai bot reviewed Dec 19, 2025

View reviewed changes

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

Change types of parametrization and avoid starting php if possible

1ad0318

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

PGijsbers added 2 commits December 19, 2025 17:13

Give a readable name to each job

664436d

[no ci] use names that are clearer

99c3986

PGijsbers merged commit 8aaf051 into main Dec 19, 2025
3 checks passed

PGijsbers deleted the parallelize-tests branch December 19, 2025 16:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Parallelize tests #221

Parallelize tests #221

Uh oh!

PGijsbers commented Dec 19, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

sourcery-ai bot commented Dec 19, 2025 •

edited

Loading

Reviewer's Guide

Flow diagram for matrix-based parallel pytest execution

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai bot commented Dec 19, 2025 •

edited

Loading

Rate limit exceeded

Uh oh!

sourcery-ai bot left a comment

Uh oh!

codecov bot commented Dec 19, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Parallelize tests #221

Parallelize tests #221

Uh oh!

Conversation

PGijsbers commented Dec 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Summary by CodeRabbit

Uh oh!

sourcery-ai bot commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Flow diagram for matrix-based parallel pytest execution

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai bot commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Dec 19, 2025

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PGijsbers commented Dec 19, 2025 •

edited by coderabbitai bot

Loading

sourcery-ai bot commented Dec 19, 2025 •

edited

Loading

coderabbitai bot commented Dec 19, 2025 •

edited

Loading