build and test against CUDA 13.1.0 #747

jameslamb · 2026-01-05T20:23:02Z

Contributes to rapidsai/build-planning#236

Tests that CI here will work with the changes from rapidsai/shared-workflows#483,
switches CUDA 13 builds to CUDA 13.1.0 and adds some CUDA 13.1.0 test jobs.

Summary by CodeRabbit

New Features
- Added support for CUDA Toolkit 13.1, providing compatibility with the latest CUDA runtime and libraries.
Chores
- Updated CI/CD workflows for building, testing, and deployment to target CUDA 13.1.0
- Updated conda environment configurations to CUDA 13.1 for ARM and x86_64 architectures
- Updated copyright year to 2026

_{✏️ Tip: You can customize this high-level summary in your review settings.}

copy-pr-bot · 2026-01-05T20:23:05Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

jakirkham · 2026-01-06T01:14:39Z

Label checker was stuck with an error. So removed and readded a label to rerun it

jakirkham · 2026-01-06T01:14:56Z

/ok to test

rgsl888prabhu · 2026-01-06T19:42:17Z

/ok to test 5d8893c

rgsl888prabhu · 2026-01-07T15:16:49Z

/ok to test 4356fe0

rgsl888prabhu · 2026-01-07T15:29:40Z

/ok to test 4fd715f

rgsl888prabhu · 2026-01-07T16:32:31Z

/ok to test c2c614f

rgsl888prabhu · 2026-01-07T19:19:18Z

/ok to test c4e79e4

rgsl888prabhu · 2026-01-07T22:06:38Z

@jameslamb This PR passes all the jobs

jameslamb · 2026-01-07T22:09:00Z

Wow amazing, thank you for working on this @rgsl888prabhu !!! I'd just assumed we wouldn't be able to do this until more of RAPIDS was building, glad you got it working.

I just took it out of draft. If you approve this, let's merge it 😁

coderabbitai · 2026-01-07T22:14:27Z

📝 Walkthrough

Walkthrough

Updates CUDA version references from 13.0 to 13.1 across GitHub Actions workflows, conda environment files, and build configurations. Switches workflow references from main to cuda-13.1.0 tag, renames conda environments accordingly, and updates driver entry point lookup version. Copyright years bumped from 2025 to 2026.

Changes

Cohort / File(s)	Summary
GitHub Workflows CUDA Tag Updates `.github/workflows/build.yaml`, `.github/workflows/pr.yaml`, `.github/workflows/test.yaml`, `.github/workflows/trigger-breaking-change-alert.yaml`	Updated all reusable workflow invocations from `@main` to `@cuda-13.1.0` tag across build, test, and publish pipelines; bumped copyright year from 2025 to 2026.
Conda Environment Files `conda/environments/all_cuda-131_arch-aarch64.yaml`, `conda/environments/all_cuda-131_arch-x86_64.yaml`	Renamed environments from `all_cuda-130_arch-` to `all_cuda-131_arch-`; updated CUDA dependency version from 13.0 to 13.1.
Build Dependencies `dependencies.yaml`, `cpp/src/utilities/driver_helpers.cuh`	Updated CUDA matrix in dependencies.yaml from 13.0 to 13.1 with corresponding package entry; updated driver entry point lookup to use CUDA 13.0 (version 13000) and added early nullptr return on lookup failure.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'build and test against CUDA 13.1.0' accurately and concisely summarizes the main change: updating CI/workflows to use CUDA 13.1.0 instead of earlier versions.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

.github/workflows/test.yaml (1)

31-68: The cuda-13.1.0 tag does not exist in rapidsai/shared-workflows.

All five workflow references (conda-cpp-tests, conda-python-tests, wheel-tests-cuopt, wheel-tests-cuopt-server, and conda-notebook-tests) are pinned to @cuda-13.1.0, which is not a published tag in the repository. Verify the correct tag name and update all references to use the intended tag version.

.github/workflows/build.yaml (1)

47-203: The cuda-13.1.0 tag does not exist in rapidsai/shared-workflows and will cause CI/CD failures.

All 14 workflow references are pinned to @cuda-13.1.0, but this tag does not exist in the repository. The shared-workflows repository uses semantic versioning tags like v26.02.00a, v25.12.00a, etc. Update the workflow references to use an existing tag or ensure the cuda-13.1.0 tag is created in rapidsai/shared-workflows before merging.

🤖 Fix all issues with AI agents

In @dependencies.yaml:
- Line 10: Update the README examples that still reference the old Docker tag
"cuda13.0" to the correct tag "cuda13.1-py3.13"; specifically replace
occurrences of "cuda13.0" at the examples (currently around the README lines
that show the docker image tags) with "cuda13.1-py3.13" so the documentation
matches the conda environment variants (cuda: ["12.9","13.1"]).

🧹 Nitpick comments (1)

cpp/src/utilities/driver_helpers.cuh (1)

21-25: Hardcoded CUDA 13.0 version for backward compatibility is appropriate, but consider making configurable per guidelines.

The code requests version 13000 (CUDA 13.0) symbols despite the PR updating to CUDA 13.1.0. This is a deliberate backward-compatibility strategy: binary built with CUDA 13.1 toolkit can run on systems with older CUDA 13.0 drivers. This approach is sound because:

The code uses only CUDA 13.0 APIs (cuDevSmResourceSplitByCount, cuGreenCtxCreate, etc.), not CUDA 13.1-specific features like cuDevSmResourceSplit or workqueue resources

Requesting 13000 ensures the binary doesn't break on CUDA 13.0 drivers

However, this hardcoded version conflicts with the coding guideline to "abstract multi-backend support for different CUDA versions." Consider making the version configurable (e.g., as a build-time macro or compile constant matching CUDART_VERSION) rather than hardcoded, allowing future flexibility if CUDA 13.1 features are needed.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22c3315 and c4e79e4.

📒 Files selected for processing (8)

.github/workflows/build.yaml
.github/workflows/pr.yaml
.github/workflows/test.yaml
.github/workflows/trigger-breaking-change-alert.yaml
conda/environments/all_cuda-131_arch-aarch64.yaml
conda/environments/all_cuda-131_arch-x86_64.yaml
cpp/src/utilities/driver_helpers.cuh
dependencies.yaml

🧰 Additional context used

📓 Path-based instructions (2)

**/*.{cu,cuh}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cu,cuh}: Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification
Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations

Files:

cpp/src/utilities/driver_helpers.cuh

**/*.{cu,cuh,cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cu,cuh,cpp,hpp,h}: Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Check that hard-coded GPU de...

Files:

cpp/src/utilities/driver_helpers.cuh

🧠 Learnings (12)

📓 Common learnings

Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions