ci(pr): classify PR test failures as new vs. known and post sticky comment by rgsl888prabhu · Pull Request #1194 · NVIDIA/cuopt

rgsl888prabhu · 2026-05-08T20:44:20Z

Summary

This is to handle flaky CI where failures might not be part of the PR and it will categorized as known failures so developers can admin merge PRs in urgent situations.

On every PR run, classify each test failure against the target branch's nightly history and post a single sticky comment that splits failures into:

CRASHES (signal-death) — surfaced in a red [!CAUTION] callout with the full diagnostic.
NEW — failures this PR likely introduced.
KNOWN — recurring on nightly, known flaky on nightly, or flaked in this run.

…mment Cross-reference each PR test failure against the target branch's nightly failure history and post a single sticky comment that splits failures into NEW (introduced by this PR) and KNOWN (recurring on nightly, known flaky on nightly, or flaked in this run via pytest-rerunfailures). - Add `--mode pr` to nightly_report.py: read history without writing back, annotate each failure with `pr_classification`. - Branch nightly_report_helper.sh on RAPIDS_BUILD_TYPE so PR runs read the target branch's history and write per-matrix summaries to a PR-scoped S3 prefix (`ci_test_reports/pr/run-{run_id}/`). - Extract shared aggregator helpers into aggregate_common.py (download/load/aggregate/escape) so the PR aggregator can reuse them without behavioural drift on nightly. - Add aggregate_pr.py to build the Markdown comment body and ci/pr_summary.sh to post or update the sticky comment via the GitHub API. - Wire a new `pr-test-summary` job into pr.yaml: gated on `if: always()`, permissions `pull-requests: write`, runs after every PR test job. Not added to pr-builder needs — the comment is informational and must not block merging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

copy-pr-bot · 2026-05-08T20:44:24Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Mirror the nightly-summary pattern: keep job-level wiring (needs/if) in pr.yaml and move the implementation into its own reusable workflow. Use explicit secret pass-through (matching how test.yaml calls nightly-summary.yaml) instead of `secrets: inherit`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The shell script had three blocks of inline Python plus a bash-specific ${VAR@Q} quoting trick. Move all GitHub API interactions into ci/utils/pr_comment_helper.py with two CLI subcommands: - base-ref: resolve the PR's target branch - post: post or update the sticky comment by hidden marker Stdlib only (urllib + json) so it runs in slim CI containers. The hidden marker now lives in pr_comment_helper.COMMENT_MARKER as the single source of truth, imported by aggregate_pr.py. ci/pr_summary.sh shrinks from ~120 lines of mixed shell+Python to ~70 lines of straight orchestration. pr_test_summary.yaml's base-ref lookup goes from a curl+inline-Python pipe to a one-liner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rgsl888prabhu · 2026-05-08T21:06:04Z

/ok to test c4876d2

rapids-check-pr-job-dependencies (run by the shared checks workflow) fails any PR job not listed in pr-builder.needs. pr-test-summary is informational and intentionally kept out of pr-builder's needs, so add it to ignored_pr_jobs. Without this the dependency check fails, the subsequent changed-files step is skipped, and the next step throws "Error reading JToken from JsonReader" trying to fromJSON the empty output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

python:3.14-slim ships with dash as /bin/sh and GitHub Actions picks it for `run:` steps in this container. Dash doesn't understand `[[ ... ]]` or `set -o pipefail`, so the inline script failed with `/__w/_temp/X.sh: 4: [[: not found`. Add `defaults.run.shell: bash` at the workflow level, matching the existing convention in self_hosted_service_test.yaml and build_test_publish_images.yaml. Bash is already available in the container — only the shell selection needed forcing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rgsl888prabhu · 2026-05-11T15:52:28Z

/ok to test 5b02581

DO NOT MERGE. Exists only to confirm that pr-test-summary picks up a new failure end to end and posts it under "NEW failures" in the sticky PR comment. Remove this file before merging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rgsl888prabhu · 2026-05-11T18:35:25Z

/ok to test 49da483

github-actions · 2026-05-11T19:44:54Z

PR Test Classification

1 CRASH(es)

Compared against nightly history for main · PR head: fc35b3688f83 · Run date: 2026-05-14 · Workflow run

Per-matrix status

Test type	Matrix	Status	Passed	Failed	Skipped
cpp	`cuda12.2.2-aarch64`	PASS	533	0	3
cpp	`cuda12.2.2-x86_64`	PASS	533	0	3
cpp	`cuda12.9.1-x86_64`	PASS	533	0	3
cpp	`cuda13.0.2-aarch64`	PASS	533	0	3
cpp	`cuda13.0.2-x86_64`	PASS	533	0	3
cpp	`cuda13.1.1-aarch64`	PASS	533	0	3
cpp	`cuda13.1.1-x86_64`	PASS	533	0	3
python	`cuda12.2.2-py3.11-x86_64`	PASS	208	0	10
python	`cuda12.2.2-py3.12-x86_64`	PASS	208	0	10
python	`cuda12.9.1-py3.14-x86_64`	PASS	208	0	10
python	`cuda13.0.2-py3.12-aarch64`	PASS	207	0	11
python	`cuda13.0.2-py3.12-x86_64`	PASS	208	0	10
python	`cuda13.1.1-py3.13-x86_64`	PASS	208	0	10
python	`cuda13.1.1-py3.14-aarch64`	PASS	207	0	11
python	`cuda13.1.1-py3.14-x86_64`	PASS	208	0	10
wheel-python	`cuda12.2.2-py3.11-aarch64`	NEW FAIL	0	1	0
wheel-python	`cuda12.9.1-py3.11-x86_64`	PASS	114	0	3
wheel-python	`cuda12.9.1-py3.14-x86_64`	PASS	114	0	3
wheel-python	`cuda13.0.2-py3.12-aarch64`	PASS	114	0	3
wheel-python	`cuda13.0.2-py3.12-x86_64`	PASS	114	0	3
wheel-python	`cuda13.1.1-py3.13-x86_64`	PASS	114	0	3
wheel-python	`cuda13.1.1-py3.14-aarch64`	PASS	114	0	3
wheel-python	`cuda13.1.1-py3.14-x86_64`	PASS	114	0	3
wheel-server	`cuda12.2.2-py3.11-aarch64`	PASS	93	0	8
wheel-server	`cuda12.9.1-py3.11-x86_64`	PASS	94	0	7
wheel-server	`cuda12.9.1-py3.14-x86_64`	PASS	94	0	7
wheel-server	`cuda13.0.2-py3.12-aarch64`	PASS	93	0	8
wheel-server	`cuda13.0.2-py3.12-x86_64`	PASS	94	0	7
wheel-server	`cuda13.1.1-py3.13-x86_64`	PASS	94	0	7
wheel-server	`cuda13.1.1-py3.14-aarch64`	PASS	93	0	8
wheel-server	`cuda13.1.1-py3.14-x86_64`	PASS	94	0	7

Caution

CRASHES detected — a test process was terminated by a signal mid-run.
These need urgent investigation. The JUnit XML was not finalized, so the specific test that triggered the crash may not be identified; check the workflow run log for the last test invoked before the signal.

1 crash — click to expand details

`pytest-cuopt` — `PROCESS_CRASH` [wheel-python / cuda12.2.2-py3.11-aarch64] — NEW

pytest process terminated by SIGABRT mid-run. The JUnit XML was not finalized; the test that triggered the crash is unknown — inspect the run log for the last test invoked.

Classification compares each failure against the most recent nightly history for the target branch. Tests passed on retry via pytest-rerunfailures are reported as flaky.

rgsl888prabhu · 2026-05-12T02:03:12Z

/ok to test 0f8275a

rgsl888prabhu · 2026-05-12T16:26:20Z

/ok to test b622786

rgsl888prabhu · 2026-05-12T16:52:03Z

/ok to test b622786

A reusable workflow can only request permissions the caller already has. The repo's default GITHUB_TOKEN permissions don't include pull-requests:write, so the caller in pr.yaml must grant it explicitly for the comment poster. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rgsl888prabhu · 2026-05-12T18:03:45Z

/ok to test 062e063

The assert-failure test already verifies the basic NEW-failure flow. Add a second test that sends SIGSEGV to pytest mid-run so the crash-marker path (write_pytest_crash_marker, added in #1191) writes a PROCESS_CRASH entry that the classifier surfaces under NEW failures with a SIGSEGV message. Named test_zz_* so it runs after the assert test in definition order and the assertion failure still lands in the partial JUnit XML before pytest is killed. DO NOT MERGE. Remove both smoke tests before merging PR #1194. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rgsl888prabhu · 2026-05-12T18:26:53Z

/ok to test c4dfdc8

Crashes (PROCESS_CRASH entries written by PR #1191's write_pytest_crash_marker, or any failure whose message matches "crashed with SIG*") now render in a dedicated GitHub `[!CAUTION]` alert block at the top of the comment. The full crash diagnostic goes in a fenced code block — no more half-sentence truncation when the message gets cut at 140 chars. Bumps the table-cell short_msg cap from 140 to 300 chars so ordinary failure messages also stop ending mid-sentence. Crashes are split out of the NEW and KNOWN buckets before those sections render, so the CAUTION block is the single source of truth for crash entries (no duplication). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rgsl888prabhu · 2026-05-12T20:42:16Z

/ok to test 6984bc1

Multi-line docstring closing quotes must be on their own line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rgsl888prabhu · 2026-05-12T20:44:46Z

/ok to test d0e0957

Verify the PR comment shows the CAUTION block on exactly one matrix entry (cuda*-py3.11-x86_64) while other Python versions stay green. Uses pytest.mark.skipif on sys.version_info so non-3.11 runs skip the crash without affecting the assertion-failure smoke test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Keep the CAUTION block as the headline; move per-crash heading and the full diagnostic into a <details> tab labelled "N crash(es) — click to expand details". Same content, click-to-reveal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rgsl888prabhu · 2026-05-13T15:33:03Z

/ok to test 868f6b8

Replace the `### NEW failures (likely introduced by this PR)` plain heading with a `> [!CAUTION]` alert block that GitHub renders with a red border and caution icon. Visual hierarchy is now: - CAUTION (crashes, collapsible details) - CAUTION (NEW failures + table) - plain h3 (KNOWN issues + sub-tables) so red callouts cover "stuff this PR may have caused" and plain sections cover pre-existing issues. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rgsl888prabhu · 2026-05-13T15:38:51Z

/ok to test 0be6ac9

Drop the temporary python/cuopt/cuopt/tests/test_pr_comment_smoke.py (verified the PR comment classifier and crash callout end-to-end). Also remove the dead _details_block helper and MAX_ERROR_SNIPPET constant from aggregate_pr.py — leftover from an early draft that was superseded by the inline CRASHES code block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rgsl888prabhu · 2026-05-13T19:09:33Z

/ok to test fc35b36

coderabbitai · 2026-05-13T19:15:08Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds end-to-end PR test-summary aggregation: shared S3 summary helpers, a PR aggregator that builds a Markdown comment, a GitHub comment helper to upsert a sticky comment, nightly PR-mode classification, an orchestration script, and workflow wiring to run the informational job.

Changes

PR Test Summary Aggregation and Reporting

Layer / File(s)	Summary
Shared aggregation foundation `ci/utils/aggregate_common.py`, `ci/utils/aggregate_nightly.py`	Common S3 download, local loading, consolidation, and HTML-escaping helpers are extracted into `aggregate_common.py` and imported by `aggregate_nightly.py` to eliminate duplication.
PR aggregation and comment body builder `ci/utils/aggregate_pr.py`	New CLI detects crashes, classifies failures into NEW/KNOWN/flaky sub-categories, renders per-matrix status grid and capped failure tables, builds the Markdown PR comment body, and writes consolidated JSON.
PR comment posting utilities `ci/utils/pr_comment_helper.py`	New Python CLI uses GitHub REST API to resolve PR base ref, search for an existing sticky comment by marker, and post or update a single sticky PR comment via PATCH/POST.
Nightly report PR-mode classification and S3 workflow `ci/utils/nightly_report.py`, `ci/utils/nightly_report_helper.sh`	`nightly_report.py` adds PR-mode classification comparing PR results against nightly history without mutation. `nightly_report_helper.sh` configures PR-mode S3 paths, computes target branch, reads nightly history, uploads per-matrix PR run summaries, and calls `nightly_report.py --mode pr`.
Workflow orchestration and CI integration `ci/pr_summary.sh`, `.github/workflows/pr_test_summary.yaml`, `.github/workflows/pr.yaml`	`pr_summary.sh` validates env, maps AWS credentials, runs the aggregation CLI, and posts the sticky comment. New reusable workflow `pr_test_summary.yaml` runs the orchestration in a container; `pr.yaml` adds an always-running `pr-test-summary` job and excludes it from dependency checks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 64.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and specifically describes the main change: implementing PR test failure classification against nightly history with sticky comment posting.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description clearly relates to the changeset by explaining the classification and posting of PR test failures as new vs. known.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ci/pr-test-classification-comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ci/utils/aggregate_common.py`:
- Around line 31-39: The public helper download_summaries and the other three
public helper functions in this module need explicit type annotations and
expanded pydocstyle docstrings: add parameter and return type hints (e.g.,
s3_prefix: str, local_dir: str, s3_fallback_prefix: Optional[str] = "", ->
List[Dict[str, Any]]) and import typing names (List, Dict, Any, Optional) as
needed; update each function's docstring to include "Parameters", "Returns", and
"Raises" sections describing inputs, the returned list of loaded summary dicts,
and any exceptions that may be raised (e.g., IO/boto3 errors,
json.JSONDecodeError) so callers and linters have complete API docs and types
for download_summaries and the three other public helpers in this file.

In `@ci/utils/aggregate_pr.py`:
- Around line 156-158: Add explicit type annotations and a full pydocstyle API
docstring for the public functions build_comment_body and main: annotate their
parameter and return types (e.g., agg: dict or specific type, target_branch:
str, github_run_url: str, sha: str = "", run_date: str = "" for
build_comment_body; appropriate signature for main), and add docstrings
containing params, returns, and raises sections describing expected inputs,
outputs, and possible exceptions; also apply the same annotations/docstring
requirements to the other public functions in the file referenced around the
380-423 region so all new public Python APIs in this module follow repository
standards.

In `@ci/utils/nightly_report.py`:
- Around line 376-452: The public function classify_pr_against_history lacks
type annotations and a complete docstring; add explicit type hints (e.g. import
Dict, List, Any, Tuple and declare def classify_pr_against_history(classified:
Dict[str, List[Dict[str, Any]]], history: Dict[str, Any]) ->
Tuple[List[Dict[str, Any]], List[Dict[str, Any]], List[Dict[str, Any]]]) and
update its docstring to include Parameters (typed description for classified and
history), Returns (describe tuple contents and element types), and Raises (note
no exceptions are raised or document any expected errors), keeping the existing
behavioral description and referencing the function name
classify_pr_against_history so callers and linters see the public API contract.
- Around line 408-424: The PR classification currently checks rec.get("status")
== "active" before rec.get("is_flaky"), causing records that are both active and
flaky to be labeled "known_recurring" and never "known_flaky_nightly"; change
the conditional order in the block that appends to recurring_failures so that
the rec and rec.get("is_flaky") branch is evaluated before the rec.get("status")
== "active" branch (i.e., check is_flaky first and append with
"pr_classification": "known_flaky_nightly", otherwise fall back to the active
branch that appends "known_recurring"), keeping the existing keys ("first_seen",
"failure_count") and variable names unchanged.

In `@ci/utils/pr_comment_helper.py`:
- Around line 67-179: Add proper type annotations and complete pydocstyle
docstrings (params, returns, raises) for the new public functions:
resolve_base_ref(repo: str, pr_number: int, token: str) -> str,
find_existing_comment_id(repo: str, pr_number: int, token: str, marker: str) ->
Optional[int], post_or_update_comment(repo: str, pr_number: int, token: str,
body: str, marker: str) -> str, and main() -> int; update their signatures to
import and use typing primitives (e.g., Optional, Dict, Any) as needed, and
expand each docstring to document parameters, return value, and possible
exceptions (e.g., RuntimeError from _gh_request), matching project docstring
style. Ensure the implementations remain unchanged except for annotations and
docstrings so callers like _cmd_base_ref/_cmd_post still work.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 85b27649-29e3-4f50-902f-0ca09e4a4357

📥 Commits

Reviewing files that changed from the base of the PR and between 3df9b05 and fc35b36.

📒 Files selected for processing (9)

.github/workflows/pr.yaml
.github/workflows/pr_test_summary.yaml
ci/pr_summary.sh
ci/utils/aggregate_common.py
ci/utils/aggregate_nightly.py
ci/utils/aggregate_pr.py
ci/utils/nightly_report.py
ci/utils/nightly_report_helper.sh
ci/utils/pr_comment_helper.py

jameslamb

Ok I had to stop... this PR is way too large to review line-by-line. From my perspective, this amount of new code and complexity is not really worth it in exchange for generating summary comments on PRs... but ultimately that's your decision to make.

Please do a pass over all this new code for the general themes I brought up... remove unnecessary flexibility and competing sources of configuration.

After that, I'll be happy to review again.

jameslamb · 2026-05-14T16:07:52Z

+    # The reusable workflow's job needs `contents: read` (for
+    # actions/checkout) and `pull-requests: write` (to post the sticky
+    # comment).  Reusable workflows inherit the caller's GITHUB_TOKEN
+    # permissions, so the caller must grant at least the same set —
+    # the workflow-level default is `permissions: {}`.


Suggested change

# The reusable workflow's job needs `contents: read` (for

# actions/checkout) and `pull-requests: write` (to post the sticky

# comment). Reusable workflows inherit the caller's GITHUB_TOKEN

# permissions, so the caller must grant at least the same set —

# the workflow-level default is `permissions: {}`.

Minor personal preference, I don't think this comment is adding much information. Recommend removing it.

jameslamb · 2026-05-14T16:13:24Z

+# The python:3.14-slim container's default /bin/sh is dash, which doesn't
+# understand `[[ ... ]]` or `set -o pipefail`.  Force bash for all run
+# steps in this workflow (matches self_hosted_service_test.yaml convention).
+defaults:
+  run:
+    shell: bash


Suggested change

# The python:3.14-slim container's default /bin/sh is dash, which doesn't

# understand `[[ ... ]]` or `set -o pipefail`. Force bash for all run

# steps in this workflow (matches self_hosted_service_test.yaml convention).

defaults:

run:

shell: bash

I don't understand this at all. GitHub Actions default shell if you don't explicitly set shell: is already bash -e {0}.

docs: https://docs.github.com/en/actions/reference/workflows-and-actions/workflow-syntax#jobsjob_iddefaultsrunshell

I'm sorry to have to ask this but... is this something you added for a real problem you faced, or something an LLM generated?

Unless this was really intentional, please remove it to simplify this.

jameslamb · 2026-05-14T16:15:08Z

+          persist-credentials: false
+      - name: Install dependencies
+        run: |
+          apt-get update && apt-get install -y --no-install-recommends curl


Suggested change

apt-get update && apt-get install -y --no-install-recommends curl

apt-get update

apt-get install -y --no-install-recommends curl

This is already multiple lines. Let's simplify this and make it a bit easier to read by splitting commands onto their own lines.

jameslamb · 2026-05-14T16:17:18Z

+if [ -z "${CUOPT_S3_URI:-}" ]; then
+    echo "WARNING: CUOPT_S3_URI is not set; nothing to aggregate." >&2
+    exit 0
+fi
+if [ -z "${GITHUB_RUN_ID:-}" ]; then
+    echo "WARNING: GITHUB_RUN_ID is not set; cannot locate per-matrix summaries." >&2
+    exit 0
+fi
+
+# aws CLI uses the standard AWS_* env vars; map the cuOpt-prefixed secrets onto them.
+export AWS_ACCESS_KEY_ID="${CUOPT_AWS_ACCESS_KEY_ID:-${AWS_ACCESS_KEY_ID:-}}"
+export AWS_SECRET_ACCESS_KEY="${CUOPT_AWS_SECRET_ACCESS_KEY:-${AWS_SECRET_ACCESS_KEY:-}}"
+unset AWS_SESSION_TOKEN
+


Suggested change

if [ -z "${CUOPT_S3_URI:-}" ]; then

echo "WARNING: CUOPT_S3_URI is not set; nothing to aggregate." >&2

exit 0

fi

if [ -z "${GITHUB_RUN_ID:-}" ]; then

echo "WARNING: GITHUB_RUN_ID is not set; cannot locate per-matrix summaries." >&2

exit 0

fi

# aws CLI uses the standard AWS_* env vars; map the cuOpt-prefixed secrets onto them.

export AWS_ACCESS_KEY_ID="${CUOPT_AWS_ACCESS_KEY_ID:-${AWS_ACCESS_KEY_ID:-}}"

export AWS_SECRET_ACCESS_KEY="${CUOPT_AWS_SECRET_ACCESS_KEY:-${AWS_SECRET_ACCESS_KEY:-}}"

unset AWS_SESSION_TOKEN

All of this flexibility adds complexity and increases the risk of "the job has been silently failing for months" types of bugs. In my opinion, it'd be much better to make this strict everywhere and have it error out with a big loud error if anything other than the exact required environment variables, arguments, etc. are set.

Will you consider that?

jameslamb · 2026-05-14T16:18:13Z

+export AWS_SECRET_ACCESS_KEY="${CUOPT_AWS_SECRET_ACCESS_KEY:-${AWS_SECRET_ACCESS_KEY:-}}"
+unset AWS_SESSION_TOKEN
+
+GITHUB_RUN_URL="${GITHUB_SERVER_URL:-https://github.com}/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}"


Suggested change

GITHUB_RUN_URL="${GITHUB_SERVER_URL:-https://github.com}/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}"

GITHUB_RUN_URL="https://github.com/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}"

This is using GitHub-specific conventions so I really wouldn't expect any other value of GITHUB_SERVER_URL to work here. If you don't have a specific intended use case for overriding GITHUB_SERVER_URL, just hard-code github.com to make this simpler.

jameslamb · 2026-05-14T16:19:19Z

+    --target-branch "${GITHUB_BASE_REF:-main}" \
+    --sha "${GITHUB_SHA:-}" \


Suggested change

--target-branch "${GITHUB_BASE_REF:-main}" \

--sha "${GITHUB_SHA:-}" \

--target-branch "${GITHUB_BASE_REF" \

--sha "${GITHUB_SHA}" \

From this point forward, assume I'm asking for a review of everything in this PR following this same trend... remove unnecessary flexibility like fallback values. This code is so complex, you could make it simpler and reduce the risk of mistakes by making all of the interfaces strict and limiting how many different places configuration comes from.

jameslamb · 2026-05-14T16:20:16Z

+    exit 0
+fi
+
+if [ -z "${GITHUB_TOKEN:-}" ]; then


This validation should happen earlier, before you've done all the work of running aggregate_pr.py.

That'll reduce the time it takes to be notified of an issue, which saves some capacity in CI and helps you make faster progress on the project.

…r bug Bug fix in classify_pr_against_history (nightly_report.py): Check `is_flaky` before `status == "active"` for hard failures. A test record marked both active and flaky was previously bucketed only as `known_recurring`, never surfacing `known_flaky_nightly`. The flaky-in-PR-run loop already had this precedence; align the hard-fail loop to match. Documentation/type-hint review fixes: - aggregate_common.py: type hints + Google-style Args/Returns/Raises docstrings on download_summaries, load_local_summaries, aggregate_summaries, html_escape. - aggregate_pr.py: same on build_comment_body and main(). - nightly_report.py: same on classify_pr_against_history. - pr_comment_helper.py: same on resolve_base_ref, find_existing_comment_id, post_or_update_comment, main(). All files now `from __future__ import annotations` so the modern ``list[dict[str, Any]]`` / ``int | None`` syntax works without needing typing.List/Dict/Optional. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ci/utils/pr_comment_helper.py`:
- Around line 188-190: The CLI currently accepts free-form --repo and
non-positive --pr; update the _add_common_args helper to validate inputs up
front: enforce that --repo matches the owner/name pattern (e.g., two non-empty
segments separated by a single slash using a simple regex) and enforce that --pr
is an integer > 0 (use an argparse type or validator that raises
argparse.ArgumentTypeError on invalid input). Modify the argument definitions
for "--repo" and "--pr" in _add_common_args to call these validators so the
parser fails fast with clear errors rather than constructing malformed GitHub
API URLs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9ad776a2-52d3-4c3e-a3fd-2ef64b2dfa12

📥 Commits

Reviewing files that changed from the base of the PR and between fc35b36 and af327d4.

📒 Files selected for processing (4)

ci/utils/aggregate_common.py
ci/utils/aggregate_pr.py
ci/utils/nightly_report.py
ci/utils/pr_comment_helper.py

🚧 Files skipped from review as they are similar to previous changes (3)

ci/utils/aggregate_common.py
ci/utils/aggregate_pr.py
ci/utils/nightly_report.py

coderabbitai · 2026-05-14T16:33:56Z

+def _add_common_args(sp):
+    sp.add_argument("--repo", required=True, help="owner/name")
+    sp.add_argument("--pr", required=True, type=int, help="PR number")


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate CLI boundary inputs before constructing GitHub API URLs.

--repo is currently free-form and --pr allows non-positive values. Add strict owner/name validation and enforce PR > 0 to fail fast with clear errors instead of malformed API calls.

Suggested patch

@@ import argparse import json import os +import re import sys from urllib import error, request @@ COMMENT_MARKER = "" +REPO_SLUG_RE = re.compile(r"^[A-Za-z0-9_.-]+/[A-Za-z0-9_.-]+$") @@ +def _positive_int(value: str) -> int: + pr_number = int(value) + if pr_number <= 0: + raise argparse.ArgumentTypeError("PR number must be > 0") + return pr_number + + def _add_common_args(sp): sp.add_argument("--repo", required=True, help="owner/name") - sp.add_argument("--pr", required=True, type=int, help="PR number") + sp.add_argument("--pr", required=True, type=_positive_int, help="PR number (> 0)") @@ args = p.parse_args() + if not REPO_SLUG_RE.fullmatch(args.repo): + print("ERROR: --repo must be in 'owner/name' format.", file=sys.stderr) + return 2 if not args.token: print("ERROR: --token or $GITHUB_TOKEN must be set.", file=sys.stderr) return 2

As per coding guidelines, **/*.{cpp,cuh,cu,hpp,py}: Flag missing input validation at library and server boundaries.

Also applies to: 233-235

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ci/utils/pr_comment_helper.py` around lines 188 - 190, The CLI currently accepts free-form --repo and non-positive --pr; update the _add_common_args helper to validate inputs up front: enforce that --repo matches the owner/name pattern (e.g., two non-empty segments separated by a single slash using a simple regex) and enforce that --pr is an integer > 0 (use an argparse type or validator that raises argparse.ArgumentTypeError on invalid input). Modify the argument definitions for "--repo" and "--pr" in _add_common_args to call these validators so the parser fails fast with clear errors rather than constructing malformed GitHub API URLs.

Address jameslamb's review on PR #1194: - pr.yaml: drop the now-self-explanatory comment block above the pr-test-summary `permissions:` block. - pr_test_summary.yaml: * remove `defaults.run.shell: bash` — GHA's default already is bash -e {0}, so the override was redundant. * split `apt-get update && apt-get install` onto two lines. * drop the unused GITHUB_SERVER_URL env (now that pr_summary.sh hard-codes https://github.com). - pr_summary.sh: rewrite to be strict. * all required env vars validated up front via `: "${VAR:?}"` assertions — including GITHUB_TOKEN, which now fails the job loudly before aggregate_pr.py runs instead of after. * remove every `:-default` fallback (CUOPT_AWS_*, GITHUB_BASE_REF, GITHUB_SHA, GITHUB_SERVER_URL). * hard-code https://github.com instead of templating GITHUB_SERVER_URL — this code is GitHub-specific and there is no real use case for overriding the server URL. * remove the silent "WARNING; exit 0" fallbacks for missing CUOPT_S3_URI / GITHUB_RUN_ID — the strict assertions cover them and the job should fail loudly when misconfigured. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Continue the simplification: aggregate_pr.py: - `--target-branch`, `--sha`, `--github-run-url` are now required. No more env-var fallbacks in argparse defaults — the shell script passes every value explicitly, and the env-var fallback was a second hidden source of configuration. - `--s3-pr-summaries-prefix` / `--local-summaries-dir` go into an argparse mutually-exclusive group with `required=True`, removing the manual ``return 2`` validation block. - Keep static defaults that are not env-fallbacks: `--output-dir`, `--run-date` (defaults to today UTC). pr_comment_helper.py: - Drop the `--token` CLI flag entirely; read `GITHUB_TOKEN` from env directly with a strict early-fail when missing. Removes the dual-source pattern (CLI default + env fallback). - Drop the `--marker` CLI flag. `COMMENT_MARKER` is the only valid value — make it the single source of truth. Public functions still accept a `marker` kwarg (default `COMMENT_MARKER`) for unit testing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Remove section-divider banners and what-not-why comments now that the function docstrings and well-named identifiers carry the same information. Keep all WHY comments (precedence reasoning in the classifier, constant-value rationale, the CUOPT_AWS_*-to-AWS_* mapping note). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The fallback exists because GHA leaves `github.base_ref` empty for the `push` events the PR workflow triggers on, and the shared rapidsai test workflows don't propagate a target branch into the test container. Inline comment + a follow-up note in the PR description to centralize PR classification in pr-test-summary (which already resolves the base ref via the API) so the fallback can be removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jameslamb

Thanks for the changes and for re-writing the description, it's much clearer to me now what the goal is.

Now that I can see what it is, I don't support this.

I'm not convinced that this can work well (for example... we run different test matrices in nightlies vs. PRs, won't that confuse the "known issue" logic?), or that it's worth the costs it imposes which include:

non-trivial new complexity and maintenance burden
expanded GitHub Actions workflow permissions
new cloud costs (API calls + storage in S3)
new risk of hitting GitHub API limits (which could block all other operations in the repo, like artifact uploading / downloading)
a huge amount of new CI code (more code = larger changesets = easier for malicious code to sneak past human review)

HOWEVER... I'm leaving a non-blocking review. I am not an active maintainer of cuOpt. If you weigh all of those costs and still think the benefits outweigh them, and get another cuOpt maintainer in cuopt-ci-codeowners like @tmckayus to approve, then I won't stand in the way.

jameslamb · 2026-05-14T19:52:26Z

+    flaked_in_pr_run = []
+
+    for entry in recurring:
+        cls = entry.get("pr_classification", "")


I'm really struggling to follow this code. Using dataclasses to represent the JSON data would be clearer and stricter.

dismissing my previous blocking review.

rgsl888prabhu · 2026-05-14T20:04:02Z

Thanks for the changes and for re-writing the description, it's much clearer to me now what the goal is.

Now that I can see what it is, I don't support this.

I'm not convinced that this can work well (for example... we run different test matrices in nightlies vs. PRs, won't that confuse the "known issue" logic?), or that it's worth the costs it imposes which include:

non-trivial new complexity and maintenance burden

expanded GitHub Actions workflow permissions

new cloud costs (API calls + storage in S3)

new risk of hitting GitHub API limits (which could block all other operations in the repo, like artifact uploading / downloading)

a huge amount of new CI code (more code = larger changesets = easier for malicious code to sneak past human review)

HOWEVER... I'm leaving a non-blocking review. I am not an active maintainer of cuOpt. If you weigh all of those costs and still think the benefits outweigh them, and get another cuOpt maintainer in cuopt-ci-codeowners like @tmckayus to approve, then I won't stand in the way.

Hmm, I agree with nightly vs PR test suite testing on different matrices, but most of the times, the tests are failing similarly in both, so the corner case seems acceptable.

But I my major concern would be the new failure points getting added due to these which might trigger other issues.

Currently we have nightly dashboard which showcases all the failures and may be users can manually cross reference when needed.

@mlubin @chris-maes @rg20 what are your thoughts ?

gforsyth · 2026-05-14T20:29:38Z

I would be concerned about the ability to distinguish between "this common failure is because this test is flaky" and "this seemingly common failure is because of a bug that's just been introduced".

Perfectly deterministic tests are hard, and I can imagine optimization problems make them even harder -- but if there are tests that become routine to ignore, then those tests no longer provide a useful signal, or they potentially mask a useful signal.

I would suggest that if a test flakes out more than 50% of the time, that is a broken test -- it should be fixed, or made more flexible (by, e.g. loosening comparison tolerances), or removed entirely.

mlubin · 2026-05-14T21:50:43Z

I'd like to put aside the workflow/procedural discussion of if or when we should decide to merge PRs despite test failures, that's not being changed by this PR and is probably better to cover on a call.

The team is excited about the reporting aspect because test failures are common, and having to dig through the CI logs on each occasion is a waste of developer time. The tie in with historical data on if this is a new or pre-existing or flaky failure is also useful and saves developer time. However, maybe this is too much for a single PR and we could split this into more digestible chunks.

@jameslamb how many of your concerns would still apply to a bot that narrowly reads the PR's CI artifacts and produces a comment summarizing which tests failed/crashed?

rgsl888prabhu self-assigned this May 8, 2026

rgsl888prabhu added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels May 8, 2026

rgsl888prabhu and others added 2 commits May 8, 2026 15:47

rgsl888prabhu and others added 2 commits May 11, 2026 10:42

Merge branch 'main' into ci/pr-test-classification-comment

0f8275a

Merge branch 'main' into ci/pr-test-classification-comment

b622786

test(temp): fix D209 docstring closing in segfault smoke test

d0e0957

Multi-line docstring closing quotes must be on their own line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rgsl888prabhu and others added 2 commits May 13, 2026 10:30

rgsl888prabhu marked this pull request as ready for review May 13, 2026 19:11

rgsl888prabhu requested a review from a team as a code owner May 13, 2026 19:11

rgsl888prabhu requested a review from gforsyth May 13, 2026 19:11

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

Comment thread ci/utils/aggregate_common.py Outdated

Comment thread ci/utils/aggregate_pr.py Outdated

Comment thread ci/utils/nightly_report.py Outdated

Comment thread ci/utils/nightly_report.py Outdated

Comment thread ci/utils/pr_comment_helper.py Outdated

jameslamb previously requested changes May 14, 2026

View reviewed changes

jameslamb removed the request for review from gforsyth May 14, 2026 16:26

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

rgsl888prabhu and others added 4 commits May 14, 2026 11:45

rgsl888prabhu mentioned this pull request May 14, 2026

ci: centralize PR test classification in pr-test-summary (follow-up to #1194) #1223

Open

5 tasks

rgsl888prabhu requested a review from jameslamb May 14, 2026 19:32

rgsl888prabhu mentioned this pull request May 14, 2026

skill: tighten PR-description and CI-authoring guidance for agents #1224

Open

jameslamb reviewed May 14, 2026

View reviewed changes

	apt-get update && apt-get install -y --no-install-recommends curl
	apt-get update
	apt-get install -y --no-install-recommends curl

	GITHUB_RUN_URL="${GITHUB_SERVER_URL:-https://github.com}/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}"
	GITHUB_RUN_URL="https://github.com/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}"

		--target-branch "${GITHUB_BASE_REF:-main}" \
		--sha "${GITHUB_SHA:-}" \

Conversation

rgsl888prabhu commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

copy-pr-bot Bot commented May 8, 2026

Uh oh!

rgsl888prabhu commented May 8, 2026

Uh oh!

rgsl888prabhu commented May 11, 2026

Uh oh!

rgsl888prabhu commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Test Classification

pytest-cuopt — PROCESS_CRASH [wheel-python / cuda12.2.2-py3.11-aarch64] — NEW

Uh oh!

rgsl888prabhu commented May 12, 2026

Uh oh!

rgsl888prabhu commented May 12, 2026

Uh oh!

rgsl888prabhu commented May 12, 2026

Uh oh!

rgsl888prabhu commented May 12, 2026

Uh oh!

rgsl888prabhu commented May 12, 2026

Uh oh!

rgsl888prabhu commented May 12, 2026

Uh oh!

rgsl888prabhu commented May 12, 2026

Uh oh!

rgsl888prabhu commented May 13, 2026

Uh oh!

rgsl888prabhu commented May 13, 2026

Uh oh!

rgsl888prabhu commented May 13, 2026

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jameslamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

jameslamb left a comment

Choose a reason for hiding this comment

Uh oh!

rgsl888prabhu commented May 8, 2026 •

edited

Loading

github-actions Bot commented May 11, 2026 •

edited

Loading

`pytest-cuopt` — `PROCESS_CRASH` [wheel-python / cuda12.2.2-py3.11-aarch64] — NEW

coderabbitai Bot commented May 13, 2026 •

edited

Loading