Skip to content

Rename legacy Jenkins CI label from linux.16xlarge.nvidia.gpu to 4-gpu#3839

Open
anishesg wants to merge 1 commit intopytorch:mainfrom
anishesg:fix/ph-issue-3780
Open

Rename legacy Jenkins CI label from linux.16xlarge.nvidia.gpu to 4-gpu#3839
anishesg wants to merge 1 commit intopytorch:mainfrom
anishesg:fix/ph-issue-3780

Conversation

@anishesg
Copy link
Copy Markdown

@anishesg anishesg commented Apr 24, 2026

Fixes #3780

The Jenkins CI label linux.16xlarge.nvidia.gpu in .jenkins/metadata.json is a legacy name that doesn't describe the actual hardware used — it simply means "needs multi-GPU". This caused confusion when debugging CI for new tutorials because the label implied specific hardware that wasn't involved.

This change renames all four occurrences of linux.16xlarge.nvidia.gpu to 4-gpu in .jenkins/metadata.json, and updates the hard-coded equality check in .jenkins/get_files_to_run.py (line 44) to match the new label name. The routing logic and shard assignment behavior are unchanged; only the label name is updated to accurately reflect its semantic meaning.

  • The issue that is being fixed is referred in the description
  • Only one issue is addressed in this pull request
  • Labels from the issue that this PR is fixing are added to this pull request
  • No unnecessary issues are included into this pull request

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 24, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3839

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 88efc1e with merge base 5e98ccc (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the cla signed label Apr 24, 2026
@anishesg anishesg marked this pull request as ready for review April 24, 2026 04:57
@sekyondaMeta
Copy link
Copy Markdown
Contributor

@claude review this PR and offer any suggestions

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Claude finished @sekyondaMeta's task in 58s —— View job


PR Review: Rename legacy Jenkins CI label to 4-gpu

  • Gather context and read changed files
  • Review the diff against origin/main
  • Check for any other references to the old label
  • Post review feedback

Summary

This is a clean, focused rename of the legacy linux.16xlarge.nvidia.gpu label to 4-gpu in two files. The change is correct and complete.

Correctness

All references updated consistently. The old label linux.16xlarge.nvidia.gpu appeared in exactly 5 places across 2 files, and all 5 are updated:

  • .jenkins/metadata.json: 4 tutorial entries (lines 24, 27, 30, 78) — updated
  • .jenkins/get_files_to_run.py: 1 equality check (line 44) — updated

I confirmed there are zero remaining references to linux.16xlarge.nvidia.gpu anywhere in the repository.

Observations

  1. Routing logic is preserved. The calculate_shards function in get_files_to_run.py:43-44 filters files by the needs label and routes multi-GPU tutorials to shard 0. Since both the metadata values and the filter string are updated in lockstep, the behavior is identical.

  2. The linux.g5.4xlarge.nvidia.gpu label remains unchanged. This is used for A10G machine routing (line 47) and was not part of the scope. That's correct — the issue only concerns the multi-GPU label.

  3. CI is passing. All 4 workflow runs (Build tutorials, Lintrunner, link check, redirect check) completed successfully.

Suggestion

One minor observation — the new label 4-gpu is a semantic improvement, but it introduces an implicit assumption about the exact GPU count. If the multi-GPU shard's hardware changes in the future (e.g., to 8 GPUs), the label would need updating again. A more future-proof alternative might be multi-gpu, which matches the variable name needs_multigpu already used at get_files_to_run.py:43 and the comment at line 50 ("the only job that has multigpu"). This would make the label self-documenting without embedding a specific count.

That said, if 4-gpu is the naming convention adopted by the broader PyTorch CI infrastructure, then consistency with that convention is more important than the naming concern above.

Verdict

The change is correct and safe to merge as-is. The only suggestion is cosmetic (consider multi-gpu over 4-gpu for alignment with existing variable names and comments), but this is non-blocking.


@georgehong
Copy link
Copy Markdown

I think the Claude suggestion of multi-gpu or something along those lines also makes sense to me, since it looks like the original motivation is that the previous name is too hardware specific. Also cc. @malfet since I see a question on the associated issue: #3780

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Confusing machine names in Jenkins CI

3 participants