Skip to content

Conversation

@youngeunkwon0405
Copy link
Contributor

@youngeunkwon0405 youngeunkwon0405 commented Dec 31, 2025

What does this PR do ?

Enabling FSDP manual registration for fast NCCL symmetric registration

Related Megatron-LM PRs

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 31, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

erhoo82
erhoo82 previously approved these changes Dec 31, 2025
@youngeunkwon0405
Copy link
Contributor Author

/ok to test 881fa7e

erhoo82
erhoo82 previously approved these changes Dec 31, 2025
@youngeunkwon0405
Copy link
Contributor Author

/ok to test 3ee00c0

@youngeunkwon0405
Copy link
Contributor Author

/ok to test f7114bb

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

draft

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

test

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

env

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

test

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

print

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

print

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

a

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

add nccl_ub

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

remove prints

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

a

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

fix

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

minor fixes

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

fix type

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

fix

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

type error

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

fix

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

revert temp commet

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

add sample recipe

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

enhance readability

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

fix lint error

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

white space

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
@youngeunkwon0405 youngeunkwon0405 force-pushed the youngeunk/fsdp-manual-reg branch from f7114bb to 9beb822 Compare December 31, 2025 18:44
@youngeunkwon0405
Copy link
Contributor Author

/ok to test 9beb822

@youngeunkwon0405 youngeunkwon0405 merged commit 7b9485a into main Jan 6, 2026
49 checks passed
@youngeunkwon0405 youngeunkwon0405 deleted the youngeunk/fsdp-manual-reg branch January 6, 2026 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants