feat: Add native support for PyTorch Profiler in ms-swift by qq1243196045 · Pull Request #9449 · modelscope/ms-swift

qq1243196045 · 2026-05-29T13:01:47Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

This PR introduces built-in support for PyTorch Profiler,users can now generate detailed execution traces (compatible with Chrome Tracing) to visualize CPU/GPU activities, memory usage, and kernel-level performance.

The implementation is highly extensible, making it straightforward to add support for other tools like nsys down the road.

To support fine-grained performance analysis in Reinforcement Learning (e.g., RLHF), this PR introduces a flexible annotation mechanism. It allows developers to independently profile specific computational roles by simply wrapping their functions with @DistProfiler.annotate.
For example, you can distinguish between the Rollout and Actor phases as follows:

class RolloutTrainer:
    @DistProfiler.annotate(color="red", role="rollout")
    def train(self):
        # rollout logic
        pass

class ActorTrainer:
    @DistProfiler.annotate(color="red", role="actor")
    def train(self):
        # actor logic
        pass

This design ensures that as the framework evolves to support complex multi-role training, performance bottlenecks can be easily isolated and visualized for each specific stage.

Experiment results

Key Arguments

--enable_profiler true 
--profiler_save_path ./profiler_output 
--profiler_ranks 0 1 
--profiler_contents "cpu" "cuda" "stack" 
--profiler_steps 1

gemini-code-assist

Code Review

This pull request introduces a performance profiling framework (DistProfiler) integrated with PyTorch's profiler, adding configuration arguments, trainer mixins, and callback hooks for standard and Megatron trainers. The feedback identifies several critical issues, including potential AttributeErrors in BaseArguments and Profiler, inconsistent default values for profiler_tool, and Python compatibility issues with the | operator in isinstance. Additionally, improvements are suggested for handling None values in config union/intersection operations, replacing print statements with standard logging, and using existing utility functions like get_dist_setting to retrieve distributed ranks.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

qq1243196045 · 2026-05-29T13:43:13Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a distributed performance profiling feature integrated with PyTorch's profiler, adding configuration arguments, callbacks for standard and Megatron trainers, and utility classes to manage profiling sessions. Key feedback highlights several critical issues: a missing logger import in profile.py that will cause a NameError, an invalid attribute check on BaseArguments that blocks initialization, a potential resource leak in discrete profiling mode if exceptions are raised, potential AttributeError issues due to class-level state tracking in Profiler, and an incorrect ordering of type assertions in TorchProfilerToolConfig.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

qq1243196045 · 2026-06-01T02:57:14Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a distributed performance profiling framework integrated with PyTorch's profiler, adding profiler arguments, trainer callbacks, and a DistProfiler utility with an annotate decorator. The code review identified several critical issues: an over-restrictive assertion in profile_args.py that crashes the application when standard callbacks are absent; signature and fallback limitations in the annotate decorator that prevent profiling standalone functions; a potential AttributeError when stopping an uninitialized profiler; and a missing try...finally block in discrete profiling mode that could leak resources if an exception occurs.

gemini-code-assist · 2026-06-01T02:59:10Z

+        if hasattr(self, 'callbacks'):
+            if self.enable_profiler and 'profiler' not in self.callbacks:
+                self.callbacks.append('profiler')
+            if 'profiler' in self.callbacks and not self.enable_profiler:
+                self.enable_profiler = True
+            if self.enable_profiler:
+                assert 'profiler' in self.callbacks, \
+                    'Profiler callback must be included in callbacks when profiler is enabled.'
+            if 'profiler' in self.callbacks:
+                assert self.enable_profiler, \
+                    'Profiler callback is included in callbacks but profiler is not enabled.'
+        else:
+            assert not self.enable_profiler, \
+                'Profiler cannot be enabled without callbacks attribute or with profiler callback missing in callbacks.'


The assertion assert not self.enable_profiler in the else block will raise an AssertionError and crash the application whenever enable_profiler is set to True on any arguments class that does not have a callbacks attribute (such as BaseArguments, DeployArguments, EvalArguments, or ExportArguments). This prevents using the profiler for inference, evaluation, or custom training loops (like RLHF rollout/actor phases) where standard trainer callbacks are not used.

We should remove this assertion to allow enabling the profiler without requiring a callbacks attribute.

Suggested change

if hasattr(self, 'callbacks'):

if self.enable_profiler and 'profiler' not in self.callbacks:

self.callbacks.append('profiler')

if 'profiler' in self.callbacks and not self.enable_profiler:

self.enable_profiler = True

if self.enable_profiler:

assert 'profiler' in self.callbacks, \

'Profiler callback must be included in callbacks when profiler is enabled.'

if 'profiler' in self.callbacks:

assert self.enable_profiler, \

'Profiler callback is included in callbacks but profiler is not enabled.'

else:

assert not self.enable_profiler, \

'Profiler cannot be enabled without callbacks attribute or with profiler callback missing in callbacks.'

if hasattr(self, 'callbacks'):

if self.enable_profiler and 'profiler' not in self.callbacks:

self.callbacks.append('profiler')

if 'profiler' in self.callbacks and not self.enable_profiler:

self.enable_profiler = True

if self.enable_profiler:

assert 'profiler' in self.callbacks, \

'Profiler callback must be included in callbacks when profiler is enabled.'

if 'profiler' in self.callbacks:

assert self.enable_profiler, \

'Profiler callback is included in callbacks but profiler is not enabled.'

BaseArguments 不应该直接使用，一般都是被其他Arguments继承，例如SftArguments，是有callback 属性的，并且profiler功能依赖profiler callback ，因此我必须确保profiler callback和enable_profiler开启。而如果没有 profiler callback，那也确实说明不该开启profiler

gemini-code-assist · 2026-06-01T02:59:11Z

+        def decorator(func):
+
+            @functools.wraps(func)
+            def wrapper(self_instance, *args, **kwargs_inner):
+                profiler = getattr(self_instance, 'profiler', None)
+
+                if (not profiler or not profiler.check_enable() or not profiler.check_this_step()
+                        or not profiler.check_this_rank()):
+                    return func(self_instance, *args, **kwargs_inner)
+
+                impl = profiler._impl
+                if hasattr(impl, 'annotate'):
+                    try:
+                        actual_decorator = impl.annotate(
+                            message=message, color=color, domain=domain, category=category, **kwargs_outer)
+
+                        return actual_decorator(func)(self_instance, *args, **kwargs_inner)
+                    except Exception:
+                        return func(self_instance, *args, **kwargs_inner)
+                return func(self_instance, *args, **kwargs_inner)
+
+            return wrapper
+
+        return decorator


The current implementation of the annotate decorator has two major issues:

It forces self_instance as a required first positional argument in the wrapper, which will raise a TypeError when decorating standalone functions or methods with no arguments.

It silently skips profiling if the class instance does not have a profiler attribute, even if a global profiler is active and running.

We can fix both issues by:

Using *args, **kwargs_inner to support any function signature (standalone or method).

Falling back to the active global profiler (_current_profiler) if the first argument does not have a profiler attribute.

def decorator(func): @functools.wraps(func) def wrapper(*args, **kwargs_inner): profiler = None if args: profiler = getattr(args[0], 'profiler', None) if profiler is None: profiler = getattr(cls, '_current_profiler', None) if (not profiler or not profiler.check_enable() or not profiler.check_this_step() or not profiler.check_this_rank()): return func(*args, **kwargs_inner) impl = profiler._impl if hasattr(impl, 'annotate'): try: actual_decorator = impl.annotate( message=message, color=color, domain=domain, category=category, **kwargs_outer) return actual_decorator(func)(*args, **kwargs_inner) except Exception: return func(*args, **kwargs_inner) return func(*args, **kwargs_inner) return wrapper

该annotate必须要求传入self_instance，standalone functions 是不支持profiler的。并且需要self_instance 拥有profiler 属性，这一点在profiler callback的时候可以保证

gemini-code-assist · 2026-06-01T02:59:11Z

+class DistProfiler:
+


Add a class-level variable _current_profiler to keep track of the active global profiler instance. This will be used as a fallback in the annotate decorator when the decorated function is a standalone function or a method of a class that does not have a profiler attribute.

class DistProfiler: _current_profiler = None

gemini-code-assist · 2026-06-01T02:59:11Z

+    def start(self, **kwargs):
+        if self.check_enable() and self.check_this_rank():
+            self._this_step = True
+            return getattr(self._impl, 'start', lambda **_: None)(**kwargs)
+
+    def stop(self):
+        if self.check_enable() and self.check_this_rank():
+            self._this_step = False
+            return getattr(self._impl, 'stop', lambda: None)()


Update the start and stop methods to register and unregister the active profiler instance in the class-level _current_profiler variable.

def start(self, **kwargs): if self.check_enable() and self.check_this_rank(): self._this_step = True DistProfiler._current_profiler = self return getattr(self._impl, 'start', lambda **_: None)(**kwargs) def stop(self): if self.check_enable() and self.check_this_rank(): self._this_step = False if DistProfiler._current_profiler is self: DistProfiler._current_profiler = None return getattr(self._impl, 'stop', lambda: None)()

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Jintao-Huang · 2026-06-01T06:59:51Z

please run:

pip install pre-commit
pre-commit run --all-files

qq1243196045 · 2026-06-01T08:43:28Z

please run:

pip install pre-commit
pre-commit run --all-files

feat: Add native support for PyTorch Profiler in ms-swift

0b4fd05

gemini-code-assist Bot reviewed May 29, 2026

View reviewed changes

qq1243196045 and others added 11 commits May 29, 2026 21:08

Update swift/utils/profiler/config.py

8c5aab6

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update swift/utils/profiler/config.py

740d942

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update swift/utils/profiler/torch_profile.py

99f464c

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update swift/utils/profiler/torch_profile.py

23e9399

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update swift/trainers/arguments.py

717d5f2

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update swift/utils/profiler/torch_profile.py

6bb140b

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update swift/utils/profiler/profile.py

269be95

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update swift/utils/profiler/torch_profile.py

32a69d1

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

add swift logger

bf99454

updtate Profile

0498984

update Profile

407ea6b

gemini-code-assist Bot reviewed May 29, 2026

View reviewed changes

Comment thread swift/utils/profiler/profile.py

Comment thread swift/arguments/base_args/profile_args.py

Comment thread swift/utils/profiler/torch_profile.py

Comment thread swift/utils/profiler/torch_profile.py

Comment thread swift/utils/profiler/config.py Outdated

qq1243196045 and others added 4 commits May 29, 2026 21:59

Update swift/utils/profiler/config.py

452d041

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

update

1f9069d

update

da28c9f

Merge branch 'Profile' of github.com:qq1243196045/ms-swift into Profile

b352fb5

gemini-code-assist Bot reviewed Jun 1, 2026

View reviewed changes

qq1243196045 and others added 2 commits June 1, 2026 11:05

Update swift/utils/profiler/torch_profile.py

f79da7c

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update swift/utils/profiler/torch_profile.py

6cffadb

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Conversation

qq1243196045 commented May 29, 2026

PR type

PR information

Experiment results

Key Arguments

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qq1243196045 commented May 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qq1243196045 commented Jun 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

qq1243196045 Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

qq1243196045 Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Jintao-Huang commented Jun 1, 2026

Uh oh!

qq1243196045 commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants