Skip to content

Fix eigenvalue monitor logging#7987

Open
heurry wants to merge 1 commit intodeepspeedai:masterfrom
heurry:fix-eigenvalue-monitor-release-version
Open

Fix eigenvalue monitor logging#7987
heurry wants to merge 1 commit intodeepspeedai:masterfrom
heurry:fix-eigenvalue-monitor-release-version

Conversation

@heurry
Copy link
Copy Markdown
Contributor

@heurry heurry commented Apr 28, 2026

Summary

Fixes #7983.

This PR fixes eigenvalue monitor logging by generating monitor events from the local block_eigenvalue values instead of the nonexistent self.ev_values, and by converting the dict_values view into an indexable list before event generation.

The release workflow/version check change from the earlier revision was removed so this PR now addresses only the eigenvalue monitor issue.

Changes

  • Add _get_eigenvalue_monitor_events() for focused eigenvalue monitor event generation.
  • Replace the broken self.ev_values monitor logging path with the helper.
  • Add a focused regression test for eigenvalue monitor events.

Tests

  • git diff --check master
  • python -m py_compile deepspeed/runtime/engine.py tests/unit/runtime/test_engine.py
  • PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 python -m pytest tests/unit/runtime/test_engine.py -v
    • passed: 1 test

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b858ce5ea4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 39 to 41
- name: Bump version
run: |
python release/bump_patch_version.py --current_version ${{ env.RELEASE_VERSION }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Revalidate master version before bumping patch

In the updated release workflow, version validation runs on the release tag before master is checked out, but the bump step still writes version.txt from ${{ env.RELEASE_VERSION }}. If master has advanced (for example, it already contains a higher version) by the time Checkout master for version bump runs, python release/bump_patch_version.py --current_version ${{ env.RELEASE_VERSION }} can overwrite master with a lower next patch version and open a downgrade PR. Add a validation step after checking out master (or derive the bump from master's current version.txt) before creating the PR.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in be529dc3.

After checking out master, the workflow now reads MASTER_VERSION from master's current version.txt and uses that value for the post-release patch bump. This avoids creating a downgrade PR if master has advanced beyond the released tag version.

@heurry heurry force-pushed the fix-eigenvalue-monitor-release-version branch from b858ce5 to be529dc Compare April 28, 2026 11:07
@delock
Copy link
Copy Markdown
Collaborator

delock commented Apr 29, 2026

Hi @heurry , I would suggest to seperate this PR into two, each of them fix a different issue.

@delock
Copy link
Copy Markdown
Collaborator

delock commented Apr 29, 2026

Also see that Eigenvalue based MoQ is temporary disabled. Hi @sfc-gh-truwase do you know whether this functionality is still needed?
https://github.com/deepspeedai/DeepSpeed/blob/master/deepspeed/runtime/config.py#L513

Signed-off-by: heurry <restart12212022@163.com>
@heurry heurry force-pushed the fix-eigenvalue-monitor-release-version branch from be529dc to 198e52f Compare April 29, 2026 14:40
@heurry heurry changed the title Fix eigenvalue monitor logging and release version check Fix eigenvalue monitor logging Apr 29, 2026
@heurry
Copy link
Copy Markdown
Contributor Author

heurry commented Apr 29, 2026

Thanks for the review.

I updated this PR to address only the eigenvalue monitor logging issue (#7983). The release workflow/version check changes have been removed from this PR, and the diff is now limited to deepspeed/runtime/engine.py plus the focused regression test in tests/unit/runtime/test_engine.py.

I also updated the PR title and description to match the narrowed scope.

@heurry heurry closed this Apr 29, 2026
@heurry heurry reopened this Apr 29, 2026
@sfc-gh-truwase
Copy link
Copy Markdown
Collaborator

Also see that Eigenvalue based MoQ is temporary disabled. Hi @sfc-gh-truwase do you know whether this functionality is still needed? https://github.com/deepspeedai/DeepSpeed/blob/master/deepspeed/runtime/config.py#L513

@delock thanks for the good catch.

@heurry thanks for PR, but unfortunately it applies to a feature that is disabled because the overall HybridEngine has not been maintained. Are you interested in using HybridEngine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect variable name: self.ev_values

3 participants