Skip to content

[Feature]: add ViT activation_offload for InternS1#1619

Open
NengXu001 wants to merge 1 commit intoInternLM:mainfrom
NengXu001:main
Open

[Feature]: add ViT activation_offload for InternS1#1619
NengXu001 wants to merge 1 commit intoInternLM:mainfrom
NengXu001:main

Conversation

@NengXu001
Copy link
Copy Markdown
Contributor

Reduce InternS1 training memory by ~10GB via ViT offloading:

  1. Added support for offloading activations in the modeling_vision module.
  2. Added variables to allow VisionConfig and MoE modules to dynamically perceive each other's layer depths.
  3. Updated MoE activation offloading arguments to include necessary vision parameters.

Copy link
Copy Markdown
Collaborator

@HAOCHENYE HAOCHENYE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should enhance the implementation of activation offload to make it more general-purpose, so that it can serve multiple models without needing to be aware of layer counts.

use_mask_token: bool = False
use_mean_pooling: bool = True
attn_impl: Literal["flash_attention", "flex_attention", "eager_attention"] = "flash_attention"
text_hidden_layers: int = 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not modify other modules to work around functional deficiencies in ActivationOffload itself. I understand that, from VisionConfig's perspective, it should not need to be aware of text_hidden_layers.

@HAOCHENYE HAOCHENYE added the npu label Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants