Skip to content

Conversation

@HAOCHENYE
Copy link
Collaborator

No description provided.

@HAOCHENYE HAOCHENYE force-pushed the yehc/training_with_hf branch 4 times, most recently from 84bbe79 to df0fa00 Compare January 28, 2026 09:29
self.llm_float8_handler.precompute_float8_dynamic_scale_for_fsdp(self.model.language_model)
if self.vision_float8_handler is not None and self.vision_float8_handler.enabled:

if self.vision_float8_handler:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

和前面一致 ,is not None

@HAOCHENYE HAOCHENYE force-pushed the yehc/training_with_hf branch 4 times, most recently from 8128974 to b28118e Compare January 31, 2026 08:22
The previous `clean_param_name` only matches the
"._checkpoint_wrapped_module" which starts with **.**, however, for
layers wrapper with checkpoint wrapper, the layer name start with
"_checkpoint_wrapped_module" cannot be cleaned for the missing prefix .

ghstack-source-id: 220732d
Pull-Request: InternLM#1452
…educe code duplication

ghstack-source-id: bdc47bb
Pull-Request: InternLM#1453
ghstack-source-id: 93b84b9
Pull-Request: InternLM#1457
`torch.autograd.grad` will raise an error if any tensor of `input` does
not require gradient, e.g, the frozen `lm_head`. This commit just fix
it with a simple control flow.


ghstack-source-id: 2b083eb
Pull-Request: InternLM#1458
…f16 in DecoderLayer

When FSDPModule is applied to DecoderLayer, it automatically converts position_embedding to bf16, which causes the forward pass output to differ from HuggingFace models beyond the acceptable threshold. This leads to test failures for Qwen VL. The impact on text-only components is currently uncertain.


ghstack-source-id: 536e8bd
Pull-Request: InternLM#1462
@HAOCHENYE HAOCHENYE force-pushed the yehc/training_with_hf branch from b28118e to e401b71 Compare February 2, 2026 05:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants