[Enhance] Simplify the pipeline of training with hf model #1461

HAOCHENYE · 2026-01-28T08:25:32Z

No description provided.

hhaAndroid · 2026-01-29T05:52:43Z

xtuner/v1/engine/vision_compose_train_engine.py

            self.llm_float8_handler.precompute_float8_dynamic_scale_for_fsdp(self.model.language_model)
-        if self.vision_float8_handler is not None and self.vision_float8_handler.enabled:
+
+        if self.vision_float8_handler:


和前面一致，is not None

ghstack-source-id: 58a583d Pull-Request: InternLM#1449

ghstack-source-id: ea6442f Pull-Request: InternLM#1450

ghstack-source-id: d3e2426 Pull-Request: InternLM#1451

The previous `clean_param_name` only matches the "._checkpoint_wrapped_module" which starts with **.**, however, for layers wrapper with checkpoint wrapper, the layer name start with "_checkpoint_wrapped_module" cannot be cleaned for the missing prefix . ghstack-source-id: 220732d Pull-Request: InternLM#1452

…educe code duplication ghstack-source-id: bdc47bb Pull-Request: InternLM#1453

ghstack-source-id: 334484b Pull-Request: InternLM#1454

ghstack-source-id: cc6308f Pull-Request: InternLM#1455

ghstack-source-id: c0f1b93 Pull-Request: InternLM#1456

ghstack-source-id: 93b84b9 Pull-Request: InternLM#1457

`torch.autograd.grad` will raise an error if any tensor of `input` does not require gradient, e.g, the frozen `lm_head`. This commit just fix it with a simple control flow. ghstack-source-id: 2b083eb Pull-Request: InternLM#1458

ghstack-source-id: ca764b5 Pull-Request: InternLM#1460

…f16 in DecoderLayer When FSDPModule is applied to DecoderLayer, it automatically converts position_embedding to bf16, which causes the forward pass output to differ from HuggingFace models beyond the acceptable threshold. This leads to test failures for Qwen VL. The impact on text-only components is currently uncertain. ghstack-source-id: 536e8bd Pull-Request: InternLM#1462

ghstack-source-id: 943f99d Pull-Request: InternLM#1469

HAOCHENYE force-pushed the yehc/training_with_hf branch 4 times, most recently from 84bbe79 to df0fa00 Compare January 28, 2026 09:29

hhaAndroid reviewed Jan 29, 2026

View reviewed changes

HAOCHENYE force-pushed the yehc/training_with_hf branch 4 times, most recently from 8128974 to b28118e Compare January 31, 2026 08:22

HAOCHENYE added 13 commits February 2, 2026 05:14

[Refactor] Replace HF key remapping hack with hf_key_mapping config

a492f1c

ghstack-source-id: 58a583d Pull-Request: InternLM#1449

[Enhance] Remove float8_handler in fully_shared

e643fbd

ghstack-source-id: ea6442f Pull-Request: InternLM#1450

[Enhance] Add copy interface for SequenceContext

a2e6f9a

ghstack-source-id: d3e2426 Pull-Request: InternLM#1451

[Refactor] Extract common text context and labels building logic to r…

de67abb

…educe code duplication ghstack-source-id: bdc47bb Pull-Request: InternLM#1453

[Enhance] Support using custom collator in DataloaderConfig

cb70527

ghstack-source-id: 334484b Pull-Request: InternLM#1454

[Enhance] Add data interface for sequence context

4442f3f

ghstack-source-id: cc6308f Pull-Request: InternLM#1455

[Enhance] Replace annotation of model_cfg with XTunerBaseModelConfig

f2fca78

ghstack-source-id: c0f1b93 Pull-Request: InternLM#1456

[Enhance] update toy tokenizer

de4d39f

ghstack-source-id: 93b84b9 Pull-Request: InternLM#1457

[Feature] Provide naive fully shard in BaseModel`

6586504

ghstack-source-id: ca764b5 Pull-Request: InternLM#1460

[Enhance] Add _collect_full_state_dict in BaseModel

e401b71

ghstack-source-id: 943f99d Pull-Request: InternLM#1469

HAOCHENYE force-pushed the yehc/training_with_hf branch from b28118e to e401b71 Compare February 2, 2026 05:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhance] Simplify the pipeline of training with hf model #1461

[Enhance] Simplify the pipeline of training with hf model #1461

Uh oh!

HAOCHENYE commented Jan 28, 2026

Uh oh!

hhaAndroid Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Enhance] Simplify the pipeline of training with hf model #1461

Are you sure you want to change the base?

[Enhance] Simplify the pipeline of training with hf model #1461

Uh oh!

Conversation

HAOCHENYE commented Jan 28, 2026

Uh oh!

hhaAndroid Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants