Add Qwen2 tp&pp model by Anti-Entrophic · Pull Request #161 · OpenMOSS/CoLLiE

Anti-Entrophic · 2024-04-11T06:45:51Z

No description provided.

KaiLv69 · 2024-04-13T12:49:44Z

collie/models/qwen2/model.py

+        )
+
+        self.num_heads_tp = query_states.shape[2]
+        self.tp_size = self.num_heads // self.num_heads_tp


tp_size能通过self.config.tp_size得到

KaiLv69 · 2024-04-13T12:51:18Z

collie/models/qwen2/model.py

+
+        attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim)
+
+        if attn_weights.size() != (bsz, self.num_heads_tp, q_len, kv_seq_len):


这里的assert也应该通过self.config.tp_size和self.num_heads来做

KaiLv69 · 2024-04-13T12:52:55Z

collie/models/qwen2/model.py

+            rearrange(value_states, "b n (h d) -> b n h d", d=self.head_dim),
+        )
+
+        self.num_heads_tp = query_states.shape[2]


和qwen2attention里一样，通过config.tp_size得到

KaiLv69 · 2024-04-13T13:01:57Z

collie/models/qwen2/model.py

+                "unexpected results may be encountered."
+            )
+        # self.self_attn = QWEN2_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+        self.self_attn = Qwen2FlashAttention2(config, layer_idx)


像这样写吧，否则config里的use_flash就不能控制这里的attn实现了
if config.attn_implementation == "flash_attention_2" or config.use_flash:
self.attention = InternLM2FlashAttention2(config=config)
else:
self.attention = InternLM2Attention(config=config)

Qwen2Attention也需要测试一下

KaiLv69 · 2024-04-13T13:07:18Z

collie/models/qwen2/model.py

+        attention_mask: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.LongTensor] = None,
+        past_key_values: Optional[List[torch.FloatTensor]] = None,
+


这个空行得删掉

这个先不改吧，进一步测试后再说

KaiLv69 · 2024-04-22T08:24:37Z

collie/models/qwen2/model.py

-        self.self_attn = Qwen2FlashAttention2(config, layer_idx)
-        # self.self_attn = Qwen2SdpaAttention(config, layer_idx)
-
+        if config._attn_implementation == "flash_attention_2" or config.use_flash:


842把_attn_implementation赋值成了"flash_attention_2"，这里or是恒为True吗

KaiLv69 · 2024-04-24T05:07:19Z

测试test_generation.py pp_size=2和tp_size=2生成结果不一样。应该是kv cache的问题。

KaiLv69 · 2024-04-24T05:12:38Z

collie/models/qwen2/model.py

+)
+from collie.models.utils import inputs_to_kv_cache_for_layer, kv_cache_to_inputs_for_layer, kv_cache_to_inputs_for_model, inputs_to_kv_cache_for_model
+
+if is_flash_attn_2_available():


如果是2.0及以前版本的flahattn会是false，config.use_flash=True的时候会报错，可以优化一下报错信息。

我看到的报错是
File "/fs-computility/llm/shared/lvkai/workspace/collie/tests/models/qwen2/../../../collie/models/qwen2/model.py", line 488, in forward _flash_supports_window_size NameError: name '_flash_supports_window_size' is not defined
可以改成提示他flash attn版本最少2.1

xxw and others added 8 commits March 9, 2024 15:13

add: init qwen2 in hf

8e0011f

add: add qwen2 tp&pp

20ec1de

fix(qwen2): remove redundant files

a736181

fix: iodriver update for safetensors

32b2467

fix: qwen2

62f661d

add(qwen2): train_example

98f8246

Delete json

a45aabe

fix: finish qwen2

fd8b5e2

KaiLv69 self-requested a review April 13, 2024 12:42

KaiLv69 requested changes Apr 13, 2024

View reviewed changes

KaiLv69 and others added 3 commits April 13, 2024 21:10

revert: revert train step change

d3da544

这个先不改吧，进一步测试后再说

fix(qwen2): refined

014012f

fix(qwen2): fixed padding side in train.py

95f95b6

KaiLv69 reviewed Apr 22, 2024

View reviewed changes

remove debug line

c5a631e

KaiLv69 reviewed Apr 24, 2024

View reviewed changes

Merge branch 'main' into pr/161

68a0978

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen2 tp&pp model#161

Add Qwen2 tp&pp model#161
Anti-Entrophic wants to merge 13 commits intoOpenMOSS:mainfrom
Anti-Entrophic:qwen2

Anti-Entrophic commented Apr 11, 2024

Uh oh!

KaiLv69 Apr 13, 2024

Uh oh!

KaiLv69 Apr 13, 2024

Uh oh!

KaiLv69 Apr 13, 2024

Uh oh!

KaiLv69 Apr 13, 2024 •

edited

Loading

Uh oh!

KaiLv69 Apr 15, 2024

Uh oh!

KaiLv69 Apr 13, 2024

Uh oh!

KaiLv69 Apr 22, 2024

Uh oh!

KaiLv69 commented Apr 24, 2024

Uh oh!

KaiLv69 Apr 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim)

		if attn_weights.size() != (bsz, self.num_heads_tp, q_len, kv_seq_len):

Conversation

Anti-Entrophic commented Apr 11, 2024

Uh oh!

KaiLv69 Apr 13, 2024

Choose a reason for hiding this comment

Uh oh!

KaiLv69 Apr 13, 2024

Choose a reason for hiding this comment

Uh oh!

KaiLv69 Apr 13, 2024

Choose a reason for hiding this comment

Uh oh!

KaiLv69 Apr 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KaiLv69 Apr 15, 2024

Choose a reason for hiding this comment

Uh oh!

KaiLv69 Apr 13, 2024

Choose a reason for hiding this comment

Uh oh!

KaiLv69 Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

KaiLv69 commented Apr 24, 2024

Uh oh!

KaiLv69 Apr 24, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KaiLv69 Apr 13, 2024 •

edited

Loading