Conversation
collie/models/qwen2/model.py
Outdated
| ) | ||
|
|
||
| self.num_heads_tp = query_states.shape[2] | ||
| self.tp_size = self.num_heads // self.num_heads_tp |
There was a problem hiding this comment.
tp_size能通过self.config.tp_size得到
collie/models/qwen2/model.py
Outdated
|
|
||
| attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim) | ||
|
|
||
| if attn_weights.size() != (bsz, self.num_heads_tp, q_len, kv_seq_len): |
There was a problem hiding this comment.
这里的assert也应该通过self.config.tp_size和self.num_heads来做
collie/models/qwen2/model.py
Outdated
| rearrange(value_states, "b n (h d) -> b n h d", d=self.head_dim), | ||
| ) | ||
|
|
||
| self.num_heads_tp = query_states.shape[2] |
There was a problem hiding this comment.
和qwen2attention里一样,通过config.tp_size得到
collie/models/qwen2/model.py
Outdated
| "unexpected results may be encountered." | ||
| ) | ||
| # self.self_attn = QWEN2_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx) | ||
| self.self_attn = Qwen2FlashAttention2(config, layer_idx) |
There was a problem hiding this comment.
像这样写吧,否则config里的use_flash就不能控制这里的attn实现了
if config.attn_implementation == "flash_attention_2" or config.use_flash:
self.attention = InternLM2FlashAttention2(config=config)
else:
self.attention = InternLM2Attention(config=config)
collie/models/qwen2/model.py
Outdated
| attention_mask: Optional[torch.Tensor] = None, | ||
| position_ids: Optional[torch.LongTensor] = None, | ||
| past_key_values: Optional[List[torch.FloatTensor]] = None, | ||
|
|
| self.self_attn = Qwen2FlashAttention2(config, layer_idx) | ||
| # self.self_attn = Qwen2SdpaAttention(config, layer_idx) | ||
|
|
||
| if config._attn_implementation == "flash_attention_2" or config.use_flash: |
There was a problem hiding this comment.
842把_attn_implementation赋值成了"flash_attention_2",这里or是恒为True吗
|
测试test_generation.py pp_size=2和tp_size=2生成结果不一样。应该是kv cache的问题。 |
| ) | ||
| from collie.models.utils import inputs_to_kv_cache_for_layer, kv_cache_to_inputs_for_layer, kv_cache_to_inputs_for_model, inputs_to_kv_cache_for_model | ||
|
|
||
| if is_flash_attn_2_available(): |
There was a problem hiding this comment.
如果是2.0及以前版本的flahattn会是false,config.use_flash=True的时候会报错,可以优化一下报错信息。
我看到的报错是
File "/fs-computility/llm/shared/lvkai/workspace/collie/tests/models/qwen2/../../../collie/models/qwen2/model.py", line 488, in forward _flash_supports_window_size NameError: name '_flash_supports_window_size' is not defined
可以改成提示他flash attn版本最少2.1
No description provided.