Skip to content

Commit be62d37

Browse files
committed
Ruff format
Signed-off-by: Kunjan Patel <kunjanp@google.com>
1 parent 0d97a52 commit be62d37

File tree

6 files changed

+13
-0
lines changed

6 files changed

+13
-0
lines changed

src/maxdiffusion/configs/base21.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,9 @@ jit_initializers: True
4949
from_pt: False
5050
split_head_dim: True
5151
attention: 'dot_product' # Supported attention: dot_product, flash
52+
mask_padding_tokens: True # Whether to mask padding tokens in attention computation.
53+
attention_sharding_uniform: True # same sequence sharding rules applied for q in both (self and cross attention)
54+
5255
flash_block_sizes: {}
5356
# GroupNorm groups
5457
norm_num_groups: 32

src/maxdiffusion/configs/base_flux_dev.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,8 @@ jit_initializers: True
6363
from_pt: True
6464
split_head_dim: True
6565
attention: 'flash' # Supported attention: dot_product, flash, cudnn_flash_te
66+
mask_padding_tokens: True # Whether to mask padding tokens in attention computation.
67+
attention_sharding_uniform: True # same sequence sharding rules applied for q in both (self and cross attention)
6668

6769
flash_block_sizes: {}
6870
# Use the following flash_block_sizes on v6e (Trillium) due to larger vmem.

src/maxdiffusion/configs/base_flux_dev_multi_res.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,8 @@ jit_initializers: True
6363
from_pt: True
6464
split_head_dim: True
6565
attention: 'flash' # Supported attention: dot_product, flash, cudnn_flash_te
66+
mask_padding_tokens: True # Whether to mask padding tokens in attention computation.
67+
attention_sharding_uniform: True # same sequence sharding rules applied for q in both (self and cross attention)
6668

6769
#flash_block_sizes: {}
6870
# Use the following flash_block_sizes on v6e (Trillium) due to larger vmem.

src/maxdiffusion/configs/base_flux_schnell.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,8 @@ jit_initializers: True
6262
from_pt: True
6363
split_head_dim: True
6464
attention: 'flash' # Supported attention: dot_product, flash, cudnn_flash_te
65+
mask_padding_tokens: True # Whether to mask padding tokens in attention computation.
66+
attention_sharding_uniform: True # same sequence sharding rules applied for q in both (self and cross attention)
6567
flash_block_sizes: {
6668
"block_q" : 256,
6769
"block_kv_compute" : 256,

src/maxdiffusion/configs/base_wan_27b.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,8 @@ from_pt: True
6161
split_head_dim: True
6262
attention: 'flash' # Supported attention: dot_product, flash, cudnn_flash_te, ring
6363
flash_min_seq_length: 4096
64+
mask_padding_tokens: True # Whether to mask padding tokens in attention computation.
65+
attention_sharding_uniform: True # same sequence sharding rules applied for q in both (self and cross attention)
6466
dropout: 0.1
6567

6668
flash_block_sizes: {

src/maxdiffusion/configs/base_xl.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,8 @@ jit_initializers: True
5050
from_pt: False
5151
split_head_dim: True
5252
attention: 'dot_product' # Supported attention: dot_product, flash
53+
mask_padding_tokens: True # Whether to mask padding tokens in attention computation.
54+
attention_sharding_uniform: True # same sequence sharding rules applied for q in both (self and cross attention)
5355
flash_block_sizes: {}
5456
# GroupNorm groups
5557
norm_num_groups: 32

0 commit comments

Comments
 (0)