Skip to content

Request to publish the Base and Samll models ~ #77

@juntaosun

Description

@juntaosun

The model parameters in vit.py are examined. It was trained using dinov2l16_384 and has 300M parameters for ViT-Large.

It's known that there are also lighter ViT-Base (86M) and ViT-Samll models. If you could release them separately, their inference speed should be even faster.

vit.py :

ViTPreset = Literal["dinov2l16_384", "dinov2b16_384", "dinov2s16_384", ]

VIT_CONFIG_DICT: dict[ViTPreset, ViTConfig] = {
    
    # ViT-Large (300M) - dinov2l16_384
    "dinov2l16_384": ViTConfig(
        in_chans=3,
        embed_dim=1024,
        depth=24,
        num_heads=16,
        init_values=1e-5,
        global_pool="",
    ),

    # ViT-Base(86M) - dinov2b16_384
    "dinov2b16_384": ViTConfig(
        in_chans=3,
        embed_dim=768,  
        depth=12,       
        num_heads=12,  
        init_values=1e-5,
        global_pool="",
    ),
    
    # ViT-Samll - dinov2s16_384
    "dinov2s16_384": ViTConfig(
        in_chans=3,
        embed_dim=384,
        depth=12,    
        num_heads=6,    
        init_values=1e-5,
        global_pool="",
    ),
}

monodepth.py:

# Map the decoder configuration with the number of output channels
# for each tensor from the decoder output.
MONODEPTH_ENCODER_DIMS_MAP: dict[ViTPreset, list[int]] = {
    # For publication
    "dinov2l16_384": [256, 512, 1024, 1024],
    
    # ADD
    "dinov2b16_384": [192, 384, 768, 768], # ViT-Base
    "dinov2s16_384": [96, 192, 384, 384], # ViT-Small
}

MONODEPTH_HOOK_IDS_MAP: dict[ViTPreset, list[int]] = {
    # For publication
    "dinov2l16_384": [5, 11, 17, 23],
    
    # ADD
    "dinov2b16_384": [2, 5, 8, 11], # ViT-Base
    "dinov2s16_384": [2, 5, 8, 11], # ViT-Small
}

Hopefully, you can train and release ViT-Base and ViT-Samll.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions