Skip to content

Questions About the finetune using the Features Extracted From a Pre-trained Model. Thanks for answer #36

@ChangYuance

Description

@ChangYuance

I use the video clips as input to do video face classification task (Palsy or not).
Here is my model like:
class MAE_only_x(nn.Module):

def __init__(self, args):
    super().__init__()
    self.mae = Marlin.from_online("marlin_vit_base_ytf")
    self.mae.eval()
    # self.norm = nn.BatchNorm1d(768*2)
    self.decoder = nn.Sequential(
            nn.Linear(768, 512),
            # nn.BatchNorm1d(512),
            nn.ReLU(),
            # nn.Dropout(0.3),
            nn.Linear(512, 2)
        )
def forward(self, x, phrase='train'):
    """
    Input x is shape (B, L, d_input)
    """
    # x [batch, 16, 3, 224, 224]
    x = x.permute(0, 2, 1, 3, 4).contiguous()
    x = self.mae.extract_features(x,keep_seq=False) # (B, 768)
    pred_logit = self.decoder(x)  # (B, d_model) -> (B, d_output)
    return pred_logit

x is my input have Batch with 16 frames per video and with 3 224x224 RGB.
But the question is with 2 label classificaiton task. The loss of the network never goes down.
Is there any thing can I do for better finetune using Marlin.
Any advice will be grateful.
Despitely, My model loss never goes down, like the following:

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions