Questions About the finetune using the Features Extracted From a Pre-trained Model. Thanks for answer

I use the video clips as input to do video face classification task (Palsy or not).
Here is my model like:
class MAE_only_x(nn.Module):

    def __init__(self, args):
        super().__init__()
        self.mae = Marlin.from_online("marlin_vit_base_ytf")
        self.mae.eval()
        # self.norm = nn.BatchNorm1d(768*2)
        self.decoder = nn.Sequential(
                nn.Linear(768, 512),
                # nn.BatchNorm1d(512),
                nn.ReLU(),
                # nn.Dropout(0.3),
                nn.Linear(512, 2)
            )
    def forward(self, x, phrase='train'):
        """
        Input x is shape (B, L, d_input)
        """
        # x [batch, 16, 3, 224, 224]
        x = x.permute(0, 2, 1, 3, 4).contiguous()
        x = self.mae.extract_features(x,keep_seq=False) # (B, 768)
        pred_logit = self.decoder(x)  # (B, d_model) -> (B, d_output)
        return pred_logit
x is my input have Batch with 16 frames per video and with 3 224x224 RGB.
But the question is with 2 label classificaiton task. The loss of the network never goes down.
Is there any thing can I do for better finetune using Marlin.
Any advice will be grateful.
Despitely, My model loss never goes down, like the following:

<img width="706" height="881" alt="Image" src="https://github.com/user-attachments/assets/3782c6fd-773f-44fe-8bc7-db8eba63f686" />


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions About the finetune using the Features Extracted From a Pre-trained Model. Thanks for answer #36

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Questions About the finetune using the Features Extracted From a Pre-trained Model. Thanks for answer #36

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions