Thanks for sharing the code and your research.
Could you also share the fine-tuning code if available? I'm working on pre-training the network and then fine-tuning it on the same dataset using supervised learning.
Should I remove the decoder and add linear layers at the end for this process?
Thanks for sharing the code and your research.
Could you also share the fine-tuning code if available? I'm working on pre-training the network and then fine-tuning it on the same dataset using supervised learning.
Should I remove the decoder and add linear layers at the end for this process?