Skip to content

fix: guard model.half() with dtype check in all rerankers#1574

Open
nedeadinside wants to merge 3 commits intoFlagOpen:masterfrom
nedeadinside:master
Open

fix: guard model.half() with dtype check in all rerankers#1574
nedeadinside wants to merge 3 commits intoFlagOpen:masterfrom
nedeadinside:master

Conversation

@nedeadinside
Copy link
Copy Markdown

Calling .half() on an already-FP16 model raises an error, causing rerankers with use_fp16=True to crash on every request after the first. Add a dtype guard so the conversion only runs when needed:

if self.use_fp16 and next(self.model.parameters()).dtype != torch.float16:
self.model.half()

Fixes BaseReranker, BaseLLMReranker, LightweightLLMReranker, LayerWiseLLMReranker, and MatroyshkaReranker.

nedeadinside and others added 2 commits April 16, 2026 17:03
Calling .half() on an already-FP16 model raises an error, causing
rerankers with use_fp16=True to crash on every request after the first.
Add a dtype guard so the conversion only runs when needed:

    if self.use_fp16 and next(self.model.parameters()).dtype != torch.float16:
        self.model.half()

Fixes BaseReranker, BaseLLMReranker, LightweightLLMReranker,
LayerWiseLLMReranker, and MatroyshkaReranker.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fix: guard model.half() with dtype check in all rerankers
Copilot AI review requested due to automatic review settings April 16, 2026 10:19
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents reranker inference from repeatedly calling model.half() when the loaded model is already FP16, which can crash subsequent requests when use_fp16=True. It adds a dtype guard in the compute_score_single_gpu path across the core rerankers (and the Matroyshka research reranker).

Changes:

  • Guard self.model.half() behind a dtype check (!= torch.float16) to avoid redundant FP16 conversion.
  • Apply the guard consistently across encoder-only, decoder-only, and Matroyshka reranker implementations.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
research/Matroyshka_reranker/inference/rank_model.py Add FP16 dtype guard before calling model.half() in Matroyshka reranker inference.
FlagEmbedding/inference/reranker/encoder_only/base.py Add FP16 dtype guard before calling model.half() in the encoder-only reranker.
FlagEmbedding/inference/reranker/decoder_only/lightweight.py Add FP16 dtype guard before calling model.half() in the lightweight LLM reranker.
FlagEmbedding/inference/reranker/decoder_only/layerwise.py Add FP16 dtype guard before calling model.half() in the layer-wise LLM reranker.
FlagEmbedding/inference/reranker/decoder_only/base.py Add FP16 dtype guard before calling model.half() in the base decoder-only LLM reranker.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread FlagEmbedding/inference/reranker/encoder_only/base.py Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants