fix: guard model.half() with dtype check in all rerankers by nedeadinside · Pull Request #1574 · FlagOpen/FlagEmbedding

nedeadinside · 2026-04-16T10:19:37Z

Calling .half() on an already-FP16 model raises an error, causing rerankers with use_fp16=True to crash on every request after the first. Add a dtype guard so the conversion only runs when needed:

if self.use_fp16 and next(self.model.parameters()).dtype != torch.float16:
self.model.half()

Fixes BaseReranker, BaseLLMReranker, LightweightLLMReranker, LayerWiseLLMReranker, and MatroyshkaReranker.

Calling .half() on an already-FP16 model raises an error, causing rerankers with use_fp16=True to crash on every request after the first. Add a dtype guard so the conversion only runs when needed: if self.use_fp16 and next(self.model.parameters()).dtype != torch.float16: self.model.half() Fixes BaseReranker, BaseLLMReranker, LightweightLLMReranker, LayerWiseLLMReranker, and MatroyshkaReranker. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: guard model.half() with dtype check in all rerankers

Copilot

Pull request overview

This PR prevents reranker inference from repeatedly calling model.half() when the loaded model is already FP16, which can crash subsequent requests when use_fp16=True. It adds a dtype guard in the compute_score_single_gpu path across the core rerankers (and the Matroyshka research reranker).

Changes:

Guard self.model.half() behind a dtype check (!= torch.float16) to avoid redundant FP16 conversion.
Apply the guard consistently across encoder-only, decoder-only, and Matroyshka reranker implementations.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
research/Matroyshka_reranker/inference/rank_model.py	Add FP16 dtype guard before calling `model.half()` in Matroyshka reranker inference.
FlagEmbedding/inference/reranker/encoder_only/base.py	Add FP16 dtype guard before calling `model.half()` in the encoder-only reranker.
FlagEmbedding/inference/reranker/decoder_only/lightweight.py	Add FP16 dtype guard before calling `model.half()` in the lightweight LLM reranker.
FlagEmbedding/inference/reranker/decoder_only/layerwise.py	Add FP16 dtype guard before calling `model.half()` in the layer-wise LLM reranker.
FlagEmbedding/inference/reranker/decoder_only/base.py	Add FP16 dtype guard before calling `model.half()` in the base decoder-only LLM reranker.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

nedeadinside and others added 2 commits April 16, 2026 17:03

Merge pull request #1 from nedeadinside/fix/reranker-fp16-half-guard

fc17449

fix: guard model.half() with dtype check in all rerankers

Copilot AI review requested due to automatic review settings April 16, 2026 10:19

Copilot started reviewing on behalf of nedeadinside April 16, 2026 10:20 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

Comment thread FlagEmbedding/inference/reranker/encoder_only/base.py Outdated

refactor: replace self.use_fp16 mutation with local fp16 flag

afe28ad

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: guard model.half() with dtype check in all rerankers#1574

fix: guard model.half() with dtype check in all rerankers#1574
nedeadinside wants to merge 3 commits intoFlagOpen:masterfrom
nedeadinside:master

nedeadinside commented Apr 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nedeadinside commented Apr 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants