Skip to content

fix(mm): support diffusers FLUX LoRAs on NF4/8-bit quantized base models#9118

Open
Pfannkuchensack wants to merge 3 commits intoinvoke-ai:mainfrom
Pfannkuchensack:fix/flux-nf4-merged-lora-attribute-error
Open

fix(mm): support diffusers FLUX LoRAs on NF4/8-bit quantized base models#9118
Pfannkuchensack wants to merge 3 commits intoinvoke-ai:mainfrom
Pfannkuchensack:fix/flux-nf4-merged-lora-attribute-error

Conversation

@Pfannkuchensack
Copy link
Copy Markdown
Collaborator

Summary

CustomInvokeLinearNF4 and CustomInvokeLinear8bitLt were missing the _cast_weight_bias_for_input / _cast_tensor_for_input methods that the sidecar-patches branch in autocast_linear_forward_sidecar_patches calls. This caused an AttributeError whenever a non-LoRALayer/FluxControlLoRALayer patch (e.g. MergedLayerPatch produced by the diffusers FLUX LoRA converter for fused Q/K/V/mlp into linear1) was applied to a quantized FLUX module.

The weight is exposed as a meta-device tensor with the correct logical shape (read from quant_state for Params4bit, since .shape reports the packed-byte layout). Shape-only patches (LoRA, LoHA, MergedLayerPatch) work; SetParameterLayer / DoRA on quantized modules remain unsupported.

Related Issues / Discussions

https://discord.com/channels/1020123559063990373/1500616847106506752

QA Instructions

Download the Lora from here and try to run it with a flux dev model

Merge Plan

Standard merge.

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • ❗Changes to a redux slice have a corresponding migration
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

CustomInvokeLinearNF4 and CustomInvokeLinear8bitLt were missing the
_cast_weight_bias_for_input / _cast_tensor_for_input methods that the
sidecar-patches branch in autocast_linear_forward_sidecar_patches calls.
This caused an AttributeError whenever a non-LoRALayer/FluxControlLoRALayer
patch (e.g. MergedLayerPatch produced by the diffusers FLUX LoRA converter
for fused Q/K/V/mlp into linear1) was applied to a quantized FLUX module.

The weight is exposed as a meta-device tensor with the correct logical
shape (read from quant_state for Params4bit, since .shape reports the
packed-byte layout). Shape-only patches (LoRA, LoHA, MergedLayerPatch)
work; SetParameterLayer / DoRA on quantized modules remain unsupported.
@github-actions github-actions Bot added python PRs that change python files backend PRs that change backend files labels May 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend PRs that change backend files python PRs that change python files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant