Please support loading Qwen 2.5 VL from GGUF

#40049

Issue Details

6 days ago
No assignee
Feature request
ihendleyihendley
opened 6 days ago
Author

Feature request

The new Qwen Image uses Qwen 2.5 VL 7B as a text encoder. Given memory constraints, some users may want to load a quantized image model and text encoder for a diffusers QwenImagePipeline, for example:

from diffusers import QwenImagePipeline, QwenImageTransformer2DModel, GGUFQuantizationConfig import torch from transformers import AutoModelForCausalLM transformer = QwenImageTransformer2DModel.from_single_file( "https://huggingface.co/QuantStack/Qwen-Image-GGUF/blob/main/Qwen_Image-Q4_K_M.gguf", quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16), torch_dtype=torch.bfloat16, config="Qwen/Qwen-Image", subfolder="transformer", ) text_encoder = AutoModelForCausalLM.from_pretrained( "unsloth/Qwen2.5-VL-7B-Instruct-GGUF", gguf_file="Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf", torch_dtype=torch.bfloat16, ) pipe = QwenImagePipeline.from_pretrained( "Qwen/Qwen-Image", transformer=transformer, text_encoder=text_encoder, torch_dtype=torch.bfloat16, )

However, this currently fails with the error:

ValueError: GGUF model with architecture qwen2vl is not supported yet.

Motivation

As described above, Qwen 2.5 VL 7B is the text encoder for the new state-of-the-art Qwen Image model, and diffusers will either attempt to download and load the full unquanitzed Qwen 2.5 VL 7B (~15GB), or it will accept a transformers model text_encoder argument, so it would be very useful to be able to load a GGUF model here to save memory.

Your contribution

With some help getting started and support along the way I could make an attempt at a PR. However it might be quicker if someone with more experience takes the lead.