Please support loading Qwen 2.5 VL from GGUF
Issue Details
Feature request
The new Qwen Image uses Qwen 2.5 VL 7B as a text encoder. Given memory constraints, some users may want to load a quantized image model and text encoder for a diffusers QwenImagePipeline, for example:
from diffusers import QwenImagePipeline, QwenImageTransformer2DModel, GGUFQuantizationConfig import torch from transformers import AutoModelForCausalLM transformer = QwenImageTransformer2DModel.from_single_file( "https://huggingface.co/QuantStack/Qwen-Image-GGUF/blob/main/Qwen_Image-Q4_K_M.gguf", quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16), torch_dtype=torch.bfloat16, config="Qwen/Qwen-Image", subfolder="transformer", ) text_encoder = AutoModelForCausalLM.from_pretrained( "unsloth/Qwen2.5-VL-7B-Instruct-GGUF", gguf_file="Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf", torch_dtype=torch.bfloat16, ) pipe = QwenImagePipeline.from_pretrained( "Qwen/Qwen-Image", transformer=transformer, text_encoder=text_encoder, torch_dtype=torch.bfloat16, )
However, this currently fails with the error:
ValueError: GGUF model with architecture qwen2vl is not supported yet.
Motivation
As described above, Qwen 2.5 VL 7B is the text encoder for the new state-of-the-art Qwen Image model, and diffusers will either attempt to download and load the full unquanitzed Qwen 2.5 VL 7B (~15GB), or it will accept a transformers model text_encoder argument, so it would be very useful to be able to load a GGUF model here to save memory.
Your contribution
With some help getting started and support along the way I could make an attempt at a PR. However it might be quicker if someone with more experience takes the lead.
Issue Details
Please support loading Qwen 2.5 VL from GGUF
Feature request
The new Qwen Image uses Qwen 2.5 VL 7B as a text encoder. Given memory constraints, some users may want to load a quantized image model and text encoder for a diffusers QwenImagePipeline, for example:
from diffusers import QwenImagePipeline, QwenImageTransformer2DModel, GGUFQuantizationConfig import torch from transformers import AutoModelForCausalLM transformer = QwenImageTransformer2DModel.from_single_file( "https://huggingface.co/QuantStack/Qwen-Image-GGUF/blob/main/Qwen_Image-Q4_K_M.gguf", quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16), torch_dtype=torch.bfloat16, config="Qwen/Qwen-Image", subfolder="transformer", ) text_encoder = AutoModelForCausalLM.from_pretrained( "unsloth/Qwen2.5-VL-7B-Instruct-GGUF", gguf_file="Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf", torch_dtype=torch.bfloat16, ) pipe = QwenImagePipeline.from_pretrained( "Qwen/Qwen-Image", transformer=transformer, text_encoder=text_encoder, torch_dtype=torch.bfloat16, )
However, this currently fails with the error:
ValueError: GGUF model with architecture qwen2vl is not supported yet.
Motivation
As described above, Qwen 2.5 VL 7B is the text encoder for the new state-of-the-art Qwen Image model, and diffusers will either attempt to download and load the full unquanitzed Qwen 2.5 VL 7B (~15GB), or it will accept a transformers model text_encoder argument, so it would be very useful to be able to load a GGUF model here to save memory.
Your contribution
With some help getting started and support along the way I could make an attempt at a PR. However it might be quicker if someone with more experience takes the lead.