[Bug] glm4.5v use sglang v0.5.0rc1 appeare type `glm4v_moe` error

#9178

Issue Details

15 days ago
No assignee
Jimmy-L99Jimmy-L99
opened 15 days ago
Author

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

Reproduction

Docker compose as follow:

services: sglang-glm4.5v-106b: image: lmsysorg/sglang:v0.5.0rc1-cu126 container_name: sglang-glm4.5v-106b-a12b volumes: - /home/vg_llm/model:/model - /etc/localtime:/etc/localtime:ro - /usr/share/zoneinfo/Asia/Shanghai:/usr/share/zoneinfo/Asia/Shanghai:ro restart: always network_mode: host # required by RDMA privileged: true # required by RDMA environment: - CUDA_VISIBLE_DEVICES=4,5,6,7 entrypoint: python3 -m sglang.launch_server command: | --model-path /model/GLM-4.5V-FP8 --host 192.168.10.87 --port 8012 --context-length 40960 --tp-size 4 --mem-fraction-static 0.80 --chunked-prefill-size 4096 --schedule-conservativeness 0.3 --tool-call-parser glm45 --reasoning-parser glm45 --enable-metrics

Error as follow:

W0814 14:27:31.050000 1 torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. W0814 14:27:31.050000 1 torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures. /sgl-workspace/sglang/python/sglang/srt/managers/session_controller.py:57: SyntaxWarning: invalid escape sequence '\-' prefix = " " * len(origin_prefix) + " \- " + child.req.rid Traceback (most recent call last): File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1271, in from_pretrained config_class = CONFIG_MAPPING[config_dict["model_type"]] ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 966, in __getitem__ raise KeyError(key) KeyError: 'glm4v_moe' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 11, in <module> server_args = prepare_server_args(sys.argv[1:]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 2208, in prepare_server_args server_args = ServerArgs.from_cli_args(raw_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 1964, in from_cli_args return cls(**{attr: getattr(args, attr) for attr in attrs}) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<string>", line 193, in __init__ File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 392, in __post_init__ self.model_specific_adjustments() File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 2104, in model_specific_adjustments hf_config = self.get_hf_config() ^^^^^^^^^^^^^^^^^^^^ File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 1974, in get_hf_config hf_config = get_config( ^^^^^^^^^^^ File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 2745, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/sgl-workspace/sglang/python/sglang/srt/hf_transformers_utils.py", line 129, in get_config config = AutoConfig.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1273, in from_pretrained raise ValueError( ValueError: The checkpoint you are trying to load has model type `glm4v_moe` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date. You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`

Environment

RTX 5880 * 8 docker sglang v0.5.0rc1