[Bug] glm4.5v use sglang v0.5.0rc1 appeare type `glm4v_moe` error (#9178) – sgl-project/sglang

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

Reproduction

Docker compose as follow:

services:
  sglang-glm4.5v-106b:
    image: lmsysorg/sglang:v0.5.0rc1-cu126
    container_name: sglang-glm4.5v-106b-a12b
    volumes:
      - /home/vg_llm/model:/model
      - /etc/localtime:/etc/localtime:ro
      - /usr/share/zoneinfo/Asia/Shanghai:/usr/share/zoneinfo/Asia/Shanghai:ro
    restart: always
    network_mode: host # required by RDMA
    privileged: true # required by RDMA
    environment:
      - CUDA_VISIBLE_DEVICES=4,5,6,7
    entrypoint: python3 -m sglang.launch_server
    command: |
      --model-path /model/GLM-4.5V-FP8
      --host 192.168.10.87
      --port 8012
      --context-length 40960
      --tp-size 4
      --mem-fraction-static 0.80
      --chunked-prefill-size 4096
      --schedule-conservativeness 0.3
      --tool-call-parser glm45
      --reasoning-parser glm45
      --enable-metrics

Error as follow:

W0814 14:27:31.050000 1 torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
W0814 14:27:31.050000 1 torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
/sgl-workspace/sglang/python/sglang/srt/managers/session_controller.py:57: SyntaxWarning: invalid escape sequence '\-'
  prefix = " " * len(origin_prefix) + " \- " + child.req.rid
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1271, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
                   ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 966, in __getitem__
    raise KeyError(key)
KeyError: 'glm4v_moe'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 11, in <module>
    server_args = prepare_server_args(sys.argv[1:])
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 2208, in prepare_server_args
    server_args = ServerArgs.from_cli_args(raw_args)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 1964, in from_cli_args
    return cls(**{attr: getattr(args, attr) for attr in attrs})
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 193, in __init__
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 392, in __post_init__
    self.model_specific_adjustments()
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 2104, in model_specific_adjustments
    hf_config = self.get_hf_config()
                ^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 1974, in get_hf_config
    hf_config = get_config(
                ^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 2745, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/hf_transformers_utils.py", line 129, in get_config
    config = AutoConfig.from_pretrained(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1273, in from_pretrained
    raise ValueError(
ValueError: The checkpoint you are trying to load has model type `glm4v_moe` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`

Environment

RTX 5880 * 8 docker sglang v0.5.0rc1

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

Reproduction

Docker compose as follow:

services:
  sglang-glm4.5v-106b:
    image: lmsysorg/sglang:v0.5.0rc1-cu126
    container_name: sglang-glm4.5v-106b-a12b
    volumes:
      - /home/vg_llm/model:/model
      - /etc/localtime:/etc/localtime:ro
      - /usr/share/zoneinfo/Asia/Shanghai:/usr/share/zoneinfo/Asia/Shanghai:ro
    restart: always
    network_mode: host # required by RDMA
    privileged: true # required by RDMA
    environment:
      - CUDA_VISIBLE_DEVICES=4,5,6,7
    entrypoint: python3 -m sglang.launch_server
    command: |
      --model-path /model/GLM-4.5V-FP8
      --host 192.168.10.87
      --port 8012
      --context-length 40960
      --tp-size 4
      --mem-fraction-static 0.80
      --chunked-prefill-size 4096
      --schedule-conservativeness 0.3
      --tool-call-parser glm45
      --reasoning-parser glm45
      --enable-metrics

Error as follow:

W0814 14:27:31.050000 1 torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
W0814 14:27:31.050000 1 torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
/sgl-workspace/sglang/python/sglang/srt/managers/session_controller.py:57: SyntaxWarning: invalid escape sequence '\-'
  prefix = " " * len(origin_prefix) + " \- " + child.req.rid
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1271, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
                   ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 966, in __getitem__
    raise KeyError(key)
KeyError: 'glm4v_moe'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 11, in <module>
    server_args = prepare_server_args(sys.argv[1:])
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 2208, in prepare_server_args
    server_args = ServerArgs.from_cli_args(raw_args)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 1964, in from_cli_args
    return cls(**{attr: getattr(args, attr) for attr in attrs})
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 193, in __init__
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 392, in __post_init__
    self.model_specific_adjustments()
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 2104, in model_specific_adjustments
    hf_config = self.get_hf_config()
                ^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 1974, in get_hf_config
    hf_config = get_config(
                ^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 2745, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/hf_transformers_utils.py", line 129, in get_config
    config = AutoConfig.from_pretrained(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1273, in from_pretrained
    raise ValueError(
ValueError: The checkpoint you are trying to load has model type `glm4v_moe` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`

Environment

RTX 5880 * 8 docker sglang v0.5.0rc1

[Bug] glm4.5v use sglang v0.5.0rc1 appeare type `glm4v_moe` error

Issue Details

Checklist

Describe the bug

Reproduction

Environment

Issue Details

[Bug] glm4.5v use sglang v0.5.0rc1 appeare type `glm4v_moe` error

Checklist

Describe the bug

Reproduction

Environment