[Solved][Bug] DeepSeek-V3.1 thinking/no_thinking (#9898) – sgl-project/sglang

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

I use two 8-card H20s to deploy model services through sglang. The sglang version is 0.5.1.poste2, and the startup command is ： python3 -m sglang.launch_server --model_path /mnt/llm_model/DeepSeek-V3.1 --tp 16 --dist-init-addr 30.238.17.116:20000 --nodes 2 --node_rank 0 --trust-remote-code --host 0.0.0.0 --port 40000 --mem-fraction-static 0.9 --max-running-requests 16 --attention-backend flashinfer --tool-call-parser deepseekv3 --chat-template /sglang/examples/chat_template/tool_chat-template_deepseekv3.jinja --reasoning-parser deepseek-v3 --enable-torch-compile --chunked-prefill-size 16384

But I found that the service cannot effectively switch between thinking and no_thinking modes. Regardless of whether the thinking parameter in my chat template kwargs is True or False, the interface will randomly return the result of thinking or no_thinking

Reproduction

python3 -m sglang.launch_server --model_path /mnt/llm_model/DeepSeek-V3.1 --tp 16 --dist-init-addr 30.238.17.116:20000 --nodes 2 --node_rank 0 --trust-remote-code --host 0.0.0.0 --port 40000 --mem-fraction-static 0.9 --max-running-requests 16 --attention-backend flashinfer --tool-call-parser deepseekv3 --chat-template /sglang/examples/chat_template/tool_chat-template_deepseekv3.jinja --reasoning-parser deepseek-v3 --enable-torch-compile --chunked-prefill-size 16384

Environment

sglang-0.5.1.post2

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

Reproduction

Environment

sglang-0.5.1.post2

#9898[Solved][Bug] DeepSeek-V3.1 thinking/no_thinking

Issue Details

Checklist

Describe the bug

Reproduction

Environment

Issue Details

Checklist

Describe the bug

Reproduction

Environment