#9898[Solved][Bug] DeepSeek-V3.1 thinking/no_thinking
Issue Details
Checklist
- 1. I have searched related issues but cannot get the expected help.
- 2. The bug has not been fixed in the latest version.
- 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 5. Please use English, otherwise it will be closed.
Describe the bug
I use two 8-card H20s to deploy model services through sglang. The sglang version is 0.5.1.poste2, and the startup command is : python3 -m sglang.launch_server --model_path /mnt/llm_model/DeepSeek-V3.1 --tp 16 --dist-init-addr 30.238.17.116:20000 --nodes 2 --node_rank 0 --trust-remote-code --host 0.0.0.0 --port 40000 --mem-fraction-static 0.9 --max-running-requests 16 --attention-backend flashinfer --tool-call-parser deepseekv3 --chat-template /sglang/examples/chat_template/tool_chat-template_deepseekv3.jinja --reasoning-parser deepseek-v3 --enable-torch-compile --chunked-prefill-size 16384
But I found that the service cannot effectively switch between thinking and no_thinking modes. Regardless of whether the thinking parameter in my chat template kwargs is True or False, the interface will randomly return the result of thinking or no_thinking
Reproduction
python3 -m sglang.launch_server --model_path /mnt/llm_model/DeepSeek-V3.1 --tp 16 --dist-init-addr 30.238.17.116:20000 --nodes 2 --node_rank 0 --trust-remote-code --host 0.0.0.0 --port 40000 --mem-fraction-static 0.9 --max-running-requests 16 --attention-backend flashinfer --tool-call-parser deepseekv3 --chat-template /sglang/examples/chat_template/tool_chat-template_deepseekv3.jinja --reasoning-parser deepseek-v3 --enable-torch-compile --chunked-prefill-size 16384
Environment
sglang-0.5.1.post2