[Feature] Support CuteDSL BF16 Gemm kernels
Author
Checklist
- 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 2. Please use English, otherwise it will be closed.
Motivation
Currently all the bf16 gemm kernels in SGLang are called from torch.linear, with underlying Cublas kernels. On B200 CuteDSL might provide bf16 gemm kernels faster than Cublas.
https://github.com/Dao-AILab/quack/blob/main/quack/dense_gemm_sm100.py
Related resources
No response
[Feature] Support CuteDSL BF16 Gemm kernels
Author
Checklist
- 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 2. Please use English, otherwise it will be closed.
Motivation
Currently all the bf16 gemm kernels in SGLang are called from torch.linear, with underlying Cublas kernels. On B200 CuteDSL might provide bf16 gemm kernels faster than Cublas.
https://github.com/Dao-AILab/quack/blob/main/quack/dense_gemm_sm100.py
Related resources
No response