[Feature] Support CuteDSL BF16 Gemm kernels

#9631

sglang

sgl-project

Issue Details

6 days ago

No assignee

View on GitHub

I Want to Work on This Issue

Fridge003

opened 6 days ago

Author

Checklist

1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
2. Please use English, otherwise it will be closed.

Motivation

Currently all the bf16 gemm kernels in SGLang are called from torch.linear, with underlying Cublas kernels. On B200 CuteDSL might provide bf16 gemm kernels faster than Cublas.

https://github.com/Dao-AILab/quack/blob/main/quack/dense_gemm_sm100.py

No response

I Want to Work on This Issue

sglang

sgl-project

Issue Details

6 days ago

No assignee

View on GitHub

I Want to Work on This Issue

[Feature] Support CuteDSL BF16 Gemm kernels

#9631

Fridge003

opened 6 days ago

Author

Checklist

1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
2. Please use English, otherwise it will be closed.

Motivation

Currently all the bf16 gemm kernels in SGLang are called from torch.linear, with underlying Cublas kernels. On B200 CuteDSL might provide bf16 gemm kernels faster than Cublas.

https://github.com/Dao-AILab/quack/blob/main/quack/dense_gemm_sm100.py

No response

I Want to Work on This Issue

[Feature] Support CuteDSL BF16 Gemm kernels

Issue Details

Checklist

Motivation

Related resources

Issue Details

[Feature] Support CuteDSL BF16 Gemm kernels

Checklist

Motivation

Related resources