vllm

vllm

53584
9033

A high-throughput and memory-efficient inference and serving engine for LLMs

Community

Languages

Python
84.9%
Cuda
8.8%
C++
4.8%
Shell
0.7%
C
0.5%
CMake
0.3%

Good First Issues

Ideal for first-time contributors

All Issues

Browse every open issue in this repository

README.md