vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

vllm
58.2k
Beginner Friendly
External Contribution
High
External Merge Rate
Medium
Avg. Time to Merge
Medium
Good First Issues
52 issues

Languages

Python
86.3%
Cuda
8.1%
C++
4.2%
Shell
0.7%
C
0.4%
CMake
0.3%

Good First Issues

Ideal for first-time contributors

README.md

Community