feat: ROCm /w RAG and SentenceTransformers

#8365

open-webui

Issue Details

7 months ago

No assignee

good first issuehelp wanted

View on GitHub

Schwenn2002

opened 7 months ago

Author

Unfortunately, ROCM is only supported for ollama, but that works fine.

For RAG, the SentenceTransformers are unfortunately only implemented with CUDA, otherwise you only have the CPU. SentenceTransformers with CUDA are also faster than ollama with ROCM and the reranking also runs on CPU.

In any case, I have the effect that about 100 documents (100k-10MB) are processed for a very long time in the RAG before the LLM (regardless of whether 3B, 8B or 22B) is executed on a Radeon Pro W7900 with 48GB VRAM.

PyTorch can do ROCM: https://rocm.docs.amd.com/projects/install-on-linux/en/develop/install/3rd-party/pytorch-install.html