feat: ROCm /w RAG and SentenceTransformers

#8365

Issue Details

7 months ago
No assignee
good first issuehelp wanted
Schwenn2002Schwenn2002
opened 7 months ago
Author

Unfortunately, ROCM is only supported for ollama, but that works fine.

For RAG, the SentenceTransformers are unfortunately only implemented with CUDA, otherwise you only have the CPU. SentenceTransformers with CUDA are also faster than ollama with ROCM and the reranking also runs on CPU.

In any case, I have the effect that about 100 documents (100k-10MB) are processed for a very long time in the RAG before the LLM (regardless of whether 3B, 8B or 22B) is executed on a Radeon Pro W7900 with 48GB VRAM.

PyTorch can do ROCM: https://rocm.docs.amd.com/projects/install-on-linux/en/develop/install/3rd-party/pytorch-install.html