feat: ROCm /w RAG and SentenceTransformers
Unfortunately, ROCM is only supported for ollama, but that works fine.
For RAG, the SentenceTransformers are unfortunately only implemented with CUDA, otherwise you only have the CPU. SentenceTransformers with CUDA are also faster than ollama with ROCM and the reranking also runs on CPU.
In any case, I have the effect that about 100 documents (100k-10MB) are processed for a very long time in the RAG before the LLM (regardless of whether 3B, 8B or 22B) is executed on a Radeon Pro W7900 with 48GB VRAM.
PyTorch can do ROCM: https://rocm.docs.amd.com/projects/install-on-linux/en/develop/install/3rd-party/pytorch-install.html
feat: ROCm /w RAG and SentenceTransformers
Unfortunately, ROCM is only supported for ollama, but that works fine.
For RAG, the SentenceTransformers are unfortunately only implemented with CUDA, otherwise you only have the CPU. SentenceTransformers with CUDA are also faster than ollama with ROCM and the reranking also runs on CPU.
In any case, I have the effect that about 100 documents (100k-10MB) are processed for a very long time in the RAG before the LLM (regardless of whether 3B, 8B or 22B) is executed on a Radeon Pro W7900 with 48GB VRAM.
PyTorch can do ROCM: https://rocm.docs.amd.com/projects/install-on-linux/en/develop/install/3rd-party/pytorch-install.html