#16650issue: Streaming output from local Ollama in OpenWebUI is extremely slow (40–50‑token bursts) when WebSocket is disabled
Issue Details
Check Existing Issues
- I have searched the existing issues and discussions.
- I am using the latest version of Open WebUI.
Installation Method
Docker
Open WebUI Version
0.6.22
Ollama Version (if applicable)
No response
Operating System
ubuntu
Browser (if applicable)
chrome
Confirmation
- I have read and followed all instructions in
README.md
. - I am using the latest version of both Open WebUI and Ollama.
- I have included the browser console logs.
- I have included the Docker container logs.
- I have provided every relevant configuration, setting, and environment variable used in my setup.
- I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
- I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
- Start with the initial platform/version/OS and dependencies used,
- Specify exact install/launch/configure commands,
- List URLs visited, user input (incl. example values/emails/passwords if needed),
- Describe all options and toggles enabled or changed,
- Include any files or environmental changes,
- Identify the expected and actual result at each stage,
- Ensure any reasonably skilled user can follow and hit the same issue.
Expected Behavior
When the streaming option is enabled, the UI should display the model’s output token‑by‑token (or at least in very small chunks, e.g. 1–2 tokens) as the LLM generates it, regardless of whether the model is accessed via the OpenAI API key or via a local Ollama instance.
Actual Behavior
Using a remote LLM (OpenAI API key) through OpenWebUI → tokens appear smoothly, 1–2 at a time. Using a local Ollama instance → the UI lags and only shows roughly 40–50 tokens at a time before the next burst. The local Ollama CLI (ollama run … --stream) behaves correctly (1–2 tokens per second). The problem persists when the WebSocket setting in OpenWebUI is disabled; it appears that the streaming mechanism is affected.
Steps to Reproduce
Install OpenWebUI and Ollama locally (Docker or native). Configure OpenWebUI to use the local Ollama model In config.json (or via the UI) set model: "ollama" and point to the local Ollama URL. Disable WebSocket in the OpenWebUI settings (or set websocket: false in the config). Start OpenWebUI. Send a prompt (e.g., “Explain quantum entanglement”) through the web UI. Observe that the output only updates after a large batch of ~40–50 tokens has been generated; the UI feels sluggish. Re‑enable WebSocket (or set websocket: true) and repeat step 5 – the output now streams smoothly, 1–2 tokens at a time.
Logs & Screenshots
https://github.com/user-attachments/assets/1df21ac4-bd39-4ee3-b928-1faf47963d53
Additional Information
What we haven’t tested yet
Whether the lag disappears when WebSocket is enabled again (i.e. is it strictly a WebSocket issue?). Whether the same problem appears if we use the HTTP‑fallback route but keep WebSocket enabled, or if we switch to a different reverse‑proxy (NGINX, Caddy, etc.).