#16650issue: Streaming output from local Ollama in OpenWebUI is extremely slow (40–50‑token bursts) when WebSocket is disabled

yuliang615yuliang615
opened about 2 months ago
Author

Check Existing Issues

  • I have searched the existing issues and discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

0.6.22

Ollama Version (if applicable)

No response

Operating System

ubuntu

Browser (if applicable)

chrome

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

When the streaming option is enabled, the UI should display the model’s output token‑by‑token (or at least in very small chunks, e.g. 1–2 tokens) as the LLM generates it, regardless of whether the model is accessed via the OpenAI API key or via a local Ollama instance.

Actual Behavior

Using a remote LLM (OpenAI API key) through OpenWebUI → tokens appear smoothly, 1–2 at a time. Using a local Ollama instance → the UI lags and only shows roughly 40–50 tokens at a time before the next burst. The local Ollama CLI (ollama run … --stream) behaves correctly (1–2 tokens per second). The problem persists when the WebSocket setting in OpenWebUI is disabled; it appears that the streaming mechanism is affected.

Steps to Reproduce

Install OpenWebUI and Ollama locally (Docker or native). Configure OpenWebUI to use the local Ollama model In config.json (or via the UI) set model: "ollama" and point to the local Ollama URL. Disable WebSocket in the OpenWebUI settings (or set websocket: false in the config). Start OpenWebUI. Send a prompt (e.g., “Explain quantum entanglement”) through the web UI. Observe that the output only updates after a large batch of ~40–50 tokens has been generated; the UI feels sluggish. Re‑enable WebSocket (or set websocket: true) and repeat step 5 – the output now streams smoothly, 1–2 tokens at a time.

Logs & Screenshots

https://github.com/user-attachments/assets/1df21ac4-bd39-4ee3-b928-1faf47963d53

Additional Information

What we haven’t tested yet

Whether the lag disappears when WebSocket is enabled again (i.e. is it strictly a WebSocket issue?). Whether the same problem appears if we use the HTTP‑fallback route but keep WebSocket enabled, or if we switch to a different reverse‑proxy (NGINX, Caddy, etc.).