Skip to content

Streaming

Pass "stream": true and the gateway returns a Server-Sent Events (SSE) stream. OpenAI SDK streaming mode works out of the box.

Terminal window
curl https://llmtr.com/v1/chat/completions \
-H "Authorization: Bearer sk_your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o",
"messages": [
{"role": "user", "content": "Hello!"}
],
"stream": true
}'

Each chunk is a data: {json} line:

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hel"}}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"delta":{"content":"lo"}}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop"}]}
data: [DONE]

By default the last chunk does not include usage. Set stream_options.include_usage: true on supported models to get token counts in the final chunk.

  • Tokens consumed during a dropped connection are still billed.
  • Your proxy must disable SSE buffering (nginx: proxy_buffering off).
  • It is a long-lived HTTP connection, not a WebSocket.