Single Endpoint Usage
Every chat-style model on LLMTR is reachable from /v1/chat/completions. Regardless of the underlying provider, requests use the same format and responses come back in the OpenAI Chat Completions shape. Your existing OpenAI SDK code works against the entire catalog.
Example request
Section titled “Example request”curl https://llmtr.com/v1/chat/completions \ -H "Authorization: Bearer llmtr-YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "xai/grok-4", "messages": [{ "role": "user", "content": "Hello" }] }'from openai import OpenAI
client = OpenAI( base_url="https://llmtr.com/v1", api_key="llmtr-YOUR_KEY",)
resp = client.chat.completions.create( model="xai/grok-4", messages=[{"role": "user", "content": "Hello"}],)print(resp.choices[0].message.content)Reasoning effort
Section titled “Reasoning effort”For reasoning-capable models, set the effort level via a slug suffix or the reasoning.effort body field.
curl https://llmtr.com/v1/chat/completions \ -H "Authorization: Bearer llmtr-YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "xai/grok-4:high", "messages": [{ "role": "user", "content": "Draft a plan." }] }'Suffix values: :low, :medium, :high, :max.
Streaming
Section titled “Streaming”When stream: true is set, the response is delivered as Server-Sent Events chunk by chunk. The format matches OpenAI Chat Completions: data: {...}\n\n lines followed by data: [DONE].
stream = client.chat.completions.create( model="xai/grok-4", messages=[{"role": "user", "content": "Write a long answer"}], stream=True,)for chunk in stream: delta = chunk.choices[0].delta.content if delta: print(delta, end="", flush=True)When to use /v1/responses
Section titled “When to use /v1/responses”Use /v1/responses directly when you need advanced features:
- Access to
reasoning_summaryoutput - Provider-specific
instructionsfield - Grok 4.20 multi-agent execution depth control
Details: /docs/en/gateway/responses.
Billing
Section titled “Billing”Billing is based on actual input/output token counts reported by the model. The per-token price is the same whether the model is called via /v1/chat/completions or /v1/responses.