Skip to content

Single Endpoint Usage

Every chat-style model on LLMTR is reachable from /v1/chat/completions. Regardless of the underlying provider, requests use the same format and responses come back in the OpenAI Chat Completions shape. Your existing OpenAI SDK code works against the entire catalog.

Terminal window
curl https://llmtr.com/v1/chat/completions \
-H "Authorization: Bearer llmtr-YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "xai/grok-4",
"messages": [{ "role": "user", "content": "Hello" }]
}'
from openai import OpenAI
client = OpenAI(
base_url="https://llmtr.com/v1",
api_key="llmtr-YOUR_KEY",
)
resp = client.chat.completions.create(
model="xai/grok-4",
messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)

For reasoning-capable models, set the effort level via a slug suffix or the reasoning.effort body field.

Terminal window
curl https://llmtr.com/v1/chat/completions \
-H "Authorization: Bearer llmtr-YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "xai/grok-4:high",
"messages": [{ "role": "user", "content": "Draft a plan." }]
}'

Suffix values: :low, :medium, :high, :max.

When stream: true is set, the response is delivered as Server-Sent Events chunk by chunk. The format matches OpenAI Chat Completions: data: {...}\n\n lines followed by data: [DONE].

stream = client.chat.completions.create(
model="xai/grok-4",
messages=[{"role": "user", "content": "Write a long answer"}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)

Use /v1/responses directly when you need advanced features:

  • Access to reasoning_summary output
  • Provider-specific instructions field
  • Grok 4.20 multi-agent execution depth control

Details: /docs/en/gateway/responses.

Billing is based on actual input/output token counts reported by the model. The per-token price is the same whether the model is called via /v1/chat/completions or /v1/responses.