Single Endpoint Usage

Every chat-style model on LLMTR is reachable from /v1/chat/completions. Regardless of the underlying provider, requests use the same format and responses come back in the OpenAI Chat Completions shape. Your existing OpenAI SDK code works against the entire catalog.

Example request

curl https://llmtr.com/v1/chat/completions \
  -H "Authorization: Bearer llmtr-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "xai/grok-4",
    "messages": [{ "role": "user", "content": "Hello" }]
  }'

from openai import OpenAI

client = OpenAI(
    base_url="https://llmtr.com/v1",
    api_key="llmtr-YOUR_KEY",
)

resp = client.chat.completions.create(
    model="xai/grok-4",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)

Reasoning effort

For reasoning-capable models, set the effort level via a slug suffix or the reasoning.effort body field.

curl https://llmtr.com/v1/chat/completions \
  -H "Authorization: Bearer llmtr-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "xai/grok-4:high",
    "messages": [{ "role": "user", "content": "Draft a plan." }]
  }'

Suffix values: :low, :medium, :high, :max.

Streaming

When stream: true is set, the response is delivered as Server-Sent Events chunk by chunk. The format matches OpenAI Chat Completions: data: {...}\n\n lines followed by data: [DONE].

stream = client.chat.completions.create(
    model="xai/grok-4",
    messages=[{"role": "user", "content": "Write a long answer"}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

When to use `/v1/responses`

Use /v1/responses directly when you need advanced features:

Access to reasoning_summary output
Provider-specific instructions field
Grok 4.20 multi-agent execution depth control

Details: /docs/en/gateway/responses.

Billing

Billing is based on actual input/output token counts reported by the model. The per-token price is the same whether the model is called via /v1/chat/completions or /v1/responses.