Skip to content

Responses API

The LLMTR gateway supports OpenAI’s /v1/responses endpoint. The GPT-5 Codex family (gpt-5-codex, gpt-5.1-codex, gpt-5.1-codex-max, gpt-5.1-codex-mini, gpt-5.2-codex, gpt-5.3-codex) is only available through this endpoint; sending these models to /v1/chat/completions returns 400 endpoint_mismatch.

  • When you need reasoning effort control (low / medium / high / xhigh)
  • When you use models that benefit from cached input
  • For models that only support the Responses endpoint

For classic chat completion flows keep using /v1/chat/completions.

POST /v1/responses
Authorization: Bearer llmtr-your_key
Content-Type: application/json
FieldTypeRequiredDescription
modelstringyesCanonical model ID (e.g. openai/gpt-5.1-codex). Optional suffix: openai/gpt-5.3-codex:max
messages or inputarray | stringyesOpenAI message format or raw input
instructionsstringnoSystem prompt equivalent
reasoningobjectno{ "effort": "low" | "medium" | "high" | "xhigh", "summary": "auto" | "concise" | "detailed" }
max_output_tokensintegernoOutput token cap
temperaturenumbernoWithin the model’s supported range
toolsarraynoFunction calling (where supported)
tool_choicestring | objectnoauto, none, or specific tool
response_formatobjectnoStructured output
streambooleannoSSE streaming (coming soon)
Terminal window
curl "$LLMTR_BASE_URL/v1/responses" \
-H "Authorization: Bearer llmtr-your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5.1-codex:high",
"messages": [
{"role": "user", "content": "Refactor: make this function pure"}
]
}'

The same request with the body field:

Terminal window
curl https://llmtr.com/v1/responses \
-H "Authorization: Bearer llmtr-your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5.1-codex",
"reasoning": { "effort": "high" },
"messages": [
{"role": "user", "content": "Refactor: make this function pure"}
]
}'
{
"id": "resp_xxx",
"object": "response",
"status": "completed",
"model": "gpt-5.1-codex",
"output": [
{
"type": "message",
"role": "assistant",
"content": [
{ "type": "output_text", "text": "Here is the refactored version..." }
]
}
],
"usage": {
"input_tokens": 142,
"input_tokens_details": { "cached_tokens": 24 },
"output_tokens": 318,
"output_tokens_details": { "reasoning_tokens": 256 }
}
}

status may be completed, incomplete, or failed.

  • input_tokens_details.cached_tokens shows how much of the prompt came from cache.
  • output_tokens_details.reasoning_tokens shows reasoning-related token usage.
  • Total cost follows the model’s own pricing; the platform margin is applied only when topping up credit.
  • Confirm the current model price in the dashboard or catalog before production use.

The recommended endpoint for xai/grok-4.3, xai/grok-4.20-multi-agent, xai/grok-4.20-0309-reasoning, and xai/grok-4.20-0309-non-reasoning is /v1/responses:

Terminal window
curl https://llmtr.com/v1/responses \
-H "Authorization: Bearer llmtr-your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "xai/grok-4.3",
"input": "Simplify a TypeScript service function."
}'

For compatibility, /v1/chat/completions requests are also accepted; the gateway internally routes them through Responses and returns the OpenAI Chat Completions shape. LLMTR sends store:false for these models. store:true, previous_response_id, instructions, and xAI server-side tools are rejected in this release. If you need a system prompt, send a first system or developer message through input or messages.

For Grok 4.3, input estimates above 200K tokens are rejected with pricing_unverified until the higher-context xAI price is verified separately. If an xAI text response does not include usage.cost_in_usd_ticks, the gateway refuses settlement and returns provider_usage_missing even if the provider request succeeded. On Grok 4.20 Multi-Agent, reasoning.effort controls agent count, so high and xhigh may consume more tokens.

For the full Grok family, including image, video, TTS, and STT examples, see xAI Grok models.

Reasoning levels are detailed on a separate page: Reasoning effort.

HTTPerror.typeMeaning
400invalid_request_errorInvalid parameter / unknown suffix
400endpoint_mismatchModel requires /v1/responses but /v1/chat/completions was called (or vice versa)
400unsupported_capabilityModel does not support this reasoning level or modality
401auth_errorInvalid API key
402insufficient_balanceInsufficient balance
429rate_limit_exceededRPM/TPM limit exceeded
502provider_errorUpstream provider error

For details see the Errors section.