Responses API
The LLMTR gateway supports OpenAI’s /v1/responses endpoint. The GPT-5 Codex family (gpt-5-codex, gpt-5.1-codex, gpt-5.1-codex-max, gpt-5.1-codex-mini, gpt-5.2-codex, gpt-5.3-codex) is only available through this endpoint; sending these models to /v1/chat/completions returns 400 endpoint_mismatch.
When to use it
Section titled “When to use it”- When you need reasoning effort control (low / medium / high / xhigh)
- When you use models that benefit from cached input
- For models that only support the Responses endpoint
For classic chat completion flows keep using /v1/chat/completions.
Request
Section titled “Request”POST /v1/responsesAuthorization: Bearer llmtr-your_keyContent-Type: application/jsonBody parameters
Section titled “Body parameters”| Field | Type | Required | Description |
|---|---|---|---|
model | string | yes | Canonical model ID (e.g. openai/gpt-5.1-codex). Optional suffix: openai/gpt-5.3-codex:max |
messages or input | array | string | yes | OpenAI message format or raw input |
instructions | string | no | System prompt equivalent |
reasoning | object | no | { "effort": "low" | "medium" | "high" | "xhigh", "summary": "auto" | "concise" | "detailed" } |
max_output_tokens | integer | no | Output token cap |
temperature | number | no | Within the model’s supported range |
tools | array | no | Function calling (where supported) |
tool_choice | string | object | no | auto, none, or specific tool |
response_format | object | no | Structured output |
stream | boolean | no | SSE streaming (coming soon) |
Basic example
Section titled “Basic example”curl "$LLMTR_BASE_URL/v1/responses" \ -H "Authorization: Bearer llmtr-your_key" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-5.1-codex:high", "messages": [ {"role": "user", "content": "Refactor: make this function pure"} ] }'The same request with the body field:
curl https://llmtr.com/v1/responses \ -H "Authorization: Bearer llmtr-your_key" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-5.1-codex", "reasoning": { "effort": "high" }, "messages": [ {"role": "user", "content": "Refactor: make this function pure"} ] }'Response
Section titled “Response”{ "id": "resp_xxx", "object": "response", "status": "completed", "model": "gpt-5.1-codex", "output": [ { "type": "message", "role": "assistant", "content": [ { "type": "output_text", "text": "Here is the refactored version..." } ] } ], "usage": { "input_tokens": 142, "input_tokens_details": { "cached_tokens": 24 }, "output_tokens": 318, "output_tokens_details": { "reasoning_tokens": 256 } }}status may be completed, incomplete, or failed.
Usage and cost notes
Section titled “Usage and cost notes”input_tokens_details.cached_tokensshows how much of the prompt came from cache.output_tokens_details.reasoning_tokensshows reasoning-related token usage.- Total cost follows the model’s own pricing; the platform margin is applied only when topping up credit.
- Confirm the current model price in the dashboard or catalog before production use.
xAI Grok
Section titled “xAI Grok”The recommended endpoint for xai/grok-4.3, xai/grok-4.20-multi-agent, xai/grok-4.20-0309-reasoning, and xai/grok-4.20-0309-non-reasoning is /v1/responses:
curl https://llmtr.com/v1/responses \ -H "Authorization: Bearer llmtr-your_key" \ -H "Content-Type: application/json" \ -d '{ "model": "xai/grok-4.3", "input": "Simplify a TypeScript service function." }'For compatibility, /v1/chat/completions requests are also accepted; the gateway internally routes them through Responses and returns the OpenAI Chat Completions shape. LLMTR sends store:false for these models. store:true, previous_response_id, instructions, and xAI server-side tools are rejected in this release. If you need a system prompt, send a first system or developer message through input or messages.
For Grok 4.3, input estimates above 200K tokens are rejected with pricing_unverified until the higher-context xAI price is verified separately. If an xAI text response does not include usage.cost_in_usd_ticks, the gateway refuses settlement and returns provider_usage_missing even if the provider request succeeded. On Grok 4.20 Multi-Agent, reasoning.effort controls agent count, so high and xhigh may consume more tokens.
For the full Grok family, including image, video, TTS, and STT examples, see xAI Grok models.
Reasoning effort
Section titled “Reasoning effort”Reasoning levels are detailed on a separate page: Reasoning effort.
Error codes
Section titled “Error codes”| HTTP | error.type | Meaning |
|---|---|---|
| 400 | invalid_request_error | Invalid parameter / unknown suffix |
| 400 | endpoint_mismatch | Model requires /v1/responses but /v1/chat/completions was called (or vice versa) |
| 400 | unsupported_capability | Model does not support this reasoning level or modality |
| 401 | auth_error | Invalid API key |
| 402 | insufficient_balance | Insufficient balance |
| 429 | rate_limit_exceeded | RPM/TPM limit exceeded |
| 502 | provider_error | Upstream provider error |
For details see the Errors section.