Responses API
The LLMTR gateway supports OpenAI’s /v1/responses endpoint. The GPT-5 Codex family (gpt-5-codex, gpt-5.1-codex, gpt-5.1-codex-max, gpt-5.1-codex-mini, gpt-5.2-codex, gpt-5.3-codex) is only available through this endpoint; sending these models to /v1/chat/completions returns 400 endpoint_mismatch.
When to use it
Section titled “When to use it”- When you need reasoning effort control (low / medium / high / xhigh)
- When you use models that benefit from cached input
- For models that only support the Responses endpoint
For classic chat completion flows keep using /v1/chat/completions.
Request
Section titled “Request”POST /v1/responsesAuthorization: Bearer sk_your_keyContent-Type: application/jsonBody parameters
Section titled “Body parameters”| Field | Type | Required | Description |
|---|---|---|---|
model | string | yes | Canonical model ID (e.g. openai/gpt-5.1-codex). Optional suffix: openai/gpt-5.3-codex:max |
messages or input | array | string | yes | OpenAI message format or raw input |
instructions | string | no | System prompt equivalent |
reasoning | object | no | { "effort": "low" | "medium" | "high" | "xhigh", "summary": "auto" | "concise" | "detailed" } |
max_output_tokens | integer | no | Output token cap |
temperature | number | no | Within the model’s supported range |
tools | array | no | Function calling (where supported) |
tool_choice | string | object | no | auto, none, or specific tool |
response_format | object | no | Structured output |
stream | boolean | no | SSE streaming (coming soon) |
Basic example
Section titled “Basic example”curl https://llmtr.com/v1/responses \ -H "Authorization: Bearer sk_your_key" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-5.1-codex:high", "messages": [ {"role": "user", "content": "Refactor: make this function pure"} ] }'The same request with the body field:
curl https://llmtr.com/v1/responses \ -H "Authorization: Bearer sk_your_key" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-5.1-codex", "reasoning": { "effort": "high" }, "messages": [ {"role": "user", "content": "Refactor: make this function pure"} ] }'Response
Section titled “Response”{ "id": "resp_xxx", "object": "response", "status": "completed", "model": "gpt-5.1-codex", "output": [ { "type": "message", "role": "assistant", "content": [ { "type": "output_text", "text": "Here is the refactored version..." } ] } ], "usage": { "input_tokens": 142, "input_tokens_details": { "cached_tokens": 24 }, "output_tokens": 318, "output_tokens_details": { "reasoning_tokens": 256 } }}status may be completed, incomplete, or failed.
Usage and cost notes
Section titled “Usage and cost notes”input_tokens_details.cached_tokensshows how much of the prompt came from cache.output_tokens_details.reasoning_tokensshows reasoning-related token usage.- Total cost follows the model’s own pricing; the platform margin is applied only when topping up credit.
- Confirm the current model price in the dashboard or catalog before production use.
Reasoning effort
Section titled “Reasoning effort”Reasoning levels are detailed on a separate page: Reasoning effort.
Error codes
Section titled “Error codes”| HTTP | error.type | Meaning |
|---|---|---|
| 400 | invalid_request_error | Invalid parameter / unknown suffix |
| 400 | endpoint_mismatch | Model requires /v1/responses but /v1/chat/completions was called (or vice versa) |
| 400 | unsupported_capability | Model does not support this reasoning level or modality |
| 401 | auth_error | Invalid API key |
| 402 | insufficient_balance | Insufficient balance |
| 429 | rate_limit_exceeded | RPM/TPM limit exceeded |
| 502 | provider_error | Upstream provider error |
For details see the Errors section.