Skip to content

Responses API

The LLMTR gateway supports OpenAI’s /v1/responses endpoint. The GPT-5 Codex family (gpt-5-codex, gpt-5.1-codex, gpt-5.1-codex-max, gpt-5.1-codex-mini, gpt-5.2-codex, gpt-5.3-codex) is only available through this endpoint; sending these models to /v1/chat/completions returns 400 endpoint_mismatch.

  • When you need reasoning effort control (low / medium / high / xhigh)
  • When you use models that benefit from cached input
  • For models that only support the Responses endpoint

For classic chat completion flows keep using /v1/chat/completions.

POST /v1/responses
Authorization: Bearer sk_your_key
Content-Type: application/json
FieldTypeRequiredDescription
modelstringyesCanonical model ID (e.g. openai/gpt-5.1-codex). Optional suffix: openai/gpt-5.3-codex:max
messages or inputarray | stringyesOpenAI message format or raw input
instructionsstringnoSystem prompt equivalent
reasoningobjectno{ "effort": "low" | "medium" | "high" | "xhigh", "summary": "auto" | "concise" | "detailed" }
max_output_tokensintegernoOutput token cap
temperaturenumbernoWithin the model’s supported range
toolsarraynoFunction calling (where supported)
tool_choicestring | objectnoauto, none, or specific tool
response_formatobjectnoStructured output
streambooleannoSSE streaming (coming soon)
Terminal window
curl https://llmtr.com/v1/responses \
-H "Authorization: Bearer sk_your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5.1-codex:high",
"messages": [
{"role": "user", "content": "Refactor: make this function pure"}
]
}'

The same request with the body field:

Terminal window
curl https://llmtr.com/v1/responses \
-H "Authorization: Bearer sk_your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5.1-codex",
"reasoning": { "effort": "high" },
"messages": [
{"role": "user", "content": "Refactor: make this function pure"}
]
}'
{
"id": "resp_xxx",
"object": "response",
"status": "completed",
"model": "gpt-5.1-codex",
"output": [
{
"type": "message",
"role": "assistant",
"content": [
{ "type": "output_text", "text": "Here is the refactored version..." }
]
}
],
"usage": {
"input_tokens": 142,
"input_tokens_details": { "cached_tokens": 24 },
"output_tokens": 318,
"output_tokens_details": { "reasoning_tokens": 256 }
}
}

status may be completed, incomplete, or failed.

  • input_tokens_details.cached_tokens shows how much of the prompt came from cache.
  • output_tokens_details.reasoning_tokens shows reasoning-related token usage.
  • Total cost follows the model’s own pricing; the platform margin is applied only when topping up credit.
  • Confirm the current model price in the dashboard or catalog before production use.

Reasoning levels are detailed on a separate page: Reasoning effort.

HTTPerror.typeMeaning
400invalid_request_errorInvalid parameter / unknown suffix
400endpoint_mismatchModel requires /v1/responses but /v1/chat/completions was called (or vice versa)
400unsupported_capabilityModel does not support this reasoning level or modality
401auth_errorInvalid API key
402insufficient_balanceInsufficient balance
429rate_limit_exceededRPM/TPM limit exceeded
502provider_errorUpstream provider error

For details see the Errors section.