Responses API

The LLMTR gateway supports OpenAI’s /v1/responses endpoint. The GPT-5 Codex family (gpt-5-codex, gpt-5.1-codex, gpt-5.1-codex-max, gpt-5.1-codex-mini, gpt-5.2-codex, gpt-5.3-codex) is only available through this endpoint; sending these models to /v1/chat/completions returns 400 endpoint_mismatch.

When to use it

When you need reasoning effort control (low / medium / high / xhigh)
When you use models that benefit from cached input
For models that only support the Responses endpoint

For classic chat completion flows keep using /v1/chat/completions.

Request

POST /v1/responses
Authorization: Bearer llmtr-your_key
Content-Type: application/json

Body parameters

Field	Type	Required	Description
`model`	string	yes	Canonical model ID (e.g. `openai/gpt-5.1-codex`). Optional suffix: `openai/gpt-5.3-codex:max`
`messages` or `input`	array \| string	yes	OpenAI message format or raw input
`instructions`	string	no	System prompt equivalent
`reasoning`	object	no	`{ "effort": "low" \| "medium" \| "high" \| "xhigh", "summary": "auto" \| "concise" \| "detailed" }`
`max_output_tokens`	integer	no	Output token cap
`temperature`	number	no	Within the model’s supported range
`tools`	array	no	Function calling (where supported)
`tool_choice`	string \| object	no	`auto`, `none`, or specific tool
`response_format`	object	no	Structured output
`stream`	boolean	no	SSE streaming (coming soon)

Basic example

curl "$LLMTR_BASE_URL/v1/responses" \
  -H "Authorization: Bearer llmtr-your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.1-codex:high",
    "messages": [
      {"role": "user", "content": "Refactor: make this function pure"}
    ]
  }'

The same request with the body field:

curl https://llmtr.com/v1/responses \
  -H "Authorization: Bearer llmtr-your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.1-codex",
    "reasoning": { "effort": "high" },
    "messages": [
      {"role": "user", "content": "Refactor: make this function pure"}
    ]
  }'

Response

{
  "id": "resp_xxx",
  "object": "response",
  "status": "completed",
  "model": "gpt-5.1-codex",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        { "type": "output_text", "text": "Here is the refactored version..." }
      ]
    }
  ],
  "usage": {
    "input_tokens": 142,
    "input_tokens_details": { "cached_tokens": 24 },
    "output_tokens": 318,
    "output_tokens_details": { "reasoning_tokens": 256 }
  }
}

status may be completed, incomplete, or failed.

Usage and cost notes

input_tokens_details.cached_tokens shows how much of the prompt came from cache.
output_tokens_details.reasoning_tokens shows reasoning-related token usage.
Total cost follows the model’s own pricing; the platform margin is applied only when topping up credit.
Confirm the current model price in the dashboard or catalog before production use.

xAI Grok

The recommended endpoint for xai/grok-4.3, xai/grok-4.20-multi-agent, xai/grok-4.20-0309-reasoning, and xai/grok-4.20-0309-non-reasoning is /v1/responses:

curl https://llmtr.com/v1/responses \
  -H "Authorization: Bearer llmtr-your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "xai/grok-4.3",
    "input": "Simplify a TypeScript service function."
  }'

For compatibility, /v1/chat/completions requests are also accepted; the gateway internally routes them through Responses and returns the OpenAI Chat Completions shape. LLMTR sends store:false for these models. store:true, previous_response_id, instructions, and xAI server-side tools are rejected in this release. If you need a system prompt, send a first system or developer message through input or messages.

For Grok 4.3, input estimates above 200K tokens are rejected with pricing_unverified until the higher-context xAI price is verified separately. If an xAI text response does not include usage.cost_in_usd_ticks, the gateway refuses settlement and returns provider_usage_missing even if the provider request succeeded. On Grok 4.20 Multi-Agent, reasoning.effort controls agent count, so high and xhigh may consume more tokens.

For the full Grok family, including image, video, TTS, and STT examples, see xAI Grok models.

Reasoning effort

Reasoning levels are detailed on a separate page: Reasoning effort.

Error codes

HTTP	`error.type`	Meaning
400	`invalid_request_error`	Invalid parameter / unknown suffix
400	`endpoint_mismatch`	Model requires `/v1/responses` but `/v1/chat/completions` was called (or vice versa)
400	`unsupported_capability`	Model does not support this reasoning level or modality
401	`auth_error`	Invalid API key
402	`insufficient_balance`	Insufficient balance
429	`rate_limit_exceeded`	RPM/TPM limit exceeded
502	`provider_error`	Upstream provider error

For details see the Errors section.