Skip to content

Reasoning Effort

For reasoning-capable models (e.g. the GPT-5 Codex family) the gateway lets you set the reasoning effort level in two ways:

  1. Model slug suffixopenai/gpt-5.1-codex:high
  2. Body field{ "reasoning": { "effort": "high" } }

If both are provided, the body field wins.

LevelSuffix aliasDescription
minimal:minReasoning almost disabled, fastest response
low:lowLow effort, fast
medium:medium, :medDefault, balanced
high:highHigh effort, deeper analysis
xhigh:max, :xhighHighest (on supported models only)

Supported levels vary by model. Sending an unsupported level returns 400 unsupported_capability.

Unknown suffixes (:turbo, :fast, etc.) return 400 invalid_request_error.

Terminal window
curl https://llmtr.com/v1/responses \
-H "Authorization: Bearer sk_your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5.3-codex:max",
"messages": [
{"role": "user", "content": "Make this algorithm O(n)"}
]
}'

:max is an alias for xhigh.

Terminal window
curl https://llmtr.com/v1/responses \
-H "Authorization: Bearer sk_your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5.1-codex",
"reasoning": { "effort": "low", "summary": "concise" },
"messages": [
{"role": "user", "content": "Quick one-line answer"}
]
}'

summary is optional. Use it when you want a shorter or more detailed reasoning summary.

from openai import OpenAI
client = OpenAI(
base_url="https://llmtr.com/v1",
api_key="sk_your_key",
)
response = client.responses.create(
model="openai/gpt-5.1-codex",
input="Refactor: make this function pure",
reasoning={"effort": "high"},
)
print(response.output_text)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://llmtr.com/v1",
apiKey: process.env.LLMTR_API_KEY,
});
const response = await client.responses.create({
model: "openai/gpt-5.3-codex:max",
input: "Explain the O(n) optimization",
});
console.log(response.output_text);

Higher reasoning levels usually produce more output tokens and longer execution times. That can increase both cost and latency.

For more detail see the Responses API page.