Skip to content

xAI Grok Models

xAI models use canonical model IDs under xai/.... Text models use /v1/responses; image, video, and voice models use their dedicated gateway endpoints.

LLMTR sends store:false for xAI text models and rejects store:true, previous_response_id, and xAI server-side tools. When xAI returns usage.cost_in_usd_ticks for text/image/video, LLMTR settles from that exact provider-reported cost.

Public text models:

ModelEndpointNotes
xai/grok-4.3/v1/responsesInputs estimated above 200K tokens are rejected until xAI high-context pricing is verified.
xai/grok-4.20-multi-agent/v1/responsesreasoning.effort controls agent count; high/xhigh may cost more.
xai/grok-4.20-0309-reasoning/v1/responsesLong-context reasoning model.
xai/grok-4.20-0309-non-reasoning/v1/responsesText work that does not need reasoning.
Terminal window
curl "$LLMTR_BASE_URL/v1/responses" \
-H "Authorization: Bearer llmtr-your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "xai/grok-4.3",
"input": "Simplify a TypeScript service function."
}'

For the multi-agent model, use a suffix or body field:

Terminal window
curl "$LLMTR_BASE_URL/v1/responses" \
-H "Authorization: Bearer llmtr-your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "xai/grok-4.20-multi-agent:high",
"input": "Evaluate this architecture decision across cost, risk, and maintenance."
}'

xai/grok-imagine-image is cataloged at $0.02 per image, and xai/grok-imagine-image-quality at $0.04 per image. If xAI returns usage.cost_in_usd_ticks, that exact provider cost is used; otherwise LLMTR falls back to deterministic per-image pricing. For image edits, xAI bills both input images and generated output images, so fallback settlement uses input image count + output image count.

Terminal window
curl "$LLMTR_BASE_URL/v1/images/generations" \
-H "Authorization: Bearer llmtr-your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "xai/grok-imagine-image",
"prompt": "A clean product photo of a copper desk lamp",
"aspect_ratio": "16:9",
"resolution": "1k",
"response_format": "url",
"n": 1
}'

Image edit:

Terminal window
curl "$LLMTR_BASE_URL/v1/images/edits" \
-H "Authorization: Bearer llmtr-your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "xai/grok-imagine-image",
"prompt": "Render this image as a pencil sketch.",
"image": { "url": "https://example.com/source.jpg" },
"aspect_ratio": "1:1",
"response_format": "url"
}'

For multi-image edits, send up to 5 inputs in images.

xai/grok-imagine-video is priced at $0.05 per second. The gateway starts the async xAI video request and polls until the result is ready.

Terminal window
curl "$LLMTR_BASE_URL/v1/video/generations" \
-H "Authorization: Bearer llmtr-your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "xai/grok-imagine-video",
"prompt": "A calm time-lapse of clouds over a mountain ridge",
"duration_seconds": 5,
"aspect_ratio": "16:9",
"resolution": "480p",
"max_wait_seconds": 120
}'

The TTS model is xai/grok-voice-tts; the REST STT model is xai/grok-voice-stt. Streaming STT and realtime voice are not public in this phase.

Terminal window
curl "$LLMTR_BASE_URL/v1/audio/speech" \
-H "Authorization: Bearer llmtr-your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "xai/grok-voice-tts",
"input": "Hello, this is a Grok Voice TTS example.",
"voice": "eve",
"language": "en"
}' \
--output voice.mp3

STT accepts base64 audio in the JSON body. Because billing is per hour of audio, sending duration_seconds for short files reduces the preflight hold.

Terminal window
curl "$LLMTR_BASE_URL/v1/audio/transcriptions" \
-H "Authorization: Bearer llmtr-your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "xai/grok-voice-stt",
"audio_base64": "<base64 mp3>",
"audio_format": "mp3",
"language": "en",
"format": true,
"duration_seconds": 12
}'
  • For text/image/video, LLMTR uses xAI cost_in_usd_ticks when it is returned.
  • Image and video fall back to official deterministic prices only when provider ticks are absent.
  • TTS is calculated from input characters; STT is calculated from provider response duration.
  • xAI server-side tools, files/collections, streaming STT, and realtime voice are not public.