Skip to content

Multimodal Overview

For supported models, send messages[].content as an OpenAI-compatible content-part array instead of a plain string. Currently accepted parts:

text  image_url  input_audio 

input_file

Check a single model’s modalities via the catalog:

Terminal window
curl https://llmtr.com/api/models \
-H "Authorization: Bearer llmtr-your_key" \
| jq '.data[] | select(.canonicalId=="google/gemini-2.5-flash") | .modalities'

To list every model that supports a specific operation (image generation, embeddings, TTS, …) use the operation filter:

Terminal window
# All image-generating models
curl "https://llmtr.com/api/models?operation=IMAGES_GENERATIONS" \
-H "Authorization: Bearer llmtr-your_key"
# Embedding models
curl "https://llmtr.com/api/models?operation=EMBEDDINGS" \
-H "Authorization: Bearer llmtr-your_key"
# Text-to-speech models
curl "https://llmtr.com/api/models?operation=AUDIO_SPEECH" \
-H "Authorization: Bearer llmtr-your_key"

When a model is sent to the wrong endpoint, the 400 unsupported_operation response includes error.details.supported_endpoints and error.details.suggested_endpoint to point you at the correct route. See Errors.

{
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "What's in this image?" },
{
"type": "image_url",
"image_url": { "url": "https://..." }
}
]
}
]
}
  • Media is sent through the JSON body. Remote URLs are safer than inline base64.
  • Keep inline base64 audio clips short (< 1 MB recommended).
  • For large files, PDFs, video, or reusable media use the Files API.