Multimodal Overview
For supported models, send messages[].content as an OpenAI-compatible content-part array instead of a plain string. Currently accepted parts:
text image_url input_audio
input_fileWhen should you use it?
Section titled “When should you use it?”Check a model’s modalities via the catalog:
curl https://llmtr.com/api/models \ -H "Authorization: Bearer sk_your_key" \ | jq '.data[] | select(.canonicalId=="google/gemini-2.5-flash") | .modalities'Content-part shape
Section titled “Content-part shape”{ "messages": [ { "role": "user", "content": [ { "type": "text", "text": "What's in this image?" }, { "type": "image_url", "image_url": { "url": "https://..." } } ] } ]}Limits and notes
Section titled “Limits and notes”- Media is sent through the JSON body. Remote URLs are safer than inline base64.
- Keep inline base64 audio clips short (< 1 MB recommended).
- For large files, PDFs, video, or reusable media use the Files API.