MiniMax / minimax/minimax-m2.5-highspeed

MiniMax M2.5 Highspeed - access through LLMTR

MiniMax M2.5 Highspeed is the faster inference variant in the M2.5 family. It makes sense for editor assistance, short refactor loops, quick technical answers, and latency-sensitive chat experiences, while preserving the 204,800-token context window.

Technical specifications

Canonical IDminimax/minimax-m2.5-highspeed
ProviderMiniMax
Context window204,800 tokens
OperationsCHAT_COMPLETIONS
Modalitiestext

Pricing

A 6% platform margin applies to credit top-ups; model usage prices are not separately marked up.

OperationMetricUnitPrice
CHAT_COMPLETIONSCACHE_READPER_1M_TOKENS$0.030000
CHAT_COMPLETIONSCACHE_WRITEPER_1M_TOKENS$0.375000
CHAT_COMPLETIONSINPUT_TEXTPER_1M_TOKENS$0.600000
CHAT_COMPLETIONSOUTPUT_TEXTPER_1M_TOKENS$2.40

Example usage

With existing OpenAI SDK flows, change only the base URL and model identifier.

curl https://llmtr.com/v1/chat/completions \
  -H "Authorization: Bearer sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{"model":"minimax/minimax-m2.5-highspeed","messages":[{"role":"user","content":"Hello"}]}'

Related models