MiniMax M3: Open-Weight Model Hits 1M-Token Context and Outperforms GPT-5.5 on Coding
Summary: MiniMax shipped M3 on June 1, introducing MiniMax Sparse Attention (MSA) — a new architecture that claims to be the first open-weight model combining a 1M-token context, native multimodal input, and frontier-level coding performance.
Key Facts
- MSA architecture: 9× faster prefill and 15× faster decoding at 1M-token context vs M2; 1/20th the per-token compute
- Benchmark: SWE-Bench Pro score of 59.0% — above GPT-5.5 and Gemini 3.1 Pro; priced at an estimated 5–10% of GPT-5.5 API cost
- Multimodal: natively handles image and video input, plus desktop computer operation
- API available immediately; model weights and technical report to be released within 10 days of launch
Why It Matters
M3 is the first model MiniMax claims simultaneously achieves open weights + 1M-token context + native multimodality. Delivering frontier coding performance at a fraction of closed-API costs puts real pressure on OpenAI and Anthropic's enterprise pricing — especially for agentic workflows and long-document processing where context length is the bottleneck.
Read More
- MiniMax Releases MiniMax M3 with MSA Architecture — MarkTechPost
- MiniMax-M3 debuts, eclipsing GPT-5.5 and Gemini 3.1 Pro — VentureBeat