MiniMax-M2.7: a 229B model that engineers itself
- Bastien
- 13 Apr, 2026
What is MiniMax-M2.7
MiniMax-M2.7 is a 229B-parameter dense model from MiniMax, a Beijing-based AI lab. Unlike most frontier models that iterate through human-supervised training cycles, M2.7’s defining claim is self-evolution: the model participated in its own post-training loop, autonomously analyzing failure trajectories, modifying code, and running evaluations across 100+ optimization rounds — achieving a 30% performance uplift without human intervention.
The result is a model that matches GPT-5.3-Codex on SWE-Pro and surpasses GPT-5.3 on professional work benchmarks, while remaining fully open-weight under a Modified-MIT license.
Architecture
M2.7 uses a dense Transformer architecture with 229B total parameters. The model supports BF16, FP32, and FP8 (E4M3) precision formats, and ships with deployment guides for SGLang, vLLM, Transformers, ModelScope, and NVIDIA NIM.
The architecture focuses on sustained agent interaction rather than single-turn generation. MiniMax designed it to handle multi-round tool calling, autonomous memory updates, and long-horizon task execution — the kind of workload where most models degrade after a few dozen steps.
Benchmark results
Software engineering
| Benchmark | MiniMax-M2.7 | Reference |
|---|---|---|
| SWE-Pro | 56.2% | matches GPT-5.3-Codex |
| SWE Multilingual | 76.5 | — |
| Multi SWE Bench | 52.7 | — |
| VIBE-Pro | 55.6% | near Opus 4.6 |
| Terminal Bench 2 | 57.0% | — |
| NL2Repo | 39.8% | — |
SWE-Pro tests multi-file, multi-step issue resolution in real codebases. M2.7 matches the Codex-optimized GPT-5.3 variant on this benchmark. VIBE-Pro — which measures creative coding and UI generation — lands within a point of Opus 4.6.
ML engineering
| Benchmark | MiniMax-M2.7 | Detail |
|---|---|---|
| MLE Bench Lite | 66.6% medal rate | 9 gold, 5 silver, 1 bronze (best run) |
MLE Bench Lite spans 22 Kaggle-style ML competitions. M2.7’s 66.6% medal rate places it second only to Opus 4.6 and GPT-5.4.
Professional work and tool use
| Benchmark | MiniMax-M2.7 | Reference |
|---|---|---|
| GDPval-AA ELO | 1495 | highest among open-weight, surpasses GPT-5.3 |
| Toolathon | 46.3% | global top tier |
| MM Claw Skills Compliance | 97% | across 40+ complex skills |
| MM Claw End-to-End | 62.7% | close to Sonnet 4.6 |
The GDPval-AA ELO of 1495 is particularly notable — it is the highest score among all open-weight models and surpasses GPT-5.3 on professional document processing tasks. MM Claw tests complex skill adherence across extended interactions: 97% compliance across 40+ skills (each exceeding 2,000 tokens) demonstrates sustained instruction following.
What makes it different: self-evolution
M2.7 is MiniMax’s first model that deeply participates in its own evolution. During post-training, the model ran autonomous optimization loops: analyzing its own failure trajectories, modifying scaffolding code, running evaluations, and iterating — over 100 rounds without human intervention.
This produced a 30% performance improvement on internal benchmarks. MiniMax reports that a research agent harness built on M2.7 now handles 30–50% of their RL team’s workflows autonomously.
The self-evolution approach also extends to deployment: M2.7 supports autonomous memory updates and dynamic tool search, meaning it can adapt its behavior within a session based on what it learns.
Agent teams and complex skills
Beyond single-agent performance, M2.7 natively supports multi-agent collaboration — what MiniMax calls “agent teams.” This includes:
- Stable role identity: each agent in a team maintains its assigned role across extended interactions
- Autonomous decision-making: agents can independently decide when to delegate, escalate, or act
- Adversarial reasoning: agents can challenge each other’s conclusions, reducing hallucination in collaborative settings
MiniMax also built dozens of complex skills for RL experiments, each exceeding 2,000 tokens of structured behavior. The model maintains 97% adherence to these skill definitions during execution — a metric they call “skill compliance.”
In production, M2.7 has demonstrated system-level reasoning capabilities: log analysis, trace analysis, root cause verification, and production incident recovery in under 3 minutes across multiple real-world scenarios.
Deployment
M2.7 is available through multiple channels:
- MiniMax Agent: agent.minimax.io
- MiniMax API: platform.minimax.io
- NVIDIA NIM: build.nvidia.com
For local deployment, MiniMax recommends the following frameworks (in order of preference):
- SGLang — primary recommendation
- vLLM
- Transformers
Recommended inference parameters: temperature=1.0, top_p=0.95, top_k=40.
39 quantized variants are available for local deployment via llama.cpp, LM Studio, Jan, and Ollama.
Limitations
MiniMax does not publicly disclose the context window length or detailed architecture specifications (layer count, head count, vocabulary size) for M2.7. The model is text-only — it supports office document processing (Word, Excel, PPT) but has no native vision or audio modalities.
The self-evolution capability, while impressive on internal benchmarks, has not been independently verified by third parties. Multi-agent team features require specific harness configurations that may not be straightforward to replicate in all deployment scenarios.
Conclusion
MiniMax-M2.7 introduces a genuinely novel training paradigm: a model that engineers its own improvement. Matching GPT-5.3-Codex on software engineering, leading open-weight models on professional work (ELO 1495), and sustaining 97% skill compliance across complex agent tasks makes M2.7 a serious contender for teams building autonomous coding and research agents.
The Modified-MIT license and broad deployment support (SGLang, vLLM, NIM, plus 39 quantization formats) lower the barrier to self-hosting. For teams that need an open-weight model capable of sustained multi-agent work, M2.7 is now the benchmark to beat.
Model: MiniMaxAI/MiniMax-M2.7 · Blog: minimax.io/news/minimax-m27-en
Tags :
- AI
- MiniMax
- LLM
- Agentic
- Open Source
- Coding