Type something to search...
MiniMax-M2.7: a 229B model that engineers itself

MiniMax-M2.7: a 229B model that engineers itself

What is MiniMax-M2.7

MiniMax-M2.7 is a 229B-parameter dense model from MiniMax, a Beijing-based AI lab. Unlike most frontier models that iterate through human-supervised training cycles, M2.7’s defining claim is self-evolution: the model participated in its own post-training loop, autonomously analyzing failure trajectories, modifying code, and running evaluations across 100+ optimization rounds — achieving a 30% performance uplift without human intervention.

The result is a model that matches GPT-5.3-Codex on SWE-Pro and surpasses GPT-5.3 on professional work benchmarks, while remaining fully open-weight under a Modified-MIT license.


Architecture

M2.7 uses a dense Transformer architecture with 229B total parameters. The model supports BF16, FP32, and FP8 (E4M3) precision formats, and ships with deployment guides for SGLang, vLLM, Transformers, ModelScope, and NVIDIA NIM.

The architecture focuses on sustained agent interaction rather than single-turn generation. MiniMax designed it to handle multi-round tool calling, autonomous memory updates, and long-horizon task execution — the kind of workload where most models degrade after a few dozen steps.


Benchmark results

Software engineering

BenchmarkMiniMax-M2.7Reference
SWE-Pro56.2%matches GPT-5.3-Codex
SWE Multilingual76.5
Multi SWE Bench52.7
VIBE-Pro55.6%near Opus 4.6
Terminal Bench 257.0%
NL2Repo39.8%

SWE-Pro tests multi-file, multi-step issue resolution in real codebases. M2.7 matches the Codex-optimized GPT-5.3 variant on this benchmark. VIBE-Pro — which measures creative coding and UI generation — lands within a point of Opus 4.6.

ML engineering

BenchmarkMiniMax-M2.7Detail
MLE Bench Lite66.6% medal rate9 gold, 5 silver, 1 bronze (best run)

MLE Bench Lite spans 22 Kaggle-style ML competitions. M2.7’s 66.6% medal rate places it second only to Opus 4.6 and GPT-5.4.

Professional work and tool use

BenchmarkMiniMax-M2.7Reference
GDPval-AA ELO1495highest among open-weight, surpasses GPT-5.3
Toolathon46.3%global top tier
MM Claw Skills Compliance97%across 40+ complex skills
MM Claw End-to-End62.7%close to Sonnet 4.6

The GDPval-AA ELO of 1495 is particularly notable — it is the highest score among all open-weight models and surpasses GPT-5.3 on professional document processing tasks. MM Claw tests complex skill adherence across extended interactions: 97% compliance across 40+ skills (each exceeding 2,000 tokens) demonstrates sustained instruction following.


What makes it different: self-evolution

M2.7 is MiniMax’s first model that deeply participates in its own evolution. During post-training, the model ran autonomous optimization loops: analyzing its own failure trajectories, modifying scaffolding code, running evaluations, and iterating — over 100 rounds without human intervention.

This produced a 30% performance improvement on internal benchmarks. MiniMax reports that a research agent harness built on M2.7 now handles 30–50% of their RL team’s workflows autonomously.

The self-evolution approach also extends to deployment: M2.7 supports autonomous memory updates and dynamic tool search, meaning it can adapt its behavior within a session based on what it learns.


Agent teams and complex skills

Beyond single-agent performance, M2.7 natively supports multi-agent collaboration — what MiniMax calls “agent teams.” This includes:

  • Stable role identity: each agent in a team maintains its assigned role across extended interactions
  • Autonomous decision-making: agents can independently decide when to delegate, escalate, or act
  • Adversarial reasoning: agents can challenge each other’s conclusions, reducing hallucination in collaborative settings

MiniMax also built dozens of complex skills for RL experiments, each exceeding 2,000 tokens of structured behavior. The model maintains 97% adherence to these skill definitions during execution — a metric they call “skill compliance.”

In production, M2.7 has demonstrated system-level reasoning capabilities: log analysis, trace analysis, root cause verification, and production incident recovery in under 3 minutes across multiple real-world scenarios.


Deployment

M2.7 is available through multiple channels:

For local deployment, MiniMax recommends the following frameworks (in order of preference):

  1. SGLang — primary recommendation
  2. vLLM
  3. Transformers

Recommended inference parameters: temperature=1.0, top_p=0.95, top_k=40.

39 quantized variants are available for local deployment via llama.cpp, LM Studio, Jan, and Ollama.


Limitations

MiniMax does not publicly disclose the context window length or detailed architecture specifications (layer count, head count, vocabulary size) for M2.7. The model is text-only — it supports office document processing (Word, Excel, PPT) but has no native vision or audio modalities.

The self-evolution capability, while impressive on internal benchmarks, has not been independently verified by third parties. Multi-agent team features require specific harness configurations that may not be straightforward to replicate in all deployment scenarios.


Conclusion

MiniMax-M2.7 introduces a genuinely novel training paradigm: a model that engineers its own improvement. Matching GPT-5.3-Codex on software engineering, leading open-weight models on professional work (ELO 1495), and sustaining 97% skill compliance across complex agent tasks makes M2.7 a serious contender for teams building autonomous coding and research agents.

The Modified-MIT license and broad deployment support (SGLang, vLLM, NIM, plus 39 quantization formats) lower the barrier to self-hosting. For teams that need an open-weight model capable of sustained multi-agent work, M2.7 is now the benchmark to beat.

Model: MiniMaxAI/MiniMax-M2.7 · Blog: minimax.io/news/minimax-m27-en

Tags :
  • AI
  • MiniMax
  • LLM
  • Agentic
  • Open Source
  • Coding
Share :

Related Posts

ChatGPT: Beware of These Malicious Chrome Extensions

ChatGPT: Beware of These Malicious Chrome Extensions

Are your ChatGPT secrets truly secure? The massive hype surrounding ChatGPT has led to the birth of thousands of Chrome extensions promising to enhance user experience. However, a recent study h

Read More
Agentic AI Smartphones: The Next Frontier for Enterprise

Agentic AI Smartphones: The Next Frontier for Enterprise

The rise of the "doer" AI The recent launch of the ZTE Nubia M153 prototype, powered by ByteDance's Doubao model, marks a decisive turning point. We are moving from passive voice assistants to "

Read More
Chroma Context-1: the 20B agentic search model that edits its own context

Chroma Context-1: the 20B agentic search model that edits its own context

What is Chroma Context-1? Chroma Context-1 is a 20B Mixture of Experts model built specifically for agentic search — retrieval tasks that require multiple hops, query decomposition, and self

Read More
Claude Opus 4.5: The Next Generation of AI

Claude Opus 4.5: The Next Generation of AI

Introduction to Claude Opus 4.5 Claude Opus 4.5, released on November 25, 2025, represents a significant leap forward in AI technology. This latest version brings a host of new features and impr

Read More
Cohere Transcribe: a 2B ASR model that tops the English leaderboard

Cohere Transcribe: a 2B ASR model that tops the English leaderboard

What is Cohere Transcribe? Cohere Transcribe 03-2026 is an automatic speech recognition (ASR) model released by Cohere Labs. With 2B parameters, it ranks **#1 on the English ASR leaderboard*

Read More
Gemma 4 31B: Google's multimodal model with 256K context and thinking mode

Gemma 4 31B: Google's multimodal model with 256K context and thinking mode

What is Gemma 4 31B? Gemma 4 31B (instruction-tuned variant: gemma-4-31B-it) is Google's latest open-weights multimodal model with 30.7 billion parameters. It processes text, images, and v

Read More
GLM-5.1: 754B parameters — Z.ai's agentic engineering flagship

GLM-5.1: 754B parameters — Z.ai's agentic engineering flagship

From GLM-5 to GLM-5.1: the agentic leap Less than two weeks after releasing GLM-5, Z.ai (formerly ZhipuAI) ships GLM-5.1 — a 754B-parameter Mixture of Experts model that does not just iterat

Read More
GLM-5: 744B parameters, 40B active — ZhipuAI's open-source frontier model

GLM-5: 744B parameters, 40B active — ZhipuAI's open-source frontier model

What is GLM-5? GLM-5 is a large language model released by ZhipuAI (智谱AI). It has 744 billion total parameters with only 40 billion active at inference — the same Mixture of Experts

Read More
Google Snapseed: A New Photo Experience Arrives on iPhone

Google Snapseed: A New Photo Experience Arrives on iPhone

Introduction: Google surprises mobile photographers Google has just made a major move in the iOS ecosystem by launching a dedicated camera app, directly linked to its famous Snapseed editing suit

Read More
Mistral Small 4: One Unified Model to Rule Reasoning, Code, and Vision

Mistral Small 4: One Unified Model to Rule Reasoning, Code, and Vision

For years, the AI model landscape has operated along a familiar tension: large models that are capable but expensive to run, versus small models that are fast but frustratingly limited. Mistral AI's

Read More
Mistral's Devstral 2: The Return of Sovereign Code AI

Mistral's Devstral 2: The Return of Sovereign Code AI

The European Counter-Strike in Code AI With the release of Devstral 2 and its lightweight counterpart Devstral Small 2, Mistral AI is effectively reclaiming territory in a sector recently domina

Read More
Nemotron Cascade 2: NVIDIA's 30B model that won the math and coding Olympics

Nemotron Cascade 2: NVIDIA's 30B model that won the math and coding Olympics

What is Nemotron Cascade 2? Nemotron Cascade 2 (30B-A3B) is an open model released by NVIDIA on March 19, 2026. Its headline number is deceptive: 30 billion total parameters, but only **3 bi

Read More
NVIDIA Nemotron-3 Super: a 120B MoE model that runs on a single GPU

NVIDIA Nemotron-3 Super: a 120B MoE model that runs on a single GPU

On March 11, 2026, NVIDIA released Nemotron-3 Super — a model that sits at an unusual intersection: 120 billion total parameters, only 12 billion active during inference, deployable on a single G

Read More
Qianfan-OCR: Baidu's 4B model that beats Gemini on document parsing

Qianfan-OCR: Baidu's 4B model that beats Gemini on document parsing

What is Qianfan-OCR? Qianfan-OCR is a document understanding model released by Baidu. It converts images of documents — PDFs, scans, photos, screenshots — directly into structured Markdown,

Read More
Qwen3.5-27B Distilled by Claude 4.6 Opus: A Local Reasoning Powerhouse

Qwen3.5-27B Distilled by Claude 4.6 Opus: A Local Reasoning Powerhouse

What is this model? Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled is an open-source 28B language model published by Jackrong on Hugging Face. The idea is

Read More
Project Ava: Razer Traps an AI in a Connected Jar

Project Ava: Razer Traps an AI in a Connected Jar

AI steps out of the screen with Razer Beyond RGB mice and keyboards, Razer is exploring new horizons with Project Ava. This concept, introduced as an "AI companion in a jar," aims to humaniz

Read More
Technology (definition)

Technology (definition)

Technology and ecology: a sustainable alliance At Reeboot, we firmly believe that technology and ecology can go hand in hand. Our mission is to provide high-performance products while adopting a

Read More
The Asus ROG Strix SCAR 18 Monster, VPN and Health: Today's Tech News

The Asus ROG Strix SCAR 18 Monster, VPN and Health: Today's Tech News

Introduction: a concentration of innovations and vigilance The world of technology never stops, and this morning, the news offers us a fascinating mix of raw performance, digital geopolitics, and

Read More
Ubuntu 26.04 LTS: Rust coreutils, Wayland-only, and kernel 7.0

Ubuntu 26.04 LTS: Rust coreutils, Wayland-only, and kernel 7.0

Ubuntu 26.04 LTS: Resolute Raccoon Ubuntu 26.04 LTS, codenamed Resolute Raccoon, ships on April 23, 2026. The codename honors Steve Langasek, a former Debian and Ubuntu release manager who p

Read More
Voxtral-4B: Mistral's open-weights TTS model that speaks 9 languages in real time

Voxtral-4B: Mistral's open-weights TTS model that speaks 9 languages in real time

What is Voxtral-4B? Voxtral-4B-TTS-2603 is a text-to-speech model released by Mistral AI in March 2026. It converts text to realistic speech in 9 languages, with 20 built-in preset voices an

Read More
Windows 11: Your Android Apps Now in Full Screen on PC

Windows 11: Your Android Apps Now in Full Screen on PC

Breaking the barriers between mobile and PC Microsoft is taking another major step in unifying its operating systems. Thanks to an update to the "Phone Link" tool, users can now project their An

Read More