GLM-5.1: 754B parameters — Z.ai's agentic engineering flagship
- Bastien
- 08 Apr, 2026
From GLM-5 to GLM-5.1: the agentic leap
Less than two weeks after releasing GLM-5, Z.ai (formerly ZhipuAI) ships GLM-5.1 — a 754B-parameter Mixture of Experts model that does not just iterate on its predecessor, it redefines what an open-weight model can do on long-horizon agent tasks. The tagline says it all: from Vibe Coding to Agentic Engineering.
Where GLM-5 proved the architecture could compete on benchmarks, GLM-5.1 proves it can sustain performance across hundreds of reasoning rounds and thousands of tool calls — the kind of workload that breaks most models long before they reach a solution.
Architecture: MoE with Dynamic Sparse Attention
GLM-5.1 uses the GLM_MOE_DSA architecture — Mixture of Experts with Dynamic Sparse Attention. The 754B total parameter count is slightly larger than GLM-5’s 744B, but the real difference is in training and post-training.
The Dynamic Sparse Attention mechanism reduces compute cost on long sequences by selectively attending to the most relevant tokens. Combined with the MoE routing, this keeps per-token inference manageable even as context windows stretch past 200K tokens.
The model ships in BF16/FP32, with quantized variants available via llama.cpp, LM Studio, Jan, and Ollama (9 quantization formats).
Benchmark results
Agentic engineering and coding
| Benchmark | GLM-5.1 | GLM-5 |
|---|---|---|
| SWE-Bench Pro | 58.4% | — |
| SWE-bench Verified | — | 77.8% |
| Terminal-Bench 2.0 | 63.5% | 56.2% |
| Terminal-Bench (Claude Code) | 69.0% | — |
| NL2Repo | 42.7% | — |
| Tool-Decathlon | 40.7% | — |
| τ³-Bench | 70.6% | — |
SWE-Bench Pro is a harder variant of SWE-bench that tests multi-file, multi-step issue resolution. GLM-5.1 leads on this benchmark. Terminal-Bench scores jump significantly, and the new NL2Repo benchmark tests full repository generation from natural language specifications.
Mathematics and reasoning
| Benchmark | GLM-5.1 | GLM-5 |
|---|---|---|
| AIME 2026 | 95.3% | 92.7% |
| HMMT Nov 2025 | 94.0% | 96.9% |
| GPQA-Diamond | 86.2% | 86.0% |
| IMOAnswerBench | 83.8% | — |
| HLE (with tools) | 52.3% | 50.4% |
AIME and GPQA-Diamond both improve. HMMT drops slightly (94.0% vs 96.9%), suggesting the post-training optimized for agentic tasks rather than pure competition math.
Cybersecurity and browsing
| Benchmark | GLM-5.1 | GLM-5 |
|---|---|---|
| CyberGym | 68.7% | 43.2% |
| BrowseComp | 68.0% | — |
The CyberGym jump from 43.2% to 68.7% is striking — a +25.5 point improvement that reflects better tool use and iterative reasoning in adversarial environments.
What makes it different: long-horizon persistence
Most large models plateau after a few dozen reasoning steps. GLM-5.1 was specifically post-trained to sustain performance over extended agent sessions — hundreds of rounds, thousands of tool calls, and repeated strategy revisions.
This is the key differentiator. On benchmarks like Terminal-Bench and SWE-Bench Pro, success depends not on getting the first attempt right, but on iterating: running experiments, reading error messages, revising approaches, and trying again. GLM-5.1 does this without losing coherence or context.
Z.ai describes this as the shift from “vibe coding” (generating plausible code in one shot) to “agentic engineering” (systematically solving complex problems through sustained interaction).
Deployment
GLM-5.1 is supported by multiple inference frameworks:
- SGLang (v0.5.10+)
- vLLM (v0.19.0+)
- xLLM (v0.8.0+)
- Transformers (v0.5.3+)
- KTransformers (v0.5.3+)
Nine quantized variants are available for local deployment via llama.cpp, LM Studio, Jan, and Ollama.
An API is available through the Z.ai platform, and a chat interface is coming to chat.z.ai.
License and access
GLM-5.1 is released under the MIT license — one of the most permissive licenses in open-source AI. This is a significant upgrade from GLM-5, which had unclear licensing. MIT means full commercial use, modification, and redistribution with minimal restrictions.
Languages supported: English and Chinese.
Conclusion
GLM-5.1 is not a minor version bump. The shift from benchmark performance to sustained agentic capability represents a genuine architectural and training philosophy change. Leading SWE-Bench Pro at 58.4%, nearly doubling CyberGym scores, and shipping under MIT makes this the strongest open-weight model for developers building agent systems.
For teams evaluating self-hosted alternatives to proprietary APIs for coding agents, terminal automation, or long-horizon task solving, GLM-5.1 just moved to the top of the list.
Model: zai-org/GLM-5.1 · Paper: arxiv.org/abs/2602.15763
Tags :
- AI
- Z.ai
- LLM
- MoE
- Agentic
- Open Source
- Coding