What is GLM-5?
GLM-5 is a large language model released by Z.ai (ζΊθ°±AI). It has 744 billion total parameters with only 40 billion active at inference β the same Mixture of Experts efficiency pattern that made DeepSeek-V3 practical to deploy at scale.
It is the direct successor to GLM-4.5 (355B/32B active) and GLM-4.7, with substantially more pre-training data (28.5T tokens vs 23T), a novel sparse attention mechanism, and a post-training infrastructure built specifically for long-horizon agentic tasks. The paper title gives the ambition away: GLM-5: from Vibe Coding to Agentic Engineering.
The benchmark numbers are frontier-level for an open-weight model: 92.7% on AIME 2026, 77.8% on SWE-bench Verified, and 96.9% on HMMT Nov 2025 β the best open-source score on that competition math benchmark.
Architecture and training
DeepSeek Sparse Attention (DSA) is integrated to reduce deployment cost while preserving long-context capacity. At 744B total parameters, hardware requirements are significant β but the 40B active parameter count keeps per-token compute at a manageable level.
The "slime" RL infrastructure is ZhipuAI's solution to training models on complex, multi-step tasks. Standard RLHF struggles with long-horizon tasks because reward signals are sparse. The asynchronous design decouples generation from optimization, allowing larger batch sizes and more stable training on multi-step agent tasks.
Benchmark results
Mathematics
| Benchmark | GLM-5 | GLM-4.7 |
|---|---|---|
| AIME 2026 I | 92.7% | β |
| HMMT Nov 2025 | 96.9% | β |
| HLE (no tools) | 30.5 | 24.8 |
| HLE (with tools) | 50.4 | β |
HMMT (Harvard-MIT Mathematics Tournament) is a highly competitive undergraduate-level math tournament. 96.9% is the best open-source result on this benchmark.
Coding and software engineering
| Benchmark | Score |
|---|---|
| SWE-bench Verified | 77.8% |
| Terminal-Bench 2.0 | 56.2β61.1% |
SWE-bench Verified measures the ability to resolve real GitHub issues on open-source codebases. At 77.8%, GLM-5 sits at the frontier for open models. Terminal-Bench scores are competitive with Claude Opus 4.5 on command-line engineering tasks.
Reasoning and knowledge
| Benchmark | Score |
|---|---|
| GPQA-Diamond | 86.0% |
| HLE (Humanity's Last Exam) | 30.5 |
| HLE with tools | 50.4 |
GPQA-Diamond is a PhD-level expert reasoning benchmark; 86.0% puts GLM-5 among the top models available. HLE is the hardest general knowledge evaluation currently in use.
Cybersecurity
| Benchmark | Score |
|---|---|
| CyberGym | 43.2% |
Context window and agentic use
GLM-5 supports up to 202,752 tokens in reasoning + tool use configurations β long enough to hold entire codebases, long reports, or multi-turn agent trajectories in context.
The model natively supports:
- Tool calling via the GLM-4.7 parser with auto-tool-choice
- Extended reasoning via the GLM-4.5 reasoning parser
- Web browsing, terminal execution, and function calling
Deployment
An FP8 quantized version (zai-org/GLM-5-FP8) is available, reducing memory requirements significantly. Both vLLM and SGLang are supported, with speculative decoding enabled for higher throughput.
With vLLM:
docker pull vllm/vllm-openai:nightly
vllm serve zai-org/GLM-5-FP8 \
--tensor-parallel-size 8 \
--gpu-memory-utilization 0.85 \
--speculative-config.method mtp \
--speculative-config.num_speculative_tokens 1 \
--tool-call-parser glm47 \
--reasoning-parser glm45 \
--enable-auto-tool-choice \
--served-model-name glm-5-fp8With SGLang:
python3 -m sglang.launch_server \
--model-path zai-org/GLM-5-FP8 \
--tp-size 8 \
--tool-call-parser glm47 \
--reasoning-parser glm45 \
--speculative-algorithm EAGLE \
--speculative-num-steps 3Ascend NPU deployments are supported via KTransformers and xLLM.
Limitations
- Languages: English and Chinese only β no multilingual coverage beyond these two
- Scale requirements β even with FP8 quantization, 40B active parameters requires multi-GPU setups (8Γ tensor parallel in the examples)
- No public API yet β self-hosted only for now
- Context at 202K requires specific configuration β default evaluation context is 128K
- License details not specified in the model card; verify before commercial deployment
Conclusion
GLM-5 enters the frontier open-source tier that only DeepSeek-V3 and a handful of others occupy. The 744B/40B MoE design keeps inference practical while delivering benchmark numbers β 96.9% HMMT, 77.8% SWE-bench, 86.0% GPQA-Diamond β that match or exceed many closed models.
For teams needing a self-hosted model for serious math, coding, or agentic workloads without depending on an API, GLM-5 is now the strongest option available.
Model: zai-org/GLM-5 Β· FP8 version
