>_Reeboot
GLM-5: 744B parameters, 40B active β€” Z.ai's open-source frontier model
AI

GLM-5: 744B parameters, 40B active β€” Z.ai's open-source frontier model

Z.ai's GLM-5 is a 744B MoE model with 40B active parameters, trained on 28.5T tokens. It scores 92.7% on AIME 2026, 77.8% on SWE-bench Verified, and is the best open-source model on HMMT Nov 2025.

What is GLM-5?

GLM-5 is a large language model released by Z.ai (ζ™Ίθ°±AI). It has 744 billion total parameters with only 40 billion active at inference β€” the same Mixture of Experts efficiency pattern that made DeepSeek-V3 practical to deploy at scale.

It is the direct successor to GLM-4.5 (355B/32B active) and GLM-4.7, with substantially more pre-training data (28.5T tokens vs 23T), a novel sparse attention mechanism, and a post-training infrastructure built specifically for long-horizon agentic tasks. The paper title gives the ambition away: GLM-5: from Vibe Coding to Agentic Engineering.

The benchmark numbers are frontier-level for an open-weight model: 92.7% on AIME 2026, 77.8% on SWE-bench Verified, and 96.9% on HMMT Nov 2025 β€” the best open-source score on that competition math benchmark.


Architecture and training

DeepSeek Sparse Attention (DSA) is integrated to reduce deployment cost while preserving long-context capacity. At 744B total parameters, hardware requirements are significant β€” but the 40B active parameter count keeps per-token compute at a manageable level.

The "slime" RL infrastructure is ZhipuAI's solution to training models on complex, multi-step tasks. Standard RLHF struggles with long-horizon tasks because reward signals are sparse. The asynchronous design decouples generation from optimization, allowing larger batch sizes and more stable training on multi-step agent tasks.


Benchmark results

Mathematics

Benchmark GLM-5 GLM-4.7
AIME 2026 I 92.7% β€”
HMMT Nov 2025 96.9% β€”
HLE (no tools) 30.5 24.8
HLE (with tools) 50.4 β€”

HMMT (Harvard-MIT Mathematics Tournament) is a highly competitive undergraduate-level math tournament. 96.9% is the best open-source result on this benchmark.

Coding and software engineering

Benchmark Score
SWE-bench Verified 77.8%
Terminal-Bench 2.0 56.2–61.1%

SWE-bench Verified measures the ability to resolve real GitHub issues on open-source codebases. At 77.8%, GLM-5 sits at the frontier for open models. Terminal-Bench scores are competitive with Claude Opus 4.5 on command-line engineering tasks.

Reasoning and knowledge

Benchmark Score
GPQA-Diamond 86.0%
HLE (Humanity's Last Exam) 30.5
HLE with tools 50.4

GPQA-Diamond is a PhD-level expert reasoning benchmark; 86.0% puts GLM-5 among the top models available. HLE is the hardest general knowledge evaluation currently in use.

Cybersecurity

Benchmark Score
CyberGym 43.2%

Context window and agentic use

GLM-5 supports up to 202,752 tokens in reasoning + tool use configurations β€” long enough to hold entire codebases, long reports, or multi-turn agent trajectories in context.

The model natively supports:

  • Tool calling via the GLM-4.7 parser with auto-tool-choice
  • Extended reasoning via the GLM-4.5 reasoning parser
  • Web browsing, terminal execution, and function calling

Deployment

An FP8 quantized version (zai-org/GLM-5-FP8) is available, reducing memory requirements significantly. Both vLLM and SGLang are supported, with speculative decoding enabled for higher throughput.

With vLLM:

docker pull vllm/vllm-openai:nightly

vllm serve zai-org/GLM-5-FP8 \
  --tensor-parallel-size 8 \
  --gpu-memory-utilization 0.85 \
  --speculative-config.method mtp \
  --speculative-config.num_speculative_tokens 1 \
  --tool-call-parser glm47 \
  --reasoning-parser glm45 \
  --enable-auto-tool-choice \
  --served-model-name glm-5-fp8

With SGLang:

python3 -m sglang.launch_server \
  --model-path zai-org/GLM-5-FP8 \
  --tp-size 8 \
  --tool-call-parser glm47 \
  --reasoning-parser glm45 \
  --speculative-algorithm EAGLE \
  --speculative-num-steps 3

Ascend NPU deployments are supported via KTransformers and xLLM.


Limitations

  • Languages: English and Chinese only β€” no multilingual coverage beyond these two
  • Scale requirements β€” even with FP8 quantization, 40B active parameters requires multi-GPU setups (8Γ— tensor parallel in the examples)
  • No public API yet β€” self-hosted only for now
  • Context at 202K requires specific configuration β€” default evaluation context is 128K
  • License details not specified in the model card; verify before commercial deployment

Conclusion

GLM-5 enters the frontier open-source tier that only DeepSeek-V3 and a handful of others occupy. The 744B/40B MoE design keeps inference practical while delivering benchmark numbers β€” 96.9% HMMT, 77.8% SWE-bench, 86.0% GPQA-Diamond β€” that match or exceed many closed models.

For teams needing a self-hosted model for serious math, coding, or agentic workloads without depending on an API, GLM-5 is now the strongest option available.

Model: zai-org/GLM-5 Β· FP8 version