Kimi K2.6: 1T parameters, Moonshot's agentic coding and vision model
- Bastien
- 01 May, 2026
From K2 to K2.6: Moonshot’s multimodal agent model
Moonshot AI’s Kimi K2.6 is a major step forward in combining three challenging capabilities into a single open-weight model: massive-scale agentic orchestration, long-context coding prowess, and native multimodal vision — all under a modified MIT license.
At 1 trillion total parameters with 32 billion active, K2.6 uses Multi-Head Latent Attention (MLA) for efficient long-context processing and integrates the MoonViT multimodal encoder for direct image and video understanding. The model isn’t just strong on benchmarks — it ships with an Agent Swarm framework capable of spawning up to 300 sub-agents across 4000 coordinated steps, and a coding agent CLI that transforms natural language prompts directly into production-ready UI.
Architecture: MoE with MLA and Vision Fusion
K2.6 stacks a dense MoE backbone with a multimodal encoder, creating a unified model that processes text and vision interleaved.
Multi-Head Latent Attention (MLA).
MLA compresses the KV cache into a low-dimensional latent space, then expands it via task-specific learned linear projections during attention computation. This reduces per-layer KV memory by roughly half compared to standard attention, making 256K-token contexts practical without excessive GPU requirements.
MoE (384 experts, 8 routed per token + 1 shared).
The 61-layer architecture includes 1 dense layer (vision fusion at the input side) and 60 MoE layers. Each token activates 8 out of 384 experts, plus 1 shared expert that all tokens pass through. This yields 32B active parameters from 1T total — a 1:31 density ratio that keeps inference efficient.
MoonViT multimodal encoder (400M).
The vision encoder processes raw images and video frames into token sequences that merge directly into the language model’s token stream. MoonViT uses a ViT-style transformer with 400 million parameters, providing strong visual grounding without requiring a separate vision model. The interleaved text-vision processing means questions like “explain this UI screenshot” can be answered with the same model that writes the HTML/CSS for the UI.
Activation and vocabulary.
SwiGLU activation replaces standard GELU (shown in ablation studies to improve MoE routing stability). The 160K vocabulary is significantly larger than the typical 32K–100K range, reducing tokenization overhead for non-English text and code.
Benchmark results
Agentic engineering and coding
| Benchmark | Kimi K2.6 |
|---|---|
| SWE-Bench Pro | 58.6% |
| SWE-Bench Verified | 80.2% |
| Terminal-Bench 2.0 | 66.7% |
| LLM-Full | 34.7% |
| BrowseComp | 83.2% |
| Toolathlon | 50.0 |
| MCPMark | 55.9 |
K2.6 leads among open-weight models on SWE-Bench Pro (58.6%), outpacing most alternatives. The BrowseComp score of 83.2 is particularly notable — it measures the ability to browse the web, synthesize information from multiple sources, and produce a correct answer, which is the core capability of autonomous research agents.
Mathematics and reasoning
| Benchmark | Kimi K2.6 |
|---|---|
| AIME 2026 | 96.4% |
| HMMT | 92.7% |
| GPQA-Diamond | 90.5% |
| HLE with tools | 54.0 |
| DeepSearchQA | 92.5 |
Math performance is among the strongest in any open model — AIME 96.4% puts K2.6 on par with or ahead of models significantly larger in parameter count. DeepSearchQA (92.5) measures deep research ability, where the model must query knowledge sources and synthesize comprehensive answers.
Coding
| Benchmark | Kimi K2.6 |
|---|---|
| LiveCodeBench v6 | 89.6% |
LiveCodeBench v6 tests real-coded programming problems from active competitions. A score of 89.6% demonstrates K2.6’s ability to solve novel coding challenges under time pressure, a skill sharpened by its dedicated coding agent training loop.
Vision and multimodal
| Benchmark | Kimi K2.6 |
|---|---|
| MMMU-Pro | 79.4% |
| V* | 96.9% |
MMMU-Pro evaluates multi-disciplinary multimodal understanding, while V* measures performance on complex visual reasoning tasks. Both scores are strong for an open model with integrated vision.
Agent capabilities
K2.6 is built for autonomous operation. Three capabilities stand out:
Agent Swarm (300 sub-agents, 4000 steps).
K2.6 can launch up to 300 sub-agents that operate in parallel, each handling a different subtask. The orchestrator coordinates across 4000+ total steps — thinking, calling tools, reviewing results, and adjusting strategy. This is not a simple tool-calling loop; it’s a hierarchical agent architecture where each sub-agent can spawn its own tool calls.
Coding-Driven Design.
A unique capability: provide a natural language prompt describing a UI, and K2.6 generates production-ready HTML/CSS/JS code. The prompt → UI pipeline leverages deep understanding of both design intent and frontend engineering conventions.
Proactive and Open Orchestration.
K2.6 supports 24/7 background agent execution — agents that run autonomously in the background, checking schedules, processing data, and reporting results. Plus an “Open” mode where agents can be observed and steered in real-time.
Comparison with other Moonshot Kimi models
| Dimension | Kimi K2.6 |
|---|---|
| Total params | 1T |
| Active params | 32B |
| Architecture | MoE + MLA + MoonViT |
| Layers | 61 (1 dense + 60 MoE) |
| Experts | 384 / 8 + 1 shared |
| Context | 256K |
| Vision | MoonViT 400M |
| Vocabulary | 160K |
| Quantization | INT4 native |
| SWE-Bench Pro | 58.6% |
| AIME 2026 | 96.4% |
| License | Modified MIT |
Deployment
K2.6 is supported by multiple inference frameworks:
- vLLM (latest)
- SGLang (latest)
- KTransformers (Moonshot’s custom inference stack)
An API is available at platform.moonshot.ai, compatible with both OpenAI and Anthropic API formats.
The model natively supports INT4 quantization, which can be leveraged for memory-efficient deployment on consumer hardware with minimal accuracy loss.
Additional features include:
- Interleaved Thinking and Multi-Step Tool Call — the model reasons, acts, observes, and repeats in a single generation
- Preserve Thinking mode — explicitly save and reuse reasoning chains across multiple rounds
- Kimi Code CLI — a coding agent framework that wraps K2.6 as a CLI tool with persistent workspace memory
License and access
Kimi K2.6 is released under a Modified MIT License — most permissions of the standard MIT apply, with minor additional terms. Check the official repository for full details.
Available at: Moonshot AI / Kimi-K2.6
Citation
@article{moonshot2026kimi-k26,
title={Kimi K2.6: Scaling Agent Orchestration with Multimodal Integration},
author={Moonshot AI},
journal={arXiv preprint},
year={2026},
url={https://arxiv.org/abs/2602.02276}
}
Tags :
- AI
- Moonshot
- LLM
- MoE
- Agentic
- Open Source
- Multimodal
- Vision
- Long Context
- Coding