Type something to search...
Kimi K2.6: 1T parameters, Moonshot's agentic coding and vision model

Kimi K2.6: 1T parameters, Moonshot's agentic coding and vision model

From K2 to K2.6: Moonshot’s multimodal agent model

Moonshot AI’s Kimi K2.6 is a major step forward in combining three challenging capabilities into a single open-weight model: massive-scale agentic orchestration, long-context coding prowess, and native multimodal vision — all under a modified MIT license.

At 1 trillion total parameters with 32 billion active, K2.6 uses Multi-Head Latent Attention (MLA) for efficient long-context processing and integrates the MoonViT multimodal encoder for direct image and video understanding. The model isn’t just strong on benchmarks — it ships with an Agent Swarm framework capable of spawning up to 300 sub-agents across 4000 coordinated steps, and a coding agent CLI that transforms natural language prompts directly into production-ready UI.


Architecture: MoE with MLA and Vision Fusion

K2.6 stacks a dense MoE backbone with a multimodal encoder, creating a unified model that processes text and vision interleaved.

Multi-Head Latent Attention (MLA).
MLA compresses the KV cache into a low-dimensional latent space, then expands it via task-specific learned linear projections during attention computation. This reduces per-layer KV memory by roughly half compared to standard attention, making 256K-token contexts practical without excessive GPU requirements.

MoE (384 experts, 8 routed per token + 1 shared).
The 61-layer architecture includes 1 dense layer (vision fusion at the input side) and 60 MoE layers. Each token activates 8 out of 384 experts, plus 1 shared expert that all tokens pass through. This yields 32B active parameters from 1T total — a 1:31 density ratio that keeps inference efficient.

MoonViT multimodal encoder (400M).
The vision encoder processes raw images and video frames into token sequences that merge directly into the language model’s token stream. MoonViT uses a ViT-style transformer with 400 million parameters, providing strong visual grounding without requiring a separate vision model. The interleaved text-vision processing means questions like “explain this UI screenshot” can be answered with the same model that writes the HTML/CSS for the UI.

Activation and vocabulary.
SwiGLU activation replaces standard GELU (shown in ablation studies to improve MoE routing stability). The 160K vocabulary is significantly larger than the typical 32K–100K range, reducing tokenization overhead for non-English text and code.


Benchmark results

Agentic engineering and coding

BenchmarkKimi K2.6
SWE-Bench Pro58.6%
SWE-Bench Verified80.2%
Terminal-Bench 2.066.7%
LLM-Full34.7%
BrowseComp83.2%
Toolathlon50.0
MCPMark55.9

K2.6 leads among open-weight models on SWE-Bench Pro (58.6%), outpacing most alternatives. The BrowseComp score of 83.2 is particularly notable — it measures the ability to browse the web, synthesize information from multiple sources, and produce a correct answer, which is the core capability of autonomous research agents.

Mathematics and reasoning

BenchmarkKimi K2.6
AIME 202696.4%
HMMT92.7%
GPQA-Diamond90.5%
HLE with tools54.0
DeepSearchQA92.5

Math performance is among the strongest in any open model — AIME 96.4% puts K2.6 on par with or ahead of models significantly larger in parameter count. DeepSearchQA (92.5) measures deep research ability, where the model must query knowledge sources and synthesize comprehensive answers.

Coding

BenchmarkKimi K2.6
LiveCodeBench v689.6%

LiveCodeBench v6 tests real-coded programming problems from active competitions. A score of 89.6% demonstrates K2.6’s ability to solve novel coding challenges under time pressure, a skill sharpened by its dedicated coding agent training loop.

Vision and multimodal

BenchmarkKimi K2.6
MMMU-Pro79.4%
V*96.9%

MMMU-Pro evaluates multi-disciplinary multimodal understanding, while V* measures performance on complex visual reasoning tasks. Both scores are strong for an open model with integrated vision.


Agent capabilities

K2.6 is built for autonomous operation. Three capabilities stand out:

Agent Swarm (300 sub-agents, 4000 steps).
K2.6 can launch up to 300 sub-agents that operate in parallel, each handling a different subtask. The orchestrator coordinates across 4000+ total steps — thinking, calling tools, reviewing results, and adjusting strategy. This is not a simple tool-calling loop; it’s a hierarchical agent architecture where each sub-agent can spawn its own tool calls.

Coding-Driven Design.
A unique capability: provide a natural language prompt describing a UI, and K2.6 generates production-ready HTML/CSS/JS code. The prompt → UI pipeline leverages deep understanding of both design intent and frontend engineering conventions.

Proactive and Open Orchestration.
K2.6 supports 24/7 background agent execution — agents that run autonomously in the background, checking schedules, processing data, and reporting results. Plus an “Open” mode where agents can be observed and steered in real-time.


Comparison with other Moonshot Kimi models

DimensionKimi K2.6
Total params1T
Active params32B
ArchitectureMoE + MLA + MoonViT
Layers61 (1 dense + 60 MoE)
Experts384 / 8 + 1 shared
Context256K
VisionMoonViT 400M
Vocabulary160K
QuantizationINT4 native
SWE-Bench Pro58.6%
AIME 202696.4%
LicenseModified MIT

Deployment

K2.6 is supported by multiple inference frameworks:

  • vLLM (latest)
  • SGLang (latest)
  • KTransformers (Moonshot’s custom inference stack)

An API is available at platform.moonshot.ai, compatible with both OpenAI and Anthropic API formats.

The model natively supports INT4 quantization, which can be leveraged for memory-efficient deployment on consumer hardware with minimal accuracy loss.

Additional features include:

  • Interleaved Thinking and Multi-Step Tool Call — the model reasons, acts, observes, and repeats in a single generation
  • Preserve Thinking mode — explicitly save and reuse reasoning chains across multiple rounds
  • Kimi Code CLI — a coding agent framework that wraps K2.6 as a CLI tool with persistent workspace memory

License and access

Kimi K2.6 is released under a Modified MIT License — most permissions of the standard MIT apply, with minor additional terms. Check the official repository for full details.

Available at: Moonshot AI / Kimi-K2.6


Citation

@article{moonshot2026kimi-k26,
  title={Kimi K2.6: Scaling Agent Orchestration with Multimodal Integration},
  author={Moonshot AI},
  journal={arXiv preprint},
  year={2026},
  url={https://arxiv.org/abs/2602.02276}
}
Tags :
  • AI
  • Moonshot
  • LLM
  • MoE
  • Agentic
  • Open Source
  • Multimodal
  • Vision
  • Long Context
  • Coding
Share :

Related Posts

DeepSeek-V4-Pro: Highly Efficient Million-Token Context Language Model

DeepSeek-V4-Pro: Highly Efficient Million-Token Context Language Model

Introduction DeepSeek-V4-Pro is a preview of the DeepSeek-V4 family released in 2026. It offers a 1.6 T‑parameter total size (49 B active) with a 1 M‑token context, using hybrid attention and the

Read More
ChatGPT: Beware of These Malicious Chrome Extensions

ChatGPT: Beware of These Malicious Chrome Extensions

Are your ChatGPT secrets truly secure? The massive hype surrounding ChatGPT has led to the birth of thousands of Chrome extensions promising to enhance user experience. However, a recent study h

Read More
Agentic AI Smartphones: The Next Frontier for Enterprise

Agentic AI Smartphones: The Next Frontier for Enterprise

The rise of the "doer" AI The recent launch of the ZTE Nubia M153 prototype, powered by ByteDance's Doubao model, marks a decisive turning point. We are moving from passive voice assistants to "

Read More
MiMo-V2.5-Pro: 1.02T parameters, MIT-licensed agent powerhouse

MiMo-V2.5-Pro: 1.02T parameters, MIT-licensed agent powerhouse

From V2-Pro to V2.5-Pro: the long-context breakthrough XiaoMi's MiMo family has rapidly positioned itself among the top open-weight models. MiMo-V2.5-Pro is the latest iteration — a 1.02 tri

Read More
Chroma Context-1: the 20B agentic search model that edits its own context

Chroma Context-1: the 20B agentic search model that edits its own context

What is Chroma Context-1? Chroma Context-1 is a 20B Mixture of Experts model built specifically for agentic search — retrieval tasks that require multiple hops, query decomposition, and self

Read More
Claude Opus 4.5: The Next Generation of AI

Claude Opus 4.5: The Next Generation of AI

Introduction to Claude Opus 4.5 Claude Opus 4.5, released on November 25, 2025, represents a significant leap forward in AI technology. This latest version brings a host of new features and impr

Read More
Claude Opus 4.7: Anthropic's software engineering flagship gets sharper

Claude Opus 4.7: Anthropic's software engineering flagship gets sharper

What is Claude Opus 4.7 On April 16, 2026, Anthropic released Claude Opus 4.7 — a targeted upgrade to its flagship model focused on one theme: rigor in long-running software engineering work

Read More
Cohere Transcribe: a 2B ASR model that tops the English leaderboard

Cohere Transcribe: a 2B ASR model that tops the English leaderboard

What is Cohere Transcribe? Cohere Transcribe 03-2026 is an automatic speech recognition (ASR) model released by Cohere Labs. With 2B parameters, it ranks **#1 on the English ASR leaderboard*

Read More
Gemma 4 31B: Google's multimodal model with 256K context and thinking mode

Gemma 4 31B: Google's multimodal model with 256K context and thinking mode

What is Gemma 4 31B? Gemma 4 31B (instruction-tuned variant: gemma-4-31B-it) is Google's latest open-weights multimodal model with 30.7 billion parameters. It processes text, images, and v

Read More
GLM-5.1: 754B parameters — Z.ai's agentic engineering flagship

GLM-5.1: 754B parameters — Z.ai's agentic engineering flagship

From GLM-5 to GLM-5.1: the agentic leap Less than two weeks after releasing GLM-5, Z.ai (formerly ZhipuAI) ships GLM-5.1 — a 754B-parameter Mixture of Experts model that does not just iterat

Read More
GLM-5: 744B parameters, 40B active — ZhipuAI's open-source frontier model

GLM-5: 744B parameters, 40B active — ZhipuAI's open-source frontier model

What is GLM-5? GLM-5 is a large language model released by ZhipuAI (智谱AI). It has 744 billion total parameters with only 40 billion active at inference — the same Mixture of Experts

Read More
Google Snapseed: A New Photo Experience Arrives on iPhone

Google Snapseed: A New Photo Experience Arrives on iPhone

Introduction: Google surprises mobile photographers Google has just made a major move in the iOS ecosystem by launching a dedicated camera app, directly linked to its famous Snapseed editing suit

Read More
LFM2.5-VL-450M: Liquid AI's 450M vision model that runs in a browser

LFM2.5-VL-450M: Liquid AI's 450M vision model that runs in a browser

What is LFM2.5-VL-450M Most vision-language models compete on scale — billions of parameters, hundreds of GPU-hours for inference. Liquid AI takes the opposite approach. LFM2.5-VL-450M is a

Read More
MiniMax-M2.7: a 229B model that engineers itself

MiniMax-M2.7: a 229B model that engineers itself

What is MiniMax-M2.7 MiniMax-M2.7 is a 229B-parameter dense model from MiniMax, a Beijing-based AI lab. Unlike most frontier models that iterate through human-supervised training cycles, M2.

Read More
Mistral Small 4: One Unified Model to Rule Reasoning, Code, and Vision

Mistral Small 4: One Unified Model to Rule Reasoning, Code, and Vision

For years, the AI model landscape has operated along a familiar tension: large models that are capable but expensive to run, versus small models that are fast but frustratingly limited. Mistral AI's

Read More
Mistral's Devstral 2: The Return of Sovereign Code AI

Mistral's Devstral 2: The Return of Sovereign Code AI

The European Counter-Strike in Code AI With the release of Devstral 2 and its lightweight counterpart Devstral Small 2, Mistral AI is effectively reclaiming territory in a sector recently domina

Read More
Nemotron Cascade 2: NVIDIA's 30B model that won the math and coding Olympics

Nemotron Cascade 2: NVIDIA's 30B model that won the math and coding Olympics

What is Nemotron Cascade 2? Nemotron Cascade 2 (30B-A3B) is an open model released by NVIDIA on March 19, 2026. Its headline number is deceptive: 30 billion total parameters, but only **3 bi

Read More
NVIDIA Nemotron-3 Super: a 120B MoE model that runs on a single GPU

NVIDIA Nemotron-3 Super: a 120B MoE model that runs on a single GPU

On March 11, 2026, NVIDIA released Nemotron-3 Super — a model that sits at an unusual intersection: 120 billion total parameters, only 12 billion active during inference, deployable on a single G

Read More
Qianfan-OCR: Baidu's 4B model that beats Gemini on document parsing

Qianfan-OCR: Baidu's 4B model that beats Gemini on document parsing

What is Qianfan-OCR? Qianfan-OCR is a document understanding model released by Baidu. It converts images of documents — PDFs, scans, photos, screenshots — directly into structured Markdown,

Read More
Qwen3.5-27B Distilled by Claude 4.6 Opus: A Local Reasoning Powerhouse

Qwen3.5-27B Distilled by Claude 4.6 Opus: A Local Reasoning Powerhouse

What is this model? Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled is an open-source 28B language model published by Jackrong on Hugging Face. The idea is

Read More
Project Ava: Razer Traps an AI in a Connected Jar

Project Ava: Razer Traps an AI in a Connected Jar

AI steps out of the screen with Razer Beyond RGB mice and keyboards, Razer is exploring new horizons with Project Ava. This concept, introduced as an "AI companion in a jar," aims to humaniz

Read More
Technology (definition)

Technology (definition)

Technology and ecology: a sustainable alliance At Reeboot, we firmly believe that technology and ecology can go hand in hand. Our mission is to provide high-performance products while adopting a

Read More
The Asus ROG Strix SCAR 18 Monster, VPN and Health: Today's Tech News

The Asus ROG Strix SCAR 18 Monster, VPN and Health: Today's Tech News

Introduction: a concentration of innovations and vigilance The world of technology never stops, and this morning, the news offers us a fascinating mix of raw performance, digital geopolitics, and

Read More
Ubuntu 26.04 LTS: Rust coreutils, Wayland-only, and kernel 7.0

Ubuntu 26.04 LTS: Rust coreutils, Wayland-only, and kernel 7.0

Ubuntu 26.04 LTS: Resolute Raccoon Ubuntu 26.04 LTS, codenamed Resolute Raccoon, ships on April 23, 2026. The codename honors Steve Langasek, a former Debian and Ubuntu release manager who p

Read More
Voxtral-4B: Mistral's open-weights TTS model that speaks 9 languages in real time

Voxtral-4B: Mistral's open-weights TTS model that speaks 9 languages in real time

What is Voxtral-4B? Voxtral-4B-TTS-2603 is a text-to-speech model released by Mistral AI in March 2026. It converts text to realistic speech in 9 languages, with 20 built-in preset voices an

Read More
Windows 11: Your Android Apps Now in Full Screen on PC

Windows 11: Your Android Apps Now in Full Screen on PC

Breaking the barriers between mobile and PC Microsoft is taking another major step in unifying its operating systems. Thanks to an update to the "Phone Link" tool, users can now project their An

Read More