Type something to search...
Nemotron Cascade 2: NVIDIA's 30B model that won the math and coding Olympics

Nemotron Cascade 2: NVIDIA's 30B model that won the math and coding Olympics

What is Nemotron Cascade 2?

Nemotron Cascade 2 (30B-A3B) is an open model released by NVIDIA on March 19, 2026. Its headline number is deceptive: 30 billion total parameters, but only 3 billion activated per inference pass. This is the Mixture of Experts architecture at work — the model routes each token through a small subset of its capacity, making it dramatically more efficient than a dense 30B model.

It supports two modes: thinking (extended chain-of-thought for hard problems) and instruct (fast, direct responses). On hard reasoning tasks, the thinking mode delivers results that are difficult to believe from a sub-frontier model.


Architecture and training

The training pipeline combines two techniques:

  • Cascade RL — a reinforcement learning approach that progressively challenges the model with harder problems as it improves
  • Multi-Domain On-Policy Distillation — the model generates its own training data under RL supervision, across mathematics, code, science, and instruction-following

The result is a model that has genuinely internalized structured problem-solving, not just pattern-matched against training examples.


Gold medals

This is the headline achievement. At the 2025 International Mathematical Olympiad and International Olympiad in Informatics, Nemotron Cascade 2 scored at gold medal level — competing against the best human students in the world.

These aren’t just benchmark numbers — IMO and IOI are the hardest math and programming competitions in the world, held annually with thousands of participants. A 30B open model reaching gold medal level is a meaningful milestone.


Full benchmark results

Mathematics

BenchmarkScore
IMO 202535 pts (gold)
AIME 202592.4 (98.6 with TIR)
AIME 202690.9 (95.0 with TIR)
HMMT Feb 202594.6
IMO AnswerBench79.3

Code & competitive programming

BenchmarkScore
IOI 2025439.3 pts (gold)
ICPC World Finals 202510/12
LiveCodeBench v687.2 (88.4 with TIR)
SWE Verified (OpenHands)50.2

Knowledge & science

BenchmarkScore
GPQA-Diamond76.1
MMLU-Pro79.8
MMLU-Redux86.3

Instruction following & alignment

BenchmarkScore
ArenaHard v2 (avg.)83.5
ArenaHard hard prompts88.2
IFBench82.9

Context length

BenchmarkScore
NIAH @ 1M tokens99.0
LongBench v240.3

The NIAH (Needle In A Haystack) score of 99.0 at 1 million tokens is particularly notable — the model reliably finds information buried in a 1M-token context.


Efficiency: 3B activated out of 30B

The MoE architecture is the key to making this model practical. At inference time, only 3B parameters fire per token. This means:

MetricValue
Total parameters30B
Activated per token3B (10%)
Context window262,144 tokens
Tensor typeBF16 / F32
Minimum setupSingle high-end GPU

You can serve this with vllm on a single GPU with --tensor-parallel-size 1 — no multi-GPU setup required for standard use.


Dual mode operation

The model is controlled via the chat template rather than separate model weights.

Thinking mode — activates the <think> reasoning trace before answering:

prompt = tokenizer.apply_chat_template(
    messages, tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True   # → <think>\n...
)

Instruct mode — skips the reasoning trace for fast responses:

prompt = tokenizer.apply_chat_template(
    messages, tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False  # → <think></think>
)

Recommended sampling: temperature=1.0, top_p=0.95.


Agentic and tool use

The model natively supports Tool-Integrated Reasoning (TIR) — it can call Python code execution mid-reasoning and incorporate the result before producing its final answer. This is what drives the +TIR improvements in the benchmark scores above.

Tool calls use this format:

<tool_call>
<function=stateful_python_code_exec>
<parameter=code>import sympy; sympy.solve(...)</parameter>
</function>
</tool_call>

For agentic coding, the model integrates with OpenHands (50.2 on SWE Verified). OpenCode is not currently supported.


Use cases

Best for:

  • Competitive mathematics and formal proofs
  • Hard coding problems (competitive programming level)
  • Long-context document analysis (up to 262K tokens)
  • Agentic coding workflows via OpenHands
  • Scientific reasoning (GPQA-Diamond: 76.1)

Not recommended for:

  • Real-time fact retrieval (no web access)
  • Deployments requiring OpenCode integration
  • Memory-constrained environments without GPU

Model ecosystem


Limitations

  • No OpenCode support — only OpenHands for agentic coding tasks
  • Context compression in multi-turn thinking — only the summary (not the full <think> trace) is retained in conversation history
  • Tool response format is non-standard — tool results go under the user role wrapped in <tool_response> tags, not a separate tool role
  • License is NVIDIA Open Model License, not Apache 2.0 — check terms for commercial use

Conclusion

Nemotron Cascade 2 redraws what’s possible with an efficient open model. A 3B-activated MoE winning gold at IMO and IOI is a genuine inflection point — not a benchmark cherry-pick, but a performance on the hardest public competitions that exist for mathematics and programming.

For researchers, engineers, and anyone building reasoning-heavy applications locally, this is the most capable open model in its weight class as of early 2026.

Model: nvidia/Nemotron-Cascade-2-30B-A3B — NVIDIA Open Model License

Tags :
  • AI
  • NVIDIA
  • Reasoning
  • MoE
  • Open Source
  • Mathematics
Share :

Related Posts

ChatGPT: Beware of These Malicious Chrome Extensions

ChatGPT: Beware of These Malicious Chrome Extensions

Are your ChatGPT secrets truly secure? The massive hype surrounding ChatGPT has led to the birth of thousands of Chrome extensions promising to enhance user experience. However, a recent study h

Read More
Agentic AI Smartphones: The Next Frontier for Enterprise

Agentic AI Smartphones: The Next Frontier for Enterprise

The rise of the "doer" AI The recent launch of the ZTE Nubia M153 prototype, powered by ByteDance's Doubao model, marks a decisive turning point. We are moving from passive voice assistants to "

Read More
Claude Opus 4.5: The Next Generation of AI

Claude Opus 4.5: The Next Generation of AI

Introduction to Claude Opus 4.5 Claude Opus 4.5, released on November 25, 2025, represents a significant leap forward in AI technology. This latest version brings a host of new features and impr

Read More
GLM-5: 744B parameters, 40B active — ZhipuAI's open-source frontier model

GLM-5: 744B parameters, 40B active — ZhipuAI's open-source frontier model

What is GLM-5? GLM-5 is a large language model released by ZhipuAI (智谱AI). It has 744 billion total parameters with only 40 billion active at inference — the same Mixture of Experts

Read More
Google Snapseed: A New Photo Experience Arrives on iPhone

Google Snapseed: A New Photo Experience Arrives on iPhone

Introduction: Google surprises mobile photographers Google has just made a major move in the iOS ecosystem by launching a dedicated camera app, directly linked to its famous Snapseed editing suit

Read More
Mistral Small 4: One Unified Model to Rule Reasoning, Code, and Vision

Mistral Small 4: One Unified Model to Rule Reasoning, Code, and Vision

For years, the AI model landscape has operated along a familiar tension: large models that are capable but expensive to run, versus small models that are fast but frustratingly limited. Mistral AI's

Read More
Mistral's Devstral 2: The Return of Sovereign Code AI

Mistral's Devstral 2: The Return of Sovereign Code AI

The European Counter-Strike in Code AI With the release of Devstral 2 and its lightweight counterpart Devstral Small 2, Mistral AI is effectively reclaiming territory in a sector recently domina

Read More
NVIDIA Nemotron-3 Super: a 120B MoE model that runs on a single GPU

NVIDIA Nemotron-3 Super: a 120B MoE model that runs on a single GPU

On March 11, 2026, NVIDIA released Nemotron-3 Super — a model that sits at an unusual intersection: 120 billion total parameters, only 12 billion active during inference, deployable on a single G

Read More
Qianfan-OCR: Baidu's 4B model that beats Gemini on document parsing

Qianfan-OCR: Baidu's 4B model that beats Gemini on document parsing

What is Qianfan-OCR? Qianfan-OCR is a document understanding model released by Baidu. It converts images of documents — PDFs, scans, photos, screenshots — directly into structured Markdown,

Read More
Qwen3.5-27B Distilled by Claude 4.6 Opus: A Local Reasoning Powerhouse

Qwen3.5-27B Distilled by Claude 4.6 Opus: A Local Reasoning Powerhouse

What is this model? Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled is an open-source 28B language model published by Jackrong on Hugging Face. The idea is

Read More
Project Ava: Razer Traps an AI in a Connected Jar

Project Ava: Razer Traps an AI in a Connected Jar

AI steps out of the screen with Razer Beyond RGB mice and keyboards, Razer is exploring new horizons with Project Ava. This concept, introduced as an "AI companion in a jar," aims to humaniz

Read More
Technology (definition)

Technology (definition)

Technology and ecology: a sustainable alliance At Reeboot, we firmly believe that technology and ecology can go hand in hand. Our mission is to provide high-performance products while adopting a

Read More
The Asus ROG Strix SCAR 18 Monster, VPN and Health: Today's Tech News

The Asus ROG Strix SCAR 18 Monster, VPN and Health: Today's Tech News

Introduction: a concentration of innovations and vigilance The world of technology never stops, and this morning, the news offers us a fascinating mix of raw performance, digital geopolitics, and

Read More
Voxtral-4B: Mistral's open-weights TTS model that speaks 9 languages in real time

Voxtral-4B: Mistral's open-weights TTS model that speaks 9 languages in real time

What is Voxtral-4B? Voxtral-4B-TTS-2603 is a text-to-speech model released by Mistral AI in March 2026. It converts text to realistic speech in 9 languages, with 20 built-in preset voices an

Read More
Windows 11: Your Android Apps Now in Full Screen on PC

Windows 11: Your Android Apps Now in Full Screen on PC

Breaking the barriers between mobile and PC Microsoft is taking another major step in unifying its operating systems. Thanks to an update to the "Phone Link" tool, users can now project their An

Read More