Chroma Context-1: 20B MoE RAG agent, self-editing context, 10x faster than frontier

What is Chroma Context-1?

Chroma Context-1 is a 20B Mixture of Experts model built specifically for agentic search — retrieval tasks that require multiple hops, query decomposition, and self-correction. It is released by Chroma (the company behind the open-source vector database) under the Apache 2.0 license.

The model's defining feature is self-editing context: it can selectively prune irrelevant documents from its own context window during multi-step retrieval, maintaining quality across long search horizons without the context bloat that typically degrades RAG pipelines.

Architecture

Base model: GPT-OSS-20B (Mixture of Experts)
Training: SFT + RL via CISPO (a reinforcement learning objective), with staged curriculum learning across web, legal, and finance domains
Precision: BF16 (MXFP4 quantized checkpoint coming soon)

The MoE architecture means only a fraction of parameters are active per forward pass, which contributes to the model's speed advantage over dense alternatives.

Key capabilities

Query decomposition

Context-1 breaks down complex multi-constraint questions into targeted subqueries, then executes them systematically rather than attempting to answer from a single retrieval pass.

Parallel tool calling

The model averages 2.56 tool calls per turn, reducing the total number of turns required and minimizing end-to-end latency. This is a training outcome, not a prompt engineering trick — the model learned to batch its information-gathering steps.

Self-editing context

The most novel capability: Context-1 selectively removes irrelevant documents from its context window mid-search. This is measured at 0.94 context pruning accuracy, which means it almost always correctly identifies and discards documents that would pollute subsequent reasoning steps.

This solves a core RAG problem — context window pollution as more documents accumulate across hops — without requiring external re-ranking infrastructure.

Cross-domain generalization

Trained on web, legal, and finance tasks, the model generalizes to held-out domains and public benchmarks:

Benchmark	Type
BrowseComp-Plus	Web search
SealQA	Structured QA
FRAMES	Multi-hop reasoning
HLE	Long-horizon evaluation

Performance

Metric	Value
Retrieval quality	Comparable to frontier LLMs
Cost vs frontier	Fraction of cost
Speed vs frontier	Up to 10x faster
Context pruning accuracy	0.94
Avg tool calls per turn	2.56

Context-1 is positioned as a cost/speed alternative to using frontier models (GPT-4o, Claude, Gemini) for retrieval tasks — comparable quality at a fraction of the inference cost, with the speed advantage of a purpose-built model.

Important: agent harness required

Context-1 is designed to operate within a specific agent harness that manages:

Tool execution
Token budget tracking
Context pruning and deduplication
Multi-turn state

The harness is not yet public. Running the model standalone without the harness will not reproduce the benchmark results reported in the technical report. Chroma has announced the full release is planned soon.

This means Context-1 is currently best understood as a preview of the architecture and training approach — practical deployment requires waiting for the harness release or building a compatible wrapper based on the technical report.

Quantization and variants

7 quantized variants are available on the Hub for:

llama.cpp
LM Studio
Jan
Ollama

An MXFP4 quantized checkpoint is planned for release.

Limitations

Harness dependency — benchmark results require the proprietary agent harness, which is not yet public
No standalone usage — the model is not designed for general-purpose text generation; it is optimized for the search agent task
MoE complexity — Mixture of Experts models can be harder to deploy on limited hardware; check VRAM requirements against your infrastructure
Narrow training distribution — trained specifically on web, legal, and finance; performance on other domains may vary

Conclusion

Context-1 is an interesting architectural bet: instead of relying on a frontier model to power RAG, train a dedicated search agent that knows how to decompose, retrieve, and self-correct. The self-editing context mechanism is the standout design decision — it's a learned behavior that replaces what most RAG pipelines solve with external re-rankers or strict context window management.

The main caveat is the harness dependency. Until the full release, Context-1 is primarily a research artifact that signals where purpose-built RAG agents are heading.

Model: chromadb/context-1
Technical report: Chroma Context-1: Training a Self-Editing Search Agent

Chroma Context-1: the 20B agentic search model that edits its own context