Chroma Context-1: the 20B agentic search model that edits its own context
- Bastien
- 03 Apr, 2026
What is Chroma Context-1?
Chroma Context-1 is a 20B Mixture of Experts model built specifically for agentic search — retrieval tasks that require multiple hops, query decomposition, and self-correction. It is released by Chroma (the company behind the open-source vector database) under the Apache 2.0 license.
The model’s defining feature is self-editing context: it can selectively prune irrelevant documents from its own context window during multi-step retrieval, maintaining quality across long search horizons without the context bloat that typically degrades RAG pipelines.
Architecture
- Base model: GPT-OSS-20B (Mixture of Experts)
- Training: SFT + RL via CISPO (a reinforcement learning objective), with staged curriculum learning across web, legal, and finance domains
- Precision: BF16 (MXFP4 quantized checkpoint coming soon)
The MoE architecture means only a fraction of parameters are active per forward pass, which contributes to the model’s speed advantage over dense alternatives.
Key capabilities
Query decomposition
Context-1 breaks down complex multi-constraint questions into targeted subqueries, then executes them systematically rather than attempting to answer from a single retrieval pass.
Parallel tool calling
The model averages 2.56 tool calls per turn, reducing the total number of turns required and minimizing end-to-end latency. This is a training outcome, not a prompt engineering trick — the model learned to batch its information-gathering steps.
Self-editing context
The most novel capability: Context-1 selectively removes irrelevant documents from its context window mid-search. This is measured at 0.94 context pruning accuracy, which means it almost always correctly identifies and discards documents that would pollute subsequent reasoning steps.
This solves a core RAG problem — context window pollution as more documents accumulate across hops — without requiring external re-ranking infrastructure.
Cross-domain generalization
Trained on web, legal, and finance tasks, the model generalizes to held-out domains and public benchmarks:
| Benchmark | Type |
|---|---|
| BrowseComp-Plus | Web search |
| SealQA | Structured QA |
| FRAMES | Multi-hop reasoning |
| HLE | Long-horizon evaluation |
Performance
| Metric | Value |
|---|---|
| Retrieval quality | Comparable to frontier LLMs |
| Cost vs frontier | Fraction of cost |
| Speed vs frontier | Up to 10x faster |
| Context pruning accuracy | 0.94 |
| Avg tool calls per turn | 2.56 |
Context-1 is positioned as a cost/speed alternative to using frontier models (GPT-4o, Claude, Gemini) for retrieval tasks — comparable quality at a fraction of the inference cost, with the speed advantage of a purpose-built model.
Important: agent harness required
Context-1 is designed to operate within a specific agent harness that manages:
- Tool execution
- Token budget tracking
- Context pruning and deduplication
- Multi-turn state
The harness is not yet public. Running the model standalone without the harness will not reproduce the benchmark results reported in the technical report. Chroma has announced the full release is planned soon.
This means Context-1 is currently best understood as a preview of the architecture and training approach — practical deployment requires waiting for the harness release or building a compatible wrapper based on the technical report.
Quantization and variants
7 quantized variants are available on the Hub for:
- llama.cpp
- LM Studio
- Jan
- Ollama
An MXFP4 quantized checkpoint is planned for release.
Limitations
- Harness dependency — benchmark results require the proprietary agent harness, which is not yet public
- No standalone usage — the model is not designed for general-purpose text generation; it is optimized for the search agent task
- MoE complexity — Mixture of Experts models can be harder to deploy on limited hardware; check VRAM requirements against your infrastructure
- Narrow training distribution — trained specifically on web, legal, and finance; performance on other domains may vary
Conclusion
Context-1 is an interesting architectural bet: instead of relying on a frontier model to power RAG, train a dedicated search agent that knows how to decompose, retrieve, and self-correct. The self-editing context mechanism is the standout design decision — it’s a learned behavior that replaces what most RAG pipelines solve with external re-rankers or strict context window management.
The main caveat is the harness dependency. Until the full release, Context-1 is primarily a research artifact that signals where purpose-built RAG agents are heading.
Model: chromadb/context-1
Technical report: Chroma Context-1: Training a Self-Editing Search Agent
Tags :
- AI
- Chroma
- RAG
- Search
- Agents
- Open Source