Olmo 3

LLM
Reasoning
Fully open language and reasoning models (7B/32B)
Author

Naoto Iwase

Published

February 2, 2026

Olmo 3 is a family of state-of-the-art, fully-open language models at the 7B and 32B parameter scales developed by the Allen Institute for AI (AI2). This release includes the entire Model Flow, i.e., the full lifecycle of the family of models, including every stage, checkpoint, data point, and dependency used to build it.

Key features:

Paper: arXiv:2512.13961

Contents

Base Model Training

Post-training

Model Variants

Olmo 3 Base: Foundation model (7B, 32B) — the strongest fully-open Base model

Olmo 3 Think: Reasoning model with step-by-step thinking — outperforms Qwen 2.5, Gemma 2/3, and DeepSeek R1

Olmo 3 Instruct: Model generating concise and direct responses — optimized for function calling

Olmo 3 RL-Zero: Trained with RL directly from Base — fully open RL benchmark

Key Results

Key benchmark results for Olmo 3.1 Think 32B:

Category Benchmark Score
Math MATH 96.2
Math AIME 2024 80.6
Reasoning BigBenchHard 88.6
Reasoning ZebraLogic 80.1
Coding HumanEvalPlus 91.5
Coding LiveCodeBench v3 83.3
IF IFEval 93.8
Knowledge MMLU 86.4

Training Cost

Approximately 56 days using 1,024 H100 GPUs (estimated cost: $2.75M)

  • Pretraining: ~47 days
  • Post-training: ~9 days

Open Artifacts

All intermediate checkpoints, training data, code, and evaluation tools are released:

  • Models: All checkpoints for Base, Think, Instruct, and RL-Zero
  • Data: Dolma 3 (pretraining), Dolci (post-training)
  • Code: OLMo-core, Open Instruct, duplodocus, OLMES

Core philosophy: To truly advance open-source AI, it is necessary to make not just the final model but the entire “path” to it transparent and accessible.