ML Notes — English

Small Recursive Reasoning Models

Naoto Iwase — Sun, 24 May 2026 00:00:00 GMT

A separate lineage of reasoning models, built around small neural networks of a few million to a few tens of millions of parameters that are recursively unrolled at test time, came into focus in 2025–2026. Five papers, Hierarchical Reasoning Model (HRM), Tiny Recursive Model (TRM), Probabilistic Tiny Recursive Model (PTRM), Generative Recursive reAsoning Models (GRAM), and Lattice Deduction Transformers (LDT), all reach competitive accuracy on Sudoku and Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) from only a thousand or so training examples, and on certain tasks claim to surpass 671B-parameter frontier Large Language Models (LLMs).

This book examines the recursive reasoning model research program from all sides, technical content, prior art, evaluation, and critique. The five main models (HRM, TRM, PTRM, GRAM, LDT) each have a dedicated chapter, four supporting chapters cover the lineage, the broader landscape of latent reasoning, the comparison with CoT scaling, and the ARC-AGI competitive context, and two final chapters provide an implementation guide and a roundup of open problems.

Reliable Reasoning

Naoto Iwase — Tue, 19 May 2026 00:00:00 GMT

Research on eliciting the reasoning ability of Large Language Models (LLMs) in a reliable manner accelerated rapidly through 2025–2026. This book organizes that literature along three axes — training-side signals (RLVR, GRPO, Process Reward Models), inference-side signals (self-consistency, confidence, test-time scaling), and structural approaches (tree search, reasoning structure analysis, diffusion LLMs) — covering more than 190 recent works from ICLR 2026, ACL 2026, ICML 2026, NeurIPS 2025, EMNLP 2025, and beyond.

Three questions run through the book:

Q1: Does RLVR genuinely expand the capabilities of the base model, or does it merely re-weight existing capabilities?
Q2: How can we estimate the correctness of a reasoning trace without access to ground truth?
Q3: Where in the inference budget — depth, width, or search — should the limited compute be invested?

Multiple research lines that developed independently around these questions began to intersect rapidly during 2025–2026.

Diffusion Language Models

Naoto Iwase — Fri, 15 May 2026 00:00:00 GMT

Diffusion Language Models (DLLM) bring the ideas behind the diffusion models that succeeded in image generation into language modeling. Recent years have seen large-scale implementations such as LLaDA and Dream, alongside commercial-grade systems including Mercury and Gemini Diffusion. This book consolidates the key references needed to understand modern DLLMs, integrating the taxonomy presented in the Li et al. 2025 survey, and systematically covers formulation, sampling, the correspondence with continuous diffusion, adaptation from AR models, derivative discrete models, hybrid architectures, inference acceleration, guidance, post-training, multimodal extensions, and downstream applications.

One-Step Generation

Naoto Iwase — Wed, 11 Feb 2026 00:00:00 GMT

Between 2025 and 2026, methods that overcome the multi-step inference of diffusion models and Flow Matching to generate high-quality images with a single network evaluation (1-NFE) have been rapidly advancing. This series curates four papers driving this field, tracing the technical evolution from extensions of Flow Matching to entirely new paradigms.

Molmo2

Naoto Iwase — Tue, 03 Feb 2026 00:00:00 GMT

Molmo2 (Multimodal Open Language Model 2) is a fully open Vision-Language Model (VLM) family developed by the Allen Institute for AI (AI2) and the University of Washington. Its key distinguishing feature is video grounding capability, which enables the model to precisely indicate “when and where” specific events or objects occur within a video.

Using 9 new datasets (constructed entirely without relying on proprietary models), Molmo2 achieves state-of-the-art performance among open-source models. In particular, it surpasses proprietary models such as Gemini 3 Pro in video pointing and tracking.

Paper: arXiv:2601.10611

Code: github.com/allenai/molmo2

Demo: playground.allenai.org

Olmo 3

Naoto Iwase — Mon, 02 Feb 2026 00:00:00 GMT

Olmo 3 is a family of state-of-the-art, fully-open language models at the 7B and 32B parameter scales developed by the Allen Institute for AI (AI2). This release includes the entire Model Flow, i.e., the full lifecycle of the family of models, including every stage, checkpoint, data point, and dependency used to build it.

Paper: arXiv:2512.13961