Samsung AI researcher's new, open reasoning model TRM outperforms models 10,000X larger — on specific problems

The trend of AI researchers developing new, small open source generative models that outperform far larger, proprietary peers continued this week with yet another staggering advancement.

Alexia Jolicoeur-Martineau, Senior AI Researcher at Samsung's Advanced Institute of Technology (SAIT) in Montreal, Canada, has introduced the Tiny Recursion Model (TRM) — a neural network so small it contains just 7 million parameters (internal model settings), yet it competes with or surpasses cutting-edge language models 10,000 times larger in terms of their parameter count, including OpenAI's o3-mini and Google's Gemini 2.5 Pro, on some of the toughest reasoning benchmarks in AI research.

The goal is to show that very highly performant new AI models can be created affordably without massive investments in the graphics processing units (GPUs) and power needed to train the larger, multi-trillion parameter flagship models powering many LLM chatbots today. The results were described in a research paper published on open access website arxiv.org, entitled "Less is More: Recursive Reasoning with Tiny Networks."

"The idea that one must rely on massive foundational models trained for millions of dollars by some big corporation in order to solve hard tasks is a trap," wrote Jolicoeur-Martineau on the social network X. "Currently, there is too much focus on exploiting LLMs rather than devising and expanding new lines of direction."

Jolicoeur-Martineau also added: "With recursive reasoning, it turns out that 'less is more'. A tiny model pretrained from scratch, recursing on itself and updating its answers over time, can achieve a lot without breaking the bank."

TRM's code is available now on Github under an enterprise-friendly, commercially viable MIT License — meaning anyone from researchers to companies can take, modify it, and deploy it for their own purposes, even commercial applications.

One Big Caveat

However, readers should be aware that TRM was designed specifically to perform well on structured, visual, grid-based problems like Sudoku, mazes, and puzzles on the ARC (Abstract and Reasoning Corpus)-AGI benchmark, the latter which offers tasks that should be easy for humans but difficult for AI models, such sorting colors on a grid based on a prior, but not identical, solution.

From Hierarchy to Simplicity

The TRM architecture represents a radical simplification.

It builds upon a technique called Hierarchical Reasoning Model (HRM) introduced earlier this year, which showed that small networks could tackle logical puzzles like Sudoku and mazes.

HRM relied on two cooperating networks—one operating at high frequency, the other at low—supported by biologically inspired arguments and mathematical justifications involving fixed-point theorems. Jolicoeur-Martineau found this unnecessarily complicated.

TRM strips these elements away. Instead of two networks, it uses a single two-layer model that recursively refines its own predictions.

The model begins with an embedded question and an initial answer, represented by variables x, y, and z. Through a series of reasoning steps, it updates its internal latent representation z and refines the answer y until it converges on a stable output. Each iteration corrects potential errors from the previous step, yielding a self-improving reasoning process without extra hierarchy or mathematical overhead.

How Recursion Replaces Scale

The core idea behind TRM is that recursion can substitute for depth and size.

By iteratively reasoning over its own output, the network effectively simulates a much deeper architecture without the associated memory or computational cost. This recursive cycle, run over as many as sixteen supervision steps, allows the model to make progressively better predictions — similar in spirit to how large language models use multi-step “chain-of-thought” reasoning, but achieved here with a compact, feed-forward design.

The simplicity pays off in both efficiency and generalization. The model uses fewer layers, no fixed-point approximations, and no dual-network hierarchy. A lightweight halting mechanism decides when to stop refining, preventing wasted computation while maintaining accuracy.

Performance That Punches Above Its Weight

Despite its small footprint, TRM delivers benchmark results that rival or exceed models millions of times larger. In testing, the model achieved:

87.4% accuracy on Sudoku-Extreme (up from 55% for HRM)
85% accuracy on Maze-Hard puzzles
45% accuracy on ARC-AGI-1
8% accuracy on ARC-AGI-2

These results surpass or closely match performance from several high-end large language models, including DeepSeek R1, Gemini 2.5 Pro, and o3-mini, despite TRM using less than 0.01% of their parameters.

Such results suggest that recursive reasoning, not scale, may be the key to handling abstract and combinatorial reasoning problems — domains where even top-tier generative models often stumble.

Design Philosophy: Less Is More

TRM’s success stems from deliberate minimalism. Jolicoeur-Martineau found that reducing complexity led to better generalization.

When the researcher increased layer count or model size, performance declined due to overfitting on small datasets.

By contrast, the two-layer structure, combined with recursive depth and deep supervision, achieved optimal results.

The model also performed better when self-attention was replaced with a simpler multilayer perceptron on tasks with small, fixed contexts like Sudoku.

For larger grids, such as ARC puzzles, self-attention remained valuable. These findings underline that model architecture should match data structure and scale rather than default to maximal capacity.

Training Small, Thinking Big

TRM is now officially available as open source under an MIT license on GitHub.

The repository includes full training and evaluation scripts, dataset builders for Sudoku, Maze, and ARC-AGI, and reference configurations for reproducing the published results.

It also documents compute requirements ranging from a single NVIDIA L40S GPU for Sudoku training to multi-GPU H100 setups for ARC-AGI experiments.

The open release confirms that TRM is designed specifically for structured, grid-based reasoning tasks rather than general-purpose language modeling.

Each benchmark — Sudoku-Extreme, Maze-Hard, and ARC-AGI — uses small, well-defined input–output grids, aligning with the model’s recursive supervision process.

Training involves substantial data augmentation (such as color permutations and geometric transformations), underscoring that TRM’s efficiency lies in its parameter size rather than total compute demand.

The model’s simplicity and transparency make it more accessible to researchers outside of large corporate labs. Its codebase builds directly on the earlier Hierarchical Reasoning Model framework but removes HRM’s biological analogies, multiple network hierarchies, and fixed-point dependencies.

In doing so, TRM offers a reproducible baseline for exploring recursive reasoning in small models — a counterpoint to the dominant “scale is all you need” philosophy.

Community Reaction

The release of TRM and its open-source codebase prompted an immediate debate among AI researchers and practitioners on X. While many praised the achievement, others questioned how broadly its methods could generalize.

Supporters hailed TRM as proof that small models can outperform giants, calling it “10,000× smaller yet smarter” and a potential step toward architectures that think rather than merely scale.

Critics countered that TRM’s domain is narrow — focused on bounded, grid-based puzzles — and that its compute savings come mainly from size, not total runtime.

Researcher Yunmin Cha noted that TRM’s training depends on heavy augmentation and recursive passes, “more compute, same model.”

Cancer geneticist and data scientist Chey Loveday stressed that TRM is a solver, not a chat model or text generator: it excels at structured reasoning but not open-ended language.

Machine learning researcher Sebastian Raschka positioned TRM as an important simplification of HRM rather than a new form of general intelligence.

He described its process as “a two-step loop that updates an internal reasoning state, then refines the answer.”

Several researchers, including Augustin Nabele, agreed that the model’s strength lies in its clear reasoning structure but noted that future work would need to show transfer to less-constrained problem types.

The consensus emerging online is that TRM may be narrow, but its message is broad: careful recursion, not constant expansion, could drive the next wave of reasoning research.

Looking Ahead

While TRM currently applies to supervised reasoning tasks, its recursive framework opens several future directions. Jolicoeur-Martineau has suggested exploring generative or multi-answer variants, where the model could produce multiple possible solutions rather than a single deterministic one.

Another open question involves scaling laws for recursion — determining how far the “less is more” principle can extend as model complexity or data size grows.

Ultimately, the study offers both a practical tool and a conceptual reminder: progress in AI need not depend on ever-larger models. Sometimes, teaching a small network to think carefully — and recursively — can be more powerful than making a large one think once.

Source link