Recursive Reasoning Without the Cruft (Allegedly :) )
Less is More: Recursive Reasoning with Tiny Networks, Mike’s daily paper review: 11.10.25
This paper introduces the Tiny Recursive Model (TRM), a sharp and compelling rebuttal to the recently proposed Hierarchical Reasoning Model (HRM). Where HRM built a complex, biologically inspired system, TRM systematically (more or less) dismantles its core components, replacing them with simpler and more effective mechanisms. The novelty here isn’t a single new technique, but a philosophical shift for a narrow problem domain: demonstrating that powerful iterative reasoning on puzzles can emerge from radical architectural minimalism.
First, TRM abandons the shaky theoretical scaffolding of its predecessor. HRM leans heavily on the Implicit Function Theorem to justify its gradient calculations, approximating a long recursive chain by only backpropagating through the final step. TRM correctly identifies this as a potential weak point, as the conditions for the theorem are unlikely to be met in practice. Its solution is brutally direct: define a full recursive block and backpropagate through the entire computational graph. This removes the need for fixed-point assumptions and grounds the learning process in a more stable, end-to-end optimization.
Second, the paper offers a reinterpretation of HRM’s latent space. HRM posits two distinct latent variables, and zH, corresponding to “low-level” and “high-level” reasoning. TRM’s novelty is to re-conceptualize this into something more intuitive: a current best-guess answer (y) and a latent reasoning scratchpad (z). The model’s single network first iterates on the reasoning latent z, then uses the refined z to update its answer y. This demystifies the process, transforming it from an abstract hierarchy into a concrete, iterative refinement loop.
Building on this simplification, TRM collapses HRM’s two separate networks into a single, unified one. The insight is that the network’s task, either refining reasoning or updating the answer, can be implicitly defined by its inputs. The presence of the original question x signals a reasoning step, while its absence signals an answer-update step. This input-driven conditioning allows a single set of weights to perform both functions, halving the parameter count.
Perhaps the most revealing, and ultimately limiting, novelty is the paper’s “less is more” discovery. Instead of scaling up, TRM finds that shrinking the network from four layers to just two, while increasing internal recursions, yields superior generalization. While framed as a strength, this is a significant red flag. In an era dominated by scaling laws, this “optimal tininess” casts serious doubt on the approach’s scalability and relevance to real-world problems. It strongly suggests that, for the heavily augmented puzzle data tested, the model is merely a brittle memorization machine, whose main challenge is avoiding overfitting, not developing true reasoning capacity. This severely limits its applicability to complex, open-ended tasks where larger models are essential.
In essence, TRM’s contribution is an elegant but narrow exercise in simplification for a niche set of problems. Its success on these highly structured puzzles, propped up by extreme data augmentation, should be viewed with significant skepticism. The model’s inverse scaling behavior confirms it is a highly specialized, and likely fragile, tool, not a general-purpose reasoner. While recursion is a powerful concept, the path to complex, human-like reasoning requires architectures that can effectively scale with both data and complexity. TRM, while a clever piece of engineering, is ultimately a detour, not a destination on that path.
https://arxiv.org/abs/2510.04871