Samsung’s Compact AI Model Surpasses Large Language Models Like Gemini 2.5 Pro in Solving ARC-AGI Puzzles

While Samsung’s camera technology may currently lack significant breakthroughs, its advancements in artificial intelligence (AI) are noteworthy. The company’s latest AI initiative features a model that has impressively outperformed other large language models (LLMs), some of which are approximately 10, 000 times its size.

Introducing Samsung’s Innovative Tiny Recursive Model

TRM: One tiny network diagram detailing 7M Parameters and features like Self-Correction and Minimal Parameters. — Image Source

This model, known as the Tiny Recursive Model (TRM), is remarkably compact, comprising only 7 million parameters compared to the billions found in larger LLMs.
TRM employs its output to guide its subsequent steps, effectively creating a self-improving feedback mechanism.
By utilizing iterative reasoning on each output, it can emulate a deeper neural architecture without incurring the typical memory or computational overhead.
Through each recursive cycle, the model enhances the accuracy of its predictions or results.

Samsung’s strategy resembles the meticulous process of revising a written draft; the model iteratively identifies and corrects errors – a notable improvement over traditional LLMs that often falter when facing logic challenges if a single misstep occurs. Though chain-of-thought reasoning aids these models, its effectiveness remains fragile under pressure.

Key Takeaway: Embrace Simplicity

Initially, Samsung attempted to enhance the model’s complexity by increasing its layers; however, this approach led to overfitting and hindered generalization. Interestingly, a shift towards fewer layers combined with an increase in recursive iterations resulted in enhanced performance for the TRM.

Performance Results

Achieved an accuracy rate of 87.4% on Sudoku-Extreme, compared to only 55% for conventional Hierarchical Reasoning Models.
Secured an 85% accuracy on Maze-Hard puzzles.
Reached a 45% accuracy on ARC-AGI-1 challenges.
Obtained an 8% accuracy on ARC-AGI-2 tasks.

Remarkably, Samsung’s TRM not only competes with but in many cases exceeds the performance of larger LLMs such as DeepSeek R1, Google’s Gemini 2.5 Pro, and OpenAI’s o3-mini, all while utilizing a fraction of their parameter count.

Source & Images