Archive - Mathy AI Substack

Not all “long thinking” is good thinking.

Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and Correctness in LLMs, Nurit and Mike’s Daily Paper: 14.10.25

18 hrs ago •

Mike Erlihson, Mathy AI

and

Nurit Cohen Inger

The Geometry of Focus: Deconstructing Attention’s Limits

LIMITATIONS OF NORMALIZATION IN ATTENTION MECHANISM, Mike’s daily paper review: 13.10.25

Oct 13 •

Mike Erlihson, Mathy AI

Your model won the benchmarks: is it that good?

Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence, Shmulik and Mike’s Daily Paper Review: 12.10.25

Oct 12 •

Mike Erlihson, Mathy AI

and

Recursive Reasoning Without the Cruft (Allegedly :) )

Less is More: Recursive Reasoning with Tiny Networks, Mike’s daily paper review: 11.10.25

Oct 10 •

Mike Erlihson, Mathy AI

CompLLM: Slaying the Quadratic Dragon with Linear Segmentation

COMPLLM: COMPRESSION FOR LONG CONTEXT Q&A, Mike’s daily paper review: 10.10.25

Oct 9 •

Mike Erlihson, Mathy AI

Gradient-Free LLM Tuning with Evolution Strategies

Evolution Strategies at Scale: LLM FINE-TUNING BEYOND REINFORCEMENT LEARNING, Mike’s daily paper review: 08.10.25

Oct 8 •

Mike Erlihson, Mathy AI

Is the Exploration-Exploitation “Trade-Off” Just a Measurement Problem?

Beyond the Exploration-Explotation Trade-Off: A Hidden State Approach for LLM Reasoning in RLVR; Mike’s daily paper review: 06.10.25

Oct 6 •

Mike Erlihson, Mathy AI

Rethinking the Forward Pass: A Compositional View of LLMs

Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs, Mike’s daily paper review: 04.10.25

Oct 4 •

Mike Erlihson, Mathy AI

The Sculptor in the Machine: Why Your Optimizer Isn’t Just a Race Car

Optimizers Qualitatively Alter Solutions, And We Should Leverage This, Mike’s Daily Paper: 02.10.25

Oct 2 •

Mike Erlihson, Mathy AI

From Static X-Rays to Live MRI: A New Method to Watch AI Learn

EVOLUTION OF CONCEPTS IN LANGUAGE MODEL PRE-TRAINING, Mike’s Daily Paper: 30.09.25

Oct 1 •

Mike Erlihson, Mathy AI

September 2025

The Gatekeepers of Attention: A Deceptively Simple Fix for a Foundational LLM Problem

Gated Attention for Large Language Models- Non-linearity, Sparsity, and Attention-Sink-Free, Mike’s Daily Paper: 26.09.25

Sep 27 •

Mike Erlihson, Mathy AI

Beyond the Sequence: Time Series, Foundation Models, and the Abstraction of Dynamics, From 2024

Foundation Models for Time Series Analysis- A Tutorial and Survey, Mike's Daily Paper: 23.09.25

Sep 23 •

Mike Erlihson, Mathy AI

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts