Mathy AI Substack
Subscribe
Sign in
Home
Notes
Archive
About
Latest
Top
Discussions
Not all “long thinking” is good thinking.
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and Correctness in LLMs, Nurit and Mike’s Daily Paper: 14.10.25
18 hrs ago
•
Mike Erlihson, Mathy AI
and
Nurit Cohen Inger
2
The Geometry of Focus: Deconstructing Attention’s Limits
LIMITATIONS OF NORMALIZATION IN ATTENTION MECHANISM, Mike’s daily paper review: 13.10.25
Oct 13
•
Mike Erlihson, Mathy AI
2
Your model won the benchmarks: is it that good?
Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence, Shmulik and Mike’s Daily Paper Review: 12.10.25
Oct 12
•
Mike Erlihson, Mathy AI
and
Shmulik Cohen
3
Recursive Reasoning Without the Cruft (Allegedly :) )
Less is More: Recursive Reasoning with Tiny Networks, Mike’s daily paper review: 11.10.25
Oct 10
•
Mike Erlihson, Mathy AI
1
CompLLM: Slaying the Quadratic Dragon with Linear Segmentation
COMPLLM: COMPRESSION FOR LONG CONTEXT Q&A, Mike’s daily paper review: 10.10.25
Oct 9
•
Mike Erlihson, Mathy AI
1
Gradient-Free LLM Tuning with Evolution Strategies
Evolution Strategies at Scale: LLM FINE-TUNING BEYOND REINFORCEMENT LEARNING, Mike’s daily paper review: 08.10.25
Oct 8
•
Mike Erlihson, Mathy AI
2
Is the Exploration-Exploitation “Trade-Off” Just a Measurement Problem?
Beyond the Exploration-Explotation Trade-Off: A Hidden State Approach for LLM Reasoning in RLVR; Mike’s daily paper review: 06.10.25
Oct 6
•
Mike Erlihson, Mathy AI
1
Rethinking the Forward Pass: A Compositional View of LLMs
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs, Mike’s daily paper review: 04.10.25
Oct 4
•
Mike Erlihson, Mathy AI
1
The Sculptor in the Machine: Why Your Optimizer Isn’t Just a Race Car
Optimizers Qualitatively Alter Solutions, And We Should Leverage This, Mike’s Daily Paper: 02.10.25
Oct 2
•
Mike Erlihson, Mathy AI
1
From Static X-Rays to Live MRI: A New Method to Watch AI Learn
EVOLUTION OF CONCEPTS IN LANGUAGE MODEL PRE-TRAINING, Mike’s Daily Paper: 30.09.25
Oct 1
•
Mike Erlihson, Mathy AI
1
September 2025
The Gatekeepers of Attention: A Deceptively Simple Fix for a Foundational LLM Problem
Gated Attention for Large Language Models- Non-linearity, Sparsity, and Attention-Sink-Free, Mike’s Daily Paper: 26.09.25
Sep 27
•
Mike Erlihson, Mathy AI
3
Beyond the Sequence: Time Series, Foundation Models, and the Abstraction of Dynamics, From 2024
Foundation Models for Time Series Analysis- A Tutorial and Survey, Mike's Daily Paper: 23.09.25
Sep 23
•
Mike Erlihson, Mathy AI
2
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts