Compression Is All You Need
Inside a new Freedman paper: a Googol hidden in 100 tokens, and why mathematics is a three-thousand-year AlphaZero run.
Inside a new Freedman paper: a Googol hidden in 100 tokens, and why mathematics is a three-thousand-year AlphaZero run.
In March this year, Michael Freedman, who won the Fields Medal back in 1986, published a paper with a few collaborators. The title is brash: Compression Is All You Need: Modeling Mathematics. I am borrowing it for this essay, because once you see what they measured, no other title does the job.
They did something that sounds dull at first.
They took MathLib, the Lean 4 library with roughly half a million theorems, definitions, and lemmas, turned the whole thing into a dependency graph, and measured two numbers for every element.
One they call wrapped length: how many tokens you write in the Lean source to state this thing.
The other is unwrapped length: if you recursively expand every reference down to the base axioms, how many raw symbols do you end up with.
Then they went looking for the deepest element in MathLib. They found a theorem in algebraic geometry called AlgebraicGeometry.Scheme.exists_hom_hom_comp_eq_comp_of_locallyOfFiniteType. Wrapped, it takes about 100 tokens. Fully unwrapped, it contains around $10^{104}$ raw symbols.
That number has a name. Googol. The search company misspelled it and kept the typo.
Something a person can read with a cup of coffee hides a logical chain 22 orders of magnitude larger than the number of particles in the observable universe.
They actually counted the symbols.
Each layer of definition doubles the underlying complexity
Freedman’s team plotted three scatter charts. The third one is the striking one.
The horizontal axis is depth: how many layers of definition separate this theorem from the base axioms. The vertical axis is $\log_2$ of the unwrapped length.
The points fall on a straight line. Slope close to 1.
Every time a human adds one more layer of definition to the mathematical tower, the raw logical content underneath doubles.
MathLib’s deepest chain reaches around 300 layers. $2^{300}$ is, give or take, that Googol.
The second chart shows something equally telling. Whether an element sits at depth 0, depth 100, or depth 300, the number of tokens you actually write in its definition hovers between 50 and 120. It does not grow.
Every time a human writes a 50-token definition, the complexity underneath doubles. It behaves like Moore’s Law, except this line has been drawn for two and a half thousand years.
You have seen this curve before
Anyone who watched the early AlphaZero papers should recognize the shape.
DeepMind’s system started self-playing in 2017. They found that ELO rating grew linearly with the log of compute. Chess, Go, shogi: three parallel lines, the same slope.
On the surface, the two projects have nothing in common. One is a Lean library built by thousands of mathematicians across millennia. The other is a neural network that trained for 72 hours on TPUs.
Freedman’s chart says otherwise. What mathematicians have been doing and what AlphaZero did are the same activity.
Both are taking something that exists but is out of reach, and compressing it, step by step, until a human can hold it.
Go has about $10^{170}$ legal positions. Chess about $10^{120}$. The strategic depth was always sitting inside the rules. Humans needed fifteen centuries to scratch the surface. AlphaZero used compute to pull the whole thing out in a weekend.
The Googol-scale theorem in MathLib was always sitting inside the Zermelo-Fraenkel axioms. Humans needed three thousand years, using definitions, lemmas, and propositions, to compress it down to 100 tokens.
In both cases, a computationally bounded agent is squeezing an exponentially large structure into a polynomial-sized expression.
Why God has no use for gravity
Last year, researchers at Carnegie Mellon and NYU published a paper called From Entropy to Epiplexity. I wrote about it in Compression Is Intelligence. The core move is to reject Shannon entropy as the right measure of information for learning.
Their favorite example is the sharpest one.
A terabyte of television static, and a copy of Euclid’s Elements. Which contains more information?
By Shannon entropy, the static wins in a landslide. Random noise is maximally incompressible. Each frame is independent. Maximum entropy means maximum information.
Feed the static to a neural network and it learns nothing. Feed it Euclid and it picks up two thousand years of geometry.
They proposed a new quantity called epiplexity. Epiplexity is the portion of structure that a computationally bounded learner can actually extract.
The static has enormous information and zero epiplexity. Euclid has modest information and pure epiplexity.
Then they wrote a line I keep coming back to. If a god with unlimited compute existed, that god would have no use for gravity, or grammar, or strategy, or laws of any kind. That god would compute everything from first principles, particle by particle.
Concepts, laws, theories, definitions, all of them exist because your compute is finite.
What Freedman does is take that philosophical line and prove it inside mathematics. He uses two algebraic structures as models. One, called the free abelian monoid $A_n$, models worlds where hierarchical compression works. The other, the free monoid $F_n$, models brute-force worlds. He shows that inside $A_n$, a logarithmic number of macros gives you exponential compression. Inside $F_n$, even polynomially many macros buy you only linear compression.
Then he fits the MathLib data to both models. Half a million mathematical elements. The numbers sit comfortably in the $A_n$ logarithmic regime and contradict every $F_n$ prediction.
Human mathematics is a strip of polynomial land floating on an exponential sea.
Most of that sea is incompressible, and P vs NP already tells you why. For a random unsatisfiable Boolean formula, the shortest proof is believed to be exponentially long. No definition system can shortcut it.
The mathematics you can learn is exactly the island where local definitional substitution works.
AlphaZero is a mirror held up to mathematicians
Put Freedman’s depth-vs-complexity chart next to AlphaZero’s ELO-vs-compute chart and you are looking at the same thing on two different time scales.
Mathematicians spent three thousand years compressing $10^{104}$ raw symbols down to 100 tokens. Each generation added a few layers of definition. Each layer doubled the complexity underneath.
AlphaZero spent 72 hours compressing the Go strategy space into a neural network of 50 million parameters. Each round of self-play nudged the score up.
The motion is identical. Given a fixed rule system, use compute to discover the compressible structure hidden inside the rules, then write that structure back into some form of memory.
For humans, the memory is MathLib, Bourbaki, every textbook.
For AlphaZero, the memory is a weight file.
So here is a way to reframe the whole thing. MathLib is the weight file our species produced after three thousand years of self-play against the game of mathematics.
Fermat’s Last Theorem, the Langlands program, resolution of singularities: these objects were always there in the set-theoretic axioms. It took dozens of generations of mathematicians, a few hours a day each, running for a few billion iterations, to fish them out.
Once you see this, it becomes harder to dismiss what OpenAI’s o3 does when it spends large amounts of compute rolling out math problems. That compute is replaying, on a faster time scale, the process that produced mathematicians in the first place.
A coordinate system for training data
This view forces a different answer to a question that has been looping in AI discourse all year: is there a scaling wall, are we running out of human text to train on?
The question is pointed at the wrong thing.
Freedman and the epiplexity paper together say something simple. The real resource for training is not token count. It is structural density.
A data source with high epiplexity and $A_n$-like hierarchy is worth orders of magnitude more than the same volume of social media noise. Ten million tokens of dense structure can outweigh ten billion tokens of random chatter. Push this to the limit and you recover the Euclid-vs-static thought experiment.
That is why formal mathematics and code keep mattering more as models get larger. Lean, Rocq, Python, TypeScript: these are $A_n$-shaped continents that humans already built. The structural density is high, and the rule system can generate new instances without bound.
The bottleneck is unmined compressible structure, not raw token count.
Freedman makes this actionable in the last section of his paper. He defines two ratios, $T_0$ and $I_0$, built from wrapped and unwrapped length. Then he runs a PageRank-style walk on the dependency graph that rewards load-bearing theorems, the ones that many highly compressed statements depend on.
In plain language: an AI agent exploring mathematics should not ask “is this problem important,” which no one can define. It should ask “does this region compress well,” which you can measure.
That turns out to match what mathematicians mean when they say a result is “deep,” or “natural,” or “generalizable.” Those words are notoriously vague. Translated into Freedman’s numbers, they say: short wrapped length, long unwrapped length, high centrality in the dependency graph.
That is a definition a machine can consume.
Closing with the mathematician himself
Freedman is not an AI researcher. He is a topologist. He won the Fields Medal for proving the four-dimensional Poincaré conjecture. At the end of this paper, he writes a paragraph that deserves to be quoted directly.
That we were led to introduce a definition while studying the role of definitions in mathematics is perhaps fitting. Definition formation is so natural to mathematical practice that even analyzing math requires new definitions.
That is compression in one sentence. Every cognitive iteration the species makes is the act of naming something, and every naming doubles the raw content you can now carry.
Definitions only exist because our compute is finite. A system with unlimited compute would do without them and just derive everything from the axioms, particle by particle.
Theorems only exist for the same reason. AlphaZero can afford to hold its strategic knowledge as a weight file because it has a TPU pod. We cannot, so we wrote MathLib.
From the outside, the two projects look similar.
From the inside, they are the same activity projected onto two different substrates.
Intelligence is not the accumulation of more information. Intelligence is the discovery of a skeleton inside information, and using that skeleton to lift a world many times its own size.
Humans have been at this for three thousand years. Machines have just picked up the shovel.
Related Reading
Compression Is Intelligence An earlier essay on the epiplexity paper and why structure, not entropy, is the right measure of information for bounded learners.
Based on the paper Compression Is All You Need: Modeling Mathematics (Aksenov, Bodnia, Freedman, Mulligan, 2026).

