Compression Is Intelligence
Why a concept called epiplexity may explain where intelligence comes from—and what that means for AI.
Intelligence, at its core, is a compression problem.
Humans cannot track every falling apple, so we invent gravity.
We cannot memorize every chess position, so we develop strategy.
We cannot remember every sentence we’ve ever read, so we acquire grammar.
In each case, intelligence emerges from the same constraint:
we cannot brute-force the world.
When computation is limited, discovering structure becomes essential.
A recent paper from researchers at Carnegie Mellon and NYU introduces a concept that captures this idea precisely. They call it epiplexity — the portion of information that a computationally bounded learner can actually extract.
The idea helps explain several puzzles about modern AI, from AlphaZero’s superhuman chess ability to the surprising effectiveness of reasoning-based models.
More importantly, it reframes a deeper question:
Where does intelligence actually come from?
The Static vs. Euclid Problem
Consider a simple thought experiment.
You have two things in front of you.
One is a terabyte of television static — pure noise, every pixel random.
The other is a copy of Euclid’s Elements, the geometry text that shaped two thousand years of mathematics.
Which one contains more information?
According to classical information theory — specifically Shannon entropy — the answer is the static.
Random noise is maximally unpredictable. You can’t compress it. Every frame is independent. High entropy means high information.
Euclid’s Elements, by contrast, is highly structured. Every theorem follows logically from earlier ones. The entire book unfolds from a small set of axioms.
That means it’s highly compressible — and therefore, according to entropy, contains less information.
And yet this answer feels obviously wrong.
Feed a model a terabyte of television static and it learns absolutely nothing.
Give it Euclid and it gains the logical foundations of geometry.
Something about the standard definition of information is missing.
The Hidden Assumption in Information Theory
The problem, according to the paper, lies in a hidden assumption.
Classical information theory implicitly assumes an observer with unlimited computational power.
An infinite calculator.
For such an observer, entropy works perfectly. Given unlimited time and compute, any structure hidden in data can eventually be extracted.
But real learners — humans and machines alike — do not have unlimited compute.
Neural networks train under finite budgets. Humans learn within limited lifetimes. No real system can analyze data forever in the hope that patterns will eventually emerge.
Imagine instead a truly omniscient being — a God with infinite computing power.
Such a being would have no need for concepts like gravity, strategy, or grammar. It could simply compute everything directly from first principles.
No shortcuts required.
But once computation becomes limited, shortcuts become essential.
You can’t memorize every chess game ever played, so you develop strategy.
You can’t track every falling apple, so you invent laws of motion.
You can’t simulate the entire universe, so you build models.
In other words:
Intelligence appears when brute force becomes impossible.
Two Kinds of Information
Once we consider computational limits, a new distinction emerges.
The paper proposes splitting information into two parts.
The first is time-bounded entropy — information that remains effectively random for any observer with limited computational power.
Television static.
The exact position of every leaf in a forest photograph.
The chaotic motion of a turbulent droplet.
This information exists, but it cannot be meaningfully compressed by a bounded learner.
For practical purposes, it behaves like noise.
The second part is what the authors call epiplexity.
Epiplexity is the structure that a computationally limited observer can actually extract.
The logical relationships in Euclid.
Strategic patterns in chess.
The grammar of language.
The causal structure inside a program.
This is the part of information that ends up inside a model’s weights.
In principle, total information contains both.
But for any real learner — human or machine — only epiplexity matters.
Noise contains information.
Structure creates intelligence.
AlphaZero and the Discovery of Structure
This perspective helps explain several surprising phenomena in modern AI.
Take DeepMind’s AlphaZero.
The system began with almost nothing — just the rules of chess.
No database of human games.
No opening books.
No expert guidance.
Instead, it simply played against itself millions of times. Within weeks, it surpassed every human chess player in history.
From the perspective of classical information theory, this seems impossible. A system shouldn’t be able to extract more information than it was given.
But epiplexity resolves the paradox.
The strategic depth of chess was always implicit in the rules. It existed inside the enormous combinatorial space of possible games.
Self-play simply converted computation into an extraction process.
It transformed the latent structure of the game into representations the neural network could actually use.
The information was always there.
Compute made it accessible.
Why Text Order Matters
Another puzzle becomes clearer as well.
In classical information theory, the order of data should not matter. Seeing A then B contains the same total information as seeing B then A.
But language models trained on natural text perform dramatically better than models trained on the same text in reverse order.
Why?
Because epiplexity depends on computational accessibility.
Natural language has directional structure. Each word constrains the next. These dependencies are relatively easy for a bounded learner to discover.
Reverse the text, and those dependencies become much harder to extract.
The total information remains the same.
But the learnable structure does not.
The AI Data Debate
This framework also sheds light on one of the most widely discussed questions in AI today.
Many researchers worry that models may eventually run out of high-quality human text to train on — a potential limit often described as the AI scaling wall.
Books, research papers, code repositories, and thoughtful writing on the web are finite. Large models have already consumed a significant portion of them.
If learning were simply about compressing human text, that constraint would indeed impose a hard ceiling.
But epiplexity suggests a different picture.
The real resource for intelligence is not human writing.
It is learnable structure.
Human text contains a great deal of structure, which is why it has been so useful for training models. But it is far from the only source.
Mathematics does not run out.
Physics does not run out.
Formal systems — programming languages, logic, games, simulations — do not run out.
In many of these domains, new structured data can be generated indefinitely simply by applying the underlying rules.
Chess positions can be generated forever.
Mathematical proofs can be explored endlessly.
Programs can be executed and simulations run.
Each instance contains structure waiting to be extracted.
Compression Is Intelligence
Viewed this way, intelligence is not about accumulating information.
It is about discovering structure under computational constraint.
Whenever brute force becomes impossible — whether for humans or machines — the only viable strategy is compression.
We compress physical observations into scientific laws.
We compress language into grammar.
We compress games into strategy.
And modern AI systems are beginning to do the same.
The internet may eventually run out of human text.
But intelligence was never about collecting text in the first place.
It was always about finding structure.
And that process — the discovery of compressible structure inside complexity — is what we call intelligence.
Related Reading
The AI Scaling Wall May Not Exist
An earlier essay exploring why AI progress may not be limited by the amount of human-generated text.
Based on the paper From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence (Finzi, Qiu, Jiang, Izmailov, Kolter, Wilson).


Good point,it was always about finding the principle and structure. But if the process of exploring structure is intelligence, then why does The AI Scaling Wall May Not Exist?