When A New Kind of Science was published twenty years ago I thought what it had to say was important. But what’s become increasingly clear—particularly in the last few years—is that it’s actually even much more important than I ever imagined. My original goal in A New Kind of Science was to take a step beyond the mathematical paradigm that had defined the state of the art in science for three centuries—and to introduce a new paradigm based on computation and on the exploration of the computational universe of possible programs. And already in A New Kind of Science one can see that there’s immense richness to what can be done with this new paradigm.
There’s a new abstract basic science—that I now call ruliology—that’s concerned with studying the detailed properties of systems with simple rules. There’s a vast new source of “raw material” to “mine” from the computational universe, both for making models of things and for developing technology. And there are new, computational ways to think about fundamental features of how systems in nature and elsewhere work.
But what’s now becoming clear is that there’s actually something still bigger, still more overarching that the paradigm of A New Kind of Science lays the foundations for. In a sense, A New Kind of Science defines how one can use computation to think about things. But what we’re now realizing is that actually computation is not just a way to think about things: it is at a very fundamental level what everything actually is.
One can see this as a kind of ultimate limit of A New Kind of Science. What we call the ruliad is the entangled limit of all possible computations. And what we, for example, experience as physical reality is in effect just our particular sampling of the ruliad. And it’s the ideas of A New Kind of Science—and particularly things like the Principle of Computational Equivalence—that lay the foundations for understanding how this works.
When I wrote A New Kind of Science I discussed the possibility that there might be a way to find a fundamental model of physics based on simple programs. And from that seed has now come the Wolfram Physics Project, which, with its broad connections to existing mathematical physics, now seems to show that, yes, it’s really true that our physical universe is “computational all the way down”.
But there’s more. It’s not just that at the lowest level there’s some specific rule operating on a vast network of atoms of space. It’s that underneath everything is all possible computation, encapsulated in the single unique construct that is the ruliad. And what determines our experience—and the science we use to summarize it—is what characteristics we as observers have in sampling the ruliad.
There is a tower of ideas that relate to fundamental questions about the nature of existence, and the foundations not only of physics, but also of mathematics, computer science and a host of other fields. And these ideas build crucially on the paradigm of A New Kind of Science. But they need something else as well: what I now call the multicomputational paradigm. There were hints of it in A New Kind of Science when I discussed multiway systems. But it has only been within the past couple of years that this whole new paradigm has begun to come into focus. In A New Kind of Science I explored some of the remarkable things that individual computations out in the computational universe can do. What the multicomputational paradigm now does is to consider the aggregate of multiple computations—and in the end the entangled limit of all possible computations, the ruliad.
The Principle of Computational Equivalence is in many ways the intellectual culmination of A New Kind of Science—and it has many deep consequences. And one of them is the idea—and uniqueness—of the ruliad. The Principle of Computational Equivalence provides a very general statement about what all possible computational systems do. What the ruliad then does is to pull together the behaviors and relationships of all these systems into a single object that is, in effect, an ultimate representation of everything computational, and indeed in a certain sense simply of everything.
The publication of A New Kind of Science 20 years ago was for me already the culmination of an intellectual journey that had begun more than 25 years earlier. I had started in theoretical physics as a teenager in the 1970s. And stimulated by my needs in physics, I had then built my first computational language. A couple of years later I returned to basic science, now interested in some very fundamental questions. And from my blend of experience in physics and computing I was led to start trying to formulate things in terms of computation, and computational experiments. And soon discovered the remarkable fact that in the computational universe, even very simple programs can generate immensely complex behavior.
For several years I studied the basic science of the particular class of simple programs known as cellular automata—and the things I saw led me to identify some important general phenomena, most notably computational irreducibility. Then in 1986—having “answered most of the obvious questions I could see”—I left basic science again, and for five years concentrated on creating Mathematica and what’s now the Wolfram Language. But in 1991 I took the tools I’d built, and again immersed myself in basic science. The decade that followed brought a long string of exciting and unexpected discoveries about the computational universe and its implications—leading finally in 2002 to the publication of A New Kind of Science.
In many ways, A New Kind of Science is a very complete book—that in its 1280 pages does well at “answering all the obvious questions”, save, notably, for some about the “application area” of fundamental physics. For a couple of years after the book was published, I continued to explore some of these remaining questions. But pretty soon I was swept up in the building of WolframAlpha and then the Wolfram Language, and in all the complicated and often deep questions involved in for the first time creating a fullscale computational language. And so for nearly 17 years I did almost no basic science.
The ideas of A New Kind of Science nevertheless continued to exert a deep influence—and I came to see my decades of work on computational language as ultimately being about creating a bridge between the vast capabilities of the computational universe revealed by A New Kind of Science, and the specific kinds of ways we humans are able to think about things. This point of view led me to all kinds of important conclusions about the role of computation and its implications for the future. But through all this I kept on thinking that one day I should look at physics again. And finally in 2019, stimulated by a small technical breakthrough, as well as enthusiasm from physicists of a new generation, I decided it was time to try diving into physics again.
My practical tools had developed a lot since I’d worked on A New Kind of Science. And—as I have found so often—the passage of years had given me greater clarity and perspective about what I’d discovered in A New Kind of Science. And it turned out we were rather quickly able to make spectacular progress. A New Kind of Science had introduced definite ideas about how fundamental physics might work. Now we could see that these ideas were very much on the right track, but on their own they did not go far enough. Something else was needed.
In A New Kind of Science I’d introduced what I called multiway systems, but I’d treated them as a kind of sideshow. Now—particularly tipped off by quantum mechanics—we realized that multiway systems were not a sideshow but were actually in a sense the main event. They had come out of the computational paradigm of A New Kind of Science, but they were really harbingers of a new paradigm: the multicomputational paradigm.
In A New Kind of Science, I’d already talked about space—and everything else in the universe—ultimately being made up of a network of discrete elements that I’d now call “atoms of space”. And I’d talked about time being associated with the inexorable progressive application of computationally irreducible rules. But now we were thinking not just of a single thread of computation, but instead of a whole multiway system of branching and merging threads—representing in effect a multicomputational history for the universe.
In A New Kind of Science I’d devoted a whole chapter to “Processes of Perception and Analysis”, recognizing the importance of the observer in computational systems. But with multicomputation there was yet more focus on this, and on how a physical observer knits things together to form a coherent thread of experience. Indeed, it became clear that it’s certain features of the observer that ultimately determine the laws of physics we perceive. And in particular it seems that as soon as we—somehow reflecting core features of our conscious experience—believe that we exist persistently through time, but are computationally bounded, then it follows that we will attribute to the universe the central known laws of spacetime and quantum mechanics.
At the level of atoms of space and individual threads of history everything is full of computational irreducibility. But the key point is that observers like us don’t experience this; instead we sample certain computationally reducible features—that we can describe in terms of meaningful “laws of physics”.
I never expected it would be so easy, but by early 2020—only a few months into our Wolfram Physics Project—we seemed to have successfully identified how the “machine code” of our universe must work. A New Kind of Science had established that computation was a powerful way of thinking about things. But now it was becoming clear that actually our whole universe is in a sense “computational all the way down”.
But where did this leave the traditional mathematical view? To my surprise, far from being at odds it seemed as if our computationallthewaydown model of physics perfectly plugged into a great many of the more abstract existing mathematical approaches. Mediated by multicomputation, the concepts of A New Kind of Science—which began as an effort to go beyond mathematics—seemed now to be finding a kind of ultimate convergence with mathematics.
But despite our success in working out the structure of the “machine code” for our universe, a major mystery remained. Let’s say we could find a particular rule that could generate everything in our universe. Then we’d have to ask “Why this rule, and not another?” And if “our rule” was simple, how come we’d “lucked out” like that? Ever since I was working on A New Kind of Science I’d wondered about this.
And just as we were getting ready to announce the Physics Project in May 2020 the answer began to emerge. It came out of the multicomputational paradigm. And in a sense it was an ultimate version of it. Instead of imagining that the universe follows some particular rule—albeit applying it multicomputationally in all possible ways—what if the universe follows all possible rules?
And then we realized: this is something much more general than physics. And in a sense it’s the ultimate computational construct. It’s what one gets if one takes all the programs in the computational universe that I studied in A New Kind of Science and runs them together—as a single, giant, multicomputational system. It’s a single, unique object that I call the ruliad, formed as the entangled limit of all possible computations.
There’s no choice about the ruliad. Everything about it is abstractly necessary—emerging as it does just from the formal concept of computation. A New Kind of Science developed the abstraction of thinking about things in terms of computation. The ruliad takes this to its ultimate limit—capturing the whole entangled structure of all possible computations—and defining an object that in some sense describes everything.
Once we believe—as the Principle of Computational Equivalence implies—that things like our universe are computational, it then inevitably follows that they are described by the ruliad. But the observer has a crucial role here. Because while as a matter of theoretical science we can discuss the whole ruliad, our experience of it inevitably has to be based on sampling it according to our actual capabilities of perception.
In the end, it’s deeply analogous to something that—as I mention in A New Kind of Science—first got me interested in fundamental questions in science 50 years ago: the Second Law of thermodynamics. The molecules in a gas move around and interact according to certain rules. But as A New Kind of Science argues, one can think about this as a computational process, which can show computational irreducibility. If one didn’t worry about the “mechanics” of the observer, one might imagine that one could readily “see through” this computational irreducibility, to the detailed behavior of the molecules underneath. But the point is that a realistic, computationally bounded observer—like us—will be forced by computational irreducibility to perceive only certain “coarsegrained” aspects of what’s going on, and so will consider the gas to be behaving in a standard largescale thermodynamic way.
And so it is, at a grander level, with the ruliad. Observers like us can only perceive certain aspects of what’s going on in the ruliad, and a key result of our Physics Project is that with only quite loose constraints on what we’re like as observers, it’s inevitable that we will perceive our universe to operate according to particular precise known laws of physics. And indeed the attributes that we associate with “consciousness” seem closely tied to what’s needed to get the features of spacetime and quantum mechanics that we know from physics. In A New Kind of Science one of the conclusions is that the Principle of Computational Equivalence implies a fundamental equivalence between systems (like us) that we consider “intelligent” or “conscious”, and systems that we consider “merely computational”.
But what’s now become clear in the multicomputational paradigm is that there’s more to this story. It’s not (as people have often assumed) that there’s something more powerful about “conscious observers” like us. Actually, it’s rather the opposite: that in order to have consistent “conscious experience” we have to have certain limitations (in particular, computational boundedness, and a belief of persistence in time), and these limitations are what make us “see the ruliad” in the way that corresponds to our usual view of the physical world.
The concept of the ruliad is a powerful one, with implications that significantly transcend the traditional boundaries of science. For example, last year I realized that thinking in terms of the ruliad potentially provides a meaningful answer to the ultimate question of why our universe exists. The answer, I posit, is that the ruliad—as a “purely formal” object—“necessarily exists”. And what we perceive as “our universe” is then just the “slice” that corresponds to what we can “see” from the particular place in “rulial space” at which we happen to be. There has to be “something there”—and the remarkable fact is that for an observer with our general characteristics, that something has to have features that are like our usual laws of physics.
In A New Kind of Science I discussed how the Principle of Computational Equivalence implies that almost any system can be thought of as being “like a mind” (as in, “the weather has a mind of its own”). But the issue—that for example is of central importance in talking about extraterrestrial intelligence—is how similar to us that mind is. And now with the ruliad we have a more definite way to discuss this. Different minds (even different human ones) can be thought of as being at different places in the ruliad, and thus in effect attributing different rules to the universe. The Principle of Computational Equivalence implies that there must ultimately be a way to translate (or, in effect, move) from one place to another. But the question is how far it is.
Our senses and measuring devices—together with our general paradigms for thinking about things—define the basic area over which our understanding extends, and for which we can readily produce a highlevel narrative description of what’s going on. And in the past we might have assumed that this was all we’d ever need to reach with whatever science we built. But what A New Kind of Science—and now the ruliad—show us is that there’s much more out there. There’s a whole computational universe of possible programs—many of which behave in ways that are far from our current domain of highlevel understanding.
Traditional science we can view as operating by gradually expanding our domain of understanding. But in a sense the key methodological idea that launched A New Kind of Science is to do computational experiments, which in effect just “jump without prior understanding” out into the wilds of the computational universe. And that’s in the end why all that ruliology in A New Kind of Science at first looks so alien: we’ve effectively jumped quite far from our familiar place in rulial space, so there’s no reason to expect we’ll recognize anything. And in effect, as the title of the book says, we need to be doing a new kind of science.
In A New Kind of Science, an important part of the story has to do with the phenomenon of computational irreducibility, and the way in which it prevents any computationally bounded observer (like us) from being able to “reduce” the behavior of systems, and thereby perceive them as anything other than complex. But now that we’re thinking not just about computation, but about multicomputation, other attributes of other observers start to be important too. And with the ruliad ultimately representing everything, the question of what will be perceived in any particular case devolves into one about the characteristics of observers.
In A New Kind of Science I give examples of how the same kinds of simple programs (such as cellular automata) can provide good “metamodels” for a variety of kinds of systems in nature and elsewhere, that show up in very different areas of science. But one feature of different areas of science is that they’re often concerned with different kinds of questions. And with the focus on the characteristics of the observer this is something we get to capture—and we get to discuss, for example, what the chemical observer, or the economic observer, might be like, and how that affects their perception of what’s ultimately in the ruliad.
In Chapter 12 of A New Kind of Science there’s a long section on “Implications for Mathematics and Its Foundations”, which begins with the observation that just as many models in science seem to be able to start from simple rules, mathematics is traditionally specifically set up to start from simple axioms. I then analyzed how multiway systems could be thought of as defining possible derivations (or proofs) of new mathematical theorems from axioms or other theorems—and I discussed how the difficulty of doing mathematics can be thought of as a reflection of computational irreducibility.
But informed by our Physics Project I realized that there’s much more to say about the foundations of mathematics—and this has led to our recently launched Metamathematics Project. At the core of this project is the idea that mathematics, like physics, is ultimately just a sampling of the ruliad. And just as the ruliad defines the lowestlevel machine code of physics, so does it also for mathematics.
The traditional axiomatic level of mathematics (with its builtin notions of variables and operators and so on) is already higher level than the “raw ruliad”. And a crucial observation is that just like physical observers operate at a level far above things like the atoms of space, so “mathematical observers” mostly operate at a level far above the raw ruliad, or even the “assembly code” of axioms. In an analogy with gases, the ruliad—or even axiom systems—are talking about the “molecular dynamics” level; but “mathematical observers” operate more at the “fluid dynamics” level.
And the result of this is what I call the physicalization of metamathematics: the realization that our “perception” of mathematics is like our perception of physics. And that, for example, the very possibility of consistently doing higherlevel mathematics where we don’t always have to drop down to the level of axioms or the raw ruliad has the same origin as the fact that “observers like us” typically view space as something continuous, rather than something made up of lots of atoms of space.
In A New Kind of Science I considered it a mystery why phenomena like undecidability are not more common in typical pure mathematics. But now our Metamathematics Project provides an answer that’s based on the character of mathematical observers.
My stated goal at the beginning of A New Kind of Science was to go beyond the mathematical paradigm, and that’s exactly what was achieved. But now there’s almost a full circle—because we see that building on A New Kind of Science and the computational paradigm we reach the multicomputational paradigm and the ruliad, and then we realize that mathematics, like physics, is part of the ruliad. Or, put another way, mathematics, like physics—and like everything else—is “made of computation”, and all computation is in the ruliad.
And that means that insofar as we consider there to be physical reality, so also we must consider there to be “mathematical reality”. Physical reality arises from the sampling of the ruliad by physical observers; so similarly mathematical reality must arise from the sampling of the ruliad by mathematical observers. Or, in other words, if we believe that the physical world exists, so we must—essentially like Plato—also believe that the mathematics exists, and that there is an underlying reality to mathematics.
All of these ideas rest on what was achieved in A New Kind of Science but now go significantly beyond it. In an “Epilog” that I eventually cut from the final version of A New Kind of Science I speculated that “major new directions” might be built in 15–30 years. And when I wrote that, I wasn’t really expecting that I would be the one to be central in doing that. And indeed I suspect that had I simply continued the direct path in basic science defined by my work on A New Kind of Science, it wouldn’t have been me.
It’s not something I’ve explicitly planned, but at this point I can look back on my life so far and see it as a repeated alternation between technology and basic science. Each builds on the other, giving me both ideas and tools—and creating in the end a taller and taller intellectual tower. But what’s crucial is that every alternation is in many ways a fresh start, where I’m able to use what I’ve done before, but have a chance to reexamine everything from a new perspective. And so it has been in the past few years with A New Kind of Science: having returned to basic science after 17 years away, it’s been possible to make remarkably rapid and dramatic progress that’s taken things to a new and wholly unexpected level.
In the course of intellectual history, there’ve been very few fundamentally different paradigms introduced for theoretical science. The first is what one might call the “structural paradigm”, in which one’s basically just concerned with what things are made of. And beginning in antiquity—and continuing for two millennia—this was pretty much the only paradigm on offer. But in the 1600s there was, as I described it in the opening sentence of A New Kind of Science, a “dramatic new idea”—that one could describe not just how things are, but also what they can do, in terms of mathematical equations.
And for three centuries this “mathematical paradigm” defined the state of the art for theoretical science. But as I went on to explain in the opening paragraph of A New Kind of Science, my goal was to develop a new “computational paradigm” that would describe things not in terms of mathematical equations but instead in terms of computational rules or programs. There’d been precursors to this in my own work in the 1980s, but despite the practical use of computers in applying the mathematical paradigm, there wasn’t much of a concept of describing things, say in nature, in a fundamentally computational way.
One feature of a mathematical equation is that it aims to encapsulate “in one fell swoop” the whole behavior of a system. Solve the equation and you’ll know everything about what the system will do. But in the computational paradigm it’s a different story. The underlying computational rules for a system in principle determine what it will do. But to actually find out what it does, you have to run those rules—which is often a computationally irreducible process.
Put another way: in the structural paradigm, one doesn’t talk about time at all. In the mathematical paradigm, time is there, but it’s basically just a parameter, that if you can solve the equations you can set to whatever value you want. In the computational paradigm, however, time is something more fundamental: it’s associated with the actual irreducible progression of computation in a system.
It’s an important distinction that cuts to the core of theoretical science. Heavily influenced by the mathematical paradigm, it’s often been assumed that science is fundamentally about being able to make predictions, or in a sense having a model that can “outrun” the system you’re studying, and say what it’s going to do with much less computational effort than the system itself.
But computational irreducibility implies that there’s a fundamental limit to this. There are systems whose behavior is in effect “too complex” for us to ever be able to “find a formula for it”. And this is not something we could, for example, resolve just by increasing our mathematical sophistication: it is a fundamental limit that arises from the whole structure of the computational paradigm. In effect, from deep inside science we’re learning that there are fundamental limitations on what science can achieve.
But as I mentioned in A New Kind of Science, computational irreducibility has an upside as well. If everything were computationally reducible, the passage of time wouldn’t in any fundamental sense add up to anything; we’d always be able to “jump ahead” and see what the outcome of anything would be without going through the steps, and we’d never have something we could reasonably experience as free will.
In practical computing it’s pretty common to want to go straight from “question” to “answer”, and not be interested in “what happened inside”. But in A New Kind of Science there is in a sense an immediate emphasis on “what happens inside”. I don’t just show the initial input and final output for a cellular automaton. I show its whole “spacetime” history. And now that we have a computational theory of fundamental physics we can see that all the richness of our physical experience is contained in the “process inside”. We don’t just want to know the endpoint of the universe; we want to live the ongoing computational process that corresponds to our experience of the passage of time.
But, OK, so in A New Kind of Science we reached what we might identify as the third major paradigm for theoretical science. But the exciting—and surprising—thing is that inspired by our Physics Project we can now see a fourth paradigm: the multicomputational paradigm. And while the computational paradigm involves considering the progression of particular computations, the multicomputational paradigm involves considering the entangled progression of many computations. The computational paradigm involves a single thread of time. The multicomputational paradigm involves multiple threads of time that branch and merge.
What in a sense forced us into the multicomputational paradigm was thinking about quantum mechanics in our Physics Project, and realizing that multicomputation was inevitable in our models. But the idea of multicomputation is vastly more general, and in fact immediately applies to any system where at any given step multiple things can happen. In A New Kind of Science I studied many kinds of computational systems—like cellular automata and Turing machines—where one definite thing happens at each step. I looked a little at multiway systems—primarily ones based on string rewriting. But now in general in the multicomputational paradigm one is interested in studying multiway systems of all kinds. They can be based on simple iterations, say involving numbers, in which multiple functions can be applied at each step. They can be based on systems like games where there are multiple moves at each step. And they can be based on a whole range of systems in nature, technology and elsewhere where there are multiple “asynchronous” choices of events that can occur.
Given the basic description of multicomputational systems, one might at first assume that whatever difficulties there are in deducing the behavior of computational systems, they would only be greater for multicomputational systems. But the crucial point is that whereas with a purely computational system (like a cellular automaton) it’s perfectly reasonable to imagine “experiencing” its whole evolution—say just by seeing a picture of it, the same is not true of a multicomputational system. Because for observers like us, who fundamentally experience time in a single thread, we have no choice but to somehow “sample” or “coarse grain” a multicomputational system if we are to reduce its behavior to something we can “experience”.
And there’s then a remarkable formal fact: if one has a system that shows fundamental computational irreducibility, then computationally bounded “singlethreadoftime” observers inevitably perceive certain effective behavior in the system, that follows something like the typical laws of physics. Once again we can make an analogy with gases made from large numbers of molecules. Largescale (computationally bounded) observers will essentially inevitably perceive gases to follow, say, the standard gas laws, quite independent of the detailed properties of individual molecules.
In other words, the interplay between an “observer like us” and a multicomputational system will effectively select out a slice of computational reducibility from the underlying computational irreducibility. And although I didn’t see this coming, it’s in the end fairly obvious that something like this has to happen. The Principle of Computational Equivalence makes it basically inevitable that the underlying processes in the universe will be computationally irreducible. But somehow the particular features of the universe that we perceive and care about have to be ones that have enough computational reducibility that we can, for example, make consistent decisions about what to do, and we’re not just continually confronted by irreducible unpredictability.
So how general can we expect this picture of multicomputation to be, with its connection to the kinds of things we’ve seen in physics? It seems to be extremely general, and to provide a true fourth paradigm for theoretical science.
There are many kinds of systems for which the multicomputational paradigm seems to be immediately relevant. Beyond physics and metamathematics, there seems to be nearterm promise in chemistry, molecular biology, evolutionary biology, neuroscience, immunology, linguistics, economics, machine learning, distributed computing and more. In each case there are underlying lowlevel elements (such as molecules) that interact through some kind of events (say collisions or reactions). And then there’s a big question of what the relevant observer is like.
In chemistry, for example, the observer could just measure the overall concentration of some kind of molecule, coarsegraining together all the individual instances of those molecules. Or the observer could be sensitive, for example, to detailed causal relationships between collisions among molecules. In traditional chemistry, things like this generally aren’t “observed”. But in biology (for example in connection with membranes), or in molecular computing, they may be crucial.
When I began the project that became A New Kind of Science the central question I wanted to answer is why we see so much complexity in so many kinds of systems. And with the computational paradigm and the ubiquity of computational irreducibility we had an answer, which also in a sense told us why it was difficult to make certain kinds of progress in a whole range of areas.
But now we’ve got a new paradigm, the multicomputational paradigm. And the big surprise is that through the intermediation of the observer we can tap into computational reducibility, and potentially find “physicslike” laws for all sorts of fields. This may not work for the questions that have traditionally been asked in these fields. But the point is that with the “right kind of observer” there’s computational reducibility to be found. And that computational reducibility may be something we can tap into for understanding, or to use some kind of system for technology.
It can all be seen as starting with the ruliad, and involving almost philosophical questions of what one can call “observer theory”. But in the end it gives us very practical ideas and methods that I think have the potential to lead to unexpectedly dramatic progress in a remarkable range of fields.
I knew that A New Kind of Science would have practical applications, particularly in modeling, in technology and in producing creative material. And indeed it has. But for our Physics Project applications seemed much further away, perhaps centuries. But a great surprise has been that through the multicomputational paradigm it seems as if there are going to be some quite immediate and very practical applications of the Physics Project.
In a sense the reason for this is that through the intermediation of multicomputation we see that many kinds of systems share the same underlying “metastructure”. And this means that as soon as there are things to say about one kind of system these can be applied to other systems. And in particular the great successes of physics can be applied to a whole range of systems that share the same multicomputational metastructure.
An immediate example is in practical computing, and particularly in the Wolfram Language. It’s something of a personal irony that the Wolfram Language is based on transformation rules for symbolic expressions, which is a structure very similar to what ends up being what’s involved in the Physics Project. But there’s a crucial difference: in the usual case of the Wolfram Language, everything works in a purely computational way, with a particular transformation being done at each step. But now there’s the potential to generalize that to the multicomputational case, and in effect to trace the multiway system of every possible transformation.
It’s not easy to pick out of that structure things that we can readily understand. But there are important lessons from physics for this. And as we build out the multicomputational capabilities of the Wolfram Language I fully expect that the “notational clarity” it will bring will help us to formulate much more in terms of the multicomputational paradigm.
I built the Wolfram Language as a tool that would help me explore the computational paradigm, and from that paradigm there emerged principles like the Principle of Computational Equivalence, which in turn led me to see the possibility of something like WolframAlpha. But now from the latest basic science built on the foundations of A New Kind of Science, together with the practical tooling of the Wolfram Language, it’s becoming possible again to see how to make conceptual advances that can drive technology that will again in turn let us make—likely dramatic—progress in basic science.
A New Kind of Science is full of intellectual seeds. And in the past few years—having now returned to basic science—I’ve been harvesting a few of those seeds. The Physics Project and the Metamathematics Project are two major results. But there’s been quite a bit more. And in fact it’s rather remarkable how many things that were barely more than footnotes in A New Kind of Science have turned into major projects, with important results.
Back in 2018—a year before beginning the Physics Project—I returned, for example, to what’s become known as the Wolfram Axiom: the axiom that I found in A New Kind of Science that is the very simplest possible axiom for Boolean algebra. But my focus now was not so much on the axiom itself as on the automated process of proving its correctness, and the effort to see the relation between “pure computation” and what one might consider a humanabsorbable “narrative proof”.
Computational irreducibility appeared many times, notably in my efforts to understand AI ethics and the implications of computational contracts. I’ve no doubt that in the years to come, the concept of computational irreducibility will become increasingly important in everyday thinking—a bit like how concepts such as energy and momentum from the mathematical paradigm have become important. And in 2019, for example, computational irreducibility made an appearance in government affairs, as a result of me testifying about its implications for legislation about AI selection of content on the internet.
In A New Kind of Science I explored many specific systems about which one can ask all sorts of questions. And one might think that after 20 years “all the obvious questions” would have been answered. But they have not. And in a sense the fact that they have not is a direct reflection of the ubiquity of computational irreducibility. But it’s a fundamental feature that whenever there’s computational irreducibility, there must also be pockets of computational reducibility: in other words, the very existence of computational irreducibility implies an infinite frontier of potential progress.
Back in 2007, we’d had great success with our Turing Machine Prize, and the Turing machine that I’d suspected was the very simplest possible universal Turing machine was indeed proved universal—providing another piece of evidence for the Principle of Computational Equivalence. And in a sense there’s a general question that’s raised by A New Kind of Science about where the threshold of universality—or computational equivalence—really is in different kinds of systems.
But there are simplertodefine questions as well. And ever since I first studied rule 30 in 1984 I’d wondered about many questions related to it. And in October 2019 I decided to launch the Rule 30 Prizes, defining three specific easytostate questions about rule 30. So far I don’t know of progress on them. And for all I know they’ll be open problems for centuries. From the point of view of the ruliad we can think of them as distant explorations in rulial space, and the question of when they can be answered is like the question of when we’ll have the technology to get to some distant place in physical space.
Having launched the Physics Project in April 2020, it was rapidly clear that its ideas could also be applied to metamathematics. And it even seemed as if it might be easier to make relevant “realworld” observations in metamathematics than in physics. And the seed for this was in a note in A New Kind of Science entitled “Empirical Metamathematics”. That note contained one picture of the theoremdependency graph of Euclid’s Elements, which in the summer of 2020 expanded into a 70page study. And in my recent “Physicalization of Metamathematics” there’s a continuation of that—beginning to map out empirical metamathematical space, as explored in the practice of mathematics, with the idea that multicomputational phenomena that in physics may take technically infeasible particle accelerators or telescopes might actually be within reach.
In addition to being the year we launched our Physics Project, 2020 was also the 100th anniversary of combinators—the first concrete formalization of universal computation. In A New Kind of Science I devoted a few pages and some notes to combinators, but I decided to do a deep dive and use both what I’d learned from A New Kind of Science and from the Physics Project to take a new look at them. Among other things the result was another application of multicomputation, as well as the realization that even though the S, K combinators from 1920 seemed very minimal, it was possible that S alone might also be universal, though with something different than the usual input → output “workflow” of computation.
In A New Kind of Science a single footnote mentions multiway Turing machines. And early last year I turned this seed into a long and detailed study that provides further foundational examples of multicomputation, and explores the question of just what it means to “do a computation” multicomputationally—something which I believe is highly relevant not only for practical distributed computing but also for things like molecular computing.
In 2021 it was the centenary of Post tag systems, and again I turned a few pages in A New Kind of Science into a long and detailed study. And what’s important about both this and my study of combinators is that they provide foundational examples (much like cellular automata in A New Kind of Science), which even in the past year or so I’ve used multiple times in different projects.
In mid2021, yet another fewpage discussion in A New Kind of Science turned into a detailed study of “The Problem of Distributed Consensus”. And once again, this turned out to have a multicomputational angle, at first in understanding the multiway character of possible outcomes, but later with the realization that the formation of consensus is deeply related to the process of measurement and the coarsegraining involved in it—and the fundamental way that observers extract “coherent experiences” from systems.
In A New Kind of Science, there’s a short note about multiway systems based on numbers. And once again, in fall 2021 I expanded on this to produce an extensive study of such systems, as a certain kind of very minimal example of multicomputation, that at least in some cases connects with traditional mathematical ideas.
From the vantage point of multicomputation and our Physics Project it’s interesting to look back at A New Kind of Science, and see some of what it describes with more clarity. In the fall of 2021, for example, I reviewed what had become of the original goal of “understanding complexity”, and what methodological ideas had emerged from that effort. I identified two primary ones, which I called “ruliology” and “metamodeling”. Ruliology, as I’ve mentioned above, is my new name for the pure, basic science of studying the behavior of systems with simple rules: in effect, it’s the science of exploring the computational universe.
Metamodeling is the key to making connections to systems in nature and elsewhere that one wants to study. Its goal is to find the “minimal models for models”. Often there are existing models for systems. But the question is what the ultimate essence of those models is. Can everything be reduced to a cellular automaton? Or a multiway system? What is the minimal “computational essence” of a system? And as we begin to apply the multicomputational paradigm to different fields, a key step will be metamodeling.
Ruliology and metamodeling are in a sense already core concepts in A New Kind of Science, though not under those names. Observer theory is much less explicitly covered. And many concepts—like branchial space, tokenevent graphs, the multiway causal graph and the ruliad—have only emerged now, with the Physics Project and the arrival of the multicomputational paradigm.
Multicomputation, the Physics Project and the Metamathematics Project are sowing their own seeds. But there are still many more seeds to harvest even from A New Kind of Science. And just as the multicomputational paradigm was not something that I, for one, could foresee from A New Kind of Science, no doubt there will in time be other major new directions that will emerge. But, needless to say, one should expect that it will be computationally irreducible to determine what will happen: a metacontribution of the science to the consideration of its own future.
The creation of A New Kind of Science took me a decade of intense work, none of which saw the light of day until the moment the book was published on May 14, 2002. Returning to basic science 17 years later the world had changed and it was possible for me to adopt a quite different approach, in a sense making the process of doing science as open and incremental as possible.
It’s helped that there’s the web, the cloud and livestreaming. But in a sense the most crucial element has been the Wolfram Language, and its character as a fullscale computational language. Yes, I use English to tell the story of what we’re doing. But fundamentally I’m doing science in the Wolfram Language, using it both as a practical tool, and as a medium for organizing my thoughts, and sharing and communicating what I’m doing.
Starting in 2003, we’ve had an annual Wolfram Summer School at which a long string of talented students have explored ideas based on A New Kind of Science, always through the medium of the Wolfram Language. In the last couple of years we’ve added a Physics track, connected to the Physics Project, and this year we’re adding a Metamathematics track, connected to the Metamathematics Project.
During the 17 years that I wasn’t focused on basic science, I was doing technology development. And I think it’s fair to say that at Wolfram Research over the past 35 years we’ve created a remarkably effective “machine” for doing innovative research and development. Mostly it’s been producing technology and products. But one of the very interesting features of the Physics Project and the projects that have followed it is that we’ve been applying the same managed approach to innovation to them that we have been using so successfully for so many years at our company. And I consider the results to be quite spectacular: in a matter of weeks or months I think we’ve managed to deliver what might otherwise have taken years, if it could have been done at all.
And particularly with the arrival of the multicomputational paradigm there’s quite a challenge. There are a huge number of exceptionally promising directions to follow, that have the potential to deliver revolutionary results. And with our concepts of managed research, open science and broad connection to talent it should be possible to make great progress even fairly quickly. But to do so requires significant scaling up of our efforts so far, which is why we’re now launching the Wolfram Institute to serve as a focal point for these efforts.
When I think about A New Kind of Science, I can’t help but be struck by all the things that had to align to make it possible. My early experiences in science and technology, the personal environment I’d created—and the tools I built. I wondered at the time whether the five years I took “away from basic science” to launch Mathematica and what’s now the Wolfram Language might have slowed down what became A New Kind of Science. Looking back I can say that the answer was definitively no. Because without the Wolfram Language the creation of A New Kind of Science would have needed “not just a decade”, but likely more than a lifetime.
And a similar pattern has repeated now, though even more so. The Physics Project and everything that has developed from it has been made possible by a tower of specific circumstances that stretch back nearly half a century—including my 17year hiatus from basic science. Had all these circumstances not aligned, it is hard to say when something like the Physics Project would have happened, but my guess is that it would have been at least a significant part of a century away.
It is a lesson of the history of science that the absorption of major new paradigms is a slow process. And normally the timescales are long compared to the 20 years since A New Kind of Science was published. But in a sense we’ve managed to jump far ahead of schedule with the Physics Project and with the development of the multicomputational paradigm. Five years ago, when I summarized the first 15 years of A New Kind of Science I had no idea that any of this would happen.
But now that it has—and with all the methodology we’ve developed for getting science done—it feels as if we have a certain obligation to see just what can be achieved. And to see just what can be built in the years to come on the foundations laid down by A New Kind of Science.
]]>In the end it’s about five and a half pounds of paper, 1280 pages, 973 illustrations and 583,313 words. And its creation took more than a decade of my life. Almost every day of my thirties, and a little beyond, I tenaciously worked on it. Figuring out more and more science. Developing new kinds of computational diagrams. Crafting an exposition that I wrote and rewrote to make as clear as possible. And painstakingly laying out page after page of what on May 14, 2002, would be published as A New Kind of Science.
I’ve written before (even in the book itself) about the intellectual journey involved in the creation of A New Kind of Science. But here I want to share some of the more practical “behind the scenes” journey of the making of what I and others usually now call simply “the NKS book”. Some of what I’ll talk about happened twenty years ago, some more like thirty years ago. And it’s been interesting to go back into my archives (and, yes, those backup tapes from 30 years ago were hard to read!) and relive some of what finally led to the delivery of the ideas and results of A New Kind of Science as truckloads of elegantly printed books with striking covers.
It was late 1989—soon after my 30th birthday—when I decided to embark on what would become A New Kind of Science. And at first my objective was quite modest: I just wanted to write a book to summarize the science I’d developed earlier in the 1980s. We’d released Version 1.0 of Mathematica (and what’s now the Wolfram Language) in June 1988, and to accompany that release I’d written what had rapidly become a very successful book. And while I’d basically built Mathematica to give me the opportunity to do more science, my thought in late 1989 was that before seriously embarking on that, I should spend perhaps a year and write a book about what I already knew, and perhaps tie up a few loose ends in the process.
My journey in science began in the early 1970s—and by the time I was 14 I’d already written three booklength “treatises” about physics (though these wouldn’t see the light of day for several more decades). I worked purely on physics for a number of years, but in 1979 this led me into my first big adventure in technology—thereby starting my (very productive) longterm personal pattern of alternating between science and technology (roughly five times so far). In the early 1980s—back in a “science phase”—I was fortunate enough to make what remains my alltime favorite science discovery: that in cellular automaton programs even with extremely simple rules it’s possible to generate immense complexity. And from this discovery I was led to a series of results that began to suggest what I started calling a general “science of complexity”.
By the mid1980s I was quite well positioned in the academic world, and my first thought was to try to build up the study of the “science of complexity” as an academic field. I started a journal and a research center, and collected my papers in a book entitled Theory and Applications of Cellular Automata (later reissued as Cellular Automata and Complexity). But things developed slowly, and eventually I decided to go to “plan B”—and just try to create the tools and environment that I would need to personally push forward the science as efficiently as possible.
The result was that in late 1986 I started the development of Mathematica (and what’s now the Wolfram Language) and founded Wolfram Research. For several years I was completely consumed with the challenges of language design, software development and CEOing our rapidly growing company. But in August 1989 we had released Mathematica 1.2 (tying up the most obvious loose ends of Version 1.0)—and with the intensity of my other commitments at least temporarily reduced, I began to think about science again.
The Mathematica Book had been comparatively straightforward and fast for me to write—even as a “side project” to architecting and developing the system. And I imagined that it would be a somewhat similar experience writing a book explaining what I’d figured out about complexity.
My first working title was Complexity: An Introduction to the Science of Complex Phenomena. My first draft of a table of contents, from November 1989, begins with “A Gallery of Complex Systems” (or “The Phenomenon of Complexity”), and continues through nine other chapters, capturing some of what I then thought would be important (and in most cases had already studied):
I wrote a few pages of introductory text—beginning by stating the objective as:
My archives record that in late December I was taking a more computationfirst approach, and considering the title Algorithms in Nature: An Introduction to Complexity. But soon I was submerged in the intense effort to develop Mathematica 2.0, and this is what consumed me for most of 1990—though my archives from the time reveal one solitary short note, apparently from the middle of the year:
But through all this I kept thinking about the book I intended to write, and wondering what it should really be like. In the late 1980s there’d been quite a run of unexpectedly successful “popular science” books—like A Brief History of Time—that mixed what were at least often claimed to be new results or new insights about science with a kind of intendedtoentertain “everyman narrative”. A sequence of publishers had encouraged me to “write a popular science book”. But should the book I was planning to write really be one of those?
I talked to quite a few authors and editors. But nobody could quite tell a coherent story. Perhaps the most promising insight came from an editor of several successful such books, who opined that he thought the main market for “popular science” books was people who in the past would have read philosophy books, but now those were too narrow and technical. Other people, though, told me they thought it was really more of an “internal market”, with the books basically being bought by other scientists. And in the media and elsewhere there continued to be an undercurrent of sentiment that while the books might be being bought, they mostly weren’t actually getting read.
“Isn’t there actual data on what’s going on?” I asked my publishing industry contacts. “No”, they said, “that’s just not how our industry works”. “Well”, I said, “why don’t we collect some data?” My thenpublisher seemed enthusiastic about it. So I wrote a rather extensive survey to do on “random shoppers” in bookstores. It began with some basic—if “1990style”—demographic questions, then got to things like
and rather charmingly ended with
(and, yes, in reality it took almost the longest time I could imagine for electronic books to become common). But after many months of “we’ll get results soon” it turned out almost no surveys were ever done. As I would learn repeatedly, most publishers seemed to have a very hard time doing anything they hadn’t already done before. Still, my thenpublisher had done well with The Mathematica Book. So perhaps they might be able to just “follow a formula” and do well with my book if it was written in “popular science” form.
But I quickly realized that the pressure to add sensationalism “to sell books” really grated on me. And it didn’t take long to decide that, no, I wasn’t going to write a “formula” popular science book. I was going to write my own kind of book—that was more direct and straightforward. No stories. Just science. With lots of pictures. And if nothing else, the book would at least be helpful to me, as a way of clarifying my own thinking.
In January 1991 we announced Mathematica 2.0—and in March and June I did a 35city tour of the US and Europe talking about it. Then, finally, at the beginning of July we delivered final floppy disks to the duplicator (as one did in those days)—and Mathematica 2.0 was on its way. So what next? I had a long roadmap of things we should do. But I decided it was time to let the team I’d built just get on with following the roadmap for a while, without me adding yet more things to it. (As it turns out, we finally finished essentially everything that was on my 1991 todo list just a few years ago.)
And so it was that in July 1991 I became a remote CEO (yes, a few decades ahead of the times), moved a couple thousand miles away from our company headquarters to a place in the hills near San Francisco, and set about getting ready to write. Based on the plan I had for the book—and my experience with The Mathematica Book—I figured it might take about a year, or maybe 18 months, to finish the project.
In the end—with a few trips in the middle, notably to see a total solar eclipse—it took me a couple of months to get my remoteCEO setup figured out (with a swank computerconnected fax machine, email getting autodelivered every 15 minutes, etc.). But even while that was going on, I was tooling up to get an efficient modern system for visualizing and studying cellular automata. Back when I had been writing my papers in the 1980s, I’d had a C program (primarily for Sun workstations) that had gradually grown, and was eventually controlled by a rather elaborate—but sensibleforitstime—hierarchical textual menu system
which, yes, could generate at least singlegraphicperscreen graphics, as in this picture of my 1983 office setup:
But now the world had changed, and I had Mathematica. And I wanted a nice collection of Wolfram Language functions that could be used as streamlined “primitives” for studying cellular automata. Given all my work on cellular automata it might seem strange that I hadn’t built cellular automaton functionality into the Wolfram Language right from the start. But in addition to being a bit bashful about my personal pet kind of system, I hadn’t been able to see how to “package” all the various different kinds of cellular automata I’d studied into one convenient superfunction—and indeed it took me a decade more of understanding, both of language design and of cellular automata, to work out how to nicely do that. And so back in 1991 I just created a collection of addon functions (or what might today be a paclet) containing the particular functions I needed. And indeed those functions served me well over the course of the development of A New Kind of Science.
A “staged” screen capture from the time shows my basic working environment:
Some printouts from early 1991 give a sense of my everyday experience:
And although it’s now more than 30 years later, I’m happy to say that we’ve successfully maintained the compatibility of the Wolfram Language, and those same functions still just run! The .ma format of my Version 2.0 notebooks from 1991 has to be converted to .nb, but then they just open in Version 13 (with a bit of automatic style modernization) and I’m immediately “transported back in time” to 1991, with, yes, a very small notebook appropriate for a 1991 rather than a 2022 screen size:
(Of course the cellular automata all look the same, but, yes, this notebook looks shockingly similar to ones from our recent cellular automaton NFTminting event.)
We’d invented notebooks in 1987 to be able to do just the kinds of things I wanted to do for my science project—and I’d been itching to use them. But before 1991 I’d mostly been doing core code development (often in C), or using the elaborate but still textual system we had for authoring The Mathematica Book. And so—even though I’d demoed them many times—I hadn’t had a chance to personally make daily use of notebooks.
But in 1991, I went all in on notebooks—and have never looked back. When I first started studying cellular automata back in 1981, I’d had to display their output as text. But soon I was able to start using the bitmapped displays of workstation computers, and by 1984 I was routinely printing cellular automaton images in fairly high resolution on a laser printer. But with Mathematica and our notebook technology things got dramatically more convenient—and what had previously often involved laborious work with paper, scissors and tape now became a matter of simple Wolfram Language code in a notebook.
For almost a decade starting in 1982, my primary computer had been a progressively more sophisticated Sun workstation. But in 1991 I switched to NeXT—mainly to be able to use our notebook interface, which was by then well developed for NeXT but wasn’t yet ready on X Windows and Sun. (It was also available on Macintosh computers, but at the time those weren’t powerful enough.)
And here I am in 1991, captured “hiding out” as a remote CEO, with a NeXT in the background, just getting started on the book:
Here’s a picture showing a bit more of the setup, taken in early 1993, during a short period when I was a remoteremoteCEO, with my computer set up in a hotel room:
Throughout the 1980s, I’d used cellular automata—and basically cellular automata alone—as my window into the computational universe. But in August 1991—with my new computational capabilities and new awayfromthecompanytodoscience setup—I decided it’d be worth trying to look at some other systems.
And I have to say that now, three decades later, I didn’t remember just how suddenly everything happened. But my filesystem records that in successive days at the beginning of September 1991 there I was investigating more and more kinds of systems (.ma’s were “Mathematica notebook” files; .mb’s were the “binary forks” of these files):
Mobile automata. Turing machines. Tag systems. Soon these would be joined by register machines, and more. The first examples of these systems tended to have quite simple behavior. But I quickly started searching to see whether these systems—like cellular automata—would be capable of complex behavior, as my 1991 notebooks record:
Often I would run programs overnight, or sometimes for many days. Later I would recruit many computers from around our company, and have them send me mail about their results:
But already in September 1991 I was starting to see that, yes, just like cellular automata, all these different kinds of systems, even when their underlying rules were simple, could exhibit highly complex behavior. I think I’d sort of implicitly assumed this would be true. But somehow actually seeing it began to elevate my view of just how general a “science of complexity” one might be able to make.
There were a few distractions in the fall of 1991. Like in October a large fire came within about half a mile of burning down our house:
But by the spring of 1992 it was beginning to become clear that there was a very general principle around all this complexity I was seeing. I had invented the concept of computational irreducibility back in 1984. And I suppose in retrospect I should have seen the bigger picture sooner. But as it was, on a pleasant afternoon (and, no, I haven’t figured out the exact date), I was taking a short break from being in front of my computer, and had wandered outside. And that’s when the Principle of Computational Equivalence came to me. Somehow after all those years with cellular automata, and all those months with computer experiments on other systems, I was primed for it. But in the end it all arrived in one moment: the concept, the name, the implications for computational irreducibility. And in the three decades since, it’s been the single most important guiding principle for my intuition.
I’ve always found it difficult to produce “disembodied content”: right from the beginning I typically need to have a pretty clear idea how what I’m producing will look in the end. So back in 1991 I really couldn’t produce more than a page or two of content for my book without knowing what the book was going to look like.
“Formula” popular science books tended—for what I later realized were largely economic reasons—to consist mainly of pages of pure text, with at most line drawings, and to concentrate whatever things like photographs they might have into a special collection of “plates” in the middle of the book. For The Mathematica Book we’d developed a definite—very functional—layout, with text, tables and twocolumn “computer dialogs”:
For the NKS book I knew I needed something much more visual. And at first I imagined it might be a bit like a highend textbook, complete with all sorts of structured elements (“Historical Note”, “Methodology”, etc.).
I asked a talented young designer who had worked on The Mathematica Book (and who, 31 years later, is now a very senior executive at our company) to see what he could come up with. And here, from November 1991, is the very first “look” for the NKS book—with content pretty much just flowed in from the few pages I’d written out in plain text:
I knew the book would have images of the kind I’d long produced of cellular automata, and that had appeared in my papers and book from the 1980s:
But what about “diagrams”? At first we toyed with drawing “textbookstyle” diagrams—and produced some samples:
But these seemed to have way too much “conceptual baggage”, and when one looks closely at them, it’s easy to get confused. I wanted something more minimal—where the spotlight was as much as possible on the systems I was studying, not on “diagrammatic scaffolding”. And so I tried to develop a “direct diagramming” methodology, where each diagram could directly “explain itself”—and where every diagram would be readable “purely visually”, without words.
In a typical case I might show the behavior of a system (here a mobile automaton), next to an explicit “visual template” of how its rules operate. The idea then was that even a reader who didn’t understand the bigger story, or any of the technical details, could still “match up templates” and understand what was going on in a particular picture:
At the beginning of the project, the diagrams were comparatively simple. But as the project progressed I invented more and more mechanisms for them, until later in the project I was producing very complex “visually readable” diagrams like this:
A crucial point was that all these diagrams were being produced algorithmically—with Wolfram Language code. And in fact I was developing the diagrams as an integral part of actually doing the research for the book. It was a lesson I’d learned years earlier: don’t wait until research is “finished” to figure out how to present it; work out the presentation as early as possible, so you can use it to help you actually do the research.
Another aspect of our first “textbooklike” style for the book was the idea of having additional elements, alongside the “main narrative” of the book. In early layouts we thought about having “Technical Notes”, “Historical Notes”, “Implementation Notes”, etc. But it didn’t take too long to decide that no, that was just going to be too complicated. So we made the decision to have one kind of note, and to collect all notes at the back of the book.
And that meant that in the main part of the book we had just two basic elements: text and images (with captions). But, OK, in designing any book a very basic question is: what size and shape will its pages be? The Mathematica Book was squarish—like a typical textbook—so that it accommodated its textontheleft codeontheright “dialogs”. We knew that the new book should be wide too, to accommodate the kinds of graphics I expected. But that posed a problem.
In The Mathematica Book ordinary text ran the full width of the page. And that worked OK, because in that book the text was typically broken up by dialogs, tables, etc. In the new book, however, I expected much longer blocks of pure text—which wouldn’t be readable if they ran the full width of the page. But if the text was narrower, then how would the graphics not look like they were awkwardly sticking out? Well, the pages would have to be carefully laid out to appropriately anchor the graphics visually, say to the tops or bottoms of pages. And that was going to make the process of layout much trickier.
Different pages were definitely going to look different. But there had to be a certain overall consistency. Every graphic was going to have a caption—and actually a caption that was sufficiently selfcontained so that people could basically “read the book just by looking at the pictures”. Within the graphics themselves there had to be standards. How should arrays of cells be rendered? To what extent should things have boxes around them, or arrows between them? How big should pictures that emphasized particular features be?
Some of these standards got implemented basically just by me remembering to follow them. But others were essentially the result of the whole stack of Wolfram Language functions that we built to produce the algorithmic diagrams for the book. At the time, there was some fiddliness to these functions, and to making their output look good—though in later years what we learned from this was used to tune up the general look of builtin graphics in the Wolfram Language.
One of the striking features of the NKS book is the crispness of its pictures. And I think it’s fair to say that this wasn’t easy to achieve—and in the end required a pretty deep dive into the technology of imaging and printing (as I’ll describe more in a later section).
Back in the 1980s I’d had plenty of pictures of things like cellular automata in my papers. And I’d produced them by outputting what amounted to pages of bitmaps on laser printers, then having publishers photographically reproduce the pictures for printing.
Up to a point the results were OK:
But for example in 1985 when I wanted a 2000step picture of rule 30 things got difficult. The computation (which, yes, involves 8 million cells) was done on a prototype Connection Machine parallel computer. And at first the output was generated on a largeformat printer that was usually used to print integrated circuit layouts. The result was quite large, and I subsequently laminated pictures like this (and in rolledup form they served as engaging hiding places for my children when they were very young):
But when photographically reproduced and printed in a journal the picture definitely wasn’t great:
And the NKS book provided another challenge as well. While the core of a picture might just be an array of cells like in a cellular automaton, a full algorithmic diagram could contain all sorts of other elements.
In the end, the NKS book was a beneficiary of an important design decision that we made back in 1987, early in the development of Mathematica. At the time, most graphics were thought about in terms of bitmaps. On whatever device one was using, there was an array of pixels of a certain resolution. And the focus was on rendering the graphics at that resolution. Not everything worked that way, though. And “drawing” (as opposed to “painting”) programs typically created graphics in “vector” form, in which at first primitives like lines and polygons were specified without reference to resolution, and were then converted to bitmaps only when they were displayed.
The shapes of characters in fonts were something that was often specified—at least at an underlying level—in vector form. There’d been various approaches to doing this, but by 1987 PostScript was an emerging standard—at least for printing—buoyed by its use in the Apple LaserWriter. The main focus of PostScript was on fonts and text, but the PostScript language also included standard graphics primitives like lines and polygons.
Back when I had built SMP in 1979–1981 we’d basically had to build a separate driver for every different display or printing device we wanted to output graphics on. But in 1987 there was an alternative: just use PostScript for everything. Printer manufacturers were working hard to support PostScript on their printers, but PostScript mostly hadn’t come to screens yet. There was an important exception though: the NeXT computer was set up to have PostScript as its native screenrendering system. And partly through that, we decided to use PostScript as our underlying way to represent all graphics in Mathematica.
At a high level, graphics were described with the same symbolic primitives as we use in the Wolfram Language today: Line, Polygon, etc. But these were converted internally to PostScript—and even stored in notebooks that way. On the NeXT this was pretty much the end of the story, but on other systems we had to write our own interpreters for at least the subset of PostScript we were using.
Why was this important to the NKS book? Well, it meant that all graphics could be specified in a fundamentally resolutionindependent way. In developing the graphics I could look at them in a notebook on a screen, or I could print them on a standard laser printer. But for the final book the exact same graphics could be printed at much higher resolution—and look much crisper.
At the time, the standard resolution of a computer screen was 72 dpi (dots per inch) and the resolution of a typical laser printer was 300 dpi. But the typical basic resolution of a bookprinting pipeline was more like 2400 dpi. I’ll talk later about the adventure of actually printing the NKS book. But the key point was that because Mathematica’s graphics were fundamentally based on PostScript, they weren’t tied to any particular resolution, so they could in principle make use of whatever resolution was available.
Needless to say, there were plenty of complicated issues. One had to do with indicating the cells in something like a cellular automaton. Here’s a picture of the first few steps of rule 30, shown as a kind of “macro bitmap”, with pure black and white cells:
✕

But often I wanted to indicate the extent of each cell:
✕

And in late 1991 and early 1992 we worried a lot about how to draw the “mesh” between cells. A first thought was just to use a thin black line. But that obviously wouldn’t work, because it wouldn’t separate black cells. And we soon settled on a GrayLevel[.15] line, which was visible against both black and white.
But how is such a line printed? If we’re just using black ink, there’s ultimately either black or white at a particular place on the page. But there’s a standard way to achieve the appearance of gray, by changing the local density of black and white. And the typical method used to implement this is (as we’ll discuss later) halftoning, in which one renders the “gray” by using black dots of different sizes.
But by the time one’s using very thin gray lines, things are getting very tricky. For example, it matters how much the ink on either side of the line spreads—because if it’s too much it can effectively fill in where the line was supposed to be. We wanted to define standards that we could use throughout the NKS book. And we couldn’t tell what would happen in the final printed book except by actually trying it, on a real printing press. So already in early 1992 we started doing print tests, trying out different thicknesses of lines and so on. And that allowed us to start setting graphics standards that we could implement in the Wolfram Language code used to make the algorithmic diagrams, that would then flow through to all renderings of those diagrams.
Back in 1991 we debated quite a bit whether the NKS book should use color. We knew it would be significantly more expensive to print the book in color. But would color allow seriously better communication of information? Twocolor cellular automata like rule 30 can be rendered in pure black and white. But over the years I’d certainly made many striking color pictures of cellular automata with more colors.
Somehow, though, those pictures hadn’t seemed quite as crisp as the black and white ones. And there was another issue too, having to do with a problem I’d noticed in the mid1980s in human visual perception of arrays of colored cells. Somewhat nerdily, I ended up including a note about this in the final NKS book:
But the final conclusion was that, yes, the NKS book would be pure black and white. Nowadays—particularly with screen rendering being in many ways more important than print—it’s much easier to do things in color. And, for example, in our Physics Project it’s been very convenient to distinguish types of graphs, or nodes in graphs, by color. But for the NKS book I think it was absolutely the right decision to use black and white. Color might have added some nice accents to certain kinds of diagrams. But the clarity—and visual force—of the images in the book was much better served by the perceptual crispness of pure black and white.
The way most books with complex formats get produced is that first the author creates “disembodied” pieces of content, then a designer or production artist comes in and arranges them on pages. But for the NKS book I wanted something where the process of creation and layout was much more integrated, and where—just as I was directly writing Wolfram Language code to produce images—I could also directly lay out final book pages.
By 1990 “desktop publishing” was commonplace, and there were plenty of systems that basically allowed one to put anything anywhere on a page. But to make a whole book we knew we needed a more consistent and templated approach—that could also interact programmatically with the Wolfram Language. There were a few welldeveloped “fullscale book production systems” that existed, but they were complex “industrially oriented” pieces of software, that didn’t seem realistic for me to use interactively while writing the book.
In mid1990, though, we saw a demo of something new, running on the NeXT computer: a system called FrameMaker, which featured bookproduction capabilities, as well as a somewhat streamlined interchange format. Oh, and especially on the NeXT, it handled PostScript graphics well, inserting them “by reference” into documents. By late 1990 we were building book layout templates in FrameMaker, and we soon settled on using that for the basic production of the book. (Later, to achieve all the effects we wanted, we ended up having to process everything through Wolfram Language, but that’s another story.
We iterated for a while on the book design, but by the end of 1991 we’d nailed it down, and I started authoring the book. I made images using Mathematica, importing them in “Encapsulated PostScript” into FrameMaker. And words I typed directly into FrameMaker—in the environment reconstructed here using a virtual machine that we saved from the time of authoring the book:
I composed every page—not only its content, but also its visual appearance. If I had a cellular automaton to render, and it was going to occupy a certain region on a page, I would pick the number of cells and steps to be appropriate for that region. I was constantly adjusting pictures to make them look good on a given page, or on pairs of facing pages, or along with other nearby pictures, and so on.
One of the tricky issues was how to refer to pictures from within the text. In technical books, it’s common to number “figures”, so that the text might say “See Figure 16”. But I wanted to avoid that piece of “scaffolding”, and instead always just be able to say things like “the picture below”, or “the picture on the facing page“. It was often quite a puzzle to see how to do this. If a picture was too big, or the text was too small, the picture would get too far ahead, and so on. And I was constantly adjusting things to make everything work.
I also decided that for elegance I wanted to avoid ever having to hyphenate words in the text. And quite often I found myself either rewording things, or slightly changing letter spacing, to make things fit, and to avoid things like “orphaned” words at the beginnings of lines.
It was a strange and painstaking process getting each page to look right, and adjusting content and layout together. Sometimes things got a little pathological. I always wanted to fill out pages, and not to leave space at the bottom (oh, and facing pages had to be exactly the same height). And I also tried to start new sections on a new page. But there I was, writing Chapter 5, and trying to end the section on “Substitution Systems and Fractals”—and I had an empty bottom third of a page. What was I to do? I decided to invent a whole new kind of system, that appears on page 192, just to fill out the layout for page 191:
Looking through my archives, I find traces of other examples. Here are notes on a printout of Chapter 6. And, yes, on page 228 I did insert images of additional rules:
By the end of 1991 I was all set up to author and lay out the book. I started writing—and things went quickly. The first printout I have from that time is from May 1992, and it already has nearly 90 pages of content, with many recognizable pictures from the final NKS book:
✕

At that point the book was titled Computation and the Complexity of Nature, and the chapter titles were a bit different, and rather complexity themed:
✕

A large fraction of the maintext material about cellular automata was already there, as well as material about substitution systems and mobile automata. And there were extensive notes at the end, though at that point they were still singlecolumn, and looked pretty much just like a slightly compressed version of the main text. And, by the way, Turing machines were just then appearing in the book, but still relegated to the notes, on the grounds that they “weren’t as minimal as mobile automata”.
✕

And hanging out, so far just as a stub, was the Principle of Computational Equivalence:
✕

By August 1992 the book had changed its title to A New Science of Complexity (subtitle: Rethinking the Mechanisms of Nature). There was a new first chapter “Some Fundamental Phenomena” that began with photographs of various “systems from nature”:
✕

Chapter 3 had now become “The Behavior of Simple Systems”. Turing machines were there. There was at least a stub for register machines and arithmetic systems. But even though I’d investigated tag systems in September 1991 they weren’t yet in the book. Systems based on numbers were starting to be there.
And then, making their first appearance (with the page tagged as having been modified May 25, 1992), were the multiway systems that are now so central to the multicomputational paradigm (or, as I had originally and perhaps more correctly called them in this case, “Multiway Substitution Systems”):
✕

By September 1992, register machines were in, complete with the simplest register machine with complex behavior (that had taken a lot of computer time to find). My simple PDE with complex behavior was also there. By early 1993 I had changed its name again, to A Science of Complexity, and had begun to have a quite recognizable chapter structure (though not yet with realistic page numbers):
✕

It imagined a rather different configuration of notes than eventually emerged:
✕

Making its first appearance was a chapter on physics, though still definitely as a stub:
✕

This version of the book opened with “chapter summaries”, noting about the chapter on fundamental physics that “[Its] high point is probably my (still speculative) attempt to reformulate the foundation of physics in computational terms, including new models for space, time and quantum mechanics”:
✕

By February 1994 I was getting bound mockups of the book made, with the final page size, though the wrong title and cover, and at that point only 458 pages (rather than the eventual 1280):
✕

The twocolumn format for the notes at the back was established, and even though the content of notes for the stillcomplexitythemed first chapter were rather different from the way they ended up, some later notes already looked pretty much the same as they would in the final book:
✕

By September 1994 the draft of the book was up to 658 pages. The chapter structure was almost exactly as it finally ended up, albeit also with an epilog, and a bibliography (more about these later):
✕

The September 1994 draft contained a section entitled “The Story of My Work on Complexity” (later renamed to the final “The Personal Story of the Science in this Book”) which then included an image of what a Wolfram Notebook on NeXT looked like at the time:
✕

The caption talked about how in the course of the project I’d generated 3 gigabytes of notebooks—a number which would increase considerably before the book was finished. Charmingly, the caption also said: “The card at the back of this book gives information about obtaining some of the programs used”. Our first corporate website went live on October 7, 1994.
By late 1994 the form of the book was basically all set. I’d successfully captured pretty much everything I’d known when I started on the book back in 1991, and I’d had three years of good discoveries. But what was still to come was seven years of intense research and writing that would take me much further than I had ever imagined back in 1991—and would end up roughly doubling the length of the book.
In 1991 I knew the book I was going to write would have lots of cellular automaton pictures. And I imagined that the main other type of pictures it would contain would be photographs of actual, natural systems. But where was I going to get those photographs from? There was no web with image search back then. We looked at stock photo catalogs, but somehow the kinds of images they had (often oriented towards advertising) were pretty far from what we wanted.
Over the years, I had collected—albeit a bit haphazardly—quite a few relevant images. But we needed many more. I wanted pictures illustrating both complexity, and simplicity. But the good news was that, as I explained early in the book, both are ubiquitous. So it should be easy to find examples of them—that one could go out and take nice, consistent photographs of.
And starting in late 1991, that’s just what we did. My archives contain all sorts of negatives and contact prints (yes, this was before digital photography, and, yes, that’s a bolt—intended as an example of simplicity in an artifact):
✕

Sometimes the specimens I’d want could easily be found in my backyard
✕

or in the sky
✕

or on my desk (and even after waiting 400 million years, the trilobite fossil didn’t make it in)
✕

Over the course of a couple of years, I’d end up visiting all sorts of zoos, museums, labs, aquariums and botanical gardens—as well as taking trips to hardware stores and grocery stores—in search of interesting forms to photograph for the book.
Sometimes it would be a bit challenging to capture things in the field (yes, that’s a big leaf I’m holding on the right):
✕

At the zoo, a giraffe took a maddeningly long time to turn around and show me the other side of its patterning (I was very curious how similar they were):
✕

There were efforts to get pictures of “simple forms” (yes, that’s an egg)
✕

with, I now notice, a cameo from me—captured in mid experiment:
✕

Sometimes the subjects of photographs—with simple or complex forms—were acquired at local grocery stores (did I eat that cookie?):
✕

I cast about far and wide for forms to photograph—including, I now realize, all of rock, paper and scissors, each illustrating something different:
✕

Sometimes we tried to do actual, physical experiments, here with billiard balls (though in this case looking just like a simulation):
✕

and here with splashes:
✕

✕

I was very interested in trying to illustrate reproducible apparently random behavior. I got a severalfeettall piece of glassware at a surplus store and repeatedly tried dropping dye into water:
✕

I tried looking at smoke rising:
✕

These were all doityourself experiments. But that wasn’t always enough. Here’s a visit to a fluid dynamics lab (yes, with me visible checking out the hydraulic jump):
✕

I’d simulated flow past an obstacle, but here it was “visualized” in real life:
Then there was the section on fracture. Again, I wanted to understand reproducibility. I got a pure silicon wafer from a physicist friend, then broke it:
✕

Under a powerful microscope, all sorts of interesting structure was visible on the fracture surface—that was useful for model building, even if not obviously reproducible:
✕

And, talking of fractures, in March 1994 I managed to slip on some ice and break my ankle. Had I had pictures of fractures in the book, I was thinking of including an xray of my broken bones:
✕

There are all sorts of stories about photographs that were taken for the book. In illustrating phyllotaxis (ultimately for Chapter 8), I wanted cabbage and broccoli. They were duly obtained from a grocery store, photographed, then eaten by the photographer (who reported that the immortalized cabbage was particularly tasty):
✕

Another thing I studied in the book was shapes of leaves. Back in 1992 I’d picked up some neighborhood leaves where I was living in California at the time, then done a field trip to a nearby botanical garden. A couple of years later—believing the completion of the book was imminent—I was urgently trying to fill out more entries in a big array of leaf pictures. But I was in the Chicago area, and it was the middle of the winter, with no local leaves to be found. What was I to do? I contacted an employee of ours in Australia. Conveniently it turned out he lived just down the street from the Melbourne botanical gardens. And there he found all sorts of interesting leaves—making my final page a curious mixture of Californian and Australian fauna:
✕

As it turned out, by the next spring I hadn’t yet finished the book, and in fact I was still trying to fill in some of what I wanted to say about leaves. I had a model for leaf growth, but I wanted to validate it by seeing how leaves actually grow. That turned out not to be so easy—though I did dissect many leaf buds in the process. (And it was very convenient that this was a plantrelated question, because I’m horribly squeamish when it comes to dissecting animals, even for food.)
Some of what I wanted to photograph was out in the world. But some was also collectible. Ever since I was a kid I had been gradually acquiring interesting shells, fossils, rocks and so on, sometimes “out in the field”, but more often at shops. Working on the NKS book I dramatically accelerated that process. Shells were a particular focus, and I soon got to the point where I had specimens of most of the general kinds with “interesting forms”. But there were still plenty of adventures—like finding my very best sample of “cellularautomatonlike” patterning, on a false melon volute shell tucked away at the back of a store in Florida:
✕

In 1998 I was working on the section of the book about biological growth, and wanted to understand the space of shell shapes. I was living in the Chicago area at that time, and spent a lovely afternoon with the curator of molluscs at the Field Museum of Natural History—gradually trying to fill in (with a story for every mollusc!) what became the array on page 416 of the book:
✕

And actually it turned out that my own shell collection (with one exception, later remedied) already contained all the necessary species—and in a drawer in my office I still have the particular shells that were immortalized on that page:
✕

I started to do the same kind of shape analysis for leaves—but never finished it, and it remains an open project even now:
✕

My original conception had been to start the book with “things we see in nature and elsewhere” and then work towards models and ideas of computation. But when I switched to “computation first” I briefly considered going to more “abstracted photographs”, for example by stippling:
✕

But in the end I decided that—just like my images of computational systems—any photographs should be as “direct as possible”. And they wouldn’t be at the beginning of the book, but instead would be concentrated in a specific later chapter (Chapter 8: “Implications for Everyday Systems”). Pictures of things like bolts and scissors became irrelevant, but by then I’d accumulated quite a library of images to choose from:
✕

Many of these images did get used, but there were some nice collections, that never made it into the book because I decided to cut the sections that would discuss them. There were the “things that look similar” arrays:
✕

And there were things like pollen grains or mineralrelated forms (and, yes, I personally crystallized that bismuth, which did at least make it into the notes):
✕

✕

There were all sorts of unexpected challenges. I wanted an array of pictures of animals, to illustrate their range of pigmentation patterns. But so many of the pictures we could find (including ones I’d taken myself) we couldn’t use—because I considered the facial expressions of the animals just too distracting.
And then there were stories like the “wild goose chase”. I was sure I’d seen a picture of migrating birds (perhaps geese) in a nested, Sierpińskilike pattern. But try as we might, we couldn’t find any trace of this.
But finally I began to assemble pictures into the arrays we were going to use. In the end, only a tiny fraction of the “nature” pictures we had made it into the book (and, for example, neither the egg nor the phyllotactically scaled pangolin here did)—some because they didn’t seem clear in what they were illustrating, and some because they just didn’t fit in with the final narrative:
✕

Beyond the natural world, the more I explored simple programs and what they can do, the more I wondered why so many of the remarkable things I was discovering hadn’t been discovered before. And as part of that, I was curious what kinds of patterns people had in fact constructed from rules, for art or otherwise. On a few occasions during the time I was working on the book, I managed to visit relevant museums, searching for unexpected patterns made by rules:
✕

✕

But mostly all I could do was scour books on art history (and architecture) looking for relevant pictures (and, yes, it was books at the time—and in fact the web didn’t immediately help even when it became available). Sometimes I would find a clear picture, and we would just ask for permission to reproduce it. But often I was interested in something that was for example off on the side in all the pictures we could find. So that meant we had to get our own pictures, and occasionally that was something of an adventure. Like when we got an employee of ours who happened to be vacationing in Italy to go to part of an obscure church in rural Italy—and get a photograph of a mosaic there from 1226 AD (and, yes, those are our photographer’s feet):
✕

When I started working on the book in 1991 I saw it as an extension of what I’d done in the 1980s to establish a “science of complexity”. So at first I simply called the book The Science of Complexity, adding the explanatory subtitle A Unified Approach to Complex Behavior in Natural and Artificial Systems. But after a while I began to feel that this sounded a bit stodgy—and like a textbook—so to spruce it up a bit I changed it to A New Science of Complexity, with subtitle Rethinking the Mechanisms of Nature:
✕

Pretty soon, though, I dropped the “New” as superfluous, and the title became A Science of Complexity. I always knew computation was a key part of the story, but as I began to understand more about just what was out there in the computational universe, I started thinking I should capture “computation” in the name of the book, leading to a new idea: Computation and the Complexity of Nature. And for this title I even had a first cover draft made—complete with an eye, added on the theory that human visual perception would draw people to the eye, and thus make them notice the book:
✕

But back in 1992 (and I think it would be different today) people really didn’t understand the term “computation”, and it just made the book sound very technical to them. So back I went to A Science of Complexity. I wasn’t very happy with it, though, and I kept on thinking about alternatives. In August 1992 I prepared a little survey:
✕

The results of this survey were—like those of many surveys—inconclusive, and didn’t change my mind about the title. Still, in October 1992 I dashed off an email considering The Inevitable Complexity of Nature and Computation. But 15 minutes later, as I put it, I’d “lost interest” in that, and it was back to A Science of Complexity.
By 1993, believing that the completion of the book was somehow imminent, we’d started trying to mock up the complete look of the book, including things like the back cover, and cover flaps:
✕

The flap copy began: “This book is about a new kind of science that…”. In the first chapter there was then a section called “The Need for a New Kind of Science”:
✕

As 1993 turned into 1994 I was still working with great intensity on the book, leaving almost no time to be out and about, talking about what I was doing. Occasionally, though, I would run into people and they would ask me what I was working on, and I would say it was a book, titled A Science of Complexity. And when I said that—at least among nontechnical people—the reaction was essentially always the same “Oh, that sounds very complicated”. And that would be the end of the conversation.
By September 1994 this had happened just too many times, and I realized I needed a new title. So I thought to myself “How would I describe the book?”. And there it was, right in the flap copy: “a new kind of science”. I made a quick note on the back of my then business card:
✕

And soon that was the title: A New Kind of Science. I started trying it out. The reaction was again almost always the same. But now it was “So, what’s new about it?” And that would start a conversation.
I liked the title a lot. It definitely said what by then I thought the book was about. But there was one thing I didn’t like. It seemed a bit like a “meta title”. OK, so you have a new kind of science. But what is that new kind of science called? What is its name? And why isn’t the book called that?
I spent countless hours thinking about this. I thought about word roots. I considered comp (for “computation”), prog (for “program”), auto (for “automata”, etc.). I went through Latin and Greek dictionaries, and considered roots like arch and log (both way too confusing). I wrote programs to generate “synthetic words” that might evoke the right meaning. I considered names like “algonomics”, “gramistry”, “regulistics” (but not “ruliology”!), and “programistics”—for which I tried to see how its usage might work:
✕

But nothing quite clicked. And in a sense my working title already told me why: I was talking about “a new kind of science”, which involved a new way of thinking, for which there were really no words, because it hadn’t been done before.
I’d had a certain amount of experience inventing words, for concepts in both science and technology. Sometimes it had gone well, sometimes not so well. And I knew the same was true in general in history. For every “physics” or “economics” or even “cybernetics” there were countless names that had never made it.
And eventually I decided that even if I could come up with a name, it wasn’t worth the risk. Maybe a name would eventually emerge, and it would be perfectly OK if the “launch book” was called A New Kind of Science (as yet unnamed). Certainly much better than if it gave the new kind of science a definite name, but the name that stuck was different.
During the writing of A New Kind of Science, I didn’t really need to “refer in the third person” to what the book was about. But pretty much as soon as the book was published, there needed to be a name for the intellectual endeavor that the book was about. During the development of the book, some of the people working on its project management had started calling the book by the initials of its title: ANKOS. And that was the seed for the name of its content, which almost immediately became “NKS”.
Over the years, I’ve returned quite a few times to the question of naming. And very recently I’ve started using the term “ruliology” for one of the key pursuits of NKS: exploring the details of what systems based on simple computational rules do. I like the name, and I think it captures well the ethos of the specific scientific activity around studying the consequences of simple rules. But it’s not the whole story of “NKS”. A New Kind of Science is, as its name suggests, about a new kind of science—and a new way of thinking about the kind of thing we imagine science can be about.
When the book was first published, some people definitely seemed to feel that the strength and simplicity of the title “A New Kind of Science” must claim too much. But twenty years later, I think it’s clear that the title said it right. And it’s charming now when people talk about what’s in A New Kind of Science, and how it’s different from other things, and want to find a way to say what it is—and end up finding themselves saying it’s “a new kind of science”. And, yes, that’s why I called the book that!
We started thinking about the cover of the book very early in the project—with the “eye” design being the first candidate. But considering this a bit too surreal, the next candidate designs were more staid. The title still wasn’t settled, but in the fall of 1992 a few covers were tried:
✕

I thought these covers looked a bit drab, so we brightened them up, and by 1993—and after a few “color explorations”
✕

we had a “working cover” for the book (complete with its working title), carrying over typography from the previous designs, but now featuring an image of rule 30 together with the “mascot of the project”: a textile cone shell with a rule30like pigmentation pattern:
✕

When I changed the title in 1994, the change was swiftly executed on the cover—with my draft copy from the time being a charming palimpsest with A New Kind of Science pasted over A Science of Complexity:
✕

I was never particularly happy with this cover, though. I thought it was a bit “static”, particularly with all those boxedin elements. And compared to other “popular books” in bookstores at the time, it was a very “quiet” cover. My book designer tried to “amp it up”
✕

sometimes still with a hint of mollusc
✕

“Not that loud!”, I said. So he quietened it down, but now with the type getting a bit more dynamic:
✕

Then a bit of a breakthrough: just type and cellular automaton (now rule 110):
✕

It was nice and simple. But now it seemed perhaps too quiet. We punched up the type, just leaving the cellular automaton as a kind of decoration:
✕

And there were a variety of ways to handle the type (maybe even with an emphasized subtitle—complete with a designer’s misspelling):
✕

But the important point was that we’d basically backed into an idea: why not just use the natural angles of the structures in rule 110 to delimit the cellular automaton on the cover? As so often happens, the computational universe had “spontaneously” thrown up a good idea that we hadn’t thought of.
I didn’t think the cover was quite “there”, but it was making progress. Right around this time, though, we were in discussions with a big New York publisher about them publishing the book, and they were trying to sell us on the value they could add. They were particularly keen to show us their prowess at cover design. We patiently explained that we had quite a large and good art department, which happened to have even recently won some national awards for design.
But the publisher was sure they could do better. I remember saying: “Go ahead and try”— and then adding, “But please don’t show us something from someone who has no idea what kind of a book this is.”
Several weeks later, with some fanfare, they produced their proposal:
✕

Yup, mollusc shells can be found on beaches. But this wasn’t a “beachreading novel” kind of book. And it would be an understatement to say we weren’t impressed.
So, OK, it was on us: as I’d expected, we’d have to come up with a cover design. My notes aren’t dated, but sometime around then I started thinking harder about the design myself. I was playing around with rule 30, imagining a “physicalized” version of it (with 3D, letters casting shadows, etc.):
✕

I find in my archives some undated sketches of further “physicalized” cover concepts (or, at least I assume they were cover concepts, and, yes, sadly I’ve never learned to draw, and I can’t even imagine who that dude was supposed to be):
✕

But then we had an idea: maybe the strangely shaped triangle could be like a shaft of light illuminating a cellular automaton image. We talked about the metaphor of the science “providing illumination”. I was very taken with the notion that the basic ideas of the science could have been discovered even in ancient times. And that made us think about cellular automaton markings in a cave, suddenly being illuminated by an archaeologist’s flashlight. But how would we make a picture of something like that?
We tried some “stone effects”:
✕

We investigated finding a stone mason who could carve a cellular automaton pattern into something like a gravestone. (3D printing wasn’t a thing yet.) We even tried some photographic experiments. But with the cellular automaton pattern itself having all sorts of fine detail, one barely even noticed a stone texture. And so we went back to pure computer graphics, but now with a “shaft of light” motif:
✕

It wasn’t quite right, but it was getting closer. Meanwhile, the New York publisher wanted to have another try. Their new, “spiffier” proposal (offering type alternatives for “extra credit”) was:
✕

(The shell, now shrunk, was being kept because their sales team was enamored of the idea of a tiein whereby they would give physical shells to bookseller sales prospects.)
OK, so how were we going to tune up the cover? The cellular automaton triangle wasn’t yet really looking much like a shaft of light. It was something to do with the edges, we thought:
✕

It was definitely very subtle. We tried different angles and colors:
✕

We tried, and rejected, sans serif, and even partial sans serif:
✕

And by July 1995 the transition was basically complete, and for the first time our draft printouts started looking (at least on the outside) very much like modern NKS books:
✕

Specifying just what color should be printed was pretty subtle, and over the months that followed we continued to tweak, particularly the “shaft of light”
✕

until eventually A New Kind of Science got its final cover:
✕

All along we’d also been thinking about what would show up on the spine of the book—and occasionally testing it in an “identity parade” on a bookshelf. And as soon as we had the “shaft of light” idea, we immediately thought of it wrapping around onto the spine:
✕

Part of what makes the cover work is the specific cellular automaton pattern it uses—which, in characteristic form, I explained in the notes (and, yes, the necessary initial conditions were found by a search, and are now in the Wolfram Data Repository):
✕

How should the NKS book begin? When I write something I always like to start writing at the beginning, and I always like to say “up front” what the main point is. But over the decade that I worked on the NKS book, the “main point” expanded—and I ended up coming back and rewriting the beginning of the book quite a few times.
In the early years, it was pretty much all about complexity—though even in 1991 the term “a new kind of science” already makes an appearance in the text:
✕

In 1993, I considered a more “show, don’t tell” approach that would be based on photographs of simple and complex forms:
✕

But soon the pictures were gone, and I began to concentrate more on how what I was doing fitted into the historical arc of the development of science—though still under a banner of complexity:
✕

After my 1996 hiatus (spent finishing Mathematica 3.0) the text of the opening section hadn’t changed, but the title was now “The Need for a New Kind of Science”:
✕

And I was soon moving further away from complexity, treating it more as “just an important example”:
✕

Then, in 1999, “complexity” drops out of the opening paragraphs entirely, and it becomes all about methodology and the arc of history:
✕

And in fact from there on out the first couple of paragraphs don’t change—though the section title softens, taking out the explicit mention of “revolution”:
✕

It’s interesting to notice that even though until perhaps 1998 before the opening of the book reflected “moving away from complexity”, other things I was writing already had. Here, for example, is a candidate “cover blurb” that I wrote on January 11, 1992 (yes, a decade early):
✕

And as I pull this out of my archives, I notice at the bottom of it:
✕

Hmm. That would have been interesting. But another 400 pages?
By the end of 1991 the basic concept of what would become A New Kind of Science was fairly clear. At the time, I still thought—as I had in the 1980s—that the best “hook” was the objective of “explaining complexity”. But I perfectly well understood that from an intellectual and methodological point of view the most important part of the story was that I was starting to truly take seriously the notion of computation—and starting to think broadly in a fundamentally computational way.
But what could be figured out like this? What about systems based on constraints? What about systems that adapt or learn? What about biological evolution? What about fundamental physics? What about the foundations of mathematics? At the outset, I really didn’t know whether my approach would have anything to say about these things. But I thought I should at least try to check each of them out. And what happened was that every time I turned over a (metaphorical) rock it seemed like I discovered a whole new world underneath.
It was intellectually exciting—and almost addictive. I would get into some new area and think “OK, let me see what I can figure out here, then move on”. But then I would get deeper and deeper into it, and weeks would turn into months, and months would turn into years. At the beginning I would sometimes tell people what I was up to. And they would say “That sounds interesting. But what about X, Y, Z?” And I would think “I might as well try and answer those questions too”. But I soon realized that I shouldn’t be letting myself get distracted: I already had more than enough very central questions to answer.
And so I decided to pretty much “go hermit” until the book was done. An email I sent on October 1, 1992, summarizes how I was thinking at the time:
✕

But that email was right before I discovered yet more kinds of computational systems to explore, and before I’d understood applications to biology, and physics, and mathematics, and so on.
In the early years of the project I’d had various “I could do that as well” ideas. In 1991 I thought about dashing off an Introduction to Computing book (maybe I should do that now!). In 1992 I had a plan for creating an email directory for the world (a very proto LinkedIn). In 1993 I thought about TIX: “The Information Exchange” (a proto web for computable documents).
But thinking even a little about these things basically just showed me how much what I really wanted to do was move forward on the science and the book. I was still energetically remoteCEOing my company. But every day, by midevening, I would get down to science, and work on it through much of the night. And pretty much that’s how I spent the better part of a decade.
My personal analytics data of outgoing emails show that during the time I was working on the book I became increasingly nocturnal (I shifted and “stabilized” after the book was finished):
✕

I had started the NKS book right after the big push to release Mathematica 2.0. And thinking the book would take a year or maybe 18 months I figured it would be long finished before there was a new version of Mathematica, and another big push was needed. But it was not to be. And while I held off as long as I could, by 1996 there was no choice: I had to jump into finishing Mathematica 3.0.
From the beginning until now I’ve always been the ultimate architect of what’s now the Wolfram Language. And back in the 1990s my way of defining the specification for the language was to write its documentation, as a book. So getting Mathematica 3.0 out required me writing a new edition of The Mathematica Book. And since we were adding a lot in Version 3, the book was long—eventually clocking in at 1403 pages. And it took me a good part of 1996 to write it.
But in September 1996, Mathematica 3.0 was released, and I was able to go back to my intense focus on science and the NKS book. In many ways it was exhilarating. With Wolfram Language as a tool, I was powering through so much research. But it was difficult stuff. And getting everything right—and as clear as possible—was painstaking, if ultimately deeply satisfying, work. On a good day I might manage to write one page of the book. Other times I might spend many days working out what would end up as just a single paragraph in the notes at the back of the book.
I kept on thinking “OK, in just a few months it’ll be finished”. But I just kept on discovering more and more. And finding out again and again that sections in the table of contents that I thought would just be “quick notes” actually led to major research projects with all sorts of important and unexpected results.
A 1995 picture captured my typical working setup:
✕

A year or so later, I had the desk I’m still sitting at today (though not in the same location), and a (rarely used) webcam had appeared:
✕

A few years after that, the computer monitor was thinner, two young helpers had arrived, and I was looking distinctly unkempt and hermitlike:
✕

In 2000 a photographer for Forbes captured my “caged scientist” look
✕

along with a rather nice artistically lit “still life” of my working environment (complete with a “fromthefuture” thickerthanreallife mockup of the NKS book):
✕

But gradually, inexorably, the book got closer and closer to being finished. The floor of my office had been covered with piles of paper, each marked with whatever issue or unfinished section they related to. But by 2001 the piles were disappearing—and by the fall of that year they were all but gone: a visible sign that the book was nearing completion.
A New Kind of Science is—as its title suggests—a book about new things. But an important part of explaining new things is to provide context for them. And for me a key part of the context for things is always the story of what led to them. And that was something I wanted to capture in the NKS book.
Typically there were two parts: a personal narrative of how I was led to something—and a historical narrative of what in the past might connect to it. The academic writing style that I’d adopted in the 1980s really didn’t capture either of these. So for the NKS book I needed a new style. And there were again two parts to this. First, I needed to “put myself into the text”, describing in the first person how I’d reached conclusions, and what their importance to me was. And second, I needed to “tell the story” of whatever historical developments were relevant.
Early on, I made the decision not to mix these kinds of narratives. I would talk about my own relation to the material. And I would talk about other people and their historical relation to the material. But I didn’t talk about my interactions with other people. And, yes, there are lots of wonderful stories to tell—which perhaps one day I’ll have a chance to systematically write down. But for the NKS book I decided that these stories—while potentially fun to read—just weren’t relevant to the absorption and contextualization of what I had to say. So, with a bit of regret, I left them out.
In typical academic papers one references other work by inserting pure, uncommented citations to it. And deep within some welldeveloped field, this is potentially an adequate thing to do. Because in such a field, the structure is in a sense already laid out, so a pure citation is enough to explain the connection. But for the NKS book it was quite different. Because most of the time the historical antecedents were necessarily done in quite different conceptual frameworks—and typically the only reasonable way to see the connection to them was to tell the story of what was done and why, recontextualized in an “NKS way”.
And what this meant was that in writing the NKS book, I ended up doing a huge amount of “scholarship”, tracking down history, and trying to piece together the stories of what happened and why. Sometimes I personally knew—or had known—the people involved. Sometimes I was dealing with things that had happened centuries ago. Often there were mysteries involved. How did this person come to be thinking about this? Why didn’t they figure thisorthat out? What really was their conceptual framework?
I’ve always been a person who tries to “do my homework” in any field I’m studying. I want to know both what’s known, and what’s not known. I want to get a sense of the patterns of thinking in the field, and “value systems” of the field. Many times in working on the NKS book I got the sense that thisorthat field should be relevant. But what was important for the NKS book was often something that was a footnote—or was even implicitly ignored—by the field. And it also didn’t help that the names for things in particular fields were often informed by their specific uses there, and didn’t connect with what was natural for the NKS book.
I started the NKS book shortly after the web was invented, and well before there was substantial content on it. So at least at first a lot of my research had to be done the same way I’d done it in the 1980s: from printed books and papers, and by using online and printed abstracting systems. Here’s part of a “search” from 1991 for papers with the keyword “automata”:
✕

By the end of writing the NKS book I’d accumulated nearly 5000 books, a few of them pictured here in their thenhabitat circa 1999 (complete with me at my I’vebeenonthisprojecttoolong lifetimemaximum weight):
✕

I had an online catalog of all my books, which I put online soon after the NKS book was published. I also had file cabinets filled with more than 7000 papers. Perhaps it might have been nice when the NKS book was published to be able to say in a kind of traditional academic style “here are the ‘citations’” (and, finally, 20 years later we’re about to be able to actually do that). But at the time it wasn’t the simple citations I wanted, or thought would be useful; it was the narrative I could piece together from them.
And sometimes the papers weren’t enough, and I had to make requests from document archives, or actually interview people. It was hard work, with a steady stream of surprises. For example, in Stan Ulam’s archives we found a (somewhat scurrilous) behindthescenes interaction about me. And after many hours of discussion John Conway admitted to me that his usual story about the origin of the Game of Life wasn’t correct—though I at least found the true story much more interesting (even if some mystery still remains). There were times when the things I wanted to know were still entangled in government or other secrecy. And there were times when people had just outright forgotten, often because the things I now cared about just hadn’t seemed important before—and now could only be recovered by painstakingly “triangulating” from other recollections and documents.
There were so many corners to the scholarship involved in creating the NKS book. One memorable example was what we called the “People Dates” project. I wanted the index to include not only the name of every person I mentioned in the book, but also their dates, and the primary country or countries in which they worked, as in “Wolfram, Stephen (England/USA, 1959– ).”
For some people that information was straightforward enough to find. But for other people there were challenges. There were 484 people altogether in the index, with a roughly exponentially increasing number born after about 1800:
✕

For ones who were alive we just sent them email, usually getting helpful (if sometimes witty) responses. In other cases we had to search government records, ask institutions, or find relatives or other personal contacts. There were lots of weird issues about transliterations, historical country designations, and definitions of “worked in”. But in the end we basically got everything (though for example Moses Schönfinkel’s date of death remained a mystery, as it does even now, after all my recent research).
Most of the historical research I did for the NKS book wound up in notes at the back of the book. But of all the 1350 notes spread over 348 smallprint pages, only 102 were in the end historical. The other notes covered a remarkable range of subject matter. They provided background information, technical details and additional results. And in many ways the notes represent the highest density of information in the NKS book—and I, for example, constantly find myself referring to them, and to their pithy (and, I think, rather clear) summaries of all sorts of things.
When I was working on the book there were often things I thought I’d better figure out, just in case they were relevant to the core narrative of the book. Sometimes they’d be difficult things, and they’d take me—and my computers—days or even weeks. But quite often what came out just didn’t fit into the core narrative of the book, or its main text. And so the results were relegated to notes. Maybe there’ll just be one sentence in the notes making some statement. But behind that statement was a lot of work.
Many times I would have liked to have had “notes to the notes”. But I restrained myself from adding yet more to the project. Even though today I’ve sometimes found myself writing even hundreds of pages to expand on what in the NKS book is just a note, or even a part of a note.
The 1990s spanned the time from the very beginning of the web to the point where the web had a few million pages of content. And by the later years of the project I was making use of the web whenever I could. But often the background facts I needed for the notes were so obscure that there was nothing coherent about them on the web—and in fact even today it’s common for the notes to the NKS book to be the best summaries to be found anywhere.
I figured, though, that the existence of the web could at least “get me off the hook” on some work I might otherwise have had to do. For example, I didn’t think there was any point in giving explicit citations to documents. I made sure to include relevant names of people and topics. Then it seemed as if it’d be much better just to search for those on the web, and find all relevant documents, than for me to do all sorts of additional scholarship trying to pick out particular citations that then someone might have to go to a library to look up.
I’m not sure when I could say that the finishing of the NKS book finally seemed in sight. We’d been making bound book mockups since early 1994. Looking through them now it’s interesting to see how different parts gradually came together. In July 1995, for example, there was already a section in Chapter 9 on “The Nature of Space”, but it was followed by a section on the “Nature of Time” that was just a few rough notes. There’s a hiatus in mockups in 1996 (when I was working on Mathematica 3.0) but when the mockups pick up again in January 1997—now bound in three volumes—there’s a section on “The Nature of Time” containing an early (and probably not very good) idea based on multiway systems that I’d long since forgotten (later “The Nature of Time” section would be broken into different sections):
✕

Already in 1997 there’s a very rough skeleton of Chapter 12—with a fairly accurate collection of section headings, but just 18 pages of rather rough notes as content. Meanwhile, there’s a postChapter12 “Epilog” that sprouts up, to be dropped only late in the project (see below). Chapter 12 begins to “bulk up” in late 1999, and in 2000 really “takes off”, for example adding the long section on “Implications for the Foundations of Mathematics”. At that point our rate of making book mockups began to pick up. We’d been indicating different mockups with dates and colored labeling (“the banana version”, etc.) But, finally, dated February 14, 2001, there’s a version labeled (in imitation of software release nomenclature) “Alpha 1”.
And by then I was starting to make serious use of the machinery for doing large projects that we’d developed for so many years at Wolfram Research. The “NKS Project” started having project managers, build systems and internal websites (yes, with garish web colors of the time):
✕

We’d had the source for the book in a source control system for several years, but as far as I was concerned the ultimate source for the book was my filesystem, and a specific set of directories that, yes, are still there in my filesystem all these years later:
✕

Everything was laid out by chapter and section. Text contained the FrameMaker files. Notebooks contained the source notebooks for all the diagrams (with longtocompute results prestored in Results):
✕

The workflow was that every diagram was created in Wolfram Language, then saved as an EPS file. (EPS or “Encapsulated PostScript” was a forerunner of PDF.) And gradually, over the course of years, more and more EPS files were generated, here reconstructed in the order of their generation, starting around 1994:
In creating all these EPS files, there was lots of detailed tweaking done, for example in the exact (programmatically specified) sizes for the images given in the files. We’d built up a whole diagramgenerating system, with all sorts of detailed standards for sizings and spacings and so on. And several times—particularly as a result of discovering quirks in the printing process—we decided we had to change the standards we were using. This could have been a projectderailing disaster. But because we had everything programmatically set up in notebooks it was actually quite straightforward to just go through and automatically regenerate the thousand or so images in the book.
Each EPS file that was generated was put in a Graphics directory, then imported (“by reference”) by FrameMaker into the appropriate page of the book. And the result was something that looked almost like the final NKS book. But there were two “little” wrinkles, that ended up leading to quite a bit of technical complexity.
The first had to with the fragments of Wolfram Language code in the notes. At the time it was typical to show code in a simple monospaced font like Courier. But I thought this looked ugly—and threw away much of the effort I’d put into making the code as elegant and readable as possible. So I decided we needed a different code font, and in particular a proportionally spaced sans serif one. But there was a technical problem with this. Many of the characters we needed for the code were available in any reasonable font. But some characters were special to the Wolfram Language—or at least were characters that for example we’d been responsible for being included in the Unicode standard, and weren’t yet widely supported in fonts.
And the result was that in addition to all the other complexities of producing the book we had to design our own font, just for the book:
✕

But that wasn’t all. In Mathematica 3.0 we had invented an elaborate typesetting system which carefully formatted Wolfram Language code, breaking it into multiple lines if necessary. But how were we to weave that nicely formatted code into the layouts of pages in FrameMaker? In the end we had to use Wolfram Language to do this. The way this worked is that first we exported the whole book from FrameMaker in “Maker Interchange Format” (MIF). Then we parsed the resulting MIF file in Wolfram Language, in effect turning the whole book into a big symbolic expression. At that point we could use whatever Wolfram Language functionality we wanted, doing various patternmatchingbased transformations and typesetting each of the pieces of code. (We also handled various aspects of the index at this stage.) Then we took the symbolic expression, converted it to MIF, and imported it back into FrameMaker.
In the end the production of the book was handled by an automated build script—just like the ones we used to build Mathematica (the full build log is 11 pages long):
✕

But, OK, so by early 2001 we were well on the way to setting all these technical systems up. But there was more to do in “producing the book”—as indicated for example by the various column headings in the project management internal website. “Graphics regenerated” was about regenerating all the EPS files with the final standards for the book. “Microtweaking” was about making sure the placement of all the graphics was just right. Then there were various kinds of what in our company we call “document quality assurance”, or DQA—checking every detail of the document, from grammar and spelling to overall consistency and formatting. (And, yes, developing a style guide that worked with my sometimesnonstandard—but I believe highly sensible!—writing conventions.)
In addition to checking the form of the book, there was also the question of checking the content. Much of that—including extensive fact checking, etc.—had gone on throughout the development of the book. But near the end one more piece of checking had to do with the code that was included in the book itself. Our company has had a long history of sophisticated software quality assurance (“SQA”), and I applied that to the book—for example having extensive tests written for all the code in the book.
Much like for software, once we reached the first “Alpha version” of the book we also started sending it out to external “alpha testers”—and got a modest but helpful collection of responses. We had several pages of instructions for our “testers” (that we called “readers” since, after all, this was a book):
✕

After the “Alpha 1” version of the book in February 2001, there followed six more “Alpha” versions. In “Alpha 1” there were still XXXX’s scattered around the text, alignment and other issues in graphics—and some of the more “philosophical” sections in the book were just in note form, crossed out with big X’s in the printout. But in the course of 2001 all these issues got ironed out. And on January 15, 2002, I finished and dated the preface.
Then on February 4, 2002, we produced the “Beta 1” version of the book—and began to make final preparations for its printing and publication. It had been a long road, illustrated by the sequence of intermediate versions we’d generated, but we were nearing the end:
✕

I like indices, and the index to the NKS book—with its 14,967 entries—is my alltime favorite. In these times of ubiquitous fulltext search one might think that a book index would just be a quaint relic of the past (and indeed some younger people don’t even seem to know that most books have indices!). But it definitely isn’t with the NKS book. And indeed when I want to find something in the book, the place I always turn first is the index (now online).
I started creating the index to the NKS book in the spring of 1999, and finished it right before the final version of the book was produced in February 2002. I had already had the experience of creating indices to five editions of The Mathematica Book, and had seen the importance of those indices in people’s actual use of Mathematica. I had developed various theories about how to make a good index—which sometimes differed from conventional wisdom—but seemed to work rather well.
A good index, I believe, should list whatever terms one might actually think of looking up, regardless of whether it’s those literal terms—or just synonyms for them—that appear in the text. If there’s a phrase (like “finite automata”) explicitly list it in all the ways people might think of it (“finite automata”, “automata, finite”), rather than having some “theory” (that the users of the index are very unlikely to know) about how to list the phrase. And perhaps most important, generously include subterms, “subdividing” until each individual entry references at most a few pages. Because when you’re looking for something, you want to be able to zero in on a particular page, not be confronted with lots of “potentially relevant” pages. And wellchosen subterms immediately give a kind of pointillistic map of the coverage of some area.
I’ve always enjoyed creating indices. For me it’s an interesting exercise in quickly organizing knowledge and identifying what’s important, as well as engaging in rapid “what are different ways to say that?” association. (And, yes, a similar skill is needed in linguistic curation for the natural language understanding system of WolframAlpha.) For the NKS book (and other indices) my basic strategy was to go through the book page by page, adding tags for index entries. But what about consistency? Did I just index “Fig leaves” in one place, and somewhere else index “Leaves, fig” instead? We built Wolfram Language code to identify such issues. But eventually I just generated the alphabetical index, and read through it. And then had Wolfram Language code that could realign tags to correct the source of whatever fixes I made—which most often related to subterms.
At first I broke the index into an ordinary “Index” and an “Index of Names”. But what counted as a “name”? Only a person’s name? Or also a place name? Or also “rule 30”? Within a couple of months I had combined everything into an “Index of words, names, concepts and systems”—which soon became headed just “Index” (with a pointer to a note about what was in it).
The final index is remarkably eclectic—reflecting of course the content of the book. After “Field theory (physics)” comes “Fields (agricultural)”, followed by “Fifths (musical chords)” and so on:
✕

In the end the index—even printed as it was in 4 columns—ran to 80 pages (or more than 6% of the book). It was obviously a very useful index, and it could even be entertaining to read, not only for its eclectic jumps from one term to the next, but also for the unexpected terms that appeared. What’s “Flash photography” or “Flint arrowheads” doing there, or “Frogs” for that matter? What do these terms have to do with a new kind of science?
But for all its value, I was a bit concerned that the index might be so long that it finally made the book “too long”. Even without the index the book ran to 1197 pages. But why tell people, I thought, that the whole book is really 1280 pages, including the index? If the pages of the index were numbered, then one could immediately see the number of that last page. But why number the pages of an index? Nobody needs to refer to those pages by numbers; if anything, just use the alphabetized terms. So I decided just quietly to omit the page numbers of the index, so we could report the length of the book as 1192 pages.
OK, so A New Kind of Science was going to be a book. But how was it going to be published? At the time I started writing A New Kind of Science in 1991 the second edition of The Mathematica Book had just been released, and its publisher (AddisonWesley) seemed to be doing a good job with it. So it was natural to start talking about my new book with the same publisher. I was quite aware that AddisonWesley was primarily a publisher of textbooklike books, and in fact the particular division of AddisonWesley that had published The Mathematica Book was more oriented towards monographs and special projects. But the success of The Mathematica Book generated what seemed like good corporate interest in trying to publish my new book.
But how would the details work? There were immediate questions even about printing the book. I knew the book would rely heavily on graphics which would need to be printed well. But to print them how they needed to be printed was expensive. So how would that work financially? (And at that point I didn’t yet even know that the book would also be more than a thousand pages long.)
The basic business model of publishing tends to be: invest up front in making a book, then (hopefully) make money by selling the book. And for most authors, the book can’t happen without that upfront investment. But that wasn’t my situation. I didn’t need an advance to support myself while writing the book. I didn’t need someone to pay for the production of the book. And if necessary I could even make the investment myself to print the books. But what I thought I needed from a publisher was access to distribution channels. I needed someone to actually sell books to bookstores. I needed there to be a sales team that had relationships with bookstore chains, and that would do things like actually visit bookstores and get books into them.
And in fact quite a lot of the early discussion about the publishing of the book centered around how salespeople would present it. How would the book be positioned relative to the wellknown “popular science” books of the time? (That positioning would be key to the size of initial purchases bookstores might make.) What special ways might the salespeople make the book memorable? Could we get enough textile cone shells that the salespeople could drop one off at every bookstore they visited? (The answer, it was determined, was yes: in the Philippines such shells were quite plentiful.)
But how exactly would the numbers work? Bookstores took a huge cut (often above 50%). And if the book was expensive to print, that didn’t leave much of a margin. At least at the time, the publishing industry was very much based on formulas. If you spend $x to print a book, you need to spend $y on marketing, and you pay the author $y (yes, same y) as an advance on royalties. For the author, the advance serves as a kind of guarantee of the publisher’s effort—since unless the book sells, the publisher just loses that money.
Well, I most definitely wanted a guarantee that the publisher would put effort in. But I didn’t need or want an advance; I just wanted the publisher to put as much as possible into distribution. Around and around it went, trying to see how that might work. Exasperated, I found an expert on book deals. They didn’t seem to be able to figure it out either. And I began to think: perhaps I should go to a different publisher, maybe one more familiar with widely distributed books.
It’s typical for authors not to interact directly with such publishers, but instead to go through an agent. In principle that allows authors not to have to exercise business savvy, and publishers not to be exposed to the foibles of authors. But I just wanted to make what—at least by tech industry standards—was a very simple deal. One agent I’d known for a while insisted that the key was to maximize the advance: “If the book earns out its advance [i.e. brings in more royalties from actual sales than were paid out up front], I haven’t done my job.” But that wasn’t my way of doing business. I wanted both sides in any deal to do well.
Then there was the question of which publisher would be the right one. “Sell to the highest bidder”, was the typical advice. But what I cared about was successful book distribution, not how much a publisher might (perhaps foolishly) spend to get the book. Particularly at the time, it was a very clubby but strangely dysfunctional industry, full of belief in a kind of magic touch, but also full of stories of confusion and failure. Still, I thought that access to distribution channels was important enough to be worth navigating this.
And by 1993 quite a bit of time had been spent on discussions about publishing the book. A particular, prominent New York publisher had been identified, and the process of negotiating a contract with them was underway. From a tech industry point of view it all seemed quite Victorian. It started from a printed (as in, on a printing press) 70page contract that seemed to date from 20 years earlier. Though after not very long, essentially every single clause had been crossed out, and replaced by something different.
An effort to “show what value they could bring” led to the incident about cover designs mentioned above. And then there was the story about printing, and printing costs. The terms of our potential deal made it quite important to know just how much it would cost to print the book. So to get a sense of that we got quotes from some of our usual printing vendors (and, yes, in those days before the web, a software company like ours did lots of printing). The publisher insisted that our quotes were too high—and that they could print the book much more cheaply. My team was skeptical. But at the center of this discussion was an important technical issue about how the book would actually be printed.
Most widely distributed (“trade”) books are printed on socalled web presses—which are giant industrial machines that take paper from a roll and move it through at perhaps 30 mph. (The term “web” here refers to the “web of paper” on its path through the machine, not the subsequently invented World Wide Web.) A web press is a good way to print a justreadthewords kind of book. But it doesn’t give one much control for pictures; if everything’s running through at high speed one can’t, for example, carefully inject more ink to deal with a big area of black on a specific page.
And so if one wanted to print a more “artquality” book one had to use a different approach: a sheetfed press in which each collection of pages is “manually” set up to be printed separately on a large sheet of paper. Sheetfed presses give one much more control—but they’re more expensive to operate. The printing quotes we’d got were for sheetfed presses, because that was the only way we could see printing the book at the quality level we wanted. (I was sufficiently curious about the whole process that I went to watch a print run for something we were printing. In interacting with our potential publisher, I was rather disappointed to discover that none of the editorial team appeared to have ever actually seen anything being printed.)
But in any case the publisher was claiming that they knew better than us, and that they could get the quality we needed on a web press, at a much lower price. They offered to run a test to prove it. We were again skeptical: to do the setup for a web press is an expensive process, and it makes no sense to do it for anything other than a real print run of thousands of books. But the publisher insisted they could do it. And our only admonition was “Don’t show us a result claiming it was made on a web press when it wasn’t!”.
A few weeks went by. Back came the test. “You can’t be serious”, we said. “That’s a sheet from a sheetfed press; we can see the characteristic registration marks!” I never quite figured out if they thought they could pull the wool over our eyes, or if this was just pure cluelessness. But for me it was basically the last straw. They came back and said “Why don’t we just refactor the contract and give you a really big advance?” “Nope”, I said “you’re profoundly missing the point! We’re done.” And that’s how—in 1995—we came to make the decision to publish A New Kind of Science “ourselves”.
But when I say “ourselves” there was quite a bit more to that story. Back at the beginning of 1995 we were thinking about the upcoming third edition of The Mathematica Book, and realizing that we needed to rejigger its publishing arrangements. And while the machinations with publishers about the NKS book had been a huge waste of time, they had helped me understand more about the publishing industry—and made me decide it was time for us to create our own publishing “imprint”, Wolfram Media.
Its website from 1996 (I never liked that logo!) highlights our first title—the copublished third edition of The Mathematica Book:
✕

This was soon joined by other titles, like our heavily illustrated Graphica books. But it wasn’t until 1999 that I began to think more seriously about the final publishing of the NKS book. In the fall of 1999 we duly listed the book with the large bookstore chains and book distributors, as well as with the alreadyverysuccessful Amazon. And in late 2000 we started touting the book on our nowmoreattractive website as “A major release coming soon…”:
✕

Particularly in those days, the typical view was that most of the sales of a book would happen in the first few weeks after it was published. But—as we’ll discuss later—printing a book (and especially one like the NKS book) takes many weeks. So that creates a tricky situation, in which a publisher has to make a highstakes decision about how many books to print at the beginning. Print too few books and at least for a time, you won’t be able to fill orders, and you’ll lose out on the initial sales peak. Print too many books and you’ll be left with an inventory of unsold books—though the more books you print in a single print run, the more you’ll spread the initial setup cost over more books, and the lower the cost of each individual book will be.
Bookstores were also an important part of the picture. Books were at the time still predominantly bought through people physically browsing at bookstores. So the more copies of a book a bookstore had, the more likely it was that someone would see it there, and buy it. And all this added up to a big focus of publishing being on the size of the initial orders that bookstores made.
How was that determined? Mostly it was up to the buyers at bookstores and bookstore chains: they had to understand enough about a book to make an accurate prediction of how many they’d be able to sell. There was a complicated dance through which publishers signaled their expectations, saying for example “X copy initial print run”, “Xcity promotional tour”, “$X promotional budget”. But in the end it was a very persontoperson sales process, often done by travelingaroundthecountry salespeople who’d developed relationships with book buyers over the course of many years.
How were we going to handle this? It certainly helped that by late 2000 there were starting to be lengthy news articles anticipating the book. And it also helped that one could see that the book was gaining momentum on Amazon. But would a sales manager we had who was used to selling software be able to sell books? At least in this case the answer was yes, and by the end of 2001 there were starting to be substantial orders from bookstores.
By the time I finished writing the book at the beginning of 2002 we were in full “bookpublishing” mode. There were still lots of issues to resolve. How would we handle distribution outside the US? (We’d actually had a UK copublisher lined up but we eventually gave up on them.) How would we reach the full range of independent bookstores? And so on. Looking at my archives I find mail from April 2002 in which I was contacting Jeff Bezos about a practical issue with Amazon; Jeff responded that he “couldn’t wait to read [the book]”, noting that “For a serious book like yours, we often account for a substantial fraction of sales.” He was right—and in fact the NKS book would reach the #1 bestseller slot on Amazon.
By the beginning of 2002 we’d had a design for the front cover of the NKS book for six years. But what about the back cover? It’s traditional to put quotes (“blurbs”) on the backs of books that people will browse in bookstores. So, in February 2002 we sent a few draft copies of the book to people we thought might give us interesting quotes. Probably the most charming response was Arthur C. Clarke’s report of the delivery of the book to his house in Sri Lanka:
✕

A few days later, he emailed again “Well, I have <looked> at (almost) every page and am still in a state of shock. Even with computers, I don’t see how you could have done it”, offering the quote “Stephen’s magnum opus may be the book of the decade, if not the century”, then adding “Even those who skip the 1200 pages of (extremely lucid) text will find the computergenerated illustrations fascinating. My friend HAL is very sorry he hadn’t thought of them first…”
Other quotes came in too. At his request, I’d sent Steve Jobs a copy of the book—and I asked if he’d like to provide a quote. He responded that he thought I really shouldn’t have quotes on the back of the book. “Isaac Newton didn’t have quotes; nor should you.” And, yes, Steve had a point. I was trying to write a book that would have longterm value; it didn’t really make sense to have momentofpublication quotes printed on it.
So—feeling bad for having solicited quotes in the first place—we dropped them from the back cover, instead just putting images from the book that we thought would intrigue people:
✕

Still, my team did use Arthur C. Clarke’s quote on the publishingindustryobligatory ad we ran in Publisher’s Weekly on April 15 as part of a final sprint to increase upfront orders from bookstores:
✕

At least the way the book trade was in those days, there was a whole arcane dance to be done in publishing a book—with carefully orchestrated timing of book reviews, marketing initiatives at bookstores, and so on. My archives contain a whole variety of pieces related to that (many of which I don’t think I saw at the time). One of the more curious (whose purpose I don’t now know) involves a perhapsnotnaturallycolored lizard that could be viewed as having escaped from page 426 of the book:
✕

From the very beginning I was very committed to doing the best we could in actually printing the book. My original discoveries about rule 30 and its complexity had originally crystallized back in 1984 when I’d first been able to produce a highresolution image of its behavior on a laser printer. Book printing allowed still vastly higher resolution, and I wanted to make use of that to make the NKS book serve if nothing else as a “printed testament” to the idea that complexity can be generated from simple computational rules.
Here’s what a printout of rule 30 made on a laser printer looks like under a microscope (this printout is from 1999, but it basically looks the same from a typical blackandwhite laser printer today):
✕

And here’s what the highestresolution picture of rule 30 from the printed NKS book looks like (and, yes, coincidentally that picture occurs on page 30 of the book):
✕

You can see the grain of the paper, but you can also see crisp boundaries around each cell. To give a sense of scale, here’s a word from the text of the book, shown at the same magnification:
✕

To achieve the kind of crispness we see in the rule 30 picture (while, for example, keeping the book of manageable size and weight) was quite an adventure in printing technology. But the difficulties with pure black and white (as in this picture of rule 30) paled in comparison to those involved with gray scales.
The fundamental technology of printing is quite binary: there’s either ink at a particular place on a page, or there isn’t. But there’s a standard method for achieving the appearance of gray, which is to use halftoning, based essentially on an array of dots of different sizes. Here’s an example of that from the photograph of a tiger on page 426 of the NKS book:
✕

But one feature of photographs is that they mostly involve smooth gradations of gray. In the NKS book, however, there are lots of cases where there are tiny cells with different gray levels right next to each other.
Here’s one example (from page 157—which we’ll encounter again later):
✕

Here’s another example with slightly smaller cells (page 640):
✕

Here’s a nice example based from a 3D graphic (page 180):
✕

And here’s one where the gray cells are so small that the halftoning gets mixed up with the actual boundaries of cells (page 67):
✕

But in general to achieve welldelineated patches of gray there have to be a decent number of halftone dots inside each patch. And this is one place where we were pushing the boundaries of printing technology for the NKS book. Here’s an image from a 1995 print test (and, yes, we were testing printing as early as 1992):
✕

This is a more straightforward case, because we’re dealing with exactly 50% gray. But look at the difference for the same picture in the final NKS book:
✕

We slightly changed our standard for how big the mobileautomatonactivecell dots should be. But the main thing to notice is that the halftone checkerboard in each gray cell is roughly twice as fine in the final version. In printing terminology, the 1995 test used a standard “100line screen”; the final NKS book used a “175line screen” (i.e. basically 175 dots per inch).
The importance of this is even more obvious when we start looking not just at gray cells, but also at gray lines. Here’s the 100linescreen print test:
✕

And here’s the same picture in the final book:
✕

Here’s the picture that first introduces rule 30:
✕

And a big issue was: how thin can the gray lines be, while not filling in, and while still looking gray? That was a difficult question, and was only answered by lots of print testing. One of the main points was: even if you effectively specify dots of a certain size, what will be the actual sizes of dots formed when the ink is absorbed into the paper? And similarly: will the ink from black cells spread into the area of the gray line you’re trying to print between them? In printing it’s typical to talk about “dot gain”. If you think you’re setting up dots to give a certain gray level, what will be the actual gray level you’ll get when those dots are made of ink on paper?
We were constantly testing things like this, with different printing technology, different paper and so on:
✕

We used a “densitometer” (yes, this was before modern digital cameras) to measure the actual gray level, and deduce the dot gain function. And we tested things like how thin lines could be before they wouldn’t print.
In halftoning, one effectively applies a global “screen” (as in, something with an array of holes in it, just like in predigital printing) to determine the positions of dots. We considered effectively setting up our own dot placement algorithm, that would for example better align with cells in something like a cellular automaton. But tests didn’t show particularly good behavior, and we soon reverted to considering the “traditional approach”, though with various kinds of tweaking.
Should the halftone dots be round, or elliptical? What should the angle of the array of dots be (it definitely needed to avoid horizontal and vertical directions)? As this manifest indicates, we did many tests:
✕

The final conclusion was: round dots, 175line screen, 45° angle. But it took quite a while to get there.
But, OK, so we had a pipeline that started with Wolfram Language code, and eventually generated PostScript. Most of the complexity we’ve just been discussing came in converting that PostScript to the image that would actually be printed. And in imaging technology jargon, that’s achieved by a RIP, or raster image processor, that takes the PostScript and generates a bitmap (normally represented as a TIFF) at an appropriate resolution for whatever will finally render it.
In the 1990s the standard thing to do was first to render the bitmap as a negative onto film. And my archives have tests of this that we did in 1992, here again shown under a microscope:
✕

Everything looks perfectly clean. And indeed printing this purely photographically still gives a perfectly clean result:
✕

But it gets much more complicated when one actually prints this with ink on a printing press:
✕

The basic way the printing is done is to (“lithographically”) etch a printing plate which will then be inked and pressed onto paper to print each copy. Given that one already has film, one can make the plate essentially photographically—more or less the same way microprocessor layouts and many other things are made. But by the beginning of the 2000s, there was a new technology: directtoplate printing, in which an (ultraviolet) laser directly etches the plate (a kind of muchhigherresolution “plate analog” of what a laser printer does). And in order to get the very crispest results, directtoplate printing was what we used for the NKS book.
What’s the actual setup for printing? In the sheetfed approach that we were using, one combines multiple pages (in our case 8) as a “signature” to be printed from a single plate onto a single piece of paper. Here’s a (yes, ratherunremarkablelooking) actual plate that was used for the first printing of the NKS book:
✕

And here’s an example of a signature printed from it, with pages that will subsequently be cut and folded:
✕

Under a microscope, the plate looks pretty much like what will finally be printed onto the paper:
✕

But now the next big issue is: what kind of paper should one use? If the paper is glossy, ink won’t spread on it, and it’s easier to get things crisp. But adding a glossy coating to paper makes the paper heavier and thicker, and we quickly determined that it wasn’t going to be practical to print the NKS book on glossy paper. Back in the 1980s it had become quite popular to print books on paper that looked good at first, but after a few years would turn yellow and disintegrate. And to avoid that, we knew we needed acidfree paper.
Any particular kind of paper will come in different “weights”, or thicknesses. And the thicker the paper is, the more opaque it will be, and the less seethrough the pages of the book will be—but also the thicker the book will be with a given number of pages. At the beginning we didn’t know how long the NKS book would be, and we were looking at comparatively thick papers; by the end we were trying to use paper that was as thin as possible.
Back in 1993 we’d identified Finch Opaque as a possible type of paper. In 1995 our paper rep suggested as an alternative Finch VHF (“Very High Finish”)—which was very smooth, and was quite bright white. But normally this paper was used in very thick pages. Still, it was possible for the paper mill to produce thinner versions as well. We studied the possibilities, and eventually decided that a 50lb version (i.e. with the paper weighing 50 lbs per 500 uncut sheets) would be the best compromise between bulk and opacity. So 50lb Finch VHF paper is what the NKS book is printed on.
Paper, of course, is made from trees. And as I’ll explain below, during the publishing of the NKS book, I became quite aware of the physical location of the trees from which the paper for the NKS book was made: they were in upstate New York (in the Adirondacks). At the time, though, I didn’t know more details about the trees. But a few years ago I learned that they were eastern hemlock trees. And it turns out that these coniferous trees are unusual in having long fibers—which is what allows the paper to be as smooth as it is. Talking about hemlock makes one think of Socrates. But no, hemlock the poison comes from the “poison hemlock” plant (Conium maculatum), which is unrelated to hemlock trees (which didn’t grow in Europe and seem to have gotten their hemlock name only fairly recently, and for rather tenuous reasons). So, no, the NKS book is not poisonous!
Once signatures are printed, the next thing is that the signatures have to be folded and cut—in the end forming little bookletlike objects. And then comes the final step: binding these pieces together into the finished book. By the mid1990s The Mathematica Book had given us quite a bit of experience with the binding of “big books”—and it wasn’t good. Many copies of multiple versions of The Mathematica Book (yes, not printed by us) had basically selfdestructed in the hands of customers.
How were we going to be sure this wouldn’t happen for the NKS book? First, many books—including some versions of The Mathematica Book—were basically “bound” by just gluing the signatures into the “case” of the book (with little fake threads added at the ends, for effect). But to robustly bind a big book one really has to actually sew the signatures to the case, and a standard way to do this is what’s called Smythe sewing. And that’s what we determined to use for the NKS book.
Still, we wanted to test things. So we sent books to a booktesting lab, where the books were “tumbled” inside a steel container, 1200 times per hour, “impacting the tail, binding edge, head and face” of each book 4800 times per hour. After 1 hour, the lab reported “spine tight and intact”. After 2 hours “text block detached from cover”. But that’s basically only after doing the equivalent of dropping the book thousands of times!
As we approached the final printing of the NKS book, there were other decisions to be made. The endpapers were going to have a rule 30 pattern printed on them. But what color should they be? We considered several, picking the goldenrod in the end (and somehow that color now seems to have become the standard for the endpapers of all books I write):
✕

In the late stages of writing the NKS book one of the big concerns was just how long the book would eventually be. We’d figured out the paper, the binding, and so on. And there was one hard constraint: the binding machines that we were going to use could only bind a book up to a certain thickness. With our specs the limit was 80 signatures—or 1280 pages. The main text clocked in at 1197 pages; with front matter, etc. that was 1213 pages. But then there was the index. And I was writing a very extensive index, that threatened to overrun our absolute maximum page count. We formatted the index in 4 columns as small and tight as we thought we could. And in the end it came in just under the wire: the book was 1280 pages, with not a single page to spare. (Somewhat simplifying the story, I’ve sometimes said that after a decade of work on the NKS book, I had to stop because otherwise I was going to have a book that was too long to bind!)
Highquality printing of the kind needed for the NKS book was then—and is now—often done in the Far East. But anticipating that we might need to reprint the book fairly quickly we didn’t consider that an option; it would just take too long to transport books by boat across the Pacific. And conveniently enough, we determined that there was a costeffective North American alternative: print the book in Canada. And so it was that we chose a printer in Winnipeg, Canada, to print the NKS book.
On February 7, 2002, the files for the book (which were now PDF, not pure PostScript) were transferred (via FTP) to the printer’s computers—a process which took a mere 90 minutes. (Well, it had to be done twice, because of an initial glitch.) But then the next step was to produce “proofs” for the book. In traditional printing, where printing plates were made from film, one could produce the film first, then make a photographic print of this, check it, and only then make the plates. But we were going to be making plates directly. So for us, “proofing” was a more digital process, that involved using a separate device from the one that would actually make the plates. Supposedly, though, “the bits were the bits”, and the results would be the same.
Within a couple of days, the printer had the first proofs made, and a few issues were seen—such as white labels inside black cells simply disappearing. The cause was subtle, though didn’t take a long time to find. Some 3D graphics in the book had generated color PostScript—and in all our tests so far these had just automatically been converted to grayscale. But now the presence of color primitives had made the RIP that was converting from PostScript change its settings—and cause other problems. But soon that was worked around, and generating proofs continued.
By February 14 we had the first batch of proofs in our hands, and my team and I went to work going through them. Everything looked just fine until—ugh—page 157:
✕

That was supposed to be a symmetrical (continuous) cellular automaton! So how could it be different on the two sides? Looking now under a microscope, here are the corresponding places on the two sides:
✕

And we can see that somehow on the left an extra column of cells has mysteriously appeared. But where did it come from? We checked the original PostScript. Nope, it wasn’t there. We asked the printer to rerun the proof, and, second time around, it was gone. Very mysterious. But we figured we could go ahead—and in any case we had a tight schedule to meet.
So on February 17 the book designer who’d worked on the project ever since the beginning went to Winnipeg, and on February 18 the book began to be printed.
I wasn’t there (and actually now I wish I’d gone) but a bunch of pictures were taken. After a decade of work all those abstract bits I’d produced were being turned into an actual, physical book. And that took actual industrial work, with actual industrial machines:
✕

Here’s the actual press that’s about to print a signature of the NKS book (the four “stations” here are set up to print four different colors, but we were only using one of them):
✕

And here’s that signature “coming off the press”:
✕

It really was coming out “hot off the press”—with a machine drying off the ink:
✕

Those controls let one change ink flows and pressures to make all the pages come out correctly balanced:
✕

Thanks, guys, for checking so carefully:
✕

✕

Pretty soon there were starting to be lots of copies of signatures being printed:
✕

And—after being involved for more than a decade—the book designer was finally able to sign off on the printed version of the opening signature of the book:
✕

The whole process of printing all the signatures of the book was scheduled to take about four weeks. We had been receiving and checking the signatures as they were ready—and on March 12 we received the final batch, and began to check them, on the alert for any possible repeat of something like the page157 problem.
Within a few hours a member of our team got to page 332 (on “signature 21”) which included this image:
✕

I’m frankly amazed he noticed, but if you look carefully near the righthand edge you might be able to tell that there’s a strange kind of “seam”. Zoom in at the top and you’ll see:
✕

And, yes, this is definitely wrong: with the aggregation rule used to make this picture it simply isn’t possible to have floating pieces. In this case, the correct version is:
✕

An hour or so later two more glitches were found, on page 251 and 253. Both cases again involved something like a column of cells being repeated. On page 253 zooming into the image
✕

reveals strange and “impossible” imperfections in the supposedly periodic background of rule 110:
✕

On page 194 there was another glitch: an arrow on a graph that had basically become too thin to see. But this problem at least we could understand—and it was our fault. Instead of setting the thickness of the arrow in some absolute way, we’d just set it to be “1 pixel”—which in the final printing was too thin to see.
But what about the other glitches? What were they? And might there be more of them?
The signatures from the book were ready to start being bound. Should we hold off and reprint the signatures where we’d found glitches? Could we do this without blowing our (already very tight) schedule? Could we even get enough extra paper in time? My team was adamant that we should try to fix the glitches, saying that otherwise they would “nag at us forever”. But I wanted first to see if we could characterize the bug better.
We knew it was associated with the rendering of the PostScript image operator. Even though PostScript is basically a vector graphics description language, the image operator allows one to include bitmaps. Normally these bitmaps are used to represent things like photographs, and have tiny (“fewpixel”) cells. But in the cellularautomatonlike images we were having trouble with, the cells were much larger; in the case of page 157, for example, each one was roughly 75 of the final 2400dpi pixels across. This was absolutely something the image operator was set up to handle. But somehow something was going wrong.
And what was particularly surprising is that it seemed as if the problem was happening after the PostScript was converted to a TIFF. Could it perhaps be in the driver for both the proofing and the final plate production system? Time was short, and we needed to make a decision about what to do.
I fired off an email to the CEO of the company that made the directtoplate system, saying: “We of course do not know the details of your software and hardware systems. However, we have done a little investigation. It appears that the data … in the case of this image is a bilevel TIFF with LZW compression. We speculate that the LZW dictionary contains something close to the actual squares seen in the image, and that somehow pointers to dictionary entries are being corrupted or are not being used correctly in the decompression of the TIFF. The TIFF experts at my company say they have never seen anything like this in developing software based on standard imaging libraries, making us suspect that it may be some kind of buffering or motion optimization bug associated with your actual hardware driver.”
The CEO of what was by then quite a large company had personally designed the original hardware, and when we talked by phone he speculated that what we were seeing might be some kind of obscure mechanical issue with the hardware. But his chief of software soon sent mail explaining that “of the several hundred thousand books that go through [their system] each year, there are a couple that have imaging problems like this.” But, he added, “Usually they are books about halftone screening algorithms, which cause an almostrecursive problem…”. He said the specific issue we were having looked like a “difficult to reproduce problem we have known about for some time but is transient enough that reimaging the same file can ‘correct’ the problem.” He added that: “Our hypothesis is that it is related to a memory access error in the RIP that manifests only at lowmemory conditions, or after many allocation/deallocation cycles of RAM blocks. The particular code path is not one we have sourcecode access to, and is rumored to be many years old, so not many people on earth are prepared to make substantive changes to it.”
OK, so what next? The RIP had been developed by Adobe, creators of PostScript. So I emailed John Warnock, cofounder of Adobe, who I’d met at quite a few softwareindustry gettogethers before my NKSbook “hermit period”. I commented that “One thing that’s peculiar (at least without knowing how the RIP works) is that the glitch involves overwriting of a column … even though scanning the underlying PostScript would involve going from one row to the next.” Warnock responded helpfully, copying his team, though saying (in an echo of what we’d already heard) “I don’t know who does PostScript stuff anymore”.
Well, that seemed like pretty much the end of the road. So we decided to assume that the glitches we’d found were the only ones, and—for perfection’s sake—we’d reprint those signatures, which by that point the printer had helpfully said they could do without blowing the schedule.
Two weeks later, Adobe delivered a new version of the RIP, in which they believed the bug had been fixed, noting that there had been significant code cleanup, and they were now using a newer version of the C++ compiler. Meanwhile, I’d realized another issue: a variety of magazines had requested files from us to be able to print highresolution images from the book. Would they end up using the same software pipeline, and potentially have the same problem? A general release of any fix was still quite far away.
Meanwhile, with the two “glitch” signatures reprinted, the book was off to be bound. The cover had also been printed, now making use of all four stations of the presses. Under a microscope the characteristic “rosettes” of 4color printing are visible:
✕

Actually, the book in a sense has two covers: a detachable dust jacket (including a dated picture of me!) and a “permanent” hard cover—which I think looks very nice:
But as I was just now looking back through my archives I found an email from February 2002, expressing concerns about the fading of ink on the cover. The printer assured us that we had “nothing to worry about unless the books were exposed to direct sunlight for an extended amount of time.” But then they added “The reds and yellows will fade faster that the other pigments, but this is not something that would be noticeable in the first 20–40 years.” Well, it’s now been 20 years, and it so happens that I have a copy of the NKS book that’s been exposed to sunlight for much of that time—and look what’s happened to its spine, right on cue:
✕

I received a first, handbound, finished NKS book on April 22. And very soon books were on their way to bookstores and distribution centers. And people were ordering the book—in large numbers. And that meant that the books we’d printed so far weren’t going to be enough. And on May 12—two days before the May 14 official publication date of the book—another print run was started.
Fortunately it was possible to reuse the plates from the first print run (well, apart from the one which said “First printing”), so we didn’t have to worry about new glitches showing up.
But once the book was published, demand continued to be strong, and on June 4 we needed to do another print run. And this time new plates had to be made. Were there going to be new glitches? We decided we should check the plates before we started printing—so we sent the person who’d caught the glitches before on a trip to Canada. Turns out the bug hadn’t yet been fixed, and there it was again on pages 583 and 979.
Some time later I heard that the bug was finally found and fixed, and had been lurking in the implementation of the PostScript image operator for well over a decade. Yes, software is hard. And computational irreducibility is rampant. But in the years since the NKS book was published, no other weird glitches like this have ever shown up. Or at least nobody has ever told us about any.
But as I was writing this, I wondered: what became of that other glitch that was in the first printing—the one with the thin arrows that was our fault? I opened an NKS book from my desk. No problem. But then I pulled off my shelf the leatherbound copy of the first printing that my team made for me, and turned to page 194. And there it was—the “1pixel arrow” (compared here under a microscope to the second printing):
✕

And yet one more thing: looking in my archives, I find a cover sheet for a print test from March 1, 1999—which notes that there is “glitch with the graphic on page 246” … “which has been traced to a problem with the Adobe 4.1 PostScript driver” for the RIP—made by a completely different company:
✕

Was it the same “page157” bug? I looked for the print test. And there’s “page 246” (which ended up in the final version as page 212):
✕

Under a microscope, most of the arrays of cells look just fine:
✕

But there it is: something weird again!
✕

Is it the same “page157” bug? Or is it another bug, perhaps even still there, 23 years later?
When the NKS book was officially published on May 14, 2002, it was the #1 bestselling book on Amazon, and it was steadily climbing the New York Times and other bestseller lists. We’d just initiated a second printing, which would be finished in a few weeks. But based on apparent demand that printing wasn’t going to be sufficient. And in fact a single bookstore chain had just offered to buy the whole second printing. We initiated a third printing on June 4, and then a fourth on June 18. But if we were going to keep the momentum of sales, we knew we had to keep feeding books into the channel.
But that’s where things got difficult again. It just didn’t seem possible to get enough books, quickly enough. But after everything we’d done to this point, I wasn’t going to be stopped here. And I went into full “handson CEO” mode, trying to see how to juggle logistics to make things work.
The paper mill was in Glens Falls, NY. Once the paper had been made, it had to be trucked 2752 km to the printer in Winnipeg, Canada. Then the finished “book blocks” had to go 2225 km to the bindery in Toronto (or maybe there was an alternative bindery in Portland, OR, 2400 km away). And finally the bound books had to come to our warehouse in Illinois, or go directly to book distribution centers.
My archives contain a diagram I made trying to see how to connect these things together, particularly in view of the impending Canada Day holiday on July 1:
✕

I have pages and pages of notes, with details of ink drying times (1 day), sheets of paper per skid (20,000), people needed per shift, and so on. But in the end we made it; with a lot of people’s help, we got the books finished on time—and put on trucks, some of which were going to the distribution center for a major bookstore chain.
The trucks arrived. But then we heard nothing. Bookstores were reporting being out of stock. What was going on? At last it was figured out: multiple truckloads of books had somehow been misplaced at the distribution center. (How do you lose something that big?) And, yes, some sales momentum was lost. And so we didn’t peak as high on bestseller lists as we might. Though hopefully in the end everyone who wanted an NKS book got one, no doubt oblivious to the logistical challenges involved in getting it to them.
For more than a decade I basically poured everything I was doing into the NKS book. Well, at least that’s the way I remember it. But going through my archives now, I realize I did quite a bit that never made it into the final NKS book. Particularly from the early years of the project, there are endless photographs—and investigations—of examples of complexity in nature, which never made it into Chapter 8. There are also lots of additional results about specific systems from the computational universe—as well as lots of details about history—that could have been notes to the notes, except I didn’t have those.
Something I didn’t remember is that in 1999—as the book was nearing completion—I considered adding a pictorial “Quick Summary” at the front of the book, here in draft form:
✕

I’m not sure if this would have been a good idea, but in the end it effectively got replaced by the textual “An Outline of Basic Ideas” that appears at the very beginning of the book. Still, right when the book was being published, I did produce an “outside the book” pictorial 1pager about Chapter 2 that saw quite a bit of use, especially for media briefings:
✕

But as I was looking through my archives, my biggest “rediscovery” is the “Epilog” to the book. There are versions of it from quite early in the development of the book, but the last time it appears is in the December 15, 2000, draft—right before “Alpha 1”. Then it’s gone. Well, that is, until I just found it again:
✕

So what’s in this “lost epilog”, with its intriguing title “The Future of the Science in This Book”? Different versions of it contain somewhat different fragmentary pieces of text. The version from late 1999, for example, begins:
✕

Later it continues (the bracketed text gives alternative phrasings I was considering):
✕

Some of what was in the “lost epilog” found its way into the Preface for the final book; some into a “General Note” entitled “Developing the new kind of science”. But quite a lot never made it. It’s often quite roughhewn text—and almost just “notes to myself”. But in a section entitled “What Should Be Done Now”, there are, for example, suggestions like:
✕

And there’s a list of “principles” that aren’t a bad summary of at least my general approach to research:
✕

Later on there are some rough notes about what I thought might happen in the future:
✕

It’s a charming timecapsulelike item. But it’s interesting to see how what I jotted down more than 20 years ago has actually panned out. And in fact I think much of it is surprisingly close to the mark. Plenty of small extensions did indeed get made in the first few years, with larger ones—both in studying abstract systems and in building practical models—coming later. (One notable extension was the 2,3 Turing machine universality proof at year 5, stimulated by our 2,3 Turing Machine Prize.)
How about “major new directions”? We’re remarkably “on cue” there. At year 18 was our Physics Project, and from that has emerged the whole multicomputational paradigm, which I consider to be the next major direction building on the ideas of the NKS book. I have to say that when I wrote down these expectations 20+ years ago, I didn’t imagine that I would personally be involved in the “major new direction” I mentioned—but, unexpected as it has been, I feel very fortunate that that’s the way it’s worked out.
What about technology? Already at year 7 WolframAlpha was in many ways a major “philosophical spinoff” of the NKS book. And although one doesn’t know its detailed origins, the proofofwork concept of bitcoin (which also first appeared at year 7) has fundamental connections to the idea of computational irreducibility. Meanwhile, the general methodology of searching the computational universe for useful programs is something that has continued to grow. And although the details are more complicated, the whole notion of deep learning in neural nets can also be thought of as related.
It’s very hard to assess just what’s happened in “becoming a part of everyday thought”—though it’s been wonderful over the years to run into so many people who’ve told me how much the NKS book affected their way of thinking about things. But my impression is that—despite quite a few specific applications—the truly widespread absorption of ideas like computational irreducibility and their implications is a bit “behind schedule”, though definitely now building well. (One piece of absorption that did happen in the 4–10 year window was into areas like art and architecture.)
What about education? 1D cellular automata have certainly become widely used as “doalittleextra” examples for both programming and math. But more serious integration of ideas from the NKS book as foundational elements of computational thinking—or as a kind of “precomputer science”—is basically still a “work in progress”.
Beyond the main text of the “lost epilog”, I found something else: “Notes for the Epilog”:
✕

And after short (and unfinished) notes on “The sociology of the new science” and “The role of amateurs”, there’s the most significant “find”: a list of altogether 283 “Open questions” for each of the chapters of the book, most still unanswered.
In preparation for our first Wolfram Summer School (then called the NKS Summer School) in June 2003, I worked on a more detailed version of something similar—but left it incomplete after getting up to the middle of Chapter 4, and didn’t include much if anything from the “Notes to the Epilog” even though I’d been accumulating those for much of the time I worked on the book:
✕

During the decade I worked on the NKS book I generated a vast amount of material. Most of it I kept in my stillverymuchextant computer filesystem, and while I can’t say that I’ve reexamined everything there, my impression is that—perhaps apart from some “notes to the notes” material—a large fraction of what should have made it into the NKS book did. But in the course of working on the book there was definitely quite a bit of more ephemeral material. Some was preserved in my computer filesystem. But some was printed out and discarded, and some was simply handwritten. But all these years I’ve kept archive boxes of that material.
Some of those boxes have now been sealed for nearly 30 years. But I thought it’d be interesting to see what they contain. So I pulled out a box labeled 6/93–10/93. It’s slightly the worse for wear after all these years, but what’s inside is well preserved. I turn over a few pages of notes, printouts and ancient company memos (some sent as faxes). And then: what’s this?
✕

It’s a note about multiway systems: things that are now central to the multicomputational paradigm I’ve just been pursuing. There’s a brief comment about numerical multiway systems in the NKS book—but just last year, I wrote a whole 85page “treatise” about them.
I turn over a few more pages. It feels a bit like a time warp. I just wrote about multiway Turing machines last year, and my very recent work on metamathematics is full of multiway string rewrites and their correspondence to mathematical proofs!
✕

A few more pages and I get to:
✕

It’s not something that made it into the NKS book in that form—but last year I wrote a piece entitled “How Inevitable Is the Concept of Numbers?” which explores (in an admittedly modernized way) some of the exact same issues.
A few more pages later I get to “timeless” graphics like these:
✕

But soon there’s a charming reminder of the times:
✕

I’ve only gone through perhaps an inch of paper so far. And I’m getting to pages like these:
✕

Yes, I’m still today investigating consequences of “computational irreducibility and the PCE (Principle of Computational Equivalence)”. And just last year I used as a central example in writing about numerical multiway systems!
I’ve gone through perhaps 10% of one box—and there are more than 40 boxes in all. And I can’t help but wonder what gems there may be in all these “outtakes” from the NKS book. But I’m also thankful that back when I was working on the NKS book I didn’t try to pursue them all—or the decade I spent on the book might have stretched into more than a lifetime.
On May 14, 2002, the NKS book was finally published. In some ways the actual day of publication was quite anticlimactic. In modern times there’d be that moment of “making things live” (as there was, for example, for WolframAlpha in 2009). But back then there’d been a big rush to get books to bookstores, but on the actual “day of publication” there wasn’t much for me to do.
It had been a long journey getting to this point, though, and for example the acknowledgements at the front of the book listed 376 people who’d helped in one way or another over the decade devoted to writing the book, or in the years beforehand. But in terms of the physical production of the book one clue about what had been involved could be found on the very last page—its “Colophon”:
✕

And, yes, as I’ve explained here, there was quite a story behind the simple paragraph: “The book was printed on 50pound Finch VHF paper on a sheetfed press. It was imaged directly to plates at 2400 dpi, with halftones rendered using a 175line screen with round dots angled at 45°. The binding was Smythe sewn.” And whatever other awards the book would win, it was rather lovely to win one for its creative use of paper:
✕

So much about the NKS book was unusual. It was a book about new discoveries on the frontiers of science written for anyone to read. It was a book full of algorithmic pictures like none seen before. It was a book about science produced to a level of quality probably never equaled except by books about art. And it was a book that was published in a direct, entrepreneurial way without the intermediation of a standard large publishing company.
Publisher’s Weekly ran an interesting—and charmingly titled—piece purely about the “publishing dynamics” of the book:
Just before the book was finally published, I’d signed some copies for friends, employees and people who’d contributed in one way or another to the book:
✕

Shortly after the book was published, we decided to make a “commemorative poster”, reproducing (small, but faithfully) every one of the pages that had taken so much effort to create:
✕

Then there were the “computationalirreducibilityinspired” bookmarks that I, for one, still use all the time:
✕

We carefully stored a virtual machine image of the environment used to produce the book (and, yes, that’s how quite a few of the images here were made):
✕

And over the years that followed we’d end up using the raw material for the book many times. Within a year there was “NKS Explorer”—a Wolfram Notebook system, distributed on CDROM, that served as a kind of virtual lab that let one (as it put it) “Experience the discoveries of A New Kind of Science on your own computer”:
✕

About five years later, more or less the same content would show up in the webaccessible Wolfram Demonstrations Project (and 10 years later, in its cloud version):
✕

When the book came out, there was already a “wolframscience.com” website:
✕

But in 2004 we were able to put a full version of the NKS book on the web:
✕

In 2010 we made a version for the iPad:
✕

And in recent years there have followed all sorts of modernizations, especially on the web—with a bunch of new functionality just recently released:
✕

I went to great effort to write the NKS book to last, and I think it’s fair to say—20 years out—that it very much has. The computational universe, of course, will be the same forever. And those pictures of the behavior of simple computational systems that occur throughout the book share the kind of fundamental timelessness that pictures of geometric constructions from antiquity do.
Of course, I knew that some things in the book would “date”, most notably my references to technology—as I warned in one of the “General Notes” at the back of the book (though actually, 20 years later, notwithstanding “electronic address books” from page 643, and MP3 on page 1080 being described as a “recent” format, surprisingly little has yet changed):
✕

What about mistakes? For 20 years we’ve meticulously tracked them. And I think it’s fair to say that all the careful checking we did originally really paid off, because in all the text and pictures in the book remarkably few errors have been found. For example, here’s the list of everything in Chapter 4, indicating a few errors that were fixed in early printings—and a couple that remain, and that we are now fixing online:
People ask me if there’ll be a second edition of the NKS book. I say no. Yes, there are gradually starting to be more things one can say—and in the past couple of years the Wolfram Physics Project and the whole multicomputational paradigm has added significantly more. But there’s nothing wrong with what’s in the NKS book. It remains as valid and coherent as it was 20 years ago. And any “secondedition surgery” would run the risk of degrading its crispness and integrity—and detract from its unique perspective of presenting science at the time of its discovery.
But, OK, so all those NKS books that were printed on all those tons of paper from hemlock trees 20 years ago: what happened to them? Looking on the web today, one can find a few out there in the wild, sitting on bookshelves alongside a remarkable variety of other books:
I myself have many NKS books on my shelves (though admittedly a few more as convenient 2.5inch “filler bookends”). And—at least when I’m in a “science phase”—I find myself using the online NKS book (if not a physical book) all the time, to see an example of some remarkable phenomenon in the computational universe, or to remind myself of some elaborate explanation or result that I put so much effort into finding all those years ago.
I consider the NKS book one the great achievements of my life—as well as one of the great “steppingstone” points in my life, that was made possible by what I’d done before, and that in turn has made possible what I’ve done since. Twenty years later it’s interesting to think back—as I’ve done here—on just what it took to produce the NKS book, and how all those individual steps that I worked so hard on for a decade came together to make the whole that is the NKS book.
To me it’s a satisfying and inspiring story of what can be achieved with clear vision, sustained effort and a willingness to go where discoveries lead. And as I reflect on achievements of the past it makes me all the more enthusiastic about what’s now possible—and why it’s worth putting great effort today into what we can now build for the future.
]]>Something remarkable has happened these past two years. For 45 years I’ve devoted myself to building a taller and taller tower of science and technology—which along the way has delivered many outputs of which I’m quite proud. But starting in 2020 with the unexpected breakthroughs of our Wolfram Physics Project we’ve jumped to a whole new level. And suddenly—yes, building on our multidecade tower—it seems as if we’ve found a new paradigm that’s incredibly powerful, and that’s going to let us tackle an almost absurd range of longstanding questions in all sorts of areas of science.
Developing a fundamental theory of physics is certainly an ambitious place to start, and I’m happy to say that things seem to be going quite excellently there, not least in providing new foundations for many existing results and initiatives in physics. But the amazing (and to me very unexpected) thing is that we can take our new paradigm and also apply it to a huge range of other areas. Just a couple of weeks ago I published a 250page treatise about its application to the “physicalization of metamathematics”—and to providing a very new view of the foundations of mathematics (with implications both for the question of what mathematics really is, and for the practical longterm future of mathematics).
In a sense, everything we’re doing ultimately builds on the great intellectual tide of the past century: the rise of the concept of computation. (And, yes, that’s something in which I’ve been deeply involved both scientifically and technologically throughout my career.) But what’s happening now is something else—that one can see as the birth of what I call the multicomputational paradigm. It’s all about doing what our Physics Project has suggested, and going beyond working with specific computations—to look at the systemic behavior of whole interacting collections of computations. In the whole multimillennium history of science, there’ve only been a very few fundamentally different paradigms for making models of things—and I think multicomputation is basically the fourth one ever.
And what makes its arrival particularly dramatic is that it comes already supercharged by its deep relation to physics, and by its ability to spread the successes of physics to other areas of science, and beyond. When I started investigating the concept of computational irreducibility in the 1980s it became clear that there are fundamental barriers to many kinds of scientific progress. But what I didn’t see coming is there would be a new path opened up by a new paradigm: the paradigm of multicomputation. And suddenly there are now all sorts of fundamental questions that are no longer blocked—and instead are ripe for rapid progress.
Over the past year we’ve started exploring a host of potential application areas. We’ve got a concept of “formalized subchemistry” with applications to a new way of thinking about molecular computing, and with potentially dramatic implications for molecular biology. We’ve got new ideas about how to think about immunology and probably also neuroscience. We’ve got a potential new approach to finding a formalization for biological evolution and its relation to biocomplexity. We’ve got a new concept for “geometrizing” the space of programs—with implications for foundational questions in computational complexity theory. We’ve got a promising way to construct new kinds of theories for economics, with implications, for example, for distributed generalizations of blockchain. We’ve got a potential new way to think about linguistics, and the structure of meaning space. Oh, and we’ve got a new “physicalized” way to conceptualize and organize distributed computing.
In the nearly half a century that I’ve been doing science, I’ve had the good fortune to be involved in quite a few significant bursts of progress. But I’ve never seen one quite as concentrated and immediate as what we’re now seeing. It’s an exciting thing. But it’s also overwhelming. There’s just so much lowhanging fruit to be picked—so many things with such important potential consequences, for science, technology and the world.
We’ve been working very hard to move all this forward with the resources we have available. But even though I think we’ve achieved a remarkable level of productivity, it’s become clear that there’s just too much to do. We’re in the midst of a major “science opportunity overload”. And to be good stewards of the ideas and their potential we’ve got to scale things up. I’ve had lots of experience over decades in making big projects happen. And now it’s time to take that experience and define a new structure to move forward the amazing science opportunity we find ourselves with.
And I think that leaves us no choice: we’ve got to launch the Wolfram Institute, and now!
In the course of my life I’ve spent a great deal of effort trying to maximize productivity and innovation around the things I do. I’ve done lots of (arguably nerdy) optimization of my own personal setup. I’ve thought long and hard about the best strategies both for choosing what to do, and for getting things done. But in many ways the most important piece has been the whole structure we’ve built up over the past 35 years at Wolfram Research.
I sometimes refer to our company as a machine for turning ideas into real things. And I think it’s been an extremely impressive machine—year after year for more than three decades systematically delivering all sorts of farreaching innovation, and using our progressive tower of technology to efficiently implement it on larger and larger scales.
As a company, we’ve mainly been concerned with delivering technology and products. But the systems, culture and methodologies we’ve developed are, at their core, about maximizing innovation and productivity. So what happens if we apply them to science?
A New Kind of Science was in a sense a first result, and I consider it an impressive one. Indeed, even nearly 20 years after it was published I’m still amazed at the sheer volume—and depth—of scientific results that it was possible to obtain in the span of just a single decade.
But with the Wolfram Physics Project it’s a yet more impressive story. We started late in 2019, and in less than six months—with an extremely small team—we were able to make dramatic progress, and to publish nearly 700 pages of material about it. It wouldn’t have been even vaguely possible without the whole tower of computational tooling provided by the Wolfram Language. And nor would it have been possible without the structure and strategy for doing projects that we’ve honed over the past three decades.
In the past two years, we’ve been energetically moving forward—and we’ve now published altogether over 2500 pages of new material, as well as nearly 200 publicly deployed new functions. In terms of scientific ideas the pace of innovation has been quite breathtaking. But we’ve also been innovating in terms of how to do the science.
An important objective has been to open the science up to give the widest possible access and potential for engagement. I’ve worked hard to define a style of expository writing that makes what we’ve done accessible to a wide audience as well as to experts. And in what we’ve published, essentially every graphic has “clicktocopy” Wolfram Language code, that anyone can immediately run and build on. We’ve also uploaded our working notebooks—so far nearly 2000 of them—so that everyone can see not only our “finished product” but also the research (wrong turns and all) that led to it.
A few years ago I started livestreaming to the world many of our software design reviews. And building on this concept, we’ve now routinely been livestreaming our scientific working sessions—giving people for the first time realtime visibility into how science is done, as well as the possibility to interact with it. And for those interested in an even deeper dive, we’ve also been recording and uploading “video work logs”—bringing us up to a total of nearly 1000 hours of video so far.
Even when we thought we were “just solving physics” we knew we had to involve other people in the project. For 20 years we’ve been doing a very successful annual Summer School about our approach to science and technology, and starting in 2020 we added a track about the Physics Project, as well as a physics Winter School. We’ve had a terrific stream of “students” for our Physics Project. And partly building on this we’ve been setting up a network of people involved in the Physics Project—now with 55 members from 20 countries.
Increasingly, there’s work based on our Physics Project that’s happening in academic institutions, quite independent of us. And no doubt this will bear all sorts of fruit.
But as we look at the next phases of our Physics Project, and even more so the huge collection of opportunities provided by our new multicomputational paradigm, it’s clear there’s so much to do that—particularly if we want it to happen in years rather than decades—we need a more focused approach.
And the good news is that through 35 years of experience at Wolfram Research, as well as the experience of A New Kind of Science and the Wolfram Physics Project, we have an excellent blueprint for what to do. But now we have to implement it at scale. And that’s what the Wolfram Institute is about.
The basic plan is simple: to create a basicscience analog of the immensely productive “machine” that I’ve built at Wolfram Research over the past 35 years—and to use this “machine” to accelerate the delivery of new science by many decades if not more. We’ve already got a definite seed: the Wolfram Physics Project. But now we have to scale this up to the full Wolfram Institute—and give it the structure it needs to grow and take full advantage of the amazing opportunities we now have.
It’s often assumed that the way to achieve maximum innovation is to put together innovative people, and then let them “just innovate” in whatever directions they choose. But it has been my consistent experience that the greatest innovation is instead achieved when there is a definite “flow” and definite, ambitious goals. The Wolfram Institute is going to be about doing largescale basic research this way.
Modeled on Wolfram Research and the Wolfram Physics Project, the idea of the Wolfram Institute is to aggressively pursue basic research that’s explicitly managed and energetically led towards its goals.
Our initial goals are already tremendously ambitious. We want to use our new paradigm to basically rewrite the foundations of several important fields of science. As with the Physics Project there’ll no doubt be tremendous synergy with existing approaches and the communities around them. But with the paradigm we now have, the tools and methods we’ve developed, and the organizational framework we’re defining for the Wolfram Institute, I think we have the opportunity to jump ahead, and in effect to deliver foundational science that would otherwise emerge at best only in the distant future.
It’s going to be an exciting thing to be a part of, and—as with projects I’ve done in the past—there are going to be many outstanding people who want to be involved. Our many decades of activity in science and technology have provided an extremely broad network of contacts, and we’ve developed a particularly concentrated pipeline of worldwide talent through our annual Summer School (as well as our High School Summer Camp).
What will the Wolfram Institute be like? There’ll be a leadership core, which, yes, I’m signing up to head. But the main meat of the institute will be a collection of researchers and fellows, working on particular, managed projects—together with students at multiple levels from high school to graduate school. We’ll be doing open science, so there’ll be lots of livestreaming and lots of open tools produced. There’ll be lots of working materials and academic papers published, and whenever we manage to make a big step forward our plan is to present it to the world using the immediate and accessible approach to exposition that I’ve developed.
In what we’ve done for the past 35 years at Wolfram Research there’s in a sense a clear model for how the organization fundamentally operates. We invent things, then we deliver them in products—and from these products we derive revenue, which is then used to allow us to invent more things and deliver more products. It’s been an extremely productive setup. And because we’re a private company without outside investment we’re able to chart our own course, pursuing the ambitious and longterm projects that we believe in. Often we choose to make things we do freely available to the world, but in the end we rely on the fact that we’re producing commercially valuable products from which we derive revenue that funds our activities.
There’s no question that the science we’ll be doing at the Wolfram Institute will lead to things of great value to the world. But it won’t be nearterm commercial value. And our fundamental model is to concentrate on producing the best and broadest longterm basic research—rather than to aim for things that can be deployed directly in specific products that provide immediate value to specific customers.
In a sense, our “customer” is the world at large—and the future. But in the present we need a way to support our researchers and fellows. So far we’ve basically been incubating the Wolfram Institute within our existing organization, with me effectively footing the bill. But as we launch the full Wolfram Institute we need a larger scale of support, and we’re counting on having a network of people and organizations to provide that.
Some of the support will be for specific researchers, fellows or students, perhaps drawn from particular geographies, backgrounds or communities. Some of it will be for specific projects. But it’s also important to have a stable core of support that will allow the institute to pursue longterm basic research that will likely deliver many of its most valuable results through developments that are in effect computationally irreducible to predict in advance.
It’s difficult to know how society at large should value the general activity of basic research, and it’s easy to criticize the inefficiencies of a largescale “just let researchers do what they want” approach. But with the Wolfram Institute we have a very different model. We’re starting the institute right now for a specific reason: we’ve got a new paradigm that’s just opened up an amazing collection of possibilities. And we plan to pursue those possibilities in an efficient and tightly managed way, optimized for innovation and new ideas.
When I look at our Physics Project and what we’ve achieved so far with it, I’m frankly amazed at how quickly and comparatively frugally we’ve managed to do it. (Yes, it helps that at least so far we need only people and computers, not telescopes and particle accelerators.) And as we scale this up to all the various projects we plan at the Wolfram Institute, it’s almost absurd how much of longterm significance I expect we will be able to deliver for how comparatively little.
The Wolfram Physics Project has been done as an entirely geodistributed project—building on the 30+ years of experience with coherent geodistributed work that we’ve had at Wolfram Research. And even though its projects will be tightly managed, the Wolfram Institute will also primarily be geodistributed, although we plan regular physical events and we’ll probably have some physical locations available.
It’s been great in working on the Wolfram Physics Project to have what we’ve been doing be so open, and to be able to share it with so many people. And as we launch the Wolfram Institute I’m looking forward to having all sorts of people involved, both within the institute, and as supporters of it.
It’s been exciting these past many months seeing the whole multicomputational paradigm emerge, and seeing more and more possibilities reveal themselves. It’s a remarkable—if overwhelming—collection of opportunities, and I believe a historic moment for the progress of science. And as we launch the Wolfram Institute I hope that with the help of enough supporters we’ll be able to deliver many dramatic results that will have great longterm value to the world and to the arc of intellectual history.
It seems like the kind of question that might have been hotly debated by ancient philosophers, but would have been settled long ago: how is it that things can move? And indeed with the view of physical space that’s been almost universally adopted for the past two thousand years it’s basically a nonquestion. As crystallized by the likes of Euclid it’s been assumed that space is ultimately just a kind of “geometrical background” into which any physical thing can be put—and then moved around.
But in our Physics Project we’ve developed a fundamentally different view of space—in which space is not just a background, but has its own elaborate composition and structure. And in fact, we posit that space is in a sense everything that exists, and that all “things” are ultimately just features of the structure of space. We imagine that at the lowest level, space consists of large numbers of abstract “atoms of space” connected in a hypergraph that’s continually getting updated according to definite rules and that’s a huge version of something like this:
✕

But with this setup, what even is motion? It’s no longer something baked into our basic ideas about space. Instead—much like the ancient philosophers imagined—it’s something we can try to derive from a lower level of description. It’s not something we can take for granted—and indeed it’s going to turn out that its character depends in fundamental ways on issues like our nature as observers.
To have a concept of motion, one has to have not only a concept of space—and time—but also a concept of “things”. One has to have something definite that one can imagine moves through space with time. And in effect the concept of “pure motion” is that there can be a “thing” that “just moves” without “changing its character”. But if the thing is “made of atoms of space” that are continually getting updated, what does this mean? Somehow the identity of the “thing” has to be associated with some collective characteristic that doesn’t depend on the particular atoms of space from which it’s made.
There’s an immediate analogy here. Consider something like a vortex in a fluid. The vortex can move around as a “thing” even though “underneath” it’s made of an everchanging collection of lots of discrete molecules. If we looked in microscopic detail, we’d see effects from those discrete molecules. But at the scale at which we humans typically operate, we just consider there to be a definite “thing” we describe as a vortex—that at this level of description exhibits “pure motion”.
Our fundamental model of space is not so different from this. At the lowest level there’s continual activity associated with the application of rules that create new atoms of space and new connections between them. And just as continual collisions between molecules in a fluid “knit together” the structure of the fluid, so also the continual rewriting of the hypergraph that connects atoms of space knits together the structure of space. But then on top of this there can be “localized collective features” that have a certain persistence. And these are the “things” (or “objects”) that we can consider to “show pure motion”.
Physics suggests two kinds of things like this. The first are particles, like electrons or photons or quarks. And the second are black holes. As of now, we have no specific evidence that particles like electrons are “made of anything”; they just seem to act like geometrical points. But in our Physics Project we posit that they are ultimately “made of space” and actually contain large numbers of atoms of space that collectively form some kind of persistent structure a bit like a vortex in a fluid.
Black holes operate on a very different scale—though I suspect they’re actually very similar in character to particles. And in fact for black holes we already have a sense from traditional general relativity that they can just be “made of space”—though without our discrete underlying model there are some inevitable mathematical hacks involved.
So what is it that leads to persistent structures? Often one can identify it as something “topological”. There’s an underlying “medium” in which all sorts of essentially continuous changes can be made. But then there are structures that can’t be created or destroyed by such continuous changes—in effect because they are “topologically distinct”. Vortices are one such example—because around the core of the vortex, independent of what “continuous deformations” one makes, there’s always a constant circulation of fluid, that can’t be gotten rid of except by some kind of discontinuous change. (In reality, of course, vortices are eventually damped out by viscosity generated as a result of microscopic motion, but the point is that this takes a while, and until it’s happened, the vortex can reasonably be considered to persistently be a “thing”.)
In our Physics Project, we’ve already been able to figure out quite a bit about how black holes work. We know less about the specifics of how particles work. But the basic idea is that somehow there are features that are local and persistent that we can identify as particles—and perhaps these features have topological origins that make it inevitable that, for example, all electrons “intrinsically seem the same”, and that there are only a discrete set of possible types of particles (at least at our energy scales).
So in the end what we imagine is that there are certain “carriers of pure motion”: certain collective features of space that are persistent enough that we can consider them to “just move”, without changing. At the outset it’s not obvious that any such features should exist at all, and that pure motion should ever be possible. Unlike in the traditional “pure geometrical” view of space, in our Physics Project it’s something one has to explicitly derive from the underlying structure of the model—though it seems quite likely that it’s ultimately an inevitable and ubiquitous consequence of rather general “topological” features of hypergraph rewriting.
We keep on talking about “features that persist”. But what does this really mean? As soon as something moves it’ll be made of different atoms of space. So what does it mean for it to “persist”? In the end it’s all about what observers perceive. Do we view it as being the “same thing” but in a different place? Or do we say it’s different because some detail of it is different?
And actually this kind of issue already comes up even before we’re talking about motion and the persistence of “objects”: it’s crucial just in the emergence of the basic notion of space itself. At the level of individual atoms of space there isn’t anything we can really call “space”, just like at the level of individual molecules there isn’t anything we can reasonably call a fluid. And instead, the notion of space—or of fluids—emerges when we look at things in the kind of way that observers like us do. We’re not tracking what’s happening at the level of individual atoms of space—or individual molecules; we’re looking at things in a more coarsegrained way, that it turns out we can summarize in terms of what amount to continuum concepts.
Once again, it’s not obvious things will work like this. Down at the level of atoms of space—or, for that matter, molecules—there are definite computational rules being followed. And from the Principle of Computational Equivalence it’s almost inevitable that there’ll be computational irreducibility, implying that there’s no way to find the outcome except in effect by doing an irreducible amount of computational work. If we as observers were computationally unbounded then, yes, we could always “decode” what’s going on, and “see down” to the behavior of individual atoms of space or individual molecules. But if we’re computationally bounded we can’t do this. And, as I’ve argued elsewhere, that’s both why we believe in the Second Law of thermodynamics, and why we perceive there to be something like ordinary “geometrical space”.
In other words, our inability to track the details means that in a first approximation we can summarize what’s going on just by saying we’ve got something that seems like our ordinary notion of space. And going one step beyond that is what has us talking about “persistent objects in space”. But now we’re back to discussing what it means for an object to “be persistent”. Ultimately it’s that we as observers somehow perceive it to “be the same”, even though perhaps in a “different place”.
A key finding of our Physics Project is that certain basic laws of physics—in particular general relativity and quantum mechanics—inevitably seem to emerge as soon as we assume that observers have two basic characteristics: first, that they are computationally bounded, and second, that they are persistent in time.
In our Physics Project the passage of time corresponds to the inexorable (and irreducible) computational process of updating the “spatial hypergraph” that represents the lowestlevel structure of the universe. And when we talk formally we can imagine looking at this “from the outside”. But in reality we as observers must be embedded within the system, being continually updated and changed just like the rest of the system.
But here there’s a crucial point. Even though the particular configuration of atoms in our brains is continually changing, we think it’s “still us”. Or, in other words, we have the perception that we persist through time. Now it could be that this wouldn’t be a consistent thing to imagine, and that if we imagined it, we’d never be able to form a coherent view of the world. But in fact what our Physics Project implies is that with this assumption we can (subject to various conditions) form a coherent view of the world, and it’s one where the core known laws of physics are in evidence.
OK, so we ourselves are persistent essentially because we assume that we are (and in most situations nothing goes wrong if we do this). But the persistence of something like a particle, or a black hole, is a different story. From our point of view, we’re not “inside” things like these; instead we’re “looking at them from the outside”.
But what do we notice in them? Well, that depends on our “powers of observation”. The basic idea of particles, for example, is that they should be objects that can somehow be separated from each other and from everything else. In our Physics Project, though, any particle must ultimately be “embedded as a part of space”. So when we say that it’s a “separable object” what we’re imagining is just that there’s some attribute of it that we can identify and observe independent of its “environment”.
But just what this is can depend on our characteristics as observers, and the fact that we operate on certain scales of length and time. If we were able to go down to the level of individual atoms of space we probably wouldn’t be able to “see” that there’s anything like a particle there at all. That’s something that emerges for observers with our kinds of characteristics.
Quite what the full spectrum of “conceivable persistent features” might be isn’t clear (though we’ll see some exotic possibilities below). But as soon as one can identify a persistent feature, one can ask about motion. Is it possible for that feature to “move” from being embedded at one “place” to another?
There’s yet another subtlety here, though. Our ordinary experience of motion involves things going from one place to another by progressively “visiting every place in between”. But ultimately, as soon as we’re dealing with discrete atoms of space, this can’t be how things work. And instead what we need to discuss is whether something somehow “maintains its form” at intermediate stages as it “moves”.
For example, we probably wouldn’t consider it motion in the ordinary sense if what we had was a kind of Star Trek–like “transporter” in which objects get completely disassembled, then get “transmitted to a different place” and reassembled. But somehow it does seem more like “ordinary motion” if there’s a collection of pixel values that move across a computer screen—even if at intermediate moments they are distorted by all sorts of aliasing effects.
Even in ordinary general relativity there are issues with the idea of motion—at least for extended objects. If we’re in a region of space that’s reasonably flat it’s fine. But if we’re near a spacetime singularity then inevitably objects won’t be able to “maintain their integrity”—and instead they’ll effectively be “shredded”—and so can’t be interpreted as “just moving”. When we’re dealing not with geometric continuum spacetime but instead with our spatial hypergraph, there’ll always be something analogous to “shredding” on a small enough scale, and the question is whether at the level we perceive things we’ll be able to tell that there’s something persistent that isn’t shredded.
So, in the end, how is it that things can move? Ultimately it’s something that has to be formally derived from the underlying model, based on the characteristics of the observer. At least conceptually the first step is to identify what kinds of things the observer considers “the same”, and what details make them “seem different”. Then one needs to determine whether there are structures that would be considered the same by the observer, but which progressively change ”where they’re embedded”. And if so, we’ve identified “motion”.
For us humans with our current state of technological development, particles and objects made of them are the most obvious things to consider. So in a sense the question reduces to whether there are “lumps of space” that persist in maintaining (perhaps topological) features recognized by our powers of perception. And to determine this is a formal question that’s important to explore as our Physics Project progresses.
We’ve talked about “persistent structures” as “carriers of pure motion”. But how do such structures actually work? Ultimately it can be a very complicated story. But here we’ll consider a simplified case that begins to illustrate some of the issues. We’ll be talking not about the actual model of space in our Physics Project, but instead about the cellular automaton systems I’ve studied for many years in which space is effectively predefined to consist of a rigid array of cells, each with a discrete value updated according to a local rule.
Here’s an example in which there quickly emerge obvious “localized persistent structures” that we can think of as being roughly like particles:
✕

Some “stay still” relative to the fixed cellular automaton background; others “move”. With this specific cellular automaton, it’s easy to identify certain possible “particles”, some “staying still” and some “showing motion”:
✕

But consider instead a cellular automaton with very different behavior:
✕

Does this support the concept of motion? Certainly not as obviously as the previous case. And in fact there doesn’t seem to be anything identifiable that systematically propagates across the system. Or in other words, at least with our typical “powers of perception” we don’t “see motion” here.
There’s a whole spectrum of more complicated cases, however. Consider for example:
✕

Here one can easily identify “particlelike” structures, but they never seem to “keep moving forever”; instead they always fairly quickly interact and “annihilate”. But to expect otherwise is to imagine an idealization in which there is at some level “only one object” in the whole system. As soon as there are multiple objects it’s basically inevitable that they’ll eventually interact. Or, put another way, motion in any real situation will never be about “persistently moving” forever; it’s just about persisting for at least long enough to be identified as something separate and definite. (This is very similar to the situation in quantum field theory where actual particles eventually interact, even though their formal definition assumes no interaction.)
Here’s another case, where on a large scale there’s no “obvious motion” to be seen
✕

but where locally one can identify rather simple “particlelike” structures
✕

that on their own can be thought of as “exhibiting motion”, even though there are other structures that for example just expand, apparently without bound:
✕

Sometimes there can be lots of “particlelike” activity, but with other things consistently mixed in:
✕

Here’s a slightly more exotic example, where continual “streams of particles” are produced:
✕

In all the examples we’ve seen so far the “particles” exist on a “blank” or otherwise simple background. But it’s also perfectly possible for them to be on a background with more elaborate structure:
✕

But what about a seemingly random background? Here’s at least a partial example where there are both structures that “respond to the background” and ones that have “intrinsic particlelike form”:
✕

What does all this mean for the concept of motion? The most important point is that we’ve seen that “objects” that can be thought of as “showing pure motion” can emerge even in underlying systems that don’t seem to have any particular “builtin concept of motion”. But what we’ve also seen is that along with “objects that show pure motion” there can be all sorts of other effects and phenomena. And in our actual Physics Project these can necessarily in a sense be much more extreme.
The cellular automaton systems we’ve been discussing so far have a builtin underlying notion of space, which exists even if the system basically “doesn’t do anything”. But in our Physics Project the structure of space itself is created through activity. So—as we discussed in the previous section—“objects” or particles have to somehow exist “on top” of this.
It’s fairly clear roughly how such particles must work, being based for example on essentially topological features of the system. But we don’t yet know the details, and there’s probably quite a depth of mathematical formalism that needs to be built to clarify them. It’s still possible, though, to explore at least some toy examples.
Consider the hypergraph rewriting rule:
✕

It maintains a very simple (effectively 1D and cyclic) form of space (with rewrites shown in red):
✕

If the initial conditions contain a feature that can be interpreted as something like a “particle” then the rules are such that this can “move around”, but can’t be destroyed:
✕

It’s a little clearer what’s going on if instead of looking at an explicit sequence of hypergraphs we instead generate causal graphs (see the next section) that show the “spacetime” network of causal relationships between updating events. Here’s the causal graph for the “space only, no particles” case (where here we can think of time as effectively running from left to right):
✕

Here’s the causal graph when there’s a “particle” included:
✕

And here’s the result when there are “two particles”—where things begin to get more complicated:
✕

We’ve discussed what it takes for an observer to identify something as “moving” in a system. But so far there’s an important piece we’ve left out. Because in effect we’ve assumed that the observer is “outside the system” and “looking in”. But if we imagine that we’re dealing with a complete model of the physical universe the observer necessarily has to “be inside”. And ultimately the observer has got to be “made of the same stuff” as whatever thing it is to which we’re attributing motion.
How does an observer observe? Ultimately whatever is “happening in the outside world” must affect the observer, and the observer must change as a result. Our Physics Project has a fundamental way to think about change, in terms of elementary “updating events”. In addition to imagining that space is made up of discrete “atoms of space”, we imagine that change is made up of discrete “atoms of change” or “events”.
In the hypergraph that represents space and everything in it, each event updates (or “rewrites”) the hypergraph, by “consuming” some collection of hyperedges, and generating a new collection. But actually events are a more general concept that don’t for example depend on having an underlying hypergraph. We can just think of them as consuming collections of “tokens”, whatever they may be, and generating new ones.
But events satisfy a very important constraint, which in some sense is responsible for the very existence of what we think of as time. And the constraint is that for any event to happen, all the tokens it’s going to consume have to exist. But those tokens have to have “come from somewhere”. And at least if we ignore what happens “at the very beginning” every token that’s going to be consumed has to have been generated by some other event. In other words, there’s a certain necessary ordering among events. And we can capture this by constructing a causal graph that captures the causal relationships that must exist between events.
As a simple example, here’s a system that consists of a string of As and Bs, and in which each “updating event” (indicated as a yellow box) corresponds to an application of the rule BA→AB:
✕

Here’s the causal graph for this superimposed:
✕

Imagine that some collection of characters on the lefthand side represents “an observer”. The only way this observer can be affected by what happens on the righthand side is as a result of its events being affected by events on the righthand side. But what event is affected by what other event is exactly what the causal graph defines. And so in the end we can say that what the observer can “perceive” is just the causal graph of causal relationships between events.
“From the outside” we might see some particular “absolute” arrangement of events in the cellularautomatonlike picture above. But the point is that “from the inside” the observer can’t perceive this “absolute arrangement”. All they can perceive is the causal graph. Or, put another way, the observer doesn’t have any “absolute knowledge” of the system; all they “know about” is “effects on them”.
So what does this imply about motion? In something like a cellular automaton there’s a fixed concept of space that we typically “look at from the outside”—and we can readily “see what’s moving” relative to that fixed, absolute “background space”. But in something like our Physics Project we imagine that any observer must be inside the system, able to “tell what’s going on” only from the causal graph.
In standard physics we might posit that to find out “where something is” we’d have to probe it, say with light signals. Here we’ve broken everything down to the level of elementary events and we’re in some sense “representing everything that can happen” in terms of the causal graph of relationships between events.
And in fact as soon as we assume that our “perceived reality” has to be based on the causal graph, we’ve inevitably abandoned any absolute notion of space. All we as observers can know is “relative information”, defined for us by the causal graph.
Looking at our BA→AB system above we can see that “viewed from the outside” there’s a lot of arbitrariness in “when we do” each update. But it turns out that none of this matters to the causal graph we construct—because this particular underlying system has the property of causal invariance, which makes the causal graph have the same structure independent of these choices. And in general whenever there’s causal invariance (which there inevitably will be at least at the ultimate level of the ruliad) this has the important implication that there’s relativistic invariance in the system.
We won’t go into this in detail here. Because while it certainly affects the specifics of how motion works there are more fundamental issues to discuss about the underlying concept of motion itself.
We’ve already discussed the idea that observers like us posit our own persistence through time. But now we can be a bit more precise—and say that what we really posit is that we “follow the causal graph”. It could be that our perception samples all sorts of events—that we might think of as being “all over spacetime”. But in fact we assume that we don’t “jump around the causal graph”, and that instead our experiences are based on “coherent paths” through the causal graph.
We never in any absolute sense “know where we are”. But we construct our notion of place by positing that we exist at a definite—and in a sense “coherent”—place, relative to which we perceive other things. If our perception of “where we are” could “jump around” the causal graph, we’d never be able to define a coherent concept of pure motion.
To make this a little bit “more practical” let’s discuss (as I did some time ago) the question of fasterthanlight travel in our Physics Project. By the very definition of the causal graph the effect of one event on another is represented by the presence of a “causal path” between the events within the graph. We can assume that “traversing” each “causal edge” (i.e. going from one event to the next) takes a certain elementary time. But to work out “how fast the effect propagated” we need to know how “far away in space” the event that was affected is.
But recall that all the observer ultimately has available is the causal graph. So any questions about “distances in space” have to be deduced from the causal graph. And the nature of the observer—and the assumptions they make about themselves—inevitably affect the deductions they make.
Imagine a causal graph that is mostly a grid, but suppose there is a single edge that “jumps across the grid”, connecting events that would otherwise be distant in the graph. If we as observers were sensitive to that single edge it’d make us think that the two events it joins are “very close together”. But if we look only at the “bulk structure” of the causal graph, we’d ignore that edge in our definition of the “layout of space”, and consider it only as some kind of “microscopic anomaly”.
So should we in fact include that single edge when we define our concept of motion? If we posit that we “exist at a definite place” then the presence of such an edge in what “constitutes us” means the “place we’re at” must extend to wherever in the causal graph the edge reaches. But if there are enough “stray edges” (or in general what I call “space tunnels”) we as observers would inevitably get very “delocalized”.
To be able to “observe motion” we’d better be observers who can coherently form a notion of space in which there can be consistent “local places”. And if there’s some elaborate pattern of space tunnels this could potentially be broken. Although ultimately it won’t be unless the space tunnels are somehow coherent enough to “get observers like us through them”.
Earlier we saw that the concept of motion depends on the idea that we as observers can identify “things” as “persistent” relative to the “background structure of space”. And now we can see that in fact motion depends on a certain persistence in time and “coherence” in place not only for the “thing” we posit is moving, but also for us as observers observing it.
In our Physics Project we imagine that both time and space are fundamentally discrete. But the concept of persistence—or “coherence”—implies that at least at the level of our perception there must be a certain effectively continuous character to them. There’s a certain resonance with things like Zeno’s paradoxes. Yes, our models may define only what happens at a sequence of discrete steps. But the perception that we persistently exist will make us effectively fill in all the “intervening moments”—to form what we experience as a “continuous thread of existence”.
The idea that pure motion is possible is thus intimately connected to the idea of the continuum. Pure motion in a sense posits that there is some kind of “thread of existence” for “things” that leads from one place and time to another. But ultimately all that’s relevant is that observers like us perceive there to be such a thread. And the whole point is that the possibility of such perception can be deduced as a matter of formal derivation from the structure of the underlying model and general characteristics of us as observers.
But in describing our perception what we’ll tend to do is to talk in terms of the continuum. Because that’s the level of description at which we can abstractly discuss pure motion, without having to get into the mechanics of how it happens. And in effect the “derivation of pure motion” is thus directly connected to the “derivation of the continuum”: pure motion is in a sense an operational consequence not necessarily of an actual continuum world, but of a continuum perception of the world by an embedded observer like us.
Our everyday experience of motion has to do with ordinary, physical space. But the multicomputational paradigm inspired by our Physics Project inevitably leads to other kinds of space—that are different in character and interpretation from ordinary, physical space, but have deep analogies to it. So in the context of these other kinds of space, what analogs of the concept of “pure motion” might there be?
Let’s talk first about branchial space, which in our Physics Project is interpreted as the space of quantum states. To approach this from a simple example, let’s consider the multiway graph generated by applying the rule {A→AB,B→A} in all possible ways to each “state”:
✕

We can think of each path through this graph as defining a possible history for the system, leading to a complicated pattern of possible “threads of history”, sometimes branching and sometimes merging. But now consider taking a “branchial slice” across this system—and then characterizing the “multicomputational behavior” of the system by constructing what we call the branchial graph by joining states that share an ancestor on the step before:
✕

For physics, we interpret the nodes of these graphs as quantum states, so that the branchial graph effectively gives us a “map of quantum entanglements” between states. And just like for the hypergraph that we imagine defines the relations between the atoms of physical space, we think about the limit of a very large branchial graph—that gives us what we can call branchial space. As we’ve discussed elsewhere, branchial space is in many ways much wilder than ordinary, physical space, and is for example probably exponentialdimensional.
In basic quantum mechanics, distances in branchial space are probably related to differences in phase between quantum amplitudes. In more complicated cases they probably correspond to more complicated transformations between quantum states. So how might we think about “motion” in branchial space?
Although we’ve discussed it at length elsewhere, we didn’t above talk about what we might call “bulk motion” in physical space, as effectively produced by the curvature of space associated with gravity. But in branchial space there seems to be a directly analogous phenomenon—in which the presence of energy (which corresponds to the density of activity in the system) leads to an effective curvature in branchial space which deflects all paths, in a way that seems to produce the change of quantum phase specified by the path integral.
But can we identify specific things moving and preserving their identity in branchial space, as we can identify things like particles moving in physical space? It’s a tricky story, incompletely figured out, and deeply connected to issues of quantum measurement. But just like in physical space, an important issue is to define what “observers like us” are like. And a crucial first step is to realize that—as entities embedded in the universe—we must inevitably have multiple histories. So to ask how we perceive what happens in the universe is in effect to ask how a “branching mind” perceives a branching universe.
And the crucial point—directly analogous to what we’ve discussed in the case of physical space—is that whatever one might be able to “see from outside”, we “internally” assume that we as observers have a certain persistence and coherence. In particular, even though “from the outside” the multiway graph might show many branching threads of history, our perception is that we have a single thread of experience.
In ordinary quantum mechanics, it’s quite tricky to see how this “conflation of threads of history” interacts even with “bulk motion” in branchial space. Typically, as in traditional quantum measurement, one just considers “snapshots” at particular times. Yes, one can imagine that things like wave packets spread out in branchial space, but—a bit like discussing “motion” for gravitational fields or even gravitational waves in spacetime—there isn’t the same kind of systematic concept of pure motion that we’ve encountered with things like particles in physical space.
When we get to quantum field theory—or the full quantum gravity associated with our models—it will probably be a different story. Perhaps we can view certain configurations of quantum fields as being like structures in branchial space, that an observer will consider to be localized and persistent. Indeed, it’s easy to imagine that in the branchial graph—or even more so the multiway causal graph—there may be things like “topologically stable” structures that we can reasonably think of as “things that move”. But just what the character and interpretation of such things might be, we don’t yet know.
There’s physical space, and there’s branchial space. But in a sense the ultimate kind of space is rulial space. The story begins with the ruliad, which represents the entangled limit of all possible computations. The ruliad is what we imagine underlies not only physics but also mathematics. When we “experience physics” we’re sampling a certain slice of the ruliad that’s accessible to physical observers like us. And when we “experience mathematics” we’re sampling a slice of the ruliad that’s accessible to “mathematical observers” like us.
So what do different “places” in rulial space correspond to? Fundamentally they’re different choices for the rules we sample from the ruliad. Ultimately everything is part of the unique object that is the ruliad. But at different places in the ruliad we’ll have different specific experiences as observers.
Inevitably, though, there’s a translation that can be made. It’s basically like the situation with different computational systems that—according to the Principle of Computational Equivalence—are generically universal: there’s always an “interpreter” that can be created in one system that can translate to the other.
In a sense the idea of different places in rulial space is quite familiar from our everyday experience. Because it’s directly analogous to the idea that different minds “parse” and “experience” the world differently. Whether one’s talking about a human brain or an artificial neural net, the details of its past experience will cause it to represent things in the world in different ways, and to process them differently.
At the very lowest level, the components of the systems will—like any other universal computer—be able to emulate the detailed operations of other systems. But at this level there are no “things that are moving from one place to another in rulial space”; everything is just being “atomized”.
So are there in fact robust structures that can “move across rulial space”? The answer, I think, is yes. But it’s a strange story. I suspect that the analog in rulial space of particles in physical space is basically concepts—say of the kind that might be represented by words in a human (or computational) language.
Imagine thinking about a cat. There’s a particular representation of a cat in one’s brain—and in detail it’ll be different from the representation in anyone else’s brain. But now imagine using the word “cat”, or in some way communicating the concept of “cat”. The “cat” concept is something robust, that we’re used to seeing “transmitted” from one brain to another—even though different brains represent it differently.
Things might not work this way. It could be that there’d be no robust way to transmit anything about the thinking going on in one brain to another brain. But that’s where the idea of concepts comes in. They’re an abstracted way to “transport” some feature of thinking in one brain to another.
And in a sense they’re a reflection of the possibility of pure motion in rulial space: they’re a way to have some kind of persistent “thing” that can be traced across rulial space.
But just like our examples of motion, the way this works depends on the characteristics of the observers observing it—and insofar as we are the observers, it therefore depends on us. We know from experience that we form concepts, and that they have a certain robustness. But why is this? In a sense, concepts are a way of coarsegraining things so that we—as computationally bounded entities—can deal with them. And the fact that we take concepts to maintain some kind of fixed meaning is part of our perception that we maintain a single persistent thread of experience.
It is strange to think that something as explicit and concrete as an electron in physical space could in some sense be similar to an abstract concept like “cat”. But this is the kind of thing that happens when one has something as fundamental and general as the ruliad underlying everything.
We know that our general characteristics as observers inevitably lead to certain general laws of physics. And so similarly we can expect that our general characteristics as observers will lead to certain general laws about the overall representation of things. Perhaps we’ll be able to identify analogs of energy and gravity and quantum mechanics. But a first step is to identify the analog of motion, and the kinds of things which can exhibit pure motion.
In physical space, particles like electrons are our basic “carriers of motion”. In rulial space “concepts” seem to be our best description of the “carriers of motion” (though there are presumably higherlevel constructs too, like analogies and syntactic structures). And, yes, it might seem very odd to say that something as apparently humancentered as “concepts” can be associated with something as fundamental as motion. But as we’ve emphasized several times here, “pure motion” is something that relies on the observer, and on the observer having what amounts to a “sensory apparatus” that considers a “thing” to maintain a persistent character. So when it comes to the representation of “arbitrary content” it’s not surprising that we as observers have to talk about the fundamental way we think about things, and about constructs like concepts.
But are things like concepts the only kind of persistent structures that can exist in rulial space? They’re ones that we as observers can readily parse out of the ruliad—based for example on the particular ways of thinking that we’ve embraced so far in our intellectual development. But we can certainly imagine that there’s the possibility for “robust communication” independent, for example, of human minds.
There’s a great tendency, though, to try to relate things back to human constructs. For example, we might consider a machinelearning system that’s successfully discovered a distinction that can repeatedly be used for some purpose. And, yes, we can imagine “transporting” that to a different system. But we’ll tend to think of this again in terms of some “feature” or “concept”, even though, for example, we might not happen (at least yet) to have some word for it in a human language, or a computational language intended for use by humans.
We can similarly talk about communication with or between other animals, or, more ambitiously, we can discuss communications with or between “alien intelligences”. We might assume that we would be able to say nothing about such cases. But ultimately we imagine that everything is represented somewhere in the ruliad. And in effect by doing things like exploring arbitrarily chosen programs we can investigate possible “raw material” for “alien intelligence”.
And it’s then at some level a matter of science—or, more specifically, ruliology—to try to identify “transportable elements” between different programs, or, in effect, between different places in rulial space. At a simple level we might say we’re looking for “common principles”—which puts us back to something like “concepts”. But in general we can imagine a more elaborate computational structure for our “transportable elements” in rulial space.
In physical space we know that we can make “material objects” out of particles like electrons and quarks, and then “move these around” in physical space. Within the domain of “humanthinking rulial space” we can do something analogous with descriptions “made from known concepts”. And in both cases we can imagine that there are more general constructs that are “possible”, even though we human observers as we are now might not be able to “parse them out of the ruliad”.
The constraints of computational boundedness and perception of persistence are probably pretty fundamental to any form of experience that can be connected to us. But as we develop what amount to new sensory capabilities or new ways of thinking we can expect that our “range” as observers will at least somewhat increase.
And in a sense our very exploration of the concept of motion here can be thought of as a way to make possible a little bit more motion in rulial space. The concept of motion is a very general one. And one that we now see is deeply tied into ideas about observers and multicomputation. The question of how things can move is the same one that was asked in antiquity. But the tower of ideas that we can now bring to bear in answering is very different, and it’s sobering to see just how far we really were earlier in intellectual history from being able to meaningfully address it.
]]>One of the many surprising (and to me, unexpected) implications of our Physics Project is its suggestion of a very deep correspondence between the foundations of physics and mathematics. We might have imagined that physics would have certain laws, and mathematics would have certain theories, and that while they might be historically related, there wouldn’t be any fundamental formal correspondence between them.
But what our Physics Project suggests is that underneath everything we physically experience there is a single very general abstract structure—that we call the ruliad—and that our physical laws arise in an inexorable way from the particular samples we take of this structure. We can think of the ruliad as the entangled limit of all possible computations—or in effect a representation of all possible formal processes. And this then leads us to the idea that perhaps the ruliad might underlie not only physics but also mathematics—and that everything in mathematics, like everything in physics, might just be the result of sampling the ruliad. ]]>
One of the many surprising (and to me, unexpected) implications of our Physics Project is its suggestion of a very deep correspondence between the foundations of physics and mathematics. We might have imagined that physics would have certain laws, and mathematics would have certain theories, and that while they might be historically related, there wouldn’t be any fundamental formal correspondence between them.
But what our Physics Project suggests is that underneath everything we physically experience there is a single very general abstract structure—that we call the ruliad—and that our physical laws arise in an inexorable way from the particular samples we take of this structure. We can think of the ruliad as the entangled limit of all possible computations—or in effect a representation of all possible formal processes. And this then leads us to the idea that perhaps the ruliad might underlie not only physics but also mathematics—and that everything in mathematics, like everything in physics, might just be the result of sampling the ruliad.
Of course, mathematics as it’s normally practiced doesn’t look the same as physics. But the idea is that they can both be seen as views of the same underlying structure. What makes them different is that physical and mathematical observers sample this structure in somewhat different ways. But since in the end both kinds of observers are associated with human experience they inevitably have certain core characteristics in common. And the result is that there should be “fundamental laws of mathematics” that in some sense mirror the perceived laws of physics that we derive from our physical observation of the ruliad.
So what might those fundamental laws of mathematics be like? And how might they inform our conception of the foundations of mathematics, and our view of what mathematics really is?
The most obvious manifestation of the mathematics that we humans have developed over the course of many centuries is the few million mathematical theorems that have been published in the literature of mathematics. But what can be said in generality about this thing we call mathematics? Is there some notion of what mathematics is like “in bulk”? And what might we be able to say, for example, about the structure of mathematics in the limit of infinite future development?
When we do physics, the traditional approach has been to start from our basic sensory experience of the physical world, and of concepts like space, time and motion—and then to try to formalize our descriptions of these things, and build on these formalizations. And in its early development—for example by Euclid—mathematics took the same basic approach. But beginning a little more than a century ago there emerged the idea that one could build mathematics purely from formal axioms, without necessarily any reference to what is accessible to sensory experience.
And in a way our Physics Project begins from a similar place. Because at the outset it just considers purely abstract structures and abstract rules—typically described in terms of hypergraph rewriting—and then tries to deduce their consequences. Many of these consequences are incredibly complicated, and full of computational irreducibility. But the remarkable discovery is that when sampled by observers with certain general characteristics that make them like us, the behavior that emerges must generically have regularities that we can recognize, and in fact must follow exactly known core laws of physics.
And already this begins to suggest a new perspective to apply to the foundations of mathematics. But there’s another piece, and that’s the idea of the ruliad. We might have supposed that our universe is based on some particular chosen underlying rule, like an axiom system we might choose in mathematics. But the concept of the ruliad is in effect to represent the entangled result of “running all possible rules”. And the key point is then that it turns out that an “observer like us” sampling the ruliad must perceive behavior that corresponds to known laws of physics. In other words, without “making any choice” it’s inevitable—given what we’re like as observers—that our “experience of the ruliad” will show fundamental laws of physics.
But now we can make a bridge to mathematics. Because in embodying all possible computational processes the ruliad also necessarily embodies the consequences of all possible axiom systems. As humans doing physics we’re effectively taking a certain sampling of the ruliad. And we realize that as humans doing mathematics we’re also doing essentially the same kind of thing.
But will we see “general laws of mathematics” in the same kind of way that we see “general laws of physics”? It depends on what we’re like as “mathematical observers”. In physics, there turn out to be general laws—and concepts like space and motion—that we humans can assimilate. And in the abstract it might not be that anything similar would be true in mathematics. But it seems as if the thing mathematicians typically call mathematics is something for which it is—and where (usually in the end leveraging our experience of physics) it’s possible to successfully carve out a sampling of the ruliad that’s again one we humans can assimilate.
When we think about physics we have the idea that there’s an actual physical reality that exists—and that we experience physics within this. But in the formal axiomatic view of mathematics, things are different. There’s no obvious “underlying reality” there; instead there’s just a certain choice we make of axiom system. But now, with the concept of the ruliad, the story is different. Because now we have the idea that “deep underneath” both physics and mathematics there’s the same thing: the ruliad. And that means that insofar as physics is “grounded in reality”, so also must mathematics be.
When most working mathematicians do mathematics it seems to be typical for them to reason as if the constructs they’re dealing with (whether they be numbers or sets or whatever) are “real things”. But usually there’s a concept that in principle one could “drill down” and formalize everything in terms of some axiom system. And indeed if one wants to get a global view of mathematics and its structure as it is today, it seems as if the best approach is to work from the formalization that’s been done with axiom systems.
In starting from the ruliad and the ideas of our Physics Project we’re in effect positing a certain “theory of mathematics”. And to validate this theory we need to study the “phenomena of mathematics”. And, yes, we could do this in effect by directly “reading the whole literature of mathematics”. But it’s more efficient to start from what’s in a sense the “current prevailing underlying theory of mathematics” and to begin by building on the methods of formalized mathematics and axiom systems.
Over the past century a certain amount of metamathematics has been done by looking at the general properties of these methods. But most often when the methods are systematically used today, it’s to set up some particular mathematical derivation, normally with the aid of a computer. But here what we want to do is think about what happens if the methods are used “in bulk”. Underneath there may be all sorts of specific detailed formal derivations being done. But somehow what emerges from this is something higher level, something “more human”—and ultimately something that corresponds to our experience of pure mathematics.
How might this work? We can get an idea from an analogy in physics. Imagine we have a gas. Underneath, it consists of zillions of molecules bouncing around in detailed and complicated patterns. But most of our “human” experience of the gas is at a much more coarsegrained level—where we perceive not the detailed motions of individual molecules, but instead continuum fluid mechanics.
And so it is, I think, with mathematics. All those detailed formal derivations—for example of the kind automated theorem proving might do—are like molecular dynamics. But most of our “human experience of mathematics”—where we talk about concepts like integers or morphisms—is like fluid dynamics. The molecular dynamics is what builds up the fluid, but for most questions of “human interest” it’s possible to “reason at the fluid dynamics level”, without dropping down to molecular dynamics.
It’s certainly not obvious that this would be possible. It could be that one might start off describing things at a “fluid dynamics” level—say in the case of an actual fluid talking about the motion of vortices—but that everything would quickly get “shredded”, and that there’d soon be nothing like a vortex to be seen, only elaborate patterns of detailed microscopic molecular motions. And similarly in mathematics one might imagine that one would be able to prove theorems in terms of things like real numbers but actually find that everything gets “shredded” to the point where one has to start talking about elaborate issues of mathematical logic and different possible axiomatic foundations.
But in physics we effectively have the Second Law of thermodynamics—which we now understand in terms of computational irreducibility—that tells us that there’s a robust sense in which the microscopic details are systematically “washed out” so that things like fluid dynamics “work”. Just sometimes—like in studying Brownian motion, or hypersonic flow—the molecular dynamics level still “shines through”. But for most “human purposes” we can describe fluids just using ordinary fluid dynamics.
So what’s the analog of this in mathematics? Presumably it’s that there’s some kind of “general law of mathematics” that explains why one can so often do mathematics “purely in the large”. Just like in fluid mechanics there can be “cornercase” questions that probe down to the “molecular scale”—and indeed that’s where we can expect to see things like undecidability, as a rough analog of situations where we end up tracing the potentially infinite paths of single molecules rather than just looking at “overall fluid effects”. But somehow in most cases there’s some much stronger phenomenon at work—that effectively aggregates lowlevel details to allow the kind of “bulk description” that ends up being the essence of what we normally in practice call mathematics.
But is such a phenomenon something formally inevitable, or does it somehow depend on us humans “being in the loop”? In the case of the Second Law it’s crucial that we only get to track coarsegrained features of a gas—as we humans with our current technology typically do. Because if instead we watched and decoded what every individual molecule does, we wouldn’t end up identifying anything like the usual bulk “SecondLaw” behavior. In other words, the emergence of the Second Law is in effect a direct consequence of the fact that it’s us humans—with our limitations on measurement and computation—who are observing the gas.
So is something similar happening with mathematics? At the underlying “molecular level” there’s a lot going on. But the way we humans think about things, we’re effectively taking just particular kinds of samples. And those samples turn out to give us “general laws of mathematics” that give us our usual experience of “humanlevel mathematics”.
To ultimately ground this we have to go down to the fully abstract level of the ruliad, but we’ll already see many core effects by looking at mathematics essentially just at a traditional “axiomatic level”, albeit “in bulk”.
The full story—and the full correspondence between physics and mathematics—requires in a sense “going below” the level at which we have recognizable formal axiomatic mathematical structures; it requires going to a level at which we’re just talking about making everything out of completely abstract elements, which in physics we might interpret as “atoms of space” and in mathematics as some kind of “symbolic raw material” below variables and operators and everything else familiar in traditional axiomatic mathematics.
The deep correspondence we’re describing between physics and mathematics might make one wonder to what extent the methods we use in physics can be applied to mathematics, and vice versa. In axiomatic mathematics the emphasis tends to be on looking at particular theorems and seeing how they can be knitted together with proofs. And one could certainly imagine an analogous “axiomatic physics” in which one does particular experiments, then sees how they can “deductively” be knitted together. But our impression that there’s an “actual reality” to physics makes us seek broader laws. And the correspondence between physics and mathematics implied by the ruliad now suggests that we should be doing this in mathematics as well.
What will we find? Some of it in essence just confirms impressions that working pure mathematicians already have. But it provides a definite framework for understanding these impressions and for seeing what their limits may be. It also lets us address questions like why undecidability is so comparatively rare in practical pure mathematics, and why it is so common to discover remarkable correspondences between apparently quite different areas of mathematics. And beyond that, it suggests a host of new questions and approaches both to mathematics and metamathematics—that help frame the foundations of the remarkable intellectual edifice that we call mathematics.
If we “drill down” to what we’ve called above the “molecular level” of mathematics, what will we find there? There are many technical details (some of which we’ll discuss later) about the historical conventions of mathematics and its presentation. But in broad outline we can think of there as being a kind of “gas” of “mathematical statements”—like 1 + 1 = 2 or x + y = y + x—represented in some specified symbolic language. (And, yes, Wolfram Language provides a welldeveloped example of what that language can be like.)
But how does the “gas of statements” behave? The essential point is that new statements are derived from existing ones by “interactions” that implement laws of inference (like that q can be derived from the statement p and the statement “p implies q”). And if we trace the paths by which one statement can be derived from others, these correspond to proofs. And the whole graph of all these derivations is then a representation of the possible historical development of mathematics—with slices through this graph corresponding to the sets of statements reached at a given stage.
By talking about things like a “gas of statements” we’re making this sound a bit like physics. But while in physics a gas consists of actual, physical molecules, in mathematics our statements are just abstract things. But this is where the discoveries of our Physics Project start to be important. Because in our project we’re “drilling down” beneath for example the usual notions of space and time to an “ultimate machine code” for the physical universe. And we can think of that ultimate machine code as operating on things that are in effect just abstract constructs—very much like in mathematics.
In particular, we imagine that space and everything in it is made up of a giant network (hypergraph) of “atoms of space”—with each “atom of space” just being an abstract element that has certain relations with other elements. The evolution of the universe in time then corresponds to the application of computational rules that (much like laws of inference) take abstract relations and yield new relations—thereby progressively updating the network that represents space and everything in it.
But while the individual rules may be very simple, the whole detailed pattern of behavior to which they lead is normally very complicated—and typically shows computational irreducibility, so that there’s no way to systematically find its outcome except in effect by explicitly tracing each step. But despite all this underlying complexity it turns out—much like in the case of an ordinary gas—that at a coarsegrained level there are much simpler (“bulk”) laws of behavior that one can identify. And the remarkable thing is that these turn out to be exactly general relativity and quantum mechanics (which, yes, end up being the same theory when looked at in terms of an appropriate generalization of the notion of space).
But down at the lowest level, is there some specific computational rule that’s “running the universe”? I don’t think so. Instead, I think that in effect all possible rules are always being applied. And the result is the ruliad: the entangled structure associated with performing all possible computations.
But what then gives us our experience of the universe and of physics? Inevitably we are observers embedded within the ruliad, sampling only certain features of it. But what features we sample are determined by the characteristics of us as observers. And what seem to be critical to have “observers like us” are basically two characteristics. First, that we are computationally bounded. And second, that we somehow persistently maintain our coherence—in the sense that we can consistently identify what constitutes “us” even though the detailed atoms of space involved are continually changing.
But we can think of different “observers like us” as taking different specific samples, corresponding to different reference frames in rulial space, or just different positions in rulial space. These different observers may describe the universe as evolving according to different specific underlying rules. But the crucial point is that the general structure of the ruliad implies that so long as the observers are “like us”, it’s inevitable that their perception of the universe will be that it follows things like general relativity and quantum mechanics.
It’s very much like what happens with a gas of molecules: to an “observer like us” there are the same gas laws and the same laws of fluid dynamics essentially independent of the detailed structure of the individual molecules.
So what does all this mean for mathematics? The crucial and at first surprising point is that the ideas we’re describing in physics can in effect immediately be carried over to mathematics. And the key is that the ruliad represents not only all physics, but also all mathematics—and it shows that these are not just related, but in some sense fundamentally the same.
In the traditional formulation of axiomatic mathematics, one talks about deriving results from particular axiom systems—say Peano Arithmetic, or ZFC set theory, or the axioms of Euclidean geometry. But the ruliad in effect represents the entangled consequences not just of specific axiom systems but of all possible axiom systems (as well as all possible laws of inference).
But from this structure that in a sense corresponds to all possible mathematics, how do we pick out any particular mathematics that we’re interested in? The answer is that just as we are limited observers of the physical universe, so we are also limited observers of the “mathematical universe”.
But what are we like as “mathematical observers”? As I’ll argue in more detail later, we inherit our core characteristics from those we exhibit as “physical observers”. And that means that when we “do mathematics” we’re effectively sampling the ruliad in much the same way as when we “do physics”.
We can operate in different rulial reference frames, or at different locations in rulial space, and these will correspond to picking out different underlying “rules of mathematics”, or essentially using different axiom systems. But now we can make use of the correspondence with physics to say that we can also expect there to be certain “overall laws of mathematics” that are the result of general features of the ruliad as perceived by observers like us.
And indeed we can expect that in some formal sense these overall laws will have exactly the same structure as those in physics—so that in effect in mathematics we’ll have something like the notion of space that we have in physics, as well as formal analogs of things like general relativity and quantum mechanics.
What does this mean? It implies that—just as it’s possible to have coherent “higherlevel descriptions” in physics that don’t just operate down at the level of atoms of space, so also this should be possible in mathematics. And this in a sense is why we can expect to consistently do what I described above as “humanlevel mathematics”, without usually having to drop down to the “molecular level” of specific axiomatic structures (or below).
Say we’re talking about the Pythagorean theorem. Given some particular detailed axiom system for mathematics we can imagine using it to build up a precise—if potentially very long and pedantic—representation of the theorem. But let’s say we change some detail of our axioms, say associated with the way they talk about sets, or real numbers. We’ll almost certainly still be able to build up something we consider to be “the Pythagorean theorem”—even though the details of the representation will be different.
In other words, this thing that we as humans would call “the Pythagorean theorem” is not just a single point in the ruliad, but a whole cloud of points. And now the question is: what happens if we try to derive other results from the Pythagorean theorem? It might be that each particular representation of the theorem—corresponding to each point in the cloud—would lead to quite different results. But it could also be that essentially the whole cloud would coherently lead to the same results.
And the claim from the correspondence with physics is that there should be “general laws of mathematics” that apply to “observers like us” and that ensure that there’ll be coherence between all the different specific representations associated with the cloud that we identify as “the Pythagorean theorem”.
In physics it could have been that we’d always have to separately say what happens to every atom of space. But we know that there’s a coherent higherlevel description of space—in which for example we can just imagine that objects can move while somehow maintaining their identity. And we can now expect that it’s the same kind of thing in mathematics: that just as there’s a coherent notion of space in physics where things can for example move without being “shredded”, so also this will happen in mathematics. And this is why it’s possible to do “higherlevel mathematics” without always dropping down to the lowest level of axiomatic derivations.
It’s worth pointing out that even in physical space a concept like “pure motion” in which objects can move while maintaining their identity doesn’t always work. For example, close to a spacetime singularity, one can expect to eventually be forced to see through to the discrete structure of space—and for any “object” to inevitably be “shredded”. But most of the time it’s possible for observers like us to maintain the idea that there are coherent largescale features whose behavior we can study using “bulk” laws of physics.
And we can expect the same kind of thing to happen with mathematics. Later on, we’ll discuss more specific correspondences between phenomena in physics and mathematics—and we’ll see the effects of things like general relativity and quantum mechanics in mathematics, or, more precisely, in metamathematics.
But for now, the key point is that we can think of mathematics as somehow being made of exactly the same stuff as physics: they’re both just features of the ruliad, as sampled by observers like us. And in what follows we’ll see the great power that arises from using this to combine the achievements and intuitions of physics and mathematics—and how this lets us think about new “general laws of mathematics”, and view the ultimate foundations of mathematics in a different light.
Consider all the mathematical statements that have appeared in mathematical books and papers. We can view these in some sense as the “observed phenomena” of (human) mathematics. And if we’re going to make a “general theory of mathematics” a first step is to do something like we’d typically do in natural science, and try to “drill down” to find a uniform underlying model—or at least representation—for all of them.
At the outset, it might not be clear what sort of representation could possibly capture all those different mathematical statements. But what’s emerged over the past century or so—with particular clarity in Mathematica and the Wolfram Language—is that there is in fact a rather simple and general representation that works remarkably well: a representation in which everything is a symbolic expression.
One can view a symbolic expression such as f[g[x][y, h[z]], w] as a hierarchical or tree structure, in which at every level some particular “head” (like f) is “applied to” one or more arguments. Often in practice one deals with expressions in which the heads have “known meanings”—as in Times[Plus[2, 3], 4] in Wolfram Language. And with this kind of setup symbolic expressions are reminiscent of human natural language, with the heads basically corresponding to “known words” in the language.
And presumably it’s this familiarity from human natural language that’s caused “human natural mathematics” to develop in a way that can so readily be represented by symbolic expressions.
But in typical mathematics there’s an important wrinkle. One often wants to make statements not just about particular things but about whole classes of things. And it’s common to then just declare that some of the “symbols” (like, say, x) that appear in an expression are “variables”, while others (like, say, Plus) are not. But in our effort to capture the essence of mathematics as uniformly as possible it seems much better to burn the idea of an object representing a whole class of things right into the structure of the symbolic expression.
And indeed this is a core idea in the Wolfram Language, where something like x or f is just a “symbol that stands for itself”, while x_ is a pattern (named x) that can stand for anything. (More precisely, _ on its own is what stands for “anything”, and x_—which can also be written x:_—just says that whatever _ stands for in a particular instance will be called x.)
Then with this notation an example of a “mathematical statement” might be:
✕

In more explicit form we could write this as Equal[f[x_, y_], f[f[y_, x_],y_]]—where Equal () has the “known meaning” of representing equality. But what can we do with this statement? At a “mathematical level” the statement asserts that and should be considered equivalent. But thinking in terms of symbolic expressions there’s now a more explicit, lowerlevel, “structural” interpretation: that any expression whose structure matches can equivalently be replaced by (or, in Wolfram Language notation, just (y ∘ x) ∘ y) and vice versa. We can indicate this interpretation using the notation
✕

which can be viewed as a shorthand for the pair of Wolfram Language rules:
✕

OK, so let’s say we have the expression . Now we can just apply the rules defined by our statement. Here’s what happens if we do this just once in all possible ways:
✕

And here we see, for example, that can be transformed to . Continuing this we build up a whole multiway graph. After just one more step we get:
✕

Continuing for a few more steps we then get
✕

or in a different rendering:
✕

But what does this graph mean? Essentially it gives us a map of equivalences between expressions—with any pair of expressions that are connected being equivalent. So, for example, it turns out that the expressions and are equivalent, and we can “prove this” by exhibiting a path between them in the graph:
✕

The steps on the path can then be viewed as steps in the proof, where here at each step we’ve indicated where the transformation in the expression took place:
✕

In mathematical terms, we can then say that starting from the “axiom” we were able to prove a certain equivalence theorem between two expressions. We gave a particular proof. But there are others, for example the “less efficient” 35step one
✕

corresponding to the path:
✕

For our later purposes it’s worth talking in a little bit more detail here about how the steps in these proofs actually proceed. Consider the expression:
✕

We can think of this as a tree:
✕

Our axiom can then be represented as:
✕

In terms of trees, our first proof becomes
✕

where we’re indicating at each step which piece of tree gets “substituted for” using the axiom.
What we’ve done so far is to generate a multiway graph for a certain number of steps, and then to see if we can find a “proof path” in it for some particular statement. But what if we are given a statement, and asked whether it can be proved within the specified axiom system? In effect this asks whether if we make a sufficiently large multiway graph we can find a path of any length that corresponds to the statement.
If our system was computationally reducible we could expect always to be able to find a finite answer to this question. But in general—with the Principle of Computational Equivalence and the ubiquitous presence of computational irreducibility—it’ll be common that there is no fundamentally better way to determine whether a path exists than effectively to try explicitly generating it. If we knew, for example, that the intermediate expressions generated always remained of bounded length, then this would still be a bounded problem. But in general the expressions can grow to any size—with the result that there is no general upper bound on the length of path necessary to prove even a statement about equivalence between small expressions.
For example, for the axiom we are using here, we can look at statements of the form . Then this shows how many expressions expr of what sizes have shortest proofs of with progressively greater lengths:
✕

And for example if we look at the statement
✕

its shortest proof is
✕

where, as is often the case, there are intermediate expressions that are longer than the final result.
The multiway graphs in the previous section are in a sense fundamentally metamathematical. Their “raw material” is mathematical statements. But what they represent are the results of operations—like substitution—that are defined at a kind of meta level, that “talks about mathematics” but isn’t itself immediately “representable as mathematics”. But to help understand this relationship it’s useful to look at simple cases where it’s possible to make at least some kind of correspondence with familiar mathematical concepts.
Consider for example the axiom
✕

that we can think of as representing commutativity of the binary operator ∘. Now consider using substitution to “apply this axiom”, say starting from the expression . The result is the (finite) multiway graph:
✕

Conflating the pairs of edges going in opposite directions, the resulting graphs starting from any expression involving s ∘’s (and distinct variables) are:
✕

And these are just the Boolean hypercubes, each with nodes.
If instead of commutativity we consider the associativity axiom
✕

then we get a simple “ring” multiway graph:
✕

With both associativity and commutativity we get:
✕

What is the mathematical significance of this object? We can think of our axioms as being the general axioms for a commutative semigroup. And if we build a multiway graph—say starting with —we’ll find out what expressions are equivalent to in any commutative semigroup—or, in other words, we’ll get a collection of theorems that are “true for any commutative semigroup”:
✕

But what if we want to deal with a “specific semigroup” rather than a generic one? We can think of our symbols a and b as generators of the semigroup, and then we can add relations, as in:
✕

And the result of this will be that we get more equivalences between expressions:
✕

The multiway graph here is still finite, however, giving a finite number of equivalences. But let’s say instead that we add the relations:
✕

Then if we start from a we get a multiway graph that begins like
✕

but just keeps growing forever (here shown after 6 steps):
✕

And what this then means is that there are an infinite number of equivalences between expressions. We can think of our basic symbols and as being generators of our semigroup. Then our expressions correspond to “words” in the semigroup formed from these generators. The fact that the multiway graph is infinite then tells us that there are an infinite number of equivalences between words.
But when we think about the semigroup mathematically we’re typically not so interested in specific words as in the overall “distinct elements” in the semigroup, or in other words, in those “clusters of words” that don’t have equivalences between them. And to find these we can imagine starting with all possible expressions, then building up multiway graphs from them. Many of the graphs grown from different expressions will join up. But what we want to know in the end is how many disconnected graph components are ultimately formed. And each of these will correspond to an element of the semigroup.
As a simple example, let’s start from all words of length 2:
✕

The multiway graphs formed from each of these after 1 step are:
✕

But these graphs in effect “overlap”, leaving three disconnected components:
✕

After 2 steps the corresponding result has two components:
✕

And if we start with longer (or shorter) words, and run for more steps, we’ll keep finding the same result: that there are just two disconnected “droplets” that “condense out” of the “gas” of all possible initial words:
✕

And what this means is that our semigroup ultimately has just two distinct elements—each of which can be represented by any of the different (“equivalent”) words in each “droplet”. (In this particular case the droplets just contain respectively all words with an odd or even number of b’s.)
In the mathematical analysis of semigroups (as well as groups), it’s common ask what happens if one forms products of elements. In our setting what this means is in effect that one wants to “combine droplets using ∘”. The simplest words in our two droplets are respectively and . And we can use these as “representatives of the droplets”. Then we can see how multiplication by and by transforms words from each droplet:
✕

With only finite words the multiplications will sometimes not “have an immediate target” (so they are not indicated here). But in the limit of an infinite number of multiway steps, every multiplication will “have a target” and we’ll be able to summarize the effect of multiplication in our semigroup by the graph:
✕

More familiar as mathematical objects than semigroups are groups. And while their axioms are slightly more complicated, the basic setup we’ve discussed for semigroups also applies to groups. And indeed the graph we’ve just generated for our semigroup is very much like a standard Cayley graph that we might generate for a group—in which the nodes are elements of the group and the edges define how one gets from one element to another by multiplying by a generator. (One technical detail is that in Cayley graphs identityelement selfloops are normally dropped.)
Consider the group (the “Klein fourgroup”). In our notation the axioms for this group can be written:
✕

Given these axioms we do the same construction as for the semigroup above. And what we find is that now four “droplets” emerge, corresponding to the four elements of
✕

and the pattern of connections between them in the limit yields exactly the Cayley graph for :
✕

We can view what’s happening here as a first example of something we’ll return to at length later: the idea of “parsing out” recognizable mathematical concepts (here things like elements of groups) from lowerlevel “purely metamathematical” structures.
In multiway graphs like those we’ve shown in previous sections we routinely generate very large numbers of “mathematical” expressions. But how are these expressions related to each other? And in some appropriate limit can we think of them all being embedded in some kind of “metamathematical space”?
It turns out that this is the direct analog of what in our Physics Project we call branchial space, and what in that case defines a map of the entanglements between branches of quantum history. In the mathematical case, let’s say we have a multiway graph generated using the axiom:
✕

After a few steps starting from we have:
✕

Now—just as in our Physics Project—let’s form a branchial graph by looking at the final expressions here and connecting them if they are “entangled” in the sense that they share an ancestor on the previous step:
✕

There’s some trickiness here associated with loops in the multiway graph (which are the analog of closed timelike curves in physics) and what it means to define different “steps in evolution”. But just iterating once more the construction of the multiway graph, we get a branchial graph:
✕

After a couple more iterations the structure of the branchial graph is (with each node sized according to the size of expression it represents):
✕

Continuing another iteration, the structure becomes:
✕

And in essence this structure can indeed be thought of as defining a kind of “metamathematical space” in which the different expressions are embedded. But what is the “geography” of this space? This shows how expressions (drawn as trees) are laid out on a particular branchial graph
✕

and we see that there is at least a general clustering of similar trees on the graph—indicating that “similar expressions” tend to be “nearby” in the metamathematical space defined by this axiom system.
An important feature of branchial graphs is that effects are—essentially by construction—always local in the branchial graph. For example, if one changes an expression at a particular step in the evolution of a multiway system, it can only affect a region of the branchial graph that essentially expands by one edge per step.
One can think of the affected region—in analogy with a light cone in spacetime—as being the “entailment cone” of a particular expression. The edge of the entailment cone in effect expands at a certain “maximum metamathematical speed” in metamathematical (i.e. branchial) space—which one can think of as being measured in units of “expression change per multiway step”.
By analogy with physics one can start talking in general about motion in metamathematical space. A particular proof path in the multiway graph will progressively “move around” in the branchial graph that defines metamathematical space. (Yes, there are many subtle issues here, not least the fact that one has to imagine a certain kind of limit being taken so that the structure of the branchial graph is “stable enough” to “just be moving around” in something like a “fixed background space”.)
By the way, the shortest proof path in the multiway graph is the analog of a geodesic in spacetime. And later we’ll talk about how the “density of activity” in the branchial graph is the analog of energy in physics, and how it can be seen as “deflecting” the path of geodesics, just as gravity does in spacetime.
It’s worth mentioning just one further subtlety. Branchial graphs are in effect associated with “transverse slices” of the multiway graph—but there are many consistent ways to make these slices. In physics terms one can think of the foliations that define different choices of sequences of slices as being like “reference frames” in which one is specifying a sequence of “simultaneity surfaces” (here “branchtime hypersurfaces”). The particular branchial graphs we’ve shown here are ones associated with what in physics might be called the cosmological rest frame in which every node is the result of the same number of updates since the beginning.
A rule like
✕

defines transformations for any expressions and . So, for example, if we use the rule from left to right on the expression the “pattern variable” will be taken to be a while will be taken to be b ∘ a, and the result of applying the rule will be .
But consider instead the case where our rule is:
✕

Applying this rule (from left to right) to we’ll now get . And applying the rule to we’ll get . But what should we make of those ’s? And in particular, are they “the same”, or not?
A pattern variable like z_ can stand for any expression. But do two different z_’s have to stand for the same expression? In a rule like … we’re assuming that, yes, the two z_’s always stand for the same expression. But if the z_’s appear in different rules it’s a different story. Because in that case we’re dealing with two separate and unconnected z_’s—that can stand for completely different expressions.
To begin seeing how this works, let’s start with a very simple example. Consider the (for now, oneway) rule
✕

where is the literal symbol , and x_ is a pattern variable. Applying this to we might think we could just write the result as:
✕

Then if we apply the rule again both branches will give the same expression , so there’ll be a merge in the multiway graph:
✕

But is this really correct? Well, no. Because really those should be two different x_’s, that could stand for two different expressions. So how can we indicate this? One approach is just to give every “generated” x_ a new name:
✕

But this result isn’t really correct either. Because if we look at the second step we see the two expressions and . But what’s really the difference between these? The names are arbitrary; the only constraint is that within any given expression they have to be different. But between expressions there’s no such constraint. And in fact and both represent exactly the same class of expressions: any expression of the form .
So in fact it’s not correct that there are two separate branches of the multiway system producing two separate expressions. Because those two branches produce equivalent expressions, which means they can be merged. And turning both equivalent expressions into the same canonical form we get:
✕

It’s important to notice that this isn’t the same result as what we got when we assumed that every x_ was the same. Because then our final result was the expression which can match but not —whereas now the final result is which can match both and .
This may seem like a subtle issue. But it’s critically important in practice. Not least because generated variables are in effect what make up all “truly new stuff” that can be produced. With a rule like one’s essentially just taking whatever one started with, and successively rearranging the pieces of it. But with a rule like there’s something “truly new” generated every time z_ appears.
By the way, the basic issue of “generated variables” isn’t something specific to the particular symbolic expression setup we’ve been using here. For example, there’s a direct analog of it in the hypergraph rewriting systems that appear in our Physics Project. But in that case there’s a particularly clear interpretation: the analog of “generated variables” are new “atoms of space” produced by the application of rules. And far from being some kind of footnote, these “generated atoms of space” are what make up everything we have in our universe today.
The issue of generated variables—and especially their naming—is the bane of all sorts of formalism for mathematical logic and programming languages. As we’ll see later, it’s perfectly possible to “go to a lower level” and set things up with no names at all, for example using combinators. But without names, things tend to seem quite alien to us humans—and certainly if we want to understand the correspondence with standard presentations of mathematics it’s pretty necessary to have names. So at least for now we’ll keep names, and handle the issue of generated variables by uniquifying their names, and canonicalizing every time we have a complete expression.
Let’s look at another example to see the importance of how we handle generated variables. Consider the rule:
✕

If we start with a ∘ a and do no uniquification, we’ll get:
✕

With uniquification, but not canonicalization, we’ll get a pure tree:
✕

But with canonicalization this is reduced to:
✕

A confusing feature of this particular example is that this same result would have been obtained just by canonicalizing the original “assumeallx_’sarethesame” case.
But things don’t always work this way. Consider the rather trivial rule
✕

starting from . If we don’t do uniquification, and don’t do canonicalization, we get:
✕

If we do uniquification (but not canonicalization), we get a pure tree:
✕

But if we now canonicalize this, we get:
✕

And this is now not the same as what we would get by canonicalizing, without uniquifying:
✕

In what we’ve done so far, we’ve always talked about applying rules (like ) to expressions (like or ). But if everything is a symbolic expression there shouldn’t really need to be a distinction between “rules” and “ordinary expressions”. They’re all just expressions. And so we should as well be able to apply rules to rules as to ordinary expressions.
And indeed the concept of “applying rules to rules” is something that has a familiar analog in standard mathematics. The “twoway rules” we’ve been using effectively define equivalences—which are very common kinds of statements in mathematics, though in mathematics they’re usually written with rather than with . And indeed, many axioms and many theorems are specified as equivalences—and in equational logic one takes everything to be defined using equivalences. And when one’s dealing with theorems (or axioms) specified as equivalences, the basic way one derives new theorems is by applying one theorem to another—or in effect by applying rules to rules.
As a specific example, let’s say we have the “axiom”:
✕

We can now apply this to the rule
✕

to get (where since is equivalent to we’re sorting each twoway rule that arises)
✕

or after a few more steps:
✕

In this example all that’s happening is that the substitutions specified by the axiom are getting separately applied to the left and righthand sides of each rule that is generated. But if we really take seriously the idea that everything is a symbolic expression, things can get a bit more complicated.
Consider for example the rule:
✕

If we apply this to
✕

then if x_ “matches any expression” it can match the whole expression giving the result:
✕

Standard mathematics doesn’t have an obvious meaning for something like this—although as soon as one “goes metamathematical” it’s fine. But in an effort to maintain contact with standard mathematics we’ll for now have the “meta rule” that x_ can’t match an expression whose toplevel operator is . (As we’ll discuss later, including such matches would allow us to do exotic things like encode set theory within arithmetic, which is again something usually considered to be “syntactically prevented” in mathematical logic.)
Another—still more obscure—meta rule we have is that x_ can’t “match inside a variable”. In Wolfram Language, for example, a_ has the full form Pattern[a,Blank[]], and one could imagine that x_ could match “internal pieces” of this. But for now, we’re going to treat all variables as atomic—even though later on, when we “descend below the level of variables”, the story will be different.
When we apply a rule like to we’re taking a rule with pattern variables, and doing substitutions with it on a “literal expression” without pattern variables. But it’s also perfectly possible to apply pattern rules to pattern rules—and indeed that’s what we’ll mostly do below. But in this case there’s another subtle issue that can arise. Because if our rule generates variables, we can end up with two different kinds of variables with “arbitrary names”: generated variables, and pattern variables from the rule we’re operating on. And when we canonicalize the names of these variables, we can end up with identical expressions that we need to merge.
Here’s what happens if we apply the rule to the literal rule :
✕

If we apply it to the pattern rule but don’t do canonicalization, we’ll just get the same basic result:
✕

But if we canonicalize we get instead:
✕

The effect is more dramatic if we go to two steps. When operating on the literal rule we get:
✕

Operating on the pattern rule, but without canonicalization, we get
✕

while if we include canonicalization many rules merge and we get:
✕

We can think of “ordinary expressions” like as being like “data”, and rules as being like “code”. But when everything is a symbolic expression, it’s perfectly possible—as we saw above—to “treat code like data”, and in particular to generate rules as output. But this now raises a new possibility. When we “get a rule as output”, why not start “using it like code” and applying it to things?
In mathematics we might apply some theorem to prove a lemma, and then we might subsequently use that lemma to prove another theorem—eventually building up a whole “accumulative structure” of lemmas (or theorems) being used to prove other lemmas. In any given proof we can in principle always just keep using the axioms over and over again—but it’ll be much more efficient to progressively build a library of more and more lemmas, and use these. And in general we’ll build up a richer structure by “accumulating lemmas” than always just going back to the axioms.
In the multiway graphs we’ve drawn so far, each edge represents the application of a rule, but that rule is always a fixed axiom. To represent accumulative evolution we need a slightly more elaborate structure—and it’ll be convenient to use tokenevent graphs rather than pure multiway graphs.
Every time we apply a rule we can think of this as an event. And with the setup we’re describing, that event can be thought of as taking two tokens as input: one the “code rule” and the other the “data rule”. The output from the event is then some collection of rules, which can then serve as input (either “code” or “data”) to other events.
Let’s start with the very simple example of the rule
✕

where for now there are no patterns being used. Starting from this rule, we get the tokenevent graph (where now we’re indicating the initial “axiom” statement using a slightly different color):
✕

One subtlety here is that the is applied to itself—so there are two edges going into the event from the node representing the rule. Another subtlety is that there are two different ways the rule can be applied, with the result that there are two output rules generated.
Here’s another example, based on the two rules:
✕

✕

Continuing for another step we get:
✕

Typically we will want to consider as “defining an equivalence”, so that means the same as , and can be conflated with it—yielding in this case:
✕

Now let’s consider the rule:
✕

After one step we get:
✕

After 2 steps we get:
✕

The tokenevent graphs after 3 and 4 steps in this case are (where now we’ve deduplicated events):
✕

Let’s now consider a rule with the same structure, but with pattern variables instead of literal symbols:
✕

Here’s what happens after one step (note that there’s canonicalization going on, so a_’s in different rules aren’t “the same”)
✕

and we see that there are different theorems from the ones we got without patterns. After 2 steps with the pattern rule we get
✕

where now the complete set of “theorems that have been derived” is (dropping the _’s for readability)
✕

or as trees:
✕

After another step one gets
✕

where now there are 2860 “theorems”, roughly exponentially distributed across sizes according to
✕

and with a typical “size19” theorem being:
✕

In effect we can think of our original rule (or “axiom”) as having initiated some kind of “mathematical Big Bang” from which an increasing number of theorems are generated. Early on we described having a “gas” of mathematical theorems that—a little like molecules—can interact and create new theorems. So now we can view our accumulative evolution process as a concrete example of this.
Let’s consider the rule from previous sections:
✕

After one step of accumulative evolution according to this rule we get:
✕

After 2 and 3 steps the results are:
✕

What is the significance of all this complexity? At a basic level, it’s just an example of the ubiquitous phenomenon in the computational universe (captured in the Principle of Computational Equivalence) that even systems with very simple rules can generate behavior as complex as anything. But the question is whether—on top of all this complexity—there are simple “coarsegrained” features that we can identify as “higherlevel mathematics”; features that we can think of as capturing the “bulk” behavior of the accumulative evolution of axiomatic mathematics.
As we’ve just seen, the accumulative evolution of even very simple transformation rules for expressions can quickly lead to considerable complexity. And in an effort to understand the essence of what’s going on, it’s useful to look at the slightly simpler case not of rules for “treestructured expressions” but instead at rules for strings of characters.
Consider the seemingly trivial case of the rule:
✕

After one step this gives
✕

while after 2 steps we get
✕

though treating as the same as this just becomes:
✕

Here’s what happens with the rule:
✕

✕

After 2 steps we get
✕

and after 3 steps
✕

where now there are a total of 25 “theorems”, including (unsurprisingly) things like:
✕

It’s worth noting that despite the “lexical similarity” of the string rule we’re now using to the expression rule from the previous section, these rules actually work in very different ways. The string rule can apply to characters anywhere within a string, but what it inserts is always of fixed size. The expression rule deals with trees, and only applies to “whole subtrees”, but what it inserts can be a tree of any size. (One can align these setups by thinking of strings as expressions in which characters are “bound together” by an associative operator, as in A·B·A·A. But if one explicitly gives associativity axioms these will lead to additional pieces in the tokenevent graph.)
A rule like also has the feature of involving patterns. In principle we could include patterns in strings too—both for single characters (as with _) and for sequences of characters (as with __)—but we won’t do this here. (We can also consider oneway rules, using → instead of .)
To get a general sense of the kinds of things that happen in accumulative (string) systems, we can consider enumerating all possible distinct twoway string transformation rules. With only a single character A, there are only two distinct cases
✕

because systematically generates all possible rules
✕

and at t steps gives a total number of rules equal to:
✕

With characters A and B the distinct tokenevent graphs generated starting from rules with a total of at most 5 characters are:
✕

Note that when the strings in the initial rule are the same length, only a rather trivial finite tokenevent graph is ever generated, as in the case of :
✕

But when the strings are of different lengths, there is always unbounded growth.
We’ve looked at accumulative versions of expression and string rewriting systems. So what about accumulative versions of hypergraph rewriting systems of the kind that appear in our Physics Project?
Consider the very simple hypergraph rule
✕

or pictorially:
✕

(Note that the nodes that are named 1 here are really like pattern variables, that could be named for example x_.)
We can now do accumulative evolution with this rule, at each step combining results that involve equivalent (i.e. isomorphic) hypergraphs:
✕

After two steps this gives:
✕

And after 3 steps:
✕

How does all this compare to “ordinary” evolution by hypergraph rewriting? Here’s a multiway graph based on applying the same underlying rule repeatedly, starting from an initial condition formed from the rule:
✕

What we see is that the accumulative evolution in effect “shortcuts” the ordinary multiway evolution, essentially by “caching” the result of every piece of every transformation between states (which in this case are rules), and delivering a given state in fewer steps.
In our typical investigation of hypergraph rewriting for our Physics Project we consider oneway transformation rules. Inevitably, though, the ruliad contains rules that go both ways. And here, in an effort to understand the correspondence with our metamodel of mathematics, we can consider twoway hypergraph rewriting rules. An example is the tw0way version of the rule above:
✕

✕

Now the tokenevent graph becomes
✕

or after 2 steps (where now the transformations from “later states” to “earlier states” have started to fill in):
✕

Just like in ordinary hypergraph evolution, the only way to get hypergraphs with additional hyperedges is to start with a rule that involves the addition of new hyperedges—and the same is true for the addition of new elements. Consider the rule:
✕

✕

After 1 step this gives
✕

while after 2 steps it gives:
✕

The general appearance of this tokenevent graph is not much different from what we saw with string rewrite or expression rewrite systems. So what this suggests is that it doesn’t matter much whether we’re starting from our metamodel of axiomatic mathematics or from any other reasonably rich rewriting system: we’ll always get the same kind of “largescale” tokenevent graph structure. And this is an example of what we’ll use to argue for general laws of metamathematics.
In an earlier section, we discussed how paths in a multiway graph can represent proofs of “equivalence” between expressions (or the “entailment” of one expression by another). For example, with the rule (or “axiom”)
✕

this shows a path that “proves” that “BA entails AAB”:
✕

But once we know this, we can imagine adding this result (as what we can think of as a “lemma”) to our original rule:
✕

And now (the “theorem”) “BA entails AAB” takes just one step to prove—and all sorts of other proofs are also shortened:
✕

It’s perfectly possible to imagine evolving a multiway system with a kind of “cachingbased” speedup mechanism where every new entailment discovered is added to the list of underlying rules. And, by the way, it’s also possible to use twoway rules throughout the multiway system:
✕

But accumulative systems provide a much more principled way to progressively “add what’s discovered”. So what do proofs look like in such systems?
Consider the rule:
✕

Running it for 2 steps we get the tokenevent graph:
✕

Now let’s say we want to prove that the original “axiom” implies (or “entails”) the “theorem” . Here’s the subgraph that demonstrates the result:
✕

And here it is as a separate “proof graph”
✕

where each event takes two inputs—the “rule to be applied” and the “rule to apply to”—and the output is the derived (i.e. entailed or implied) new rule or rules.
If we run the accumulative system for another step, we get:
✕

Now there are additional “theorems” that have been generated. An example is:
✕

And now we can find a proof of this theorem:
✕

This proof exists as a subgraph of the tokenevent graph:
✕

The proof just given has the fewest events—or “proof steps”—that can be used. But altogether there are 50 possible proofs, other examples being:
✕

These correspond to the subgraphs:
✕

How much has the accumulative character of these tokenevent graphs contributed to the structure of these proofs? It’s perfectly possible to find proofs that never use “intermediate lemmas” but always “go back to the original axiom” at every step. In this case examples are
✕

which all in effect require at least one more “sequential event” than our shortest proof using intermediate lemmas.
A slightly more dramatic example occurs for the theorem
✕

where now without intermediate lemmas the shortest proof is
✕

but with intermediate lemmas it becomes:
✕

What we’ve done so far here is to generate a complete tokenevent graph for a certain number of steps, and then to see if we can find a proof in it for some particular statement. The proof is a subgraph of the “relevant part” of the full tokenevent graph. Often—in analogy to the simpler case of finding proofs of equivalences between expressions in a multiway graph—we’ll call this subgraph a “proof path”.
But in addition to just “finding a proof” in a fully constructed tokenevent graph, we can ask whether, given a statement, we can directly construct a proof for it. As discussed in the context of proofs in ordinary multiway graphs, computational irreducibility implies that in general there’s no “shortcut” way to find a proof. In addition, for any statement, there may be no upper bound on the length of proof that will be required (or on the size or number of intermediate “lemmas” that will have to be used). And this, again, is the shadow of undecidability in our systems: that there can be statements whose provability may be arbitrarily difficult to determine.
In making our “metamodel” of mathematics we’ve been discussing the rewriting of expressions according to rules. But there’s a subtle issue that we’ve so far avoided, that has to do with the fact that the expressions we’re rewriting are often themselves patterns that stand for whole classes of expressions. And this turns out to allow for additional kinds of transformations that we’ll call cosubstitution and bisubstitution.
Let’s talk first about cosubstitution. Imagine we have the expression f[a]. The rule would do a substitution for a to give f[b]. But if we have the expression f[c] the rule will do nothing.
Now imagine that we have the expression f[x_]. This stands for a whole class of expressions, including f[a], f[c], etc. For most of this class of expressions, the rule will do nothing. But in the specific case of f[a], it applies, and gives the result f[b].
If our rule is f[x_] → s then this will apply as an ordinary substitution to f[a], giving the result s. But if the rule is f[b] → s this will not apply as an ordinary substitution to f[a]. However, it can apply as a cosubstitution to f[x_] by picking out the specific case where x_ stands for b, then using the rule to give s.
In general, the point is that ordinary substitution specializes patterns that appear in rules—while what one can think of as the “dual operation” of cosubstitution specializes patterns that appear in the expressions to which the rules are being applied. If one thinks of the rule that’s being applied as like an operator, and the expression to which the rule is being applied as an operand, then in effect substitution is about making the operator fit the operand, and cosubstitution is about making the operand fit the operator.
It’s important to realize that as soon as one’s operating on expressions involving patterns, cosubstitution is not something “optional”: it’s something that one has to include if one is really going to interpret patterns—wherever they occur—as standing for classes of expressions.
When one’s operating on a literal expression (without patterns) only substitution is ever possible, as in
✕

corresponding to this fragment of a tokenevent graph:
✕

Let’s say we have the rule f[a] → s (where f[a] is a literal expression). Operating on f[b] this rule will do nothing. But what if we apply the rule to f[x_]? Ordinary substitution still does nothing. But cosubstitution can do something. In fact, there are two different cosubstitutions that can be done in this case:
✕

What’s going on here? In the first case, f[x_] has the “special case” f[a], to which the rule applies (“by cosubstitution”)—giving the result s. In the second case, however, it’s on its own which has the special case f[a], that gets transformed by the rule to s, giving the final cosubstitution result f[s].
There’s an additional wrinkle when the same pattern (such as ) appears multiple times:
✕

In all cases, x_ is matched to a. But which of the x_’s is actually replaced is different in each case.
Here’s a slightly more complicated example:
✕

In ordinary substitution, replacements for patterns are in effect always made “locally”, with each specific pattern separately being replaced by some expression. But in cosubstitution, a “special case” found for a pattern will get used throughout when the replacement is done.
Let’s see how this all works in an accumulative axiomatic system. Consider the very simple rule:
✕

One step of substitution gives the tokenevent graph (where we’ve canonicalized the names of pattern variables to a_ and b_):
✕

But one step of cosubstitution gives instead:
✕

Here are the individual transformations that were made (with the rule at least nominally being applied only in one direction):
✕

The tokenevent graph above is then obtained by canonicalizing variables, and combining identical expressions (though for clarity we don’t merge rules of the form and ).
If we go another step with this particular rule using only substitution, there are additional events (i.e. transformations) but no new theorems produced:
✕

Cosubstitution, however, produces another 27 theorems
✕

or altogether
✕

or as trees:
✕

We’ve now seen examples of both substitution and cosubstitution in action. But in our metamodel for mathematics we’re ultimately dealing not with each of these individually, but rather with the “symmetric” concept of bisubstitution, in which both substitution and cosubstitution can be mixed together, and applied even to parts of the same expression.
In the particular case of , bisubstitution adds nothing beyond cosubstitution. But often it does. Consider the rule:
✕

Here’s the result of applying this to three different expressions using substitution, cosubstitution and bisubstitution (where we consider only matches for “whole ∘ expressions”, not subparts):
✕

Cosubstitution very often yields substantially more transformations than substitution—bisubstitution then yielding modestly more than cosubstitution. For example, for the axiom system
✕

the number of theorems derived after 1 and 2 steps is given by:
✕

In some cases there are theorems that can be produced by full bisubstitution, but not—even after any number of steps—by substitution or cosubstitution alone. However, it is also common to find that theorems can in principle be produced by substitution alone, but that this just takes more steps (and sometimes vastly more) than when full bisubstitution is used. (It’s worth noting, however, that the notion of “how many steps” it takes to “reach” a given theorem depends on the foliation one chooses to use in the tokenevent graph.)
The various forms of substitution that we’ve discussed here represent different ways in which one theorem can entail others. But our overall metamodel of mathematics—based as it is purely on the structure of symbolic expressions and patterns—implies that bisubstitution covers all entailments that are possible.
In the history of metamathematics and mathematical logic, a whole variety of “laws of inference” or “methods of entailment” have been considered. But with the modern view of symbolic expressions and patterns (as used, for example, in the Wolfram Language), bisubstitution emerges as the fundamental form of entailment, with other forms of entailment corresponding to the use of particular types of expressions or the addition of further elements to the pure substitutions we’ve used here.
It should be noted, however, that when it comes to the ruliad different kinds of entailments correspond merely to different foliations—with the form of entailment that we’re using representing just a particularly straightforward case.
The concept of bisubstitution has arisen in the theory of term rewriting, as well as in automated theorem proving (where it is often viewed as a particular “strategy”, and called “paramodulation”). In term rewriting, bisubstitution is closely related to the concept of unification—which essentially asks what assignment of values to pattern variables is needed in order to make different subterms of an expression be identical.
Now that we’ve finished describing the many technical issues involved in constructing our metamodel of mathematics, we can start looking at its consequences. We discussed above how multiway graphs formed from expressions can be used to define a branchial graph that represents a kind of “metamathematical space”. We can now use a similar approach to set up a metamathematical space for our full metamodel of the “progressive accumulation” of mathematical statements.
Let’s start by ignoring cosubstitution and bisubstitution and considering only the process of substitution—and beginning with the axiom:
✕

Doing accumulative evolution from this axiom we get the tokenevent graph
✕

or after 2 steps:
✕

From this we can derive an “effective multiway graph” by directly connecting all input and output tokens involved in each event:
✕

And then we can produce a branchial graph, which in effect yields an approximation to the “metamathematical space” generated by our axiom:
✕

Showing the statements produced in the form of trees we get (with the top node representing ⟷):
✕

If we do the same thing with full bisubstitution, then even after one step we get a slightly larger tokenevent graph:
✕

After two steps, we get
✕

which contains 46 statements, compared to 42 if only substitution is used. The corresponding branchial graph is:
✕

The adjacency matrices for the substitution and bisubstitution cases are then
✕

which have 80% and 85% respectively of the number of edges in complete graphs of these sizes.
Branchial graphs are usually quite dense, but they nevertheless do show definite structure. Here are some results after 2 steps:
✕

We’ve discussed at some length what happens if we start from axioms and then build up an “entailment cone” of all statements that can be derived from them. But in the actual practice of mathematics people often want to just look at particular target statements, and see if they can be derived (i.e. proved) from the axioms.
But what can we say “in bulk” about this process? The best source of potential examples we have right now come from the practice of automated theorem proving—as for example implemented in the Wolfram Language function FindEquationalProof. As a simple example of how this works, consider the axiom
✕

and the theorem:
✕

Automated theorem proving (based on FindEquationalProof) finds the following proof of this theorem:
✕

Needless to say, this isn’t the only possible proof. And in this very simple case, we can construct the full entailment cone—and determine that there aren’t any shorter proofs, though there are two more of the same length:
✕

All three of these proofs can be seen as paths in the entailment cone:
✕

How “complicated” are these proofs? In addition to their lengths, we can for example ask how big the successive intermediate expressions they involve become, where here we are including not only the proofs already shown, but also some longer ones as well:
✕

In the setup we’re using here, we can find a proof of by starting with lhs, building up an entailment cone, and seeing whether there’s any path in it that reaches rhs. In general there’s no upper bound on how far one will have to go to find such a path—or how big the intermediate expressions may need to get.
One can imagine all kinds of optimizations, for example where one looks at multistep consequences of the original axioms, and treats these as “lemmas” that we can “add as axioms” to provide new rules that jump multiple steps on a path at a time. Needless to say, there are lots of tradeoffs in doing this. (Is it worth the memory to store the lemmas? Might we “jump” past our target? etc.)
But typical actual automated theorem provers tend to work in a way that is much closer to our accumulative rewriting systems—in which the “raw material” on which one operates is statements rather than expressions.
Once again, we can in principle always construct a whole entailment cone, and then look to see whether a particular statement occurs there. But then to give a proof of that statement it’s sufficient to find the subgraph of the entailment cone that leads to that statement. For example, starting with the axiom
✕

we get the entailment cone (shown here as a tokenevent graph, and dropping _’s):
✕

After 2 steps the statement
✕

shows up in this entailment cone
✕

where we’re indicating the subgraph that leads from the original axiom to this statement. Extracting this subgraph we get
✕

which we can view as a proof of the statement within this axiom system.
But now let’s use traditional automated theorem proving (in the form of FindEquationalProof) to get a proof of this same statement. Here’s what we get:
✕

This is again a tokenevent graph, but its structure is slightly different from the one we “fished out of” the entailment cone. Instead of starting from the axiom and “progressively deriving” our statement we start from both the statement and the axiom and then show that together they lead “merely via substitution” to a statement of the form , which we can take as an “obviously derivable tautology”.
Sometimes the minimal “direct proof” found from the entailment cone can be considerably simpler than the one found by automated theorem proving. For example, for the statement
✕

the minimal direct proof is
✕

while the one found by FindEquationalProof is:
✕

But the great advantage of automated theorem proving is that it can “directedly” search for proofs instead of just “fishing them out of” the entailment cone that contains all possible exhaustively generated proofs. To use automated theorem proving you have to “know where you want to go”—and in particular identify the theorem you want to prove.
Consider the axiom
✕

and the statement:
✕

This statement doesn’t show up in the first few steps of the entailment cone for the axiom, even though millions of other theorems do. But automated theorem proving finds a proof of it—and rearranging the “proveatautology proof” so that we just have to feed in a tautology somewhere in the proof, we get:
✕

The modeltheoretic methods we’ll discuss a little later allow one effectively to “guess” theorems that might be derivable from a given axiom system. So, for example, for the axiom system
✕

here’s a “guess” at a theorem
✕

and here’s a representation of its proof found by automated theorem proving—where now the length of an intermediate “lemma” is indicated by the size of the corresponding node
✕

and in this case the longest intermediate lemma is of size 67 and is:
✕

In principle it’s possible to rearrange tokenevent graphs generated by automated theorem proving to have the same structure as the ones we get directly from the entailment cone—with axioms at the beginning and the theorem being proved at the end. But typical strategies for automated theorem proving don’t naturally produce such graphs. In principle automated theorem proving could work by directly searching for a “path” that leads to the theorem one’s trying to prove. But usually it’s much easier instead to have as the “target” a simple tautology.
At least conceptually automated theorem proving must still try to “navigate” through the full tokenevent graph that makes up the entailment cone. And the main issue in doing this is that there are many places where one does not know “which branch to take”. But here there’s a crucial—if at first surprising—fact: at least so long as one is using full bisubstitution it ultimately doesn’t matter which branch one takes; there’ll always be a way to “merge back” to any other branch.
This is a consequence of the fact that the accumulative systems we’re using automatically have the property of confluence which says that every branch is accompanied by a subsequent merge. There’s an almost trivial way in which this is true by virtue of the fact that for every edge the system also includes the reverse of that edge. But there’s a more substantial reason as well: that given any two statements on two different branches, there’s always a way to combine them using a bisubstitution to get a single statement.
In our Physics Project, the concept of causal invariance—which effectively generalizes confluence—is an important one, that leads among other things to ideas like relativistic invariance. Later on we’ll discuss the idea that “regardless of what order you prove theorems in, you’ll always get the same math”, and its relationship to causal invariance and to the notion of relativity in metamathematics. But for now the importance of confluence is that it has the potential to simplify automated theorem proving—because in effect it says one can never ultimately “make a wrong turn” in getting to a particular theorem, or, alternatively, that if one keeps going long enough every path one might take will eventually be able to reach every theorem.
And indeed this is exactly how things work in the full entailment cone. But the challenge in automated theorem proving is to generate only a tiny part of the entailment cone, yet still “get to” the theorem we want. And in doing this we have to carefully choose which “branches” we should try to merge using bisubstitution events. In automated theorem proving these bisubstitution events are typically called “critical pair lemmas”, and there are a variety of strategies for defining an order in which critical pair lemmas should be tried.
It’s worth pointing out that there’s absolutely no guarantee that such procedures will find the shortest proof of any given theorem (or in fact that they’ll find a proof at all with a given amount of computational effort). One can imagine “higherorder proofs” in which one attempts to transform not just statements of the form , but full proofs (say represented as tokenevent graphs). And one can imagine using such transformations to try to simplify proofs.
A general feature of the proofs we’ve been showing is that they are accumulative, in the sense they continually introduce lemmas which are then reused. But in principle any proof can be “unrolled” into one that just repeatedly uses the original axioms (and in fact, purely by substitution)—and never introduces other lemmas. The necessary “cut elimination” can effectively be done by always recreating each lemma from the axioms whenever it’s needed—a process which can become exponentially complex.
As an example, from the axiom above we can generate the proof
✕

where for example the first lemma at the top is reused in four events. But now by cut elimination we can “unroll” this whole proof into a “straightline” sequence of substitutions on expressions done just using the original axiom
✕

and we see that our final theorem is the statement that the first expression in the sequence is equivalent under the axiom to the last one.
As is fairly evident in this example, a feature of automated theorem proving is that its result tends to be very “nonhuman”. Yes, it can provide incontrovertible evidence that a theorem is valid. But that evidence is typically far away from being any kind of “narrative” suitable for human consumption. In the analogy to molecular dynamics, an automated proof gives detailed “turnbyturn instructions” that show how a molecule can reach a certain place in a gas. Typical “humanstyle” mathematics, on the other hand, operates on a higher level, analogous to talking about overall motion in a fluid. And a core part of what’s achieved by our physicalization of metamathematics is understanding why it’s possible for mathematical observers like us to perceive mathematics as operating at this higher level.
The axiom systems we’ve been talking about so far were chosen largely for their axiomatic simplicity. But what happens if we consider axiom systems that are used in practice in presentday mathematics?
The simplest common example are the axioms (actually, a single axiom) of semigroup theory, stated in our notation as:
✕

Using only substitution, all we ever get after any number of steps is the tokenevent graph (i.e. “entailment cone”):
✕

But with bisubstitution, even after one step we already get the entailment cone
✕

which contains such theorems as:
✕

After 2 steps, the entailment cone becomes
✕

which contains 1617 theorems such as
✕

with sizes distributed as follows:
✕

Looking at these theorems we can see that—in fact by construction—they are all just statements of the associativity of ∘. Or, put another way, they state that under this axiom all expression trees that have the same sequence of leaves are equivalent.
What about group theory? The standard axioms can be written
✕

where ∘ is interpreted as the binary group multiplication operation, overbar as the unary inverse operation, and 1 as the constant identity element (or, equivalently, zeroargument function).
One step of substitution already gives:
✕

It’s notable that in this picture one can already see “different kinds of theorems” ending up in different “metamathematical locations”. One also sees some “obvious” tautological “theorems”, like and .
If we use full bisubstitution, we get 56 rather than 27 theorems, and many of the theorems are more complicated:
✕

After 2 steps of pure substitution, the entailment cone in this case becomes
✕

which includes 792 theorems with sizes distributed according to:
✕

But among all these theorems, do straightforward “textbook theorems” appear, like?
✕

The answer is no. It’s inevitable that in the end all such theorems must appear in the entailment cone. But it turns out that it takes quite a few steps. And indeed with automated theorem proving we can find “paths” that can be taken to prove these theorems—involving significantly more than two steps:
✕

So how about logic, or, more specifically Boolean algebra? A typical textbook axiom system for this (represented in terms of And ∧, Or ∨ and Not ) is:
✕

After one step of substitution from these axioms we get
✕

or in our more usual rendering:
✕

So what happens here with “named textbook theorems” (excluding commutativity and distributivity, which already appear in the particular axioms we’re using)?
✕

Once again none of these appear in the first step of the entailment cone. But at step 2 with full bisubstitution the idempotence laws show up
✕

where here we’re only operating on theorems with leaf count below 14 (of which there are a total of 27,953).
And if we go to step 3—and use leaf count below 9—we see the law of excluded middle and the law of noncontradiction show up:
✕

How are these reached? Here’s the smallest fragment of tokenevent graph (“shortest path”) within this entailment cone from the axioms to the law of excluded middle:
✕

There are actually many possible “paths” (476 in all with our leaf count restriction); the next smallest ones with distinct structures are:
✕

Here’s the “path” for this theorem found by automated theorem proving:
✕

Most of the other “named theorems” involve longer proofs—and so won’t show up until much later in the entailment cone:
✕

The axiom system we’ve used for Boolean algebra here is by no means the only possible one. For example, it’s stated in terms of And, Or and Not—but one doesn’t need all those operators; any Boolean expression (and thus any theorem in Boolean algebra) can also be stated just in terms of the single operator Nand.
And in terms of that operator the very simplest axiom system for Boolean algebra contains (as I found in 2000) just one axiom (where here ∘ is now interpreted as Nand):
✕

Here’s one step of the substitution entailment cone for this axiom:
✕

After 2 steps this gives an entailment cone with 5486 theorems
✕

with size distribution:
✕

When one’s working with Nand, it’s less clear what one should consider to be “notable theorems”. But an obvious one is the commutativity of Nand:
✕

Here’s a proof of this obtained by automated theorem proving (tipped on its side for readability):
✕

Eventually it’s inevitable that this theorem must show up in the entailment cone for our axiom system. But based on this proof we would expect it only after something like 102 steps. And with the entailment cone growing exponentially this means that by the time shows up, perhaps other theorems would have done so—though most vastly more complicated.
We’ve looked at axioms for group theory and for Boolean algebra. But what about other axiom systems from presentday mathematics? In a sense it’s remarkable how few of these there are—and indeed I was able to list essentially all of them in just two pages in A New Kind of Science:
The longest axiom system listed here is a precise version of Euclid’s original axioms
✕

where we are listing everything (even logic) in explicit (Wolfram Language) functional form. Given these axioms we should now be able to prove all theorems in Euclidean geometry. As an example (that’s already complicated enough) let’s take Euclid’s very first “proposition” (Book 1, Proposition 1) which states that it’s possible “with a ruler and compass” (i.e. with lines and circles) to construct an equilateral triangle based on any line segment—as in:
✕
RandomInstance[Entity["GeometricScene","EuclidBook1Proposition1"]["Scene"]]["Graphics"] 
We can write this theorem by saying that given the axioms together with the “setup”
✕

it’s possible to derive:
✕

We can now use automated theorem proving to generate a proof
✕

and in this case the proof takes 272 steps. But the fact that it’s possible to generate this proof shows that (up to various issues about the “setup conditions”) the theorem it proves must eventually “occur naturally” in the entailment cone of the original axioms—though along with an absolutely immense number of other theorems that Euclid didn’t “call out” and write down in his books.
Looking at the collection of axiom systems from A New Kind of Science (and a few related ones) for many of them we can just directly start generating entailment cones—here shown after one step, using substitution only:
✕

But if we’re going to make entailment cones for all axiom systems there are a few other technical wrinkles we have to deal with. The axiom systems shown above are all “straightforwardly equational” in the sense that they in effect state what amount to “algebraic relations” (in the sense of universal algebra) universally valid for all choices of variables. But some axiom systems traditionally used in mathematics also make other kinds of statements. In the traditional formalism and notation of mathematical logic these can look quite complicated and abstruse. But with a metamodel of mathematics like ours it’s possible to untangle things to the point where these different kinds of statements can also be handled in a streamlined way.
In standard mathematical notation one might write
✕

which we can read as “for all a and b, equals ”—and which we can interpret in our “metamodel” of mathematics as the (twoway) rule:
✕

What this says is just that any time we see an expression that matches the pattern we can replace it by (or in Wolfram Language notation just ), and vice versa, so that in effect can be said to entail .
But what if we have axioms that involve not just universal statements (“for all …”) but also existential statements (“there exists…”)? In a sense we’re already dealing with these. Whenever we write —or in explicit functional form, say o[a_, b_]—we’re effectively asserting that there exists some operator o that we can do operations with. It’s important to note that once we introduce o (or ∘) we imagine that it represents the same thing wherever it appears (in contrast to a pattern variable like a_ that can represent different things in different instances).
Now consider an “explicit existential statement” like
✕

which we can read as “there exists something a for which equals a”. To represent the “something” we just introduce a “constant”, or equivalently an expression with head, say, α, and zero arguments: α[ ]. Now we can write out existential statement as
✕

or:
✕

We can operate on this using rules like , with α[] always “passing through” unchanged—but with its mere presence asserting that “it exists”.
A very similar setup works even if we have both universal and existential quantifiers. For example, we can represent
✕

as just
✕

where now there isn’t just a single object, say β[], that we assert exists; instead there are “lots of different β’s”, “parametrized” in this case by a.
We can apply our standard accumulative bisubstitution process to this statement—and after one step we get:
✕

Note that this is a very different result from the one for the “purely universal” statement:
✕

✕

In general, we can “compile” any statement in terms of quantifiers into our metamodel, essentially using the standard technique of Skolemization from mathematical logic. Thus for example
✕

can be “compiled into”
✕

while
✕

can be compiled into:
✕

If we look at the actual axiom systems used in current mathematics there’s one more issue to deal with—which doesn’t affect the axioms for logic or group theory, but does show up, for example, in the Peano axioms for arithmetic. And the issue is that in addition to quantifying over “variables”, we also need to quantify over “functions”. Or formulated differently, we need to set up not just individual axioms, but a whole “axiom schema” that can generate an infinite sequence of “ordinary axioms”, one for each possible “function”.
In our metamodel of mathematics, we can think of this in terms of “parametrized functions”, or in Wolfram Language, just as having functions whose heads are themselves patterns, as in f[n_][a_].
Using this setup we can then “compile” the standard induction axiom of Peano arithmetic
✕

into the (Wolfram Language) metamodel form
✕

where the “implications” in the original axiom have been converted into oneway rules, so that what the axiom can now be seen to do is to define a transformation for something that is not an “ordinary mathematicalstyle expression” but rather an expression that is itself a rule.
But the important point is that our whole setup of doing substitutions in symbolic expressions—like Wolfram Language—makes no fundamental distinction between dealing with “ordinary expressions” and with “rules” (in Wolfram Language, for example, is just Rule[a,b]). And as a result we can expect to be able to construct tokenevent graphs, build entailment cones, etc. just as well for axiom systems like Peano arithmetic, as for ones like Boolean algebra and group theory.
The actual number of nodes that appear even in what might seem like simple cases can be huge, but the whole setup makes it clear that exploring an axiom system like this is just another example—that can be uniformly represented with our metamodel of mathematics—of a form of sampling of the ruliad.
We’ve so far considered something like
✕

just as an abstract statement about arbitrary symbolic variables x and y, and some abstract operator ∘. But can we make a “model” of what x, y, and ∘ could “explicitly be”?
Let’s imagine for example that x and y can take 2 possible values, say 0 or 1. (We’ll use numbers for notational convenience, though in principle the values could be anything we want.) Now we have to ask what ∘ can be in order to have our original statement always hold. It turns out in this case that there are several possibilities, that can be specified by giving possible “multiplication tables” for ∘:
✕

(For convenience we’ll often refer to such multiplication tables by numbers FromDigits[Flatten[m],k], here 0, 1, 5, 7, 10, 15.) Using let’s say the second multiplication table we can then “evaluate” both sides of the original statement for all possible choices of x and y, and verify that the statement always holds:
✕

If we allow, say, 3 possible values for x and y, there turn out to be 221 possible forms for ∘. The first few are:
✕

As another example, let’s consider the simplest axiom for Boolean algebra (that I discovered in 2000):
✕

Here are the “size2” models for this
✕

and these, as expected, are the truth tables for Nand and Nor respectively. (In this particular case, there are no size3 models, 12 size4 models, and in general models of size 2^{n}—and no finite models of any other size.)
Looking at this example suggests a way to talk about models for axiom systems. We can think of an axiom system as defining a collection of abstract constraints. But what can we say about objects that might satisfy those constraints? A model is in effect telling us about these objects. Or, put another way, it’s telling what “things” the axiom system “describes”. And in the case of my axiom for Boolean algebra, those “things” would be Boolean variables, operated on using Nand or Nor.
As another example, consider the axioms for group theory
✕

✕

Is there a mathematical interpretation of these? Well, yes. They essentially correspond to (representations of) particular finite groups. The original axioms define constraints to be satisfied by any group. These models now correspond to particular groups with specific finite numbers of elements (and in fact specific representations of these groups). And just like in the Boolean algebra case this interpretation now allows us to start saying what the models are “about”. The first three, for example, correspond to cyclic groups which can be thought of as being “about” addition of integers mod k.
For axiom systems that haven’t traditionally been studied in mathematics, there typically won’t be any such preexisting identification of what they’re “about”. But we can still think of models as being a way that a mathematical observer can characterize—or summarize—an axiom system. And in a sense we can see the collection of possible finite models for an axiom system as being a kind of “model signature” for the axiom system.
But let’s now consider what models tell us about “theorems” associated with a given axiom system. Take for example the axiom:
✕

Here are the size2 models for this axiom system:
✕

Let’s now pick the last of these models. Then we can take any symbolic expression involving ∘, and say what its values would be for every possible choice of the values of the variables that appear in it:
✕

The last row here gives an “expression code” that summarizes the values of each expression in this particular model. And if two expressions have different codes in the model then this tells us that these expressions cannot be equivalent according to the underlying axiom system.
But if the codes are the same, then it’s at least possible that the expressions are equivalent in the underlying axiom system. So as an example, let’s take the equivalences associated with pairs of expressions that have code 3 (according to the model we’re using):
✕

So now let’s compare with an actual entailment cone for our underlying axiom system (where to keep the graph of modest size we have dropped expressions involving more than 3 variables):
✕

So far this doesn’t establish equivalence between any of our code3 expressions. But if we generate a larger entailment cone (here using a different initial expression) we get
✕

where the path shown corresponds to the statement
✕

demonstrating that this is an equivalence that holds in general for the axiom system.
But let’s take another statement implied by the model, such as:
✕

Yes, it’s valid in the model. But it’s not something that’s generally valid for the underlying axiom system, or could ever be derived from it. And we can see this for example by picking another model for the axiom system, say the secondtolast one in our list above
✕

and finding out that the values for the two expressions here are different in that model:
✕

The definitive way to establish that a particular statement follows from a particular axiom system is to find an explicit proof for it, either directly by picking it out as a path in the entailment cone or by using automated theorem proving methods. But models in a sense give one a way to “get an approximate result”.
As an example of how this works, consider a collection of possible expressions, with pairs of them joined whenever they can be proved equal in the axiom system we’re discussing:
✕

Now let’s indicate what codes two models of the axiom system assign to the expressions:
✕

The expressions within each connected graph component are equal according to the underlying axiom system, and in both models they are always assigned the same codes. But sometimes the models “overshoot”, assigning the same codes to expressions not in the same connected component—and therefore not equal according to the underlying axiom system.
The models we’ve shown so far are ones that are valid for the underlying axiom system. If we use a model that isn’t valid we’ll find that even expressions in the same connected component of the graph (and therefore equal according to the underlying axiom system) will be assigned different codes (note the graphs have been rearranged to allow expressions with the same code to be drawn in the same “patch”):
✕

We can think of our graph of equivalences between expressions as corresponding to a slice through an entailment graph—and essentially being “laid out in metamathematical space”, like a branchial graph, or what we’ll later call an “entailment fabric”. And what we see is that when we have a valid model different codes yield different patches that in effect cover metamathematical space in a way that respects the equivalences implied by the underlying axiom system.
But now let’s see what happens if we make an entailment cone, tagging each node with the code corresponding to the expression it represents, first for a valid model, and then for nonvalid ones:
✕

With the valid model, the whole entailment cone is tagged with the same code (and here also same color). But for the nonvalid models, different “patches” in the entailment cone are tagged with different codes.
Let’s say we’re trying to see if two expressions are equal according to the underlying axiom system. The definitive way to tell this is to find a “proof path” from one expression to the other. But as an “approximation” we can just “evaluate” these two expressions according to a model, and see if the resulting codes are the same. Even if it’s a valid model, though, this can only definitively tell us that two expressions aren’t equal; it can’t confirm that they are. In principle we can refine things by checking in multiple models—particularly ones with more elements. But without essentially prechecking all possible equalities we can’t in general be sure that this will give us the complete story.
Of course, generating explicit proofs from the underlying axiom system can also be hard—because in general the proof can be arbitrarily long. And in a sense there is a tradeoff. Given a particular equivalence to check we can either search for a path in the entailment graph, often effectively having to try many possibilities. Or we can “do the work up front” by finding a model or collection of models that we know will correctly tell us whether the equivalence is correct.
Later we’ll see how these choices relate to how mathematical observers can “parse” the structure of metamathematical space. In effect observers can either explicitly try to trace out “proof paths” formed from sequences of abstract symbolic expressions—or they can “globally predetermine” what expressions “mean” by identifying some overall model. In general there may be many possible choices of models—and what we’ll see is that these different choices are essentially analogous to different choices of reference frames in physics.
One feature of our discussion of models so far is that we’ve always been talking about making models for axioms, and then applying these models to expressions. But in the accumulative systems we’ve discussed above (and that seem like closer metamodels of actual mathematics), we’re only ever talking about “statements”—with “axioms” just being statements we happen to start with. So how do models work in such a context?
Here’s the beginning of the tokenevent graph starting with
✕

produced using one step of entailment by substitution:
✕

For each of the statements given here, there are certain size2 models (indicated here by their multiplication tables) that are valid—or in some cases all models are valid:
✕

We can summarize this by indicating in a 4×4 grid which of the 16 possible size2 models are consistent with each statement generated so far in the entailment cone:
✕

Continuing one more step we get:
✕

It’s often the case that statements generated on successive steps in the entailment cone in essence just “accumulate more models”. But—as we can see from the righthand edge of this graph—it’s not always the case—and sometimes a model valid for one statement is no longer valid for a statement it entails. (And the same is true if we use full bisubstitution rather than just substitution.)
Everything we’ve discussed about models so far here has to do with expressions. But there can also be models for other kinds of structures. For strings it’s possible to use something like the same setup, though it doesn’t work quite so well. One can think of transforming the string
✕

into
✕

and then trying to find appropriate “multiplication tables” for ∘, but here operating on the specific elements A and B, not on a collection of elements defined by the model.
Defining models for a hypergraph rewriting system is more challenging, if interesting. One can think of the expressions we’ve used as corresponding to trees—which can be “evaluated” as soon as definite “operators” associated with the model are filled in at each node. If we try to do the same thing with graphs (or hypergraphs) we’ll immediately be thrust into issues of the order in which we scan the graph.
At a more general level, we can think of a “model” as being a way that an observer tries to summarize things. And we can imagine many ways to do this, with differing degrees of fidelity, but always with the feature that if the summaries of two things are different, then those two things can’t be transformed into each other by whatever underlying process is being used.
Put another way, a model defines some kind of invariant for the underlying transformations in a system. The raw material for computing this invariant may be operators at nodes, or may be things like overall graph properties (like cycle counts).
We’ve talked about what happens with specific, sample axiom systems, as well as with various axiom systems that have arisen in presentday mathematics. But what about “axiom systems in the wild”—say just obtained by random sampling, or by systematic enumeration? In effect, each possible axiom system can be thought of as “defining a possible field of mathematics”—just in most cases not one that’s actually been studied in the history of human mathematics. But the ruliad certainly contains all such axiom systems. And in the style of A New Kind of Science we can do ruliology to explore them.
As an example, let’s look at axiom systems with just one axiom, one binary operator and one or two variables. Here are the smallest few:
✕

For each of these axiom systems, we can then ask what theorems they imply. And for example we can enumerate theorems—just as we have enumerated axiom systems—then use automated theorem proving to determine which theorems are implied by which axiom systems. This shows the result, with possible axiom systems going down the page, possible theorems going across, and a particular square being filled in (darker for longer proofs) if a given theorem can be proved from a given axiom system:
✕

The diagonal on the left is axioms “proving themselves”. The lines across are for axiom systems like that basically say that any two expressions are equal—so that any theorem that is stated can be proved from the axiom system.
But what if we look at the whole entailment cone for each of these axiom systems? Here are a few examples of the first two steps:
✕

With our method of accumulative evolution the axiom doesn’t on its own generate a growing entailment cone (though if combined with any axiom containing ∘ it does, and so does on its own). But in all the other cases shown the entailment cone grows rapidly (typically at least exponentially)—in effect quickly establishing many theorems. Most of those theorems, however, are “not small”—and for example after 2 steps here are the distributions of their sizes:
✕

So let’s say we generate only one step in the entailment cone. This is the pattern of “small theorems” we establish:
✕

And here is the corresponding result after two steps:
✕

Superimposing this on our original array of theorems we get:
✕

In other words, there are many small theorems that we can establish “if we look for them”, but which won’t “naturally be generated” quickly in the entailment cone (though eventually it’s inevitable that they will be generated). (Later we’ll see how this relates to the concept of “entailment fabrics” and the “knitting together of pieces of mathematics”.)
In the previous section we discussed the concept of models for axiom systems. So what models do typical “axiom systems from the wild” have? The number of possible models of a given size varies greatly for different axiom systems:
✕

✕

But for each model we can ask what theorems it implies are valid. And for example combining all models of size 2 yields the following “predictions” for what theorems are valid (with the actual theorems indicated by dots):
✕

Using instead models of size 3 gives “more accurate predictions”:
✕

As expected, looking at a fixed number of steps in the entailment cone “underestimates” the number of valid theorems, while looking at finite models overestimates it.
So how does our analysis for “axiom systems from the wild” compare with what we’d get if we considered axiom systems that have been explicitly studied in traditional human mathematics? Here are some examples of “known” axiom systems that involve just a single binary operator
✕

and here’s the distribution of theorems they give:
✕

As must be the case, all the axiom systems for Boolean algebra yield the same theorems. But axiom systems for “different mathematical theories” yield different collections of theorems.
What happens if we look at entailments from these axiom systems? Eventually all theorems must show up somewhere in the entailment cone of a given axiom system. But here are the results after one step of entailment:
✕

Some theorems have already been generated, but many have not:
✕

Just as we did above, we can try to “predict” theorems by constructing models. Here’s what happens if we ask what theorems hold for all valid models of size 2:
✕

For several of the axiom systems, the models “perfectly predict” at least the theorems we show here. And for Boolean algebra, for example, this isn’t surprising: the models just correspond to identifying ∘ as Nand or Nor, and to say this gives a complete description of Boolean algebra. But in the case of groups, “size2 models” just capture particular groups that happen to be of size 2, and for these particular groups there are special, extra theorems that aren’t true for groups in general.
If we look at models specifically of size 3 there aren’t any examples for Boolean algebra so we don’t predict any theorems. But for group theory, for example, we start to get a slightly more accurate picture of what theorems hold in general:
✕

Based on what we’ve seen here, is there something “obviously special” about the axiom systems that have traditionally been used in human mathematics? There are cases like Boolean algebra where the axioms in effect constrain things so much that we can reasonably say that they’re “talking about definite things” (like Nand and Nor). But there are plenty of other cases, like group theory, where the axioms provide much weaker constraints, and for example allow an infinite number of possible specific groups. But both situations occur among axiom systems “from the wild”. And in the end what we’re doing here doesn’t seem to reveal anything “obviously special” (say in the statistics of models or theorems) about “human” axiom systems.
And what this means is that we can expect that conclusions we draw from looking at the “general case of all axiom systems”—as captured in general by the ruliad—can be expected to hold in particular for the specific axiom systems and mathematical theories that human mathematics has studied.
In the typical practice of pure mathematics the main objective is to establish theorems. Yes, one wants to know that a theorem has a proof (and perhaps the proof will be helpful in understanding the theorem), but the main focus is on theorems and not on proofs. In our effort to “go underneath” mathematics, however, we want to study not only what theorems there are, but also the process by which the theorems are reached. We can view it as an important simplifying assumption of typical mathematical observers that all that matters is theorems—and that different proofs aren’t relevant. But to explore the underlying structure of metamathematics, we need to unpack this—and in effect look directly at the structure of proof space.
Let’s consider a simple system based on strings. Say we have the rewrite rule and we want to establish the theorem . To do this we have to find some path from A to ABA in the multiway system (or, effectively, in the entailment cone for this axiom system):
✕

But this isn’t the only possible path, and thus the only possible proof. In this particular case, there are 20 distinct paths, each corresponding to at least a slightly different proof:
✕

But one feature here is that all these different proofs can in a sense be “smoothly deformed” into each other, in this case by progressively changing just one step at a time. So this means that in effect there is no nontrivial topology to proof space in this case—and “distinctly inequivalent” collections of proofs:
✕

But consider instead the rule . With this “axiom system” there are 15 possible proofs for the theorem :
✕

Pulling out just the proofs we get:
✕

And we see that in a sense there’s a “hole” in proof space here—so that there are two distinctly different kinds of proofs that can be done.
One place it’s common to see a similar phenomenon is in games and puzzles. Consider for example the Towers of Hanoi puzzle. We can set up a multiway system for the possible moves that can be made. Starting from all disks on the left peg, we get after 1 step:
✕

After 2 steps we have:
✕

And after 8 steps (in this case) we have the whole “game graph”:
✕

The corresponding result for 4 disks is:
✕

And in each case we see the phenomenon of nontrivial topology. What fundamentally causes this? In a sense it reflects the possibility for distinctly different strategies that lead to the same result. Here, for example, different sides of the “main loop” correspond to the “foundational choice” of whether to move the biggest disk first to the left or to the right. And the same basic thing happens with 4 disks on 4 pegs, though the overall structure is more complicated there:
✕

If two paths diverge in a multiway system it could be that it will never be possible for them to merge again. But whenever the system has the property of confluence, it’s guaranteed that eventually the paths will merge. And, as it turns out, our accumulative evolution setup guarantees that (at least ignoring generation of new variables) confluence will always be achieved. But the issue is how quickly. If branches always merge after just one step, then in a sense there’ll always be topologically trivial proof space. But if the merging can take awhile (and in a continuum limit, arbitrarily long) then there’ll in effect be nontrivial topology.
And one consequence of the nontrivial topology we’re discussing here is that it leads to disconnection in branchial space. Here are the branchial graphs for the first 3 steps in our original 3disk 3peg case:
✕

For the first two steps, the branchial graphs stay connected; but on the third step there’s disconnection. For the 4disk 4peg case the sequence of branchial graphs begins:
✕

At the beginning (and also the end) there’s a single component, that we might think of as a coherent region of metamathematical space. But in the middle it breaks into multiple disconnected components—in effect reflecting the emergence of multiple distinct regions of metamathematical space with something like event horizons temporarily existing between them.
How should we interpret this? First and foremost, it’s something that reveals that there’s structure “below” the “fluid dynamics” level of mathematics; it’s something that depends on the discrete “axiomatic infrastructure” of metamathematics. And from the point of view of our Physics Project, we can think of it as a kind of metamathematical analog of a “quantum effect”.
In our Physics Project we imagine different paths in the multiway system to correspond to different possible quantum histories. The observer is in effect spread over multiple paths, which they coarse grain or conflate together. An “observable quantum effect” occurs when there are paths that can be followed by the system, but that are somehow “too far apart” to be immediately coarsegrained together by the observer.
Put another way, there is “noticeable quantum interference” when the different paths corresponding to different histories that are “simultaneously happening” are “far enough apart” to be distinguished by the observer. “Destructive interference” is presumably associated with paths that are so far apart that to conflate them would effectively require conflating essentially every possible path. (And our later discussion of the relationship between falsity and the “principle of explosion” then suggests a connection between destructive interference in physics and falsity in mathematics.)
In essence what determines the extent of “quantum effects” is then our “size” as observers in branchial space relative to the size of features in branchial space such as the “topological holes” we’ve been discussing. In the metamathematical case, the “size” of us as observers is in effect related to our ability (or choice) to distinguish slight differences in axiomatic formulations of things. And what we’re saying here is that when there is nontrivial topology in proof space, there is an intrinsic dynamics in metamathematical entailment that leads to the development of distinctions at some scale—though whether these become “visible” to us as mathematical observers depends on how “strong a metamathematical microscope” we choose to use relative to the scale of the “topological holes”.
A fundamental feature of our metamodel of mathematics is the idea that a given set of mathematical statements can entail others. But in this picture what does “mathematical progress” look like?
In analogy with physics one might imagine it would be like the evolution of the universe through time. One would start from some limited set of axioms and then—in a kind of “mathematical Big Bang”—these would lead to a progressively larger entailment cone containing more and more statements of mathematics. And in analogy with physics, one could imagine that the process of following chains of successive entailments in the entailment cone would correspond to the passage of time.
But realistically this isn’t how most of the actual history of human mathematics has proceeded. Because people—and even their computers—basically never try to extend mathematics by axiomatically deriving all possible valid mathematical statements. Instead, they come up with particular mathematical statements that for one reason or another they think are valid and interesting, then try to prove these.
Sometimes the proof may be difficult, and may involve a long chain of entailments. Occasionally—especially if automated theorem proving is used—the entailments may approximate a geodesic path all the way from the axioms. But the practical experience of human mathematics tends to be much more about identifying “nearby statements” and then trying to “fit them together” to deduce the statement one’s interested in.
And in general human mathematics seems to progress not so much through the progressive “time evolution” of an entailment graph as through the assembly of what one might call an “entailment fabric” in which different statements are being knitted together by entailments.
In physics, the analog of the entailment graph is basically the causal graph which builds up over time to define the content of a light cone (or, more accurately, an entanglement cone). The analog of the entailment fabric is basically the (moreorless) instantaneous state of space (or, more accurately, branchial space).
In our Physics Project we typically take our lowestlevel structure to be a hypergraph—and informally we often say that this hypergraph “represents the structure of space”. But really we should be deducing the “structure of space” by taking a particular time slice from the “dynamic evolution” represented by the causal graph—and for example we should think of two “atoms of space” as “being connected” in the “instantaneous state of space” if there’s a causal connection between them defined within the slice of the causal graph that occurs within the time slice we’re considering. In other words, the “structure of space” is knitted together by the causal connections represented by the causal graph. (In traditional physics, we might say that space can be “mapped out” by looking at overlaps between lots of little light cones.)
Let’s look at how this works out in our metamathematical setting, using string rewrites to simplify things. If we start from the axiom this is the beginning of the entailment cone it generates:
✕

✕

But instead of starting with one axiom and building up a progressively larger entailment cone, let’s start with multiple statements, and from each one generate a small entailment cone, say applying each rule at most twice. Here are entailment cones started from several different statements:
✕

But the crucial point is that these entailment cones overlap—so we can knit them together into an “entailment fabric”:
✕

Or with more pieces and another step of entailment:
✕

And in a sense this is a “timeless” way to imagine building up mathematics—and metamathematical space. Yes, this structure can in principle be viewed as part of the branchial graph obtained from a slice of an entailment graph (and technically this will be a useful way to think about it). But a different view—closer to the practice of human mathematics—is that it’s a “fabric” formed by fitting together many different mathematical statements. It’s not something where one’s tracking the overall passage of time, and seeing causal connections between things—as one might in “running a program”. Rather, it’s something where one’s fitting pieces together in order to satisfy constraints—as one might in creating a tiling.
Underneath everything is the ruliad. And entailment cones and entailment fabrics can be thought of just as different samplings or slicings of the ruliad. The ruliad is ultimately the entangled limit of all possible computations. But one can think of it as being built up by starting from all possible rules and initial conditions, then running them for an infinite number of steps. An entailment cone is essentially a “slice” of this structure where one’s looking at the “time evolution” from a particular rule and initial condition. An entailment fabric is an “orthogonal” slice, looking “at a particular time” across different rules and initial conditions. (And, by the way, rules and initial conditions are essentially equivalent, particularly in an accumulative system.)
One can think of these different slices of the ruliad as being what different kinds of observers will perceive within the ruliad. Entailment cones are essentially what observers who persist through time but are localized in rulial space will perceive. Entailment fabrics are what observers who ignore time but explore more of rulial space will perceive.
Elsewhere I’ve argued that a crucial part of what makes us perceive the laws of physics we do is that we are observers who consider ourselves to be persistent through time. But now we’re seeing that in the way human mathematics is typically done, the “mathematical observer” will be of a different character. And whereas for a physical observer what’s crucial is causality through time, for a mathematical observer (at least one who’s doing mathematics the way it’s usually done) what seems to be crucial is some kind of consistency or coherence across metamathematical space.
In physics it’s far from obvious that a persistent observer would be possible. It could be that with all those detailed computationally irreducible processes happening down at the level of atoms of space there might be nothing in the universe that one could consider consistent through time. But the point is that there are certain “coarsegrained” attributes of the behavior that are consistent through time. And it is by concentrating on these that we end up describing things in terms of the laws of physics we know.
There’s something very analogous going on in mathematics. The detailed branchial structure of metamathematical space is complicated, and presumably full of computational irreducibility. But once again there are “coarsegrained” attributes that have a certain consistency and coherence across it. And it is on these that we concentrate as human “mathematical observers”. And it is in terms of these that we end up being able to do “humanlevel mathematics”—in effect operating at a “fluid dynamics” level rather than a “molecular dynamics” one.
The possibility of “doing physics in the ruliad” depends crucially on the fact that as physical observers we assume that we have certain persistence and coherence through time. The possibility of “doing mathematics (the way it’s usually done) in the ruliad” depends crucially on the fact that as “mathematical observers” we assume that the mathematical statements we consider will have a certain coherence and consistency—or, in effect, that it’s possible for us to maintain and grow a coherent body of mathematical knowledge, even as we try to include all sorts of new mathematical statements.
Logic was originally conceived as a way to characterize human arguments—in which the concept of “truth” has always seemed quite central. And when logic was applied to the foundations of mathematics, “truth” was also usually assumed to be quite central. But the way we’ve modeled mathematics here has been much more about what statements can be derived (or entailed) than about any kind of abstract notion of what statements can be “tagged as true”. In other words, we’ve been more concerned with “structurally deriving” that “” than in saying that “1 + 1 = 2 is true”.
But what is the relation between this kind of “constructive derivation” and the logical notion of truth? We might just say that “if we can construct a statement then we should consider it true”. And if we’re starting from axioms, then in a sense we’ll never have an “absolute notion of truth”—because whatever we derive is only “as true as the axioms we started from”.
One issue that can come up is that our axioms might be inconsistent—in the sense that from them we can derive two obviously inconsistent statements. But to get further in discussing things like this we really need not only to have a notion of truth, but also a notion of falsity.
In traditional logic it has tended to be assumed that truth and falsity are very much “the same kind of thing”—like 1 and 0. But one feature of our view of mathematics here is that actually truth and falsity seem to have a rather different character. And perhaps this is not surprising—because in a sense if there’s one true statement about something there are typically an infinite number of false statements about it. So, for example, the single statement is true, but the infinite collection of statements for any other are all false.
There is another aspect to this, discussed since at least the Middle Ages, often under the name of the “principle of explosion”: that as soon as one assumes any statement that is false, one can logically derive absolutely any statement at all. In other words, introducing a single “false axiom” will start an explosion that will eventually “blow up everything”.
So within our model of mathematics we might say that things are “true” if they can be derived, and are “false” if they lead to an “explosion”. But let’s say we’re given some statement. How can we tell if it’s true or false? One thing we can do to find out if it’s true is to construct an entailment cone from our axioms and see if the statement appears anywhere in it. Of course, given computational irreducibility there’s in general no upper bound on how far we’ll need to go to determine this. But now to find out if a statement is false we can imagine introducing the statement as an additional axiom, and then seeing if the entailment cone that’s now produced contains an explosion—though once again there’ll in general be no upper bound on how far we’ll have to go to guarantee that we have a “genuine explosion” on our hands.
So is there any alternative procedure? Potentially the answer is yes: we can just try to see if our statement is somehow equivalent to “true” or “false”. But in our model of mathematics where we’re just talking about transformations on symbolic expressions, there’s no immediate builtin notion of “true” and “false”. To talk about these we have to add something. And for example what we can do is to say that “true” is equivalent to what seems like an “obvious tautology” such as , or in our computational notation, , while “false” is equivalent to something “obviously explosive”, like (or in our particular setup something more like ).
But even though something like “Can we find a way to reach from a given statement?” seems like a much more practical question for an actual theoremproving system than “Can we fish our statement out of a whole entailment cone?”, it runs into many of the same issues—in particular that there’s no upper limit on the length of path that might be needed.
Soon we’ll return to the question of how all this relates to our interpretation of mathematics as a slice of the ruliad—and to the concept of the entailment fabric perceived by a mathematical observer. But to further set the context for what we’re doing let’s explore how what we’ve discussed so far relates to things like Gödel’s theorem, and to phenomena like incompleteness.
From the setup of basic logic we might assume that we could consider any statement to be either true or false. Or, more precisely, we might think that given a particular axiom system, we should be able to determine whether any statement that can be syntactically constructed with the primitives of that axiom system is true or false. We could explore this by asking whether every statement is either derivable or leads to an explosion—or can be proved equivalent to an “obvious tautology” or to an “obvious explosion”.
But as a simple “approximation” to this, let’s consider a string rewriting system in which we define a “local negation operation”. In particular, let’s assume that given a statement like the “negation” of this statement just exchanges A and B, in this case yielding .
Now let’s ask what statements are generated from a given axiom system. Say we start with . After one step of possible substitutions we get
✕

while after 2 steps we get:
✕

And in our setup we’re effectively asserting that these are “true” statements. But now let’s “negate” the statements, by exchanging A and B. And if we do this, we’ll see that there’s never a statement where both it and its negation occur. In other words, there’s no obvious inconsistency being generated within this axiom system.
But if we consider instead the axiom then this gives:
✕

And since this includes both and its “negation” , by our criteria we must consider this axiom system to be inconsistent.
In addition to inconsistency, we can also ask about incompleteness. For all possible statements, does the axiom system eventually generate either the statement or its negation? Or, in other words, can we always decide from the axiom system whether any given statement is true or false?
With our simple assumption about negation, questions of inconsistency and incompleteness become at least in principle very simple to explore. Starting from a given axiom system, we generate its entailment cone, then we ask within this cone what fraction of possible statements, say of a given length, occur.
If the answer is more than 50% we know there’s inconsistency, while if the answer is less than 50% that’s evidence of incompleteness. So what happens with different possible axiom systems?
Here are some results from A New Kind of Science, in each case showing both what amounts to the raw entailment cone (or, in this case, multiway system evolution from “true”), and the number of statements of a given length reached after progressively more steps:
At some level this is all rather straightforward. But from the pictures above we can already get a sense that there’s a problem. For most axiom systems the fraction of statements reached of a given length changes as we increase the number of steps in the entailment cone. Sometimes it’s straightforward to see what fraction will be achieved even after an infinite number of steps. But often it’s not.
And in general we’ll run into computational irreducibility—so that in effect the only way to determine whether some particular statement is generated is just to go to ever more steps in the entailment cone and see what happens. In other words, there’s no guaranteedfinite way to decide what the ultimate fraction will be—and thus whether or not any given axiom system is inconsistent, or incomplete, or neither.
For some axiom systems it may be possible to tell. But for some axiom systems it’s not, in effect because we don’t in general know how far we’ll have to go to determine whether a given statement is true or not.
A certain amount of additional technical detail is required to reach the standard versions of Gödel’s incompleteness theorems. (Note that these theorems were originally stated specifically for the Peano axioms for arithmetic, but the Principle of Computational Equivalence suggests that they’re in some sense much more general, and even ubiquitous.) But the important point here is that given an axiom system there may be statements that either can or cannot be reached—but there’s no upper bound on the length of path that might be needed to reach them even if one can.
OK, so let’s come back to talking about the notion of truth in the context of the ruliad. We’ve discussed axiom systems that might show inconsistency, or incompleteness—and the difficulty of determining if they do. But the ruliad in a sense contains all possible axiom systems—and generates all possible statements.
So how then can we ever expect to identify which statements are “true” and which are not? When we talked about particular axiom systems, we said that any statement that is generated can be considered true (at least with respect to that axiom system). But in the ruliad every statement is generated. So what criterion can we use to determine which we should consider “true”?
The key idea is any computationally bounded observer (like us) can perceive only a tiny slice of the ruliad. And it’s a perfectly meaningful question to ask whether a particular statement occurs within that perceived slice.
One way of picking a “slice” is just to start from a given axiom system, and develop its entailment cone. And with such a slice, the criterion for the truth of a statement is exactly what we discussed above: does the statement occur in the entailment cone?
But how do typical “mathematical observers” actually sample the ruliad? As we discussed in the previous section, it seems to be much more by forming an entailment fabric than by developing a whole entailment cone. And in a sense progress in mathematics can be seen as a process of adding pieces to an entailment fabric: pulling in one mathematical statement after another, and checking that they fit into the fabric.
So what happens if one tries to add a statement that “isn’t true”? The basic answer is that it produces an “explosion” in which the entailment fabric can grow to encompass essentially any statement. From the point of view of underlying rules—or the ruliad—there’s really nothing wrong with this. But the issue is that it’s incompatible with an “observer like us”—or with any realistic idealization of a mathematician.
Our view of a mathematical observer is essentially an entity that accumulates mathematical statements into an entailment fabric. But we assume that the observer is computationally bounded, so in a sense they can only work with a limited collection of statements. So if there’s an explosion in an entailment fabric that means the fabric will expand beyond what a mathematical observer can coherently handle. Or, put another way, the only kind of entailment fabrics that a mathematical observer can reasonably consider are ones that “contain no explosions”. And in such fabrics, it’s reasonable to take the generation or entailment of a statement as a signal that the statement can be considered true.
The ruliad is in a sense a unique and absolute thing. And we might have imagined that it would lead us to a unique and absolute definition of truth in mathematics. But what we’ve seen is that that’s not the case. And instead our notion of truth is something based on how we sample the ruliad as mathematical observers. But now we must explore what this means about what mathematics as we perceive it can be like.
The ruliad in a sense contains all structurally possible mathematics—including all mathematical statements, all axiom systems and everything that follows from them. But mathematics as we humans conceive of it is never the whole ruliad; instead it is always just some tiny part that we as mathematical observers sample.
We might imagine, however, that this would mean that there is in a sense a complete arbitrariness to our mathematics—because in a sense we could just pick any part of the ruliad we want. Yes, we might want to start from a specific axiom system. But we might imagine that that axiom system could be chosen arbitrarily, with no further constraint. And that the mathematics we study can therefore be thought of as an essentially arbitrary choice, determined by its detailed history, and perhaps by cognitive or other features of humans.
But there is a crucial additional issue. When we “sample our mathematics” from the ruliad we do it as mathematical observers and ultimately as humans. And it turns out that even very general features of us as mathematical observers turn out to put strong constraints on what we can sample, and how.
When we discussed physics, we said that the central features of observers are their computational boundedness and their assumption of their own persistence through time. In mathematics, observers are again computationally bounded. But now it is not persistence through time that they assume, but rather a certain coherence of accumulated knowledge.
We can think of a mathematical observer as progressively expanding the entailment fabric that they consider to “represent mathematics”. And the question is what they can add to that entailment fabric while still “remaining coherent” as observers. In the previous section, for example, we argued that if the observer adds a statement that can be considered “logically false” then this will lead to an “explosion” in the entailment fabric.
Such a statement is certainly present in the ruliad. But if the observer were to add it, then they wouldn’t be able to maintain their coherence—because, whimsically put, their mind would necessarily explode.
In thinking about axiomatic mathematics it’s been standard to say that any axiom system that’s “reasonable to use” should at least be consistent (even though, yes, for a given axiom system it’s in general ultimately undecidable whether this is the case). And certainly consistency is one criterion that we now see is necessary for a “mathematical observer like us”. But one can expect that it’s not the only criterion.
In other words, although it’s perfectly possible to write down any axiom system, and even start generating its entailment cone, only some axiom systems may be compatible with “mathematical observers like us”.
And so, for example, something like the Continuum Hypothesis—which is known to be independent of the “established axioms” of set theory—may well have the feature that, say, it has to be assumed to be true in order to get a metamathematical structure compatible with mathematical observers like us.
In the case of physics, we know that the general characteristics of observers lead to certain key perceived features and laws of physics. In statistical mechanics, we’re dealing with “coarsegrained observers” who don’t trace and decode the paths of individual molecules, and therefore perceive the Second Law of thermodynamics, fluid dynamics, etc. And in our Physics Project we’re also dealing with coarsegrained observers who don’t track all the details of the atoms of space, but instead perceive space as something coherent and effectively continuous.
And it seems as if in metamathematics there’s something very similar going on. As we began to discuss in the very first section above, mathematical observers tend to “coarse grain” metamathematical space. In operational terms, one way they do this is by talking about something like the Pythagorean theorem without always going down to the detailed level of axioms, and for example saying just how real numbers should be defined. And something related is that they tend to concentrate more on mathematical statements and theorems than on their proofs. Later we’ll see how in the context of the ruliad there’s an even deeper level to which one can go. But the point here is that in actually doing mathematics one tends to operate at the “human scale” of talking about mathematical concepts rather than the “molecularscale details” of axioms.
But why does this work? Why is one not continually “dragged down” to the detailed axiomatic level—or below? How come it’s possible to reason at what we described above as the “fluid dynamics” level, without always having to go down to the detailed “molecular dynamics” level?
The basic claim is that this works for mathematical observers for essentially the same reason as the perception of space works for physical observers. With the “coarsegraining” characteristics of the observer, it’s inevitable that the slice of the ruliad they sample will have the kind of coherence that allows them to operate at a higher level. In other words, mathematics can be done “at a human level” for the same basic reason that we have a “humanlevel experience” of space in physics.
The fact that it works this way depends both on necessary features of the ruliad—and in general of multicomputation—as well as on characteristics of us as observers.
Needless to say, there are “corner cases” where what we’ve described starts to break down. In physics, for example, the “humanlevel experience” of space breaks down near spacetime singularities. And in mathematics, there are cases where for example undecidability forces one to take a lowerlevel, more axiomatic and ultimately more metamathematical view.
But the point is that there are large regions of physical space—and metamathematical space—where these kinds of issues don’t come up, and where our assumptions about physical—and mathematical—observers can be maintained. And this is what ultimately allows us to have the “humanscale” views of physics and mathematics that we do.
In the traditional view of the foundations of mathematics one imagines that axioms—say stated in terms of symbolic expressions—are in some sense the lowest level of mathematics. But thinking in terms of the ruliad suggests that in fact there is a stilllower “ur level”—a kind of analog of machine code in which everything, including axioms, is broken down into ultimate “raw computation”.
Take an axiom like , or, in more precise computational language:
✕

Compared to everything we’re used to seeing in mathematics this looks simple. But actually it’s already got a lot in it. For example, it assumes the notion of a binary operator, which it’s in effect naming “∘”. And for example it also assumes the notion of variables, and has two distinct pattern variables that are in effect “tagged” with the names x and y.
So how can we define what this axiom ultimately “means”? Somehow we have to go from its essentially textual symbolic representation to a piece of actual computation. And, yes, the particular representation we’ve used here can immediately be interpreted as computation in the Wolfram Language. But the ultimate computational concept we’re dealing with is more general than that. And in particular it can exist in any universal computational system.
Different universal computational systems (say particular languages or CPUs or Turing machines) may have different ways to represent computations. But ultimately any computation can be represented in any of them—with the differences in representation being like different “coordinatizations of computation”.
And however we represent computations there is one thing we can say for sure: all possible computations are somewhere in the ruliad. Different representations of computations correspond in effect to different coordinatizations of the ruliad. But all computations are ultimately there.
For our Physics Project it’s been convenient use a “parametrization of computation” that can be thought of as being based on rewriting of hypergraphs. The elements in these hypergraphs are ultimately purely abstract, but we tend to talk about them as “atoms of space” to indicate the beginnings of our interpretation.
It’s perfectly possible to use hypergraph rewriting as the “substrate” for representing axiom systems stated in terms of symbolic expressions. But it’s a bit more convenient (though ultimately equivalent) to instead use systems based on expression rewriting—or in effect tree rewriting.
At the outset, one might imagine that different axiom systems would somehow have to be represented by “different rules” in the ruliad. But as one might expect from the phenomenon of universal computation, it’s actually perfectly possible to think of different axiom systems as just being specified by different “data” operated on by a single set of rules. There are many rules and structures that we could use. But one set that has the benefit of a century of history are S, K combinators.
The basic concept is to represent everything in terms of “combinator expressions” containing just the two objects S and K. (It’s also possible to have just one fundamental object, and indeed S alone may be enough.)
It’s worth saying at the outset that when we go this “far down” things get pretty nonhuman and obscure. Setting things up in terms of axioms may already seem pedantic and low level. But going to a substrate below axioms—that we can think of as getting us to raw “atoms of existence”—will lead us to a whole other level of obscurity and complexity. But if we’re going to understand how mathematics can emerge from the ruliad this is where we have to go. And combinators provide us with a moreorlessconcrete example.
Here’s an example of a small combinator expression
✕

which corresponds to the “expression tree”:
✕

We can write the combinator expression without explicit “function application” [ ... ] by using a (left) application operator •
✕

and it’s always unambiguous to omit this operator, yielding the compact representation:
✕

By mapping S, K and the application operator to codewords it’s possible to represent this as a simple binary sequence:
✕

But what does our combinator expression mean? The basic combinators are defined to have the rules:
✕

These rules on their own don’t do anything to our combinator expression. But if we form the expression
✕

which we can write as
✕

then repeated application of the rules gives:
✕

We can think of this as “feeding” c, x and y into our combinator expression, then using the “plumbing” defined by the combinator expression to assemble a particular expression in terms of c, x and y.
But what does this expression now mean? Well, that depends on what we think c, x and y mean. We might notice that c always appears in the configuration c[_][_]. And this means we can interpret it as a binary operator, which we could write in infix form as ∘ so that our expression becomes:
✕

And, yes, this is all incredibly low level. But we need to go even further. Right now we’re feeding in names like c, x and y. But in the end we want to represent absolutely everything purely in terms of S and K. So we need to get rid of the “humanreadable names” and just replace them with “lumps” of S, K combinators that—like the names—get “carried around” when the combinator rules are applied.
We can think about our ultimate expressions in terms of S and K as being like machine code. “One level up” we have assembly language, with the same basic operations, but explicit names. And the idea is that things like axioms—and the laws of inference that apply to them—can be “compiled down” to this assembly language.
But ultimately we can always go further, to the very lowestlevel “machine code”, in which only S and K ever appear. Within the ruliad as “coordinatized” by S, K combinators, there’s an infinite collection of possible combinator expressions. But how do we find ones that “represent something recognizably mathematical”?
As an example let’s consider a possible way in which S, K can represent integers, and arithmetic on integers. The basic idea is that an integer n can be input as the combinator expression
✕

which for n = 5 gives:
✕

But if we now apply this to [S][K] what we get reduces to
✕

which contains 4 S’s.
But with this representation of integers it’s possible to find combinator expressions that represent arithmetic operations. For example, here’s a representation of an addition operator:
✕

At the “assembly language” level we might call this plus, and apply it to integers i and j using:
✕

But at the “pure machine code” level can be represented simply by
✕

which when applied to [S][K] reduces to the “output representation” of 3:
✕

As a slightly more elaborate example
✕

represents the operation of raising to a power. Then becomes:
✕

Applying this to [S][K] repeated application of the combinator rules gives
✕

eventually yielding the output representation of 8:
✕

We could go on and construct any other arithmetic or computational operation we want, all just in terms of the “universal combinators” S and K.
But how should we think about this in terms of our conception of mathematics? Basically what we’re seeing is that in the “raw machine code” of S, K combinators it’s possible to “find” a representation for something we consider to be a piece of mathematics.
Earlier we talked about starting from structures like axiom systems and then “compiling them down” to raw machine code. But what about just “finding mathematics” in a sense “naturally occurring” in “raw machine code”? We can think of the ruliad as containing “all possible machine code”. And somewhere in that machine code must be all the conceivable “structures of mathematics”. But the question is: in the wildness of the raw ruliad, what structures can we as mathematical observers successfully pick out?
The situation is quite directly analogous to what happens at multiple levels in physics. Consider for example a fluid full of molecules bouncing around. As we’ve discussed several times, observers like us usually aren’t sensitive to the detailed dynamics of the molecules. But we can still successfully pick out largescale structures—like overall fluid motions, vortices, etc. And—much like in mathematics—we can talk about physics just at this higher level.
In our Physics Project all this becomes much more extreme. For example, we imagine that space and everything in it is just a giant network of atoms of space. And now within this network we imagine that there are “repeated patterns”—that correspond to things like electrons and quarks and black holes.
In a sense it is the big achievement of natural science to have managed to find these regularities so that we can describe things in terms of them, without always having to go down to the level of atoms of space. But the fact that these are the kinds of regularities we have found is also a statement about us as physical observers.
And the point is that even at the level of the raw ruliad our characteristics as physical observers will inevitably lead us to such regularities. The fact that we are computationally bounded and assume ourselves to have a certain persistence will lead us to consider things that are localized and persistent—that in physics we identify for example as particles.
And it’s very much the same thing in mathematics. As mathematical observers we’re interested in picking out from the raw ruliad “repeated patterns” that are somehow robust. But now instead of identifying them as particles, we’ll identify them as mathematical constructs and definitions. In other words, just as a repeated pattern in the ruliad might in physics be interpreted as an electron, in mathematics a repeated pattern in the ruliad might be interpreted as an integer.
We might think of physics as something “emergent” from the structure of the ruliad, and now we’re thinking of mathematics the same way. And of course not only is the “underlying stuff” of the ruliad the same in both cases, but also in both cases it’s “observers like us” that are sampling and perceiving things.
There are lots of analogies to the process we’re describing of “fishing constructs out of the raw ruliad”. As one example, consider the evolution of a (“class 4”) cellular automaton in which localized structures emerge:
✕

Underneath, just as throughout the ruliad, there’s lots of detailed computation going on, with rules repeatedly getting applied to each cell. But out of all this underlying computation we can identify a certain set of persistent structures—which we can use to make a “higherlevel description” that may capture the aspects of the behavior that we care about.
Given an “ocean” of S, K combinator expressions, how might we set about “finding mathematics” in them? One straightforward approach is just to identify certain “mathematical properties” we want, and then go searching for S, K combinator expressions that satisfy these.
For example, if we want to “search for (propositional) logic” we first need to pick combinator expressions to symbolically represent “true” and “false”. There are many pairs of expressions that will work. As one example, let’s pick:
✕

Now we can just search for combinator expressions which, when applied to all possible pairs of “true” and “false” give truth tables corresponding to particular logical functions. And if we do this, here are examples of the smallest combinator expressions we find:
✕

Here’s how we can then reproduce the truth table for And:
✕

If we just started picking combinator expressions at random, then most of them wouldn’t be “interpretable” in terms of this representation of logic. But if we ran across for example
✕

we could recognize in it the combinators for And, Or, etc. that we identified above, and in effect “disassemble” it to give:
✕

It’s worth noting, though, that even with the choices we made above for “true” and “false”, there’s not just a single possible combinator, say for And. Here are a few possibilities:
✕

And there’s also nothing unique about the choices for “true” and “false”. With the alternative choices
✕

here are the smallest combinator expressions for a few logical functions:
✕

So what can we say in general about the “interpretability” of an arbitrary combinator expression? Obviously any combinator expression does what it does at the level of raw combinators. But the question is whether it can be given a “higherlevel”—and potentially “mathematical”—interpretation.
And in a sense this is directly an issue of what a mathematical observer “perceives” in it. Does it contain some kind of robust structure—say a kind of analog for mathematics of a particle in physics?
Axiom systems can be viewed as a particular way to “summarize” certain “raw machine code” in the ruliad. But from the point of a “raw coordinatization of the ruliad” like combinators there doesn’t seem to be anything immediately special about them. At least for us humans, however, they do seem to be an obvious “waypoint”. Because by distinguishing operators and variables, establishing arities for operators and introducing names for things, they reflect the kind of structure that’s familiar from human language.
But now that we think of the ruliad as what’s “underneath” both mathematics and physics there’s a different path that’s suggested. With the axiomatic approach we’re effectively trying to leverage human language as a way of summarizing what’s going on. But an alternative is to leverage our direct experience of the physical world, and our perception and intuition about things like space. And as we’ll discuss later, this is likely in many ways a better “metamodel” of the way pure mathematics is actually practiced by us humans.
In some sense, this goes straight from the “raw machine code” of the ruliad to “humanlevel mathematics”, sidestepping the axiomatic level. But given how much “reductionist” work has already been done in mathematics to represent its results in axiomatic form, there is definitely still great value in seeing how the whole axiomatic setup can be “fished out” of the “raw ruliad”.
And there’s certainly no lack of complicated technical issues in doing this. As one example, how should one deal with “generated variables”? If one “coordinatizes” the ruliad in terms of something like hypergraph rewriting this is fairly straightforward: it just involves creating new elements or hypergraph nodes (which in physics would be interpreted as atoms of space). But for something like S, K combinators it’s a bit more subtle. In the examples we’ve given above, we have combinators that, when “run”, eventually reach a fixed point. But to deal with generated variables we probably also need combinators that never reach fixed points, making it considerably more complicated to identify correspondences with definite symbolic expressions.
Another issue involves rules of entailment, or, in effect, the metalogic of an axiom system. In the full axiomatic setup we want to do things like create tokenevent graphs, where each event corresponds to an entailment. But what rule of entailment should be used? The underlying rules for S, K combinators, for example, define a particular choice—though they can be used to emulate others. But the ruliad in a sense contains all choices. And, once again, it’s up to the observer to “fish out” of the raw ruliad a particular “slice”—which captures not only the axiom system but also the rules of entailment used.
It may be worth mentioning a slightly different existing “reductionist” approach to mathematics: the idea of describing things in terms of types. A type is in effect an equivalence class that characterizes, say, all integers, or all functions from tuples of reals to truth values. But in our terms we can interpret a type as a kind of “template” for our underlying “machine code”: we can say that some piece of machine code represents something of a particular type if the machine code matches a particular pattern of some kind. And the issue is then whether that pattern is somehow robust “like a particle” in the raw ruliad.
An important part of what made our Physics Project possible is the idea of going “underneath” space and time and other traditional concepts of physics. And in a sense what we’re doing here is something very similar, though for mathematics. We want to go “underneath” concepts like functions and variables, and even the very idea of symbolic expressions. In our Physics Project a convenient “parametrization” of what’s “underneath” is a hypergraph made up of elements that we often refer to as “atoms of space”. In mathematics we’ve discussed using combinators as our “parametrization” of what’s “underneath”.
But what are these “made of”? We can think of them as corresponding to raw elements of metamathematics, or raw elements of computation. But in the end, they’re “made of” whatever the ruliad is “made of”. And perhaps the best description of the elements of the ruliad is that they are “atoms of existence”—the smallest units of anything, from which everything, in mathematics and physics and elsewhere, must be made.
The atoms of existence aren’t bits or points or anything like that. They’re something fundamentally lower level that’s come into focus only with our Physics Project, and particularly with the identification of the ruliad. And for our purposes here I’ll call such atoms of existence “emes” (pronounced “eemes”, like phonemes etc.).
Everything in the ruliad is made of emes. The atoms of space in our Physics Project are emes. The nodes in our combinator trees are emes. An eme is a deeply abstract thing. And in a sense all it has is an identity. Every eme is distinct. We could give it a name if we wanted to, but it doesn’t intrinsically have one. And in the end the structure of everything is built up simply from relations between emes.
The concept of the ruliad suggests there is a deep connection between the foundations of mathematics and physics. And now that we have discussed how some of the familiar formalism of mathematics can “fit into” the ruliad, we are ready to use the “bridge” provided by the ruliad to start exploring how to apply some of the successes and intuitions of physics to mathematics.
A foundational part of our everyday experience of physics is our perception that we live in continuous space. But our Physics Project implies that at sufficiently small scales space is actually made of discrete elements—and it is only because of the coarsegrained way in which we experience it that we perceive it as continuous.
In mathematics—unlike physics—we’ve long thought of the foundations as being based on things like symbolic expressions that have a fundamentally discrete structure. Normally, though, the elements of those expressions are, for example, given humanrecognizable names (like 2 or Plus). But what we saw in the previous section is that these recognizable forms can be thought of as existing in an “anonymous” lowerlevel substrate made of what we can call atoms of existence or emes.
But the crucial point is that this substrate is directly based on the ruliad. And its structure is identical between the foundations of mathematics and physics. In mathematics the emes aggregate up to give us our universe of mathematical statements. In physics they aggregate up to give us our physical universe.
But now the commonality of underlying “substrate” makes us realize that we should be able to take our experience of physics, and apply it to mathematics. So what is the analog in mathematics of our perception of the continuity of space in physics? We’ve discussed the idea that we can think of mathematical statements as being laid out in a metamathematical space—or, more specifically, in what we’ve called an entailment fabric. We initially talked about “coordinatizing” this using axioms, but in the previous section we saw how to go “below axioms” to the level of “pure emes”.
When we do mathematics, though, we’re sampling this on a much higher level. And just like as physical observers we coarse grain the emes (that we usually call “atoms of space”) that make up physical space, so too as “mathematical observers” we coarse grain the emes that make up metamathematical space.
Foundational approaches to mathematics—particularly over the past century or so—have almost always been based on axioms and on their fundamentally discrete symbolic structure. But by going to a lower level and seeing the correspondence with physics we are led to consider what we might think of as a higherlevel “experience” of mathematics—operating not at the “molecular dynamics” level of specific axioms and entailments, but rather at what one might call the “fluid dynamics” level of largerscale concepts.
At the outset one might not have any reason to think that this higherlevel approach could consistently be applied. But this is the first big place where ideas from physics can be used. If both physics and mathematics are based on the ruliad, and if our general characteristics as observers apply in both physics and mathematics, then we can expect that similar features will emerge. And in particular, we can expect that our everyday perception of physical space as continuous will carry over to mathematics, or, more accurately, to metamathematical space.
The picture is that we as mathematical observers have a certain “size” in metamathematical space. We identify concepts—like integers or the Pythagorean theorem—as “regions” in the space of possible configurations of emes (and ultimately of slices of the ruliad). At an axiomatic level we might think of ways to capture what a typical mathematician might consider “the same concept” with slightly different formalism (say, different large cardinal axioms or different models of real numbers). But when we get down to the level of emes there’ll be vastly more freedom in how we capture a given concept—so that we’re in effect using a whole region of “emic space” to do so.
But now the question is what happens if we try to make use of the concept defined by this “region”? Will the “points in the region” behave coherently, or will everything be “shredded”, with different specific representations in terms of emes leading to different conclusions?
The expectation is that in most cases it will work much like physical space, and that what we as observers perceive will be quite independent of the detailed underlying behavior at the level of emes. Which is why we can expect to do “higherlevel mathematics”, without always having to descend to the level of emes, or even axioms.
And this we can consider as the first great “physicalized law of mathematics”: that coherent higherlevel mathematics is possible for us for the same reason that physical space seems coherent to observers like us.
We’ve discussed several times before the analogy to the Second Law of thermodynamics—and the way it makes possible a higherlevel description of things like fluids for “observers like us”. There are certainly cases where the higherlevel description breaks down. Some of them may involve specific probes of molecular structure (like Brownian motion). Others may be slightly more “unwitting” (like hypersonic flow).
In our Physics Project we’re very interested in where similar breakdowns might occur—because they’d allow us to “see below” the traditional continuum description of space. Potential targets involve various extreme or singular configurations of spacetime, where in effect the “coherent observer” gets “shredded”, because different atoms of space “within the observer” do different things.
In mathematics, this kind of “shredding” of the observer will tend to be manifest in the need to “drop below” higherlevel mathematical concepts, and go down to a very detailed axiomatic, metamathematical or even eme level—where computational irreducibility and phenomena like undecidability are rampant.
It’s worth emphasizing that from the point of view of pure axiomatic mathematics it’s not at all obvious that higherlevel mathematics should be possible. It could be that there’d be no choice but to work through every axiomatic detail to have any chance of making conclusions in mathematics.
But the point is that we now know there could be exactly the same issue in physics. Because our Physics Project implies that at the lowest level our universe is effectively made of emes that have all sorts of complicated—and computationally irreducible—behavior. Yet we know that we don’t have to trace through all the details of this to make conclusions about what will happen in the universe—at least at the level we normally perceive it.
In other words, the fact that we can successfully have a “highlevel view” of what happens in physics is something that fundamentally has the same origin as the fact that we can successfully have a highlevel view of what happens in mathematics. Both are just features of how observers like us sample the ruliad that underlies both physics and mathematics.
We’ve discussed how the basic concept of space as we experience it in physics leads us to our first great physicalized law of mathematics—and how this provides for the very possibility of higherlevel mathematics. But this is just the beginning of what we can learn from thinking about the correspondences between physical and metamathematical space implied by their common origin in the structure of the ruliad.
A key idea is to think of a limit of mathematics in which one is dealing with so many mathematical statements that one can treat them “in bulk”—as forming something we could consider a continuous metamathematical space. But what might this space be like?
Our experience of physical space is that at our scale and with our means of perception it seems to us for the most part quite simple and uniform. And this is deeply connected to the concept that pure motion is possible in physical space—or, in other words, that it’s possible for things to move around in physical space without fundamentally changing their character.
Looked at from the point of view of the atoms of space it’s not at all obvious that this should be possible. After all, whenever we move we’ll almost inevitably be made up of different atoms of space. But it’s fundamental to our character as observers that the features we end up perceiving are ones that have a certain persistence—so that we can imagine that we, and objects around us, can just “move unchanged”, at least with respect to those aspects of the objects that we perceive. And this is why, for example, we can discuss laws of mechanics without having to “drop down” to the level of the atoms of space.
So what’s the analog of all this in metamathematical space? At the present stage of our physical universe, we seem to be able to experience physical space as having features like being basically threedimensional. Metamathematical space probably doesn’t have such familiar mathematical characterizations. But it seems very likely (and we’ll see some evidence of this from empirical metamathematics below) that at the very least we’ll perceive metamathematical space as having a certain uniformity or homogeneity.
In our Physics Project we imagine that we can think of physical space as beginning “at the Big Bang” with what amounts to some small collection of atoms of space, but then growing to the vast number of atoms in our current universe through the repeated application of particular rules. But with a small set of rules being applied a vast number of times, it seems almost inevitable that some kind of uniformity must result.
But then the same kind of thing can be expected in metamathematics. In axiomatic mathematics one imagines the mathematical analog of the Big Bang: everything starts from a small collection of axioms, and then expands to a huge number of mathematical statements through repeated application of laws of inference. And from this picture (which gets a bit more elaborate when one considers emes and the full ruliad) one can expect that at least after it’s “developed for a while” metamathematical space, like physical space, will have a certain uniformity.
The idea that physical space is somehow uniform is something we take very much for granted, not least because that’s our lifelong experience. But the analog of this idea for metamathematical space is something we don’t have immediate everyday intuition about—and that in fact may at first seem surprising or even bizarre. But actually what it implies is something that increasingly rings true from modern experience in pure mathematics. Because by saying that metamathematical space is in a sense uniform, we’re saying that different parts of it somehow seem similar—or in other words that there’s parallelism between what we see in different areas of mathematics, even if they’re not “nearby” in terms of entailments.
But this is exactly what, for example, the success of category theory implies. Because it shows us that even in completely different areas of mathematics it makes sense to set up the same basic structures of objects, morphisms and so on. As such, though, category theory defines only the barest outlines of mathematical structure. But what our concept of perceived uniformity in metamathematical space suggests is that there should in fact be closer correspondences between different areas of mathematics.
We can view this as another fundamental “physicalized law of mathematics”: that different areas of mathematics should ultimately have structures that are in some deep sense “perceived the same” by mathematical observers. For several centuries we’ve known there’s a certain correspondence between, for example, geometry and algebra. But it’s been a major achievement of recent mathematics to identify more and more such correspondences or “dualities”.
Often the existence of these has seemed remarkable, and surprising. But what our view of metamathematics here suggests is that this is actually a general physicalized law of mathematics—and that in the end essentially all different areas of mathematics must share a deep structure, at least in some appropriate “bulk metamathematical limit” when enough statements are considered.
But it’s one thing to say that two places in metamathematical space are “similar”; it’s another to say that “motion between them” is possible. Once again we can make an analogy with physical space. We’re used to the idea that we can move around in space, maintaining our identity and structure. But this in a sense requires that we can maintain some kind of continuity of existence on our path between two positions.
In principle it could have been that we would have to be “atomized” at one end, then “reconstituted” at the other end. But our actual experience is that we perceive ourselves to continually exist all the way along the path. In a sense this is just an assumption about how things work that physical observers like us make; but what’s nontrivial is that the underlying structure of the ruliad implies that this will always be consistent.
And so we expect it will be in metamathematics. Like a physical observer, the way a mathematical observer operates, it’ll be possible to “move” from one area of mathematics to another “at a high level”, without being “atomized” along the way. Or, in other words, that a mathematical observer will be able to make correspondences between different areas of mathematics without having to go down to the level of emes to do so.
It’s worth realizing that as soon as there’s a way of representing mathematics in computational terms the concept of universal computation (and, more tightly, the Principle of Computational Equivalence) implies that at some level there must always be a way to translate between any two mathematical theories, or any two areas of mathematics. But the question is whether it’s possible to do this in “highlevel mathematical terms” or only at the level of the underlying “computational substrate”. And what we’re saying is that there’s a general physicalized law of mathematics that implies that higherlevel translation should be possible.
Thinking about mathematics at a traditional axiomatic level can sometimes obscure this, however. For example, in axiomatic terms we usually think of Peano arithmetic as not being as powerful as ZFC set theory (for example, it lacks transfinite induction)—and so nothing like “dual” to it. But Peano arithmetic can perfectly well support universal computation, so inevitably a “formal emulator” for ZFC set theory can be built in it. But the issue is that to do this essentially requires going down to the “atomic” level and operating not in terms of mathematical constructs but instead directly in terms of “metamathematical” symbolic structure (and, for example, explicitly emulating things like equality predicates).
But the issue, it seems, is that if we think at the traditional axiomatic level, we’re not dealing with a “mathematical observer like us”. In the analogy we’ve used above, we’re operating at the “molecular dynamics” level, not at the humanscale “fluid dynamics” level. And so we see all sorts of details and issues that ultimately won’t be relevant in typical approaches to actually doing pure mathematics.
It’s somewhat ironic that our physicalized approach shows this by going below the axiomatic level—to the level of emes and the raw ruliad. But in a sense it’s only at this level that there’s the uniformity and coherence to conveniently construct a general picture that can encompass observers like us.
Much as with ordinary matter we can say that “everything is made of atoms”, we’re now saying that everything is “made of computation” (and its structure and behavior is ultimately described by the ruliad). But the crucial idea that emerged from our Physics Project—and that is at the core of what I’m calling the multicomputational paradigm—is that when we ask what observers perceive there is a whole additional level of inexorable structure. And this is what makes it possible to do both humanscale physics and higherlevel mathematics—and for there to be what amounts to “pure motion”, whether in physical or metamathematical space.
There’s another way to think about this, that we alluded to earlier. A key feature of an observer is to have a coherent identity. In physics, that involves having a consistent thread of experience in time. In mathematics, it involves bringing together a consistent view of “what’s true” in the space of mathematical statements.
In both cases the observer will in effect involve many separate underlying elements (ultimately, emes). But in order to maintain the observer’s view of having a coherent identity, the observer must somehow conflate all these elements, effectively treating them as “the same”. In physics, this means “coarsegraining” across physical or branchial (or, in fact, rulial) space. In mathematics, this means “coarsegraining” across metamathematical space—or in effect treating different mathematical statements as “the same”.
In practice, there are several ways this happens. First of all, one tends to be more concerned about mathematical results than their proofs, so two statements that have the same form can be considered the same even if the proofs (or other processes) that generated them are different (and indeed this is something we have routinely done in constructing entailment cones here). But there’s more. One can also imagine that any statements that entail each other can be considered “the same”.
In a simple case, this means that if and then one can always assume . But there’s a much more general version of this embodied in the univalence axiom of homotopy type theory—that in our terms can be interpreted as saying that mathematical observers consider equivalent things the same.
There’s another way that mathematical observers conflate different statements—that’s in many ways more important, but less formal. As we mentioned above, when mathematicians talk, say, about the Pythagorean theorem, they typically think they have a definite concept in mind. But at the axiomatic level—and even more so at the level of emes—there are a huge number of different “metamathematical configurations” that are all “considered the same” by the typical working mathematician, or by our “mathematical observer”. (At the level of axioms, there might be different axiom systems for real numbers; at the level of emes there might be different ways of representing concepts like addition or equality.)
In a sense we can think of mathematical observers as having a certain “extent” in metamathematical space. And much like humanscale physical observers see only the aggregate effects of huge numbers of atoms of space, so also mathematical observers see only the “aggregate effects” of huge numbers of emes of metamathematical space.
But now the key question is whether a “whole mathematical observer” can “move in metamathematical space” as a single “rigid” entity, or whether it will inevitably be distorted—or shredded—by the structure of metamathematical space. In the next section we’ll discuss the analog of gravity—and curvature—in metamathematical space. But our physicalized approach tends to suggest that in “most” of metamathematical space, a typical mathematical observer will be able to “move around freely”, implying that there will indeed be paths or “bridges” between different areas of mathematics, that involve only higherlevel mathematical constructs, and don’t require dropping down to the level of emes and the raw ruliad.
If metamathematical space is like physical space, does that mean that it has analogs of gravity, and relativity? The answer seems to be “yes”—and these provide our next examples of physicalized laws of mathematics.
In the end, we’re going to be able to talk about at least gravity in a largely “static” way, referring mostly to the “instantaneous state of metamathematics”, captured as an entailment fabric. But in leveraging ideas from physics, it’s important to start off formulating things in terms of the analog of time for metamathematics—which is entailment.
As we’ve discussed above, the entailment cone is the direct analog of the light cone in physics. Starting with some mathematical statement (or, more accurately, some event that transforms it) the forward entailment cone contains all statements (or, more accurately, events) that follow from it. Any possible “instantaneous state of metamathematics” then corresponds to a “transverse slice” through this entailment cone—with the slice in effect being laid out in metamathematical space.
An individual entailment of one statement by another corresponds to a path in the entailment cone, and this path (or, more accurately for accumulative evolution, subgraph) can be thought of as a proof of one statement given another. And in these terms the shortest proof can be thought of as a geodesic in the entailment cone. (In practical mathematics, it’s very unlikely one will find—or care about—the strictly shortest proof. But even having a “fairly short proof” will be enough to give the general conclusions we’ll discuss here.)
Given a path in the entailment cone, we can imagine projecting it onto a transverse slice, i.e. onto an entailment fabric. Being able to consistently do this depends on having a certain uniformity in the entailment cone, and in the sequence of “metamathematical hypersurfaces” that are defined by whatever “metamathematical reference frame” we’re using. But assuming, for example, that underlying computational irreducibility successfully generates a kind of “statistical uniformity” that cannot be “decoded” by the observer, we can expect to have meaningful paths—and geodesics—on entailment fabrics.
But what these geodesics are like then depends on the emergent geometry of entailment fabrics. In physics, the limiting geometry of the analog of this for physical space is presumably a fairly simple 3D manifold. For branchial space, it’s more complicated, probably for example being “exponential dimensional”. And for metamathematics, the limiting geometry is also undoubtedly more complicated—and almost certainly exponential dimensional.
We’ve argued that we expect metamathematical space to have a certain perceived uniformity. But what will affect this, and therefore potentially modify the local geometry of the space? The basic answer is exactly the same as in our Physics Project. If there’s “more activity” somewhere in an entailment fabric, this will in effect lead to “more local connections”, and thus effective “positive local curvature” in the emergent geometry of the network. Needless to say, exactly what “more activity” means is somewhat subtle, especially given that the fabric in which one is looking for this is itself defining the ambient geometry, measures of “area”, etc.
In our Physics Project we make things more precise by associating “activity” with energy density, and saying that energy effectively corresponds to the flux of causal edges through spacelike hypersurfaces. So this suggests that we think about an analog of energy in metamathematics: essentially defining it to be the density of update events in the entailment fabric. Or, put another way, energy in metamathematics depends on the “density of proofs” going through a region of metamathematical space, i.e. involving particular “nearby” mathematical statements.
There are lots of caveats, subtleties and details. But the notion that “activity AKA energy” leads to increasing curvature in an emergent geometry is a general feature of the whole multicomputational paradigm that the ruliad captures. And in fact we expect a quantitative relationship between energy density (or, strictly, energymomentum) and induced curvature of the “transversal space”—that corresponds exactly to Einstein’s equations in general relativity. It’ll be more difficult to see this in the metamathematical case because metamathematical space is geometrically more complicated—and less familiar—than physical space.
But even at a qualitative level, it seems very helpful to think in terms of physics and spacetime analogies. The basic phenomenon is that geodesics are deflected by the presence of “energy”, in effect being “attracted to it”. And this is why we can think of regions of higher energy (or energymomentum/mass)—in physics and in metamathematics—as “generating gravity”, and deflecting geodesics towards them. (Needless to say, in metamathematics, as in physics, the vast majority of overall activity is just devoted to knitting together the structure of space, and when gravity is produced, it’s from slightly increased activity in a particular region.)
(In our Physics Project, a key result is that the same kind of dependence of “spatial” structure on energy happens not only in physical space, but also in branchial space—where there’s a direct analog of general relativity that basically yields the path integral of quantum mechanics.)
What does this mean in metamathematics? Qualitatively, the implication is that “proofs will tend to go through where there’s a higher density of proofs”. Or, in an analogy, if you want to drive from one place to another, it’ll be more efficient if you can do at least part of your journey on a freeway.
One question to ask about metamathematical space is whether one can always get from any place to any other. In other words, starting from one area of mathematics, can one somehow derive all others? A key issue here is whether the area one starts from is computation universal. Propositional logic is not, for example. So if one starts from it, one is essentially trapped, and cannot reach other areas.
But results in mathematical logic have established that most traditional areas of axiomatic mathematics are in fact computation universal (and the Principle of Computational Equivalence suggests that this will be ubiquitous). And given computation universality there will at least be some “proof path”. (In a sense this is a reflection of the fact that the ruliad is unique, so everything is connected in “the same ruliad”.)
But a big question is whether the “proof path” is “big enough” to be appropriate for a “mathematical observer like us”. Can we expect to get from one part of metamathematical space to another without the observer being “shredded”? Will we be able to start from any of a whole collection of places in metamathematical space that are considered “indistinguishably nearby” to a mathematical observer and have all of them “move together” to reach our destination? Or will different specific starting points follow quite different paths—preventing us from having a highlevel (“fluid dynamics”) description of what’s going on, and instead forcing us to drop down to the “molecular dynamics” level?
In practical pure mathematics, this tends to be an issue of whether there is an “elegant proof using highlevel concepts”, or whether one has to drop down to a very detailed level that’s more like lowlevel computer code, or the output of an automated theorem proving system. And indeed there’s a very visceral sense of “shredding” in cases where one’s confronted with a proof that consists of page after page of “machinelike details”.
But there’s another point here as well. If one looks at an individual proof path, it can be computationally irreducible to find out where the path goes, and the question of whether it ever reaches a particular destination can be undecidable. But in most of the current practice of pure mathematics, one’s interested in “higherlevel conclusions”, that are “visible” to a mathematical observer who doesn’t resolve individual proof paths.
Later we’ll discuss the dichotomy between explorations of computational systems that routinely run into undecidability—and the typical experience of pure mathematics, where undecidability is rarely encountered in practice. But the basic point is that what a typical mathematical observer sees is at the “fluid dynamics level”, where the potentially circuitous path of some individual molecule is not relevant.
Of course, by asking specific questions—about metamathematics, or, say, about very specific equations—it’s still perfectly possible to force tracing of individual “lowlevel” proof paths. But this isn’t what’s typical in current pure mathematical practice. And in a sense we can see this as an extension of our first physicalized law of mathematics: not only is higherlevel mathematics possible, but it’s ubiquitously so, with the result that, at least in terms of the questions a mathematical observer would readily formulate, phenomena like undecidability are not generically seen.
But even though undecidability may not be directly visible to a mathematical observer, its underlying presence is still crucial in coherently “knitting together” metamathematical space. Because without undecidability, we won’t have computation universality and computational irreducibility. But—just like in our Physics Project—computational irreducibility is crucial in producing the lowlevel apparent randomness that is needed to support any kind of “continuum limit” that allows us to think of large collections of what are ultimately discrete emes as building up some kind of coherent geometrical space.
And when undecidability is not present, one will typically not end up with anything like this kind of coherent space. An extreme example occurs in rewrite systems that eventually terminate—in the sense that they reach a “fixedpoint” (or “normal form”) state where no more transformations can be applied.
In our Physics Project, this kind of termination can be interpreted as a spacelike singularity at which “time stops” (as at the center of a nonrotating black hole). But in general decidability is associated with “limits on how far paths can go”—just like the limits on causal paths associated with event horizons in physics.
There are many details to work out, but the qualitative picture can be developed further. In physics, the singularity theorems imply that in essence the eventual formation of spacetime singularities is inevitable. And there should be a direct analog in our context that implies the eventual formation of “metamathematical singularities”. In qualitative terms, we can expect that the presence of proof density (which is the analog of energy) will “pull in” more proofs until eventually there are so many proofs that one has decidability and a “proof event horizon” is formed.
In a sense this implies that the longterm future of mathematics is strangely similar to the longterm future of our physical universe. In our physical universe, we expect that while the expansion of space may continue, many parts of the universe will form black holes and essentially be “closed off”. (At least ignoring expansion in branchial space, and quantum effects in general.)
The analog of this in mathematics is that while there can be continued overall expansion in metamathematical space, more and more parts of it will “burn out” because they’ve become decidable. In other words, as more work and more proofs get done in a particular area, that area will eventually be “finished”—and there will be no more “openended” questions associated with it.
In physics there’s sometimes discussion of white holes, which are imagined to effectively be timereversed black holes, spewing out all possible material that could be captured in a black hole. In metamathematics, a white hole is like a statement that is false and therefore “leads to an explosion”. The presence of such an object in metamathematical space will in effect cause observers to be shredded—making it inconsistent with the coherent construction of higherlevel mathematics.
We’ve talked at some length about the “gravitational” structure of metamathematical space. But what about seemingly simpler things like special relativity? In physics, there’s a notion of basic, flat spacetime, for which it’s easy to construct families of reference frames, and in which parallel trajectories stay parallel. In metamathematics, the analog is presumably metamathematical space in which “parallel proof geodesics” remain “parallel”—so that in effect one can continue “making progress in mathematics” by just “keeping on doing what you’ve been doing”.
And somehow relativistic invariance is associated with the idea that there are many ways to do math, but in the end they’re all able to reach the same conclusions. Ultimately this is something one expects as a consequence of fundamental features of the ruliad—and the inevitability of causal invariance in it resulting from the Principle of Computational Equivalence. It’s also something that might seem quite familiar from practical mathematics and, say, from the ability to do derivations using different methods—like from either geometry or algebra—and yet still end up with the same conclusions.
So if there’s an analog of relativistic invariance, what about analogs of phenomena like time dilation? In our Physics Project time dilation has a rather direct interpretation. To “progress in time” takes a certain amount of computational work. But motion in effect also takes a certain amount of computational work—in essence to continually recreate versions of something in different places. But from the ruliad on up there is ultimately only a certain amount of computational work that can be done—and if computational work is being “used up” on motion, there is less available to devote to progress in time, and so time will effectively run more slowly, leading to the experience of time dilation.
So what is the metamathematical analog of this? Presumably it’s that when you do derivations in math you can either stay in one area and directly make progress in that area, or you can “base yourself in some other area” and make progress only by continually translating back and forth. But ultimately that translation process will take computational work, and so will slow down your progress—leading to an analog of time dilation.
In physics, the speed of light defines the maximum amount of motion in space that can occur in a certain amount of time. In metamathematics, the analog is that there’s a maximum “translation distance” in metamathematical space that can be “bridged” with a certain amount of derivation. In physics we’re used to measuring spatial distance in meters—and time in seconds. In metamathematics we don’t yet have familiar units in which to measure, say, distance between mathematical concepts—or, for that matter, “amount of derivation” being done. But with the empirical metamathematics we’ll discuss in the next section we actually have the beginnings of a way to define such things, and to use what’s been achieved in the history of human mathematics to at least imagine “empirically measuring” what we might call “maximum metamathematical speed”.
It should be emphasized that we are only at the very beginning of exploring things like the analogs of relativity in metamathematics. One important piece of formal structure that we haven’t really discussed here is causal dependence, and causal graphs. We’ve talked at length about statements entailing other statements. But we haven’t talked about questions like which part of which statement is needed for some event to occur that will entail some other statement. And—while there’s no fundamental difficulty in doing it—we haven’t concerned ourselves with constructing causal graphs to represent causal relationships and causal dependencies between events.
When it comes to physical observers, there is a very direct interpretation of causal graphs that relates to what a physical observer can experience. But for mathematical observers—where the notion of time is less central—it’s less clear just what the interpretation of causal graphs should be. But one certainly expects that they will enter in the construction of any general “observer theory” that characterizes “observers like us” across both physics and mathematics.
We’ve discussed the overall structure of metamathematical space, and the general kind of sampling that we humans do of it (as “mathematical observers”) when we do mathematics. But what can we learn from the specifics of human mathematics, and the actual mathematical statements that humans have published over the centuries?
We might imagine that these statements are just ones that—as “accidents of history”—humans have “happened to find interesting”. But there’s definitely more to it—and potentially what’s there is a rich source of “empirical data” relevant to our physicalized laws of mathematics, and to what amounts to their “experimental validation”.
The situation with “human settlements” in metamathematical space is in a sense rather similar to the situation with human settlements in physical space. If we look at where humans have chosen to live and build cities, we’ll find a bunch of locations in 3D space. The details of where these are depend on history and many factors. But there’s a clear overarching theme, that’s in a sense a direct reflection of underlying physics: all the locations lie on the moreorless spherical surface of the Earth.
It’s not so straightforward to see what’s going on in the metamathematical case, not least because any notion of coordinatization seems to be much more complicated for metamathematical space than for physical space. But we can still begin by doing “empirical metamathematics” and asking questions about for example what amounts to where in metamathematical space we humans have so far established ourselves. And as a first example, let’s consider Boolean algebra.
Even to talk about something called “Boolean algebra” we have to be operating at a level far above the raw ruliad—where we’ve already implicitly aggregated vast numbers of emes to form notions of, for example, variables and logical operations.
But once we’re at this level we can “survey” metamathematical space just by enumerating possible symbolic statements that can be created using the operations we’ve set up for Boolean algebra (here And ∧, Or ∨ and Not ):
✕

But so far these are just raw, structural statements. To connect with actual Boolean algebra we must pick out which of these can be derived from the axioms of Boolean algebra, or, put another way, which of them are in the entailment cone of these axioms:
✕

Of all possible statements, it’s only an exponentially small fraction that turn out to be derivable:
✕

But in the case of Boolean algebra, we can readily collect such statements:
✕

We’ve typically explored entailment cones by looking at slices consisting of collections of theorems generated after a specified number of proof steps. But here we’re making a very different sampling of the entailment cone—looking in effect instead at theorems in order of their structural complexity as symbolic expressions.
In doing this kind of systematic enumeration we’re in a sense operating at a “finer level of granularity” than typical human mathematics. Yes, these are all “true theorems”. But mostly they’re not theorems that a human mathematician would ever write down, or specifically “consider interesting”. And for example only a small fraction of them have historically been given names—and are called out in typical logic textbooks:
✕

The reduction from all “structurally possible” theorems to just “ones we consider interesting” can be thought of as a form of coarse graining. And it could well be that this coarse graining would depend on all sorts of accidents of human mathematical history. But at least in the case of Boolean algebra there seems to be a surprisingly simple and “mechanical” procedure that can reproduce it.
Go through all theorems in order of increasing structural complexity, in each case seeing whether a given theorem can be proved from ones earlier in the list:
✕

It turns out that the theorems identified by humans as “interesting” coincide almost exactly with “root theorems” that cannot be proved from earlier theorems in the list. Or, put another way, the “coarse graining” that human mathematicians do seems (at least in this case) to essentially consist of picking out only those theorems that represent “minimal statements” of new information—and eliding away those that involve “extra ornamentation”.
But how are these “notable theorems” laid out in metamathematical space? Earlier we saw how the simplest of them can be reached after just a few steps in the entailment cone of a typical textbook axiom system for Boolean algebra. The full entailment cone rapidly gets unmanageably large but we can get a first approximation to it by generating individual proofs (using automated theorem proving) of our notable theorems, and then seeing how these “knit together” through shared intermediate lemmas in a tokenevent graph:
✕

Looking at this picture we see at least a hint that clumps of notable theorems are spread out across the entailment cone, only modestly building on each other—and in effect “staking out separated territories” in the entailment cone. But of the 11 notable theorems shown here, 7 depend on all 6 axioms, while 4 depend only on various different sets of 3 axioms—suggesting at least a certain amount of fundamental interdependence or coherence.
From the tokenevent graph we can derive a branchial graph that represents a very rough approximation to how the theorems are “laid out in metamathematical space”:
✕

We can get a potentially slightly better approximation by including proofs not just of notable theorems, but of all theorems up to a certain structural complexity. The result shows separation of notable theorems both in the multiway graph
✕

and in the branchial graph:
✕

In doing this empirical metamathematics we’re including only specific proofs rather than enumerating the whole entailment cone. We’re also using only a specific axiom system. And even beyond this, we’re using specific operators to write our statements in Boolean algebra.
In a sense each of these choices represents a particular “metamathematical coordinatization”—or particular reference frame or slice that we’re sampling in the ruliad.
For example, in what we’ve done above we’ve built up statements from And, Or and Not. But we can just as well use any other functionally complete sets of operators, such as the following (here each shown representing a few specific Boolean expressions):
✕

For each set of operators, there are different axiom systems that can be used. And for each axiom system there will be different proofs. Here are a few examples of axiom systems with a few different sets of operators—in each case giving a proof of the law of double negation (which has to be stated differently for different operators):
✕

Boolean algebra (or, equivalently, propositional logic) is a somewhat desiccated and thin example of mathematics. So what do we find if we do empirical metamathematics on other areas?
Let’s talk first about geometry—for which Euclid’s Elements provided the very first largescale historical example of an axiomatic mathematical system. The Elements started from 10 axioms (5 “postulates” and 5 “common notions”), then gave 465 theorems.
Each theorem was proved from previous ones, and ultimately from the axioms. Thus, for example, the “proof graph” (or “theorem dependency graph”) for Book 1, Proposition 5 (which says that angles at the base of an isosceles triangle are equal) is:
✕

One can think of this as a coarsegrained version of the proof graphs we’ve used before (which are themselves in turn “slices” of the entailment graph)—in which each node shows how a collection of “input” theorems (or axioms) entails a new theorem.
Here’s a slightly more complicated example (Book 1, Proposition 48) that ultimately depends on all 10 of the original axioms:
✕

And here’s the full graph for all the theorems in Euclid’s Elements:
✕

Of the 465 theorems here, 255 (i.e. 55%) depend on all 10 axioms. (For the much smaller number of notable theorems of Boolean algebra above we found that 64% depended on all 6 of our stated axioms.) And the general connectedness of this graph in effect reflects the idea that Euclid’s theorems represent a coherent body of connected mathematical knowledge.
The branchial graph gives us an idea of how the theorems are “laid out in metamathematical space”:
✕

One thing we notice is that theorems about different areas—shown here in different colors—tend to be separated in metamathematical space. And in a sense the seeds of this separation are already evident if we look “textually” at how theorems in different books of Euclid’s Elements refer to each other:
✕

Looking at the overall dependence of one theorem on others in effect shows us a very coarse form of entailment. But can we go to a finer level—as we did above for Boolean algebra? As a first step, we have to have an explicit symbolic representation for our theorems. And beyond that, we have to have a formal axiom system that describes possible transformations between these.
At the level of “whole theorem dependency” we can represent the entailment of Euclid’s Book 1, Proposition 1 from axioms as:
✕

But if we now use the full, formal axiom system for geometry that we discussed in a previous section we can use automated theorem proving to get a full proof of Book 1, Proposition 1:
✕

In a sense this is “going inside” the theorem dependency graph to look explicitly at how the dependencies in it work. And in doing this we see that what Euclid might have stated in words in a sentence or two is represented formally in terms of hundreds of detailed intermediate lemmas. (It’s also notable that whereas in Euclid’s version, the theorem depends only on 3 out of 10 axioms, in the formal version the theorem depends on 18 out of 20 axioms.)
How about for other theorems? Here is the theorem dependency graph from Euclid’s Elements for the Pythagorean theorem (which Euclid gives as Book 1, Proposition 47):
✕

The theorem depends on all 10 axioms, and its stated proof goes through 28 intermediate theorems (i.e. about 6% of all theorems in the Elements). In principle we can “unroll” the proof dependency graph to see directly how the theorem can be “built up” just from copies of the original axioms. Doing a first step of unrolling we get:
✕

And “flattening everything out” so that we don’t use any intermediate lemmas but just go back to the axioms to “reprove” everything we can derive the theorem from a “proof tree” with the following number of copies of each axiom (and a certain “depth” to reach that axiom):
✕

So how about a more detailed and formal proof? We could certainly in principle construct this using the axiom system we discussed above.
But an important general point is that the thing we in practice call “the Pythagorean theorem” can actually be set up in all sorts of different axiom systems. And as an example let’s consider setting it up in the main actual axiom system that working mathematicians typically imagine they’re (usually implicitly) using, namely ZFC set theory.
Conveniently, the Metamath formalized math system has accumulated about 40,000 theorems across mathematics, all with handconstructed proofs based ultimately on ZFC set theory. And within this system we can find the theorem dependency graph for the Pythagorean theorem:
✕

Altogether it involves 6970 intermediate theorems, or about 18% of all theorems in Metamath—including ones from many different areas of mathematics. But how does it ultimately depend on the axioms? First, we need to talk about what the axioms actually are. In addition to “pure ZFC set theory”, we need axioms for (predicate) logic, as well as ones that define real and complex numbers. And the way things are set up in Metamath’s “set.mm” there are (essentially) 49 basic axioms (9 for pure set theory, 15 for logic and 25 related to numbers). And much as in Euclid’s Elements we found that the Pythagorean theorem depended on all the axioms, so now here we find that the Pythagorean theorem depends on 48 of the 49 axioms—with the only missing axiom being the Axiom of Choice.
Just like in the Euclid’s Elements case, we can imagine “unrolling” things to see how many copies of each axiom are used. Here are the results—together with the “depth” to reach each axiom:
✕

And, yes, the numbers of copies of most of the axioms required to establish the Pythagorean theorem are extremely large.
There are several additional wrinkles that we should discuss. First, we’ve so far only considered overall theorem dependency—or in effect “coarsegrained entailment”. But the Metamath system ultimately gives complete proofs in terms of explicit substitutions (or, effectively, bisubstitutions) on symbolic expressions. So, for example, while the firstlevel “wholetheoremdependency” graph for the Pythagorean theorem is
✕

the full firstlevel entailment structure based on the detailed proof is (where the black vertices indicate “internal structural elements” in the proof—such as variables, class specifications and “inputs”):
✕

Another important wrinkle has to do with the concept of definitions. The Pythagorean theorem, for example, refers to squaring numbers. But what is squaring? What are numbers? Ultimately all these things have to be defined in terms of the “raw data structures” we’re using.
In the case of Boolean algebra, for example, we could set things up just using Nand (say denoted ∘), but then we could define And and Or in terms of Nand (say as and respectively). We could still write expressions using And and Or—but with our definitions we’d immediately be able to convert these to pure Nands. Axioms—say about Nand—give us transformations we can use repeatedly to make derivations. But definitions are transformations we use “just once” (like macro expansion in programming) to reduce things to the point where they involve only constructs that appear in the axioms.
In Metamath’s “set.mm” there are about 1700 definitions that effectively build up from “pure set theory” (as well as logic, structural elements and various axioms about numbers) to give the mathematical constructs one needs. So, for example, here is the definition dependency graph for addition (“+” or Plus):
✕

At the bottom are the basic constructs of logic and set theory—in terms of which things like order relations, complex numbers and finally addition are defined. The definition dependency graph for GCD, for example, is somewhat larger, though has considerable overlap at lower levels:
✕

Different constructs have definition dependency graphs of different sizes—in effect reflecting their “definitional distance” from set theory and the underlying axioms being used:
✕

In our physicalized approach to metamathematics, though, something like set theory is not our ultimate foundation. Instead, we imagine that everything is eventually built up from the raw ruliad, and that all the constructs we’re considering are formed from what amount to configurations of emes in the ruliad. We discussed above how constructs like numbers and logic can be obtained from a combinator representation of the ruliad.
We can view the definition dependency graph above as being an empirical example of how somewhat higherlevel definitions can be built up. From a computer science perspective, we can think of it as being like a type hierarchy. From a physics perspective, it’s as if we’re starting from atoms, then building up to molecules and beyond.
It’s worth pointing out, however, that even the top of the definition hierarchy in something like Metamath is still operating very much at an axiomatic kind of level. In the analogy we’ve been using, it’s still for the most part “formulating math at the molecular dynamics level” not at the more human “fluid dynamics” level.
We’ve been talking about “the Pythagorean theorem”. But even on the basis of set theory there are many different possible formulations one can give. In Metamath, for example, there is the pythag version (which is what we’ve been using), and there is also a (somewhat more general) pythi version. So how are these related? Here’s their combined theorem dependency graph (or at least the first two levels in it)—with red indicating theorems used only in deriving pythag, blue indicating ones used only in deriving pythi, and purple indicating ones used in both:
✕

And what we see is there’s a certain amount of “lowerlevel overlap” between the derivations of these variants of the Pythagorean theorem, but also some discrepancy—indicating a certain separation between these variants in metamathematical space.
So what about other theorems? Here’s a table of some famous theorems from all over mathematics, sorted by the total number of theorems on which proofs of them formulated in Metamath depend—giving also the number of axioms and definitions used in each case:
✕

The Pythagorean theorem (here the pythi formulation) occurs solidly in the second half. Some of the theorems with the fewest dependencies are in a sense very structural theorems. But it’s interesting to see that theorems from all sorts of different areas soon start appearing, and then are very much mixed together in the remainder of the list. One might have thought that theorems involving “more sophisticated concepts” (like Ramsey’s theorem) would appear later than “more elementary” ones (like the sum of angles of a triangle). But this doesn’t seem to be true.
There’s a distribution of what amount to “proof sizes” (or, more strictly, theorem dependency sizes)—from the Schröder–Bernstein theorem which relies on less than 4% of all theorems, to Dirichlet’s theorem that relies on 25%:
✕

If we look not at “famous” theorems, but at all theorems covered by Metamath, the distribution becomes broader, with many shorttoprove “glue” or essentially “definitional” lemmas appearing:
✕

But using the list of famous theorems as an indication of the “math that mathematicians care about” we can conclude that there is a kind of “metamathematical floor” of results that one needs to reach before “things that we care about” start appearing. It’s a bit like the situation in our Physics Project—where the vast majority of microscopic events that happen in the universe seem to be devoted merely to knitting together the structure of space, and only “on top of that” can events which can be identified with things like particles and motion appear.
And indeed if we look at the “prerequisites” for different famous theorems, we indeed find that there is a large overlap (indicated by lighter colors)—supporting the impression that in a sense one first has “knit together metamathematical space” and only then can one start generating “interesting theorems”:
✕

Another way to see “underlying overlap” is to look at what axioms different theorems ultimately depend on (the colors indicate the “depth” at which the axioms are reached):
✕

The theorems here are again sorted in order of “dependency size”. The “verysettheoretic” ones at the top don’t depend on any of the various numberrelated axioms. And quite a few “integerrelated theorems” don’t depend on complex number axioms. But otherwise, we see that (at least according to the proofs in set.mm) most of the “famous theorems” depend on almost all the axioms. The only axiom that’s rarely used is the Axiom of Choice—on which only things like “analysisrelated theorems” such as the Fundamental Theorem of Calculus depend.
If we look at the “depth of proof” at which axioms are reached, there’s a definite distribution:
✕

And this may be about as robust as any a “statistical characteristic” of the sampling of metamathematical space corresponding to mathematics that is “important to humans”. If we were, for example, to consider all possible theorems in the entailment cone we’d get a very different picture. But potentially what we see here may be a characteristic signature of what’s important to a “mathematical observer like us”.
Going beyond “famous theorems” we can ask, for example, about all the 42,000 or so identified theorems in the Metamath set.mm collection. Here’s a rough rendering of their theorem dependency graph, with different colors indicating theorems in different fields of math (and with explicit edges removed):
✕

There’s some evidence of a certain overall uniformity, but we can see definite “patches of metamathematical space” dominated by different areas of mathematics. And here’s what happens if we zoom in on the central region, and show where famous theorems lie:
✕

A bit like we saw for the named theorems of Boolean algebra clumps of famous theorems appear to somehow “stake out their own separate metamathematical territory”. But notably the famous theorems seem to show some tendency to congregate near “borders” between different areas of mathematics.
To get more of a sense of the relation between these different areas, we can make what amounts to a highly coarsened branchial graph, effectively laying out whole areas of mathematics in metamathematical space, and indicating their crossconnections:
✕

We can see “highways” between certain areas. But there’s also a definite “background entanglement” between areas, reflecting at least a certain background uniformity in metamathematical space, as sampled with the theorems identified in Metamath.
It’s not the case that all these areas of math “look the same”—and for example there are differences in their distributions of theorem dependency sizes:
✕

In areas like algebra and number theory, most proofs are fairly long, as revealed by the fact that they have many dependencies. But in set theory there are plenty of short proofs, and in logic all the proofs of theorems that have been included in Metamath are short.
What if we look at the overall dependency graph for all theorems in Metamath? Here’s the adjacency matrix we get:
✕

The results are triangular because theorems in the Metamath database are arranged so that later ones only depend on earlier ones. And while there’s considerable patchiness visible, there still seems to be a certain overall background level of uniformity.
In doing this empirical metamathematics we’re sampling metamathematical space just through particular “human mathematical settlements” in it. But even from the distribution of these “settlements” we potentially begin to see evidence of a certain background uniformity in metamathematical space.
Perhaps in time as more connections between different areas of mathematics are found human mathematics will gradually become more “uniformly settled” in metamathematical space—and closer to what we might expect from entailment cones and ultimately from the raw ruliad. But it’s interesting to see that even with fairly basic empirical metamathematics—operating on a current corpus of human mathematical knowledge—it may already be possible to see signs of some features of physicalized metamathematics.
One day, no doubt, we’ll be able do experiments in physics that take our “parsing” of the physical universe in terms of things like space and time and quantum mechanics—and reveal “slices” of the raw ruliad underneath. But perhaps something similar will also be possible in empirical metamathematics: to construct what amounts to a metamathematical microscope (or telescope) through which we can see aspects of the ruliad.
It’s an old and oftasked question: is mathematics ultimately something that is invented, or something that is discovered? Or, put another way: is mathematics something arbitrarily set up by us humans, or something inevitable and fundamental and in a sense “preexisting”, that we merely get to explore? In the past it’s seemed as if these were two fundamentally incompatible possibilities. But the framework we’ve built here in a sense blends them both into a rather unexpected synthesis.
The starting point is the idea that mathematics—like physics—is rooted in the ruliad, which is a representation of formal necessity. Actual mathematics as we “experience” it is—like physics—based on the particular sampling we make of the ruliad. But then the crucial point is that very basic characteristics of us as “observers” are sufficient to constrain that experience to be our general mathematics—or our physics.
At some level we can say that “mathematics is always there”—because every aspect of it is ultimately encoded in the ruliad. But in another sense we can say that the mathematics we have is all “up to us”—because it’s based on how we sample the ruliad. But the point is that that sampling is not somehow “arbitrary”: if we’re talking about mathematics for us humans then it’s us ultimately doing the sampling, and the sampling is inevitably constrained by general features of our nature.
A major discovery from our Physics Project is that it doesn’t take much in the way of constraints on the observer to deeply constrain the laws of physics they will perceive. And similarly we posit here that for “observers like us” there will inevitably be general (“physicalized”) laws of mathematics, that make mathematics inevitably have the general kinds of characteristics we perceive it to have (such as the possibility of doing mathematics at a high level, without always having to drop down to an “atomic” level).
Particularly over the past century there’s been the idea that mathematics can be specified in terms of axiom systems, and that these axiom systems can somehow be “invented at will”. But our framework does two things. First, it says that “far below” axiom systems is the raw ruliad, which in a sense represents all possible axiom systems. And second, it says that whatever axiom systems we perceive to be “operating” will be ones that we as observers can pick out from the underlying structure of the ruliad.
At a formal level we can “invent” an arbitrary axiom system (and it’ll be somewhere in the ruliad), but only certain axiom systems will be ones that describe what we as “mathematical observers” can perceive. In a physics setting we might construct some formal physical theory that talks about detailed patterns in the atoms of space (or molecules in a gas), but the kind of “coarsegrained” observations that we can make won’t capture these. Put another way, observers like us can perceive certain kinds of things, and can describe things in terms of these perceptions. But with the wrong kind of theory—or “axioms”—these descriptions won’t be sufficient—and only an observer who’s “shredded” down to a more “atomic” level will be able to track what’s going on.
There’s lots of different possible math—and physics—in the ruliad. But observers like us can only “access” a certain type. Some putative alien not like us might access a different type—and might end up with both a different math and a different physics. Deep underneath they—like us—would be talking about the ruliad. But they’d be taking different samples of it, and describing different aspects of it.
For much of the history of mathematics there was a close alignment between the mathematics that was done and what we perceive in the world. For example, Euclidean geometry—with its whole axiomatic structure—was originally conceived just as an idealization of geometrical things that we observe about the world. But by the late 1800s the idea had emerged that one could create “disembodied” axiomatic systems with no particular grounding in our experience in the world.
And, yes, there are many possible disembodied axiom systems that one can set up. And in doing ruliology and generally exploring the computational universe it’s interesting to investigate what they do. But the point is that this is something quite different from mathematics as mathematics is normally conceived. Because in a sense mathematics—like physics—is a “more human” activity that’s based on what “observers like us” make of the raw formal structure that is ultimately embodied in the ruliad.
When it comes to physics there are, it seems, two crucial features of “observers like us”. First, that we’re computationally bounded. And second, that we have the perception that we’re persistent—and have a definite and continuous thread of experience. At the level of atoms of space, we’re in a sense constantly being “remade”. But we nevertheless perceive it as always being the “same us”.
This single seemingly simple assumption has farreaching consequences. For example, it leads us to experience a single thread of time. And from the notion that we maintain a continuity of experience from every successive moment to the next we are inexorably led to the idea of a perceived continuum—not only in time, but also for motion and in space. And when combined with intrinsic features of the ruliad and of multicomputation in general, what comes out in the end is a surprisingly precise description of how we’ll perceive our universe to operate—that seems to correspond exactly with known core laws of physics.
What does that kind of thinking tell us about mathematics? The basic point is that—since in the end both relate to humans—there is necessarily a close correspondence between physical and mathematical observers. Both are computationally bounded. And the assumption of persistence in time for physical observers becomes for mathematical observers the concept of maintaining coherence as more statements are accumulated. And when combined with intrinsic features of the ruliad and multicomputation this then turns out to imply the kind of physicalized laws of mathematics that we’ve discussed.
In a formal axiomatic view of mathematics one just imagines that one invents axioms and sees their consequences. But what we’re describing here is a view of mathematics that is ultimately just about the ways that we as mathematical observers sample and experience the ruliad. And if we use axiom systems it has to be as a kind of “intermediate language” that helps us make a slightly higherlevel description of some corner of the raw ruliad. But actual “humanlevel” mathematics—like humanlevel physics—operates at a higher level.
Our everyday experience of the physical world gives us the impression that we have a kind of “direct access” to many foundational features of physics, like the existence of space and the phenomenon of motion. But our Physics Project implies that these are not concepts that are in any sense “already there”; they are just things that emerge from the raw ruliad when you “parse” it in the kinds of ways observers like us do.
In mathematics it’s less obvious (at least to all but perhaps experienced pure mathematicians) that there’s “direct access” to anything. But in our view of mathematics here, it’s ultimately just like physics—and ultimately also rooted in the ruliad, but sampled not by physical observers but by mathematical ones.
So from this point view there’s just as much that’s “real” underneath mathematics as there is underneath physics. The mathematics is sampled slightly differently (though very similarly)—but we should not in any sense consider it “fundamentally more abstract”.
When we think of ourselves as entities within the ruliad, we can build up what we might consider a “fully abstract” description of how we get our “experience” of physics. And we can basically do the same thing for mathematics. So if we take the commonsense point of view that physics fundamentally exists “for real”, we’re forced into the same point of view for mathematics. In other words, if we say that the physical universe exists, so must we also say that in some fundamental sense, mathematics also exists.
It’s not something we as humans “just make”, but it is something that is made through our particular way of observing the ruliad, that is ultimately defined by our particular characteristics as observers, with our particular core assumptions about the world, our particular kinds of sensory experience, and so on.
So what can we say in the end about whether mathematics is “invented” or “discovered”? It is neither. Its underpinnings are the ruliad, whose structure is a matter of formal necessity. But its perceived form for us is determined by our intrinsic characteristics as observers. We neither get to “arbitrarily invent” what’s underneath, nor do we get to “arbitrarily discover” what’s already there. The mathematics we see is the result of a combination of formal necessity in the underlying ruliad, and the particular forms of perception that we—as entities like us—have. Putative aliens could have quite different mathematics, but not because the underlying ruliad is any different for them, but because their forms of perception might be different. And it’s the same with physics: even though they “live in the same physical universe” their perception of the laws of physics could be quite different.
When they were first developed in antiquity the axioms of Euclidean geometry were presumably intended basically as a kind of “tightening” of our everyday impressions of geometry—that would aid in being able to deduce what was true in geometry. But by the mid1800s—between nonEuclidean geometry, group theory, Boolean algebra and quaternions—it had become clear that there was a range of abstract axiom systems one could in principle consider. And by the time of Hilbert’s program around 1900 the pure process of deduction was in effect being viewed as an end in itself—and indeed the core of mathematics—with axiom systems being seen as “starter material” pretty much just “determined by convention”.
In practice even today very few different axiom systems are ever commonly used—and indeed in A New Kind of Science I was able to list essentially all of them comfortably on a couple of pages. But why these axiom systems and not others? Despite the idea that axiom systems could ultimately be arbitrary, the concept was still that in studying some particular area of mathematics one should basically have an axiom system that would provide a “tight specification” of whatever mathematical object or structure one was trying to talk about. And so, for example, the Peano axioms are what became used for talking about arithmeticstyle operations on integers.
In 1931, however, Gödel’s theorem showed that actually these axioms weren’t strong enough to constrain one to be talking only about integers: there were also other possible models of the axiom system, involving all sorts of exotic “nonstandard arithmetic”. (And moreover, there was no finite way to “patch” this issue.) In other words, even though the Peano axioms had been invented—like Euclid’s axioms for geometry—as a way to describe a definite “intuitive” mathematical thing (in this case, integers) their formal axiomatic structure “had a life of its own” that extended (in some sense, infinitely) beyond its original intended purpose.
Both geometry and arithmetic in a sense had foundations in everyday experience. But for set theory dealing with infinite sets there was never an obvious intuitive base rooted in everyday experience. Some extrapolations from finite sets were clear. But in covering infinite sets various axioms (like the Axiom of Choice) were gradually added to capture what seemed like “reasonable” mathematical assertions.
But one example whose status for a long time wasn’t clear was the Continuum Hypothesis—which asserts that the “next distinct possible cardinality” after the cardinality of the integers is : the cardinality of real numbers (i.e. of “the continuum”). Was this something that followed from previously accepted axioms of set theory? And if it was added, would it even be consistent with them? In the early 1960s it was established that actually the Continuum Hypothesis is independent of the other axioms.
With the axiomatic view of the foundations of mathematics that’s been popular for the past century or so it seems as if one could, for example, just choose at will whether to include the Continuum Hypothesis (or its negation) as an axiom in set theory. But with the approach to the foundations of mathematics that we’ve developed here, this is no longer so clear.
Recall that in our approach, everything is ultimately rooted in the ruliad—with whatever mathematics observers like us “experience” just being the result of the particular sampling we do of the ruliad. And in this picture, axiom systems are a particular representation of fairly lowlevel features of the sampling we do of the raw ruliad.
If we could do any kind of sampling we want of the ruliad, then we’d presumably be able to get all possible axiom systems—as intermediatelevel “waypoints” representing different kinds of slices of the ruliad. But in fact by our nature we are observers capable of only certain kinds of sampling of the ruliad.
We could imagine “alien observers” not like us who could for example make whatever choice they want about the Continuum Hypothesis. But given our general characteristics as observers, we may be forced into a particular choice. Operationally, as we’ve discussed above, the wrong choice could, for example, be incompatible with an observer who “maintains coherence” in metamathematical space.
Let’s say we have a particular axiom stated in standard symbolic form. “Underneath” this axiom there will typically be at the level of the raw ruliad a huge cloud of possible configurations of emes that can represent the axiom. But an “observer like us” can only deal with a coarsegrained version in which all these different configurations are somehow considered equivalent. And if the entailments from “nearby configurations” remain nearby, then everything will work out, and the observer can maintain a coherent view of what’s going, for example just in terms of symbolic statements about axioms.
But if instead different entailments of raw configurations of emes lead to very different places, the observer will in effect be “shredded”—and instead of having definite coherent “singleminded” things to say about what happens, they’ll have to separate everything into all the different cases for different configurations of emes. Or, as we’ve said it before, the observer will inevitably end up getting “shredded”—and not be able to come up with definite mathematical conclusions.
So what specifically can we say about the Continuum Hypothesis? It’s not clear. But conceivably we can start by thinking of as characterizing the “base cardinality” of the ruliad, while characterizes the base cardinality of a firstlevel hyperruliad that could for example be based on Turing machines with oracles for their halting problems. And it could be that for us to conclude that the Continuum Hypothesis is false, we’d have to somehow be straddling the ruliad and the hyperruliad, which would be inconsistent with us maintaining a coherent view of mathematics. In other words, the Continuum Hypothesis might somehow be equivalent to what we’ve argued before is in a sense the most fundamental “contingent fact”—that just as we live in a particular location in physical space—so also we live in the ruliad and not the hyperruliad.
We might have thought that whatever we might see—or construct—in mathematics would in effect be “entirely abstract” and independent of anything about physics, or our experience in the physical world. But particularly insofar as we’re thinking about mathematics as done by humans we’re dealing with “mathematical observers” that are “made of the same stuff” as physical observers. And this means that whatever general constraints or features exist for physical observers we can expect these to carry over to mathematical observers—so it’s no coincidence that both physical and mathematical observers have the same core characteristics, of computational boundedness and “assumption of coherence”.
And what this means is that there’ll be a fundamental correlation between things familiar from our experience in the physical world and what shows up in our mathematics. We might have thought that the fact that Euclid’s original axioms were based on our human perceptions of physical space would be a sign that in some “overall picture” of mathematics they should be considered arbitrary and not in any way central. But the point is that in fact our notions of space are central to our characteristics as observers. And so it’s inevitable that “physicalexperienceinformed” axioms like those for Euclidean geometry will be what appear in mathematics for “observers like us”.
How does the “size of mathematics” compare to the size of our physical universe? In the past this might have seemed like an absurd question, that tries to compare something abstract and arbitrary with something real and physical. But with the idea that both mathematics and physics as we experience them emerge from our sampling of the ruliad, it begins to seem less absurd.
At the lowest level the ruliad can be thought of as being made up of atoms of existence that we call emes. As physical observers we interpret these emes as atoms of space, or in effect the ultimate raw material of the physical universe. And as mathematical observers we interpret them as the ultimate elements from which the constructs of mathematics are built.
As the entangled limit of all possible computations, the whole ruliad is infinite. But we as physical or mathematical observers sample only limited parts of it. And that means we can meaningfully ask questions like how the number of emes in these parts compare—or, in effect, how big is physics as we experience it compared to mathematics.
In some ways an eme is like a bit. But the concept of emes is that they’re “actual atoms of existence”—from which “actual stuff” like the physical universe and its history are made—rather than just “static informational representations” of it. As soon as we imagine that everything is ultimately computational we are immediately led to start thinking of representing it in terms of bits. But the ruliad is not just a representation. It’s in some way something lower level. It’s the “actual stuff” that everything is made of. And what defines our particular experience of physics or of mathematics is the particular samples we as observers take of what’s in the ruliad.
So the question is now how many emes there are in those samples. Or, more specifically, how many emes “matter to us” in building up our experience.
Let’s return to an analogy we’ve used several times before: a gas made of molecules. In the volume of a room there might be individual molecules, each on average colliding every seconds. So that means that our “experience of the room” over the course of a minute or so might sample collisions. Or, in terms closer to our Physics Project, we might say that there are perhaps “collision events” in the causal graph that defines what we experience.
But these “collision events” aren’t something fundamental; they have what amounts to “internal structure” with many associated parameters about location, time, molecular configuration, etc.
Our Physics Project, however, suggests that—far below for example our usual notions of space and time—we can in fact have a truly fundamental definition of what’s happening in the universe, ultimately in terms of emes. We don’t yet know the “physical scale” for this—and in the end we presumably need experiments to determine that. But rather rickety estimates based on a variety of assumptions suggest that the elementary length might be around meters, with the elementary time being around seconds.
And with these estimates we might conclude that our “experience of a room for a minute” would involve sampling perhaps update events, that create about this number of atoms of space.
But it’s immediately clear that this is in a sense a gross underestimate of the total number of emes that we’re sampling. And the reason is that we’re not accounting for quantum mechanics, and for the multiway nature of the evolution of the universe. We’ve so far only considered one “thread of time” at one “position in branchial space”. But in fact there are many threads of time, constantly branching and merging. So how many of these do we experience?
In effect that depends on our size in branchial space. In physical space “human scale” is of order a meter—or perhaps elementary lengths. But how big is it in branchial space?
The fact that we’re so large compared to the elementary length is the reason that we consistently experience space as something continuous. And the analog in branchial space is that if we’re big compared to the “elementary branchial distance between branches” then we won’t experience the different individual histories of these branches, but only an aggregate “objective reality” in which we conflate together what happens on all the branches. Or, put another way, being large in branchial space is what makes us experience classical physics rather than quantum mechanics.
Our estimates for branchial space are even more rickety than for physical space. But conceivably there are on the order of “instantaneous parallel threads of time” in the universe, and encompassed by our instantaneous experience—implying that in our minutelong experience we might sample a total of on the order of close to emes.
But even this is a vast underestimate. Yes, it tries to account for our extent in physical space and in branchial space. But then there’s also rulial space—which in effect is what “fills out” the whole ruliad. So how big are we in that space? In essence that’s like asking how many different possible sequences of rules there are that are consistent with our experience.
The total conceivable number of sequences associated with emes is roughly the number of possible hypergraphs with nodes—or around . But the actual number consistent with our experience is smaller, in particular as reflected by the fact that we attribute specific laws to our universe. But when we say “specific laws” we have to recognize that there is a finiteness to our efforts at inductive inference which inevitably makes these laws at least somewhat uncertain to us. And in a sense that uncertainty is what represents our “extent in rulial space”.
But if we want to count the emes that we “absorb” as physical observers, it’s still going to be a huge number. Perhaps the base may be lower—say —but there’s still a vast exponent, suggesting that if we include our extent in rulial space, we as physical observers may experience numbers of emes like .
But let’s say we go beyond our “everyday humanscale experience”. For example, let’s ask about “experiencing” our whole universe. In physical space, the volume of our current universe is about times larger than “human scale” (while human scale is perhaps times larger than the “scale of the atoms of space”). In branchial space, conceivably our current universe is times larger than “human scale”. But these differences absolutely pale in comparison to the sizes associated with rulial space.
We might try to go beyond “ordinary human experience” and for example measure things using tools from science and technology. And, yes, we could then think about “experiencing” lengths down to meters, or something close to “single threads” of quantum histories. But in the end, it’s still the rulial size that dominates, and that’s where we can expect most of the vast number of emes that form of our experience of the physical universe to come from.
OK, so what about mathematics? When we think about what we might call humanscale mathematics, and talk about things like the Pythagorean theorem, how many emes are there “underneath”? “Compiling” our theorem down to typical traditional mathematical axioms, we’ve seen that we’ll routinely end up with expressions containing, say, symbolic elements. But what happens if we go “below that”, compiling these symbolic elements—which might include things like variables and operators—into “pure computational elements” that we can think of as emes? We’ve seen a few examples, say with combinators, that suggest that for the traditional axiomatic structures of mathematics, we might need another factor of maybe roughly .
These are incredibly rough estimates, but perhaps there’s a hint that there’s “further to go” to get from humanscale for a physical observer down to atoms of space that correspond to emes, than there is to get from humanscale for a mathematical observer down to emes.
Just like in physics, however, this kind of “static drilldown” isn’t the whole story for mathematics. When we talk about something like the Pythagorean theorem, we’re really referring to a whole cloud of “humanequivalent” points in metamathematical space. The total number of “possible points” is basically the size of the entailment cone that contains something like the Pythagorean theorem. The “height” of the entailment cone is related to typical lengths of proofs—which for current human mathematics might be perhaps hundreds of steps.
And this would lead to overall sizes of entailment cones of very roughly theorems. But within this “how big” is the cloud of variants corresponding to particular “humanrecognized” theorems? Empirical metamathematics could provide additional data on this question. But if we very roughly imagine that half of every proof is “flexible”, we’d end up with things like variants. So if we asked how many emes correspond to the “experience” of the Pythagorean theorem, it might be, say, .
To give an analogy of “everyday physical experience” we might consider a mathematician thinking about mathematical concepts, and maybe in effect pondering a few tens of theorems per minute—implying according to our extremely rough and speculative estimates that while typical “specific humanscale physics experience” might involve emes, specific humanscale mathematics experience might involve emes (a number comparable, for example, to the number of physical atoms in our universe).
What if instead of considering “everyday mathematical experience” we consider all humanly explored mathematics? On the scales we’re describing, the factors are not large. In the history of human mathematics, only a few million theorems have been published. If we think about all the computations that have been done in the service of mathematics, it’s a somewhat larger factor. I suspect Mathematica is the dominant contributor here—and we can estimate that the total number of Wolfram Language operations corresponding to “humanlevel mathematics” done so far is perhaps .
But just like for physics, all these numbers pale in comparison with those introduced by rulial sizes. We’ve talked essentially about a particular path from emes through specific axioms to theorems. But the ruliad in effect contains all possible axiom systems. And if we start thinking about enumerating these—and effectively “populating all of rulial space”—we’ll end up with exponentially more emes.
But as with the perceived laws of physics, in mathematics as done by humans it’s actually just a narrow slice of rulial space that we’re sampling. It’s like a generalization of the idea that something like arithmetic as we imagine it can be derived from a whole cloud of possible axiom systems. It’s not just one axiom system; but it’s also not all possible axiom systems.
One can imagine doing some combination of ruliology and empirical metamathematics to get an estimate of “how broad” humanequivalent axiom systems (and their construction from emes) might be. But the answer seems likely to be much smaller than the kinds of sizes we have been estimating for physics.
It’s important to emphasize that what we’ve discussed here is extremely rough—and speculative. And indeed I view its main value as being to provide an example of how to imagine thinking through things in the context of the ruliad and the framework around it. But on the basis of what we’ve discussed, we might make the very tentative conclusion that “humanexperienced physics” is bigger than “humanexperienced mathematics”. Both involve vast numbers of emes. But physics seems to involve a lot more. In a sense—even with all its abstraction—the suspicion is that there’s “less ultimately in mathematics” as far as we’re concerned than there is in physics. Though by any ordinary human standards, mathematics still involves absolutely vast numbers of emes.
The human activity that we now call “mathematics” can presumably trace its origins into prehistory. What might have started as “a single goat”, “a pair of goats”, etc. became a story of abstract numbers that could be indicated purely by things like tally marks. In Babylonian times the practicalities of a citybased society led to all sorts of calculations involving arithmetic and geometry—and basically everything we now call “mathematics” can ultimately be thought of as a generalization of these ideas.
The tradition of philosophy that emerged in Greek times saw mathematics as a kind of reasoning. But while much of arithmetic (apart from issues of infinity and infinitesimals) could be thought of in explicit calculational ways, precise geometry immediately required an idealization—specifically the concept of a point having no extent, or equivalently, the continuity of space. And in an effort to reason on top of this idealization, there emerged the idea of defining axioms and making abstract deductions from them.
But what kind of a thing actually was mathematics? Plato talked about things we sense in the external world, and things we conceptualize in our internal thoughts. But he considered mathematics to be at its core an example of a third kind of thing: something from an abstract world of ideal forms. And with our current thinking, there is an immediate resonance between this concept of ideal forms and the concept of the ruliad.
But for most of the past two millennia of the actual development of mathematics, questions about what it ultimately was lay in the background. An important step was taken in the late 1600s when Newton and others “mathematicized” mechanics, at first presenting what they did in the form of axioms similar to Euclid’s. Through the 1700s mathematics as a practical field was viewed as some kind of precise idealization of features of the world—though with an increasingly elaborate tower of formal derivations constructed in it. Philosophy, meanwhile, typically viewed mathematics—like logic—mostly as an example of a system in which there was a formal process of derivation with a “necessary” structure not requiring reference to the real world.
But in the first half of the 1800s there arose several examples of systems where axioms—while inspired by features of the world—ultimately seemed to be “just invented” (e.g. group theory, curved space, quaternions, Boolean algebra, …). A push towards increasing rigor (especially for calculus and the nature of real numbers) led to more focus on axiomatization and formalization—which was still further emphasized by the appearance of a few nonconstructive “purely formal” proofs.
But if mathematics was to be formalized, what should its underlying primitives be? One obvious choice seemed to be logic, which had originally been developed by Aristotle as a kind of catalog of human arguments, but two thousand years later felt basic and inevitable. And so it was that Frege, followed by Whitehead and Russell, tried to start “constructing mathematics” from “pure logic” (along with set theory). Logic was in a sense a rather lowlevel “machine code”, and it took hundreds of pages of unreadable (if impressivelooking) “code” for Whitehead and Russell, in their 1910 Principia Mathematica, to get to 1 + 1 = 2.
Meanwhile, starting around 1900, Hilbert took a slightly different path, essentially representing everything with what we would now call symbolic expressions, and setting up axioms as relations between these. But what axioms should be used? Hilbert seemed to feel that the core of mathematics lay not in any “external meaning” but in the pure formal structure built up from whatever axioms were used. And he imagined that somehow all the truths of mathematics could be “mechanically derived” from axioms, a bit, as he said in a certain resonance with our current views, like the “great calculating machine, Nature” does it for physics.
Not all mathematicians, however, bought into this “formalist” view of what mathematics is. And in 1931 Gödel managed to prove from inside the formal axiom system traditionally used for arithmetic that this system had a fundamental incompleteness that prevented it from ever having anything to say about certain mathematical statements. But Gödel seems to have maintained a more Platonic belief about mathematics: that even though the axiomatic method falls short, the truths of mathematics are in some sense still “all there”, and it’s potentially possible for the human mind to have “direct access” to them. And while this is not quite the same as our picture of the mathematical observer accessing the ruliad, there’s again some definite resonance here.
But, OK, so how has mathematics actually conducted itself over the past century? Typically there’s at least lip service paid to the idea that there are “axioms underneath”—usually assumed to be those from set theory. There’s been significant emphasis placed on the idea of formal deduction and proof—but not so much in terms of formally building up from axioms as in terms of giving narrative expositions that help humans understand why some theorem might follow from other things they know.
There’s been a field of “mathematical logic” concerned with using mathematicslike methods to explore mathematicslike aspects of formal axiomatic systems. But (at least until very recently) there’s been rather little interaction between this and the “mainstream” study of mathematics. And for example phenomena like undecidability that are central to mathematical logic have seemed rather remote from typical pure mathematics—even though many actual longunsolved problems in mathematics do seem likely to run into it.
But even if formal axiomatization may have been something of a sideshow for mathematics, its ideas have brought us what is without much doubt the single most important intellectual breakthrough of the twentieth century: the abstract concept of computation. And what’s now become clear is that computation is in some fundamental sense much more general than mathematics.
At a philosophical level one can view the ruliad as containing all computation. But mathematics (at least as it’s done by humans) is defined by what a “mathematical observer like us” samples and perceives in the ruliad.
The most common “core workflow” for mathematicians doing pure mathematics is first to imagine what might be true (usually through a process of intuition that feels a bit like making “direct access to the truths of mathematics”)—and then to “work backwards” to try to construct a proof. As a practical matter, though, the vast majority of “mathematics done in the world” doesn’t follow this workflow, and instead just “runs forward”—doing computation. And there’s no reason for at least the innards of that computation to have any “humanized character” to it; it can just involve the raw processes of computation.
But the traditional pure mathematics workflow in effect depends on using “humanlevel” steps. Or if, as we described earlier, we think of lowlevel axiomatic operations as being like molecular dynamics, then it involves operating at a “fluid dynamics” level.
A century ago efforts to “globally understand mathematics” centered on trying to find common axiomatic foundations for everything. But as different areas of mathematics were explored (and particularly ones like algebraic topology that cut across existing disciplines) it began to seem as if there might also be “topdown” commonalities in mathematics, in effect directly at the “fluid dynamics” level. And within the last few decades, it’s become increasingly common to use ideas from category theory as a general framework for thinking about mathematics at a high level.
But there’s also been an effort to progressively build up—as an abstract matter—formal “higher category theory”. A notable feature of this has been the appearance of connections to both geometry and mathematical logic—and for us a connection to the ruliad and its features.
The success of category theory has led in the past decade or so to interest in other highlevel structural approaches to mathematics. A notable example is homotopy type theory. The basic concept is to characterize mathematical objects not by using axioms to describe properties they should have, but instead to use “types” to say “what the objects are” (for example, “mapping from reals to integers”). Such type theory has the feature that it tends to look much more “immediately computational” than traditional mathematical structures and notation—as well as making explicit proofs and other metamathematical concepts. And in fact questions about types and their equivalences wind up being very much like the questions we’ve discussed for the multiway systems we’re using as metamodels for mathematics.
Homotopy type theory can itself be set up as a formal axiomatic system—but with axioms that include what amount to metamathematical statements. A key example is the univalence axiom which essentially states that things that are equivalent can be treated as the same. And now from our point of view here we can see this being essentially a statement of metamathematical coarse graining—and a piece of defining what should be considered “mathematics” on the basis of properties assumed for a mathematical observer.
When Plato introduced ideal forms and their distinction from the external and internal world the understanding of even the fundamental concept of computation—let alone multicomputation and the ruliad—was still more than two millennia in the future. But now our picture is that everything can in a sense be viewed as part of the world of ideal forms that is the ruliad—and that not only mathematics but also physical reality are in effect just manifestations of these ideal forms.
But a crucial aspect is how we sample the “ideal forms” of the ruliad. And this is where the “contingent facts” about us as human “observers” enter. The formal axiomatic view of mathematics can be viewed as providing one kind of lowlevel description of the ruliad. But the point is that this description isn’t aligned with what observers like us perceive—or with what we will successfully be able to view as humanlevel mathematics.
A century ago there was a movement to take mathematics (as well, as it happens, as other fields) beyond its origins in what amount to human perceptions of the world. But what we now see is that while there is an underlying “world of ideal forms” embodied in the ruliad that has nothing to do with us humans, mathematics as we humans do it must be associated with the particular sampling we make of that underlying structure.
And it’s not as if we get to pick that sampling “at will”; the sampling we do is the result of fundamental features of us as humans. And an important point is that those fundamental features determine our characteristics both as mathematical observers and as physical observers. And this fact leads to a deep connection between our experience of physics and our definition of mathematics.
Mathematics historically began as a formal idealization of our human perception of the physical world. Along the way, though, it began to think of itself as a more purely abstract pursuit, separated from both human perception and the physical world. But now, with the general idea of computation, and more specifically with the concept of the ruliad, we can in a sense see what the limit of such abstraction would be. And interesting though it is, what we’re now discovering is that it’s not the thing we call mathematics. And instead, what we call mathematics is something that is subtly but deeply determined by general features of human perception—in fact, essentially the same features that also determine our perception of the physical world.
The intellectual foundations and justification are different now. But in a sense our view of mathematics has come full circle. And we can now see that mathematics is in fact deeply connected to the physical world and our particular perception of it. And we as humans can do what we call mathematics for basically the same reason that we as humans manage to parse the physical world to the point where we can do science about it.
Having talked a bit about historical context let’s now talk about what the things we’ve discussed here mean for the future of mathematics—both in theory and in practice.
At a theoretical level we’ve characterized the story of mathematics as being the story of a particular way of exploring the ruliad. And from this we might think that in some sense the ultimate limit of mathematics would be to just deal with the ruliad as a whole. But observers like us—at least doing mathematics the way we normally do it—simply can’t do that. And in fact, with the limitations we have as mathematical observers we can inevitably sample only tiny slices of the ruliad.
But as we’ve discussed, it is exactly this that leads us to experience the kinds of “general laws of mathematics” that we’ve talked about. And it is from these laws that we get a picture of the “largescale structure of mathematics”—that turns out to be in many ways similar to the picture of the largescale structure of our physical universe that we get from physics.
As we’ve discussed, what corresponds to the coherent structure of physical space is the possibility of doing mathematics in terms of highlevel concepts—without always having to drop down to the “atomic” level. Effective uniformity of metamathematical space then leads to the idea of “pure metamathematical motion”, and in effect the possibility of translating at a high level between different areas of mathematics. And what this suggests is that in some sense “all highlevel areas of mathematics” should ultimately be connected by “highlevel dualities”—some of which have already been seen, but many of which remain to be discovered.
Thinking about metamathematics in physicalized terms also suggests another phenomenon: essentially an analog of gravity for metamathematics. As we discussed earlier, in direct analogy to the way that “larger densities of activity” in the spatial hypergraph for physics lead to a deflection in geodesic paths in physical space, so also larger “entailment density” in metamathematical space will lead to deflection in geodesic paths in metamathematical space. And when the entailment density gets sufficiently high, it presumably becomes inevitable that these paths will all converge, leading to what one might think of as a “metamathematical singularity”.
In the spacetime case, a typical analog would be a place where all geodesics have finite length, or in effect “time stops”. In our view of metamathematics, it corresponds to a situation where “all proofs are finite”—or, in other words, where everything is decidable, and there is no more “fundamental difficulty” left.
Absent other effects we might imagine that in the physical universe the effects of gravity would eventually lead everything to collapse into black holes. And the analog in metamathematics would be that everything in mathematics would “collapse” into decidable theories. But among the effects not accounted for is continued expansion—or in effect the creation of new physical or metamathematical space, formed in a sense by underlying raw computational processes.
What will observers like us make of this, though? In statistical mechanics an observer who does coarse graining might perceive the “heat death of the universe”. But at a molecular level there is all sorts of detailed motion that reflects a continued irreducible process of computation. And inevitably there will be an infinite collection of possible “slices of reducibility” to be found in this—just not necessarily ones that align with any of our current capabilities as observers.
What does this mean for mathematics? Conceivably it might suggest that there’s only so much that can fundamentally be discovered in “highlevel mathematics” without in effect “expanding our scope as observers”—or in essence changing our definition of what it is we humans mean by doing mathematics.
But underneath all this is still raw computation—and the ruliad. And this we know goes on forever, in effect continually generating “irreducible surprises”. But how should we study “raw computation”?
In essence we want to do unfettered exploration of the computational universe, of the kind I did in A New Kind of Science, and that we now call the science of ruliology. It’s something we can view as more abstract and more fundamental than mathematics—and indeed, as we’ve argued, it’s for example what’s underneath not only mathematics but also physics.
Ruliology is a rich intellectual activity, important for example as a source of models for many processes in nature and elsewhere. But it’s one where computational irreducibility and undecidability are seen at almost every turn—and it’s not one where we can readily expect “general laws” accessible to observers like us, of the kind we’ve seen in physics, and now see in mathematics.
We’ve argued that with its foundation in the ruliad mathematics is ultimately based on structures lower level than axiom systems. But given their familiarity from the history of mathematics, it’s convenient to use axiom systems—as we have done here—as a kind of “intermediatescale metamodel” for mathematics.
But what is the “workflow” for using axiom systems? One possibility in effect inspired by ruliology is just to systematically construct the entailment cone for an axiom system, progressively generating all possible theorems that the axiom system implies. But while doing this is of great theoretical interest, it typically isn’t something that will in practice reach much in the way of (currently) familiar mathematical results.
But let’s say one’s thinking about a particular result. A proof of this would correspond to a path within the entailment cone. And the idea of automated theorem proving is to systematically find such a path—which, with a variety of tricks, can usually be done vastly more efficiently than just by enumerating everything in the entailment cone. In practice, though, despite half a century of history, automated theorem proving has seen very little use in mainstream mathematics. Of course it doesn’t help that in typical mathematical work a proof is seen as part of the highlevel exposition of ideas—but automated proofs tend to operate at the level of “axiomatic machine code” without any connection to humanlevel narrative.
But if one doesn’t already know the result one’s trying to prove? Part of the intuition that comes from A New Kind of Science is that there can be “interesting results” that are still simple enough that they can conceivably be found by some kind of explicit search—and then verified by automated theorem proving. But so far as I know, only one significant unexpected result has so far ever been found in this way with automated theorem proving: my 2000 result on the simplest axiom system for Boolean algebra.
And the fact is that when it comes to using computers for mathematics, the overwhelming fraction of the time they’re used not to construct proofs, but instead to do “forward computations” and “get results” (yes, often with Mathematica). Of course, within those forward computations, there are many operations—like Reduce, SatisfiableQ, PrimeQ, etc.—that essentially work by internally finding proofs, but their output is “just results” not “whyit’strue explanations”. (FindEquationalProof—as its name suggests—is a case where an actual proof is generated.)
Whether one’s thinking in terms of axioms and proofs, or just in terms of “getting results”, one’s ultimately always dealing with computation. But the key question is how that computation is “packaged”. Is one dealing with arbitrary, raw, lowlevel constructs, or with something higher level and more “humanized”?
As we’ve discussed, at the lowest level, everything can be represented in terms of the ruliad. But when we do both mathematics and physics what we’re perceiving is not the raw ruliad, but rather just certain highlevel features of it. But how should these be represented? Ultimately we need a language that we humans understand, that captures the particular features of the underlying raw computation that we’re interested in.
From our computational point of view, mathematical notation can be thought of as a rough attempt at this. But the most complete and systematic effort in this direction is the one I’ve worked towards for the past several decades: what’s now the fullscale computational language that is the Wolfram Language (and Mathematica).
Ultimately the Wolfram Language can represent any computation. But the point is to make it easy to represent the computations that people care about: to capture the highlevel constructs (whether they’re polynomials, geometrical objects or chemicals) that are part of modern human thinking.
The process of language design (on which, yes, I’ve spent immense amounts of time) is a curious