When we launched the Wolfram Physics Project a year ago today, I was fairly certain that—to my great surprise—we’d finally found a path to a truly fundamental theory of physics, and it was beautiful. A year later it’s looking even better. We’ve been steadily understanding more and more about the structure and implications of our models—and they continue to fit beautifully with what we already know about physics, particularly connecting with some of the most elegant existing approaches, strengthening and extending them, and involving the communities that have developed them.
And if fundamental physics wasn’t enough, it’s also become clear that our models and formalism can be applied even beyond physics—suggesting major new approaches to several other fields, as well as allowing ideas and intuition from those fields to be brought to bear on understanding physics.
Needless to say, there is much hard work still to be done. But a year into the process I’m completely certain that we’re “climbing the right mountain”. And the view from where we are so far is already quite spectacular.
We’re still mostly at the stage of exploring the very rich structure of our models and their connections to existing theoretical frameworks. But we’re on a path to being able to make direct experimental predictions, even if it’ll be challenging to find ones accessible to presentday experiments. But quite independent of this, what we’ve done right now is already practical and useful—providing new streamlined methods for computing several important existing kinds of physics results.
The way I see what we’ve achieved so far is that it seems as if we’ve successfully found a structure for the “machine code” of the universe—the lowestlevel processes from which all the richness of physics and everything else emerges. It certainly wasn’t obvious that any such “machine code” would exist. But I think we can now be confident that it does, and that in a sense our universe is fundamentally computational all the way down. But even though the foundations are different, the remarkable thing is that what emerges aligns with important mathematical structures we already know, enhancing and generalizing them.
From four decades of exploring the computational universe of possible programs, my most fundamental takeaway has been that even simple programs can produce immensely complex behavior, and that this behavior is usually computationally irreducible, in the sense that it can’t be predicted by anything much less than just running the explicit computation that produced it. And at the level of the machine code our models very much suggest that our universe will be full of such computational irreducibility.
But an important part of the way I now understand our Physics Project is that it’s about what a computationally bounded observer (like us) can see in all this computational irreducibility. And the key point is that within the computational irreducibility there are inevitably slices of computational reducibility. And, remarkably, the three such slices we know correspond exactly to the great theories of existing physics: general relativity, quantum mechanics and statistical mechanics.
And in a sense, over the past year, I’ve increasingly come to view the whole fundamental story of science as being about the interplay between computational irreducibility and computational reducibility. The computational nature of things inevitably leads to computational irreducibility. But there are slices of computational reducibility that inevitably exist on top of this irreducibility that are what make it possible for us—as computationally bounded entities—to identify meaningful scientific laws and to do science.
There’s a part of this that leads quite directly to specific formal development, and for example specific mathematics. But there’s also a part that leads to a fundamentally new way of thinking about things, that for example provides new perspectives on issues like the nature of consciousness, that have in the past seemed largely in the domain of philosophy rather than science.
Spatial hypergraphs. Causal graphs. Multiway graphs. Branchial graphs. A year ago we had the basic structure of our models and we could see how both general relativity and quantum mechanics could arise from them. And it could have been that as we went further—and filled in more details—we’d start seeing issues and inconsistencies. But nothing of the sort has happened. Instead, it seems as if at every turn more and more seems to fit beautifully together—and more and more of the phenomena we know in physics seem to inevitably emerge as simple and elegant consequences of our models.
It all starts—very abstractly—with collections of elements and relations. And as I’ve got more comfortable with our models, I’ve started referring to those elements by what might almost have been an ancient Greek term: atoms of space. The core concept is then that space as we know it is made up from a very large number of these atoms of space, connected by a network of relations that can be represented by a hypergraph. And in our models there’s in a sense nothing in the universe except space: all the matter and everything else that “exists in space” is just encoded in the details of the hypergraph that corresponds to space.
Time in our models is—at least initially—something fundamentally different from space: it corresponds to the computational process of successively applying rules that transform the structure of the hypergraph. And in a sense the application of these rules represents the fundamental operation of the universe. And a key point is that this will inevitably show the phenomenon of computational irreducibility—making the progress of time an inexorable and irreducible computational process.
A striking feature of our models is that at the lowest level there’s nothing constant in our universe. At every moment even space is continually being remade by the action of the underlying rules—and indeed it is precisely this action that knits together the whole structure of spacetime. And though it still surprises me that it can be said so directly, it’s possible to identify energy as essentially just the amount of activity in space, with mass in effect being the “inertia” or persistence of this activity.
At the lowest level everything is just atoms of space “doing their thing”. But the crucial result is that—with certain assumptions—there’s largescale collective behavior that corresponds exactly to general relativity and the observed continuum structure of spacetime. Over the course of the year, the derivation of this result has become progressively more streamlined. And it’s clear it’s all about what a computationally bounded observer will be able to conclude about underlying computationally irreducible processes.
But there’s then an amazing unification here. Because at a formal level the setup is basically the same as for molecular dynamics in something like a gas. Again there’s computational irreducibility in the underlying behavior. And there’s a computationally bounded observer, usually thought of in terms of “coarse graining”. And for that observer—in direct analogy to an observer in spacetime—one then derives the Second Law of Thermodynamics, and the equations of continuum fluid behavior.
But there’s an important feature of both these derivations: they’re somehow generic, in the sense that they don’t depend on underlying details like the precise nature of the molecules in the gas, or the atoms of space. And what this means is that both thermodynamics and relativity are general emergent laws. Regardless of what the precise underlying rules are, they’ll basically always be what one gets in a largescale limit.
It’s quite remarkable that relativity in a sense formally comes from the same place as thermodynamics. But it’s the genericity of general relativity that’s particularly crucial in thinking about our models. Because it implies that we can make largescale conclusions about physics without having to know what specific rule is being applied at the level of the underlying hypergraph.
Much like hypersonic flow in a gas, however, there will nevertheless be extreme situations in which one will be able to “see beneath” the generic continuum behavior—and tell that there are discrete atoms of space with particular behavior. Or in other words, that one will be able to see corrections to Einstein’s equations—that depend on the fact that space is actually a hypergraph with definite rules, rather than a continuous manifold.
One important feature of our spatial hypergraph is that—unlike our ordinary experience of space—it doesn’t intrinsically have any particular dimension. Dimension is an emergent largescale feature of the hypergraph—and it can be an integer, or not, and it can, for example, vary with position and time. So one of the unexpected implications of our models is that there can be dimension fluctuations in our universe. And in fact it seems likely that our universe started essentially infinitedimensional, only gradually “cooling” to become basically threedimensional. And though we haven’t yet worked it out, we expect there’ll be a “dimensionchanging cosmology” that may well have definite predictions for the observed largescale structure of our universe.
The underlying discreteness—and variable dimension—of space in our models has many other implications. Traditional general relativity suggests certain exotic phenomena in spacetime, like event horizons and black holes—but ultimately it’s limited by its reliance on describing spacetime in terms of a continuous manifold. In our models, there are all sorts of possible new exotic phenomena—like change in spacetime topology, space tunnels and dynamic disconnection of the hypergraph.
What happens if one sets up a black hole that spins too rapidly? In our models, a piece of spacetime simply disconnects. And it’s been interesting to see how much more direct our models allow one to be in analyzing the structure of spacetime, even in cases where traditional general relativity gives one a hint of what happens.
Calculus has been a starting point for almost all traditional mathematical physics. But our models in a sense require a fundamental generalization of calculus. We have to go beyond the notion of an integer number of “variables” corresponding to particular dimensions, to construct a kind of “hypercalculus” that can for example generalize differential geometry to fractional dimensional space.
It’s a challenging direction in mathematics, but the concreteness of our models helps greatly in defining and exploring what to do—and in seeing what it means to go “below whole variables” and build everything up from fragmentary discrete connections. And one of the things that’s happened over the past year is that we’ve been steadily recapitulating the history of calculuslike mathematics, progressively defining generalizations of notions like tangent spaces, tensors, parallel transport, fiber bundles, homotopy classes, Lie group actions and so on, that apply to limits of our hypergraphs and to the kind of space to which they correspond.
One of the ironies of practical investigations of traditional general relativity is that even though the theory is set up in terms of continuous manifolds and continuous partial differential equations, actual computations normally involve doing “numerical relativity” that uses discrete approximations suitable for digital computers. But our models are “born digital” so nothing like this has to be done. Of course, the actual number of atoms of space in our real universe is immensely larger than anything we can simulate.
But we’ve recently found that even much more modest hypergraphs are already sufficient to reproduce the same kind of results that are normally found with numerical relativity. And so for example we can directly see in our models things like the ringdown of merging black holes. And what’s more, as a matter of practical computation, our models seem potentially more efficient at generating results than numerical relativity. So that means that even if one isn’t interested in models of fundamental physics and in the “underlying machine code” of the universe, our project is already useful—in delivering a new and promising method for doing practical computations in general relativity.
And, by the way, the method isn’t limited to general relativity: it looks as if it can be applied to other kinds of systems based on PDEs—like stress analysis and biological growth. Normally one thinks of taking some region of space, and approximating it by a discrete mesh, that one might adapt and subdivide. But with our method, the hypergraphs—with their variable dimensions—provide a richer way to approximate space, in which subdivision is done “automatically” through the actual dynamics of the hypergraph evolution.
I already consider it very impressive and significant that our models can start from simple abstract rules and end up with the structure of space and time as we know them in some sense inevitably emerging. But what I consider yet more impressive and significant is that these very same models also inevitably yield quantum mechanics.
It’s often been said (for example by my late friend Richard Feynman) that “nobody really understands quantum mechanics”. But I’m excited to be able to say that—particularly after this past year—I think that we are finally beginning to actually truly understand quantum mechanics. Some aspects of it are at first somewhat mindbending, but given our new understanding we’re in a position to develop more and more accessible ways of thinking about it. And with our new understanding comes a formalism that can actually be applied in many other places—and from these applications we can expect that in time what now seem like bizarre features of quantum mechanics will eventually seem much more familiar.
In ordinary classical physics, the typical setup is to imagine that definite things happen, and that in a sense every system follows a definite thread of behavior through time. But the key idea of quantum mechanics is to imagine that many threads of possible behavior are followed, with a definite outcome being found only through a measurement made by an observer.
And in our models this picture is not just conceivable, but inevitable. The rules that operate on our underlying spatial hypergraph specify that a particular configuration of elements and relations will be transformed into some other one. But typically there will be many different places in the spatial hypergraphs where any such transformation can be applied. And each possible sequence of such updating events defines a particular possible “thread of history” for the system.
A key idea of our models is to consider all those possible threads of history—and to represent these in a single object that we call a multiway graph. In the most straightforward way of setting this up, each node in the multiway graph is a complete state of the universe, joined to whatever states are reached from it by all possible updating events that can occur in it.
A particular possible history for the universe then corresponds to a particular path through the multiway graph. And the crucial point is that there is branching—and merging—in the multiway graph leading in general to a complicated interweaving of possible threads of history.
But now imagine slicing across the multiway graph—in a sense sampling many threads of history at some particular stage in their evolution. If we were to look at these threads of history separately there might not seem to be any relation between them. But the way they’re embedded in the multiway graph inevitably defines relations between them. And for example we can imagine just saying that any two states in a particular slice of the multiway graph are related if they have a common ancestor, and are each just a result of a different event occurring in that ancestor state. And by connecting such states we form what we call a branchial graph—a graph that captures the relations between multiway branches.
But just like we imagine our spatial hypergraphs limit to something like ordinary continuous physical space, so also we can imagine that our branchial graphs limit to something we can call branchial space. And in our models branchial space corresponds to a space of quantum states, with the branchial graph in effect providing a map of the entanglements between those states.
In ordinary physical space we know that we can define coordinates that label different positions. And one of the things we’re understanding with progressively more clarity is also how to set up coordinatizations of branchial space—so that instead of just talking individually about “points in branchial space” we can talk more systematically about what happens “as a function of position” in branchial space.
But what is the interpretation of “position” in branchial space? It turns out that it is essentially the phase of a quantum amplitude. In the traditional formalism of quantum mechanics, every different state has a certain complex number associated with it that is its quantum amplitude. In our models, that complex number should be thought of in two parts. Its magnitude is associated with a combinatorial counting of possible paths in the multiway graph. But its phase is “position in branchial space”.
Once one has a notion of position, one is led to talk about motion. And in classical mechanics and general relativity a key concept is that things in physical space move by following shortest paths (“geodesics”) between different positions. When space is flat these paths are ordinary straight lines, but when there is curvature in space—corresponding in general relativity to the presence of gravity—the paths are deflected. But what the Einstein equations then say is that curvature in space is associated with the presence of energymomentum. And in our models, this is exactly what happens: energymomentum is associated with the presence of update events in the spatial hypergraph, and these lead to curvature and a deflection of geodesics.
So what about motion in branchial space? Here we are interested in how “bundles of nearby histories” progress through time in the multiway graph. And it turns out that once again we are dealing with geodesics that are deflected by the presence of update events that we can interpret as energymomentum.
But now this deflection is not in physical space but in branchial space. The fundamental underlying mathematical structure is the same in both cases. But the interpretation in terms of traditional physics is different. And in what to me is a singularly beautiful result of our models it turns out that what gives the Einstein equations in physical space gives the Feynman path integral in branchial space. Or in other words, quantum mechanics is the same as general relativity, except in branchial space rather than physical space.
But, OK, so how do we assign positions in branchial space? It’s a mathematically complicated thing to do. Nearly a year ago we found a kind of trick way to do it for a standard simple quantum setup: the doubleslit experiment. But over the course of the year, we’ve developed a much more systematic approach based on category theory and categorical quantum mechanics.
In its usual applications in mathematics, category theory talks about things like the patterns of mappings (morphisms) between definite named kinds of objects. But in our models what we want is just the “bulk structure” of category theory, and the general idea of patterns of connections between arbitrary unnamed objects. It’s very much like what we do in setting up our spatial hypergraph. There are symbolic expressions—like in the Wolfram Language—that define structures associated with named kinds of things, and on which transformations can be applied. But we can also consider “bulk symbolic expressions” that don’t in effect “name every element of space”, and where we just consider their overall structure.
It’s an abstract and elaborate mathematical story. But the key point is that in the end our multiway formalism can be shown to correspond to the formalism that has been developed for categorical quantum mechanics—which in turn is known to be equivalent to the standard formalism of quantum mechanics.
So what this means is that we can take a description of a quantum system—say a quantum circuit—and in effect “compile” it into an equivalent multiway system. One thing is that we can think of this as a “proof by compilation”: we know our models reproduce standard quantum mechanics, because standard quantum mechanics can in effect just be systematically compiled into our models.
But in practice there’s something more: by really getting at the essence of quantum mechanics, our models can provide more efficient ways to do actual computations in quantum mechanics. And for example we’ve got recent results on using automated theorem proving methods within our models to more efficiently optimize practical quantum circuits. Much as in the case of general relativity, it seems that by “going underneath” the standard formalism of physics, we’re able to come up with more efficient ways to do computations, even for standard physics.
And what’s more, the formalism we have potentially applies to things other than physics. I’ll talk more about this later. But here let me mention a simple example that I’ve tried to use to build intuition about quantum mechanics. If you have something like tictactoe, you can think of all possible games that can be played as paths through a multiway graph in which the nodes are possible configurations of the tictactoe board. Much like in the case of quantum mechanics, one can define a branchial graph—and then one can start thinking about the analogs of all kinds of “quantum” effects, and how there are just a few final “classical” outcomes for the game.
Most practical computations in quantum mechanics are done at the level of quantum amplitudes—which in our setup corresponds essentially to working out the evolution of densities in branchial space. But in a sense this just tells us that there are lots of different threads of history that a particular system could follow. So how is it then that we come to perceive definite things as happening in the world?
The traditional formalism of quantum mechanics essentially by fiat introduces the socalled Born rule which in effect says how densities in branchial space can be converted to probabilities of different specific outcomes. But in our models we can “go inside” this “process of measurement”.
The key idea—which has become clearer over the course of this year—is at first a bit mindbending. Remember that our models are supposed to be models for everything in the universe, including us as observers of the universe. In thinking about space and time we might at first imagine that we could just independently trace the individual time evolution of, for example, different atoms of space. But if we’re inside the system no such “absolute tracing” is possible; instead all we can ever perceive is the graph of causal relationships of different events that occur. In a sense we’re only “plugged into” the universe through the causal effects that the universe has on us.
OK, so what about the quantum case? We want to tell what’s going on in the multiway graph of all possible histories. But we’re part of that graph, with many possible histories ourselves. So in a sense what we have to think about is how a “branching brain” perceives a “branching universe”. People have often imagined that somehow having a “conscious observer” is crucial to “making measurements” in quantum mechanics. And I think we can now understand how that works. It seems as if the essence of being a “conscious observer” is precisely having a “single thread of experience”—or in other words conflating the different histories in different branches.
Of course, it is not at all obvious that doing this will be consistent. But in our models there is the notion of causal invariance. In the end this doesn’t have to be an intrinsic feature of specific lowlevel rules one attributes to the universe; as I’ll talk about a bit later, it seems to be an inevitable emergent feature of the structure of what we call rulial space. But what’s important about causal invariance is that it implies that different possible threads of history must in effect in the end always have the same causal structure—and the same observable causal graph that describes what happens in the universe.
It’s causal invariance that makes different reference frames in physical space (corresponding, for example, to different states of motion) work the same, and that leads to relativistic invariance. And it’s also causal invariance (or at least eventual causal invariance) that makes the conflation of quantum histories be consistent—and makes there be a meaningful notion of objective reality in quantum mechanics, shared by different observers.
There’s more to do in working out the detailed mechanics of how threads of history can be conflated. It can be thought of as closely related to the addition of “completion lemmas” in automated theorem proving. Some aspects of it can be thought of as a “convention”—analogous to a choice of reference frame. But the structure of the model implies certain important “physical constraints”.
We’ve often been asked: “What does all this mean for quantum computing?” The basic idea of quantum computing—captured in a minimal form by something like a multiway Turing machine—is to do different computations in parallel along different possible threads of history. But the key issue (that I’ve actually wondered about since the early 1980s) is then how to corral those threads of history together to figure out a definite answer for the computation. And our models give us ways to look “inside” that process, and see what’s involved, and how much time it should take. We’re still not sure about the answer, but the preliminary indication is that at least at a formal level, quantum computers aren’t going to come out ahead. (In practice, of course, investigating physical processes other than traditional semiconductor electronics will surely lead to even perhaps dramatically faster computers, even if they’re not “officially quantum”.)
One of the surprises to me this year has been just how far we can get in exploring quantum mechanics without ever having to talk about actual particles like electrons or photons. Actual quantum experiments usually involve particles that are somehow localized to particular positions in space. But it seems as if the essentials of quantum mechanics can actually be captured without depending on particles, or space.
What are particles in our models? Like everything else in the universe, they can be thought of as features of space. The general picture is that in the spatial hypergraph there are continual updates going on, but most of them are basically just concerned with “maintaining the structure of space”. But within that structure, we imagine that there can be localized pieces that have a certain stability that allows them to “move largely unchanged through space” (even as “space itself” is continually getting remade). And these correspond to particles.
Analogous to things like vortices in fluids, or black holes in spacetime, we can view particles in our models as some kind of “topological obstructions” that prevent features of the hypergraph from “readily unraveling”. We’ve made some progress this year in understanding what these topological obstructions might be like, and how their structure might be related to things like the quantization of particle spin, and in general the existence of discrete quantum numbers.
It’s an interesting thing to have both “external space” and “internal quantum numbers” encoded together in the structure of the spatial hypergraph. But we’ve been making progress at seeing how to tease apart different features of things like homotopy and geometry in the limit of large hypergraphs, and how to understand the relations between things like foliations and fibrations in the multiway graph describing hypergraph evolution.
We haven’t “found the electron” yet, but we’re definitely getting closer. And one of the things we’ve started to identify is how a fiber bundle structure can emerge in the evolution of the hypergraph—and how local gauge invariance can arise. In a discrete hypergraph it’s not immediately obvious even how something like limiting rotational symmetry would work. We have a pretty good idea how hypergraphs can limit on a large scale to continuous “spatial” manifolds. And it’s now becoming clearer how things like the correspondences between collections of geodesics from a single point can limit to things like continuous symmetry groups.
What’s very nice about all of this is how generic it’s turning out to be. It doesn’t depend on the specifics of the underlying rules. Yes, it’s difficult to untangle, and to set up the appropriate mathematics. But once one’s done that, the results are very robust.
But how far will that go? What will be generic, and what not? Spatial isotropy—and the corresponding spherical symmetry—will no doubt be generic. But what about local gauge symmetry? The SU(3)×SU(2)×U(1) that appears in the Standard Model of particle physics seems on the face of it quite arbitrary. But it would be very satisfying if we were to find that our models inevitably imply a gauge group that is, say, a subgroup of E(8).
We haven’t finished the job yet, but we’ve started understanding features of particle physics like CPT invariance (P and T are space and time inversion, and we suspect that charge conjugation operation C is “branchial inversion”). Another promising possibility relates to the distinction between fermions and bosons. We’re not sure yet, but it seems as if Fermi–Dirac statistics may be associated with multiway graphs where we see only nonmerging branches, while Bose–Einstein statistics may be associated with ones where we see all branches merging. Spinors may then turn out to be as straightforward as being associated with directed rather than undirected spatial hypergraphs.
It’s not yet clear how much we’re going to have to understand particles in order to see things like the spinstatistics connection, or whether—like in basic quantum mechanics—we’re going to be able to largely “factor out” the “spatial details” of actual particles. And as we begin to think about quantum field theory, it’s again looking as if there’ll be a lot that can be said in the “bulk” case, without having to get specific about particles. And just as we’ve been able to do for spacetime and general relativity, we’re hoping it’ll be possible to do computations in quantum field theory directly from our models, providing, for example, an alternative to things like lattice gauge theory (presumably with a more realistic treatment of time).
When we mix spatial hypergraphs with multiway graphs we inevitably end up with pretty complex structures—and ones that at least in the first instance tend to be full of redundancy. In the most obvious “global” multiway graph, each multiway graph node is in effect a complete state of the universe, and one’s always (at least conceptually) “copying” every part of this state (i.e. every spatial hypergraph node) at every update, even though only a tiny part of the state will actually be affected by the update.
So one thing we’ve been working on this year is defining more local versions of multiway systems. One version of this is based on what I call “multispace”, in which one effectively “starts from space”, then lets parts of it “bow out” where there are differences between different multiway branches. But a more scalable approach is to make a multiway graph not from whole states, but instead from a mixture of update events and individual “tokens” that knit together to form states.
There’s a definite tradeoff, though. One can set up a “tokenevent graph” that pretty much completely avoids redundancy. But the cost is that it can be very difficult to reassemble complete states. The full problem of reassembly no doubt runs into the computational irreducibility of the underlying evolution. But presumably there’s some limited form of reassembly that captures actual physical measurements, and that can be done by computationally bounded observers.
In assessing a scientific theory the core question to ask is whether you get out more than you put in. It’s a bad sign if you carefully set up some very detailed model, and it still can’t tell you much. It’s a good sign if you just set up a simple model, and it can tell you lots of things. Well, by this measure, our models are the most spectacular I have ever seen. A year ago, it was already clear that the models had a rich set of implications. But over the course of this year, it feels as if more and more implications have been gushing out.
And the amazing thing is that they all seem to align with what we know from physics. There’s been no tweaking involved. Yes, it’s often challenging to work out what the models imply. But when we do, it always seems to agree with physics. And that’s what makes me now so confident that our models really do actually represent a correct fundamental theory of physics.
It’s been very interesting to see the methodology of “proof by compilation”. Do our models correctly reproduce general relativity? We can “compile” questions in general relativity into our models—then effectively run at the level of our “machine code”, and generate results. And what we’ve found is that, yes, compiling into our models works, giving the same results as we would get in the traditional theory, though, as it happens, potentially more efficiently.
We’ve found the same thing for quantum mechanics. And maybe we’ll find the same thing also for quantum field theory (where the traditional computations are much harder).
We’ve also been looking at specific effects and phenomena in existing physics—and we’re having excellent success not only in reproducing them in our models (and finding ways to calculate them) but also in (often for the first time) fundamentally understanding them. But what about new effects and phenomena that aren’t seen or expected in existing physics? Especially surprising ones?
It’s already very significant when a theory can efficiently explain things that are already known. But it’s a wonderful “magic trick” if a theory can say “This is what you’ll see”, and then that’s what’s seen in some actual experiment. Needless to say, it can be very difficult to figure out detailed predictions from a theory (and historically it’s often taken decades or even centuries). And when you’re dealing with something that’s never been seen before, it’s often difficult to know if you’ve included everything you need to get the right answer, both in working out theoretical predictions, and in making experimental measurements.
But one of the interesting things about our models is how structurally different they are from existing physics. And even before we manage to make detailed quantitative predictions, the very structure of our models implies the possibility of a variety of unexpected and often bizarre phenomena.
One class of such phenomena relate to the fact that in our models the dimension of space is dynamic, and does not just have a fixed integer value. Our expectation is that in the very early universe, the dimension of space was effectively infinite, gradually “cooling” to approximately 3. And in this setup, there should have been “dimension fluctuations”, which could perhaps have left a recognizable imprint on the cosmic microwave background, or other largescale features of the universe.
It’s also possible that there could be dimension fluctuations still in our universe today, either as relics from the early universe, or as the result of gravitational processes. And if photons propagate through such dimension fluctuations, we can expect strange optical effects, though the details are still to be worked out. (One can also imagine things like pulsar timing anomalies, or effects on gravitational waves—or just straight local deviations from the inverse square law. Conceivably quantum field theoretic phenomena like anomalous magnetic moments of leptons could be sensitive dimension probes—though on small scales it’s difficult to distinguish dimension change from curvature. Or maybe there would be anomalies or magnetic monopoles made possible by noninteger dimensionality.)
A core concept of our models is that space (and time) are fundamentally discrete. So how might we see signs of this discreteness? There’s really only one fundamental unknown free parameter in our models (at least at a generic level), and there are many seemingly very different experiments that could determine it. But without having the value of this parameter, we don’t ultimately know the scale of discreteness in our models.
We have a (somewhat unreliable) estimate, however, that the elementary length might be around 10^{90} meters (and the elementary time around 10^{100} seconds). But these are nearly 70 orders of magnitude smaller than anything directly probed by presentday experiments.
So can we imagine any way to detect discreteness on such scales? Conceivably there could be effects left over from a time when the whole universe was very small. In the current universe there could be a signature of momentum discreteness in “maximum boosts” for sufficiently light particles. Or maybe there could be “shot noise” in the propagation of particles. But the best hope for detecting discreteness of spacetime seems to be in connection with large gravitational fields.
Eventually our models must imply corrections to Einstein’s equations. But at least in the most obvious estimates these would only become significant when the scale of curvature is comparable to the elementary length. Of course, it’s conceivable that there could be situations where, for example, there could be, say, a logarithmic signature of discreteness, allowing a more effective “gravitational microscope” to be constructed.
In current studies of general relativity, the potentially most accessible “extreme situation” is a spinning black hole close to critical angular momentum. And in our models, we already have direct simulations of this. And what we see is that as we approach criticality there starts to be a region of space that’s knitted into the rest of space by fewer and fewer updating events. And conceivably when this happens there would be “shot noise”, say visible in gravitational waves.
There are other effects too. In a kind of spacetime analog of vacuum polarization, the discreteness of spacetime should lead to a “black hole wind” of outgoing momentum from an event horizon—though the effect is probably only significant for elementarylengthscale black holes. (Such effects might lead to energy loss from black holes through a different “mode of spacetime deformation” than ordinary gravitational radiation.) Another effect of having a discrete structure to space is that information transmission rates are only “statistically” limited to the speed of light, and so fluctuations are conceivable, though again most likely only on elementarylengthtype scales.
In general the discreteness of spacetime leads to all sorts of exotic structures and singularities in spacetime not present in ordinary general relativity. Notable potential features include dynamic topology change, “space tunnels”, “dimension anomalies” and spatial disconnection.
We imagine that in our models particles are some kind of topological obstructions in the spatial hypergraph. And perhaps we will find even quite generic results for the “spectrum” of such obstructions. But it’s also quite possible that there will be “topologically stable” structures that aren’t just like point particles, but are something more exotic. By the way, in computing things like the cosmological constant—or features of dark energy—we need to compare the “total visible particle content” with the total activity in the spatial hypergraph, and there may be generic results to be had about this.
One feature of our models is that they imply that things like electrons are not intrinsically of zero size—but in fact are potentially quite large compared to the elementary length. Their actual size is far out of range of any anticipated experiments, but the fact that they involve so many elements in the underlying spatial hypergraph suggests that there might be particles—that I’ve called oligons—that involve many fewer, and that might have measurable cosmological or astrophysical effects, or even be directly detectable as some kind of verylowmass dark matter.
In thinking about particles, our models also make one think about some potential highly exotic possibilities. For example, perhaps not every photon in the universe with given energymomentum and polarization is actually identical. Maybe they have the same “overall topological structure”, but different detailed configuration of (say) the multiway causal graph. And maybe such differences would have detectable effects on sufficiently large coherent collections of photons. (It may be more plausible, however, that particles act a bit like tiny black holes, with their “internal state” not evident outside.)
When it comes to quantum mechanics, our models again have some generic predictions—the most obvious of which is the existence of a maximum entanglement speed ζ, that is the analog of the speed of light, but in branchial space. In our models, the scale of ζ is directly connected to the scale of the elementary length, so measuring one would determine the other—and with our (rather unreliable) estimate for the elementary length ζ might be around 10^{5} solar masses per second.
There are a host of “relativityanalog” effects associated with ζ, an example being the quantum Zeno effect that is effectively time dilation associated with rapidly repeated measurement. And conceivably there is some kind of atomicscale (or gravitationalwavedetectordeformationscale) “measurement from the environment” that could be sensitive to this—perhaps associated with what might be considered “noise” for a quantum computer. (By the way, ζ potentially also defines limitations on the effectiveness of quantum computing, but it’s not clear how one would disentangle “engineering issues”.)
Then there are potential interactions between quantum mechanics and the structure of spacetime—perhaps for example effects of features of spacetime on quantum coherence. But probably the most dramatic effects will be associated with things like black holes, where for example the maximum entanglement speed should represent an additional limitation on black hole formation—that with our estimate for ζ might actually be observable in the near term.
Historically, general relativity was fortunate enough to imply effects that did not depend on any unknown scales (like the cosmological constant). The most obvious candidates for similar effects in our models involve things like the quantum behavior of photons orbiting a black hole. But there’s lots of detailed physics to do to actually work any such things out.
In the end, a fundamental model for physics in our setup involves some definite underlying rule. And some of our conclusions and predictions about physics will surely depend on the details of that rule. But one of the continuing surprises in our models is how many implied features of physics are actually generic to a large class of rules. Still, there are things like the masses of elementary particles that at least feel like they must be specific to particular rules. Although—who knows—maybe overall symmetries are determined by the basic structure of the model, maybe the number of generations of fermions is connected to the effective dimensionality of space, etc. These are some of the kinds of things it looks conceivable that we’ll begin to know in the next few years.
When I first started developing what people have been calling “Wolfram models”, my primary motivation was to understand fundamental physics. But it was quickly clear that the models were interesting in their own right, independent of their potential connection to physics, and that they might have applications even outside of physics. And I suppose one of the big surprises this year has been just how true that is.
I feel like our models have introduced a whole new paradigm, that allows us to think about all kinds of fields in fundamentally new ways, and potentially solve longstanding foundational problems in them.
The general exploration of the computational universe—that I began more than forty years ago—has brought us phenomena like computational irreducibility and has led to all sorts of important insights. But I feel that with our new models we’ve entered a new phase of understanding the computational universe, in particular seeing the subtle but robust interplay between computational reducibility and computational irreducibility that’s associated with the introduction of computationally bounded observers or measurements.
I hadn’t really known how to fit the successes of physics into the framework of what I’d seen in the computational universe. But now it’s becoming clear. And the result is not only that we understand more about the foundations of physics, but also that we can import the successes of physics into our thinking about the computational universe, and all its various applications.
At a very pragmatic level, cellular automata (my longtime favorite examples in the computational universe) provide minimal models for systems in which arbitrary local rules operate on a fixed array in space and time. Our new models now provide minimal models for systems that have no such definite structure in space and time. Cellular automata are minimal models of “array parallel” computational processes; our new models are minimal models of distributed, asynchronous computational processes.
In something like a cellular automaton—with its very organized structure for space and time—it’s straightforward to see “what leads to what”. But in our new models it can be much more complicated—and to represent the causal relationships between different events we need to construct causal graphs. And for me one consequence of studying our models has been that whenever I’m studying anything I now routinely start asking about causal graphs—and in all sorts of cases this has turned out to be very illuminating.
But beyond causal graphs, one feature of our new models is their essentially inevitable multiway character. There isn’t just one “thread of history” for the evolution of the system, there’s a whole multiway graph of them. In the past, there’ve been plenty of probabilistic or nondeterministic models for all sorts of systems. But in a sense I’ve always found them unsatisfactory, because they end up talking about making an arbitrary choice “from outside the system”. A multiway graph doesn’t do that. Instead, it tells our story purely from within the system. But it’s the whole story: “in one gulp” it’s capturing the whole dynamic collection of all possibilities.
And now that the formalism of our models has gotten me used to multiway graphs, I see them everywhere. And all sorts of systems that I thought somehow weren’t well enough defined to be able to study in a systematic way I now realize are amenable to “multiway analysis”.
One might think that a multiway graph that captures all possibilities would inevitably be too complicated to be useful. But this is another key observation from our Physics Project: particularly with the phenomenon of causal invariance, there are generic statements that can be made, without dealing with all the details. And one of the important directions we’ve pursued over the course of this year is to get a better understanding—sometimes using methods from category theory—of the general theory of multiway systems.
But, OK, so what can we apply the formalism of our models to? Lots of things. Some that we’ve at least started to think seriously about are: distributed computing, mathematics and metamathematics, chemistry, biology and economics. And in each case it’s not just a question of having some kind of “addon” model; it seems like our formalism allows one to start talking about deep, foundational questions in each of these fields.
In distributed computing, I feel like we’re just getting started. For decades I’ve wondered how to think about organizing distributed computing so that we humans can understand it. And now within our formalism, I’ve both understood why that’s hard, and begun to get ideas about how we might do it. A crucial part is getting intuition from physics: thinking about “programming in a reference frame”, causal invariance as a source of eventual consistency, quantum effects as ambiguities of outcome, and so on. But it’s also been important over the past year to study specific systems—like multiway Turing machines and combinators—and be able to see how things work out in these simpler cases.
As an “exercise”, we’ve been looking at using ideas from our formalism to develop a distributed analog of blockchain—in which “intentional events” introduced from outside the system are “knitted together” by large numbers of “autonomous events”, in much the same way as consistent “classical” space arises in our models of physics. (The analog of “forcing consensus” or coming to a definite conclusion is essentially like the process of quantum measurement.)
It’s interesting to try to apply “causal” and “multiway” thinking to practical computation, for example in the Wolfram Language. What is the causal graph of a computation? It’s a kind of dependency trace. And after years of looking for a way to get a good manipulable symbolic representation of program execution this may finally show us how to do it. What about the multiway graph? We’re used to thinking about computations that get done on “data structures”, like lists. But how should we think of a “multiway computation” that can produce a whole bundle of outputs? (In something like logic programming, one starts with a multiway concept, but then typically picks out a single path; what seems really interesting is to see how to systematically “compute at the multiway level”.)
OK, so what about mathematics? There’s an immediate correspondence between multiway graphs and the networks obtained by applying axioms or laws of inference to generate all possible theorems in a given mathematical theory. But now our study of physics makes a suggestion: what would happen if—like in physics—we take a limit of this process? What is “bulk” or “continuum” metamathematics like?
In the history of human mathematics, there’ve been a few million theorems published—defining in a sense the “human geography” of metamathematical space. But what about the “intrinsic geometry”? Is there a theory of this, perhaps analogous to our theory of physics? A “physicalized metamathematics”? And what does it tell us about the “infinitetime limit” of mathematics, or the general nature of mathematics?
If we try to fully formalize mathematics, we typically end up with a very “nonhuman” “machine code”. In physics there might be a hundred orders of magnitude between the atoms of space and our typical experience. In presentday formalized mathematics, there might be 4 or 5 orders of magnitude from the “machine code” to typical statements of theorems that humans would deal with.
At the level of the machine code, there’s all sorts of computational irreducibility and undecidability, just like in physics. But somehow at the “human level” there’s enough computational reducibility that one can meaningfully “do mathematics”. I used to think that this was some kind of historical accident. But I now suspect that—just like with physics—it’s a fundamental feature of the involvement of computationally bounded human “observers”. And with the correspondence of formalism, one’s led to ask things like what the analog of relativity—or quantum mechanics—is in “bulk metamathematics”, and, for example, how it might relate to things like “computationally bounded category theory”.
And, yes, this is interesting in terms of understanding the nature of mathematics. But mathematics also has its own deep stack of results and intuition, and in studying mathematics using the same formalism as physics, we also get to use this in our efforts to understand physics.
How could all this be relevant to chemistry? Well, a network of all possible chemical reactions is once again a multiway graph. In chemical synthesis one’s usually interested in just picking out one particular “pathway”. But what if we think “multiway style” about all the possibilities? Branchial space is a map of chemical species. And we now have to understand what kind of laws a “computationally bounded chemical sensor” might “perceive” in it.
Imagine we were trying to “do a computation with molecules”. The “events” in the computation could be thought of as chemical reactions. But now instead of just imagining “getting a single molecular result”, consider using the whole multiway system “as the computation”. It’s basically the same story as distributed computing. And while we don’t yet have a good way to “program” like this, our Physics Project now gives us a definite direction. (Yes, it’s ironic that this kind of molecularscale computation might work using the same formalism as quantum mechanics—even though the actual processes involved don’t have to be “quantum” in the underlying physics sense.)
When we look at biological systems, it’s always been a bit of a mystery how one should think about the complex collections of chemical processes they involve. In the case of genetics we have the organizing idea of digital information and DNA. But in the general case of systems biology we don’t seem to have overarching principles. And I certainly wonder whether what’s missing is “multiway thinking” and whether using ideas from our Physics Project we might be able to get a more global understanding—like a “general relativity” of systems biology.
It’s worth pointing out that the detailed techniques of hypergraph evolution are probably applicable to biological morphogenesis. Yes, one can do a certain amount with things like continuum reactiondiffusion equations. But in the end biological tissue—like, we now believe, physical space—is made of discrete elements. And particularly when it comes to topologychanging phenomena (like gastrulation) that’s probably pretty important.
Biology hasn’t generally been a field that’s big on formal theories—with the one exception of the theory of natural selection. But beyond specific fewwholespeciesdynamics results, it’s been difficult to get global results about natural selection. Might the formalism of our models help? Perhaps we’d be able to start thinking about individual organisms a bit like we think about atoms of space, then potentially derive largescale “relativitystyle” results, conceivably about general features of “species space” that really haven’t been addressed before.
In the long list of potential areas where our models and formalism could be applied, there’s also economics. A bit like in the natural selection case, the potential idea is to think about in effect modeling every individual event or “transaction” in an economy. The causal graph then gives some kind of generalized supply chain. But what is the effect of all those transactions? The important point is that there’s almost inevitably lots of computational irreducibility. Or, in other words, much like in the Second Law of Thermodynamics, the transactions rapidly start to not be “unwindable” by a computationally bounded agent, but have robust overall “equilibrium” properties, that in the economic case might represent “meaningful value”—so that the robustness of the notion of monetary value might correspond to the robustness with which thermodynamic systems can be characterized as having certain amounts of heat.
But with this view of economics, the question still remains: are there “physicslike” laws to be found? Are there economic analogs of reference frames? (In an economy with geographically local transactions one might even expect to see effects analogous to relativistic time dilation.)
To me, the most remarkable thing is that the formalism we’ve developed for thinking about fundamental physics seems to give us such a rich new framework for discussing so many other kinds of areas—and for pooling the results and intuitions of these areas.
And, yes, we can keep going. We can imagine thinking about machine learning—for example considering the multiway graph of all possible learning processes. We can imagine thinking about linguistics—starting from every elementary “event” of, say, a word being said by one person to another. We even think about questions in traditional physics—like one of my old favorites, the hardsphere gas—analyzing them not with correlation functions and partition functions but with causal graphs and multiway graphs.
A year ago, as we approached the launch of the Wolfram Physics Project, we felt increasingly confident that we’d found the correct general formalism for the “machine code” of the universe, we’d built intuition by looking at billions of possible specific rules, and we’d discovered that in our models many features of physics are actually quite generic, and independent of specific rules.
But we still assumed that in the end there must be some specific rule for our particular universe. We thought about how we might find it. And then we thought about what would happen if we found it, and how we might imagine answering the question “Why this rule, and not another?”
But then we realized: actually, the universe does not have to be based on just one particular rule; in some sense it can be running all possible rules, and it is merely through our perception that we attribute a specific rule to what we see about the universe.
We already had the concept of a multiway graph, generated by applying all possible update events, and tracing out the different histories to which they lead. In an ordinary multiway graph, the different possible update events occur at different places in the spatial hypergraph. But we imagined generalizing this to a rulial multiway graph, generated by applying not just updates occurring in all possible places, but also updates occurring with all possible rules.
At first one might assume that if one used all possible rules, nothing definite could come out. But the fact that different rules can potentially lead to identical states causes a definite rulial multiway graph to be knitted together—including all possible histories, based on all possible sequences of rules.
What could an observer embedded in such a rulial multiway graph perceive? Just as for causal graphs or ordinary multiway graphs, one can imagine defining a reference frame—here a “rulial frame”—that makes the observer perceive the universe as evolving through a series of slices in rulial space, or in effect operating according to certain rules. In other words, the universe follows all possible rules, but an observer in a particular rulial frame describes its operation according to particular rules.
And the critical point is then that this is consistent because the evolution in the rulial multiway graph inevitably shows causal invariance. At first this all might seem quite surprising. But the thing to realize is that the Principle of Computational Equivalence implies that collections of rules will generically show computation universality. And this means that whatever rulial frame one picks—and whatever rules one uses to describe the evolution of the universe—it’ll always be possible to use those rules to emulate any other possible rules.
There is a certain ultimate abstraction and unification in all this. In a sense it says that the only thing one ultimately needs to know about our universe is that it is “computational”—and from there the whole formal structure of our models takes over. It also tells us that there is ultimately only one universe—though different rulial frames may describe it differently.
How should we think about the limiting rulial multiway graph? It turns out that something like it has also appeared in the upper reaches of pure mathematics in connection with higher category theory. We can think of our basic multiway graphs as related to (weak versions of) ordinary categories. It’s a little different from how categorical quantum mechanics works in our models. But when we add in equivalences between branches in the multiway system we get a 2category. And if we keep adding higherandhigherorder equivalences, we get higher and higher categories. But in the infinite limit it turns out the structure we get is exactly the rulial multiway graph—so that now we can identify this as an infinity category, or more specifically an infinity groupoid.
Grothendieck’s conjecture suggests that there is in a sense inevitable geometry in the infinity groupoid, and it’s ultimately this structure that seems to “trickle down” from the rulial multiway graph to everything else we look at, and imply, for example, that there can be meaningful notions of physical and branchial space.
We can think of the limiting multiway graph as a representation of physics and the universe. But the exact same structure can also be thought of as a kind of metamathematical limit of all possible mathematics—in a sense fundamentally tying together the foundations of physics and mathematics.
There are many details and implications to this, that we’re just beginning to work out. The ultimate formation of the rulial multiway graph depends on identifying when states or objects can be treated as the same, and merged. In the case of physics, this can be seen as a feature of the observer, and the reference frames they define. In the case of mathematics, it can be seen as a feature of the underlying axiomatic framework used, with the univalence axiom of homotopy type theory being one possible choice.
The whole concept of rulial space raises the question of why we perceive the kind of laws of physics we do, rather than other ones. And the important recent realization is that it seems deeply connected to what we define as consciousness.
I must say that I’ve always been suspicious about attempts to make a scientific framework for consciousness. But what’s recently become clear is that in our approach to physics there’s both a potential way to do it, and in a sense it’s fundamentally needed to explain what we see.
Long ago I realized that as soon as you go beyond humans, the only viable general definition of intelligence is the ability to do sophisticated computation—which the Principle of Computational Equivalence says is quite ubiquitous. One might have thought that consciousness is an “addon” to intelligence, but actually it seems instead to be a “step down”. Because it seems that the key element of what we consider consciousness is the notion of having a definite “thread of experience” through time—or, in other words, a sequential way to experience the universe.
In our models the universe is doing all sorts of complicated things, and showing all sorts of computational irreducibility. But if we’re going to sample it in the way consciousness does, we’ll inevitably pick out only certain computationally reducible slices. And that’s precisely what the laws of physics we know—embodied in general relativity and quantum mechanics—correspond to. In some sense, therefore, we see physics as we do because we are observing the universe through the sequential thread of experience that we associate with consciousness.
Let me not go deeper into this here, but suffice it to say that from our science we seem to have reached an interesting philosophical conclusion about the way that we effectively “create” our description of the universe as a result of our own sensory and cognitive capabilities. And, yes, that means that “aliens” with different capabilities (or even just different extents in physical or branchial space) could have descriptions of the universe that are utterly incoherent with our own.
But, OK, so what can we say about rulial space? With a particular description of the universe we’re effectively stuck in a particular location or frame in rulial space. But we can imagine “moving” by changing our point of view about how the universe works. We can always make a translation, but that inevitably takes time.
And in the end, just like with light cones in physical space, or entanglement cones in branchial space, there’s a limit to how fast a particular translation distance can be covered, defined by a “translation cone”. And there’s a “maximum translation speed” ρ, analogous to the speed of light c in space or the maximum entanglement speed ζ in branchial space. And in a sense ρ defines the ultimate “processor speed” for the universe.
In defining the speed of light we have to introduce units for length in space. In defining ρ we have to introduce units for the length of descriptions of programs or rules—so, for example, ρ could be measured, say, in units of “Wolfram Language tokens per second”. We don’t know the value of ρ, but an unreliable estimate might be 10^{450} WLT/second. And just like in general relativity and quantum mechanics one can expect that there will be all sorts of effects scaled by ρ that occur in rulial space. (One example might be a “quantumlike uncertainty” that provides limits on inductive inference by not letting one distinguish “theories of the universe” until they’ve “diverged far enough” in rulial space.)
The concept of rulial space is a very general one. It applies to physics. It applies to mathematics. And it also applies to pure computation. In a sense rulial space provides a map of the computational universe. It can be “coordinatized” by representing computations in terms of Turing machines, cellular automata, Wolfram models, or whatever. But in general we can ask about its limiting geometrical and topological structure. And here we see a remarkable convergence with fundamental questions in theoretical computer science.
For example, particular geodesic paths in rulial space correspond to maximally efficient deterministic computations that follow a single rule. Geodesic balls correspond to maximally efficient nondeterministic computations that can follow a sequence of rules. So then something like the P vs. NP question becomes what amounts to a geometrical or topological question about rulial space.
In our Physics Project we set out to find a fundamental theory for physics. But what’s become clear is that in thinking about physics we’re uncovering a formal structure that applies to much more than just physics. We already had the concept of computation in all its generality—with implications like the Principle of Computational Equivalence and computational irreducibility. But what we’ve now uncovered is unification at a different level, not about all computation, but about computation as perceived by computationally bounded observers, and about the kinds of things about which we can expect to make theories as powerful as the ones we know in physics.
For each field what’s key is to identify the right question. What is the analog of space, or time, or quantum measurement, or whatever? But once we know that, we can start to use the machinery our formalism provides. And the result is a remarkable new level of unification and power to apply to science and beyond.
How should one set about finding a fundamental theory of physics? There was no roadmap for the science to do. And there was no roadmap for how the science should be done. And part of the unfolding story of the Wolfram Physics Project is about its process, and about new ways of doing science.
Part of what has made the Wolfram Physics Project possible is ideas. But part of it is also tools, and in particular the tall tower of technology that is the Wolfram Language. In a sense the whole four decades of history behind the Wolfram Language has led us to this point. The general conception of computational language built to represent everything, including, it now seems, the whole universe. And the extremely broad yet tightly integrated capabilities of the language that make it possible to so fluidly and efficiently pursue each different piece of research that is needed.
For me, the Wolfram Physics Project is an exciting journey that, yes, is going much better than I ever imagined. From the start we were keen to share this journey as widely as possible. We certainly hoped to enlist help. But we also wanted to open things up so that as many people as possible could experience and participate in this unique adventure at the frontiers of science.
And a year later I think I can say that our approach to open science has been a great and accelerating success. An increasing number of talented researchers have become involved in the project, and have been able to make progress with great synergy and effectiveness. And by opening up what we’re doing, we’ve also been able to engage with—and hopefully inspire—a very wide range of people even outside of professional science.
One core part of what’s moving the project forward is our tools and the way we’re using them. The idea of computational language—as the Wolfram Language uniquely embodies—is to have a way to represent things in computational terms, and be able to communicate them like that. And that’s what’s happening all the time in the Wolfram Physics Project. There’s an idea or direction. And it gets expressed in Wolfram Language. And that means it can explicitly and repeatably be understood, run and explored—by anyone.
We’re posting our Wolfram Language working notebooks all the time—altogether 895 of them over the past year. And we’re packaging functions we write into the Wolfram Function Repository—130 of them over the past year—all with source code, all documented, and all instantly and openly usable in any Wolfram Language system. It’s become a rhythm for our research. First explore in working notebooks, adding explanations where appropriate to make them readable as computational essays. Then organize important functions and submit them to the Function Repository, then use these functions to take the next steps in the research.
This whole setup means that when people write about their results, there’s immediately runnable computational language code. And in fact, at least in what I’ve personally written, I’ve had the rule that for any picture or result I show (so far 2385 of them) it must be possible to just click it, and immediately get code that will reproduce it. It might sound like a small thing, but this kind of fluid immediacy to being able to reproduce and build on what’s been done has turned out to be tremendously important and powerful.
There are so many details—that in a sense come as second nature given our long experience with production software development. Being careful and consistent about the design of functions. Knowing when it makes sense to optimize at the cost of having less flexible code. Developing robust standardized visualizations. There are lots of what seem like small things that have turned out to be important. Like having consistent color schemes for all our various kinds of graphs, so when one sees what someone has done, one immediately knows “that’s a causal graph”, “that’s a branchial graph” and so on, without even having to read any explanation.
But in addition to opening up the functions and ongoing notebooks we produce, we’ve also done something more radical: we’ve opened up our process of work, routinely livestreaming our working meetings. (There’ve been 168 hours of them this year; we’ve now also posted 331 hours from the 6 months before the launch of the project.) I’ve personally even gone one step further: I’ve posted “video work logs” of my personal ongoing work (so far, 343 hours of them)—right down to, for example, the writing of this very sentence.
We started doing all this partly as an experiment, and partly following the success we’ve had over the past few years in livestreaming our internal meetings designing the Wolfram Language. But it’s turned out that capturing our Physics Project being done has all sorts of benefits that we never anticipated. You see something in a piece I’ve written. You wonder “Where did that come from?”. Well, now you can drill all the way down, to see just what went into making it, missteps and all.
It’s been great to share our experience of figuring things out. And it’s been great to get all those questions, feedback and suggestions in our livestreams. I don’t think there’s any other place where you can see science being done in real time like this. Of course it helps that it’s so uniquely easy to do serious research livecoding in the Wolfram Language. But, yes, it takes some boldness (or perhaps foolhardiness) to expose one’s ongoing steps—forward or backward—in real time to the world. But I hope it helps people see more about what’s involved in figuring things out, both in general and specifically for our project.
When we launched the project, we put online nearly a thousand pages of material, intended to help people get up to speed with what we’d done so far. And within a couple of months after the launch, we had a 4week track of our Wolfram Summer School devoted to the Wolfram Physics Project. We had 30 students there (as well as another 4 from our High School Summer Camp)—all of whom did projects based on the Wolfram Physics Project.
And after the Summer School, responding to tremendous demand, we organized two weeklong study sessions (with 30 more students), followed in January by a 2week Winter School (with another 17 students). It’s been great to see so many people coming up to speed on the project. And so far there’ve been a total of 79 publications, “bulletins” and posts that have come out of this—containing far more than, for example, I could possibly have summarized here.
There’s an expanding community of people involved with the Wolfram Physics Project. And to help organize this, we created our Research Affiliate and Junior Research Affiliate programs, now altogether with 49 people from around the world involved.
Something else that’s very important is happening too: steadily increasing engagement from a wide range of areas of physics, mathematics and computer science. In fact, with every passing month it seems like there’s some new research community that’s engaging with the project. Causal set theory. Categorical quantum mechanics. Term rewriting. Numerical relativity. Topos theory. Higher category theory. Graph rewriting. And a host of other communities too.
We can view the achievement of our project as being in a sense to provide a “machine code” for physics. And one of the wonderful things about it is how well it seems to connect with a tremendous range of work that’s been done in mathematical physics—even when it wasn’t yet clear how that work on its own might relate to physical reality. Our project, it seems, provides a kind of Rosetta stone for mathematical physics—a common foundation that can connect, inform and be informed by all sorts of different approaches.
Over the past year there’s been a repeated, rather remarkable experience. For some reason or another we’ll get exposed to some approach or idea. Constructor theory. Causal dynamical triangulation. Ontological bases. Synthetic differential geometry. ER=EPR. And we’ll use our models as a framework for thinking about it. And we’ll realize: “Gosh, now we can understand that!” And we’ll see how it fits in with our models, how we can learn more about our models from it—and how we can use our models and our formalism to bring in new ideas to advance the thing itself.
In some ways our project represents a radical shift from the past century or so of physics. And more often than not, when such intellectual shifts are made in the history of science, they’ve been accompanied by all kinds of difficulties in connecting with existing communities. But I’m very happy to report that over the past year our project has been doing quite excellently in connecting with existing communities—no doubt helped by its “Rosetta stone” character. And as we progress, we’re looking forward to an increasing network of collaborations, both within the community that’s already formed and with other communities.
And over the coming year, as we start to more seriously explore the implications of our models and formalism even beyond physics, I’m anticipating still more connections and collaborations.
It’s hard to believe it’s only been a little over 18 months since we started working in earnest on the Wolfram Physics Project. So much has happened, and we’ve gotten so much further than I ever thought possible. And it feels like a whole new world has opened up. So many new ideas, so many new ways of looking at things.
I’ve been fortunate enough to have already had a long and satisfying career, and it’s a surprising and remarkable thing at this stage to have what seems like a fresh, new start. Of course, in some respects I’ve spent much of my life preparing for what is now the Wolfram Physics Project. But the actuality of it has been so much more exciting and invigorating than anything I imagined. There’ve been so many questions—about all sorts of different things—that I’ve been accumulating and mulling over for decades. And suddenly it seems as if a door I never knew existed has opened, and now it’s possible to go forward on a dizzying array of fronts.
I’ve spent most of my life building a whole tower of things—alternating between science and technology. And in this tower it’s remarkable the extent to which each level has built on what’s come before: tools from technology have made it possible to explore science, and ideas from science have made it possible to create technology. But a year ago I thought the Wolfram Physics Project might finally be the end of the line: a piece of basic science that was really just science, and nothing but science, with no foreseeable implications for technology.
But it turns out I was completely wrong. And in fact of all the pieces of basic science I’ve ever done, the Wolfram Physics Project may be the one which has the greatest shortterm implications for technology. We’re not talking about building starships using physics. We’re talking about taking the formalism we’ve developed for physics—and applying it, now informed by physics, in all sorts of very practical settings in distributed computing, modeling, chemistry, economics and beyond.
In the end, one may look back at many of these applications and say “that didn’t really need the Physics Project; we could have just got there directly”. But in my experience, that’s not how intellectual progress works. It’s only by building a tower of tools and ideas that one can see far enough to understand what’s possible. And without that decades or centuries may go by, with the path forward hiding in what will later seem like plain sight.
A year ago I imagined that in working on the Wolfram Physics Project I’d mostly be doing things that were “obviously physics”. But in actuality the project has led me to pursue all sorts of “distractions”. I’ve studied things like multiway Turing machines, which, yes, are fairly obviously related to questions about quantum mechanics. But I’ve also studied combinators and tag systems (OK, these were induced by the arrival of centenaries). And I spent a while looking at the empirical mathematics of Euclid and beyond.
And, yes, the way I approached all these things was strongly informed by our Physics Project. But what’s surprising is that I feel like doing each of these projects advanced the Physics Project too. The “Euclid” project has started to build a bridge that lets us import the intuition and formalism of metamathematics—informed by the concrete example of Euclid’s Elements. The combinator project deepened my understanding of causal invariance and of the possible structures of things like space. And even the historical scholarship I did on combinators taught me a lot about issues in the foundations of mathematics that have languished for a century but I now realize are important.
In all the pieces I’ve written over the past year add up to about 750 pages of material (and, yes, that number makes me feel fairly productive). But there’s so much more to do and to write. A few times in my life I’ve had the great pleasure of discovering a new paradigm and being able to start exploring what’s possible within it. And in many ways the Wolfram Physics Project has—yes, after three decades of gestation—been the most sudden of these experiences. It’s been an exciting year. And I’m looking forward to what comes next, and to seeing the new paradigm that’s been created develop both in physics and beyond.
One of the great pleasures of this year has been the energy and enthusiasm of people working on the Wolfram Physics Project. But I’d particularly like to mention Jonathan Gorard, who has achieved an exceptional level of productivity and creativity, and has been a driving force behind many of the advances described here.
]]>For most big ideas in recorded intellectual history one can answer the question: “What became of the person who originated it?” But late last year I tried to answer that for Moses Schönfinkel, who sowed a seed for what’s probably the single biggest idea of the past century: abstract computation and its universality.
I managed to find out quite a lot about Moses Schönfinkel. But I couldn’t figure out what became of him. Still, I kept on digging. And it turns out I was able to find out more. So here’s an update….
To recap a bit: Moses Schönfinkel was born in 1888 in Ekaterinoslav (now Dnipro) in what’s now Ukraine. He went to college in Odessa, and then in 1914 went to Göttingen to work with David Hilbert. He didn’t publish anything, but on December 7, 1920—at the age of 32—he gave a lecture entitled “Elemente der Logik” (“Elements of Logic”) that introduced what are now called combinators, the first complete formalism for what we’d now call abstract computation. Then on March 18, 1924, with a paper based on his lecture just submitted for publication, he left for Moscow. And basically vanished.
It’s said that he had mental health issues, and that he died in poverty in Moscow in 1940 or 1942. But we have no concrete evidence for either of these claims.
When I was researching this last year, I found out that Moses Schönfinkel had a younger brother Nathan Scheinfinkel (yes, he used a different transliteration of the Russian Шейнфинкель) who became a physiology professor at Bern in Switzerland, and later in Turkey. Late in the process, I also found out that Moses Schönfinkel had a younger sister Debora, who we could tell graduated from high school in 1907.
Moses Schönfinkel came from a Jewish merchant family, and his mother came from a quite prominent family. I suspected that there might be other siblings (Moses’s mother came from a family of 8). And the first “new find” was that, yes, there were indeed two additional younger brothers. Here are the recordings of their births now to be found in the State Archives of the Dnipropetrovsk (i.e. Ekaterinoslav) Region:
So the complete complement of Шейнфинкель/Schönfinkel/Scheinfinkel children was (including birth dates both in their original Julian calendar form, and in their modern Gregorian form, and graduation dates in modern form):
And having failed to find out more about Moses Schönfinkel directly, plan B was to investigate his siblings.
I had already found out a fair amount about Nathan. He was married, and lived at least well into the 1960s, eventually returning to Switzerland. And most likely he had no children.
Debora we could find no trace of after her highschool graduation (we looked for marriage records, but they’re not readily available for what we assume is the relevant time period).
By the way, rather surprisingly, we found nice (alphabetically ordered), printed class lists from the highschool graduations (apparently these were distributed to highereducation institutions across the Russian Empire so anyone could verify “graduation status”, and were deposited in the archives of the education district, where they’ve now remained for more than a century):
(We can’t find any particular trace of the 36 other students in the same group as Moses.)
OK, so what about the “newly found siblings”, Israel and Gregory? Well, here we had a bit more luck.
For Israel we found these somewhat strange traces:
They are World War I hospital admission records from January and December 1916. Apparently Israel was a private in the 2nd Finnish Regiment (which—despite its name—by then didn’t have any Finns in it, and in 1916 was part of the Russian 7th Army pushing west in southern Ukraine in the effort to retake Galicia). And the documents we have show that twice he ended up in a hospital in Pavlohrad (only about 40 miles from Ekaterinoslav, though in the opposite direction from where the 7th Army was) with some kind of (presumably not lifethreatening) hernialike problem.
But unfortunately, that’s it. No more trace of Israel.
OK, what about the “baby brother”, Gregory, 11 years younger than Moses? Well, he shows up in World War II records. We found four documents:
Document #4 contains something interesting: an address for Gregory in 1944—in Moscow. Remember that Moses went to Moscow in 1924. And one of my speculations was that this was the result of some family connection there. Well, at least 20 years later (and probably also much earlier, as we’ll see), his brother Gregory was in Moscow. So perhaps that’s why Moses went there in 1924.
OK, but what story do these World War II documents tell about Gregory? Document #1 tells us that on July 27, 1943, Gregory arrived at the military unit designated 15 зсп 44 зсбр (15 ZSP 44 ZSBR) at transit point (i.e. basically “military address”) 215 азсп 61А (215 AZSP 61A). It also tells us that he had the rank of private in the Red Army.
Sometime soon thereafter he was transferred to unit 206 ZSP. But unfortunately he didn’t last long in the field. Around October 1, 1943, he was wounded (later, we learn he has “one wound”), and—as document #2 tells us—he was one of 5 people picked up by hospital train #762 (at transit point 206 зсп ЗапФ). On November 26, 1943, document #3 records that he was discharged from the hospital train (specifically, the document explains that he’s not getting paid for the time he was on the hospital train). And, finally, document #4 records that on February 18, 1944—presumably after a period of assessment of his condition—he’s discharged from the military altogether, returning to an address in Moscow.
OK, so first some military points. When Gregory arrived in the army in July 1943 he was assigned (as a reserve or “replacement”) to the 44th Rifle Brigade (44 зсбр) in the 15th Rifle Division (15 зсп) in the 61st Army (61A)—presumably as part of reinforcements brought in after some heavy Soviet losses. Later he was transferred to the 206th Rifle Division in the 47th Army, which is where he was when he was wounded around October 1, 1943.
What was the general military situation then? In the summer of 1943 the major story was that the Soviets were trying to push the Germans back west, with the front pretty much along the Dnieper River in Ukraine—which, curiously enough, flows right through the middle of Ekaterinoslav. On October 4, 1943, here’s how the New York Times presented things:
But military history being what it is, there’s much more detailed information available. Here’s a modern map showing troop movements involving the 47th Army in late September 1943:
The Soviets managed to get more than 100,000 men across the Dnieper River, but there was intense fighting, and at the end of September the 206th Rifle Division (as part of the 47th Army) was probably involved in the later stages of the fight for the Bukrin Bridgehead. And this is probably where Gregory Schönfinkel was wounded.
After being wounded, he seems to have been taken to some kind of service area for the 206th Rifle Division (206 зсп ЗапФ), from which he was picked up by a hospital train (and, yes, it was actually a moving hospital, with lots of cars with red crosses painted on top).
But more significant in our quest for the story of Gregory Schönfinkel is other information in the military documents we have. They record that he is Jewish (as opposed to “Russian”, which is how basically all the other soldiers in these lists are described). Then they say that he has “higher education”. One says he is an “engineer”. Another is more specific, and says he’s an “engineer economist” (Инж. Эконом.). They also say that he is not a member of the Communist Party.
They say he is a widower, and that his wife’s name was Evdokiya Ivanovna (Евдокия Иван.). They also list his “mother”, giving her name as Мария Григ. (“Maria Grig.”, perhaps short for “Grigorievna”). And then they list an address: Москва С. Набер. д. 26 кв. 1ч6, which is presumably 26 Sofiyskaya Embankment, Apartment 16, Moscow.
Where is that address? Well, it turns out it’s in the very center of Moscow (“inside the Garden Ring”), with the front looking over the Moscow River directly at the Kremlin:
Here’s a current picture of the building
as well as one from perhaps 100 years earlier:
The building was built by a family of merchants named the Bakhrushins in 1900–1903 to provide free apartments for widows and orphans (apparently there were about 450 oneroom 150to300squarefoot apartments). In the Russian Revolution, the building was taken over by the government, and set up to house the Ministry of Oil and Gas. But some “communal apartments” were left, and it’s presumably in one of those that Gregory Schönfinkel lived. (Today the building is the headquarters of the Russian state oil company Rosneft.)
OK, but let’s unpack this a bit further. “Communal apartments” basically means dormitorystyle housing. A swank building, but apparently not so swank accommodation. Well, actually, in Soviet times dormitorystyle housing was pretty typical in Moscow, so this really was a swank setup.
But then there are a couple of mysteries. First, how come a highly educated engineering economist with a swank address was just a private in the army? (When the hospital train picked up Gregory, along with four other privates, one of the others was listed as a carpenter; the others were all listed as “с/хоз” or “сельское хозяйство”, basically meaning “farm laborer”, or what before Soviet times would have been called “peasant”).
Maybe the Russian army was so desperate for recruits after all their losses that—despite being 44 years old—Gregory was drafted. Maybe he volunteered (though then we have to explain why he didn’t do that earlier). But regardless of how he wound up in the army, maybe his status as a private had to do with the fact that he wasn’t a member of the Communist Party. At that time, a large fraction of the citydwelling “elite” were members of the Communist Party (and it wouldn’t have been a major problem that he was Jewish, though coming from a merchant family might have been a negative). But if he wasn’t in the “elite”, how come the swank address?
A first observation is that his wife’s first name Evdokiya was a popular Russian Orthodox name, at least before 1917 (and is apparently popular again now). So presumably Gregory had—not uncommonly in the Soviet era—married someone who wasn’t Jewish. But now let’s look at the “mother’s” name: “Мария Григ.” (“Maria Grig.”).
We know Gregory’s (and Moses’s) mother’s name was Maria/“Masha” Gertsovna Schönfinkel (née Lurie)—or Мария (“Маша”) Герцовна Шейнфинкель. And according to other information, she died in 1936. So—unless someone miswrote Gregory’s “mother’s” name—the patronymics (second names) don’t match. So what’s going on?
My guess is that the “mother” is actually a motherinlaw, and that it was her apartment. Perhaps her husband (most likely at that point not her) had worked at the Ministry of Oil and Gas, and that’s how she ended up with the apartment. Maybe Gregory worked there too.
OK, so what was an “engineer economist” (Инженер Экономист)? In the planningoriented Soviet system, it was something quite important: basically a person who planned and organized production and labor in some particular industry.
How did one become an “engineer economist”? At least a bit later, it was a 5year “master’s level” course of study, including courses in engineering, mathematics, bookkeeping, finance, economics of a particular sector, and “political economy” (à la Marx). And it was a very Soviet kind of thing. So the fact that that was what Gregory did presumably means that he was educated in the Soviet Union.
He must have finished high school right when the Tsar was being overthrown. Probably too late to be involved in World War I. But perhaps he got swept up in the Russian Civil War. Or maybe he was in college then, getting an early Soviet education. But, in any case, as an engineer economist it’s pretty surprising that in World War II he didn’t get assigned to something technical in the army, and was just a simple private in the infantry.
From the data we have, it’s not clear what was going on. But maybe it had something to do with Moses.
It’s claimed that Moses died in 1940 or 1942 and was “living in a communal apartment”. Well, maybe that communal apartment was actually Gregory’s (or at least his motherinlaw’s) apartment. And here’s a perhaps fanciful theory: Gregory joined the army out of some kind of despondency. His wife died. His older brother died. And in February 1942 any of his family members still in Ekaterinoslav probably died in the massacre of the Jewish population there (at least if they hadn’t evacuated as a result of earlier bombing). Gregory hadn’t joined the army earlier in the war, notably during the Battle of Moscow. And by 1943 he was 44 years old. So perhaps in some despondency—or anger—he volunteered for the army.
We don’t know. And at this point the trail seems to go cold. It doesn’t appear that Gregory had any children, and we haven’t been able to find out anything more about him.
But I consider it progress that we’ve managed to identify that Moses’s younger brother lived in Moscow, potentially providing a plausible reason that Moses might have gone to Moscow.
Actually, there may have been other “family reasons”. There seems to have been quite a lot of backandforth in the Jewish population between Moscow and Ekaterinoslav. And Moses’s mother came from the Lurie family, which was prominent not only in Ekaterinoslav, but also in Moscow. And it turns out that the Lurie family has done a fair amount of genealogy research. So we were able, for example, to reach a first cousin once removed of Moses’s (i.e. someone whose parent shared a grandparent with Moses, or 1/32 of the genetics). But so far nobody has known anything about what happened to Moses, and nobody has said “Oh, and by the way, we have a suitcase full of strange papers” or anything.
I haven’t given up. And I’m hoping that we’ll still be able to find out more. But this is where we’ve got so far.
In addition to pursuing the question of the fate of Moses Schönfinkel, I’ve made one other potential connection. Partly in compiling a bibliography of combinators, I discovered a whole collection of literature about “combinatory categorial grammars” and “combinatory linguistics”.
What are these? These days, the most common way to parse an English sentence like “I am trying to track down a piece of history” is a hierarchical tree structure—analogous to the way a contextfree computer language would be parsed:
But there is an alternative—and, as it turns out, significantly older—approach: to use a socalled dependency grammar in which verbs act like functions, “depending” on a collection of arguments:
In something like Wolfram Language, the arguments in a function would appear in some definite order and structure, say as f[x, y, z]. But in a natural language like English, everything is just given in sequence, and a function somehow has to have a way to figure out what to grab. And the idea is that this process might work like how combinators written out in sequence “grab” certain elements to act on.
This idea seems to have a fairly tortuous history, mixed up with attempts and confusions about connecting the syntax (i.e. grammatical structure) of human languages to their semantics (i.e. meaning). The core issue has been that it’s perfectly possible to have a syntactically correct sentence (“The flying chair ate a happy semicolon”) that just doesn’t seem to have any “realworld” meaning. How should one think about this?
I think the concept of computational language that I’ve spent so many years developing actually makes it fairly clear. If one can express something in computational language there’s a way to compute from it. Maybe the resulting computation will align with what happens in the real world; maybe it won’t. But there’s some “meaningful place to go” with what one has. And the point is that a computational language has a welldefined “inner computational representation” for things. The particular syntax (e.g. sequence of characters) that one might use for input or output in the computational language is just something superficial.
But without the idea of computational language people have struggled to formalize semantics, tending to try to hang what they’re doing on the detailed structure and syntax of human languages. But then what should one do about syntactically correct structures that don’t “mean anything”? An example of what I consider to be a rather bizarre solution—embodied in socalled Montague grammars from the 1970s—is essentially to turn pieces of certain sentences into functions, in which there’s nothing “concrete” there, just “slots” where things could go (“x_ ate y_”)—and where one can “hold off meaninglessness” by studying things without explicitly filling in the slots.
In the original formulation, the “functions” were thought about in terms of lambdas. But combinatory categorial grammars view them instead in terms of combinators, in which in the course of a sentence words in a sense “apply to each other”. And even without the notion of slots one can do “combinatory linguistics” and imagine finding the structure of sentences by taking words to “apply themselves” “across the sentence” like combinators.
If well designed (as I hope the Wolfram Language is!) computational language has a certain clean, formal structure. But human natural language is full of messiness, which has to be untangled by natural language understanding—as we’ve done for so many years for WolframAlpha, always ultimately translating to our computational language, the Wolfram Language.
But without the notion of an underlying computational language, people tend to feel the need to search endlessly for formal structure in human natural language. And, yes, some exists. But—as we see all the time in actually doing practical natural language understanding for WolframAlpha—there’s a giant tail that seems to utterly explode any allencompassing formal theory.
Are there at least fragments that have formal structure? There are things like logic (“and”, “or”, etc.) that get used in human language, and which are fairly straightforwardly formalizable. But maybe there are more “functional” structures too, perhaps having to do with the operation of verbs. And in combinatory linguistics, there’ve been attempts to find these—even for example directly using things like Schönfinkel’s S combinator. (Given S f g x → f[x][g[x]] one can start imagining—with a slight stretch—that “eat peel orange” operates like the S combinator in meaning “eat[orange][peel[orange]]”.)
Much of the work on this has been done in the last few decades. But it turns out that its history stretches back much further, and might conceivably actually intersect with Moses Schönfinkel himself.
The key potential link is Kazimierz Ajdukiewicz (1890–1963). Ajdukiewicz was a Polish logician/philosopher who long tried to develop a “mathematicized theory” of how meaning emerges, among other things, from natural language, and who basically laid the early groundwork for what’s now combinatory linguistics.
Kazimierz Ajdukiewicz was born two years after Moses Schönfinkel, and studied philosophy, mathematics and physics at the University of Lviv (now in Ukraine), finishing his PhD in 1912 with a thesis on Kant’s philosophy of space. But what’s most interesting for our purposes is that in 1913 Ajdukiewicz went to Göttingen to study with David Hilbert and Edmund Husserl.
In 1914 Ajdukiewicz published one paper on “Hilbert’s New Axiom System for Arithmetic”, and another on contradiction in the light of Bertrand Russell’s work. And then in 1915 Ajdukiewicz was drafted into the Austrian army, where he remained until 1920, after which he went to work at the University of Warsaw.
But in 1914 there’s an interesting potential intersection. Because June of that year is when Moses Schönfinkel arrived in Göttingen to work with Hilbert. At the time, Hilbert was mostly lecturing about physics (though he also did some lectures about “principles of mathematics”). And it seems inconceivable that—given their similar interests in the structural foundations of mathematics—they wouldn’t have interacted.
Of course, we don’t know how close to combinators Schönfinkel was in 1914; after all, his lecture introducing them was six years later. But it’s interesting to at least imagine some interaction with Ajdukiewicz. Ajdukiewicz’s own work was at first most concerned with things like the relationship of mathematical formalism and meaning. (Do mathematical constructs “actually exist”, given that their axioms can be changed, etc.?) But by the beginning of the 1930s he was solidly concerned with natural language, and was soon writing papers with titles like “Syntactic Connexion” that gave formal symbolic descriptions of language (complete with “functors”, etc.) quite reminiscent of Schönfinkel’s work.
So far as I can tell Ajdukiewicz never explicitly mentioned Schönfinkel in his publications. But it seems like too much of a coincidence for the idea of something like combinators to have arisen completely independently in two people who presumably knew each other—and never to have independently arisen anywhere else.
Thanks to Vitaliy Kaurov for finding additional documents (and to the State Archives of the Dnipropetrovsk Region and Elena Zavoiskaia for providing various documents), Oleg and Anna Marichev for interpreting documents, and Jason Cawley for information about military history. Thanks also to Oleg Kiselyov for some additional suggestions on the original version of this piece.
]]>For years I’ve batted it away. I’ll be talking about my discoveries in the computational universe, and computational irreducibility, and my Principle of Computational Equivalence, and people will ask “So what does this mean about consciousness?” And I’ll say “that’s a slippery topic”. And I’ll start talking about the sequence: life, intelligence, consciousness.
I’ll ask “What is the abstract definition of life?” We know about the case of life on Earth, with all its RNA and proteins and other implementation details. But how do we generalize? What is life generally? And I’ll argue that it’s really just computational sophistication, which the Principle of Computational Equivalence says happens all over the place. Then I’ll talk about intelligence. And I’ll argue it’s the same kind of thing. We know the case of human intelligence. But if we generalize, it’s just computational sophistication—and it’s ubiquitous. And so it’s perfectly reasonable to say that “the weather has a mind of its own”; it just happens to be a mind whose details and “purposes” aren’t aligned with our existing human experience.
I’ve always implicitly assumed that consciousness is just a continuation of the same story: something that, if thought about in enough generality, is just a feature of computational sophistication, and therefore quite ubiquitous. But from our Physics Project—and particularly from thinking about its implications for the foundations of quantum mechanics—I’ve begun to realize that at its core consciousness is actually something rather different. Yes, its implementation involves computational sophistication. But its essence is not so much about what can happen as about having ways to integrate what’s happening to make it somehow coherent and to allow what we might see as “definite thoughts” to be formed about it.
And rather than consciousness being somehow beyond “generalized intelligence” or general computational sophistication, I now instead see it as a kind of “step down”—as something associated with simplified descriptions of the universe based on using only bounded amounts of computation. At the outset, it’s not obvious that a notion of consciousness defined in this way could consistently exist in our universe. And indeed the possibility of it seems to be related to deep features of the formal system that underlies physics.
In the end, there’s a lot going on in the universe that’s in a sense “beyond consciousness”. But the core notion of consciousness is crucial to our whole way of seeing and describing the universe—and at a very fundamental level it’s what makes the universe seem to us to have the kinds of laws and behavior it does.
Consciousness is a topic that’s been discussed and debated for centuries. But the surprise to me is that with what we’ve learned from exploring the computational universe and especially from our recent Physics Project it seems there may be new perspectives to be had, which most significantly seem to have the potential to connect questions about consciousness to concrete, formal scientific ideas.
Inevitably the discussion of consciousness—and especially its connection to our new foundations of physics—is quite conceptually complex, and all I’ll try to do here is sketch some preliminary ideas. No doubt quite a bit of what I say can be connected to existing philosophical and other thinking, but so far I’ve only had a chance to explore the ideas themselves, and haven’t yet tried to study their historical context.
The universe in our models is full of sophisticated computation, all the way down. At the lowest level it’s just a giant collection of “atoms of space”, whose relationships are continually being updated according to a computational rule. And inevitably much of that process is computationally irreducible, in the sense that there’s no general way to “figure out what’s going to happen” except, in effect, by just running each step.
But given that, how come the universe doesn’t just seem to us arbitrarily complex and unpredictable? How come there’s order and regularity that we can perceive in it? There’s still plenty of computational irreducibility. But somehow there are also pockets of reducibility that we manage to leverage to form a simpler description of the world, that we can successfully and coherently make use of. And a fundamental discovery of our Physics Project is that the two great pillars of twentiethcentury physics—general relativity and quantum mechanics—correspond precisely to two such pockets of reducibility.
There’s an immediate analog—that actually ends up being an example of the same fundamental computational phenomenon. Consider a gas, like air. Ultimately the gas consists of lots of molecules bouncing around in a complicated way that’s full of computational irreducibility. But it’s a central fact of statistical mechanics that if we look at the gas on a large scale, we can get a useful description of what it does just in terms of properties like temperature and pressure. And in effect this reflects a pocket of computational reducibility, that allows us to operate without engaging with all the computational irreducibility underneath.
How should we think about this? An idea that will generalize is that as “observers” of the gas, we’re conflating lots of different microscopic configurations of molecules, and just paying attention to overall aggregate properties. In the language of statistical mechanics, it’s effectively a story of “coarse graining”. But within our computational approach, there’s now a clear, computational way to characterize this. At the level of individual molecules there’s an irreducible computation happening. And to “understand what’s going on” the observer is doing a computation. But the crucial point is that if there’s a certain boundedness to that computation then this has immediate consequences for the effective behavior the observer will perceive. And in the case of something like a gas, it turns out to directly imply the Second Law of Thermodynamics.
In the past there’s been a certain amount of mystery around the origin and validity of the Second Law. But now we can see it as a consequence of the interplay between underlying computational irreducibility and the computational boundedness of observers. If the observer kept track of all the computationally irreducible motions of individual molecules, they wouldn’t see Second Law behavior. The Second Law depends on a pocket of computational reducibility that in effect emerges only when there’s a constraint on the observer that amounts to the requirement that the observer has a “coherent view” of what’s going on.
So what about physical space? The traditional view had been that space was something that could to a large extent just be described as a coherent mathematical object. But in our models of physics, space is actually made of an immense number of discrete elements whose pattern of interconnections evolves in a complex and computationally irreducible way. But it’s much like with the gas molecules. If an observer is going to form a coherent view of what’s going on, and if they have bounded computational capabilities, then this puts definite constraints on what behavior they will perceive. And it turns out that those constraints yield exactly relativity.
In other words, for the “atoms of space”, relativity is the result of the interplay between underlying computational irreducibility and the requirement that the observer has a coherent view of what’s going on.
It may be helpful to fill in a little more of the technical details. Our underlying theory basically says that each elementary element of space follows computational rules that will yield computationally irreducible behavior. But if that was all there was to it, the universe would seem like a completely incoherent place, with every part of it doing irreducibly unpredictable things.
But imagine there’s an observer who perceives coherence in the universe. And who, for example, views there as being a definite coherent notion of “space”. What can we say about such an observer? The first thing is that since our model is supposed to describe everything in the universe, it must in particular include our observer. The observer must be an embedded part of the system—made up of the same atoms of space, and following the same rules, as everything else.
And there’s an immediate consequence to this. From “inside” the system there are only certain things about the system that the observer can perceive. Let’s say, for example, that in the whole universe there’s only one point at which anything is updated at any given time, but that that “update point” zips around the universe (in “Turing machine style”), sometimes updating a piece of the observer, and sometimes updating something they were observing. If one traces through scenarios like this, one realizes that from “inside the system” the only thing the observer can ever perceive is causal relationships between events.
They can’t tell “specifically when” any given event happens; all they can tell is what event has to happen before what other one, or in other words, what the causal relationships between events are. And this is the beginning of what makes relativity inevitable in our models.
But there are two other pieces. If the observer is going to have a coherent description of “space” they can’t in effect be tracking each atom separately; they’ll have to fit them into some overall framework, say by assigning each of them particular “coordinates”, or, in the language of relativity, defining a “reference frame” that conflates many different points in space. But if the observer is computationally bounded, then this puts constraints on the structure of the reference frame: it can’t for example be so wild that it separately traces the computationally irreducible behavior of individual atoms of space.
But let’s say an observer has successfully picked some reference frame. What’s to say that as the universe evolves it’s still possible to consistently maintain that reference frame? Well, this relies on a fundamental property that we believe either directly or effectively defines the operation of our universe: what we call “causal invariance”. The underlying rules just describe possible ways that the connections between atoms of space can be updated. But causal invariance implies that whatever actual sequence of updatings is used, there must always be the same graph of causal relationships.
And it’s this that gives observers the ability to pick different reference frames, and still have the same consistent and coherent perception of the behavior of the universe. And in the end, we have a definite result: that if there’s underlying computational irreducibility—plus causal invariance—then any observer who forms their perception of the universe in a computationally bounded way must inevitably perceive the universe to follow the laws of general relativity.
But—much like with the Second Law—this conclusion relies on having an observer who forms a coherent perception of the universe. If the observer could separately track every atom of space they won’t “see general relativity”; that only emerges for an observer who forms a coherent perception of the universe.
OK, so what about quantum mechanics? How does that relate to observers? The story is actually surprisingly similar to both the Second Law and general relativity: quantum mechanics is again something that emerges as a result of trying to form a coherent perception of the universe.
In ordinary classical physics one considers everything that happens in the universe to happen in a definite way, in effect defining a single thread of history. But the essence of quantum mechanics is that actually there are many threads of history that are followed. And an important feature of our models is that this is inevitable.
The underlying rules define how local patterns of connections between atoms of space should be updated. But in the hypergraph of connections that represents the universe there will in general be many different places where the rules can be applied. And if we trace all the possibilities we get a multiway graph that includes many possible threads of history, sometimes branching and sometimes merging.
So how will an observer perceive all this? The crucial point is that the observer is themselves part of this multiway system. So in other words, if the universe is branching, so is the observer. And in essence the question becomes how a “branching brain” will perceive a branching universe.
It’s fairly easy to imagine how an observer who is “spatially large” compared to individual molecules in a gas—or atoms of space—could conflate their view of these elements so as to perceive only some aggregate property. Well, it seems like very much the same kind of thing is going on with observers in quantum mechanics. It’s just that instead of being extended in physical space, they’re extended in what we call branchial space.
Consider a multiway graph representing possible histories for a system. Now imagine slicing through this graph at a particular level that in effect corresponds to a particular time. In that slice there will be a certain set of nodes of the multiway graph, representing possible states of the system. And the structure of the multiway graph then defines relationships between these states (say through common ancestry). And in a largescale limit we can say that the states are laid out in branchial space.
In the language of quantum mechanics, the geometry of branchial space in effect defines a map of entanglements between quantum states, and coordinates in branchial space are like phases of quantum amplitudes. In the evolution of a quantum system, one might start from a certain bundle of quantum states, then follow their threads of history, looking at where in branchial space they go.
But what would a quantum observer perceive about this? Even if they didn’t start that way, over time a quantum observer will inevitably become spread out in branchial space. And so they’ll always end up sampling a whole region in branchial space, or a whole bundle of “threads of history” in the multiway graph.
What will they make of them? If they considered each of them separately no coherent picture would emerge, not least since the underlying evolution of individual threads of history can be expected to be computationally irreducible. But what if the observer just defines their way of viewing things to be one that systematically organizes different threads of history, say by conflating “computationally nearby” ones? It’s similar to setting up a reference frame in relativity, except that now the coherent representation that this “quantum frame” defines is of branchial space rather than physical space.
But what will this coherent representation be like? Well, it seems to be exactly quantum mechanics as it was developed over the past century. In other words, just like general relativity emerges as an aggregate description of physical space formed by a computationally bounded observer, so quantum mechanics emerges as an aggregate description of branchial space.
Does the observer “create” the quantum mechanics? In some sense, yes. Just as in the spacetime case, the multiway graph has all sorts of computationally irreducible things going on. But if there’s an observer with a coherent description of what’s going on, then their description must follow the laws of quantum mechanics. Of course, there are lots of other things going on too—but they don’t fit into this coherent description.
OK, but let’s say that we have an observer who’s set up a quantum frame that conflates different threads of history to get a coherent description of what’s going on. How will their description correlate with what another observer—with a different quantum frame—would perceive? In the traditional formalism of quantum mechanics it’s always been difficult to explain why different observers—making different measurements—still fundamentally perceive the universe to be working the same.
In our model, there’s a clear answer: just like in the spacetime case, if the underlying rules show causal invariance, then regardless of the frame one uses, the basic perceived behavior will always be the same. Or, in other words, causal invariance guarantees the consistency of the behavior deduced by different observers.
There are many technical details to this. The traditional formalism of quantum mechanics has two separate parts. First, the time evolution of quantum amplitudes, and second, the process of measurement. In our models, there’s a very beautiful correspondence between the phenomenon of motion in space and the evolution of quantum amplitudes. In essence, both are associated with the deflection of (geodesic) paths by the presence of energymomentum. But in the case of motion this deflection (that we identify as the effect of gravity) happens in physical space, while in the quantum case the deflection (that we identify as the phase change specified by the path integral) happens in branchial space. (In other words, the Feynman path integral is basically just the direct analog in branchial space of the Einstein equations in physical space.)
OK, so what about quantum measurement? Doing a quantum measurement involves somehow taking many threads of history (corresponding to a superposition of many quantum states) and effectively reducing them to a single thread that coherently represents the “outcome”. A quantum frame defines a way to do this—in effect specifying the pattern of threads of history that should be conflated. In and of itself, a quantum frame—like a relativistic reference frame—isn’t a physical thing; it just defines a way of describing what’s going on.
But as a way of probing possible coherent representations that an observer can form, one can consider what happens if one formally conflates things according to a particular quantum frame. In an analogy where the multiway graph defines inferences between propositions in a formal system, conflating things is like “performing certain completions”. And each completion is then like an elementary step in the act of measurement. And by looking at the effect of all necessary completions one gets the “Completion Interpretation of Quantum Mechanics” suggested by Jonathan Gorard.
Assuming that the underlying rule for the universe ultimately shows causal invariance, doing these completions is never fundamentally necessary, because different threads of history will always eventually give the same results for what can be perceived within the system. But if we want to get a “possible snapshot” of what the system is doing, we can pick a quantum frame and formally do the completions it defines.
Doing this doesn’t actually “change the system” in a way that we would “see from outside”. It’s only that we’re in effect “doing a formal projection” to see how things would be perceived by an observer who’s picked a particular quantum frame. And if the observer is going to have a coherent perception of what’s going on, they in effect have to have picked some specific quantum frame. But then from the “point of view of the observer” the completions associated with that frame in some sense “seem real” because they’re the way the observer is accessing what’s going on.
Or, in other words, the way a computationally bounded “branching brain” can have a coherent perception of a “branching universe” is by looking at things in terms of quantum frames and completions, and effectively picking off a computationally reducible slice of the whole computationally irreducible evolution of the universe—where it then turns out that the slice must necessarily follow the laws of quantum mechanics.
So, once again, for a computationally bounded observer to get a coherent perception of the universe—with all its underlying computational irreducibility—there’s a strong constraint on what that perception can be. And what we’ve discovered is that it turns out to basically have to follow the two great core theories of twentiethcentury physics: general relativity and quantum mechanics.
It’s not immediately obvious that there has to be any way to get a coherent perception of the universe. But what we now know is that if there is, it essentially forces specific major results about physics. And, of course, if there wasn’t any way to get a coherent perception of the universe there wouldn’t really be systematic overall laws, or, for that matter, anything like physics, or science as we know it.
What’s special about the way we humans experience the world? At some level, the very fact that we even have a notion of “experiencing” it at all is special. The world is doing what it does, with all sorts of computational irreducibility. But somehow even with the computationally bounded resources of our brains (or minds) we’re able to form some kind of coherent model of what’s going on, so that, in a sense, we’re able to meaningfully “form coherent thoughts” about the universe. And just as we can form coherent thoughts about the universe, so also we can form coherent thoughts about that small part of the universe that corresponds to our brains—or to the computations that represent the operation of our minds.
But what does it mean to say that we “form coherent thoughts”? There’s a general notion of computation, which the Principle of Computational Equivalence tells us is quite ubiquitous. But it seems that what it means to “form coherent thoughts” is that computations are being “concentrated down” to the point where a coherent stream of “definite thoughts” can be identified in them.
At the outset it’s certainly not obvious that our brains—with their billions of neurons operating in parallel—should achieve anything like this. But in fact it seems that our brains have a quite specific neural architecture—presumably produced by biological evolution—that in effect attempts to “integrate and sequentialize” everything. In our cortex we bring together sensory data we collect, then process it with a definite thread of attention. And indeed in medical settings observed deficits in this are what are normally used to identify absence of levels of consciousness. There may still be neurons firing but without integration and sequentialization there doesn’t really seem to be what we normally consider consciousness.
These are biological details. But they seem to point to a fundamental feature of consciousness. Consciousness is not about the general computation that brains—or, for that matter, many other things—can do. It’s about the particular feature of our brains that causes us to have a coherent thread of experience.
But what we have now realized is that the notion of having a coherent thread of experience has deep consequences that far transcend the details of brains or biology. Because in particular what we’ve seen is that it defines the laws of physics, or at least what we consider the laws of physics to be.
Consciousness—like intelligence—is something of which we only have a clear sense in the single case of humans. But just as we’ve seen that the notion of intelligence can be generalized to the notion of arbitrary sophisticated computation, so now it seems that the notion of consciousness can be generalized to the notion of forming a coherent thread of representation for computations.
Operationally, there’s potentially a rather straightforward way to think about this, though it depends on our recent understanding of the concept of time. In the past, time in fundamental physics was usually viewed as being another dimension, much like space. But in our models of fundamental physics, time is something quite different from space. Space corresponds to the hypergraph of connections between the elements that we can consider as “atoms of space”. But time is instead associated with the inexorable and irreducible computational process of repeatedly updating these connections in all possible ways.
There are definite causal relationships between these updating events (ultimately defined by the multiway causal graph), but one can think of many of the events as happening “in parallel” in different parts of space or on different threads of history. But this kind of parallelism is in a sense antithetical to the concept of a coherent thread of experience.
And as we’ve discussed above, the formalism of physics—whether reference frames in relativity or quantum mechanics—is specifically set up to conflate things to the point where there is a single thread of evolution in time.
So one way to think about this is that we’re setting things up so we only have to do sequential computation, like a Turing machine. We don’t have multiple elements getting updated in parallel like in a cellular automaton, and we don’t have multiple threads of history like in a multiway (or nondeterministic) Turing machine.
The operation of the universe may be fundamentally parallel, but our “parsing” and “experience” of it is somehow sequential. As we’ve discussed above, it’s not obvious that such a “sequentialization” would be consistent. But if it’s done with frames and so on, the interplay between causal invariance and underlying computational irreducibility ensures that it will be—and that the behavior of the universe that we’ll perceive will follow the core features of twentiethcentury physics, namely general relativity and quantum mechanics.
But do we really “sequentialize” everything? Experience with artificial neural networks seems to give us a fairly good sense of the basic operation of brains. And, yes, something like initial processing of visual scenes is definitely handled in parallel. But the closer we get to things we might realistically describe as “thoughts” the more sequential things seem to get. And a notable feature is that what seems to be our richest way to communicate thoughts, namely language, is decidedly sequential.
When people talk about consciousness, something often mentioned is “selfawareness” or the ability to “think about one’s own processes of thinking”. Without the conceptual framework of computation, this might seem quite mysterious. But the idea of universal computation instead makes it seem almost inevitable. The whole point of a universal computer is that it can be made to emulate any computational system—even itself. And that is why, for example, we can write the evaluator for Wolfram Language in Wolfram Language itself.
The Principle of Computational Equivalence implies that universal computation is ubiquitous, and that both brains and minds, as well as the universe at large, have it. Yes, the emulated version of something will usually take more time to execute than the original. But the point is that the emulation is possible.
But consider a mind in effect thinking about itself. When a mind thinks about the world at large, its process of perception involves essentially making a model of what’s out there (and, as we’ve discussed, typically a sequentialized one). So when the mind thinks about itself, it will again make a model. Our experiences may start by making models of the “outside world”. But then we’ll recursively make models of the models we make, perhaps barely distinguishing between “raw material” that comes from “inside” and “outside”.
The connection between sequentialization and consciousness gives one a way to understand why there can be different consciousnesses, say associated with different people, that have different “experiences”. Essentially it’s just that one can pick different frames and so on that lead to different “sequentialized” accounts of what’s going on.
Why should they end up eventually being consistent, and eventually agreeing on an objective reality? Essentially for the same reason that relativity works, namely that causal invariance implies that whatever frame one picks, the causal graph that’s eventually traced out is always the same.
If it wasn’t for all the interactions continually going on in the universe, there’d be no reason for the experience of different consciousnesses to get aligned. But the interactions—with their underlying computational irreducibility and overall causal invariance—lead to the consistency that’s needed, and, as we’ve discussed, something else too: particular effective laws of physics, that turn out to be just the relativity and quantum mechanics we know.
The view of consciousness that we’ve discussed is in a sense focused on the primacy of time: it’s about reducing the “parallelism” associated with space—and branchial space—to allow the formation of a coherent thread of experience, that in effect occurs sequentially in time.
And it’s undoubtedly no coincidence that we humans are in effect well placed in the universe to be able to do this. In large part this has to do with the physical sizes of things—and with the (undoubtedly not coincidental) fact that human scales are intermediate between those at which the effects of either relativity or quantum mechanics become extreme.
Why can we “ignore space” to the point where we can just discuss things happening “wherever” at a sequence of moments in time? Basically it’s because the speed of light is large compared to human scales. In our everyday lives the important parts of our visual environment tend to be at most tens of meters away—so it takes light only tens of nanoseconds to reach us. Yet our brains process information on timescales measured in milliseconds. And this means that as far as our experience is concerned, we can just “combine together” things at different places in space, and consider a sequence of instantaneous states in time.
If we were the size of planets, though, this would no longer work. Because—assuming our brains still ran at the same speed—we’d inevitably end up with a fragmented visual experience, that we wouldn’t be able to think about as a single thread about which we can say “this happened, then that happened”.
Even at standard human scale, we’d have somewhat the same experience if we used for example smell as our source of information about the world (as, say, dogs to a large extent do). Because in effect the “speed of smell” is quite slow compared to brain processing. And this would make it much less useful to identify our usual notion of “space” as a coherent concept. So instead we might invent some “other physics”, perhaps labeling things in terms of the paths of air currents that deliver smells to us, then inventing some elaborate gaugefieldlike construct to talk about the relations between different paths.
In thinking about our “place in the universe” there’s also another important effect: our brains are small and slow enough that they’re not limited by the speed of light, which is why it’s possible for them to “form coherent thoughts” in the first place. If our brains were the size of planets, it would necessarily take far longer than milliseconds to “come to equilibrium”, so if we insisted on operating on those timescales there’d be no way—at least “from the outside”—to ensure a consistent thread of experience.
From “inside”, though, a planetsize brain might simply assume that it has a consistent thread of experience. And in doing this it would in a sense try to force a different physics on the universe. Would it work? Based on what we currently know, not without at least significantly changing the notions of space and time that we use.
By the way, the situation would be even more extreme if different parts of a brain were separated by permanent event horizons. And it seems as if the only way to maintain a consistent thread of experience in this case would be in effect to “freeze experience” before the event horizons formed.
What if we and our brains were much smaller than they actually are? As it is, our brains may contain perhaps 10^{300} atoms of space. But what if they contained, say, only a few hundred? Probably it would be hard to avoid computational irreducibility—and we’d never even be able to imagine that there were overall laws, or generally predictable features of the universe, and we’d never be able to build up the kind of coherent experience needed for our view of consciousness.
What about our extent in branchial space? In effect, our perception that “definite things happen even despite quantum mechanics” implies a conflation of the different threads of history that exist in the region of branchial space that we occupy. But how much effect does this have on the rest of the universe? It’s much like the story with the speed of light, except now what’s relevant is a new quantity that appears in our models: the maximum entanglement speed. And somehow this is large enough that over “everyday scales” in branchial space it’s adequate for us just to pick a quantum frame and treat it as something that can be considered to have a definite state at any given instant in time—so that we can indeed consistently maintain a “single thread of experience”.
OK, so now we have a sense of why with our particular human scale and characteristics our view of consciousness might be possible. But where else might consciousness be possible?
It’s a tricky and challenging thing to ask. To achieve our view of consciousness we need to be able to build up something that “viewed from the inside” represents a coherent thread of experience. But the issue is that we’re in effect “on the outside”. We know about our human thread of experience. And we know about the physics that effectively follows from it. And we can ask how we might experience that if, for example, our sensory systems were different. But to truly “get inside” we have to be able to imagine something very alien. Not only different sensory data and different “patterns of thinking”, but also different implied physics.
An obvious place to start in thinking about “other consciousnesses” is with animals and other organisms. But immediately we have the issue of communication. And it’s a fundamental one. Perhaps one day there’ll be ways for various animals to fluidly express themselves through something like humanrelatable videogames. But as of now we have surprisingly little idea how animals “think about things”, and, for example, what their experience of the world is.
We can guess that there will be many differences from ours. At the simplest level, there are organisms that use different sensory modalities to probe the world, whether those be smell, sound, electrical, thermal, pressure, or other. There are “hive mind” organisms, where whatever integrated experience of the world there may be is built up through slow communication between different individuals. There are organisms like plants, which are (quite literally) rooted to one place in space. There are also things like viruses where anything akin to an “integrated thread of experience” can presumably only emerge at the level of something like the progress of an epidemic.
Meanwhile, even in us, there are things like the immune system, which in effect have some kind of “thread of experience” though with rather different input and output than our brains. Even if it seems bizarre to attribute something like consciousness to the immune system, it is interesting to try to imagine what its “implied physics” would be.
One can go even further afield, and think about things like the complete tree of life on Earth, or, for that matter, the geological history of the Earth, or the weather. But how can these have anything like consciousness? The Principle of Computational Equivalence implies that all of them have just the same fundamental computational sophistication as our brains. But, as we have discussed, consciousness seems to require something else as well: a kind of coherent integration and sequentialization.
Take the weather as an example. Yes, there is lots of computational sophistication in the patterns of fluid flow in the atmosphere. But—like fundamental processes in physics—it seems to be happening all over the place, with nothing, it seems, to define anything like a coherent thread of experience.
Coming a little closer to home, we can consider software and AI systems. One might expect that to “achieve consciousness” one would have to go further than ever before and inject some special “humanlike spark”. But I suspect that the true story is rather different. If one wants the systems to make the richest use of what the computational universe has to offer, then they should behave a bit like fundamental physics (or nature in general), with all sorts of components and all sorts of computationally irreducible behavior.
But to have something like our view of consciousness requires taking a step down, and effectively forcing simpler behavior in which things are integrated to produce a “sequentialized” experience. And in the end, it may not be that different from picking out of the computational universe of possibilities just what can be expressed in a definite computational language of the kind the Wolfram Language provides.
Again we can ask about the “implied physics” of such a setup. But since the Wolfram Language is modeled on picking out the computational essence of human thinking it’s basically inevitable that its implied physics will be largely the same as the ordinary physics that is derived from ordinary human thinking.
One feature of having a fundamental model for physics is that it “reduces physics to mathematics”, in the sense that it provides a purely formal system that describes the universe. So this raises the question of whether one can think about consciousness in a formal system, like mathematics.
For example, imagine a formal analog of the universe constructed by applying axioms of mathematics. One would build up an elaborate network of theorems, that in effect populate “metamathematical space”. This setup leads to some fascinating analogies between physics and metamathematics. The notion of time effectively remains as always, but here represents the progressive proving of new mathematical theorems.
The analog of our spatial hypergraph is a structure that represents all theorems proved up to a given time. (And there’s also an analog of the multiway graph that yields quantum mechanics, but in which different paths now in effect represent different possible proofs of a theorem.) So what about things like reference frames?
Well, just as in physics, a reference frame is something associated with an observer. But here the observer is observing not physical space, but metamathematical space. And in a sense any given observer is “discovering mathematics in a particular order”. It could be that all the different “points in metamathematical space” (i.e. theorems) are behaving in completely incoherent—and computationally irreducible—ways. But just as in physics, it seems that there’s a certain computational reducibility: causal invariance implies that different reference frames will in a sense ultimately always “see the same mathematics”.
There’s an analog of the speed of light: the speed at which a new theorem can affect theorems that are progressively further away in metamathematical space. And relativistic invariance then becomes the statement that “there’s only one mathematics”—but it can just be explored in different ways.
How does this relate to “mathematical consciousness”? The whole idea of setting up reference frames in effect relies on the notion that one can “sequentialize metamathematical space”. And this in turn relies on a notion of “mathematical perception”. The situation is a bit like in physics. But now one has a formalized mathematician whose mind stretches over a certain region of metamathematical space.
In current formalized approaches to mathematics, a typical “humanscale mathematical theorem” might correspond to perhaps 10^{5} lowestlevel mathematical propositions. Meanwhile, the “mathematician” might “integrate into their experience” some small fraction of the metamathematical universe (which, for human mathematics, is currently perhaps 3 × 10^{6} theorems). And it’s this setup—which amounts to defining a “sequentialized mathematical consciousness”—that means it makes sense to do analysis using reference frames, etc.
So, just as in physics, it’s ultimately the characteristics of our consciousness that lead to the physics we attribute to the universe, so something similar seems to happen in mathematics.
Clearly we’ve now reached a quite high level of abstraction, so perhaps it’s worth mentioning one more wrinkle that involves an even higher level of abstraction.
We’ve talked about applying a rule to update the abstract structure that represents the universe. And we’ve discussed the fact that the rule can be applied at different places, and on different threads of history. But there’s another freedom: we don’t have to consider a specific rule; we can consider all possible rules.
The result is a rulial multiway graph of possible states of the universe. On different paths, different specific rules are followed. And if you slice across the graph you can get a map of states laid out in rulial space, with different positions corresponding to the outcomes of applying different rules to the universe.
An important fact is then that at the level of the rulial multiway graph there is always causal invariance. So this means that different “rulial reference frames” must always ultimately give equivalent results. Or, in other words, even if one attributes the evolution of the universe to different rules, there is always fundamental equivalence in the results.
In a sense, this can be viewed as a reflection of the Principle of Computational Equivalence and the fundamental idea that the universe is computational. In essence it is saying that since whatever rules one uses to “construct the universe” are almost inevitably computation universal, one can always use them to emulate any other rules.
How does this relate to consciousness? Well, one feature of different rulial reference frames is that they can lead to utterly and incoherently different basic descriptions of the universe.
One of them could be our hypergraphrewritingbased setup, with a representation of space that corresponds well with what emerged in twentiethcentury physics. But another could be a Turing machine, in which one views the updating of the universe as being done by a single head zipping around to different places.
We’ve talked about some possible systems in which consciousness could occur. But one we haven’t yet mentioned—but which has often been considered—is “extraterrestrial intelligences”. Before our Physics Project one might reasonably have assumed that even if there was little else in common with such “alien intelligences”, at least they would be “experiencing the same physics”.
But it’s now clear that this absolutely does not need to be the case. An alien intelligence could perfectly well be experiencing the universe in a different rulial reference frame, utterly incoherent with the one we use.
Is there anything “sequentializable” in a different rulial reference frame? Presumably it’s possible to find at least something sequentializable in any rulial reference frame. But the question of whether the alien intelligence can be thought of as sampling it is a quite different one.
Does there need to be a “sequentializable consciousness” to imply “meaningful laws of physics”? Presumably meaningful laws have to somehow be associated with computational reducibility; certainly that would be true if they were going to be useful to a “computationally bounded” alien intelligence.
But it’s undoubtedly the case that “sequentializability” is not the only way to access computational reducibility. In a mathematical analogy, using sequentializability is a bit like using ordinary mathematical induction. But there are other axiomatic setups (like transfinite induction) that define other ways to do things like prove theorems.
Yes, humanlike consciousness might involve sequentializability. But if the general idea of consciousness is to have a way of “experiencing the universe” that accesses computational reducibility then there are no doubt other ways. It’s a kind of “secondorder alienness”: in addition to using a different rulial reference frame, it’s using a different scheme for accessing reducibility. And the implied physics of such a setup is likely to be very different from anything we currently think of as physics.
Could we ever expect to identify what some of these “alien possibilities” are? The Principle of Computational Equivalence at least implies that we can in principle expect to be able to set up any possible computational rule. But if we start doing experiments we can’t have an expectation that scientific induction will work, and it is potentially arbitrarily difficult to identify computational reducibility. Yes, we might recognize some form of prediction or regularity that we are familiar with. But to recognize an arbitrary form of computational reducibility in effect relies on some analog of a definition of consciousness, which is what we were looking for in the first place.
Consciousness is a difficult topic, that has vexed philosophers and others for centuries. But with what we know now from our Physics Project it at least seems possible to cast it in a new light much more closely connected to the traditions of formal science. And although I haven’t done it here, I fully anticipate that it’ll be possible to take the ideas I’ve discussed and use them to create formal models that can answer questions about consciousness and capture its connections, particularly to physics.
It’s not clear how much realistic physics there will need to be in models to make them useful. Perhaps one will already be able to get worthwhile information about how branching brains perceive a branching universe by looking at some simple case of a multiway Turing machine. Perhaps some combinator system will already reveal something about how different versions of physics could be set up.
In a sense what’s important is that it seems we may have a realistic way to formalize issues about consciousness, and to turn questions about consciousness into what amount to concrete questions about mathematics, computation, logic or whatever that can be formally and rigorously explored.
But ultimately the way to tether the discussion—and to have it not for example devolve into debates about the meaning of words—is to connect it to actionable issues and applications.
As a first example, let’s discuss distributed computing. How should we think about computations that—like those in our model of physics—take place in parallel across many different elements? Well—except in very simple or structured cases—it’s hard, at least for us humans. And from what we’ve discussed about consciousness, perhaps we can now understand why.
The basic issue is that consciousness seems to be all about forming a definite “sequentialized” thread of experience of the world, which is directly at odds with the idea of parallelism.
But so how can we proceed if we need to do distributed computing? Following what we believe about consciousness, I suspect a good approach will be to essentially mirror what we do in parsing the physical universe—and for example to pick reference frames in which to view and integrate the computation.
Distributed computing is difficult enough for us humans to “wrap our brains around”. Multiway or nondeterministic computing tends to be even harder. And once again I suspect this is because of the “limitations imposed by consciousness”. And that the way to handle it will be to use ideas that come from physics, and from the interaction of consciousness with quantum mechanics.
A few years ago at an AI ethics conference I raised the question of what would make us think AIs should have rights and responsibilities. “When they have consciousness!” said an enthusiastic philosopher. Of course, that begs the question of what it would mean for AIs to have consciousness. But the point is that attributing consciousness to something has potential consequences, say for ethics.
And it’s interesting to see how the connection might work. Consider a system that’s doing all sorts of sophisticated and irreducible computation. Already we might reasonably say that the system is showing a generalization of intelligence. But to achieve what we’re viewing as consciousness the system also has to integrate this computation into some kind of single thread of experience.
And somehow it seems much more appropriate to attribute “responsibility” to that single thread that we can somehow “point to” than to a whole incoherent distributed computation. In addition, it seems much “more wrong” to imagine “killing” a single thread, probably because it feels much more unique and special. In a generic computational system there are many ways to “move forward”. But if there’s a single thread of experience it’s more like there’s only one.
And perhaps it’s like the death of a human consciousness. Inevitably the history around that consciousness has affected all sorts of things in the physical universe that will survive its disappearance. But it’s the thread of consciousness that ties it all together that seems significant to us, particularly as we try to make a “summary” of the universe to create our own coherent thread of experience.
And, by the way, when we talk about “explaining AI” what it tends to come down to is being able not just to say “that’s the computation that ran”, but being able to “tell a story” about what happened, which typically begins with making it “sequential enough” that we can relate to it like “another consciousness”.
I’ve often noted that the Principle of Computational Equivalence has important implications for understanding our “place in the universe”. We might have thought that with our life and intelligence there must be something fundamentally special about us. But what we’ve realized is that the essence of these is just computational sophistication—and the Principle of Computational Equivalence implies that that’s actually quite ubiquitous and generic. So in a sense this promotes the importance of our human details—because that’s ultimately all that’s special about us.
So what about consciousness? In full generality it too has a certain genericity. Because it can potentially “plug into” any pocket of reducibility of which there are inevitably infinitely many—even though we humans would not yet recognize most of them. But for our particular version of consciousness the idea of sequentialization seems to be central.
And, yes, we might have hoped that our consciousness would be something that even at an abstract level would put us “above” other parts of the physical universe. So the idea that this vaunted feature of ours is ultimately associated with what amounts to a restriction on computation might seem disappointing. But I view this as just part of the story that what’s special about us are not big, abstract things, but specific things that reflect all that specific irreducible computation that has gone into creating our biology, our civilization and our lives.
In a sense the story of science is a story of struggle between computational irreducibility and computational reducibility. The richness of what we see is a reflection of computational irreducibility, but if we are to understand it we must find computational reducibility in it. And from what we have discussed here we now see how consciousness—which seems so core to our existence—might fundamentally relate to the computational reducibility we need for science, and might ultimately drive our actual scientific laws.
How does this all relate to what philosophers (and others) have said before? It will take significant work to figure that out, and I haven’t done it. But it’ll surely be valuable. Of course it’ll be fun to know if Leibniz or Kant or Plato already figured out—or guessed—this or that, even centuries or millennia before we discovered some feature of computation or physics. But what’s more important is that if there’s overlap with some existing body of work then this provides the potential to make a connection with other aspects of that work, and to show, for example, how what I discuss might relate, say, to other areas of philosophy or other questions in philosophy.
My mother, Sybil Wolfram, was a longtime philosophy professor at Oxford University, and I was introduced to philosophical discourse at a very young age. I always said, though, that if there was one thing I’d never do when I was grown up, it’s philosophy; it just seemed too crazy to still be arguing about the same issues after two thousand years. But after more than half a century of “detour” in science, here I am, arguably, doing philosophy after all….
Some of the early development of the ideas here were captured in the livestream: A Discussion about Physics Built by Alien Intelligences (June 25, 2020). Thanks particularly to Jeff Arle, Jonathan Gorard and Alexander Wolfram for discussions.
]]>In the early years of the twentieth century it looked as if—if only the right approach could be found—all of mathematics might somehow systematically be solved. In 1910 Whitehead and Russell had published their monumental Principia Mathematica showing (rather awkwardly) how all sorts of mathematics could be represented in terms of logic. But Emil Post wanted to go further. In what seems now like a rather modern idea (with certain similarities to the core structure of the Wolfram Language, and very much like the string multiway systems in our Physics Project), he wanted to represent the logic expressions of Principia Mathematica as strings of characters, and then have possible operations correspond to transformations on these strings.
In the summer of 1920 it was all going rather well, and Emil Post as a freshly minted math PhD from Columbia arrived in Princeton to take up a prestigious fellowship. But there was one final problem. Having converted everything to string transformations, Post needed to have a theory of what such transformations could do.
He progressively simplified things, until he reached what he called the problem of “tag”. Take a string of 0s and 1s. Drop its first ν elements. Look at the first dropped element. If it’s a 0 add a certain block of elements at the end of the string, and if it’s a 1 add another block. Post solved several cases of this problem.
But then he came across the one he described as 0→00, 1→1101 with ν=3. Here’s an example of its behavior:
✕
Style[Text[ Column[Row /@ NestList[ Replace[#, {{0, _, _, s___} > {s, 0, 0}, {1, _, _, s___} > {s, 1, 1, 0, 1}}] &, IntegerDigits[5, 2, 3], 10], Spacings > .2]], FontFamily > "Roboto"] 
After a few steps it just ends up in a simple loop, alternating forever between two strings. Here’s another example, starting now from a different string:
✕
Style[Text[ Column[Row /@ NestList[ Replace[#, {{0, _, _, s___} > {s, 0, 0}, {1, _, _, s___} > {s, 1, 1, 0, 1}}] &, IntegerDigits[18, 2, 5], 30]]], FontFamily > "Roboto"] 
Again this ends up in a loop, now involving 6 possible strings.
But what happens in general? To Post, solving this problem was a seemingly simple stepping stone to his program of solving all of mathematics. And he began on it in the early summer of 1921, no doubt expecting that such a simpletostate problem would have a correspondingly simple solution.
But rather than finding a simple solution, he instead discovered that he could make little real progress. And after months of work he finally decided that the problem was in fact, as he later said, “hopeless”—and as a result, he concluded, so was his whole approach to “solving mathematics”.
What had happened? Well, Post had seen a glimpse of a completely unanticipated but fundamental feature of what we now call computation. A decade later what was going on became a little clearer when Kurt Gödel discovered Gödel’s theorem and undecidability. (As Post later put it: “I would have discovered Gödel’s theorem in 1921—if I had been Gödel.”) Then as the years went by, and Turing machines and other kinds of computational systems were introduced, tag systems began to seem more about computation than about mathematics, and in 1961 Marvin Minsky proved that in fact a suitably constructed tag system could be made to do any computation that any Turing machine could do.
But what about Post’s particular, very simple tag system? It still seemed very surprising that something so simple could behave in such complicated ways. But sixty years after Post’s work, when I started to systematically explore the computational universe of simple programs, it began to seem a lot less surprising. For—as my Principle of Computational Equivalence implies—throughout the computational universe, above some very low threshold, even in systems with very simple rules, I was seeing the phenomenon of computational irreducibility, and great complexity of behavior.
But now a century has passed since Emil Post battled with his tag system. So armed with all our discoveries—and all our modern tools and technology—what can we now say about it? Can we finally crack Post’s problem of tag? Or—simple as it is—will it use the force of computational irreducibility to resist all our efforts?
This is the story of my recent efforts to wage my own battle against Post’s tag system.
The Wolfram Language can be seen in part as a descendent of Post’s idea of representing everything in terms of transformation rules (though for symbolic expressions rather than strings). So it’s no surprise that Post’s problem of tag is very simple to set up in the Wolfram Language:
✕
NestList[Replace[{ {0, _, _, s___} > {s, 0, 0}, {1, _, _, s___} > {s, 1, 1, 0, 1} }], {1, 0, 0, 1, 0}, 10] // Column 
Given the initial string, the complete behavior is always determined. But what can happen? In the examples above, what we saw is that after some “transient” the system falls into a cycle which repeats forever.
Here’s a plot for all possible initial strings up to length 7. In each case there’s a transient and a cycle, with lengths shown in the plot (with cycle length stacked on top of transient length):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; With[{list = Catenate[Table[Tuples[{0, 1}, n], {n, 7}]]}, ListStepPlot[ Transpose[((Length /@ FindTransientRepeat[TSDirectEvolveList[#, 1000], 4]) & /@ list)], Center, PlotRange > {0, 28}, PlotStyle > {Hue[0.1, 1, 1], Hue[0.02, 0.92, 0.8200000000000001]}, PlotLayout > "Stacked", Joined > True, Filling > Automatic, Frame > True, AspectRatio > 1/5, FrameTicks > {{Automatic, None}, {Extract[ MapThread[ List[#1, Rotate[Style[StringJoin[ToString /@ #2], FontFamily > "Roboto", Small], 90 Degree]] &, {Range[0, 253], list}], Position[list, Alternatives @@ Select[list, IntegerExponent[FromDigits[#, 2], 2] > Length[#]/2 && Length[#] > 1 &]]], None}}]] 
(Note that if the system reaches 00—or another string with less than 3 characters—one can either say that it has a cycle of length 1, or that it stops completely, effectively with a cycle of length 0.) For initial strings up to length 7, the nontrivial cycles observed are of lengths 2 and 6.
Starting from 10010 as above, we can show the behavior directly—or we can try to compensate for the removal of elements from the front at each step by rotating at each step:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; MapIndexed[ With[{func = #1, ind = #2}, ArrayPlot[ MapIndexed[func, PadRight[TSDirectEvolveList[{1, 0, 0, 1, 0}, 40], If[First[ind] == 1, {Automatic, 17}, Automatic], .25]], Mesh > True, MeshStyle > GrayLevel[0.75, 0.75], Frame > False, ImageSize > {Automatic, 240}]] &, {# &, RotateLeft[#, 3 (First[#2]  1)] &}] 
We can also show only successive “generations” in which the rule has effectively “gone through the whole string”:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[ PadRight[TSGenerationEvolveList[{1, 0, 0, 1, 0}, 30], {Automatic, 17}, .25], Mesh > True, MeshStyle > GrayLevel[0.75, .75], Frame > False, ImageSize > {100, Automatic}] 
Let’s continue to longer initial sequences. Here are the lengths of transients and cycles for initial sequences up to length 12:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; With[{list = Catenate[Table[Tuples[{0, 1}, n], {n, 12}]]}, ListStepPlot[ Transpose[((Length /@ FindTransientRepeat[TSDirectEvolveList[#, 1000], 4]) & /@ list)], Center, PlotRange > All, PlotStyle > {Hue[0.1, 1, 1], Hue[0.02, 0.92, 0.8200000000000001]}, PlotLayout > "Stacked", Joined > True, Filling > Automatic, Frame > True, AspectRatio > 1/6, FrameTicks > {{Automatic, None}, {Extract[ MapThread[ List[#1, Rotate[Style[StringJoin[ToString /@ #2], FontFamily > "Roboto", Small], 90 Degree]] &, {Range[0, 8189], list}], Position[list, Alternatives @@ Select[list, IntegerExponent[FromDigits[#, 2], 2] > Length[#]/1.3 && Length[#] > 7 &]]], None}}]] 
All the cycles are quite short—in fact they’re all of lengths 0, 2, 4, 6 or 10. And for initial strings up to length 11, the transients (which we can think of as “halting times”) are at most of length 28. But at length 12 the string 100100100000 suddenly gives a transient of length 419, before finally evolving to the string 00.
Here’s a plot of the sequence of lengths of intermediate strings produced in this case (the maximum length is 56):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ Length /@ TSDirectEvolveList[{1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0}, 501], Filling > Axis, Frame > True, AspectRatio > 1/3, PlotStyle > Hue[0.07, 1, 1]] 
And, by the way, this gives an indication of why Post called this the “problem of tag” (at the suggestion of his colleague Bennington Gill). Elements keep on getting removed from the “head” of the string, and added to its “tail”. But will the head catch up with the tail? When it does, it’s like someone winning a game of tag, by being able to “reach the last person”.
Here’s a picture of the detailed behavior in the case above:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; (Row[ArrayPlot[#, ImageSize > {100, Automatic}] & /@ Partition[ MapIndexed[#, PadRight[ TSDirectEvolveList[{1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0}, 501], Automatic, .25]], UpTo[210]]]) & /@ {# &, RotateLeft[#, 3 (First[#2]  1)] &} 
And here’s the “generational” plot, now flipped around to go from left to right:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[ Reverse@Transpose@ PadRight[ TSGenerationEvolveList[{1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0}, 50], {Automatic, 58}, .25], Mesh > True, MeshStyle > GrayLevel[0.75, .75], Frame > False] 
By the way, we can represent the complete history of the tag system just by concatenating the original string with all the blocks of elements that are added to it, never removing blocks of elements at the beginning. In this case this is the length1260 string we get:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Style[StringJoin[ ToString /@ TSDirectEvolveSequence[{1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0}, 440]], FontFamily > "Roboto", 8] 
Plotting the “walk” obtained by going up at each 1 and down at each 0 we get (and not surprisingly, this is basically the same curve as the sequence of total string lengths above):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListLinePlot[ Accumulate[ 2 TSDirectEvolveSequence[{1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0}, 440]  1], Frame > True, AspectRatio > 1/3, PlotStyle > Hue[0.07, 1, 1]] 
How “random” is the sequence of 0s and 1s? There are a total of 615 1s and 645 0s in the whole sequence—so roughly equal. For length2 blocks, there are only about 80% as many 01s and 10s as 00s and 11s. For length3 blocks, the disparities are larger, with only 30% as many 001 blocks occurring as 000 blocks.
And then at length 4, there is something new: none of the blocks
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Text[Row /@ Complement[Tuples[{1, 0}, 4], Union[Partition[ TSDirectEvolveSequence[{1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0}, 450], 4, 1]]]] 
ever appear at all, and 0010 appears only twice, both at the beginning of the sequence. Looking at the rule, it’s easy to see why, for example, 1111 can never occur—because no sequence of the 00s and 1101s inserted by the rule can ever produce it. (We’ll discuss block occurrences more below.)
OK, so we’ve found some fairly complicated behavior even with initial strings of length 12. But what about longer strings? What can happen with them? Before exploring this, it’s useful to look in a little more detail at the structure of the underlying problem.
To find out what can happen in our tag system, we’ve enumerated all possible initial strings up to certain lengths. But it turns out that there’s a lot of redundancy in this—as our plots of “halting times” above might suggest. And the reason is that the way the tag system operates, only every third element in the initial string actually ever matters. As far as the rule is concerned we can just fill in _ for the other elements:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Style[Text[ Column[Row /@ NestList[ Join[Drop[#, 3], {{0, 0}, {1, 1, 0, 1}}[[ 1 + First[#]]]] &, {0, _, _, 1, _, _, 1, _, _, 1, _, _, 1, _, _}, 10]]], FontFamily > "Roboto"] 
The _’s will steadily be “eaten up”, and whether they were originally filled in with 0s or 1s will never matter. So given this, we don’t lose any information by using a compressed representation of the strings, in which we specify only every third element:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Style[Text[ Grid[Transpose@{Row /@ (MapAt[ Style[#1, Bold] &, #, {1 ;; 1 ;; 3}] & /@ NestList[ Join[Drop[#, 3], {{0, 0}, {1, 1, 0, 1}}[[ 1 + First[#]]]] &, {0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0}, 10]), Row[{Style[Row[Take[#, 1 ;; 1 ;; 3]], Bold], Style[Row[{Style[":", Gray], Mod[Length[#], 3]}], Small]}] & /@ NestList[ Join[Drop[#, 3], {{0, 0}, {1, 1, 0, 1}}[[1 + First[#]]]] &, {0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0}, 10]}, Dividers > Center, FrameStyle > LightGray, Alignment > Left]], FontFamily > "Roboto"] 
But actually this isn’t quite enough. We also need to say the “phase” of the end of the string: the number of trailing elements after the last block of 3 elements (i.e. the length of the original string mod 3).
So now we can start enumerating nonredundant possible initial strings, specifying them in the compressed representation:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Grid[Transpose@ Partition[ Text[Style[#, FontFamily > "Roboto"]] & /@ PhasedStringForm /@ EnumerateInits[3], 6], Spacings > {1.5, .2}] 
Given a string in compressed form, we can explicitly compute its evolution. The effective rules are a little more complicated than for the underlying uncompressed string, but for example the following will apply one step of evolution to any compressed string (represented in the form {phase, elements}):
✕
Replace[ {{0, {0, s___}} > {2, {s, 0}}, {0, {1, s___}} > {1, {s, 1, 1}}, {1, {0, s___}} > {0, {s}}, {1, {1, s___}} > {2, {s, 0}}, {2, {0, s___}} > {1, {s, 0}}, {2, {1, s___}} > {0, {s, 1}}}] 
Can we reconstruct an uncompressed string from a compressed one? Well, no, not uniquely. Because the “intermediate” elements that will be ignored by the rule aren’t specified in the compressed form. Given, say, the compressed string 10:2 we know the uncompressed string must be of the form 1__0_ but the _’s aren’t determined. However, if we actually run the rule, we get
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Style[Text[ Column[Row /@ TSDirectEvolveList[FromPhaseForm[{2, {1, 0}}, _], 3]]], FontFamily > "Roboto"] 
so that the blanks in effect quickly resolve. (By the way, given a compressed string s:0 the uncompressed one is __, for s:1 it is just , and for s:2 it is , with the uncompressed string length mod 3 being equal to the phase.)
So taking all compressed strings up to length 4 here is the sequence of transient and cycle lengths obtained:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ Transpose[((Length /@ FindTransientRepeat[TSDirectEvolveList[#, 1000], 4]) & /@ Catenate[Table[DistinctInits[i], {i, 4}]])], Center, PlotRange > {0, Automatic}, PlotLayout > "Stacked", PlotStyle > {Hue[0.1, 1, 1], Hue[0.02, 0.92, 0.8200000000000001]}, Joined > True, Filling > Automatic, Frame > True, AspectRatio > 1/5] 
The first case that is cut off in the plot has halting time 419; it corresponds to the compressed string 1110:0.
We can think of compressed strings as corresponding to possible nonredundant “states” of the tag system. And then we can represent the global evolution of the system by constructing a state transition graph that connects each state to its successor in the evolution. Here is the result starting from distinct length3 strings (here shown in uncompressed form; the size of each node reflects the length of the string):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; With[{g = VertexDelete[NestGraph[TSStep, {{0, 0, 0}, {1, 0, 0}}, 400], 0]}, HighlightGraph[ Graph[g, VertexSize > (# > .1 Sqrt[Length[#]] & /@ VertexList[g]), VertexStyle > Hue[0.58, 0.65, 1], EdgeStyle > Hue[0.58, 1, 0.7000000000000001], VertexLabels > (# > Placed[Row[#], Above] & /@ VertexList[g])], {Style[ Subgraph[g, FindCycle[g, {1, Infinity}, All]], Thick, Hue[ 0.02, 0.92, 0.8200000000000001]], Pick[VertexList[g], VertexOutDegree[g], 0]}]] 
There is a length2 cycle, indicated in red, and also a “terminating state” indicated in yellow. Here’s the state transition graph starting with all length1 compressed strings (i.e. nonredundant uncompressed strings with lengths between 3 and 5)—with nodes now labeled just with the (uncompressed) length of the string that they represent:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; With[{g = VertexDelete[NestGraph[TSStep, DistinctInits[1], 400], 0]}, HighlightGraph[ Graph[g, VertexSize > (# > .1 Sqrt[Length[#]] & /@ VertexList[g]), VertexStyle > Hue[0.58, 0.65, 1], EdgeStyle > Hue[0.58, 1, 0.7000000000000001], VertexLabels > (# > Placed[Length[#], Above] & /@ VertexList[g])], {Style[ Subgraph[g, FindCycle[g, {1, Infinity}, All]], Thick, Hue[ 0.02, 0.92, 0.8200000000000001]], Pick[VertexList[g], VertexOutDegree[g], 0]}]] 
We see the same length2 cycle and terminating state as we saw before. But now there is also a length6 cycle. The original “feeder” for this length6 cycle is the string 10010 (compressed: 11:2
Here are the corresponding results for compressed initial strings up to successively greater lengths n, with the lengths of cycles labeled:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; GraphicsRow[ Table[Labeled[ Framed[Show[ With[{g = VertexDelete[ NestGraph[TSStep, Catenate[Table[DistinctInits[i], {i, n}]], 700], 0]}, With[{c = FindCycle[g, {1, Infinity}, All]}, HighlightGraph[ Graph[g, VertexLabels > Join[(#[[1, 1]] > Placed[Style[Length[#], 11, Darker[Hue[ 0.02, 0.92, 0.8200000000000001], .2]], {Before, Below}] & /@ c), # > Style[1, 11, Darker[Yellow, .4]] & /@ Pick[VertexList[g], VertexOutDegree[g], 0]], VertexSize > (# > .3 Sqrt[Length[#]] & /@ VertexList[g]), VertexStyle > Hue[0.58, 0.65, 1], EdgeStyle > Hue[0.58, 1, 0.7000000000000001]], {Style[ Subgraph[g, c], Thick, Hue[0.02, 0.92, 0.8200000000000001]], Pick[VertexList[g], VertexOutDegree[g], 0]}]]], ImageSize > {UpTo[250], UpTo[250]}], FrameStyle > LightGray], Style[Text[ Row[{Style["n", Italic], " \[LessEqual] ", ToString[n]}]], 10]], {n, 2, 3}]] 
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; GraphicsColumn[ Table[Labeled[ Framed[Show[ With[{g = VertexDelete[ NestGraph[TSStep, Catenate[Table[DistinctInits[i], {i, n}]], 700], 0]}, With[{c = FindCycle[g, {1, Infinity}, All]}, HighlightGraph[ Graph[g, VertexStyle > Hue[0.58, 0.65, 1], EdgeStyle > Hue[0.58, 1, 0.7000000000000001], VertexLabels > Join[(#[[1, 1]] > Placed[Style[Length[#], 11, Darker[Hue[ 0.02, 0.92, 0.8200000000000001], .2]], {After, Above}] & /@ c), # > Style[1, 11, Darker[Yellow, .4]] & /@ Pick[VertexList[g], VertexOutDegree[g], 0]], VertexSize > (# > .6 Sqrt[Length[#]] & /@ VertexList[g]), GraphStyle > "Default"], {Style[Subgraph[g, c], Thick, Red], Pick[VertexList[g], VertexOutDegree[g], 0]}]]], ImageSize > {UpTo[500], UpTo[200]}], FrameStyle > LightGray], Style[Text[ Row[{Style["n", Italic], " \[LessEqual] ", ToString[n]}]], 10]], {n, 4, 5}], ImageSize > {550, Automatic}] 
A notable feature of these graphs is that at compressed length 4, a long “highway” appears that goes for about 400 steps. The highway basically represents the long transient first seen for the initial string 11:2. There is one “onramp” for this string, but then there is also a tree of other states that enter the same highway.
Why is there a “highway” in the first place? Basically because the length419 transient involves strings that are long compared to any we are starting from—so nothing can feed into it after the beginning, and it basically just has to “work itself through” until it reaches whatever cycle it ends up in.
When we allow initial strings with compressed length up to 6 a new highway appears, dwarfing the previous one (by the way, most of the wiggliness we see is an artifact of the graph layout):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; With[{n = 6}, Labeled[Framed[ With[{g = VertexDelete[ NestGraph[TSStep, Catenate[Table[DistinctInits[i], {i, n}]], 20000], 0]}, With[{c = FindCycle[g, {1, Infinity}, All]}, HighlightGraph[ Graph[g, VertexStyle > Hue[0.58, 0.65, 1], EdgeStyle > Hue[0.58, 1, 0.7000000000000001], VertexLabels > Join[(#[[1, 1]] > Placed[Style[Length[#], 11, Darker[Hue[ 0.02, 0.92, 0.8200000000000001], .2]], {Before, Above}] & /@ c), # > Style[1, 11, Darker[Yellow, .4]] & /@ Pick[VertexList[g], VertexOutDegree[g], 0]], VertexSize > (# > .6 Sqrt[Length[#]] & /@ VertexList[g]), GraphStyle > "Default"], {Style[Subgraph[g, c], Thick, Red], Pick[VertexList[g], VertexOutDegree[g], 0]}]]], FrameStyle > LightGray], Style[Text[ Row[{Style["n", Italic], " \[LessEqual] ", ToString[n]}]], 10]]] 
The first initial state to reach this highway is 111010:0 (uncompressed: 100100100000100000)—which after 2141 steps evolves to a cycle of length 28. Here are the lengths of the intermediate strings along this highway (note the cycle at the end):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ Length /@ TSDirectEvolveList[{1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0}, 2300], Filling > Axis, Frame > True, AspectRatio > 1/3, PlotStyle > Hue[0.07, 1, 1]] 
And here are the “generational states” reached (note that looking only at generations makes the final 28cycle show up as a 1cycle):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[ Reverse@Transpose@ PadRight[ TSGenerationEvolveList[{1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0}, 80], {Automatic, 180}, .25], Frame > False] 
Or looking at “compressed strings” (i.e. including only every third element of each string):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[ Reverse@Transpose@ PadRight[ Last /@ TSGenerationPhaseEvolveList[{1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0}, 80], {Automatic, 70}, .25], Frame > False] 
If we consider all initial strings up to compressed length 6, we get the following transient+cycle lengths:
✕

And what we see is that there are particular lengths of transients—corresponding to the highways in the state transition graph above—to which certain strings evolve. If we plot the distribution of halting (i.e. transient) times for all the strings we find, then, as expected, it peaks around the lengths of the main highways:
>
✕ 
So given a particular “onramp to a highway”—or, for that matter, a state on a cycle—what states will evolve to it? In general there’ll be a tree of states in the state transition graph that are the “predecessors” of a given state—in effect forming its “basin of attraction”.
For any particular string the rule gives a unique successor. But we can also imagine “running the rule backwards”. And if we do this, it turns out that any given compressed string can have 0, 1 or 2 immediate predecessors. For example, 000:0 has the unique predecessor 0000:1. But 001:0 has both 0001:1 and 100:2 as predecessors. And for example 001:1 has no predecessors. (For uncompressed strings, there are always either 0 or 4 immediate predecessors.)
Any state that has no predecessors can occur only as the initial string; it can never be generated in the evolution. (There are similar results for substrings, as we’ll discuss later.)
And if we start from a state that does have at least one predecessor, we can in general construct a whole tree of “successively further back” predecessors. Here, for example, is the 10step tree for 000:2:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; With[{g = Graph[# > TSPhaseStep[#] & /@ Union[Flatten[ NestList[ Flatten[PhaseStepBackwards[#] & /@ #, 1] &, {{0, {0, 0, 0}}}, 10], 1]]]}, Graph[g, VertexLabels > (# > PhasedStringForm[#] & /@ VertexList[g]), AspectRatio > 1, VertexStyle > Hue[0.58, 0.65, 1], EdgeStyle > Hue[0.58, 1, 0.7000000000000001]]] 
Here it is after 30 steps, in two different renderings:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; With[{g = Graph[# > TSPhaseStep[#] & /@ Union[Flatten[ NestList[ Flatten[PhaseStepBackwards[#] & /@ #, 1] &, {{0, {0, 0, 0}}}, 30], 1]]]}, Graph[g, VertexStyle > Hue[0.58, 0.65, 1], EdgeStyle > Hue[0.58, 1, 0.7000000000000001], GraphLayout > "LayeredDigraphEmbedding", AspectRatio > 1/2]] 
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; With[{g = Graph[# > TSPhaseStep[#] & /@ Union[Flatten[ NestList[ Flatten[PhaseStepBackwards[#] & /@ #, 1] &, {{0, {0, 0, 0}}}, 30], 1]]]}, Graph[g, VertexStyle > Hue[0.58, 0.65, 1], EdgeStyle > Hue[0.58, 1, 0.7000000000000001]]] 
If we continue this particular tree we’ll basically get a state transition graph for all states that eventually terminate. Not surprisingly, there’s considerable complexity in this tree—though the number of states after t steps does grow roughly exponentially (apparently like ):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ Length /@ NestList[Flatten[PhaseStepBackwards[#] & /@ #, 1] &, {{0, {0, 0, 0}}}, 100], Center, Frame > True, Filling > Axis, ScalingFunctions > "Log", AspectRatio > 1/3, PlotStyle > Hue[0.07, 1, 1]] 
By the way, there are plenty of states that have finite predecessor trees. For example 1100:0 yields a tree which grows only for 21 steps, then stops:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Rotate[With[{g = Graph[# > TSPhaseStep[#] & /@ Union[Flatten[ NestList[ Flatten[PhaseStepBackwards[#] & /@ #, 1] &, {{0, {1, 1, 0, 0}}}, 21], 1]]]}, Graph[g, GraphLayout > "LayeredDigraphEmbedding", VertexStyle > Hue[0.58, 0.65, 1], EdgeStyle > Hue[0.58, 1, 0.7000000000000001]]], 90 Degree] 
At least in all the cases we’ve seen so far, our tag system always evolves to a cycle (or terminates in a trivial state). But what cycles are possible? In effect any cycle state S must be a solution to a “tag eigenvalue equation” of the form S = S for some p, where T is the “tag evolution operator”.
Starting with compressed strings of length 1, only one cycle can ever be reached:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; First[With[{n = 1}, With[{g = Graph[DirectedEdge @@@ Partition[#, 2, 1, 1]]}, Graph[g, VertexLabels > (# > Placed[Row[#], Above] & /@ VertexList[g]), VertexStyle > Hue[0.58, 0.65, 1], EdgeStyle > Hue[0.58, 1, 0.7000000000000001]]] & /@ (Last[ FindTransientRepeat[TSDirectEvolveList[FromPhaseForm[#], 1000], 3]] & /@ (First /@ With[{v = PostTagSystem[AllInits[n]]}, Map[v["State", #] &, Map[First] /@ FindCycle[v["StateGraph"], {1, Infinity}, All], {2}]]))]] 
Starting with compressed strings of length 2 a 6cycle appears (here shown labeled respectively with uncompressed and with compressed strings):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; {Last[With[{n = 2}, With[{g = Graph[DirectedEdge @@@ Partition[#, 2, 1, 1]]}, Graph[g, VertexLabels > (# > Placed[Row[#], Above] & /@ VertexList[g]), VertexStyle > Hue[0.58, 0.65, 1], EdgeStyle > Hue[0.58, 1, 0.7000000000000001]]] & /@ (Last[ FindTransientRepeat[ TSDirectEvolveList[FromPhaseForm[#], 1000], 3]] & /@ (First /@ With[{v = PostTagSystem[AllInits[n]]}, Map[v["State", #] &, Map[First] /@ FindCycle[v["StateGraph"], {1, Infinity}, All], {2}]]))]], Last[With[{n = 2}, With[{g = Graph[DirectedEdge @@@ Partition[#, 2, 1, 1]]}, Graph[g, VertexStyle > Hue[0.58, 0.65, 1], EdgeStyle > Hue[0.58, 1, 0.7000000000000001], VertexLabels > (# > Placed[PhasedStringForm[ToPhaseForm[#]], Above] & /@ VertexList[g])]] & /@ (Last[ FindTransientRepeat[ TSDirectEvolveList[FromPhaseForm[#], 1000], 3]] & /@ (First /@ With[{v = PostTagSystem[AllInits[n]]}, Map[v["State", #] &, Map[First] /@ FindCycle[v["StateGraph"], {1, Infinity}, All], {2}]]))]]} 
No new cycles appear until one has initial strings of compressed length 4, but then one gets (where now the states are labeled with their uncompressed lengths):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; With[{n = 4}, Framed[GraphicsRow[ Sort[With[{g = Graph[DirectedEdge @@@ Partition[#, 2, 1, 1]]}, Graph[g, VertexStyle > Hue[0.58, 0.65, 1], EdgeStyle > Hue[0.58, 1, 0.7000000000000001], VertexLabels > (# > Placed[Length[#], Above] & /@ VertexList[g])]] & /@ (Last[ FindTransientRepeat[ TSDirectEvolveList[FromPhaseForm[#], 1000], 3]] & /@ (First /@ With[{v = PostTagSystem[AllInits[n]]}, Map[v["State", #] &, Map[First] /@ FindCycle[v["StateGraph"], {1, Infinity}, All], {2}]]))]], FrameStyle > LightGray]] 
The actual cycles are as follows
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; With[{n = 4}, ArrayPlot[PadRight[#, Automatic, .25], Mesh > True, MeshStyle > GrayLevel[0.75, 0.75], ImageSize > {Automatic, Length[#] 11}] & /@ Sort[(ResourceFunction["CanonicalListRotation"][ Last[FindTransientRepeat[ TSDirectEvolveList[FromPhaseForm[#], 1000], 3]]] & /@ (First /@ With[{v = PostTagSystem[AllInits[n]]}, Map[v["State", #] &, Map[First] /@ FindCycle[v["StateGraph"], {1, Infinity}, All], {2}]]))]] 
while the ones from length5 initial strings are:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; With[{n = 5}, ArrayPlot[PadRight[#, Automatic, .25], Mesh > True, MeshStyle > GrayLevel[0.75, 0.75], ImageSize > {Automatic, Length[#] 7}] & /@ Sort[(ResourceFunction["CanonicalListRotation"][ Last[FindTransientRepeat[ TSDirectEvolveList[FromPhaseForm[#], 1000], 3]]] & /@ (First /@ With[{v = PostTagSystem[AllInits[n]]}, Map[v["State", #] &, Map[First] /@ FindCycle[v["StateGraph"], {1, Infinity}, All], {2}]]))]] 
What larger cycles can occur? It is fairly easy to see that a compressed string consisting of any sequence of the blocks 01 and 1100 will yield a state on a cycle. To find out about uncompressed strings on cycles, we can just apply the rule 0→00, 1→1101, with the result that we conclude that any sequence of the length6 and length12 blocks 001101 and 110111010000 will give a state on a cycle.
If we plot the periods of cycles against the lengths of their “seed” strings, we get:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListPlot[Style[ Catenate[Table[{Length[Flatten[#]], Length[FindRepeat[TSDirectEvolveList[Flatten[#], 1000]]]} & /@ Tuples[{{0, 0, 1, 1, 0, 1}, {1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0}}, n], {n, 10}]], Hue[0.02, 0.92, 0.8200000000000001]], Frame > True, PlotStyle > PointSize[.02]] 
If we generate cycles from sequences of, say, b of our 01, 1100 blocks, how many of the cycles we get will be distinct? Here are the periods of the distinct cycles for successive b:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Text[Grid[ Table[{b, Length /@ Union[ResourceFunction["CanonicalListRotation"][ FindRepeat[TSDirectEvolveList[Flatten[#], 1000]]] & /@ Tuples[{{0, 0, 1, 1, 0, 1}, {1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0}}, b]]}, {b, 6}], Frame > All, FrameStyle > Gray]] 
The total number of cycles turns out to be:
✕
DivisorSum[n, k > EulerPhi[k] 2^(n/k)]/n 
✕
Table[DivisorSum[n, k > EulerPhi[k] 2^(n/k)]/n, {n, 15}] 
We can also ask an inverse question: of all 2^{n} (uncompressed) strings of length n, how many of them lie on cycles of the kind we have identified? The answer is the same as the number of distinct “cyclic necklaces” with n beads, each 0 or 1, with no pair of 0s adjacent:
✕
DivisorSum[n, k > EulerPhi[n/k] LucasL[k]]/n 
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Table[DivisorSum[n, k > EulerPhi[n/k] LucasL[k]]/n, {n, 20}] 
Asymptotically this is about —implying that of all strings of length n, only a fraction ≈ of them will be on cycles, so that for large n the overwhelming majority of strings will not be on cycles, at least of this kind.
But are there other kinds of cycles? It turns out there are, though they do not seem to be common or plentiful. One family—always of period 6—are seeded by compressed strings of the form (with uncompressed length 16 + 18m):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[PadRight[#, Automatic, .25], Mesh > True, MeshStyle > GrayLevel[0.75, 0.75], ImageSize > {Automatic, Length[#] 8}] & /@ Table[FindRepeat[ TSDirectEvolveList[ Flatten[Flatten[{{0, 0, 1, 1, 1}, Table[{0, 0, 0, 1, 1, 1}, m]}] /. {1 > {1, 1, 0, 1}, 0 > {0, 0}}], 100]], {m, 3}] 
But there are other cases too. The first example appears with initial compressed strings of length 9. The length13 compressed string 0011111110100 (with uncompressed length 39) yields the period40 cycle (with uncompressed string lengths between 37 and 44):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[PadRight[#, Automatic, .25], Mesh > True, MeshStyle > GrayLevel[0.75, 0.75], ImageSize > {Automatic, Length[#] 4}] &[ FindRepeat[ TSDirectEvolveList[{0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0}, 400]]] 
The next example occurs with an initial compressed string of length 15, and a compressed “seed” of length 24—and has period 282:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[PadRight[#, Automatic, .25], Frame > False] &[ FindRepeat[ TSDirectEvolveList[ Flatten[{0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0} /. {1 > {1, 1, 0, 1}, 0 > {0, 0}}], 1000]]] 
And I’ve found one more example (that arises from an initial compressed string of length 18) and has period 66:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[PadRight[#, Automatic, .25], Frame > False] &[ FindRepeat[ TSDirectEvolveList[ Flatten[{0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1} /. {1 > {1, 1, 0, 1}, 0 > {0, 0}}], 1000]]] 
If we look at these cycles in “generational” terms, they are of lengths 3, 11 and 14, respectively (note that the second two pictures above start with “incomplete generations”):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[PadRight[#, Automatic, .25], Frame > False, ImageSize > {Automatic, 140}] &[ TSGenerationEvolveList[#, 60]] & /@ ((First[Last[#]] &@ FindTransientRepeat[TSGenerationEvolveList[#, 100], 3]) & /@ {{0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0}, Flatten[{0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0} /. {1 > {1, 1, 0, 1}, 0 > {0, 0}}], Flatten[{0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1} /. {1 > {1, 1, 0, 1}, 0 > {0, 0}}]}) 
I don’t know how far Emil Post got in exploring his tag system by hand a century ago. And I rather suspect that we’ve already gone a lot further here than he ever did. But what we’ve seen has just deepened the mystery of what tag systems can do. So far, every initial string we’ve tried has evolved to a cycle (or just terminated). But will this always happen? And how long can it take?
So far, the longest transient we’ve seen is 2141 steps—from the length6 compressed string 111010:0. Length7 and length8 strings at most just “follow the same highway” in the state transition graph, and don’t give longer transients. But at length 9 something different happens: 111111010:0 takes 24,552 steps to evolve a 6cycle (with string length 12), with the lengths of intermediate (compressed) strings being:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ Quotient[Length /@ TSDirectEvolveList[PuffOut[{1, 1, 1, 1, 1, 1, 0, 1, 0}], 25300], 3], Center, Frame > True, Filling > Axis, AspectRatio > 1/3, PlotStyle > Hue[0.07, 1, 1], MaxPlotPoints > 4000] 
Plotting (from left to right) the actual elements in compressed strings in each “generation” this shows in more detail what’s “going on inside”:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[ Reverse@Transpose[ PadRight[ Last /@ TSGenerationPhaseEvolveList[ PuffOut[{1, 1, 1, 1, 1, 1, 0, 1, 0}], 400], {Automatic, 230}, .25]], Frame > False] 
In systematically exploring what can happen in tag systems, it’s convenient to specify initial compressed strings by converting their sequences of 1s and 0s to decimal numbers—but because our strings can have leading 0s we have to include the length, say as a prefix. So with this setup our length9 “halting time winner” 111111010:0 becomes 9:506:0.
The next “winner” is 12:3962:0, which takes 253,456 steps to evolve to a 6cycle:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[{First[#], Length[#[[2, 2]]]} & /@ With[{re = PostTagSystem[{0, {1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0}}]}, Table[{i, re["State", i]}, {i, 1, 253456 + 100, 100}]], Center, Frame > True, Filling > Axis, AspectRatio > 1/3, PlotStyle > Hue[0.07, 1, 1]] 
In generational form the explicit evolution in this case is:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[ Reverse@Transpose[ PadRight[ Last /@ TSGenerationPhaseEvolveList[ PuffOut[{1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0}], 950], Automatic, .25]], Frame > False] 
The first case to take over a million steps is 15:30166:0—which terminates after 20,858,103 steps:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Show[LengthsPlotDecimal[{0, 30166}, 15, 20858103, 4000, 10^6], FrameTicks > {{Automatic, None}, {Thread[{Range[0, 20][[1 ;; 1 ;; 5]], Append[Range[0, 15][[1 ;; 1 ;; 5]], "20 million"]}], None}}] 
The first case to take over a billion steps is 20:718458:0—which leads to a 6cycle after 2,586,944,112 steps:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Show[LengthsPlotDecimal[{0, 718458}, 20, 2586944112, 1000000], FrameTicks > {{Automatic, None}, {Thread[{Range[0, 2500][[1 ;; 1 ;; 500]], Append[Range[0, 2000][[1 ;; 1 ;; 500]], "2500 million"]}], None}}] 
Here’s table of all the “longestsofar” winners through compressed initial length28 strings (i.e. covering all ≈ 2 × 10^{25} ordinary initial strings up to length 84):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Text[Grid[ Prepend[{DecimalStringForm[{First[#], #[[2, 1]]}], #[[2, 2, 1]], If[# == 0, Style[#, Gray], #] &[#[[2, 2, 2]]]} & /@ {{4, {0, 14} > {419, 0}}, {6, {0, 58} > {2141, 28}}, {9, {0, 506} > {24552, 6}}, {12, {0, 3962} > {253456, 6}}, {13, {0, 5854} > {341992, 6}}, {15, {0, 16346} > {20858069, 0}}, {15, {0, 30074} > {357007576, 6}}, {20, {0, 703870} > {2586944104, 6}}, {22, {0, 3929706} > {2910925472, 6}}, {24, {0, 12410874} > {50048859310, 0}}, {25, {0, 33217774} > {202880696061, 6, {0, {0, 1, 1, 1, 0, 0}}}}, {27, {0, 125823210} > {259447574536, 6, {0, {0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1}}}}, {28, {2, 264107671} > {643158954877, 10, {0, {0, 1, 1, 1, 0, 0, 1, 1, 0, 0}}}}}, Style[#, Italic] & /@ {"initial state", "steps", "cycle length"}], Frame > All, Alignment > {{Left, Right, Right}}, FrameStyle > GrayLevel[.7], Background > {None, {GrayLevel[.9]}}]] 
And here are their “size traces”:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; GraphicsGrid[ Partition[ ParallelMap[ Show[If[#[[1]] < 9, LengthsPlotDecimalSmall[#[[2, 1]], #[[1]], #[[2, 2, 1]]], LengthsPlotDecimal[#[[2, 1]], #[[1]], #[[2, 2, 1]], 8 Quotient[#[[2, 2, 1]], 8000]]], FrameTicks > None] &, {{4, {0, 14} > {419, 0}}, {6, {0, 58} > {2141, 28}}, {9, {0, 506} > {24552, 6}}, {12, {0, 3962} > {253456, 6}}, {13, {0, 5854} > {341992, 6}}, {15, {0, 16346} > {20858069, 0}}, {15, {0, 30074} > {357007576, 6}}, {20, {0, 703870} > {2586944104, 6}}, {22, {0, 3929706} > {2910925472, 6}}, {24, {0, 12410874} > {50048859310, 0}}, {25, {0, 33217774} > {202880696061, 6}}, {27, {0, 125823210} > {259447574536, 6}}, {28, {2, 264107671} > {643158954877, 10}}}], UpTo[3]]] 
One notable thing here—that we’ll come back to—is that after the first few cases, it’s very difficult to tell the overall scale of these pictures. On the first row, the longest x axis is about 20,000 steps; on the last row it is about 600 billion.
But probably the most remarkable thing is that we now know that for all (uncompressed) initial strings up to length 75, the system always eventually evolves to a cycle (or terminates).
Could the sequences of lengths in our tag system be like random walks? Obviously they can’t strictly be random walks because given an initial string, each entire “walk” is completely determined, and nothing probabilistic or random is introduced.
But what if we look at a large collection of initial conditions? Could the ensemble of observed walks somehow statistically be like random walks? From the basic construction of the tag system we know that at each step the (uncompressed) string either increases or decreases in length by one element depending on whether its first element is 1 or 0.
But if we just picked increase or decrease at random here are two typical examples of ordinary random walks we’d get:
✕
(SeedRandom[#]; ListStepPlot[Accumulate[RandomChoice[{1, 1}, 2000]], Frame > True, Filling > Axis, AspectRatio > 1/3, ImageSize > 300, PlotStyle > Hue[0.07, 1, 1]]) & /@ {3442, 3447} 
One very obvious difference from our tag system case is these walks can go below 0, whereas in the tag system case once one’s reached something at least close to 0 (corresponding to a cycle), the walk stops. (In a market analogy, the time series ends if there’s “bankruptcy” where the price hits 0.)
An important fact about random walks (at least in one dimension) is that with probability 1 they always eventually reach any particular value, like 0. So if our tag system behaved enough like a random walk, we might have an argument that it must “terminate with probability 1” (whatever that might mean given its discrete set of possible initial conditions).
But how similar can the sequence generated by a tag system actually be to an ordinary random walk? An important fact is that—beyond its initial condition—any tag system sequence must always consist purely of concatenations of the blocks 00 and 1101, or in other words, the sequence must be defined by a path through the finite automaton:
And from this we can see that—while all 2grams and 3grams can occur—the 4grams 1111,1100, 0101 and 0010 can never occur. In addition, if we assume that 0s and 1s occur with equal probability at the beginning of the string, then the blocks 00 and 1101 occur with equal probability, but the 3grams 000, 011 occur with double the probabilities of the others.
In general the numbers of possible mgrams for successive m are 2, 4, 8, 12, 15, 20, 25, 33, 41, … or for all m ≥ 3:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; \!\( \*UnderoverscriptBox[\(\[Sum]\), \(i\), \(m\)]\(Fibonacci[Ceiling[ \*FractionBox[\(i\), \(2\)] + 2]]\)\) + 5 == If[EvenQ[m], 2 Fibonacci[m/2 + 4], Fibonacci[(m + 11)/2]]  1 
Asymptotically this is —implying a limiting set entropy of per element. The relative frequencies of mgrams that appear (other than 0000…) are always of the form . The following lists for each m the number of mgrams that appear at given multiplicities (as obtained from Flatten[DeBruijnSequence[{{0,0},{1,1,0,1}},m]]):
✕

(This implies a “p log p” measure entropy of below 0.1.)
So what happens in actual tag system sequences? Once clear of the initial conditions, they seem to quite accurately follow these probabilistic (“meanfield theory”) estimates, though with various fluctuations. In general, the results are quite different from a pure ordinary random walk with every element independent, but in agreement with the estimates for a “00, 1101 random walk”.
Another difference from an ordinary random walk is that our walks end whenever they reach a cycle—and we saw above that there are an infinite number of cycles, of progressively greater sizes. But the density of such “trap” states is small: among all sizen strings, only perhaps of them lie on cycles.
The standard theory of random walks says, however, that in the limit of infinitely large strings and long walks, if there is indeed a random process underneath, these things will not matter: we’ll have something that is in the same universality class as the ordinary ±1 random walk, with the same largescale statistical properties.
But what about our tag systems that survive billions of steps before hitting 0? Could genuine random walks plausibly survive that long? The standard theory of first passage times (or “stopping times”) tells us that the probability for a random walk starting at 0 to first reach x (or, equivalently, for a walk starting at x to reach 0) at time t is:
✕
P(t) = (x exp((x^2/(2 t))))/Sqrt[2 \[Pi] t^3] 
This shows the probability of starting from x and first reaching 0 as a function of the number of steps:
✕
Off[General::munfl]; Plot[ Evaluate[Table[ If[x < 4, Callout, #1 &][(E^((x^2/(2 t))) x)/( Sqrt[2 \[Pi]] Sqrt[t^3]), x], {x, 5}]], {t, 0, 1000}, ScalingFunctions > {"Log", "Log"}, AspectRatio > 1/3, Frame > True, Axes > False] 
The most likely stopping time is , but there is a long tail, and the probability of surviving for a time longer than t is:
✕
erf(x/Sqrt[2 t]) \[TildeTilde] Sqrt[2/(\[Pi] t)] x 
How does this potentially apply to our systems? Assume we start from a string of (compressed) length n. This implies that the probability to survive for t steps (before “reaching x = 0”) is about . But there are 3 × 2^{n} possible strings of length n. So we can roughly estimate that one of them might survive for about steps, or at least a number of steps that increases roughly exponentially with n.
And our results for “longestsofar winners” above do in fact show roughly exponential increase with n (the dotted line is ≈ ):
✕
Show[ListPlot[{{4, 419}, {6, 2141}, {9, 24552}, {12, 253456}, {13, 341992}, {15, 20858069}, {15, 357007576}, {20, 2586944104}, {22, 2910925472}, {24, 50048859310}, {25, 202880696061}}, ScalingFunctions > "Log", Frame > True], Plot[4^(.75 n), {n, 1, 25}, ScalingFunctions > "Log", PlotStyle > Directive[LightGray, Dotted]]] 
We can do a more detailed comparison with random walks by looking at the complete distribution of halting (AKA stopping) times for tag systems. Here are the results for all n = 15 and 25 initial strings:
✕

Plotting these on a log scale we get
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; GraphicsRow[ ListStepPlot[Transpose[{Most[#1], #2} & @@ #2], Frame > True, Filling > Axis, ScalingFunctions > "Log", PlotRange > {{1, #1}, Automatic}, PlotStyle > Hue[0.07, 1, 1], FrameTicks > {{None, None}, {Thread[{Range[2, 10, 2], {"\!\(\*SuperscriptBox[\(10\), \(2\)]\)", "\!\(\*SuperscriptBox[\(10\), \(4\)]\)", "\!\(\*SuperscriptBox[\(10\), \(6\)]\)", "\!\(\*SuperscriptBox[\(10\), \(8\)]\)", "\!\(\*SuperscriptBox[\(10\), \(10\)]\)"}}], None}}] & @@@ {{5, hist15}, {9, {{ 9/10, 23/25, 47/50, 24/25, 49/50, 1, 51/50, 26/25, 53/50, 27/25, 11/10, 28/25, 57/50, 29/25, 59/50, 6/5, 61/50, 31/25, 63/50, 32/25, 13/10, 33/25, 67/50, 34/25, 69/50, 7/5, 71/50, 36/25, 73/50, 37/25, 3/2, 38/25, 77/50, 39/25, 79/50, 8/5, 81/50, 41/25, 83/50, 42/25, 17/10, 43/25, 87/50, 44/25, 89/50, 9/5, 91/50, 46/25, 93/50, 47/25, 19/10, 48/25, 97/50, 49/25, 99/50, 2, 101/50, 51/25, 103/50, 52/25, 21/10, 53/25, 107/50, 54/25, 109/50, 11/5, 111/50, 56/25, 113/50, 57/25, 23/10, 58/25, 117/50, 59/25, 119/50, 12/5, 121/50, 61/25, 123/50, 62/25, 5/2, 63/25, 127/50, 64/25, 129/50, 13/5, 131/50, 66/25, 133/50, 67/25, 27/10, 68/25, 137/50, 69/25, 139/50, 14/5, 141/50, 71/25, 143/50, 72/25, 29/10, 73/25, 147/50, 74/25, 149/50, 3, 151/50, 76/25, 153/50, 77/25, 31/10, 78/25, 157/50, 79/25, 159/50, 16/5, 161/50, 81/25, 163/50, 82/25, 33/10, 83/25, 167/50, 84/25, 169/50, 17/5, 171/50, 86/25, 173/50, 87/25, 7/2, 88/25, 177/50, 89/25, 179/50, 18/5, 181/50, 91/25, 183/50, 92/25, 37/10, 93/25, 187/50, 94/25, 189/50, 19/5, 191/50, 96/25, 193/50, 97/25, 39/10, 98/25, 197/50, 99/25, 199/50, 4, 201/50, 101/25, 203/50, 102/25, 41/10, 103/25, 207/50, 104/25, 209/50, 21/5, 211/50, 106/25, 213/50, 107/25, 43/10, 108/25, 217/50, 109/25, 219/50, 22/5, 221/50, 111/25, 223/50, 112/25, 9/2, 113/25, 227/50, 114/25, 229/50, 23/5, 231/50, 116/25, 233/50, 117/25, 47/10, 118/25, 237/50, 119/25, 239/50, 24/5, 241/50, 121/25, 243/50, 122/25, 49/10, 123/25, 247/50, 124/25, 249/50, 5, 251/50, 126/25, 253/50, 127/25, 51/10, 128/25, 257/50, 129/25, 259/50, 26/5, 261/50, 131/25, 263/50, 132/25, 53/10, 133/25, 267/50, 134/25, 269/50, 27/5, 271/50, 136/25, 273/50, 137/25, 11/2, 138/25, 277/50, 139/25, 279/50, 28/5, 281/50, 141/25, 283/50, 142/25, 57/10, 143/25, 287/50, 144/25, 289/50, 29/5, 291/50, 146/25, 293/50, 147/25, 59/10, 148/25, 297/50, 149/25, 299/50, 6, 301/50, 151/25, 303/50, 152/25, 61/10, 153/25, 307/50, 154/25, 309/50, 31/5, 311/50, 156/25, 313/50, 157/25, 63/10, 158/25, 317/50, 159/25, 319/50, 32/5, 321/50, 161/25, 323/50, 162/25, 13/2, 163/25, 327/50, 164/25, 329/50, 33/5, 331/50, 166/25, 333/50, 167/25, 67/10, 168/25, 337/50, 169/25, 339/50, 34/5, 341/50, 171/25, 343/50, 172/25, 69/10, 173/25, 347/50, 174/25, 349/50, 7, 351/50, 176/25, 353/50, 177/25, 71/10, 178/25, 357/50, 179/25, 359/50, 36/5, 361/50, 181/25, 363/50, 182/25, 73/10, 183/25, 367/50, 184/25, 369/50, 37/5, 371/50, 186/25, 373/50, 187/25, 15/2, 188/25, 377/50, 189/25, 379/50, 38/5, 381/50, 191/25, 383/50, 192/25, 77/10, 193/25, 387/50, 194/25, 389/50, 39/5, 391/50, 196/25, 393/50, 197/25, 79/10, 198/25, 397/50, 199/25, 399/50, 8, 401/50, 201/25, 403/50, 202/25, 81/10, 203/25, 407/50, 204/25, 409/50, 41/5, 411/50, 206/25, 413/50, 207/25, 83/10, 208/25, 417/50, 209/25, 419/50, 42/5, 421/50, 211/25, 423/50, 212/25, 17/2, 213/25, 427/50, 214/25, 429/50, 43/5, 431/50, 216/25, 433/50, 217/25, 87/10, 218/25, 437/50, 219/25, 439/50, 44/5, 441/50, 221/25, 443/50, 222/25, 89/10, 223/25, 447/50, 224/25, 449/50, 9, 451/50, 226/25, 453/50, 227/25, 91/10, 228/25, 457/50, 229/25, 459/50, 46/5, 461/50, 231/25, 463/50, 232/25, 93/10, 233/25, 467/50, 234/25, 469/50, 47/5, 471/50, 236/25, 473/50, 237/25, 19/2, 238/25, 477/50, 239/25, 479/50, 48/5, 481/50, 241/25, 483/50, 242/25, 97/10, 243/25, 487/50, 244/25, 489/50, 49/5, 491/50, 246/25, 493/50, 247/25, 99/10, 248/25, 497/50, 249/25, 499/50, 10, 501/50, 251/25, 503/50, 252/25, 101/10, 253/25, 507/50, 254/25, 509/50, 51/5, 511/50, 256/25, 513/50, 257/25, 103/10, 258/25, 517/50, 259/25, 519/50, 52/5, 521/50, 261/25, 523/50, 262/25, 21/2, 263/25, 527/50, 264/25, 529/50, 53/5, 531/50, 266/25, 533/50, 267/25, 107/10, 268/25, 537/50, 269/25, 539/50, 54/5, 541/50, 271/25, 543/50, 272/25, 109/10, 273/25, 547/50, 274/25, 549/50, 11, 551/50, 276/25, 553/50, 277/25, 111/10, 278/25, 557/50, 279/25, 559/50, 56/5, 561/50, 281/25, 563/50, 282/25, 113/10, 283/25}, CompressedData[" 1:eJy1lA9M1VUUxz+/3+893ns8eLwHjz8PQ0IQSERE/mhRSZIbkoBCjAmYoEli yr/+MCij1Yp0hlRzmYvVzCZOoiFDUsM2/2xRNDbX2qxgZRTFmkZJw0zWeTwI yIJq82xn53vPPfecc8+554ZsLMsqVQCTCksN/G8KD5jdZlPu3+s/bbpRV7Nv +vq7wUmcskf5E9fnq3iILN2v8YHcYYM71C3TcT4WzHfpGXpE5JNu3J1g4K2t CtVXDHS+ZuRCiEpvi4ltI+5cjDRT0KFhHzUzt8qT/Lct7Kjwou6YFa3Cxqmi AD5aaGVJhJWj7RayVD9GVlloeMyH5DPehP/swaNtPszts+OGnUPXfLi804Yj w0ZmsC/2Z228UOLLyVgbxa/7M/iSP4nzAjjSEkBJYhBDvg5sNYFkmQI4/oQf jjw7gf2+LG21US74jbAQIrIdRHTYKWidQ/AhB8832Sh8z8b2TE+CVptxHDST 0+SBbchMaqMHC1oE/24i189A9eNGmqwGFg8YCb9sZt8KMzsq3Xg63sCZAQc1 70Sxck4woSkOBmK9eE4z0RdgJKhfx6VXNKqbVCri3DjRYySpS8fhKCN8pmev WePooMbnpSpfr9BIeEoj6rRGfafGlVA9frUqKQMKZbUKqWtViqIVfjQoeI/K 8SAdbiUW3vzWnTWvqjTnKVgrTVikn/pl4i9cYXWoQrf07NY6hej3pbdtCpYN 0H1WZbgTrq9SuTRfY1GmRu1v0HhSeuylYItT+GodbI2BvoMKOYIr71RJ74Fr aQodD8OvF+D0ehjNgOYcUOMhOx+WLwEfPzgufg4sgC+3Qbmc/zBRoTZP7OdD tEyMPhUiz8F2k4LPYYWuEDhxj9gHwSJ5g01V4icJPjZDifi86gZpVvhlWEdB FOR6wu5I2Fyl0L5Q8rLLDIbBQ1U6NkncbxYrxEgMPy/IkLyPXdXxiZwZkae/ fjOs84Y7TJAsrUgrBIfE2r8Rzop9sSbrPdAgMRPErqQXvpcZGZJ1lgVWip9z kmOgc67k/oE6KBS/2emwU2aoWfSSFveLbJW5bhdcL7hyDfSL75dvYWzmtoht ieydF5wjuXzh/Adk3SicKnF2yf5u4SQ99Eq8taIXNQekJvLExmie6EKdMYXf lZxvFxkn+lPCRcIXpSaFkl+5YIHcJxwzPvvXbXBEzncJ/kFkmOTlL7YvCt4r OgnLxE+hjeOJdYOwhGKX8E+yuUVkj861myU88RUmC0ubaBd+RjjR3aV3Wjrz DFNccYqd9V3u8ukkueJYjZx2Vib1U76xaeSsg/uEYyZjTOTuJP9xee/4nq/w bX/xUzYu5QlhnKJ/8B/iOik+ehIbp6dwAw1rrjw8p+geUFz3ktKTPsPhNFw9 jJ3B/1TSjUv9FF2bxyTuVpmVZrrLfyVtdpObTv+2djeDnLX8A7Tp63c= "]}}}] 
showing at least a rough approximation to the behavior expected for a random walk.
In making distributions like these, we’re putting together all the initial strings of length n, and asking about the statistical properties of this ensemble. But we can also imagine seeing whether initial strings with particular properties consistently behave differently from others. This shows the distribution of halting times as a function of the number of 1s in the initial string; no strong correlations are seen (here for n = 20), even though at least at the beginning the presence of 1s leads to growth:
✕

How should we think about what we’re seeing? To me it in many ways just seems a typical manifestation of the ubiquitous phenomenon of computational irreducibility. Plenty of systems show what seems like random walk behavior. Even in rule 30, for example, the dividing line between regularity and randomness appears to follow a (biased) random walk:
✕

✕

If we changed the initial conditions, we’d get a different random walk. But in all cases, we can think of the evolution of rule 30 as intrinsically generating apparent randomness, “seeded” by its initial conditions.
Even more directly analogous to our tag system are cellular automata whose boundaries show apparent randomness. An example is the k = 2, r = 3/2 rule 7076:
✕
ArrayPlot[CellularAutomaton[{7076, 2, 3/2}, {{1}, 0}, #], ImageSize > 300] & /@ {100, 400} 
✕

Will this pattern go on growing forever, or will it eventually become very narrow, and either enter a cycle or terminate entirely? This is analogous to asking whether our tag system will halt.
There are other cellular automata that show even more obvious examples of these kinds of questions. Consider the k = 3, r = 1 totalistic code 1329 cellular automaton. Here is its behavior for a sequence of simple initial conditions. In some cases the pattern dies out (“it halts”); in some cases it evolves to a (rather elaborate) period78 cycle. And in one case here it evolves to a period7 cycle:
✕
GraphicsRow[ Table[ArrayPlot[ CellularAutomaton[{1329, {3, 1}, 1}, {IntegerDigits[i, 3], 0}, {220, {13, 13}}], ColorRules > {0 > White, 1 > Hue[.03, .9, 1], 2 > Hue[.6, .9, .7]}], {i, 1, 64, 3}]] 
But is this basically all that can happen? No. Here are the various persistent structures that occur with the first 10,000 initial conditions—and we see that in addition to getting ordinary “cycles”, we also get “shift cycles”:
✕
Row[ArrayPlot[CellularAutomaton[{1329, {3, 1}, 1}, {#Cells, 0}, 200], ImageSize > {Automatic, 250}, ColorRules > {0 > White, 1 > Hue[.03, .9, 1], 2 > Hue[.6, .9, .7]}] & /@ Normal[Take[ResourceData["728d1c0788924673bab3d889cc6c4623"], 7]], Spacer[7]] 
But if we go a little further, there’s another surprise: initial condition 54,889 leads to a structure that just keeps growing forever—while initial condition 97,439 also does this, but in a much more trivial way:
✕
GraphicsRow[ ArrayPlot[ CellularAutomaton[{1329, {3, 1}, 1}, {IntegerDigits[#, 3], 0}, 1000], ColorRules > {0 > White, 1 > Hue[.03, .9, 1], 2 > Hue[.6, .9, .7]}, ImageSize > {Automatic, 400}] & /@ {54889, 97439}] 
In our tag system, the analog of these might be particular strings that produce patterns that “obviously grow forever”.
One might think that there could be a fundamental difference between a cellular automaton and a tag system. In a cellular automaton the rules operate in parallel, in effect connecting a whole grid of neighboring cells, while in a tag system the rules only specifically operate on the very beginning and end of each string.
But to see a closer analogy we can consider every update in the tag system as an “event”, then draw a causal graph that shows the relationships between these events. Here is a simple case:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; With[{evol = TSDirectEvolveList[IntegerDigits[18464, 2, 15], 25]}, Show[ArrayPlot[PadRight[evol, Automatic, .1], Mesh > True, Frame > False, MeshStyle > GrayLevel[0.9, 0.9], ColorRules > {0 > White, 1 > GrayLevel[.5]}], Graphics[{Hue[0, 1, 0.56], Opacity[0.2], Rectangle[{0, 0}, {3, Length[evol]}]}], MapIndexed[Graphics[{FaceForm[Opacity[0]], EdgeForm[ Hue[0.11, 1, 0.97]], Rectangle[{0, First[Length[evol]  #2 + 1]}, {1, First[Length[evol]  #2]}]}] &, evol], Rest[MapIndexed[ Graphics[{FaceForm[Opacity[0]], EdgeForm[Directive[Thick, Hue[0.11, 1, 0.97]]], Rectangle[{If[Quiet[First[First[evol[[#2  1]]]] == 0], Length[#1]  2, Length[#1]  4], First[Length[evol]  #2 + 1]}, {Length[#1], First[Length[evol]  #2]}]}] &, evol]], MapIndexed[ Graphics[{Hue[0, 1, 0.56], Thick, Arrowheads[Small], Arrow[BezierCurve[{{0, First[Length[evol] + 0.5  #2]}, {1, First[Length[evol]  #2]}, {0, First[Length[evol]  0.5  #2]}}]]}] &, Most[evol]], Module[{quo, rem, src}, {quo, rem} = Transpose[QuotientRemainder[(Length[#]  3), 3] & /@ evol]; MapIndexed[ If[First[#1] === 1, Switch[First[rem[[#2]]], 0, Graphics[{Hue[0, 1, 0.56], Thick, Arrowheads[Small], If[First[Length[evol]  (#2 + 1) + .5  quo[[#2]]] > 0, Arrow[{{Length[#1]  3, First[Length[evol]  (#2 + 1) + 0.5]}, {1, First[Length[evol]  (#2 + 1) + 0.5  quo[[#2]]]}}], Nothing], If[First[Length[evol]  (#2 + 1)  0.5  quo[[#2]]] > 0, Arrow[{{Length[#1]  3, First[Length[evol]  (#2 + 1) + 0.5]}, {1, First[Length[evol]  (#2 + 1)  0.5  quo[[#2]]]}}], Nothing]}], 1  2, Graphics[{Hue[0, 1, 0.56], Thick, Arrowheads[Small], If[First[Length[evol]  (#2 + 1)  0.5  quo[[#2]]] > 0, Arrow[{{Length[#1]  3, First[Length[evol]  (#2 + 1) + 0.5]}, {1, First[Length[evol]  (#2 + 1)  0.5  quo[[#2]]]}}], Nothing]}]], Switch[First[rem[[#2]]], 0, If[First[Length[evol]  (#2 + 1)  0.5  quo[[#2]]] > 0, Graphics[{Hue[0, 1, 0.56], Thick, Arrowheads[Small], Arrow[{{Length[#1]  3, First[Length[evol]  (#2 + 1) + 0.5]}, {1, First[Length[evol]  (#2 + 1) + 0.5  quo[[#2]]]}}]}], Nothing], 1, Nothing, 2, Graphics[{Hue[0, 1, 0.56], Thick, Arrowheads[Small], If[First[Length[evol]  (#2 + 1)  0.5  quo[[#2]]] > 0, Arrow[{{Length[#1]  3, First[Length[evol]  (#2 + 1) + 0.5]}, {1, First[Length[evol]  (#2 + 1)  0.5  quo[[#2]]]}}], Nothing]}]]] &, evol]], Drop[MapIndexed[ Graphics[{Hue[0, 1, 0.56], Thick, Arrowheads[Small], Arrow[{{1, First[Length[evol] + 0.5  #2]}, {If[ Quiet[First[evol[[#2  1]]] == 0], Length[#1]  1, Length[#1]  3], First[Length[evol]  0.5  #2]}}]}] &, Most[evol]], 1] ]] 
Extracting the pure causal graph we get:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Graph[TagSystemCausalGraph @ With[{ system = PostTagSystem[{0, {1, 1, 0, 1, 0}}]}, system["State", #] & /@ Range[system["StateCount"]] ], AspectRatio > 1.4] 
For the string 4:14:0 which takes 419 steps to terminate, the causal graph is:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ggg = TagSystemCausalGraph @ With[{ system = PostTagSystem[{0, {1, 1, 1, 0}}]}, system["State", #] & /@ Range[system["StateCount"]]] 
Or laid out differently, and marking expansion (1→1101) and contraction (0→00) events with red and blue:
✕

Here is the causal graph for the 2141step evolution of 6:58:0
✕

and what is notable is that despite the “spatial localization” of the underlying operation of the tag system, the causal graph in effect connects events in something closer to a uniform mesh.
When Emil Post was first studying tag systems a hundred years ago he saw them as the last hurdle in finding a systematic way to “solve all of mathematics”, and in particular to solve all problems in number theory. Of course, they turned out to be a very big hurdle. But having now seen how complex tag systems can be, it’s interesting to go back and connect again with number theory.
It’s straightforward to convert a tag system into something more obviously number theoretical. For example, if one represents each string of length n by a pair of integers {n,i} in which the binary digits of i give the elements of the string, then each step in the evolution can be obtained from:
✕
TagStep[{n_, i_}] := With[{j = 2^(n  1) FractionalPart[(8 i)/2^n]}, If[i < 2^(n  1), {n  1, j}, {n + 1, 4 j + 13}]] 
Starting from the 4:14:0 initial condition (here represented in uncompressed form by {12, 2336}) the first few steps are then:
✕
NestList[TagStep, {12, 2336}, 10] 
For compressed strings, the corresponding form is:
✕
TagStep[{n_, i_, p_}] := With[{j = 2^n FractionalPart[i/2^(n  1)]}, If[i < 2^( n  1), {{n, j, 2}, {n  1, j/2, 0}, {n, j, 1}}, {{n + 1, 2 j + 3, 1}, {n, j, 2}, {n, j + 1, 0}}][[p + 1]]] 
There are different number theoretical formulations one can imagine, but a core feature is that at each step the tag system is making a choice between two arithmetic forms, based on some essentially arithmetic property of the number obtained so far. (Note that the type of condition we have given here can be further “compiled” into “pure arithmetic” by extracting it as a solution to a Diophantine equation.)
A widely studied system similar to this is the Collatz or 3n + 1 problem, which generates successive integers by applying the function:
✕
n > If[EvenQ[n], n/2, 3 n + 1] 
Starting, say, from 27, the sequence of numbers obtained is 27, 82, 41, 124, 62, 31, …
✕
ListStepPlot[ NestList[n > If[EvenQ[n], n/2, 3 n + 1], 27, 120], Center, Frame > True, AspectRatio > 1/3, Filling > Axis, PlotRange > All, PlotStyle > Hue[0.07, 1, 1]] 
where after 110 steps the system reaches the cycle 4, 2, 1, 4, 2, 1, …. As a closer analog to the plots for tag systems that we made above we can instead plot the lengths of the successive integers, represented in base 2:
✕
ListStepPlot[ IntegerLength[#, 2] & /@ NestList[n > If[EvenQ[n], n/2, 3 + 1], 27, 130], Center, Frame > True, AspectRatio > 1/3, Filling > Axis, PlotRange > All, PlotStyle > Hue[0.07, 1, 1]] 
The state transition graph starting from integers up to 10 is
✕
With[{g = NestGraph[n > If[EvenQ[n], n/2, 3 n + 1], Range[10], 50]}, HighlightGraph[ Graph[g, VertexLabels > Automatic, VertexStyle > Hue[0.58, 0.65, 1], EdgeStyle > Hue[0.58, 1, 0.7000000000000001]], {Style[ Subgraph[g, FindCycle[g, {1, Infinity}, All]], Thick, Hue[ 0.02, 0.92, 0.8200000000000001]], Pick[VertexList[g], VertexOutDegree[g], 0]}]] 
and up to 1000 it is:
✕
With[{g = NestGraph[n > If[EvenQ[n], n/2, 3 n + 1], Range[1000], 10000, VertexStyle > Hue[0.58, 0.65, 1], EdgeStyle > Hue[0.58, 1, 0.7000000000000001]]}, HighlightGraph[ g, {Style[Subgraph[g, FindCycle[g, {1, Infinity}, All]], Thickness[.01], Hue[0.02, 0.92, 0.8200000000000001]], Pick[VertexList[g], VertexOutDegree[g], 0]}]] 
Unlike for Post’s tag system, there is only one connected component (and one final cycle), and the “highways” are much shorter. For example, among the first billion initial conditions, the longest transient is just 986 steps. It occurs for the initial integer 670617279—which yields the following sequence of integer lengths:
✕
ListStepPlot[ IntegerLength[#, 2] & /@ NestList[n > If[EvenQ[n], n/2, 3 n + 1], 670617279, 1100], Center, Frame > True, AspectRatio > 1/3, Filling > Axis, PlotRange > All, PlotStyle > Hue[0.07, 1, 1]] 
Despite a fair amount of investigation since the 1930s, it’s still not known whether the 3n + 1 problem always terminates on its standard cycle—though this is known to be the case for all integers up to .
For Post’s tag system the most obvious probabilistic estimate suggests that the sequence of string lengths should follow an unbiased random walk. For the 3n + 1 problem, a similar analysis suggests a random walk with an average bias of binary digits per step, as suggested by this collection of walks from initial conditions + k:
✕
ListStepPlot[ Table[IntegerLength[#, 2] & /@ NestList[n > If[EvenQ[n], n/2, 3 n + 1], 10^8 + i, 200], {i, 0, 40}], Center, Frame > True, AspectRatio > 1/3, PlotRange > All] 
The rule (discussed in A New Kind of Science)
✕
n > If[EvenQ[n], n/2, 5 n + 1] 
instead implies a bias of +0.11 digits per step, and indeed most initial conditions lead to growth:
✕
Function[{i}, ListStepPlot[ IntegerLength[#, 2] & /@ NestList[n > If[EvenQ[n], n/2, 5 n + 1], i, 200], Center, Frame > True, AspectRatio > 1/3, Filling > Axis, PlotRange > All, Epilog > Inset[i, Scaled[{.1, .8}]], PlotStyle > Hue[0.07, 1, 1]]] /@ {7, 37} 
But there are still some that—even though they grow for a while—have “fluctuations” that cause them to “crash” and end up in cycles:
✕
Function[{i}, ListStepPlot[ IntegerLength[#, 2] & /@ NestList[n > If[EvenQ[n], n/2, 5 n + 1], i, 100], Center, Frame > True, AspectRatio > .45, Filling > Axis, PlotRange > All, Epilog > Inset[i, Scaled[{.9, .8}]], PlotStyle > Hue[0.07, 1, 1]]] /@ {181, 613, 9818} 
What is the “most unbiased” a n + b system? If we consider mod 3 instead of mod 2, we have systems like:
✕
n > \!\(\*SubscriptBox[\({n, \*SubscriptBox[\(a\), \(1\)] n + \*SubscriptBox[\(b\), \(1\)], \*SubscriptBox[\(a\), \(2\)] n + \*SubscriptBox[\(b\), \(2\)]}\), \(\([\)\(\([\)\(Mod[n, 3] + 1\)\(]\)\)\(]\)\)]\)/3 
We need to be divisible by 3 when n = i mod 3. In our approximation, the bias will be . This is closest to zero (with value +0.05) when a_{i} are 4 and 7. An example of a possible iteration is then:
✕
n > \!\(\*SubscriptBox[\({n, 4 n + 2, 7 n + 1}\), \(\([\)\(\([\)\(Mod[n, 3] + 1\)\(]\)\)\(]\)\)]\)/3 
Starting from a sequence of initial conditions this clearly shows less bias than the 3n + 1 case:
✕
ListStepPlot[Table[IntegerLength[#, 2] & /@ NestList[n > \!\(\*SubscriptBox[\({n, 4 n + 2, 7 n + 1}\), \(\([\)\(\([\)\(Mod[n, 3] + 1\)\(]\)\)\(]\)\)]\)/ 3, 10^8 + i, 100], {i, 0, 40}], Center, Frame > True, AspectRatio > 1/3, PlotRange > All] 
Here are the halting times for initial conditions up to 1000:
✕
ListStepPlot[ Transpose[ ParallelTable[Length /@ FindTransientRepeat[NestList[n > \!\(\*SubscriptBox[\({n, 4 n + 2, 7 n + 1}\), \(\([\)\(\([\)\(Mod[n, 3] + 1\)\(]\)\)\(]\)\)]\)/3, i, 5000], 3], {i, 1000}]], Center, PlotRange > {0, 4000}, PlotLayout > "Stacked", Joined > True, Filling > Automatic, Frame > True, AspectRatio > 1/4, PlotStyle > Hue[0.1, 1, 1]] 
Most initial conditions quickly evolve to cycles of length 5 or 20. But initial condition 101 takes 2604 steps to reach the 20cycle:
✕
Function[{i}, ListStepPlot[IntegerLength[#, 2] & /@ NestList[n > \!\(\*SubscriptBox[\({n, 4 n + 2, 7 n + 1}\), \(\([\)\(\([\)\(Mod[n, 3] + 1\)\(]\)\)\(]\)\)]\)/3, i, 3000], Center, Frame > True, AspectRatio > 1/3, Filling > Axis, PlotRange > All, Epilog > Inset[i, Scaled[{.06, .9}]], PlotStyle > Hue[0.07, 1, 1]]] /@ {101, 469} 
And initial condition 469 does not appear to reach a cycle at all—and instead appears to systematically grow at about 0.018 bits per step:
✕
ListStepPlot[ MapIndexed[{1 + (First[#2]  1)*1000, #} &, (IntegerLength[#, 2] & /@ NestList[Nest[n > \!\(\*SubscriptBox[\({n, 4 n + 2, 7 n + 1}\), \(\([\)\(\([\)\(Mod[n, 3] + 1\)\(]\)\)\(]\)\)]\)/3, #, 1000] &, 469, 1000])], Center, Frame > True, AspectRatio > 1/3, Filling > Axis, PlotRange > All, PlotStyle > Hue[0.07, 1, 1]] 
In other words, unlike the 3n + 1 problem—or our tag system—this iteration usually leads to a cycle, but just sometimes appears to “escape” and continue to increase, presumably forever.
(In general, for modulus m, the minimum bias will typically be , and the “smoothest” iterations will be ones whose multipliers involve similarsized factors of numbers close to . For m = 4, for example, {n, 3n – 3, 5n – 2, 17n + 1} is the best.)
One might wonder how similar our tag system—or the 3n + 1 problem—is to classic unsolved problems in number theory, like the Riemann Hypothesis. In essence the Riemann Hypothesis is an assertion about the statistical randomness of primes, normally stated in terms of complex zeroes of the Riemann zeta function, or equivalently, that all the maxima of RiemannSiegelZ[t] (for any value of t) lie above the axis:
✕
Plot[RiemannSiegelZ[t], {t, 0, 400}, Frame > True, AspectRatio > 1/6, PlotPoints > 500, PlotStyle > Hue[0.07, 1, 1]] 
But it’s known (thanks to extensive work by Yuri Matiyasevich) that an equivalent—much more obviously integerrelated—statement is that
✕
(2 n + 3)!!/15  (2 n  2)!! PrimePi[ n]^2 ((BitLength[Fold[LCM, Range[n]]]  1) \!\( \*UnderoverscriptBox[\(\[Sum]\), \(k = 1\), \(n  1\)]\( \*SuperscriptBox[\((\(1\))\), \(k + 1\)] \*SuperscriptBox[\(k\), \(1\)]\)\)  n) 
is positive for all positive n. And this then turns out to be equivalent to the surprisingly simple statement that the iteration
✕
NestWhile[x > {2 \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(2\)\(\ \[RightDoubleBracket]\)\)]\) \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(1\)\(\ \[RightDoubleBracket]\)\)]\)  4 (1)^x[[2]] \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(5\)\(\ \[RightDoubleBracket]\)\)]\), \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(2\)\(\ \[RightDoubleBracket]\)\)]\) + 1, ( \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(2\)\(\ \[RightDoubleBracket]\)\)]\) + 1) \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(3\)\(\ \[RightDoubleBracket]\)\)]\)/GCD[ \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(2\)\(\ \[RightDoubleBracket]\)\)]\) + 1, \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(3\)\(\ \[RightDoubleBracket]\)\)]\)], If[GCD[ \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(2\)\(\ \[RightDoubleBracket]\)\)]\) + 1, \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(3\)\(\ \[RightDoubleBracket]\)\)]\)] == 1, \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(4\)\(\ \[RightDoubleBracket]\)\)]\) + 1, \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(4\)\(\ \[RightDoubleBracket]\)\)]\)], \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(6\)\(\ \[RightDoubleBracket]\)\)]\), (2 \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(2\)\(\ \[RightDoubleBracket]\)\)]\) + 2) \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(6\)\(\ \[RightDoubleBracket]\)\)]\), (2 \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(2\)\(\ \[RightDoubleBracket]\)\)]\) + 5) \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(7\)\(\ \[RightDoubleBracket]\)\)]\)}, {1, 1, 1, 0, 0, 1, 1}, x > \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(7\)\(\ \[RightDoubleBracket]\)\)]\) > \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(4\)\(\ \[RightDoubleBracket]\)\)]\)^2 ( \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(1\)\(\ \[RightDoubleBracket]\)\)]\) (BitLength[ \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(3\)\(\ \[RightDoubleBracket]\)\)]\)]  1)  \!\(\*SubscriptBox[\(x\), \(\(\[LeftDoubleBracket]\)\(6\)\(\ \[RightDoubleBracket]\)\)]\))] 
will never terminate.
For successive n the quantity above is given by:
✕
Table[(2 n + 3)!!/ 15  (2 n  2)!! PrimePi[ n]^2 ((BitLength[Fold[LCM, Range[n]]]  1) \!\( \*UnderoverscriptBox[\(\[Sum]\), \(k = 1\), \(n  1\)]\( \*SuperscriptBox[\((\(1\))\), \(k + 1\)] \*SuperscriptBox[\(k\), \(1\)]\)\)  n), {n, 10}] 
At least at the beginning the numbers are definitely positive, as the Riemann Hypothesis would suggest. But if we ask about the longterm behavior we can see something of the complexity involved by looking at the differences in successive ratios:
✕
GraphicsRow[ ListStepPlot[ Differences[ Ratios[Table[(2 n + 3)!!/ 15  (2 n  2)!! PrimePi[ n]^2 ((BitLength[Fold[LCM, Range[n]]]  1) \!\( \*UnderoverscriptBox[\(\[Sum]\), \(k = 1\), \(n  1\)]\( \*SuperscriptBox[\((\(1\))\), \(k + 1\)] \*SuperscriptBox[\(k\), \(1\)]\)\)  n), {n, #}]]], Frame > True, PlotStyle > Hue[0.07, 1, 1], AspectRatio > 1/3] & /@ {100, 1000}] 
The Riemann Hypothesis effectively says that there aren’t too many negative differences here.
So far we’ve been talking specifically about Emil Post’s particular 00, 1101 tag system. But as Post himself observed, one can define plenty of other tag systems—including ones that involve not just 0 and 1 but any number of possible elements (Post called the number of possible elements μ, but I’ll call it k), and delete not just 3 but any number of elements at each step (Post called this ν, but I’ll call it r).
It’s easy to see that rules which delete only one element at each step (r = 1) cannot involve real “communication” (or causal connections) between different parts of the string, and must be equivalent to neighborindependent substitution systems—so that they either have trivial behavior, or grow without bound to produce at most highly regular nested sequences. (0→01, 1→10 will generate the Thue–Morse string, while 0→01, 1→0 will generate the Fibonacci string.)
Things immediately get more complicated when two elements are deleted at each step (r = 2). Post correctly observed that with just 0 and 1 (k = 2) there are no rules that show the kind of sometimesexpanding, sometimescontracting behavior of his 00, 1101 rule. But back in 2007—as part of a live experiment at our annual Summer School—I looked at the r = 2 rule 0→1, 1→110. Here’s what it does starting with 10:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[ PadRight[TSGDirectEvolveList[{2, {{1}, {1, 1, 0}}}, {1, 0}, 25], Automatic, .25], Mesh > True, MeshStyle > GrayLevel[0.75, 0.75]] 
And here’s how the sequence of string lengths behaves:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ TagLengthFunction[{2, {{1}, {1, 1, 0}}}][{1, 0}, 60], Center, AspectRatio > 1/3, Filling > Axis, Frame > True, PlotStyle > Hue[0.07, 1, 1]] 
If we assume that 0 and 1 appear randomly with certain probabilities, then a simple calculation shows that 1 should occur about times as often as 0, and the string should grow an average of elements at each step. So “detrending” by this, we get:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ MapIndexed[#  (Sqrt[2]  1) First[#2] &, TagLengthFunction[{2, {{1}, {1, 1, 0}}}][{1, 0}, 300]], Center, AspectRatio > 1/4, Filling > Axis, Frame > True, PlotStyle > Hue[0.07, 1, 1]] 
Continuing for more steps we see a close approximation to a random walk:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ MapIndexed[#  (Sqrt[2]  1) First[#2] &, TagLengthFunction[{2, {{1}, {1, 1, 0}}}][{1, 0}, 10000]], Center, AspectRatio > 1/4, Filling > Axis, Frame > True, PlotStyle > Hue[0.07, 1, 1]] 
So just like with Post’s 00, 1101 rule—and, of course, with rule 30 and all sorts of other systems in the computational universe—we have here a completely deterministic system that generates what seems like randomness. And indeed among tag systems of the type we’re discussing here this appears to be the very simplest rule that shows this kind of behavior.
But does this rule show the same kind of growth from all initial conditions? It can show different random sequences, for example here for initial conditions 5:17 and 7:80:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ MapIndexed[#  (Sqrt[2]  1) First[#2] &, TagLengthFunction[{2, {{1}, {1, 1, 0}}}][#, 300]], Center, AspectRatio > 1/4, Filling > Axis, Frame > True, PlotStyle > Hue[0.07, 1, 1]] & /@ {IntegerDigits[17, 2, 5], IntegerDigits[80, 2, 7]} 
And sometimes it just immediately enters a cycle. But it has some “surprises” too. Like with initial condition 9:511 (i.e. 111111111) it grows not linearly, but like (shown here without any detrending):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ TagLengthFunction[{2, {{1}, {1, 1, 0}}}][{1, 1, 1, 1, 1, 1, 1, 1, 1}, 150], Center, AspectRatio > 1/4, Filling > Axis, Frame > True, PlotStyle > Hue[0.07, 1, 1]] 
But what about a tag system that doesn’t seem to “typically grow forever”? When I was working on A New Kind of Science I studied generalized tag systems that don’t just look at their first elements, but instead use the whole block of elements they’re deleting to determine what elements to add at the end (and so work in a somewhat more “cellularautomatonstyle” way).
One particular rule that I showed in A New Kind of Science (as case (c) on page 94) is:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Text[Map[Row, {{0, 0} > {0}, {1, 0} > {1, 0, 1}, {0, 1} > {0, 0, 0}, {1, 1} > {0, 1, 1}}, {2}]] 
Starting with 11 this rule gives
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[ PadRight[GCSSEvolveList[{2, {{0, 0} > {0}, {1, 0} > {1, 0, 1}, {0, 1} > {0, 0, 0}, {1, 1} > {0, 1, 1}}}, {1, 1}, 25], Automatic, .25], Mesh > True, MeshStyle > GrayLevel[0.75, 0.75]] 
and grows for a while—but then terminates after 289 steps:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ Length /@ GCSSEvolveList[{2, {{0, 0} > {0}, {1, 0} > {1, 0, 1}, {0, 1} > {0, 0, 0}, {1, 1} > {0, 1, 1}}}, {1, 1}, 300], Center, AspectRatio > 1/4, Filling > Axis, Frame > True, PlotStyle > Hue[0.07, 1, 1]] 
The corresponding generational evolution is:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[ Reverse@Transpose[ PadRight[ GCSSGenerationEvolveList[{2, {{0, 0} > {0}, {1, 0} > {1, 0, 1}, {0, 1} > {0, 0, 0}, {1, 1} > {0, 1, 1}}}, {1, 1}, 35], {Automatic, 38}, .25]], Mesh > True, MeshStyle > GrayLevel[.75, .75], Frame > False] 
(Note that the kind of “phase decomposition” that we did for Post’s tag system doesn’t make sense for a block tag system like this.)
Here are the lengths of the transients+cycles for possible initial conditions up to size 7:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; With[{list = Catenate[Table[Tuples[{0, 1}, n], {n, 7}]]}, ListStepPlot[ Transpose[((Length /@ FindTransientRepeat[ GCSSEvolveList[{2, {{0, 0} > {0}, {1, 0} > {1, 0, 1}, {0, 1} > {0, 0, 0}, {1, 1} > {0, 1, 1}}}, #, 1000], 4]) & /@ list)], Center, PlotStyle > {Hue[0.1, 1, 1], Hue[0.02, 0.92, 0.8200000000000001]}, PlotRange > {0, 800}, PlotLayout > "Stacked", Joined > True, Filling > Automatic, Frame > True, AspectRatio > 1/5, FrameTicks > {{Automatic, None}, {Extract[ MapThread[ List[#1, Rotate[Style[StringJoin[ToString /@ #2], FontFamily > "Roboto", Small], 90 Degree]] &, {Range[0, 253], list}], Position[list, Alternatives @@ Select[list, IntegerExponent[FromDigits[#, 2], 2] > Length[#]/2 && Length[#] > 1 &]]], None}}]] 
This looks more irregular—and “livelier”—than the corresponding plot for Post’s tag system, but not fundamentally different. At size 5 the initial string 11010 (denoted 5:12) yields
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ Length /@ GCSSEvolveList[{2, {{0, 0} > {0}, {1, 0} > {1, 0, 1}, {0, 1} > {0, 0, 0}, {1, 1} > {0, 1, 1}}}, {1, 1, 0, 1, 0}, 800], Center, AspectRatio > 1/4, Filling > Axis, Frame > True, PlotStyle > Hue[0.07, 1, 1]] 
which terminates after 706 steps in a length8 cycle. Going further one sees a sequence of progressively longer transients:
✕
Text[Grid[ Prepend[{Row[{#[[1, 1]], ":", #[[1, 2]]}], #[[2, 1]], #[[2, 2]]} & /@ {{2, 3} > {288, 1}, {5, 12} > {700, 8}, {6, 62} > {4184, 1}, {8, 175} > {20183, 8}, {9, 345} > {26766, 1}, {9, 484} > {51680, 8}, {10, 716} > {100285, 1}, {10, 879} > {13697828, 8}, {13, 7620} > {7575189088, 1}, {17, 85721} > {14361319032, 8}}, Style[#, Italic] & /@ {"initial state", "steps", "cycle length"}], Frame > All, Alignment > {{Left, Right, Right}}, FrameStyle > GrayLevel[.7], Background > {None, {GrayLevel[.9]}}]] 
✕

But like with Post’s tag system, the system always eventually reaches a cycle (or terminates)—at least for all initial strings up to size 17. But what will happen for the longest initial strings is not clear, and the greater “liveliness” of this system relative to Post’s suggests that if exotic behavior occurs, it will potentially do so for smaller initial strings than in Post’s system.
Another way to generalize Post’s 00, 1101 tag system is to consider not just elements 0, 1, but, say, 0, 1, 2 (i.e. k = 3). And in this case there is already complex behavior even with rules that consider just the first element, and delete two elements at each step (r = 2).
As an example, consider the rule:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; #1 > Row[#2] & @@@ Thread[Range[0, 2] > TakeList[IntegerDigits[76, 3, 6], {1, 2, 3}]] 
Starting, say, with 101 this gives
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[ PadRight[TSGDirectEvolveList[{2, TakeList[IntegerDigits[76, 3, 6], {1, 2, 3}]}, IntegerDigits[10, 3, 3], 20], Automatic, .25], Mesh > True, MeshStyle > GrayLevel[.85, .75], ColorRules > {0 > White, 1 > Hue[.03, .9, 1], 2 > Hue[.7, .8, .5], 1 > GrayLevel[.85]}] 
which terminates after 74 steps:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ Length /@ TSGDirectEvolveList[{2, TakeList[IntegerDigits[76, 3, 6], {1, 2, 3}]}, IntegerDigits[10, 3, 3], 250], Center, AspectRatio > 1/4, Filling > Axis, Frame > True, PlotStyle > Hue[0.07, 1, 1]] 
Here are the lengths of transients+cycles for this rule up to length6 initial (ternary) strings:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; With[{r = 76, list = Catenate[ Table[IntegerDigits[i, 3, n], {n, 1, 6}, {i, 0, 3^n  1}]]}, ListStepPlot[ Transpose[ Last /@ Monitor[ Flatten[Table[ ParallelTable[{n, i} > Length /@ FindTransientRepeat[ Length /@ TSGDirectEvolveList[{2, TakeList[IntegerDigits[r, 3, 6], {1, 2, 3}]}, IntegerDigits[i, 3, n], 1000], 10], {i, 0, 3^n  1}], {n, 6}]], n]], Center, PlotRange > {0, 125}, PlotStyle > {Hue[0.1, 1, 1], Hue[0.02, 0.92, 0.8200000000000001]}, PlotLayout > "Stacked", Joined > True, Filling > Automatic, Frame > True, AspectRatio > 1/5, FrameTicks > {{Automatic, None}, {Extract[ MapThread[ List[#1, Rotate[Style[StringJoin[ToString /@ #2], FontFamily > "Roboto", Small], 90 Degree]] &, {Range[0, 1091], list}], Position[list, Alternatives @@ Select[list, IntegerExponent[FromDigits[#, 3], 3] > Length[#]/2 && Length[#] =!= 3 && Length[#] > 1 &]]], None}}]] 
The initial string 202020 (denoted 6:546, where now this indicates ternary rather than binary) terminates after 6627 steps
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ Length /@ TSGDirectEvolveList[{2, TakeList[IntegerDigits[76, 3, 6], {1, 2, 3}]}, IntegerDigits[546, 3, 6], 10000], Center, AspectRatio > 1/4, Filling > Axis, Frame > True, PlotStyle > Hue[0.07, 1, 1]] 
with (phasereduced) generational evolution:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[ Reverse@Transpose[ PadRight[ Take[#, 1 ;; 1 ;; 2] & /@ TSGGenerationEvolveList[{2, TakeList[IntegerDigits[76, 3, 6], {1, 2, 3}]}, IntegerDigits[546, 3, 6], 180], {Automatic, 95}, .25]], Frame > False, ColorRules > {0 > White, 1 > Hue[.03, .9, 1], 2 > Hue[.7, .8, .5], 1 > GrayLevel[.85]}] 
And once again, the overall features of the behavior are very similar to Post’s system, with the longest halting times seen up to strings of length 14 being:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Text[Grid[ Prepend[{DecimalStringForm[{#[[1, 1]], #[[1, 2]]}], #[[2, 1]], If[# == 0, Style[#, Gray], #] &@#[[2, 2]]} & /@ {{3, {0, 10}} > {74, 0}, {5, {0, 91}} > {122, 0}, {6, {0, 546}} > {6627, 0}, {9, {0, 499}} > {9353, 0}, {9, {0, 610}} > {12789, 0}, {9, {0, 713}} > {20175, 0}, {9, {0, 1214}} > {175192, 0}, {9, {0, 18787}} > {336653, 0}, {10, {0, 17861}} > {519447, 0}, {10, {0, 29524}} > {21612756, 6}, {10, {0, 52294}} > {85446023, 0}, {11, {0, 93756}} > {377756468, 6}, {12, {0, 412474}} > {30528772851, 0}}, Style[#, Italic] & /@ {"initial state", "steps", "cycle length"}], Frame > All, Alignment > {{Left, Right, Right}}, FrameStyle > GrayLevel[.7], Background > {None, {GrayLevel[.9]}}]] 
But what about other possible rules? As an example, we can look at all 90 possible k = 3, r = 2 rules of the form 0→_, 1→__, 2→___ in which the righthand sides are “balanced” in the sense that in total they all contain two 0s, 1s and 2s. This shows the evolution (for 100 steps) for each of these rules that has the longest transient for any initial string with less than 7 elements:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; GraphicsGrid[ Partition[ ParallelMap[ ListStepPlot[ Length /@ TSGDirectEvolveList[{2, TakeList[IntegerDigits[#[[1]], 3, 6], {1, 2, 3}]}, IntegerDigits[#[[2, 2]], 3, #[[2, 1]]], 100], Center, PlotRange > {{0, 100}, Automatic}, AspectRatio > 1/3, Filling > Axis, Frame > True, FrameTicks > False, PlotStyle > Hue[0.07, 1, 1]] &, {44 > {5, 182}, 50 > {6, 492}, 52 > {3, 20}, 68 > {2, 6}, 70 > {5, 19}, 76 > {6, 546}, 98 > {3, 20}, 104 > {3, 2}, 106 > {5, 182}, 116 > {5, 182}, 128 > {6, 492}, 132 > {5, 182}, 140 > {6, 540}, 142 > {5, 181}, 146 > {4, 60}, 150 > {5, 163}, 154 > {3, 10}, 156 > {5, 100}, 176 > {6, 270}, 178 > {6, 540}, 184 > {6, 270}, 194 > {5, 173}, 196 > {6, 57}, 200 > {5, 182}, 204 > {6, 543}, 208 > {5, 173}, 210 > {6, 486}, 220 > {5, 91}, 226 > {5, 100}, 228 > {5, 91}, 260 > {5, 182}, 266 > {6, 492}, 268 > {5, 182}, 278 > {5, 182}, 290 > {6, 492}, 294 > {5, 164}, 302 > {6, 519}, 304 > {6, 30}, 308 > {6, 492}, 312 > {6, 489}, 316 > {6, 546}, 318 > {6, 546}, 332 > {6, 540}, 344 > {6, 492}, 348 > {5, 182}, 380 > {6, 519}, 384 > {6, 270}, 396 > {6, 276}, 410 > {5, 101}, 412 > {6, 543}, 416 > {6, 543}, 420 > {6, 57}, 424 > {6, 489}, 426 > {5, 164}, 434 > {6, 273}, 438 > {6, 513}, 450 > {6, 543}, 460 > {6, 516}, 462 > {5, 99}, 468 > {6, 30}, 500 > {6, 546}, 502 > {5, 181}, 508 > {6, 6}, 518 > {5, 99}, 520 > {6, 516}, 524 > {6, 543}, 528 > {5, 99}, 532 > {3, 9}, 534 > {6, 546}, 544 > {5, 181}, 550 > {6, 519}, 552 > {5, 181}, 572 > {6, 540}, 574 > {5, 181}, 578 > {3, 10}, 582 > {5, 172}, 586 > {6, 546}, 588 > {6, 513}, 596 > {5, 180}, 600 > {5, 18}, 612 > {6, 546}, 622 > {6, 519}, 624 > {6, 513}, 630 > {6, 519}, 652 > {6, 270}, 658 > {5, 19}, 660 > {6, 540}, 676 > {6, 57}, 678 > {6, 297}, 684 > {6, 30}}], 6]] 
Many lead quickly to cycles or termination. Others after 100 steps seem to be growing irregularly, but all the specific evolutions shown here eventually halt. There are peculiar cases, like 0→0, 1→02, 2→112 which precisely repeats the initial string 20 after 18,255 steps:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ Length /@ TSGDirectEvolveList[{2, TakeList[IntegerDigits[68, 3, 6], {1, 2, 3}]}, IntegerDigits[6, 3, 2], 40000], Center, AspectRatio > 1/5, Filling > Axis, Frame > True, PlotStyle > Hue[0.07, 1, 1]] 
And then there are cases like 0→0, 1→01, 2→212, say starting from 200020, which either halt quickly, or generate strings of everincreasing length (here like ) and can easily be seen never to halt:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ Length /@ TSGDirectEvolveList[{2, TakeList[IntegerDigits[50, 3, 6], {1, 2, 3}]}, IntegerDigits[492, 3, 6], 100], Center, AspectRatio > 1/3, Filling > Axis, Frame > True, PlotStyle > Hue[0.07, 1, 1]] 
(By the way, the situation with “nonbalanced” k = 3 rules is not fundamentally different from balanced ones; 0→0, 1→22, 2→102, for example, shows very “Postlike” behavior.)
The tag systems we’ve been discussing are pretty simple. But an even simpler version considered in A New Kind of Science are what I called cyclic tag systems. In a cyclic tag system one removes the first element of the string at each step. On successive steps, one cycles through a collection of possible blocks to add, adding one if the deleted element was a 1 (and otherwise adding nothing).
If the possible blocks to add are 111 and 0, then the behavior starting from the string 1 is as follows
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[ PadRight[CTEvolveList[{{1, 1, 1}, {0}}, {1}, 25], {Automatic, 18}, .25], Mesh > True, MeshStyle > GrayLevel[0.75, 0.75]] 
with the lengths “detrended by t/2” behaving once again like an approximate random walk:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ MapIndexed[#  First[#2]/2 &, Length /@ CTEvolveList[{{1, 1, 1}, {0}}, {1}, 20000]], Center, AspectRatio > 1/4, Filling > Axis, Frame > True, PlotStyle > Hue[0.07, 1, 1]] 
With cycles of just 2 blocks, one typically sees either quick cycling or termination, or what seems like obvious infinite growth. But if one allows a cycle of 3 blocks, more complicated halting behavior becomes possible.
Consider for example 01, 0, 011. Starting from 0111 one gets
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[ PadRight[CTEvolveList[{{0, 1}, {0}, {0, 1, 1}}, {0, 1, 1, 1}, 20], {Automatic, 8}, .25], Mesh > True, MeshStyle > GrayLevel[0.75, 0.75]] 
with the system halting after 169 steps:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ Length /@ CTEvolveList[{{0, 1}, {0}, {0, 1, 1}}, {0, 1, 1, 1}, 200], Center, AspectRatio > 1/4, Filling > Axis, Frame > True, PlotStyle > Hue[0.07, 1, 1]] 
Here are the transient+cycle times for initial strings up to size 8 (the system usually just terminates, but for example 001111 goes into a cycle of length 18):
✕
With[{r = {{0, 1}, {0}, {0, 1, 1}}, list = Catenate[ Table[IntegerDigits[i, 2, n], {n, 1, 8}, {i, 0, 2^n  1}]]}, ListStepPlot[ Transpose[ Last /@ Monitor[ Flatten[Table[ ParallelTable[{n, i} > Length /@ FindTransientRepeat[ CTLengthList[r, IntegerDigits[i, 2, n], 800], 3], {i, 0, 2^n  1}], {n, 8}]], n]], Center, PlotRange > {0, 500}, PlotStyle > {Hue[0.1, 1, 1], Hue[0.02, 0.92, 0.8200000000000001]}, PlotLayout > "Stacked", Joined > True, Filling > Automatic, Frame > True, AspectRatio > 1/5, FrameTicks > {{Automatic, None}, {Extract[ MapThread[ List[#1, Rotate[Style[StringJoin[ToString /@ #2], FontFamily > "Roboto", Small], 90 Degree]] &, {Range[0, 509], list}], Position[list, Alternatives @@ Select[list, IntegerExponent[FromDigits[#, 2], 2] > Length[#]/1.5 && Length[#] > 2 &]]], None}}]] 
The behavior of the longesttohaltsofar “winners” are again similar to what we have seen before—except perhaps for the rather huge jump in halting time at length 13—that isn’t surpassed until size 16:
✕
Text[Grid[ Prepend[MapIndexed[{Style[Row[{#[[1, 1]], ":", #[[1, 2]]}], If[First[#2] > 6, Gray, Black]], Style[#[[2]], If[First[#2] > 6, Gray, Black]]} &, {{1, 1} > 59, {4, 7} > 169, {5, 21} > 1259, {7, 126} > 6470, {10, 687} > 134318, {13, 7655} > 10805957330 (* ,{13, 7901}\[Rule]180044,{13,7903}\[Rule]2431313,{14, 12270}\[Rule]7490186,{16,14999}\[Rule]3367712,{16, 15055}\[Rule]12280697,{16,43961}\[Rule]27536759 *)}], Style[#, Italic] & /@ {"initial state", "steps"}], Frame > All, Alignment > {{Left, Right, Right}}, FrameStyle > GrayLevel[.7], Background > {None, {GrayLevel[.9]}}]] 
✕

When Post originally invented tag systems in 1920 he intended them as a stringbased idealization of the operations in mathematical proofs. But a decade and a half later, once Turing machines were known, it started to be clear that tag systems were better framed as being computational systems. And by the 1940s it was known that at least in principle stringrewriting systems of the kind Post used were capable of doing exactly the same types of computations as Turing machines—or, as we would say now, that they were computation universal.
At first what was proved was that a fairly general stringrewriting system was computation universal. But by the early 1960s it was known that a tag system that looks only at its first element is also universal. And in fact it’s not too difficult to write a “compiler” that takes any Turing machine rule and converts it to a tag system rule—and page 670 of A New Kind of Science is devoted to showing a pictorial example of how this works:
For example we can take the simplest universal Turing machine (which has 2 states and 3 colors) and compile it into a 2elementdeletion tag system with 32 possible elements (the ones above 9 represented by letters) and rules:
✕

But what about a tag system like Post’s 00, 1101 one—with much simpler rules? Could it also be universal?
Our practical experience with computers might make us think that to get universality we would necessarily have to have a system with complicated rules. But the surprising conclusion suggested by the Principle of Computational Equivalence is that this is not correct—and that instead essentially any system whose behavior is not obviously simple will actually be capable of universal computation.
For any particular system it’s usually extremely difficult to prove this. But we now have several examples that seem to validate the Principle of Computational Equivalence—in particular the rule 110 cellular automaton and the 2,3 Turing machine. And this leads us to the conjecture that even tag systems with very simple rules (at least ones whose overall behavior is not obviously simple) should also be computation universal.
How can we get evidence for this? We might imagine that we could see a particular tag system “scanning over” a wide range of computations as we change its initial conditions. Of course, computation universality just says that it must be possible to construct an initial condition that performs any given computation. And it could be that to perform any decently sophisticated computation would require an immensely complex initial condition, that would never be “found naturally” by scanning over possible initial conditions.
But the Principle of Computational Equivalence actually goes further than just saying that all sorts of systems can in principle do sophisticated computations; it says that such computations should be quite ubiquitous among possible initial conditions. There may be some special initial conditions that lead to simple behavior. But other initial conditions should produce behavior that corresponds to a computation that is in a sense as sophisticated as any other computation.
And a consequence of this is that the behavior we see will typically be computationally irreducible: that in general there will be no way to compute its outcome much more efficiently than just by following each of its steps. Or, in other words, when we observe the system, we will have no way to computationally reduce it—and so its behavior will seem to us complex.
So when we find behavior in tag systems that seems to us complex—and that we do not appear able to analyze or predict—the expectation is that it must correspond to a sophisticated computation, and be a sign that the tag system follows the Principle of Computational Equivalence and is computation universal.
But what actual computations do particular tag systems do? Clearly they do the computations that are defined by their rules. But the question is whether we can somehow also interpret the overall computations they do in terms of familiar concepts, say in mathematics or computer science.
Consider for example the 2elementdeletion tag system with rules 1→111. Starting it off with 11 we get
✕

and we can see that the tag in effect just “counts up in unary”. (The 1elementdeletion rule 1→11 does the same thing.)
Now consider the tag system with rules:
✕
First[#] > Row[Last[#]] & /@ {1 > {2, 2}, 2 > {1, 1, 1, 1}} 
Starting it with 11 we get
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Column[Row /@ TSEvolveList[{2, {1 > {2, 2}, 2 > {1, 1, 1, 1}}}, {1, 1}, 8]] 
or more pictorially (red is 1, blue is 2):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[ PadRight[TSEvolveList[{2, {1 > {2, 2}, 2 > {1, 1, 1, 1}}}, {1, 1}, 34], Automatic], Mesh > True, MeshStyle > GrayLevel[.75, .75], ColorRules > {3 > White, 1 > Hue[.03, .9, 1], 2 > Hue[.7, .8, .5], 0 > GrayLevel[.85]}] 
But now look at steps where strings of only 1s appear. The number of 1s in these strings forms the sequence
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Total /@ Cases[ TSEvolveList[{2, {1 > {2, 2}, 2 > {1, 1, 1, 1}}}, {1, 1}, 1000], {1 ...}] 
of successive powers of 2. (The 1elementdeletion rule 1→2, 2→11 gives the same sequence.)
The rule
✕
First[#] > Row[Last[#]] & /@ {1 > {2, 2}, 2 > {1, 1, 1}} 
starting from 11 yields instead
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[ PadRight[TSEvolveList[{2, {1 > {2, 2}, 2 > {1, 1, 1}}}, {1, 1}, 80], Automatic], MeshStyle > GrayLevel[.75, .75], Frame > False, ColorRules > {3 > White, 1 > Hue[.03, .9, 1], 2 > Hue[.7, .8, .5], 0 > GrayLevel[.85]}] 
and now the lengths of the sequences of 1s form the sequence:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Total /@ Cases[ TSEvolveList[{2, {1 > {2, 2}, 2 > {1, 1, 1}}}, {1, 1}, 10000], {1 ...}] 
This sequence is not as familiar as powers of 2, but it still has a fairly traditional “mathematical interpretation”: it is the result of iterating
✕
n > Ceiling[(3 n)/2] 
or
✕
n > If[EvenQ[n], (3 n)/2, (3 n + 1)/2 ] 
(and this same iteration applies for any initial string of 1s of any length).
But consider now the rule:
✕
First[#] > Row[Last[#]] & /@ {1 > {1, 2}, 2 > {1, 1, 1}} 
Here is what it does starting with sequences of 1s of different lengths:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Row[Table[ ArrayPlot[ PadRight[ TSEvolveList[{2, {1 > {1, 2}, 2 > {1, 1, 1}}}, Table[1, k], 100]], ImageSize > {Automatic, 150}, ColorRules > {3 > White, 1 > Hue[.03, .9, 1], 2 > Hue[.7, .8, .5], 0 > GrayLevel[.85]}], {k, 2, 20}], Spacer[2]] 
In effect it is taking the initial number of 1s n and computing the function:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ ParallelTable[ Last[Total /@ Cases[TSEvolveList[{2, {1 > {1, 2}, 2 > {1, 1, 1}}}, Table[1, k], 100000], {1 ...}]], {k, 1, 100}], Center, Filling > Axis, Frame > True, PlotRange > All, AspectRatio > 1/3, PlotStyle > Hue[0.07, 1, 1]] 
But what “is” this function? In effect it depends on the binary digits of n, and turns out to be given (for n > 1) by:
✕
With[{e = IntegerExponent[n + 1, 2]}, (3^e (n + 1))/2^e  1] 
What other “identifiable functions” can simple tag systems produce? Consider the rules:
✕
First[#] > Row[Last[#]] & /@ {1 > {2, 3}, 2 > {1}, 3 > {1, 1, 1}} 
Starting with a string of five 1s this gives (3 is white)
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[ PadRight[TSEvolveList[{2, {1 > {2, 3}, 2 > {1}, 3 > {1, 1, 1}}}, Table[1, 5], 100], {22, 10}], Mesh > True, ColorRules > {3 > White, 1 > Hue[.03, .9, 1], 2 > Hue[.7, .8, .5], 0 > GrayLevel[.85]}, MeshStyle > GrayLevel[0.85, 0.75]] 
in effect running for 21 steps and then terminating. If one looks at the string of 1s produced here, their sequence of lengths is 5, 8, 4, 2, 1, and in general the sequence is determined by the iteration
✕
n > If[EvenQ[n], n/2, 3 n + 1 ] 
except that if n reaches 1 the tag system terminates, while the iteration keeps going.
So if we ask what this tag system is “doing”, we can say it’s computing 3n + 1 problem iterations, and we can explicitly “see it doing the computation”. Here it’s starting with n = 7
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ArrayPlot[ PadRight[TSEvolveList[{2, {1 > {2, 3}, 2 > {1}, 3 > {1, 1, 1}}}, Table[1, 7], 200]], Frame > False, ColorRules > {3 > White, 1 > Hue[.03, .9, 1], 2 > Hue[.7, .8, .5], 0 > GrayLevel[.85]}] 
and here it’s starting with successive values of n:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Row[Table[ ArrayPlot[ PadRight[ TSEvolveList[{2, {1 > {2, 3}, 2 > {1}, 3 > {1, 1, 1}}}, Table[1, k], 150], {150, Automatic}], ImageSize > {Automatic, 160}, Frame > False, ColorRules > {3 > White, 1 > Hue[.03, .9, 1], 2 > Hue[.7, .8, .5], 0 > GrayLevel[.85]}], {k, 2, 21}], Spacer[2]] 
Does the tag system always eventually halt? This is exactly the 3n + 1 problem—which has been unsolved for the better part of a century.
It might seem remarkable that even such a simple tag system rule can in effect give us such a difficult mathematical problem. But the Principle of Computational Equivalence makes this seem much less surprising—and in fact it tells us that we should expect tag systems to quickly “ascend out of” the range of computations to which we can readily assign traditional mathematical interpretations.
Changing the rule to
✕
First[#] > Row[Last[#]] & /@ {1 > {2, 3}, 2 > {1, 1, 1}, 3 > {1}} 
yields instead the iteration
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Row[Table[ ArrayPlot[ PadRight[ TSEvolveList[{2, {1 > {2, 3}, 2 > {1, 1, 1}, 3 > {1}}}, Table[1, k], 150], {150, Automatic}], ImageSize > {Automatic, 160}, Frame > False, ColorRules > {3 > White, 1 > Hue[.03, .9, 1], 2 > Hue[.7, .8, .5], 0 > GrayLevel[.85]}], {k, 2, 21}], Spacer[2]] 
which again is “interpretable” as corresponding to the iteration:
✕
n > If[EvenQ[n], 3 n/2, (n  1)/2] 
But what if we consider all possible rules, say with the very simple form 1→__, 2→___? Here is what each of the 32 of these does starting from 1111:
✕

For some of these we’ve been able to identify “traditional mathematical interpretations”, but for many we have not. And if we go even further and look at the very simplest nontrivial rules—of the form 1→_, 2→___—here is what happens starting from a string of 10 1s:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Row[ArrayPlot[ PadRight[ TSNEvolveList[{2, #}, Table[1, 10], 40], {40, Automatic}], ImageSize > {Automatic, 120}, Frame > False, ColorRules > {3 > White, 1 > Hue[.03, .9, 1], 2 > Hue[.7, .8, .5], 0 > GrayLevel[.85]}] & /@ (TakeList[#, {1, 3}] & /@ Tuples[{1, 2}, 4]), Spacer[1]] 
One of these rules we already discussed above
✕
First[#] > Row[Last[#]] & /@ {1 > {2}, 2 > {2, 2, 1}} 
and we found that it seems to lead to infinite irregular growth (here shown “detrended” by ):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ MapIndexed[#  (Sqrt[2]  1) First[#2] &, TagLengthFunction[{2, {{1}, {1, 1, 0}}}][Table[0, 10], 10000]], Center, AspectRatio > 1/4, Filling > Axis, Frame > True, PlotStyle > Hue[0.07, 1, 1]] 
But even in the case of
✕
First[#] > Row[Last[#]] & /@ {1 > {2}, 2 > {1, 1, 1}} 
which appears always to halt
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Row[Table[ ArrayPlot[ PadRight[ TSNEvolveList[{2, {{2}, {1, 1, 1}}}, Table[1, k], 40], {40, Automatic}], ImageSize > {Automatic, 120}, Frame > False, ColorRules > {3 > White, 1 > Hue[.03, .9, 1], 2 > Hue[.7, .8, .5], 0 > GrayLevel[.85]}], {k, 17}], Spacer[1]] 
the differences between halting times with successive sizes of initial strings form a surprisingly complex sequence
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ Differences[ First /@ Table[ Length /@ FindTransientRepeat[ TSNEvolveList[{2, {{2}, {1, 1, 1}}}, Table[1, k], 600], 3], {k, 150}]], Center, PlotRange > {0, 21}, AspectRatio > 1/5, Filling > Axis, Frame > True, PlotStyle > Hue[0.07, 1, 1]] 
that does not seem to have any simple traditional mathematical interpretation. (By the way, in a case like this it’s perfectly possible that there will be some kind of “mathematical interpretation”—though it might be like the page of weird definitions that I found for halting times of Turing machine 600720 in A New Kind of Science.)
When Emil Post was studying his tag system back in 1921, one of his big questions was: “Does it always halt?” Frustratingly enough, I must report that even a century later I still haven’t been able to answer this question.
Running Post’s tag system on my computer I’m able to work out what it does billions of times faster than Post could. And I’ve been able to look at billions of possible initial strings. And I’ve found that it can take a very long time—like half a trillion steps—for the system to halt:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; Show[LengthsPlotDecimal[{2, 264107671}, 28, 643158954877, 100000000], FrameTicks > {{Automatic, None}, {Thread[{Range[0, 643000][[1 ;; 1 ;; 100000]], Append[Range[0, 500][[1 ;; 1 ;; 100]], "600 billion"]}], None}}] 
But so far—even with all the computation I’ve done—I haven’t found a single example where it doesn’t eventually halt.
If we were doing ordinary natural science, billions of examples that all ultimately work the same would normally be far more than enough to convince us of something. But from studying the computational universe we know that this kind of “scientific inference” won’t always be correct. Gödel’s theorem from 1931 introduced the idea of undecidability (and it was sharpened by Turing machines, etc.). And that’s what can bite us in the computational universe.
Because one of the consequences of undecidability as we now understand it is that there can be questions where there may be no bound on how much computation will be needed to answer them. So this means that even if we have failed to see something in billions of examples that doesn’t mean it’s impossible; it may just be that we haven’t done enough computation to see it.
In practice it’s tended to be assumed, though, that undecidability is something rare and exotic, that one will only run into if one asks some kind of awkward—or “meta”—question. But my explorations in the computational universe—and in particular my Principle of Computational Equivalence—imply that this is not correct, and that instead undecidability is quite ubiquitous, and occurs essentially whenever a system can behave in ways that are not obviously simple.
And this means that—despite the simplicity of its construction—it’s actually to be expected that something like the 00, 1101 tag system could show undecidability, and so that questions about it could require arbitrary amounts of computational effort to answer. But there’s something of a catch. Because the way one normally proves the presence of undecidability is by proving computation universality. But at least in the usual way of thinking about computation universality, a universal system cannot always halt—since otherwise it wouldn’t be able to emulate systems that themselves don’t halt.
So with this connection between halting and computation universality, we have the conclusion that if the 00, 1101 tag system always halts it cannot be computation universal. So from our failure to find a nonhalting example the most obvious conclusion might be that our tag system does in fact always halt, and is not universal.
And this could then be taken as evidence against the Principle of Computational Equivalence, or at least its application to this case. But I believe strongly enough in the Principle of Computational Equivalence that I would tend to draw the opposite conclusion: that actually the 00, 1101 tag system is universal, and won’t always halt, and it’s just that we haven’t gone far enough in investigating it to see a nonhalting example yet.
But how far should we have to go? Undecidability says we can’t be sure. But we can still potentially use experience from studying other systems to get some sense. And this in fact tends to suggest that we might have to go a long way to get our first nonhalting example.
We saw above an example of cellular automata in which unbounded growth (a rough analog of nonhalting) does occur, but we have to look through nearly 100,000 initial conditions before we find it. A New Kind of Science contains many other examples. And in number theory, it is quite routine to have Diophantine equations where the smallest solutions are very large.
How should we think about these kinds of things? In essence, we are taking computation universal systems and trying to “program them” (by setting up appropriate initial conditions) to have a particular form of behavior, say nonhalting. But there is nothing to say these programs have to be short. Yes, nonhalting might seem to us like a simple objective. And, yes, the universal system should in the end be able to achieve it. But given the particular components of the universal system, it may be complicated to get.
Let me offer two analogies. The first has to do with mathematical proofs. Having found the very simplest possible axiom system for Boolean algebra ((p · q) · r) · (p · ((p · r) · p)) = = r, we know that in principle we can prove any theorem in Boolean algebra. But even something like p · q = q · p—that might seem simple to us—can take hundreds of elaborate steps to prove given our particular axiom system.
As a more whimsical example, consider the process of selfreproduction. It seems simple enough to describe this objective, yet to achieve it, say with the components of molecular biology, may be complex. And maybe on the early Earth it was only because there were so many molecules, and so much time, that selfreproduction could ever be “discovered”.
One might think that, yes, it could be difficult to find something (like a nonhalting initial condition, or a configuration with particular behavior in a cellular automaton) by pure search, but that it would still be possible to systematically “engineer” one. And indeed there may be ways to “engineer” initial conditions for the 00, 1101 tag system. But in general it is another consequence of the Principle of Computational Equivalence (and computational irreducibility) that there is no guarantee that there will be any “simple engineering path” to reach any particular capability.
By the way, one impression from looking at tag systems and many other kinds of systems is that as one increases the sizes of initial conditions, one crosses a sequence of thresholds for different behaviors. Only at size 14, for example, might some long “highway” in our tag system’s state transition graph appear. And then nothing longer might appear until size 17. Or some particular period of final cycle might only appear at size15 initial conditions. It’s as if there’s a “minimum program length” needed to achieve a particular objective, in a particular system. And perhaps similarly there’s a minimum initial string length necessary to achieve nonhalting in our tag system—that we just don’t happen to have reached yet. (I’ve done random searches in longer initial conditions, though, so we at least know it’s not common there.)
OK, but let’s try a different tack. Let’s ask what would be involved in proving that the tag system doesn’t always halt. We’re trying to prove essentially the following statement: “There exists an initial condition i such that for all steps t the tag system has not halted”. In the language of mathematical logic this is a ∃∀ statement, that is at the level in the arithmetic hierarchy.
One way to prove it is just explicitly to find a string whose evolution doesn’t halt. But how would one show that the evolution doesn’t halt? It might be obvious: there might for example just be something like a fixed block that is getting added in a simple cycle of some kind, as in:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"];ListStepPlot[ Length /@ TSGDirectEvolveList[{2, TakeList[IntegerDigits[#[[1]], 3, 6], {1, 2, 3}]}, IntegerDigits[#[[2, 2]], 3, #[[2, 1]]], 100], Center, PlotRange > {{0, 100}, Automatic}, AspectRatio > 1/3, Filling > Axis, Frame > True, FrameTicks > False, PlotStyle > Hue[0.07, 1, 1]] &[52 > {3, 20}] 
But it also might not be obvious. It could be like some of our examples above where there seems to be systematic growth, but where there are small fluctuations:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ TagLengthFunction[{2, {{1}, {1, 1, 0}}}][{1, 0}, 200], Center, AspectRatio > 1/3, Filling > Axis, Frame > True, FrameTicks > False, PlotStyle > Hue[0.07, 1, 1]] 
Will these fluctuations suddenly become big and lead the system to halt? Or will they always stay somehow small enough that that cannot happen? There are plenty of questions like this that arise in number theory. And sometimes (as, for example, with the Skewes number associated with the distribution of primes) there can be surprises, with very longterm trends getting reversed only in exceptionally large cases.
By the way, even identifying “halting” can be difficult, especially if (as we do for our tag system) we define “halting” to include going into a cycle. For example, we saw above a tag system that does cycle, but takes more than 18,000 steps to do so:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ Length /@ TSGDirectEvolveList[{2, TakeList[IntegerDigits[68, 3, 6], {1, 2, 3}]}, IntegerDigits[6, 3, 2], 40000], Center, AspectRatio > 1/5, Filling > Axis, Frame > True, FrameTicks > False, PlotStyle > Hue[0.07, 1, 1]] 
Conversely, just because something takes a long time to halt doesn’t mean that it will be difficult to show this. For example, it is quite common to see Turing machines that take a huge number of steps to halt, but behave in basically systematic and predictable ways (this one takes 47,176,870 steps):
✕

But to “explain why something halts” we might want to have something like a mathematical proof: a sequence of steps consistent with a certain set of axioms that derives the fact that the system halts. In effect the proof is a higherlevel (“symbolic”) way of representing aspects of what the system is doing. Instead of looking at all the individual values at each step in the evolution of the system we’re just calling things x and y (or whatever) and deriving relationships between them at some kind of symbolic level.
And given a particular axiom system it may or may not be possible to construct this kind of symbolic proof of any given fact. It could be that the axiom system just doesn’t have the “derivational power” to represent faithfully enough what the system we are studying is doing.
So what does this mean for tag systems? It means, for example, that it could perfectly well be that a given tag system evolution doesn’t halt—but that we couldn’t prove that using, say, the axiom system of Peano Arithmetic.
And in fact as soon as we have a system that is computation universal it turns out that any finite axiom system must eventually fail to be able to give a finite proof of some fact about the system. We can think of the axioms as defining certain relations about the system. But computational irreducibility implies that eventually the system will be able to do things which cannot be “reduced” by any finite set of relations.
Peano Arithmetic contains as an axiom the statement that mathematical induction works, in the sense that if a statement s[0] is true, and s[n] implies s[n + 1], then any statement s[n] must be true. But it’s possible to come up with statements that entail for example nested collections of recursions that effectively grow too quickly for this axiom alone to be able to describe symbolically “in one go” what they can do.
If one uses a stronger axiom system, however, then one will be able to do this. And, for example, Zermelo–Fraenkel set theory—which allows not only ordinary induction but also transfinite induction—may succeed in being able to give a proof even when Peano Arithmetic fails.
But in the end any finitely specified axiom system will fail to be able to prove everything about a computationally irreducible system. Intuitively this is because making proofs is a form of computational reduction, and it is inevitable that this can only go so far. But more formally, one can imagine using a computational system to encode the possible steps that can be made with a given axiom system. Then one would construct a program in the computational system that would systematically enumerate all theorems in the axiom system. (It may be easier to think of first creating a multiway system in which each possible application of the axiom rules is made, and then “unrolling” the multiway system to be “run sequentially”.)
And for example we could set things up so that the computational system halts if it ever finds an inconsistency in the theorems derived from the axiom system. But then we know that we won’t be able to prove that the computational system does not halt from within the axiom system because (by Gödel’s second incompleteness theorem) no nontrivial axiom system can prove its own consistency.
So if we chose to work, say, purely within Peano Arithmetic, then it might be that Post’s original question is simply unanswerable. We might have no way to prove or disprove that his tag system always halts. To know that might require a finer level of analysis—or, in effect, a higher degree of reduction—than Peano Arithmetic can provide. (Picking a particular model of Peano Arithmetic would resolve the question, but to home in on a particular model can in effect require infinite computational effort.)
If we have a tag system that we know is universal then it’s inevitable that certain things about it will not be provable within Peano Arithmetic, or any other finitely specified axiom system. But for any given property of the system it may be very difficult to determine whether that property is provable within Peano Arithmetic.
The problem is similar to proving computation universality: in effect one has to see how to encode some specified structure within a particular formal system—and that can be arbitrarily difficult to do. So just as it may be very hard to prove that the 00, 1101 tag system is computation universal, it may also be very difficult to prove that some particular property of it is not “accessible” through Peano Arithmetic.
Could it be undecidable whether the 00, 1101 tag system always halts? And if we could prove this, would this actually have proved that it in fact doesn’t halt? Recall that above we mentioned that at least the obvious statement of the problem is at the level in the arithmetic hierarchy. And it turns out that statements at this level don’t have “default truth values”, so proving undecidability wouldn’t immediately give us a conclusion. But there’s nothing to say that some clever reformulation might not reduce the problem to or , at which point proving undecidability would lead to a definite conclusion.
(Something like this in fact happened with the Riemann Hypothesis. At first this seemed like a statement, but it was reformulated as a statement—and eventually reduced to the specific statement several sections above that a particular computation should not terminate. But now if the termination of this is proved undecidable, it must in fact not terminate, and the Riemann Hypothesis must be true.)
Can one prove undecidability without proving computation universality? There are in principle systems that show “intermediate degrees”: they exhibit undecidability but cannot directly be used to do universal computation (and Post was in fact the person who suggested that this might be possible). But actual examples of systems with intermediate degree still seem to involve having computation universality “inside”, but then limiting the inputoutput capabilities to prevent the universality from being accessed, beyond making certain properties undecidable.
The most satisfying (and ultimately satisfactory) way to prove universality for the 00, 1101 tag system would simply be to construct a compiler that takes a specification of some other system that is known to support universality (say a particular knowntobeuniversal tag system, or the set of all possible tag systems) and then turns this into an initial string for the 00, 1101 tag system. The tag system would then “run” the string, and generate something that could readily be “decoded” as the result of the original computation.
But there are ways one might imagine establishing what amounts to universality, that could be enough to prove halting properties, even though they might not be as “practical” as actual ways to do computations. (Yes, one could conceivably imagine a molecularscale computer that works just like a tag system.)
In the current proofs of universality for the simplest cellular automata and Turing machines, for example, one assumes that their initial configurations contain “background” periodic patterns, with the specific input for a particular computation being a finitesize perturbation to this background. For a cellular automaton or Turing machine it seems fairly unremarkable to imagine such a background: even though it extends infinitely across the cells of the system it somehow does not seem to be adding more than a small amount of “new information” to the system.
But for a tag system it’s more complicated to imagine an infinite periodic “background”, because at every step the string the system is dealing with is finite. One could consider modifying the rules of the tag system so that, for example, there is some fixed background that acts as a “mask” every time the block of elements is added at the end of the string. (For example, the mask could flip the value of every element, relative to a fixed “coordinate system”.)
But with the original tag system rules the only way to have an infinite background seems to be to have an infinite string. But how could this work? The rules of the tag system add elements at the end of the string, and if the string is infinitely long, it will take an infinite number of steps before the values of these elements ever matter to the actual behavior of the system.
There is one slightly exotic possibility, however, which is to think about transfinite versions of the tag system. Imagine that the string in the tag system has a length given by a transfinite number, say the ordinal ω. Then it is perfectly meaningful in the context of transfinite arithmetic to imagine additional elements being added at positions ω + 1 etc. And if the tag system then runs for ω steps, its behavior can start to depend on these added elements.
And even though the strings themselves would be infinite, there can still be a finite (“symbolic”) way to describe the system. For example, there could be a function f[i] which defines the value of the element. Then we can formally write down the rules for the tag system in terms of this function. And even though it would take an infinite time to explicitly generate the strings that are specified, it can still be possible to “reason” about what happens, just by doing symbolic operations on the function f.
Needless to say, the various issues I’ve discussed above about provability in particular axiom systems may come into play. But there may still be cases where definite results about computation universality could be established “symbolically” about transfinite tag systems. And conceivably such results could then be “projected down” to imply undecidability or other results about tag systems with finite initial strings.
Clearly the question of proving (or disproving) halting for the 00, 1101 tag system is a complicated one. We might be lucky, and be able to find with our computers (or conceivably engineer) an initial string that we can see doesn’t halt. Or we might be able to construct a symbolic representation in which we can carry out a proof.
But ultimately we are in a sense at the mercy of the Principle of Computational Equivalence. There is presumably computational irreducibility in the 00, 1101 tag system that we can’t systematically outrun.
Yes, the trace of the tag system seems to be a good approximation to a random walk. And, yes, as a random walk it will halt with probability 1. But in reality it’s not a “truly random” random walk; it’s a walk determined by a specific computational process. We can turn our questions about halting to questions about the randomness of the walk (and to do so may provide interesting connections with the foundations of probability theory). But in the end we’re back to the same issues, and we’re still confronted by computational irreducibility.
Tag systems are simple enough that it’s conceivable they might have arisen in something like games even millennia ago. But for us tag systems—and particularly the specific 00, 1101 tag system we’ve mostly been studying—were the invention of Emil Post, in 1921.
Emil Post lived most of his life in New York City, though he was born (into a Jewish family) in 1897 in Augustow, Poland (then part of the Russian Empire). (And, yes, it’s truly remarkable how many of the notable contributors to mathematical logic in the early part of the 20th century were born to Jewish families in a fairly small region of what’s now eastern Poland and western Ukraine.)
As a child, Post seems to have at first wanted to be an astronomer, but having lost his left arm in a freak carrelated street accident at age 12 he was told this was impractical—and turned instead to mathematics. Post went to a public high school for gifted students and then attended City College of New York, graduating with a bachelor’s degree in math in 1917. Perhaps presaging a lifelong interest in generalization, he wrote his first paper while in college (though it wasn’t published until 15+ years later), on the subject of fractional differentiation.
He enrolled in the math PhD program at Columbia, where he got involved in a seminar studying Whitehead and Russell’s recently published Principia Mathematica, run by Cassius Keyser, who was one of the early American mathematicians interested in the foundations of math (and who wrote many books on history and philosophy around mathematics; a typical example being his 1922 Mathematical Philosophy, a Study of Fate and Freedom). Early in graduate school, Post wrote a paper about functional equations for the gamma function (related to fractional differentiation), but soon he turned to logic, and his thesis—written in 1920—included early versions of what became his signature ideas.
Post’s main objective in his thesis was to simplify, streamline and further formalize Principia Mathematica. He started by looking at propositional calculus, and tried to “drill down” to find out more of what logic was really about. He invented truth tables (as several other people also independently did) and used them to prove completeness and consistency results. He investigated how different logic functions could be built up from one another through composition, classifying different elements of what’s now called the Post lattice. (He commented on Nand and an early simple axiom system for it—and might well have gone further with it if he’d known the minimal axiom system for Nand that I finally discovered in 2000. In another smallintellectualworld story, I realize now his lattice is also similar to my “cellular automaton emulation network”.) Going in the direction of “what’s logic really about” Post also considered multivalued logic, and algebraic structures around it.
Post published the core of his thesis in 1921 as “Introduction to a General Theory of Elementary Propositions”, but—in an unfortunate and recurring theme—didn’t publish the whole thing for another 20 years. But even in 1920 Post had what he called “generalization by postulation” and this quickly turned into the idea that all operations in Principia Mathematica (or mathematics in general) could ultimately be represented as transformations (“production rules”) on strings of characters.
When he finally ended up publishing this in 1943 he called the resulting formal structures “canonical systems”. And already by 1920 he’d discovered that not all possible production rules were needed; it was sufficient to have only ones in “normal form” g$→$h, where $ is a “pattern variable”. (The idea of $ representing a pattern became common in early computer stringmanipulation systems, and in fact I used it for expression patterns in my SMP system in 1979—probably without at the time knowing it came from Post.)
Post was close to the concept of universal computation, and the notion that anything (in his case, any string transformation) could be built up from a fixed set of primitives. And in 1920 —in the effort to “reduce his primitives” he came up with tag systems. At the time—11 years before Gödel’s theorem—Post and others still thought that it might somehow be possible to “solve mathematics” in some finite way. Post felt he had good evidence that Principia Mathematica could be reduced to string rewriting, so now he just had to solve that.
One basic question was how to tell when two strings should be considered equivalent under the string rewriting rules. And in formulating a simple case of this Post came up with tag systems. In particular, he wanted to determine whether the “iterative process [of tag] was terminating, periodic, or divergent”. And Post made “the problem of ‘tag’… the major project of [his] tenure of a Procter fellowship in mathematics at Princeton during the academic year 1920–21.”
Post later reported that a “major success of the project was the complete solution of the problem for all bases in which μ and ν were both 2”, though stated that “even this special case… involved considerable labor”. But then, as he later wrote, “while considerable effort was expanded [sic] on the case μ = 2, ν > 2… little progress resulted… [with] such a simple basis as 0→00, 1→1101, ν = 3, proving intractable”. Post makes a footnote “Numerous initial sequences… tried [always] led… to termination or periodicity, usually the latter.” Then he added, reflecting our random walk observations, “It might be noted that an easily derived probability ‘prognostication’ suggested… that periodicity was to be expected.” (I’m curious how he could tell it should be periodicity rather than termination.)
But by the end of the summer of 1921, Post had concluded that “the solution of the general problem of ‘tag’ appeared hopeless, and with it [his] entire program of the solution of finiteness problems”. In other words, the seemingly simple problem of tag had derailed Post’s whole program of “solving mathematics”.
In 1920 Princeton had a top American mathematics department, and Post went there on a prestigious fellowship (recently endowed by the Procter of Procter & Gamble). But—like the problem of tag—things did not work out so well there for Post, and in 1921 he had the first of what would become a sequence of “runaway mind” manic episodes, in what appears to have been a cycle of what was then called manic depression.
It’s strange to think that the problem of tag might have “driven Post crazy”, and probably the timing of the onset of manic depression had more to do with his age—though Post later seems to have believed that the excitement of research could trigger manic episodes (which often involved talking intensely about streams of poorly connected ideas, like the “psychic ether” from which new ideas come, discovering a new star named “Post”, etc.) But in any case, in late 1921 Post—who had by then returned to Columbia—was institutionalized.
By 1924 he had recovered enough to take up an instructorship at Cornell, but then relapsed. Over the years that followed he supported himself by teaching high school in New York, but continued to have mental health issues. He married in 1929, had a daughter in 1932, and in 1935 finally became a professor at City College, where he remained for the rest of his life.
Post published nothing from the early 1920s until 1936. But in 1936—with Gödel’s theorem known, and Alonzo Church’s “An Unsolvable Problem of Elementary Number Theory” recently published—Post published a 3page paper entitled “Finite Combinatory Processes—Formulation 1”. Post comes incredibly close to defining Turing machines (he talks about “workers” interacting with a potentially infinite sequence of “marked” and “unmarked boxes”). And he says that he “expects [his] formulation to be logically equivalent to recursiveness in the sense of the Gödel–Church development”, adding “Its purpose, however, is not only to present a system of a certain logical potency but also, in its restricted field, of psychological fidelity”. Post doesn’t get too specific, but he does make the comment (rather resonating with my own work, and particularly our Physics Project) that the hypothesis of global success of these formalisms would be “not so much… a definition or an axiom but… a natural law”.
In 1936 Post also published his longestever paper: 142 pages on what he called “polyadic groups”. It’s basically about abstract algebra, but in typical Post style, it’s a generalization, involving looking not at binary “multiplication” operations but for example ternary ones. It’s not been a popular topic, though, curiously, I also independently got interested in it in the 1990s, eventually discovering Post’s work on it.
By 1941 Post was publishing more, including several nowclassic papers in mathematical logic, covering things like degrees of unsolvability, the unsolvability of the word problem for semigroups, and what’s now called the Post Correspondence Problem. He managed his time in a very precise way, following a grueling teaching schedule (with intense and precise lectures planned to the minute) and—apparently to maintain his psychological wellbeing—restricting his research activities to three specific hours each day (interspersed with walks). But by then he was a respected professor, and logic had become a more popular field, giving him more of an audience.
In 1943, largely summarizing his earlier work, Post published “Formal Reductions of the General Combinatorial Decision Problem”, and in it, the “problem of tag” makes its first published appearance:
Post notes that “the little progress made in [its] solution” makes it a “candidate for unsolvability”. (Notice the correction in Post’s handwriting “intensely” → “intensively” in the copy of his paper reproduced in his collected works.)
Through all this, however, Post continued to struggle with mental illness. But by the time he reached the age of 50 in 1947 he began to improve, and even loosened up on his rigid schedule. But in 1954 depression was back, and after receiving electroshock therapy (which he thought had helped him in the past), he died of a heart attack at the age of 57.
His former undergraduate student, Martin Davis, eventually published Post’s “Absolutely Undecidable Problems”, subtitled “Account of an Anticipation”, which describes the arc of Post’s work—including more detail on the story of tag systems. And in hindsight we can see how close Post came to discovering Gödel’s theorem and inventing the idea of universal computation. If instead of turning away from the complexity he found in tag systems he had embraced and explored it, I suspect he would have discovered not only foundational ideas of the 1930s, but also some of what I found half a century later in my bythencomputerassisted explorations of the computational universe.
When Post died, he left many unpublished notes. A considerable volume of them concern a major project he launched in 1938 that he planned to call “Creative Logic”. He seemed to feel that “extreme abstraction” as a way of exploring mathematics would give way to something in which it’s recognized that “processes of deduction are themselves essentially physical and hence subject to formulations in a physical science”. And, yes, there’s a strange resonance here with my own current efforts—informed by our Physics Project—to “physicalize” metamathematics. And perhaps I’ll discover that here too Post anticipated what was to come.
So what happened to tag systems? By the mid1950s Post’s idea of string rewriting (“production systems”) was making its way into many things, notably both the development of generative grammars in linguistics, and formal specifications of early computer languages. But tag systems—which Post had mentioned only once in his published works, and then as a kind of aside—were still basically unknown.
Post had come to his string rewriting systems—much as Turing had come to his Turing machines—as a way to idealize the processes of mathematics. But by the 1950s there was increasing interest in using such abstract systems as a way to represent “general computations”, as well as brains. And one person drawn in this direction was Marvin Minsky. After a math PhD in 1954 at Princeton on what amounted to analog artificial neural networks, he started exploring more discrete systems, initially finite automata, essentially searching for the simplest elements that would support universal computation (and, he hoped, thinkinglike behavior).
Near the end of the 1950s he looked at Turing machines—and in trying to find the simplest form of them that would be universal started looking at their correspondence with Post’s string rewriting systems. Marvin Minsky knew Martin Davis from their time together as graduate students at Princeton, and by 1958 Davis was fully launched in mathematical logic, with a recently published book entitled Computability and Unsolvability.
As Davis tells it now, Minsky phoned him about some unsolvability results he had about Post’s systems, asking if they were of interest. Davis told him about tag systems, and that Post had thought they might be universal. Minsky found that indeed they were, publishing the result in 1960 in “Recursive Unsolvability of Post’s Problem of ‘Tag’ and Other Topics in Theory of Turing Machines”.
Minsky had recently joined the faculty at MIT, but also had a position at MIT’s Lincoln Laboratory, where in working on computing for the Air Force there was a collaboration with IBM. And it was probably through this that Minsky met John Cocke, a lifelong computer designer (and general inventor) at IBM (who in later years was instrumental in the development of RISC architecture). The result was that in 1963 Minsky and Cocke published a paper entitled “Universality of Tag Systems with P=2” that dramatically simplified Minsky’s construction and showed (essentially by compiling to a Turing machine) that universality could be achieved with tag systems that delete only 2 elements at each step. (One might think of it as an ultimate RISC architecture.)
For several years, Minsky had been trying to find out what the simplest universal Turing machine might be, and in 1962 he used the results Cocke and he had about tag systems to construct a 7state, 4color universal machine. That machine remained the record holder for the simplest known universal Turing machine for more than 40 years, though finally now we know the very simplest possible universal machine: a 2,3 machine that I discovered and conjectured would be universal—and that was proved so by Alex Smith in 2007 (thereby winning a prize I offered).
But back in 1967, the visibility of tag systems got a big boost. Minsky wrote an influential book entitled Computation: Finite and Infinite Machines—and the last part of the book was devoted to “SymbolManipulation Systems and Computability”, with Post’s string rewriting systems a centerpiece.
But my favorite part of Minsky’s book was always the very last chapter: “Very Simple Bases for Computability”. And there on page 267 is Post’s tag system:
Minsky reports that “Post found this (00, 1101) problem ‘intractable’, and so did I, even with the help of a computer”. But then he adds, in a style very characteristic of the Marvin Minsky I knew for nearly 40 years: “Of course, unless one has a theory, one cannot expect much help from a computer (unless it has a theory)…” He goes on to say that “if the reader tries to study the behavior of 100100100100100100 without [the aid of a computer] he will be sorry”.
Well, I guess computers have gotten a lot faster since the early 1960s; for me now it’s trivial to determine that this case evolves to a 10cycle after 47 steps:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/PostTagSystem/Programs01.wl"]; ListStepPlot[ Length /@ TSDirectEvolveList[Flatten[Table[{1, 0, 0}, 6]], 90], Filling > Axis, Frame > True, AspectRatio > 1/3, PlotStyle > Hue[0.07, 1, 1]] 
(By the way, I recently asked Martin Davis if Post had ever run a tag system on a computer. He responded: “Goodness! When Post died von Neumann still thought that a dozen computers should suffice for America’s needs. I guess I could have programmed [the tag system] for the [Institute for Advanced Study] computer, but it never occurred to me to do so.” Notably, in 1954 Davis did start programming logic theoremproving algorithms on that computer.)
After their appearance in Minsky’s book, tag systems became “known”, but they hardly became famous, and only a very few papers appeared about them. In 1972, at least their name got some visibility, when Alan Cobham, a longtime IBMer then working on coding theory, published a paper entitled “Uniform Tag Sequences”. Yes, this was about tag systems, but now with just one element being deleted at each step, which meant there couldn’t really be any interaction between elements. The mathematics was much more tractable (this was one of several inventions of neighborindependent substitution systems generating purely nested behavior), but it didn’t really say anything about Post’s “problem of tag”.
When I started working on A New Kind of Science in 1991 I wanted to explore the computational universe of simple programs as widely as I could—to find out just how general (or not) the surprising phenomena I’d seen in cellular automata in the 1980s actually were. And almost from the beginning in the table of contents for my chapter on “The World of Simple Programs”, nestled between substitution systems and register machines, were tag systems (I had actually first mentioned tag systems in a paper in 1985):
In the main text, I only spent two pages on them:
And I did what I have done so many times for so many kinds of systems: I searched and found remarkably simple rules that generate complex behavior. And then on these pages I showed my favorite examples. (I generalized Post’s specific tag systems by allowing dependence on more than just the first element.)
Did I look at Post’s specific 00, 1101 system? A New Kind of Science includes the note:
And, yes, it mentions Post’s 00, 1101 tag system, then comments that “at least for all the initial conditions up to length 28, the rule eventually just leads to behavior that repeats”. An innocuouslooking statement, in very small print, tucked at the back of my very big book. But like so many such statements in the book, there was quite a lot behind it. (By the way, “length 28” then is what I would consider [compressed] length 9 now.)
A quick search of my filesystem quickly reveals (.ma is an earlier format for notebooks that, yes, we can still read over a third of a century later):
I open one of the notebook files (and, yes, windows—and screens—were tiny in those days):
And there it is! Post’s 00, 1101 tag system, along with many others I was studying. And it seems I couldn’t let go of this; in 1994 I was running a standalone program to try to find infinitely growing cases. Here’s the output:
So that’s where I got my statement about “up to size 28” (now size 9) from. I don’t know how long this took to run; “pyrethrum” was at the time the fastest computer at our company—with a newfangled 64bit CPU (a DEC Alpha) running at the nowsnailsounding clock speed of 150 MHz.
My archives from the early 1990s record a fair amount of additional “traffic” about tag systems. Interactions with Marvin Minsky. Interactions with my thenresearchassistant about what I ended up calling “cyclic tag systems” (I originally called them “cyclic substitution systems”).
For nearly 15 years there’s not much. That is, until June 25, 2007. It’s been my tradition since we started our Wolfram Summer School back in 2003 that on the first day I do a “live experiment”, and try to discover something. Well, that day I decided to look at tag systems. Here’s how I began:
Right there, it’s Post’s 00, 1101 system. And I think I took it further than I’d ever done before. Pretty soon I was finding “long survivors” (I even got one that lasted more than 200,000 steps):
I was drawing state transition graphs:
But I obviously decided that I couldn’t get further with the 00, 1101 system that day. So I turned to “variants” and quickly found the 2elementdeletion 1, 110 rule that I’ve described above.
I happened to write a piece about this particular live experiment (“Science: Live and in Public”), and right then I made a mental note: let me look at Post’s tag system again before its centenary, in 2021. So here we are….
Emil Post didn’t manage to crack his 00, 1101 tag system back in 1921 with hand calculations. But we might imagine that a century later—with the equivalent of tens of billions times more computational power we’d be able to do. But so far I haven’t managed it.
For Post, the failure to crack his system derailed his whole intellectual worldview. For me now, the failure to crack Post’s system in a sense just bolsters my worldview—providing yet more indication of the strength and ubiquity of computational irreducibility and the Principle of Computational Equivalence.
After spending several weeks throwing hundreds of modern computers and all sorts of computational methods at Post’s 00, 1101 tag system, what do we know? Here’s a summary:
What’s missing here? Post wanted to know whether the system would halt, and so do we. But now the Principle of Computational Equivalence makes a definite prediction. It predicts that the system should be capable of universal computation. And this basically has the implication that the system can’t always halt: there has to be some initial string that will make it grow forever.
In natural science it’s standard for theories to make predictions that can be investigated by doing experiments in the physical world. But the kind of predictions that the Principle of Computational Equivalence makes are more general; they’re not just about particular systems in the natural world, but about all possible abstract systems, and in a sense all conceivable universes. But it’s still possible to do experiments about them, though the experiments are now not physical ones, but abstract ones, carried out in the computational universe of possible programs.
And with Post’s tag system we have an example of one particular such experiment: can we find nonhalting behavior that will validate the prediction that the system can support universal computation? To do so would be another piece of evidence for the breadth of applicability of the Principle of Computational Equivalence.
But what’s going to be involved in doing it? Computational irreducibility tells us that we can’t know.
Traditional mathematical science has tended to make the assumption that once you know an abstract theory for something, then you can work out anything you want about it. But computational irreducibility shows that isn’t true. And in fact it shows how there are fundamental limitations to science that intrinsically arise from within science itself. And our difficulty in analyzing Post’s tag system is in a sense just an “in your face” example of how strong these limitations can be.
But the Principle of Computational Equivalence says that somewhere we’ll see nonhalting behavior. It doesn’t tell us exactly what that behavior will be like, or how difficult it’ll be for us to interpret what we see. But it says that the “simple conclusion” of “always halting” shouldn’t continue forever.
I’ve so far done nearly a quintillion iterations of Post’s tag system in all. But that hasn’t been enough. I’ve been able to optimize the computations a bit. But fundamentally I’ve been left with what seems to be raw computational irreducibility. And to make progress I seem to need more time and more computers.
Will a million of today’s computers be enough? Will it take a billion? I don’t know. Maybe it requires a new level of computational speed. Maybe to resolve the question requires more steps of computation than the physical universe has ever done. I don’t know for sure. But I’m optimistic that it’s within the current computational capabilities of the world to find that little string of bits for the tag system that will allow us to see more about the general Principle of Computational Equivalence and what it predicts.
In the future there will be ever more that we will want and need to explore in the computational universe. And in a sense the problem of tag is a dry run for the kinds of things that we will see more and more often. But with the distinction of a century of history it’s a good place to rally our efforts and learn more about what’s involved.
So far it’s only been my computers that have been working on this. But we’ll be setting things up so that anyone can join the project. I don’t know if it’ll get solved in a month, a year or a century. But with the Principle of Computational Equivalence as my guide I’m confident there’s something interesting to discover. And a century after Emil Post defined the problem I, for one, want to see it resolved.
The main tagsystemrelated functions used are in the Wolfram Function Repository, as TagSystemEvolve, TagSystemEvolveList, TagSystemConvert, CyclicTagSystemEvolveList.
A list of t steps in the evolution of the tag system from an (uncompressed) initial list init can be achieved with
✕
TagSystemEvolveList[init_List, t_Integer] := With[{ru = Dispatch[{{0, _, _, s___} > {s, 0, 0}, {1, _, _, s___} > {s, 1, 1, 0, 1}}]}, NestList[Replace[ru], init, t]] 
or
✕
TagSystemEvolveList[init_List, t_Integer] := NestWhileList[ Join[Drop[#, 3], {{0, 0}, {1, 1, 0, 1}}[[1 + First[#]]]] &, init, Length[#] >= 3 &, 1, t] 
giving for example:
✕
TagSystemEvolveList[{1, 0, 0, 1, 0}, 4] 
The list of lengths can be obtained from
✕
TagSystemLengthList[init_List, t_Integer] := Reap[NestWhile[(Sow[Length[#]]; #) &[ Join[Drop[#, 3], {{0, 0}, {1, 1, 0, 1}}[[1 + First[#]]]]] &, init, Length[#] >= 3 &, 1, t]][[2, 1]] 
giving for example:
✕
TagSystemLengthList[{1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0}, 25] 
The output from t steps of evolution can be obtained from:
✕
TagSystemEvolve[init_List, t_Integer] := NestWhile[Join[Drop[#, 3], {{0, 0}, {1, 1, 0, 1}}[[1 + First[#]]]] &, init, Length[#] >= 3 &, 1, t] 
A version of this using a lowlevel queue data structure is:
✕
TagSystemEvolve[init_List, t_Integer] := Module[{q = CreateDataStructure["Queue"]}, Scan[q["Push", #] &, init]; Do[If[q["Length"] >= 3, Scan[q["Push", #] &, If[q["Pop"] == 0, {0, 0}, {1, 1, 0, 1}]]; Do[q["Pop"], 2]], t]; Normal[q]] 
The compressed {p, values} form of a tag system state can be obtained with
✕
TagSystemCompress[list_] := {Mod[Length[list], 3], Take[list, 1 ;; 1 ;; 3]} 
while an uncompressed form can be recovered with:
✕
TagSystemUncompress[{p_, list_}, pad_ : 0] := Join[Riffle[list, Splice[{pad, pad}]], Table[pad, <0 > 2, 1 > 0, 2 > 1>[p]]] 
Each step in evolution in compressed form is obtained from
✕
TagSystemCompressedStep[{p_, {s_, r___}}] := Apply[{#1, Join[{r}, #2]} &, <{0, 0} > {2, {0}}, {1, 0} > {0, {}}, {2, 0} > {1, {0}}, {0, 1} > {1, {1, 1}}, {1, 1} > {2, {0}}, {2, 1} > {0, {1}}>[{p, s}]] 
or:
✕
TagSystemCompressedStep[list : {_Integer, _List}] := Replace[list, {{0, {0, s___}} > {2, {s, 0}}, {1, {0, s___}} > {0, {s}}, {2, {0, s___}} > {1, {s, 0}}, {0, {1, s___}} > {1, {s, 1, 1}}, {1, {1, s___}} > {2, {s, 0}}, {2, {1, s___}} > {0, {s, 1}}}] 
The largestscale computations done here made use of furtheroptimized code (available in the Wolfram Function Repository), in which the state of the tag system is stored in a bitpacked array, with 8 updates being done at a time by having a table of results for all 256 cases and using the first byte of the bitpacked array to index into this. This approach routinely achieves a quarter billion updates per second on current hardware. (Larger update tables no longer fit in L1 cache and so typically do not help.)
As I’ve mentioned, there isn’t a particularly large literature on the specific behavior of tag systems. In 1963 Shigeru Watanabe described the basic families of cycles for Post’s 00, 1101 tag system (though did not discover the “sporadic cases”). After A New Kind of Science in 2002, I’m aware of one extensive series of papers (partly using computer experiment methods) written by Liesbeth De Mol following her 2007 PhD thesis. Carlos Martin (a student at the Wolfram Summer School) also wrote about probabilistic methods for predicting tag system evolution.
Thanks to Max Piskunov and Mano Namuduri for help with tag system implementations, Ed Pegg for tag system analysis (and for joining me in some tag system “hunting expeditions”), Matthew Szudzik and Jonathan Gorard for clarifying metamathematical issues, and Catherine Wolfram for help on the theory of random walks. Thanks also to Martin Davis and Margaret Minsky for clarifying some historical issues (and Dana Scott for having also done so long ago).
We’re in the process of setting up a distributed computing project to try to answer Emil Post’s 100yearold tag system question. Let us know if you’d like to get involved….
Wolfram Physics Bulletin
Informal updates and commentary on progress in the Wolfram Physics Project
Related livestreams Related notebooks Video work logs Related tweets
January 5, 2021 January 9, 2021 January 10, 2021 January 11, 2021 January 18, 2021 January 23, 2021 January 25, 2021 January 28, 2021
Over the years I’ve studied the simplest ordinary Turing machines quite a bit, but I’ve barely looked at multiway Turing machines (also known as nondeterministic Turing machines or NDTMs). Recently, though, I realized that multiway Turing machines can be thought of as “maximally minimal” models both of concurrent computing and of the way we think about quantum mechanics in our Physics Project. So now this piece is my attempt to “do the obvious explorations” of multiway Turing machines. And as I’ve found so often in the computational universe, even cases with some of the very simplest possible rules yield some significant surprises....
An ordinary Turing machine has a rule such as
✕
RulePlot[TuringMachine[2506]] 
that specifies a unique successor for each configuration of the system (here shown going down the page starting from an initial condition consisting of a blank tape):
✕
RulePlot[TuringMachine[2506], {{1, 6}, Table[0, 10]}, 20, Mesh > True, Frame > False] 
Wolfram Physics Bulletin
Informal updates and commentary on progress in the Wolfram Physics Project
Related livestreams Related notebooks Video work logs Related tweets
January 5, 2021 January 9, 2021 January 10, 2021 January 11, 2021 January 18, 2021 January 23, 2021 January 25, 2021 January 28, 2021
Over the years I’ve studied the simplest ordinary Turing machines quite a bit, but I’ve barely looked at multiway Turing machines (also known as nondeterministic Turing machines or NDTMs). Recently, though, I realized that multiway Turing machines can be thought of as “maximally minimal” models both of concurrent computing and of the way we think about quantum mechanics in our Physics Project. So now this piece is my attempt to “do the obvious explorations” of multiway Turing machines. And as I’ve found so often in the computational universe, even cases with some of the very simplest possible rules yield some significant surprises....
An ordinary Turing machine has a rule such as
✕
RulePlot[TuringMachine[2506]] 
that specifies a unique successor for each configuration of the system (here shown going down the page starting from an initial condition consisting of a blank tape):
✕
RulePlot[TuringMachine[2506], {{1, 6}, Table[0, 10]}, 20, Mesh > True, Frame > False] 
Any serious calculation in particle physics takes a lot of algebra. Maybe it doesn’t need to. But with the methods based on Feynman diagrams that we know so far, it does. And in fact it was these kinds of calculations that first led me to use computers for symbolic computation. That was in 1976, which by now is a long time ago. But actually the idea of doing Feynman diagram calculations by computer is even older.
So far as I know it all started from a single conversation on the terrace outside the cafeteria of the CERN particle physics lab near Geneva in 1962. Three physicists were involved. And out of that conversation there emerged three early systems for doing algebraic computation. One was written in Fortran. One was written in LISP. And one was written in assembly language.
I’ve told this story quite a few times, often adding “And which of those physicists do you suppose later won a Nobel Prize?” “Of course,” I explain, “it was the one who wrote their system in assembly language!”
That physicist was Martinus (Tini) Veltman, who died a couple of weeks ago, and who I knew for several decades. His system was called SCHOONSCHIP, and he wrote the first version of it in IBM 7000 series assembly language. A few years later he rewrote it in CDC 6000 series assembly language.
The emphasis was always on speed. And for many years SCHOONSCHIP was the main workhorse system for doing very largescale Feynman diagram calculations—which could take months of computer time.
Back in the early 1960s when SCHOONSCHIP was first written, Feynman diagrams—and the quantum field theory from which they came—were out of fashion. Feynman diagrams had been invented in the 1940s for doing calculations in quantum electrodynamics (the quantum theory of electrons and photons)—and that had gone well. But attention had turned to the strong interactions which hold nuclei together, and the weak interactions responsible for nuclear beta decay, and in neither case did Feynman diagrams seem terribly useful.
There was, however, a theory of the weak interactions that involved asyetunobserved “intermediate vector bosons” (that were precursors of what we now call W particles). And in 1961—as part of his PhD thesis—Tini Veltman took on the problem of computing how photons would interact with intermediate vector bosons. And for this he needed elaborate Feynman diagram calculations.
I’m not sure if Tini already knew how to program, or whether he learned it for the purpose of creating SCHOONSCHIP—though I do know that he’d been an electronics buff since childhood.
I think I was first exposed to SCHOONSCHIP in 1976, and I used it for a few calculations. In my archives now, I can find only a single example of running it: a sample calculation someone did for me, probably in 1978, in connection with something I was writing (though never published):
By modern standards it looks a bit obscure. But it’s a fairly typical “oldstyle line printer output”. There’s a version of the input at the top. Then some diagnostics in the middle. And then the result appears at the bottom. And the system reports that this took .12 seconds to generate.
This particular result is for a very simple Feynman diagram involving the interaction of a photon and an electron—and involves just 9 terms. But SCHOONSCHIP could handle results involving millions of terms too (which allowed computations in QED to be done to 8digit precision).
Within days after finishing my PhD in theoretical physics at Caltech in November 1979, I flew to Geneva, Switzerland, to visit CERN for a couple of weeks. And it was during that visit that I started designing SMP (“Symbolic Manipulation Program”)—the system that would be the forerunner of Mathematica and the Wolfram Language.
And when I mentioned what I was doing to people at CERN they said “You should talk to Tini Veltman”.
And so it was that in December 1979 I flew to Amsterdam, and went to see Tini Veltman. The first thing that struck me was how incongruous the name “Tini” (pronounced “teeny”) seemed. (At the time, I didn’t even know why he was called Tini; I’d only seen his name as “M. Veltman”, and didn’t know “Tini” was short for “Martinus”.) But Tini was a large man, with a large beard—not “teeny” at all. He reminded me of pictures of European academics of old—and, for some reason, particularly of Ludwig Boltzmann.
He was 48 years old; I was 20. He was definitely a bit curious about the “newfangled computer ideas” I was espousing. But generally he took on the mantle of an elder statesman who was letting me in on the secrets of how to build a computer system like SCHOONSCHIP.
I pronounced SCHOONSCHIP “scoonship”. He got a twinkle in his eye, and corrected it to a very guttural “scohwnscchhip” (IPA: [sxon][sxɪp]), explaining that, yes, he’d given it a Dutch name that was hard for nonDutch people to say. (The Dutch word “schoonschip” means, roughly, “shipshape”—like SCHOONSCHIP was supposed to make one’s algebraic expressions.) Everything in SCHOONSCHIP was built for efficiency. The commands were short. Tini was particularly pleased with YEP for generating intermediate results, which, he said, was mnemonic in Dutch (yes, SCHOONSCHIP may be the only computer system with keywords derived from Dutch).
If you look at the sample SCHOONSCHIP output above, you might notice something a little strange in it. Every number in the (exact) algebraic expression result that’s generated has a decimal point after it. And there’s even a +0. at the end. What’s going on there? Well, that was one of the big secrets Tini was very keen to tell me.
“Floatingpoint computation is so much faster than integer”, he said. “You should do everything you can in floating point. Only convert it back to exact numbers at the end.” And, yes, it was true that the scientific computers of the time—like the CDC machines he used—had very much been optimized for floatingpoint arithmetic. He quoted instruction times. He explained that if you do all your arithmetic in “fast floating point”, and then get a 64bit floating point number out at the end, you can always reverse engineer what rational number it was supposed to be—and that it’s much faster to do this than to keep track of rational numbers exactly through the computation.
It was a neat hack. And I bought it. And in fact when we implemented SMP its default “exact” arithmetic worked exactly this way. I’m not sure if it really made computations more efficient. In the end, it got quite tangled up with the rather abstract and general design of SMP, and became something of a millstone. But actually we use somewhat similar ideas in modern Wolfram Language (albeit now with formally verified interval arithmetic) for doing exact computations with things like algebraic numbers.
I’m not sure I ever talked much to Tini about the content of physics; somehow we always ended up discussing computers (or the physics community) instead. But I certainly made use of Tini’s efforts to streamline not just the computer implementation but also the underlying theory of Feynman diagram calculation. I expect—as I have seen so often—that his efforts to streamline the underlying theory were driven by thinking about things in computational terms. But I made particular use of the “Diagrammar” he produced in 1972:
One of the principal methods that was introduced here was what’s called dimensional regularization: the concept of formally computing results in ddimensional space (with d a continuous variable), then taking the limit d → 4 at the end. It’s an elegant approach, and when I was doing particle physics in the late 1970s, I became quite an enthusiast of it. (In fact, I even came up with an extension of it—based on looking at the angular structure of Gegenbauer functions as ddimensional spherical functions—that was further developed by Tony Terrano, who worked with me on Feynman diagram computation, and later on SMP.)
Back in the 1970s, “continuing to d dimensions” was just thought of as a formal trick. But, curiously enough, in our Physics Project, where dimension is an emergent property, one’s interested in “genuinely ddimensional” space. And, quite possibly, there are experimentally observable signatures of d ≠ 3 dimensions of space. And in thinking about that, quite independent of Tini, I was just about to pull out my copy of “Diagrammar” again.
Tini’s original motivation for writing SCHOONSCHIP had been a specific calculation involving the interaction of photons and putative “intermediate vector bosons”. But by the late 1960s, there was an actual theoretical candidate for what the “intermediate vector boson” might be: a “gauge boson” basically associated with the “gauge symmetry group” SU(2)—and given mass through “spontaneous symmetry breaking” and the “Higgs mechanism”.
But what would happen if one did Feynman diagram calculations in a “gauge theory” like this? Would they have the renormalizability property that Richard Feynman had identified in QED, and that allowed one to not worry about infinities that were nominally generated in calculations? Tini Veltman wanted to figure this out, and soon suggested the problem to his student Gerard ’t Hooft (his coauthor on “Diagrammar”).
Tini had defined the problem and formalized what was needed, but it was ’t Hooft who figured out the math and in the end presented a rather elaborate proof of renormalizability of gauge theories in his 1972 PhD thesis. It was a major and muchheralded result—providing what was seen as key theoretical validation for the first part of what became the Standard Model of particle physics. And it launched ’t Hooft’s career.
Tini Veltman always gave me the impression of someone who wanted to interact—and collaborate—with people. Gerard ’t Hooft has always struck me as being more in the “lone wolf” model of doing physics. I’ve interacted with Gerard from time to time for years (in fact I first met him several years before I met Tini). And it’s been very impressive to see him invent a long sequence of some of the most creative ideas in physics over the past half century. And though it’s not my focus here, I should mention that Gerard got interested in cellular automata in the late 1980s, and a few years ago even wrote a book called The Cellular Automaton Interpretation of Quantum Mechanics. I’d never quite understood what he was talking about, and I suspected that—despite his use of Mathematica—he’d never explored the computational universe enough to develop a true intuition for what goes on there. But actually, quite recently, it looks as if there’s a limiting case of our Physics Project that may just correspond to what Gerard has been talking about—which would be very cool…
But I digress. Starting in 1966 Tini was a professor at Utrecht. And in 1974 Gerard became a professor there too. And even by the time I met Tini in 1979 there were already rumors of a fallingout. Gerard was reported as saying that Tini didn’t understand stuff. Tini was reported as saying that Gerard was “a monster”. And then there was the matter of the Nobel Prize.
As the Standard Model gained momentum, and was increasingly validated by experiments, the proof of renormalizability of gauge theories started to seem more and more like it would earn a Nobel Prize. But who would actually get the prize? Gerard was clear. But what about Tini? There were rumors of letters arguing one way and the other, and stories of scurrilous campaigning.
Prizes are always a complicated matter. They’re usually created to incentivize something, though realistically they’re often as much as anything for the benefit of the giver. But if they’re successful, they tend to come to represent objectives in themselves. Years ago I remember the wife of a wellknown physicist advising me to “do something you can win a prize for”. It didn’t make sense to me, and then I realized why. “I want to do things”, I said, “for which nobody’s thought to invent a prize yet”.
Well, the good news is that in 1999, the Nobel Committee decided to award the Nobel Prize to both Gerard and Tini. “Thank goodness” was the general sentiment.
When I visited Tini in Utrecht in 1979 I got the impression that he and his family were very deeply rooted in the Netherlands and would always be there. I knew that Tini had spent time at CERN, and I think I vaguely knew that he’d been quite involved with the neutrino experiments there. But I didn’t know that SCHOONSCHIP wasn’t originally written when Tini was at Utrecht or at CERN: despite the version in the CERN Program Library saying it was “Written in 1967 by M. Veltman at CERN” the first version was actually written right in the heart of what would become Silicon Valley, during the time Tini worked at the thenverynew Stanford Linear Accelerator Center, in 1963.
He’d gone there along with John Bell (of Bell’s inequalities fame), whose “day job” was working on theoretical aspects of neutrino experiments. (Thinking about the foundations of quantum mechanics was not well respected by other physicists at the time.) Curiously, another person at Stanford at the time was Tony Hearn, who was one of the physicists in the discussion on the terrace at CERN. But unlike Tini, he fell into the computer science and John McCarthy orbit at Stanford, and wrote his REDUCE program in LISP.
By the way, in piecing together the story of Tini’s life and times, I just discovered another “small world” detail. It turns out back in 1961 an early version of the intermediate boson calculations that Tini was interested in had been done by two famous physicists, T. D. Lee and C. N. Yang—with the aid of a computer. And they’d been helped by a certain Peter Markstein at IBM—who, along with his wife Vicky Markstein, would be instrumental nearly 30 years later in getting Mathematica to run on IBM RISC systems. But in any case, back in 1961, Lee and Yang apparently wouldn’t give Tini access to the programs Peter Markstein had created—which was why Tini decided to make his own, and to write SCHOONSCHIP to do it.
But back to the main story. I suspect it was a result of the rift with Gerard ’t Hooft. But in 1980, at the age of 50, Tini transplanted himself and his family from Utrecht to the University of Michigan in Ann Arbor, Michigan. He spent quite a bit of his time at Fermilab near Chicago, in and around neutrino experiments.
But it was in Michigan that I had my next major interaction with Tini. I had started building SMP right after I saw Tini in 1979—and after a somewhat tortuous effort to choose between CERN and Caltech—I had accepted a faculty position at Caltech. In early 1981 Version 1.0 of SMP was released. And in the effort to figure out how to develop it further—with the initial encouragement of Caltech—I ended up starting my first company. But soon (through a chain of events I’ve described elsewhere) Caltech had a change of heart, and in June 1982 I decided I was going to quit Caltech.
I wrote to Tini—and, somewhat to my surprise, he quickly began to aggressively try to recruit me to Michigan. He wrote me asking what it would take to get me there. Third on his list was the type of position, “research or faculty”. Second was “salary”. But first was “computers”—adding parenthetically “I understand you want a VAX; this needs some detailing”. The University of Michigan did indeed offer me a nice professorship, but—choosing among several possibilities—I ended up going to the Institute for Advanced Study in Princeton, with the result that I never had the chance to interact with Tini at close quarters.
A few years later, I was working on Mathematica, and what would become the Wolfram Language. And, no, I didn’t use a floatingpoint representation for algebraic coefficients again. Mathematica 1.0 was released in 1988, and shortly after that Tini told me he was writing a new version of SCHOONSCHIP, in a different language. “What language?”, I asked. “68000 assembler”, he said. “You can’t be serious!” I said. But he was, and soon thereafter a new version of SCHOONSCHIP appeared, written in 68000 assembler.
I think Tini somehow never really fully trusted anything higher level than assembler—proudly telling me things he could do by writing right down “at the metal”. I talked about portability. I talked about compiler optimizers. But he wasn’t convinced. And at the time, perhaps he was still correct. But just last week, for example, I got the latest results from benchmarking the symbolic compiler that we have under development for the Wolfram Language: the compiled versions of some pieces of toplevel code run 30x faster than customwritten C code. Yes, the machine is probably now smarter even than Tini at being able to create fast code.
At Michigan, alongside his more directly experimentally related work (which, as I now notice, even included a paper related to a particle physics result of mine from 1978), Tini continued his longtime interest in Feynman diagrams. In 1989, he wrote a paper called “Gammatrica”, about the Dirac gamma matrix computations that are the core of many Feynman diagram calculations. And then in 1994 a textbook called Diagrammatica—kind of like “Diagrammar” but with a Mathematicarhyming ending.
Tini didn’t publish all that many papers but spent quite a bit of time helping set directions for the US particle physics community. Looking at his list of publications, though, one that stands out is a 1991 paper written in collaboration with his daughter Hélène, who had just got her physics PhD at Berkeley (she subsequently went into quantitative finance): “On the Possibility of Resonances in Longitudinally Polarized Vector Boson Scattering”. It’s a nice paper, charmingly “resonant” with things Tini was thinking about in 1961, even comparing the interactions of W particles with interactions between pions of the kind that were all the rage in 1961.
Tini retired from Michigan in 1996, returned to the Netherlands and set about building a house. The longawaited Nobel Prize arrived in 1999.
In 2003 Tini published a book, Facts and Mysteries in Elementary Particle Physics, presenting particle physics and its history for a general audience. Interspersed through the book are onepage summaries of various physicists—often with charming little “gossip” tidbits that Tini knew from personal experience, or picked up from his time in the physics community.
One such page describing some experimental physicists ends:
“The CERN terrace, where you can see the Mont Blanc on the horizon, is very popular among highenergy physicists. You can meet there just about everybody in the business. Many initiatives were started there, and many ideas were born in that environment. So far you can still smoke a cigar there.”
The page has a picture, taken in June 1962—that I rather imagine must mirror what that “symbolic computation origin discussion” looked like (yes, physicists wore ties back then):
Just after Tini won the Nobel Prize, he ran into Rolf Mertig, who was continuing Tini’s tradition of Feynman diagram computation by creating the FeynCalc system for the Wolfram Language. Tini apparently explained that had he not gone into physics, he would have gone into “business”.
I’m not sure if it was before Mathematica 1.0 or after, but I remember Tini telling me that he thought that maybe he should get into the software business. I think Tini felt in some ways frustrated with physics. I remember when I first met him back in 1979 he spent several hours telling me about issues at CERN. One of the important predictions of what became the Standard Model were socalled neutral currents (associated with the Z boson). In the end, neutral currents were discovered in 1973. But Tini explained that many years earlier he started telling people at CERN that they should be able to see neutral currents in their experiments. But for years they didn’t listen to him, and when they finally did, it turned out that—expensive as their earlier experiments had been—they’d thrown out the bubble chamber film that had been produced, and on which neutral currents should have been visible perhaps 15 years earlier.
When Tini won his Nobel Prize, I sent him a congratulations card. He sent a slightly stiff letter in response:
When Tini met me in 1979, I’m not sure he expected me to just take off and ultimately build something like Mathematica. But his input—and encouragement—back in 1979 was important in giving me the confidence to start down that road. So, thanks Tini for all you did for me, and for the advice—even though your PS advice I think I still haven’t taken….
]]>When we released Version 12.1 in March of this year, I was pleased to be able to say that with its 182 new functions it was the biggest .1 release we’d ever had. But just nine months later, we’ve got an even bigger .1 release! Version 12.2, launching today, has 228 completely new functions!
We always have a portfolio of development projects going on, with any given project taking anywhere from a few months to more than a decade to complete. And of course it’s a tribute to our whole Wolfram Language technology stack that we’re able to develop so much, so quickly. But Version 12.2 is perhaps all the more impressive for the fact that we didn’t concentrate on its final development until midJune of this year. Because between March and June we were concentrating on 12.1.1, which was a “polishing release”. No new features, but more than a thousand outstanding bugs fixed:
How did we design all those new functions and new features that are now in 12.2? It’s a lot of work! And it’s what I personally spend a lot of my time on (along with other “small items” like physics, etc.). But for the past couple of years we’ve done our language design in a very open way—livestreaming our internal design discussions, and getting all sorts of great feedback in real time. So far we’ve recorded about 550 hours—of which Version 12.2 occupied at least 150 hours.
By the way, in addition to all of the fully integrated new functionality in 12.2, there’s also been significant activity in the Wolfram Function Repository—and even since 12.1 was released 534 new, curated functions for all sorts of specialized purposes have been added there.
There are so many different things in so many areas in Version 12.2 that it’s hard to know where to start. But let’s talk about a completely new area: biosequence computation. Yes, we’ve had gene and protein data in the Wolfram Language for more than a decade. But what’s new in 12.2 is the beginning of the ability to do flexible, general computation with bio sequences. And to do it in a way that fits in with all the chemical computation capabilities we’ve been adding to the Wolfram Language over the past few years.
Here’s how we represent a DNA sequence (and, yes, this works with very long sequences too):
✕
BioSequence["DNA", "CTTTTCGAGATCTCGGCGTCA"] 
This translates the sequence to a peptide (like a “symbolic ribosome”):
✕
BioSequenceTranslate[%] 
Now we can find out what the corresponding molecule is:
✕
Molecule[%] 
And visualize it in 3D (or compute lots of properties):
✕
MoleculePlot3D[%] 
I have to say that I agonized a bit about the “nonuniversality” of putting the specifics of “our” biology into our core language… but it definitely swayed my thinking that, of course, all our users are (for now) definitively eukaryotes. Needless to say, though, we’re set up to deal with other branches of life too:
✕
Entity["GeneticTranslationTable", "AscidianMitochondrial"]["StartCodons"] 
You might think that handling genome sequences is “just string manipulation”—and indeed our string functions are now set up to work with bio sequences:
✕
StringReverse[BioSequence["DNA", "CTTTTCGAGATCTCGGCGTCA"]] 
But there’s also a lot of biologyspecific additional functionality. Like this finds a complementary basepair sequence:
✕
BioSequenceComplement[BioSequence["DNA", "CTTTTCGAGATCTCGGCGTCA"]] 
Actual, experimental sequences often have base pairs that are somehow uncertain—and there are standard conventions for representing this (e.g. “S” means C or G; “N” means any base). And now our string patterns also understand things like this for bio sequences:
✕
StringMatchQ[BioSequence["DNA", "CTTT"], "STTT"] 
And there are new functions like BioSequenceInstances for resolving degenerate characters:
✕
BioSequenceInstances[BioSequence["DNA", "STTT"]] 
BioSequence is also completely integrated with our builtin genome and protein data. Here’s a gene that we can ask for in natural language “WolframAlpha style”:
✕
BioSequence[CloudGet["https://wolfr.am/ROWvGTNr"]] 
Now we ask to do sequence alignment between these two genes (in this case, both human—which is, needless to say, the default):
✕

What’s in 12.2 is really just the beginning of what we’re planning for biosequence computation. But already you can do very flexible things with large datasets. And, for example, it’s now straightforward for me to read my genome in from FASTA files and start exploring it…
✕
BioSequence["DNA", First[Import["Genome/Consensus/c1.fa.consensus.fa"]]] 
Locations of birds’ nests, gold deposits, houses for sale, defects in a material, galaxies…. These are all examples of spatial point datasets. And in Version 12.2 we now have a broad collection of functions for handling such datasets.
Here’s the “spatial point data” for the locations of US state capitals:
✕
SpatialPointData[ GeoPosition[EntityClass["City", "UnitedStatesCapitals"]]] 
Since it’s geo data, it’s plotted on a map:
✕
PointValuePlot[%] 
Let’s restrict our domain to the contiguous US:
✕
capitals = SpatialPointData[ GeoPosition[EntityClass["City", "UnitedStatesCapitals"]], Entity["Country", "UnitedStates"]]; 
✕
PointValuePlot[%] 
Now we can start computing spatial statistics. Like here’s the mean density of state capitals:
✕
MeanPointDensity[capitals] 
Assume you’re in a state capital. Here’s the probability to find the nearest other state capital a certain distance away:
✕
NearestNeighborG[capitals] 
✕
Plot[%[Quantity[r, "Miles"]], {r, 0, 400}] 
This tests whether the state capitals are randomly distributed; needless to say, they’re not:
✕
SpatialRandomnessTest[capitals] 
In addition to computing statistics from spatial data, Version 12.2 can also generate spatial data according to a wide range of models. Here’s a model that picks “center points” at random, then has other points clustered around them:
✕
PointValuePlot[ RandomPointConfiguration[MaternPointProcess[.0001, 1, .1, 2], CloudGet["https://wolfr.am/ROWwlIqR"]]] 
You can also go the other way around, and fit a spatial model to data:
✕
EstimatedPointProcess[capitals, MaternPointProcess[\[Mu], \[Lambda], r, 2], {\[Mu], \[Lambda], r}] 
In some ways we’ve been working towards it for 30 years. We first introduced NDSolve back in Version 2.0, and we’ve been steadily enhancing it ever since. But our longterm goal has always been convenient handling of realworld PDEs of the kind that appear throughout highend engineering. And in Version 12.2 we’ve finally got all the pieces of underlying algorithmic technology to be able to create a truly streamlined PDEsolving experience.
OK, so how do you specify a PDE? In the past, it was always done explicitly in terms of particular derivatives, boundary conditions, etc. But most PDEs used for example in engineering consist of higherlevel components that “package together” derivatives, boundary conditions, etc. to represent features of physics, materials, etc.
The lowest level of our new PDE framework consists of symbolic “terms”, corresponding to common mathematical constructs that appear in realworld PDEs. For example, here’s a 2D “Laplacian term”:
✕
LaplacianPDETerm[{u[x, y], {x, y}}] 
And now this is all it takes to find the first 5 eigenvalues of the Laplacian in a regular polygon:
✕
NDEigenvalues[LaplacianPDETerm[{u[x, y], {x, y}}], u[x, y], {x, y} \[Element] RegularPolygon[5], 5] 
And the important thing is that you can put this kind of operation into a whole pipeline. Like here we’re getting the region from an image, solving for the 10th eigenmode, and then 3D plotting the result:
✕
NDEigensystem[{LaplacianPDETerm[{u[x, y], {x, y}}]}, u[x, y], {x, y} \[Element] ImageMesh[CloudGet["https://wolfr.am/ROWwBtE7"]], 10][[2, 1]] 
✕
Plot3D[%, {x, y} \[Element] ImageMesh[CloudGet["https://wolfr.am/ROWwGqjg"]]] 
In addition to LaplacianPDETerm, there are things like DiffusionPDETerm and ConvectionPDETerm that represent other terms that arise in realworld PDEs. Here’s a term for isotropic diffusion with unit diffusion coefficient:
✕
DiffusionPDETerm[{\[Phi][x, y, z], {x, y, z}}] 
Beyond individual terms, there are also “components” that combine multiple terms, usually with various parameters. Here’s a Helmholtz PDE component:
✕
HelmholtzPDEComponent[{u[x, y], {x, y}}, <"HelmholtzEigenvalue" > k>] 
By the way, it’s worth pointing out that our “terms” and “components” are set up to represent the symbolic structure of PDEs in a form suitable for structural manipulation and for things like numerical analysis. And to ensure that they maintain their structure, they’re normally kept in an inactivated form. But you can always “activate” them if you want to do things like algebraic operations:
✕
Activate[%] 
In realworld PDEs, one’s often dealing with actual, physical processes taking place in actual physical materials. And in Version 12.2 we’ve got immediate ways to deal not only with things like diffusion, but also with acoustics, heat transfer and mass transport—and to feed in properties of actual materials. Typically the structure is that there’s a PDE “component” that represents the bulk behavior of the material, together with a variety of PDE “values” or “conditions” that represent boundary conditions.
Here’s a typical PDE component, using material properties from the Wolfram Knowledgebase:
✕
HeatTransferPDEComponent[{\[CapitalTheta][t, x, y], t, {x, y}}, < "Material" > CloudGet["https://wolfr.am/ROWwUQai"]>] 
There’s quite a bit of diversity and complexity to the possible boundary conditions. For example, for heat transfer, there’s HeatFluxValue, HeatInsulationValue and five other symbolic boundary condition specification constructs. In each case, the basic idea is to say where (geometrically) the condition applies, then what it applies to, and what parameters relate to it.
So, for example, here’s a condition that specifies that there’s a fixed “surface temperature” θ_{0} everywhere outside the (circular) region defined by x^{2} + y^{2} = 1:
✕
HeatTemperatureCondition[ x^2 + y^2 > 1, {\[CapitalTheta][t, x, y], t, {x, y}}, < "SurfaceTemperature" > Subscript[\[Theta], 0]>] 
What’s basically happening here is that our highlevel “physics” description is being “compiled” into explicit “mathematical” PDE structures—like Dirichlet boundary conditions.
OK, so how does all this fit together in a reallife situation? Let me show an example. But first, let me tell a story. Back in 2009 I was having tea with our lead PDE developer. I picked up a teaspoon and asked “When will we be able to model the stresses in this?” Our lead developer explained that there was quite a bit to build to get to that point. Well, I’m excited to say that after 11 years of work, in Version 12.2 we’re there. And to prove it, our lead developer just gave me… a (computational) spoon!
✕
spoon = CloudGet["https://wolfr.am/ROWx6wKF"]; 
The core of the computation is a 3D diffusion PDE term, with a “diffusion coefficient” given by a rank4 tensor parametrized by Young’s modulus (here Y) and Poisson ratio (ν):
✕
pdeterm = DiffusionPDETerm[{{u[x, y, z], v[x, y, z], w[x, y, z]}, {x, y, z}}, Y/(1 + \[Nu]) { {{ {(1  \[Nu])/(1  2 \[Nu]), 0, 0}, {0, 1/2, 0}, {0, 0, 1/2} }, { {0, \[Nu]/(1  2 \[Nu]), 0}, {1/2, 0, 0}, {0, 0, 0} }, { {0, 0, \[Nu]/(1  2 \[Nu])}, {0, 0, 0}, {1/2, 0, 0} }}, {{ {0, 1/2, 0}, {\[Nu]/(1  2 \[Nu]), 0, 0}, {0, 0, 0} }, { {1/2, 0, 0}, {0, (1  \[Nu])/(1  2 \[Nu]), 0}, {0, 0, 1/2} }, { {0, 0, 0}, {0, 0, \[Nu]/(1  2 \[Nu])}, {0, 1/2, 0} }}, {{ {0, 0, 1/2}, {0, 0, 0}, {\[Nu]/(1  2 \[Nu]), 0, 0} }, { {0, 0, 0}, {0, 0, 1/2}, {0, \[Nu]/(1  2 \[Nu]), 0} }, { {1/2, 0, 0}, {0, 1/2, 0}, {0, 0, (1  \[Nu])/(1  2 \[Nu])} }} }, <Y > 10^9, \[Nu] > 33/100>]; 
There are boundary conditions to specify how the spoon is being held, and pushed. Then solving the PDE (which takes just a few seconds) gives the displacement field for the spoon
✕
dfield = deformations = NDSolveValue[{pdeterm == {0, NeumannValue[1000, x <= 100], 0}, DirichletCondition[{u[x, y, z] == 0., v[x, y, z] == 0., w[x, y, z] == 0.}, x >= 100]}, {u, v, w}, {x, y, z} \[Element] spoon]; 
which we can then use to find how the spoon would deform:
✕
Show[MeshRegion[ Table[Apply[if, m], {m, MeshCoordinates[spoon]}, {if, deformations}] + MeshCoordinates[spoon], MeshCells[spoon, MeshCells[spoon, {2, All}]]], Graphics3D[Style[spoon, LightGray]]] 
PDE modeling is a complicated area, and I consider it to be a major achievement that we’ve now managed to “package” it as cleanly as this. But in Version 12.2, in addition to the actual technology of PDE modeling, something else that’s important is a large collection of computational essays about PDE modeling—altogether about 400 pages of detailed explanation and application examples, currently in acoustics, heat transfer and mass transport, but with many other domains to come.
The Wolfram Language is all about expressing yourself in precise computational language. But in notebooks you can also express yourself with ordinary text in natural language. But what if you want to display math in there as well? For 25 years we’ve had the infrastructure to do the math display—through our box language. But the only convenient way to enter the math is through Wolfram Language math constructs—that in some sense have to have computational meaning.
But what about “math” that’s “for human eyes only”? That has a certain visual layout that you want to specify, but that doesn’t necessarily have any particular underlying computational meaning that’s been defined? Well, for many decades there’s been a good way to specify such math, thanks to my friend Don Knuth: just use T_{E}X. And in Version 12.2 we’re now supporting direct entry of T_{E}X math into Wolfram Notebooks, both on the desktop and in the cloud. Underneath, the T_{E}X is being turned into our box representation, so it structurally interoperates with everything else. But you can just enter it—and edit it—as T_{E}X.
The interface is very much like the += interface for WolframAlphastyle natural language input. But for T_{E}X (in a nod to standard T_{E}X delimiters), it’s +$.
Type +$ and you get a T_{E}X input box. When you’ve finished the T_{E}X, just hit and it’ll be rendered:
Like with +=, if you click the rendered form, it’ll go back to text and you can edit again, just as T_{E}X.
Entering T_{E}X in text cells is the most common thing to want. But Version 12.2 also supports entering T_{E}X in input cells:
What happens if you + evaluate? Your input will be treated as TraditionalForm, and at least an attempt will be made to interpret it. Though, of course, if you wrote “computationally meaningless math” that won’t work.
Type Canvas[] and you’ll get a blank canvas to draw whatever you want:
✕
Canvas[] 
We’ve worked hard to make the drawing tools as ergonomic as possible.
Applying Normal gives you graphics that you can then use or manipulate:
✕
GraphicsGrid[ Partition[ Table[Rasterize[Rotate[Normal[%], \[Theta]], ImageSize > 50], {\[Theta], 0, 2 Pi, .4}], UpTo[8]], ImageSize > 500] 
✕
GraphicsGrid[ Partition[ Table[Rasterize[Rotate[Normal[%], \[Theta]], ImageSize > 50], {\[Theta], 0, 2 Pi, .4}], UpTo[8]], ImageSize > 500] 
When you create a canvas, it can have any graphic as initial content—and it can have any background you want:
✕
Canvas[Graphics[ Style[Disk[], Opacity[.4, Red], EdgeForm[{Thick, Red}]]], Background > GeoGraphics[ Entity["MannedSpaceMission", "Apollo16"][ EntityProperty["MannedSpaceMission", "LandingPosition"]]]] 
On the subject of drawing anything, Version 12.2 has another new function: MoleculeDraw, for drawing (or editing) molecules. Start with the symbolic representation of a molecule:
✕
Molecule[Entity["Chemical", "Caffeine"]] 
Now use MoleculeDraw to bring up the interactive molecule drawing environment, make an edit, and return the result:
It’s another molecule now:
Math has been a core use case for the Wolfram Language (and Mathematica) since the beginning. And it’s been very satisfying over the past third of a century to see how much math we’ve been able to make computational. But the more we do, the more we realize is possible, and the further we can go. It’s become in a sense routine for us. There’ll be some area of math that people have been doing by hand or piecemeal forever. And we’ll figure out: yes, we can make an algorithm for that! We can use the giant tower of capabilities we’ve built over all these years to systematize and automate yet more mathematics; to make yet more math computationally accessible to anyone. And so it has been with Version 12.2. A whole collection of pieces of “math progress”.
Let’s start with something rather cut and dried: special functions. In a sense, every special function is an encapsulation of a certain nugget of mathematics: a way of defining computations and properties for a particular type of mathematical problem or system. Starting from Mathematica 1.0 we’ve achieved excellent coverage of special functions, steadily expanding to more and more complicated functions. And in Version 12.2 we’ve got another class of functions: the Lamé functions.
Lamé functions are part of the complicated world of handling ellipsoidal coordinates; they appear as solutions to the Laplace equation in an ellipsoid. And now we can evaluate them, expand them, transform them, and do all the other kinds of things that are involved in integrating a function into our language:
✕
Plot[Abs[LameS[3/2 + I, 3, z, 0.1 + 0.1 I]], {z, 8 EllipticK[1/3], 8 EllipticK[1/3]}] 
✕
Series[LameC[\[Nu], j, z, m], {z, 0, 3}] 
Also in Version 12.2 we’ve done a lot on elliptic functions—dramatically speeding up their numerical evaluation and inventing algorithms doing this efficiently at arbitrary precision. We’ve also introduced some new elliptic functions, like JacobiEpsilon—which provides a generalization of EllipticE that avoids branch cuts and maintains the analytic structure of elliptic integrals:
✕
ComplexPlot3D[JacobiEpsilon[z, 1/2], {z, 6}] 
We’ve been able to do many symbolic Laplace and inverse Laplace transforms for a couple of decades. But in Version 12.2 we’ve solved the subtle problem of using contour integration to do inverse Laplace transforms. It’s a story of knowing enough about the structure of functions in the complex plane to avoid branch cuts and other nasty singularities. A typical result effectively sums over an infinite number of poles:
✕
InverseLaplaceTransform[Coth[s \[Pi] /2 ]/(1 + s^2), s, t] 
And between contour integration and other methods we’ve also added numerical inverse Laplace transforms. It all looks easy in the end, but there’s a lot of complicated algorithmic work needed to achieve this:
✕
InverseLaplaceTransform[1/(s + Sqrt[s] + 1), s, 1.5] 
Another new algorithm made possible by finer “function understanding” has to do with asymptotic expansion of integrals. Here’s a complex function that becomes increasingly wiggly as λ increases:
✕
Table[ReImPlot[(t^10 + 3) Exp[I \[Lambda] (t^5 + t + 1)], {t, 2, 2}], {\[Lambda], 10, 30, 10}] 
And here’s the asymptotic expansion for λ→∞:
✕
AsymptoticIntegrate[(t^10 + 3) Exp[ I \[Lambda] (t^5 + t + 1)], {t, 2, 2}, {\[Lambda], Infinity, 2}] 
It’s a very common calculus exercise to determine, for example, whether a particular function is injective. And it’s pretty straightforward to do this in easy cases. But a big step forward in Version 12.2 is that we can now systematically figure out these kinds of global properties of functions—not just in easy cases, but also in very hard cases. Often there are whole networks of theorems that depend on functions having suchandsuch a property. Well, now we can automatically determine whether a particular function has that property, and so whether the theorems hold for it. And that means that we can create systematic algorithms that automatically use the theorems when they apply.
Here’s an example. Is Tan[x] injective? Not globally:
✕
FunctionInjective[Tan[x], x] 
But over an interval, yes:
✕
FunctionInjective[{Tan[x], 0 < x < Pi/2}, x] 
What about the singularities of Tan[x]? This gives a description of the set:
✕
FunctionSingularities[Tan[x], x] 
You can get explicit values with Reduce:
✕
Reduce[%, x] 
So far, fairly straightforward. But things quickly get more complicated:
✕
FunctionSingularities[ArcTan[x^y], {x, y}, Complexes] 
And there are more sophisticated properties you can ask about as well:
✕
FunctionMeromorphic[Log[z], z] 
✕
FunctionMeromorphic[{Log[z], z > 0}, z] 
We’ve internally used various kinds of functiontesting properties for a long time. But with Version 12.2 function properties are much more complete and fully exposed for anyone to use. Want to know if you can interchange the order of two limits? Check FunctionSingularities. Want to know if you can do a multivariate change of variables in an integral? Check FunctionInjective.
And, yes, even in Plot3D we’re routinely using FunctionSingularities to figure out what’s going on:
✕
Plot3D[Re[ArcTan[x^y]], {x, 5, 5}, {y, 5, 5}] 
In Version 12.1 we began the process of introducing video as a builtin feature of the Wolfram Language. Version 12.2 continues that process. In 12.1 we could only handle video in desktop notebooks; now it’s extended to cloud notebooks—so when you generate a video in Wolfram Language it’s immediately deployable to the cloud.
A major new video feature in 12.2 is VideoGenerator. Provide a function that makes images (and/or audio), and VideoGenerator will generate a video from them (here a 4second video):
✕
VideoGenerator[Graphics3D[AugmentedPolyhedron[Icosahedron[], #  2], ImageSize > {200, 200}] &, 4] 
To add a sound track, we can just use VideoCombine:
✕
VideoCombine[{%, \!\(\* TagBox[ RowBox[{"CloudGet", "[", "\"\<https://wolfr.am/ROWzckqS\>\"", "]"}], Audio`AudioBox["AudioClass" > "AudioData"], Editable>False, Selectable>False]\)}] 
So how would we edit this video? In Version 12.2 we have programmatic versions of standard videoediting functions. VideoSplit, for example, splits the video at particular times:
✕
VideoSplit[%, {.3, .5, 2}] 
But the real power of the Wolfram Language comes in systematically applying arbitrary functions to videos. VideoMap lets you apply a function to a video to get another video. For example, we could progressively blur the video we just made:
✕
VideoMap[Blur[#Image, 20 #Time] &, %%] 
There are also two new functions for analyzing videos—VideoMapList and VideoMapTimeSeries—which respectively generate a list and a time series by applying a function to the frames in a video, and to its audio track.
Another new function—highly relevant for video processing and video editing—is VideoIntervals, which determines the time intervals over which any given criterion applies in a video:
✕
VideoIntervals[%, Length[DominantColors[#Image]] < 3 &] 
Now, for example, we can delete those intervals in the video:
✕
VideoDelete[%, %%] 
A common operation in the practical handling of videos is transcoding. And in Version 12.2 the function VideoTranscode lets you convert a video among any of the over 300 containers and codecs that we support. By the way, 12.2 also has new functions ImageWaveformPlot and ImageVectorscopePlot that are commonly used in video color correction:
✕
ImageVectorscopePlot[CloudGet["https://wolfr.am/ROWzsGFw"]] 
One of the main technical issues in handling video is dealing with the large amount of data in a typical video. In Version 12.2 there’s now finer control over where that data is stored. The option GeneratedAssetLocation (with default $GeneratedAssetLocation) lets you pick between different files, directories, local object stores, etc.
But there’s also a new function in Version 12.2 for handling “lightweight video”, in the form of AnimatedImage. AnimatedImage simply takes a list of images and produces an animation that immediately plays in your notebook—and has everything directly stored in your notebook:
✕
AnimatedImage[ Table[Rasterize[Rotate[Style["W", 40], \[Theta]]], {\[Theta], 0, 2 Pi, .1}]] 
It comes up quite frequently for me—especially given our Physics Project. I’ve got a big computation I’d like to do, but I don’t want to (or can’t) do it on my computer. And instead what I’d like to do is run it as a batch job in the cloud.
This has been possible in principle for as long as cloud computation providers have been around. But it’s been very involved and difficult. Well, now, in Version 12.2 it’s finally easy. Given any piece of Wolfram Language code, you can just use RemoteBatchSubmit to send it to be run as a batch job in the cloud.
There’s a little bit of setup required on the batch computation provider side. First, you have to have an account with an appropriate provider—and initially we’re supporting AWS Batch and Charity Engine. Then you have to configure things with that provider (and we’ve got workflows that describe how to do that). But as soon as that’s done, you’ll get a remote batch submission environment that’s basically all you need to start submitting batch jobs:
✕
env = RemoteBatchSubmissionEnvironment[ "AWSBatch", <"JobQueue" > "arn:aws:batch:useast1:123456789012:jobqueue/MyQueue", "JobDefinition" > "arn:aws:batch:useast1:123456789012:jobdefinition/MyDefinition:\ 1", "IOBucket" > "myjobbucket">] 
OK, so what would be involved, say, in submitting a neural net training? Here’s how I would run it locally on my machine (and, yes, this is a very simple example):
✕
NetTrain[NetModel["LeNet"], "MNIST"] 
And here’s the minimal way I would send it to run on AWS Batch:
✕
job = RemoteBatchSubmit[env, NetTrain[NetModel["LeNet"], "MNIST"]] 
I get back an object that represents my remote batch job—that I can query to find out what’s happened with my job. At first it’ll just tell me that my job is “runnable”:
✕
job["JobStatus"] 
Later on, it’ll say that it’s “starting”, then “running”, then (if all goes well) “succeeded”. And once the job is finished, you can get back the result like this:
✕
job["EvaluationResult"] 
There’s lots of detail you can retrieve about what actually happened. Like here’s the beginning of the raw job log:
✕
job["JobLog"] 
But the real point of running your computations remotely in a cloud is that they can potentially be bigger and crunchier than the ones you can run on your own machines. Here’s how we could run the same computation as above, but now requesting the use of a GPU:
✕
RemoteBatchSubmit[env, NetTrain[NetModel["LeNet"], "MNIST", TargetDevice > "GPU"], RemoteProviderSettings > <"GPUCount" > 1>] 
RemoteBatchSubmit can also handle parallel computations. If you request a multicore machine, you can immediately run ParallelMap etc. across its cores. But you can go even further with RemoteBatchMapSubmit—which automatically distributes your computation across a whole collection of separate machines in the cloud.
Here’s an example:
✕
job = RemoteBatchMapSubmit[env, ImageIdentify, WebImageSearch["happy", 100]] 
While it’s running, we can get a dynamic display of the status of each part of the job:
✕
job["DynamicStatusVisualization"] 
About 5 minutes later, the job is finished:
✕
job["JobStatus"] 
And here are our results:
✕
ReverseSort[Counts[job["EvaluationResults"]]] 
RemoteBatchSubmit and RemoteBatchMapSubmit give you highlevel access to cloud compute services for general batch computation. But in Version 12.2 there is also a direct lowerlevel interface available, for example for AWS.
Connect to AWS:
✕
aws = ServiceConnect["AWS"] 
Once you’ve authenticated, you can see all the services that are available:
✕
aws["Services"] 
This gives a handle to the Amazon Translate service:
✕
aws["GetService", "Name" > "Translate"] 
Now you can use this to call the service:
✕
%["TranslateText", "Text" > "今日は良い一日だった", "SourceLanguageCode" > "auto", "TargetLanguageCode" > "en" ] 
Of course, you can always do language translation directly through the Wolfram Language too:
✕
TextTranslation["今日は良い一日だった"] 
It’s straightforward to plot data that involves one, two or three dimensions. For a few dimensions above that, you can use colors or other styling. But by the time you’re dealing with ten dimensions, that breaks down. And if you’ve got a lot of data in 10D, for example, then you’re probably going to have to use something like DimensionReduce to try to tease out “interesting features”.
But if you’re just dealing with a few “data points”, there are other ways to visualize things like 10dimensional data. And in Version 12.2 we’re introducing several functions for doing this.
As a first example, let’s look at ParallelAxisPlot. The idea here is that every “dimension” is plotted on a “separate axis”. For a single point it’s not that exciting:
✕
ParallelAxisPlot[{{10, 17, 19, 8, 7, 5, 17, 4, 8, 2}}, PlotRange > {0, 20}] 
Here’s what happens if we plot three random “10D data points”:
✕
ParallelAxisPlot[RandomInteger[20, {3, 10}], PlotRange > {0, 20}] 
But one of the important features of ParallelAxisPlot is that by default it automatically determines the scale on each axis, so there’s no need for the axes to be representing similar kinds of things. So, for example, here are 7 completely different quantities plotted for all the chemical elements:
✕
ParallelAxisPlot[ EntityValue[ "Element", {EntityProperty["Element", "AtomicMass"], EntityProperty["Element", "AtomicRadius"], EntityProperty["Element", "BoilingPoint"], EntityProperty["Element", "ElectricalConductivity"], EntityProperty["Element", "MeltingPoint"], EntityProperty["Element", "NeutronCrossSection"], EntityProperty["Element", "ThermalConductivity"]}]] 
Different kinds of highdimensional data do best on different kinds of plots. Another new type of plot in Version 12.2 is RadialAxisPlot. (This type of plot also goes by names like radar plot, spider plot and star plot.)
RadialAxisPlot plots each dimension in a different direction:
✕
RadialAxisPlot[ EntityValue[ "Element", {EntityProperty["Element", "AtomicMass"], EntityProperty["Element", "AtomicRadius"], EntityProperty["Element", "BoilingPoint"], EntityProperty["Element", "ElectricalConductivity"], EntityProperty["Element", "MeltingPoint"], EntityProperty["Element", "NeutronCrossSection"], EntityProperty["Element", "ThermalConductivity"]}]] 
It’s typically most informative when there aren’t too many data points:
✕
RadialAxisPlot[ EntityValue[{Entity["City", {"Chicago", "Illinois", "UnitedStates"}], Entity["City", {"Dallas", "Texas", "UnitedStates"}], Entity["City", {"NewYork", "NewYork", "UnitedStates"}], Entity["City", {"LosAngeles", "California", "UnitedStates"}]}, {EntityProperty["City", "MedianHomeSalePrice"], EntityProperty["City", "TotalSalesTaxRate"], EntityProperty["City", "MedianHouseholdIncome"], EntityProperty["City", "Population"], EntityProperty["City", "Area"]}, "EntityAssociation"], PlotLegends > Automatic] 
Back in 1984 I used a Cray supercomputer to make 3D pictures of 2D cellular automata evolving in time (yes, captured on 35 mm slides):
I’ve been waiting for 36 years to have a really streamlined way to reproduce these. And now finally in Version 12.2 we have it: ArrayPlot3D. Already in 2012 we introduced Image3D to represent and display 3D images composed of 3D voxels with specified colors and opacities. But its emphasis is on “radiologystyle” work, in which there’s a certain assumption of continuity between voxels. And if you’ve really got a discrete array of discrete data (as in cellular automata) that won’t lead to crisp results.
And here it is, for a slightly more elaborate case of a 3D cellular automaton:
✕
Table[ArrayPlot3D[ CellularAutomaton[{14, {2, 1}, {1, 1, 1}}, {{{{1}}}, 0}, {{{t}}}]], {t, 20, 40, 10}] 
Another new ArrayPlotfamily function in 12.2 is ComplexArrayPlot, here applied to an array of values from Newton’s method:
✕
Table[ArrayPlot3D[ CellularAutomaton[{14, {2, 1}, {1, 1, 1}}, {{{{1}}}, 0}, {{{t}}}], PlotTheme > "Web"], {t, 10, 40, 10}] 
One of our objectives in Wolfram Language is to have visualizations that just “automatically look good”—because they’ve got algorithms and heuristics that effectively implement good computational aesthetics. In Version 12.2 we’ve tuned up the computational aesthetics for a variety of types of visualization. For example, in 12.1 this is what a SliceVectorPlot3D looked like by default:
✕
SliceVectorPlot3D[{y + x, z, y}, {x, 2, 2}, {y, 2, 2}, {z, 2, 2}] 
Now it looks like this:
Since Version 10, we’ve also been making increasing use of our PlotTheme option, to “bank switch” detailed options to make visualizations that are suitable for different purposes, and meet different aesthetic goals. So for example in Version 12.2 we’ve added plot themes to GeoRegionValuePlot. Here’s an example of the default (which has been updated, by the way):
✕
GeoRegionValuePlot[CloudGet["https://wolfr.am/ROWDoxAw"] > "GDP"] 
And here it is with the "Marketing" plot theme:
✕
GeoRegionValuePlot[CloudGet["https://wolfr.am/ROWDoxAw"] > "GDP", PlotTheme > "Marketing"] 
Another thing in Version 12.2 is the addition of new primitives and new “raw material” for creating aesthetic visual effects. In Version 12.1 we introduced things like HatchFilling for crosshatching. In Version 12.2 we now also have LinearGradientFilling:
✕
Graphics[Style[Disk[], LinearGradientFilling[{RGBColor[1., 0.71, 0.75], RGBColor[0.64, Rational[182, 255], Rational[244, 255]]}]]] 
And we can now add this kind of effect to the filling in a plot:
✕
Plot[2 Sin[x] + x, {x, 0, 15}, FillingStyle > LinearGradientFilling[{RGBColor[0.64, Rational[182, 255], Rational[244, 255]], RGBColor[1., 0.71, 0.75]}, Top], Filling > Bottom] 
To be even more stylish, one can plot random points using the new ConicGradientFilling:
✕
Graphics[Table[ Style[Disk[RandomReal[20, 2]], ConicGradientFilling[RandomColor[3]]], 100]] 
A core goal of the Wolfram Language is to define a coherent computational language that can readily be understood by both computers and humans. We (and I in particular!) put a lot of effort into the design of the language, and into things like picking the right names for functions. But in making the language as easy to read as possible, it’s also important to streamline its “nonverbal” or syntactic aspects. For function names, we’re basically leveraging people’s understanding of words in natural language. For syntactic structure, we want to leverage people’s “ambient understanding”, for example, from areas like math.
More than a decade ago we introduced as a way to specify Function functions, so instead of writing
✕
Function[x, x^2] 
(or #^{2}&) you could write:
✕
x > x^2 
But to enter you had to type \[Function] or at least fn , which tended to feel “a bit difficult”.
Well, in Version 12.2, we’re “mainstreaming” by making it possible to type just as >
✕
x   > x^2 
You can also do things like
✕
{x, y} > x + y 
as well as things like:
✕
SameTest > ({x, y} > Mod[x  y, 2] == 0) 
In Version 12.2, there’s also another new piece of “short syntax”: //=
Imagine you’ve got a result, say called res. Now you want to apply a function to res, and then “update res”. The new function ApplyTo (written //=) makes it easy to do that:
✕
res = 10 
✕
res //= f 
✕
res 
We’re always on the lookout for repeated “lumps of computation” that we can “package” into functions with “easytounderstand names”. And in Version 12.2 we have a couple of new such functions: FoldWhile and FoldWhileList. FoldList normally just takes a list and “folds” each successive element into the result it’s building up—until it gets to the end of the list:
✕
FoldList[f, {1, 2, 3, 4}] 
But what if you want to “stop early”? FoldWhileList lets you do that. So here we’re successively dividing by 1, 2, 3, …, stopping when the result isn’t an integer anymore:
✕
FoldWhileList[Divide, 5!, Range[10], IntegerQ] 
Let’s say you’ve got an array, like:
✕
{{a, b, c, d}, {x, y, z, w}} // MatrixForm 
Map lets you map a function over the “rows” of this array:
✕
Map[f, {{a, b, c, d}, {x, y, z, w}}] 
But what if you want to operate on the “columns” of the array, effectively “reducing out” the first dimension of the array? In Version 12.2 the function ArrayReduce lets you do this:
✕
ArrayReduce[f, {{a, b, c, d}, {x, y, z, w}}, 1] 
Here’s what happens if instead we tell ArrayReduce to “reduce out” the second dimension of the array:
✕
ArrayReduce[f, {{a, b, c, d}, {x, y, z, w}}, 2] 
What’s really going on here? The array has dimensions 2×4:
✕
Dimensions[{{a, b, c, d}, {x, y, z, w}}] 
ArrayReduce[f, ..., 1] “reduces out” the first dimension, leaving an array with dimensions {4}. ArrayReduce[f, ..., 2] reduces out the second dimension, leaving an array with dimensions {2}.
Let’s look at a slightly bigger case—a 2×3×4 array:
✕
array = ArrayReshape[Range[24], {2, 3, 4}] 
This now eliminates the “first dimension”, leaving a 3×4 array:
✕
ArrayReduce[f, array, 1] 
✕
Dimensions[%] 
This, on the other hand, eliminates the “second dimension”, leaving a 2×4 array:
✕
ArrayReduce[f, array, 2] 
✕
Dimensions[%] 
Why is this useful? One example is when you have arrays of data where different dimensions correspond to different attributes, and then you want to “ignore” a particular attribute, and aggregate the data with respect to it. Let’s say that the attribute you want to ignore is at level n in your array. Then all you do to “ignore” it is to use ArrayReduce[f, ..., n], where f is the function that aggregates values (often something like Total or Mean).
You can achieve the same results as ArrayReduce by appropriate sequences of Transpose, Apply, etc. But it’s quite messy, and ArrayReduce provides an elegant “packaging” of these kinds of array operations.
ArrayReduce is quite general; it lets you not only “reduce out” single dimensions, but whole collections of dimensions:
✕
ArrayReduce[f, array, {2, 3}] 
✕
ArrayReduce[f, array, {{2}, {3}}] 
At the simplest level, ArrayReduce is a convenient way to apply functions “columnwise” on arrays. But in full generality it’s a way to apply functions to subarrays with arbitrary indices. And if you’re thinking in terms of tensors, ArrayReduce is a generalization of contraction, in which more than two indices can be involved, and elements can be “flattened” before the operation (which doesn’t have to be summation) is applied.
It’s an old adage in debugging code: “put in a print statement”. But it’s more elegant in the Wolfram Language, thanks particularly to Echo. It’s a simple idea: Echo[expr] “echoes” (i.e. prints) the value of expr, but then returns that value. So the result is that you can put Echo anywhere into your code (often as Echo@…) without affecting what your code does.
In Version 12.2 there are some new functions that follow the “Echo” pattern. A first example is EchoLabel, which just adds a label to what’s echoed:
✕
EchoLabel["a"]@5! + EchoLabel["b"]@10! 
Aficionados might wonder why EchoLabel is needed. After all, Echo itself allows a second argument that can specify a label. The answer—and yes, it’s a mildly subtle piece of language design—is that if one’s going to just insert Echo as a function to apply (say with @), then it can only have one argument, so no label. EchoLabel is set up to have the operator form EchoLabel[label] so that EchoLabel[label][expr] is equivalent to Echo[expr,label].
Another new “echo function” in 12.2 is EchoTiming, which displays the timing (in seconds) of whatever it evaluates:
✕
Table[Length[EchoTiming[Permutations[Range[n]]]], {n, 8, 10}] 
It’s often helpful to use both Echo and EchoTiming:
✕
Length[EchoTiming[Permutations[Range[Echo@10]]]] 
And, by the way, if you always want to print evaluation time (just like Mathematica 1.0 did by default 32 years ago) you can always globally set $Pre=EchoTiming.
Another new “echo function” in 12.2 is EchoEvaluation which echoes the “before” and “after” for an evaluation:
✕
EchoEvaluation[2 + 2] 
You might wonder what happens with nested EchoEvaluation’s. Here’s an example:
✕
EchoEvaluation[ Accumulate[EchoEvaluation[Reverse[EchoEvaluation[Range[10]]]]]] 
By the way, it’s quite common to want to use both EchoTiming and EchoEvaluation:
✕
Table[EchoTiming@EchoEvaluation@FactorInteger[2^(50 n)  1], {n, 2}] 
Finally, if you want to leave echo functions in your code, but want your code to “run quiet”, you can use the new QuietEcho to “quiet” all the echoes (like Quiet “quiets” messages):
✕
QuietEcho@ Table[EchoTiming@EchoEvaluation@FactorInteger[2^(50 n)  1], {n, 2}] 
Did something go wrong inside your program? And if so, what should the program do? It can be possible to write very elegant code if one ignores such things. But as soon as one starts to put in checks, and has logic for unwinding things if something goes wrong, it’s common for the code to get vastly more complicated, and vastly less readable.
What can one do about this? Well, in Version 12.2 we’ve developed a highlevel symbolic mechanism for handling things going wrong in code. Basically the idea is that you insert Confirm (or related functions)—a bit like you might insert Echo—to “confirm” that something in your program is doing what it should. If the confirmation works, then your program just keeps going. But if it fails, then the program stops–and exits to the nearest enclosing Enclose. In a sense, Enclose “encloses” regions of your program, not letting anything that goes wrong inside immediately propagate out.
Let’s see how this works in a simple case. Here the Confirm successfully “confirms” y, just returning it, and the Enclose doesn’t really do anything:
✕
Enclose[f[x, Confirm[y], z]] 
But now let’s put $Failed in place of y. $Failed is something that Confirm by default considers to be a problem. So when it sees $Failed, it stops, exiting to the Enclose—which in turn yields a Failure object:
✕
Enclose[f[x, Confirm[$Failed], z]] 
If we put in some echoes, we’ll see that x is successfully reached, but z is not; as soon as the Confirm fails, it stops everything:
✕
Enclose[f[Echo[x], Confirm[$Failed], Echo[z]]] 
A very common thing is to want to use Confirm/Enclose when you define a function:
✕
addtwo[x_] := Enclose[Confirm[x] + 2] 
Use argument 5 and everything just works:
✕
addtwo[5] 
But if we instead use Missing[]—which Confirm by default considers to be a problem—we get back a Failure object:
✕
addtwo[Missing[]] 
We could achieve the same thing with If, Return, etc. But even in this very simple case, it wouldn’t look as nice.
Confirm has a certain default set of things that it considers “wrong” ($Failed, Failure[...], Missing[...] are examples). But there are related functions that allow you to specify particular tests. For example, ConfirmBy applies a function to test if an expression should be confirmed.
Here, ConfirmBy confirms that 2 is a number:
✕
Enclose[f[1, ConfirmBy[2, NumberQ], 3]] 
But x is not considered so by NumberQ:
✕
Enclose[f[1, ConfirmBy[x, NumberQ], 3]] 
OK, so let’s put these pieces together. Let’s define a function that’s supposed to operate on strings:
✕
world[x_] := Enclose[ConfirmBy[x, StringQ] <> " world!"] 
If we give it a string, all is well:
✕
world["hello"] 
But if we give it a number instead, the ConfirmBy fails:
✕
world[4] 
But here’s where really nice things start to happen. Let’s say we want to map world over a list, always confirming that it gets a good result. Here everything is OK:
✕
Enclose[Confirm[world[#]] & /@ {"a", "b", "c"}] 
But now something has gone wrong:
✕
Enclose[Confirm[world[#]] & /@ {"a", "b", 3}] 
The ConfirmBy inside the definition of world failed, causing its enclosing Enclose to produce a Failure object. Then this Failure object caused the Confirm inside the Map to fail, and the enclosing Enclose gave a Failure object for the whole thing. Once again, we could have achieved the same thing with If, Throw, Catch, etc. But Confirm/Enclose do it more robustly, and more elegantly.
These are all very small examples. But where Confirm/Enclose really show their value is in large programs, and in providing a clear, highlevel framework for handling errors and exceptions, and defining their scope.
In addition to Confirm and ConfirmBy, there’s also ConfirmMatch, which confirms that an expression matches a specified pattern. Then there’s ConfirmQuiet, which confirms that the evaluation of an expression doesn’t generate any messages (or, at least, none that you told it to test for). There’s also ConfirmAssert, which simply takes an “assertion” (like p>0) and confirms that it’s true.
When a confirmation fails, the program always exits to the nearest enclosing Enclose, delivering to the Enclose a Failure object with information about the failure that occurred. When you set up the Enclose, you can tell it how to handle failure objects it receives—either just returning them (perhaps to enclosing Confirm’s and Enclose’s), or applying functions to their contents.
Confirm and Enclose provide an elegant mechanism for handling errors, that are easy and clean to insert into programs. But—needless to say—there are definitely some tricky issues around them. Let me mention just one. The question is: which Confirm’s does a given Enclose really enclose? If you’ve written a piece of code that explicitly contains Enclose and Confirm, it’s pretty obvious. But what if there’s a Confirm that’s somehow generated—perhaps dynamically—deep inside some stack of functions? It’s similar to the situation with named variables. Module just looks for the variables directly (“lexically”) inside its body. Block looks for variables (“dynamically”) wherever they may occur. Well, Enclose by default works like Module, “lexically” looking for Confirm’s to enclose. But if you include tags in Confirm and Enclose, you can set them up to “find each other” even if they’re not explicitly “visible” in the same piece of code.
Confirm/Enclose provide a good highlevel way to handle the “flow” of things going wrong inside a program or a function. But what if there’s something wrong right at the getgo? In our builtin Wolfram Language functions, there’s a standard set of checks we apply. Are there the correct number of arguments? If there are options, are they allowed options, and are they in the correct place? In Version 12.2 we’ve added two functions that can perform these standard checks for functions you write.
This says that f should have two arguments, which here it doesn’t:
✕
CheckArguments[f[x, y, z], 2] 
Here’s a way to make CheckArguments part of the basic definition of a function:
✕
f[args___] := Null /; CheckArguments[f[args], 2] 
Give it the wrong number of arguments, and it’ll generate a message, and then return unevaluated, just like lots of builtin Wolfram Language functions do:
✕
f[7] 
ArgumentsOptions is another new function in Version 12.2—that separates “positional arguments” from options in a function. Set up options for a function:
✕
Options[f] = {opt > Automatic}; 
This expects one positional argument, which it finds:
✕
ArgumentsOptions[f[x, opt > 7], 1] 
If it doesn’t find exactly one positional argument, it generates a message:
✕
ArgumentsOptions[f[x, y], 1] 
You run a piece of code and it does what it does—and typically you don’t want it to leave anything behind. Often you can use scoping constructs like Module, Block, BlockRandom, etc. to achieve this. But sometimes there’ll be something you set up that needs to be explicitly “cleaned up” when your code finishes.
For example, you might create a file in your piece of code, and want the file removed when that particular piece of code finishes. In Version 12.2 there’s a convenient new function for managing things like this: WithCleanup.
WithCleanup[expr, cleanup] evaluates expr, then cleanup—but returns the result from expr. Here’s a trivial example (which could really be achieved better with Block). You’re assigning a value to x, getting its square—then clearing x before returning the square:
✕
WithCleanup[x = 7; x^2, Clear[x]] 
It’s already convenient just to have a construct that does cleanup while still returning the main expression you were evaluating. But an important detail of WithCleanup is that it also handles the situation where you abort the main evaluation you were doing. Normally, issuing an abort would cause everything to stop. But WithCleanup is set up to make sure that the cleanup happens even if there’s an abort. So if the cleanup involves, for example, deleting a file, the file gets deleted, even if the main operation is aborted.
WithCleanup also allows an initialization to be given. So here the initialization is done, as is the cleanup, but the main evaluation is aborted:
✕
WithCleanup[Echo[1], Abort[]; Echo[2], Echo[3]] 
By the way, WithCleanup can also be used with Confirm/Enclose to ensure that even if a confirmation fails, certain cleanup will be done.
It’s December 16, 2020, today—at least according to the standard Gregorian calendar that’s usually used in the US. But there are many other calendar systems in use for various purposes around the world, and even more that have been used at one time or another historically.
In earlier versions of Wolfram Language we supported a few common calendar systems. But in Version 12.2 we’ve added very broad support for calendar systems—altogether 41 of them. One can think of calendar systems as being a bit like projections in geodesy or coordinate systems in geometry. You have a certain time: now you have to know how it is represented in whatever system you’re using. And much like GeoProjectionData, there’s now CalendarData which can give you a list of available calendar systems:
✕
CalendarData["DateCalendar"] 
So here’s the representation of “now” converted to different calendars:
✕
CalendarConvert[Now, #] & /@ CalendarData["DateCalendar"] 
There are many subtleties here. Some calendars are purely “arithmetic”; others rely on astronomical computations. And then there’s the matter of “leap variants”. With the Gregorian calendar, we’re used to just adding a February 29. But the Chinese calendar, for example, can add whole “leap months” within a year (so that, for example, there can be two “fourth months”). In the Wolfram Language, we now have a symbolic representation for such things, using LeapVariant:
✕
DateObject[{72, 25, LeapVariant[4], 20}, CalendarType > "Chinese"] 
One reason to deal with different calendar systems is that they’re used to determine holidays and festivals in different cultures. (Another reason, particularly relevant to someone like me who studies history quite a bit, is in the conversion of historical dates: Newton’s birthday was originally recorded as December 25, 1642, but converting it to a Gregorian date it’s January 4, 1643.)
Given a calendar, something one often wants to do is to select dates that satisfy a particular criterion. And in Version 12.2 we’ve introduced the function DateSelect to do this. So, for example, we can select dates within a particular interval that satisfy the criterion that they are Wednesdays:
✕
DateSelect[DateInterval[{{{2020, 4, 1}, {2020, 4, 30}}}, "Day", "Gregorian", 5.], #DayName == Wednesday &] 
As a more complicated example, we can convert the current algorithm for selecting dates of US presidential elections to computable form, and then use it to determine dates for the next 50 years:
✕
DateSelect[DateInterval[{{2020}, {2070}}, "Day"], Divisible[#Year, 4] && #Month == 11 && #DayName == Tuesday && Or[#DayNameInstanceInMonth == 1 && #Day =!= 1, #DayNameInstanceInMonth == 2 && #Day == 8] &] 
By now, the Wolfram Language has strong capabilities in geo computation and geo visualization. But we’re continuing to expand our geo functionality. In Version 12.2 an important addition is spatial statistics (mentioned above)—which is fully integrated with geo. But there are also a couple of new geo primitives. One is GeoBoundary, which computes boundaries of things:
✕
GeoBoundary[CloudGet["https://wolfr.am/ROWGPJ4I"]] 
✕
GeoLength[%] 
There’s also GeoPolygon, which is a full geo generalization of ordinary polygons. One of the tricky issues GeoPolygon has to handle is what counts as the “interior” of a polygon on the Earth. Here it’s picking the larger area (i.e. the one that wraps around the globe):
✕
GeoGraphics[ GeoPolygon[{{50, 70}, {30, 90}, {70, 50}}, "LargerArea"]] 
GeoPolygon can also—like Polygon—handle holes, or in fact arbitrary levels of nesting:
✕
GeoGraphics[ GeoPolygon[ Entity["AdministrativeDivision", {"Illinois", "UnitedStates"}] > Entity["AdministrativeDivision", {"ChampaignCounty", "Illinois", "UnitedStates"}]]] 
But the biggest “coming attraction” of geo is completely new rendering of geo graphics and maps. It’s still preliminary (and unfinished) in Version 12.2, but there’s at least experimental support for vectorbased map rendering. The most obvious payoff from this is maps that look much crisper and sharper at all scales. But another payoff is our ability to introduce new styling for maps, and in Version 12.2 we’re including eight new map styles.
Here’s our “oldstyle”map:
✕
GeoGraphics[Entity["Building", "EiffelTower::5h9w8"], GeoRange > Quantity[400, "Meters"]] 
Here’s the new, vector version of this “classic” style:
✕
GeoGraphics[Entity["Building", "EiffelTower::5h9w8"], GeoBackground > "VectorClassic", GeoRange > Quantity[400, "Meters"]] 
Here’s a new (vector) style, intended for the web:
✕
GeoGraphics[Entity["Building", "EiffelTower::5h9w8"], GeoBackground > "VectorWeb", GeoRange > Quantity[400, "Meters"]] 
And here’s a “dark” style, suitable for having information overlaid on it:
✕
GeoGraphics[Entity["Building", "EiffelTower::5h9w8"], GeoBackground > "VectorDark", GeoRange > Quantity[400, "Meters"]] 
Want to analyze a document that’s in PDF? We’ve been able to extract basic content from PDF files for well over a decade. But PDF is a highly complex (and evolving) format, and many documents “in the wild” have complicated structures. In Version 12.2, however, we’ve dramatically expanded our PDF import capabilities, so that it becomes realistic to, for example, take a random paper from arXiv, and import it:
✕
Import["https://arxiv.org/pdf/2011.12174.pdf"] 
By default, what you’ll get is a highresolution image for each page (in this particular case, all 100 pages).
If you want the text, you can import that with "Plaintext":
✕
Import["https://arxiv.org/pdf/2011.12174.pdf", "Plaintext"] 
Now you can immediately make a word cloud of the words in the paper:
✕
WordCloud[%] 
This picks out all the images from the paper, and makes a collage of them:
✕
ImageCollage[Import["https://arxiv.org/pdf/2011.12174.pdf", "Images"]] 
You can get the URLs from each page:
✕
Import["https://arxiv.org/pdf/2011.12174.pdf", "URLs"] 
Now pick off the last two, and get images of those webpages:
✕
WebImage /@ Take[Flatten[Values[%]], 2] 
Depending on how they’re produced, PDFs can have all sorts of structure. "ContentsGraph" gives a graph representing the overall structure detected for a document:
✕
Import["https://arxiv.org/pdf/2011.12174.pdf", "ContentsGraph"] 
And, yes, it really is a graph:
✕
Graph[EdgeList[%]] 
For PDFs that are fillable forms, there’s more structure to import. Here I grabbed a random unfilled government form from the web. Import gives an association whose keys are the names of the fields—and if the form had been filled in, it would have given their values too, so you could immediately do analysis on them:
✕
Import["https://www.fws.gov/forms/320041.pdf", "FormFieldRules"] 
Starting in Version 12.0, we’ve been adding stateoftheart capabilities for solving largescale optimization problems. In Version 12.2 we’ve continued to round out these capabilities.
One new thing is the superfunction ConvexOptimization, which automatically handles the full spectrum of linear, linearfractional, quadratic, semidefinite and conic optimization—giving both optimal solutions and their dual properties. In 12.1 we added support for integer variables (i.e. combinatorial optimization); in 12.2 we’re also adding support for complex variables.
But the biggest new things for optimization in 12.2 are the introduction of robust optimization and of parametric optimization. Robust optimization lets you find an optimum that’s valid across a whole range of values of some of the variables. Parametric optimization lets you get a parametric function that gives the optimum for any possible value of particular parameters. So for example this finds the optimum for x, y for any (positive) value of α:
✕
ParametricConvexOptimization[(x  1)^2 + Abs[y], {(x + \[Alpha])^2 <= 1, x + y >= \[Alpha]}, {x, y}, {\[Alpha]}] 
Now evaluate the parametric function for a particular α:
✕
%[.76] 
As with everything in the Wolfram Language, we’ve put a lot of effort into making sure that convex optimization integrates seamlessly into the rest of the system—so you can set up models symbolically, and flow their results into other functions. We’ve also included some very powerful convex optimization solvers. But particularly if you’re doing mixed (i.e. real+integer) optimization, or you’re dealing with really huge (e.g. 10 million variables) problems, we’re also giving access to other, external solvers. So, for example, you can set up your problem using Wolfram Language as your “algebraic modeling language”, then (assuming you have the appropriate external licenses) just by setting Method to, say, “Gurobi” or “Mosek” you can immediately run your problem with an external solver. (And, by the way, we now have an open framework for adding more solvers.)
One can say that the whole idea of symbolic expressions (and their transformations) on which we rely so much in the Wolfram Language originated with combinators—which just celebrated their centenary on December 7, 2020. The version of symbolic expressions that we have in Wolfram Language is in many ways vastly more advanced and usable than raw combinators. But in Version 12.2—partly by way of celebrating combinators—we wanted to add a framework for raw combinators.
So now for example we have CombinatorS, CombinatorK, etc., rendered appropriately:
✕
CombinatorS[CombinatorK] 
But how should we represent the application of one combinator to another? Today we write something like:
✕
f@g@h@x 
But in the early days of mathematical logic there was a different convention—that involved leftassociative application, in which one expected “combinator style” to generate “functions” not “values” from applying functions to things. So in Version 12.2 we’re introducing a new “application operator” Application, displayed as (and entered as \[Application] or ap ):
✕
Application[f, Application[g, Application[h, x]]] 
✕
Application[Application[Application[f, g], h], x] 
And, by the way, I fully expect Application—as a new, basic “constructor”—to have a variety of uses (not to mention “applications”) in setting up general structures in the Wolfram Language.
The rules for combinators are trivial to specify using pattern transformations in the Wolfram Language:
✕
{CombinatorS\[Application]x_\[Application]y_\[Application]z_ :> x\[Application]z\[Application](y\[Application]z), CombinatorK\[Application]x_\[Application]y_ :> x} 
But one can also think about combinators more “algebraically” as defining relations between expressions—and there’s now a theory in AxiomaticTheory for that.
And in 12.2 a few more other theories have been added to AxiomaticTheory, as well as several new properties.
One of the major advances in Version 12.0 was the introduction of a symbolic representation for Euclidean geometry: you specify a symbolic GeometricScene, giving a variety of objects and constraints, and the Wolfram Language can “solve” it, and draw a diagram of a random instance that satisfies the constraints. In Version 12.2 we’ve made this interactive, so you can move the points in the diagram around, and everything will (if possible) interactively be rearranged so as to maintain the constraints.
Here’s a random instance of a simple geometric scene:
✕
RandomInstance[ GeometricScene[{a, b, c, d}, {CircleThrough[{a, b, c}, d], Triangle[{a, b, c}], d == Midpoint[{a, c}]}]] 
If you move one of the points, the other points will interactively be rearranged so as to maintain the constraints defined in the symbolic representation of the geometric scene:
✕
RandomInstance[ GeometricScene[{a, b, c, d}, {CircleThrough[{a, b, c}, d], Triangle[{a, b, c}], d == Midpoint[{a, c}]}]] 
What’s really going on inside here? Basically, the geometry is getting converted to algebra. And if you want, you can get the algebraic formulation:
✕
%["AlgebraicFormulation"] 
And, needless to say, you can manipulate this using the many powerful algebraic computation capabilities of the Wolfram Language.
In addition to interactivity, another major new feature in 12.2 is the ability to handle not just complete geometric scenes, but also geometric constructions that involve building up a scene in multiple steps. Here’s an example—that happens to be taken directly from Euclid:
✕
RandomInstance[GeometricScene[ {{\[FormalCapitalA], \[FormalCapitalB], \[FormalCapitalC], \ \[FormalCapitalD], \[FormalCapitalE], \[FormalCapitalF]}, {}}, { GeometricStep[{Line[{\[FormalCapitalA], \[FormalCapitalB]}], Line[{\[FormalCapitalA], \[FormalCapitalC]}]}, "Define an arbitrary angle BAC."], GeometricStep[{\[FormalCapitalD] \[Element] Line[{\[FormalCapitalA], \[FormalCapitalB]}], \[FormalCapitalE] \ \[Element] Line[{\[FormalCapitalA], \[FormalCapitalC]}], EuclideanDistance[\[FormalCapitalA], \[FormalCapitalD]] == EuclideanDistance[\[FormalCapitalA], \[FormalCapitalE]]}, "Put D and E on AB and AC equidistant from A."], GeometricStep[{Line[{\[FormalCapitalD], \[FormalCapitalE]}], GeometricAssertion[{\[FormalCapitalA], \[FormalCapitalF]}, \ {"OppositeSides", Line[{\[FormalCapitalD], \[FormalCapitalE]}]}], GeometricAssertion[ Triangle[{\[FormalCapitalE], \[FormalCapitalF], \ \[FormalCapitalD]}], "Equilateral"], Line[{\[FormalCapitalA], \[FormalCapitalF]}]}, "Construct an equilateral triangle on DE."] } ]] 
The first image you get is basically the result of the construction. And—like all other geometric scenes—it’s now interactive. But if you mouse over it, you’ll get controls that allow you to move to earlier steps:
✕

Move a point at an earlier step, and you’ll see what consequences that has for later steps in the construction.
Euclid’s geometry is the very first axiomatic system for mathematics that we know about. So—2000+ years later—it’s exciting that we can finally make it computable. (And, yes, it will eventually connect up with AxiomaticTheory, FindEquationalProof, etc.)
But in recognition of the significance of Euclid’s original formulation of geometry, we’ve added computable versions of his propositions (as well as a bunch of other “famous geometric theorems”). The example above turns out to be proposition 9 in Euclid’s book 1. And now, for example, we can get his original statement of it in Greek:
✕
Entity["GeometricScene", "EuclidBook1Proposition9"]["GreekStatement"] 
And here it is in modern Wolfram Language—in a form that can be understood by both computers and humans:
✕
Entity["GeometricScene", "EuclidBook1Proposition9"]["Scene"] 
An important part of the story of Wolfram Language as a fullscale computational language is its access to our vast knowledgebase of data about the world. The knowledgebase is continually being updated and expanded, and indeed in the time since Version 12.1 essentially all domains have had data (and often a substantial amount) updated, or entities added or modified.
But as examples of what’s been done, let me mention a few additions. One area that’s received a lot of attention is food. By now we have data about more than half a million foods (by comparison, a typical large grocery store stocks perhaps 30,000 types of items). Pick a random food:
✕
RandomEntity["Food"] 
Now generate a nutrition label:
✕
%["NutritionLabel"] 
As another example, a new type of entity that’s been added is physical effects. Here are some random ones:
✕
RandomEntity["PhysicalEffect", 10] 
And as an example of something that can be done with all the data in this domain, here’s a histogram of the dates when these effects were discovered:
✕
DateHistogram[EntityValue["PhysicalEffect", "DiscoveryDate"], "Year", PlotRange > {{DateObject[{1700}, "Year", "Gregorian", 5.`], DateObject[{2000}, "Year", "Gregorian", 5.`]}, Automatic}] 
As another sample of what we’ve been up to, there’s also now what one might (tongueincheek) call a “heavylifting” domain—weighttraining exercises:
✕
Entity["WeightTrainingExercise", "BenchPress"]["Dataset"] 
An important feature of the Wolfram Knowledgebase is that it contains symbolic objects, which can represent not only “plain data”—like numbers or strings—but full computational content. And as an example of this, Version 12.2 allows one to access the Wolfram Demonstrations Project—with all its active Wolfram Language code and notebooks—directly in the knowledgebase. Here are some random Demonstrations:
✕
RandomEntity["WolframDemonstration", 5] 
The values of properties can be dynamic interactive objects:
✕
Entity["WolframDemonstration", "MooreSpiegelAttractor"]["Manipulate"] 
And because everything is computable, one can for example immediately make an image collage of all Demonstrations on a particular topic:
✕
ImageCollage[ EntityValue[ EntityClass["WolframDemonstration", "ChemicalEngineering"], "Thumbnail"]] 
It’s been nearly 7 years since we first introduced Classify and Predict, and began the process of fully integrating neural networks into the Wolfram Language. There’ve been two major directions: the first is to develop “superfunctions”, like Classify and Predict, that—as automatically as possible—perform machinelearningbased operations. The second direction is to provide a powerful symbolic framework to take advantage of the latest advances with neural nets (notably through the Wolfram Neural Net Repository) and to allow flexible continued development and experimentation.
Version 12.2 has progress in both these areas. An example of a new superfunction is FaceRecognize. Give it a small number of tagged examples of faces, and it will try to identify them in images, videos, etc. Let’s get some training data from web searches (and, yes, it’s somewhat noisy):
✕
faces = Image[#, ImageSize > 30] & /@ AssociationMap[Flatten[ FindFaces[#, "Image"] & /@ WebImageSearch["star trek " <> #]] &, {"JeanLuc Picard", "William Riker", "Phillipa Louvois", "Data"}] 
Now create a face recognizer with this training data:
✕
recognizer = FaceRecognize[faces] 
Now we can use this to find out who’s on screen in each frame of a video:
✕
VideoMapList[recognizer[FindFaces[#Image, "Image"]] &, Video[URLDownload["https://ia802900.us.archive.org/7/items/2000promoforstartrekthenextgeneration/2000%20promo%20for%20Star%20Trek%20%20The%20Next%20Generation.ia.mp4"]]] /. m_Missing \[RuleDelayed] "Other" 
Now plot the results:
✕
ListPlot[Catenate[ MapIndexed[{First[#2], #1} &, ArrayComponents[%], {2}]], Sequence[ ColorFunction > ColorData["Rainbow"], Ticks > {None, Thread[{ Range[ Max[ ArrayComponents[rec]]], DeleteDuplicates[ Flatten[rec]]}]}]] 
In the Wolfram Neural Net Repository there’s a regular stream of new networks being added. Since Version 12.1 about 20 new kinds of networks have been added—including many new transformer nets, as well as EfficientNet and for example feature extractors like BioBERT and SciBERT specifically trained on text from scientific papers.
In each case, the networks are immediately accessible—and usable—through NetModel. Something that’s updated in Version 12.2 is the visual display of networks:
✕
NetModel["ELMo Contextual Word Representations Trained on 1B Word \ Benchmark"] 
There are lots of new icons, but there’s also now a clear convention that circles represent fixed elements of a net, while squares represent trainable ones. In addition, when there’s a thick border in an icon, it means there’s an additional network inside, that you can see by clicking.
Whether it’s a network that comes from NetModel or your construct yourself (or a combination of those two), it’s often convenient to extract the “summary graphic” for the network, for example so you can put it in documentation or a publication. Information provides several levels of summary graphics:
✕
Information[ NetModel["CapsNet Trained on MNIST Data"], "SummaryGraphic"] 
There are several important additions to our core neural net framework that broaden the range of neural net functionality we can access. The first is that in Version 12.2 we have native encoders for graphs and for time series. So, here, for example, we’re making a feature space plot of 20 random named graphs:
✕
FeatureSpacePlot[GraphData /@ RandomSample[GraphData[], 20]] 
Another enhancement to the framework has to do with diagnostics for models. We introduced PredictorMeasurements and ClassifierMeasurements many years ago to provide a symbolic representation for the performance of models. In Version 12.2—in response to many requests—we’ve made it possible to feed final predictions, rather than a model, to create a PredictorMeasurements object, and we’ve streamlined the appearance and operation of PredictorMeasurements objects:
✕
PredictorMeasurements[{3.2, 3.5, 4.6, 5}, {3, 4, 5, 6}] 
An important new feature of ClassifierMeasurements is the ability to compute a calibration curve that compares the actual probabilities observed from sampling a test set with the predictions from the classifier. But what’s even more important is that Classify automatically calibrates its probabilities, in effect trying to “sculpt” the calibration curve:
✕
Row[{ First@ClassifierMeasurements[ Classify[training, Method > "RandomForest", "Calibration" > False], test, "CalibrationCurve"], " \[LongRightArrow] ", First@ClassifierMeasurements[ Classify[training, Method > "RandomForest", "Calibration" > True], test, "CalibrationCurve"] }] 
Version 12.2 also has the beginning of a major update to the way neural networks can be constructed. The fundamental setup has always been to put together a certain collection of layers that expose what amount to array indices that are connected by explicit edges in a graph. Version 12.2 now introduces FunctionLayer, which allows you to give something much closer to ordinary Wolfram Language code. As an example, here’s a particular function layer:
✕
FunctionLayer[ 2*(#v . #m . {0.25, 0.75}) . NetArray[<"Array" > {0.1, 0.9}>] & ] 
And here’s the representation of this function layer as an explicit NetGraph:
✕
NetGraph[%] 
v and m are named “input ports”. The NetArray—indicated by the square icons in the net graph—is a learnable array, here containing just two elements.
There are cases where it’s easier to use the “blockbased” (or “graphical”) programming approach of just connecting together layers (and we’ve worked hard to ensure that the connections can be made as automatically as possible). But there are also cases where it’s easier to use the “functional” programming approach of FunctionLayer. For now, FunctionLayer supports only a subset of the constructs available in the Wolfram Language—though this already includes many standard array and functional programming operations, and more will be added in the future.
An important feature of FunctionLayer is that the neural net it produces will be as efficient as any other neural net, and can run on GPUs etc. But what can you do about Wolfram Language constructs that are not yet natively supported by FunctionLayer? In Version 12.2 we’re adding another new experimental function—CompiledLayer—that extends the range of Wolfram Language code that can be handled efficiently.
It’s perhaps worth explaining a bit about what’s happening inside. Our main neural net framework is essentially a symbolic layer that organizes things for optimized lowlevel implementation, currently using MXNet. FunctionLayer is effectively translating certain Wolfram Language constructs directly to MXNet. CompiledLayer is translating Wolfram Language to LLVM and then to machine code, and inserting this into the execution process within MXNet. CompiledLayer makes use of the new Wolfram Language compiler, and its extensive type inference and type declaration mechanisms.
OK, so let’s say one’s built a magnificent neural net in our Wolfram Language framework. Everything is set up so that the network can immediately be used in a whole range of Wolfram Language superfunctions (Classify, FeatureSpacePlot, AnomalyDetection, FindClusters, …). But what if one wants to use the network “standalone” in an external environment? In Version 12.2 we’re introducing the capability to export essentially any network in the recently developed ONNX standard representation.
And once one has a network in ONNX form, one can use the whole ecosystem of external tools to deploy it in a wide variety of environments. A notable example—that’s now a fairly streamlined process—is to take a full Wolfram Language–created neural net and run it in CoreML on an iPhone, so that it can for example directly be included in a mobile app.
What’s the best way to collect structured material? If you just want to get a few items, an ordinary form created with FormFunction (and for example deployed in the cloud) can work well. But what if you’re trying to collect longer, richer material?
For example, let’s say you’re creating a quiz where you want students to enter a whole sequence of complex responses. Or let’s say you’re creating a template for people to fill in documentation for something. What you need in these cases is a new concept that we’re introducing in Version 12.2: form notebooks.
A form notebook is basically a notebook that is set up to be used as a complex “form”, where the inputs in the form can be all the kinds of things that you’re used to having in a notebook.
The basic workflow for form notebooks is the following. First you author a form notebook, defining the various “form elements” (or areas) that you want the user of the form notebook to fill in. As part of the authoring process, you define what you want to have happen to the material the user of the form notebook enters when they use the form notebook (e.g. put the material in a Wolfram Data Drop databin, send the material to a cloud API, send the material as a symbolic expression by email, etc.).
After you’ve authored the form notebook, you then generate an active version that can be sent to whoever will be using the form notebook. Once someone has filled in their material in their copy of the deployed form notebook, they press a button, typically “Submit”, and their material is then sent as a structured symbolic expression to whatever destination the author of the form notebook specified.
It’s perhaps worth mentioning how form notebooks relate to something that sounds similar: template notebooks. In a sense, a template notebook is doing the reverse of a form notebook. A form notebook is about having a user enter material that will then be processed. A template notebook, on the other hand, is about having the computer generate material which will then be used to populate a notebook whose structure is defined by the template notebook.
OK, so how do you get started with form notebooks? Just go to File > New > Programmatic Notebook > Form Notebook Authoring:
This is just a notebook, where you can enter whatever content you want—say an explanation of what you want people to do when they “fill out” the form notebook. But then there are special cells or sequences of cells in the form notebook that we call “form elements” and “editable notebook areas”. These are what the user of the form notebook “fills out” to enter their “responses”, and the material they provide is what gets sent when they press the “Submit” button (or whatever final action has been defined).
In the authoring notebook, the toolbar gives you a menu of possible form elements that you can insert:
Let’s pick Input Field as an example:
What does all this mean? Basically a form element is represented by a very flexible symbolic Wolfram Language expression, and this is giving you a way to specify the expression you want. You can give a label and a hint to put in the input field. But it’s with the Interpreter that you start to see the power of Wolfram Language. Because the Interpreter is what takes whatever the user of the form notebook enters in this input field, and interprets it as a computable object. The default is just to treat it as a string. But it could for example be a “Country” or a “MathExpression”. And with these choices, the material will automatically be interpreted as a country, math expression, etc., with the user typically being prompted if their input can’t be interpreted as specified.
There are lots of options about the details of how even an input field can work. Some of them are provided in the Add Action menu:
But so what actually “is” this form element? Press the CODE tab on the left to see:
What would a user of the form notebook see here? Press the PREVIEW tab to find out:
Beyond input fields, there are lots of other possible form elements. There are things like checkboxes, radio buttons and sliders. And in general it’s possible to use any of the rich symbolic user interface constructs that exist in the Wolfram Language.
Once you’ve finishing authoring, you press Generate to generate a form notebook that is ready to be provided to users to be filled in. The Settings define things like how the “submit” action should be specified, and what should be done when the form notebook is submitted:
So what is the “result” of a submitted form notebook? Basically it’s an association that says what was filled into each area of the form notebook. (The areas are identified by keys in the association that were specified when the areas were first defined in the authoring notebook.)
Let’s see how this works in a simple case. Here’s the authoring notebook for a form notebook:
Here’s the generated form notebook, ready to be filled in (assuming you have 12.2):
Here’s a sample of how the form notebook might be filled in:
And this is what “comes back” when Submit is pressed:
For testing, you can just have this association placed interactively in a notebook. But in practice it’s more common to send the association to a databin, store it in a cloud object, or generally put it in a more “centralized” location.
Notice that at the end of this example we have an editable notebook area—where you can enter freeform notebook content (with cells, headings, code, output, etc.) that will all be captured when the form notebook is submitted.
Form notebooks are very powerful idea, and you’ll see them used all over the place. As a first example, the various submission notebooks for the Wolfram Function Repository, Wolfram Demonstrations Project, etc. are becoming form notebooks. We’re also expecting a lot of use of form notebooks in educational settings. And as part of that, we’re building a system that leverages Wolfram Language for assessing responses in form notebooks (and elsewhere).
You can see the beginnings of this in Version 12.2 with the experimental function AssessmentFunction—which can be hooked into form notebooks somewhat like Interpreter. But even without the full capabilities planned for AssessmentFunction there’s still an incredible amount that can be done—in educational settings and otherwise—using form notebooks.
It’s worth understanding, by the way, that form notebooks are ultimately very simple to use in any particular case. Yes, they have a lot of depth that allows them to do a very wide range of things. And they’re basically only possible because of the whole symbolic structure of the Wolfram Language, and the fact that Wolfram Notebooks are ultimately represented as symbolic expressions. But when it comes to using them for a particular purpose they’re very streamlined and straightforward, and it’s completely realistic to create a useful form notebook in just a few minutes.
We invented notebooks—with all their basic features of hierarchical cells, etc.—back in 1987. But for a third of a century, we’ve been progressively polishing and streamlining how they work. And in Version 12.2 there are all sorts of useful and convenient new notebook features.
It’s a very simple feature, but it’s very useful. You see something in a notebook, and all you really want to be able to do with it is copy it (or perhaps copy something related to it). Well, then just use ClickToCopy:
✕
ClickToCopy[10!] 
If you want to clicktocopy something unevaluated, use Defer:
✕
ClickToCopy[Plot[Sin[x], {x, 0, 10}], Defer[Plot[Sin[x], {x, 0, 10}]]] 
++h has inserted a hyperlink in a Wolfram Notebook since 1996. But in Version 12.2 there are two important new things with hyperlinks. First, automatic hyperlinking that handles a wide range of different situations. And second, a modernized and streamlined mechanism for hyperlink creation and editing.
In Version 12.2 we’re exposing something that we’ve had internally for a while: the ability to attach a floating fully functional cell to any given cell (or box, or whole notebook). Accessing this feature needs symbolic notebook programming, but it lets you do very powerful things—particularly in introducing contextual and “justintime” interfaces. Here’s an example that puts a dynamic counter that counts in primes on the rightbottom part of the cell bracket:
✕
obj=AttachCell[EvaluationCell[],Panel[Dynamic[i]],{"CellBracket",Bottom},0,{Right,Bottom}]; Do[PrimeQ[i],{i,10^7}]; NotebookDelete[obj] 
Sometimes it’s useful for what you see not to be what you have. For example, you might want to display something in a notebook as J_{0}(x) but have it really be BesselJ[0, x]. For many years, we’ve had Interpretation as a way to set this up for specific expressions. But we’ve also had a more general mechanism—TemplateBox—that lets you take expressions, and separately specify how they should be displayed, and interpreted.
In Version 12.2 we’ve further generalized—and streamlined—TemplateBox, allowing it to incorporate arbitrary user interface elements, as well as allowing it to specify things like copy behavior. Our new T_{E}X input mechanism, for example, is basically just an application of the new TemplateBox.
In this case, "TeXAssistantTemplate" refers to a piece of functionality defined in the notebook stylesheet—whose parameters are specified by the association given in the TemplateBox:
✕
RawBoxes[TemplateBox[< "boxes" > FormBox[FractionBox["1", "2"], TraditionalForm], "errors" > {}, "input" > "\\frac{1}{2}", "state" > "Boxes">, "TeXAssistantTemplate"]] 
An important feature of Wolfram Notebooks is that they’re set up to operate both on the desktop and in the cloud. And even between versions of Wolfram Language there’s lots of continued enhancement in the way notebooks work in the cloud. But in Version 12.2 there’s been some particular streamlining of the interface for notebooks between desktop and cloud.
A particularly nice mechanism already available for a couple of years in any desktop notebook is the File > Publish to Cloud menu item, which allows you to take the notebook and immediately make it available as a published cloud notebook that can be accessed by anyone with a web browser. In Version 12.2 we’ve streamlined the process of notebook publishing.
When I’m giving a presentation I’ll usually be creating a desktop notebook as I go (or perhaps using one that already exists). And at the end of the presentation, it’s become my practice to publish it to the cloud, so anyone in the audience can interact with it. But how can I give everyone the URL for the notebook? In a virtual setting, you can just use chat. But in an actual physical presentation, that’s not an option. And in Version 12.2 we’ve provided a convenient alternative: the result of Publish to Cloud includes a QR code that people can capture with their phones, then immediately go to the URL and interact with the notebook on their phones.
There’s one other notable new item visible in the result of Publish to Cloud: “Direct JavaScript Embedding”. This is a link to the Wolfram Notebook Embedder which allows cloud notebooks to be directly embedded through JavaScript onto webpages.
It’s always easy to use an iframe to embed one webpage on another. But iframes have many limitations, such as requiring their sizes to be defined in advance. The Wolfram Notebook Embedder allows fullfunction fluid embedding of cloud notebooks—as well as scriptable control of the notebooks from other elements of a webpage. And since the Wolfram Notebook Embedder is set up to use the oEmbed embedding standard, it can immediately be used in basically all standard web content management systems.
We’ve talked about sending notebooks from the desktop to the cloud. But another thing that’s new in Version 12.2 is faster and easier browsing of your cloud file system from the desktop—as accessed from File > Open from Cloud and File > Save to Cloud.
One of the things we want to do with Wolfram Language is to make it as easy as possible to connect with pretty much any external system. And in modern times an important part of that is being able to conveniently handle cryptographic protocols. And ever since we started introducing cryptography directly into the Wolfram Language five years ago, I’ve been surprised at just how much the symbolic character of the Wolfram Language has allowed us to clarify and streamline things to do with cryptography.
A particularly dramatic example of this has been how we’ve been able to integrate blockchains into Wolfram Language (and Version 12.2 adds bloxberg with several more on the way). And in successive versions we’re handling different applications of cryptography. In Version 12.2 a major emphasis is symbolic capabilities for key management. Version 12.1 already introduced SystemCredential for dealing with local “keychain” key management (supporting, for example, “remember me” in authentication dialogs). In 12.2 we’re also dealing with PEM files.
If we import a PEM file containing a private key we get a nice, symbolic representation of the private key:
✕
private = First[Import["ExampleData/privatesecp256k1.pem"]] 
Now we can derive a public key:
✕
public = PublicKey[%] 
If we generate a digital signature for a message using the private key
✕
GenerateDigitalSignature["Hello there", private] 
then this verifies the signature using the public key we’ve derived:
✕
VerifyDigitalSignature[{"Hello there", %}, public] 
An important part of modern security infrastructure is the concept of a security certificate—a digital construct that allows a third party to attest to the authenticity of a particular public key. In Version 12.2 we now have a symbolic representation for security certificates—providing what’s needed for programs to establish secure communication channels with outside entities in the same kind of way that https does:
✕
Import["ExampleData/client.pem"] 
In Version 12.0 we introduced powerful functionality for querying relational databases symbolically within the Wolfram Language. Here’s how we connect to a database:
✕
db = DatabaseReference[ FindFile["ExampleData/ecommercedatabase.sqlite"]] 
Here’s how we connect the database so that its tables can be treated just like entity types from the builtin Wolfram Knowledgebase:
✕
EntityRegister[EntityStore[RelationalDatabase[db]]] 
Now we can for example ask for a list of entities of a given type:
✕
EntityList["offices"] 
What’s new in 12.2 is that we can conveniently go “under” this layer, to directly execute SQL queries against the underlying database, getting the complete database table as a Dataset expression:
✕
ExternalEvaluate[db, "SELECT * FROM offices"] 
These queries can not only read from the database, but also write to it. And to make things even more convenient, we can effectively treat SQL just like any other “external language” in a notebook.
First we have to register our database, to say what we want our SQL to be run against:
✕
RegisterExternalEvaluator["SQL", db] 
And now we can just type SQL as input—and get back Wolfram Language output, directly in the notebook:
You’ve developed a control system or signal processing in Wolfram Language. Now how do you deploy it to a piece of standalone electronics? In Version 12.0 we introduced the Microcontroller Kit for compiling from symbolic Wolfram Language structures directly to microcontroller code.
We’ve had lots of feedback on this, asking us to expand the range of microcontrollers that we support. So in Version 12.2 I’m happy to say that we’re adding support for 36 new microcontrollers, particularly 32bit ones:
Here’s an example in which we deploy a symbolically defined digital filter to a particular kind of microcontroller, showing the simplified C source code generated for that particular microcontroller:
✕
Needs["MicrocontrollerKit`"] 
✕
ToDiscreteTimeModel[ButterworthFilterModel[{3, 2}], 0.6] // Chop 
✕
MicrocontrollerEmbedCode[%, <"Target" > "AdafruitGrandCentralM4", "Inputs" > 0 > "Serial", "Outputs" > 1 > "Serial">, "/dev/cu.usbmodem14101"]["SourceCode"] 
Our longterm goal is to make the Wolfram Language and the computational intelligence it provides as ubiquitous as possible. And part of doing this is to set up the Wolfram Engine which implements the language so that it can be deployed in as broad a range of computational infrastructure settings as possible.
Wolfram Desktop—as well as classic Mathematica—primarily provides a notebook interface to the Wolfram Engine, running on a local desktop system. It’s also possible to run Wolfram Engine directly—as a commandline program (e.g. through WolframScript)—on a local computer system. And, of course, one can run the Wolfram Engine in the cloud, either through the full Wolfram Cloud (public or private), or through more lightweight cloud and server offerings (both existing and forthcoming).
But with Version 12.2 there’s a new deployment of the Wolfram Engine: WSTPServer. If you use Wolfram Engine in the cloud, you’re typically communicating with it through http or related protocols. But for more than thirty years, the Wolfram Language has had its own dedicated protocol for transferring symbolic expressions and everything around them. Originally we called it MathLink, but in more recent years, as it’s progressively been extended, we’ve called it WSTP: the Wolfram Symbolic Transfer Protocol. What WSTPServer does, as its name suggests, is to give you a lightweight server that delivers Wolfram Engines and lets you communicate with them directly in native WSTP.
Why is this important? Basically because it gives you a way to manage pools of persistent Wolfram Language sessions that can operate as services for other applications. For example, normally each time you call WolframScript you get a new, fresh Wolfram Engine. But by using wolframscript wstpserver with a particular “WSTP profile name” you can keep getting the same Wolfram Engine every time you call WolframScript. You can do this directly on your local machine—or on remote machines.
And an important use of WSTPServer is to expose pools of Wolfram Engines that can be accessed through the new RemoteEvaluate function in Version 12.2. It’s also possible to use WSTPServer to expose Wolfram Engines for use by ParallelMap, etc. And finally, since WSTP has (for nearly 30 years!) been the way the notebook front end communicates with the Wolfram Engine kernel, it’s now possible to use WSTPServer to set up a centralized kernel pool to which you can connect the notebook front end, allowing you, for example, to keep running a particular session (or even a particular computation) in the kernel even as you switch to a different notebook front end, on a different computer.
Along the lines of “use Wolfram Language everywhere” another new function in Version 12.2 is RemoteEvaluate. We’ve got CloudEvaluate which does a computation in the Wolfram Cloud, or an Enterprise Private Cloud. We’ve got ParallelEvaluate which does computations on a predefined collection of parallel subkernels. And in Version 12.2 we’ve got RemoteBatchSubmit which submits batch computations to cloud computation providers.
RemoteEvaluate is a general, lightweight “evaluate now” function that lets you do a computation on any specified remote machine that has an accessible Wolfram Engine. You can connect to the remote machine using ssh or wstp (or http with a Wolfram Cloud endpoint).
✕
RemoteEvaluate["ssh://byblis67.wolfram.com", Labeled[Framed[$MachineName], Now]] 
Sometimes you’ll want to use RemoteEvaluate to do things like system administration across a range of machines. Sometimes you might want to collect or send data to remote devices. For example, you might have a network of Raspberry Pi computers which all have Wolfram Engine—and then you can use RemoteEvaluate to do something like retrieve data from these machines. By the way, you can also use ParallelEvaluate from within RemoteEvaluate, so you’re having a remote machine be the master for a collection of parallel subkernels.
Sometimes you’ll want RemoteEvaluate to start a fresh instance of Wolfram Engine whenever you do an evaluation. But with WSTPServer you can also have it use a persistent Wolfram Language session. RemoteEvaluate and WSTPServer are the beginning of a general symbolic framework for representing running Wolfram Engine processes. Version 12.2 already has RemoteKernelObject and $DefaultRemoteKernel which provide symbolic ways to represent remote Wolfram Language instances.
I’ve at least touched on many of the bigger new features of Version 12.2. But there’s a lot more. Additional functions, enhancements, fixes and general rounding out and polishing.
Like in computational geometry, ConvexHullRegion now deals with regions, not just points. And there are functions like CollinearPoints and CoplanarPoints that test for collinearity and coplanarity, or give conditions for achieving them.
There are more import and export formats. Like there’s now support for the archive formats: “7z”, “ISO”, “RAR”, “ZSTD”. There’s also FileFormatQ and ByteArrayFormatQ for testing whether things correspond to particular formats.
In terms of core language, there are things like updates to the complicatedtodefine ValueQ. There’s also RandomGeneratorState that gives a symbolic representation of random generator states.
In the desktop package (i.e. .wl file) editor, there’s a new (somewhat experimental) Format Cell button, that reformats code—with a control on how “airy” it should be (i.e. how dense it should be in newlines).
In WolframAlphaMode Notebooks (as used by default in WolframAlpha Notebook Edition) there are other new features, like function documentation targeted for particular function usage.
There’s also more in TableView, as well as a large suite of new paclet authoring tools that are included on an experimental basis.
To me it’s rather amazing how much we’ve been able to bring together in Version 12.2, and, as always, I’m excited that it’s now out and available to everyone to use….
]]>On Tuesday, December 7, 1920, the Göttingen Mathematics Society held its regular weekly meeting—at which a 32yearold local mathematician named Moses Schönfinkel with no known previous mathematical publications gave a talk entitled “Elemente der Logik” (“Elements of Logic”).
A hundred years later what was presented in that talk still seems in many ways alien and futuristic—and for most people almost irreducibly abstract. But we now realize that that talk gave the first complete formalism for what is probably the single most important idea of this past century: the idea of universal computation.
Sixteen years later would come Turing machines (and lambda calculus). But in 1920 Moses Schönfinkel presented what he called “building blocks of logic”—or what we now call “combinators”—and then proceeded to show that by appropriately combining them one could effectively define any function, or, in modern terms, that they could be used to do universal computation.
Looking back a century it’s remarkable enough that Moses Schönfinkel conceptualized a formal system that could effectively capture the abstract notion of computation. And it’s more remarkable still that he formulated what amounts to the idea of universal computation, and showed that his system achieved it.
But for me the most amazing thing is that not only did he invent the first complete formalism for universal computation, but his formalism is probably in some sense minimal. I’ve personally spent years trying to work out just how simple the structure of systems that support universal computation can be—and for example with Turing machines it took from 1936 until 2007 for us to find the minimal case.
But back in his 1920 talk Moses Schönfinkel—presenting a formalism for universal computation for the very first time—gave something that is probably already in his context minimal.
Moses Schönfinkel described the result of his 1920 talk in an 11page paper published in 1924 entitled “Über die Bausteine der mathematischen Logik” (“On the Building Blocks of Mathematical Logic”). The paper is a model of clarity. It starts by saying that in the “axiomatic method” for mathematics it makes sense to try to keep the number of “fundamental notions” as small as possible. It reports that in 1913 Henry Sheffer managed to show that basic logic requires only one connective, that we now call Nand. But then it begins to go further. And already within a couple of paragraphs it’s saying that “We are led to [an] idea, which at first glance certainly appears extremely bold”. But by the end of the introduction it’s reporting, with surprise, the big news: “It seems to me remarkable in the extreme that the goal we have just set can be realized… [and]; as it happens, it can be done by a reduction to three fundamental signs”.
Those “three fundamental signs”, of which he only really needs two, are what we now call the S and K combinators (he called them S and C). In concept they’re remarkably simple, but their actual operation is in many ways braintwistingly complex. But there they were—already a century ago—just as they are today: minimal elements for universal computation, somehow conjured up from the mind of Moses Schönfinkel.
So who was this person, who managed so long ago to see so far?
The complete known published output of Moses Schönfinkel consists of just two papers: his 1924 “On the Building Blocks of Mathematical Logic”, and another, 31page paper from 1927, coauthored with Paul Bernays, entitled “Zum Entscheidungsproblem der mathematischen Logik” (“On the Decision Problem of Mathematical Logic”).
And somehow Schönfinkel has always been in the shadows—appearing at best only as a kind of footnote to a footnote. Turing machines have taken the limelight as models of computation—with combinators, hard to understand as they are, being mentioned at most only in obscure footnotes. And even within the study of combinators—often called “combinatory logic”—even as S and K have remained ubiquitous, Schönfinkel’s invention of them typically garners at most a footnote.
About Schönfinkel as a person, three things are commonly said. First, that he was somehow connected with the mathematician David Hilbert in Göttingen. Second, that he spent time in a psychiatric institution. And third, that he died in poverty in Moscow, probably around 1940 or 1942.
But of course there has to be more to the story. And in recognition of the centenary of Schönfinkel’s announcement of combinators, I decided to try to see what I could find out.
I don’t think I’ve got all the answers. But it’s been an interesting, if at times unsettling, trek through the Europe—and mathematics—of a century or so ago. And at the end of it I feel I’ve come to know and understand at least a little more about the triumph and tragedy of Moses Schönfinkel.
It’s a strange and sad resonance with Moses Schönfinkel’s life… but there’s a 1953 song by Tom Lehrer about plagiarism in mathematics—where the protagonist explains his chain of intellectual theft: “I have a friend in Minsk/Who has a friend in Pinsk/Whose friend in Omsk”… “/Whose friend somehow/Is solving now/The problem in Dnepropetrovsk”. Well, Dnepropetrovsk is where Moses Schönfinkel was born.
Except, confusingly, at the time it was called (after Catherine the Great or maybe her namesake saint) Ekaterinoslav (Екатеринослáв)—and it’s now called Dnipro. It’s one of the larger cities in Ukraine, roughly in the center of the country, about 250 miles down the river Dnieper from Kiev. And at the time when Schönfinkel was born, Ukraine was part of the Russian Empire.
So what traces are there of Moses Schönfinkel in Ekaterinoslav (AKA Dnipro) today? 132 years later it wasn’t so easy to find (especially during a pandemic)… but here’s a record of his birth: a certificate from the Ekaterinoslav Public Rabbi stating that entry 272 of the Birth Register for Jews from 1888 records that on September 7, 1888, a son Moses was born to the Ekaterinoslav citizen Ilya Schönfinkel and his wife Masha:
This seems straightforward enough. But immediately there’s a subtlety. When exactly was Moses Schönfinkel born? What is that date? At the time the Russian Empire—which had the Russian Orthodox Church, which eschewed Pope Gregory’s 1582 revision of the calendar—was still using the Julian calendar introduced by Julius Caesar. (The calendar was switched in 1918 after the Russian Revolution, although the Orthodox Church plans to go on celebrating Christmas on January 7 until 2100.) So to know a correct modern (i.e. Gregorian calendar) date of birth we have to do a conversion. And from this we’d conclude that Moses Schönfinkel was born on September 19, 1888.
But it turns out that’s not the end of the story. There are several other documents associated with Schönfinkel’s college years that also list his date of birth as September 7, 1888. But the state archives of the Dnepropetrovsk region contain the actual, original register from the synagogue in Ekaterinoslav. And here’s entry 272—and it records the birth of Moses Schönfinkel, but on September 17, not September 7:
So the official certificate is wrong! Someone left a digit out. And there’s a check: the Birth Register also gives the date in the Jewish calendar: 24 Tishrei–which for 1888 is the Julian date September 17. So converting to modern Gregorian form, the correct date of birth for Moses Schönfinkel is September 29, 1888.
OK, now what about his name? In Russian it’s given as Моисей Шейнфинкель (or, including the patronymic, with the most common transliteration from Hebrew, Моисей Эльевич Шейнфинкель). But how should his last name be transliterated? Well, there are several possibilities. We’re using Schönfinkel—but other possibilities are Sheinfinkel and Sheynfinkel—and these show up almost randomly in different documents.
What else can we learn from Moses Schönfinkel’s “birth certificate”? Well, it describes his father Эльева (Ilya) as an Ekaterinoslav мещанина. But what is that word? It’s often translated “bourgeoisie”, but seems to have basically meant “middleclass city dweller”. And in other documents from the time, Ilya Schönfinkel is described as a “merchant of the 2nd guild” (i.e. not the “top 5%” 1st guild, nor the lower 3rd guild).
Apparently, however, his fortunes improved. The 1905 “Index of Active Enterprises Incorporated in the [Russian] Empire” lists him as a “merchant of the 1st guild” and records that in 1894 he cofounded the company of “Lurie & Sheinfinkel” (with a paidin capital of 10,000 rubles, or about $150k today) that was engaged in the grocery trade:
Lurie & Sheinfinkel seems to have had multiple wine and grocery stores. Between 1901 and 1904 its “store #2” was next to a homeopathic pharmacy in a building that probably looked at the time much like it does today:
And for store #1 there are actually contemporary photographs (note the инкель for the end of “Schönfinkel” visible on the bottom left; this particular building was destroyed in World War II):
There seems to have been a close connection between the Schönfinkels and the Luries—who were a prominent Ekaterinoslav family involved in a variety of enterprises. Moses Schönfinkel’s mother Maria (Masha) was originally a Lurie (actually, she was one of the 8 siblings of Ilya Schönfinkel’s business partner Aron Lurie). Ilya Schönfinkel is listed from 1894 to 1897 as “treasurer of the Lurie Synagogue”. And in 1906 Moses Schönfinkel listed his mailing address in Ekaterinoslav as Lurie House, Ostrozhnaya Square. (By 1906 that square sported an upscale park—though a century earlier it had housed a prison that was referenced in a poem by Pushkin. Now it’s the site of an opera house.)
Accounts of Schönfinkel sometimes describe him as coming from a “village in Ukraine”. In actuality, at the turn of the twentieth century Ekaterinoslav was a bustling metropolis, that for example had just become the third city in the whole Russian Empire to have electric trams. Schönfinkel’s family also seems to have been quite well to do. Some pictures of Ekaterinoslav from the time give a sense of the environment (this building was actually the site of a Lurie candy factory):
As the name “Moses” might suggest, Moses Schönfinkel was Jewish, and at the time he was born there was a large Jewish population in the southern part of Ukraine. Many Jews had come to Ekaterinoslav from Moscow, and in fact 40% of the whole population of the town was identified as Jewish.
Moses Schönfinkel went to the main high school in town (the “Ekaterinoslav classical gymnasium”)—and graduated in 1906, shortly before turning 18. Here’s his diploma:
The diploma shows that he got 5/5 in all subjects—the subjects being theology, Russian, logic, Latin, Greek, mathematics, geodesy (“mathematical geography”), physics, history, geography, French, German and drawing. So, yes, he did well in high school. And in fact the diploma goes on to say: “In view of his excellent behavior and diligence and excellent success in the sciences, especially in mathematics, the Pedagogical Council decided to award him the Gold Medal…”
Having graduated from high school, Moses Schönfinkel wanted to go (“for purely family reasons”, he said) to the University of Kiev. But being told that Ekaterinoslav was in the wrong district for that, he instead asked to enroll at Novorossiysk University in Odessa. He wrote a letter—in rather neat handwriting—to unscramble a bureaucratic issue, giving various excuses along the way:
But in the fall of 1906, there he was: a student in the Faculty of Physics and Mathematics Faculty of Novorossiysk University, in the rather upscale and cosmopolitan town of Odessa, on the Black Sea.
The Imperial Novorossiya University, as it was then officially called, had been created out of an earlier institution by Tsar Alexander II in 1865. It was a distinguished university, with for example Dmitri Mendeleev (of periodic table fame) having taught there. In Soviet times it would be renamed after the discoverer of macrophages, Élie Metchnikoff (who worked there). Nowadays it is usually known as Odessa University. And conveniently, it has maintained its archives well—so that, still there, 114 years later, is Moses Schönfinkel’s student file:
It’s amazing how “modern” a lot of what’s in it seems. First, there are documents Moses Schönfinkel sent so he could register (confirming them by telegram on September 1, 1906). There’s his highschool diploma and birth certificate—and there’s a document from the Ekaterinoslav City Council certifying his “citizen rank” (see above). The cover sheet also records a couple of other documents, one of which is presumably some kind of deferment of military service.
And then in the file there are two “photo cards” giving us pictures of the young Moses Schönfinkel, wearing the uniform of the Imperial Russian Army:
(These pictures actually seem to come from 1908; the style of uniform was a standard one issued after 1907; the [presumably] white collar tabs indicate the 3rd regiment of whatever division he was assigned to.)
Nowadays it would all be online, but in his physical file there is a “lecture book” listing courses (yes, every document is numbered, to correspond to a line in a central ledger):
Here are the courses Moses Schönfinkel took in his first semester in college (fall 1906):
Introduction to Analysis (6 hrs), Introduction to Determinant Theory (2 hrs), Analytical Geometry 1 (2 hrs), Chemistry (5 hrs), Physics 1 (3 hrs), Elementary Number Theory (2 hrs): a total of 20 hours. Here’s the bill for these courses: pretty good value at 1 ruble per coursehour, or a total of 20 rubles, which is about $300 today:
Subsequent semesters list many very familiar courses: Differential Calculus, Integrals (parts 1 and 2), and Higher Algebra, as well as “Calculus of Probabilities” (presumably probability theory) and “Determinant Theory” (essentially differently branded “linear algebra”). There are some “distribution” courses, like Astronomy (and Spherical Astronomy) and Physical Geography (or is that Geodesy?). And by 1908, there are also courses like Functions of a Complex Variable, IntegroDifferential Equations (yeah, differential equations definitely pulled ahead of integral equations over the past century), Calculus of Variations and Infinite Series. And—perhaps presaging Schönfinkel’s next life move—another course that makes an appearance in 1908 is German (and it’s Schönfinkel only nonscience course during his whole university career).
In Schönfinkel’s “lecture book” many of the courses also have names of professors listed. For example, there’s “Kagan”, who’s listed as teaching Foundations of Geometry (as well as Higher Algebra, Determinant Theory and IntegroDifferential Equations). That’s Benjamin Kagan, who was then a young lecturer, but would later become a leader in differential geometry in Moscow—and also someone who studied the axiomatic foundations of geometry (as well as writing about the somewhat tragic life of Lobachevsky).
Another professor—listed as teaching Schönfinkel Introduction to Analysis and Theory of Algebraic Equation Solving—is “Shatunovsky”. And (at least according to Shatunovsky’s later student Sofya Yanoskaya, of whom we’ll hear more later), Samuil Shatunovsky was basically Schönfinkel’s undergraduate advisor.
Shatunovsky had been the 9th child of a poor Jewish family (actually) from a village in Ukraine. He was never able to enroll at a university, but for some years did manage to go to lectures by people around Pafnuty Chebyshev in Saint Petersburg. For quite a few years he then made a living as an itinerant math tutor (notably in Ekaterinoslav) but papers he wrote were eventually noticed by people at the university in Odessa, and, finally, in 1905, at the age of 46, he ended up as a lecturer at the university—where the following year he taught Schönfinkel.
Shatunovsky (who stayed in Odessa until his death in 1929) was apparently an energetic but precise lecturer. He seems to have been quite axiomatically oriented, creating axiomatic systems for geometry, algebraic fields, and notably, for order relations. (He was also quite a constructivist, opposed to the indiscriminate use of the Law of Excluded Middle.) The lectures from his Introduction to Analysis course (which Schönfinkel took in 1906) were published in 1923 (by the local publishing company Mathesis in which he and Kagan were involved).
Another of Schönfinkel’s professors (from whom he took Differential Calculus and “Calculus of Probabilities”) was a certain Ivan (or Jan) Śleszyński, who had worked with Karl Weierstrass on things like continued fractions, but by 1906 was in his early 50s and increasingly transitioning to working on logic. In 1911 he moved to Poland, where he sowed some of the seeds for the Polish school of mathematical logic, in 1923 writing a book called On the Significance of Logic for Mathematics (notably with no mention of Schönfinkel), and in 1925 one on proof theory.
It’s not clear how much mathematical logic Moses Schönfinkel picked up in college, but in any case, in 1910, he was ready to graduate. Here’s his final student ID (what are those pieces of string for?):
There’s a certificate confirming that on April 6, 1910, Moses Schönfinkel had no books that needed returning to the library. And he sent a letter asking to graduate (with slightlylessneat handwriting than in 1906):
The letter closes with his signature (Моисей Шейнфинкель):
After Moses Schönfinkel graduated college in 1910 he probably went into four years of military service (perhaps as an engineer) in the Russian Imperial Army. World War I began on July 28, 1914—and Russia mobilized on July 30. But in one of his few pieces of good luck Moses Schönfinkel was not called up, having arrived in Göttingen, Germany on June 1, 1914 (just four weeks before the event that would trigger World War I), to study mathematics.
Göttingen was at the time a top place for mathematics. In fact, it was sufficiently much of a “math town” that around that time postcards of local mathematicians were for sale there. And the biggest star was David Hilbert—which is who Schönfinkel went to Göttingen hoping to work with.
Hilbert had grown up in Prussia and started his career in Königsberg. His big break came in 1888 at age 26 when he got a major result in representation theory (then called “invariant theory”)—using thenshocking nonconstructive techniques. And it was soon after this that Felix Klein recruited Hilbert to Göttingen—where he remained for the rest of his life.
In 1900 Hilbert gave his famous address to the International Congress of Mathematicians where he first listed his (ultimately 23) problems that he thought should be important in the future of mathematics. Almost all the problems are what anyone would call “mathematical”. But problem 6 has always stuck out for me: “Mathematical Treatment of the Axioms of Physics”: Hilbert somehow wanted to axiomatize physics as Euclid had axiomatized geometry. And he didn’t just talk about this; he spent nearly 20 years working on it. He brought in physicists to teach him, and he worked on things like gravitation theory (“Einstein–Hilbert action”) and kinetic theory—and wanted for example to derive the existence of the electron from something like Maxwell’s equations. (He was particularly interested in the way atomistic processes limit to continua—a problem that I now believe is deeply connected to computational irreducibility, in effect implying another appearance of undecidability, like in Hilbert’s 1st, 2nd and 10th problems.)
Hilbert seemed to feel that physics was a crucial source of raw material for mathematics. But yet he developed a whole program of research based on doing mathematics in a completely formalistic way—where one just writes down axioms and somehow “mechanically” generates all true theorems from them. (He seems to have drawn some distinction between “merely mathematical” questions, and questions about physics, apparently noting—in a certain resonance with my life’s work—that in the latter case “the physicist has the great calculating machine, Nature”.)
In 1899 Hilbert had written down more precise and formal axioms for Euclid’s geometry, and he wanted to go on and figure out how to formulate other areas of mathematics in this kind of axiomatic way. But for more than a decade he seems to have spent most of his time on physics—finally returning to questions about the foundations of mathematics around 1917, giving lectures about “logical calculus” in the winter session of 1920.
By 1920, World War I had come and gone, with comparatively little effect on mathematical life in Göttingen (the nearest battle was in Belgium 200 miles to the west). Hilbert was 58 years old, and had apparently lost quite a bit of his earlier energy (not least as a result of having contracted pernicious anemia [autoimmune vitamin B12 deficiency], whose cure was found only a few years later). But Hilbert was still a celebrity around Göttingen, and generating mathematical excitement. (Among “celebrity gossip” mentioned in a letter home by young Russian topologist Pavel Urysohn is that Hilbert was a huge fan of the gramophone, and that even at his advanced age, in the summer, he would sit in a tree to study.)
I have been able to find out almost nothing about Schönfinkel’s interaction with Hilbert. However, from April to August 1920 Hilbert gave weekly lectures entitled “Problems of Mathematical Logic” which summarized the standard formalism of the field—and the official notes for those lectures were put together by Moses Schönfinkel and Paul Bernays (the “N” initial for Schönfinkel is a typo):
A few months after these lectures came, at least from our perspective today, the highlight of Schönfinkel’s time in Göttingen: the talk he gave on December 7, 1920. The venue was the weekly meeting of the Göttingen Mathematics Society, held at 6pm on Tuesdays. The society wasn’t officially part of the university, but it met in the same university “Auditorium Building” that at the time housed the math institute:
The talks at the Göttingen Mathematics Society were listed in the Annual Report of the German Mathematicians Association:
There’s quite a lineup. November 9, Ludwig Neder (student of Edmund Landau): “Trigonometric Series”. November 16, Erich BesselHagen (student of Carathéodory): “Discontinuous Solutions of Variational Problems”. November 23, Carl Runge (of Runge–Kutta fame, then a Göttingen professor): “American Work on Star Clusters in the Milky Way”. November 30 Gottfried Rückle (assistant of van der Waals): “Explanations of Natural Laws Using a Statistical Mechanics Basis”. And then: December 7: Moses Schönfinkel, “Elements of Logic”.
The next week, December 14, Paul Bernays, who worked with Hilbert and interacted with Schönfinkel, spoke about “Probability, the Arrow of Time and Causality” (yes, there was still a lot of interest around Hilbert in the foundations of physics). January 10+11, Joseph Petzoldt (philosopher of science): “The Epistemological Basis of Special and General Relativity”. January 25, Emmy Noether (of Noether’s theorem fame): “Elementary Divisors and General Ideal Theory”. February 1+8, Richard Courant (of PDE etc. fame) & Paul Bernays: “About the New Arithmetic Theories of Weyl and Brouwer”. February 22, David Hilbert: “On a New Basis for the Meaning of a Number” (yes, that’s foundations of math).
What in detail happened at Schönfinkel’s talk, or as a result of it? We don’t know. But he seems to have been close enough to Hilbert that just over a year later he was in a picture taken for David Hilbert’s 60th birthday on January 23, 1922:
There are all sorts of wellknown mathematicians in the picture (Richard Courant, Hermann Minkowski, Edmund Landau, …) as well as some physicists (Peter Debye, Theodore von Kármán, Ludwig Prandtl, …). And there near the top left is Moses Schönfinkel, sporting a somewhat surprised expression.
For his 60th birthday Hilbert was given a photo album—with 44 pages of pictures of altogether about 200 mathematicians (and physicists). And there on page 22 is Moses Schönfinkel:
Who are the other people on the page with him? Adolf Kratzer (1893–1983) was a student of Arnold Sommerfeld, and a “physics assistant” to Hilbert. Hermann Vermeil (1889–1959) was an assistant to Hermann Weyl, who worked on differential geometry for general relativity. Heinrich Behmann (1891–1970) was a student of Hilbert and worked on mathematical logic, and we’ll encounter him again later. Finally, Carl Ludwig Siegel (1896–1981) had been a student of Landau and would become a wellknown number theorist.
There’s a lot that’s still mysterious about Moses Schönfinkel’s time in Göttingen. But we have one (undated) letter written by Nathan Schönfinkel, Moses’s younger brother, presumably in 1921 or 1922 (yes, he romanizes his name “Scheinfinkel” rather than “Schönfinkel”):
Dear Professor!
I received a letter from Rabbi Dr. Behrens in which he wrote that my brother was in need, that he was completely malnourished. It was very difficult for me to read these lines, even more so because I cannot help my brother. I haven’t received any messages or money myself for two years. Thanks to the good people where I live, I am protected from severe hardship. I am able to continue my studies. I hope to finish my PhD in 6 months. A few weeks ago I received a letter from my cousin stating that our parents and relatives are healthy. My cousin is in Kishinev (Bessarabia), now in Romania. He received the letter from our parents who live in Ekaterinoslav. Our parents want to help us but cannot do so because the postal connections are nonexistent. I hope these difficulties will not last long. My brother is helpless and impractical in this material world. He is a victim of his great love for science. Even as a 12 year old boy he loved mathematics, and all window frames and doors were painted with mathematical formulas by him. As a high school student, he devoted all his free time to mathematics. When he was studying at the university in Odessa, he was not satisfied with the knowledge there, and his striving and ideal was Göttingen and the king of mathematics, Prof. Hilbert. When he was accepted in Göttingen, he once wrote to me the following: “My dear brother, it seems to me as if I am dreaming but this is reality: I am in Göttingen, I saw Prof. Hilbert, I spoke to Prof. Hilbert.” The war came and with it suffering. My brother, who is helpless, has suffered more than anyone else. But he did not write to me so as not to worry me. He has a good heart. I ask you, dear Professor, for a few months until the connections with our city are established, to help him by finding a suitable (not harmful to his health) job for him. I will be very grateful to you, dear Professor, if you will answer me.
Sincerely.
N. Scheinfinkel
We’ll talk more about Nathan Schönfinkel later. But suffice it to say here that when he wrote the letter he was a physiology graduate student at the University of Bern—and he would get his PhD in 1922, and later became a professor. But the letter he wrote is probably our best single surviving source of information about the situation and personality of Moses Schönfinkel. Obviously he was a serious math enthusiast from a young age. And the letter implies that he’d wanted to work with Hilbert for some time (presumably hence the German classes in college).
It also implies that he was financially supported in Göttingen by his parents—until this was disrupted by World War I. (And we learn that his parents were OK in the Russian Revolution.) (By the way, the rabbi mentioned is probably a certain Siegfried Behrens, who left Göttingen in 1922.)
There’s no record of any reply to Nathan Schönfinkel’s letter from Hilbert. But at least by the time of Hilbert’s 60th birthday in 1922 Moses Schönfinkel was (as we saw above) enough in the inner circle to be invited to the birthday party.
What else is there in the university archives in Göttingen about Moses Schönfinkel? There’s just one document, but it’s very telling:
It’s dated 18 March 1924. And it’s a carbon copy of a reference for Schönfinkel. It’s rather cold and formal, and reads:
“The Russian privatdozent [private lecturer] in mathematics, Mr. Scheinfinkel, is hereby certified to have worked in mathematics for ten years with Prof. Hilbert in Göttingen.”
It’s signed (with a stylized “S”) by the “University Secretary”, a certain Ludwig Gossmann, who we’ll be talking about later. And it’s being sent to Ms. Raissa Neuburger, at Bühlplatz 5, Bern. That address is where the Physiology Institute at the University of Bern is now, and also was in 1924. And Raissa Neuberger either was then, or soon would become, Nathan Schönfinkel’s wife.
But there’s one more thing, handwritten in black ink at the bottom of the document. Dated March 20, it’s another note from the University Secretary. It’s annotated “a.a.”, i.e. ad acta—for the records. And in German it reads:
Gott sei Dank, dass Sch weg ist
which translates in English as:
Thank goodness Sch is gone
Hmm. So for some reason at least the university secretary was happy to see Schönfinkel go. (Or perhaps it was a German 1920s version of an HR notation: “not eligible for rehire”.) But let’s analyze this document in a little more detail. It says Schönfinkel worked with Hilbert for 10 years. That agrees with him having arrived in Göttingen in 1914 (which is a date we know for other reasons, as we’ll see below).
But now there’s a mystery. The reference describes Schönfinkel as a “privatdozent”. That’s a definite position at a German university, with definite rules, that in 1924 one would expect to have been rigidly enforced. The basic career track was (and largely still is): first, spend 2–5 years getting a PhD. Then perhaps get recruited for a professorship, or if not, continue doing research, and write a habilitation, after which the university may issue what amounts to an official government “license to teach”, making someone a privatdozent, able to give lectures. Being a privatdozent wasn’t as such a paid gig. But it could be combined with a job like being an assistant to a professor—or something outside the university, like tutoring, teaching high school or working at a company.
So if Schönfinkel was a privatdozent in 1924, where is the record of his PhD, or his habilitation? To get a PhD required “formally publishing” a thesis, and printing (as in, on a printing press) at least 20 or so copies of the thesis. A habilitation was typically a substantial, published research paper. But there’s absolutely no record of any of these things for Schönfinkel. And that’s very surprising. Because there are detailed records for other people (like Paul Bernays) who were around at the time, and were indeed privatdozents.
And what’s more the Annual Report of the German Mathematicians Association—which listed Schönfinkel’s 1920 talk—seems to have listed mathematical goingson in meticulous detail. Who gave what talk. Who wrote what paper. And most definitely who got a PhD, did a habilitation or became a privatdozent. (And becoming a privatdozent also required an action of the university senate, which was carefully recorded.) But going through all the annual reports of the German Mathematicians Association we find only four mentions of Schönfinkel. There’s his 1920 talk, and also a 1921 talk with Paul Bernays that we’ll discuss later. There’s the publication of his papers in 1924 and 1927. And there’s a single other entry, which says that on November 4, 1924, Richard Courant gave a report to the Göttingen Mathematical Society about a conference in Innsbruck, where Heinrich Behmann reported on “published work by M. Schönfinkel”. (It describes the work as follows: “It is a continuation of Sheffer’s [1913] idea of replacing the elementary operations of symbolic logic with a single one. By means of a certain function calculus, all logical statements (including the mathematical ones) are represented by three basic signs alone.”)
So, it seems, the university secretary wasn’t telling it straight. Schönfinkel might have worked with Hilbert for 10 years. But he wasn’t a privatdozent. And actually it doesn’t seem as if he had any “official status” at all.
So how do we even know that Schönfinkel was in Göttingen from 1914 to 1924? Well, he was Russian, and so in Germany he was an “alien”, and as such he was required to register his address with the local police (no doubt even more so from 1914 to 1918 when Germany was, after all, at war with Russia). And the remarkable thing is that even after all these years, Schönfinkel’s registration card is still right there in the municipal archives of the city of Göttingen:
So that means we have all Schönfinkel’s addresses during his time in Göttingen. Of course, there are confusions. There’s yet another birthdate for Schönfinkel: September 4, 1889. Wrong year. Perhaps a wrongly done correction from the Julian calendar. Perhaps “adjusted” for some reason of military service obligations. But, in any case, the document says that Moses Schönfinkel from Ekaterinoslav arrived in Göttingen on June 1, 1914, and started living at 6 Lindenstraße (now FelixKleinStrasse).
He moved pretty often (11 times in 10 years), not at particularly systematic times of year. It’s not clear exactly what the setup was in all these places, but at least at the end (and in another document) it lists addresses and “with Frau….”, presumably indicating that he was renting a room in someone’s house.
Where were all those addresses? Well, here’s a map of Göttingen circa 1920, with all of them plotted (along with a red “M” for the location of the math institute):
The last item on the registration card says that on March 18, 1924 he departed Göttingen, and went to Moscow. And the note on the copy of the reference saying “thank goodness [he’s] gone” is dated March 20, so that all ties together.
But let’s come back to the reference. Who was this “University Secretary” who seems to have made up the claim that Schönfinkel was a privatdozent? It was fairly easy to find out that his name was Ludwig Gossmann. But the big surprise was to find out that the university archives in Göttingen have nearly 500 pages about him—primarily in connection with a “criminal investigation”.
Here’s the story. Ludwig Gossmann was born in 1878 (so he was 10 years older than Schönfinkel). He grew up in Göttingen, where his father was a janitor at the university. He finished high school but didn’t go to college and started working for the local government. Then in 1906 (at age 28) he was hired by the university as its “secretary”.
The position of “university secretary” was a highlevel one. It reported directly to the vicerector of the university, and was responsible for “general administrative matters” for the university, including, notably, the supervision of international students (of whom there were many, Schönfinkel being one). Ludwig Gossmann held the position of university secretary for 27 years—even while the university had a different rector (normally a distinguished academic) every year.
But Mr. Gossmann also had a sideline: he was involved in real estate. In the 1910s he started building houses (borrowing money from, among others, various university professors). And by the 1920s he had significant real estate holdings—and a business where he rented to international visitors and students at the university.
Years went by. But then, on January 24, 1933, the newspaper headline announced: “Sensational arrest: senior university official Gossmann arrested on suspicion of treason—communist revolution material [Zersetzungsschrift] confiscated from his apartment”. It was said that perhaps it was a setup, and that he’d been targeted because he was gay (though, a year earlier, at age 54, he did marry a woman named Elfriede).
This was a bad time to be accused of being a communist (Hitler would become chancellor less than a week later, on January 30, 1933, in part propelled by fears of communism). Gossmann was taken to Hanover “for questioning”, but was then allowed back to Göttingen “under house arrest”. He’d had health problems for several years, and died of a heart attack on February 24, 1933.
But none of this really helps us understand why Gossmann would go out on a limb to falsify the reference for Schönfinkel. We can’t specifically find an address match, but perhaps Schönfinkel had at least at some point been a tenant of Gossmann’s. Perhaps he still owed rent. Perhaps he was just difficult in dealing with the university administration. It’s not clear. It’s also not clear why the reference Gossmann wrote was sent to Schönfinkel’s brother in Bern, even though Schönfinkel himself was going to Moscow. Or why it wasn’t just handed to Schönfinkel before he left Göttingen.
Whatever was going on with Schönfinkel in Göttingen in 1924, we know one thing for sure: it was then that he published his remarkable paper about what are now called combinators. Let’s talk in a bit more detail about the paper—though the technicalities I’m discussing elsewhere.
First, there’s some timing. At the end of the paper, it says it was received by the journal on March 15, 1924, i.e. just three days before the date of Ludwig Gossmann’s reference for Schönfinkel. And then at the top of the paper, there’s something else: under Schönfinkel’s name it says “in Moskau”, i.e. at least as far as the journal was concerned, Schönfinkel was in Moscow, Russia, at the time the article was published:
There’s also a footnote on the first page of the paper:
“The following thoughts were presented by the author to the Mathematical Society in Göttingen on December 7, 1920. Their formal and stylistic processing for this publication was done by H. Behmann in Göttingen.”
The paper itself is written in a nice, clear and mathematically mature way. Its big result (as I’ve discussed elsewhere) is the introduction of what would later be called combinators: two abstract constructs from which arbitrary functions and computations can be built up. Schönfinkel names one of them S, after the German word “Schmelzen” for “fusion”. The other has become known as K, although Schönfinkel calls it C, even though the German word for “constancy” (which is what would naturally describe it) is “Konstantheit”, which starts with a K.
The paper ends with three paragraphs, footnoted with “The considerations that follow are the editor’s” (i.e. Behmann’s). They’re not as clear as the rest of the paper, and contain a confused mistake.
The main part of the paper is “just math” (or computation, or whatever). But here’s the page where S and K (called C here) are first used:
And now there’s something more peopleoriented: a footnote to the combinator equation I = SCC saying “This reduction was communicated to me by Mr. Boskowitz; some time before that, Mr. Bernays had called the somewhat less simple one (SC)(CC) to my attention.” In other words, even if nothing else, Schönfinkel had talked to Boskowitz and Bernays about what he was doing.
OK, so we’ve got three people—in addition to David Hilbert—somehow connected to Moses Schönfinkel.
Let’s start with Heinrich Behmann—the person footnoted as “processing” Schönfinkel’s paper for publication:
He was born in Bremen, Germany in 1891, making him a couple of years younger than Schönfinkel. He arrived in Göttingen as a student in 1911, and by 1914 was giving a talk about Whitehead and Russell’s Principia Mathematica (which had been published in 1910). When World War I started he volunteered for military service, and in 1915 he was wounded in action in Poland (receiving an Iron Cross)—but in 1916 he was back in Göttingen studying under Hilbert, and in 1918 he wrote his PhD thesis on “The Antinomy of the Transfinite Number and Its Resolution by the Theory of Russell and Whitehead” (i.e. using the idea of types to deal with paradoxes associated with infinity).
Behmann continued in the standard academic track (i.e. what Schönfinkel apparently didn’t do)—and in 1921 he got his habilitation with the thesis “Contributions to the Algebra of Logic, in Particular to the Entscheidungsproblem [Decision Problem]”. There’d been other decision problems discussed before, but Behmann said what he meant was a “procedure [giving] complete instructions for determining whether a [logical or mathematical] assertion is true or false by a deterministic calculation after finitely many steps”. And, yes, Alan Turing’s 1936 paper “On Computable Numbers, with an Application to the Entscheidungsproblem” was what finally established that the halting problem, and therefore the Entscheidungsproblem, was undecidable. Curiously, in principle, there should have been enough in Schönfinkel’s paper that this could have been figured out back in 1921 if Behmann or others had been thinking about it in the right way (which might have been difficult before Gödel’s work).
So what happened to Behmann? He continued to work on mathematical logic and the philosophy of mathematics. After his habilitation in 1921 he became a privatdozent at Göttingen (with a job as an assistant in the applied math institute), and then in 1925 got a professorship in Halle in applied math—though having been an active member of the Nazi Party since 1937, lost this professorship in 1945 and became a librarian. He died in 1970.
(By the way, even though in 1920 “PM” [Principia Mathematica] was hot—and Behmann was promoting it—Schönfinkel had what in my opinion was the good taste to not explicitly mention it in his paper, referring only to Hilbert’s muchlessmuddy ideas about the formalization of mathematics.)
OK, so what about Boskovitz, credited in the footnote with having discovered the classic combinator result I = SKK? That was Alfred Boskovitz, in 1920 a 23yearold Jewish student at Göttingen, who came from Budapest, Hungary, and worked with Paul Bernays on set theory. Boskovitz is notable for having contributed far more corrections (nearly 200) to Principia Mathematica than anyone else, and being acknowledged (along with Behmann) in a footnote in the (1925–27) second edition. (This edition also gives a reference to Schönwinkel’s [sic] paper at the end of a list of 14 “other contributions to mathematical logic” since the first edition.) In the mid1920s Boskovitz returned to Budapest. In 1936 he wrote to Behmann that antiJewish sentiment there made him concerned for his safety. There’s one more known communication from him in 1942, then no further trace.
The third person mentioned in Schönfinkel’s paper is Paul Bernays, who ended up living a long and productive life, mostly in Switzerland. But we’ll come to him later.
So where was Schönfinkel’s paper published? It was in a journal called Mathematische Annalen (Annals of Mathematics)—probably the top math journal of the time. Here’s its rather swank masthead, with quite a collection of famous names (including physicists like Einstein, Born and Sommerfeld):
The “instructions to contributors” on the inside cover of each issue had a statement from the “Editorial Office” about not changing things at the proof stage because “according to a calculation they [cost] 6% of the price of a volume”. The instructions then go on to tell people to submit papers to the editors—at their various home addresses (it seems David Hilbert lived just down the street from Felix Klein…):
Here’s the complete table of contents for the volume in which Schönfinkel’s paper appears:
There are a variety of famous names here. But particularly notable for our purposes are Aleksandr Khintchine (of Khinchin constant fame) and the topologists Pavel Alexandroff and Pavel Urysohn, who were all from Moscow State University, and who are all indicated, like Schönfinkel as being “in Moscow”.
There’s a little bit of timing information here. Schönfinkel’s paper was indicated as having been received by the journal on March 15, 1924. The “thank goodness [he’s] gone [from Göttingen]” comment is dated March 20. Meanwhile, the actual issue of the journal with Schönfinkel’s article (number 3 of 4) was published September 15, with table of contents:
But note the ominous † next to Urysohn’s name. Turns out his fatal swimming accident was August 17, so—notwithstanding their admonitions—the journal must have added the † quite quickly at the proof stage.
Beyond his 1924 paper on combinators, there’s only one other known piece of published output from Moses Schönfinkel: a paper coauthored with Paul Bernays “On the Decision Problem of Mathematical Logic”:
It’s actually much more widely cited than Schönfinkel’s 1924 combinator paper, but it’s vastly less visionary and ultimately much less significant; it’s really about a technical point in mathematical logic.
About halfway through the paper it has a note:
“The following thoughts were inspired by Hilbert’s lectures on mathematical logic and date back several years. The decision procedure for a single function F(x, y) was derived by M. Schönfinkel, who first tackled the problem; P. Bernays extended the method to several logical functions, and also wrote the current paper.”
The paper was submitted on March 24, 1927. But in the records of the German Mathematicians Association we find a listing of another talk at the Göttingen Mathematical Society: December 6, 1921, P. Bernays and M. Schönfinkel, “Das Entscheidungsproblem im Logikkalkul”. So the paper had a long gestation period, and (as the note in the paper suggests) it basically seems to have fallen to Bernays to get it written, quite likely with little or no communication with Schönfinkel.
So what else do we know about it? Well, remarkably enough, the Bernays archive contains two notebooks (the paper kind!) by Moses Schönfinkel that are basically an early draft of the paper (with the title already being the same as it finally was, but with Schönfinkel alone listed as the author):
These notebooks are basically our best window into the front lines of Moses Schönfinkel’s work. They aren’t dated as such, but at the end of the second notebook there’s a byline of sorts, that lists his street address in Göttingen—and we know he lived at that address from September 1922 until March 1924:
OK, so what’s in the notebooks? The first page might indicate that the notebooks were originally intended for a different purpose. It’s just a timetable of lectures:
“Hilbert lectures: Monday: Mathematical foundations of quantum theory; Thursday: Hilbert–Bernays: Foundations of arithmetic; Saturday: Hilbert: Knowledge and mathematical thinking”. (There’s also a slightly unreadable note that seems to say “Hoppe. 6–8… electricity”, perhaps referring to Edmund Hoppe, who taught physics in Göttingen, and wrote a history of electricity.)
But then we’re into 15 pages (plus 6 in the other notebook) of content, written in essentially perfect German, but with lots of parentheticals of different possible word choices:
The final paper as coauthored with Bernays begins:
“The central problem of mathematical logic, which is also closely connected to its axiomatic foundations, is the decision problem [Entscheidungsproblem]. And it deals with the following. We have logical formulas which contain logic functions, predicates, …”
Schönfinkel’s version begins considerably more philosophically (here with a little editing for clarity):
“Generality has always been the main goal—the ideal of the mathematician. Generality in the solution, in the method, in the concept and formulation of the theorem, in the problem and question. This tendency is even more pronounced and clearer with modern mathematicians than with earlier ones, and reaches its high point in the work of Hilbert and Ms. Noether. Such an ideal finds its most extreme expression when one faces the problem of “solving all problems”—at least all mathematical problems, because everything else after is easy, as soon as this “Gordian Knot” is cut (because the world is written in “mathematical letters” according to Hilbert).
In just the previous century mathematicians would have been extremely skeptical and even averse to such fantasies… But today’s mathematician has already been trained and tested in the formal achievements of modern mathematics and Hilbert’s axiomatics, and nowadays one has the courage and the boldness to dare to touch this question as well. We owe to mathematical logic the fact that we are able to have such a question at all.
From Leibniz’s bold conjectures, the great logicianmathematicians went step by step in pursuit of this goal, in the systematic structure of mathematical logic: Boole (discoverer of the logical calculus), (Bolzano?), Ernst Schröder, Frege, Peano, Ms. LaddFranklin, the two Peirces, Sheffer, Whitehead, Couturat, Huntington, Padoa, Shatunovsky, Sleshinsky, Kagan, Poretsky, Löwenheim, Skolem, … and their numerous students, collaborators and contemporaries … until in 1910–1914 “the system” by Bertrand Russell and Whitehead appeared—the famous “Principia Mathematica”—a mighty titanic work, a large system. Finally came our knowledge of logic from Hilbert’s lectures on (the algebra of) logic (calculus) and, following on from this, the groundbreaking work of Hilbert’s students: Bernays and Behmann.
The investigations of all these scholars and researchers have led (in no uncertain terms) to the fact that it has become clear that actual mathematics represents a branch of logic. … This emerges most clearly from the treatment and conception of mathematical logic that Hilbert has given. And now, thanks to Hilbert’s approach, we can (satisfactorily) formulate the great decision problem of mathematical logic.”
We learn quite a bit about Schönfinkel from this. Perhaps the most obvious thing is that he was a serious fan of Hilbert and his approach to mathematics (with a definite shoutout to “Ms. Noether”). It’s also interesting that he refers to Bernays and Behmann as “students” of Hilbert. That’s pretty much correct for Behmann. But Bernays (as we’ll see soon) was more an assistant or colleague of Hilbert’s than a student.
It gives interesting context to see Schönfinkel rattle off a sequence of contributors to what he saw as the modern view of mathematical logic. He begins—quite rightly I think—mentioning “Leibniz’s bold conjectures”. He’s not sure whether Bernard Bolzano fits (and neither am I). Then he lists Schröder, Frege and Peano—all pretty standard choices, involved in building up the formal structure of mathematical logic.
Next he mentions Christine LaddFranklin. At least these days, she’s not particularly well known, but she had been a mathematical logic student of Charles Peirce, and in 1881 she’d written a paper about the “Algebra of Logic” which included a truth table, a solid 40 years before Post or Wittgenstein. (In 1891 she had also worked in Göttingen on color vision with the experimental psychologist Georg Müller—who was still there in 1921.) It’s notable that Schönfinkel mentions LaddFranklin ahead of the fatherandson Peirces. Next we see Sheffer, who Schönfinkel quotes in connection with Nand in his combinator paper. (No doubt unbeknownst to Schönfinkel, Henry Sheffer—who spent most of his life in the US—was also born in Ukraine [“near Odessa”, his documents said], and was also Jewish, and was just 6 years older than Schönfinkel.) I’m guessing Schönfinkel mentions Whitehead next in connection with universal algebra, rather than his later collaboration with Russell.
Next comes Louis Couturat, who frankly wouldn’t have made my list for mathematical logic, but was another “algebra of logic” person, as well as a Leibniz fan, and developer of the Ido language offshoot from Esperanto. Huntington was involved in the axiomatization of Boolean algebra; Padoa was connected to Peano’s program. Shatunovsky, Sleshinsky and Kagan were all professors of Schönfinkel’s in Odessa (as mentioned above), concerned in various ways with foundations of mathematics. Platon Poretsky I must say I had never heard of before; he seems to have done fairly technical work on propositional logic. And finally Schönfinkel lists Löwenheim and Skolem, both of whom are well known in mathematical logic today.
I consider it rather wonderful that Schönfinkel refers to Whitehead and Russell’s Principia Mathematica as a “titanic work” (Titanenwerk). The showy and “overconfident” Titanic had come to grief on its iceberg in 1912, somehow reminiscent of Principia Mathematica, eventually coming to grief on Gödel’s theorem.
At first it might just seem charming—particularly in view of his brother’s comment that “[Moses] is helpless and impractical in this material world”—to see Schönfinkel talk about how after one’s solved all mathematical problems, then solving all problems will be easy, explaining that, after all, Hilbert has said that “the world is written in ‘mathematical letters’”. He says that in the previous century mathematicians wouldn’t have seriously considered “solving everything”, but now, because of progress in mathematical logic, “one has the courage and the boldness to dare to touch this question”.
It’s very easy to see this as naive and unworldly—the writing of someone who knew only about mathematics. But though he didn’t have the right way to express it, Schönfinkel was actually onto something, and something very big. He talks at the beginning of his piece about generality, and about how recent advances in mathematical logic embolden one to pursue it. And in a sense he was very right about this. Because mathematical logic—through work like his—is what led us to the modern conception of computation, which really is successful in “talking about everything”. Of course, after Schönfinkel’s time we learned about Gödel’s theorem and computational irreducibility, which tell us that even though we may be able to talk about everything, we can never expect to “solve every problem” about everything.
But back to Schönfinkel’s life and times. The remainder of Schönfinkel’s notebooks give the technical details of his solution to a particular case of the decision problem. Bernays obviously worked through these, adding more examples as well as some generalization. And Bernays cut out Schönfinkel’s philosophical introduction, no doubt on the (probably correct) assumption that it would seem too airyfairy for the paper’s intended technical audience.
So who was Paul Bernays? Here’s a picture of him from 1928:
Bernays was almost exactly the same age as Schönfinkel (he was born on October 17, 1888—in London, where there was no calendar issue to worry about). He came from an international business family, was a Swiss citizen and grew up in Paris and Berlin. He studied math, physics and philosophy with a distinguished roster of professors in Berlin and Göttingen, getting his PhD in 1912 with a thesis on analytic number theory.
After his PhD he went to the University of Zurich, where he wrote a habilitation (on complex analysis), and became a privatdozent (yes, with the usual documentation, that can still be found), and an assistant to Ernst Zermelo (of ZFC set theory fame). But in 1917 Hilbert visited Zurich and soon recruited Bernays to return to Göttingen. In Göttingen, for apparently bureaucratic reasons, Bernays wrote a second habilitation, this time on the axiomatic structure of Principia Mathematica (again, all the documentation can still be found). Bernays was also hired to work as a “foundations of math assistant” to Hilbert. And it was presumably in that capacity that he—along with Moses Schönfinkel—wrote the notes for Hilbert’s 1920 course on mathematical logic.
Unlike Schönfinkel, Bernays followed a fairly standard—and successful—academic track. He became a professor in Göttingen in 1922, staying there until he was dismissed (because of partially Jewish ancestry) in 1933—after which he moved back to Zurich, where he stayed and worked very productively, mostly in mathematical logic (von Neumann–Bernays–Gödel set theory, etc.), until he died in 1977.
Back when he was in Göttingen one of the things Bernays did with Hilbert was to produce the twovolume classic Grundlagen der Mathematik (Foundations of Mathematics). So did the Grundlagen mention Schönfinkel? It has one mention of the Bernays–Schönfinkel paper, but no direct mention of combinators. However, there is one curious footnote:
This starts “A system of axioms that is sufficient to derive all true implicational formulas was first set up by M. Schönfinkel…”, then goes on to discuss work by Alfred Tarski. So do we have evidence of something else Schönfinkel worked on? Probably.
In ordinary logic, one starts from an axiom system that gives relations, say about And, Or and Not. But, as Sheffer established in 1910, it’s also possible to give an axiom system purely in terms of Nand (and, yes, I’m proud to say that I found the very simplest such axiom system in 2000). Well, it’s also possible to use other bases for logic. And this footnote is about using Implies as the basis. Actually, it’s implicational calculus, which isn’t as strong as ordinary logic, in the sense that it only lets you prove some of the theorems. But there’s a question again: what are the possible axioms for implicational calculus?
Well, it seems that Schönfinkel found a possible set of such axioms, though we’re not told what they were; only that Tarski later found a simpler set. (And, yes, I looked for the simpler axiom systems for implicational calculus in 2000, but didn’t find any.) So again we see Schönfinkel in effect trying to explore the lowestlevel foundations of mathematical logic, though we don’t know any details.
So what other interactions did Bernays have with Schönfinkel? There seems to be no other information in Bernays’s archives. But I have been able to get a tiny bit more information. In a strange chain of connections, someone who’s worked on Mathematica and Wolfram Language since 1987 is Roman Maeder. And Roman’s thesis advisor (at ETH Zurich) was Erwin Engeler—who was a student of Paul Bernays. Engeler (who is now in his 90s) worked for many years on combinators, so of course I had to ask him what Bernays might have told him about Schönfinkel. He told me he recalled only two conversations. He told me he had the impression that Bernays found Schönfinkel a difficult person. He also said he believed that the last time Bernays saw Schönfinkel it was in Berlin, and that Schönfinkel was somehow in difficult circumstances. Any such meeting in Berlin would have had to be before 1933. But try as we might to track it down, we haven’t succeeded.
In the space of three days in March 1924 Moses Schönfinkel—by then 35 years old—got his paper on combinators submitted to Mathematische Annalen, got a reference for himself sent out, and left for Moscow. But why did he go to Moscow? We simply don’t know.
A few things are clear, though. First, it wasn’t difficult to get to Moscow from Göttingen at that time; there was pretty much a direct train there. Second, Schönfinkel presumably had a valid Russian passport (and, one assumes, didn’t have any difficulties from not having served in the Russian military during World War I).
One also knows that there was a fair amount of intellectual exchange and travel between Göttingen and Moscow. The very same volume of Mathematische Annalen in which Schönfinkel’s paper was published has three (out of 19 authors) authors in addition to Schönfinkel listed as being in Moscow: Pavel Alexandroff, Pavel Urysohn and Aleksandr Khinchin. Interestingly, all of these people were at Moscow State University.
And we know there was more exchange with that university. Nikolai Luzin, for example, got his PhD in Göttingen in 1915, and went on to be a leader in mathematics at Moscow State University (until he was effectively dismissed by Stalin in 1936). And we know that for example in 1930, Andrei Kolmogorov, having just graduated from Moscow State University, came to visit Hilbert.
Did Schönfinkel go to Moscow State University? We don’t know (though we haven’t yet been able to access any archives that may be there).
Did Schönfinkel go to Moscow because he was interested in communism? Again, we don’t know. It’s not uncommon to find mathematicians ideologically sympathetic to at least the theory of communism. But communism doesn’t seem to have particularly been a thing in the mathematics or general university community in Göttingen. And indeed when Ludwig Gossmann was arrested in 1933, investigations of who he might have recruited into communism didn’t find anything of substance.
Still, as I’ll discuss later, there is a tenuous reason to think that Schönfinkel might have had some connection to Leon Trotsky’s circle, so perhaps that had something to do with him going to Moscow—though it would have been a bad time to be involved with Trotsky, since by 1925 he was already out of favor with Stalin.
A final theory is that Schönfinkel might have had relatives in Moscow; at least it looks as if some of his Lurie cousins ended up there.
But realistically we don’t know. And beyond the bylines on the journals, we don’t really have any documentary evidence that Schönfinkel was in Moscow. However, there is one more data point, from November 1927 (8 months after the submission of Schönfinkel’s paper with Bernays). Pavel Alexandroff was visiting Princeton University, and when Haskell Curry (who we’ll meet later) asked him about Schönfinkel he was apparently told that “Schönfinkel has… gone insane and is now in a sanatorium & will probably not be able to work any more.”
Ugh! What happened? Once again, we don’t know. Schönfinkel doesn’t seem to have ever been “in a sanatorium” while he was in Göttingen; after all, we have all his addresses, and none of them were sanatoria. Maybe there’s a hint of something in Schönfinkel’s brother’s letter to Hilbert. But are we really sure that Schönfinkel actually suffered from mental illness? There’s a bunch of hearsay that says he did. But then it’s a common claim that logicians who do highly abstract work are prone to mental illness (and, well, yes, there are a disappointingly large number of historical examples).
Mental illness wasn’t handled very well in the 1920s. Hilbert’s only child, his son Franz (who was about five years younger than Schönfinkel), suffered from mental illness, and after a delusional episode that ended up with him in a clinic, David Hilbert simply said “From now on I have to consider myself as someone who does not have a son”. In Moscow in the 1920s—despite some political rhetoric—conditions in psychiatric institutions were probably quite poor, and there was for example quite a bit of use of primitive shock therapy (though not yet electroshock). It’s notable, by the way, that Curry reports that Alexandroff described Schönfinkel as being “in a sanatorium”. But while at that time the word “sanatorium” was being used in the US as a better term for “insane asylum”, in Russia it still had more the meaning of a place for a rest cure. So this still doesn’t tell us if Schönfinkel was in fact “institutionalized”—or just “resting”. (By the way, if there was mental illness involved, another connection for Schönfinkel that doesn’t seem to have been made is that Paul Bernays’s first cousin once removed was Martha Bernays, wife of Sigmund Freud.)
Whether or not he was mentally ill, what would it have been like for Schönfinkel in what was then the Soviet Union in the 1920s? One thing is that in the Soviet system, everyone was supposed to have a job. So Schönfinkel was presumably employed doing something—though we have no idea what. Schönfinkel had presumably been at least somewhat involved with the synagogue in Göttingen (which is how the rabbi there knew to tell his brother he was in bad shape). There was a large and growing Jewish population in Moscow in the 1920s, complete with things like Yiddish newspapers. But by the mid 1930s it was no longer so comfortable to be Jewish in Moscow, and Jewish cultural organizations were being shut down.
By the way, in the unlikely event that Schönfinkel was involved with Trotsky, there could have been trouble even by 1925, and certainly by 1929. And it’s notable that it was a common tactic for Stalin (and others) to claim that their various opponents were “insane”.
So what else do we know about Schönfinkel in Moscow? It’s said that he died there in 1940 or 1942, aged 52–54. Conditions in Moscow wouldn’t have been good then; the socalled Battle of Moscow occurred in the winter of 1941. And there are various stories told about Schönfinkel’s situation at that time.
The closest to a primary source seems to be a summary of mathematical logic in the Soviet Union, written by Sofya Yanovskaya in 1948. Yanovskaya was born in 1896 (so 8 years after Schönfinkel), and grew up in Odessa. She attended the same university there as Schönfinkel, studying mathematics, though arrived five years after Schönfinkel graduated. She had many of the same professors as Schönfinkel, and, probably like Schönfinkel, was particularly influenced by Shatunovsky. When the Russian Revolution happened, Yanovskaya went “all in”, becoming a serious party operative, but eventually began to teach, first at the Institute of Red Professors, and then from 1925 at Moscow State University—where she became a major figure in mathematical logic, and was eventually awarded the Order of Lenin.
One might perhaps have thought that mathematical logic would be pretty much immune to political issues. But the founders of communism had talked about mathematics, and there was a complex debate about the relationship between Marxist–Leninist ideology and formal ideas in mathematics, notably the Law of Excluded Middle. Sofya Yanovskaya was deeply involved, initially in trying to “bring mathematics to heel”, but later in defending it as a discipline, as well as in editing Karl Marx’s mathematical writings.
It’s not clear to what extent her historical writings were censored or influenced by party considerations, but they certainly contain lots of good information, and in 1948 she wrote a paragraph about Schönfinkel:
“The work of M. I. Sheinfinkel played a substantial role in the further development of mathematical logic. This brilliant student of S. O. Shatunovsky, unfortunately, left us early. (After getting mentally ill [заболев душевно], M. I. Sheinfinkel passed away in Moscow in 1942.) He did the work mentioned here in 1920, but only published it in 1924, edited by Behmann.”
Unless she was hiding things, this quote doesn’t make it sound as if Yanovskaya knew much about Schönfinkel. (By the way, her own son was apparently severely mentally ill.) A student of Jean van Heijenoort (who we’ll encounter later) named Irving Anellis did apparently in the 1990s ask a student of Yanovskaya’s whether Yanovskaya had known Schönfinkel. Apparently he responded that unfortunately nobody had thought to ask her that question before she died in 1966.
What else do we know? Nothing substantial. The most extensively embellished story I’ve seen about Schönfinkel appears in an anonymous comment on the talk page for the Wikipedia entry about Schönfinkel:
“William Hatcher, while spending time in St Petersburg during the 1990s, was told by Soviet mathematicians that Schönfinkel died in wretched poverty, having no job and but one room in a collective apartment. After his death, the rough ordinary people who shared his apartment burned his manuscripts for fuel (WWII was raging). The few Soviet mathematicians around 1940 who had any discussions with Schönfinkel later said that those mss reinvented a great deal of 20th century mathematical logic. Schönfinkel had no way of accessing the work of Turing, Church, and Tarski, but had derived their results for himself. Stalin did not order Schönfinkel shot or deported to Siberia, but blame for Schönfinkel’s death and inability to publish in his final years can be placed on Stalin’s doorstep. 202.36.179.65 06:50, 25 February 2006 (UTC)”
William Hatcher was a mathematician and philosopher who wrote extensively about the Baháʼí Faith and did indeed spend time at the Steklov Institute of Mathematics in Saint Petersburg in the 1990s—and mentioned Schönfinkel’s technical work in his writings. People I’ve asked at the Steklov Institute do remember Hatcher, but don’t know anything about what it’s claimed he was told about Schönfinkel. (Hatcher died in 2005, and I haven’t been successful at getting any material from his archives.)
So are there any other leads? I did notice that the IP address that originated the Wikipedia comment is registered to the University of Canterbury in New Zealand. So I asked people there and in the New Zealand foundations of math scene. But despite a few “maybe soandso wrote that” ideas, nobody shed any light.
OK, so what about at least a death certificate for Schönfinkel? Well, there’s some evidence that the registry office in Moscow has one. But they tell us that in Russia only direct relatives can access death certificates….
So far as we know, Moses Schönfinkel never married, and didn’t have children. But he did have a brother, Nathan, who we encountered earlier in connection with the letter he wrote about Moses to David Hilbert. And in fact we know quite a bit about Nathan Scheinfinkel (as he normally styled himself). Here’s a biographical summary from 1932:
The basic story is that he was about five years younger than Moses, and went to study medicine at the University of Bern in Switzerland in April 1914 (i.e. just before World War I began). He got his MD in 1920, then got his PhD on “Gas Exchange and Metamorphosis of Amphibian Larvae after Feeding on the Thyroid Gland or Substances Containing Iodine” in 1922. He did subsequent research on the electrochemistry of the nervous system, and in 1929 became a privatdozent—with official “license to teach” documentation:
(In a piece of bizarre smallworldness, my grandfather, Max Wolfram, also got a PhD in the physiology [veterinary medicine] department at the University of Bern [studying the function of the thymus gland], though that was in 1909, and presumably he had left before Nathan Scheinfinkel arrived.)
But in any case, Nathan Scheinfinkel stayed at Bern, eventually becoming a professor, and publishing extensively, including in English. He became a Swiss citizen in 1932, with the official notice stating:
“Scheinfinkel, Nathan. Son of Ilia Gerschow and Mascha [born] Lurie, born in Yekaterinoslav, Russia, September 13, 1893 (old style). Doctor of medicine, residing in Bern, Neufeldstrasse 5a, husband of Raissa [born] Neuburger.”
In 1947, however, he moved to become a founding professor in a new medical school in Ankara, Turkey. (Note that Turkey, like Switzerland, had been neutral in World War II.) In 1958 he moved again, this time to found the Institute of Physiology at Ege University in Izmir, Turkey, and then at age 67, in 1961, he retired and returned to Switzerland.
Did Nathan Scheinfinkel have children (whose descendents, at least, might know something about “Uncle Moses”)? It doesn’t seem so. We tracked down Nuran Harirî, now an emeritus professor, but in the 1950s a young physiology resident at Ege University responsible for translating Nathan Scheinfinkel’s lectures into Turkish. She said that Nathan Scheinfinkel was at that point living in campus housing with his wife, but she never heard mention of any children, or indeed of any other family members.
What about any other siblings? Amazingly, looking through handwritten birth records from Ekaterinoslav, we found one! Debora Schönfinkel, born December 22, 1889 (i.e. January 3, 1890, in the modern calendar):
So Moses Schönfinkel had a younger sister, as well as a younger brother. And we even know that his sister graduated from high school in June 1907. But we don’t know anything else about her, or about other siblings. We know that Schönfinkel’s mother died in 1936, at the age of 74.
Might there have been other Schönfinkel relatives in Ekaterinoslav? Perhaps, but it’s unlikely they survived World War II—because in one of those shocking and tragic pieces of history, over a fourday period in February 1942 almost the whole Jewish population of 30,000 was killed.
Could there be other Schönfinkels elsewhere? The name is not common, but it does show up (with various spellings and transliterations), both before and after Moses Schönfinkel. There’s a Scheinfinkel Russian revolutionary buried in the Kremlin Wall; there was a Lovers of Zion delegate Scheinfinkel from Ekaterinoslav. There was a Benjamin Scheinfinkel in New York City in the 1940s; a Shlomo Scheinfinkel in Haifa in the 1930s. There was even a certain curiously named Bas Saul Haskell Scheinfinkel born in 1875. But despite quite a bit of effort, I’ve been unable to locate any living relative of Moses Schönfinkel. At least so far.
What happened with combinators after Schönfinkel published his 1924 paper? Initially, so far as one can tell, nothing. That is, until Haskell Curry found Schönfinkel’s paper in the library at Princeton University in November 1927—and launched into a lifetime of work on combinators.
Who was Haskell Curry? And why did he know to care about Schönfinkel’s paper?
Haskell Brooks Curry was born on September 12, 1900, in a small town near Boston, MA. His parents were both elocution educators, who by the time Haskell Curry was born were running the School of Expression (which had evolved from his mother’s Bostonbased School of Elocution and Expression). (Many years later, the School of Expression would evolve into Curry College in Waltham, Massachusetts—which happens to be where for several years we held our Wolfram Summer School, often noting the “coincidence” of names when combinators came up.)
Haskell Curry went to college at Harvard, graduating in mathematics in 1920. After a couple of years doing electrical engineering, he went back to Harvard, initially working with Percy Bridgman, who was primarily an experimental physicist, but was writing a philosophy of science book entitled The Logic of Modern Physics. And perhaps through this Curry got introduced to Whitehead and Russell’s Principia Mathematica.
But in any case, there’s a note in his archive about Principia Mathematica dated May 20, 1922:
Curry seems—perhaps like an electrical engineer or a “preprogrammer”—to have been very interested in the actual process of mathematical logic, starting his notes with: “No logical process is possible without the phenomenon of substitution.” He continued, trying to break down the process of substitution.
But then his notes end, more philosophically, and perhaps with “expression” influence: “Phylogenetic origin of logic: 1. Sensation; 2. Association: Red hot poker–law of permanence”.
At Harvard Curry started working with George Birkhoff towards a PhD on differential equations. But by 1927–8 he had decided to switch to logic, and was spending a year as an instructor at Princeton. And it was there—in November 1927—that he found Schönfinkel’s paper. Preserved in his archives are the notes he made:
At the top there’s a date stamp of November 28, 1927. Then Curry writes: “This paper anticipates much of what I have done”—then launches into a formal summary of Schönfinkel’s paper (charmingly using f@x to indicate function application—just as we do in Wolfram Language, except his is left associative…).
He ends his “report” with “In criticism I might say that no formal development have been undertaken in the above. Equality is taken intuitively and such things as universality, and proofs of identity are shown on the principle that if for every z, x@z : y@z then x=y ….”
But then there’s another piece:
“On discovery of this paper I saw Prof. Veblen. Schönfinkel’s paper said ‘in Moskau’. Accordingly we sought out Paul Alexandroff. The latter says Schönfinkel has since gone insane and is now in a sanatorium & will probably not be able to work any more. The paper was written with help of Paul Bernays and Behman [sic]; who would presumably be the only people in the world who would write on that subject.”
What was the backstory to this? Oswald Veblen was a math professor at Princeton who had worked on the axiomatization of geometry and was by then working on topology. Pavel Alexandroff (who we encountered earlier) was visiting from Moscow State University for the year, working on topology with Hopf, Lefschetz, Veblen and Alexander. I’m not quite sure why Curry thought Bernays and Behmann “would be the only people in the world who would write on that subject”; I don’t see how he could have known.
Curry continues: “It was suggested I write to Bernays, who is außerord. prof. [longterm lecturer] at Göttingen.” But then he adds—in depressingly familiar academic form: “Prof. Veblen thought it unwise until I had something definite ready to publish.”
“A footnote to Schönfinkel’s paper said the ideas were presented before Math Gesellschaft in Göttingen on Dec. 7, 1920 and that its formal and elegant [sic] write up was due to H. Behman”. “Elegant” is a peculiar translation of “stilistische” that probably gives Behmann too much credit; a more obvious translation might be “stylistic”.
Curry continues: “Alexandroff’s statements, as I interpret them, are to the effect that Bernays, Behman, Ackermann, von Neumann, Schönfinkel & some others form a small school of math logicians working on this & similar topics in Göttingen.”
And so it was that Curry resolved to study in Göttingen, and do his PhD in logic there. But before he left for Göttingen, Curry wrote a paper (published in 1929):
Already there’s something interesting in the table of contents: the use of the word “combinatory”, which, yes, in Curry’s care is going to turn into “combinator”.
The paper starts off reading a bit like a student essay, and one’s not encouraged by a footnote a few pages in:
“In the writing the foregoing account I have naturally made use of any ideas I may have gleaned from reading the literature. The writings of Hilbert are fundamental in this connection. I hope that I have added clearness to certain points where the existing treatments are obscure.” [“Clearness” not “clarity”?]
Then, towards the end of the “Preliminary Discussion” is this:
And the footnote says: “See the paper of Schönfinkel cited below”. It’s (so far as I know) the firstever citation to Schönfinkel’s paper!
On the next page Curry starts to give details. Curry starts talking about substitution, then says (in an echo of modern symbolic language design) this relates to the idea of “transformation of functions”:
At first he’s off talking about all the various combinatorial arrangements of variables, etc. But then he introduces Schönfinkel—and starts trying to explain in a formal way what Schönfinkel did. And even though he says he’s talking about what one assumes is structural substitution, he seems very concerned about what equality means, and how Schönfinkel didn’t quite define that. (And, of course, in the end, with universal computation, undecidability, etc. we know that the definition of equality wasn’t really accessible in the 1920s.)
By the next page, here we are, S and K (Curry renamed Schönfinkel’s C):
At first he’s imagining that the combinators have to be applied to something (i.e. f[x] not just f). But by the next page he comes around to what Schönfinkel was doing in looking at “pure combinators”:
The rest of the paper is basically concerned with setting up combinators that can successively represent permutations—and it certainly would have been much easier if Curry had had a computer (and one could imagine minimal “combinator sorters” like minimal sorting networks):
After writing this paper, Curry went to Göttingen—where he worked with Bernays. I must say that I’m curious what Bernays said to Curry about Schönfinkel (was it more than to Erwin Engeler?), and whether other people around Göttingen even remembered Schönfinkel, who by then had been gone for more than four years. In 1928, travel in Europe was open enough that Curry should have had no trouble going, for example, to Moscow, but there’s no evidence he made any effort to reach out to Schönfinkel. But in any case, in Göttingen he worked on combinators, and over the course of a year produced his first official paper on “combinatory logic”:
Strangely, the paper was published in an American journal—as the only paper not in English in that volume. The paper is more straightforward, and in many ways more “Schönfinkel like”. But it was just the first of many papers that Curry wrote about combinators over the course of nearly 50 years.
Curry was particularly concerned with the “mathematicization” of combinators, finding and fixing problems with axioms invented for them, connecting to other formalisms (notably Church’s lambda calculus), and generally trying to prove theorems about what combinators do. But more than that, Curry spread the word about combinators far and wide. And before long most people viewed him as “Mr. Combinator”, with Schönfinkel at most a footnote.
In 1958, when Haskell Curry and Robert Feys wrote their book on Combinatory Logic, there’s a historical footnote—that gives the impression that Curry “almost” had Schönfinkel’s ideas before he saw Schönfinkel’s paper in 1927:
I have to say that I don’t think that’s a correct impression. What Schönfinkel did was much more singular than that. It’s plausible to think that others (and particularly Curry) could have had the idea that there could be a way to go “below the operations of mathematical logic” and find more fundamental building blocks based on understanding things like the process of substitution. But the actuality of how Schönfinkel did it is something quite different—and something quite unique.
And when one sees Schönfinkel’s S combinator: what mind could have come up with such a thing? Even Curry says he didn’t really understand the significance of the S combinator until the 1940s.
I suppose if one’s just thinking of combinatory logic as a formal system with a certain general structure then it might not seem to matter that things as simple as S and K can be the ultimate building blocks. But the whole point of what Schönfinkel was trying to do (as the title of his paper says) was to find the “building blocks of logic”. And the fact that he was able to do it—especially in terms of things as simple as S and K—was a great and unique achievement. And not something that (despite all the good he did for combinators) Curry did.
In the decade or so after Schönfinkel’s paper appeared, Curry occasionally referenced it, as did Church and a few other closely connected people. But soon Schönfinkel’s paper—and Schönfinkel himself—disappeared completely from view, and standard databases list no citations.
But in 1967 Schönfinkel’s paper was seen again—now even translated into English. The venue was a book called From Frege to Gödel: A Source Book in Mathematical Logic, 1879–1931. And there, sandwiched between von Neumann on transfinite numbers and Hilbert on “the infinite”, is Schönfinkel’s paper, in English, with a couple of pages of introduction by Willard Van Orman Quine. (And indeed it was from this book that I myself first became aware of Schönfinkel and his work.)
But how did Schönfinkel’s paper get into the book? And do we learn anything about Schönfinkel from its appearance there? Maybe. The person who put the book together was a certain Jean van Heijenoort, who himself had a colorful history. Born in 1912, he grew up mostly in France, and went to college to study mathematics—but soon became obsessed with communism, and in 1932 left to spend what ended up being nearly ten years working as a kind of combination PR person and bodyguard for Leon Trotsky, initially in Turkey but eventually in Mexico. Having married an American, van Heijenoort moved to New York City, eventually enrolling in a math PhD program, and becoming a professor doing mathematical logic (though with some colorful papers along the way, with titles like “The Algebra of Revolution”).
Why is this relevant? Well, the question is: how did van Heijenoort know about Schönfinkel? Perhaps it was just through careful scholarship. But just maybe it was through Trotsky. There’s no real evidence, although it is known that during his time in Mexico, Trotsky did request a copy of Principia Mathematica (or was it his “PR person”?). But at least if there was a Trotsky connection it could help explain Schönfinkel’s strange move to Moscow. But in the end we just don’t know.
When one reads about the history of science, there’s a great tendency to get the impression that big ideas come suddenly to people. But my historical research—and my personal experience—suggest that that’s essentially never what happens. Instead, there’s usually a period of many years in which some methodology or conceptual framework gradually develops, and only then can the great idea emerge.
So with Schönfinkel it’s extremely frustrating that we just can’t see that long period of development. The records we have just tell us that Schönfinkel announced combinators on December 7, 1920. But how long had he been working towards them? We just don’t know.
On the face of it, his paper seems simple—the kind of thing that could have been dashed off in a few weeks. But I think it’s much more likely that it was the result of a decade of development—of which, through foibles of history, we now have no trace.
Yes, what Schönfinkel finally came up with is simple to explain. But to get to it, he had to cut through a whole thicket of technicality—and see the essence of what lay beneath. My life as a computational language designer has often involved doing very much this same kind of thing. And at the end of it, what you come up with may seem in retrospect “obvious”. But to get there often requires a lot of hard intellectual work.
And in a sense what Schönfinkel did was the most impressive possible version of this. There were no computers. There was no ambient knowledge of computation as a concept. Yet Schönfinkel managed to come up with a system that captures the core of those ideas. And while he didn’t quite have the language to describe it, I think he did have a sense of what he was doing—and the significance it could have.
What was the personal environment in which Schönfinkel did all this? We just don’t know. We know he was in Göttingen. We don’t think he was involved in any particularly official way with the university. Most likely he was just someone who was “around”. Clearly he had some interaction with people like Hilbert and Bernays. But we don’t know how much. And we don’t really know if they ever thought they understood what Schönfinkel was doing.
Even when Curry picked up the idea of combinators—and did so much with it—I don’t think he really saw the essence of what Schönfinkel was trying to do. Combinators and Schönfinkel are a strange episode in intellectual history. A seed sown far ahead of its time by a person who left surprisingly few traces, and about whom we know personally so little.
But much as combinators represent a way of getting at the essence of computation, perhaps in combinators we have the essence of Moses Schönfinkel: years of a life compressed to two “signs” (as he would call them) S and K. And maybe if the operation we now call currying needs a symbol we should be using the “sha” character Ш from the beginning of Schönfinkel’s name to remind us of a person about whom we know so little, but who planted a seed that gave us so much.
Many people and organizations have helped in doing research and providing material for this piece. Thanks particularly to Hatem Elshatlawy (fieldwork in Göttingen, etc.), Erwin Engeler (firstperson history), Unal Goktas (Turkish material), Vitaliy Kaurov (locating Ukraine + Russia material), Anna & Oleg Marichev (interpreting old Russian handwriting), Nik Murzin (fieldwork in Moscow), Eila Stiegler (German translations), Michael Trott (interpreting German). Thanks also for input from Henk Barendregt, Semih Baskan, Metin Baştuğ, Cem Boszahin, Jason Cawley, Jack Copeland, Nuran Hariri, Ersin Koylu, Alexander Kuzichev, Yuri Matiyasevich, Roman Maeder, Volker Peckhaus, Jonathan Seldin, Vladimir Shalack, Matthew Szudzik, Christian Thiel, Richard Zach. Particular thanks to the following archives and staff: Berlin State Library [Gabriele Kaiser], Bern University Archive [Niklaus Bütikofer], ETHZ (Bernays) Archive [Flavia Lanini, Johannes Wahl], Göttingen City Archive [Lena Uffelmann], Göttingen University [Katarzyna Chmielewska, Bärbel Mund, Petra Vintrová, Dietlind Willer].
]]>“In principle you could use combinators,” some footnote might say. But the implication tends to be “But you probably don’t want to.” And, yes, combinators are deeply abstract—and in many ways hard to understand. But tracing their history over the hundred years since they were invented, I’ve come to realize just how critical they’ve actually been to the development of our modern conception of computation—and indeed my own contributions to it.
The idea of representing things in a formal, symbolic way has a long history. In antiquity there was Aristotle’s logic and Euclid’s geometry. By the 1400s there was algebra, and in the 1840s Boolean algebra. Each of these was a formal system that allowed one to make deductions purely within the system. But each, in a sense, ultimately viewed itself as being set up to model something specific. Logic was for modeling the structure of arguments, Euclid’s geometry the properties of space, algebra the properties of numbers; Boolean algebra aspired to model the “laws of thought”.
But was there perhaps some more general and fundamental infrastructure: some kind of abstract system that could ultimately model or represent anything? Today we understand that’s what computation is. And it’s becoming clear that the modern conception of computation is one of the single most powerful ideas in all of intellectual history—whose implications are only just beginning to unfold.
But how did we finally get to it? Combinators had an important role to play, woven into a complex tapestry of ideas stretching across more than a century.
The main part of the story begins in the 1800s. Through the course of the 1700s and 1800s mathematics had developed a more and more elaborate formal structure that seemed to be reaching ever further. But what really was mathematics? Was it a formal way of describing the world, or was it something else—perhaps something that could exist without any reference to the world?
Developments like nonEuclidean geometry, group theory and transfinite numbers made it seem as if meaningful mathematics could indeed be done just by positing abstract axioms from scratch and then following a process of deduction. But could all of mathematics actually just be a story of deduction, perhaps even ultimately derivable from something seemingly lower level—like logic?
But if so, what would things like numbers and arithmetic be? Somehow they would have to be “constructed out of pure logic”. Today we would recognize these efforts as “writing programs” for numbers and arithmetic in a “machine code” based on certain “instructions of logic”. But back then, everything about this and the ideas around it had to be invented.
Before one could really dig into the idea of “building mathematics from logic” one had to have ways to “write mathematics” and “write logic”. At first, everything was just words and ordinary language. But by the end of the 1600s mathematical notation like +, =, > had been established. For a while new concepts—like Boolean algebra—tended to just piggyback on existing notation. By the end of the 1800s, however, there was a clear need to extend and generalize how one wrote mathematics.
In addition to algebraic variables like x, there was the notion of symbolic functions f, as in f(x). In logic, there had long been the idea of letters (p, q, …) standing for propositions (“it is raining now”). But now there needed to be notation for quantifiers (“for all x suchandsuch”, or “there exists x such that…”). In addition, in analogy to symbolic functions in mathematics, there were symbolic logical predicates: not just explicit statements like x > y but also ones like p(x, y) for symbolic p.
The first full effort to set up the necessary notation and come up with an actual scheme for constructing arithmetic from logic was Gottlob Frege’s 1879 Begriffsschrift (“concept script”):
And, yes, it was not so easy to read, or to typeset—and at first it didn’t make much of an impression. But the notation got more streamlined with Giuseppe Peano’s Formulario project in the 1890s—which wasn’t so concerned with starting from logic as starting from some specified set of axioms (the “Peano axioms”):
And then in 1910 Alfred Whitehead and Bertrand Russell began publishing their 2000page Principia Mathematica—which pretty much by its sheer weight and ambition (and notwithstanding what I would today consider grotesque errors of language design)—popularized the possibility of building up “the complexity of mathematics” from “the simplicity of logic”:
It was one thing to try to represent the content of mathematics, but there was also the question of representing the infrastructure and processes of mathematics. Let’s say one picks some axioms. How can one know if they’re consistent? What’s involved in proving everything one can prove from them?
In the 1890s David Hilbert began to develop ideas about this, particularly in the context of tightening up the formalism of Euclid’s geometry and its axioms. And after Principia Mathematica, Hilbert turned more seriously to the use of logicbased ideas to develop “metamathematics”—notably leading to the formulation of things like the “decision problem” (Entscheidungsproblem) of asking whether, given an axiom system, there’s a definite procedure to prove or disprove any statement with respect to it.
But while connections between logic and mathematics were of great interest to people concerned with the philosophy of mathematics, a more obviously mathematical development was universal algebra—in which axioms for different areas of mathematics were specified just by giving appropriate algebraiclike relations. (As it happens, universal algebra was launched under that name by the 1898 book A Treatise on Universal Algebra by Alfred Whitehead, later of Principia Mathematica fame.)
But there was one area where ideas about algebra and logic intersected: the tightening up of Boolean algebra, and in particular the finding of simpler foundations for it. Logic had pretty much always been formulated in terms of And, Or and Not. But in 1912 Henry Sheffer—attempting to simplify Principia Mathematica—showed that just Nand (or Nor) were sufficient. (It turned out that Charles Peirce had already noted the same thing in the 1880s.)
So that established that the notation of logic could be made basically as simple as one could imagine. But what about its actual structure, and axioms? Sheffer talked about needing five “algebrastyle” axioms. But by going to axioms based on logical inferences Jean Nicod managed in 1917 to get it down to just one axiom. (And, as it happens, I finally finished the job in 2000 by finding the very simplest “algebrastyle” axioms for logic—the single axiom: ((p·q)·r)·(p·((p·r)·p))r.)
The big question had in a sense been “What is mathematics ultimately made of?”. Well, now it was known that ordinary propositional logic could be built up from very simple elements. So what about the other things used in mathematics—like functions and predicates? Was there a simple way of building these up too?
People like Frege, Whitehead and Russell had all been concerned with constructing specific things—like sets or numbers—that would have immediate mathematical meaning. But Hilbert’s work in the late 1910s began to highlight the idea of looking instead at metamathematics and the “mechanism of mathematics”—and in effect at how the pure symbolic infrastructure of mathematics fits together (through proofs, etc.), independent of any immediate “external” mathematical meaning.
Much as Aristotle and subsequent logicians had used (propositional) logic to define a “symbolic structure” for arguments, independent of their subject matter, so too did Hilbert’s program imagine a general “symbolic structure” for mathematics, independent of particular mathematical subject matter.
And this is what finally set the stage for the invention of combinators.
We don’t know how long it took Moses Schönfinkel to come up with combinators. From what we know of his personal history, it could have been as long as a decade. But it could also have been as short as a few weeks.
There’s no advanced math or advanced logic involved in defining combinators. But to drill through the layers of technical detail of mathematical logic to realize that it’s even conceivable that everything can be defined in terms of them is a supreme achievement of a kind of abstract reductionism.
There is much we don’t know about Schönfinkel as a person. But the 11page paper he wrote on the basis of his December 7, 1920, talk in which he introduced combinators is extremely clear.
The paper is entitled “On the Building Blocks of Mathematical Logic” (in the original German, “Über die Bausteine der mathematischen Logik”.) In other words, its goal is to talk about “atoms” from which mathematical logic can be built. Schönfinkel explains that it’s “in the spirit of” Hilbert’s axiomatic method to build everything from as few notions as possible; then he says that what he wants to do is to “seek out those notions from which we shall best be able to construct all other notions of the branch of science in question”.
His first step is to explain that Hilbert, Whitehead, Russell and Frege all set up mathematical logic in terms of standard And, Or, Not, etc. connectives—but that Sheffer had recently been able to show that just a single connective (indicated by a stroke “”—and what we would now call Nand) was sufficient:
But in addition to the “content” of these relations, I think Schönfinkel was trying to communicate by example something else: that all these logical connectives can ultimately be thought of just as examples of “abstract symbolic structures” with a certain “function of arguments” (i.e. f[x,y]) form.
The next couple of paragraphs talk about how the quantifiers “for all” (∀) and “there exists” (∃) can also be simplified in terms of the Sheffer stroke (i.e. Nand). But then comes the rallying cry: “The successes that we have encountered thus far… encourage us to attempt further progress.” And then he’s ready for the big idea—which he explains “at first glance certainly appears extremely bold”. He proposes to “eliminate by suitable reduction the remaining fundamental concepts of proposition, function and variable”.
He explains that this only makes sense for “arbitrary, logically general propositions”, or, as we’d say now, for purely symbolic constructs without specific meanings yet assigned. In other words, his goal is to create a general framework for operating on arbitrary symbolic expressions independent of their interpretation.
He explains that this is valuable both from a “methodological point of view” in achieving “the greatest possible conceptual uniformity”, but also from a certain philosophical or perhaps aesthetic point of view.
And in a sense what he was explaining—back in 1920—was something that’s been a core part of the computational language design that I’ve done for the past 40 years: that everything can be represented as a symbolic expression, and that there’s tremendous value to this kind of uniformity.
But as a “language designer” Schönfinkel was an ultimate minimalist. He wanted to get rid of as many notions as possible—and in particular he didn’t want variables, which he explained were “nothing but tokens that characterize certain argument places and operators as belonging together”; “mere auxiliary notions”.
Today we have all sorts of mathematical notation that’s at least somewhat “variable free” (think coordinatefree notation, category theory, etc.) But in 1920 mathematics as it was written was full of variables. And it needed a serious idea to see how to get rid of them. And that’s where Schönfinkel starts to go “even more symbolic”.
He explains that he’s going to make a kind of “functional calculus” (Funktionalkalkül). He says that normally functions just define a certain correspondence between the domain of their arguments, and the domain of their values. But he says he’s going to generalize that—and allow (“disembodied”) functions to appear as arguments and values of functions. In other words, he’s inventing what we’d now call higherorder functions, where functions can operate “symbolically” on other functions.
In the context of traditional calculusandalgebrastyle mathematics it’s a bizarre idea. But really it’s an idea about computation and computational structures—that’s more abstract and ultimately much more general than the mathematical objectives that inspired it.
But back to Schönfinkel’s paper. His next step is to explain that once functions can have other functions as arguments, functions only ever need to take a single argument. In modern (Wolfram Language) notation he says that you never need f[x,y]; you can always do everything with f[x][y].
In something of a sleight of hand, he sets up his notation so that fxyz (which might look like a function of three arguments f[x,y,z]) actually means (((fx)y)z) (i.e. f[x][y][z]). (In other words—somewhat confusingly with respect to modern standard functional notation—he takes function application to be left associative.)
Again, it’s a bizarre idea—though actually Frege had had a similar idea many years earlier (and now the idea is usually called currying, after Haskell Curry, who we’ll be talking about later). But with his “functional calculus” set up, and all functions needing to take only one argument, Schönfinkel is ready for his big result.
He’s effectively going to argue that by combining a small set of particular functions he can construct any possible symbolic function—or at least anything needed for predicate logic. He calls them a “sequence of particular functions of a very general nature”. Initially there are five of them: the identity function (Identitätsfunktion) I, the constancy function (Konstanzfunktion) C (which we now call K), the interchange function (Vertauschungsfunktion) T, the composition function (Zusammensetzungsfunktion) Z, and the fusion function (Verschmelzungsfunktion) S.
And then he’s off and running defining what we now call combinators. The definitions look simple and direct. But to get to them Schönfinkel effectively had to cut away all sorts of conceptual baggage that had come with the historical development of logic and mathematics.
Even talking about the identity combinator isn’t completely straightforward. Schönfinkel carefully explains that in I x = x, equality is direct symbolic or structural equality, or as he puts it “the equal sign is not to be taken to represent logical equivalence as it is ordinarily defined in the propositional calculus of logic but signifies that the expressions on the left and on the right mean the same thing, that is, that the function value lx is always the same as the argument value x, whatever we may substitute for x.” He then adds parenthetically, “Thus, for instance, I I would be equal to I”. And, yes, to someone used to the mathematical idea that a function takes values like numbers, and gives back numbers, this is a bit mindblowing.
Next he explains the constancy combinator, that he called C (even though the German word for it starts with K), and that we now call K. He says “let us assume that the argument value is again arbitrary without restriction, while, regardless of what this value is, the function value will always be the fixed value a”. And when he says “arbitrary” he really means it: it’s not just a number or something; it’s what we would now think of as any symbolic expression.
First he writes (C a)y = a, i.e. the value of the “constancy function C a operating on any y is a”, then he says to “let a be variable too”, and defines (C x)y = x or Cxy = x. Helpfully, almost as if he were writing computer documentation, he adds: “In practical applications C serves to permit the introduction of a quantity x as a ‘blind’ variable.”
Then he’s on to T. In modern notation the definition is T[f][x][y] = f[y][x] (i.e. T is essentially ReverseApplied). (He wrote the definition as (Tϕ)xy = ϕyx, explaining that the parentheses can be omitted.) He justifies the idea of T by saying that “The function T makes it possible to alter the order of the terms of an expression, and in this way it compensates to a certain extent for the lack of a commutative law.”
Next comes the composition combinator Z. He explains that “In [mathematical] analysis, as is well known, we speak loosely of a ‘function of a function’...”, by which he meant that it was pretty common then (and now) to write something like f(g(x)). But then he “went symbolic”—and defined a composition function that could symbolically act on any two functions f and g: Z[f][g][x] = f[g[x]]. He explains that Z allows one to “shift parentheses” in an expression: i.e. whatever the objects in an expression might be, Z allows one to transform [][][] to [[]] etc. But in case this might have seemed too abstract and symbolic, he then attempted to explain in a more “algebraic” way that the effect of Z is “somewhat like that of the associative law” (though, he added, the actual associative law is not satisfied).
Finally comes the pièce de résistance: the S combinator (that Schönfinkel calls the “fusion function”):
He doesn’t take too long to define it. He basically says: consider (fx)(gx) (i.e. f[x][g[x]]). This is really just “a function of x”. But what function? It’s not a composition of f and g; he calls it a “fusion”, and he defines the S combinator to create it: S[f][g][x] = f[x][g[x]].
It’s pretty clear Schönfinkel knew this kind of “symbolic gymnastics” would be hard for people to understand. He continues: “It will be advisable to make this function more intelligible by means of a practical example.” He says to take fxy (i.e. f[x][y]) to be log_{x}y (i.e. Log[x,y]), and gz (i.e. g[z]) to be 1 + z. Then Sfgx = (fx)(gx) = log_{x}(1 + x) (i.e. S[f][g][x]=f[x][g[x]]=Log[x,1+x]). And, OK, it’s not obvious why one would want to do that, and I’m not rushing to make S a builtin function in the Wolfram Language.
But Schönfinkel explains that for him “the practical use of the function S will be to enable us to reduce the number of occurrences of a variable—and to some extent also of a particular function—from several to a single one”.
Setting up everything in terms of five basic objects I, C (now K), T, Z and S might already seem impressive and minimalist enough. But Schönfinkel realized that he could go even further:
First, he says that actually I = SCC (or, in modern notation, s[k][k]). In other words, s[k][k][x] for symbolic x is just equal to x (since s[k][k][x] becomes k[x][k[x]] by using the definition of S, and this becomes x by using the definition of C). He notes that this particular reduction was communicated to him by a certain Alfred Boskowitz (who we know to have been a student at the time); he says that Paul Bernays (who was more of a colleague) had “some time before” noted that I = (SC)(CC) (i.e. s[k][k[k]]). Today, of course, we can use a computer to just enumerate all possible combinator expressions of a particular size, and find what the smallest reduction is. But in Schönfinkel’s day, it would have been more like solving a puzzle by hand.
Schönfinkel goes on, and proves that Z can also be reduced: Z = S(CS)C (i.e. s[k[s]][k]). And, yes, a very simple Wolfram Language program can verify in a few milliseconds that that is the simplest form.
OK, what about T? Schönfinkel gives 8 steps of reduction to prove that T = S(ZZS)(CC) (i.e. s[s[k[s]][k][s[k[s]][k]][s]][k[k]]). But is this the simplest possible form for T? Well, no. But (with the very straightforward 2line Wolfram Language program I wrote) it did take my modern computer a number of minutes to determine what the simplest form is.
The answer is that it doesn't have size 12, like Schönfinkel’s, but rather size 9. Actually, there are 6 cases of size 9 that all work: s[s[k[s]][s[k[k]][s]]][k[k]] (S(S(KS)(S(KK)S))(KK))) and five others. And, yes, it takes a few steps of reduction to prove that they work (the other size9 cases S(SSK(K(SS(KK))))S, S(S(K(S(KS)K))S)(KK), S(K(S(S(KS)K)(KK)))S, S(K(SS(KK)))(S(KK)S), S(K(S(K(SS(KK)))K))S all have more complicated reductions):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/\ Programs.wl"]; CombinatorEvolutionPlot[ CombinatorFixedPointList[ s[s[k[s]][s[k[k]][s]]][k[k]][f][g][x]], "StatesDisplay"] 
But, OK, what did Schönfinkel want to do with these objects he’d constructed? As the title of his paper suggests, he wanted to use them as building blocks for mathematical logic. He begins: “Let us now apply our results to a special case, that of the calculus of logic in which the basic elements are individuals and the functions are propositional functions.” I consider this sentence significant. Schönfinkel didn’t have a way to express it (the concept of universal computation hadn’t been invented yet), but he seems to have realized that what he’d done was quite general, and went even beyond being able to represent a particular kind of logic.
Still, he went on to give his example. He’d explained at the beginning of the paper that the quantifiers we now call ∀ and ∃ could both be represented in terms of a kind of “quantified Nand” that he wrote :
But now he wanted to “combinatorify” everything. So he introduced a new combinator U, and defined it to represent his “quantified Nand”: Ufg = fx gx (he called U the “incompatibility function”—an interesting linguistic description of Nand):
“It is a remarkable fact”, he says, “that every formula of logic can now be expressed by means... solely of C, S and U.” So he’s saying that any expression from mathematical logic can be written out as some combinator expression in terms of S, C (now K) and U. He says that when there are quantifiers like “for all x...” it’s always possible to use combinators to get rid of the “bound variables” x, etc. He says that he “will not give the complete demonstration here”, but rather content himself with an example. (Unfortunately—for reasons of the trajectory of his life that are still quite unclear—he never published his “complete demonstration”.)
But, OK, so what had he achieved? He’d basically shown that any expression that might appear in predicate logic (with logical connectives, quantifiers, variables, etc.) could be reduced to an expression purely in terms of the combinators S, C (now K) and U.
Did he need the U? Not really. But he had to have some way to represent the thing with mathematical or logical “meaning” on which his combinators would be acting. Today the obvious thing to do would be to have a representation for true and false. And what’s more, to represent these purely in terms of combinators. For example, if we took K to represent true, and SK (s[k]) to represent false, then And can be represented as SSK (s[s][k]), Or as S(SS)S(SK) (s[s[s]][s][s[k]]) and Nand as S(S(K(S(SS(K(KK))))))S (s[s[k[s[s[s][k[k[k]]]]]]][s]). Schönfinkel got amazingly far in reducing everything to his “building blocks”. But, yes, he missed this final step.
But given that he’d managed to reduce everything to S, C and U he figured he should try to go further. So he considered an object J that would be a single building block of S and C: JJ = S and J(JJ) = C.
With S and K one can just point to any piece of an expression and see if it reduces. With J it’s a bit more complicated. In modern Wolfram Language terms one can state the rules as {j[j][x_][y_][z_]→x[z][y[z]], j[j[j]][x_][y_]→x} (where order matters) but to apply these requires pattern matching “clusters of J’s” rather than just looking at single S’s and K’s at a time.
But even though—as Schönfinkel observed—this “final reduction” to J didn’t work out, getting everything down to S and K was already amazing. At the beginning of the paper, Schönfinkel had described his objectives. And then he says “It seems to me remarkable in the extreme that the goal we have just set can be realized also; as it happens, it can be done by a reduction to three fundamental signs.” (The paper does say three fundamental signs, presumably counting U as well as S and K.)
I’m sure Schönfinkel expected that to reproduce all the richness of mathematical logic he’d need quite an elaborate set of building blocks. And certainly people like Frege, Whitehead and Russell had used what were eventually very complicated setups. Schönfinkel managed to cut through all the complexity to show that simple building blocks were all that were needed. But then he found something else: that actually just two building blocks (S and K) were enough.
In modern terms, we’d say that Schönfinkel managed to construct a system capable of universal computation. And that’s amazing in itself. But even more amazing is that he found he could do it with such a simple setup.
I’m sure Schönfinkel was extremely surprised. And here I personally feel a certain commonality with him. Because in my own explorations of the computational universe, what I’ve found over and over again is that it takes only remarkably simple systems to be capable of highly complex behavior—and of universal computation. And even after exploring the computational universe for four decades, I’m still continually surprised at just how simple the systems can be.
For me, this has turned into a general principle—the Principle of Computational Equivalence—and a whole conceptual framework around it. Schönfinkel didn’t have anything like that to think in terms of. But he was in a sense a good enough scientist that he still managed to discover what he discovered—that many decades later we can see fits in as another piece of evidence for the Principle of Computational Equivalence.
Looking at Schönfinkel’s paper a century later, it’s remarkable not only for what it discovers, but also for the clarity and simplicity with which it is presented. A little of the notation is now dated (and of course the original paper is written in German, which is no longer the kind of leading language of scholarship it once was). But for the most part, the paper still seems perfectly modern. Except, of course, that now it could be couched in terms of symbolic expressions and computation, rather than mathematical logic.
Combinators are hard to understand, and it’s not clear how many people understood them when they were first introduced—let alone understood their implications. It’s not a good sign that when Schönfinkel’s paper appeared in 1924 the person who helped prepare it for final publication (Heinrich Behmann) added his own three paragraphs at the end, that were quite confused. And Schönfinkel’s sole other published paper—coauthored with Paul Bernays in 1927—didn’t even mention combinators, even though they could have very profitably been used to discuss the subject at hand (decision problems in mathematical logic).
But in 1927 combinators (if not perhaps Schönfinkel’s recognition for them) had a remarkable piece of good fortune. Schönfinkel’s paper was discovered by a certain Haskell Curry—who would then devote more than 50 years to studying what he named “combinators”, and to spreading the word about them.
At some level I think one can view the main thrust of what Curry and his disciples did with combinators as an effort to “mathematicize” them. Schönfinkel had presented combinators in a rather straightforward “structural” way. But what was the mathematical interpretation of what he did, and of how combinators work in general? What mathematical formalism could capture Schönfinkel’s structural idea of substitution? Just what, for example, was the true notion of equality for combinators?
In the end, combinators are fundamentally computational constructs, full of all the phenomena of “unbridled computation”—like undecidability and computational irreducibility. And it’s inevitable that mathematics as normally conceived can only go so far in “cracking” them.
But back in the 1920s and 1930s the concept and power of computation was not yet understood, and it was assumed that the ideas and tools of mathematics would be the ones to use in analyzing a formal system like combinators. And it wasn’t that mathematical methods got absolutely nowhere with combinators.
Unlike cellular automata, or even Turing machines, there’s a certain immediate structural complexity to combinators, with their elaborate tree structures, equivalences and so on. And so there was progress to be made—and years of work to be done—in untangling this, without having to face the raw features of fullscale computation, like computational irreducibility.
In the end, combinators are full of computational irreducibility. But they also have layers of computational reducibility, some of which are aligned with the kinds of things mathematics and mathematical logic have been set up to handle. And in this there’s a curious resonance with our recent Physics Project.
In our models based on hypergraph rewriting there’s also a kind of bedrock of computational irreducibility. But as with combinators, there’s a certain immediate structural complexity to what our models do. And there are layers of computational reducibility associated with this. But the remarkable thing with our models is that some of those layers—and the formalisms one can build to understand them—have an immediate interpretation: they are basically the core theories of twentiethcentury physics, namely general relativity and quantum mechanics.
Combinators work sufficiently differently that they don’t immediately align with that kind of interpretation. But it’s still true that one of the important properties discovered in combinators (namely confluence, related to our idea of causal invariance) turns out to be crucial to our models, their correspondence with physics, and in the end our whole ability to perceive regularity in the universe, even in the face of computational irreducibility.
But let’s get back to the story of combinators as it played out after Schönfinkel’s paper. Schönfinkel had basically set things up in a novel, very direct, structural way. But Curry wanted to connect with more traditional ideas in mathematical logic, and mathematics in general. And after a first paper (published in 1929) which pretty much just recorded his first thoughts, and his efforts to understand what Schönfinkel had done, Curry was by 1930 starting to do things like formulate axioms for combinators, and hoping to prove general theorems about mathematical properties like equality.
Without the understanding of universal computation and their relationship to it, it wasn’t clear yet how complicated it might ultimately be to deal with combinators. And Curry pushed forward, publishing more papers and trying to do things like define set theory using his axioms for combinators. But in 1934 disaster struck. It wasn’t something about computation or undecidability; instead it was that Stephen Kleene and J. Barkley Rosser showed the axioms Curry had come up with to try and “tighten up Schönfinkel” were just plain inconsistent.
To Kleene and Rosser it provided more evidence of the need for Russell’s (originally quite hacky) idea of types—and led them to more complicated axiom systems, and away from combinators. But Curry was undeterred. He revised his axiom system and continued—ultimately for many decades—to see what could be proved about combinators and things like them using mathematical methods.
But already at the beginning of the 1930s there were bigger things afoot around mathematical logic—which would soon intersect with combinators.
How should one represent the fundamental constructs of mathematics? Back in the 1920s nobody thought seriously about using combinators. And instead there were basically three “big brands”: Principia Mathematica, set theory and Hilbert’s program. Relations were being found, details were being filled in, and issues were being found. But there was a general sense that progress was being made.
Quite where the boundaries might lie wasn’t clear. For example, could one specify a way to “construct any function” from lowerlevel primitives? The basic idea of recursion was very old (think: Fibonacci). But by the early 1920s there was a fairly wellformalized notion of “primitive recursion” in which functions always found their values from earlier values. But could all “mathematical” functions be constructed this way?
By 1926 it was known that this wouldn’t work: the Ackermann function was a reasonable “mathematical” function, but it wasn’t primitive recursive. It meant that definitions had to be generalized (e.g. to “general recursive functions” that didn’t just look back at earlier values, but could “look forward until...” as well). But there didn’t seem to be any fundamental problem with the idea that mathematics could just “mechanistically” be built out forever from appropriate primitives.
But in 1931 came Gödel’s theorem. There’d been a long tradition of identifying paradoxes and inconsistencies, and finding ways to patch them by changing axioms. But Gödel’s theorem was based on Peano’s bythenstandard axioms for arithmetic (branded by Gödel as a fragment of Principia Mathematica). And it showed there was a fundamental problem.
In essence, Gödel took the paradoxical statement “this statement is unprovable” and showed that it could be expressed purely as a statement of arithmetic—roughly a statement about the existence of solutions to appropriate integer equations. And basically what Gödel had to do to achieve this was to create a “compiler” capable of compiling things like “this statement is unprovable” into arithmetic.
In his paper one can basically see him building up different capabilities (e.g. representing arbitrary expressions as numbers through Gödel numbering, checking conditions using general recursion, etc.)—eventually getting to a “high enough level” to represent the statement he wanted:
What did Gödel’s theorem mean? For the foundations of mathematics it meant that the idea of mechanically proving “all true theorems of mathematics” wasn’t going to work. Because it showed that there was at least one statement that by its own admission couldn’t be proved, but was still a “statement about arithmetic”, in the sense that it could be “compiled into arithmetic”.
That was a big deal for the foundations of mathematics. But actually there was something much more significant about Gödel’s theorem, even though it wasn’t recognized at the time. Gödel had used the primitives of number theory and logic to build what amounted to a computational system—in which one could take things like “this statement is unprovable”, and “run them in arithmetic”.
What Gödel had, though, wasn’t exactly a streamlined general system (after all, it only really needed to handle one statement). But the immediate question then was: if there’s a problem with this statement in arithmetic, what about Hilbert’s general “decision problem” (Entscheidungsproblem) for any axiom system?
To discuss the “general decision problem”, though, one needed some kind of general notion of how one could decide things. What ultimate primitives should one use? Schönfinkel (with Paul Bernays)—in his sole other published paper—wrote about a restricted case of the decision problem in 1927, but doesn’t seem to have had the idea of using combinators to study it.
By 1934 Gödel was talking about general recursiveness (i.e. definability through general recursion). And Alonzo Church and Stephen Kleene were introducing λ definability. Then in 1936 Alan Turing introduced Turing machines. All these approaches involved setting up certain primitives, then showing that a large class of things could be “compiled” to those primitives. And that—in effect by thinking about having it compile itself—Hilbert’s Entscheidungsproblem couldn’t be solved.
Perhaps no single result along these lines would have been so significant. But it was soon established that all three kinds of systems were exactly equivalent: the set of computations they could represent were the same, as established by showing that one system could emulate another. And from that discovery eventually emerged the modern notion of universal computation—and all its implications for technology and science.
In the early days, though, there was actually a fourth equivalent kind of system—based on string rewriting—that had been invented by Emil Post in 1920–1. Oh, and then there were combinators.
What was the right “language” to use for setting up mathematical logic? There’d been gradual improvement since the complexities of Principia Mathematica. But around 1930 Alonzo Church wanted a new and cleaner setup. And he needed to have a way (as Frege and Principia Mathematica had done before him) to represent “pure functions”. And that’s how he came to invent λ.
Today in the Wolfram Language we have Function[x,f[x]] or xf[x] (or various shorthands). Church originally had λx[M]:
But what’s perhaps most notable is that on the very first page he defines λ, he’s referencing Schönfinkel’s combinator paper. (Well, specifically, he’s referencing it because he wants to use the device Schönfinkel invented that we now call currying—f[x][y] in place of f[x,y]—though ironically he doesn’t mention Curry.) In his 1932 paper (apparently based on work in 1928–9) λ is almost a sideshow—the main event being the introduction of 37 formal postulates for mathematical logic:
By the next year J. Barkley Rosser is trying to retool Curry’s “combinatory logic” with combinators of his own—and showing how they correspond to lambda expressions:
Then in 1935 lambda calculus has its big “coming out” in Church’s “An Unsolvable Problem of Elementary Number Theory”, in which he introduces the idea that any “effectively calculable” function should be “λ definable”, then defines integers in terms of λ’s (“Church numerals”)
and then shows that the problem of determining equivalence for λ expressions is undecidable.
Very soon thereafter Turing publishes his “On Computable Numbers, with an Application to the Entscheidungsproblem” in which he introduces his much more manifestly mechanistic Turing machine model of computation. In the main part of the paper there are no lambdas—or combinators—to be seen. But by late 1936 Turing had gone to Princeton to be a student with Church—and added a note showing the correspondence between his Turing machines and Church’s lambda calculus.
By the next year, when Turing is writing his rather abstruse “Systems of Logic Based on Ordinals” he’s using lambda calculus all over the place. Early in the document he writes I → λx[x], and soon he’s mixing lambdas and combinators with wild abandon—and in fact he’d already published a onepage paper which introduced the fixedpoint combinator Θ (and, yes, the K in the title refers to Schönfinkel’s K combinator):
When Church summarized the state of lambda calculus in 1941 in his “The Calculi of LambdaConversion” he again made extensive use of combinators. Schönfinkel’s K is prominent. But Schönfinkel’s S is nowhere to be seen—and in fact Church has his own S combinator S[n][f][x]→f[n[f][x]] which implements successors in Church’s numeral system. And he also has a few other “basic combinators” that he routinely uses.
In the end, combinators and lambda calculus are completely equivalent, and it’s quite easy to convert between them—but there’s a curious tradeoff. In lambda calculus one names variables, which is good for human readability, but can lead to problems at a formal level. In combinators, things are formally much cleaner, but the expressions one gets can be completely incomprehensible to humans.
The point is that in a lambda expression like λx λy x[y] one’s naming the variables (here x and y), but really these names are just placeholders: what they are doesn’t matter; they’re just showing where different arguments go. And in a simple case like this, everything is fine. But what happens if one substitutes for y another lambda expression, say λx f[x]? What is that x? Is it the same x as the one outside, or something different? In practice, there are all sorts of renaming schemes that can be used, but they tend to be quite hacky, and things can quickly get tangled up. And if one wants to make formal proofs about lambda calculus, this can potentially be a big problem, and indeed at the beginning it wasn’t clear it wouldn’t derail the whole idea of lambda calculus.
And that’s part of why the correspondence between lambda calculus and combinators was important. With combinators there are no variables, and so no variable names to get tangled up. So if one can show that something can be converted to combinators—even if one never looks at the potentially very long and ugly combinator expression that’s generated—one knows one’s safe from issues about variable names.
There are still plenty of other complicated issues, though. Prominent among them are questions about when combinator expressions can be considered equal. Let’s say you have a combinator expression, like s[s[s[s][k]]][k]. Well, you can repeatedly apply the rules for combinators to transform and reduce it. And it’ll often end up at a fixed point, where no rules apply anymore. But a basic question is whether it matters in which order the rules are applied. And in 1936 Church and Rosser proved it doesn’t.
Actually, what they specifically proved was the analogous result for lambda calculus. They drew a picture to indicate different possible orders in which lambdas could be reduced out, and showed it didn’t matter which path one takes:
This all might seem like a detail. But it turns out that generalizations of their result apply to all sorts of systems. In doing computations (or automatically proving theorems) it’s all about “it doesn’t matter what path you take; you’ll always get the same result”. And that’s important. But recently there’s been another important application that’s shown up. It turns out that a generalization of the “Church–Rosser property” is what we call causal invariance in our Physics Project.
And it’s causal invariance that leads in our models to relativistic invariance, general covariance, objective reality in quantum mechanics, and other central features of physics.
In retrospect, one of the great achievements of the 1930s was the inception of what ended up being the idea of universal computation. But at the time what was done was couched in terms of mathematical logic and it was far from obvious that any of the theoretical structures being built would have any real application beyond thinking about the foundations of mathematics. But even as people like Hilbert were talking in theoretical terms about the mechanization of mathematics, more and more there were actual machines being built for doing mathematical calculations.
We know that even in antiquity (at least one) simple gearbased mechanical calculational devices existed. In the mid1600s arithmetic calculators started being constructed, and by the late 1800s they were in widespread use. At first they were mechanical, but by the 1930s most were electromechanical, and there started to be systems where units for carrying out different arithmetic operations could be chained together. And by the end of the 1940s fairly elaborate such systems based on electronics were being built.
Already in the 1830s Charles Babbage had imagined an “analytical engine” which could do different operations depending on a “program” specified by punch cards—and Ada Lovelace had realized that such a machine had broad “computational” potential. But by the 1930s a century had passed and nothing like this was connected to the theoretical developments that were going on—and the actual engineering of computational systems was done without any particular overarching theoretical framework.
Still, as electronic devices got more complicated and scientific interest in psychology intensified, something else happened: there started to be the idea (sometimes associated with the name cybernetics) that somehow electronics might reproduce how things like brains work. In the mid1930s Claude Shannon had shown that Boolean algebra could represent how switching circuits work, and in 1943 Warren McCulloch and Walter Pitts proposed a model of idealized neural networks formulated in something close to mathematical logic terms.
Meanwhile by the mid1940s John von Neumann—who had worked extensively on mathematical logic—had started suggesting mathlike specifications for practical electronic computers, including the way their programs might be stored electronically. At first he made lots of brainlike references to “organs” and “inhibitory connections”, and essentially no mention of ideas from mathematical logic. But by the end of the 1940s von Neumann was talking at least conceptually about connections to Gödel’s theorem and Turing machines, Alan Turing had become involved with actual electronic computers, and there was the beginning of widespread understanding of the notion of generalpurpose computers and universal computation.
In the 1950s there was an explosion of interest in what would now be called the theory of computation—and great optimism about its relevance to artificial intelligence. There was all sorts of “interdisciplinary work” on fairly “concrete” models of computation, like finite automata, Turing machines, cellular automata and idealized neural networks. More “abstract” approaches, like recursive functions, lambda calculus—and combinators—remained, however, pretty much restricted to researchers in mathematical logic.
When early programming languages started to appear in the latter part of the 1950s, thinking about practical computers began to become a bit more abstract. It was understood that the grammars of languages could be specified recursively—and actual recursion (of functions being able to call themselves) just snuck into the specification of ALGOL 60. But what about the structures on which programs operated? Most of the concentration was on arrays (sometimes rather elegantly, as in APL) and, occasionally, character strings.
But a notable exception was LISP, described in John McCarthy’s 1960 paper “Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I” (part 2 was not written). There was lots of optimism about AI at the time, and the idea was to create a language to “implement AI”—and do things like “mechanical theorem proving”. A key idea—that McCarthy described as being based on “recursive function formalism”—was to have treestructured symbolic expressions (“S expressions”). (In the original paper, what’s now Wolfram Language–style f[g[x]] “M expression” notation, complete with square brackets, was used as part of the specification, but the quintessentialLISPlike (f (g x)) notation won out when LISP was actually implemented.)
An issue in LISP was how to take “expressions” (which were viewed as representing things) and turn them into functions (which do things). And the basic plan was to use Church’s idea of λ notation. But when it came time to implement this, there was, of course, trouble with name collisions, which ended up getting handled in quite hacky ways. So did McCarthy know about combinators? The answer is yes, as his 1960 paper shows:
I actually didn’t know until just now that McCarthy had ever even considered combinators, and in the years I knew him I don’t think I ever personally talked to him about them. But it seems that for McCarthy—as for Church—combinators were a kind of “comforting backstop” that ensured that it was OK to use lambdas, and that if things went too badly wrong with variable naming, there was at least in principle always a way to untangle everything.
In the practical development of computers and computer languages, even lambdas—let alone combinators—weren’t really much heard from again (except in a small AI circle) until the 1980s. And even then it didn’t help that in an effort variously to stay close to hardware and to structure programs there tended to be a desire to give everything a “data type”—which was at odds with the “consume any expression” approach of standard combinators and lambdas. But beginning in the 1980s—particularly with the progressive rise of functional programming—lambdas, at least, have steadily gained in visibility and practical application.
What of combinators? Occasionally as a proof of principle there’ll be a hardware system developed that natively implements Schönfinkel’s combinators. Or—particularly in modern times—there’ll be an esoteric language that uses combinators in some kind of purposeful effort at obfuscation. Still, a remarkable crosssection of notable people concerned with the foundations of computing have—at one time or another—taught about combinators or written a paper about them. And in recent years the term “combinator” has become more popular as a way to describe a “purely applicative” function.
But by and large the important ideas that first arose with combinators ended up being absorbed into practical computing by quite circuitous routes, without direct reference to their origins, or to the specific structure of combinators.
For 100 years combinators have mostly been an obscure academic topic, studied particularly in connection with lambda calculus, at borders between theoretical computer science, mathematical logic and to some extent mathematical formalisms like category theory. Much of the work that’s been done can be traced in one way or another to the influence of Haskell Curry or Alonzo Church—particularly through their students, grandstudents, greatgrandstudents, etc. Partly in the early years, most of the work was centered in the US, but by the 1960s there was a strong migration to Europe and especially the Netherlands.
But even with all their abstractness and obscurity, on a few rare occasions combinators have broken into something cl