*Version 14.0 of Wolfram Language and Mathematica is available immediately both on the desktop and in the cloud. See also more detailed information on Version 13.1, Version 13.2 and Version 13.3.*

Today we celebrate a new waypoint on our journey of nearly four decades with the release of Version 14.0 of Wolfram Language and Mathematica. Over the two years since we released Version 13.0 we’ve been steadily delivering the fruits of our research and development in .1 releases every six months. Today we’re aggregating these—and more—into Version 14.0.

It’s been more than 35 years now since we released Version 1.0. And all those years we’ve been continuing to build a taller and taller tower of capabilities, progressively expanding the scope of our vision and the breadth of our computational coverage of the world:

Version 1.0 had 554 built-in functions; in Version 14.0 there are 6602. And behind each of those functions is a story. Sometimes it’s a story of creating a superalgorithm that encapsulates decades of algorithmic development. Sometimes it’s a story of painstakingly curating data that’s never been assembled before. Sometimes it’s a story of drilling down to the essence of something to invent new approaches and new functions that can capture it.

And from all these pieces we’ve been steadily building the coherent whole that is today’s Wolfram Language. In the arc of intellectual history it defines a broad, new, computational paradigm for formalizing the world. And at a practical level it provides a superpower for implementing computational thinking—and enabling “computational X” for all fields X.

To us it’s profoundly satisfying to see what has been done over the past three decades with everything we’ve built so far. So many discoveries, so many inventions, so much achieved, so much learned. And seeing this helps drive forward our efforts to tackle still more, and to continue to push every boundary we can with our R&D, and to deliver the results in new versions of our system.

Our R&D portfolio is broad. From projects that get completed within months of their conception, to projects that rely on years (and sometimes even decades) of systematic development. And key to everything we do is leveraging what we have already done—often taking what in earlier years was a pinnacle of technical achievement, and now using it as a routine building block to reach a level that could barely even be imagined before. And beyond practical technology, we’re also continually going further and further in leveraging what’s now the vast conceptual framework that we’ve been building all these years—and progressively encapsulating it in the design of the Wolfram Language.

We’ve worked hard all these years not only to create ideas and technology, but also to craft a practical and sustainable ecosystem in which we can systematically do this now and into the long-term future. And we continue to innovate in these areas, broadening the delivery of what we’ve built in new and different ways, and through new and different channels. And in the past five years we’ve also been able to open up our core design process to the world—regularly livestreaming what we’re doing in a uniquely open way.

And indeed over the past several years the seeds of essentially everything we’re delivering today in Version 14.0 has been openly shared with the world, and represents an achievement not only for our internal teams but also for the many people who have participated in and commented on our livestreams.

Part of what Version 14.0 is about is continuing to expand the domain of our computational language, and our computational formalization of the world. But Version 14.0 is also about streamlining and polishing the functionality we’ve already defined. Throughout the system there are things we’ve made more efficient, more robust and more convenient. And, yes, in complex software, bugs of many kinds are a theoretical and practical inevitability. And in Version 14.0 we’ve fixed nearly 10,000 bugs, the majority found by our increasingly sophisticated internal software testing methods.

Even after all the work we’ve put into the Wolfram Language over the past several decades, there’s still yet another challenge: how to let people know just what the Wolfram Language can do. Back when we released Version 1.0 I was able to write a book of manageable size that could pretty much explain the whole system. But for Version 14.0—with all the functionality it contains—one would need a book with perhaps 200,000 pages.

And at this point nobody (even me!) immediately knows everything the Wolfram Language does. Of course one of our great achievements has been to maintain across all that functionality a tightly coherent and consistent design that results in there ultimately being only a small set of fundamental principles to learn. But at the vast scale of the Wolfram Language as it exists today, knowing what’s possible—and what can now be formulated in computational terms—is inevitably very challenging. And all too often when I show people what’s possible, I’ll get the response “I had no idea the Wolfram Language could do *that*!”

So in the past few years we’ve put increasing emphasis into building large-scale mechanisms to explain the Wolfram Language to people. It begins at a very fine-grained level, with “just-in-time information” provided, for example, through suggestions made when you type. Then for each function (or other construct in the language) there are pages that explain the function, with extensive examples. And now, increasingly, we’re adding “just-in-time learning material” that leverages the concreteness of the functions to provide self-contained explanations of the broader context of what they do.

By the way, in modern times we need to explain the Wolfram Language not just to humans, but also to AIs—and our very extensive documentation and examples have proved extremely valuable in training LLMs to use the Wolfram Language. And for AIs we’re providing a variety of tools—like immediate computable access to documentation, and computable error handling. And with our Chat Notebook technology there’s also a new “on ramp” for creating Wolfram Language code from linguistic (or visual, etc.) input.

But what about the bigger picture of the Wolfram Language? For both people and AIs it’s important to be able to explain things at a higher level, and we’ve been doing more and more in this direction. For more than 30 years we’ve had “guide pages” that summarize specific functionality in particular areas. Now we’re adding “core area pages” that give a broader picture of large areas of functionality—each one in effect covering what might otherwise be a whole product on its own, if it wasn’t just an integrated part of the Wolfram Language:

But we’re going even much further, building whole courses and books that provide modern hands-on Wolfram-Language-enabled introductions to a broad range of areas. We’ve now covered the material of many standard college courses (and quite a lot besides), in a new and very effective “computational” way, that allows immediate, practical engagement with concepts:

All these courses involve not only lectures and notebooks but also auto-graded exercises, as well as official certifications. And we have a regular calendar of everyone-gets-together-at-the-same-time instructor-led peer Study Groups about these courses. And, yes, our Wolfram U operation is now emerging as a significant educational entity, with many thousands of students at any given time.

In addition to whole courses, we have “miniseries” of lectures about specific topics:

And we also have courses—and books—about the Wolfram Language itself, like my *Elementary Introduction to the Wolfram Language*, which came out in a third edition this year (and has an associated course, online version, etc.):

In a somewhat different direction, we’ve expanded our Wolfram Summer School to add a Wolfram Winter School, and we’ve greatly expanded our Wolfram High School Summer Research Program, adding year-round programs, middle-school programs, etc.—including the new “Computational Adventures” weekly activity program.

And then there’s livestreaming. We’ve been doing weekly “R&D livestreams” with our development team (and sometimes also external guests). And I myself have also been doing a lot of livestreaming (232 hours of it in 2023 alone)—some of it design reviews of Wolfram Language functionality, and some of it answering questions, technical and other.

The list of ways we’re getting the word out about the Wolfram Language goes on. There’s Wolfram Community, that’s full of interesting contributions, and has ever-increasing readership. There are sites like Wolfram Challenges. There are our Wolfram Technology Conferences. And lots more.

We’ve put immense effort into building the whole Wolfram technology stack over the past four decades. And even as we continue to aggressively build it, we’re putting more and more effort into telling the world about just what’s in it, and helping people (and AIs) to make the most effective use of it. But in a sense, everything we’re doing is just a seed for what the wider community of Wolfram Language users are doing, and can do. Spreading the power of the Wolfram Language to more and more people and areas.

The machine learning superfunctions `Classify` and `Predict` first appeared in Wolfram Language in 2014 (Version 10). By the next year there were starting to be functions like `ImageIdentify` and `LanguageIdentify`, and within a couple of years we’d introduced our whole neural net framework and Neural Net Repository. Included in that were a variety of neural nets for language modeling, that allowed us to build out functions like `SpeechRecognize` and an experimental version of `FindTextualAnswer`. But—like everyone else—we were taken by surprise at the end of 2022 by ChatGPT and its remarkable capabilities.

Very quickly we realized that a major new use case—and market—had arrived for Wolfram|Alpha and Wolfram Language. For now it was not only humans who’d need the tools we’d built; it was also AIs. By March 2023 we’d worked with OpenAI to use our Wolfram Cloud technology to deliver a plugin to ChatGPT that allows it to call Wolfram|Alpha and Wolfram Language. LLMs like ChatGPT provide remarkable new capabilities in reproducing human language, basic human thinking and general commonsense knowledge. But—like unaided humans—they’re not set up to deal with detailed computation or precise knowledge. For that, like humans, they have to use formalism and tools. And the remarkable thing is that the formalism and tools we’ve built in Wolfram Language (and Wolfram|Alpha) are basically a broad, perfect fit for what they need.

We created the Wolfram Language to provide a bridge from what humans think about to what computation can express and implement. And now that’s what the AIs can use as well. The Wolfram Language provides a medium not only for humans to “think computationally” but also for AIs to do so. And we’ve been steadily doing the engineering to let AIs call on Wolfram Language as easily as possible.

But in addition to LLMs using Wolfram Language, there’s also now the possibility of Wolfram Language using LLMs. And already in June 2023 (Version 13.3) we released a major collection of LLM-based capabilities in Wolfram Language. One category is LLM functions, that effectively use LLMs as “internal algorithms” for operations in Wolfram Language:

In typical Wolfram Language fashion, we have a symbolic representation for LLMs: `LLMConfiguration``[`…`]` represents an LLM with its various parameters, promptings, etc. And in the past few months we’ve been steadily adding connections to the full range of popular LLMs, making Wolfram Language a unique hub not only for LLM usage, but also for studying the performance—and science—of LLMs.

You can define your own LLM functions in Wolfram Language. But there’s also the Wolfram Prompt Repository that plays a similar role for LLM functions as the Wolfram Function Repository does for ordinary Wolfram Language functions. There’s a public Prompt Repository that so far has several hundred curated prompts. But it’s also possible for anyone to post their prompts in the Wolfram Cloud and make them publicly (or privately) accessible. The prompts can define personas (“talk like a [stereotypical] pirate”). They can define AI-oriented functions (“write it with emoji”). And they can define modifiers that affect the form of output (“haiku style”).

In addition to calling LLMs “programmatically” within Wolfram Language, there’s the new concept (first introduced in Version 13.3) of “Chat Notebooks”. Chat Notebooks represent a new kind of user interface, that combines the graphical, computational and document features of traditional Wolfram Notebooks with the new linguistic interface capabilities brought to us by LLMs.

The basic idea of a Chat Notebook—as introduced in Version 13.3, and now extended in Version 14.0—is that you can have “chat cells” (requested by typing `‘`) whose content gets sent not to the Wolfram kernel, but instead to an LLM:

You can use “function prompts”—say from the Wolfram Prompt Repository—directly in a Chat Notebook:

And as of Version 14.0 you can also knit Wolfram Language computations directly into your “conversation” with the LLM:

(You type `\` to insert Wolfram Language, very much like the way you can use <* … *> to insert Wolfram Language into external evaluation cells.)

One thing about Chat Notebooks is that—as their name suggests—they really are centered around “chatting”, and around having a sequential interaction with an LLM. In an ordinary notebook, it doesn’t matter where in the notebook each Wolfram Language evaluation is requested; all that’s relevant is the order in which the Wolfram kernel does the evaluations. But in a Chat Notebook the “LLM evaluations” are always part of a “chat” that’s explicitly laid out in the notebook.

A key part of Chat Notebooks is the concept of a chat block: type `~` and you get a separator in the notebook that “starts a new chat”:

Chat Notebooks—with all their typical Wolfram Notebook editing, structuring, automation, etc. capabilities—are very powerful just as “LLM interfaces”. But there’s another dimension as well, enabled by LLMs being able to call Wolfram Language as a tool.

At one level, Chat Notebooks provide an “on ramp” for using Wolfram Language. Wolfram|Alpha—and even more so, Wolfram|Alpha Notebook Edition—let you ask questions in natural language, then have the questions translated into Wolfram Language, and answers computed. But in Chat Notebooks you can go beyond asking specific questions. Instead, through the LLM, you can just “start chatting” about what you want to do, then have Wolfram Language code generated, and executed:

The workflow is typically as follows. First, you have to conceptualize in computational terms what you want. (And, yes, that step requires computational thinking—which is a very important skill that too few people have so far learned.) Then you tell the LLM what you want, and it’ll try to write Wolfram Language code to achieve it. It’ll typically run the code for you (but you can also always do it yourself)—and you can see whether you got what you wanted. But what’s crucial is that Wolfram Language is intended to be read not only by computers but also by humans. And particularly since LLMs actually usually seem to manage to write pretty good Wolfram Language code, you can expect to read what they wrote, and see if it’s what you wanted. If it is, you can take that code, and use it as a “solid building block” for whatever larger system you might be trying to set up. Otherwise, you can either fix it yourself, or try chatting with the LLM to get it to do it.

One of the things we see in the example above is the LLM—within the Chat Notebook—making a “tool call”, here to a Wolfram Language evaluator. In the Wolfram Language there’s now a whole mechanism for defining tools for LLMs—with each tool being represented by an `LLMTool` symbolic object. In Version 14.0 there’s an experimental version of the new Wolfram LLM Tool Repository with some predefined tools:

In a default Chat Notebook, the LLM has access to some default tools, which include not only the Wolfram Language evaluator, but also things like Wolfram documentation search and Wolfram|Alpha query. And it’s common to see the LLM go back and forth trying to write “code that works”, and for example sometimes having to “resort” (much like humans do) to reading the documentation.

Something that’s new in Version 14.0 is experimental access to multimodal LLMs that can take images as well as text as input. And when this capability is enabled, it allows the LLM to “look at pictures from the code it generated”, see if they’re what was asked for, and potentially correct itself:

The deep integration of images into Wolfram Language—and Wolfram Notebooks—yields all sorts of possibilities for multimodal LLMs. Here we’re giving a plot as an image and asking the LLM how to reproduce it:

Another direction for multimodal LLMs is to take data (in the hundreds of formats accepted by Wolfram Language) and use the LLM to guide its visualization and analysis in the Wolfram Language. Here’s an example that starts from a file data.csv in the current directory on your computer:

One thing that’s very nice about using Wolfram Language directly is that everything you do (well, unless you use `RandomInteger`, etc.) is completely reproducible; do the same computation twice and you’ll get the same result. That’s not true with LLMs (at least right now). And so when one uses LLMs it feels like something more ephemeral and fleeting than using Wolfram Language. One has to grab any good results one gets—because one might never be able to reproduce them. Yes, it’s very helpful that one can store everything in a Chat Notebook, even if one can’t rerun it and get the same results. But the more “permanent” use of LLM results tends to be “offline”. Use an LLM “up front” to figure something out, then just use the result it gave.

One unexpected application of LLMs for us has been in suggesting names of functions. With the LLM’s “experience” of what people talk about, it’s in a good position to suggest functions that people might find useful. And, yes, when it writes code it has a habit of hallucinating such functions. But in Version 14.0 we’ve actually added one function—`DigitSum`—that was suggested to us by LLMs. And in a similar vein, we can expect LLMs to be useful in making connections to external databases, functions, etc. The LLM “reads the documentation”, and tries to write Wolfram Language “glue” code—which then can be reviewed, checked, etc., and if it’s right, can be used henceforth.

Then there’s data curation, which is a field that—through Wolfram|Alpha and many of our other efforts—we’ve become extremely expert at over the past couple of decades. How much can LLMs help with that? They certainly don’t “solve the whole problem”, but integrating them with the tools we already have has allowed us over the past year to speed up some of our data curation pipelines by factors of two or more.

If we look at the whole stack of technology and content that’s in the modern Wolfram Language, the overwhelming majority of it isn’t helped by LLMs, and isn’t likely to be. But there are many—sometimes unexpected—corners where LLMs can dramatically improve heuristics or otherwise solve problems. And in Version 14.0 there are starting to be a wide variety of “LLM inside” functions.

An example is `TextSummarize`, which is a function we’ve considered adding for many versions—but now, thanks to LLMs, can finally implement to a useful level:

The main LLMs that we’re using right now are based on external services. But we’re building capabilities to allow us to run LLMs in local Wolfram Language installations as soon as that’s technically feasible. And one capability that’s actually part of our mainline machine learning effort is `NetExternalObject`—a way of representing symbolically an externally defined neural net that can be run inside Wolfram Language. `NetExternalObject` allows you, for example, to take any network in ONNX form and effectively treat it as a component in a Wolfram Language neural net. Here’s a network for image depth estimation—that we’re here importing from an external repository (though in this case there’s actually a similar network already in the Wolfram Neural Net Repository):

Now we can apply this imported network to an image that’s been encoded with our built-in image encoder—then we’re taking the result and visualizing it:

It’s often very convenient to be able to run networks locally, but it can sometimes take quite high-end hardware to do so. For example, there’s now a function in the Wolfram Function Repository that does image synthesis entirely locally—but to run it, you do need a GPU with at least 8 GB of VRAM:

By the way, based on LLM principles (and ideas like transformers) there’ve been other related advances in machine learning that have been strengthening a whole range of Wolfram Language areas—with one example being image segmentation, where `ImageSegmentationComponents` now provides robust “content-sensitive” segmentation:

When Mathematica 1.0 was released in 1988, it was a “wow” that, yes, now one could routinely do integrals symbolically by computer. And it wasn’t long before we got to the point—first with indefinite integrals, and later with definite integrals—where what’s now the Wolfram Language could do integrals better than any human. So did that mean we were “finished” with calculus? Well, no. First there were differential equations, and partial differential equations. And it took a decade to get symbolic ODEs to a beyond-human level. And with symbolic PDEs it took until just a few years ago. Somewhere along the way we built out discrete calculus, asymptotic expansions and integral transforms. And we also implemented lots of specific features needed for applications like statistics, probability, signal processing and control theory. But even now there are still frontiers.

And in Version 14 there are significant advances around calculus. One category concerns the structure of answers. Yes, one can have a formula that correctly represents the solution to a differential equation. But is it in the best, simplest or most useful form? Well, in Version 14 we’ve worked hard to make sure it is—often dramatically reducing the size of expressions that get generated.

Another advance has to do with expanding the range of “pre-packaged” calculus operations. We’ve been able to do derivatives ever since Version 1.0. But in Version 14 we’ve added implicit differentiation. And, yes, one can give a basic definition for this easily enough using ordinary differentiation and equation solving. But by adding an explicit `ImplicitD` we’re packaging all that up—and handling the tricky corner cases—so that it becomes routine to use implicit differentiation wherever you want:

Another category of pre-packaged calculus operations new in Version 14 are ones for vector-based integration. These were always possible to do in a “do-it-yourself” mode. But in Version 14 they are now streamlined built-in functions—that, by the way, also cover corner cases, etc. And what made them possible is actually a development in another area: our decade-long project to add geometric computation to Wolfram Language—which gave us a natural way to describe geometric constructs such as curves and surfaces:

Related functionality new in Version 14 is `ContourIntegrate`:

Functions like `ContourIntegrate` just “get the answer”. But if one’s learning or exploring calculus it’s often also useful to be able to do things in a more step-by-step way. In Version 14 you can start with an inactive integral

and explicitly do operations like changing variables:

Sometimes actual answers get expressed in inactive form, particularly as infinite sums:

And now in Version 14 the function `TruncateSum` lets you take such a sum and generate a truncated “approximation”:

Functions like `D` and `Integrate`—as well as `LineIntegrate` and `SurfaceIntegrate`—are, in a sense, “classic calculus”, taught and used for more than three centuries. But in Version 14 we also support what we can think of as “emerging” calculus operations, like fractional differentiation:

What are the primitives from which we can best build our conception of computation? That’s at some level the question I’ve been asking for more than four decades, and what’s determined the functions and structures at the core of the Wolfram Language.

And as the years go by, and we see more and more of what’s possible, we recognize and invent new primitives that will be useful. And, yes, the world—and the ways people interact with computers—change too, opening up new possibilities and bringing new understanding of things. Oh, and this year there are LLMs which can “get the intellectual sense of the world” and suggest new functions that can fit into the framework we’ve created with the Wolfram Language. (And, by the way, there’ve also been lots of great suggestions made by the audiences of our design review livestreams.)

One new construct added in Version 13.1—and that I personally have found very useful—is `Threaded`. When a function is listable—as `Plus` is—the top levels of lists get combined:

But sometimes you want one list to be “threaded into” the other at the lowest level, not the highest. And now there’s a way to specify that, using `Threaded`:

In a sense, `Threaded` is part of a new wave of symbolic constructs that have “ambient effects” on lists. One very simple example (introduced in 2015) is `Nothing`:

Another, introduced in 2020, is `Splice`:

An old chestnut of Wolfram Language design concerns the way infinite evaluation loops are handled. And in Version 13.2 we introduced the symbolic construct `TerminatedEvaluation` to provide better definition of how out-of-control evaluations have been terminated:

In a curious connection, in the computational representation of physics in our recent Physics Project, the direct analog of nonterminating evaluations are what make possible the seemingly unending universe in which we live.

But what is actually going on “inside an evaluation”, terminating or not? I’ve always wanted a good representation of this. And in fact back in Version 2.0 we introduced `Trace` for this purpose:

But just how much detail of what the evaluator does should one show? Back in Version 2.0 we introduced the option `TraceOriginal` that traces every path followed by the evaluator:

But often this is way too much. And in Version 14.0 we’ve introduced the new setting `TraceOriginal``→``Automatic`, which doesn’t include in its output evaluations that don’t do anything:

This may seem pedantic, but when one has an expression of any substantial size, it’s a crucial piece of pruning. So, for example, here’s a graphical representation of a simple arithmetic evaluation, with `TraceOriginal``→``True`:

And here’s the corresponding “pruned” version, with `TraceOriginal``→``Automatic`:

(And, yes, the structures of these graphs are closely related to things like the causal graphs we construct in our Physics Project.)

In the effort to add computational primitives to the Wolfram Language, two new entrants in Version 14.0 are `Comap` and `ComapApply`. The function `Map` takes a function `f` and “maps it” over a list:

`Comap` does the “mathematically co-” version of this, taking a list of functions and “comapping” them onto a single argument:

Why is this useful? As an example, one might want to apply three different statistical functions to a single list. And now it’s easy to do that, using `Comap`:

By the way, as with `Map`, there’s also an operator form for `Comap`:

`Comap` works well when the functions it’s dealing with take just one argument. If one has functions that take multiple arguments, `ComapApply` is what one typically wants:

Talking of “co-like” functions, a new function added in Version 13.2 is `PositionSmallest`. `Min` gives the smallest element in a list; `PositionSmallest` instead says where the smallest elements are:

One of the important objectives in the Wolfram Language is to have as much as possible “just work”. When we released Version 1.0 strings could be assumed just to contain ordinary ASCII characters, or perhaps to have an external character encoding defined. And, yes, it could be messy not to know “within the string itself” what characters were supposed to be there. And by the time of Version 3.0 in 1996 we’d become contributors to, and early adopters of, Unicode, which provided a standard encoding for “16-bits’-worth” of characters. And for many years this served us well. But in time—and particularly with the growth of emoji—16 bits wasn’t enough to encode all the characters people wanted to use. So a few years ago we began rolling out support for 32-bit Unicode, and in Version 13.1 we integrated it into notebooks—in effect making strings something much richer than before:

And, yes, you can use Unicode everywhere now:

Back when Version 1.0 was released, a megabyte was a lot of memory. But 35 years later we routinely deal with gigabytes. And one of the things that makes practical is computation with video. We first introduced `Video` experimentally in Version 12.1 in 2020. And over the past three years we’ve been systematically broadening and strengthening our ability to deal with video in Wolfram Language. Probably the single most important advance is that things around video now—as much as possible—“just work”, without “creaking” under the strain of handling such large amounts of data.

We can directly capture video into notebooks, and we can robustly play video anywhere within a notebook. We’ve also added options for where to store the video so that it’s conveniently accessible to you and anyone else you want to give access to it.

There’s lots of complexity in the encoding of video—and we now robustly and transparently support more than 500 codecs. We also do lots of convenient things automatically, like rotating portrait-mode videos—and being able to apply image processing operations like `ImageCrop` across whole videos. In every version, we’ve been further optimizing the speed of some video operation or another.

But a particularly big focus has been on video generators: programmatic ways to produce videos and animations. One basic example is `AnimationVideo`, which produces the same kind of output as `Animate`, but as a `Video` object that can either be displayed directly in a notebook, or exported in MP4 or some other format:

`AnimationVideo` is based on computing each frame in a video by evaluating an expression. Another class of video generators take an existing visual construct, and simply “tour” it. `TourVideo` “tours” images, graphics and geo graphics; `Tour3DVideo` (new in Version 14.0) tours 3D geometry:

A very powerful capability in Wolfram Language is being able to apply arbitrary functions to videos. One example of how this can be done is `VideoFrameMap`, which maps a function across frames of a video, and which was made efficient in Version 13.2:

And although Wolfram Language isn’t intended as an interactive video editing system, we’ve made sure that it’s possible to do streamlined programmatic video editing in the language, and for example in Version 14.0 we’ve added things like transition effects in `VideoJoin` and timed overlays in `OverlayVideo`.

With every new version of Wolfram Language we add new capabilities to extend yet further the domain of the language. But we also put a lot of effort into something less immediately visible: making existing capabilities faster, stronger and sleeker.

And in Version 14 two areas where we can see some examples of all these are dates and quantities. We introduced the notion of symbolic dates (`DateObject`, etc.) nearly a decade ago. And over the years since then we’ve built many things on this structure. And in the process of doing this it’s become clear that there are certain flows and paths that are particularly common and convenient. At the beginning what mattered most was just to make sure that the relevant functionality existed. But over time we’ve been able to see what should be streamlined and optimized, and we’ve steadily been doing that.

In addition, as we’ve worked towards new and different applications, we’ve seen “corners” that need to be filled in. So, for example, astronomy is an area we’ve significantly developed in Version 14, and supporting astronomy has required adding several new “high-precision” time capabilities, such as the `TimeSystem` option, as well as new astronomy-oriented calendar systems. Another example concerns date arithmetic. What should happen if you want to add a month to January 30? Where should you land? Different kinds of business applications and contracts make different assumptions—and so we added a `Method` option to functions like `DatePlus` to handle this. Meanwhile, having realized that date arithmetic is involved in the “inner loop” of certain computations, we optimized it—achieving a more than 100x speedup in Version 14.0.

Wolfram|Alpha has been able to deal with units ever since it was first launched in 2009—now more than 10,000 of them. And in 2012 we introduced `Quantity` to represent quantities with units in the Wolfram Language. And over the past decade we’ve been steadily smoothing out a whole series of complicated gotchas and issues with units. For example, what does

At first our priority with `Quantity` was to get it working as broadly as possible, and to integrate it as widely as possible into computations, visualizations, etc. across the system. But as its capabilities have expanded, so have its uses, repeatedly driving the need to optimize its operation for particular common cases. And indeed between Version 13 and Version 14 we’ve dramatically sped up many things related to `Quantity`, often by factors of 1000 or more.

Talking of speedups, another example—made possible by new algorithms operating on multithreaded CPUs—concerns polynomials. We’ve worked with polynomials in Wolfram Language since Version 1, but in Version 13.2 there was a dramatic speedup of up to 1000x on operations like polynomial factoring.

In addition, a new algorithm in Version 14.0 dramatically speeds up numerical solutions to polynomial and transcendental equations—and, together with the new `MaxRoots` options, allows us, for example, to pick off a few roots from a degree-one-million polynomial

or to find roots of a transcendental equation that we could not even attempt before without pre-specifying bounds on their values:

Another “old” piece of functionality with recent enhancement concerns mathematical functions. Ever since Version 1.0 we’ve set up mathematical functions so that they can be computed to arbitrary precision:

But in recent versions we’ve wanted to be “more precise about precision”, and to be able to rigorously compute just what range of outputs are possible given the range of values provided as input:

But every function for which we do this effectively requires a new theorem, and we’ve been steadily increasing the number of functions covered—now more than 130—so that this “just works” when you need to use it in a computation.

Trees are useful. We first introduced them as basic objects in the Wolfram Language only in Version 12.3. But now that they’re there, we’re discovering more and more places they can be used. And to support that, we’ve been adding more and more capabilities to them.

One area that’s advanced significantly since Version 13 is the rendering of trees. We tightened up the general graphic design, but, more importantly, we introduced many new options for how rendering should be done.

For example, here’s a random tree where we’ve specified that for all nodes only 3 children should be explicitly displayed: the others are elided away:

Here we’re adding several options to define the rendering of the tree:

By default, the branches in trees are labeled with integers, just like parts in an expression. But in Version 13.1 we added support for named branches defined by associations:

Our original conception of trees was very centered around having elements one would explicitly address, and that could have “payloads” attached. But what became clear is that there were applications where all that mattered was the structure of the tree, not anything about its elements. So we added `UnlabeledTree` to create “pure trees”:

Trees are useful because many kinds of structures are basically trees. And since Version 13 we’ve added capabilities for converting trees to and from various kinds of structures. For example, here’s a simple `Dataset` object:

You can use `ExpressionTree` to convert this to a tree:

And `TreeExpression` to convert it back:

We’ve also added capabilities for converting to and from JSON and XML, as well as for representing file directory structures as trees:

In Version 1.0 we had integers, rational numbers and real numbers. In Version 3.0 we added algebraic numbers (represented implicitly by `Root`)—and a dozen years later we added algebraic number fields and transcendental roots. For Version 14 we’ve now added another (long-awaited) “number-related” construct: finite fields.

Here’s our symbolic representation of the field of integers modulo 7:

And now here’s a specific element of that field

which we can immediately compute with:

But what’s really important about what we’ve done with finite fields is that we’ve fully integrated them into other functions in the system. So, for example, we can factor a polynomial whose coefficients are in a finite field:

We can also do things like find solutions to equations over finite fields. So here, for example, is a point on a Fermat curve over the finite field GF(17^{3}):

And here is a power of a matrix with elements over the same finite field:

A major new capability added since Version 13 is astro computation. It begins with being able to compute to high precision the positions of things like planets. Even knowing what one means by “position” is complicated, though—with lots of different coordinate systems to deal with. By default `AstroPosition` gives the position in the sky at the current time from your `Here` location:

But one can instead ask about a different coordinate system, like global galactic coordinates:

And now here’s a plot of the distance between Saturn and Jupiter over a 50-year period:

In direct analogy to `GeoGraphics`, we’ve added `AstroGraphics`, here showing a patch of sky around the current position of Saturn:

And this now shows the sequence of positions for Saturn over the course of a couple of years—yes, including retrograde motion:

There are many styling options for `AstroGraphics`. Here we’re adding a background of the “galactic sky”:

And here we’re including renderings for constellations (and, yes, we had an artist draw them):

Something specifically new in Version 14.0 has to do with extended handling of solar eclipses. We always try to deliver new functionality as fast as we can. But in this case there was a very specific deadline: the total solar eclipse visible from the US on April 8, 2024. We’ve had the ability to do global computations about solar eclipses for some time (actually since soon before the 2017 eclipse). But now we can also do detailed local computations right in the Wolfram Language.

So, for example, here’s a somewhat detailed overall map of the April 8, 2024, eclipse:

Now here’s a plot of the magnitude of the eclipse over a few hours, complete with a little “rampart” associated with the period of totality:

And here’s a map of the region of totality every minute just after the moment of maximum eclipse:

We first introduced computable data on biological organisms back when Wolfram|Alpha was released in 2009. But in Version 14—following several years of work—we’ve dramatically broadened and deepened the computable data we have about biological organisms.

So for example here’s how we can figure out what species have cheetahs as predators:

And here are pictures of these:

Here’s a map of countries where cheetahs have been seen (in the wild):

We now have data—curated from a great many sources—on more than a million species of animals, as well as most of the plants, fungi, bacteria, viruses and archaea that have been described. And for animals, for example, we have nearly 200 properties that are extensively filled in. Some are taxonomic properties:

Some are physical properties:

Some are genetic properties:

Some are ecological properties (yes, the cheetah is not the apex predator):

It’s useful to be able to get properties of individual species, but the real power of our curated computable data shows up when one does larger-scale analyses. Like here’s a plot of the lengths of genomes for organisms with the longest ones across our collection of organisms:

Or here’s a histogram of the genome lengths for organisms in the human gut microbiome:

And here’s a scatterplot of the lifespans of birds against their weights:

Following the idea that cheetahs aren’t apex predators, this is a graph of what’s “above” them in the food chain:

We began the process of introducing chemical computation into the Wolfram Language in Version 12.0, and by Version 13 we had good coverage of atoms, molecules, bonds and functional groups. Now in Version 14 we’ve added coverage of chemical formulas, amounts of chemicals—and chemical reactions.

Here’s a chemical formula, that basically just gives a “count of atoms”:

Now here are specific molecules with that formula:

Let’s pick one of these molecules:

Now in Version 14 we have a way to represent a certain quantity of molecules of a given type—here 1 gram of methylcyclopentane:

`ChemicalConvert` can convert to a different specification of quantity, here moles:

And here a count of molecules:

But now the bigger story is that in Version 14 we can represent not just individual types of molecules, and quantities of molecules, but also chemical reactions. Here we give a “sloppy” unbalanced representation of a reaction, and `ReactionBalance` gives us the balanced version:

And now we can extract the formulas for the reactants:

We can also give a chemical reaction in terms of molecules:

But with our symbolic representation of molecules and reactions, there’s now a big thing we can do: represent classes of reactions as “pattern reactions”, and work with them using the same kinds of concepts as we use in working with patterns for general expressions. So, for example, here’s a symbolic representation of the hydrohalogenation reaction:

Now we can apply this pattern reaction to particular molecules:

Here’s a more elaborate example, in this case entered using a SMARTS string:

Here we’re applying the reaction just once:

And now we’re doing it repeatedly

in this case generating longer and longer molecules (which in this case happen to be polypeptides):

Every minute of every day, new data is being added to the Wolfram Knowledgebase. Much of it is coming automatically from real-time feeds. But we also have a very large-scale ongoing curation effort with humans in the loop. We’ve built sophisticated (Wolfram Language) automation for our data curation pipeline over the years—and this year we’ve been able to increase efficiency in some areas by using LLM technology. But it’s hard to do curation right, and our long-term experience is that to do so ultimately requires human experts being in the loop, which we have.

So what’s new since Version 13.0? 291,842 new notable current and historical people; 264,467 music works; 118,538 music albums; 104,024 named stars; and so on. Sometimes the addition of an entity is driven by the new availability of reliable data; often it’s driven by the need to use that entity in some other piece of functionality (e.g. stars to render in `AstroGraphics`). But more than just adding entities there’s the issue of filling in values of properties of existing entities. And here again we’re always making progress, sometimes integrating newly available large-scale secondary data sources, and sometimes doing direct curation ourselves from primary sources.

A recent example where we needed to do direct curation was in data on alcoholic beverages. We have very extensive data on hundreds of thousands of types of foods and drinks. But none of our large-scale sources included data on alcoholic beverages. So that’s an area where we need to go to primary sources (in this case typically the original producers of products) and curate everything for ourselves.

So, for example, we can now ask for something like the distribution of flavors of different varieties of vodka (actually, personally, not being a consumer of such things, I had no idea vodka even had flavors…):

But beyond filling out entities and properties of existing types, we’ve also steadily been adding new entity types. One recent example is geological formations, 13,706 of them:

So now, for example, we can specify where *T. rex* have been found

and we can show those regions on a map:

PDEs are hard. It’s hard to solve them. And it’s hard to even specify what exactly you want to solve. But we’ve been on a multi-decade mission to “consumerize” PDEs and make them easier to work with. Many things go into this. You need to be able to easily specify elaborate geometries. You need to be able to easily define mathematically complicated boundary conditions. You need to have a streamlined way to set up the complicated equations that come out of underlying physics. Then you have to—as automatically as possible—do the sophisticated numerical analysis to efficiently solve the equations. But that’s not all. You also often need to visualize your solution, compute other things from it, or run optimizations of parameters over it.

It’s a deep use of what we’ve built with Wolfram Language—touching many parts of the system. And the result is something unique: a truly streamlined and integrated way to handle PDEs. One’s not dealing with some (usually very expensive) “just for PDEs” package; what we now have is a “consumerized” way to handle PDEs whenever they’re needed—for engineering, science, or whatever. And, yes, being able to connect machine learning, or image computation, or curated data, or data science, or real-time sensor feeds, or parallel computing, or, for that matter, Wolfram Notebooks, to PDEs just makes them so much more valuable.

We’ve had “basic, raw `NDSolve`” since 1991. But what’s taken decades to build is all the structure around that to let one conveniently set up—and efficiently solve—real-world PDEs, and connect them into everything else. It’s taken developing a whole tower of underlying algorithmic capabilities such as our more-flexible-and-integrated-than-ever-before industrial-strength computational geometry and finite element methods. But beyond that it’s taken creating a language for specifying real-world PDEs. And here the symbolic nature of the Wolfram Language—and our whole design framework—has made possible something very unique, that has allowed us to dramatically simplify and consumerize the use of PDEs.

It’s all about providing symbolic “construction kits” for PDEs and their boundary conditions. We started this about five years ago, progressively covering more and more application areas. In Version 14 we’ve particularly focused on solid mechanics, fluid mechanics, electromagnetics and (one-particle) quantum mechanics.

Here’s an example from solid mechanics. First, we define the variables we’re dealing with (displacement and underlying coordinates):

Next, we specify the parameters we want to use to describe the solid material we’re going to work with:

Now we can actually set up our PDE—using symbolic PDE specifications like `SolidMechanicsPDEComponent`—here for the deformation of a solid object pulled on one side:

And, yes, “underneath”, these simple symbolic specifications turn into a complicated “raw” PDE:

Now we are ready to actually solve our PDE in a particular region, i.e. for an object with a particular shape:

And now we can visualize the result, which shows how our object stretches when it’s pulled on:

The way we’ve set things up, the material for our object is an idealization of something like rubber. But in the Wolfram Language we now have ways to specify all sorts of detailed properties of materials. So, for example, we can add reinforcement as a unit vector in a particular direction (say in practice with fibers) to our material:

Then we can rerun what we did before

but now we get a slightly different result:

Another major PDE domain that’s new in Version 14.0 is fluid flow. Let’s do a 2D example. Our variables are 2D velocity and pressure:

Now we can set up our fluid system in a particular region, with no-slip conditions on all walls except at the top where we assume fluid is flowing from left to right. The only parameter needed is the Reynolds number. And instead of just solving our PDEs for a single Reynolds number, let’s create a parametric solver that can take any specified Reynolds number:

Now here’s the result for Reynolds number 100:

But with the way we’ve set things up, we can as well generate a whole video as a function of Reynolds number (and, yes, the `Parallelize` speeds things up by generating different frames in parallel):

Much of our work in PDEs involves catering to the complexities of real-world engineering situations. But in Version 14.0 we’re also adding features to support “pure physics”, and in particular to support quantum mechanics done with the Schrödinger equation. So here, for example, is the 2D 1-particle Schrödinger equation (with ):

Here’s the region we’re going to be solving over—showing explicit discretization:

Now we can solve the equation, adding in some boundary conditions:

And now we get to visualize a Gaussian wave packet scattering around a barrier:

Systems engineering is a big field, but it’s one where the structure and capabilities of the Wolfram Language provide unique advantages—that over the past decade have allowed us to build out rather complete industrial-strength support for modeling, analysis and control design for a wide range of types of systems. It’s all an integrated part of the Wolfram Language, accessible through the computational and interface structure of the language. But it’s also integrated with our separate Wolfram System Modeler product, that provides a GUI-based workflow for system modeling and exploration.

Shared with System Modeler are large collections of domain-specific modeling libraries. And, for example, since Version 13, we’ve added libraries in areas such as battery engineering, hydraulic engineering and aircraft engineering—as well as educational libraries for mechanical engineering, thermal engineering, digital electronics, and biology. (We’ve also added libraries for areas such as business and public policy simulation.)

A typical workflow for systems engineering begins with the setting up of a model. The model can be built from scratch, or assembled from components in model libraries—either visually in Wolfram System Modeler, or programmatically in the Wolfram Language. For example, here’s a model of an electric motor that’s turning a load through a flexible shaft:

Once one’s got a model, one can then simulate it. Here’s an example where we’ve set one parameter of our model (the moment of inertia of the load), and we’re computing the values of two others as a function of time:

A new capability in Version 14.0 is being able to see the effect of uncertainty in parameters (or initial values, etc.) on the behavior of a system. So here, as an example, we’re saying the value of the parameter is not definite, but is instead distributed according to a normal distribution—then we’re seeing the distribution of output results:

The motor with flexible shaft that we’re looking at can be thought of as a “multidomain system”, combining electrical and mechanical components. But the Wolfram Language (and Wolfram System Modeler) can also handle “mixed systems”, combining analog and digital (i.e. continuous and discrete) components. Here’s a fairly sophisticated example from the world of control systems: a helicopter model connected in a closed loop to a digital control system:

This whole model system can be represented symbolically just by:

And now we compute the input-output response of the model:

Here’s specifically the output response:

But now we can “drill in” and see specific subsystem responses, here of the zero-order hold device (labeled ZOH above)—complete with its little digital steps:

But what if we want to design the control systems ourselves? Well, in Version 14 we can now apply all our Wolfram Language control systems design functionality to arbitrary system models. Here’s an example of a simple model, in this case in chemical engineering (a continuously stirred tank):

Now we can take this model and design an LQG controller for it—then assemble a whole closed-loop system for it:

Now we can simulate the closed-loop system—and see that the controller succeeds in bringing the final value to 0:

Graphics have always been an important part of the story of the Wolfram Language, and for more than three decades we’ve been progressively enhancing and updating their appearance and functionality—sometimes with help from advances in hardware (e.g. GPU) capabilities.

Since Version 13 we’ve added a variety of “decorative” (or “annotative”) effects in 2D graphics. One example (useful for putting captions on things) is `Haloing`:

Another example is `DropShadowing`:

All of these are specified symbolically, and can be used throughout the system (e.g. in hover effects, etc). And, yes, there are many detailed parameters you can set:

A significant new capability in Version 14.0 is convenient texture mapping. We’ve had low-level polygon-by-polygon textures for a decade and a half. But now in Version 14.0 we’ve made it straightforward to map textures onto whole surfaces. Here’s an example wrapping a texture onto a sphere:

And here’s wrapping the same texture onto a more complicated surface:

A significant subtlety is that there are many ways to map what amount to “texture coordinate patches” onto surfaces. The documentation illustrates new, named cases:

And now here’s what happens with stereographic projection onto a sphere:

Here’s an example of “surface texture” for the planet Venus

and here it’s been mapped onto a sphere, which can be rotated:

Here’s a “flowerified” bunny:

Things like texture mapping help make graphics visually compelling. Since Version 13 we’ve also added a variety of “live visualization” capabilities that automatically “bring visualizations to life”. For example, any plot now by default has a “coordinate mouseover”:

As usual, there’s lots of ways to control such “highlighting” effects:

One might say it’s been two thousand years in the making. But four years ago (Version 12) we began to introduce a computable version of Euclid-style synthetic geometry.

The idea is to specify geometric scenes symbolically by giving a collection of (potentially implicit) constraints:

We can then generate a random instance of geometry consistent with the constraints—and in Version 14 we’ve considerably enhanced our ability to make sure that geometry will be “typical” and non-degenerate:

But now a new feature of Version 14 is that we can find values of geometric quantities that are determined by the constraints:

Here’s a slightly more complicated case:

And here we’re now solving for the areas of two triangles in the figure:

We’ve always been able to give explicit styles for particular elements of a scene:

Now one of the new features in Version 14 is being able to give general “geometric styling rules”, here just assigning random colors to each element:

Our goal with Wolfram Language is to make it as easy as possible to express oneself computationally. And a big part of achieving that is the coherent design of the language itself. But there’s another part as well, which is being able to actually enter Wolfram Language input one wants—say in a notebook—as easily as possible. And with every new version we make enhancements to this.

One area that’s been in continuous development is interactive syntax highlighting. We first added syntax highlighting nearly two decades ago—and over time we’ve progressively made it more and more sophisticated, responding both as you type, and as code gets executed. Some highlighting has always had obvious meaning. But particularly highlighting that is dynamic and based on cursor position has sometimes been harder to interpret. And in Version 14—leveraging the brighter color palettes that have become the norm in recent years—we’ve tuned our dynamic highlighting so it’s easier to quickly tell “where you are” within the structure of an expression:

On the subject of “knowing what one has”, another enhancement—added in Version 13.2—is differentiated frame coloring for different kinds of visual objects in notebooks. Is that thing one has a graphic? Or an image? Or a graph? Now one can tell from the color of frame when one selects it:

An important aspect of the Wolfram Language is that the names of built-in functions are spelled out enough that it’s easy to tell what they do. But often the names are therefore necessarily quite long, and so it’s important to be able to autocomplete them when one’s typing. In 13.3 we added the notion of “fuzzy autocompletion” that not only “completes to the end” a name one’s typing, but also can fill in intermediate letters, change capitalization, etc. Thus, for example, just typing `l``l``l` brings up an autocompletion menu that begins with `ListLogLogPlot`:

A major user interface update that first appeared in Version 13.1—and has been enhanced in subsequent versions—is a default toolbar for every notebook:

The toolbar provides immediate access to evaluation controls, cell formatting and various kinds of input (like inline cells, , hyperlinks, drawing canvas, etc.)—as well as to things like cloud publishing, documentation search and “chat” (i.e. LLM) settings.

Much of the time, it’s useful to have the toolbar displayed in any notebook you’re working with. But on the left-hand side there’s a little tiny that lets you minimize the toolbar:

In 14.0 there’s a Preferences setting that makes the toolbar come up minimized in any new notebook you create—and this in effect gives you the best of both worlds: you have immediate access to the toolbar, but your notebooks don’t have anything “extra” that might distract from their content.

Another thing that’s advanced since Version 13 is the handling of “summary” forms of output in notebooks. A basic example is what happens if you generate a very large result. By default only a summary of the result is actually displayed. But now there’s a bar at the bottom that gives various options for how to handle the actual output:

By default, the output is only stored in your current kernel session. But by pressing the `Iconize` button you get an iconized form that will appear directly in your notebook (or one that can be copied anywhere) and that “has the whole output inside”. There’s also a `Store full expression in notebook` button, which will “invisibly” store the output expression “behind” the summary display.

If the expression is stored in the notebook, then it’ll be persistent across kernel sessions. Otherwise, well, you won’t be able to get to it in a different kernel session; the only thing you’ll have is the summary display:

It’s a similar story for large “computational objects”. Like here’s a `Nearest` function with a million data points:

By default, the data is just something that exists in your current kernel session. But now there’s a menu that lets you save the data in various persistent locations:

There are many ways to run the Wolfram Language. Even in Version 1.0 we had the notion of remote kernels: the notebook front end running on one machine (in those days essentially always a Mac, or a NeXT), and the kernel running on a different machine (in those days sometimes even connected by phone lines). But a decade ago came a major step forward: the Wolfram Cloud.

There are really two distinct ways in which the cloud is used. The first is in delivering a notebook experience similar to our longtime desktop experience, but running purely in a browser. And the second is in delivering APIs and other programmatically accessed capabilities—notably, even at the beginning, a decade ago, through things like `APIFunction`.

The Wolfram Cloud has been the target of intense development now for nearly 15 years. Alongside it have also come Wolfram Application Server and Wolfram Web Engine, which provide more streamlined support specifically for APIs (without things like user management, etc., but with things like clustering).

All of these—but particularly the Wolfram Cloud—have become core technology capabilities for us, supporting many of our other activities. So, for example, the Wolfram Function Repository and Wolfram Paclet Repository are both based on the Wolfram Cloud (and in fact this is true of our whole resource system). And when we came to build the Wolfram plugin for ChatGPT earlier this year, using the Wolfram Cloud allowed us to have the plugin deployed within a matter of days.

Since Version 13 there have been quite a few very different applications of the Wolfram Cloud. One is for the function `ARPublish`, which takes 3D geometry and puts it in the Wolfram Cloud with appropriate metadata to allow phones to get augmented-reality versions from a QR code of a cloud URL:

On the Cloud Notebook side, there’s been a steady increase in usage, notably of embedded Cloud Notebooks, which have for example become common on Wolfram Community, and are used all over the Wolfram Demonstrations Project. Our goal all along has been to make Cloud Notebooks be as easy to use as simple webpages, but to have the depth of capabilities that we’ve developed in notebooks over the past 35 years. We achieved this some years ago for fairly small notebooks, but in the past couple of years we’ve been going progressively further in handling even multi-hundred-megabyte notebooks. It’s a complicated story of caching, refreshing—and dodging the vicissitudes of web browsers. But at this point the vast majority of notebooks can be seamlessly deployed to the cloud, and will display as immediately as simple webpages.

It’s been possible to call external code from Wolfram Language ever since Version 1.0. But in Version 14 there are important advances in the extent and ease with which external code can be integrated. The overall goal is to be able to use all the power and coherence of the Wolfram Language even when some part of a computation is done in external code. And in Version 14 we’ve done a lot to streamline and automate the process by which external code can be integrated into the language.

Once something is integrated into the Wolfram Language it just becomes, for example, a function that can be used just like any other Wolfram Language function. But what’s underneath is necessarily quite different for different kinds of external code. There’s one setup for interpreted languages like Python. There’s another for C-like compiled languages and dynamic libraries. (And then there are others for external processes, APIs, and what amount to “importable code specifications”, say for neural networks.)

Let’s start with Python. We’ve had `ExternalEvaluate` for evaluating Python code since 2018. But when you actually come to use Python there are all these dependencies and libraries to deal with. And, yes, that’s one of the places where the incredible advantages of the Wolfram Language and its coherent design are painfully evident. But in Version 14.0 we now have a way to encapsulate all that Python complexity, so that we can deliver Python functionality within Wolfram Language, hiding all the messiness of Python dependencies, and even the versioning of Python itself.

As an example, let’s say we want to make a Wolfram Language function `Emojize` that uses the Python function emojize within the emoji Python library. Here’s how we can do that:

And now you can just call `Emojize` in the Wolfram Language and—under the hood—it’ll run Python code:

The way this works is that the first time you call `Emojize`, a Python environment with all the right features is created, then is cached for subsequent uses. And what’s important is that the Wolfram Language specification of `Emojize` is completely system independent (or as system independent as it can be, given vicissitudes of Python implementations). So that means that you can, for example, deploy `Emojize` in the Wolfram Function Repository just like you would deploy something written purely in Wolfram Language.

There’s very different engineering involved in calling C-compatible functions in dynamic libraries. But in Version 13.3 we also made this very streamlined using the function `ForeignFunctionLoad`. There’s all sorts of complexity associated with converting to and from native C data types, managing memory for data structures, etc. But we’ve now got very clean ways to do this in Wolfram Language.

As an example, here’s how one sets up a “foreign function” call to a function RAND_bytes in the OpenSSL library:

Inside this, we’re using Wolfram Language compiler technology to specify the native C types that will be used in the foreign function. But now we can package this all up into a Wolfram Language function:

And we can call this function just like any other Wolfram Language function:

Internally, all sorts of complicated things are going on. For example, we’re allocating a raw memory buffer that’s then getting fed to our C function. But when we do that memory allocation we’re creating a symbolic structure that defines it as a “managed object”:

And now when this object is no longer being used, the memory associated with it will be automatically freed.

And, yes, with both Python and C there’s quite a bit of complexity underneath. But the good news is that in Version 14 we’ve basically been able to automate handling it. And the result is that what gets exposed is pure, simple Wolfram Language.

But there’s another big piece to this. Within particular Python or C libraries there are often elaborate definitions of data structures that are specific to that library. And so to use these libraries one has to dive into all the—potentially idiosyncratic—complexities of those definitions. But in the Wolfram Language we have consistent symbolic representations for things, whether they’re images, or dates or types of chemicals. When you first hook up an external library you have to map its data structures to these. But once that’s done, anyone can use what’s been built, and seamlessly integrate with other things they’re doing, perhaps even calling other external code. In effect what’s happening is that one’s leveraging the whole design framework of the Wolfram Language, and applying that even when one’s using underlying implementations that aren’t based on the Wolfram Language.

A single line (or less) of Wolfram Language code can do a lot. But one of the remarkable things about the language is that it’s fundamentally scalable: good both for very short programs and very long programs. And since Version 13 there’ve been several advances in handling very long programs. One of them concerns “code editing”.

Standard Wolfram Notebooks work very well for exploratory, expository and many other forms of work. And it’s certainly possible to write large amounts of code in standard notebooks (and, for example, I personally do it). But when one’s doing “software-engineering-style work” it’s both more convenient and more familiar to use what amounts to a pure code editor, largely separate from code execution and exposition. And this is why we have the “package editor”, accessible from `File` > `New` > `Package/Script`. You’re still operating in the notebook environment, with all its sophisticated capabilities. But things have been “skinned” to provide a much more textual “code experience”—both in terms of editing, and in terms of what actually gets saved in .wl files.

Here’s typical example of the package editor in action (in this case applied to our GitLink package):

Several things are immediately evident. First, it’s very line oriented. Lines (of code) are numbered, and don’t break except at explicit newlines. There are headings just like in ordinary notebooks, but when the file is saved, they’re stored as comments with a certain stylized structure:

It’s still perfectly possible to run code in the package editor, but the output won’t get saved in the .wl file:

One thing that’s changed since Version 13 is that the toolbar is much enhanced. And for example there’s now “smart search” that is aware of code structure:

You can also ask to go to a line number—and you’ll immediately see whatever lines of code are nearby:

In addition to code editing, another set of features new since Version 13 of importance to serious developers concern automated testing. The main advance is the introduction of a fully symbolic testing framework, in which individual tests are represented as symbolic objects

and can be manipulated in symbolic form, then run using functions like `TestEvaluate` and `TestReport`:

In Version 14.0 there’s another new testing function—`IntermediateTest`—that lets you insert what amount to checkpoints inside larger tests:

Evaluating this test, we see that the intermediate tests were also run:

The Wolfram Function Repository has been a big success. We introduced it in 2019 as a way to make specific, individual contributed functions available in the Wolfram Language. And now there are more than 2900 such functions in the Repository.

The nearly 7000 functions that constitute the Wolfram Language as it is today have been painstakingly developed over the past three and a half decades, always mindful of creating a coherent whole with consistent design principles. And now in a sense the success of the Function Repository is one of the dividends of all that effort. Because it’s the coherence and consistency of the underlying language and its design principles that make it feasible to just add one function at a time, and have it really work. You want to add a function to do some very specific operation that combines images and graphs. Well, there’s a consistent representation of both images and graphs in the Wolfram Language, which you can leverage. And by following the principles of the Wolfram Language—like for the naming of functions—you can create a function that’ll be easy for Wolfram Language users to understand and use.

Using the Wolfram Function Repository is a remarkably seamless process. If you know the function’s name, you can just call it using `ResourceFunction`; the function will be loaded if it’s needed, and then it’ll just run:

If there’s an update available for the function, it’ll give you a message, but run the old version anyway. The message has a button that lets you load in the update; then you can rerun your input and use the new version. (If you’re writing code where you want to “burn in” a particular version of a function, you can just use the `ResourceVersion` option of `ResourceFunction`.)

If you want your code to look more elegant, just evaluate the `ResourceFunction` object

and use the formatted version:

And, by the way, pressing the `+` then gives you more information about the function:

An important feature of functions in the Function Repository is that they all have documentation pages—that are organized pretty much like the pages for built-in functions:

But how does one create a Function Repository entry? Just go to `File` > `New` > `Repository Item` > `Function Repository Item` and you’ll get a Definition Notebook:

We’ve optimized this to be as easy to fill in as possible, minimizing boilerplate and automatically checking for correctness and consistency whenever possible. And the result is that it’s perfectly realistic to create a simple Function Repository item in under an hour—with the main time spent being in the writing of good expository examples.

When you press `Submit to Repository` your function gets sent to the Wolfram Function Repository review team, whose mandate is to ensure that functions in the repository do what they say they do, work in a way that is consistent with general Wolfram Language design principles, have good names, and are adequately documented. Except for very specialized functions, the goal is to finish reviews within a week (and sometimes considerably sooner)—and to publish functions as soon as they are ready.

There’s a digest of new (and updated) functions in the Function Repository that gets sent out every Friday—and makes for interesting reading (you can subscribe here):

The Wolfram Function Repository is a curated public resource that can be accessed from any Wolfram Language system (and, by the way, the source code for every function is available—just press the `Source Notebook` button). But there’s another important use case for the infrastructure of the Function Repository: privately deployed “resource functions”.

It all works through the Wolfram Cloud. You use the exact same Definition Notebook, but now instead of submitting to the public Wolfram Function Repository, you just deploy your function to the Wolfram Cloud. You can make it private so that only you, or some specific group, can access it. Or you can make it public, so anyone who knows its URL can immediately access and use it in their Wolfram Language system.

This turns out to be a tremendously useful mechanism, both for group projects, and for creating published material. In a sense it’s a very lightweight but robust way to distribute code—packaged into functions that can immediately be used. (By the way, to find the functions you’ve published from your Wolfram Cloud account, just go to the DeployedResources folder in the cloud file browser.)

(For organizations that want to manage their own function repository, it’s worth mentioning that the whole Wolfram Function Repository mechanism—including the infrastructure for doing reviews, etc.—is also available in a private form through the Wolfram Enterprise Private Cloud.)

So what’s in the public Wolfram Function Repository? There are a lot of “specialty functions” intended for specific “niche” purposes—but very useful if they’re what you want:

There are functions that add various kinds of visualizations:

Some functions set up user interfaces:

Some functions link to external services:

Some functions provide simple utilities:

There are also functions that are being explored for potential inclusion in the core system:

There are also lots of “leading-edge” functions, added as part of research or exploratory development. And for example in pieces I write (including this one), I make a point of having all pictures and other output be backed by “click-to-copy” code that reproduces them—and this code quite often contains functions either from the public Wolfram Function Repository or from (publicly accessible) private deployments.

Paclets are a technology we’ve used for more than a decade and a half to distribute updated functionality to Wolfram Language systems in the field. In Version 13 we began the process of providing tools for anyone to create paclets. And since Version 13 we’ve introduced the Wolfram Language Paclet Repository as a centralized repository for paclets:

What is a paclet? It’s a collection of Wolfram Language functionality—including function definitions, documentation, external libraries, stylesheets, palettes and more—that can be distributed as a unit, and immediately deployed in any Wolfram Language system.

The Paclet Repository is a centralized place where anyone can publish paclets for public distribution. So how does this relate to the Wolfram Function Repository? They are interestingly complementary—with different optimization and different setups. The Function Repository is more lightweight, the Paclet Repository more flexible. The Function Repository is for making available individual new functions, that independently fit into the whole existing structure of the Wolfram Language. The Paclet Repository is for making available larger-scale pieces of functionality, that can define a whole framework and environment of their own.

The Function Repository is also fully curated, with every function being reviewed by our team before it is posted. The Paclet Repository is an immediate-deployment system, without pre-publication review. In the Function Repository every function is specified just by its name—and our review team is responsible for ensuring that names are well chosen and have no conflicts. In the Paclet Repository, every contributor gets their own namespace, and all their functions and other material live inside that namespace. So, for example, I contributed the function `RandomHypergraph` to the Function Repository, which can be accessed just as `ResourceFunction`[`"``RandomHypergraph``"]`. But if I had put this function in a paclet in the Paclet Repository, it would have to be accessed as something like `PacletSymbol``["StephenWolfram/Hypergraphs",` `"``RandomHypergraph``"]`.

`PacletSymbol`, by the way, is a convenient way of “deep accessing” individual functions inside a paclet. `PacletSymbol` temporarily installs (and loads) a paclet so that you can access a particular symbol in it. But more often one wants to permanently install a paclet (using `PacletInstall`), then explicitly load its contents (using `Needs`) whenever one wants to have its symbols available. (All the various ancillary elements, like documentation, stylesheets, etc. in a paclet get set up when it is installed.)

What does a paclet look like in the Paclet Repository? Every paclet has a home page that typically includes an overall summary, a guide to the functions in the paclet, and some overall examples of the paclet:

Individual functions typically have their own documentation pages:

Just like in the main Wolfram Language documentation, there can be a whole hierarchy of guide pages, and there can be things like tutorials.

Notice that in examples in paclet documentation, one often sees constructs like . These represent symbols in the paclet, presented in forms like `PacletSymbol``["WolframChemistry/ProteinVisualization", "AmidePlanePlot"]` that allow these symbols to be accessed in a “standalone” way. If you directly evaluate such a form, by the way, it’ll force (temporary) installation of the paclet, then return the actual, raw symbol that appears in the paclet:

So how does one create a paclet suitable for submission to the Paclet Repository? You can do it purely programmatically, or you can start from `File` > `New` > `Repository Item` > `Paclet Repository Item`, which launches what amounts to a whole paclet creation IDE. The first step is to specify where you want to assemble your paclet. You give some basic information

then a Paclet Resource Definition Notebook is created, from which you can give function definitions, set up documentation pages, specify what you want your paclet’s home page to be like, etc.:

There are lots of sophisticated tools that let you create full-featured paclets with the same kind of breadth and depth of capabilities that you find in the Wolfram Language itself. For example, Documentation Tools lets you construct full-featured documentation pages (function pages, guide pages, tutorials, …):

Once you’ve assembled a paclet, you can check it, build it, deploy it privately—or submit it to the Paclet Repository. And once you submit it, it will automatically get set up on the Paclet Repository servers, and within just a few minutes the pages you’ve created describing your paclet will show up on the Paclet Repository website.

So what’s in the Paclet Repository so far? There’s a lot of good and very serious stuff, contributed both by teams at our company and by members of the broader Wolfram Language community. In fact, many of the 134 paclets now in the Paclet Repository have enough in them that there’s a whole piece like this that one could write about them.

One category of things you’ll find in the Paclet Repository are snapshots of our ongoing internal development projects—many of which will eventually become built-in parts of the Wolfram Language. A good example of this is our LLM and Chat Notebook functionality, whose rapid development and deployment over the past year was made possible by the use of the Paclet Repository. Another example, representing ongoing work from our chemistry team (AKA WolframChemistry in the Paclet Repository) is the ChemistryFunctions paclet, which contains functions like:

And, yes, this is interactive:

Or, also from WolframChemistry:

Another “development snapshot” is DiffTools—a paclet for making and viewing diffs between strings, cells, notebooks, etc.:

A major paclet is QuantumFramework—which provides the functionality for our Wolfram Quantum Framework

and delivers broad support for quantum computing (with at least a few connections to multiway systems and our Physics Project):

Talking of our Physics Project, there are over 200 functions supporting it that are in the Wolfram Function Repository. But there are also paclets, like WolframInstitute/Hypergraph:

An example of an externally contributed package is Automata—with more than 250 functions for doing computations related to finite automata:

Another contributed paclet is FunctionalParsers, which goes from a symbolic parser specification to an actual parser, here being used in a reverse mode to generate random “sentences”:

Phi4Tools is a more specialized paclet, for working with Feynman diagrams in field theory:

And, as another example, here’s MaXrd, for crystallography and x-ray scattering:

As just one more example, there’s the Organizer paclet—a utility paclet for making and manipulating organizer notebooks. But unlike the other paclets we’ve seen here, it doesn’t expose any Wolfram Language functions; instead, when you install it, it puts a palette in your Palettes list:

As of today, Version 14 is finished, and out in the world. So what’s next? We have lots of projects underway—some already with years of development behind them. Some extend and strengthen what’s already in the Wolfram Language; some take it in new directions.

One major focus is broadening and streamlining the deployment of the language: unifying the way it’s delivered and installed on computers, packaging it so it can be efficiently integrated into other standalone applications, etc.

Another major focus is expanding the handling of very large amounts of data by the Wolfram Language—and seamlessly integrating out-of-core and lazy processing.

Then of course there’s algorithmic development. Some is “classical”, directly building on the towers of functionality we’ve developed over the decades. Some is more “AI based”. We’ve been creating heuristic algorithms and meta-algorithms ever since Version 1.0—increasingly using methods from machine learning. How far will neural net methods go? We don’t know yet. We’re routinely using them in things like algorithm selection. But to what extent can they help in the heart of algorithms?

I’m reminded of something we did back in 1987 in developing Version 1.0. There was a long tradition in numerical analysis of painstakingly deriving series approximations for particular cases of mathematical functions. But we wanted to be able to compute hundreds of different functions to arbitrary precision for any complex values of their arguments. So how did we do it? We generalized from series to rational approximations—and then, in a very “machine-learning-esque” way—we spent months of CPU time systematically optimizing these approximations. Well, we’ve been trying to do the same kind of thing again—though now over more ambitious domains—and now using not rational functions but large neural nets as our basis.

We’ve also been exploring using neural nets to “control” precise algorithms, in effect making heuristic choices which either guide or can be validated by the precise algorithms. So far, none of what we’ve produced has outperformed our existing methods, but it seems plausible that fairly soon it will.

We’re doing a lot with various aspects of metaprogramming. There’s the project of

getting LLMs to help in the construction of Wolfram Language code—and in giving comments on it, and in analyzing what went wrong if the code didn’t do what one expected. Then there’s code annotation—where LLMs may help in doing things like predicting the most likely type for something. And there’s code compilation. We’ve been working for many years on a full-scale compiler for the Wolfram Language, and in every version what we have becomes progressively more capable. We’ve been doing some level of automatic compilation in particular cases (particularly ones involving numerical computation) for more than 30 years. And eventually full-scale automatic compilation will be possible for everything. But as of now some of the biggest payoffs from our compiler technology have been for our internal development, where we can now get optimal down-to-the-metal performance simply by compiled (albeit carefully written) Wolfram Language code.

One of the big lessons of the surprising success of LLMs is that there’s potentially more structure in meaningful human language than we thought. I’ve long been interested in creating what I’ve called a “symbolic discourse language” that gives a computational representation of everyday discourse. The LLMs haven’t explicitly done that. But they encourage the idea that it should be possible, and they also provide practical help in doing it. And whether the goal is to be able to represent narrative text, or contracts, or textual specifications, it’s a matter of extending the computational language we’ve built to encompass more kinds of concepts and structures.

There are typically several kinds of drivers for our continued development efforts. Sometimes it’s a question of continuing to build a tower of capabilities in some known direction (like, for example, solving PDEs). Sometimes the tower we’ve built suddenly lets us see new possibilities. Sometimes when we actually use what we’ve built we realize there’s an obvious way to polish or extend it—or to “double down” on something that we can now see is valuable. And then there are cases where things happening in the technology world suddenly open up new possibilities—like LLMs have recently done, and perhaps XR will eventually do. And finally there are cases where new science-related insights suggest new directions.

I had assumed that our Physics Project would at best have practical applications only centuries hence. But in fact it’s become clear that the correspondence it’s defined between physics and computation gives us quite immediate new ways to think about aspects of practical computation. And indeed we’re now actively exploring how to use this to define a new level of parallel and distributed computation in the Wolfram Language, as well as to represent symbolically not only the results of computations but also the ongoing process of computation.

One might think that after nearly four decades of intense development there wouldn’t be anything left to do in developing the Wolfram Language. But in fact at every level we reach, there’s ever more that becomes possible, and ever more that can we see might be possible. And indeed this moment is a particularly fertile one, with an unprecedentedly broad waterfront of possibilities. Version 14 is an important and satisfying waypoint. But there are wonderful things ahead—as we continue our long-term mission to make the computational paradigm achieve its potential, and to build our computational language to help that happen.

]]>

We call it perception. We call it measurement. We call it analysis. But in the end it’s about how we take the world as it is, and derive from it the impression of it that we have in our minds.

We might have thought that we could do science “purely objectively” without any reference to observers or their nature. But what we’ve discovered particularly dramatically in our Physics Project is that the nature of us as observers is critical even in determining the most fundamental laws we attribute to the universe.

But what ultimately does an observer—say like us—do? And how can we make a theoretical framework for it? Much as we have a general model for the process of computation—instantiated by something like a Turing machine—we’d like to have a general model for the process of observation: a general “observer theory”.

Central to what we think of as an observer is the notion that the observer will take the raw complexity of the world and extract from it some reduced representation suitable for a finite mind. There might be zillions of photons impinging on our eyes, but all we extract is the arrangement of objects in a visual scene. Or there might be zillions of gas molecules impinging on a piston, yet all we extract is the overall pressure of the gas.

In the end, we can think of it fundamentally as being about equivalencing. There are immense numbers of different individual configurations for the photons or the gas molecules—that are all treated as equivalent by an observer who’s just picking out the particular features needed for some reduced representation.

There’s in a sense a certain duality between computation and observation. In computation one’s generating new states of a system. In observation, one’s equivalencing together different states.

That equivalencing must in the end be implemented “underneath” by computation. But in observer theory what we want to do is just characterize the equivalencing that’s achieved. For us as observers it might in practice be all about how our senses work, what our biological or cultural nature is—or what technological devices or structures we’ve built. But what makes a coherent concept of observer theory possible is that there seem to be general, abstract characterizations that capture the essence of different kinds of observers.

It’s not immediately obvious that anything suitable for a finite mind could ever be extracted from the complexity of the world. And indeed the Principle of Computational Equivalence implies that computational irreducibility (and its multicomputational generalization) will be ubiquitous. But within computational irreducibility there must always be slices of computational reducibility. And it’s these slices of reducibility that an observer must try to pick out—and that ultimately make it possible for a finite mind to develop a “useful narrative” about what happens in the world, that allows it to make decisions, predictions, and so on.

How “special” is what an observer does? At its core it’s just about taking a large set of possible inputs, and returning a much smaller set of possible outputs. And certainly that’s a conceptual idea that’s appeared in many fields under many different names: a contractive mapping, reduction to canonical form, a classifier, an acceptor, a forgetful functor, evolving to an attractor, extracting statistics, model fitting, lossy compression, projection, phase transitions, renormalization group transformations, coarse graining and so on. But here we want to think not about what’s “mathematically describable”, but instead about what in general is actually implemented—say by our senses, our measuring devices, or our ways of analyzing things.

At an ultimate level, everything that happens can be thought of as being captured by the ruliad—the unique object that emerges as the entangled limit of all possible computations. And in a vast generalization of ideas like that our brains—like any other material thing—are made of atoms, so too any observer must be embedded as some kind of structure within the ruliad. But a key concept of observer theory is that it’s possible to make conclusions about an observer’s impression of the world just by knowing about the capabilities—and assumptions—of the observer, without knowing in detail what the observer is “like inside”.

And so it is, for example, that in our Physics Project we seem to be able to derive—essentially from the structure of the ruliad—the core laws of twentieth-century physics (general relativity, quantum mechanics and the Second Law) just on the basis of two features of us as observers: that we’re computationally bounded, and that we believe we’re persistent in time (even though “underneath” we’re made of different atoms of space at every successive moment). And we can expect that if we were to include other features of us as observers (for example, that we believe there are persistent objects in the world, or that we believe we have free will) then we’d be able to derive more aspects of the universe as we experience it—or of natural laws we attribute to it.

But the notion of observers—and observer theory—isn’t limited purely to “physical observers”. It applies whenever we try to “get an impression” of something. And so, for example, we can also operate as “mathematical observers”, sampling the ruliad to build up conclusions about mathematical laws. Some features of us as physical observers—like the computational boundedness associated with the finiteness of our minds—inevitably carry over to us as mathematical observers. But other features do not. But the point of observer theory is to provide a general framework in which we can characterize observers—and then see the consequences of those characterizations for the impressions or conclusions observers will form.

As humans we have senses like sight, hearing, touch, taste, smell and balance. And through our technology we also have access to a few thousand other kinds of measurements. So how basically do all these work?

The vast majority in effect aggregate a large number of small inputs to generate some kind of “average” output—which in the case of measurements is often specified as a (real) number. In a few cases, however, there’s instead a discrete choice between outputs that’s made on the basis of whether the total input exceeds a threshold (think: distributed consensus schemes, weighing balances, etc.)

But in all cases what’s fundamentally happening is that lots of different input configurations are all being equivalenced—or, more operationally, the dynamics of the system essentially make all equivalenced states evolve to the same “attractor state”.

As an example, let’s consider measuring the pressure of a gas. There are various ways to do this. But a very direct one is just to have a piston, and see how much force is exerted by the gas on this piston. So where does this force come from? At the lowest level it’s the result of lots of individual molecules bouncing off the surface of the piston, each transferring a tiny amount of momentum to it. If we looked at the piston at an atomic scale, we’d see it temporarily deform from each molecular impact. But the crucial point is that at a large scale the piston moves together, as a single rigid object—aggregating the effects of all those individual molecular impacts.

But why does it work this way? Essentially it’s because the intermolecular forces inside the piston are much stronger than the forces associated with molecules in the gas. Or, put more abstractly, there’s more coupling and coherence “inside the observer” than between the observer and what it’s observing.

We see the same basic pattern over and over again. There’s some form of transduction that couples the individual elements of what’s being observed to the observer. Then “within the observer” there’s something that in essence aggregates all these small effects. Sometimes that aggregation is “directly numerical”, as in the addition of lots of small momentum transfers. But sometimes it’s instead more explicitly like evolution to one attractor rather than another.

Consider, for example, the case of vision. An array of photons fall on the photoreceptor cells on our retinas, generating electrical signals transmitted through nerve fibers to our brains. Within the brain there’s then effectively a neural net that evolves to different attractors depending on what one’s looking at. Most of the time a small change in input image won’t affect what attractor one evolves to. But—much like with a weighing balance—there’s an “edge” at which even a small change can lead to a different output.

One can go through lots of different types of sensory systems and measuring devices. But the basic outline seems to always be the same. First, there’s a coupling between what is being sensed or measured and the thing that’s doing the sensing or measuring. Quite often that coupling involves transducing from one physical form to another—say from light to electricity, or from force to position. Sometimes then the crucial step of equivalencing different detailed inputs is achieved by simple “numerical aggregation”, most often by accumulation of objects (atoms, raindrops, etc.) or physical effects (forces, currents, etc.). But sometimes the equivalencing is instead achieved by a more obviously dynamical process.

It could amount to simple amplification, in which, say, the presence of a small element of input (say an individual particle) “tips over” some metastable system so that it goes into a certain final state. Or it could be more like a neural net where there’s a more complicated translation defined by hard-to-describe borders between basins of attraction leading to different attractors.

But, OK, so what’s the endpoint of a process of observation? Ultimately for us humans it’s an impression created in our minds. Of course that gets into lots of slippery philosophical issues. Yes, each of us has an “inner experience” of what’s going on in our mind. But anything else is ultimately an extrapolation. We make the assumption that other human minds also “see what we see”, but we can never “feel it from the inside”.

We can of course make increasingly detailed measurements—say of neural activity—to see how similar what’s going on is between one brain and another. But as soon as there’s the slightest structural—or situational—difference between the brains, we really can’t say exactly how their “impressions” will compare.

But for our purposes in constructing a general “observer theory” we’re basically going to make the assumption (or, in effect, “philosophical approximation”) that whenever a system does enough equivalencing, that’s tantamount to it “acting like an observer”, because it can then act as a “front end” that takes the “incoherent complexity of the world” and “collimates it” to the point where a mind will derive a definite impression from it.

Of course, there’s still a lot of subtlety here. There has to be “just enough equivalencing” and not too much. For example, if all inputs were always equivalenced to the same output, there’d be nothing useful observed. And in the end there’s somehow got to be some kind of match between the compression of input achieved by equivalencing, and the “capacity” of the mind that’s ultimately deriving an impression from it.

A crucial feature of anything that can reasonably be called a mind is that “something’s got to be going on in there”. It can’t be, for example, that the internal state of the system is fixed. There has to be some internal dynamics—some computational process that we can identify as the ongoing operation of the mind.

At an informational level we might say that there has to be more information processing going on inside than there is flow of information from the outside. Or, in other words, if we’re going to be meaningful “observers like us” we can’t just be bombarded by input we don’t process; we have to have some capability to “think about what we’re seeing”.

All of this comes back to the idea that a crucial feature of us as observers is that we are computationally bounded. We do computation; that’s why we can have an “inner sense of things going on”. But the amount of computation we do is tiny compared to the computation going on in the world around us. Our experience represents a heavily filtered version of “what’s happening outside”. And the essence of “being an observer like us” is that we’re effectively doing lots of equivalencing to get to that filtered version.

But can we imagine a future in which we “expand our minds”? Or perhaps encounter some alien intelligence with a fundamentally “less constrained mind”? Well, at some point there’s an issue with this. Because in a sense the idea that we have a coherent existence relies on us having “limited minds”. For without such constraints there wouldn’t be a coherent “self” that we could identify—with coherent inner experience.

Let’s say we’re shown some system—say in nature—“from the outside”. Can we tell if “there’s an observer in there”? Ultimately not, because in a sense we’d have to be “inside that observer” and be able to experience the impression of the world that it’s getting. But in much the same way as we extrapolate to believing that, say, other human minds are experiencing things like we’re experiencing, so also we can potentially extrapolate to say what we might think of as an observer.

And the core idea seems to be that an “observer” should be a subsystem whose “internal states” are affected by the rest of the system, but where many “external states” lead to the same internal state—and where there is rich dynamics “within the observer” that in effect operates only on its internal states. Ultimately—following the Principle of Computational Equivalence—both the outside and the inside of the “observer subsystem” can be expected to be equivalent in the computations they’re performing. But the point is that the coupling from outside the subsystem to inside effectively “coarse grains” what’s outside, so that the “inner computation” is operating on a much-reduced set of elements.

Why should any such “observer subsystems” exist? Presumably at some level it’s inevitable from the presence of pockets of computational reducibility within arbitrary computationally irreducible systems. But more important for us is that our very existence—and the possibility of our coherent inner experience—depends on us “operating as observers”. And—almost as a “self-fulfilling prophecy”—our behavior tends to perpetuate our ability to successfully do this. For example, we can think of us as choosing to put ourselves in situations and environments where we can “predict what’s going to happen” well enough to “survive as observers”. (At a mundane practical level we might do this by not living in places subject to unpredictable natural forces—or by doing things like building ourselves structures that shelter us from those forces.)

We’ve talked about observers operating by compressing the complexities of the world to “inner impressions” suitable for finite minds. And in typical situations that we describe as perception and measurement, the main way this happens is by fairly direct equivalencing of different states. But in a sense there’s a higher-level story that relies on formalization—and in essence computation—and that’s what we usually call “analysis”.

Let’s say we have some intricate structure—perhaps some nested, fractal pattern. A direct rendering of all the pixels in this pattern ultimately won’t be something well suited for a “finite mind”. But if we gave rules—or a program—for generating the pattern we’d have a much more succinct representation of it.

But now there’s a problem with computational irreducibility. Yes, the rules determine the pattern. But to get from these rules to the actual pattern can require an irreducible amount of computation. And to “reverse engineer the pattern” to find the rules can require even more computation.

Yes, there are particular cases—like repetitive and simple nested patterns—where there’s enough immediate computational reducibility that a computationally bounded system (or observer) can fairly easily “do the analysis” and “get the compression”. But in general it’s hard. And indeed in a sense it’s the whole mission of science to pick away at the problem, and try to find more ways to “reduce the complexities of the world” to “human-level narratives”.

Computational irreducibility limits the extent to which this can be successful. But the inevitable existence of pockets of reducibility even within computational irreducibility guarantees that progress can always in principle be made. As we invent more kinds of measuring devices we can extend our domain as observers. And the same is true when we invent more methods of analysis, or identify more principles in science.

But the overall picture remains the same: what’s crucial to “being an observer” is equivalencing many “states of the world”, either through perceiving or measuring only specific aspects of them, or through identifying “simplified narratives” that capture them. (In effect, perception and measurement tend to do “lossy compression”; analysis is more about “lossless compression” where the equivalencing is effectively not between possible inputs but between possible generative rules.)

Our view of the world is ultimately determined by what we observe of it. We take what’s “out there in the world” and in effect “construct our perceived reality” by our operation as observers. Or, in other words, insofar as we have a narrative about “what’s going on in the world”, that’s something that comes from our operation as observers.

And in fact from our Physics Project we’re led to an extreme version of this—in which what’s “out there in the world” is just the whole ruliad, and in effect everything specific about our perceived reality must come from how we operate as observers and thus how we sample the ruliad.

But long before we get to this ultimate level of abstraction, there are lots of ways in which our nature as observers “builds” our perceived reality. Think about any material substance—like a fluid. Ultimately it’s made up of lots of individual molecules “doing their thing”. But observers like us aren’t seeing those molecules. Instead, we’re aggregating things to the point where we can just describe the system as a fluid, that operates according to the “narrative” defined by the laws of fluid mechanics.

But why do things work this way? Ultimately it’s the result of the repeated story of the interplay between underlying computational irreducibility, and the computational boundedness of us as observers. At the lowest level the motion of the molecules is governed by simple rules of mechanics. But the phenomenon of computational irreducibility implies that to work out the detailed consequences of “running these rules” involves an irreducible amount of computational work—which is something that we as computationally bounded observers can’t do. And the result of this is that we’ll end up describing the detailed behavior of the molecules as just “random”. As I’ve discussed at length elsewhere, this is the fundamental origin of the Second Law of thermodynamics. But for our purposes here the important point is that it’s what makes observers like us “construct the reality” of things like fluids. Our computational boundedness as observers makes us unable to trace all the detailed behavior of molecules, and leaves us “content” to describe fluids in terms of the “narrative” defined by the laws of fluid mechanics.

Our Physics Project implies that it’s the same kind of story with physical space. For in our Physics Project, space is ultimately “made” of a network of relations (or connections) between discrete “atoms of space”—that’s progressively being updated in what ends up being a computationally irreducible way. But we as computationally bounded observers can’t “decode” all the details of what’s happening, and instead we end up with a simple “aggregate” narrative, that turns out to correspond to continuum space operating according to the laws of general relativity.

The way both coherent notions of “matter” (or fluids) and spacetime emerge for us as observers can be thought of as a consequence of the equivalencing we do as observers. In both cases, there’s immense and computationally irreducible complexity “underneath”. But we’re ignoring most of that—by effectively treating different detailed behaviors as equivalent—so that in the end we get to a (comparatively) “simple narrative” more suitable for our finite minds. But we should emphasize that what’s “really going on in the system” is something much more complicated; it’s just that we as observers aren’t paying attention to that, so our perceived reality is much simpler.

OK, but what about quantum mechanics? In a sense that’s an extreme test of our description of how observers work, and the extent to which the operation of observers “constructs their perceived reality”.

In our Physics Project the underlying structure (hypergraph) that represents space and everything in it is progressively being rewritten according to definite rules. But the crucial point is that at any given stage there can be lots of ways this rewriting can happen. And the result is that there’s a whole tree of possible “states of the universe” that can be generated. So given this, why do we ever think that definite things happen in the universe? Why don’t we just think that there’s an infinite tree of branching histories for the universe?

Well, it all has to do with our nature as observers, and the equivalencing we do. At an immediate level, we can imagine looking at all those different possible branching paths for the evolution of the universe. And the key point is that even though they come from different paths of history, two states can just be the same. Sometimes it’ll be obvious that they’re same; sometimes one might have to determine, say, whether two hypergraphs are isomorphic. But the point is that to any observer (at least one that isn’t managing to look at arbitrary “implementation details”), the states will inevitably be considered equivalent.

But now there’s a bigger point. Even though “from the outside” there might be a whole branching and merging multiway graph of histories for the universe, observers like us can’t trace that. And in fact all we perceive is a single thread of history. Or, said another way, we believe that we have a single thread of experience—something closely related to our belief that (despite the changing “underlying elements” from which we are made) we are somehow persistent in time (at least during the span of our existence).

But operationally, how do we go from all those underlying branches of history to our perceived single thread of history? We can think of the states on different threads of history as being related by what we call a branchial graph, that joins states that have immediate common ancestors. And in the limit of many threads, we can think of these different states as being laid out “branchial space”. (In traditional quantum mechanics terms, this layout defines a “map of quantum entanglements”—with each piece of common ancestry representing an entanglement between states.)

In physical space—whether we’re looking at molecules in a fluid or atoms of space—we can think of us operating as observers who are physically large enough to span many underlying discrete elements, so that what we end up observing is just some kind of aggregate, averaged result. And it’s very much the same kind of thing in branchial space: we as observers tend to be large enough in branchial space to be spread across an immense number of branches of history, so that what we observe is just aggregate, averaged results across all those branches.

There’s lots of detailed complexity in what happens on different branches, just like there is in what happens to different molecules, or different atoms of space. And the reason is that there’s inevitably computational irreducibility, or, in this case, more accurately, multicomputational irreducibility. But as computationally bounded observers we just perceive aggregate results that “average out” the “underlying apparent randomness” to give a consistent single thread of experience.

And effectively this is what happens in the transition from quantum to classical behavior. Even though there are many possible detailed (“quantum”) threads of history that an object can follow, what we perceive corresponds to a single consistent “aggregate” (“classical”) sequence of behavior.

And this is typically true even at the level of our typical observation of molecules and chemical processes. Yes, there are many possible threads of history for, say, a water molecule. But most of our observations aggregate things to the point where we can talk about a definite shape for the molecule, with definite “chemical bonds”, etc.

But there is a special situation that actually looms large in typical discussions of quantum mechanics. We can think of it as the result of doing measurements that aren’t “aggregating threads of history to get an average”, but are instead doing something more like a weighing balance, always “tipping” one way or the other. In the language of quantum computing, we might say that we’re arranging things to be able to “measure a single qubit”. In terms of the equivalencing of states, we might say that we’re equivalencing lots of underlying states to specific canonical states (like “spin up” and “spin down”).

Why do we get one outcome rather than another? Ultimately we can think of it as all depending on the details of us as observers. To see this, let’s start from the corresponding question in physical space. We might ask why we observe some particular thing happening. Well, in our Physics Project everything about “what happens” is deterministic. But there’s still the “arbitrariness” of where we are in physical space. We’ll always basically see the same laws of physics, but the particulars of what we’ll observe depend on where we are, say on the surface of the Earth versus in interstellar space, etc.

Is there a “theory” for “where we are”? In some sense, yes, because we can go back and see why the molecules that make us up landed up in the particular place where they did. But what we can’t have an “external theory” for is just which molecules end up making up “us”, as we experience ourselves “from inside”. In our view of physics and the universe, it’s in some sense the only “ultimately subjective” thing: where our internal experience is “situated”.

And the point is that basically—even though it’s much less familiar—the same thing is going on at the level of quantum mechanics. Just as we “happen” to be at a certain place in physical space, so we’re at a certain place in branchial space. Looking back we can trace how we got here. But there’s no *a priori* way to determine “where our particular experience will be situated”. And that means we can’t know what the “local branchial environment” will be—and so, for example, what the outcome of “balance-like” measurements will be.

Just as in traditional discussions of quantum mechanics, the mechanics of doing the measurement—which we can think of as effectively equivalencing many underlying branches of history—will have an effect on subsequent behavior, and subsequent measurements.

But let’s say we look just at the level of the underlying multiway graph—or, more specifically, the multiway causal graph that records causal connections between different updating events. Then we can identify a complicated web of interdependence between events that are timelike, spacelike and branchlike separated. And this interdependence seems to correspond precisely to what’s expected from quantum mechanics.

In other words, even though the multiway graph is completely determined, the arbitrariness of “where the observer is” (particularly in branchial space), combined with the inevitable interdependence of different aspects of the multiway (causal) graph, seems sufficient to reproduce the not-quite-purely-probabilistic features of quantum mechanics.

In making observations in physical space, it’s common to make a measurement at one place or time, then make another measurement at another place or time, and, for example, see how they’re related. But in actually doing this, the observer will have to move from one place to the other, and persist from one time to another. And in the abstract it’s not obvious that that’s possible. For example, it could be that an observer won’t be able to move without changing—or, in other words, that “pure motion” won’t be possible for an observer. But in effect this is something we as observers assume about ourselves. And indeed, as I’ve discussed elsewhere, this is a crucial part of why we perceive spacetime to operate according to the laws of physics we know.

But what about in branchial space? We have much less intuition for this than for physical space. But we still effectively believe that pure motion is possible for us as observers in branchial space. It could be—like an observer in physical space, say, near a spacetime singularity—that an observer would get “shredded” when trying to “move” in branchial space. But our belief is that typically nothing like that happens. At some level being at different locations in branchial space presumably corresponds to picking different bases for our quantum states, or effectively to defining our experiments differently. And somehow our belief in the possibility of pure motion in branchial space seems related to our belief in the possibility of making arbitrary sequences choices in sets of experiments we do.

We might have thought that the only thing ultimately “out there” for us to observe would be our physical universe. But actually there are important situations where we’re essentially operating not as observers of our familiar physical universe, but instead of what amount to abstract universes. And what we’ll see is that the ideas of observer theory seem to apply there too—except that now what we’re picking out and reducing to “internal impressions” are features not of the physical world but of abstract worlds.

Our Physics Project in a sense brings ideas about the physical and abstract worlds closer—and the concept of the ruliad ultimately leads to a deep unification between them. For what we now imagine is that the physical universe as we perceive it is just the result of the particular kind of sampling of the ruliad made by us as certain kinds of observers. And the point is that we as observers can make other kinds of samplings, leading to what we can describe as abstract universes. And one particularly prominent example of this is mathematics, or rather, metamathematics.

Imagine starting from all possible axioms for mathematics, then constructing the network of all possible theorems that can be derived from them. We can consider this as forming a kind of “metamathematical universe”. And the particular mathematics that some mathematician might study we can then think of as the result of a “mathematical observer” observing that metamathematical universe.

There are both close analogies and differences between this and the experience of a physical observer in the physical universe. Both ultimately correspond to samplings of the ruliad, but somewhat different ones.

In our Physics Project we imagine that physical space and everything in it is ultimately made up of discrete elements that we identify as “atoms of space”. But in the ruliad in general we can think of everything being made up of “pure atoms of existence” that we call emes. In the particular case of physics we interpret these emes as atoms of space. But in metamathematics we can think of emes as corresponding to (“subaxiomatic”) elements of symbolic structures—from which things like axioms or theorems can be constructed.

A central feature of our interaction with the ruliad for physics is that observers like us don’t track the detailed behavior of all the various atoms of space. Instead, we equivalence things to the point where we get descriptions that are reduced enough to “fit in our minds”. And something similar is going on in mathematics.

We don’t track all the individual subaxiomatic emes—or usually in practice even the details of fully formalized axioms and theorems. Instead, mathematics typically operates at a much higher and “more human” level, dealing not with questions like how real numbers can be built from emes—or even axioms—but rather with what can be deduced about the properties of mathematical objects like real numbers. In a physics analogy to the behavior of a gas, typical human mathematics operates not at the “molecular” level of individual emes (or even axioms) but rather at the “fluid dynamics” level of “human-accessible” mathematical concepts.

In effect, therefore, a mathematician is operating as an observer who equivalences many detailed configurations—ultimately of emes—in order to form higher-level mathematical constructs suitable for our computationally bounded minds. And while at the outset one might have imagined that anything in the ruliad could serve as a “possible mathematics”, the point is that observers like us can only sample the ruliad in particular ways—leading to only particular possible forms for “human-accessible” mathematics.

It’s a very similar story to the one we’ve encountered many times in thinking about physics. In studying gases, for example, we could imagine all sorts of theories based on tracking detailed molecular motions. But for observers like us—with our computational boundedness—we inevitably end up with things like the Second Law of thermodynamics, and the laws of fluid mechanics. And in mathematics the main thing we end up with is “higher-level mathematics”—mathematics that we can do directly in terms of typical textbook concepts, rather than constantly having to “drill down” to the level of axioms, or emes.

In physics we’re usually particularly concerned with issues like predicting how things will evolve through time. In mathematics it’s more about accumulating what can be considered true. And indeed we can think of an idealized mathematician as going through the ruliad and collecting in their minds a “bag” of theorems (or axioms) that they “consider to be true”. And given such a collection, they can essentially follow the “entailment paths” defined by computations in the ruliad to find more theorems to “add to their bag”. (And, yes, if they put in a false theorem then—because a false premise in the standard setup of logic implies everything—they’ll end up with an “infinite explosion of theorems”, that won’t fit in a finite mind.)

In observing the physical universe, we talk about our different possible senses (like vision, hearing, etc.) or different kinds of measuring devices. In observing the metamathematical universe the analogy is basically different possible kinds of theories or abstractions—say, algebraic vs. geometrical vs. topological vs. categorical, etc. (with new approaches being like new kinds of measuring devices).

Particularly when we think in terms of the ruliad we can expect a certain kind of ultimate unity in the metamathematical universe—but different theories and different abstractions will pick up different aspects of it, just as vision and hearing pick up different aspects of the physical universe. But in a sense observer theory gives us a global way to talk about this, and to characterize what kinds of observations observers like us can make—whether of the physical universe or the metamathematical one.

In physics we’ve then seen in our Physics Project how this allows us to find general laws that describe our perception of the physical world—and that turn out to reproduce the core known laws of physics. In mathematics we’re not as familiar with the concept of general laws, though the very fact that higher-level mathematics is possible is presumably in essence such a law, and perhaps the kinds of regularities seen in areas like category theory are others—as are the inevitable dualities we expect to be able to identify between different fields of mathematics. All these laws ultimately rely on the structure of the ruliad. But the crucial point is that they’re not talking about the “raw ruliad”; instead they’re talking about just certain samplings of the ruliad that can be done by observers like us, and that lead to certain kinds of “internal impressions” in terms of which these laws can be stated.

Mathematics represents a certain kind of abstract setup that’s been studied in a particularly detailed way over the centuries. But it’s not the only kind of “abstract setup” we can imagine. And indeed there’s even a much more familiar one: the use of concepts—and words—in human thinking and language.

We might imagine that at some time in the distant past our forebears could signify, say, rocks only by pointing at individual ones. But then there emerged the general notion of “rock”, captured by a word for “rock”. And once again this is a story of observers and equivalences. When we look at a rock, it presumably produces all sorts of detailed patterns of neuron firings in our brains, different for each particular rock. But somehow—presumably essentially through evolution to an attractor in the neural net in our brains—we equivalence all these patterns to extract our “inner impression” of the “concept of a rock”.

In the typical tradition of quantitative science we tend to be interested in doing measurements that lead to things like numerical results. But in representing the world using language we tend to be interested instead in creating symbolic structures that involve collections of discrete words embedded in a grammatical framework. Such linguistic descriptions don’t capture every detail; in a typical observer kind of way they broadly equivalence many things—and in a sense reduce the complexity of the world to a description in terms of a limited number of discrete words and linguistic forms.

Within any given person’s brain there’ll be “thoughts” defined by patterns of neuron firings. And the crucial role of language is to provide a way to robustly “package up” those thoughts, and for example represent them with discrete words, so they can be communicated to another person—and unpacked in that person’s brain to produce neuron firings that reproduce what amount to those same thoughts.

When we’re dealing with something like a numerical measurement we might imagine that it could have some kind of absolute interpretation. But words are much more obviously an “arbitrary basis” for communication. We could pick a different specific word (say from a different human language) but still “communicate the same thing”. All that’s required is that everyone who’s using the word agrees on its meaning. And presumably that normally happens because of shared “social” history between people who use a given word.

It’s worth pointing out that for this to work there has to be a certain separation of scales. The collective impression of the meaning of a word may change over time, but that change has to be slow compared to the rate at which the word is used in actual communication. In effect, the meaning of a word—as we humans might understand it—emerges from the aggregation of many individual uses.

In the abstract, there might not be any reason to think that there’d be a way to “understand words consistently”. But it’s a story very much like what we’ve encountered in both physics and mathematics. Even though there are lots of complicated individual details “underneath”, we as observers manage to pick out features that are “simple enough for us to understand”. In the case of molecules in a gas that might be the overall pressure of the gas. And in the case of words it’s a stable notion of “meaning”.

Put another way, the possibility of language is another example of observer theory at work. Inside our brains there are all sorts of complicated neuron firings. But somehow these can be “packaged up” into things like words that form “human-level narratives”.

There’s a certain complicated feedback loop between the world as we experience it and the words we use to describe it. We invent words for things that we commonly encounter (“chair”, “table”, …). Yet once we have a word for something we’re more able to form thoughts about it, or communicate about it. And that in turn makes us more likely to put instances of it in our environment. In other words, we tend to build our environment so that the way we have of making narratives about it works well—or, in effect, so our inner description of it can be as simple as possible, and it can be as predictable to us as possible.

We can view our experience of physics and of mathematics as being the result of us acting as physical observers and mathematical observers. Now we’re viewing our experience of the “conceptual universe” as being the result of us acting as “conceptual observers”. But what’s crucial is that in all these cases, we have the same intrinsic features as observers: computational boundedness and a belief in persistence. The computational boundedness is what makes us equivalence things to the point where we can have symbolic descriptions of the world, for example in terms of words. And the belief in persistence is what lets those words have persistent meanings.

And actually these ideas extend beyond just language—to paradigms, and general ways of thinking about things. When we define a word we’re in effect defining an abstraction for a class of things. And paradigms are somehow a generalization of this: ways of taking lots of specifics and coming up with a uniform framework for them. And when we do this, we’re in effect making a classic observer theory move—and equivalencing lots of different things to produce an “internal impression” that’s “simple enough” to fit in our finite minds.

Our tendency as observers is always to believe that we can separate our “inner experience” from what’s going on in the “outside world”. But in the end everything is just part of the ruliad. And at the level of the ruliad we as observers are ultimately “made of the same stuff” as everything else.

But can we imagine that we can point at one part of the ruliad and say “that’s an observer”, and at another part and say “that’s not”? At least to some extent the answer is presumably yes—at least if we restrict ourselves to “observers like us”. But it’s a somewhat subtle—and seemingly circular—story.

For example, one core feature of observers like us is that we have a certain persistence, or at least we believe we have a certain persistence. But, inevitably, at the level of the “raw ruliad”, we’re continually being made from different atoms of existence, i.e. different emes. So in what sense are we persistent? Well, the point is that an observer can equivalence those successive patterns of emes, so that what they observe is persistent. And, yes, this is at least on the face of it circular. And ultimately to identify what parts of the ruliad might be “persistent enough to be observers”, we’ll have to ground this circularity in some kind of further assumption.

What about the computational boundedness of observers like us, which forces us to do lots of equivalencing? At some level that equivalencing must be implemented by lots of different states evolving to the same states. But once again there’s circularity, because even to define what we mean by “the same states” (“Are isomorphic graphs the same?”, etc.) we have to be imagining certain equivalencing.

So how do we break out of the circularity? The key is presumably the presence of additional features that define “observers like us”. And one important class of such features has to do with scale.

We’re neither tiny nor huge. We involve enough emes that consistent averages can emerge. Yet we don’t involve so many emes that we span anything but an absolutely tiny part of the whole ruliad.

And actually a lot of our experience is determined by “our size as observers”. We’re large enough that certain equivalencing is inevitable. Yet we’re small enough that we can reasonably think of there being many choices for “where we are”.

The overall structure of the ruliad is a matter of formal necessity; there’s only one possible way for it to be. But there’s contingency in our character as observers. And for example in a sense there’s a fundamental constant of nature as we perceive it, which is our extent in the ruliad, say measured in emes (and appropriately projected into physical space, branchial space, etc.).

And the fact that this extent is small compared to the whole ruliad means that there are “many possible observers”—who we can think of as existing at different positions in the ruliad. And those different observers will look at the ruliad from different “points of view”, and thus develop different “internal impressions” of “perceived reality”.

But a crucial fact central to our Physics Project is that there are certain aspects of that perceived reality that are inevitable for observers like us—and that correspond to core laws of physics. But when it gets to more specific questions (“What does the night sky look like from where you are?”, etc.) different observers will inevitably have different versions of perceived reality.

So is there a way to translate from one observer to another? Essentially that’s a story of motion. What happens when an observer at one place in the ruliad “moves” to another place? Inevitably, the observer will be “made of different emes” if it’s at a different place. But will it somehow still “be the same”? Well, that’s a subtle question, that depends both on the background structure of the ruliad, and the nature of the observer.

If the ruliad is “too wild” (think: spacetime near a singularity) then the observer will inevitably be “shredded” as it “moves”. But computational irreducibility implies a certain overall regularity to most of the ruliad, making “pure motion” at least conceivable. But to achieve “pure motion” the observer still has to be “made of” something that is somehow robust—essentially some “lump of computational reducibility” that can “predictably survive” the underlying background of computational irreducibility.

In spacetime we can identify such “lumps” with things like black holes, and particles like electrons, photons, etc. (and, yes, in our models there’s probably considerable commonality between black holes and particles). It’s not yet clear quite what the analog is in branchial space, though a very simple example might involve persistence of qubits. And in rulial space, one kind of analog is the very notion of concepts. For in effect concepts (as represented for example by words) are the analog of particles in rulial space: they are the robust structures that can move across rulial space and “maintain their identity”, carrying “the same thoughts” to different minds.

So what does all this mean for what can constitute an observer in the ruliad? Observers in effect leverage computational reducibility to extract simplified features that can “fit in finite minds”. But observers themselves must also embody computational reducibility in order to maintain their own persistence and the persistence of the features they extract. Or in other words, observers must in a sense always correspond to “patches of regularity” in the ruliad.

But can any patch of regularity in the ruliad be thought of as an observer? Probably not usefully so. Because another feature of observers like us is that we are connected in some kind of collective “social” framework. Not only do we individually form internal impressions in our minds, but we also communicate these impressions. And indeed without such communication we wouldn’t, for example, be able to set up things like coherent languages with which to describe things.

A key implication of our Physics Project and the concept of the ruliad is that we perceive the universe to be the way we do because we are the way we are as observers. And the most fundamental aspect of observers like us is that we’re doing lots of equivalencing to reduce the “complexity of the world” to “internal impressions” that “fit into our minds”. But just what kinds of equivalencing are we actually doing? At some level a lot of that is defined by the things we believe—or assume—about ourselves and the way we interact with the world.

A very central assumption we make is that we’re somehow “stable observers” of a changing “outside world”. Of course, at some level we’re actually not “stable” at all: we’re built up from emes whose configuration is changing all the time. But our belief in our own stability—and, in effect, our belief in our “persistence in time”—makes us equivalence those configurations. And having done that equivalencing we perceive the universe to operate in a certain way, that turns out to align with the laws of physics we know.

But actually there’s more than just our assumption of persistence in time. For example, we also have an assumption of persistence in space: we assume that—at least on reasonably short timescales—we’re consistently “observing the universe from the same place”, and not, say, “continually darting around”. The network that represents space is continually changing “around us”. But we equivalence things so that we can assume that—in a first approximation—we are “staying in the same place”.

Of course, we don’t believe that we have to stay in exactly the same place all the time; we believe we’re able to move. And here we make what amounts to another “assumption of stability”: we assume that pure motion is possible for us as observers. In other words, we assume that we can “go to different places” and still be “the same us”, with the same properties as observers.

At the level of the “raw ruliad” it’s not at all obvious that such assumptions can be consistently made. But as we discussed above, the fact that for observers like us they can (at least to a good approximation) is a reflection of certain properties of us as observers—in particular of our physical scale, being large in terms of atoms of space but small in terms of the whole universe.

Related to our assumption about motion is our assumption that “space exists”—or that we can treat space as something coherent. Underneath, there’s all sorts of complicated dynamics of changing patterns of emes. But on the timescales at which we experience things we can equivalence these patterns to allow us to think of space as having a “coherent structure”. And, once again, the fact that we can do this is a consequence of physical scales associated with us as observers. In particular, the speed of light is “fast enough” that it brings information to us from the local region around us in much less time than it takes our brain to process it. And this means that we can equivalence all the different ways in which different pieces of information reach us, and we can consistently just talk about the state of a region of space at a given time.

Part of our assumption that we’re “persistent in time” is that our thread of experience is—at least locally—continuous, with no breaks. Yes, we’re born and we die—and we also sleep. But we assume that at least on scales relevant for our ongoing perception of the world, we experience time as something continuous.

More than that, we assume that we have just a single thread of experience. Or, in other words, that there’s always just “one us” going through time. Of course, even at the level of neurons in our brains all sorts of activity goes on in parallel. But somehow in our normal psychological state we seem to concentrate everything so that our “inner experience” follows just one “thread of history”, on which we can operate in a computationally bounded way, and form definite memories and have definite sequences of thoughts.

We’re not as familiar with branchial space as with physical space. But presumably our “fundamental assumption of stability” extends there as well. And when combined with our basic computational boundedness it then becomes inevitable that (as we discussed above) we’ll conflate different “quantum paths of history” to give us as observers a definite “classical thread of inner experience”.

Beyond “stability”, another very important assumption we implicitly make about ourselves is what amounts to an assumption of “independence”. We imagine that we can somehow separate ourselves off from “everything else”. And one aspect of this is that we assume we’re localized—and that most of the ruliad “doesn’t matter to us”, so that we can equivalence all the different states of the “rest of the ruliad”.

But there’s also another aspect of “independence”: that in effect we can choose to do “whatever we want” independent of the rest of the universe. And this means that we assume we can, for example, essentially “do any possible experiment”, make any possible measurement—or “go anywhere we want” in physical or branchial space, or indeed rulial space. We assume that we effectively have “free will” about these things—determined only by our “inner choices”, and independent of the state of the rest of the universe.

Ultimately, of course, we’re just part of the ruliad, and everything we do is determined by the structure of the ruliad and our history within it. But we can view our “belief of freedom” as a reflection of the fact that we don’t know *a priori* where we’ll be located in the ruliad—and even if we did, computational irreducibility would prevent us from making predictions about what we will do.

Beyond our assumptions about our own “independence from the rest of the universe”, there’s also the question of independence between different parts of what we observe. And quite central to our way of “parsing the world” is our typical assumption that we can “think about different things separately”. In other words, we assume it’s possible to “factor” what we see happening in the universe into independent parts.

In science, this manifests itself in the idea that we can do “controlled experiments” in which we study how something behaves in isolation from everything else. It’s not self-evident that this will be possible (and indeed in areas like ethics it might fundamentally not be), but we as observers tend to implicitly assume it.

And actually, we normally go much further. Because we typically assume that we can describe—and think about—the world “symbolically”. In other words, we assume that we can take all the complexity of the world and represent at least the parts of it that we care about in terms of discrete symbolic concepts, of the kind that appear in human (or computational) language. There’s lots of detail in the world that our limited collection of symbolic concepts doesn’t capture, and effectively “equivalences out”. But the point is that it’s this symbolic description that normally seems to form the backbone of the “inner narrative” we have about the world.

There’s another implicit assumption that’s being made here, however. And that’s that there’s some kind of stability in the symbolic concepts we’re using. Yes, any particular mind might parse the world using a particular set of symbolic concepts. But we make the implicit assumption that there are other minds out there that work like ours. And this makes us imagine that there can be some form of “objective reality” that’s just “always out there”, to be sampled by whatever mind might happen to come along.

Not only, therefore, do we assume our own stability as observers; we also assume a certain stability to what we perceive of “everything that’s out there”. Underneath, there’s all the wildness and complexity of the ruliad. But we assume that we can successfully equivalence things to the point where all we perceive is something quite stable—and something that we can describe as ultimately governed by consistent laws.

It could be that every part of the universe just “does its own thing”, with no overall laws tying everything together. But we make the implicit assumption that, no, the universe—at least as far as we perceive it—is a more organized and consistent place. And indeed it’s that assumption that makes it feasible for us to operate as observers like us at all, and to even imagine that we can usefully reduce the complexity of the world to something that “fits in our finite minds”.

What resources does it take for an observer to make an observation? In most of traditional science, observation is at best added as an afterthought, and no account is taken of the process by which it occurs. And indeed, for example, in the traditional formalism of quantum mechanics, while “measurement” can have an effect on a system, it’s still assumed to be an “indivisible act” without any “internal process”.

But in observer theory, we’re centrally talking about the process of observation. And so it makes sense to try asking questions about the resources involved in this process.

We might start with our own everyday experience. Something happens out in the world. What resources—and, for example, how much time—does it take us to “form an impression of it”? Let’s say that out in the world a cat either comes into view or it doesn’t. There are signals that come to our brain from our eyes, effectively carrying data on each pixel in our visual field. Then, inside our brain, these signals are processed by a succession of layers of neurons, with us in the end concluding either “there’s a cat there”, or “there’s not”.

And from artificial neural nets we can get a pretty good idea of how this likely works. And the key to it—as we discussed above—is that there’s an attractor. Lots of different detailed configurations of pixels all evolve either to the “cat” or “no cat” final state. The different configurations have been equivalenced, so that only a “final conclusion” survives.

The story is a bit trickier though. Because “cat” or “no cat” really isn’t the final state of our brain; hopefully it’s not the “last thought we have”. Instead, our brain will continue to “think more thoughts”. So “cat”/”no cat” is at best some kind of intermediate waypoint in our process of thinking; an instantaneous conclusion that we’ll continue to “build on”.

And indeed when we consider measuring devices (like a piston measuring the pressure of a gas) we similarly usually imagine that they will “come to an instantaneous conclusion”, but “continue operating” and “producing more data”. But how long should we wait for each intermediate conclusion? How long, for example, will it take for the stresses generated by a particular pattern of molecules hitting a piston to “dissipate out”, and for the piston to be “ready to produce more data”?

There are lots of specific questions of physics here. But if our purpose is to build a formal observer theory, how should we think about such things? There is something of an analogy in the formal theory of computation. An actual computational system—say in the physical world—will just “keep computing”. But in formal computation theory it’s useful to talk about computations that halt, and about functions that can be “evaluated” and give a “definite answer”. So what’s the analog of this in observer theory?

Instead of general computations, we’re interested in computations that effectively “implement equivalences”. Or, put another way, we want computations that “destroy information”—and that have many incoming states but few outgoing ones. As a practical matter, we can either have the outgoing states explicitly represent whole equivalence classes, or they can just be “canonical representatives”—like in a network where at each step each element takes on whatever the “majority” or “consensus” value of its neighbors was.

But however it works, we can still ask questions about what computational resources were involved. How many steps did it take? How many elements were involved?

And with the idea that observers like us are “computationally bounded”, we expect limitations on these resources. But with this formal setup we can start asking just how far an observer like us can get, say in “coming to a conclusion” about the results of some computationally irreducible process.

An interesting case arises in putative quantum computers. In the model implied by our Physics Project, such a “quantum computer” effectively “performs many computations in parallel” on the separate branches of a multiway system representing the various threads of history of the universe. But if the observer tries to “come to a conclusion” about what actually happened, they have to “knit together” all those threads of history, in effect by implementing equivalences between them.

One could in principle imagine an observer who’d just follow all the quantum branches. But it wouldn’t be an observer like us. Because what seems to be a core feature of observers like us is that we believe we have just a single thread of experience. And to maintain that belief, our “process of observation” must equivalence all the different quantum branches.

How much “effort” will that be? Well, inevitably if a thread of history branched, our equivalencing has to “undo that branching”. And that suggests that the number of “elementary equivalencings” will have to be at least comparable to the number of “elementary branchings”—making it seem that the “effort of observation” will tend to be at least comparable to reduction of effort associated with parallelism in the “underlying quantum process”.

In general it’s interesting to compare the “effort of observation” with the “effort of computation”. With our concept of “elementary equivalencings” we have a way to measure both in terms of computational operations. And, yes, both could in principle be implemented by something like a Turing machine, though in practice the equivalencings might be most conveniently modeled by something like string rewriting.

And indeed one can often go much further, talking not directly in terms of equivalencings, but rather about processes that show attractors. There are different kinds of attractors. Sometimes—as in class 1 cellular automata—there are just a limited number of static, global fixed points (say, either all cells black or all cells white). But in other cases—such as class 3 cellular automata—the number of “output states” may be smaller than the number of “input states” but there may be no computationally simple characterization of them.

“Observers like us”, though, mostly seem to make use of the fixed points. We try to “symbolicize the world”, taking all the complexities “out there”, and reducing them to “discrete conclusions”, that we might for example describe using the discrete words in a language.

There’s an immediate subtlety associated with attractors of any kind, though. Typical physics is reversible, in the sense that any process (say two molecules scattering from each other) can run equally well forwards and backwards. But in an attractor one goes from lots of possible initial states to a smaller number of “attractor” final states. And there are two basic ways this can happen, even when there’s underlying reversibility. First, the system one’s studying can be “open”, in the sense that effects can “radiate” out of the region that one’s studying. And second, the states the system gets into can be “complicated enough” that, say, a computationally bounded observer will inevitably equivalence them. And indeed that’s the main thing that’s happening, for example, when a system “reaches thermodynamic equilibrium”, as described by the Second Law.

And actually, once again, there’s often a certain circularity. One is trying to determine whether an observer has “finished observing” and “come to a conclusion”. But one needs an observer to make that determination. Can we tell if we’ve finished “forming a thought”? Well, we have to “think about it”—in effect by forming another thought.

Put another way: imagine we are trying to determine whether a piston has “come to a conclusion” about pressure in a gas. Particularly if there’s microscopic reversibility, the piston and things around it will “continue wiggling around”, and it’ll “take an observer” to determine whether the “heat is dissipated” to the point where one can “read out the result”.

But how do we break out of what seems like an infinite regress? The point is that whatever mind is ultimately forming the impression that is “the observation” is inevitably the final arbiter. And, yes, this could mean that we’d always have to start discussing all sorts of details about photoreceptors and neurons and so on. But—as we’ve discussed at length—the key point that makes a general observer theory possible is that there are many conclusions that can be drawn for large classes of observers, quite independent of these details.

But, OK, what happens if we think about the raw ruliad? Now all we have are emes and elementary events updating the configuration of them. And in a sense we’re “fishing out of this” pieces that represent observers, and pieces that represent things they’re observing. Can we “assess the cost of observation” here? It really depends on the fundamental scale of what we consider to be observers. And in fact we might even think of our scale as observers (say measured in emes or elementary events) as defining a “fundamental constant of nature”—at least for the universe as we perceive it. But given this scale, we can for example ask for there to develop “consensus across it”, or at least for “every eme in it to have had time to communicate with every other”.

In an attempt to formalize the “cost of observation” we’ll inevitably have to make what seem like arbitrary choices, just as we would in setting up a scheme to determine when an ongoing computational process has “generated an answer”. But if we assume a certain boundedness to our choices, we can expect that we’ll be able to draw definite conclusions, and in effect be able to construct an analog of computational complexity theory for processes of observation.

My goal here has been to explore some of the key concepts and principles needed to create a framework that we can call observer theory. But what I’ve done is just the beginning, and there is much still to be done in fleshing out the theory and investigating its implications.

One important place to start is in making more explicit models of the “mechanics of observation”. At the level of the general theory, it’s all about equivalencing. But how specifically is that equivalencing achieved in particular cases? There are many thousands of kinds of sensors, measuring devices, analysis methods, etc. All of these should be systematically inventoried and classified. And in each case there’s a metamodel to be made, that clarifies just how equivalencing is achieved, and, for example, what separation of physical (or other) scales make it possible.

Human experience and human minds are the inspiration—and ultimate grounding—for our concept of an observer. And insofar as neural nets trained on what amounts to human experience have emerged as somewhat faithful models for what human minds do, we can expect to use them as a fairly detailed proxy for observers like us. So, for example, we can imagine exploring things like quantum observers by studying multiway generalizations of neural nets. (And this is something that becomes easier if instead of organizing their data into real-number weights we can “atomize” neural nets into purely discrete elements.)

Such investigations of potentially realistic models provide a useful “practical grounding” for observer theory. But to develop a general observer theory we need a more formal notion of an observer. And there is no doubt a whole abstract framework—perhaps using methods from areas like category theory—that can be developed purely on the basis of our concept of observers being about equivalencing.

But to understand the connection of observer theory to things like science as done by us humans, we need to tighten up what it means to be an “observer like us”. What exactly are all the general things we “believe about ourselves”? As we discussed above, many we so much take for granted that it’s challenging for us to identify them as actually just “beliefs” that in principle don’t have to be that way.

But I suspect that the more we can tighten up our definition of “observers like us”, the more we’ll be able to explain why we perceive the world the way we do, and attribute to it the laws and properties we do. Is there some feature of us as observers, for example, that makes us “parse” the physical world as being three-dimensional? We could represent the same data about what’s out there by assigning a one-dimensional (“space-filling”) coordinate to everything. But somehow observers like us don’t do that. And instead, in effect, we “probe the ruliad” by sampling it in what we perceive as 3D slices. (And, yes, the most obvious coarse graining just considers progressively larger geodesic balls, say in the spatial hypergraphs that appear in our Physics Project—but that’s probably at best just an approximation to the sampling observers like us do.)

As part of our Physics Project we’ve discovered that the structure of the three main theories of twentieth-century physics (statistical mechanics, general relativity and quantum mechanics) can be derived from properties of the ruliad just by knowing that observers like us are computationally bounded and believe we’re persistent in time. But how might we reach, say, the Standard Model of particle physics—with all its particular values of parameters, etc.? Some may be inevitable, given the underlying structure of our theory. But others, one suspects, are in effect reflections of aspects of us as observers. They are “derivable”, but only given our particular character—or beliefs—as observers. And, yes, presumably things like the “constant of nature” that characterizes “our size in emes” will appear in the laws we attribute to the universe as we perceive it.

And, by the way, these considerations of “observers like us” extend beyond physical observers. Thus, for example, as we tighten up our characterization of what we’re like as mathematical observers, we can expect that this will constrain the “possible laws of our mathematical universe”. We might have thought that we could “pick whatever axioms we want”, in effect sampling the ruliad to get any mathematics we want. But, presumably, observers like us can’t do this—so that questions like “Is the continuum hypothesis true?” can potentially have definite answers for any observers like us, and for any coherent mathematics that we build.

But in the end, do we really have to consider observers whose characteristics are grounded in human experience? We already reflexively generalize our own personal experiences to those of other humans. But can we go further? We don’t have the internal experience of being a dog, an ant colony, a computer, or an ocean. And typically at best we anthropomorphize such things, trying to reduce the behavior we perceive in them to elements that align with our own human experience.

But are we as humans just stuck with a particular kind of “internal experience”? The growth of technology—and in particular sensors and measuring devices—has certainly expanded the range of inputs that can be delivered to our brains. And the growth of our collective knowledge about the world has expanded our ways of representing and thinking about things. Right now those are basically our only ways of modifying our detailed “internal experience”. But what if we were to connect directly—and internally—into our brains?

Presumably, at least at first, we’d need the “neural user interface” to be familiar—and we’d be forced into, for example, concentrating everything into a single thread of experience. But what if we allowed “multiway experience”? Well, of course our brains are already made up of billions of neurons that each do things. But it seems to be a core feature of human experience that we concentrate those things to give a single thread of experience. And that seems to be an essential feature of being an “observer like us”.

That kind of concentration also happens in a flock of birds, an ant colony—or a human society. In all these cases, each individual organism “does their thing”. But somehow collective “decisions” get made, with many different detailed situations getting equivalenced together to leave only the “final decision”. So that means that from the outside, the system behaves as we would expect of an “observer like us”. Internally, that kind of “observer behavior” is happening “above the experience” of each single individual. But still, at the level of the “hive mind” it’s behavior typical of an observer like us.

That’s not to say, though, that we can readily imagine what it’s like to be a system like this, or even to be one of its parts. And in the effort to explore observer theory an important direction is to try to imagine ourselves having a different kind of experience than we do. And from “within” that experience, try to see what kind of laws would we attribute, say, to the physical universe.

In the early twentieth century, particularly in the context of relativity and quantum mechanics, it became clear that being “more realistic” about the observer was crucial in moving forward in science. Things like computational irreducibility—and even more so, our Physics Project—take that another step.

One used to imagine that science should somehow be “fundamentally objective”, and independent of all aspects of the observer. But what’s become clear is that it’s not. And that the nature of us as observers is actually crucial in determining what science we “experience”. But the crucial point is that there are often powerful conclusions that can be drawn even without knowing all the details of an observer. And that’s a central reason for building a general observer theory—in effect to give an objective way of formally and robustly characterizing what one might consider to be the subjective element in science.

There are no doubt many precursors of varying directness that can be found to the things I discuss here; I have not attempted a serious historical survey. In my own work, a notable precursor from 2002 is Chapter 10 of *A New Kind of Science*, entitled “Processes of Perception and Analysis”. I thank many people involved with our Wolfram Physics Project for related discussions, including Xerxes Arsiwalla, Hatem Elshatlawy and particularly Jonathan Gorard.

It’s all about systems where there can in effect be many possible paths of history. In a typical standard computational system like a cellular automaton, there’s always just one path, defined by evolution from one state to the next. But in a multiway system, there can be many possible next states—and thus many possible paths of history. Multiway systems have a central role in our Physics Project, particularly in connection with quantum mechanics. But what’s now emerging is that multiway systems in fact serve as a quite general foundation for a whole new “multicomputational” paradigm for modeling.

My objective here is twofold. First, I want to use multiway systems as minimal models for growth processes based on aggregation and tiling. And second, I want to use this concrete application as a way to develop further intuition about multiway systems in general. Elsewhere I have explored multiway systems for strings, multiway systems based on numbers, multiway Turing machines, multiway combinators, multiway expression evaluation and multiway systems based on games and puzzles. But in studying multiway systems for aggregation and tiling, we’ll be dealing with something that is immediately more physical and tangible.

When we think of “growth by aggregation” we typically imagine a “random process” in which new pieces get added “at random” to something. But each of these “random possibilities” in effect defines a different path of history. And the concept of a multiway system is to capture all those possibilities together. In a typical random (or “stochastic”) model one’s just tracing a single path of history, and one imagines one doesn’t have enough information to say which path it will be. But in a multiway system one’s looking at all the paths. And in doing so, one’s in a sense making a model for the “whole story” of what can happen.

The choice of a single path can be “nondeterministic”. But the whole multiway system is deterministic. And by studying that “deterministic whole” it’s often possible to make useful, quite general statements.

One can think of a particular moment in the evolution of a multiway system as giving something like an ensemble of states of the kind studied in statistical mechanics. But the general concept of a multiway system, with its discrete branching at discrete steps, depends on a level of fundamental discreteness that’s quite unfamiliar from traditional statistical mechanics—though is perfectly straightforward to define in a computational, or even mathematical, way.

For aggregation it’s easy enough to set up a minimal discrete model—at least if one allows explicit randomness in the model. But a major point of what we’ll do here is to “go above” that randomness, setting up our model in terms of a whole, deterministic multiway system.

What can we learn by looking at this whole multiway system? Well, for example, we can see whether there’ll always be growth—whatever the random choices may be—or whether the growth will sometimes, or even always, stop. And in many practical applications (think, for example, tumors) it can be very important to know whether growth always stops—or through what paths it can continue.

A lot of what we’ll at first do here involves seeing the effect of local constraints on growth. Later on, we’ll also look at effects of geometry, and we’ll study how objects of different shapes can aggregate, or ultimately tile.

The models we’ll introduce are in a sense very minimal—combining the simplest multiway structures with the simplest spatial structures. And with this minimality it’s almost inevitable that the models will show up as idealizations of all sorts of systems—and as foundations for good models of these systems.

At first, multiway systems can seem rather abstract and difficult to grasp—and perhaps that’s inevitable given our human tendency to think sequentially. But by seeing how multiway systems play out in the concrete case of growth processes, we get to build our intuition and develop a more grounded view—that will stand us in good stead in exploring other applications of multiway systems, and in general in coming to terms with the whole multicomputational paradigm.

It’s the ultimate minimal model for random discrete growth (often called the Eden model). On a square grid, start with one black cell, then at each step randomly attach a new black cell somewhere onto the growing “cluster”:

After 10,000 steps we might get:

But what are all the possible things that can happen? For that, we can construct a multiway system:

A lot of these clusters differ only by a trivial translation; canonicalizing by translation we get

or after another step:

If we also reduce out rotations and reflections we get

or after another step:

The set of possible clusters after *t* steps are just the possible polyominoes (or “square lattice animals”) with *t* cells. The number of these for successive *t* is

growing roughly like *k ^{t}* for large

By the way, canonicalization by translation always reduces the number of possible clusters by a factor of *t*. Canonicalization by rotation and reflection can reduce the number by a factor of 8 if the cluster has no symmetry (which for large clusters becomes increasingly likely), and by a smaller factor the more symmetry the cluster has, as in:

With canonicalization, the multiway graph after 7 steps has the form

and it doesn’t look any simpler with alternative rendering:

If we imagine that at each step, cells are added with equal probability at every possible position on the cluster, or equivalently that all outgoing edges from a given cluster in the uncanonicalized multiway graph are followed with equal probability, then we can get a distribution of probabilities for the distinct canonical clusters obtained—here shown after 7 steps:

One feature of the large random cluster we saw at the beginning is that it has some holes in it. Clusters with holes start developing after 7 steps, with the smallest being:

This cluster can be reached through a subset of the multiway system:

And in fact in the limit of large clusters, the probability for there to be a hole seems to approach 1—even though the total fraction of area covered by holes approaches 0.

One way to characterize the “space of possible clusters” is to create a branchial graph by connecting every pair of clusters that have a common ancestor one step back in the multiway graph:

The connectedness of all these graphs reflects the fact that with the rule we’re using, it’s always possible at any step to go from one cluster to another by a sequence of delete-one-cell/add-one-cell changes.

The branchial graphs here also show a 4-fold symmetry resulting from the symmetry of the underlying lattice. Canonicalizing the states, we get smaller branchial graphs that no longer show any such symmetry:

With the rule we’ve been discussing so far, a new cell to be attached can be anywhere on a cluster. But what if we limit growth, by requiring that new cells must have certain numbers of existing cells around them? Specifically, let’s consider rules that look at the neighbors around any given position, and allow a new cell there only if there are specified numbers of existing cells in the neighborhood.

Starting with a cross of black cells, here are some examples of random clusters one gets after 20 steps with all possible rules of this type (the initial “4” designates that these are 4-neighbor rules):

Rules that don’t allow new cells to end up with just one existing neighbor can only fill in corners in their initial conditions, and can’t grow any further. But any rule that allows growth with only one existing neighbor produces clusters that keep growing forever. And here are some random examples of what one can get after 10,000 steps:

The last of these is the unconstrained (Eden model) rule we already discussed above. But let’s look more carefully at the first case—where there’s growth only if a new cell will end up with exactly one neighbor. The canonicalized multiway graph in this case is:

The possible clusters here correspond to polyominoes that are “always one cell wide” (i.e. have no 2×2 blocks), or, equivalently, have perimeter 2*t* + 2 at step *t*. The number of such canonicalized clusters grows like:

This is an increasing fraction of the total number of polyominoes—implying that most large polyominoes take this “spindly” form.

A new feature of a rule with constraints is that not all locations around a cluster may allow growth. Here is a version of the multiway system above, with cells around each cluster annotated with green if new growth is allowed there, and red if it never can be:

In a larger random cluster, we can see that with this rule, most of the interior is “dead” in the sense that the constraint of the rule allows no further growth there:

By the way, the clusters generated by this rule can always be directly represented by their “skeleton graphs”:

Looking at random clusters for all the (grow-with-1-neighbor) rules above, we see different patterns of holes in each case:

There are altogether five types of cells being distinguished here, reflecting different neighbor configurations:

Here’s a sample cluster generated with the 4:{1,3} rule:

Cells indicated with already have too many neighbors, and so can never be added to the cluster. Cells indicated with have exactly the right number of neighbors to be added immediately. Cells indicated with don’t currently have the right number of neighbors to grow, but if neighbors are filled in, they might be able to be added. Sometimes it will turn out that when neighbors of cells get filled in, they will actually prevent the cell from being added (so that it becomes )—and in the particular case shown here that happens with the 2×2 blocks of cells.

The multiway graphs from the rules shown here are all qualitatively similar, but there are detailed differences. In particular, at least for many of the rules, an increasing number of states are “missing” relative to what one gets with the grow-in-all-cases 4:{1,2,3,4} rule—or, in other words, there are an increasing number of polyominoes that can’t be generated given the constraints:

The first polyomino that can’t be reached (which occurs at step 4) is:

At step 6 the polyominoes that can’t be reached for rules 4:{1,3} and 4:{1,3,4} are

while for 4:{1} and 4:{1,4} the additional polyomino

can also not be reached.

At step 8, the polyomino

is reachable with 4:{1} and 4:{1,3} but not with 4:{1,4} and 4:{1,3,4}.

Of some note is that none of the rules that exclude polyominoes can reach:

What happens if one considers diagonal as well orthogonal neighbors, giving a total of 8 neighbors around a cell? There are 256 possible rules in this case, corresponding to the possible subsets of `Range[8]`. Here are samples of what they do after 200 steps, starting from an initial cluster:

Two cases that at least initially show growth here are (the “8” designates that these are 8-neighbor rules):

In the {2} case, the multiway graph begins with:

One might assume that every branch in this graph would continue forever, and that growth would never “get stuck”. But it turns out that after 9 steps the following cluster is generated:

And with this cluster, no further growth is possible: no positions around the boundary have exactly 2 neighbors. In the multiway graph up to 10 steps, it turns out this is the only “terminal cluster” that can be generated—out of a total of 1115 possible clusters:

So how is that terminal cluster reached? Here’s the fragment of multiway graph that leads to it:

If we don’t prune off all the ways to “go astray”, the fragment appears as part of a larger multiway graph:

And if one follows all paths in the unpruned (and uncanonicalized) multiway graph at random (i.e. at each step, one chooses each branch with equal probability), it turns out that the probability of ever reaching this particular terminal cluster is just:

(And the fact that this number is fairly small implies that the system is far from confluent; there are many paths that, for example, don’t converge to the fixed point corresponding to this terminal cluster.)

If we keep going in the evolution of the multiway system, we’ll reach other terminal clusters; after 12 steps the following have appeared:

For the {3} rule above, the multiway system takes a little longer to “get going”:

Once again there are terminal clusters where the system gets stuck; the first of them appears at step 14:

And also once again the terminal cluster appears as an isolated node in the whole multiway system:

The fragment of multiway graph that leads to it is:

So far we’ve been finding terminal clusters by waiting for them to appear in the evolution of the multiway system. But there’s another approach, similar to what one might use in filling in something like a tiling. The idea is that every cell in a terminal cluster must have neighbors that don’t allow further growth. In other words, the terminal cluster must consist of certain “local tiles” for which the constraints don’t allow growth. But what configurations of local tiles are possible? To determine this, we turn the matching conditions for the tiles into logical expressions whose variables are `True` and `False` depending on whether particular positions in the template do or do not contain cells in the cluster. By solving the satisfiability problem for the combination of these logical expressions, one finds configurations of cells that could conceivably correspond to terminal clusters.

Following this procedure for the {2} rules with regions of up to 6×6 cells we find:

But now there’s an additional constraint. Assuming one starts from a connected initial cluster, any subsequent cluster generated must also be connected. Removing the non-connected cases we get:

So given these terminal clusters, what initial conditions can lead to them? To determine this we effectively have to invert the aggregation process—giving in the end a multiway graph that includes all initial conditions that can generate a given terminal cluster. For the smallest terminal cluster we get:

Our 4-cell “T” initial condition appears here—but we see that there are also even smaller 2-cell initial conditions that lead to the same terminal cluster.

For all the terminal clusters we showed before, we can construct the multiway graphs starting with the minimal initial clusters that lead to them:

For terminal clusters like

there’s no nontrivial multiway system to show, since these clusters can only appear as initial conditions; they can never be generated in the evolution.

There are quite a few small clusters that can only appear as initial conditions, and do not have preimages under the aggregation rule. Here are the cases that fit in a 3×3 region:

The case of the {3} rule is fairly similar to the {2} rule. The possible terminal clusters up to 5×5 are:

However, most of these have only a fairly limited set of possible preimages:

For example we have:

And indeed beyond the (size-17) example we already showed above, no other terminal clusters that can be generated from a T initial condition appear here. Sampling further, however, additional terminal clusters appear (beginning at size 25):

The fragments of multiway graphs for the first few of these are:

We’ve seen above that for the rules we’ve been investigating, terminal clusters are quite rare among possible states in the multiway system. But what happens if we just evolve at random? How often will we wind up with a terminal cluster? When we say “evolve at random”, what we mean is that at each step we’re going to look at all possible positions where a new cell could be added to the cluster that exists so far, and then we’re going to pick with equal probability at which of these to actually add the new cell.

For the 8:{3} rule something surprising happens. Even though terminal clusters are rare in its multiway graph, it turns out that regardless of its initial conditions, it always eventually reaches a terminal cluster—though it often takes a while. And here, for example, are a few possible terminal clusters, annotated with the number of steps it took to reach them (which is also equal to the number of cells they contain):

The distribution of the number of steps to termination seems to be very roughly exponential (here based on a sample of 10,000 random cases)—with mean lifetime around 2300 and half-life around 7400:

Here’s an example of a large terminal cluster—that takes 21,912 steps to generate:

And here’s a map showing when growth in different parts of this cluster occurred (with blue being earliest and red being latest):

This picture suggests that different parts of the cluster “actively grow” at different times, and if we look at a “spacetime” plot of where growth occurs as a function of time, we can confirm this:

And indeed what this suggests is that what’s happening is that different parts of the cluster are at first “fertile”, but later inevitably “burn out”—so that in the end there are no possible positions left where growth can occur.

But what shapes can the final terminal clusters form? We can get some idea by looking at a “compactness measure” (of the kind often used to study gerrymandering) that roughly gives the standard deviation of the distances from the center of each cluster to each of the cells in it. Both “very stringy” and “roughly circular” clusters are fairly rare; most clusters lie somewhere in between:

If we look not at the 8:{3} but instead at the 8:{2} rule, things are very different. Once again, it’s possible to reach a terminal cluster, as the multiway graph shows. But now random evolution almost never reaches a terminal cluster, and instead almost always “runs away” to generate an infinite cluster. The clusters generated in this case are typically much more “compact” than in the 8:{3} case

and this is also reflected in the “spacetime” version:

In building up our clusters so far, we’ve always been assuming that cells are added sequentially, one at a time. But if two cells are far enough apart, we can actually add them “simultaneously”, in parallel, and end up building the same cluster. We can think of the addition of each cell as being an “event” that updates the state of the cluster. Then—just like in our Physics Project, and other applications of multicomputation—we can define a causal graph that represents the causal dependencies between these events, and then foliations of this causal graph tell us possible overall sequences of updates, including parallel.

As an example, consider this sequence of states in the “always grow” 4:{1,2,3,4} rule—where at each step the cell that’s new is colored red (and we’re including the “nothing” state at the beginning):

Every transition between successive states defines an event:

There’s then causal dependence of one event on another if the cell added in the second event is adjacent to the one added in the first event. So, for example, there are causal dependencies like

and

where in the second case additional “spatially separated” cells have been added that aren’t involved in the causal dependence. Putting all the causal dependencies together, we get the complete causal graph for this evolution:

We can recover our original sequence of states by picking a particular ordering of these events (here indicated by the positions of the cells they add):

This path has the property that it always follows the direction of causal edges—and we can make that more obvious by using a different layout for the causal graph:

But in general we can use any ordering of events consistent with the causal graph. Another ordering (out of a total of 40,320 possibilities in this case) is

which gives the sequence of states

with the same final cluster configuration, but different intermediate states.

But now the point is that the constraints implied by the causal graph do not require all events to be applied sequentially. Some events can be considered “spacelike separated” and so can be applied simultaneously. And in fact, any foliation of the causal graph defines a certain sequence for applying events—either sequentially or in parallel. So, for example, here is one particular foliation of the causal graph (shown with two different renderings for the causal graph):

And here is the corresponding sequence of states obtained:

And since in some slices of this foliation multiple events happen “in parallel”, it’s “faster” to get to the final configuration. (As it happens, this foliation is like a “cosmological rest frame foliation” in our Physics Project, and involves the maximum possible number of events happening on each slice.)

Different foliations (and there are a total of 678,972 possibilities in this case) will give different sequences of states, but always the same final state:

Note that nothing we’ve done here depends on the particular rule we’ve used. So, for example, for the 8:{2} rule with sequence of states

the causal graph is:

It’s worth commenting that everything we’ve done here has been for particular sequences of states, i.e. particular paths in the multiway graph. And in effect what we’re doing is the analog of classical spacetime physics—tracing out causal dependencies in particular evolution histories. But in general we could look at the whole multiway causal graph, with events that are not only timelike or spacelike separated, but also branchlike separated. And if we make foliations of this graph, we’ll end up not only with “classical” spacetime states, but also “quantum” superposition states that would need to be represented by something like multispace (in which at each spatial position, there is a “branchial stack” of possible cell values).

So far we’ve been considering aggregation processes in two dimensions. But what about one dimension? In 1D, a “cluster” just consists of a sequence of cells. The simplest rule allows a cell to be added whenever it’s adjacent to a cell that’s already there. Starting from a single cell, here’s a possible random evolution according to such a rule, shown evolving down the page:

We can also construct the multiway system for this rule:

Canonicalizing the states gives the trivial multiway graph:

But just like in the 2D case things get less trivial if there are constraints on growth. For example, assume that before placing a new cell we count the number of cells that lie either distance 1 or distance 2 away. If the number of allowed cells can only be exactly 1 we get behavior like:

The corresponding multiway system is

or after canonicalization:

The number of distinct sequences after *t* steps here is given by

which can be expressed in terms of Fibonacci numbers, and for large *t* is about .

The rule in effect generates all possible Morse-code-like sequences, consisting of runs of either 2-cell (“long”) black blocks or 1-cell (“short”) black blocks, interspersed by “gaps” of single white cells.

The branchial graphs for this system have the form:

Looking at random evolutions for all possible rules of this type we get:

The corresponding canonicalized multiway graphs are:

The rules we’ve looked at so far are purely totalistic: whether a new cell can be added depends only on the total number of cells in its neighborhood. But (much like, for example, in cellular automata) it’s also possible to have rules where whether one can add a new cell depends on the complete configuration of cells in a neighborhood. Mostly, however, such rules seem to behave very much like totalistic ones.

Other generalizations include, for example, rules with multiple “colors” of cells, and rules that depend either on the total number of cells of different colors, or their detailed configurations.

The kind of analysis we’ve done for 2D and 1D aggregation systems can readily be extended to 3D. As a first example, consider a rule in which cells can be added along each of the 6 coordinate directions in a 3D grid whenever they are adjacent to an existing cell. Here are some typical examples of random clusters formed in this case:

Taking successive slices through the first of these (and coloring by “age”) we get:

If we allow a cell to be added only when it is adjacent to just one existing cell (corresponding to the rule 6:{1}) we get clusters that from the outside look almost indistinguishable

but which have an “airier” internal structure:

Much like in 2D, with 6 neighbors, there can’t be unbounded growth unless cells can be added when there is just one cell in the neighborhood. But in analogy to what happens in 2D, things get more complicated when we allow “corner adjacency” and have a 26-cell neighborhood.

If cells can be added whenever there’s at least one adjacent cell, the results are similar to the 6-neighbor case, except that now there can be “corner-adjacent outgrowths”

and the whole structure is “still airier”:

Little qualitatively changes for a rule like 26:{2} where growth can occur only with exactly 2 neighbors (here starting with a 3D dimer):

But the general question of when there is growth, and when not, is quite complicated and subtle. In particular, even with a specific rule, there are often some initial conditions that can lead to unbounded growth, and others that cannot.

Sometimes there is growth for a while, but then it stops. For example, with the rule 26:{9}, one possible path of evolution from a 3×3×3 block is:

The full multiway graph in this case terminates, confirming that no unbounded growth is ever possible:

With other initial conditions, however, this rule can grow for longer (here shown every 10 steps):

And from what one can tell, all rules 26:{*n*} lead to unbounded growth for , and do not for .

So far, we’ve been looking at “filling in cells” in grids—in 2D, 1D and 3D. But we can also look at just “placing tiles” without a grid, with each new tile attaching edge to edge to an existing tile.

For square tiles, there isn’t really a difference:

And the multiway system is just the same as for our original “grow anywhere” rule on a 2D grid:

Here’s now what happens for triangular tiles:

The multiway graph now generates all polyiamonds (triangular polyforms):

And since equilateral triangles can tessellate in a regular lattice, we can think of this—like the square case—as “filling in cells in a lattice” rather than just “placing tiles”. Here are some larger examples of random clusters in this case:

Essentially the same happens with regular hexagons:

The multiway graph generates all polyhexes:

Here are some examples of larger clusters—showing somewhat more “tendrils” than the triangular case:

And in an “effectively lattice” case like this we could also go on and impose constraints on neighborhood configurations, much as we did in earlier sections above.

But what happens if we consider shapes that do not tessellate the plane—like regular pentagons? We can still “sequentially place tiles” with the constraint that any new tile can’t overlap an existing one. And with this rule we get for example:

Here are some “randomly grown” larger clusters—showing all sorts of irregularly shaped interstices inside:

(And, yes, generating such pictures correctly is far from trivial. In the “effectively lattice” case, coincidences between polygons are fairly easy to determine exactly. But in something like the pentagon case, doing so requires solving equations in a high-degree algebraic number field.)

The multiway graph, however, does not show any immediately obvious differences from the ones for “effectively lattice” cases:

It makes it slightly easier to see what’s going on if we riffle the results on the last step we show:

The branchial graphs in this case have the form:

Here’s a larger cluster formed from pentagons:

And remember that the way this is built is sequentially to add one pentagon at each step by testing every “exposed edge” and seeing in which cases a pentagon will “fit”. As in all our other examples, there is no preference given to “external” versus “internal” edges.

Note that whereas “effectively lattice” clusters always eventually fill in all their holes, this isn’t true for something like the pentagon case. And in this case it appears that in the limit, about 28% of the overall area is taken up by holes. And, by the way, there’s a definite “zoo” of at least small possible holes, here plotted with their (logarithmic) probabilities:

So what happens with other regular polygons? Here’s an example with octagons (and in this case the limiting total area taken up by holes is about 35%):

And, by the way, here’s the “zoo of holes” in this case:

With pentagons, it’s pretty clear that difficult-to-resolve geometrical situations will arise. And one might have thought that octagons would avoid these. But there are still plenty of strange “mismatches” like

that aren’t easy to characterize or analyze. By the way, one should note that any time a “closed hole” is formed, the vectors corresponding to the edges that form its boundary must sum to zero—in effect defining an equation.

When the number of sides in the regular polygon gets large, our clusters will approximate circle packings. Here’s an example with 12-gons:

But of course because we’re insisting on adding one polygon at a time, the resulting structure is much “airier” than a true circle packing—of the kind that would be obtained (at least in 2D) by “pushing on the edges” of the cluster.

In the previous section we considered “sequential tilings” constructed from regular polygons. But the methods we used are quite general, and can be applied to sequential tilings formed from any shape—or shapes (or, at least, any shapes for which “attachment edges” can be identified).

As a first example, consider a domino or dimer shape—which we assume can be oriented both vertically and horizontally:

Here’s a somewhat larger cluster formed from dimers:

Here’s the canonicalized multiway graph in this case:

And here are the branchial graphs:

So what about other polyomino shapes? What happens when we try to sequentially tile with these—effectively making “polypolyominoes”?

Here’s an example based on an L-shaped polyomino:

Here’s a larger cluster

and here’s the canonicalized multiway graph after just 1 step

and after 2 steps:

The only other 3-cell polyomino is the tromino:

(For dimers, the limiting fraction of area covered by holes seems to be about 17%, while for L and tromino polyominoes, it’s about 27%.)

Going to 4 cells, there are 5 possible polyominoes—and here are samples of random clusters that can be built with them (note that in the last case shown, we require only that “subcells” of the 2×2 polyomino must align):

The corresponding multiway graphs are:

Continuing for more steps in a few cases:

Some polyominoes are “more awkward” to fit together than others—so these typically give clusters of “lower density”:

So far, we’ve always considered adding new polyominoes so that they “attach” on any “exposed edge”. And the result is that we can often get long “tendrils” in our clusters of polyominoes. But an alternative strategy is to try to add polyominoes as “compactly” as possible, in effect by adding successive “rings” of polyominoes (with “older” rings here colored bluer):

In general there are many ways to add these rings, and eventually one will often get stuck, unable to add polyominoes without leaving holes—as indicated by the red annotation here:

Of course, that doesn’t mean that if one was prepared to “backtrack and try again”, one couldn’t find a way to extend the cluster without leaving holes. And indeed for the polyomino we’re looking at here it’s perfectly possible to end up with “perfect tilings” in which no holes are left:

In general, we could consider all sorts of different strategies for growing clusters by adding polyominoes “in parallel”—just like in our discussion of causal graphs above. And if we add polyominoes “a ring at a time” we’re effectively making a particular choice of foliation—in which the successive “ring states” turn out be directly analogous to what we call “generational states” in our Physics Project.

If we allow holes (and don’t impose other constraints), then it’s inevitable that—just with ordinary, sequential aggregation—we can grow an unboundedly large cluster of polyominoes of any shape, just by always attaching one edge of each new polyomino to an “exposed” edge of the existing cluster. But if we don’t allow holes, it’s a different story—and we’re talking about a traditional tiling problem, where there are ultimately cases where tiling is impossible, and only limited-size clusters can be generated.

As it happens, all polyominoes with 6 or fewer cells do allow infinite tilings. But with 7 cells the following do not:

It’s perfectly possible to grow random clusters with these polyominoes—but they tend not to be at all compact, and to have lots of holes and tendrils:

So what happens if we try to grow clusters in rings? Here are all the possible ways to “surround” the first of these polyominoes with a “single ring”:

And it turns out in every single case, there are edges (indicated here in red) where the cluster can’t be extended—thereby demonstrating that no infinite tiling is possible with this particular polyomino.

By the way, much like we saw with constrained growth on a grid, it’s possible to have “tiling regions” that can extend only a certain limited distance, then always get stuck.

It’s worth mentioning that we’ve considered here the case of single polyominoes. It’s also possible to consider being able to add a whole set of possible polyominoes—“*Tetris* style”.

We’ve looked at polyominoes—and shapes like pentagons—that don’t tile the plane. But what about shapes that can tile the plane, but only nonperiodically? As an example, let’s consider Penrose tiles. The basic shapes of these tiles are

though there are additional matching conditions (implicitly indicated by the arrows on each tile), which can be enforced either by putting notches in the tiles or by decorating the tiles:

Starting with these individual tiles, we can build up a multiway system by attaching tiles wherever the matching rules are satisfied (note that all edges of both tiles are the same length):

So how can we tell that these tiles can form a nonperiodic tiling? One approach is to generate a multiway system in which at successive steps we surround clusters with rings in all possible ways:

Continuing for another step we get:

Notice that here some of the branches have died out. But the question is what branches exist that will continue forever, and thus lead to an infinite tiling? To answer this we have to do a bit of analysis.

The first step is to see what possible “rings” can have formed around the original tile. And we can read all of these off from the multiway graph:

But now it’s convenient to look not at possible rings around a tile, but instead at possible configurations of tiles that can surround a single vertex. There turns out to be the following limited set:

The last two of these configurations have the feature that they can’t be extended: no tile can be added on the center of their “blue sides”. But it turns out that all the other configurations can be extended—though only to make a nested tiling, not a periodic one.

And a first indication of this is that larger copies of tiles (“supertiles”) can be drawn on top of the first three configurations we just identified, in such a way that the vertices of the supertiles coincide with vertices of the original tiles:

And now we can use this to construct rules for a substitution system:

Applying this substitution system builds up a nested tiling that can be continued forever:

But is such a nested tiling the only one that is possible with our original tiles? We can prove that it is by showing that every tile in every possible configuration occurs within a supertile. We can pull out possible configurations from the multiway system—and then in each case it turns out that we can indeed find a supertile in which the original tile occurs:

And what this all means is that the only infinite paths that can occur in the multiway system are ones that correspond to nested tilings; all other paths must eventually die out.

The Penrose tiling involves two distinct tiles. But in 2022 it was discovered that—if one’s allowed to flip the tile over—just a single (“hat”) tile is sufficient to force a nonperiodic tiling:

The full multiway graph obtained from this tile (and its flip-over) is complicated, but many paths in it lead (at least eventually) to “dead ends” which cannot be further extended. Thus, for example, the following configurations—which appear early in the multiway graph—all have the property that they can’t occur in an infinite tiling:

In the first case here, we can successively add a few rings of tiles:

But after 7 rings, there is a “contradiction” on the boundary, and no further growth is possible (as indicated by the red annotations):

Having eliminated cases that always lead to “dead ends” the resulting simplified multiway graph effectively includes all joins between hat tiles that can ultimately lead to surviving configurations:

Once again we can define a supertile transformation

where the region outlined in red can potentially overlap another supertile. Now we can construct a multiway graph for the supertile (in its “bitten out” and full variant)

and can see that there is a (one-to-one) map from the multiway graph for the original tiles and for these supertiles:

And now from this we can tell that there can be arbitrarily large nested tilings using the hat tile:

Tucked away on page 979 of my 2002 book *A New Kind of Science* is a note (written in 1995) on “Generalized aggregation models”:

And in many ways the current piece is a three-decade-later followup to that note—using a new approach based on multiway systems.

In *A New Kind of Science* I did discuss multiway systems (both abstractly, and in connection with fundamental physics). But what I said about aggregation was mostly in a section called “The Phenomenon of Continuity” which discussed how randomness could on a large scale lead to apparent continuity. That section began by talking about things like random walks, but went on to discuss the same minimal (“Eden model”) example of “random aggregation” that I give here. And then, in an attempt to “spruce up” my discussion of aggregation, I started looking at “aggregation with constraints”. In the main text of the book I gave just two examples:

But then for the footnote I studied a wider range of constraints (enumerating them much as I had cellular automata)—and noticed the surprising phenomenon that with some constraints the aggregation process could end up getting stuck, and not being able to continue.

For years I carried around the idea of investigating that phenomenon further. And it was often on my list as a possible project for a student to explore at the Wolfram Summer School. Occasionally it was picked, and progress was made in various directions. And then a few years ago, with our Physics Project in the offing, the idea arose of investigating it using multiway systems—and there were Summer School projects that made progress on this. Meanwhile, as our Physics Project progressed, our tools for working with multiway systems greatly improved—ultimately making possible what we’ve done here.

By the way, back in the 1990s, one of the many topics I studied for *A New Kind of Science* was tilings. And in an effort to determine what tilings were possible, I investigated what amounts to aggregation under tiling constraints—which is in fact even a generalization of what I consider here:

First and foremost, I’d like to thank Brad Klee for extensive help with this piece, as well as Nik Murzin for additional help. (Thanks also to Catherine Wolfram, Christopher Wolfram and Ed Pegg for specific pointers.) I’d like to thank various Wolfram Summer School students (and their mentors) who’ve worked on aggregation systems and their multiway interpretation in recent years: Kabir Khanna 2019 (mentors: Christopher Wolfram & Jonathan Gorard), Lina M. Ruiz 2021 (mentors: Jesse Galef & Xerxes Arsiwalla), Pietro Pepe 2023 (mentor: Bob Nachbar). (Also related are the Summer School projects on tilings by Bowen Ping 2023 and Johannes Martin 2023.)

Games and Puzzles as Multicomputational Systems

The Physicalization of Metamathematics and Its Implications for the Foundations of Mathematics

Multicomputation with Numbers: The Case of Simple Multiway Systems

Multicomputation: A Fourth Paradigm for Theoretical Science

Combinators: A Centennial View—Updating Schemes and Multiway Systems

]]>*Transcript of a talk at TED AI on October 17, 2023, in San Francisco*

Human language. Mathematics. Logic. These are all ways to formalize the world. And in our century there’s a new and yet more powerful one: computation.

And for nearly 50 years I’ve had the great privilege of building an ever taller tower of science and technology based on that idea of computation. And today I want to tell you some of what that’s led to.

There’s a lot to talk about—so I’m going to go quickly… sometimes with just a sentence summarizing what I’ve written a whole book about.

You know, I last gave a TED talk thirteen years ago—in February 2010—soon after Wolfram|Alpha launched.

And I ended that talk with a question: is computation ultimately what’s underneath everything in our universe?

I gave myself a decade to find out. And actually it could have needed a century. But in April 2020—just after the decade mark—we were thrilled to be able to announce what seems to be the ultimate “machine code” of the universe.

And, yes, it’s computational. So computation isn’t just a possible formalization; it’s the ultimate one for our universe.

It all starts from the idea that space—like matter—is made of discrete elements. And that the structure of space and everything in it is just defined by the network of relations between these elements—that we might call atoms of space. It’s very elegant—but deeply abstract.

But here’s a humanized representation:

A version of the very beginning of the universe. And what we’re seeing here is the emergence of space and everything in it by the successive application of very simple computational rules. And, remember, those dots are not atoms in any existing space. They’re atoms *of* space—that are getting put together to *make* space. And, yes, if we kept going long enough, we could build our whole universe this way.

Eons later here’s a chunk of space with two little black holes, that eventually merge, radiating ripples of gravitational radiation:

And remember—all this is built from pure computation. But like fluid mechanics emerging from molecules, what emerges here is spacetime—and Einstein’s equations for gravity. Though there are deviations that we just might be able to detect. Like that the dimensionality of space won’t always be precisely 3.

And there’s something else. Our computational rules can inevitably be applied in many ways, each defining a different thread of time—a different path of history—that can branch and merge:

But as observers embedded in this universe, we’re branching and merging too. And it turns out that quantum mechanics emerges as the story of how branching minds perceive a branching universe.

The little pink lines here show the structure of what we call branchial space—the space of quantum branches. And one of the stunningly beautiful things—at least for a physicist like me—is that the same phenomenon that in physical space gives us gravity, in branchial space gives us quantum mechanics.

In the history of science so far, I think we can identify four broad paradigms for making models of the world—that can be distinguished by how they deal with time.

In antiquity—and in plenty of areas of science even today—it’s all about “what things are made of”, and time doesn’t really enter. But in the 1600s came the idea of modeling things with mathematical formulas—in which time enters, but basically just as a coordinate value.

Then in the 1980s—and this is something in which I was deeply involved—came the idea of making models by starting with simple computational rules and then just letting them run:

Can one predict what will happen? No, there’s what I call computational irreducibility: in effect the passage of time corresponds to an irreducible computation that we have to run to know how it will turn out.

But now there’s something even more: in our Physics Project things become multicomputational, with many threads of time, that can only be knitted together by an observer.

It’s a new paradigm—that actually seems to unlock things not only in fundamental physics, but also in the foundations of mathematics and computer science, and possibly in areas like biology and economics too.

You know, I talked about building up the universe by repeatedly applying a computational rule. But how is that rule picked? Well, actually, it isn’t. Because all possible rules are used. And we’re building up what I call the ruliad: the deeply abstract but unique object that is the entangled limit of all possible computational processes. Here’s a tiny fragment of it shown in terms of Turing machines:

OK, so the ruliad is everything. And we as observers are necessarily part of it. In the ruliad as a whole, everything computationally possible can happen. But observers like us can just sample specific slices of the ruliad.

And there are two crucial facts about us. First, we’re computationally bounded—our minds are limited. And second, we believe we’re persistent in time—even though we’re made of different atoms of space at every moment.

So then here’s the big result. What observers with those characteristics perceive in the ruliad necessarily follows certain laws. And those laws turn out to be precisely the three key theories of 20th-century physics: general relativity, quantum mechanics, and statistical mechanics and the Second Law.

It’s because we’re observers like us that we perceive the laws of physics we do.

We can think of different minds as being at different places in rulial space. Human minds who think alike are nearby. Animals further away. And further out we get to alien minds where it’s hard to make a translation.

How can we get intuition for all this? We can use generative AI to take what amounts to an incredibly tiny slice of the ruliad—aligned with images we humans have produced.

We can think of this as a place in the ruliad described using the concept of a cat in a party hat:

Zooming out, we see what we might call “cat island”. But pretty soon we’re in interconcept space. Occasionally things will look familiar, but mostly we’ll see things we humans don’t have words for.

In physical space we explore more of the universe by sending out spacecraft. In rulial space we explore more by expanding our concepts and our paradigms.

We can get a sense of what’s out there by sampling possible rules—doing what I call ruliology:

Even with incredibly simple rules there’s incredible richness. But the issue is that most of it doesn’t yet connect with things we humans understand or care about. It’s like when we look at the natural world and only gradually realize we can use features of it for technology. Even after everything our civilization has achieved, we’re just at the very, very beginning of exploring rulial space.

But what about AIs? Just like we can do ruliology, AIs can in principle go out and explore rulial space. But left to their own devices, they’ll mostly be doing things we humans don’t connect with, or care about.

The big achievements of AI in recent times have been about making systems that are closely aligned with us humans. We train LLMs on billions of webpages so they can produce text that’s typical of what we humans write. And, yes, the fact that this works is undoubtedly telling us some deep scientific things about the semantic grammar of language—and generalizations of things like logic—that perhaps we should have known centuries ago.

You know, for much of human history we were kind of like LLMs, figuring things out by matching patterns in our minds. But then came more systematic formalization—and eventually computation. And with that we got a whole other level of power—to create truly new things, and in effect to go wherever we want in the ruliad.

But the challenge is to do that in a way that connects with what we humans—and our AIs—understand.

And in fact I’ve devoted a large part of my life to building that bridge. It’s all been about creating a language for expressing ourselves computationally: a language for computational thinking.

The goal is to formalize what we know about the world—in computational terms. To have computational ways to represent cities and chemicals and movies and formulas—and our knowledge about them.

It’s been a vast undertaking—that’s spanned more than four decades of my life. It’s something very unique and different. But I’m happy to report that in what has been Mathematica and is now the Wolfram Language I think we have now firmly succeeded in creating a truly full-scale computational language.

In effect, every one of the functions here can be thought of as formalizing—and encapsulating in computational terms—some facet of the intellectual achievements of our civilization:

It’s the most concentrated form of intellectual expression I know: finding the essence of everything and coherently expressing it in the design of our computational language. For me personally it’s been an amazing journey, year after year building the tower of ideas and technology that’s needed—and nowadays sharing that process with the world on open livestreams.

A few centuries ago the development of mathematical notation, and what amounts to the “language of mathematics”, gave a systematic way to express math—and made possible algebra, and calculus, and ultimately all of modern mathematical science. And computational language now provides a similar path—letting us ultimately create a “computational X” for all imaginable fields X.

We’ve seen the growth of computer science—CS. But computational language opens up something ultimately much bigger and broader: CX. For 70 years we’ve had programming languages—which are about telling computers in their terms what to do. But computational language is about something intellectually much bigger: it’s about taking everything we can think about and operationalizing it in computational terms.

You know, I built the Wolfram Language first and foremost because I wanted to use it myself. And now when I use it, I feel like it’s giving me a superpower:

I just have to imagine something in computational terms and then the language almost magically lets me bring it into reality, see its consequences and then build on them. And, yes, that’s the superpower that’s let me do things like our Physics Project.

And over the past 35 years it’s been my great privilege to share this superpower with many other people—and by doing so to have enabled such an incredible number of advances across so many fields. It’s a wonderful thing to see people—researchers, CEOs, kids—using our language to fluently think in computational terms, crispening up their own thinking and then in effect automatically calling in computational superpowers.

And now it’s not just people who can do that. AIs can use our computational language as a tool too. Yes, to get their facts straight, but even more importantly, to compute new facts. There are already some integrations of our technology into LLMs—and there’s a lot more you’ll be seeing soon. And, you know, when it comes to building new things, a very powerful emerging workflow is basically to start by telling the LLM roughly what you want, then have it try to express that in precise Wolfram Language. Then—and this is a critical feature of our computational language compared to a programming language—you as a human can “read the code”. And if it does what you want, you can use it as a dependable component to build on.

OK, but let’s say we use more and more AI—and more and more computation. What’s the world going to be like? From the Industrial Revolution on, we’ve been used to doing engineering where we can in effect “see how the gears mesh” to “understand” how things work. But computational irreducibility now shows that won’t always be possible. We won’t always be able to make a simple human—or, say, mathematical—narrative to explain or predict what a system will do.

And, yes, this is science in effect eating itself from the inside. From all the successes of mathematical science we’ve come to believe that somehow—if only we could find them—there’d be formulas to predict everything. But now computational irreducibility shows that isn’t true. And that in effect to find out what a system will do, we have to go through the same irreducible computational steps as the system itself.

Yes, it’s a weakness of science. But it’s also why the passage of time is significant—and meaningful. We can’t just jump ahead and get the answer; we have to “live the steps”.

It’s going to be a great societal dilemma of the future. If we let our AIs achieve their full computational potential, they’ll have lots of computational irreducibility, and we won’t be able to predict what they’ll do. But if we put constraints on them to make them predictable, we’ll limit what they can do for us.

So what will it feel like if our world is full of computational irreducibility? Well, it’s really nothing new—because that’s the story with much of nature. And what’s happened there is that we’ve found ways to operate within nature—even though nature can still surprise us.

And so it will be with the AIs. We might give them a constitution, but there will always be consequences we can’t predict. Of course, even figuring out societally what we want from the AIs is hard. Maybe we need a promptocracy where people write prompts instead of just voting. But basically every control-the-outcome scheme seems full of both political philosophy and computational irreducibility gotchas.

You know, if we look at the whole arc of human history, the one thing that’s systematically changed is that more and more gets automated. And LLMs just gave us a dramatic and unexpected example of that. So does that mean that in the end we humans will have nothing to do? Well, if you look at history, what seems to happen is that when one thing gets automated away, it opens up lots of new things to do. And as economies develop, the pie chart of occupations seems to get more and more fragmented.

And now we’re back to the ruliad. Because at a foundational level what’s happening is that automation is opening up more directions to go in the ruliad. And there’s no abstract way to choose between them. It’s just a question of what we humans want—and it requires humans “doing work” to define that.

A society of AIs untethered by human input would effectively go off and explore the whole ruliad. But most of what they’d do would seem to us random and pointless. Much like now most of nature doesn’t seem like it’s “achieving a purpose”.

One used to imagine that to build things that are useful to us, we’d have to do it step by step. But AI and the whole phenomenon of computation tell us that really what we need is more just to define what we want. Then computation, AI, automation can make it happen.

And, yes, I think the key to defining in a clear way what we want is computational language. You know—even after 35 years—for many people the Wolfram Language is still an artifact from the future. If your job is to program it seems like a cheat: how come you can do in an hour what would usually take a week? But it can also be daunting, because having dashed off that one thing, you now have to conceptualize the next thing. Of course, it’s great for CEOs and CTOs and intellectual leaders who are ready to race onto the next thing. And indeed it’s impressively popular in that set.

In a sense, what’s happening is that Wolfram Language shifts from concentrating on mechanics to concentrating on conceptualization. And the key to that conceptualization is broad computational thinking. So how can one learn to do that? It’s not really a story of CS. It’s really a story of CX. And as a kind of education, it’s more like liberal arts than STEM. It’s part of a trend that when you automate technical execution, what becomes important is not figuring out how to do things—but what to do. And that’s more a story of broad knowledge and general thinking than any kind of narrow specialization.

You know, there’s an unexpected human-centeredness to all of this. We might have thought that with the advance of science and technology, the particulars of us humans would become ever less relevant. But we’ve discovered that that’s not true. And that in fact everything—even our physics—depends on how we humans happen to have sampled the ruliad.

Before our Physics Project we didn’t know if our universe really was computational. But now it’s pretty clear that it is. And from that we’re inexorably led to the ruliad—with all its vastness, so hugely greater than all the physical space in our universe.

So where will we go in the ruliad? Computational language is what lets us chart our path. It lets us humans define our goals and our journeys. And what’s amazing is that all the power and depth of what’s out there in the ruliad is accessible to everyone. One just has to learn to harness those computational superpowers. Which starts here. Our portal to the ruliad:

]]>

Enter any expression and it’ll get evaluated:

And internally—say in the Wolfram Language—what’s going on is that the expression is progressively being transformed using all available rules until no more rules apply. Here the process can be represented like this:

We can think of the yellow boxes in this picture as corresponding to “evaluation events” that transform one “state of the expression” (represented by a blue box) to another, eventually reaching the “fixed point” 12.

And so far this may all seem very simple. But actually there are many surprisingly complicated and deep issues and questions. For example, to what extent can the evaluation events be applied in different orders, or in parallel? Does one always get the same answer? What about non-terminating sequences of events? And so on.

I was first exposed to such issues more than 40 years ago—when I was working on the design of the evaluator for the SMP system that was the forerunner of Mathematica and the Wolfram Language. And back then I came up with pragmatic, practical solutions—many of which we still use today. But I was never satisfied with the whole conceptual framework. And I always thought that there should be a much more principled way to think about such things—that would likely lead to all sorts of important generalizations and optimizations.

Well, more than 40 years later I think we can finally now see how to do this. And it’s all based on ideas from our Physics Project—and on a fundamental correspondence between what’s happening at the lowest level in all physical processes and in expression evaluation. Our Physics Project implies that ultimately the universe evolves through a series of discrete events that transform the underlying structure of the universe (say, represented as a hypergraph)—just like evaluation events transform the underlying structure of an expression.

And given this correspondence, we can start applying ideas from physics—like ones about spacetime and quantum mechanics—to questions of expression evaluation. Some of what this will lead us to is deeply abstract. But some of it has immediate practical implications, notably for parallel, distributed, nondeterministic and quantum-style computing. And from seeing how things play out in the rather accessible and concrete area of expression evaluation, we’ll be able to develop more intuition about fundamental physics and about other areas (like metamathematics) where the ideas of our Physics Project can be applied.

The standard evaluator in the Wolfram Language applies evaluation events to an expression in a particular order. But typically multiple orders are possible; for the example above, there are three:

So what determines what orders are possible? There is ultimately just one constraint: the causal dependencies that exist between events. The key point is that a given event cannot happen unless all the inputs to it are available, i.e. have already been computed. So in the example here, the evaluation event cannot occur unless the one has already occurred. And we can summarize this by “drawing a causal edge” from the event to the one. Putting together all these “causal relations”, we can make a causal graph, which in the example here has the simple form (where we include a special “Big Bang” initial event to create the original expression that we’re evaluating):

What we see from this causal graph is that the events on the left must all follow each other, while the event on the right can happen “independently”. And this is where we can start making an analogy with physics. Imagine our events are laid out in spacetime. The events on the left are “timelike separated” from each other, because they are constrained to follow one after another, and so must in effect “happen at different times”. But what about the event on the right? We can think of this as being “spacelike separated” from the others, and happening at a “different place in space” asynchronously from the others.

As a quintessential example of a timelike chain of events, consider making the definition

and then generating the causal graph for the events associated with evaluating `f[f[f[1]]]` (i.e. `Nest[f, 1, 3]`):

A straightforward way to get spacelike events is just to “build in space” by giving an expression like `f[1]` `+` `f[1]` `+` `f[1]` that has parts that can effectively be thought of as being explicitly “laid out in different places”, like the cells in a cellular automaton:

But one of the major lessons of our Physics Project is that it’s possible for space to “emerge dynamically” from the evolution of a system (in that case, by successive rewriting of hypergraphs). And it turns out very much the same kind of thing can happen in expression evaluation, notably with recursively defined functions.

As a simple example, consider the standard definition of Fibonacci numbers:

With this definition, the causal graph for the evaluation of `f[3]` is then:

For `f[5]`, dropping the “context” of each event, and showing only what changed, the graph is

while for `f[8]` the structure of the graph is:

So what is the significance of there being spacelike-separated parts in this graph? At a practical level, a consequence is that those parts correspond to subevaluations that can be done independently, for example in parallel. All the events (or subevaluations) in any timelike chain must be done in sequence. But spacelike-separated events (or subevaluations) don’t immediately have a particular relative order. The whole graph can be thought of as defining a partial ordering for all events—with the events forming a partially ordered set (poset). Our “timelike chains” then correspond to what are usually called chains in the poset. The antichains of the poset represent possible collections of events that can occur “simultaneously”.

And now there’s a deep analogy to physics. Because just like in the standard relativistic approach to spacetime, we can define a sequence of “spacelike surfaces” (or hypersurfaces in 3 + 1-dimensional spacetime) that correspond to possible successive “simultaneity surfaces” where events can consistently be done simultaneously. Put another way, any “foliation” of the causal graph defines a sequence of “time steps” in which particular collections of events occur—as in for example:

And just like in relativity theory, different foliations correspond to different choices of reference frames, or what amount to different choices of “space and time coordinates”. But at least in the examples we’ve seen so far, the “final result” from the evaluation is always the same, regardless of the foliation (or reference frame) we use—just as we expect when there is relativistic invariance.

As a slightly more complex—but ultimately very similar—example, consider the nestedly recursive function:

Now the causal graph for `f[12]` has the form

which again has both spacelike and timelike structure.

Let’s go back to our first example above—the evaluation of (1 + (2 + 2)) + (3 + 4). As we saw above, the causal graph in this case is:

The standard Wolfram Language evaluator makes these events occur in the following order:

And by applying events in this order starting with the initial state, we can reconstruct the sequence of states that will be reached at each step by this particular evaluation process (where now we’ve highlighted in each state the part that’s going to be transformed at each step):

Here’s the standard evaluation order for the Fibonacci number `f[3]`:

And here’s the sequence of states generated from this sequence of events:

Any valid evaluation order has to eventually visit (i.e. apply) all the events in the causal graph. Here’s the path that’s traced out by the standard evaluation order on the causal graph for `f[8]`. As we’ll discuss later, this corresponds to a depth-first scan of the (directed) graph:

But let’s return now to our first example. We’ve seen the order of events used in the standard Wolfram Language evaluation process. But there are actually three different orders that are consistent with the causal relations defined by the causal graph (in the language of posets, each of these is a “total ordering”):

And for each of these orders we can reconstruct the sequence of states that would be generated:

Up to this point we’ve always assumed that we’re just applying one event at a time. But whenever we have spacelike-separated events, we can treat such events as “simultaneous”—and applied at the same point. And—just like in relativity theory—there are typically multiple possible choices of “simultaneity surfaces”. Each one corresponds to a certain foliation of our causal graph. And in the simple case we’re looking at here, there are only two possible (maximal) foliations:

From such foliations we can reconstruct possible total orderings of individual events just by enumerating possible permutations of events within each slice of the foliation (i.e. within each simultaneity surface). But we only really need a total ordering of events if we’re going to apply one event at a time. Yet the whole point is that we can view spacelike-separated events as being “simultaneous”. Or, in other words, we can view our system as “evolving in time”, with each “time step” corresponding to a successive slice in the foliation.

And with this setup, we can reconstruct states that exist at each time step—interspersed by updates that may involve several “simultaneous” (spacelike-separated) events. In the case of the two foliations above, the resulting sequences of (“reconstructed”) states and updates are respectively:

As a more complicated example, consider recursively evaluating the Fibonacci number `f[3]` as above. Now the possible (maximal) foliations are:

For each of these foliations we can then reconstruct an explicit “time series” of states, interspersed by “updates” involving varying numbers of events:

So where in all these is the standard evaluation order? Well, it’s not explicitly here—because it involves doing a single event at a time, while all the foliations here are “maximal” in the sense that they aggregate as many events as they can into each spacelike slice. But if we don’t impose this maximality constraint, are there foliations that in a sense “cover” the standard evaluation order? Without the maximality constraint, there turn out in the example we’re using to be not 10 but 1249 possible foliations. And there are 4 that “cover” the standard (“depth-first”) evaluation order (indicated by a dashed red line):

(Only the last foliation here, in which every “slice” is just a single event, can strictly reproduce the standard evaluation order, but the others are all still “consistent with it”.)

In the standard evaluation process, only a single event is ever done at a time. But what if instead one tries to do as many events as possible at a time? Well, that’s what our “maximal foliations” above are about. But one particularly notable case is what corresponds to a breadth-first scan of the causal graph. And this turns out to be covered by the very last maximal foliation we showed above.

How this works may not be immediately obvious from the picture. With our standard layout for the causal graph, the path corresponding to the breadth-first scan is:

But if we lay out the causal graph differently, the path takes on the much-more-obviously-breadth-first form:

And now using this layout for the various configurations of foliations above we get:

We can think of different layouts for the causal graph as defining different “coordinatizations of spacetime”. If the vertical direction is taken to be time, and the horizontal direction space, then different layouts in effect place events at different positions in time and space. And with the layout here, the last foliation above is “flat”, in the sense that successive slices of the foliation can be thought of as directly corresponding to successive “steps in time”.

In physics terms, different foliations correspond to different “reference frames”. And the “flat” foliation can be thought of as being like the cosmological rest frame, in which the observer is “at rest with respect to the universe”. In terms of states and events, we can also interpret this another way: we can say it’s the foliation in which in some sense the “largest possible number of events are being packed in at each step”. Or, more precisely, if at each step we scan from left to right, we’re doing every successive event that doesn’t overlap with events we’ve already done at this step:

And actually this also corresponds to what happens if, instead of using the built-in standard evaluator, we explicitly tell the Wolfram Language to repeatedly do replacements in expressions. To compare with what we’ve done above, we have to be a little careful in our definitions, using ⊕ and ⊖ as versions of + and – that have to get explicitly evaluated by other rules. But having done this, we get exactly the same sequence of “intermediate expressions” as in the flat (i.e. “breadth-first”) foliation above:

In general, different foliations can be thought of as specifying different “event-selection functions” to be applied to determine what events should occur at the next steps from any given state. At one extreme we can pick single-event-at-a-time event selection functions—and at the other extreme we can pick maximum-events-at-a-time event selection functions. In our Physics Project we have called the states obtained by applying maximal collections of events at a time “generational states”. And in effect these states represent the typical way we parse physical “spacetime”—in which we take in “all of space” at every successive moment of time. At a practical level the reason we do this is that the speed of light is somehow fast compared to the operation of our brains: if we look at our local surroundings (say the few hundred meters around us), light from these will reach us in a microsecond, while it takes our brains milliseconds to register what we’re seeing. And this makes it reasonable for us to think of there being an “instantaneous state of space” that we can perceive “all at once” at each particular “moment in time”.

But what’s the analog of this when it comes to expression evaluation? We’ll discuss this a little more later. But suffice it to say here that it depends on who or what the “observer” of the process of evaluation is supposed to be. If we’ve got different elements of our states laid out explicitly in arrays, say in a GPU, then we might again “perceive all of space at once”. But if, for example, the data associated with states is connected through chains of pointers in memory or the like, and we “observe” this data only when we explicitly follow these pointers, then our perception won’t as obviously involve something we can think of as “bulk space”. But by thinking in terms of foliations (or reference frames) as we have here, we can potentially fit what’s going on into something like space, that seems familiar to us. Or, put another way, we can imagine in effect “programming in a certain reference frame” in which we can aggregate multiple elements of what’s going on into something we can consider as an analog of space—thereby making it familiar enough for us to understand and reason about.

We can view everything we’ve done so far as dissecting and reorganizing the standard evaluation process. But let’s say we’re just given certain underlying rules for transforming expressions—and then we apply them in all possible ways. It’ll give us a “multiway” generalization of evaluation—in which instead of there being just one path of history, there are many. And in our Physics Project, this is exactly how the transition from classical to quantum physics works. And as we proceed here, we’ll see a close correspondence between multiway evaluation and quantum processes.

But let’s start again with our expression (1 + (2 + 2)) + (3 + 4), and consider all possible ways that individual integer addition “events” can be applied to evaluate this expression. In this particular case, the result is pretty simple, and can be represented by a tree that branches in just two places:

But one thing to notice here is that even at the first step there’s an event that we’ve never seen before. It’s something that’s possible if we apply integer addition in all possible places. But when we start from the standard evaluation process, the basic event just never appears with the “expression context” we’re seeing it in here.

Each branch in the tree above in some sense represents a different “path of history”. But there’s a certain redundancy in having all these separate paths—because there are multiple instances of the same expression that appear in different places. And if we treat these as equivalent and merge them we now get:

(The question of “state equivalence” is a subtle one, that ultimately depends on the operation of the observer, and how the observer constructs their perception of what’s going on. But for our purposes here, we’ll treat expressions as equivalent if they are structurally the same, i.e. every instance of or of 5 is “the same” or 5.)

If we now look only at states (i.e. expressions) we’ll get a multiway graph, of the kind that’s appeared in our Physics Project and in many applications of concepts from it:

This graph in a sense gives a succinct summary of possible paths of history, which here correspond to possible evaluation paths. The standard evaluation process corresponds to a particular path in this multiway graph:

What about a more complicated case? For example, what is the multiway graph for our recursive computation of Fibonacci numbers? As we’ll discuss at more length below, in order to make sure every branch of our recursive evaluation terminates, we have to give a slightly more careful definition of our function `f`:

But now here’s the multiway tree for the evaluation of `f[2]`:

And here’s the corresponding multiway graph:

The leftmost branch in the multiway tree corresponds to the standard evaluation process; here’s the corresponding path in the multiway graph:

Here’s the structure of the multiway graph for the evaluation of `f[3]`:

Note that (as we’ll discuss more later) all the possible evaluation paths in this case lead to the same final expression, and in fact in this particular example all the paths are of the same length (12 steps, i.e. 12 evaluation events).

In the multiway graphs we’re drawing here, every edge in effect corresponds to an evaluation event. And we can imagine setting up foliations in the multiway graph that divide these events into slices. But what is the significance of these slices? When we did the same kind of thing above for causal graphs, we could interpret the slices as representing “instantaneous states laid out in space”. And by analogy we can interpret a slice in the multiway graph as representing “instantaneous states laid out across branches of history”. In the context of our Physics Project, we can then think of these slices as being like superpositions in quantum mechanics, or states “laid out in branchial space”. And, as we’ll discuss later, just as we can think of elements laid out in “space” as corresponding in the Wolfram Language to parts in a symbolic expression (like a list, a sum, etc.), so now we’re dealing with a new kind of way of aggregating states across branchial space, that has to be represented with new language constructs.

But let’s return to the very simple case of (1 + (2 + 2)) + (3 + 4). Here’s a more complete representation of the multiway evaluation process in this case, including both all the events involved, and the causal relations between them:

The “single-way” evaluation process we discussed above uses only part of this:

And from this part we can pull out the causal relations between events to reproduce the (“single-way”) causal graph we had before. But what if we pull out all the causal relations in our full graph?

What we then have is the multiway causal graph. And from foliations of this, we can construct possible histories—though now they’re multiway histories, with the states at particular time steps now being what amount to superposition states.

In the particular case we’re showing here, the multiway causal graph has a very simple structure, consisting essentially just of a bunch of isomorphic pieces. And as we’ll see later, this is an inevitable consequence of the nature of the evaluation we’re doing here, and its property of causal invariance (and in this case, confluence).

Although what we’ve discussed has already been somewhat complicated, there’s actually been a crucial simplifying assumption in everything we’ve done. We’ve assumed that different transformations on a given expression can never apply to the same part of the expression. Different transformations can apply to different parts of the same expression (corresponding to spacelike-separated evaluation events). But there’s never been a “conflict” between transformations, where multiple transformations can apply to the same part of the same expression.

So what happens if we relax this assumption? In effect it means that we can generate different “incompatible” branches of history—and we can characterize the events that produce this as “branchlike separated”. And when such branchlike-separated events are applied to a given state, they’ll produce multiple states which we can characterize as “separated in branchial space”, but nevertheless correlated as a result of their “common ancestry”—or, in quantum mechanics terms, “entangled”.

As a very simple first example, consider the rather trivial function `f` defined by

If we evaluate `f[f[0]]` (for any `f`) there are immediately two “conflicting” branches: one associated with evaluation of the “outer `f`”, and one with evaluation of the “inner `f`”:

We can indicate branchlike-separated pairs of events by a dashed line:

Adding in causal edges, and merging equivalent states, we get:

We see that some events are causally related. The first two events are not—but given that they involve overlapping transformations they are “branchially related” (or, in effect, entangled).

Evaluating the expression `f[f[0]+1]` gives a more complicated graph, with two different instances of branchlike-separated events:

Extracting the multiway states graph we get

where now we have indicated “branchially connected” states by pink “branchial edges”. Pulling out only these branchial edges then gives the (rather trivial) branchial graph for this evaluation process:

There are many subtle things going on here, particularly related to the treelike structure of expressions. We’ve talked about separations between events: timelike, spacelike and branchlike. But what about separations between elements of an expression? In something like `{f[0]`, `f[0]`, `f[0]}` it’s reasonable to extend our characterization of separations between events, and say that the `f[0]`’s in the expression can themselves be considered spacelike separated. But what about in something like `f[f[0]]`? We can say that the `f[_]`’s here “overlap”—and “conflict” when they are transformed—making them branchlike separated. But the structure of the expression also inevitably makes them “treelike separated”. We’ll see later how to think about the relation between treelike-separated elements in more fundamental terms, ultimately using hypergraphs. But for now an obvious question is what in general the relation between branchlike-separated elements can be.

And essentially the answer is that branchlike separation has to “come with” some other form of separation: spacelike, treelike, rulelike, etc. Rulelike separation involves having multiple rules for the same object (e.g. a rule as well as )—and we’ll talk about this later. With spacelike separation, we basically get branchlike separation when subexpressions “overlap”. This is fairly subtle for tree-structured expressions, but is much more straightforward for strings, and indeed we have discussed this case extensively in connection with our Physics Project.

Consider the (rather trivial) string rewriting rule:

Applying this rule to AAAAAA we get:

Some of the events here are purely spacelike separated, but whenever the characters they involve overlap, they are also branchlike separated (as indicated by the dashed pink lines). Extracting the multiway states graph we get:

And now we get the following branchial graph:

So how can we see analogs in expression evaluation? It turns out that combinators provide a good example (and, yes, it’s quite remarkable that we’re using combinators here to help explain something—given that combinators almost always seem like the most obscure and difficult-to-explain things around). Define the standard S and K combinators:

Now we have for example

where there are many spacelike-separated events, and a single pair of branchlike + treelike-separated ones. With a slightly more complicated initial expression, we get the rather messy result

now with many branchlike-separated states:

Rather than using the full standard S, K combinators, we can consider a simpler combinator definition:

Now we have for example

where the branchial graph is

and the multiway causal graph is:

The expression `f[f[f][f]][f]` gives a more complicated multiway graph

and branchial graph:

Before we started talking about branchlike separation, the only kinds of separation we considered were timelike and spacelike. And in this case we were able to take the causal graphs we got, and set up foliations of them where each slice could be thought of as representing a sequential step in time. In effect, what we were doing was to aggregate things so that we could talk about what happens in “all of space” at a particular time.

But when there’s branchlike separation we can no longer do this. Because now there isn’t a single, consistent “configuration of all of space” that can be thought of as evolving in a single thread through time. Rather, there are “multiple threads of history” that wind their way through the branchings (and mergings) that occur in the multiway graph. One can make foliations in the multiway graph—much like one does in the causal graph. (More strictly, one really needs to make the foliations in the multiway causal graph—but these can be “inherited” by the multiway graph.)

In physics terms, the (single-way) causal graph can be thought of as a discrete version of ordinary spacetime—with a foliation of it specifying a “reference frame” that leads to a particular identification of what one considers space, and what time. But what about the multiway causal graph? In effect, we can imagine that it defines a new, branchial “direction”, in addition to the spatial direction. Projecting in this branchial direction, we can then think of getting a kind of branchial analog of spacetime that we can call branchtime. And when we construct the multiway graph, we can basically imagine that it’s a representation of branchtime.

A particular slice of a foliation of the (single-way) causal graph can be thought of as corresponding to an “instantaneous state of (ordinary) space”. So what does a slice in a foliation of the multiway graph represent? It’s effectively a branchial or multiway combination of states—a collection of states that can somehow all exist “at the same time”. And in physics terms we can interpret it as a quantum superposition of states.

But how does all this work in the context of expressions? The parts of a single expression like *a* `+` *b* `+` *c* `+` *d*`{`*a*`,` *b*`,` *c*`,` *d*`}` can be thought of being spacelike separated, or in effect “laid out in space”. But what kind of a thing has parts that are “laid out in branchial space”? It’s a new kind of fundamentally multiway construct. We’re not going to explore it too much here, but in the Wolfram Language we might in future call it `Multi`. And just as `{`*a*`,` *b*`,` *c*`,` *d*`}` (or `List``[`*a*`,` *b*`,` *c*`,` *d*`]`) can be thought of as representing *a*, *b*, *c*, *d* “laid out in space”, so now `Multi``[`*a*`,` *b*`,` *c*`,` *d*`]` would represent *a*, *b*, *c*, *d*

In ordinary evaluation, we just generate a specific sequence of individual expressions. But in multiway evaluation, we can imagine that we generate a sequence of `Multi` objects. In the examples we’ve seen so far, we always eventually get a `Multi` containing just a single expression. But we’ll soon find out that that’s not always how things work, and we can perfectly well end up with a `Multi` containing multiple expressions.

So what might we do with a `Multi`? In a typical “nondeterministic computation” we probably want to ask: “Does the `Multi` contain some particular expression or pattern that we’re looking for?” If we imagine that we’re doing a “probabilistic computation” we might want to ask about the frequencies of different kinds of expressions in the `Multi`. And if we’re doing quantum computation with the normal formalism of quantum mechanics, we might want to tag the elements of the `Multi` with “quantum amplitudes” (that, yes, in our model presumably have magnitudes determined by path counting in the multiway graph, and phases representing the “positions of elements in branchial space”). And in a traditional quantum measurement, the concept would typically be to determine a projection of a `Multi`, or in effect an inner product of `Multi` objects. (And, yes, if one knows only that projection, it’s not going to be enough to let one unambiguously continue the “multiway computation”; the quantum state has in effect been “collapsed”.)

For an expression like (1 + (2 + 2)) + (3 + 4) it doesn’t matter in what order one evaluates things; one always gets the same result—so that the corresponding multiway graph leads to just a single final state:

But it’s not always true that there’s a single final state. For example, with the definitions

standard evaluation in the Wolfram Language gives the result 0 for `f[f[0]]` but the full multiway graph shows that (with a different evaluation order) it’s possible instead to get the result `g[g[0]]`:

And in general when a certain collection of rules (or definitions) always leads to just a single result, one says that the collection of rules is confluent; otherwise it’s not. Pure arithmetic turns out to be confluent. But there are plenty of examples (e.g. in string rewriting) that are not. Ultimately a failure of confluence must come from the presence of branchlike separation—or in effect a conflict between behavior on two different branches. And so in the example above we see that there are branchlike-separated “conflicting” events that never resolve—yielding two different final outcomes:

As an even simpler example, consider the definitions and . In the Wolfram Language these definitions immediately overwrite each other. But assume they could both be applied (say through explicit , rules). Then there’s a multiway graph with two “unresolved” branches—and two outcomes:

For string rewriting systems, it’s easy to enumerate possible rules. The rule

(that effectively sorts the elements in the string) is confluent:

But the rule

is not confluent

and “evaluates” BABABA to four distinct outcomes:

These are all cases where “internal conflicts” lead to multiple different final results. But another way to get different results is through “side effects”. Consider first setting *x* = 0 then evaluating `{x = 1, x + 1}`:

If the order of evaluation is such that `x + 1` is evaluated before `x = 1` it will give `1`, otherwise it will give `2`, leading to the two different outcomes `{1, 1}` and `{1, 2}`. In some ways this is like the example above where we had two distinct rules: and . But there’s a difference. While explicit rules are essentially applied only “instantaneously”, an assignment like *x* = 1 has a “permanent” effect, at least until it is “overwritten” by another assignment. In an evaluation graph like the one above we’re showing particular expressions generated during the evaluation process. But when there are assignments, there’s an additional “hidden state” that in the Wolfram Language one can think of as corresponding to the state of the global symbol table. If we included this, then we’d again see rules that apply “instantaneously”, and we’d be able to explicitly trace causal dependencies between events. But if we elide it, then we effectively hide the causal dependence that’s “carried” by the state of the symbol table, and the evaluation graphs we’ve been drawing are necessarily somewhat incomplete.

The basic operation of the Wolfram Language evaluator is to keep doing transformations until the result no longer changes (or, in other words, until a fixed point is reached). And that’s convenient for being able to “get a definite answer”. But it’s rather different from what one usually imagines happens in physics. Because in that case we’re typically dealing with things that just “keep progressing through time”, without ever getting to any fixed point. (“Spacetime singularities”, say in black holes, do for example involve reaching fixed points where “time has come to an end”.)

But what happens in the Wolfram Language if we just type , without giving any value to ? The Wolfram Language evaluator will keep evaluating this, trying to reach a fixed point. But it’ll never get there. And in practice it’ll give a message, and (at least in Version 13.3 and above) return a `TerminatedEvaluation` object:

What’s going on inside here? If we look at the evaluation graph, we can see that it involves an infinite chain of evaluation events, that progressively “extrude” `+1`’s:

A slightly simpler case (that doesn’t raise questions about the evaluation of `Plus`) is to consider the definition

which has the effect of generating an infinite chain of progressively more “`f`-nested” expressions:

Let’s say we define two functions:

Now we don’t just get a simple chain of results; instead we get an exponentially growing multiway graph:

In general, whenever we have a recursive definition (say of `f` in terms of `f` or *x* in terms of *x*) there’s the possibility of an infinite process of evaluation, with no “final fixed point”. There are of course specific cases of recursive definitions that always terminate—like the Fibonacci example we gave above. And indeed when we’re dealing with so-called “primitive recursion” this is how things inevitably work: we’re always “systematically counting down” to some defined base case (say `f[1] = 1`).

When we look at string rewriting (or, for that matter, hypergraph rewriting), evolution that doesn’t terminate is quite ubiquitous. And in direct analogy with, for example, the string rewriting rule ABBB, BBA we can set up the definitions

and then the (infinite) multiway graph begins:

One might think that the possibility of evaluation processes that don’t terminate would be a fundamental problem for a system set up like the Wolfram Language. But it turns out that in current normal usage one basically never runs into the issue except by mistake, when there’s a bug in one’s program.

Still, if one explicitly wants to generate an infinite evaluation structure, it’s not hard to do so. Beyond one can define

and then one gets the multiway graph

which has `CatalanNumber[t]` (or asymptotically ~4^{t}) states at layer *t*.

Another “common bug” form of non-terminating evaluation arises when one makes a primitive-recursion-style definition without giving a “boundary condition”. Here, for example, is the Fibonacci recursion without `f[0]` and `f[1]` defined:

And in this case the multiway graph is infinite

with ~2^{t} states at layer *t*.

But consider now the “unterminated factorial recursion”

On its own, this just leads to a single infinite chain of evaluation

but if we add the explicit rule that multiplying anything by zero gives zero (i.e. `0 _ → 0`) then we get

in which there’s a “zero sink” in addition to an infinite chain of `f[–n]` evaluations.

Some definitions have the property that they provably always terminate, though it may take a while. An example is the combinator definition we made above:

Here’s the multiway graph starting with `f[f[f][f]][f]`, and terminating in at most 10 steps:

Starting with `f[f[f][f][f][f]][f]` the multiway graph becomes

but again the evaluation always terminates (and gives a unique result). In this case we can see why this happens: at each step `f[x_][y_]` effectively “discards ”, thereby “fundamentally getting smaller”, even as it “puffs up” by making three copies of .

But if instead one uses the definition

things get more complicated. In some cases, the multiway evaluation always terminates

while in others, it never terminates:

But then there are cases where there is sometimes termination, and sometimes not:

In this particular case, what’s happening is that evaluation of the first argument of the “top-level `f`” never terminates, but if the top-level `f` is evaluated before its arguments then there’s immediate termination. Since the standard Wolfram Language evaluator evaluates arguments first (“leftmost-innermost evaluation”), it therefore won’t terminate in this case—even though there are branches in the multiway evaluation (corresponding to “outermost evaluation”) that do terminate.

If a computation reaches a fixed point, we can reasonably say that that’s the “result” of the computation. But what if the computation goes on forever? Might there still be some “symbolic” way to represent what happens—that for example allows one to compare results from different infinite computations?

In the case of ordinary numbers, we know that we can define a “symbolic infinity” ∞ (`Infinity` in Wolfram Language) that represents an infinite number and has all the obvious basic arithmetic properties:

But what about infinite processes, or, more specifically, infinite multiway graphs? Is there some useful symbolic way to represent such things? Yes, they’re all “infinite”. But somehow we’d like to distinguish between infinite graphs of different forms, say:

And already for integers, it’s been known for more than a century that there’s a more detailed way to characterize infinities than just referring to them all as ∞: it’s to use the idea of transfinite numbers. And in our case we can imagine successively numbering the nodes in a multiway graph, and seeing what the largest number we reach is. For an infinite graph of the form

(obtained say from *x* = *x* + 1 or *x* = {*x*}) we can label the nodes with successive integers, and we can say that the “largest number reached” is the transfinite ordinal ω.

A graph consisting of two infinite chains is then characterized by 2ω, while an infinite 2D grid is characterized by ω^{2}, and an infinite binary tree is characterized by 2^{ω}.

What about larger numbers? To get to ω^{ω} we can use a rule like

that effectively yields a multiway graph that corresponds to a tree in which successive layers have progressively larger numbers of branches:

One can think of a definition like *x* = *x* + 1 as setting up a “self-referential data structure”, whose specification is finite (in this case essentially a loop), and where the infinite evaluation process arises only when one tries to get an explicit value out of the structure. More elaborate recursive definitions can’t, however, readily be thought of as setting up straightforward self-referential data structures. But they still seem able to be characterized by transfinite numbers.

In general many multiway graphs that differ in detail will be associated with a given transfinite number. But the expectation is that transfinite numbers can potentially provide robust characterizations of infinite evaluation processes, with different constructions of the “same evaluation” able to be identified as being associated with the same canonical transfinite number.

Most likely, definitions purely involving pattern matching won’t be able to generate infinite evaluations beyond ε_{0} = ω^{ωω...}—which is also the limit of where one can reach with proofs based on ordinary induction, Peano Arithmetic, etc. It’s perfectly possible to go further—but one needs to explicitly use functions like `NestWhile` etc. in the definitions that are given.

And there’s another issue as well: given a particular set of definitions, there’s no limit to how difficult it can be to determine the ultimate multiway graph that’ll be produced. In the end this is a consequence of computational irreducibility, and of the undecidability of the halting problem, etc. And what one can expect in the end is that some infinite evaluation processes one will be able to prove can be characterized by particular transfinite numbers, but others one won’t be able to “tie down” in this way—and in general, as computational irreducibility might suggest, won’t ever allow one to give a “finite symbolic summary”.

One of the key lessons of our Physics Project is the importance of the character of the observer in determining what one “takes away” from a given underlying system. And in setting up the evaluation process—say in the Wolfram Language—the typical objective is to align with the way human observers expect to operate. And so, for example, one normally expects that one will give an expression as input, then in the end get an expression as output. The process of transforming input to output is analogous to the doing of a calculation, the answering of a question, the making of a decision, the forming of a response in human dialog, and potentially the forming of a thought in our minds. In all of these cases, we treat there as being a certain “static” output.

It’s very different from the way physics operates, because in physics “time always goes on”: there’s (essentially) always another step of computation to be done. In our usual description of evaluation, we talk about “reaching a fixed point”. But an alternative would be to say that we reach a state that just repeats unchanged forever—but we as observers equivalence all those repeats, and think of it as having reached a single, unchanging state.

Any modern practical computer also fundamentally works much more like physics: there are always computational operations going on—even though those operations may end up, say, continually putting the exact same pixel in the same place on the screen, so that we can “summarize” what’s going on by saying that we’ve reached a fixed point.

There’s much that can be done with computations that reach fixed points, or, equivalently with functions that return definite values. And in particular it’s straightforward to compose such computations or functions, continually taking output and then feeding it in as input. But there’s a whole world of other possibilities that open up once one can deal with infinite computations. As a practical matter, one can treat such computations “lazily”—representing them as purely symbolic objects from which one can derive particular results if one explicitly asks to do so.

One kind of result might be of the type typical in logic programming or automated theorem proving: given a potentially infinite computation, is it ever possible to reach a specified state (and, if so, what is the path to do so)? Another type of result might involve extracting a particular “time slice” (with some choice of foliation), and in general representing the result as a `Multi`. And still another type of result (reminiscent of “probabilistic programming”) might involve not giving an explicit `Multi`, but rather computing certain statistics about it.

And in a sense, each of these different kinds of results can be thought of as what’s extracted by a different kind of observer, who is making different kinds of equivalences.

We have a certain typical experience of the physical world that’s determined by features of us as observers. For example, as we mentioned above, we tend to think of “all of space” progressing “together” through successive moments of time. And the reason we think this is that the regions of space we typically see around us are small enough that the speed of light delivers information on them to us in a time that’s short compared to our “brain processing time”. If we were bigger or faster, then we wouldn’t be able to think of what’s happening in all of space as being “simultaneous” and we’d immediately be thrust into issues of relativity, reference frames, etc.

And in the case of expression evaluation, it’s very much the same kind of thing. If we have an expression laid out in computer memory (or across a network of computers), then there’ll be a certain time to “collect information spatially from across the expression”, and a certain time that can be attributed to each update event. And the essence of array programming (and much of the operation of GPUs) is that one can assume—like in the typical human experience of physical space—that “all of space” is being updated “together”.

But in our analysis above, we haven’t assumed this, and instead we’ve drawn causal graphs that explicitly trace dependencies between events, and show which events can be considered to be spacelike separated, so that they can be treated as “simultaneous”.

We’ve also seen branchlike separation. In the physics case, the assumption is that we as observers sample in an aggregated way across extended regions in branchial space—just as we do across extended regions in physical space. And indeed the expectation is that we encounter what we describe as “quantum effects” precisely because we are of limited extent in branchial space.

In the case of expression evaluation, we’re not used to being extended in branchial space. We typically imagine that we’ll follow some particular evaluation path (say, as defined by the standard Wolfram Language evaluator), and be oblivious to other paths. But, for example, strategies like speculative execution (typically applied at the hardware level) can be thought of as representing extension in branchial space.

And at a theoretical level, one certainly thinks of different kinds of “observations” in branchial space. In particular, there’s nondeterministic computation, in which one tries to identify a particular “thread of history” that reaches a given state, or a state with some property one wants.

One crucial feature of observers like us is that we are computationally bounded—which puts limitations on the kinds of observations we can make. And for example computational irreducibility then limits what we can immediately know (and aggregate) about the evolution of systems through time. And similarly multicomputational irreducibility limits what we can immediately know (and aggregate) about how systems behave across branchial space. And insofar as any computational devices we build in practice must be ones that we as observers can deal with, it’s inevitable that they’ll be subject to these kinds of limitations. (And, yes, in talking about quantum computers there tends to be an implicit assumption that we can in effect overcome multicomputational irreducibility, and “knit together” all the different computational paths of history—but it seems implausible that observers like us can actually do this, or can in general derive definite results without expending computationally irreducible effort.)

One further small comment about observers concerns what in physics are called closed timelike curves—essentially loops in time. Consider the definition:

This gives for example the multiway graph:

One can think of this as connecting the future to the past—something that’s sometimes interpreted as “allowing time travel”. But really this is just a more (time-)distributed version of a fixed point. In a fixed point, a single state is constantly repeated. Here a sequence of states (just two in the example given here) get visited repeatedly. The observer could treat these states as continually repeating in a cycle, or could coarse grain and conclude that “nothing perceptible is changing”.

In spacetime we think of observers as making particular choices of simultaneity surfaces—or in effect picking particular ways to “parse” the causal graph of events. In branchtime the analog of this is that observers pick how to parse the multiway graph. Or, put another way, observers get to choose a path through the multiway graph, corresponding to a particular evaluation order or evaluation scheme. In general, there is a tradeoff between the choices made by the observer, and the behavior generated by applying the rules of the system.

But if the observer is computationally bounded, they cannot overcome the computational irreducibility—or multicomputational irreducibility—of the behavior of the system. And as a result, if there is complexity in the detailed behavior of the system, the observer will not be able to avoid it at a detailed level by the choices they make. Though a critical idea of our Physics Project is that by appropriate aggregation, the observer will detect certain aggregate features of the system, that have robust characteristics independent of the underlying details. In physics, this represents a bulk theory suitable for the perception of the universe by observers like us. And presumably there is an analog of this in expression evaluation. But insofar as we’re only looking at the evaluation of expressions we’ve engineered for particular computational purposes, we’re not yet used to seeing “generic bulk expression evaluation”.

But this is exactly what we’ll see if we just go out and run “arbitrary programs”, say found by enumerating certain classes of programs (like combinators or multiway Turing machines). And for observers like us these will inevitably “seem very much like physics”.

Although we haven’t talked about this so far, any expression fundamentally has a tree structure. So, for example, (1 + (2 + 2)) + (3 + 4) is represented—say internally in the Wolfram Language—as the tree:

So how does this tree structure interact with the process of evaluation? In practice it means for example that in the standard Wolfram Language evaluator there are two different kinds of recursion going on. The first is the progressive (“timelike”) reevaluation of subexpressions that change during evaluation. And the second is the (“spacelike” or “treelike”) scanning of the tree.

In what we’ve discussed above, we’ve focused on evaluation events and their relationships, and in doing so we’ve concentrated on the first kind of recursion—and indeed we’ve often elided some of the effects of the second kind by, for example, immediately showing the result of evaluating `Plus``[2, 2]` without showing more details of how this happens.

But here now is a more complete representation of what’s going on in evaluating this simple expression:

The solid gray lines in this “trace graph” indicate the subparts of the expression tree at each step. The dashed gray lines indicate how these subparts are combined to make expressions. And the red lines indicate actual evaluation events where rules (either built in or specified by definitions) are applied to expressions.

It’s possible to read off things like causal dependence between events from the trace graph. But there’s a lot else going on. Much of it is at some level irrelevant—because it involves recursing into parts of the expression tree (like the head `Plus`) where no evaluation events occur. Removing these parts we then get an elided trace graph in which for example the causal dependence is clearer:

Here’s the trace graph for the evaluation of `f[5]` with the standard recursive Fibonacci definition

and here’s its elided form:

At least when we discussed single-way evaluation above, we mostly talked about timelike and spacelike relations between events. But with tree-structured expressions there are also treelike relations.

Consider the rather trivial definition

and look at the multiway graph for the evaluation of `f[f[0]]`:

What is the relation between the event on the left branch, and the top event on the right branch? We can think of them as being treelike separated. The event on the left branch transforms the whole expression tree. But the event on the right branch just transforms a subexpression.

Spacelike-separated events affect disjoint parts in an expression (i.e. ones on distinct branches of the expression tree). But treelike-separated events affect nested parts of an expression (i.e. ones that appear on a single branch in the expression tree). Inevitably, treelike-separated events also have a kind of one-way branchlike separation: if the “higher event” in the tree happens, the “lower one” cannot.

In terms of Wolfram Language part numbers, spacelike-separated events affect parts with disjoint numbers, say `{2, 5}` and `{2, 8}`. But treelike-separated events affect parts with overlapping sequences of part numbers, say `{2}` and `{2, 5}` or `{2, 5}` and `{2, 5, 1}`.

In our Physics Project there’s nothing quite like treelike relations built in. The “atoms of space” are related by a hypergraph—without any kind of explicit hierarchical structure. The hypergraph can take on what amounts to a hierarchical structure, but the fundamental transformation rules won’t intrinsically take account of this.

The hierarchical structure of expressions is incredibly important in their practical use—where it presumably leverages the hierarchical structure of human language, and of ways we talk about the world:

We’ll see soon below that we can in principle represent expressions without having hierarchical structure explicitly built in. But in almost all uses of expressions—say in Wolfram Language—we end up needing to have hierarchical structure.

If we were only doing single-way evaluation the hierarchical structure of expressions would be important in determining the order of evaluation to be used, but it wouldn’t immediately enmesh with core features of the evaluation process. But in multiway evaluation “higher” treelike-separated events can in effect cut off the evaluation histories of “lower” ones—and so it’s inevitably central to the evaluation process. For spacelike- and branchlike-separated events, we can always choose different reference frames (or different spacelike or branchlike surfaces) that arrange the events differently. But treelike-separated events—a little like timelike-separated ones—have a certain forced relationship that cannot be affected by an observer’s choices.

To draw causal graphs—and in fact to do a lot of what we’ve done here—we need to know “what depends on what”. And with our normal setup for expressions this can be quite subtle and complicated. We apply the rule to to give the result . But does the a that “comes out” depend on the a that went in, or is it somehow something that’s “independently generated”? Or, more extremely, in a transformation like , to what extent is it “the same 1” that goes in and comes out? And how do these issues of dependence work when there are the kinds of treelike relations discussed in the previous section?

The Wolfram Language evaluator defines how expressions should be evaluated—but doesn’t immediately specify anything about dependencies. Often we can look “after the fact” and deduce what “was involved” and what was not—and thus what should be considered to depend on what. But it’s not uncommon for it to be hard to know what to say—forcing one to make what seem likely arbitrary decisions. So is there any way to avoid this, and to set things up so that dependency becomes somehow “obvious”?

It turns out that there is—though, perhaps not surprisingly, it comes with difficulties of its own. But the basic idea is to go “below expressions”, and to “grind everything down” to hypergraphs whose nodes are ultimate direct “carriers” of identity and dependency. It’s all deeply reminiscent of our Physics Project—and its generalization in the ruliad. Though in those cases the individual elements (or “emes” as we call them) exist far below the level of human perception, while in the hypergraphs we construct for expressions, things like symbols and numbers appear directly as emes.

So how can we “compile” arbitrary expressions to hypergraphs? In the Wolfram Language something like *a* + *b* + *c* is the “full-form” expression

which corresponds to the tree:

And the point is that we can represent this tree by a hypergraph:

`Plus`, *a*, *b* and *c* appear directly as “content nodes” in the hypergraph. But there are also “infrastructure nodes” (here labeled with integers) that specify how the different pieces of content are “related”—here with a 5-fold hyperedge representing `Plus` with three arguments. We can write this hypergraph out in “symbolic form” as:

Let’s say instead we have the expression or `Plus``[a, Plus``[b, c]]`, which corresponds to the tree:

We can represent this expression by the hypergraph

which can be rendered visually as:

What does evaluation do to such hypergraphs? Essentially it must transform collections of hyperedges into other collections of hyperedges. So, for example, when `x_ + y_` is evaluated, it transforms a set of 3 hyperedges to a single hyperedge according to the rule:

(Here the list on the left-hand side represents three hyperedges in any order—and so is effectively assumed to be orderless.) In this rule, the literal `Plus` acts as a kind of key to determine what should happen, while the specific patterns define how the input and output expressions should be “knitted together”.

So now let’s apply this rule to the expression 10 + (20 + 30). The expression corresponds to the hypergraph

where, yes, there are integers both as content elements, and as labels or IDs for “infrastructure nodes”. The rule operates on collections of hyperedges, always consuming 3 hyperedges, and generating 1. We can think of the hyperedges as “fundamental tokens”. And now we can draw a token-event graph to represent the evaluation process:

Here’s the slightly more complicated case of (10 + (20 + 20)) + (30 + 40):

But here now is the critical point. By looking at whether there are emes in common from one event to another, we can determine whether there is dependency between those events. Emes are in a sense “atoms of existence” that maintain a definite identity, and immediately allow one to trace dependency.

So now we can fill in causal edges, with each edge labeled by the emes it “carries”:

Dropping the hyperedges, and adding in an initial “Big Bang” event, we get the (multiway) causal graph:

We should note that in the token-event graph, each expression has been “shattered” into its constituent hyperedges. Assembling the tokens into recognizable expressions effectively involves setting up a particular foliation of the token-event graph. But if we do this, we get a multiway graph expressed in terms of hypergraphs

or in visual form:

As a slightly more complicated case, consider the recursive computation of the Fibonacci number `f[2]`. Here is the token-event graph in this case:

And here is the corresponding multiway causal graph, labeled with the emes that “carry causality”:

Every kind of expression can be “ground down” in some way to hypergraphs. For strings, for example, it’s convenient to make a separate token out of every character, so that “ABBAAA” can be represented as:

It’s interesting to note that our hypergraph setup can have a certain similarity to machine-level representations of expressions, with every eme in effect corresponding to a pointer to a certain memory location. Thus, for example, in the representation of the string, the infrastructure emes define the pointer structure for a linked list—with the content emes being the “payloads” (and pointing to globally shared locations, like ones for A and B).

Transformations obtained by applying rules can then be thought of as corresponding just to rearranging pointers. Sometimes “new emes” have to be created, corresponding to new memory being allocated. We don’t have an explicit way to “free” memory. But sometimes some part of the hypergraph will become disconnected—and one can then imagine disconnected pieces to which the observer is not attached being garbage collected.

So far we’ve discussed what happens in the evaluation of particular expressions according to particular rules (where those rules could just be all the ones that are built into Wolfram Language). But the concept of the ruliad suggests thinking about all possible computations—or, in our terms here, all possible evaluations. Instead of particular expressions, we are led to think about evaluating all possible expressions. And we are also led to think about using all possible rules for these evaluations.

As one simple approach to this, instead of looking, for example, at a single combinator definition such as

used to evaluate a single expression such as

we can start enumerating all possible combinator rules

and apply them to evaluate all possible expressions:

Various new phenomena show up here. For example, there is now immediately the possibility of not just spacelike and branchlike separation, but also what we can call rulelike separation.

In a trivial case, we could have rules like

and then evaluating *x* will lead to two events which we can consider rulelike separated:

In the standard Wolfram Language system, the definitions and *x* = *b* would overwrite each other. But if we consider rulial multiway evaluation, we’d have branches for each of these definitions.

In what we’ve discussed before, we effectively allow evaluation to take infinite time, as well as infinite space and infinite branchial space. But now we’ve got the new concept of infinite rulial space. We might say from the outset that, for example, we’re going to use all possible rules. Or we might have what amounts to a dynamical process that generates possible rules.

And the key point is that as soon as that process is in effect computation universal, there is a way to translate from one instance of it to another. Different specific choices will lead to a different basis—but in the end they’ll all eventually generate the full ruliad.

And actually, this is where the whole concept of expression evaluation ultimately merges with fundamental physics. Because in both cases, the limit of what we’re doing will be exactly the same: the full ruliad.

The formalism we’ve discussed here—and particularly its correspondence with fundamental physics—is in many ways a new story. But it has precursors that go back more than a century. And indeed as soon as industrial processes—and production lines—began to be formalized, it became important to understand interdependencies between different parts of a process. By the 1920s flowcharts had been invented, and when digital computers were developed in the 1940s they began to be used to represent the “flow” of programs (and in fact Babbage had used something similar even in the 1840s). At first, at least as far as programming was concerned, it was all about the “flow of control”—and the sequence in which things should be done. But by the 1970s the notion of the “flow of data” was also widespread—in some ways reflecting back to actual flow of electrical signals. In some simple cases various forms of “visual programming”—typically based on connecting virtual wires—have been popular. And even in modern times, it’s not uncommon to talk about “computation graphs” as a way to specify how data should be routed in a computation, for example in sequences of operations on tensors (say for neural net applications).

A different tradition—originating in mathematics in the late 1800s—involved the routine use of “abstract functions” like *f*(*x*). Such abstract functions could be used both “symbolically” to represent things, and explicitly to “compute” things. All sorts of (often ornate) formalism was developed in mathematical logic, with combinators arriving in 1920, and lambda calculus in 1935. By the late 1950s there was LISP, and by the 1970s there was a definite tradition of “functional programming” involving the processing of things by successive application of different functions.

The question of what really depended on what became more significant whenever there was the possibility of doing computations in parallel. This was already being discussed in the 1960s, but became more popular in the early 1980s, and in a sense finally “went mainstream” with GPUs in the 2010s. And indeed our discussion of causal graphs and spacelike separation isn’t far away from the kind of thing that’s often discussed in the context of designing parallel algorithms and hardware. But one difference is that in those cases one’s usually imagining having a “static” flow of data and control, whereas here we’re routinely considering causal graphs, etc. that are being created “on the fly” by the actual progress of a computation.

In many situations—with both algorithms and hardware—one has precise control over when different “events” will occur. But in distributed systems it’s also common for events to be asynchronous. And in such cases, it’s possible to have “conflicts”, “race conditions”, etc. that correspond to branchlike separation. There have been various attempts—many originating in the 1970s—to develop formal “process calculi” to describe such systems. And in some ways what we’re doing here can be seen as a physics-inspired way to clarify and extend these kinds of approaches.

The concept of multiway systems also has a long history—notably appearing in the early 1900s in connection with game graphs, formal group theory and various problems in combinatorics. Later, multiway systems would implicitly show up in considerations of automated theorem proving and nondeterministic computation. In practical microprocessors it’s been common for a decade or so to do “speculative execution” where multiple branches in code are preemptively followed, keeping only the one that’s relevant given actual input received.

And when it comes to branchlike separation, a notable practical example arises in version control and collaborative editing systems. If a piece of text has changes at two separated places (“spacelike separation”), then these changes (“diffs”) can be applied in any order. But if these changes involve the same content (e.g. same characters) then there can be a conflict (“merge conflict”) if one tries to apply the changes—in effect reflecting the fact that these changes were made by branchlike-separated “change events” (and to trace them requires creating different “forks” or what we might call different histories).

It’s perhaps worth mentioning that as soon as one has the concept of an “expression” one is led to the concept of “evaluation”—and as we’ve seen many times here, that’s even true for arithmetic expressions, like 1 + (2 + 3). We’ve been particularly concerned with questions about “what depends on what” in the process of evaluation. But in practice there’s often also the question of when evaluation happens. The Wolfram Language, for example, distinguishes between “immediate evaluation” done when a definition is made, and “delayed evaluation” done when it’s used. There’s also lazy evaluation where what’s immediately generated is a symbolic representation of the computation to be done—with steps or pieces being explicitly computed only later, when they are requested.

But what really is “evaluation”? If our “input expression” is 1 + 1, we typically think of this as “defining a computation that can be done”. Then the idea of the “process of evaluation” is that it does that computation, deriving a final “value”, here 2. And one view of the Wolfram Language is that its whole goal is to set up a collection of transformations that do as many computations that we know how to do as possible. Some of those transformations effectively incorporate “factual knowledge” (like knowledge of mathematics, or chemistry, or geography). But some are more abstract, like transformations defining how to do transformations, say on patterns.

These abstract transformations are in a sense the easiest to trace—and often above that’s what we’ve concentrated on. But usually we’ve allowed ourselves to do at least some transformations—like adding numbers—that are built into the “insides” of the Wolfram Language. It’s perhaps worth mentioning that in conveniently representing such a broad range of computational processes the Wolfram Language ends up having some quite elaborate evaluation mechanisms. A common example is the idea of functions that “hold their arguments”, evaluating them only as “specifically requested” by the innards of the function. Another—that in effect creates a “side chain” to causal graphs—are conditions (e.g. associated with `/;`) that need to be evaluated to determine whether patterns are supposed to match.

Evaluation is in a sense the central operation in the Wolfram Language. And what we’ve seen here is that it has a deep correspondence with what we can view as the “central operation” of physics: the passage of time. Thinking in terms of physics helps organize our thinking about the process of evaluation—and it also suggests some important generalizations, like multiway evaluation. And one of the challenges for the future is to see how to take such generalizations and “package” them as part of our computational language in a form that we humans can readily understand and make use of.

It was in late 1979 that I first started to design my SMP (“Symbolic Manipulation Program”) system. I’d studied both practical computer systems and ideas from mathematical logic. And one of my conclusions was that any definition you made should always get used, whenever it could. If you set , then you set , you should get (not ) if you asked for . It’s what most people would expect should happen. But like almost all fundamental design decisions, in addition to its many benefits, it had some unexpected consequences. For example, it meant that if you set without having given a value for , you’d in principle get an infinite loop.

Back in 1980 there were computer scientists who asserted that this meant the “infinite evaluation” I’d built into the core of SMP “could never work”. Four decades of experience tells us rather definitively that in practice they were wrong about this (essentially because people just don’t end up “falling into the pothole” when they’re doing actual computations they want to do). But questions like those about made me particularly aware of issues around recursive evaluation. And it bothered me that a recursive factorial definition like `f[n_]:=n f[n–1]` (the rather less elegant SMP notation was f[$n]::$n f[$1-1]) might just run infinitely if it didn’t have a base case (f[1] = 1), rather than terminating with the value 0, which it “obviously should have”, given that at some point one’s computing 0×….

So in SMP I invented a rather elaborate scheme for recursion control that “solved” this problem. And here’s what happens in SMP (now running on a reconstructed virtual machine):

And, yes, if one includes the usual base case for factorial, one gets the usual answer:

So what is going on here? Section 3.1 of the SMP documentation in principle tells the story. In SMP I used the term “simplification” for what I’d now call “evaluation”, both because I imagined that most transformations one wanted would make things “simpler” (as in ), and because there was a nice pun between the name SMP and the function Smp that carried out the core operation of the system (yes, SMP rather foolishly used short names for built-in functions). Also, it’s useful to know that in SMP I called an ordinary expression like f[x, y, …] a “projection”: its “head” f was called its “projector”, and its arguments x, y, … were called “filters”.

As the Version 1.0 documentation from July 1981 tells it, “simplification” proceeds like this:

By the next year, it was a bit more sophisticated, though the default behavior didn’t change:

With the definitions above, the value of f itself was (compare `Association` in Wolfram Language):

But the key to evaluation without the base case actually came in the “properties” of multiplication:

In SMP True was (foolishly) 1. It’s notable here that Flat corresponds to the attribute `Flat` in Wolfram Language, Comm to `Orderless` and Ldist to `Listable`. (Sys indicated that this was a built-in system function, while Tier dealt with weird consequences of the attempted unification of arrays and functions into an association-like construct.) But the critical property here was Smp. By default its value was Inf (for `Infinity`). But for Mult (`Times`) it was 1.

And what this did was to tell the SMP evaluator that inside any multiplication, it should allow a function (like f) to be called recursively at most once before the actual multiplication was done. Telling SMP to trace the evaluation of f[5] we then see:

So what’s going on here? The first time f appears inside a multiplication its definition is used. But when f appears recursively a second time, it’s effectively frozen—and the multiplication is done using its frozen form, with the result that as soon as a 0 appears, one just ends up with 0.

Reset the Smp property of Mult to infinity, and the evaluation runs away, eventually producing a rather indecorous crash:

In effect, the Smp property defines how many recursive evaluations of arguments should be done before a function itself is evaluated. Setting the Smp property to 0 has essentially the same effect as the `HoldAll` attribute in Wolfram Language: it prevents arguments from being evaluated until a function as a whole is evaluated. Setting Smp to value *k* basically tells SMP to do only *k* levels of “depth-first” evaluation before collecting everything together to do a “breadth-first evaluation”.

Let’s look at this for a recursive definition of Fibonacci numbers:

With the Smp property of Plus set to infinity, the sequence of evaluations of f follows a pure “depth-first” pattern

where we can plot the sequence of `f[`*n*`]` evaluated as:

But with the default setting of 1 for the Smp property of Plus the sequence is different

and now the sequence of `f[`*n*`]` evaluated is:

In the pure depth-first case all the exponentially many leaves of the Fibonacci tree are explicitly evaluated. But now the evaluation of f[*n*] is being frozen after each step and terms are being collected and combined. Starting for example from f[10] we get f[9] + f[8]. And evaluating another step we get

I don’t now remember quite why I put it in, but SMP also had another piece of recursion control: the Rec property of a symbol—which basically meant “it’s OK for this symbol to appear recursively; don’t count it when you’re trying to work out whether to freeze an evaluation”.

And it’s worth mentioning that SMP also had a way to handle the original issue:

It wasn’t a terribly general mechanism, but at least it worked in this case:

I always thought that SMP’s “wait and combine terms before recursing” behavior was quite clever, but beyond the factorial and Fibonacci examples here I’m not sure I ever found clear uses for it. Still, with our current physics-inspired way of looking at things, we can see that this behavior basically corresponded to picking a “more spacetime-like” foliation of the evaluation graph.

And it’s a piece of personal irony that right around the time I was trying to figure out recursive evaluation in SMP, I was also working on gauge theories in physics—which in the end involve very much the same kinds of issues. But it took another four decades—and the development of our Physics Project—before I saw the fundamental connection between these things.

The idea of parallel computation was one that I was already thinking about at the very beginning of the 1980s—partly at a theoretical level for things like neural nets and cellular automata, and partly at a practical level for SMP (and indeed by 1982 I had described a Ser property in SMP that was supposed to ensure that the arguments of a particular function would always get evaluated in a definite order “in series”). Then in 1984 I was involved in trying to design a general language for parallel computation on the Connection Machine “massively parallel” computer. The “obvious” approach was just to assume that programs would be set up to operate in steps, even if at each step many different operations might happen in parallel. But I somehow thought that there must be a better approach, somehow based on graphs, and graph rewriting. But back then I didn’t, for example, think of formulating things in terms of causal graphs. And while I knew about phenomena like race conditions, I hadn’t yet internalized the idea of constructing multiway graphs to “represent all possibilities”.

When I started designing Mathematica—and what’s now the Wolfram Language—in 1986, I used the same core idea of transformation rules for symbolic expressions that was the basis for SMP. But I was able to greatly streamline the way expressions and their evaluation worked. And not knowing compelling use cases, I decided not to set up the kind of elaborate recursion control that was in SMP, and instead just to concentrate on basically two cases: functions with ordinary (essentially leftmost-innermost) evaluation and functions with held-argument (essentially outermost) evaluation. And I have to say that in three decades of usages and practical applications I haven’t really missed having more elaborate recursion controls.

In working on *A New Kind of Science* in the 1990s, issues of evaluation order first came up in connection with “symbolic systems” (essentially, generalized combinators). They then came up more poignantly when I explored the possible computational “infrastructure” for spacetime—and indeed that was where I first started explicitly discussing and constructing causal graphs.

But it was not until 2019 and early 2020, with the development of our Physics Project, that clear concepts of spacelike and branchlike separation for events emerged. The correspondence with expression evaluation got clearer in December 2020 when—in connection with the centenary of their invention—I did an extensive investigation of combinators (leading to my book *Combinators*). And as I started to explore the general concept of multicomputation, and its many potential applications, I soon saw the need for systematic ways to think about multicomputational evaluation in the context of symbolic language and symbolic expressions.

In both SMP and Wolfram Language the main idea is to “get results”. But particularly for debugging it’s always been of interest to see some kind of trace of how the results are obtained. In SMP—as we saw above—there was a Trace property that would cause any evaluation associated with a particular symbol to be printed. But what about an actual computable representation of the “trace”? In 1990 we introduced the function `Trace` in the Wolfram Language—which produces what amounts to a symbolic representation of an evaluation process.

I had high hopes for `Trace`—and for its ability to turn things like control flows into structures amenable to direct manipulation. But somehow what `Trace` produces is almost always too difficult to understand in real cases. And for many years I kept the problem of “making a better `Trace`” on my to-do list, though without much progress.

The problem of “exposing a process of computation” is quite like the problem of presenting a proof. And in 2000 I had occasion to use automated theorem proving to produce a long proof of my minimal axiom system for Boolean algebra. We wanted to introduce such methods into Mathematica (or what’s now the Wolfram Language). But we were stuck on the question of how to represent proofs—and in 2007 we ended up integrating just the “answer” part of the methods into the function `FullSimplify`.

By the 2010s we’d had the experience of producing step-by-step explanations in Wolfram|Alpha, as well as exploring proofs in the context of representing pure-mathematical knowledge. And finally in 2018 we introduced `FindEquationalProof`, which provided a symbolic representation of proofs—at least ones based on successive pattern matching and substitution—as well as a graphical representation of the relationships between lemmas.

After the arrival of our Physics Project—as well as my exploration of combinators—I returned to questions about the foundations of mathematics and developed a whole “physicalization of metamathematics” based on tracing what amount to multiway networks of proofs. But the steps in these proofs were still in a sense purely structural, involving only pattern matching and substitution.

I explored other applications of “multicomputation”, generating multiway systems based on numbers, multiway systems representing games, and so on. And I kept on wondering—and sometimes doing livestreamed discussions about—how best to create a language design around multicomputation. And as a first step towards that, we developed the `TraceGraph` function in the Wolfram Function Repository, which finally provided a somewhat readable graphical rendering of the output of `Trace``—`and began to show the causal dependencies in at least single-way computation. But what about the multiway case? For the Physics Project we’d already developed `MultiwaySystem` and related functions in the Wolfram Function Repository. So now the question was: how could one streamline this and have it provide essentially a multiway generalization of `TraceGraph`? We began to think about—and implement—concepts like `Multi`, and imagine ways in which general multicomputation could encompass things like logic programming and probabilistic programming, as well as nondeterministic and quantum computation.

But meanwhile, the “ question” that had launched my whole adventure in recursion control in SMP was still showing up—43 years later—in the Wolfram Language. It had been there since Version 1.0, though it never seemed to matter much, and we’d always handled it just by having a global “recursion limit”—and then “holding” all further subevaluations:

But over the years there’d been increasing evidence that this wasn’t quite adequate, and that for example further processing of the held form (even, for example, formatting it) could in extreme cases end up triggering even infinite cascades of evaluations. So finally—in Version 13.2 at the end of last year—we introduced the beginnings of a new mechanism to cut off “runaway” computations, based on a construct called `TerminatedEvaluation`:

And from the beginning we wanted to see how to encode within `TerminatedEvaluation` information about just what evaluation had been terminated. But to do this once again seemed to require having a way to represent the “ongoing process of evaluation”—leading us back to `Trace`, and making us think about evaluation graphs, causal graphs, etc.

At the beginning *x* = *x* + 1 might just have seemed like an irrelevant corner case—and for practical purposes it basically is. But already four decades ago it led me to start thinking not just about the results of computations, but also how their internal processes can be systematically organized. For years, I didn’t really connect this to my work on explicit computational processes like those in systems such as cellular automata. Hints of such connections did start to emerge as I began to try to build computational models of fundamental physics. But looking back I realize that in *x* = *x* + 1 there was already in a sense a shadow of what was to come in our Physics Project and in the whole construction of the ruliad.

Because *x* = *x* + 1 is something which—like physics and like the ruliad—necessarily generates an ongoing process of computation. One might have thought that the fact that it doesn’t just “give an answer” was in a sense a sign of uselessness. But what we’ve now realized is that our whole existence and experience is based precisely on “living inside a computational process” (which, fortunately for us, hasn’t just “ended with an answer”). Expression evaluation is in its origins intended as a “human-accessible” form of computation. But what we’re now seeing is that its essence also inevitably encompasses computations that are at the core of fundamental physics. And by seeing the correspondence between what might at first appear to be utterly unrelated intellectual directions, we can expect to inform both of them. Which is what I have started to try to do here.

What I’ve described here builds quite directly on some of my recent work, particularly as covered in my books *Combinators: A Centennial View* and *Metamathematics: Physicalization & Foundations*. But as I mentioned above, I started thinking about related issues at the beginning of the 1980s in connection with the design of SMP, and I’d like to thank members of the SMP development team for discussions at that time, particularly Chris Cole, Jeff Greif and Tim Shaw. Thanks also to Bruce Smith for his 1990 work on `Trace` in Wolfram Language, and for encouraging me to think about symbolic representations of computational processes. In much more recent times, I’d particularly like to thank Jonathan Gorard for his extensive conceptual and practical work on multiway systems and their formalism, both in our Physics Project and beyond. Some of the directions described here have (at least indirectly) been discussed in a number of recent Wolfram Language design review livestreams, with particular participation by Ian Ford, Nik Murzin, and Christopher Wolfram, as well as Dan Lichtblau and Itai Seggev. Thanks also to Wolfram Institute fellows Richard Assar and especially Nik Murzin for their help with this piece.

In many ways the great quest of Doug Lenat’s life was an attempt to follow on directly from the work of Aristotle and Leibniz. For what Doug was fundamentally trying to do over the forty years he spent developing his CYC system was to use the framework of logic—in more or less the same form that Aristotle and Leibniz had it—to capture what happens in the world. It was a noble effort and an impressive example of long-term intellectual tenacity. And while I never managed to actually use CYC myself, I consider it a magnificent experiment—that if nothing else ultimately served to demonstrate the importance of building frameworks beyond logic alone in usefully representing and reasoning about the world.

Doug Lenat started working on artificial intelligence at a time when nobody really knew what might be possible—or even easy—to do. Was AI (whatever that might mean) just a clever algorithm—or a new type of computer—away? Or was it all just an “engineering problem” that simply required pulling together a bigger and better “expert system”? There was all sorts of mystery—and quite a lot of hocus pocus—around AI. Did the demo one was seeing actually prove something, or was it really just a trivial (if perhaps unwitting) cheat?

I first met Doug Lenat at the beginning of the 1980s. I had just developed my SMP (“Symbolic Manipulation Program”) system, that was the forerunner of Mathematica and the modern Wolfram Language. And I had been quite exposed to commercial efforts to “do AI” (and indeed our VCs had even pushed my first company to take on the dubious name “Inference Corporation”, complete with a “=>” logo). And I have to say that when I first met Doug I was quite dismissive. He told me he had a program (that he called “AM” for “Automated Mathematician”, and that had been the subject of his Stanford CS PhD thesis) that could discover—and in fact had discovered—nontrivial mathematical theorems.

“What theorems?” I asked. “What did you put in? What did you get out?” I suppose to many people the concept of searching for theorems would have seemed like something remarkable, and immediately exciting. But not only had I myself just built a system for systematically representing mathematics in computational form, I had also been enumerating large collections of simple programs like cellular automata. I poked at what Doug said he’d done, and came away unconvinced. Right around the same time I happened to be visiting a leading university AI group, who told me they had a system for translating stories from Spanish into English. “Can I try it?” I asked, suspending for a moment my feeling that this sounded like science fiction. “I don’t really know Spanish”, I said, “Can I start with just a few words?” “No”, they said, “the system works only with stories.” “How long does a story have to be?” I asked. “Actually it has to be a particular kind of story”, they said. “What kind?” I asked. There were a few more iterations, but eventually it came out: the “system” translated one particular story from Spanish into English! I’m not sure if my response included an expletive, but I wondered what kind of science, technology, or anything else this was supposed to be. And when Doug told me about his “Automated Mathematician”, this was the kind of thing I was afraid I was going to find.

Years later, I might say, I think there’s something AM could have been trying to do that’s valid, and interesting, if not obviously possible. Given a particular axiom system it’s easy to mechanically generate infinite collections of “true theorems”—that in effect fill metamathematical space. But now the question is: which of these theorems will human mathematicians find “interesting”? It’s not clear how much of the answer has to do with the “social history of mathematics”, and how much is more about “abstract principles”. I’ve been studying this quite a bit in recent years (not least because I think it could be useful in practice)—and have some rather deep conclusions about its relation to the nature of mathematics. But I now do wonder to what extent Doug’s work from all those years ago might (or might not) contain heuristics that would be worth trying to pursue even now.

I ran into Doug quite a few times in the early to mid-1980s, both around a company called Thinking Machines (to which I was a consultant) and at various events that somehow touched on AI. There was a fairly small and somewhat fragmented AI community in those days, with the academic part in the US concentrated around MIT, Stanford and CMU. I had the impression that Doug was never quite at the center of that community, but was somehow nevertheless a “notable member”, who—particularly with his work being connected to math—was seen as “doing upscale things” around AI.

In 1984 I wrote an article for a special issue of *Scientific American* on “computer software” (yes, software was trendy then). My article was entitled “Computer Software in Science and Mathematics”, and the very next article was by Doug, entitled “Computer Software for Intelligent Systems”. The summary at the top of my article read: “Computation offers a new means of describing and investigating scientific and mathematical systems. Simulation by computer may be the only way to predict how certain complicated systems evolve.” And the summary for Doug’s article read: “The key to intelligent problem solving lies in reducing the random search for solutions. To do so intelligent computer programs must tap the same underlying ‘sources of power’ as human beings”. And I suppose in many ways both of us spent most of our next four decades essentially trying to fill out the promise of these summaries.

A key point in Doug’s article—with which I wholeheartedly agree—is that to create something one can usefully identify as “AI”, it’s essential to somehow have lots of knowledge of the world built in. But how should that be done? How should the knowledge be encoded? And how should it be used?

Doug’s article in *Scientific American* illustrated his basic idea:

Encode knowledge about the world in the form of statements of logic. Then find ways to piece together these statements to derive conclusions. It was, in a sense, a very classic approach to formalizing the world—and one that would at least in concept be familiar to Aristotle and Leibniz. Of course it was now using computers—both as a way to store the logical statements, and as a way to find inferences from them.

At first, I think Doug felt the main problem was how to “search for correct inferences”. Given a whole collection of logical statements, he was asking how these could be knitted together to answer some particular question. In essence it was just like mathematical theorem proving: how could one knit together axioms to make a proof of a particular theorem? And especially with the computers and algorithms of the time, this seemed like a daunting problem in almost any realistic case.

But then how did humans ever manage to do it? What Doug imagined was that the critical element was heuristics: strategies for guessing how one might “jump ahead” and not have to do the kind of painstaking searches that systematic methods seemed to imply would be needed. Doug developed a system he called EURISKO that implemented a range of heuristics—that Doug expected could be used not only for math, but basically for anything, or at least anything where human-like thinking was effective. And, yes, EURISKO included not only heuristics, but also at least some kinds of heuristics for making new heuristics, etc.

But OK, so Doug imagined that EURISKO could be used to “reason about” anything. So if it had the kind of knowledge humans do, then—Doug believed—it should be able to reason just like humans. In other words, it should be able to deliver some kind of “genuine artificial intelligence” capable of matching human thinking.

There were all sorts of specific domains of knowledge to consider. But Doug particularly wanted to push in what seemed like the most broadly impactful direction—and tackle the problem of commonsense knowledge and commonsense reasoning. And so it was that Doug began what would become a lifelong project to encode as much knowledge as possible in the form of statements of logic.

In 1984 Doug’s project—now named CYC—became a flagship part of MCC (Microelectronics and Computer Technology Corporation) in Austin, TX—an industry-government consortium that had just been created to counter the perceived threat from the Japanese “Fifth Generation Computer Project”, that had shocked the US research establishment by putting immense resources into “solving AI” (and was actually emphasizing many of the same underlying rule-based techniques as Doug). And at MCC Doug had the resources to hire scores of people to embark on what was expected to be a few thousand person-years of effort.

I didn’t hear much about CYC for quite a while, though shortly after Mathematica was released in 1988 Marvin Minsky mused to me about how it seemed like we were doing for math-like knowledge what CYC was hoping to do for commonsense knowledge. I think Marvin wasn’t convinced that Doug had the technical parts of CYC right (and, yes, they weren’t using Marvin’s theories as much as they might). But in those years Marvin seemed to feel that CYC was one of the few AI projects going on that actually made any sense. And indeed in my archives I find a rather charming email from Marvin in 1992, attaching a draft of a science fiction novel (entitled *The Turing Option*) that he was writing with Harry Harrison, which contained mention of CYC:

June 19, 2024

When Brian and Ben reached the lab, the computer was running

but the tree-robot was folded and motionless. “Robin,

activate.”

…

“Robin will have to use different concepts of progress for

different kinds of problems. And different kinds of subgoals

for reducing those different kinds of differences.”

“Won’t that require enormous amounts of knowledge?”

“It will indeed—and that’s one reason human education takes

so long. But Robin should already contain a massive amount of

just that kind of information—as part of his CYC-9 knowledge-

base.”

…

“There now exists a procedural model for the behavior of a

human individual, based on the prototype human described in

section 6.001 of the CYC-9 knowledge base. Now customizing

parameters on the basis of the example person Brian Delaney

described in the employment, health, and security records of

Megalobe Corporation.”

A brief silence ensued. Then the voice continued.

“The Delaney model is judged as incomplete as compared to those

of other persons such as President Abraham Lincoln, who has

3596.6 megabytes of descriptive text, or Commander James

Bond, who has 16.9 megabytes.”

Later, one of the novel’s characters observes: “Even if we started with nothing but the

old Lenat–Haase representation-languages, we’d still be far ahead of what any animal ever evolved.” (Ken Haase was a student of Marvin’s who critiqued and extended Doug’s work on heuristics.)

I was exposed to CYC again in 1996 in connection with a book called *HAL’s Legacy*—to which both Doug and I contributed—published in honor of the fictional birthday of the AI in the movie *2001*. But mostly AI as a whole was in the doldrums, and almost nobody seemed to be taking it seriously. Sometimes I would hear murmurs about CYC, mostly from government and military contacts. Among academics, Doug would occasionally come up, but rather cruelly he was most notable for his name being used for a unit of “bogosity”—the lenat—of which it was said that “Like the farad it is considered far too large a unit for practical use, so bogosity is usually expressed in microlenats”.

Many years passed. I certainly hadn’t forgotten Doug, or CYC. And a few times people suggested connecting CYC in some way to our technology. But nothing ever happened. Then in the spring of 2009 we were nearing the first release of Wolfram|Alpha, and it seemed like I finally had something that I might meaningfully be able to talk to Doug about.

I sent a rather tentative email:

Subject: something you might find interesting…

Date: Thu, 05 Mar 2009 11:15:04 -0500

From: Stephen Wolfram

To: Doug Lenat

We’re in the final stages of a rather large project that I think relates to

some of your interests.

some of your interests.

I just made a small blog post about it:

http://blog.wolfram.com/2009/03/05/wolframalpha-is-coming/

I’d be pleased to give you a webconference demo if you’re interested.

I hope you’ve been well all these years.

— Stephen

Doug quickly responded:

Subject: Re: something you might find interesting…

Date: Thu, 5 Mar 2009 13:23:31 -0600

From: Doug Lenat

To: Stephen Wolfram

Hi, Stephen.

You have become a master of understatement! This certainly

does relate to the 1000 person-years we’ve spent building Cyc’s ontology,

knowledge base, and inference engines, over the last 25 years. I’d very

much like to see a webconference demo, so we identify the opportunities for

synergy.

Regards

Doug

It was definitely a “you’re on my turf” kind of response. And I wasn’t sure what to expect from Doug. But a few days later we had a long call with Doug and some of the senior members of what was now the Cycorp team. And Doug did something that deeply impressed me. Rather than for example nitpicking that Wolfram|Alpha was “not AI” he basically just said “We’ve been trying to do something like this for years, and now you’ve succeeded”. It was a great—and even inspirational—show of intellectual integrity. And whatever I might think of CYC and Doug’s other work (and I’d never formed a terribly clear opinion), this for me put Doug firmly in the category of people to respect.

Doug wrote a blog post entitled “I was positively impressed with Wolfram Alpha”, and immediately started inviting us to various AI and industry-pooh-bah events to which he was connected.

Doug seemed genuinely pleased that we had made such progress in something so close to his longtime objectives. I talked to him about the comparison between our approaches. He was just working with “pure human-like reasoning”, I said, like one would have had to do in the Middle Ages. But, I said, “In a sense we cheated”. Because we used all the things that got invented in modern times in science and math and so on. If he wanted to work out how some mechanical system would behave, he would have to reason through it: “If you push this down, that pulls up, then this rolls”, etc. But with what we’re doing, we just have to turn everything into math (or something like it), then systematically solve it using equations and so on.

And there was something else too: we weren’t trying to use just logic to represent the world, we were using the full power and richness of computation. In talking about the Solar System, we didn’t just say that “Mars is a planet contained in the Solar System”; we had an algorithm for computing its detailed motion, and so on.

Doug and CYC had also emphasized the scraps of knowledge that seem to appear in our “common sense”. But we were interested in systematic, computable knowledge. We didn’t just want a few scattered “common facts” about animals. We wanted systematic tables of properties of millions of species. And we had very general computational ways to represent things: not just words or tags for things, but systematic ways to capture computational structures, whether they were entities, graphs, formulas, images, time series, or geometrical forms, or whatever.

I think Doug viewed CYC as some kind of formalized idealization of how he imagined human minds work: providing a framework into which a large collection of (fairly undifferentiated) knowledge about the world could be “poured”. At some level it was a very “pure AI” concept: set up a generic brain-like thing, then “it’ll just do the rest”. But Doug still felt that the thing had to operate according to logic, and that what was fed into it also had to consist of knowledge packaged up in the form of logic.

But while Doug’s starting points were AI and logic, mine were something different—in effect computation writ large. I always viewed logic as something not terribly special: a particular formal system that described certain kinds of things, but didn’t have any great generality. To me the truly general concept was computation. And that’s what I’ve always used as my foundation. And it’s what’s now led to the modern Wolfram Language, with its character as a full-scale computational language.

There is a principled foundation. But it’s not logic. It’s something much more general, and structural: arbitrary symbolic expressions and transformations of them. And I’ve spent much of the past forty years building up coherent computational representations of the whole range of concepts and constructs that we encounter in the world and in our thinking about it. The goal is to have a language—in effect, a notation—that can represent things in a precise, computational way. But then to actually have the built-in capability to compute with that representation. Not to figure out how to string together logical statements, but rather to do whatever computation might need to be done to get an answer.

But beyond their technical visions and architectures, there is a certain parallelism between CYC and the Wolfram Language. Both have been huge projects. Both have been in development for more than forty years. And both have been led by a single person all that time. Yes, the Wolfram Language is certainly the larger of the two. But in the spectrum of technical projects, CYC is still a highly exceptional example of longevity and persistence of vision—and a truly impressive achievement.

After Wolfram|Alpha came on the scene I started interacting more with Doug, not least because I often came to the SXSW conference in Austin, and would usually make a point of reaching out to Doug when I did. Could CYC use Wolfram|Alpha and the Wolfram Language? Could we somehow usefully connect our technology to CYC?

When I talked to Doug he tended to downplay the commonsense aspects of CYC, instead talking about defense, intelligence analysis, healthcare, etc. applications. He’d enthusiastically tell me about particular kinds of knowledge that had been put into CYC. But time and time again I’d have to tell him that actually we already had systematic data and algorithms in those areas. Often I felt a bit bad about it. It was as if he’d been painstakingly planting crops one by one, and we’d come through with a giant industrial machine.

In 2010 we made a big “Timeline of Systematic Data and the Development of Computable Knowledge” poster—and CYC was on it as one of the six entries that began in the 1980s (alongside, for example, the web). Doug and I continued to talk about somehow working together, but nothing ever happened. One problem was the asymmetry: Doug could play with Wolfram|Alpha and Wolfram Language any time. But I’d never once actually been able to try CYC. Several times Doug had promised API keys, but none had ever materialized.

Eventually Doug said to me: “Look, I’m worried you’re going to think it’s bogus”. And particularly knowing Doug’s history with alleged “bogosity” I tried to assure him my goal wasn’t to judge. Or, as I put it in a 2014 email: “Please don’t worry that we’ll think it’s ‘bogus’. I’m interested in finding the good stuff in what you’ve done, not criticizing its flaws.”

But when I was at SXSW the next year Doug had something else he wanted to show me. It was a math education game. And Doug seemed incredibly excited about its videogame setup, complete with 3D spacecraft scenery. My son Christopher was there and politely asked if this was the default Unity scenery. I kept on saying, “Doug, I’ve seen videogames before; show me the AI!” But Doug didn’t seem interested in that anymore, eventually saying that the game wasn’t using CYC—though did still (somewhat) use “rule-based AI”.

I’d already been talking to Doug, though, about what I saw as being an obvious, powerful application of CYC in the context of Wolfram|Alpha: solving math word problems. Given a problem, say, in the form of equations, we could solve pretty much anything thrown at us. But with a word problem like “If Mary has 7 marbles and 3 fall down a drain, how many does she now have?” we didn’t stand a chance. Because to solve this requires commonsense knowledge of the world, which isn’t what Wolfram|Alpha is about. But it is what CYC is supposed to be about. Sadly, though, despite many reminders, we never got to try this out. (And, yes, we built various simple linguistic templates for this kind of thing into Wolfram|Alpha, and now there are LLMs.)

Independent of anything else, it was impressive that Doug had kept CYC and Cycorp running all those years. But when I saw him in 2015 he was enthusiastically telling me about what I told him seemed to me to be a too-good-to-be-true deal he was making around CYC. A little later there was a strange attempt to sell us the technology of CYC, and I don’t think our teams interacted again after that.

I personally continued to interact with Doug, though. I sent him things I wrote about the formalization of math. He responded pointing me to things he’d done on AM. On the tenth anniversary of Wolfram|Alpha Doug sent me a nice note, offering that “If you want to team up on, e.g., knocking the Winograd sentence pairs out of the park, let me know.” I have to say I wondered what a “Winograd sentence pair” was. It felt like some kind of challenge from an age of AI long past (apparently it has to do with identifying pronoun reference, which of course has become even more difficult in modern English usage).

And as I write this today, I realize a mistake I made back in 2016. I had for years been thinking about what I’ve come to call “symbolic discourse language”—an extension of computational language that can represent “everyday discourse”. And—stimulated by blockchain and the idea of computational contracts—I finally wrote something about this in 2016, and I now realize that I overlooked sending Doug a link to it. Which is a shame, because maybe it would have finally been the thing that got us to connect our systems.

Doug was a person who believed in formalism, particularly logic. And I have the impression that he always considered approaches like neural nets not really to have a chance of “solving the problem of AI”. But now we have LLMs. So how do they fit in with things like the ideas of CYC?

One of the surprises of LLMs is that they often seem, in effect, to use logic, even though there’s nothing in their setup that explicitly involves logic. But (as I’ve described elsewhere) I’m pretty sure what’s happened is that LLMs have “discovered” logic much as Aristotle did—by looking at lots of examples of statements people make and identifying patterns in them. And in a similar way LLMs have “discovered” lots of commonsense knowledge, and reasoning. They’re just following patterns they’ve seen, but—probably in effect organized into what I’ve called a “semantic grammar” that determines “laws of semantic motion”—that’s enough to often achieve some fairly impressive commonsense-like results.

I suspect that a great many of the statements that were fed into CYC could now be generated fairly successfully with LLMs. And perhaps one day there’ll be good enough “LLM science” to be able to identify mechanisms behind what LLMs can do in the commonsense arena—and maybe they’ll even look a bit like what’s in CYC, and how it uses logic. But in a sense the very success of LLMs in the commonsense arena strongly suggests that you don’t fundamentally need deep “structured logic” for that. Though, yes, the LLM may be immensely less efficient—and perhaps less reliable—than a direct symbolic approach.

It’s a very different story, by the way, with computational language and computation. LLMs are through and through based on language and patterns to be found through it. But computation—as it can be accessed through structured computational language—is something very different. It’s about processes that are in a sense thoroughly non-human, and that involve much deeper following of general formal rules, as well as much more structured kinds of data, etc. An LLM might be able to do basic logic, as humans have. But it doesn’t stand a chance on things where humans have had to systematically use formal tools that do serious computation. Insofar as LLMs represent “statistical AI”, CYC represents a certain level of “symbolic AI”. But computational language and computation go much further—to a place where LLMs can’t and shouldn’t follow, and should just call them as tools.

Doug always seemed to have a very optimistic view of the promise of AI. In 2013 he wrote to me:

Of course you are coming at this from the opposite end of the Chunnel than

we are, but you’re proceeding, frankly, much more rapidly toward us than we

are toward you. I probably appreciate the significance of what you’ve

accomplished more than almost anyone else: when your and our approaches do

meet up, the combination will be the existence of real AI on Earth. I

think that’s the main motivation in your life, as it is in mine: to live to

see real AI, with the obvious sweeping change in all aspects of life when

there is (i) cradle-to-grave 24×7 Aristotle mentoring and advising for

every human being and, in effect, (ii) a Land of Faerie intelligence

effectively present [e.g., that one can converse with] in every door, floor

tile,…every tangible object above a certain microscopic size.) And to

live to see and be users ourselves in an era of massively amplified human

intelligence …

we are, but you’re proceeding, frankly, much more rapidly toward us than we

are toward you. I probably appreciate the significance of what you’ve

accomplished more than almost anyone else: when your and our approaches do

meet up, the combination will be the existence of real AI on Earth. I

think that’s the main motivation in your life, as it is in mine: to live to

see real AI, with the obvious sweeping change in all aspects of life when

there is (i) cradle-to-grave 24×7 Aristotle mentoring and advising for

every human being and, in effect, (ii) a Land of Faerie intelligence

effectively present [e.g., that one can converse with] in every door, floor

tile,…every tangible object above a certain microscopic size.) And to

live to see and be users ourselves in an era of massively amplified human

intelligence …

The last mail I received from Doug was on January 10, 2023—telling me that he thought it was great that I was talking about connecting our tech to ChatGPT. He said, though, that he found it “increasingly worrisome that these models train on CONVINCINGNESS rather than CORRECTNESS”, then gave an example of ChatGPT getting a math word problem wrong.

His email ended:

Yes, let’s chat again at your convenience… it bothers both of us, I

believe, that our systems aren’t leveraging each other! That just bothers

me more and more as I get old (not just older).

believe, that our systems aren’t leveraging each other! That just bothers

me more and more as I get old (not just older).

Sadly we never did chat again. We now have a team actively working on symbolic discourse language, and just last week I mentioned CYC to them—and lamented that I’d never been able to try it. And then on Friday I heard that Doug had died. A remarkable pioneer of AI who steadfastly pursued his vision over the whole course of his career, and was taken far too soon.

]]>“OK, so let me tell you…” And so it would begin. A long and colorful story. An elaborate description of a wild idea. In the forty years I knew Ed Fredkin I heard countless wild ideas and colorful stories from him. He always radiated a certain adventurous joy—together with supreme, almost-childlike confidence. Ed was someone who wanted to independently figure things out for himself, and delighted in presenting his often somewhat-outlandish conclusions—whether about technology, science, business or the world—with dramatic showman-like panache.

In all the years I knew Ed, I’m not sure he ever really listened to anything I said (though he did use tools I built). He used to like to tell people I’d learned a lot from him. And indeed we had intellectual interests that should have overlapped. But in actuality our ways of thinking about them mostly didn’t connect much at all. But at a personal and social level it was still always a lot of fun being around Ed and being exposed to his unique intense opportunistic energy—with its repeating themes but ever-changing directions.

And there was one way in which Ed and I were very much aligned: both of our lives were deeply influenced by computers and computing. Ed had started with computers in 1956—as part of one of the very first cohorts of programmers. And perhaps on the basis of that experience, he would still, even at the end of his life, matter-of-factly refer to himself as “the world’s best programmer”. Indeed, so confident was he of his programming prowess that he became convinced that he should in effect be able to write a program for the universe—and make all of physics into a programming problem. It didn’t help that his knowledge of physics was at best spotty (and, for example, I don’t think he ever really learned calculus). But his almost lifelong desire to “program physics” did successfully lead him to the concept of reversible logic, and to what’s now called the “Fredkin gate”. But it also led him to the idea that the universe must be a giant cellular automaton—whose program he could invent.

I first met Ed in 1982—on an island in the Caribbean he had bought with money from taking public a tech company he’d founded. The year before, I had started studying cellular automata, but, unlike Ed, I wasn’t trying to “program” them—to be the universe or anything else. Instead, I was mostly doing what amounted to empirical science, running computer experiments to see what they did, and treating them as part of a computational universe of possible programs “out there to explore”. It wasn’t a methodology I think Ed ever really understood—or cared about. He was a programmer (and inventor), not an empirical scientist. And he was convinced—like a modern analog of an ancient Greek philosopher—that by pure thought he could come up with the whole “clockwork” of the universe.

Central to his picture was the idea that at the bottom of everything was a cellular automaton, with its grid of cells somehow laid out in space. I told Ed countless times that what was known from twentieth-century physics implied this really couldn’t be how things worked at a fundamental level. I tried to interest Ed in my way of using cellular automata. But Ed wasn’t interested. He was going for what he saw as the big prize: using them to “construct the universe”.

Every few years Ed would tell me he’d made progress—and rather dramatically say things like that he’d “found the electron”. I’d politely ask for details. Then start pointing out that it couldn’t work that way. But soon Ed would be telling a story or talking about some completely different idea—about technology, business or something else.

By the mid-1980s I’d discovered a lot about cellular automata. And I always felt a bit embarrassed by Ed’s attempt to use them in what seemed to me like a very naive way for fundamental physics—and I worried (as did happen a few times) that people would dismiss my efforts by identifying them with his.

My own career had begun in the 1970s with traditional fundamental physics. And while I didn’t think cellular automata as such could be directly applied to fundamental physics, I did think that the core computational phenomena I’d discovered through studying cellular automata might be very relevant. And then in the early 1990s I had an idea. In a cellular automaton, space has a fixed grid-like structure. But what if the structure of space is in fact dynamic, and everything in the universe emerges just from the dynamics of that structure? Finally I felt as if there might be a plausible computational foundation for fundamental physics.

I wrote about this in one chapter of my 2002 book *A New Kind of Science*. I don’t know if Ed ever read what I wrote, but in any case it didn’t seem to affect his idea that the universe was a cellular automaton—and to confuse things further, he told quite a few people that was what I was saying too. At first I found this frustrating—and upsetting—but eventually I realized it was just “Ed being Ed”, and there were still plenty of things to like about Ed.

Nearly twenty years passed. I would see Ed with some regularity. And sometimes I would mention physics. But Ed would just keep talking about his idea that the universe is a cellular automaton. And when we finally made the breakthrough that led in 2020 to our Physics Project it made me a little sad that I didn’t even try to explain it to Ed. The universe isn’t a cellular automaton. But it is computational. And I think that knowing this would have brought a certain intellectual closure to Ed’s long journey and aspirations around physics.

Ed might have considered physics his single most important quest. But Ed’s life as a whole was filled with a remarkably rich assortment of activities and interests. Computers. Inventions. Companies. Airplanes. MIT. His island. The Soviet Union. Not to mention people, like Marvin Minsky, John McCarthy and Richard Feynman (as well as Tom Watson, Richard Branson, and many more). And he would tell stories about all these people and things, and more. Sometimes (particularly later in his life) the stories would repeat. But with remarkable regularity Ed would surprise me with yet another—often at first hard-to-believe—story about a situation or topic that I had no idea he’d ever been involved in.

But what was the “whole Ed story”? I knew a lot of fragments, often quite colorful. But they didn’t seem to fit together into the narrative of a life. And now that Ed is sadly no longer with us, I decided I should really try to “understand Ed” and his story. A few times over the years I had made efforts to ask Ed for systematic historical accounts—and in 2014 I even recorded many hours of oral history with him. But there was clearly much more. And in writing this piece I found myself going through lots of documents and archives—and having quite a few conversations— and unearthing even yet more stories than I already knew. And in the end there’s a lot to say—and indeed this has turned into the most difficult and complicated biographical piece I’ve ever written. But I hope that everything I’ve assembled will help tell the often so-wild-you-can’t-make-this-stuff-up story of that most singular individual who I knew all those years.

Ed never said much to me about his early life. And in fact I think it was only in writing this piece that I even learned he’d grown up in Los Angeles (specifically, East Hollywood). His parents were both (Jewish) Russian immigrants (his father was born in St. Petersburg; his mother in Odessa; they met in LA). His father’s university engineering studies had been cut short by the Russian Revolution, and he now had a one-man wholesale electronic parts business. His mother had in her youth been trained as a concert pianist, and died when Ed was 11, leaving a somewhat fragmented family situation. Ed had a half-sister, 14 years older than him, a brother 6 years older, and a sister a year older. As he told it in later oral histories, he got interested in both machines and money very early, repairing appliances for a fee even as a tween, and soon learning about the idea of owning stock in companies.

But Ed Fredkin’s first piece of public visibility seems to have come in 1948, when he was 13 years old—and it reminds me so much of many of Ed’s later “self-imposed” adventures. There was at that time an exhibition of historic US documents traveling around the country on a train named the Freedom Train. And when the train came to Los Angeles, the young Ed Fredkin decided he had to be the first person to see it:

The *Los Angeles Times* published his account of his adventure—a younger but “quintessentially Ed” story:

Ed’s record in high school was at best spotty. But as he tells it, he figured out very early a system for improving the odds in multiple-choice tests, and for example in 9th grade got a top score on a newly instituted (multiple-choice) California-wide IQ test. At the end of high school, Ed applied to Caltech (which was only 13 miles away from where he lived), and largely on the basis of his test scores, was admitted. He ended up spending time working various jobs to support himself, didn’t do much homework, and by his sophomore year—before having to pick a major—dropped out. In 2015 Ed told me a nice story about his time at Caltech:

In 1952–53, I was a student in Linus Pauling’s class where he lectured Freshman Chemistry at Caltech. After class, one day, I asked Pauling “What is a superconductor at the highest known temperature?” Pauling immediately replied “Niobium Nitride, 18 Kelvin”. I was puzzled because I had never heard of Niobium, so I looked it up and, with some difficulty found a reference that defined it as a European name for the metal Columbium.

Later that same day, reading a Pasadena newspaper, I saw an article about Pauling: It announced that Pauling had just returned from Europe (London is what I recall) where Pauling, as Chairman of the International Committee on the naming of the elements, had decided that henceforth the metal Columbium would be renamed Niobium.

I recently looked into that matter and discovered that evidently that renaming was part of a USA–Europe Compromise… In Europe it had been Wolfram and Niobium, in the USA it had been Tungsten and Columbium.

Europe got its way re Niobium and the USA got its way re Tungsten… Perhaps it was a flip of a coin? Someone might know.

As a Wolfram, I thought you might be interested (and, of course, perhaps all this is old hat to you…).

(For what it’s worth, I actually didn’t know this “Wolfram story”, though the details weren’t quite as dramatic as Ed said: the “niobium” decision was actually made in 1949, without Pauling specifically involved, though Pauling did indeed travel to London just before the beginning of the 1952 school year.)

With his interest in machinery, Ed had always been keen on cars, and in his freshman year at Caltech, he also decided to learn to fly a plane. Ed’s older brother, Norman, had joined the Air Force five years earlier. And when he left Caltech—in 1954 at age 19—Ed joined the Air Force too. (If he hadn’t done that, he would have been drafted into the Army.) Ed’s brother Norman (who would spend his whole career in aviation) had been involved in the Korean War, particularly doing aerial reconnaissance—here pictured with his plane (and, no, there don’t seem to be any Air Force pictures of Ed himself):

By the time Ed joined the Air Force, the Korean War was over. Ed was assigned to an airbase in Arizona, and by the summer of 1955 he had qualified as a fighter pilot. Ed was never officially a “test pilot”, but he told me stories about figuring out how to take his plane higher than anyone else—and achieving weightlessness by flying his plane in a perfect free-fall trajectory by maintaining an eraser floating in midair in front of him.

By 1956 Ed had been grounded from flying as a result of asthma, and was now at an airbase in Florida as an “intercept controller”—essentially an air traffic controller responsible for guiding fighters to intercept bombers. It was a time when the Air Force was developing the SAGE (Semi-Automatic Ground Environment) air defense system—a huge project whose concept was to use computers to coordinate data from many radars so as to be able to intercept Soviet bombers that might attack the US (cf. *Dr. Strangelove*, etc.). The center of SAGE development was Lincoln Lab (then part of MIT) in Lexington, MA—with IBM providing computers, Bell (AT&T) providing telecommunications, RAND providing algorithms, etc. And in mid-1956 the Air Force sent a group—including Ed—to test the next phase of SAGE. But as Ed tells it, they were soon informed that actually there would be a one-year delay.

At the time, the SAGE project was busily trying to train people about computers, and some people from the Air Force stayed in the Boston area to participate in this. As Ed tells it, however, he was the only one who didn’t drop out of the training—and over the course of a year it taught him “much of what was then known about computer programming and computer hardware design”. There were at the time only a few hundred people in the world who could call themselves programmers. And Ed was now one of them. (Perhaps he was even “the world’s best”.)

Having learned to program, Ed remained at Lincoln Lab, paid by the Air Force, doing what amounted to computational “odd jobs”. Often this had to do with connecting systems together, or coming up with “clever hacks” to overcome particular system limitations. Occasionally it was a little more algorithmic—like when Sputnik was launched in 1957, and Ed got pulled into a piece of “emergency programming” for orbit calculations.

Ed told many stories about “hacking” the bureaucracy at the Air Force (being given a “Secret” stamp so he could read his own documents; avoiding being sent for a year to the Canadian Arctic by finding a loophole associated with his wife being pregnant, etc.)—and in 1958 he left the Air Force (though he would remain a captain in the reserves for many years), but stayed on at Lincoln Lab. Officially he was there as an “administrative assistant”, because—without a degree—that was all they could offer him. But by then he was becoming known as a “computer person”—with lots of ideas. He wanted to start his own company. And (as he tells it) the very first potential customer he visited was an MIT-spinoff acoustics firm called Bolt Beranek & Newman (BBN). And the person he saw there was their “vice president of engineering psychology”—a certain J. C. R. “Lick” Licklider—who persuaded Ed to join BBN to “teach them about computers”.

It didn’t really come to light until he was at BBN, but while at Lincoln Lab Ed had made what would eventually become his first lasting contribution to computer science. He thought of it as a new way of storing textual information in a computer, and he called it “TRIE memory” (after “reTRIEval”). Nowadays we’d call it the trie (or prefix tree) data structure. Here it is for some common words in English made from the letters of “wolf”:

Licklider persuaded Ed to write a paper about tries—which appeared in 1960, and for a couple of decades was essentially Ed’s only academic-style publication:

The paper has a pretty clear description of tries, even with some nice diagrams:

Even in analyzing the performance of tries, there was only the faintest hint of math in the paper—though Ed realized (probably with input from Licklider) that the efficiency of tries would depend on the Shannon-style redundancy of what they were storing, and he ran Monte Carlo simulations to investigate this:

(He explains: “The test program was written in FORTRAN for the IBM 709. The program is composed of 42 subroutines, of which 19 were coded specially for this program and 23 were taken from the library.”)

Tries didn’t make a splash when Ed first introduced them—not least because computers didn’t really have the memory then to make use of them. I think I first heard about them in the late 1970s in connection with spellchecking, and nowadays they’re widely used in lots of text search, bioinformatics and other applications.

Ed had apparently first started talking about tries when he was still in the Air Force. As he explained it to me in 2014:

The Air Force [people] had no idea [what I was talking about]. But I kept on [saying] “I need to find someone who knows something about this that can critique it for me.” And someone says to me, “There’s a guy at MIT who deals in something similar, he calls it lists”. And that was John McCarthy. So, I call up, I get a secretary and, you know, I make a date, and I go to MIT and in building 56 with the computation center, I go to his office and the secretary says he’s somewhere out in the hall. I see some guy wandering back and forth. I go up and say, “You John McCarthy?” He says, “Yes.” So, I say, “I’ve had this idea—” I can’t remember if I was in uniform or not; I might’ve been. I said, “I had this idea, and I’ve written a program and tested it. And might you take a look?” Then he takes this thing, and he starts to read it.

Then he did something that struck me as very weird. He turned around slowly and started walking away, he’s reading and walk, walk, walk, walk, stop. Turns around, walk, walk, walk, walk, back slowly, you know. Finally, he comes back and he stops and he reads and reads. And he’s obviously angry. And I thought, “This is weird.” I said “Does it make sense or anything?” He says, “Yes, it makes sense.” And I said, “Well, what’s up?” He says, “Well, I’ve had the same idea.” And I said, “Oh.” He says, “But I’ve never written it down.” And I said, “Oh, okay. So, do you think I ought to work on it or do something?” He says, “Yeah”. So, that’s how I met John McCarthy.

Ed remained friends with McCarthy for the rest of McCarthy’s life, and involved him in many of his endeavors. In 1956 McCarthy had been one of the organizers of the conference that coined the term “artificial intelligence”, and in 1958 McCarthy began the development of LISP (which was based on linked lists). I have to say I wish I’d known Ed’s story with McCarthy much earlier; I would have handled my own interactions with McCarthy differently—because, as it was, over the course of various encounters from 1981 to 2003 I never persisted very far beyond the curmudgeon stage.

Back around 1958, the circle of “serious computer people” in the Boston area wasn’t very large—and another was Marvin Minsky (who I knew for many years). Between Ed and Licklider, both McCarthy and Minsky became consultants at BBN, and all of them would have many interactions in the years to come.

But in late 1959 there was another entrant in the Boston computer scene: the PDP-1 computer, designed by a certain Ben Gurley for a new company named Digital Equipment Corporation (DEC) that had essentially spun off from Lincoln Lab and MIT. BBN was the first customer for the PDP-1, and Ed was its anchor user:

John McCarthy had had the “theoretical” idea of timesharing, whereby multiple users could work on a single computer. Ed figured out how to make it practical on the PDP-1, in the process inventing what would now be called asynchronous interrupts (then the “sequence break system”). And so began a process which led BBN to become a significant force in computing, the creation of the internet, etc.

But in 1961, Ed and a certain Roland Silver, who also worked at BBN, decided to quit BBN—and, strangely enough, to move to Brazil, where they were enamored of the recently elected new president. But when that new president unexpectedly resigned, they abandoned their plan. And when BBN didn’t want them back, Ed decided to start a company, initially doing consulting for DEC. As Ed tells it, he and Roland Silver were such good friends and had so much they talked about that together they couldn’t get anything done, so they decided they’d better split up.

As I was writing this piece, I decided to look up more about Roland Silver—who I found out had been a college roommate of Marvin Minsky’s at Harvard, and had had a long career in math, etc. at MITRE (the holding company for Lincoln Lab). But I also remembered that many years ago I’d received letters and a rather new-age newsletter from a certain “Rollo Silver”:

Could it be the same person? Yes! And in my archives I also found an ad:

Some time after my work on cellular automata in the 1980s, Roland Silver—together with my longtime friend Rudy Rucker—started a newsletter about cellular automata, notably not mentioning Ed, but including a colorful bio for Silver:

But back to Ed and his story. It was 1961, and Ed had quit his job at BBN. In 1957, he’d met on a Cape Cod beach a woman from Western Massachusetts named Dorothy Abair (who was at the time working at a beauty salon)—and six weeks later they’d married, and now had a 3-year-old daughter. Ed had already lined up some consulting with DEC, and as Ed tells it, with a little “hacking” of bank loans, etc. he was able to officially start Information International Incorporated (III)—with a tiny office in Maynard, MA (home of DEC). But then, one day he gets a call from the Woods Hole Oceanographic Institute. He drives down to Woods Hole with a certain Henry Stommel—an oceanography professor at Harvard—who tells him about a “vortex ocean model”, and asks Ed if he can program it on a PDP-1 so that it displays ocean currents on a screen. And the result is that III soon has a contract for $10k (about $100k today) to do this.

I might add a small footnote here. Years later I was talking to Ed about the origins of cellular automata, and he tells me that a certain Henry Stommel had told him that there were cellular automaton models of sand dunes from the 1930s. At the time—before the web—I couldn’t easily track down who Henry Stommel was (and I had no idea how Ed knew him), and to this day I don’t know what those sand dune models might have been.

But in any case, Ed’s interaction with Woods Hole led to what became III’s first major business: digital reading of film. As Ed tells it:

At Woods Hole … they had these meters which would measure how fast the ocean current was going and which way—and recorded it on 16 mm film with little tiny lights and a little fiber optic thing. And they had built a machine to read that film. I looked at the machine and said “That’ll never work”. And they said “Who are you? Of course it’ll work”, and so on, so forth. OK, so some months later they call me up and say it didn’t work.

I have to tell you this but this is insanely funny. So I decide I’m going to make a film reader and here’s how I’m going to do it. I knew there was a 16 mm projector you could rent from a company and you could stop it and then say “Advance one frame” by clicking and it would just advance one frame at a time. So I thought: say I take the lightbulb out and put a photomultiplier in and point it at the screen of the computer. Then light will come from the screen, go through the lens and be focused on the film, and some would go through the film to the photomultiplier and I would be able to tell how much light got through. And we could write a program to do the rest.

That was my idea, OK.

So not having any money, we rented that projector and I got Digital (DEC) to let me use their milling machine and I bought the photomultiplier tube, and I got Ben Gurley to design the circuitry and connect it to the computer. But there was one more thing. The photomultiplier tube was like a vacuum tube but it had like 16 pins and a very odd connector that no one had. But I thought “Lincoln Labs has parts for everything in their electronics warehouse”. So I called someone I used to work with there, and said “Look, do me a favor and sneak into the parts area, take that part and just give it to me. I’ve ordered one but I’m not going to get it for a while and when I get it I’ll give it to you and you can put it back so it’s not actually a theft.” And he said “OK, I’ll do it” but he asked me why I wanted it and I told him “Well, I’m doing this stuff for Woods Hole to read some film with a computer”.

OK, so he gave me the part and we get it going right away and we’re reading the film, and that solved the problem. But meanwhile this very funny thing happened. Someone from Lincoln Labs found out about all this and said “Hey, you’re reading some kind of film. Is that what you used that thing for?” And I said “Yeah”. And they said “Well, we tried to read some films so we built a gadget and did the same thing you did: we pointed it at the screen of the computer, but we can’t make the software work”. And I said “OK, well, come down and tell me about it”. So they come down and what happens is this. There’s some army people and they have a radar that’s looking at a missile coming in and records on film from an oscilloscope. And they asked could we read this. And to make a long story short they signed another contract….

The whole setup was eventually captured in a patent entitled simply “High-Speed Film Reading”:

And actually this wasn’t Ed’s first patent. That had been filed in 1960, while Ed was at BBN—and it was for a mechanical punched card sorter, with arrays of metal pins and the like, and no computer in evidence:

III ended up discovering that there were many applications—military and otherwise—for film readers. But their Woods Hole relationship led in another direction as well: computer graphics and data visualization. By 1963 there were perhaps 300,000 oceanographic stations recording their data on punched cards, and the idea was to take this data and produce from it a “computer-compiled oceanographic atlas”. The result was a paper:

And with statements like “Only a high-speed computer has the capacity and speed to follow the quickly shifting demands and questions of a human mind exploring a large field of numbers” the paper presented visualizations like:

These various developments put III in the center of the emerging field of film-meets-computers systems. The company grew, moving its center of operations to Los Angeles, not least to be near the Systems Development Corporation (SDC) which RAND had spun off as its software arm in response to the SAGE project.

But Ed was always having new ideas for III, and defining new directions. Ed had brought Minsky and McCarthy into III as board members and consultants, and for example in 1964 III was proposing to SDC a project to make a new version of LISP (and, yes, with no obvious film-meets-computers applications). The proposal gives some insight into the state of III at the time. It says that “From a one-man operation [in 1962], I.I.I. has grown to the point where our gross volume of business for 1964 is in the neighborhood of $1 million [about $10 million today]”. It explains that III has four divisions: Mathematical and Programming Services, Behavioral Science, Operations, and “New York”. It goes on to list various things III is doing: (1) LISP; (2) Inductive Inference on Sequences; (3) Computer Time-Sharing; (4) Programmable Film Readers; (5) The World Oceanographic Data Display System; and (6) Computer Display Systems.

It’s certainly an eclectic collection, reflecting, as such things often do, the character of the company’s founder. From a modern perspective, one item that catches one’s attention is:

One can think of it as an early attempt at AI/machine learning—which 60 years later still hasn’t been solved. (GPT-4 says the next letter should be Q, not O.)

But distractions or not, it was a talented team that assembled at III—with lots of cross-fertilization with MIT. III’s business progressively grew, and perhaps it outgrew Ed—and in 1965 Ed stepped down as CEO. In 1968 he left entirely and (as we’ll discuss below) went to MIT, leaving III in the hands of Al Fenaughty, who, years later (and after nearly 30 years at III), would become the chairman of Yandex.

As someone who’s curious about the ways of company founders, I asked Ed many times about his departure from III. He usually just said: “I had a partner who died”. But it’s only now that I’ve pieced together, partly from my 2014 oral history with Ed, what happened. Ed described it to me as the greatest tragedy of his life.

Shortly after he set up III, Ed persuaded Ben Gurley (designer of the PDP-1) to leave DEC and join him at III. I think Ed had hoped to build computers at III, with Gurley as their designer. But on November 7, 1963, in Concord, MA, just a few miles from where I am as I write this, Ben Gurley was murdered—by a single revolver shot through his dining room window as he was about to sit down for dinner with his wife and 7 children. An engineer from DEC (and Lincoln Labs)—about whom Gurley had recently complained to the police—was arrested, and eventually convicted of the crime (after Ed hired a private detective to help). It later turned out that a few years earlier the same engineer was likely also responsible for shooting (though not killing) another engineer from DEC.

I had always assumed that Ed’s decision to leave III happened just after his “partner had died”. But I now realize that Gurley’s death early in the history of III caused III to go on its path of making things like film readers, rather than the DEC- or IBM-challenging computers I think Ed had hoped for.

Even after Ed left active management of III, he was still its chairman. And in late 1968 something would happen that would change his life forever. Taking tech companies public on the “over-the-counter” market had become a thing, and a broker offered to take III public. And on November 26, 1968, III filed its SEC paperwork:

III’s “principal product to date” is described as a “programmable film reader”, but the paperwork notes that as of October 31, 1968, the company has no film readers on order—though there are orders for its new microfilm reader, which it hasn’t delivered yet. It also says that proceeds from the offering will be used to fund its “proposed optical character recognition project”. But for our purposes what’s perhaps more significant is that the paperwork records that Ed owns 57.7% of the company, with the Edward Fredkin Charitable Foundation owning 0.4%.

On January 8, 1969, III went public, and Ed was suddenly, at least on paper, worth more than $10M (or more than $80M today). Two years later (perhaps as soon as a lockup period expired), Ed cashed out, with the SEC notice indicating that Ed would be “repaying personal indebtedness to a bank incurred by him for reasons unrelated to the company or its business” (presumably a loan he’d taken out before he could achieve liquidity):

So now Ed—at age 37—was wealthy. And in fact the money he made from III would basically last the rest of his life, even through a long sequence of subsequent business failures.

III’s OCR project was never a great success, but III became a key company in digital-to-film systems (relevant to both movies and printing), and in the early 1970s created some of the very first computer-generated special effects, that eventually made it into movies like *Star Wars*. III’s stock price hovered around $10 per share for years, and in 1996—after PostScript had pretty much taken the market for prepress printing systems—III was sold to Autologic for $35M in stock, then in 2001 Autologic was sold to Agfa for $42M.

When III went public in 1969 it was the height of the Cold War (which probably didn’t hurt III’s military sales). And many people—including Ed—thought World War III might be imminent. And so it was that in 1970 Ed decided to buy an island in the Caribbean, close enough to the tropics, he told me subsequently, that, he assumed (incorrectly according to current models), radioactive fallout from a nuclear war wouldn’t reach it.

Apparently Ed was sitting in a dentist’s office when he saw an “Island for Sale” ad in a newspaper. The seller was a shipwreck-scavenging treasure hunter named Bert Kilbride—sometimes called “the last pirate of the Caribbean”—who had started to develop the island (and for several years would manage it for Ed). It’s a fairly small island (about 125 acres, or 0.2 square miles)—in the British Virgin Islands. And its name is Mosquito Island (or sometimes, with some historical justification, Moskito Island). And when Ed bought it, it probably cost something under $1M. (Richard Branson bought the nearby but smaller Necker Island in 1978.)

I visited Ed’s island in January 1982—the first time I met Ed. And, yes, there was a certain “lair of a Bond villain” (think: *Dr. No*) vibe to the whole thing. Here are pictures I took from a boat leaving the island (notice the just-visible seaplane parked at the island):

There was a small resort (and restaurant) on the island, named Drake’s Anchorage (built by the previous owner):

And, yes, there were beaches on the island (though I myself have never been much of a beach-goer):

And, in keeping with the Bond vibe, there was a seaplane too:

There was one house on the island, here pictured from the plane (it so happened that when I visited the island, I was learning to fly small planes myself—so I was interested in the plane):

Visiting a nearby island—with its very rundown airport sign—gives some sense of the overall area:

Ed claimed it was difficult to run the resort on his island, not least because, he said, “the British Virgin Islands have the lowest average worker productivity in the world”. But he nevertheless, for example, had a functioning restaurant, and here I am there in 1982, along with Charles Bennett, about whom we’ll hear more later:

When people talked about Ed, his island was often mentioned, and it projected a general image of overall mystique and extreme wealth. In 1983 a movie called *WarGames* came out, featuring a reclusive military-oriented computer expert named “Professor Falken”—who had an island. Many people assumed Falken was based on Fredkin (and it now says so all over the internet). However, in writing this piece, I decided to find out what was actually true—so I asked one of the writers of the movie, Walter Parkes. He responded, and, yes, fact is often even stranger than fiction:

Unfortunately I can confirm that Ed was not the inspiration for Stephen Falken. The character was inspired by Steven [

sic] Hawking. (Falken = Falcon = Hawking) The movie was first conceived to be about two characters, a young super-genius born into a family incapable of acknowledging his gifts, and a dying scientist in need of a protégé. In the first several drafts Falken was confined to a wheel-chair and was working on understanding the big bang, for which he had created a computer simulation. Little known fact—while writing the character, we had one person in mind to play the role: John Lennon, who was murdered shortly before we finished the script.

(By the way, in a moment of “fact follows fiction”, *WarGames* featured a computer with lots of flashing lights. I happened to see the movie with Danny Hillis, and as we were walking out of the movie, I said to Danny “Perhaps your computer should have flashing lights too”. And indeed flashing lights became a signature feature of Danny’s Connection Machine computer, as later seen in movies like *Jurassic Park*.)

After he left III in 1968, Ed’s next stop would be MIT, and specifically Project MAC (the “Multiple Access Computer” Project). But actually Ed had already been involved much earlier with Project MAC. In many ways the project was a follow-on to what Ed had been doing at BBN on timesharing.

In 1963 Ed wrote a long survey article on timesharing:

The introduction contains a rather charming window onto the view of computers at the time:

And the ads interspersed through the article give a further sense of the time:

As illustrations of what can be done with an interactive timeshared computer, there’s a picture from Ed’s vortex ocean simulation—as well as an example of an online “book” about LISP:

And, yes, already a kind of “cloud computing” story:

There’s also a description of Project MAC—that had just been funded by the Advanced Research Projects Agency (now DARPA). The article said that the “MAC” stood either for “Multiple Access Computer” or “Machine-Aided Cognition”. It included various sections on what might be possible with timesharing:

The main text of the article ends with a rousing (?) vision of AI taking over from humans (and, yes, even though this is from 60 years ago it’s not so different from what at least some people might say about the “AI future” today):

But there’s a curious piece of backstory to Project MAC—from 1961—that appears as a footnote to Ed’s article:

Ed told me versions of this story many times. McCarthy had failed to get tenure at MIT, and was looking for another job. (Yes, in retrospect this seems remarkable given all the things he’d already done by then. But those things were computer science—and MIT didn’t yet have a CS department; McCarthy was in the EE department.) Ed, Minsky and McCarthy were going to an SDC meeting in Los Angeles, and while he was out there McCarthy was going to interview at Caltech (his undergraduate alma mater). They had a free evening, and Ed suggested they meet “someone interesting”. Ed remembered Linus Pauling from his time at Caltech. But Pauling wasn’t in. So Minsky suggested they call Richard Feynman. And he was in, and invited them over to his house.

Feynman apparently showed them things like his nanotech-inspiring tiny motor, etc., but somehow the discussion shifted to AI. And Minsky mentioned work a student of his was doing on the “AI problem” of symbolic integration. Then McCarthy started to explain ways a computer could do algebra. Then, as Ed told it to me in 2014:

Feynman produces this sheaf of papers to show us. It was all algebra. And he says “There’s a problem. I’ve done this calculation, and it’s close to 50 pages. A graduate student has done it too, and Murray Gell-Mann has done it. And the only thing we know for sure is that our three results are mutually inconsistent. And the only conclusion we can arrive at is that a person can’t do this much algebra with the hope of getting it right.” And so the question was could there be some system that could help do a problem like that? So what happened is Marvin [Minsky] and I basically fleshed out the idea of a mathematical thing. And it was agreed that we would do it. Marvin and I decided to divide this task up, that I would do one part, and he would do another. Now, we had one bad idea in there, OK. It’s partly Feynman’s fault, but it’s also Marvin and my fault. He was convinced you could not do [math] by typing it. It had to have some kind of handwriting recognition. So, it was decided I would do the handwriting recognition…

And although I didn’t know this until I was writing this piece, it turns out the original proposal for Project MAC was actually based on the idea of building a system for mathematics, and “Project MAC” was originally the “Project on Mathematics and Computation”. Pretty soon, though, the emphasis of Project MAC would shift to the “infrastructure” of timeshared computing. But there was still a math effort, which in time became the MACSYMA system for computer algebra (written in LISP by students and grandstudents of Minsky).

And here this intersects with my personal story. Because many years later (starting in 1976) I would use that system—along with other early computer algebra systems—to do all sorts of physics calculations. My archives still contain an example of what it was like in 1980 to log in to “Project MAC” over the ARPANET (my username was “swolf” in those days; note the system message, the presence of 15 MITishly-named “lusers” altogether, and yes, mail):

But, actually, in late 1979 I had already decided to “do my own thing” and build my own system for doing mathematical computation, and eventually much more. And indeed when I first met Ed in 1982 I had recently finished the first version of SMP, and to commercialize it I had started my first company. In 1986 I started to build Mathematica (and what’s now Wolfram Language)—which was released in 1988. Ed started using Mathematica very soon after it was released, and basically continued to do so for the rest of his life.

But picking up the original Project MAC narrative from 1963: the old group from BBN had dispersed but were still writing together about timesharing (and when they said a “debugging system” they meant essentially what we would now call an operating system):

And when Project MAC launched in 1963, its “steering committee” included Minsky, Gurley—and Ed. (John McCarthy had landed at Stanford, where he would remain for the rest of his life. I first met him in 1981, at a time when Stanford was trying to recruit me. There was a lunch with the CS department; people went around the room and introduced themselves. McCarthy unhelpfully—and confusingly—said he was “John Smith”.)

In 1968, Ed left III—and Minsky, together with Licklider (who had by then become director of Project MAC), persuaded the MIT EE department to hire Ed as a visiting professor for the year. Ed had been spending most of his time at III in Los Angeles, but III also had a *pied-à-terre* in the Boston area, and indeed its IPO documents listed its address as 545 Technology Square, Cambridge—the very building in which Project MAC was located.

At MIT, Ed invented and taught a freshman course on “Problem Solving”. He told me many times one of his favorite “problem exercises”. Imagine there’s a person who can cure anyone who’s sick just by touching them. How could one set things up to make the best use of this? I must say I never find such implausible hypotheticals terribly interesting. But Ed was proud of a solution that he’d come up with (I think in discussion with Minsky and McCarthy) that involved systematically shuttling millions of people past the healer.

This probably didn’t come from that particular course, but here are some notes I found in an archive of Ed’s papers at MIT that perhaps suggest some of the flavor of the course (we’ll talk about Ed’s interest in the Soviet Union later):

In 1968 MIT—and Project MAC in particular—was at the very center of emerging ideas about computer science and AI. A picture from that time captures Ed (third from left) with a few of the people involved: Claude Shannon, John McCarthy and Joe Weizenbaum (creator of ELIZA, the original chatbot):

At the end of the 1968 academic year student reviews from Ed’s course were unexpectedly good, and MIT needed faculty members who could be principal investigators on the government grants that were becoming plentiful for computing—and one of those typical-for-Ed “surprising things” happened: MIT agreed to hire him as a full professor with tenure, despite his lack of academic qualifications. It was a watershed moment for Ed, and I think a piece of validation that he carried with pride for the rest of his life. (For what it’s worth, while Ed was an extreme case, MIT was at that time also hiring at least some other people without the usual PhD qualifications into CS professor positions.)

In 1971 Licklider stepped down from his position as director of Project MAC—and Ed assumed the position. His archives from the time contain lots of administrative material—studies, reports, proposals, budgets, etc.—including many pieces reflecting things like the birth of the ARPANET, the maturing of operating systems and the general enthusiasm about the promise of AI.

One item (conceivably from an earlier time) is Ed’s summary of “Information Processing Terminology” for PDP-1 users, complete with definitions like: “A bit is a binary digit or any thing or state that represents a binary digit. Equivalently, a bit is a set with exactly two members. Note that a bit is not one of the members of such a set”:

Ed does not seem to have been very central to the intellectual activities around Project MAC, and the emerging Lab for Computer Science and AI Lab. But his name shows up from time to time. And, for example, in the classic “HAKMEM” collection of 191 math and CS “hacks” from the AI Lab, there are two—both very number oriented—attributed to Ed:

Rollo Silver gets mentioned too—notably in connection with “random number generators” involving XORs (and, yes, the code is assembly code—for a PDP-10):

Also in HAKMEM is the “munching squares” algorithm—that I was later shown by Bill Gosper:

And talking of Gosper (whom I’ve known since 1979, and who almost every week seems to send me mail with a surprising new piece of math he’s found with Mathematica): in 1970 the Game of Life cellular automaton had come on the scene, and Gosper and others at MIT were intensely studying it, with Gosper triumphantly discovering the glider gun in November 1970. Curiously—in view of all his emphasis on cellular automata—Ed doesn’t seem to have been involved.

But he did do other things. In 1972, for example, as a kind of spinoff from his Problem Solving course, he formed a group called “The Army to End the War” (i.e. the Vietnam War), whose idea was that it was time to stop the government fighting an unwinnable war, and this could be achieved by having an organization that would coordinate citizens to threaten a run on banks unless the war was ended. Needless to say, though, this didn’t really fit well with the project Ed ran being funded by the Department of Defense.

Between MIT being what it is, and Ed being who he was, there were often strange things that happened. As Ed tells it, one day he was in Marvin Minsky’s office talking about unrecognized geniuses, and a certain Patrick Gunkel walks in, and identifies himself as such. Ed ended up having a long association with Gunkel, who produced such documents as:

(Gunkel’s major goal was to create what he called “ideonomy”, or the “science of ideas”, with divisions like isology, chorology, morology and crinology. I met Gunkel once, in Woods Hole, where he had become something of a local fixture, riding around town with his cat in his bicycle basket.)

But after a few years as director of Project MAC, in 1974 Ed was onto something new: being a visiting scholar at Caltech. After his 1961 encounter, he had gotten to know Richard Feynman—who always enjoyed spending time with “out of the box” people like Ed. And so in 1974 Ed went for a year to Caltech, to be with Feynman.

My own efforts (and successes) with cellular automata may perhaps have had something to do with it. But I think at least in the later part of his life, Ed felt his greatest achievements related to cellular automata and in particular his idea that the universe is a giant cellular automaton. I’m not sure when Ed really first hatched this idea, or indeed started to think about cellular automata. Ed had told me many times that when he’d told John McCarthy “the idea”, McCarthy suggested testing it by looking for “roundoff error” in physics, analogous to roundoff error from finite precision in computers. Ed scoffed at this, accusing McCarthy of imagining that there was literally “an IBM 709 computer in the sky”. And Ed’s implication was that he had gotten further than that, imagining the universe to be made more abstractly from a cellular automaton.

I didn’t know quite when this exchange with McCarthy was supposed to have taken place (and, by the way, some of the emerging experimental implications of our Physics Project are precisely about finding evidence of discrete space through something quite analogous to “roundoff errors” in the equations for spacetime). But Ed’s implication to me was always that he’d started exploring cellular automata sometime before 1960.

In the mid-1990s, researching history for my book *A New Kind of Science*, (as I’ll discuss below) I had a detailed email exchange and long phone conversation with Ed about this. The result was a statement in my notes about the history of cellular automata:

At the time, Ed made it sound very convincing. But in writing this piece, I’ve come to the conclusion it’s almost certainly not correct. And of course that’s disappointing given all the effort I put into the history notes in my book, and the almost complete lack of other errors that have surfaced even after two decades of scrutiny. But in any case, it’s interesting to trace the actual development of Ed’s ideas.

One useful piece of evidence is a 25-page document from 1969 in his archives, entitled “Thinking about New Things”—that seems to outline Ed’s thinking at the time. Ed explains “I am not a Physicist, in fact I know very little about modern physics”—but says he wants to suggest a new way of thinking about physics:

Soon he starts talking about the possibility that the universe is “merely a simulation on a giant computer”, and relates a version of what he told me about his interaction with John McCarthy:

He talks (in a rather programmer kind of way) about the beginning of the universe:

He goes on—again in a charmingly “programmer” way:

A bit later, Ed is beginning to get to the concept of cellular automata:

And there we have it: Ed gets to (3D) cellular automata, though he calls them “spatial automata”:

And now he claims that spatial automata can exhibit “very complex behavior”—although his meaning of that will turn out to be a pale shadow of what I discovered in the early 1980s with things like rule 30:

But at this point Ed already seems to think he’s almost there—that he’s almost reproduced physics:

A little later he’s discussing doing something very much in my style: enumerating possible rules:

And still further on he actually talks about 1D rules. And in some sense it might seem like he’s getting very close to what I did in the early 1980s. But his approach is very different. He’s not doing “science” and “empirically seeing what cellular automata do”. Or even being very interested in cellular automata for their own sake. Instead, he’s trying to engineer cellular automata that can “be the universe”. And so for example he wants to consider only left-right symmetric cellular automata “because the universe is isotropic”. And having also decided he wants cellular automata that are symmetric under interchange of black and white (a property he calls “syntactic symmetry”), he ends up with just 8 rules. He could just have simulated these by running them on a computer. But instead he tries to “prove” by pure thought what the rules will do—and comes up with this table:

Had he done simulations he might have made pictures like these (labeled using my rule-numbering scheme):

But as it was he didn’t really come to any particular conclusion, other than what amount to a few simple “theorems” about what “data processing” these cellular automata can do:

I must say I find it very odd that—particularly given all the stories about his activities and achievements he told me—Ed never in the four decades I knew him mentioned anything about having thought about 1D cellular automata. Perhaps he didn’t remember, or perhaps—even after everything I wrote about them—he never really knew that I was studying 1D cellular automata.

But in any case, what comes next in the 1969 document is Ed getting back to “pure thought” arguments about how cellular automata might “make physics”:

It’s a bit muddled (though, to be fair, this was a document Ed never published), but at the end it’s basically saying that if the universe really is just a cellular automaton then one should be able to replace physical experiments (that would, for example, need particle accelerators) with “digital hardware” that just runs the cellular automaton. The next section is entitled “The Design of a Simulator”, and discusses how such hardware could be constructed, concluding that a 1000×1000×1000 3D grid of cells could be built for $50M (or nearly half a billion dollars today).

After that, there’s one final (perhaps unfinished) section that reads a bit like a caricature of “I’ve-got-a-theory-of-physics-too” mechanical models of physics:

But, OK, so what does this all mean? Well, first, I think it makes it rather clear that (despite what he told me) by 1969—let alone 1961—Ed hadn’t actually implemented or run cellular automata in any serious way. It’s also notable that in this 1969 piece Ed isn’t using the term “cellular automaton”. The concept of cellular automata had been invented many times, under many different names. But by 1969 the term “cellular automaton” was pretty firmly established, and in fact 1969 might have represented the very peak up to that point of interest in cellular automata in the world at large. But somehow Ed didn’t know about this—or at least wasn’t choosing to connect with it.

Even at MIT Frederick Hennie in the EE department had actually been studying cellular automata—albeit under the name “iterative arrays”—since the very beginning of the 1960s. In 1968 E. F. Codd from IBM (who laid the foundations for SQL—and who worked with Ed’s friend John Cocke) had published a book entitled *Cellular Automata*. Alvy Ray Smith—in the same department as John McCarthy at Stanford—was writing his PhD thesis on “cellular automata”. In 1969 Marvin Minsky and Seymour Papert published their *Perceptrons* book, and were apparently talking a lot about cellular automata. And for example by the fall of 1969 Papert’s student Terry Beyer had written a thesis about the “recognition and transformation of figures by iterative arrays of finite state automata”—under the auspices of Project MAC, presumably right under Ed’s nose. (And, no, the thesis doesn’t mention Ed, though it mentions Minsky.)

Right around that time, though, something happens. Ed had been convinced—probably by Minsky and McCarthy—that any cellular automaton capable of “being the universe” better be computation universal. And now there’s a student named Roger Banks who’s working on seeing what kind of (2D) cellular automaton would be needed to get computation universality. Banks had found examples requiring much fewer than the 29 states von Neumann and Burks had used in the 1950s. But—as he related to me many times—Ed challenged Banks to find a 2-state example (“implementable purely with logic gates”), and Banks soon found it, first describing it in June 1970:

Banks had apparently been interacting with the “Life hackers” at MIT, and in November 1970 some of the thunder of his result was stolen when Bill Gosper at MIT discovered the glider gun, which suggested that even the rules of the Game of Life (albeit involving 9 rather than 5 2D neighbors) were likely to be sufficient for computation universality.

But for our efforts to trace history, Banks’s June 1970 report has a number of interesting elements. It relates the history of cellular automata, without any mention of Ed. But then—in its one mention of Ed—it says:

The “mod-2 rule” that Ed told me he’d simulated in 1961 has finally made an appearance. In an oral history years later Terry Winograd reported that in 1970 he “went to a lecture of Papert’s in which he described a conjecture about cellular automata [which Winograd] came back with a proof of”.

By January 1971, Banks is finishing his thesis, which is now officially supervised by Ed (even though it’s nominally in the mechanical engineering department):

Most of Banks’s work is presented as what amount to “engineering drawings”, but he mentions that he has done some simulations. I don’t know if these included simulations of the mod-2 rule but it seems likely.

So was 1969 or 1970 the first time the mod-2 rule had been heard from? I’m not sure, but I suspect so. But to confuse things there’s a “display hack” known as “munching squares” (described in HAKMEM) that looks in some ways similar, and that was probably already seen in 1962 on the PDP-1. Here are the frames in a small example of munching squares:

Here’s a video of a bigger example:

I expect Ed saw munching squares, perhaps even in 1962. But it’s not the mod-2 rule—or actually a cellular automaton at all. And even though Ed certainly had the capability to simulate cellular automata back at the beginning of the 1960s (and could even have recorded videos of 2D ones with III’s film technology) the evidence we have so far is that he didn’t. And in fact my suspicion is that it was probably only around the time I met Ed in 1982 when it finally happened.

In May 1981 there’d been a conference at MIT on the Physics of Computation. I’d been invited, but in the end I couldn’t go—because (in a pattern that has repeated many times in my life) it coincided with the initial release of my SMP software system. Still, in December 1981 I got the following invitation:

In January 1982 I was planning to go to England to do a few weeks of intensive SMP development on a computer that a friend’s startup had—and I figured I would go to the Caribbean “on the way”.

It was an interesting group that assembled on January 18, 1982, on Mosquito Island. It was the first time I met my now-longtime friend Greg Chaitin. There were physicists there, like Ken Wilson and David Finkelstein. (Despite the promise of the invitation, Feynman’s health prevented him from coming.) And then there were people who’d worked on reversible computation, like Rolf Landauer and Charles Bennett. There were Tom Toffoli and Norm Margolus, who had their cellular automaton machine with them. And finally there was Ed. At first he seemed a little Gatsby-like, watching and listening, but not saying much. I think it was the next morning that Ed pulled me aside rather conspiratorially and said I should come and see something.

There was just one real house (as opposed to cabin) on the island (with enough marble to clinch the Bond-villain-lair vibe). Ed led me to a narrow room in the house—where there was a rather-out-of-place-for-a-tropical-island modern workstation computer. I’d seen workstation computers before; in fact, the company I’d started was at the time (foolishly) thinking of building one. But the computer Ed had was from a company he was CEOing. It was a PERQ 1, made by Three Rivers Computer Corporation, which had been founded by a group from CMU including McCarthy’s former student Raj Reddy. I learned that Three Rivers was a company in trouble, and that Ed had recently jumped in to save it. I also learned that in addition to any other challenges the engineers there might have had, he’d added the requirement that the PERQ be able to successfully operate on a tropical island with almost 100% humidity.

But in any case, Ed wanted to show me something on the screen. And here’s basically what it was:

Ed pressed a button and now this is what happened:

I’d seen plenty of “display hacks” before. Bill Gosper had shown me ones at Xerox PARC back in 1979, and my archives even contain some of the early color laser printer outputs he gave me:

I don’t remember the details of what Ed said. And what I saw looked like “display hacks flashing on the screen”. But Ed also mentioned the more science-oriented idea of reversibility. And I’m pretty sure he mentioned the term “cellular automaton”. It wasn’t a long conversation. And I remember that at the end I said I’d like to understand better what he was showing me.

And so it was that Ed handed me a PERQ 8” floppy disk. And now, 41 years later, here it is, sitting— still unread—in my archives:

It’s not so easy these days to read something like this—and I’m not even sure it will have “magnetically survived”. But fortunately—along with the floppy—there’s something else Ed gave me that day. Two copies of a 9-page printout, presumably of what’s on the floppy:

And what’s there is basically a Pascal program (and the PERQ was a very Pascal-oriented machine; “PERQ” is said to have stood for “Pascal Engine that Runs Quicker”). But what does the program do? The main program is called “CA1”, suggesting that, yes, it was supposed to do something with cellular automata.

There are a few comments:

And there’s code for making help text:

Apparently you press “b” to “clear the Celluar [*sic*] Automata boundary”, “n” for “Fredkin’s Pattern” and “p” for “EF1”. And at the end there’s a reference to munching squares. The first pattern above is what you get by pressing “n”; the second by pressing “p”.

Both patterns look pretty messy. But if instead you press “a”, you get something with a lot more structure:

I think Ed showed this to me in passing. But he was more interested in the more complicated patterns, and in the fact that you could get them to reverse what they were doing. And in this animated form, I suspect this just looked to me like another munching squares kind of thing.

But, OK, given that we have the program, can we tell what it actually does? The core of it is a bunch of calls to the function rasterop(). Functions like rasterop() were common in computers with bitmapped displays. Their purpose was to apply a certain Boolean operation to the array of black and white pixels in a region of the screen. Here it’s always rasterop(6, …) which means that the function being applied is Boolean function 6, or Xor (or “sum mod 2”).

And what’s happening is that chunks of the screen are getting Xor’ed together: specifically, chunks that are offset by one pixel in each of the four directions. And this is all happening in two phases, swapping between different halves of the framebuffer. Here are the central parts of the sequence of frames that get generated starting from a single cell:

It helps a lot to see the separate frames explicitly. And, yes, it’s a cellular automaton. In fact, it’s exactly the “reversible mod-2 rule”. Here it is for a few more steps, with its simple “self-reproduction” increasingly evident:

Back in 1982 I think I only saw the PERQ that one time. But in one of the resort cabins on the other side of the island—there was this (as captured in a slightly blurry photograph that I took):

It was a “cellular automaton machine” built out of “raw electronics” by Tom Toffoli and Norm Margolus—who were the core of Ed’s “Information Mechanics” group at MIT. It didn’t feel much like science, but more like a video DJ performance. Patterns flashing and dancing on the screen. Constant rewiring to produce new effects. I wanted to slow it all down and “sciencify” it. But Tom and Norm always wanted to show yet another strange thing they’d found.

Looking in my archives today, I find just one other photograph I took of the machine. I think I considered this the most striking pattern I saw the machine produce. And, yes, presumably it’s a 2D cellular automaton—though despite my decades of experience with cellular automata I don’t today immediately recognize it:

What did I make of Ed back in 1982? Remember, those were days long before the web, and before one could readily look up people’s backgrounds. So pretty much all I knew was that Ed was connected to MIT, and that he owned the island. And I had the impression that he was some kind of technology magnate (and, yes, the island and the plane helped). But it was all quite mysterious. Ed didn’t engage much in technical conversations. He would make statements that were more like pronouncements—that sounded interesting, but were too vague and general for me to do much more than make up my own interpretations for them. Sometimes I would try to ask for clarification, but the response was usually not an explanation, but instead a tangentially related—though often rather engaging—story.

All these years later, though, one particular exchange stands out in my memory. It was at the end of the conference. We were standing around in the little restaurant on the island, waiting for a boat to arrive. And Ed said out of the blue: “I’ll make a deal with you. You teach me how to write a paper and I’ll teach you how to build a company.” At the time, this struck me as quite odd. After all, writing papers seemed easy to me, and I assumed Ed was doing it if he wanted to. And I’d already successfully started a company the previous year, and didn’t think I particularly needed help with it. (Though, yes, I made plenty of mistakes with that company.) But that one comment from Ed somehow for years cemented my view of him as a business tycoon who didn’t quite “get” science, though had ideas about it and wanted to dabble in it.

Ed would later describe Richard Feynman as his best friend. As we discussed above, they’d first met in 1961, and in 1974 Ed had spent the year at Caltech visiting Feynman, having, as Ed tells it, made a deal (analogous to the one he later proposed to me) that he would teach Feynman about computers, and Feynman would teach him about physics. I myself first got to know Feynman in 1978, and interacted extensively with him not only about physics, but also about symbolic computing—and cellular automata. And in retrospect I have to say I’m quite surprised that he mentioned Ed to me only a few times in passing, and never in detail.

But I think the point was that Feynman and Ed were—more than anything else—personal friends. Feynman tended to find “traditional academics” quite dull, and much preferred to hang out with more “unusual” people—like Ed. Quite often the people Feynman hung out with had quite kooky ideas about things, and I think he was always a little embarrassed by this, even though he often seemed to find it fun to indulge and explore those ideas.

Feynman always liked solving problems, and applying himself to different kinds of areas. But I have to say that even I was a little surprised when in writing this piece I was going through the archives of Ed’s papers at MIT, and found the following letter from Feynman to Ed:

Clearly he—like me—viewed Ed as an authority on business. But what on earth was this “cutting machine”, and why was Feynman trying to sell it?

For what it’s worth, the next couple of pages tell the story:

Feynman’s next-door neighbor had a company that made swimwear, and this was a machine for cutting the necessary fabric—and Feynman had helped develop it. And much as Feynman had been prepared to help his neighbor with this, he was also prepared to help Ed with some of his ideas about physics. And in the archive of Ed’s papers, there’s a letter from Feynman:

I don’t know whether this is the first place the term “Fredkin gate” was ever used. But what’s here is a quintessential example of Feynman diving into some new subject, doing detailed calculations (by hand) and getting a useful answer—in this case about what would become Ed’s best-known invention: reversible logic, and the Fredkin gate.

Feynman had always been interested in “computing”. And indeed when he was recruited to the Manhattan Project it was to run a team of human computers (equipped with mechanical desk calculators). I think Feynman always hoped that physics would “become computational” at least in some sense—and he would for example lament to me that Feynman diagrams were such a bad way to compute things. Feynman always liked the methodology of traditional continuous mathematics, but (as I just noticed) even in 1964 he was saying that “I believe that the theory that space is continuous is wrong, because we get these infinities and other difficulties…”. And elsewhere in his 1964 lectures that became *The Character of Physical Law* Feynman says:

Did Feynman say these things because of his conversations with Ed? I rather doubt it. But as I was writing this piece I learned that Ed thought differently. As he told it:

I never pressed any issue that would sort of give me credit, okay? It’s just my nature. A very weird thing happened toward the end of my time at Caltech. Richard Feynman and I would get into very fierce arguments. . . . I’m trying to convince him of my ideas, that at the bottom is something finite and so on. He suddenly says to me, “You know, I’m sure I had this same idea sometime quite a while ago, but I don’t remember where or how or whether I ever wrote it down.” I said, “I know what you’re talking about. It’s a set of lectures you gave someplace. In those lectures you said perhaps the world is finite.” He just has this little statement in this book. I saw the book on his shelf. I got it out, and he was so happy to see that there. What I didn’t tell him was he gave that lecture years after I’d been haranguing him on this subject. I knew he thought it was his idea, and I left it that way. That was just my nature.

Notwithstanding what he said, I rather suspect he did push the point. And for example when Feynman gave a talk on “Simulating Physics with Computers” at the 1981 MIT Physics of Computation conference that Ed co-organized, he was careful to write that:

Ed, by the way, arranged for Feynman to get his first personal computer: a Commodore PET. I don’t think Feynman ended up using it terribly much, though in 1984 he took it with him on a trip to Hawaii where he and his son Carl used it to work out probabilities to try to “crack” the randomness of my rule 30 cellular automaton (needless to say, without success).

Back at MIT in 1975 after his year at Caltech, Ed was no longer the director of Project MAC, but was still on the books as a professor, albeit something of an outcast one. Soon, though, he was teaching a class about his ideas—under the title of “Digital Physics”:

Cellular automata weren’t specifically mentioned in the course description—though in the syllabus they were there, with the Game of Life as a key example:

Back in the 1960s, cellular automata had been a popular topic in theoretical computer science. But by the mid-1970s the emphasis of the field had switched to things like computational complexity theory—and, as Ed told me many times, his efforts to interest people at MIT in cellular automata failed, with influential CS professor Albert Meyer (whose advisor Patrick Fischer had worked quite extensively on cellular automata) apparently telling Ed that “one can tell someone is out of it if they don’t think cellular automata are dead”. (It’s an amusing irony that around this time, Meyer’s future wife Irene Greif would point John Moussouris—who we’ll meet later—to Ed and his work on cellular automata.)

Ed’s ideas about physics were not well received by the physicists at MIT. And for example when students from Ed’s class asked the well-known MIT physics professor Philip Morrison what he thought of Ed’s approach, he apparently responded that “Of course Fredkin thinks the universe is a computer—he’s a computer person; if instead he were a cheese merchant he’d think it was a big cheese!”

When Ed was at Caltech in 1974 a big focus there—led by Carver Mead—was VLSI design. And this led to increasing interest in the ultimate limits on computation imposed by physics. Ever since von Neumann in the 1950s it had been assumed that every step in a computation would necessarily require dissipation of energy—and this was something Carver Mead took as a given. But if this was true, how could Ed’s cellular automaton for the universe work? Somehow, Ed reasoned, it—and any computation, for that matter—had to be able to run reversibly, without dissipating any energy. And this is what led Ed to his most notable scientific contribution: the idea of reversible logic.

Ordinary logic operations—like And and Or—take two bits of input and give one bit of output. And this means they can’t be reversible: with only one bit in the output there isn’t information to uniquely determine the two bits of input from the output. But if—like Ed—you consider a generalized logic operation that for example has both two inputs and two outputs, then this can be invertible, i.e. reversible.

The concept of an invertible mapping had long existed in mathematics, and under the name “automorphisms of the shift” had even been studied back in the 1950s for the case of what amounted to 1D cellular automata (for applications in cryptography). And in 1973 Charles Bennett had shown that one could make a reversible analog of a Turing machine. But what Ed realized is that it’s possible to make something like a typical computer design—and have it be reversible, by building it out of reversible logic elements.

Looking through the archive of Ed’s papers at MIT, I found what seem to be notes on the beginning of this idea:

And I also found this—which I immediately recognized as a sorting network, in which values get sorted through a sequence of binary comparisons:

Sorting networks are inevitably reversible. And this particular sorting network I recognized as the largest guaranteed-optimal sorting network that’s known—discovered by Milton Green at SRI (then “Stanford Research Institute”) in 1969. It’s implausible that Ed independently discovered this exact same network, but it’s interesting that he was drawing it (by hand) on a piece of paper.

Ed’s archives also contain a 3-page draft entitled “Conservative Logic”:

Ed explains that he is limiting himself to gates that implement permutations

and then goes on to construct a “symmetric-majority-parity” gate—which he claims is “computation universal”:

It’s not quite a Fredkin gate, but it’s close. And, by the way, it’s worth pointing out that these gates alone aren’t “computation universal” in something like the Turing sense. Rather, the point is that—like with Nand for ordinary logic—any reversible logic operation (i.e. permutation) with any number of inputs can be constructed using just these gates, connected by wires.

Ed didn’t at first publish anything about his reversible logic idea, though he talked about it in his class, and in 1978 there were already students writing term papers about it. But then in 1978, as Ed told it later:

I found this guy Tommaso Toffoli. He had written a paper that showed how you could build a reversible computer by storing everything that an ordinary computer would have to forget. I had figured out how to have a reversible computer that didn’t store anything because all the fundamental activity was reversible. Okay? So I decided to hire him because he was the only person who tried to do it and he didn’t succeed, really, and I had—and I hired him to help me.

Toffoli had done a first PhD in Italy building electronics for cosmic ray detectors, and in 1978 he’d just finished a second PhD, working on 2D cellular automata with Art Burks (who had coined the name “cellular automaton”). Ed brought Toffoli to MIT under a grant to build a cellular automaton machine—leading to the machine I saw on Ed’s island in 1982. But Ed also worked with Toffoli to write a paper about conservative logic—which finally appeared in 1982, and contained both the Fredkin gate, and the Toffoli gate. (Ed later griped to me that Toffoli “really hadn’t done much” for the paper—and that after all the Toffoli gate was just a special case of the Fredkin gate.)

Back in 1980—on the way to this paper—Ed, with Feynman’s encouragement, had had another idea: to imagine implementing reversible logic not just abstractly, but through an explicit physical process, namely collisions between elastic billiard balls. And as we saw above, Feynman quickly got into analyzing this, for example seeing how a Fredkin gate could be implemented just with billiard balls.

But ultimately Ed wanted to implement reversibility not just for things like circuits, but also—imitating the reversibility that he believed was fundamental to physics—for cellular automata. Now the fact is that reversibility for cellular automata had actually been quite well studied since the 1950s. But I don’t think Ed knew that—and so he invented his own way to “get reversibility” in cellular automata.

It came from something Ed had seen on the PDP-1 back in 1961. As Ed tells it, in playing around with the PDP-1 he had come up with a piece of code that surprised him by drawing something close to a circle in pixels on the screen. Minsky had apparently “gone into the debugger” to see how it worked—and in 1972 HAKMEM attributed the algorithm to Minsky (though in the Pascal program I got from Ed in 1982, it appears as a function called efpattern()). Here’s a version of the algorithm:

And, yes, with different divisors *d* it can give rather different (and sometimes wild) results:

But for our purposes here what’s important is that Ed found out that this algorithm is reversible—and he realized that in some sense the reason is that it’s based on a second-order recurrence. And, once again, the basic ideas here are well known in math (cf. reversibility of the wave equation, which is second order). But Ed had a more computational version: a second-order cellular automaton in which one adds mod 2 the value of a cell two steps back. And I think in 1982 Ed was already talking about this “mod-2 trick”—and perhaps the PERQ program was intended to implement it (though it didn’t).

Ed’s work on reversible logic and “digital physics” in a sense came to a climax with the 1981 Physics of Computation conference at MIT—that brought in quite a *Who’s Who* of people who’d been interested in related topics (as I mentioned above, I wasn’t there because of a clash with the release of SMP Version 1.0, though I did meet or at least correspond with most of the attendees at one time or another):

Originally Ed wanted to call the conference “Physics and Computation”. But Feynman objected, and the conference was renamed. In the end, though, Feynman gave a talk entitled “Simulating Physics with Computers”—which most notably talked about the relation between quantum mechanics and computation, and is often seen as a key impetus for the development of quantum computing. (As a small footnote to history, I worked with Feynman quite a bit on the possibility of both quantum computing and quantum randomness generation, and I think we were both convinced that the process of measurement was ultimately going to get in the way—something that with our Physics Project we are finally now beginning to be able to analyze in much more detail.)

But despite his interactions with Feynman, Ed was never too much into the usual ideas of quantum mechanics, hoping (as he said in the flyer for his course on digital physics) that perhaps quantum mechanics would somehow fall out of a classical cellular-automaton-based universe. But when quantum computing finally became popular in the 1990s, reversible logic was a necessary feature, and the Fredkin gate (also known as CSWAP or “controlled-swap”) became famous. (The Toffoli gate—or CCNOT—is a bit more famous, though.)

In tracing the development of Ed’s ideas, particularly about “digital physics”, there’s another event worthy of mention. In late 1969 Ed learned about an older German tech entrepreneur named Konrad Zuse who’d published an article in 1967 (and a book in 1969) on *Rechnender Raum (Calculating Space)*—mentioning the term “cellular automata”:

Although Zuse was 24 years older than Ed, there were definitely similarities between them. Zuse had been very early to computers, apparently building one during World War II that suffered an air raid (and may yet still lie buried in Berlin). After the war, Zuse started a series of computer companies—and had ideas about many things. He’d been trained as an engineer, and perhaps it was having worked on solving his share of PDEs using finite differences that led him to the idea—a bit like Ed’s—that space might fundamentally be a discrete grid. But unlike Ed, Zuse for the most part seemed to think that—as with finite differences—the values on the grid should be continuous, or at least integers. Ed arranged for Zuse’s book to be translated into English, and for Zuse to visit MIT. I don’t know how much influence Zuse had on Ed, and when Ed talked to me about Zuse it was mostly just to say that people had treated his ideas—like Ed’s—as rather kooky. (I exchanged letters with Zuse in the 1980s and 1990s; he seemed to find my work on cellular automata interesting.)

It wasn’t just physics that Ed had ideas about. It was lots of other things too. Sometimes the ideas would turn into businesses; more often they’d just stay as ideas. Ed’s archive, for example, contains a document on the “Intermon Idea” that Ed hoped would “provide a permanent solution to the world’s problem of not having a stable medium of exchange”:

And, no, Ed wasn’t Satoshi Nakamoto—though he did tell me several times that (although, to his displeasure, it was never acknowledged) he had suggested to Ron Rivest (the “R” of RSA cryptography) the idea of “using factoring as a trapdoor”. And—not content with solving the financial problems of the world, or, for that matter, fundamental physics—Ed also had his “algorithmic plan” to prevent the possibility of World War III.

And then there was the Muse. Marvin Minsky had long been involved with music, and had assembled out of electronic modules a system that generated sequences of musical notes. But in 1970 Ed and Minsky developed what they called the Muse—whose idea was to be a streamlined system that would use integrated circuits to “automatically compose music”:

In actuality, the Muse produced sequences of notes determined by a linear feedback shift register—in essence a 1D additive cellular automaton—in which the details of the rule were set on its front panel as “themes”. The results were interesting—if rather R2-D2-like—but weren’t what people usually thought of as “music”. Ed and Minsky started a company named Triadex (note the triangular shape of the Muse), and manufactured a few hundred Muses. But the venture was not a commercial success.

Particularly through interacting with Minsky, Ed was quite involved in “things that should be possible with AI”. The Muse had been about music. But Ed also for example thought about chess—where he wanted to build an array of circuits that could tree out possible moves. Working with Richard Greenblatt (who had developed an earlier chess machine) my longtime friend John Moussouris ended up designing CHEOPS (a “Chess-Oriented Processing System”) while Ed was away at Caltech. (Soon thereafter, curiously enough, Moussouris would go to Oxford and work with Roger Penrose on discrete spacetime—in the form of spin networks. Then in later years he would found two important Silicon Valley microprocessor companies.)

Keeping on the chess theme, Ed would in 1980 (through his Fredkin Foundation) put up the Fredkin Prize for the first computer to beat a world champion at chess. The first “pre-prize” of $5k was awarded in 1981; the second pre-prize of $10k in 1988—and the grand prize of $100k was awarded in 1997 with some fanfare to the IBM Deep Blue team.

Ed also put up a prize for “math AI”, or, more specifically, automated theorem proving. It was administered through the American Math Society and a few “milestone prizes” were given out. But the grand Leibniz Prize “for the proof of a ‘substantial’ theorem in which the computer played a major role” was never claimed, the assets of the Fredkin Foundation withered, and the prize was withdrawn. (I wonder if some of the things done in the 1980s and 1990s by users of Mathematica should have qualified—but Ed and I never made this connection, and it’s too late now.)

Particularly during his time at MIT, Ed did a fair amount of strategy consulting for tech companies—and Ed would tell me many stories about this, particularly related to IBM and DEC (which were in the 1980s the world’s two largest computer companies).

One story (whose accuracy I’ve never been able to determine) related to DEC’s ultimately disastrous decision not to enter the personal computer business. As Ed tells it, a team at DEC did a focus group about PCs—with Ken Olsen (CEO of DEC) watching. There was a young teacher in the group who was particularly enthusiastic. And Olsen seemed to be getting convinced that, yes, PCs were a good idea. As the focus group was concluding, the teacher listed off all sorts of ways PCs could change the world. But then, fatefully, he added right at the end: “And I don’t just mean here on Earth”. Ed claims this was the moment when Olsen decided to kill the PC project at DEC.

Ed told a story from the early 1970s about a giant IBM project called FS (for “Future Systems”):

IBM has this project. They’re going to completely revolutionize everything. The project is to design everything from the smallest computer to the new largest. They’re all to be multiprocessors. The specs were just fantastic. They promised to guarantee their customers 100% uptime. Their plans were, for instance, when you have a new OS, it’s updated. They guarantee 24-hour operation at all times. They plan to be able to update the OS without stopping this process. Things like that, a lot of goals that are very lofty, and so on.

Someone at IBM whom I knew very well, a very senior guy, came to me one day and said, “Look, these guys are in trouble, and maybe MIT could help them.” I organized something. Just under 30 professors of computer science came down to IBM. We got there on Sunday night and starting Monday morning, we got one lecture an hour, eight on Monday, Tuesday, Wednesday, Thursday, and four on Friday, describing the system. It was just spectacular, everything they were trying to do, but it was full of all kinds of idiocy. They were designing things that they’d never used. This whole thing was to be oriented about people looking at displays.

No one at IBM had done anything like that. They think, “Okay, you should have a computer display,” and they came up with certain problems that hadn’t occurred to the rest of us. If you’re looking at the display, how can you tell the difference between what you had put into the computer and what the computer had put in? This worried them. They came up with a hardware fix. When you typed, it always went on the right half of the screen; when the computer did something, it always went on the left half, or I may have it backwards, but that was the hardware.

…

What happened is I came to realize that they were so over their head in their goal that they were going to annihilate themselves with this thing. It was just going to be the world’s greatest fiasco for it. I started cornering people and saying, “Look, do you realize that you’re never going to make this work?” and so on, so forth. This came to the attention of people at IBM, and it annoyed them. I got a call from someone saying, “Look, you’re driving us nuts. We want to hear you out, so we’re going to conduct a debate.” There’s a guy named Bob [Evans], who was the head of the project. What happened was we’re in the boardroom with IBM, lots of officials there, and he and I have a debate.

I’m debating that they have to kill the project and do something else. He’s debating that they shouldn’t kill the project. I made all my points. He made all his points. Then a guy named Mannie Piore, who was the one who thought of the idea of having a research laboratory, a very senior guy said to me, he said, “Hey, Ed,” he said, “We’ve heard you out.” He says, “This is our company. We can do this product even if you think we shouldn’t.” I said, “Yes, I admit that’s true.” He said, “You presented your case. We’ve heard you out, and we want to do it.” I said, “Okay.” He said, “Can you do us a favor?” I said, “What’s that?’ He said, “Can you stop going around talking to people about why it has to be killed?” I said, “Look, I’ve said my piece. I’ve been heard out.” “Yes. Okay.” “I quit.”

I had only one ally in that room; that was John Cocke. As we were walking out of the room, he came over to me and said, “Don’t worry, Ed.” He said, “It’s going to fall over of its own weight.” I’ll never forget that. Ten days later, it was canceled. A lot of people were very mad at me.

I’m not sure what Ed was like as an operational manager of businesses. But he certainly had no shortage of opinions about how businesses should be run, or at least what their strategies should be. He was always keen on “do-the-big-thing” ideas. I remember him telling me multiple times about a company that did airplane navigation. It had put a certain number of radio navigation beacons into its software. Ed told me he’d asked about others, and the company had said “Well, we only put in the beacons lots of people care about”. Ed said “Just put all of them in”. They didn’t. And eventually they were overtaken by a company that did.

Ed’s great business success—and windfall—was III. But Ed was also involved with a couple dozen other companies—almost all of which failed. There’s a certain charm in the diversity of Ed’s companies. There was Three Rivers Computer Corporation, that made the PERQ computer. There was Triadex, that made the Muse. There was a Boston television station. There was an air taxi service. There was Fredkin Enterprises, importing PCs into the Soviet Union. There was Drake’s Anchorage, the resort on his island. There was Gensym, a maker of AI-oriented process control systems, which was a rare success. And then there was Reliable Water.

Ed’s island—like many tropical islands—had trouble getting fresh water. So Ed decided to invent a solution, coming up with a new, more energy-optimized way to do reverse osmosis—with a dash of AI control. Reliable Water announced its product in May 1987, desalinating water taken from Boston Harbor and serving it to journalists to drink. (Ed told me he was a little surprised how willingly they did so.)

Looking at my archives I see I was sufficiently charmed by the picture of Ed posing with his elaborate “intelligent” glass tubing that I kept the article from *New Scientist*:

As Ed told it to me, Reliable Water was just about to sell a major system to an Arab country when his well-pedigreed CEO somehow cheated him, and the deal fell through.

But what about the television station? How did Ed get involved with that? Apparently in 1969 Jerry Wiesner, then president of MIT, encouraged Ed to support a group of Black investors (led by a certain Bertram Lee) who were challenging the broadcasting license of Boston’s channel 7. Years went by, other suitors showed up, and litigation about the license went all the way to the Supreme Court (which described the previous licensee as having shown an “egregious lack of candor” with the FCC). For a while it seemed like channel 7 might just “go dark”. But in early January 1982 (just a couple of weeks before I first met him) Ed took over as president of New England Television Corporation (NETV)—and in May 1982 NETV took over channel 7, leaving Ed with a foot of acquisition documents in his home library, and a television channel to run:

There’d been hopes of injecting new ideas, and adding innovative educational and other content. But things didn’t go well and it wasn’t long before Ed stepped down from his role.

A major influence on Ed’s business activities came out of something that happened in his personal life. In 1977 Ed had been married for 20 years and had three almost-grown children. But then he met Joyce. On a flight back from the Caribbean he sat next to a certain Joyce Wheatley who came from a prominent family in the British Virgin Islands and had just graduated with a BS in economics and finance from Bentley College (now Bentley University) in Waltham, MA. As both Ed and Joyce tell it, Ed immediately gave advice like that the best way to overcome a fear of flying was to learn to fly (which much later, Joyce in fact did).

Joyce was starting work at a bank in Boston, but matters with Ed intervened, and in 1980 the two of them were married in the Virgin Islands, with Feynman serving as Ed’s best man (and at the last minute lending Ed a tie for the occasion). In 1981, Ed and Joyce had a son, who they named Richard after Richard Feynman (though now themed as “Rick”)—of whom Ed was very proud.

When Ed died, Joyce and he had been married for 43 years—and Joyce had been Ed’s key business partner all that time. They made many investments together. Sometimes it’d start with a friend or vendor. Sometimes Ed (or Joyce) would meet students or others—who’d be invited over to the house some evening, and leave with a check. Sometimes the investments would be fairly hands-off. Sometimes Ed would get deeply involved, even at times playing CEO (as he did with Three Rivers and NETV).

When the web started to take off, Ed and Joyce created a company called Capital Technologies which did angel investing—and ended up investing in many companies with names like Sourcecraft, SqueePlay, EchoMail, Individual Inc. and Radnet. And—like so many startups of this kind—most failed.

Ed also continued to have all sorts of ideas of his own, some of which turned into patents. And—like so much to do with Ed—they were eclectic. In 1995 (with a couple of other people) there was one based on using evanescent waves (essentially photon tunneling) to more accurately find the distance between the read/write head and the disk in a disk drive or CD-ROM drive. Then in 1999 there was the “Automatic Refueling Station”—using machine vision plus a car database to automate pumping gas into cars:

That was followed in 2003 by a patent about securely controlling telephone switching from web clients. In 2006, there was a patent application named simply “Contract System” about an “algorithmic contract system” in which the requirements of buyers and sellers of basically anything would be matched up in a kind of tiling-oriented geometrical way:

In 2011 there was “Traffic Negotiation System”, in which cars would have rather-airplane-like displays installed that would get them in effect to “drive in formation” to avoid traffic jams:

Ed’s last patent was filed in 2015, and was essentially for a scheme to cache large chunks of the web locally on a user’s computer—a kind of local CDN.

But all these patents represented only a small part of Ed’s “idea output”. And for example Ed told me many other tech ideas he had—a few of which I’ll mention later.

And Ed’s business activities weren’t limited to tech. He did his share of real-estate transactions too. And then there was his island. For years Joyce and Ed continued to operate Drake’s Anchorage, and tried to improve the infrastructure of the island—with Ed, as Joyce tells it, more often to be found helping to fix the generator on the island than partaking of its beaches.

Back in 1978 Ed had acquired a “neighbor” when Richard Branson bought Necker Island, which was a couple of miles further out towards the Atlantic than Moskito Island. Ed told me quite a few stories about Branson, and for years had told me that Branson wanted to buy his island. Ed hadn’t been interested in selling, but eventually agreed to give Branson right of first refusal. Then in 2007 a Czech (or were they a Russian?) showed up and offered to buy the island for cash “to be delivered in a suitcase”. It was all rather sketchy, but Ed and Joyce decided it was finally time to sell, and let Branson exercise his right of first refusal, and buy the island for about $10M.

Ed liked to buy things. Computers. Cars. Planes. Boats. Oh, and extra houses too (Vermont, Martha’s Vineyard, Portola Valley, …)—as well as his island. Ed would typically make decisions quickly. A house he drove by. New tech when it first came out. He was always proud of being an early adopter, and he’d often talk almost conspiratorially about the “secret” features he’d figured out in new tech he’d bought.

But I think Ed’s all-time favorite “toys” were planes—and over the course of his life he owned a long sequence of them. Ed was a serious (and, by all reports, exceptionally good) pilot—with an airplane transport pilot license (plus seaplane and glider licenses). And I always suspected that his cut-and-dried approach to many things reflected his experience in making decisions as a pilot.

Ed at different times had a variety of kinds of planes, usually registered with the vanity tail number N1EF. There were twin-propellor planes. There were high-performance single-propellor planes. There was the seaplane that I’d “met” in the Caribbean. At one time there was a jet—and in typical fashion Ed got himself certified to fly the jet singlehandedly, without a copilot. Ed had all sorts of stories about flying. About running into Tom Watson (CEO of IBM) who was also a pilot. About getting a new type of plane where he thought he was getting #5 off the production line, but it was actually #1—and one day its engine basically melted down, but Ed was still able to land it.

Ed also had gliders, and competed in gliding competitions. Several times he told me a story—as a kind of allegory—about another pilot in a gliding competition. Gliders are usually transported with their wings removed, with the wings attached in order to fly. Apparently there was an extra locking pin used, which the other pilot decided to remove to save weight, because it didn’t seem necessary. But when the glider was flying in the competition its wings fell off. (The pilot had a parachute, but landed embarrassed.) The very pilot-oriented moral as far as Ed was concerned: just because you don’t understand why something is there, don’t assume it’s not necessary.

One of the topics about which Ed often told “you-can’t-make-this-stuff-up” stories was the Soviet Union. Ed’s friend John McCarthy had parents who were active communists, had learned Russian, and regularly took trips to the Soviet Union. And as Ed tells it McCarthy came to Ed one day and said (perhaps as a result of having gotten involved with a Russian woman) “I’m moving to the Soviet Union”, and talked about how he was planning to dramatically renounce his US citizenship. McCarthy began to make arrangements. Ed tried to talk him out of it. And then it was 1968 and the Soviets send their tanks into Czechoslovakia—and McCarthy is incensed, and according to Ed, sends a telegram to a very senior person in the Soviet Union saying “If you invade Czechoslovakia then I’m not coming”. Needless to say, the Soviets ignored him. Ed told me he’d said at the time: “If the Russians were really smart and really understood things, and they had to choose between John McCarthy and Czechoslovakia, they should have chosen John McCarthy.” (McCarthy would later “flip” and become a staunch conservative.)

Perhaps through McCarthy, Ed started visiting the Soviet Union. He didn’t like the tourist arrangements (required to be through the government’s Intourist organization)—and decided to try to do something about it, sending a survey to Americans who’d visited the Soviet Union:

A year later, Ed was back in the Soviet Union, attending a somewhat all-star conference (along with McCarthy) on AI—with a rather modern-sounding collection of topics:

Here’s a photograph of a bearded Ed in action there—with a very Soviet simultaneous translation booth behind him:

Ed used to tell a story about Soviet computers that probably came from that visit. The Soviet Union had made a copy of an IBM mainframe computer—labeling it as a “RYAD” computer. There was a big demo—and the computer didn’t work. The generals in charge asked “Well, did you copy everything?” As it turned out, there was active circuitry in the “IBM” logo—and that needed to be copied too. Or at least that’s what Ed told me.

But Ed’s most significant interaction with the Soviet Union came in the early 1980s. The US had in place its CoCom list that embargoed export of things like personal computers to the Soviet Union. Meanwhile, within the Soviet Union, photocopiers were strictly controlled—to prevent non-state-sanctioned flow of information. But as Ed tells it, he hatched a plan and sold it to the Reagan administration, telling them: “You’re on the wrong track. If we can get personal computers into the Soviet Union, it breaks their lock on the flow of information.” But the problem was he had to convince the Soviets they wanted personal computers.

In 1984 Ed was in Moscow—supposedly tagging along to a physics conference with an MIT physicist named Roman Jackiw. He “dropped in” at the Computation Center of the Academy of Sciences (which, secretly, was a supplier to the KGB of things like speech recognition tech). And there he was told to talk to a certain Evgeny Velikhov, a nuclear physicist who’d just been elected vice president of the Academy of Sciences. Velikhov arranged for Ed to give a talk at the Kremlin to pitch the importance of computers, which apparently he successfully did, after convincing the audience that his motivation was to make the world a safer place by balancing the technical capabilities of East and West.

And as if to back up this point, while he was in the Soviet Union, Ed wrote a 5-page piece from “A Concerned Citizen, Planet Earth” addressed “To whom it may concern” in Moscow and Washington—ending with the suggestion that its plan might be discussed at an upcoming meeting between Andrei Gromyko and Ronald Reagan at the UN:

The piece mentions another issue: the fate of prominent, but by then dissident, Soviet physicist Andrei Sakharov, who was in internal exile and reportedly on hunger strike. Ed hatched a kind of PCs-for-Sakharov plan in which the Soviets would get PCs if they freed Sakharov.

Meanwhile, in true arms-dealer-like fashion, he’d established Fredkin Enterprises, S.A. which planned to export PCs to the Soviet Union. He had his student Norm Margolus spend a summer analyzing the CoCom regulations to see what characteristics PCs needed to have to avoid embargo.

In the Reagan Presidential Library there’s now a fairly extensive file entitled “Fredkin Computer Exports to USSR”—which for example contains a memo reporting a call made on August 25, 1984, by then-vice-president George H. W. Bush to Sakharov’s stepdaughter, who was by that time living in Massachusetts (and, yes, Ed was described as a “PhD in computer science” with a “flourishing computer business”):

Soon the White House is communicating with the US embassy in Moscow to get a message to Ed:

And things are quickly starting to sound as if they were from a Cold War spy drama (there’s no evidence Ed was ever officially involved with the US intelligence services, though):

I don’t think Ed ever ended up talking to Sakharov, but on November 6, 1984, Fredkin Enterprises was sent a letter by Velikhov ordering 100 PCs for the Academy of Sciences, and saying they hoped to order 10,000 more. But the US was not as speedy, and in 1985 there was still back and forth about CoCom issues. Ed of course had a plan:

And indeed in the end Ed did succeed in shipping at least some computers to the Soviet Union, adding a hack to support Cyrillic characters. Ed often took his family with him to Moscow, and he told me that his son Rick created quite a stir when at age 6 he was seen there playing a game on a computer. Up to then, computers had always been viewed as expensive tools for adults. But after Rick’s example there were suddenly all sorts of academicians’ kids using computers.

(In the small world that it is, one person Ed got to know in the Academy of Sciences was a certain Arkady Borkovsky—who in 1989 would leave Russia to come work at our company, and who would later co-found Yandex.)

By the way, to fill in a little color of the time, I might relate a story of my own. In 1987 I went to a (rather Soviet) conference in Moscow on “Logic, Methodology and Philosophy of Science.” Like everyone, I was assigned a “guide”. Mine continually tried to pump me for information about the American computer industry. Eventually I just said: “So what do you actually want to know?” He said: “We’ve cloned the Intel 8086 microprocessor, and we want to know if it’s worth cloning the Motorola 68000. Motorola has put a layer of epoxy that makes it hard to reverse engineer.” He assumed that the epoxy was at the request of the US government, to defeat Soviet efforts—and he didn’t believe me when I said I thought it was much more likely there to defeat Intel.

Ed told me another story about his interactions with Soviet computer efforts after Gorbachev came to power:

Before the days of integrated circuits the way IBM and Digital built computers was they put the whole computer together, and then it would sit for six weeks in “system integration” while they made the pieces work together and slowly got the bugs out.

The Russians built computers differently because that seemed logical to them. They’d send all the components down there and then some guy was supposed to plug them together, and they were supposed to work. But they didn’t. With these big computers, they never made any of them work.

The Academy of Sciences had one. And one time I went to see their big computer, so they unlock the doors to this dusty room where the computer is, where it’s not being used because it doesn’t work, and all this information is being kept secret, not from the United States, but from the leadership. When I discovered all this I documented it … and I wrote a 40-page document that explained it.

I was making trips with Rick often and Mike [his older son] very often. On one trip when I arrived, they tell me, “Oh, you have to come to this meeting.”

I don’t speak Russian. I never knew it. I’m seated at this meeting, and there’s a Russian friend of mine [head of the Soviet Space Research Institute] next to me. We’re just sitting there, and things are going on. I still don’t know what that meeting was, but I had this 40-page document. I gave it to my friend. He starts reading. He says, “Oh, this is so interesting.” It got to be about ten o’clock at night and they said, “Everyone come back in the morning. Nine o’clock.”

My friend said, “Can I borrow this [document]? I’ll bring it back in the morning”. I said, “Sure, go ahead.” He comes back next morning. He says to me, “I have good news, and I have bad news.” I said, “What’s the good news?” He says, “Your document has been translated into Russian.” I said, “You left here with a 40-page typewritten document. I don’t believe you.” He said, “Well, my institute recently took on the task of translating scientific American into Russian.

“When I left here, I went to my institute, called in the translators, and they all came in. We divided the document up between them, and it’s all been translated into Russian.”

The document was the analysis of the RYAD situation with the recommendation that the only thing they could do was to cancel it all.

I said, “Okay, what’s the bad news?” He says, “The bad news is it’s classified secret.” When you made a copy or did something, you had to have a government person look at it. They classified it. I said to him, “You can’t classify my documents.” He said, “Of course not. We haven’t. It’s just the Russian one that’s secret.”

Then maybe a week later, he said, “Gorbachev’s read your document.” He canceled it. RYAD. Some people I know were looking to kill me.

In Moscow, there’s a building that’s so unusual. It’s on a highway leading into the city. It’s about five stories high. It’s about a kilometer long, okay? It’s a giant building. I was in it a few years ago, and it’s just a beehive of startups, almost all software startups. That was the RYAD Software Center, okay? 100,000 people got put out of work.

When I first met Ed in 1982, he was in principle a professor at MIT. But he was also CEOing a computer company (Three Rivers), and, though I didn’t know it at the time, had just become president of a television channel. Not to mention a host of other assorted business activities. MIT had a policy that professors could do other things “one day a week”. But Ed was doing other things a lot more than that. Ed used to say he was “tricked” out of his tenured professorship. Because in 1986 he was convinced that with all the other things he had going on, he should become an adjunct professor. But apparently he didn’t realize that tenure doesn’t apply to adjunct professors. And, as Ed told it, the people in the department considered him something of a kook, and without tenure forcing them to keep him, were keen to eject him.

Minsky’s neighbor in Brookline, MA, was a certain Larry Sulak—the very energetic chairman of the physics department at Boston University (and someone I have known since the 1970s). Ed knew Sulak and when Ed was ejected from MIT, Sulak seized the opportunity to bring Ed in as a physics professor at Boston University. Sulak asked me to write a letter about Ed (and, yes, particularly after the research for this piece, there are some things I would change today):

Subject: Re: Ed Fredkin

Date: Aug 24, 1988

From: Stephen Wolfram

To: Larry Sulak

Dear Larry:

In this century, people like Ed Fredkin have been very rare. Ed Fredkin

is a gentleman scientist. He has made several fortunes in business, yet

he chooses to spend much of his time thinking about science.

The main thing he thinks about is what ideas from computing can tell us

about physics. This is an area that I believe has fundamental importance

for physics. There are many issues about the behaviour of complex

physical systems where the best hope for analysis and understanding comes

from computational ideas. There are also many traditional problems

in quantum physics and other fundamental areas that I suspect are most

likely to be solved by thinking about things from a computational point

of view.

Ed Fredkin has had some very good ideas about physics and its relation

to computation. Probably the single most important was his independent discovery

of the possibility of thermodynamically reversible computation.

von Neumann got this wrong — by thinking about things from a computational

point of view, Fredkin got it right.

Fredkin has been convinced for many years that cellular automata —

basically computational models — could describe fundamental physical

processes. As you know, I have worked on using cellular automata to

model various specific physical processes. Fredkin is trying to do something

grander — he wants to show that all of physics can be reproduced by

a cellular automaton. If he is right the discovery would be one whose

importance could be compared to the discovery of quantization.

Of course, what he is trying to show may not be true, but that is a risk

that any new fundamental idea in physics faces.

Ed Fredkin’s style is not typical of scientists. He is more used to

addressing boards of directors than lecture audiences. He learned

the kind of physics that is in the Feynman lectures by spending time

with Dick Feynman rather than reading his books. To some standard

scientists, Fredkin at first seems like a nut. To be sure, some of his

ideas are pretty nutty. But if you listen and think about it, there

is much substance to what Fredkin has to say.

I gather that Fredkin has decided to spend some time around “ordinary

physicists”, to try and work out how his ideas fit in with current

physical thinking. I believe you are very lucky that Fredkin wants

to do this in your department.

Best wishes,

Stephen

And so it was that Ed became a research professor of physics at Boston University (BU). At MIT he’d gotten a DARPA grant that supported Tom Toffoli and Ed’s only “physics PhD student” Norm Margolus in building ever-larger “cellular automaton machines”. And when Ed moved to BU, this effort moved with him, leaving in effect “no trace of Ed” at MIT.

When Ed arrived at BU he found he was assigned to an office with a certain Gerard ‘t Hooft—who happens to be one of the more creative and productive theoretical physicists of the past half-century (and would win a Nobel Prize in 1999 for his efforts). Ed became friends with ‘t Hooft, inviting him and his family to spend time on his island, and later on the boat that Ed bought in the south of France. Feynman died in 1988, and Ed would tell me that he thought he’d “traded” one great physicist for another. (Feynman had suggested Ed try Sidney Coleman, but Coleman wasn’t into it.)

Like Feynman, I think ‘t Hooft felt a little uneasy with Ed’s statements about physics. But in 2016 ‘t Hooft ended up publishing a book entitled *The Cellular Automaton Interpretation of Quantum Mechanics*. I thought it was a nice recognition of ‘t Hooft’s friendship with Ed. But Ed told me in no uncertain terms that he thought ‘t Hooft hadn’t given him the credit he was due—though in reality I don’t think what ‘t Hooft did was much related to Ed’s actual work and ideas. (And, by the way, it’s not directly related to my efforts either, though conceivably looking at “generational states” in our Physics Project may give something at least somewhat analogous.)

In 1994 Ed’s direct affiliation with BU ended—though he remained on good terms with the department, and after I moved to the Boston area in 2002 I would often see him at an annual dinner the BU physics department put on for “Boston-area physics people”.

In 1998 Ed would summarize himself like this:

Ed Fredkin has worked with a number of companies in the computer field and has held academic positions at a number of universities. He is a computer programmer, a pilot, advisor to businesses and an amatuer [

sic] physicist. His main interests concern digital computer like models of basic processes in physics.

For a while, Ed didn’t have a “university affiliation” (except, through Minsky, as a visitor at the MIT Media Lab), but in 2003—through his friend Raj Reddy—he became a professor (now of computer science) at Carnegie Mellon University, for a while spending time at their West Coast outpost, but mostly just making occasional trips in his plane to Pittsburgh.

For a few years after I first met Ed in 1982, I’d see him fairly regularly. In 1983 I invited him to the first “modern” conference on cellular automata, that I co-organized at Los Alamos. I visited his house in Brookline, MA, a few times. I saw him at the Aspen Center for Physics, and at other places around the world. He was always fun and lively—and told great stories about all sorts of things. He gave the impression that he was mostly spending his time doing big things in business, and that science was an avocation for him. Sometimes he would talk about cellular automata—though I now realize that what he said was either very general and philosophical (leaving me to interpret things in my own way), or very specific to particular rules he’d engineered.

It was always a bit uncomfortable when it came to physics. Because the things Ed was saying always seemed to me pretty naive. Quite often I would challenge them—and frustratedly tell Ed that he should learn twentieth-century physics. But Ed would glide over it—and be off telling some other (engaging) story, or some such.

In 1986 I co-organized (with Tom Toffoli and Charles Bennett) a conference called *Cellular Automata ’86—*at MIT. Ed didn’t come—and I think I had the impression that he’d rather lost interest in cellular automata by that time. I myself went off to start my Center for Complex Systems Research, and then to found Wolfram Research and start the development of Mathematica. Mathematica was released on June 23, 1988—and our records (yes, we’ve kept them!) show that Ed registered his first copy on December 14, 1988. In March 1991 I did a lecture tour about Mathematica 2.0, and saw Ed one last time before diving into work on my book *A New Kind of Science—*which led me for more than a decade to became an almost complete scientific hermit.

I saw Ed (now 62 years old) when I briefly “came up for air” in connection with the release of Mathematica 3.0 in 1996, and we continued occasionally to exchange pleasant emails:

Date: Sun, 29 Jun 1997 15:49:41 -0400

From: Ed Fredkin

To: Stephen Wolfram

…

[Reporting the birth of my second child]

…

For many children its worst when they are teenagers. Some glide through

that period of life without hassle. Rick is doing great (at 15) despite

his unorthodox education. He relishes calling his parents dopes, but

aside from arguments about subjects like how late he should be able to

hang out with his buddies, its clear that he doesn’t think we’re dopes.

…

I promise to read your book as soon as I get it!

…

Its nice to hear from you. News here is that I am no longer needed at

Radnet as they now have a great CEO. I got a new airplane in December.

It’s called a Cessna CitationJet. It can carry 7 people at about 440

mph. So far its been a lot of fun. We’ll have to think of an excuse to

go for a ride. We are planning to spend some time at Drake’s Anchorage

in July. Its great for kids so if that interests you, let me or Joyce

know.

I have taken as a challenge to architect a computer (that weighs a few

kilos) that assumes another 100 years of Moore’s Law (10^15 in cost

performance). There are a lot of unsuspected problems lurking in the

details, but everyone of them seems to have easy solutions. I have

given a number of talks (IBM Almaden and Watson labs, Intel, NYU,

etc.). Interest in reversible computing has picked up since heat

dissipation has gotten to be a really hot topic (no pun intended). The

next high end Alpha may dissipate as much as 150 watts. Think of a

light bulb!

I use Mathematica for something almost every week… keep it up!

Best regards,

Ed

Although I didn’t see Ed myself for quite a few years, Ed would always write to ask for betas of new versions of Mathematica, and he would sometimes chat with staff from my company at trade shows. I thought it a bit odd in 1999 when I heard that in such an encounter he said that he was the one who had “introduced me to cellular automata”. And, moreover, that he, Feynman and Murray [Gell-Mann] were the people who’d suggested I write SMP—which was particularly bizarre since, among other things, I hadn’t met (or even heard of) Ed until about 3 years later.

Then, out of the blue on September 13, 2000, Ed calls my assistant, and follows up with an email:

Subject: Invitation

Date: Wed, 13 Sep 2000 23:53:09 -0400

From: Ed Fredkin

To: Stephen Wolfram

Hi,

The primary reason I’m contacting you has to do with a program I’m

organizing at Carnegie Mellon (CMU). I wrote a proposal to the NSF, called

“The Digital Perspective” and got funded. The idea is to invite a number (8

to 10) of guests to come to CMU for a few days, to meet with students and to

give a Distinguished Lecture. The NSF would also like to arrange for the

guests to come to Washington D.C. and give the same lecture there.

By “Digital Perspective” I mean looking at aspects of the world as Digital

Processes. As you know, I am most interested in looking at physics this

way. I have just started getting commitments from potential participants.

Gerard ‘t Hooft has agreed to come and a number of other good physicists are

thinking about it.

…

Please consider this to be a formal invitation. Of course, CMU will pay

expenses and an honorarium. If the timing works out, it can probably be

arranged for many of the students to have read your book before you come.

You might get some good feedback from bright students who have also gained

familiarity with the thoughts of others who are thinking about the “Digital

Perspective”. The seminar will run throughout the 2000-2001 academic year.

If you can make it to CMU, I expect that it will be fun and interesting;

both for you, for me and for many others.

…

I responded:

Subject: RE: Invitation

Date: Thu, 14 Sep 2000 06:49:19 -0500

From: Stephen Wolfram

To: Ed Fredkin

Thanks for the invitation, etc.

It sounds like a thing I’d like to do, but I can only consider

*anything* after my book is finished.

…

If my book is done in time for your program, then, yes, I’d like to

participate (though of course I’d want more details about the actual

plans etc. etc.). But if the book isn’t done, then sadly I just can’t.

If the cutoff time is June 2001, I am not extremely hopeful that the

book will be done … but if it’s fall 2001 the probabilities go up

substantially (though, sadly, they are still not 100%).

…

And what are you up to these days? Business? Science? Other?

On another topic:

In my book, I’m trying very hard to write accurate history notes about

the things I discuss. And for the notes on the history of cellular

automata I’ve been meaning for ages to ask you some questions…

I’m not sure this is a complete list, but here are a few I’ve been

curious about for a long time that I’d really like to know the answers

to…

I know that history is hard … even if it’s about oneself. I consider

that I have a good memory, but it’s often hard for me to keep straight

what happened when, and why, etc. But anything you can tell me about

these questions … or about other aspects of CA history … I’d be very

grateful for.

1. As far as you know, did you invent the 2D XOR CA rule? (I’m assuming

the answer is “yes”…)

2. In what year did you first simulate this CA? On what computer?

Where?

3. What other CA rules did you study at that time?

4. Do you still have any material from the simulations you did

(printouts, tapes, programs, etc.)?

5. When you learn about the “munching squares” display hack? How did it

relate to your work on the XOR CA?

6. What did you know about the work done by Unger etc. on cellular image

processors? How did this relate to your work?

7. What did you know about von Neumann’s work on cellular automata? How

did it relate to your work?

8. What did you know about Ulam and others’ work at Los Alamos on

simulating cellular automata? How did it relate to your work?

9. Were you aware of work on cryptographic applications of CA-like

systems?

…

Ed responded:

Subject: RE: Invitation

Date: Fri, 15 Sep 2000 01:25:23 -0400

From: Ed Fredkin

To: Stephen Wolfram

Hi,

Here are some answers and some free association type ramblings.

…

> And what are you up to these days? Business? Science? Other?

I’m winding down on business (I’m into one last e-business project) and like

you, working on a book. My guess is that mine is nowhere as ambitious as

yours… It’s just to document my ideas about Digital Mechanics (Physics).

In any case, these ideas have made more progress in the last 2 years than in

the previous 40.

I bought a sailboat which is moored in Antibes, France. I spent most of the

summer there and got more science done than in the prior several years.

It’s absolutely the perfect place and circumstance for me to work on my

stuff. Gerard ‘t Hooft (plus wife and daughter) came down and joined us for

a while. You know (I hope) about his interest in CA’s? I’m going back

there for a few weeks on Tuesday.

Here’s a formal proof that you can, at any time, escape all your normal

responsibilities and concentrate exclusively on one really important thing

(hint, hint). The proof is that, at any time, YOU CAN DIE. I don’t mean to

be morbid, but sometimes it makes good sense to consider that proof and

temporarily abandon all but some very important task (or some very exciting

or fun thing).

Ed continued with a long response to my “history questionnaire”:

> 1. As far as you know, did you invent the 2D XOR CA rule? (I’m assuming

> the answer is “yes”…)

> the answer is “yes”…)

Yes, as far as I know I did invent it. Here is what I did. I decided to

look for the simplest possible rule that met certain criteria. I wanted

spatial symmetry and a symmetric rule vis-à-vis the states of the cells.

The thought was to find something so simple that its behavior could be

understood while not so simple as to be totally dull. The first such rule I

tried was the XOR rule. I programmed it first on the PDP-1 (1961, at BBN

and III) where I could see it on the display, and later I wrote a program

for CTSS using a model 33 teletype as a terminal. My motivation was then,

as it is now, to be able to capture more and more properties of physics

within a Digital model. I found an easy proof as to why patterns reappeared

in any number of dimensions. I also found, at the beginning, a formula for

the number of ones as a function of time from a single one as the initial

state. My recollection was that it was something like 2D 2^b(t) where D is

the number of dimensions, t is the time step, and b(t) is the number of bits

that are one in the binary representation of t (the tally function). After

I showed all this to Seymour Papert, he generalized the proof re self

replication from XOR (sum mod 2) to sum mod any prime. (Some time around

1967)

> 2. In what year did you first simulate this CA? On what computer?

Where?

See above.

> 3. What other CA rules did you study at that time?

I found a simple proof that a von Neumann neighborhood CA could exactly

emulate any other (such as the 3×3 neighborhood) and used this as a reason

to look at nothing else. I explored so many different rules that I probably

would have found the game of Life had I not put blinders on. After I came

to MIT (1968), I had 2 things in mind, to find a really simple Universal CA

(I call them UCA’s )and to find Reversible, Universal CA’s (RUCA’s)

As you may know, the search for UCA’s went slowly until I had the idea to

abandon the Turing Machine model and look at modeling digital logic and

wires. Within 15 minutes after this idea occurred to me, I had a 4 state

UCA on my blackboard. At that time the best known was in Codd’s thesis; an

8 state UCA. I showed this to a student of mine, Roger Banks, who had been

struggling for a few years trying to complete an AI PhD thesis. The next

morning both he and I showed up with 3 state UCA’s. He switched his PhD topic

and found a 2 state, von Neumann neighborhood UCA, a thing that Codd

purported to have proved impossible.

While at BBN, after seeing all my 2-D CA’s expanding with simple

kaleidoscope like symmetries, (like the diamond shapes in the XOR rule),

Marvin Minsky challenged me to find a rule (any rule) that showed spherical

propagation. I took the challenge and shortly came up with such a rule.

With respect to reversibility, the first satisfactory RUCA was done by

Norman Margolus. I shortly thereafter found a simple RUCA that didn’t need

the use of the Margolus Neighborhood trick.

> 4. Do you still have any material from the simulations you did

> (printouts, tapes, programs, etc.)?

Yes, Probably, quite a bit

> 5. When you learn about the “munching squares” display hack? How did it

> relate to your work on the XOR CA?

I don’t recall it having any effect. It’s very unlikely that I knew of it

prior to the XOR CA.

> 6. What did you know about the work done by Unger etc. on cellular image

> processors? How did this relate to your work?

I knew of it second hand, but I don’t think it had any effect. Do you know

about Farley and Clark (Wes Clark) and their publication while at MIT’s

Lincoln Labs in the late 50’s?

> 7. What did you know about von Neumann’s work on cellular automata? How

> did it relate to your work?

At the time I did the XOR work I had not read anything about the von Neumann

CA, but I was told about it and I understood the concept very well. Many

years later I read something (by Burkes, I think). I remember knowing that

it was a 29 state system and that it knew left from right in order to extend

and turn its construction arm.

> 8. What did you know about Ulam and others’ work at Los Alamos on

> simulating cellular automata? How did it relate to your work?

All I knew about Ulam and CA is that, like the Hydrogen Bomb, he had key

ideas but probably didn’t get as much credit as he deserved. All my

knowledge re Ulam was anecdotal. As to what he did vs. what von Neumann did

I didn’t really know anything.

I didn’t know anything about anyone else actually simulating CA’s however

I’m pretty sure I assumed that others must have done so. It was so easy and

so obvious. While the use of a computer with a display (such as the Lincoln

Lab TX-0 and TX-2, the Digital PDP-1 and the IBM 709 and 7090 all had or

could have CRT displays, it was easy enough to display simple CA’s with a

printer, even a 10 CPS teletype.

> 9. Were you aware of work on cryptographic applications of CA-like

> systems?

I thought I invented that idea! As soon as I found ways to make RUCA’s it

occurred to me that they could be used for cryptography. As an aside, when

Witt Diffey [Whit Diffie] came up with the idea of public key cryptography,

which needed a trapdoor function, I thought of using the product of 2 large primes.

I had just written the first program, in LISP, to implement Michael Rabin’s first

version of a probabilistic prime test. As soon as I implemented it I

started a search at 10^100 and discovered that 10^100 +35,737 and 10^100

+35,739 were prime. A week later I met Rich Schroeppel in LA (he was

working for my company, III) and knowing a larger prime pair than anyone

else on Earth I told Rich and he was blown away. He was seated at a PDP-10

terminal and all he said was an emphatic “Really!” He then went type, type,

type for a few seconds and turned around and said “You’re right!” which blew

me away! I asked what he did and he said (while knowing nothing of Rabin’s

method) “all I did was look at 3^(n-1) mod n, you know, Fermat’s little

theorem, it usually gives 1 for primes.”

I’m rambling, probably about stuff of no interest to you. Anyway, I stopped

Ron Rivest in the hallway at Tech Square and asked if he had heard of

Diffey’s [Diffie’s] stuff. I don’t remember exactly what he said but I know that when

I told him that Rabin’s new method to find large primes meant that the

product of 2 primes was a good trapdoor function he was surprised and

thought it was a good idea! I never thought any more about it and hadn’t

come up with the idea of using the phi function… Years later, long after

RSA was a big thing Ron reminded me of the event… Don Knuth told me that

he also thought of using the product of 2 primes before RSA, but he couldn’t

have known about Rabin’s method when I did (as Rabin told it to me right

when he thought it up!)

By the way, I have an interesting algorithm for factoring smaller numbers,

such as can be done in less than an hour with Mathematica (normal

FactorInteger or ECM). I’ve written a few terribly unoptimized Mathematica

functions that implement the method. For what its good for, my Mathematica

functions (not compiled or anything) make Mathematica factor in a lot less

real time than Mathematica does with FactorInteger or ECM.

The big news re me and my work is what’s happening right now. Whatever one

thinks about my stuff (Digital Mechanics), it’s vastly improved. However

it’s still very far from a complete theory. Of course, Digital Mechanics is

about CA’s.

If you have any interest in reversibility, I’ve done lots in that area,

ranging from RUCAs, conservative logic, and my transforms. The transforms

are general methods of converting algorithms that calculate the approximate

time evolution of a system (approximate because of round off, truncation and

the finite delta t) which is approximately reversible (by changing delta t

to minus delta t) into an equivalent algorithm that calculates approximately

the same thing going forwards, but which is exactly reversible (being

calculated on a computer with round off and truncation error). I also have

a lot of methods for making RUCA’s with particular properties.

You’ve criticized me in the past for not publishing stuff, but I’m so

ambitious as to what I’m trying to do that I haven’t had the motivation to

publish all the little things I’ve uncovered along the way.

I’m sure I discovered more and better ways to make all kinds of RUCAs before

anyone else with the exception of the rule found by my student, Margolus.

Finally, one last anecdote. You and I were at some meeting long ago (maybe

Santa Fe?) and you brought along an early Sun to demonstrate your collection

of different kinds of 1-D CA’s. After your talk, I asked you why none of

the CA’s you showed were reversible. Your response was “Because all

reversible CA’s are trivial.” That really was a very common belief,

coincident with most people’s intuition. On the spot, I made up a rule,

using your convention for specifying it, of an “interesting” reversible CA.

You typed it in and ran it. Being surprised is one of the best kinds of

experiences we ever have.

As Emerson once quipped, “My apologies for such a long email, I didn’t have

the time to write you a short one.”

I’m having fun; it’s a good thing to do!

Best regards

Ed F

PS If you have any interest in having parts of your book read so that you

can get comments prior to publication, I have an idea that might be useful.

A little later he added:

Subject: error

Date: Fri, 15 Sep 2000 10:12:15 -0400

From: Ed Fredkin

To: Stephen Wolfram

Hi,

Looking at my long email I noticed a boo boo.

Where I wrote, quoting Schroeppel talking about 3^(n-1) mod n, “…it

usually give a 1 for primes…” very true but a bit of an understatement.

Of course, it ALWAYS gives a 1 for primes! What Schroeppel said was that it

usually doesn’t give a 1 for non-primes. It’s incorrect for 91 and 121 and

lots of other small numbers, but seems to work better for large numbers…

but then you probably know much more about such things than I do. Also

looking at your questions, I had the feeling that some might have been

prompted by my circa 1990 Digital Mechanics paper. If so, I guess I

repeated stuff already in the paper and I apologise.

Regards,

Ed F

I responded, asking for various pieces of clarification (and now that I’m writing this piece I would have asked even more, because some key parts of what Ed said I now realize don’t add up):

Subject: Re: your mail

Date: Wed, 20 Sep 2000 21:04:54 -0500

From: Stephen Wolfram

To: Ed Fredkin

…

>> 1. As far as you know, did you invent the 2D XOR CA rule?

>>

> Yes, as far as I know I did invent it. Here is what I did. ….

>

Very interesting.

1a. Did you ever look at 1D CAs? If not, why not?

1b. Did you think about analogies between XOR rules and linear feedback

shift registers?

1c. Did you think about analogies between XOR rules and Pascal’s

triangle?

By the way, the result about the number of binomial coefficients mod a

prime has been independently discovered a remarkable number of times

(including by me). The earliest references I know are Edouard Lucas

(1877) and James Glaisher (1899).

….

>> 3. What other CA rules did you study at that time?

> … I explored so many different rules that I probably

would have found the game of Life had I not put blinders on.

By the way, I happened to have a long phone conversation recently with

John Conway about the history of the Game of Life. I still haven’t

quite got to the bottom of exactly what Conway was doing and why (I

think he wants some of the history lost, which is a pity, because it is

interesting and reflects much better on him than he seems to

believe…) But what is clear is that Conway (and his various helpers)

had much more serious motivations from recursive function theory etc.

than is ever usually mentioned. It was just not a “find an amusing

game” etc. piece of work.

> Marvin Minsky challenged me to find a rule (any rule) that showed spherical

propagation. I took the challenge and shortly came up with such a rule.

I don’t believe I’ve ever seen your rule of this kind. I showed such a

rule to Marvin in 1984 and he said “that’s very interesting; we were

looking for these but hadn’t found any”. So I’m confused about

this….

…

>> 6. What did you know about the work done by Unger etc. on cellular image

processors? How did this relate to your work?

> I knew of it second hand, but I don’t think it had any effect.

Wasn’t BBN quite involved with cellular image processing? And I believe

you worked on aerial photography analysis. Did you use cellular

automata for image processing?

…

>> 9. Were you aware of work on cryptographic applications of CA-like

systems?

> I thought I invented that idea!

There was a lot of work done on 1D CAs by some distinguished

mathematicians consulting for the NSA in the late 1950s. I think much

of it is still classified. But over the years I’ve talked to many of

the people involved (Gustav Hedlund, Andrew Gleason, John Milnor, some

NSA folk, etc. etc.), and read their unclassified papers. They figured

out some interesting stuff. They thought of it as related to nonlinear

feedback shift registers.

> As soon as I found ways to make RUCA’s it

occurred to me that they could be used for cryptography.

How?

There’s a 1D CA (rule 30) that I studied in 1984 that has been

extensively used as a randomness generator (e.g. Random[Integer] in

Mathematica uses it), and that has been used a bit as a cryptosystem.

I tried to make a good public key system out of CAs in the mid-1980s

(mostly in collaboration with John Milnor), but did not come up with

anything satisfactory. …

…

> I also have a lot of methods for making RUCA’s with particular properties.

I am definitely somewhat interested in these things. They don’t happen

to be central to my grand scheme. But they are obviously worthwhile …

AND WORTH (you) WRITING DOWN!!

I’m sure I discovered more and better ways to make all kinds of RUCAs before

anyone else with the exception of the rule found by my student, Margolus.

Interesting. You probably know that the general problem of telling

whether an arbitrary 2D CA is reversible is undecidable (the question

can be mapped to the tiling problem).

So I’m taking it that you have some good methods for generating 2D

reversible CAs. That’s obviously interesting.

> Finally, one last anecdote. … I asked you why none of

the CA’s you showed were reversible. Your response was “Because all

reversible CA’s are trivial.” …

This anecdote can’t be quite right. I have known since 1982 that there

are nontrivial things that can happen in CAs that are made reversible by

your mod 2 trick. What is true (and may have been what I was saying)

is that none of the 2-color nearest neighbor CAs that are reversible are

non-trivial. With more colors or more neighbors, that changes. I’m

guessing that what you showed me was a 4-color nearest neighbor CA that

is reversible … and that is of course quite easy to get by recoding a

2-color one that has your mod 2 trick.

By the way, I heard third hand a while back that you had “introduced me

to CAs”. For what it’s worth, that isn’t correct. My first “CA

experience” was actually in 1973 (when I was 13) when I tried to program

molecular dynamics on a very small computer, and ended up with something

equivalent to the square CA fluid model. My next CA experience was in

summer 1981. I was trying to make models of “self organizing” systems

(now I hate that term), particularly self-gravitating gases. I ended up

simplifying the models until I got 1D CAs. That fall I spent a month at

the Institute for Advanced Study, and spent a lot of time studying von

Neumann’s work, etc., and analysing all sorts of features of 1D CAs. I

came for a day to give a talk at MIT, and was having dinner with some

LCS people (Rich Zippel was one of them), and they told me about your

work. Later that fall I talked with Feynman a certain amount about what

I was doing with CAs, and he again mentioned you. (I think he had been

to your Physics of Computation meeting, which was perhaps in June 1981,

but I didn’t discuss the CA aspects of the meeting with him.) Then in

[January 1982] I came to the meeting you had on your island, and Tom Toffoli

showed me his 2D CA machine (at the time he gave me the impression of

95% hackery, 5% science), and you showed me the 2D XOR CA on a PERQ

computer.

Ed didn’t respond to this, but three days later we talked on the phone. I sent some (unvarnished) notes from the call to a research assistant of mine:

Subject: Fredkin conversation

Date: Sat, 23 Sep 2000 03:05:43 -0500

From: Stephen Wolfram

I had a long conversation with Ed this evening.

About his work in science, my work in science, etc.

A few things mentioned:

– He feels bitter that his paper on reversible logic, coauthored with

Tom Toffoli, was actually all his (Ed’s) work

– He is pleased that I will discuss history even when people haven’t

published things (of course he has published little)

– He says he has written about 150 pages about his views of physics; he

is planning to prepare something, perhaps for publication, in about a

year

– He says he missed not being able to bounce ideas off Dick Feynman …

even though Feynman often ended up screaming at him (Ed) about how dumb

his ideas were

– He said that his main problem was that he has been trying to get

people to steal his ideas for years, but nobody was interested

– He said that now “for some reason” he is becoming more concerned about

matters of credit

– He is a serious fan of Mathematica, the Mathematica Book, etc.

– He made an effort again (he’s been trying for 20 years) to get me to

coauthor a paper with him. He recognizes that he can’t write a credible

scientific paper, but he’s “sure he has some ideas I haven’t thought

of”. I told him that unfortunately I haven’t written a paper for 15

years.

– I told him that particularly when I’m in the Boston area, I’ll look

forward to chatting with him about physics etc.

– He said he’s tried to interact some with Gerhardt ‘t Hooft, but that

‘t Hooft keeps on rushing off in traditional physics directions that Ed

(and I, by the way) think are stupid

– He wanted to know if I really believed that all of physics etc. was

ultimately discrete; he expressed the opinion that he and I may be the

only people in the world who actually believe that right now

– He told a bizarre story about how Don Knuth gave a talk at MIT

recently on computers and religion, and how 1/4 of it was stuff that Don

had heard about from Ed. Apparently Guy Steele asked a question about

how Don’s stuff related to Ed’s, and Don said something meaningless.

I talked to him a little more about the CA history stuff. He mentioned

that around 1961 a certain Henry Stommel (sp?) told him that CA-like

models had been used in studying sand dunes in the 1930s. I have a

feeling this may be another cat gut search, but perhaps we can follow

up. (You could email Ed at the appropriate time.)

I asked Ed if he had ever looked at cryptography (as in NSA style stuff)

with CAs. He said no. But that in the late 1960’s he had had a student

who had studied ways to make counters out of JK flip flops … and that

that person’s work had made something that Ed thought could be used for

cryptography. This was followed up by a certain Vera Pless

subsequently.

I didn’t hear anything more from Ed for a while, though a public records search indicates that, yes, he had successfully “worked the system” to get $100k from the NSF for “The Digital Perspective Project”. And on May 1, 2001, I received a rather formal email from Ed (for some reason Americans born before about 1955 seem to reflexively call me “Steve”):

Subject: Workshop on the Digital Perspective 24-26 July, Washington DC

Date: Tue, 1 May 2001 21:20:45 -0400

From: Ed Fredkin

To: Steve Wolfram

We are sending this email to invite you to an NSF-sponsored workshop on the

Digital Perspective in Physics planned for July 24th through the 26th,

Tuesday, Wednesday and Thursday. It will be held in the NSF building,

Arlington Virginia. Gerard ‘t Hooft has already agreed to present a paper

and we hope that you will also be willing to contribute. We intend to

combine the papers presented at the workshop into a monograph that will be

published later this year. Two earlier workshops on related subjects were

held at Moskito Island and this was a central theme at a meeting held at

MIT’s Endicott house in 1982. Participants at previous meetings included

Charles Bennett, Richard Feynman, Ed Fredkin, Leo Kadanoff, Rolf Landauer,

Norman Margolus, Tomasso Toffoli, John Wheeler, Ken Wilson, Stephen Wolfram

and others.

Digital Perspective in Physics planned for July 24th through the 26th,

Tuesday, Wednesday and Thursday. It will be held in the NSF building,

Arlington Virginia. Gerard ‘t Hooft has already agreed to present a paper

and we hope that you will also be willing to contribute. We intend to

combine the papers presented at the workshop into a monograph that will be

published later this year. Two earlier workshops on related subjects were

held at Moskito Island and this was a central theme at a meeting held at

MIT’s Endicott house in 1982. Participants at previous meetings included

Charles Bennett, Richard Feynman, Ed Fredkin, Leo Kadanoff, Rolf Landauer,

Norman Margolus, Tomasso Toffoli, John Wheeler, Ken Wilson, Stephen Wolfram

and others.

…

When I didn’t immediately respond, Ed called my assistant, saying that he was “calling regarding a meeting he spoke with [me] about on the phone”. I responded by email later the same day:

Subject: I gather you called…

Date: Tue, 15 May 2001 15:18:57 -0500

From: Stephen Wolfram

To: Ed Fredkin

Sorry for not getting back to you sooner….

I myself am right now trying to work at absolutely full capacity to finish my

book/project. I haven’t done any travelling at all for a long time, and won’t

until my book is done.

And I also don’t yet have anything public to say about my work on physics.

Hopefully by the end of the year my book will be done and I will have quite a

bit to say.

However, it occurs to me that one or two of my assistants might be very good

people to come to your workshop.

Who all is coming?

One person you should definitely invite is someone who has been an assistant of

mine, and now works part time for me, and part time on his own projects. His

name is David Hillman, and he’s been interested in discrete models of spacetime

for a long time. (He got his PhD working on some kind of generalization of

cellular automata intended as a spacetime model.)

I have two physics assistants, and one math one, who might be relevant for your

workshop.

Just let me know in more detail who might be coming, and I’ll try to figure out

the correct person/people to suggest.

Of course I’d love to come myself if I were a free man. But not until the book

is done.

In haste,

— Stephen

Ed responded pleasantly enough:

Subject: RE: I gather you called…

Date: Tue, 15 May 2001 17:08:39 -0400

From: Ed Fredkin

To: Stephen Wolfram

Hi,

Sorry you can’t make it.

About half of those coming are veterans of some Moskito Island workshop.

Newcomers include Gerry Sussman, Tom Knight, Gerard ‘t Hooft, John Negele,

John Conway, Raj Reddy, Jack Wisdom, Seth Lloyd, David di Vincenzo, plus a

number of students, etc. A couple of those mentioned are still struggling

with scheduling issues.

But, in any case, I would be pleased to have David Hillman come to the

workshop. Send me his email address and I will send him an invitation.

Best regards and good luck on the book!

Ed

I responded and suggested an additional person from our team for his workshop. Nearly a month passed with no word from Ed, so I pinged him asking what was going on. No response. It was a very busy time for me, and this wasn’t something I wanted to be chasing (I saw myself as doing Ed a favor by suggesting sending people to his workshop) … so I sent a slightly exasperated email:

Subject: your conference, again

Date: Fri, 15 Jun 2001 06:08:39 -0500

From: Stephen Wolfram

To: Ed Fredkin

Look … I’m now in a bit of an embarrassing situation: following your

initial response, I told David Hillman and David Reiss about your conference

… assuming you’d want to invite them … and they both became quite

interested in it. But they never heard from anyone about it. So now of

course they’re wondering what’s going on. And so am I. What should I tell

them? I’m now embarrassed about having suggested this…

initial response, I told David Hillman and David Reiss about your conference

… assuming you’d want to invite them … and they both became quite

interested in it. But they never heard from anyone about it. So now of

course they’re wondering what’s going on. And so am I. What should I tell

them? I’m now embarrassed about having suggested this…

This seems peculiarly un-you-like. I was thinking you must have been away

or something. But isn’t the conference coming up very soon?

I hope everything’s OK…

Still no response from Ed. A week later I called him, and we talked for two hours. It wasn’t clear why he hadn’t already reached out to the people I’d suggested, but he quickly said he would. And then Ed launched into telling me about the “astounding” cellular automaton models he said he’d just created that “had charge, energy, momentum, angular momentum, etc.”. He talked about things like the idea of what he called an “infoton” that would be an “information particle” that would “make Feynman diagrams reversible”. I explained why that didn’t make any real sense given how Feynman diagrams actually work. It was the same kind of conversation I had many times with Ed. I kept trying to explain what was known in physics, and he kept on coming back with things that, yes, I think I understood, but that seemed close to typical crackpot fare to me. But Ed seemed convinced he had discovered something great (though exactly what I couldn’t divine). And eventually—having obviously not convinced me of what he was doing “on its merits”—he just came out and said “It must be related to stuff you’re doing, one way or another”.

I explained that I really didn’t think that was very likely, not least because I emphatically wasn’t trying to use cellular automata as models of fundamental physics. And with that, Ed launched into a long speech about giving credit, particularly to him. I explained that I was trying hard to write correct history, and reiterated some of the questions I’d asked him before. He didn’t really tell me more, but instead regaled me with stories (that I’d mostly heard many times before from him) about how he’d been the first to figure out this and that—apparently oblivious to historical research I tried to tell him. But eventually we both had to go—and the conversation ended pleasantly enough, with him confirming the email addresses for the two people for his workshop.

As the workshop approached, the people from my team had made arrangements to go to Washington, DC—but still didn’t know where exactly the workshop was. With days to go, one of them simply called Ed to ask. But Ed told them that actually they couldn’t come, because “Raj Reddy says there is no room for you”. Really? No extra chair to be found? Ed was the organizer, wasn’t he? Why was he laying this on someone else? It seemed to me that Ed was playing some kind of game. But at that moment I was too busy trying to finish my book to think about it. (Now that I’m writing this piece, however, I realize that Ed was perhaps following an “algorithm” he’d established years earlier when he was proud to have organized a meeting to push forward his ideas about timesharing—by inviting just people who supported his ideas, and not inviting ones who didn’t. I don’t know if the meeting actually happened, or what went on there. I don’t think the writeup promised in the invitation and in the NSF contract ever materialized.)

In January 2002 *A New Kind of Science* was off to the printer, and review copies were starting to be sent out. In late March a seasoned journalist named Steven Levy (who had written about my work on cellular automata in the mid-1980s) was talking to someone from my team and reported that Ed had told him that “Minsky had told [Ed] to publish his stuff on the web to stake out priority” before my book came out. (And it’s a pity Ed didn’t do that, because it might have made it clear to him and everyone else how different what he was saying was from what I was saying.) But in any case Levy said that Ed seemed to be saying the same things as he’d said 15 years ago—and Levy knew that regardless of anything else I’d done incredibly much more since then.

After his conversation with Levy, Ed sent me mail:

Subject: The Book

Date: Fri, 22 Mar 2002 16:38:29 -0800 (PST)

From: Ed Fredkin

To: Stephen Wolfram

Congratulations on finishing!!!

I ordered the book from someplace, so long ago I can’t

remember from who. I’m wondering if, when its

possible, I could get a copy in advance of whenever my

ordered copy is going to appear. I just don’t want to

be the last on the block to see it. Of course I’d be

happy to pay if you can tell me how to do it.

Thanks,

Ed Fredkin

The book was going to be published on May 14; on May 4 I signed a copy for Ed:

The book mentioned Ed a total of 7 times. (The person with the most mentions overall was Alan Turing, at 19; Minsky had 13; Feynman 10.)

Ed never told me he’d received the book. And I’m not sure he ever seriously looked at it. But somehow he was convinced that since he knew it talked a lot about cellular automata, and had a section about physics, it must be about his big idea—that the universe is a cellular automaton. As one witty friend pointed out to me in connection with writing this piece, my book says only one thing about the universe being a cellular automaton: that it isn’t! But in any case, Ed apparently seemed to feel that I was stealing credit from him for his big idea—and, as I now realize, started an urgent campaign to right the perceived wrong, basically by telling people that somehow (despite all my efforts to describe the history) I wasn’t giving anyone enough credit and that “he was there first”. The *New York Times* rather diplomatically quoted Ed as saying “For me this is a great event. Wolfram is the first significant person to believe in this stuff. I’ve been very lonely”. It followed up by saying that “Mr. Fredkin, who said he was a longtime friend, said Dr. Wolfram had ‘an egregious lack of humility’”. (In some contexts, I suppose that might be a compliment.)

In writing this piece I asked Steven Levy what Ed had actually said in the interview he did. His first summary in reviewing his notes was “He says he considers you a friend and then goes on endlessly about what an egomaniac you are”. But then he sent me his actual notes, and they’re somewhat revealing. Ed doesn’t claim he introduced me to cellular automata, perhaps because he realizes that Levy knows from the 1980s that that isn’t true. But then Ed tells the story about showing me reversible cellular automata, which I’d explained to Ed wasn’t true. Ed goes on to say that “Everyone who’s in science wants credit, driven probably by wanting to become famous. [Wolfram] has a larger than normal dose”. Ed says that when he had said that cellular automata underlie physics, I’d said that was crazy. (Yup, that’s true.) But then Ed said “Now he denies this”. Huh? Ed went on: “He’s a prisoner of some kind of overactive ego. I believe he might not know. Wolfram deserves loads & loads of credit, but he has this personality flaw”. And so on.

A month later Ed writes to me:

From: Ed Fredkin

To: Steve Wolfram

Sent: Friday, June 14, 2002 2:48 PM

Subject: ANKOS critics

Steve,

Sometime soon I’d like to get together and talk.

I’ve read a lot of your book.

Take a look at the draft of a little paper of mine (attached). I’d

appreciate comments.

Ed F

The following is my response someone else’s response [Gerry Sussman] to a review of ANKOS.

My comments are only with regard to Wolfram’s ideas on modeling physics.

I don’t happen to like his network model but we are in agreement that

some kind of discrete process might underlie QM.

Not everything Wolfram says is wrong.

…

The ideas that some kinds of discrete space-time processes (such as

CA’s) might underlie physics or other processes in nature is the BABY.

Everything else in ANKOS (or missing from ANKOS) is the BATH WATER.

Ed’s attached paper was basically yet another restatement of cellular automata as models of fundamental physics.

A few weeks later there was a strange (if in some ways charming) incident when a reporter for the *San Francisco Chronicle* decided to investigate what seemed to be a science feud between Ed and me. After a nod to medieval metaphysicians, the article (under the title “Cosmic Computer”) opens with “Nowadays, with a daring that might have dazzled St. Augustine and St. Thomas Aquinas, two titans of the computer world argue that everything in the universe is a kind of computer.” After analogizing me to Britney Spears, the article goes on to say “The excitement has also brought tension to the long-standing friendship between Wolfram and Fredkin, who are now wrestling with one of the bigger bummers of any scientist’s life: a dispute over originality.” The article reports: “Last week, the two men had a long, heartfelt phone conversation with each other, in which they tried to resolve their strong disagreement over priority. The conversation was amicable, but they failed to reach agreement.”

And so things remained until March 2003 when Ed sent the following:

Subject: Re: NKS 2003 Conference & Minicourse

Date: Thu, 20 Mar 2003 17:24:59 -0500

From: Edward Fredkin

To: Stephen Wolfram

Dear Stephen,

I guess I’m on a Wolfram mailing list for potential attendees for your

Boston conference. I hope you don’t mind a little plain speaking. I

consider that I am a friend of yours and therefor I take the risk of

telling the emperor about his new clothes. Of course, few others

would do so as a friend. Please don’t be offended as the plain talking

that follows is my attempt at trying to be constructive.

Your work is acquiring a reputation amongst the scientific community

that is much less than it deserves. I find myself often in the

position of defending you, your work and your accomplishments against

the negative views that many hold, even though they have little

understanding of the significance of what you have done. They are

turned off by your egregious behavior; it distracts much of the

scientific community rendering it barely possible for them to take you

seriously . You have invented and discovered quite a few things, but

so have others. You told me you would try to give credit in ANKOS

where credit was due; I believed you and I believe that you tried your

best but nevertheless you failed miserably. I guess you simply didn’t

know how. Consider this conference: Must this conference be a one man

show or might it actually be better for the ideas in ANKOS and better

for SW and his overall scientific reputation if it were a real

conference where others might address the same questions? Please don’t

kid yourself into thinking that no one else has anything original,

novel, important or interesting to say.

Of course, this so-called “…first ever conference” devoted to the “…

ideas and implications…” of concepts found in ANKOS might be nothing

more than a marketing tool for Mathematica and for sales of the ANKOS

book. If so, you ought to call a spade “spade”.

You’ve done enough things (and hopefully will continue to do so) to

ensure your reputation as a pioneer in various areas. This flood of

self puffery simply detracts, in the minds of many whose opinions you

ought to value, from the positive reputation you deserve.

I’m not one of those whose opinion of your work is in any way affected

by your unfortunate behavior. I see and understand exactly what you’ve

done and I know and understand what your work is based on. I am human,

so I find it interesting when you now and then claim to have discovered

an idea or fact that I personally explained to you when it was

perfectly clear at the time that to you, the idea was absolutely novel.

My model of you is that your overpowering motivation results in your

mind playing tricks on you. I really believe that you actually forget;

that you actually re-remember the past differently than it happened.

But I am the eternal optimist. I believe that even Stephen Wolfram

might someday come around and join the collegial scientific community

where you receive credit and give credit; both nearly effortlessly.

The world actually might voluntarily heap honors on you as opposed to

SW having to orchestrate “conferences” for the glorification of SW and

all the ideas claimed by SW. No one knows better than me how slow and

torturous this process can be for new and novel big concepts, but

patience and modestly [*sic*] still seems like the better path.

Please try to not be offended. I actually mean well. If you ever have

an actual, real conference, invite me to be a speaker; I’ll come. If I

organize another conference you can rest assured that you will be

invited again (as you were for the NSF Workshop) and I hope you will

come to talk about your ideas and maybe — maybe even stay to hear what

others have to say on the subject. It’s not too healthy to the

scientific mind to be the only real speaker at conferences you organize

and hype for yourself.

Among the very few who really are able to appreciate what you’ve done,

I am one of your greatest supporters. But I am not your average person

with more or less normal reactions. When you reach for extra glory

and credit by stealing one of my ideas, my reaction is: “I admire your

good taste”.

Best regards

Ed F

On Thursday, March 20, 2003, at 02:19 PM, Stephen Wolfram wrote:

> In June of this year we’re going to be holding the first-ever

> conference devoted to the ideas and implications of A NEW KIND OF

> SCIENCE. I think it’ll be an exciting and unique event. And if you’re

> interested in any facet of NKS or its implications, you should plan

> to come!

>

> I’ll be giving a series of in-depth lectures to explain the core

> ideas of NKS. There’ll be more specialized sessions exploring

> implications and applications in areas such as computer science,

> biology, social science, physics, mathematics, philosophy, and future

> of technology. And there’ll also be workshops and case studies about

> such issues as modelling, computer experimentation, defining NKS

> problems, NKS-based education–as well as a gallery of NKS-based

> art pieces.

>

> I’d expected that it’d be a few years before it would make sense to

> start having NKS conferences. But things have gone faster than I

> expected, and the enthusiasm and energy we’ve seen in the ten months

> since the book was published has made it clear that it’s time to have

> the first NKS conference.

>

> In planning NKS 2003, we want to cater to as broad a range of

> attendees as possible. There’ll be many professional scientists

> coming, as well as technologists and other researchers from a very

> wide range of fields. There’ll also be a large number of educators

> and students, as well as all sorts of individuals with general

> interests in the ideas and implications of NKS.

>

> We’ll be holding NKS 2003 near Boston over the weekend of June 27-29,

> 2003. There’s more information and registration details at

> http://www.wolframscience.com/nks2003

>

> It’s going to be an extremely stimulating weekend–and a unique

> opportunity to meet a broad cross-section of people interested in new

> ideas.

>

> I hope you’ll be able to be part of this pioneering event!

>

>

> — Stephen Wolfram

I responded:

Subject: Re: NKS 2003 Conference & Minicourse

Date: Sat, 22 Mar 2003 22:19:33 -0500

From: Stephen Wolfram

To: Edward Fredkin

Ed —

I must say that I am reluctant to respond to a note like the one below, but

it seems a pity to let things end this way.

I can tell you’re very angry … but beyond that I really can’t tell too

much.

I’d always thought we had a fine, largely social, relationship. We talked

about many kinds of things. It was fun. Occasionally we talked about

science. In the early 1980s I learned a few things about cellular automata

from you. None were extremely influential to me, but they were fine things

that you should be proud of having figured out—and in fact I took some

trouble to mention them in the notes to NKS.

You also told me some of your thinking about fundamental physics. I was (I

hope) polite, and tried to be helpful. But I always found what you were

saying quite vague and slippery—and when it became definite it usually

seemed very naive. I think it’s a great pity that you’ve never taken the

time to learn the technical details of physics as it’s currently practiced.

There’s a lot known. And if you understood it, I think you’d be able to

tell quite quickly which of your ideas are totally naive, and which might

actually be interesting.

I think it’s also a pity that—so far as I can tell—you’ve never really

taken the time to understand what I’ve done. It’s in the end pretty

nontrivial stuff. It’s not just saying something like “the universe is a

cellular automaton” or “I have a philosophy that the universe is like a

computer”. It’s a big and rich intellectual structure, built on a lot of

solid results and detailed, careful, analysis. That among other things

happens to give a bunch of ideas about how physics might actually

work—that have (so far as I know) almost nothing to do with things you’ve

been talking about.

I do agree with your belief that the universe is ultimately discrete. But

of course many people have for a long time said that they thought the

universe might at some level be discrete. Some of those people (like

Wheeler, Penrose, Finkelstein, etc.) are sophisticated physicists, and what

they’ve said has lots of real content—it’s not just vague essay-type

stuff. Now, I don’t happen to think what they’ve specifically proposed is

correct. But you would be completely wrong to think (as you seem to) that

somehow the idea that the universe might be discrete originated with you.

I really encourage you to read NKS in detail, including the notes at the

back. I think there’s a lot more there than you imagine. And I think if

you really understood it, you would be completely embarrassed to write a

note like the one below.

You’ve never struck me as being someone who is terribly interested in other

peoples’ ideas. And that’s of course fine. But you shouldn’t assume you

know their ideas just on the basis of a few buzzphrases or some such. In

some areas of business, that approach often works. Because, as we both

know, the ideas typically aren’t that deep. But it won’t work with a

character like me doing science. There’s too much nontrivial content. You

have to actually dig in to understand it. And from the things you say you

obviously haven’t.

For twenty years I thought we had a fine personal relationship. I thought

it was a little odd that you seemed to go around telling people that you had

introduced me to cellular automata. We talked about this a few times, and

you admitted this wasn’t a true story. But while I thought it was a little

unreasonable for you to keep on saying something you knew wasn’t true, I

didn’t pay much attention. It never really got in the way of our

relationship.

And then there was the incident of your NSF-funded conference. You invited

me. I said I couldn’t come. And suggested two alternates. You said fine.

But then you never contacted these people. Which was rather embarrassing

for me. And then, when David Reiss contacted you, you told him the

conference “was full”.

Later, when we talked about it, you admitted that that was a lie—and then

blamed the lie on Raj Reddy.

Frankly, I was flabbergasted by all this. That’s not the kind of

interaction someone like me expects to have with a seasoned high-level

operative like yourself. Yes, that’s the kind of thing some sleazy young

businessperson might do. But not a mature businessperson who has run

companies and things.

I still have no idea what you were thinking of. But it thoroughly shook my

confidence in you as someone I could interact straightforwardly with.

And then, of course, there’s the question of what you’ve said to journalists

etc. about NKS. In detail, I don’t have much idea. But something fishy

was surely going on. I haven’t gone and studied all the quotes from you.

But certainly my impression was that you were trying to claim that really

lots of key things in NKS were things you had done or said first.

You know that I tried to research the history carefully. And unless I

missed something quite huge, your contributions to NKS were extremely minor,

and are certainly accurately represented in the history notes. Now of

course if you don’t actually understand what I’ve done in NKS, that may be

hard to see. But I can’t really help you with that.

OK, where do we go from here?

We talked at some length when that reporter from a San Francisco paper was

trying to write a story about you and NKS. I thought we had a decent

conversation. But then, so far as I could tell, you went right ahead and

told the reporter—again—exactly a bunch of things we’d agreed in our

conversation weren’t true.

It was the same pattern as with telling people that you’d introduced me to

cellular automata. And it resonated in a bad way with the lie you told

about your conference.

I would have expected vastly better from you. I must say that I was

personally most disappointed. And I concluded with much regret that I must

have seriously misjudged you all these years.

I would like nothing more than to be able to mend our relationship, and go

back to the kind of pleasant social interactions we have always had.

How can that be achieved? Perhaps it’s impossible. But one step is that

you might actually try to understand what I’ve done in NKS. That would

surely help.

— Stephen

Subject: Delayed reply

Date: Thu, 3 Apr 2003 23:34:17 -0500

From: Edward Fredkin

To: Stephen Wolfram

Stephen,

I have been traveling and more recently have had my time gobbled up by

a most urgent matter.

I appreciate your quick reply to my email and I will get back to you

sometime soon. Rather than trying to respond to everything you brought up,

I will be limited to dealing with a couple of issues at a time.

What I can tell you is that I am not angry, and was not angry or upset.

I have always been a non-emotional observer with regard to whatever it is

that comes my way. That’s just my nature. It has come in handy at times,

such as when someone’s stupid mistake caused the single engine jet fighter

I was flying have an engine fire on take off. This required shutting down the

engine and taking other drastic actions very quickly; no time to get mad.

The gist of my comments to you was not related to the work you

documented in NKS, but rather to the style and methodology you are using

while trying to get people to understand and appreciate what it is you

have done. I certainly agree with the fact that it is extraordinarily

difficult to get the scientific establishment to pay attention, listen,

understand and appreciate what you’ve done. Nevertheless, I think there

might be a better approach to that problem than the one you are following.

So, as soon as I can get a little breathing room, I’ll respond to some

of your comments. I do value our friendship and whatever I do in this regard

will be an attempt at honest and unemotional communication with the goal of

some better mutual understanding.

By the way, I have taken the time to read and understand what you’ve

done in NKS. I’m pretty sure that I am better able than most to appreciate

the effort, persistence and creativity that went into that work.

You have made some comments about me and my own work, and I wonder what

you actually know about it beyond our conversations and the things you

referenced in NKS.

As soon as I can get some time I’ll continue with some further thoughts.

Best regards,

Ed

Ed didn’t send the promised followup. But a couple of months later *New Scientist* sent our media email address a note titled: “cover feature on Fredkin, Wolfram right to reply”, which asked for “comment on the suggestion that you first became familiar with cellular automata first at Fredkin’s lab in the 1970s and that examples in *A New Kind of Science* came out of work done in the lab”. I told Ed he should correct that—and he responded to me:

Subject: Re: cover feature on Fredkin, Wolfram right to reply.

Date: Thu, 29 May 2003 13:55:02 -0400

From: Ed Fredkin

To: Stephen Wolfram

Stephen,

I carefully and clearly told the author of the NS article that to my

knowledge it is not true “… that Wolfram first became familiar with

cellular automata at Fredkin’s lab in the ’70’s…” and further that

you already knew about CA’s.

My guess is that magazines see value in controversy and they would like

to attribute statements to each of us that helps them titillate their

readers. I tried in every way I could to correct any wrong impressions

the author had. But what they end up doing is beyond my control.

As to cracking the fundamental theory of physics, I did read and did

understand what you wrote about in NKS, however my interests lie in

models that are regular and based on a simple underlying Cartesian

lattice. The models I have been working on for the past few years are

called “Salt” as they are CA’s similar to an NaCl crystal. You can

read about it at www.digitalphilosophy.org.

My approach to being consistent with QM, SR and GR is related to the

fact that CA models of physics can exactly conserve such quantities as

momentum, energy, charge etc. By means of a variant of Noether’s

Theorem, the physics of such CA’s can exhibited the all the symmetries

we currently attribute to physics, but doing so asymptotically at

scales above the lattice.

Thus, in my concept of a theory of physics, translation symmetry,

rotation symmetry etc. would all be violated as we currently understand

is true for time symmetry, parity symmetry and charge symmetry .

No one suggests that you should agree with all my ideas, however your

comment in your prior email to me is unnecessarily condescending:

> “I think it’s a great pity that you’ve never taken

> the time to learn the technical details of physics as it’s

> currently practiced. There’s a lot known. And if you understood

> it, I think you’d be able to tell quite quickly which of

> your ideas are totally naive, and which might actually be interesting.”

What is certain is that there’s no “great pity” necessary. I actually

do know a lot about the technical details of physics. In any case,

thirty years ago Feynman thought that I needed to learn more about

certain aspects of QM. He was specific in what he felt was everything

more that I needed to know (in order to make progress with my CA

ideas). He offered to work with me, which was accomplished during the

course of the year I spent at Caltech (1974-1975). I studied, learned

more about QM and passed the final exam that Feynman gave me. While

we argued a lot, Feynman never accused me of having naive ideas.

As to NKS 2003, it doesn’t make a lot of sense for me to come to be a

member of the audience. If you would like me to participate in some

meaningful way, let me know.

Best regards,

Ed F

And after that exchange, Ed and I basically went back to being as we had been before—having pleasant interactions, without any particular scientific engagement. And in a sense for many years I kept out of Ed’s scientific way—not seriously working on physics again until 2019.

Since 2002 I’d been living in the Boston area, so Ed and I ran into each other more often. And although Ed’s behavior over *A New Kind of Science* had disappointed and upset me, it gave me a better understanding of Ed as a human being, and a vulnerable one at that.

It was always a little hard to tell just what was going on with Ed. In July 2003, for example, he wrote to me:

Subject: Gunkel

Date: Thu, 24 Jul 2003 19:07:56 -0400

From: Ed Fredkin

To: Stephen Wolfram

Stephen,

First I must apologize for this long letter. Pat Gunkel sent me an

email telling of your visit. It prompted me (who hardly ever writes

anything) to type up my thoughts for whatever they’re worth.

…

You might be surprised at the number of wise and intelligent people who

really appreciate Pat and his works. Yet after more than 30 years of

fitful, diverse yet nearly continuous support, Pat has come to a

situation that, to him, looks like the end of the line.

…

I, unfortunately, am no longer in a position to personally provide the

kind of modest support that Pat needs to continue his church-mouse kind

of existence.

…

There is no doubt that Pat can be a difficult person to help, but I

notice that he has mellowed with age. Of course, Wolfram Research is

not a charitable institution. But I believe that Pat’s ideas on

ideonomy are really important and that those ideas may form the basis

of interesting future applications. The point of all this is that if

what Pat is doing seems interesting to you, some arrangement with

Wolfram Research might make sense.

(True to form, Gunkel followed up with a very forthright note, including a scathing critique he’d written of *A New Kind of Science—*as well as of Ed’s theories. That wouldn’t have deterred me, but I couldn’t see anything Gunkel could actually do for us, so I never pursued this.)

But did Ed’s note imply that Ed was running out of money? I’d always assumed some kind of vast business empire lurking in the background, but now I wasn’t sure.

I saw Ed only a few times in the next couple of years—at events like a Festschrift for Sulak and a bat mitzvah for one of Feynman’s granddaughters. But as usual, he was eager to tell stories, some of which I hadn’t heard before—mostly about things far in the past. He said that in the early 1960s John Cocke had stolen the idea of RISC architecture from his murdered friend Ben Gurley, though it had taken him two decades to get it taken seriously. He said that around the same time he’d been pulled in by the Air Force to help with analysis of blast waves from nuclear tests (and that story came with descriptions of B-52s doing loop-the-loop maneuvers when they dropped atomic bombs). He said that he’d once demoed the Muse music system (which, he emphasized, he, not Minsky, had invented) to an astonished audience in the Soviet Union. He said that he’d advised Richard Branson on his transatlantic balloon trip, telling him his butane burners weren’t correctly mounted—and in fact they fell off. And so on.

In 2005 Ed told me he’d been working with a programmer in California named Dan Miller (who’d developed audio compression software [and been at the NKS 2003 conference that Ed had been so upset about]) on the new 3D cellular automaton he’d invented that he called the “SALT architecture” because its pattern of updates were like the Na and Cl in a salt crystal.

But then in 2008 Ed told me he’d sold his island—presumably relieving whatever financial issues he’d had before—and suddenly Ed started to show up much more. He told me (as he did quite a few times) that he was working on a book (which never materialized). He told me he was teaching a course at Carnegie Mellon on the “Physics of Theoretical Computation”—which was apparently actually a very-much-as-before “engineering-style” effort to explore building features of physics from a cellular automaton, now with his SALT architecture. He invited me to a dinner at his house in honor of ‘t Hooft, photographed here with Ed, me and Sulak:

That fall, Ed came to the Midwest NKS Conference in Indiana, here photographed in a discussion with Greg Chaitin, me and others:

I would interact with Ed quite regularly after that—most often with him telling me about his use of Mathematica and soon Wolfram|Alpha. In 2012 Ed—now aged 78—sent me a nice “I have an idea” email (I made the requested introduction, though I’m not sure if this ever went anywhere):

Subject: Alpha and Problem Solving

Date: Fri, 19 Oct 2012 20:12:01 +0000

From: Edward Fredkin

To: Steve Wolfram

Steve,

The first thing I taught at MIT was a course in general problem solving (in 1968).

I’m now developing a new course on General Problem Solving which I expect to

offer first at Harvard’s HILR program. Part of the motivation came from watching

Joyce struggle with a Harvard course on Chemistry, where a lot of the homework

involved units conversions. I noticed that Alpha promptly solved many of Joyce’s

homework problems including some involving chemical reactions. (The course

was really for students planning to take the MCAT Exam in order to get into Medical

School). One clue that you might give to the Alpha developers, is to work toward

getting Alpha to have more of the capabilities necessary to pass different standard

tests that involve various kinds of quantitative analysis. (Of course, you might

have already done so.)

You might recall that I discussed the issue of units conversion with you long ago

(before Mathematica), and you described the idea you then had that turned into

Convert in Mathematica.

In any case, Alpha is fantastic, and getting better all the time. My plan is that every

one of my students must use Alpha for every problem that involves numbers, along

with some that don’t involve numbers. My motto is John McCarthy’s dictum:

“Those who refuse to do arithmetic are doomed to talk nonsense!” However, with

Alpha, the problem solver doesn’t have to do the arithmetic or the units

conversions; Alpha can do it!

It would be helpful if I could get a little bit of cooperation from someone in the Alpha

group. Basically, I will want to talk to an Alpha expert from time to time to make sure

I’m taking advantage of the best that Alpha can do along with resources already

developed for introducing Alpha to new users. My initial students will be drawn from

a group of retirees who, while clearly above average in intelligence, may have few

recently used skills in mathematics. I also expect that almost all of my initial

students will be first time Alpha users. Again, I might profit from discussion

with someone who has thought about how to introduce Alpha to beginners.

Let me know what you think or, if you like, we could get together to talk about it.

Best regards and Congratulations!

Ed

In 2014, when I recorded some oral history with Ed—now age 80—he was again brimming with ideas. The one he was most excited about had to do with weather prediction. It started from the observation that most smartphones have pressure sensors in them. Ed’s idea was to use these—and more—to create a sensor net that would continuously collect billions of pressure measurements, to be fed as input to weather forecast codes. Channeling his lifelong interest in reversible computing he imagined that the codes could be made reversible, and that running backwards from an incorrect prediction could tell one where more data had to be collected. Then Ed imagined doing this by having tiny balloons all over the place—with nothing that would cause trouble if a plane ran into it. He had a whole plan for partners he wanted to get (and, yes, he wanted us to be part of it too). And in typical Ed fashion, it was all laced with stories:

You know, I had this personal experience with weather. I was flying a glider along at 16,000 feet, and I encountered sink. You know, sink is wind blowing down. And the speed of the sink was 10,000 feet a minute. I was at 16,000 feet. And two minutes later, I was on the ground landing. Not on purpose. You know my attitude was—if I don’t see a big grading on the ground—[the wind] can’t keep going this way all the way down, so I won’t be killed. Actually, in that same storm, one of the pilots was killed.

…

The weather people just aren’t into the vertical movement of air. They do everything in layers. But this went through a lot of layers all at once in an organized fashion. So the point is that to talk about thousands or even millions of sensors makes no sense. You’re not going to do good weather until you get billions of sensors. That’s my opinion.

We talked about whether sensitive dependence on initial conditions destroys all predictability in fluid dynamics. I have theoretical and computational reasons to think it doesn’t. But Ed had a story:

There’s a mountain in California I happen to know, and I have a picture of a cloud street that starts on that mountain because it has a very peculiar geometry, and then runs for 2,000 miles.

So this particular mountain has an area of its rock that faces towards the east and it’s big. And what happens is when the Sun is shining on that and the wet wind is coming from the Pacific and so on, you get this big cumulus cloud that flows back this way, and then you get another one and it pulses. You get one after another. And these are very stable things and they travel a very long way. So my point is that amidst all the randomness there’s a lot of order that can be found and understood. There are regions that have funny properties. They’re much more temperature stable. There’s like islands of stability. And things like that get ignored by everything people are doing today, you know what I mean?

I would send things I’d written to Ed. I didn’t really think he’d read them. But I thought he might at least enjoy their concepts. And often he would respond with ideas of his own. I sent him an announcement about our Tweet-a-Program project (now reconfigured because of Twitter changes) with the one-line comment (reflecting his “best programmer” self-characterization): “A new frontier of programming prowess?” He responded, in typical Ed fashion, with an idea—that’s actually a little reminiscent of modern AI image generation:

Subject: Re: Tweet-a-Program

Date: Fri, 19 Sep 2014 21:25:47 +0000

From: Edward Fredkin

To: Stephen Wolfram

Hi,

I like it! As usual, it gave me ideas that might be outside of your

current concept.

We should talk sometime, so that I can explain something closely related

to [Tweet-a-Program] but decidedly different and perhaps even more fun.

Strangely, it has to do with Haiku.

What I have figured out is that there could be a new kind of Haiku, where

the text is interpreted by Mathematica to generate an image.

…

The trick will be having the image reflect something of the Haiku meaning,

even if only abstractly. I don’t know how to do this so that it does the perfect

thing every time, but I have thought of something that could be fun, and a

person could become skilled at creating Mathematica Haikus that seem to reflect

some aspects of the feeling of the words in an image with some increasing

probability of doing it well, as a result of practice.

…

Ed

Late in 2014 Ed sent me another piece of mail saying he was starting a project to produce a “new cellular automaton system”—and he wanted to use our technology to do it. He also sent me a paper he’d written about his SALT cellular automaton:

Finally—and without my help—Ed seemed to have mastered the art of academic papers. This one was on the arXiv preprint server. Others—with titles like “An Introduction to Digital Philosophy”—had appeared in academic journals. (Ones with titles like “A New Cosmogony” and “Finite Nature ” were more privately circulated.) But what most struck me about this particular paper was that—for the first time—it seemed to have actual images of cellular automaton behavior. Ever since those few minutes with the PERQ computer on Ed’s island in 1982 I hadn’t seen Ed ever show anything like that. And now Ed was again chasing that old question Minsky had asked, of making a circle with a cellular automaton.

At the time, I didn’t have a chance to see what Ed had actually done, and whether he’d finally solved it. But in writing this piece, I decided I’d better try to find out. The actual rule—that Ed and Dan Miller called “BusyBoxes”—is quite complicated, involving knight’s-move neighborhoods, etc. Their claim was that starting with a string of cells in a particular configuration, the average of their positions would trace out what in the limit of a long string would be a circle:

At first it looks like a kind of magic trick (and no, nothing is bouncing off any “walls”; the direction changes are just a consequence of the initial pattern of cells). But if you keep all the locations that get visited, things start to seem less mysterious—because what you realize is that the “basket” that gets “woven” is actually just a cube, viewed from a corner:

Where does the apparent circle come from? The details are a bit complicated—and I’ve put them in an appendix below. But suffice it say to that Ed’s old nemesis—calculus—comes in very handy. And in fact it lets one show that although one gets almost a circle, it’s not quite a circle; even with an infinite string, its radius is still wiggling by about 0.5% as one goes around the “circle”:

And—as we’ll see below—remarkably enough one can get a closed-form result for the amount of wiggliness (here computed as the ratio of maximum to minimum radius):

In earlier years, Ed might have tried to say that generating a circle (which this doesn’t) was tantamount to showing that a cellular automaton could reproduce physics. But by now I think he realized that it was really much more complicated than that. And he wasn’t mentioning physics much to me anymore. But—perhaps not least because many of his longtime interlocutors had by then died—he was interacting with me more than before. And perhaps he was even beginning to think that I might have a bit more to contribute than he’d assumed.

In December 2015 I sent Ed a piece I’d written to celebrate the bicentenary of Ada Lovelace, and he responded:

Date: Fri, 11 Dec 2015 15:58:14 +0000

From: Edward Fredkin

To: Stephen Wolfram

Stephen,

I was truly blown away by your essay re Ada Lovelace! You’ve got a lot

more to give the world than I had imagined, and I, more than anyone else,

appreciate what you might still be capable of accomplishing.

It’s too bad that some persons at MIT, for far too long, hung onto one

dimensional views focussed on what Macsyma might have been. My own

impressions have always been different, I recognized your potential long

ago and consequently invited you to one of my Mosqito Island conferences

some 3.5 decades ago.

In any case, much of what Mathematica makes possible is very important and

valuable to me. As you know I was an early user and continue to be a

user.

Many of my interests have run along many paths opened up by activities you

have instigated at Wolfram. Wolfram Alpha and its connections to Siri,

are examples.

Your new book “An Elementary Introduction to the Wolfram Language” (I

don’t yet have a hard copy) fits in with a project I had in mind for my

grandchild Robert, who at age 6 already seems to be extraordinarily

talented mathematically.

To cut to the chase, I want to make a proposal: Although I’m too old to

be a regular employee, I’d nevertheless like to have an association with

Wolfram, where I might be able to contribute ideas, and solve problems

(I’m still quite good at that).

I won’t need much from you other than your opening the door to my

involvement at Wolfram. What I have in mind would be an arrangement

where I could work for Wolfram, with some kind of arrangement other than

full time employment.

I’ve attached something I wrote recently.

Ed

Gosh! That was an unexpected development. Flattering, I suppose. But my main reaction was a kind of sadness. Yes, after all these years, Ed had finally read something I’d written. But somehow his response sounded like he was surrendering. This wasn’t the “I-want-to-do-everything-for-myself” Ed I had known all this time. This was an Ed who somehow felt he needed us to support him. And while our company has been able to absorb a great many “unusual” people—with terrific success—Ed seemed like he was pretty far outside our envelope.

At the time, I didn’t look at the attachment Ed sent with his email. But opening it now adds to my sense of sadness. It was a 13-page document about a system Ed imagined that would help people with “various forms of cognitive disabilities”, including a section on “Dementia and Alzheimer’s”:

It wasn’t until 2017 that Ed explicitly mentioned to me that his short-term memory was failing—though in talking to him it had been increasingly obvious for several years. He said he’d joined a group of people who were writing their memoirs. I told him I’d look forward to seeing his, though I’m not sure he ever made much progress on them.

Ed continued to send me ideas and proposals. There was a very Ed-like “global idea” about creating a system “GM” (presumably for “General Mathematician”) that would effectively “learn all of mathematics” by automatically reading math books, etc. (yes, definite overtones of what’s happening with LLM-meets-Wolfram-Language):

Later there were several pieces of mail about a new idea for factoring integers. In the first of them (from 2016), Ed told me that when the NeXT computer first came out (in 1989) he’d used Mathematica on it to simulate a reversible hardware multiplier. And being reminded of this by a historical piece I’d written, he said it had “started me thinking, again, about that problem and I had a new insight that appears to so greatly reduce the complexity of a reversible multiplier so as to possibly make it better at factoring large integers than current algorithms.” He wrote me about this several more times, suggesting various kinds of collaborations. Finally, in 2018 he told me how the method worked, saying it involved doing reversible arithmetic using balanced ternary. (Strangely enough, years earlier Ed had told me about Soviet computers that also used balanced ternary.)

I think that was the last technical conversation I had with Ed. A couple of years later I sent him the book about our Physics Project with the inscription:

And I would see him at least once a year at the Boston-area physics get-together organized by Boston University. He would always tell me stories. Often the same stories, and sometimes stories about me. And indeed as I was writing this piece I actually found a video Ed made in November 2020 that has such a story, albeit by this point seriously muddled (and, no, I’ve basically never “run” a cellular automaton by hand in my life!):

I used to organize meetings in the Caribbean and I did this because I had an island in the Caribbean … I invited Wolfram to come down. Wolfram had done pioneering work in cellular automata. … He was a great guy, you know, and I wanted him to get on the bandwagon … He shows up at the meeting and he had done all his work by hand as had everyone else in cellular automata. He didn’t think of using a computer. [!] I had a display processor that I modified to be able to run a cellular automaton with the stuff that it used to put text up on the screen. And so I’m showing him a cellular automata running at 60 frames a second continuously like a movie. This was 10,000 times faster than doing it by hand which is what he’d always done. He never thought of using a computer to do cellular automata and he turns around and walks out and and he left the island and went back to someplace else. So [later] I went to his meeting at Los Alamos and I ran into him again and he was now doing computer work. And I said to him “How come in all your work you don’t have a reversible [rule]”, and he says to me “Oh, reversible ones are all trivial”. And I went up and this is the most telling thing about his intellect: he’s a very smart guy [and when I] showed him how he could change his rule slightly and make it reversible his eyes just about popped out of his head and he knew I was correct.

I may have introduced him to this field but what he has done is he is far better than I at getting other people involved. I’ve never bothered and I don’t have the talent that he has for that. What he did was he came up with similar ideas and initially he didn’t give me the credit I thought I deserved. But it became apparent to me that he did this independently and he’s better at writing things and better at hiring bright people who can do things than I ever was.

And right after that, Ed ends the video with:

As I look back on my career I’ve had a fantastic life and I’m not unhappy about any aspect of it because, you know, I’ve accomplished everything I might have done and in spite of various handicaps—like not being a writer—I still have done a lot and the world, uh, understands me, I think, and appreciates what I’ve done.

When I saw Ed in 2022 he wasn’t able to say much. But, though it was a struggle, he was keen to make one point to me, that seemed to matter a lot to him: “You’ve managed to get people to follow you”, he said “I was never able to do that”. I saw Ed one last time this May. Joyce explained that Ed had “bumped his head”, and, in a very Ed-like way, she was avoiding a repeat by getting him to wear a bike helmet. She wanted someone to snap a picture of me and her with Ed:

Six weeks later, Ed died, at the age of 88.

I went to see Joyce and Rick a few weeks later, among other things to check facts for this piece. I’d heard from Ed that his ancestors had provided wood for the imperial palace in St. Petersburg. But I’d also heard from someone else that Ed had said he was descended from Mongolian royalty. And as I was about to leave, I thought I might as well ask. “Oh yes”, they said. “And Ed’s father even wrote a historical novel about it”. And they showed me two books (both from the mid-1980s):

I’m not sure who Sarah, Queen of Mongolia was, but the book blurb claims that Ed’s father was her great-great-great-grandson—and goes on to speak of the “strong family inheritance of a mind that analyzes not only the injustice of human oppression but offers realistic and beneficial solutions”.

“Can that really be true?” I often asked myself when hearing yet another of Ed’s implausible stories. And of course it didn’t help that stories he told—even to me—about me weren’t true. But the remarkable thing in writing this piece is that I’ve been able to verify that a lot of Ed’s stories—implausible though they may have sounded—were in fact true. Yes, they were often embellished, and parts that didn’t reflect so well on Ed were omitted. But together they defined a remarkable tapestry of a life.

It was in many ways a very independent life. Ed had friends and family members to whom he stayed close throughout his life. But mostly it was “Ed for himself, against the world”. He didn’t want to learn anything from anyone else; he wanted to figure out everything for himself. He wanted to invent his own ideas; he wasn’t too interested in other people’s. In a rather Air-Force-pilot kind of way (“eject or not?”) he liked to be decisive—and he liked to be incisive too, always figuring out a clear, simple thing to say. Sometimes that came across as naive. And sometimes it was in fact naive. But mostly Ed didn’t seem to mind much; he would just go on to another idea.

Ed was a great storyteller, and an engaging speaker. For some reason he developed the theory that he couldn’t write—but there’s ample evidence, going back even to his teenage years, that this wasn’t true. If there was a problem, it was with content, not writing. And the issue with the content was that it tended to just be too Ed-specific—too insular—and not connected enough for other people to be able to understand or appreciate it.

I don’t know what Ed was like as a manager; I rather suspect he may have suffered from trying to be a bit too clever, with too many ideas and too much gamification. In the end, he felt he’d failed as a leader, and perhaps that was inevitable given how independent he always wanted to be. Despite his stints as an academic administrator and as a CEO, Ed was in the end fundamentally a lone warrior (and problem solver), not a general.

And what about all those ideas? Most never developed very far. Some were pretty wild. But many had at least a kernel of visionary insight. The details of the universe as a cellular automaton didn’t make sense. But the idea that the universe is somehow computational is surely correct. And spread over the course of more than six decades, Ed spun out nuggets of ideas that would later appear—usually much more developed—in a remarkable range of areas.

Ed projected a kind of personal serenity—yet he was in many ways deeply competitive. Most of the time, though, he was able to define the arena of his competitiveness so idiosyncratically that there really weren’t other contenders. And I think in the end Ed felt pretty good about all the things he’d managed to do in his life. It was fitting that he owned an actual island. Because somehow an island was a metaphor for Ed’s life: separate, independent and unique.

I’ve had help with information for this piece from many people, including Joyce Fredkin, Rick Fredkin, Simson Garfinkel, Andrea Gerlach, Bill Gosper, Howard Gutowitz, Steven Levy, Norm Margolus, Margaret Minsky, Dave Moon, John Moussouris, Mark Nahabedian, Walter Parkes, David Reiss, Brian Silverman, George Sulak, Larry Sulak and Matthew Szudzik. (Tom Toffoli agreed to talk, but didn’t show up.) I thank the Department of Distinctive Collections at the MIT Library for access to the Fredkin papers archive there. Thanks also to Brad Klee and Nik Murzin for technical help.

Here’s what the SALT cellular automaton does for two sizes of initial “string”:

For an initial string of length *n* (with *n* > 2), the overall period is 54*n* – 43, and the envelope “woven” going through all configurations is:

The “circle” is obtained by averaging the positions of all cells present at a given time step. The “circle” is always planar, but its effective radius varies with direction (i.e. as the system steps through each cycle):

Ed and Dan Miller looked at the standard deviation of the effective radius as a function of *n*, computing it up to *n* = 20, and getting the following results:

It looked as if the standard deviation was just going to go smoothly to zero—so that for an infinite string one would get a perfect circle. But that turns out not to be true, as one can see by extending the computation to slightly larger values of *n*:

And actually there’s a minimum at *n* = 43, with standard deviation 0.0012 (and fractional size discrepancy 0.0048)—and it doesn’t look like even for *n* ∞ one will get a perfect circle.

But how can one work out the *n* ∞ case? It’s actually a nice application for calculus.

First, notice that the “basket” consists of a series of layers of a cube viewed from one of its corners, or in other words a sequence of shapes like this:

Here’s how these are formed as one sweeps through the cube:

One can think of the string in the cellular automaton as spanning these “layers”, and successively moving around all of them as the cellular automaton evolves. In the continuum limit, there’s effectively a parameter *t* that defines where on each “layer curve” one is at a particular time. Conveniently enough, the length of all the layer curves is the same (for a unit cube it is 3 ≈ 4.2). With successive layers parametrized by a variable *s* (running from 0 to 1) the corners of the layer curves (all normalized to have length 1) are given by:

Now we need to find the actual *x*, *y* positions of string elements (AKA infinitesimal cells) as a function of *s* and *t*. Since the edges of the layer polygons are always straight, in each of a series of “piecewise regions” in *s* and *t* (with breakpoints defined by the corners of the polygons), we get expressions for *x* and *y* that are linear in *s* and *t*:

One subtlety is that the string in essence turns as time progresses, so that it effectively samples a different *t* value for different layers *s*. To correct for this, we have to find for which *t* we get *x* = 0 for a given *s*. It’s convenient to put the center of all our layer curves at {0, 0}, and we can do this now by subtracting . Then the (first) value of *t* at which *x* = 0 is given simply by:

The parametric surface we now get as a function of *t* is (with discrete lines indicating particular values of *s*):

Now we can slice the parametric surface not in discrete *s* values but instead in discrete *t* values—thus getting what’s basically a sequence of effective strings at discrete times:

The centroids of the strings are indicated in green, and these are then points on our potential circle. Using what we did above, the radius of this “circle” as a function of *t* can then be found by integrating over *s*. The result is algebraically complicated, but has a closed form:

Integrating this over *t* we get the “average radius”, normalized to “circumference 1” from the fact that *t* varies from 0 to 1 going “around the circle”:

(This means that the “effective π” for this circle is about 3.437.)

Now we can plot the “wiggle” of the radius as a function of “angle” (i.e. *t*):

It looks a bit like a sine curve, but it’s not one. And, for example, it isn’t even symmetrical. Its maxima (which occur at odd multiples of 30°) are

while its minima (at even multiples of 30°) are

and dividing by the average radius these are about 1.00734 and 0.992175.

The ratio of maximum to minimum (effectively “wiggle amplitude”) is:

Meanwhile, the standard deviation can be obtained as an integral over *t*, and the final result is

which is about 2.4 times larger than what we get at *n* = 100. We can see the approach to the asymptotic value by computing integrals over *t* for progressively larger numbers of discrete values of *s* (which, we should emphasize, is similar to values of *n*, but not quite the same, particularly for small *n*):

Click on any image in this post to copy the code that produced it and generate the output on your own computer in a Wolfram notebook.

How do alien minds perceive the world? It’s an old and oft-debated question in philosophy. And it now turns out to also be a question that rises to prominence in connection with the concept of the ruliad that’s emerged from our Wolfram Physics Project.

I’ve wondered about alien minds for a long time—and tried all sorts of ways to imagine what it might be like to see things from their point of view. But in the past I’ve never really had a way to build my intuition about it. That is, until now. So, what’s changed? It’s AI. Because in AI we finally have an accessible form of alien mind.

We typically go to a lot of trouble to train our AIs to produce results that are like we humans would do. But what if we take a human-aligned AI, and modify it? Well, then we get something that’s in effect an alien AI—an AI aligned not with us humans, but with an alien mind.

So how can we see what such an alien AI—or alien mind—is “thinking”? A convenient way is to try to capture its “mental imagery”: the image it forms in its “mind’s eye”. Let’s say we use a typical generative AI to go from a description in human language—like “a cat in a party hat”—to a generated image:

It’s exactly the kind of image we’d expect—which isn’t surprising, because it comes from a generative AI that’s trained to “do as we would”. But now let’s imagine taking the neural net that implements this generative AI, and modifying its insides—say by resetting weights that appear in its neural net.

By doing this we’re in effect going from a human-aligned neural net to some kind of “alien” one. But this “alien” neural net will still produce some kind of image—because that’s what a neural net like this does. But what will the image be? Well, in effect, it’s showing us the mental imagery of the “alien mind” associated with the modified neural net.

But what does it actually look like? Well, here’s a sequence obtained by progressively modifying the neural net—in effect making it “progressively more alien”:

At the beginning it’s still a very recognizable picture of “a cat in a party hat”. But it soon becomes more and more alien: the mental image in effect diverges further from the human one—until it no longer “looks like a cat”, and in the end looks, at least to us, rather random.

There are many details of how this works that we’ll be discussing below. But what’s important is that—by studying the effects of changing the neural net—we now have a systematic “experimental” platform for probing at least one kind of “alien mind”. We can think of what we’re doing as a kind of “artificial neuroscience”, probing not actual human brains, but neural net analogs of them.

And we’ll see many parallels to neuroscience experiments. For example, we’ll often be “knocking out” particular parts of our “neural net brain”, a little like how injuries such as strokes can knock out parts of a human brain. But we know that when a human brain suffers a stroke, this can lead to phenomena like “hemispatial neglect”, in which a stroke victim asked to draw a clock will end up drawing just one side of the clock—a little like the pictures of cats “degrade” when parts of the “neural net brain” are knocked out.

Of course, there are many differences between real brains and artificial neural nets. But most of the core phenomena we’ll observe here seem robust and fundamental enough that we can expect them to span very different kinds of “brains”—human, artificial and alien. And the result is that we can begin to build up intuition about what the worlds of different—and alien—minds can be like.

How does an AI manage to create a picture, say of a cat in a party hat? Well, the AI has to be trained on “what makes a reasonable picture”—and how to determine what a picture is of. Then in some sense what the AI does is to start generating “reasonable” pictures at random, in effect continually checking what the picture it’s generating seems to be “of”, and tweaking it to guide it towards being a picture of what one wants.

So what counts as a “reasonable picture”? If one looks at billions of pictures—say on the web—there are lots of regularities. For example, the pixels aren’t random; nearby ones are usually highly correlated. If there’s a face, it’s usually more or less symmetrical. It’s more common to have blue at the top of a picture, and green at the bottom. And so on. And the important technological point is that it turns out to be possible to use a neural network to capture regularities in images, and to generate random images that exhibit them.

Here are some examples of “random images” generated in this way:

And the idea is that these images—while each is “random” in its specifics—will in general follow the “statistics” of the billions of images from the web on which the neural network has been “trained”. We’ll be talking more about images like these later. But for now suffice it to say that while some may just look like abstract patterns, others seem to contain things like landscapes, human forms, etc. And what’s notable is that none just look like “random arrays of pixels”; they all show some kind of “structure”. And, yes, given that they’ve been trained from pictures on the web, it’s not too surprising that the “structure” sometimes includes things like human forms.

But, OK, let’s say we specifically want a picture of a cat in a party hat. From all of the almost infinitely large number of possible “well-structured” random images we might generate, how do we get one that’s of a cat in a party hat? Well, a first question is: how would we know if we’ve succeeded? As humans, we could just look and see what our image is of. But it turns out we can also train a neural net to do this (and, no, it doesn’t always get it exactly right):

How is the neural net trained? The basic idea is to take billions of images—say from the web—for which corresponding captions have been provided. Then one progressively tweaks the parameters of the neural net to make it reproduce these captions when it’s fed the corresponding images. But the critical point is the neural net turns out to do more: it also successfully produces “reasonable” captions for images it’s never seen before. What does “reasonable” mean? Operationally, it means captions that are similar to what we humans might assign. And, yes, it’s far from obvious that a computationally constructed neural net will behave at all like us humans, and the fact that it does is presumably telling us fundamental things about how human brains work.

But for now what’s important is that we can use this captioning capability to progressively guide images we produce towards what we want. Start from “pure randomness”. Then try to “structure the randomness” to make a “reasonable” picture, but at every step see in effect “what the caption would be”. And try to “go in a direction” that “leads towards” a picture with the caption we want. Or, in other words, progressively try to get to a picture that’s of what we want.

The way this is set up in practice, one starts from an array of random pixels, then iteratively forms the picture one wants:

Different initial arrays lead to different final pictures—though if everything works correctly, the final pictures will all be of “what one asked for”, in this case a cat in a party hat (and, yes, there are a few “glitches”):

We don’t know how mental images are formed in human brains. But it seems conceivable that the process is not too different. And that in effect as we’re trying to “conjure up a reasonable image”, we’re continually checking if it’s aligned with what we want—so that, for example, if our checking process is impaired we can end up with a different image, as in hemispatial neglect.

That everything can ultimately be represented in terms of digital data is foundational to the whole computational paradigm. But the effectiveness of neural nets relies on the slightly different idea that it’s useful to treat at least many kinds of things as being characterized by arrays of real numbers. In the end one might extract from a neural net that’s giving captions to images the word “cat”. But inside the neural net it’ll operate with arrays of numbers that correspond in some fairly abstract way to the image you’ve given, and the textual caption it’ll finally produce.

And in general neural nets can typically be thought of as associating “feature vectors” with things—whether those things are images, text, or anything else. But whereas words like “cat” and “dog” are discrete, the feature vectors associated with them just contain collections of real numbers. And this means that we can think of a whole space of possibilities, with “cat” and “dog” just corresponding to two specific points.

So what’s out there in that space of possibilities? For the feature vectors we typically deal with in practice the space is many-thousand-dimensional. But we can for example look at the (nominally straight) line from the “dog point” to the “cat point” in this space, and even generate sample images of what comes between:

And, yes, if we want to, we can keep going “beyond cat”—and pretty soon things start becoming quite weird:

We can also do things like look at the line from a plane to a cat—and, yes, there’s strange stuff in there (wings hat ears?):

What about elsewhere? For example, what happens “around” our standard “cat in a party hat”? With the particular setup we’re using, there’s a 2304-dimensional space of possibilities. But as an example, we look at what we get on a particular 2D plane through the “standard cat” point:

Our “standard cat” is in the middle. But as we move away from the “standard cat” point, progressively weirder things happen. For a while there are recognizable (if perhaps demonic) cats to be seen. But soon there isn’t much “catness” in evidence—though sometimes hats do remain (in what we might characterize as an “all hat, no cat” situation, reminiscent of the Texan “all hat, no cattle”).

How about if we pick other planes through the standard cat point? All sorts of images appear:

But the fundamental story is always the same: there’s a kind of “cat island”, beyond which there are weird and only vaguely cat-related images—encircled by an “ocean” of what seem like purely abstract patterns with no obvious cat connection. And in general the picture that emerges is that in the immense space of possible “statistically reasonable” images, there are islands dotted around that correspond to “linguistically describable concepts”—like cats in party hats.

The islands normally seem to be roughly “spherical”, in the sense that they extend about the same nominal distance in every direction. But relative to the whole space, each island is absolutely tiny—something like perhaps a fraction 2^{–2000} ≈ 10^{–600} of the volume of the whole space. And between these islands there lie huge expanses of what we might call “interconcept space”.

What’s out there in interconcept space? It’s full of images that are “statistically reasonable” based on the images we humans have put on the web, etc.—but aren’t of things we humans have come up with words for. It’s as if in developing our civilization—and our human language—we’ve “colonized” only certain small islands in the space of all possible concepts, leaving vast amounts of interconcept space unexplored.

What’s out there is pretty weird—and sometimes a bit disturbing. Here’s what we see zooming in on the same (randomly chosen) plane around “cat island” as above:

What are all these things? In a sense, words fail us. They’re things on the shores of interconcept space, where human experience has not (yet) taken us, and for which human language has not been developed.

What if we venture further out into interconcept space—and for example just sample points in the space at random? It’s just like we already saw above: we’ll get images that are somehow “statistically typical” of what we humans have put on the web, etc., and on which our AI was trained. Here are a few more examples:

And, yes, we can pick out at least two basic classes of images: ones that seem like “pure abstract textures”, and ones that seem “representational”, and remind us of real-world scenes from human experience. There are intermediate cases—like “textures” with structures that seem like they might “represent something”, and “representational-seeming” images where we just can’t place what they might be representing.

But when we do see recognizable “real-world-inspired” images they’re a curious reflection of the concepts—and general imagery—that we humans find “interesting enough to put on the web”. We’re not dealing here with some kind of “arbitrary interconcept space”; we’re dealing with “human-aligned” interconcept space that’s in a sense anchored to human concepts, but extends between and around them. And, yes, viewed in these terms it becomes quite unsurprising that in the interconcept space we’re sampling, there are so many images that remind us of human forms and common human situations.

But just what were the images that the AI saw, from which it formed this model of interconcept space? There were a few billion of them, “foraged” from the web. Like things on the web in general, it’s a motley collection; here’s a random sample:

Some can be thought of as capturing aspects of “life as it is”, but many are more aspirational, coming from staged and often promotionally oriented photography. And, yes, there are lots of Net-a-Porter-style “clothing-without-heads” images. There are also lots of images of “things”—like food, etc. But somehow when we sample randomly in interconcept space it’s the human forms that most distinctively stand out, conceivably because “things” are not particularly consistent in their structure, but human forms always have a certain consistency of “head-body-arms, etc.” structure.

It’s notable, though, that even the most real-world-like images we find by randomly sampling interconcept space seem to typically be “painterly” and “artistic” rather than “photorealistic” and “photographic”. It’s a different story close to “concept points”—like on cat island. There more photographic forms are common, though as we go away from the “actual concept point”, there’s a tendency towards either a rather toy-like appearance, or something more like an illustration.

By the way, even the most “photographic” images the AI generates won’t be anything that comes directly from the training set. Because—as we’ll discuss later—the AI is not set up to directly store images; instead its training process in effect “grinds up” images to extract their “statistical properties”. And while “statistical features” of the original images will show up in what the AI generates, any detailed arrangement of pixels in them is overwhelmingly unlikely to do so.

But, OK, what happens if we start not at a “describable concept” (like “a cat in a party hat”), but just at a random point in interconcept space? Here are the kinds of things we see:

The images often seem to be a bit more diverse than those around “known concept points” (like our “cat point” above). And occasionally there’ll be a “flash” of something “representationally familiar” (perhaps like a human form) that’ll show up. But most of the time we won’t be able to say “what these images are of”. They’re of things that are somehow “statistically” like what we’ve seen, but they’re not things that are familiar enough that we’ve—at least so far—developed a way to describe them, say with words.

There’s something strangely familiar—yet unfamiliar—to many of the images in interconcept space. It’s fairly common to see pictures that seem like they’re of people:

But they’re “not quite right”. And for us as humans, being particularly attuned to faces, it’s the faces that tend to seem the most wrong—even though other parts are “wrong” as well.

And perhaps in commentary on our nature as a social species (or maybe it’s as a social media species), there’s a great tendency to see pairs or larger groups of people:

There’s also a strange preponderance of torso-only pictures—presumably the result of “fashion shots” in the training data (and, yes, with some rather wild “fashion statements”):

People are by far the most common identifiable elements. But one does sometimes see other things too:

Then there are some landscape-type scenes:

Some look fairly photographically literal, but others build up the impression of landscapes from more abstract elements:

Occasionally there are cityscape-like pictures:

And—still more rarely—indoor-like scenes:

Then there are pictures that look like they’re “exteriors” of some kind:

It’s common to see images built up from lines or dots or otherwise “impressionistically formed”:

And then there are lots of images of that seem like they’re trying to be “of something”, but it’s not at all clear what that “thing” is, and whether indeed it’s something we humans would recognize, or whether instead it’s something somehow “fundamentally alien”:

It’s also quite common to see what look more like “pure patterns”—that don’t really seem like they’re “trying to be things”, but more come across like “decorative textures”:

But probably the single most common type of images are somewhat uniform textures, formed by repeating various simple elements, though usually with “dislocations” of various kinds:

Across interconcept space there’s tremendous variety to the images we see. Many have a certain artistic quality to them—and a feeling that they are some kind of “mindful interpretation” of a perhaps mundane thing in the world, or a simple, essentially mathematical pattern. And to some extent the “mind” involved is a collective version of our human one, reflected in a neural net that has “experienced” some of the many images humans have put on the web, etc. But in some ways the mind is also a more alien one, formed from the computational structure of the neural net, with its particular features, and no doubt in some ways computationally irreducible behavior.

And indeed there are some motifs that show up repeatedly that are presumably reflections of features of the underlying structure of the neural net. The “granulated” appearance, with alternation between light and dark, for example, is presumably a consequence of the dynamics of the convolutional parts of the neural net—and analogous to the results of what amounts to iterated blurring and sharpening with a certain effective pixel scale (reminiscent, for example, of video feedback):

We can think of what we’ve done so far as exploring what a mind trained from human-like experiences can “imagine” by generalizing from those experiences. But what might a different kind of mind imagine?

As a very rough approximation, we can think of just taking the trained “mind” we’ve created, and explicitly modifying it, then seeing what it now “imagines”. Or, more specifically, we can take the neural net we have been using, and start making changes to it, and seeing what effect that has on the images it produces.

We’ll discuss later the details of how the network is set up, but suffice it to say here that it involves 391 distinct internal modules, involving altogether nearly a billion numerical weights. When the network is trained, those numerical weights are carefully tuned to achieve the results we want. But what if we just change them? We’ll still (normally) get a network that can generate images. But in some sense it’ll be “thinking differently”—so potentially the images will be different.

So as a very coarse first experiment—reminiscent of many that are done in biology—let’s just “knock out” each successive module in turn, setting all its weights to zero. If we ask the resulting network to generate a picture of “a cat in a party hat”, here’s what we now get:

Let’s look at these results in a bit more detail. In quite a few cases, zeroing out a single module doesn’t make much of a difference; for example, it might basically only change the facial expression of the cat:

But it can also more fundamentally change the cat (and its hat):

It can change the configuration or position of the cat (and, yes, some of those paws are not anatomically correct):

Zeroing out other modules can in effect change the “rendering” of the cat:

But in other cases things can get much more mixed up, and difficult for us to parse:

Sometimes there’s clearly a cat there, but its presentation is at best odd:

And sometimes we get images that have definite structure, but don’t seem to have anything to do with cats:

Then there are cases where we basically just get “noise”, albeit with things superimposed:

But—much like in neurophysiology—there are some modules (like the very first and last ones in our original list) where zeroing them out basically makes the system not work at all, and just generate “pure random noise”.

As we’ll discuss below, the whole neural net that we’re using has a fairly complex internal structure—for example, with a few fundamentally different kinds of modules. But here’s a sample of what happens if one zeros out modules at different places in the network—and what we see is that for the most part there’s no obvious correlation between where the module is, and what effect zeroing it out will have:

So far, we’ve just looked at what happens if we zero out a single module at a time. Here are some randomly chosen examples of what happens if one zeros out successively more modules (one might call this a “HAL experiment” in remembrance of the fate of the fictional HAL AI in the movie *2001*):

And basically once the “catness” of the images is lost, things become more and more alien from there on out, ending either in apparent randomness, or sometimes barren “zeroness”.

Rather than zeroing out modules, we can instead randomize the weights in them (perhaps a bit like the effect of a tumor rather than a stroke in a brain)—but the results are usually at least qualitatively similar:

Something else we can do is just to progressively mix randomness uniformly into every weight in the network (perhaps a bit like globally “drugging” a brain). Here are three examples where in each case 0%, 1%, 2%, … of randomness was added—all “fading away” in a very similar way:

And similarly, we can progressively scale down towards zero (in 1% increments: 100%, 99%, 98%, …) all the weights in the network:

Or we can progressively increase the numerical values of the weights—eventually in some sense “blowing the mind” of the network (and going a bit “psychedelic” in the process):

We can think of what we’ve done so far as exploring some of the “natural history” of what’s out there in generative AI space—or as providing a small taste of at least one approximation to the kind of mental imagery one might encounter in alien minds. But how does this fit into a more general picture of alien minds and what they might be like?

With the concept of the ruliad we finally have a principled way to talk about alien minds—at least at a theoretical level. And the key point is that any alien mind—or, for that matter, any mind—can be thought of as “observing” or sampling the ruliad from its own particular point of view, or in effect, its own position in rulial space.

The ruliad is defined to be the entangled limit of all possible computations: a unique object with an inevitable structure. And the idea is that anything—whether one interprets it as a phenomenon or an observer—must be part of the ruliad. The key to our Physics Project is then that “observers like us” have certain general characteristics. We are computationally bounded, with “finite minds” and limited sensory input. And we have a certain coherence that comes from our belief in our persistence in time, and our consistent thread of experience. And what we then discover in our Physics Project is the rather remarkable result that from these characteristics and the general properties of the ruliad alone it’s essentially inevitable that we must perceive the universe to exhibit the fundamental physical laws it does, in particular the three big theories of twentieth-century physics: general relativity, quantum mechanics and statistical mechanics.

But what about more detailed aspects of what we perceive? Well, that will depend on more detailed aspects of us as observers, and of how our minds are set up. And in a sense, each different possible mind can be thought of as existing in a certain place in rulial space. Different human minds are mostly close in rulial space, animal minds further away, and more alien minds still further. But how can we characterize what these minds are “thinking about”, or how these minds “perceive things”?

From inside our own minds we can form a sense of what we perceive. But we don’t really have good ways to reliably probe what another mind perceives. But what about what another mind imagines? Well, that’s where what we’ve been doing here comes in. Because with generative AI we’ve got a mechanism for exposing the “mental imagery” of an “AI mind”.

We could consider doing this with words and text, say with an LLM. But for us humans images have a certain fluidity that text does not. Our eyes and brains can perfectly well “see” and absorb images even if we don’t “understand” them. But it’s very difficult for us to absorb text that we don’t “understand”; it usually tends to seem just like a kind of “word soup”.

But, OK, so we generate “mental imagery” from “minds” that have been “made alien” by various modifications. How come we humans can understand anything such minds make? Well, it’s bit like one person being able to understand the thoughts of another. Their brains—and minds—are built differently. And their “internal view” of things will inevitably be different. But the crucial idea—that’s for example central to language—is that it’s possible to “package up” thoughts into something that can be “transported” to another mind. Whatever some particular internal thought might be, by the time we can express it with words in a language, it’s possible to communicate it to another mind that will “unpack” it into different internal thoughts.

It’s a nontrivial fact of physics that “pure motion” in physical space is possible; in other words, that an “object” can be moved “without change” from one place in physical space to another. And now, in a sense, we’re asking about pure motion in rulial space: can we move something “without change” from one mind at one place in rulial space to another mind at another place? In physical space, things like particles—as well as things like black holes—are the fundamental elements that are imagined to move without change. So what’s now the analog in rulial space? It seems to be concepts—as often, for example, represented by words.

So what does that mean for our exploration of generative AI “alien minds”? We can ask whether when we move from one potentially alien mind to another concepts are preserved. We don’t have a perfect proxy for this (though we could make a better one by appropriately training neural net classifiers). But as a first approximation this is like asking whether as we “change the mind”—or move in rulial space—we can still recognize the “concept” the mind produces. Or, in other words, if we start with a “mind” that’s generating a cat in a party hat, will we still recognize the concepts of cat or hat in what a “modified mind” produces?

And what we’ve seen is that sometimes we do, and sometimes we don’t. And for example when we looked at “cat island” we saw a certain boundary beyond which we could no longer recognize “catness” in the image that was produced. And by studying things like cat island (and particularly its analogs when not just the “prompt” but also the underlying neural net is changed) it should be possible to map out how far concepts “extend” across alien minds.

It’s also possible to think about a kind of inverse question: just what is the extent of a mind in rulial space? Or, in other words, what range of points of view, ultimately about the ruliad, can it hold? Will it be “narrow-minded”, able to think only in particular ways, with particular concepts? Or will it be more “broad-minded”, encompassing more ways of thinking, with more concepts?

In a sense the whole arc of the intellectual development of our civilization can be thought of as corresponding to an expansion in rulial space: with us progressively being able to think in new ways, and about new things. And as we expand in rulial space, we are in effect encompassing more of what we previously would have had to consider the domain of an alien mind.

When we look at images produced by generative AI away from the specifics of human experience—say in interconcept space, or with modified rules of generation—we may at first be able to make little from them. Like inkblots or arrangements of stars we’ll often find ourselves wanting to say that what we see looks like this or that thing we know.

But the real question is whether we can devise some way of describing what we see that allows us to build thoughts on what we see, or “reason” about it. And what’s very typical is that we manage to do this when we come up with a general “symbolic description” of what we see, say captured with words in natural language (or, now, computational language). Before we have those words, or that symbolic description, we’ll tend just not to absorb what we see.

And so, for example, even though nested patterns have always existed in nature, and were even explicitly created by mosaic artisans in the early 1200s, they seem to have never been systematically noticed or discussed at all until the latter part of the 20th century, when finally the framework of “fractals” was developed for talking about them.

And so it may be with many of the forms we’ve seen here. As of today, we have no name for them, no systematic framework for thinking about them, and no reason to view them as important. But particularly if the things we do repeatedly show us such forms, we’ll eventually come up with names for them, and start incorporating them into the domain that our minds cover.

And in a sense what we’ve done here can be thought of as showing us a preview of what’s out there in rulial space, in what’s currently the domain of alien minds. In the general exploration of ruliology, and the investigation of what arbitrary simple programs in the computational universe do, we’re able to jump far across the ruliad. But it’s typical that what we see is not something we can connect to things we’re familiar with. In what we’re doing here, we’re moving only much smaller distances in rulial space. We’re starting from generative AI that’s closely aligned with current human development—having been trained from images that we humans have put on the web, etc. But then we’re making small changes to our “AI mind”, and looking at what it now generates.

What we see is often surprising. But it’s still close enough to where we “currently are” in rulial space that we can—at least to some extent—absorb and reason about what we’re seeing. Still, the images often don’t “make sense” to us. And, yes, quite possibly the AI has invented something that has a rich and “meaningful” inner structure. But it’s just that we don’t (yet) have a way to talk about it—and if we did, it would immediately “make perfect sense” to us.

So if we see something we don’t understand, can we just “train a translator”? At some level the answer must be yes. Because the Principle of Computational Equivalence implies that ultimately there’s a fundamental uniformity to the ruliad. But the problem is that the translator is likely to have to do an irreducible amount of computational work. And so it won’t be implementable by a “mind like ours”. Still, even though we can’t create a “general translator” we can expect that certain features of what we see will still be translatable—in effect by exploiting certain pockets of computational reducibility that must necessarily exist even when the system as a whole is full of computational irreducibility. And operationally what this means in our case is that the AI may in effect have found certain regularities or patterns that we don’t happen to have noticed but that are useful in exploring further from the “current human point” in rulial space.

It’s very challenging to get an intuitive understanding of what rulial space is like. But the approach we’ve taken here is for me a promising first effort in “humanizing” rulial space, and seeing just how we might be able to relate to what is so far the domain of alien minds.

In the main part of this piece, we’ve mostly just talked about what generative AI does, not how it works inside. Here I’ll go a little deeper into what’s inside the particular type of generative AI system that I’ve used in my explorations. It’s a method called stable diffusion, and its operation is in many ways both clever and surprising. As it’s implemented today it’s steeped in fairly complicated engineering details. To what extent these will ultimately be necessary isn’t clear. But in any case here I’ll mostly concentrate on general principles, and on giving a broad outline of how generative AI can be used to produce images.

At the core of generative AI is the ability to produce things of some particular type that “follow the patterns of” known things of that type. So, for example, large language models (LLMs) are intended to produce text that “follows the patterns” of text written by humans, say on the web. And generative AI systems for images are similarly intended to produce images that “follow the patterns” of images put on the web, etc.

But what kinds of patterns exist in typical images, say on the web? Here are some examples of “typical images”—scaled down to 32×32 pixels and taken from a standard set of 60,000 images:

And as a very first thing, we can ask what colors show up in these images. They’re not uniform in RGB space:

But what about the positions of different colors? Adjusting to accentuate color differences, the “average image” turns out to have a curious “HAL’s eye” look (presumably with blue for sky at the top, and brown for earth at the bottom):

But just picking pixels separately—even with the color distribution inferred from actual images—won’t produce images that in any way look “natural” or “realistic”:

And the immediate issue is that the pixels aren’t really independent; most pixels in most images are correlated in color with nearby pixels. And in a first approximation one can capture this for example by fitting the list of colors of all the pixels to a multivariate Gaussian distribution with a covariance matrix that represents their correlation. Sampling from this distribution gives images like these—that indeed look somehow “statistically natural”, even if there isn’t appropriate detailed structure in them:

So, OK, how can one do better? The basic idea is to use neural nets, which can in effect encode detailed long-range connections between pixels. In some way it’s similar to what’s done in LLMs like ChatGPT—where one has to deal with long-range connections between words in text. But for images it’s structurally a bit more difficult, because in some sense one has to “consistently fit together 2D patches” rather than just progressively extend a 1D sequence.

And the typical way this is done at first seems a bit bizarre. The basic idea is to start with a random array of pixels—corresponding in effect to “pure noise”—and then progressively to “reduce the noise” to end up with a “reasonable image” that follows the patterns of typical images, all the while guided by some prompt that says what one wants the “reasonable image” to be of.

How does one go from randomness to definite “reasonable” things? The key is to use the notion of attractors. In a very simple case, one might have a system—like this “mechanical” example—where from any “randomly chosen” initial condition one also evolves to one of (here) two definite (fixed-point) attractors:

One has something similar in a neural net that’s for example trained to recognize digits:

Regardless of exactly how each digit is written, or noise that gets added to it, the network will take this input and evolve to an attractor corresponding to a digit.

Sometimes there can be lots of attractors. Like in this (“class 2”) cellular automaton evolving down the page, many different initial conditions can lead to the same attractor, but there are many possible attractors, corresponding to different final patterns of stripes:

The same can be true for example in 2D cellular automata, where now the attractors can be thought of as being different “images” with structure determined by the cellular automaton rule:

But what if one wants to arrange to have particular images as attractors? Here’s where the somewhat surprising idea of “stable diffusion” can be used. Imagine we start with two possible images, and , and then in a series of steps progressively add noise to them:

Here’s the bizarre thing we now want to do: train a neural net to take the image we get at a particular step, and “go backwards”, removing noise from it. The neural net we’ll use for this is somewhat complicated, with “convolutional” pieces that basically operate on blocks of nearby pixels, and “transformers” that get applied with certain weights to more distant pixels. Schematically in Wolfram Language the network looks at a high level like this:

And roughly what it’s doing is to make an informationally compressed version of each image, and then to expand it again (through what is usually called a “U-net” neural net). We start with an untrained version of this network (say just randomly initialized). Then we feed it a couple of million examples of noisy pictures of and , and the denoised outputs we want in each case.

Then if we take the trained neural net and successively apply it, for example, to a “noised ”, the net will “correctly” determine that the “denoised” version is a “pure ”:

But what if we apply this network to pure noise? The network has been set up to always eventually evolve either to the “” attractor or the “” attractor. But which it “chooses” in a particular case will depend on the details of the initial noise—so in effect the network will seem to be picking at random to “fish” either “” or “” out of the noise:

How does this apply to our original goal of generating images “like” those found for example on the web? Well, instead of just training our “denoising” (or “inverse diffusion”) network on a couple of “target” images, let’s imagine we train it on billions of images from the web. And let’s also assume that our network isn’t big enough to store all those images in any kind of explicit way.

In the abstract it’s not clear what the network will do. But the remarkable empirical fact is that it seems to manage to successfully generate (“from noise”) images that “follow the general patterns” of the images it was trained from. There isn’t any clear way to “formally validate” this success. It’s really just a matter of human perception: to us the images (generally) “look right”.

It could be that with a different (alien?) system of perception we’d immediately see “something wrong” with the images. But for purposes of human perception, the neural net seems to give “reasonable-looking” images—perhaps not least because the neural net operates at least approximately like our brains and our processes of perception seem to operate.

We’ve described how a denoising neural net seems to be able to start from some configuration of random noise and generate a “reasonable-looking” image. And from any particular configuration of noise, a given neural net will always generate the same image. But there’s no way to tell what that image will be of; it’s just something to empirically explore, as we did above.

But what if we want to “guide” the neural net to generate an image that we’d describe as being of a definite thing, like “a cat in a party hat”? We could imagine “continually checking” whether the image we’re generating would be recognized by a neural net as being of what we wanted. And conceptually that’s what we can do. But we also need a way to “redirect” the image generation if it’s “not going in the right direction”. And a convenient way to do this is to mix a “description of what we want” right into the denoising training process. In particular, if we’re training to “recover an ”, mix a description of the “” right alongside the image of the “”.

And here we can make use of a key feature of neural nets: that ultimately they operate on arrays of (real) numbers. So whether they’re dealing with images composed of pixels, or text composed of words, all these things eventually have to be “ground up” into arrays of real numbers. And when a neural net is trained, what it’s ultimately “learning” is just how to appropriately transform these “disembodied” arrays of numbers.

There’s a fairly natural way to generate an array of numbers from an image: just take the triples of red, green and blue intensity values for each pixel. (Yes, we could pick a different detailed representation, but it’s not likely to matter—because the neural net can always effectively “learn a conversion”.) But what about a textual description, like “a cat in a party hat”?

We need to find a way to encode text as an array of numbers. And actually LLMs face the same issue, and we can solve it in basically the same way here as LLMs do. In the end what we want is to derive from any piece of text a “feature vector” consisting of an array of numbers that provide some kind of representation of the “effective meaning” of the text, or at least the “effective meaning” relevant to describing images.

Let’s say we train a neural net to reproduce associations between images and captions, as found for example on the web. If we feed this neural net an image, it’ll try to generate a caption for the image. If we feed the neural net a caption, it’s not realistic for it to generate a whole image. But we can look at the innards of the neural net and see the array of numbers it derived from the caption—and then use this as our feature vector. And the idea is that because captions that “mean the same thing” should be associated in the training set with “the same kind of images”, they should have similar feature vectors.

So now let’s say we want to generate a picture of a cat in a party hat. First we find the feature vector associated with the text “a cat in a party hat”. Then this is what we keep mixing in at each stage of denoising to guide the denoising process, and end up with an image that the image captioning network will identify as “a cat in a party hat”.

The most direct way to do “denoising” is to operate directly on the pixels in an image. But it turns out there’s a considerably more efficient approach, which operates not on pixels but on “features” of the image—or, more specifically, on a feature vector which describes an image.

In a “raw image” presented in terms of pixels, there’s a lot of redundancy—which is why, for example, image formats like JPEG and PNG manage to compress raw images so much without even noticeably modifying them for purposes of typical human perception. But with neural nets it’s possible to do much greater compression, particularly if all we want to do is to preserve the “meaning” of an image, without worrying about its precise details.

And in fact as part of training a neural net to associate images with captions, we can derive a “latent representation” of images, or in effect a feature vector that captures the “important features” of the image. And then we can do everything we’ve discussed so far directly on this latent representation—decoding it only at the end into the actual pixel representation of the image.

So what does it look like to build up the latent representation of an image? With the particular setup we’re using here, it turns out that the feature vector in the latent representation still preserves the basic spatial arrangement of the image. The “latent pixels” are much coarser than the “visible” ones, and happen to be characterized by 4 numbers rather than the 3 for RGB. But we can decode things to see the “denoising” process happening in terms of “latent pixels”:

And then we can take the latent representation we get, and once again use a trained neural net to fill in a “decoding” of this in terms of actual pixels, getting out our final generated image.

Generative AI systems work by having attractors that are carefully constructed through training so that they correspond to “reasonable outputs”. A large part of what we’ve done above is to study what happens to these attractors when we change the internal parameters of the system (neural net weights, etc.). What we’ve seen has been complicated, and, indeed, often quite “alien looking”. But is there perhaps a simpler setup in which we can see similar core phenomena?

By the time we’re thinking about creating attractors for realistic images, etc. it’s inevitable that things are going to be complicated. But what if we look at systems with much simpler setups? For example, consider a dynamical system whose state is characterized just by a single number—such as an iterated map on the interval, like *x* *a* *x* (1 – *x*).

Starting from a uniform array of possible *x* values, we can show down the page which values of *x* are achieved at successive iterations:

For *a* = 2.9, the system evolves from any initial value to a single attractor, which consists of a single fixed final value. But if we change the “internal parameter” *a* to 3.1, we now get two distinct final values. And at the “bifurcation point” *a* = 3 there’s a sudden change from one to two distinct final values. And indeed in our generative AI system it’s fairly common to see similar discontinuous changes in behavior even when an internal parameter is continuously changed.

As another example—slightly closer to image generation—consider (as above) a 1D cellular automaton that exhibits class 2 behavior, and evolves from any initial state to some fixed final state that one can think of as an attractor for the system:

Which attractor one reaches depends on the initial condition one starts from. But—in analogy to our generative AI system—we can think of all the attractors as being “reasonable outputs” for the system. But now what happens if we change the parameters of the system, or in this case, the cellular automaton rule? In particular, what will happen to the attractors? It’s like what we did above in changing weights in a neural net—but a lot simpler.

The particular rule we’re using here has 4 possible colors for each cell, and is defined by just 64 discrete values from 0 to 3. So let’s say we randomly change just one of those values at a time. Here are some examples of what we get, always starting from the same initial condition as in the first picture above:

With a couple of exceptions these seem to produce results that are at least “roughly similar” to what we got without changing the rule. In analogy to what we did above, the cat might have changed, but it’s still more or less a cat. But let’s now try “progressive randomization”, where we modify successively more values in the definition of the rule. For a while we again get “roughly similar” results, but then—much like in our cat examples above—things eventually “fall apart” and we get “much more random” results:

One important difference between “stable diffusion” and cellular automata is that while in cellular automata, the evolution can lead to continued change forever, in stable diffusion there’s an annealing process used that always makes successive steps “progressively smaller”—and essentially forces a fixed point to be reached.

But notwithstanding this, we can try to get a closer analogy to image generation by looking (again as above) at 2D cellular automata. Here’s an example of the (not-too-exciting-as-images) “final states” reached from three different initial states in a particular rule:

And here’s what happens if one progressively changes the rule:

At first one still gets “reasonable-according-to-the-original-rule” final states. But if one changes the rule further, things get “more alien”, until they look to us quite random.

In changing the rule, one is in effect “moving in rulial space”. And by looking at how this works in cellular automata, one can get a certain amount of intuition. (Changes to the rule in a cellular automaton seem a bit like “changes to the genotype” in biology—with the behavior of the cellular automaton representing the corresponding “phenotype”.) But seeing how “rulial motion” works in a generative AI that’s been trained on “human-style input” gives a more accessible and humanized picture of what’s going on, even if it seems still further out of reach in terms of any kind of traditional explicit formalization.

This project is the first I’ve been able to do with our new Wolfram Institute. I thank our Fourmilab Fellow Nik Murzin and Ruliad Fellow Richard Assar for help. I also thank Jeff Arle, Nicolò Monti, Philip Rosedale and the Wolfram Research Machine Learning Group.

]]>Today we’re launching Version 13.3 of Wolfram Language and Mathematica—both available immediately on desktop and cloud. It’s only been 196 days since we released Version 13.2, but there’s a lot that’s new, not least a whole subsystem around LLMs.

Last Friday (June 23) we celebrated 35 years since Version 1.0 of Mathematica (and what’s now Wolfram Language). And to me it’s incredible how far we’ve come in these 35 years—yet how consistent we’ve been in our mission and goals, and how well we’ve been able to just keep building on the foundations we created all those years ago.

And when it comes to what’s now Wolfram Language, there’s a wonderful timelessness to it. We’ve worked very hard to make its design as clean and coherent as possible—and to make it a timeless way to elegantly represent computation and everything that can be described through it.

Last Friday I fired up Version 1 on an old Mac SE/30 computer (with 2.5 megabytes of memory), and it was a thrill see functions like `Plot` and `NestList` work just as they would today—albeit a lot slower. And it was wonderful to be able to take (on a floppy disk) the notebook I created with Version 1 and have it immediately come to life on a modern computer.

But even as we’ve maintained compatibility over all these years, the scope of our system has grown out of all recognition—with everything in Version 1 now occupying but a small sliver of the whole range of functionality of the modern Wolfram Language:

So much about Mathematica was ahead of its time in 1988, and perhaps even more about Mathematica and the Wolfram Language is ahead of its time today, 35 years later. From the whole idea of symbolic programming, to the concept of notebooks, the universal applicability of symbolic expressions, the notion of computational knowledge, and concepts like instant APIs and so much more, we’ve been energetically continuing to push the frontier over all these years.

Our long-term objective has been to build a full-scale computational language that can represent everything computationally, in a way that’s effective for both computers and humans. And now—in 2023—there’s a new significance to this. Because with the advent of LLMs our language has become a unique bridge between humans, AIs and computation.

The attributes that make Wolfram Language easy for humans to write, yet rich in expressive power, also make it ideal for LLMs to write. And—unlike traditional programming languages— Wolfram Language is intended not only for humans to write, but also to read and think in. So it becomes the medium through which humans can confirm or correct what LLMs do, to deliver computational language code that can be confidently assembled into a larger system.

The Wolfram Language wasn’t originally designed with the recent success of LLMs in mind. But I think it’s a tribute to the strength of its design that it now fits so well with LLMs—with so much synergy. The Wolfram Language is important to LLMs—in providing a way to access computation and computational knowledge from within the LLM. But LLMs are also important to Wolfram Language—in providing a rich linguistic interface to the language.

We’ve always built—and deployed—Wolfram Language so it can be accessible to as many people as possible. But the advent of LLMs—and our new Chat Notebooks—opens up Wolfram Language to vastly more people. Wolfram|Alpha lets anyone use natural language—without prior knowledge—to get questions answered. Now with LLMs it’s possible to use natural language to start defining potential elaborate computations.

As soon as you’ve formulated your thoughts in computational terms, you can immediately “explain them to an LLM”, and have it produce precise Wolfram Language code. Often when you look at that code you’ll realize you didn’t explain yourself quite right, and either the LLM or you can tighten up your code. But anyone—without any prior knowledge—can now get started producing serious Wolfram Language code. And that’s very important in seeing Wolfram Language realize its potential to drive “computational X” for the widest possible range of

But while LLMs are “the biggest single story” in Version 13.3, there’s a lot else in Version 13.3 too—delivering the latest from our long-term research and development pipeline. So, yes, in Version 13.3 there’s new functionality not only in LLMs but also in many “classic” areas—as well as in new areas having nothing to do with LLMs.

Across the 35 years since Version 1 we’ve been able to continue accelerating our research and development process, year by year building on the functionality and automation we’ve created. And we’ve also continually honed our actual process of research and development—for the past 5 years sharing our design meetings on open livestreams.

Version 13.3 is—from its name—an “incremental release”. But—particularly with its new LLM functionality—it continues our tradition of delivering a long list of important advances and updates, even in incremental releases.

LLMs make possible many important new things in the Wolfram Language. And since I’ve been discussing these in a series of recent posts, I’ll just give only a fairly short summary here. More details are in the other posts, both ones that have appeared, and ones that will appear soon.

To ensure you have the latest Chat Notebook functionality installed and available, use:

The most immediately visible LLM tech in Version 13.3 is Chat Notebooks. Go to `File` > `New` > `Chat-Enabled Notebook``'` (quote) to get a new chat cell:

You might not like some details of what got done (do you really want those boldface labels?) but I consider this pretty impressive. And it’s a great example of using an LLM as a “linguistic interface” with common sense, that can generate precise computational language, which can then be run to get a result.

This is all very new technology, so we don’t yet know what patterns of usage will work best. But I think it’s going to go like this. First, you have to think computationally about whatever you’re trying to do. Then you tell it to the LLM, and it’ll produce Wolfram Language code that represents what it thinks you want to do. You might just run that code (or the Chat Notebook will do it for you), and see if it produces what you want. Or you might read the code, and see if it’s what you want. But either way, you’ll be using computational language—Wolfram Language—as the medium to formalize and express what you’re trying to do.

When you’re doing something you’re familiar with, it’ll almost always be faster and better to think directly in Wolfram Language, and just enter the computational language code you want. But if you’re exploring something new, or just getting started on something, the LLM is likely to be a really valuable way to “get you to first code”, and to start the process of crispening up what you want in computational terms.

If the LLM doesn’t do exactly what you want, then you can tell it what it did wrong, and it’ll try to correct it—though sometimes you can end up doing a lot of explaining and having quite a long dialog (and, yes, it’s often vastly easier just to type Wolfram Language code yourself):

Sometimes the LLM will notice for itself that something went wrong, and try changing its code, and rerunning it:

And even if it didn’t write a piece of code itself, it’s pretty good at piping up to explain what’s going on when an error is generated:

And actually it’s got a big advantage here, because “under the hood” it can look at lots of details (like stack trace, error documentation, etc.) that humans usually don’t bother with.

To support all this interaction with LLMs, there’s all kinds of new structure in the Wolfram Language. In Chat Notebooks there are chat cells, and there are chatblocks (indicated by gray bars, and generating with `~`) that delimit the range of chat cells that will be fed to the LLM when you press `shift``enter` on a new chat cell. And, by the way, the whole mechanism of cells, cell groups, etc. that we invented 36 years ago now turns out to be extremely powerful as a foundation for Chat Notebooks.

One can think of the LLM as a kind of “alternate evaluator” in the notebook. And there are various ways to set up and control it. The most immediate is in the menu associated with every chat cell and every chatblock (and also available in the notebook toolbar):

The first items here let you define the “persona” for the LLM. Is it going to act as a Code Assistant that writes code and comments on it? Or is it just going to be a Code Writer, that writes code without being wordy about it? Then there are some “fun” personas—like Wolfie and Birdnardo—that respond “with an attitude”. The `Advanced Settings` let you do things like set the underlying LLM model you want to use—and also what tools (like Wolfram Language code evaluation) you want to connect to it.

Ultimately personas are mostly just special prompts for the LLM (together, sometimes with tools, etc.) And one of the new things we’ve recently launched to support LLMs is the Wolfram Prompt Repository:

The Prompt Repository contains several kinds of prompts. The first are personas, which are used to “style” and otherwise inform chat interactions. But then there are two other types of prompts: function prompts, and modifier prompts.

Function prompts are for getting the LLM to do something specific, like summarize a piece of text, or suggest a joke (it’s not terribly good at that). Modifier prompts are for determining how the LLM should modify its output, for example translating into a different human language, or keeping it to a certain length.

You can pull in function prompts from the repository into a Chat Notebook by using `!`, and modifier prompts using `#`. There’s also a `^` notation for saying that you want the “input” to the function prompt to be the cell above:

This is how you can access LLM functionality from within a Chat Notebook. But there’s also a whole symbolic programmatic way to access LLMs that we’ve added to the Wolfram Language. Central to this is `LLMFunction`, which acts very much like a Wolfram Language pure function, except that it gets “evaluated” not by the Wolfram Language kernel, but by an LLM:

You can access a function prompt from the Prompt Repository using `LLMResourceFunction`:

There’s also a symbolic representation for chats. Here’s an empty chat:

And here now we “say something”, and the LLM responds:

There’s lots of depth to both Chat Notebooks and LLM functions—as I’ve described elsewhere. There’s `LLMExampleFunction` for getting an LLM to follow examples you give. There’s `LLMTool` for giving an LLM a way to call functions in the Wolfram Language as “tools”. And there’s `LLMSynthesize` which provides raw access to the LLM as its text completion and other capabilities. (And controlling all of this is `$LLMEvaluator` which defines the default LLM configuration to use, as specified by an `LLMConfiguration` object.)

I consider it rather impressive that we’ve been able to get to the level of support for LLMs that we have in Version 13.3 in less than six months (along with building things like the Wolfram Plugin for ChatGPT, and the Wolfram ChatGPT Plugin Kit). But there’s going to be more to come, with LLM functionality increasingly integrated into Wolfram Language and Notebooks, and, yes, Wolfram Language functionality increasingly integrated as a tool into LLMs.

“Find the integral of the function ___” is a typical core thing one wants to do in calculus. And in Mathematica and the Wolfram Language that’s achieved with `Integrate`. But particularly in applications of calculus, it’s common to want to ask slightly more elaborate questions, like “What’s the integral of ___ over the region ___?”, or “What’s the integral of ___ along the line ___?”

Almost a decade ago (in Version 10) we introduced a way to specify integration over regions—just by giving the region “geometrically” as the domain of the integral:

It had always been possible to write out such an integral in “standard `Integrate`” form

but the region specification is much more convenient—as well as being much more efficient to process.

Finding an integral along a line is also something that can ultimately be done in “standard `Integrate`” form. And if you have an explicit (parametric) formula for the line this is typically fairly straightforward. But if the line is specified in a geometrical way then there’s real work to do to even set up the problem in “standard `Integrate`” form. So in Version 13.3 we’re introducing the function `LineIntegrate` to automate this.

`LineIntegrate` can deal with integrating both scalar and vector functions over lines. Here’s an example where the line is just a straight line:

But `LineIntegrate` also works for lines that aren’t straight, like this parametrically specified one:

To compute the integral also requires finding the tangent vector at every point on the curve—but `LineIntegrate` automatically does that:

Line integrals are common in applications of calculus to physics. But perhaps even more common are surface integrals, representing for example total flux through a surface. And in Version 13.3 we’re introducing `SurfaceIntegrate`. Here’s a fairly straightforward integral of flux that goes radially outward through a sphere:

Here’s a more complicated case: