Stephen Wolfram Blog Stephen Wolfram's Personal Blog Mon, 16 Dec 2019 14:47:55 +0000 en-US hourly 1 <![CDATA[The New World of Notebook Publishing]]> Thu, 24 Oct 2019 15:41:54 +0000 Stephen Wolfram World-Pub-Blog-IconWolfram Notebooks on the Web We’ve been working towards it for many years, but now it’s finally here: an incredibly smooth workflow for publishing Wolfram Notebooks to the web—that makes possible a new level of interactive publishing and computation-enabled communication. You create a Wolfram Notebook—using all the power of the Wolfram Language and the Wolfram [...]]]> World-Pub-Blog-Icon

Wolfram Notebooks on the Web

We’ve been working towards it for many years, but now it’s finally here: an incredibly smooth workflow for publishing Wolfram Notebooks to the web—that makes possible a new level of interactive publishing and computation-enabled communication.

You create a Wolfram Notebook—using all the power of the Wolfram Language and the Wolfram Notebook system—on the desktop or in the cloud. Then you just press a button to publish it to the Wolfram Cloud—and immediately anyone anywhere can both read and interact with it on the web.

The new world of notebook publishing

It’s an unprecedentedly easy way to get rich, interactive, computational content onto the web. And—together with the power of the Wolfram Language as a computational language—it promises to usher in a new era of computational communication, and to be a crucial driver for the development of “computational X” fields.

When a Wolfram Notebook is published to the cloud, it’s immediately something people can read and interact with. But it’s more than that. Because if you press the Make Your Own Copy button, you’ll get your own copy of the notebook, which you can not only read and interact with, but also edit and do computation in, right on the web. And what this means is that the notebook becomes not just something you look at, but something you can immediately use and build on.

And, by the way, we’ve set it up so that anyone can make their own copy of a published notebook, and start using it; all they need is a (free) Cloud Basic account. And people with Cloud Basic accounts can even publish their own notebooks in the cloud, though if they want to store them long term they’ll have to upgrade their account. (Through the Wolfram Foundation, we’re also developing a permanent curated Notebook Archive for public-interest notebooks.)

Make Your Own Copy

There are lots of other important workflows too. On a computer, you can immediately download notebooks to your desktop, and run them there natively using the latest version of the Wolfram Player that we’ve made freely available for many years. You can also run notebooks natively on iOS devices using the Wolfram Player app. And the Wolfram Cloud app (on iOS or Android) gives you a streamlined way to make your own copy of a notebook to work with in the cloud.

Notebook workflows

You can publish a Wolfram Notebook to the cloud, and you can use it as a complete, rich webpage. But you can also embed the notebook inside an existing webpage, providing anything from a single (perhaps dynamically updated) graphic to a full interactive interface or embedded document.

And, by the way, the exact same technology that enables Wolfram Notebooks in the cloud also allows you to immediately set up Wolfram Language APIs or form interfaces, for use either directly on the web, or through client libraries in languages like Python and Java.

The Story of Notebooks

The story of notebooks

We invented notebooks in 1988 as the main interface for Mathematica Version 1.0, and over the past three decades, many millions of Wolfram Notebooks have been made. Some record ongoing work, some are exercises, and some contain discoveries small and large. Some are expositions, presentations or online books and papers. Some are interactive demonstrations. And with the emergence of the Wolfram Language as a full-scale computational language, more and more now serve as rich computational essays, communicating with unprecedented effectiveness in a mixture of human language and computational language.

Over the years, we’ve progressively polished the notebook experience with a long series of user interface innovations, adapted and optimized for successive generations of desktop systems. But what’s allowed us now to do full-scale notebook publishing on the web is that—after many years of work—we’ve managed to get a polished version of Wolfram Notebooks that run in the cloud, much as they do on desktop.

Create a notebook on the desktop or in the cloud, complete with all its code, hierarchical sections, interactive elements, large graphics, etc. When it’s published as a cloud notebook people will be able to visit it just like they would visit any webpage, except that it’ll automatically “come alive” and allow all sorts of interaction.

Some of that interaction will happen locally inside the web browser; some of it will automatically access servers in the cloud. But in the end—reflecting our whole hyperarchitecture approach—Wolfram Notebooks will run seamlessly across desktop, cloud and mobile. Create your content once, and let people not only read it anywhere, but also interact with it, as well as modify and compute with it.

What’s in a Notebook

When you first go to a Wolfram Notebook in the cloud it might look like an ordinary webpage. But the fact that it’s an active, computational document means there are lots of things you can immediately do with it. If you see a graphic, you’ll immediately be able to resize it. If it’s 3D, you’ll be able to rotate it too. Notebooks are typically organized in a hierarchy of cells, and you can immediately open and close groups of cells to navigate the hierarchy.

Notebook cell hierarchy

There can also be dynamic interactive content. In the Wolfram Language, functions like Manipulate automatically set up interactive user interfaces in notebooks, with sliders and so on—and these are automatically active in a published cloud notebook. Other content can be dynamic too: using functions like Dynamic you can for example dynamically pull data in real time from the Wolfram Knowledgebase or the Wolfram Data Drop or—if the user allows it—from their computer’s camera or microphone.

Dynamic interactive content

When you write a computational essay you typically want people to read your Wolfram Language code, because it’s part of how you’re communicating your content. But in a Wolfram Notebook you can also use Iconize to just show an iconized version of details of your code (like, say, options for how to display graphics):


Normally when you do a computation in a Wolfram Notebook, there’ll be a succession of In[ ] and Out[ ] cells. But you can always double-click the Out[ ] cell to close the In[ ] cell, so people at first just see the output, and not the computational language code that made it.

Input and output cells

One of the great things about the Wolfram Language is how integrated and self-contained it is. And that means that it’s realistic to pick up even fragments of code from anywhere in a notebook, and expect to have it work it elsewhere. In a published notebook, just click a piece of code and it’ll get copied so you can paste it into a notebook you’re creating, on the cloud or the desktop.

A great source of “ready-made” interactive content for Wolfram Notebooks is the 12,000+ interactive Demonstrations in the Wolfram Demonstrations Project. Press Copy to Clipboard and you can paste the Demonstration (together with closed cells containing its code) into any notebook.

Copy Demonstration to clipboard

Once you’ve assembled the notebook you want, you can publish it. On the desktop, go to File > Publish to Cloud. In the cloud, just press Publish. You can either specify the name for the published notebook—or you can let the system automatically pick a UUID name. But you can take any notebook—even a large one—and very quickly have a published version in the cloud.

Computational Journals

It didn’t take long after we invented notebooks back in 1988 for me to start thinking about using them to enable a new kind of computational publishing, with things like computational journals and computational books. And, indeed, even very early on, there started to be impressive examples of what could be done.

But with computation tied to the desktop, there was always a limit to what could be done. Even before the web, we invented systems for distributing notebooks as desktop files. Later, when web browsers existed, we built plugins to access desktop computation capabilities from within browsers. And already in the mid-1990s we built mechanisms for generating content through web servers from within webpages. But it was only with modern web technology and with the whole framework of the Wolfram Cloud that the kind of streamlined notebook publishing that we’re releasing today has become possible.

But given what we now have, I think there’s finally an opportunity to transform things like scientific and technical publishing—and to make them truly take advantage of the computational paradigm. Yes, there can be rich interactive diagrams, that anyone can use on the web. And, yes, things can be dynamically updated, for example based on real-time data from the Wolfram Knowledgebase or elsewhere.

But important as these things are, I think they ultimately pale in comparison to what Wolfram Notebooks can do for the usability and reproducibility of knowledge. Because a Wolfram Notebook doesn’t just give you something to read or even interact with; it can also give you everything you need to actually use—or reproduce—what it says.

Either directly within the notebook, or in the Wolfram Data Repository, or elsewhere in the cloud, there can for example be underlying data—say from observations or experiments. Then there can be code in the notebook that computes graphics or other outputs that can be derived from this data. And, yes, that code could be there just to be there—and could be hidden away in some kind of unreadable computational footnote.

But there’s something much more powerful that’s now uniquely possible with the Wolfram Language as it exists today: it’s possible to use the language not just to provide code for a computer to run, but also to express things in computational language in a way that not just computers, but also humans, can readily understand. Technical papers often use mathematical notation to succinctly express mathematical ideas. What we’ve been working toward all these years with the Wolfram Language is to provide a full-scale computational language that can also express computational ideas.

So let’s say you’ve got a technical paper that’s presented as a Wolfram Notebook, with lots of its content in the Wolfram Language. What can you do with it? You can certainly run the computational language code to make sure it produces what the paper says. But more important, you can take pieces of that computational language code and build on it, using it yourself in your own notebook, running it for different cases, modifying it, etc.

Of course, the fact that this can actually work in practice is incredibly nontrivial, and relies on a huge amount of unique development that we’ve done. Because first and foremost, it requires a coherently designed, full-scale symbolic computational language—because that’s the only way it’s realistic to be able to take even small fragments of code and have them work on their own, or in different situations. But there’s more too: it’s also critical that code that works now goes on working in the future, and with the design discipline we’ve had in the Wolfram Language we have an impressive history of compatibility spanning more than 30 years.

Back in the 1970s when I started writing technical papers, they typically began as handwritten documents. Later they were typed on a typewriter. Then when a journal was going to publish them, they would get copyedited and typeset, before being printed. It was a laborious—and inevitably somewhat expensive—process.

By the 1980s, personal computers with word processors and typesetting systems were becoming common—and pretty soon journals could expect “camera-ready” electronic versions of papers. (As it happens, in 1986 I started what may have been the first journal to routinely accept such things.)

And as the technology improved, the quality of what an author could readily make and what a publisher could produce in a fully typeset journal gradually converged, leaving the primary role of the journal being around branding and selectivity, and for many people calling its value into question.

But for computational journals it’s a new story. Because if a paper has computational language code in it, there’s the immediate question of whether the code actually runs, and runs correctly. It’s a little like the old process of copyediting a paper so it could be typeset. There’s real human work—and understanding—that’s needed to make sure the code runs correctly. The good news is that one can use methods from software quality assurance, now enhanced by things like modern machine learning. But there’s still real work to be done—and as a result there’s real value to be added by the process of “official publication” in a computational journal, and there’s a real reason to actually have a computational journal as an organized, potentially commercial, thing.

We’ve been doing review and curation of submissions to the Wolfram Demonstrations Project for a dozen years now. And, yes, it takes work. But the result is that we can be confident that the Demonstrations we publish actually run, and will go on doing so. For the Wolfram Data Repository we also have a review process, there to ensure that data is computable at an appropriate level.

One day there’ll surely be “first-run” computational journals, where new results are routinely reported through computational essays. But even before that, we can expect ancillary computational journals, that provide genuine “computation-backed” and “data-backed” publication. There just hasn’t been the technology to make this work properly in the past. Now, with the Wolfram Language, and the new streamlined web publishing of Wolfram Notebooks, everything that’s needed finally exists.

Changing the Way I Work

It’s always a sign that something is important when it immediately changes the way one works. And that’s certainly something that’s happened for me with notebook publishing.

I might give a talk where I build up a notebook, say doing a live experiment or a demonstration. And then at the end of the talk, I’ll do something new: I’ll publish the notebook to the cloud (either by pressing the button or using CloudPublish). Then I’ll make a QR code of the notebook URL (say using BarcodeImage), and show it big on the screen. People in the audience can then hold up their phones to read the QR code—and then just click the URL, and immediately be able to start using my notebook in the Wolfram Cloud on their phones.

I can tell that notebook publishing is getting me to write more, because now I have a convenient way to distribute what I write. I’ll often do computational explorations of one thing or another. And in the past, I’d just store the notebooks I made in my filesystem (and, yes, over 30+ years I’ve built up a huge number). But now it’s incredibly fast to add some text to turn the notebooks into computational essays—that I can immediately publish to the cloud, so anyone can access them.

Sometimes I’ll put a link to the published notebook in a post like this; sometimes I’ll do something like tweet it. But the point is that I now have a very streamlined way to give people direct access to computational work I do, in a form that they can immediately interact with, and build on.

From a technical development point of view, the path to where we are today has been a long and complex one, involving many significant achievements in software engineering. But the result is something conceptually clear and simple, though extremely powerful—that I think is going to enable a major new level of computation-informed communication: a new world of notebook publishing.

More about Wolfram Notebooks:

Wolfram Notebooks Overview »
Wolfram Notebooks Interactive Course »

]]> 3
<![CDATA[Just Published: <em>Adventures of a Computational Explorer</em>]]> Wed, 16 Oct 2019 15:32:58 +0000 Stephen Wolfram Wolfram_Adventures_Thumb2Today my latest book is published: Adventures of a Computational Explorer. From the preface: “You work so hard… but what do you do for fun?” people will ask me. Well, the fact is that I’ve tried to set up my life so that the things I work on are things I find fun. Most of [...]]]> Wolfram_Adventures_Thumb2

Today my latest book is published: Adventures of a Computational Explorer.

Just Published: Adventures of a Computational Explorer

From the preface:

“You work so hard… but what do you do for fun?” people will ask me. Well, the fact is that I’ve tried to set up my life so that the things I work on are things I find fun. Most of those things are aligned with big initiatives of mine, and with products and companies and scientific theories that I’ve built over decades. But sometimes I work on things that just come up, and that for one reason or another I find interesting and fun.

This book is a collection of pieces I’ve written over the past dozen years on some of these things, and the adventures I’ve had around them. Most of the pieces I wrote in response to some particular situation or event. Their topics are diverse. But it’s remarkable how connected they end up being. And at some level all of them reflect the paradigm for thinking that has defined much of my life.

It all centers around the idea of computation, and the generality of abstraction to which it leads. Whether I’m thinking about science, or technology, or philosophy, or art, the computational paradigm provides both an overall framework and specific facts that inform my thinking. And in a sense this book reflects the breadth of applicability of this computational paradigm.

But I suppose it also reflects something else that I’ve long cultivated in myself: a willingness and an interest in applying my ways of thinking to pretty much any topic. I sometimes imagine that I will have nothing much to add to some particular topic. But it’s remarkable how often the computational paradigm—and my way of thinking about it—ends up providing a new and different insight, or an unexpected way forward.

I often urge people to “keep their thinking apparatus engaged” even when they’re faced with issues that don’t specifically seem to be in their domains of expertise. And I make a point of doing this myself. It helps that the computational paradigm is so broad. But even at a much more specific level I’m continually amazed by how much the things I’ve learned from science or language design or technology development or business actually do end up connecting to the issues that come up.

If there’s one thing that I hope comes through from the pieces in this book it’s how much fun it can be to figure things out, and to dive deep into understanding particular topics and questions. Sometimes there’s a simple, superficial answer. But for me what’s really exciting is the much more serious intellectual exploration that’s involved in giving a proper, foundational answer. I always find it particularly fun when there’s a very practical problem to solve, but to get to a good solution requires an adventure that takes one through deep, and often philosophical, issues.

Inevitably, this book reflects some of my personal journey. When I was young I thought my life would be all about making discoveries in specific areas of science. But what I’ve come to realize—particularly having embraced the computational paradigm—is that the same intellectual thought processes can be applied not just to what one thinks of as science, but to pretty much anything. And for me there’s tremendous satisfaction in seeing how this works out.

]]> 2
<![CDATA[Announcing the Rule 30 Prizes]]> Tue, 01 Oct 2019 17:52:32 +0000 Stephen Wolfram rule30-iconThe Story of Rule 30 How can something that simple produce something that complex? It’s been nearly 40 years since I first saw rule 30—but it still amazes me. Long ago it became my personal all-time favorite science discovery, and over the years it’s changed my whole worldview and led me to all sorts of [...]]]> rule30-icon

Announcing the Rule 30 Prizes

The Story of Rule 30

How can something that simple produce something that complex? It’s been nearly 40 years since I first saw rule 30—but it still amazes me. Long ago it became my personal all-time favorite science discovery, and over the years it’s changed my whole worldview and led me to all sorts of science, technology, philosophy and more.

But even after all these years, there are still many basic things we don’t know about rule 30. And I’ve decided that it’s now time to do what I can to stimulate the process of finding more of them out. So as of today, I am offering $30,000 in prizes for the answers to three basic questions about rule 30.

The setup for rule 30 is extremely simple. One’s dealing with a sequence of lines of black and white cells. And given a particular line of black and white cells, the colors of the cells on the line below are determined by looking at each cell and its immediate neighbors and then applying the following simple rule:



If you start with a single black cell, what will happen? One might assume—as I at first did—that the rule is simple enough that the pattern it produces must somehow be correspondingly simple. But if you actually do the experiment, here’s what you find happens over the first 50 steps:


RulePlot[CellularAutomaton[30], {{1}, 0}, 50, Mesh -> All, 
 ImageSize -> Full]

But surely, one might think, this must eventually resolve into something much simpler. Yet here’s what happens over the first 300 steps:

The first 300 steps of rule 30—click to enlarge

And, yes, there’s some regularity over on the left. But many aspects of this pattern look for all practical purposes random. It’s amazing that a rule so simple can produce behavior that’s so complex. But I’ve discovered that in the computational universe of possible programs this kind of thing is common, even ubiquitous. And I’ve built a whole new kind of science—with all sorts of principles—based on this.

And gradually there’s been more and more evidence for these principles. But what specifically can rule 30 tell us? What concretely can we say about how it behaves? Even the most obvious questions turn out to be difficult. And after decades without answers, I’ve decided it’s time to define some specific questions about rule 30, and offer substantial prizes for their solutions.

I did something similar in 2007, putting a prize on a core question about a particular Turing machine. And at least in that case the outcome was excellent. In just a few months, the prize was won—establishing forever what the simplest possible universal Turing machine is, as well as providing strong further evidence for my general Principle of Computational Equivalence.

The Rule 30 Prize Problems again get at a core issue: just how complex really is the behavior of rule 30? Each of the problems asks this in a different, concrete way. Like rule 30 itself, they’re all deceptively simple to state. Yet to solve any of them will be a major achievement—that will help illuminate fundamental principles about the computational universe that go far beyond the specifics of rule 30.

I’ve wondered about every one of the problems for more than 35 years. And all that time I’ve been waiting for the right idea, or the right kind of mathematical or computational thinking, to finally be able to crack even one of them. But now I want to open this process up to the world. And I’m keen to see just what can be achieved, and what methods it will take.

The Rule 30 Prize Problems

For the Rule 30 Prize Problems, I’m concentrating on a particularly dramatic feature of rule 30: the apparent randomness of its center column of cells. Start from a single black cell, then just look down the sequence of values of this cell—and it seems random:


 MapIndexed[If[#2[[2]] != 21, # /. {0 -> 0.2, 1 -> .6}, #] &, 
  CellularAutomaton[30, {{1}, 0}, 20], {2}], Mesh -> All]

But in what sense is it really random? And can one prove it? Each of the Prize Problems in effect uses a different criterion for randomness, then asks whether the sequence is random according to that criterion.

Problem 1: Does the center column always remain non-periodic?

Here’s the beginning of the center column of rule 30:


ArrayPlot[List@CellularAutomaton[30, {{1}, 0}, {80, {{0}}}], 
 Mesh -> True, ImageSize -> Full]

It’s easy to see that this doesn’t repeat—it doesn’t become periodic. But this problem is about whether the center column ever becomes periodic, even after an arbitrarily large number of steps. Just by running rule 30, we know the sequence doesn’t become periodic in the first billion steps. But what about ever? To establish that, we need a proof. (Here are the first million and first billion bits in the sequence, by the way, as entries in the Wolfram Data Repository.)

Problem 2: Does each color of cell occur on average equally often in the center column?

Here’s what one gets if one tallies the number of black and of white cells in successively more steps in the center column of rule 30:

The number of black and of white cells in the center column of rule 30

Dataset[{{1, 1, 0, ""}, {10, 7, 3, 2.3333333333333335}, {100, 52, 48, 1.0833333333333333}, 
 {1000, 481, 519, 0.9267822736030829}, {10000, 5032, 4968, 1.0128824476650564}, 
 {100000, 50098, 49902, 1.0039276982886458}, {1000000, 500768, 499232, 
  1.003076725850907}, {10000000, 5002220, 4997780, 1.0008883944471345}, 
 {100000000, 50009976, 49990024, 1.000399119632349}, 
 {1000000000, 500025038, 499974962, 1.0001001570154626}}]

The results are certainly close to equal for black vs. white. But what this problem asks is whether the limit of the ratio after an arbitrarily large number of steps is exactly 1.

Problem 3: Does computing the nth cell of the center column require at least O(n) computational effort?

To find the nth cell in the center column, one can always just run rule 30 for n steps, computing the values of all the cells in this diamond:


With[{n = 100}, 
  MapIndexed[If[Total[Abs[#2 - n/2 - 1]] <= n/2, #, #/4] &, 
   CellularAutomaton[30, CenterArray[{1}, n + 1], n], {2}]]]

But if one does this directly, one’s doing n2 individual cell updates, so the computational effort required goes up like O(n2). This problem asks if there’s a shortcut way to compute the value of the nth cell, without all this intermediate computation—or, in particular, in less than O(n) computational effort.

The Digits of Pi

Rule 30 is a creature of the computational universe: a system found by exploring possible simple programs with the new intellectual framework that the paradigm of computation provides. But the problems I’ve defined about rule 30 have analogs in mathematics that are centuries old.

Consider the digits of π. They’re a little like the center column of rule 30. There’s a definite algorithm for generating them. Yet once generated they seem for all practical purposes random:

N[Pi, 85]

N[Pi, 85]

Just to make the analog a little closer, here are the first few digits of π in base 2:

BaseForm[N[Pi, 25], 2]

BaseForm[N[Pi, 25], 2]

And here are the first few bits in the center column of rule 30:

Row[CellularAutomaton[30, {{1}, 0}, {90, {{0}}}]]

Row[CellularAutomaton[30, {{1}, 0}, {90, {{0}}}]]

Just for fun, one can convert these to base 10:

N[FromDigits[{Flatten[CellularAutomaton[30, {{1}, 0}, {500, {0}}]], 0}, 2], 85]

N[FromDigits[{Flatten[CellularAutomaton[30, {{1}, 0}, {500, {0}}]], 
   0}, 2], 85]

Of course, the known algorithms for generating the digits of π are considerably more complicated than the simple rule for generating the center column of rule 30. But, OK, so what’s known about the digits of π?

Well, we know they don’t repeat. That was proved in the 1760s when it was shown that π is an irrational number—because the only numbers whose digits repeat are rational numbers. (It was also shown in 1882 that π is transcendental, i.e. that it cannot be expressed in terms of roots of polynomials.)

How about the analog of problem 2? Do we know if in the digit sequence of π different digits occur with equal frequency? By now more than 100 trillion binary digits have been computed—and the measured frequencies of digits are very close (in the first 40 trillion binary digits the ratio of 1s to 0s is about 0.9999998064). But in the limit, are the frequencies exactly the same? People have been wondering about this for several centuries. But so far mathematics hasn’t succeeded in delivering any results.

For rational numbers, digit sequences are periodic, and it’s easy to work out relative frequencies of digits. But for the digit sequences of all other “naturally constructed” numbers, basically there’s nothing known about limiting frequencies of digits. It’s a reasonable guess that actually the digits of π (as well as the center column of rule 30) are “normal”, in the sense that not only every individual digit, but also every block of digits of any given length in the limit occur with equal frequency. And as was noted in the 1930s, it’s perfectly possible to “digit-construct” normal numbers. Champernowne’s number, formed by concatenating the digits of successive integers, is an example (and, yes, this works in any base, and one can also get normal numbers by concatenating values of functions of successive integers):

N[ChampernowneNumber[10], 85]

N[ChampernowneNumber[10], 85]

But the point is that for “naturally constructed” numbers formed by combinations of standard mathematical functions, there’s simply no example known where any regularity of digits has been found. Of course, it ultimately depends what one means by “regularity”—and at some level the problem devolves into a kind of number-digit analog of the search for extraterrestrial intelligence. But there’s absolutely no proof that one couldn’t, for example, find even some strange combination of square roots that would have a digit sequence with some very obvious regularity.

OK, so what about the analog of problem 3 for the digits of π? Unlike rule 30, where the obvious way to compute elements in the sequence is one step at a time, traditional ways of computing digits of π involve getting better approximations to π as a complete number. With the standard (bizarre-looking) series invented by Ramanujan in 1910 and improved by the Chudnovsky brothers in 1989, the first few terms in the series give the following approximations:

Standard series

\*UnderoverscriptBox[\(\[Sum]\), \(k = 0\), \(n\)]
\*SuperscriptBox[\((\(-1\))\), \(k\)]*\(\((6*k)\)!\)*\((13591409 + 
           545140134*k)\)\), \(\(\((3*k)\)!\) 
\*SuperscriptBox[\((\(k!\))\), \(3\)]*
\*SuperscriptBox[\(640320\), \(3*k + 3/2\)]\)]\))^-1, 100], {n, 10}] //
   Column, 9]

So how much computational effort is it to find the nth digit? The number of terms required in the series is O(n). But each term needs to be computed to n-digit precision, which requires at least O(n) individual digit operations—implying that altogether the computational effort required is more than O(n).

Until the 1990s it was assumed that there wasn’t any way to compute the nth digit of π without computing all previous ones. But in 1995 Simon Plouffe discovered that actually it’s possible to compute—albeit slightly probabilistically—the nth digit without computing earlier ones. And while one might have thought that this would allow the nth digit to be obtained with less than O(n) computational effort, the fact that one has to do computations at n-digit precision means that at least O(n) computational effort is still required.

Results, Analogies and Intuitions

Problem 1: Does the center column always remain non-periodic?

Of the three Rule 30 Prize Problems, this is the one on which the most progress has already been made. Because while it’s not known if the center column in the rule 30 pattern ever becomes periodic, Erica Jen showed in 1986 that no two columns can both become periodic. And in fact, one can also give arguments that a single column plus scattered cells in another column can’t both be periodic.

The proof about a pair of columns uses a special feature of rule 30. Consider the structure of the rule:



Normally one would just say that given each triple of cells, the rule determines the color of the center cell below. But for rule 30, one can effectively also run the rule sideways: given the cell to the right and above, one can also uniquely determine the color of the cell to the left. And what this means is that if one is given two adjacent columns, it’s possible to reconstruct the whole pattern to the left:


 ArrayPlot[#, PlotRange -> 1, Mesh -> All, PlotRange -> 1, 
    Background -> LightGray, 
    ImageSize -> {Automatic, 80}] & /@ (PadLeft[#, {Length[#], 10}, 
      10] & /@ 
    Module[{data = {{0, 1}, {1, 1}, {0, 0}, {0, 1}, {1, 1}, {1, 
         0}, {0, 1}, {1, 10}}}, 
         Table[Module[{p, q = data[[n, 1]], r = data[[n, 2]], 
            s = data[[n + 1, 1]] },
           p = Mod[-q - r - q r + s, 2];
           PrependTo[data[[n]], p]], {n, 1, Length[data] - i}], 
         PrependTo[data[[-#]], 10] & /@ Reverse[Range[i]]], {i, 7}]}, 

But if the columns were periodic, it immediately follows that the reconstructed pattern would also have to be periodic. Yet by construction at least the initial condition is definitely not periodic, and hence the columns cannot both be periodic. The same argument works if the columns are not adjacent, and if one doesn’t know every cell in both columns. But there’s no known way to extend the argument to a single column—such as the center column—and thus it doesn’t resolve the first Rule 30 Prize Problem.

OK, so what would be involved in resolving it? Well, if it turns out that the center column is eventually periodic, one could just compute it, and show that. We know it’s not periodic for the first billion steps, but one could at least imagine that there could be a trillion-step transient, after which it’s periodic.

Is that plausible? Well, transients do happen—and theoretically (just like in the classic Turing machine halting problem) they can even be arbitrarily long. Here’s a somewhat funky example—found by a search—of a rule with 4 possible colors (totalistic code 150898). Run it for 200 steps, and the center column looks quite random:

Rule 150898

 CellularAutomaton[{150898, {4, 1}, 1}, {{1}, 0}, {200, 150 {-1, 1}}],
  ColorRules -> {0 -> Hue[0.12, 1, 1], 1 -> Hue[0, 0.73, 0.92], 
   2 -> Hue[0.13, 0.5, 1], 3 -> Hue[0.17, 0, 1]}, 
 PixelConstrained -> 2, Frame -> False]

After 500 steps, the whole pattern still looks quite random:

Rule 150898

 CellularAutomaton[{150898, {4, 1}, 1}, {{1}, 0}, {500, 300 {-1, 1}}],
  ColorRules -> {0 -> Hue[0.12, 1, 1], 1 -> Hue[0, 0.73, 0.92], 
   2 -> Hue[0.13, 0.5, 1], 3 -> Hue[0.17, 0, 1]}, Frame -> False, 
 ImagePadding -> 0, PlotRangePadding -> 0, PixelConstrained -> 1]

But if one zooms in around the center column, there’s something surprising: after 251 steps, the center column seems to evolve to a fixed value (or at least it’s fixed for more than a million steps):

Rule 150898

Grid[{ArrayPlot[#, Mesh -> True, 
     ColorRules -> {0 -> Hue[0.12, 1, 1], 1 -> Hue[0, 0.73, 0.92], 
       2 -> Hue[0.13, 0.5, 1], 3 -> Hue[0.17, 0, 1]}, ImageSize -> 38,
      MeshStyle -> Lighter[GrayLevel[.5, .65], .45]] & /@ 
    CellularAutomaton[{150898, {4, 1}, 1}, {{1}, 0}, {1400, {-4, 4}}],
     100]}, Spacings -> .35]

Could some transient like this happen in rule 30? Well, take a look at the rule 30 pattern, now highlighting where the diagonals on the left are periodic:


steps = 500;
					diagonalsofrule30 = 
  Reverse /@ 
    MapIndexed[RotateLeft[#1, (steps + 1) - #2[[1]]] &, 
     CellularAutomaton[30, {{1}, 0}, steps]]];

     diagonaldataofrule30 = 
  Table[With[{split = 
      Split[Partition[Drop[diagonalsofrule30[[k]], 1], 8]], 
     ones = Flatten[
       Position[Reverse[Drop[diagonalsofrule30[[k]], 1]], 
        1]]}, {Length[split[[1]]], split[[1, 1]], 
     If[Length[split] > 1, split[[2, 1]], 
      Length[diagonalsofrule30[[k]]] - Floor[k/2]]}], {k, 1, 
    2 steps + 1}];

transientdiagonalrule30 = %;

    transitionpointofrule30 = 
  If[IntegerQ[#[[3]]], #[[3]], 
     If[#[[1]] > 1, 
      8 #[[1]] + Count[Split[#[[2]] - #[[3]]][[1]], 0] + 1, 0] ] & /@ 

   decreasingtransitionpointofrule30 = 
  Append[Min /@ Partition[transitionpointofrule30, 2, 1], 0];

  transitioneddiagonalsofrule30 = 
      decreasingtransitionpointofrule30[[n]]] + 2, 
     decreasingtransitionpointofrule30[[n]]]], {n, 1, 2 steps + 1}];

     transientdiagonalrule30 = 
 MapIndexed[RotateRight[#1, (steps + 1) - #2[[1]]] &, 
  Transpose[Reverse /@ transitioneddiagonalsofrule30]];
  smallertransientdiagonalrule30 = 
  Take[#, {225, 775}] & /@ Take[transientdiagonalrule30, 275];

  ColorRules -> {0 -> White, 1 -> Gray, 2 -> Hue[0.14, 0.55, 1], 
    3 -> Hue[0.07, 1, 1]}, PixelConstrained -> 1,
  Frame -> None,
  ImagePadding -> 0, ImageMargins -> 0,
  PlotRangePadding -> 0, PlotRangePadding -> Full
  ], FrameMargins -> 0, FrameStyle -> GrayLevel[.75]]

There seems to be a boundary that separates order on the left from disorder on the right. And at least over the first 100,000 or so steps, the boundary seems to move on average about 0.252 steps to the left at each step—with roughly random fluctuations:


data = CloudGet[

 MapIndexed[{First[#2], -# - .252 First[#2]} &, 
  Module[{m = -1, w}, 
   w = If[First[#] > m, m = First[#], m] & /@ data[[1]]; m = 1;
   Table[While[w[[m]] < i, m++]; m - i, {i, 100000}]]], 
 Filling -> Axis, AspectRatio -> 1/4, MaxPlotPoints -> 10000, 
 Frame -> True, PlotRangePadding -> 0, AxesOrigin -> {Automatic, 0}, 
 PlotStyle -> Hue[0.07`, 1, 1], 
 FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]]

But how do we know that there won’t at some point be a huge fluctuation, that makes the order on the left cross the center column, and perhaps even make the whole pattern periodic? From the data we have so far, it looks unlikely, but I don’t know any way to know for sure.

And it’s certainly the case that there are systems with exceptionally long “transients”. Consider the distribution of primes, and compute LogIntegral[n] - PrimePi[n]:


DiscretePlot[LogIntegral[n] - PrimePi[n], {n, 10000}, 
 Filling -> Axis,
 Frame -> True, PlotRangePadding -> 0, AspectRatio -> 1/4, 
 Joined -> True, PlotStyle -> Hue[0.07`, 1, 1], 
 FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]]

Yes, there are fluctuations. But from this picture it certainly looks as if this difference is always going to be positive. And that’s, for example, what Ramanujan thought. But it turns out it isn’t true. At first the bound for where it would fail was astronomically large (Skewes’s number 10^10^10^964). And although still nobody has found an explicit value of n for which the difference is negative, it’s known that before n = 10317 there must be one (and eventually the difference will be negative at least nearly a millionth of the time).

I strongly suspect that nothing like this happens with the center column of rule 30. But until we have a proof that it can’t, who knows?

One might think, by the way, that while one might be able to prove periodicity by exposing regularity in the center column of rule 30, nothing like that would be possible for non-periodicity. But actually, there are patterns whose center columns one can readily see are non-periodic, even though they’re very regular. The main class of examples are nested patterns. Here’s a very simple example, from rule 161—in which the center column has white cells when n = 2k:

Rule 161

 ArrayPlot[CellularAutomaton[161, {{1}, 0}, #]] & /@ {40, 200}]

Here’s a slightly more elaborate example (from the 2-neighbor 2-color rule 69540422), in which the center column is a Thue–Morse sequence ThueMorse[n]:

Thue-Morse sequence

    CellularAutomaton[{69540422, 2, 2}, {{1}, 
      0}, {#, {-#, #}}]] & /@ {40, 400}]

One can think of the Thue–Morse sequence as being generated by successively applying the substitutions:


RulePlot[SubstitutionSystem[{0 -> {0, 1}, 1 -> {1, 0}}], 
 Appearance -> "Arrow"]

And it turns out that the nth term in this sequence is given by Mod[DigitCount[n, 2, 1], 2]—which is never periodic.

Will it turn out that the center column of rule 30 can be generated by a substitution system? Again, I’d be amazed (although there are seemingly natural examples where very complex substitution systems do appear). But once again, until one has a proof, who knows?

Here’s something else, that may be confusing, or may be helpful. The Rule 30 Prize Problems all concern rule 30 running in an infinite array of cells. But what if one considers just n cells, say with the periodic boundary conditions (i.e. taking the right neighbor of the rightmost cell to be the leftmost cell, and vice versa)? There are 2n possible total states of the system—and one can draw a state transition diagram that shows which state evolves to which other. Here’s the diagram for n = 5:


Graph[# -> CellularAutomaton[30][#] & /@ Tuples[{1, 0}, 4], 
 VertexLabels -> ((# -> 
       ArrayPlot[{#}, ImageSize -> 30, Mesh -> True]) & /@ 
    Tuples[{1, 0}, 4])]

And here it is for n = 4 through n = 11:


  Framed[Graph[# -> CellularAutomaton[30][#] & /@ 
     Tuples[{1, 0}, n]]], {n, 4, 11}]]

The structure is that there are a bunch of states that appear only as transients, together with other states that are on cycles. Inevitably, no cycle can be longer than 2n (actually, symmetry considerations show that it always has to be somewhat less than this).

OK, so on a size-n array, rule 30 always has to show behavior that becomes periodic with a period that’s less than 2n. Here are the actual periods starting from a single black cell initial condition, plotted on a log scale:


      "Repetition Periods for Elementary Cellular Automata"][
     Select[#Rule == 30 &]][All, "RepetitionPeriods"]]], 
 Joined -> True, Filling -> Bottom, Mesh -> All, 
 MeshStyle -> PointSize[.008], AspectRatio -> 1/3, Frame -> True, 
 PlotRange -> {{47, 2}, {0, 10^10}}, PlotRangePadding -> .1, 
 PlotStyle -> Hue[0.07`, 1, 1], 
 FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]]

And at least for these values of n, a decent fit is that the period is about 20.63 n. And, yes, at least in all these cases, the period of the center column is equal to the period of the whole evolution. But what do these finite-size results imply about the infinite-size case? I, at least, don’t immediately see.

Problem 2: Does each color of cell occur on average equally often in the center column?

Here’s a plot of the running excess of 1s over 0s in 10,000 steps of the center column of rule 30:


 Accumulate[2 CellularAutomaton[30, {{1}, 0}, {10^4 - 1, {{0}}}] - 1],
  AspectRatio -> 1/4, Frame -> True, PlotRangePadding -> 0, 
 AxesOrigin -> {Automatic, 0}, Filling -> Axis, 
 PlotStyle -> Hue[0.07`, 1, 1], 
 FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]]

Here it is for a million steps:


  2 ResourceData[
     "A Million Bits of the Center Column of the Rule 30 Cellular Automaton"] - 1], Filling -> Axis, Frame -> True, PlotRangePadding -> 0, AspectRatio -> 1/4, MaxPlotPoints -> 1000, PlotStyle -> Hue[0.07`, 1, 1], 
 FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]]

And a billion steps:


Billion Bits of the Center Column of the Rule 30 Cellular Automaton"]]];
data=Accumulate[2 data-1];
ListLinePlot[Transpose[{Range[10000] 10^5,sdata}],Filling->Axis,Frame->True,PlotRangePadding->0,AspectRatio->1/4,MaxPlotPoints->1000,PlotStyle->Hue[0.07`,1,1],FillingStyle->Directive[Opacity[0.35`],Hue[0.12`,1,1]]]

We can see that there are times when there’s an excess of 1s over 0s, and vice versa, though, yes, as we approach a billion steps 1 seems to be winning over 0, at least for now.

But let’s compute the ratio of the total number of 1s to the total number 0f 0s. Here’s what we get after 10,000 steps:


  MapIndexed[#/(First[#2] - #) &, 
   Accumulate[CellularAutomaton[30, {{1}, 0}, {10^4 - 1, {{0}}}]]], 
  AspectRatio -> 1/4, Filling -> Axis, AxesOrigin -> {Automatic, 1}, 
  Frame -> True, PlotRangePadding -> 0, PlotStyle -> Hue[0.07`, 1, 1],
   FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]], 
  PlotRange -> {Automatic, {.88, 1.04}}]]

Is this approaching the value 1? It’s hard to tell. Go on a little longer, and this is what we see:


  MapIndexed[#/(First[#2] - #) &, 
   Accumulate[CellularAutomaton[30, {{1}, 0}, {10^5 - 1, {{0}}}]]], 
  AspectRatio -> 1/4, Filling -> Axis, AxesOrigin -> {Automatic, 1}, 
  Frame -> True, PlotRangePadding -> 0, PlotStyle -> Hue[0.07`, 1, 1],
   FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]], 
  PlotRange -> {Automatic, {.985, 1.038}}]]

The scale is getting smaller, but it’s still hard to tell what will happen. Plotting the difference from 1 on a log-log plot up to a billion steps suggests it’s fairly systematically getting smaller:


Billion Bits of the Center Column of the Rule 30 Cellular Automaton"]]]];




But how do we know this trend will continue? Right now, we don’t. And, actually, things could get quite pathological. Maybe the fluctuations in 1s vs. 0s grow, so even though we’re averaging over longer and longer sequences, the overall ratio will never converge to a definite value.

Again, I doubt this is going to happen in the center column of rule 30. But without a proof, we don’t know for sure.

We’re asking here about the frequencies of black and white cells. But an obvious—and potentially illuminating—generalization is to ask instead about the frequencies for blocks of cells of length k. We can ask if all 2k such blocks have equal limiting frequency. Or we can ask the more basic question of whether all the blocks even ever occur—or, in other words, whether if one goes far enough, the center column of rule 30 will contain any given sequence of length k (say a bitwise representation of some work of literature).

Again, we can get empirical evidence. For example, at least up to k = 22, all 2k sequences do occur—and here’s how many steps it takes:


ListLogPlot[{3, 7, 13, 63, 116, 417, 1223, 1584, 2864, 5640, 23653, 
  42749, 78553, 143591, 377556, 720327, 1569318, 3367130, 7309616, 
  14383312, 32139368, 58671803}, Joined -> True, AspectRatio -> 1/4, 
 Frame -> True, Mesh -> True, 
 MeshStyle -> 
  Directive[{Hue[0.07, 0.9500000000000001, 0.99], PointSize[.01]}], 
 PlotTheme -> "Detailed", 
 PlotStyle -> Directive[{Thickness[.004], Hue[0.1, 1, 0.99]}]]

It’s worth noticing that one can succeed perfectly for blocks of one length, but then fail for larger blocks. For example, the Thue–Morse sequence mentioned above has exactly equal frequencies of 0 and 1, but pairs don’t occur with equal frequencies, and triples of identical elements simply never occur.

In traditional mathematics—and particularly dynamical systems theory—one approach to take is to consider not just evolution from a single-cell initial condition, but evolution from all possible initial conditions. And in this case it’s straightforward to show that, yes, if one evolves with equal probability from all possible initial conditions, then columns of cells generated by rule 30 will indeed contain every block with equal frequency. But if one asks the same thing for different distributions of initial conditions, one gets different results, and it’s not clear what the implication of this kind of analysis is for the specific case of a single-cell initial condition.

If different blocks occurred with different frequencies in the center column of rule 30, then that would immediately show that the center column is “not random”, or in other words that it has statistical regularities that could be used to at least statistically predict it. Of course, at some level the center column is completely “predictable”: you just have to run rule 30 to find it. But the question is whether, given just the values in the center column on their own, there’s a way to predict or compress them, say with much less computational effort than generating an arbitrary number of steps in the whole rule 30 pattern.

One could imagine running various data compression or statistical analysis algorithms, and asking whether they would succeed in finding regularities in the sequence. And particularly when one starts thinking about the overall computational capabilities of rule 30, it’s conceivable that one could prove something about how across a spectrum of possible analysis algorithms, there’s a limit to how much they could “reduce” the computation associated with the evolution of rule 30. But even given this, it’d likely still be a major challenge to say anything about the specific case of relative frequencies of black and white cells.

It’s perhaps worth mentioning one additional mathematical analog. Consider treating the values in a row of the rule 30 pattern as digits in a real number, say with the first digit of the fractional part being on the center column. Now, so far as we know, the evolution of rule 30 has no relation to any standard operations (like multiplication or taking powers) that one does on real numbers. But we can still ask about the sequence of numbers formed by looking at the right-hand side of the rule 30 pattern. Here’s a plot for the first 200 steps:


 FromDigits[{#, 0}, 2] & /@ 
  CellularAutomaton[30, {{1}, 0}, {200, {0, 200}}], Mesh -> All, 
 AspectRatio -> 1/4, Frame -> True, 
 MeshStyle -> 
  Directive[{Hue[0.07, 0.9500000000000001, 0.99], PointSize[.0085]}], 
 PlotTheme -> "Detailed", PlotStyle -> Directive[{
Hue[0.1, 1, 0.99]}], ImageSize -> 575]

And here’s a histogram of the values reached at successively more steps:


    FromDigits[{#, 0}, 2] & /@ 
     CellularAutomaton[30, {{1}, 0}, {10^n, {0, 20}}], {.01}, 
    Frame -> True, 
    FrameTicks -> {{None, 
       None}, {{{0, "0"}, .2, .4, .6, .8, {1, "1"}}, None}}, 
    PlotLabel -> (StringTemplate["`` steps"][10^n]), 
    ChartStyle -> Directive[Opacity[.5], Hue[0.09, 1, 1]], 
    ImageSize -> 208, 
    PlotRangePadding -> {{0, 0}, {0, Scaled[.06]}}], {n, 4, 6}]}, 
 Spacings -> .2]

And, yes, it’s consistent with the limiting histogram being flat, or in other words, with these numbers being uniformly distributed in the interval 0 to 1.

Well, it turns out that in the early 1900s there were a bunch of mathematical results established about this kind of equidistribution. In particular, it’s known that FractionalPart[h n] for successive n is always equidistributed if h isn’t a rational number. It’s also known that FractionalPart[hn] is equidistributed for almost all h (Pisot numbers like the golden ratio are exceptions). But specific cases—like FractionalPart[(3/2)n]—have eluded analysis for at least half a century. (By the way, it’s known that the digits of π in base 16 and thus base 2 can be generated by a recurrence of the form xn = FractionalPart[16 xn-1 + r[n]] where r[n] is a fixed rational function of n.)

Problem 3: Does computing the nth cell of the center column require at least O(n) computational effort?

Consider the pattern made by rule 150:

Rule 150

Row[{ArrayPlot[CellularAutomaton[150, {{1}, 0}, 30], Mesh -> All, 
   ImageSize -> 315], 
  ArrayPlot[CellularAutomaton[150, {{1}, 0}, 200], ImageSize -> 300]}]

It’s a very regular, nested pattern. Its center column happens to be trivial (all cells are black). But if we look one column to the left or right, we find:


ArrayPlot[{Table[Mod[IntegerExponent[t, 2], 2], {t, 80}]}, 
 Mesh -> All, ImageSize -> Full]

How do we work out the value of the nth cell? Well, in this particular case, it turns out there’s essentially just a simple formula: the value is given by Mod[IntegerExponent[n, 2], 2]. In other words, just look at the number n in base 2, and ask whether the number of zeros it has at the end is even or odd.

How much computational effort does it take to “evaluate this formula”? Well, even if we have to check every bit in n, there are only about Log[2, n] of those. So we can expect that the computational effort is O(log n).

But what about the rule 30 case? We know we can work out the value of the nth cell in the center column just by explicitly applying the rule 30 update rule n2 times. But the question is whether there’s a way to reduce the computational work that’s needed. In the past, there’s tended to be an implicit assumption throughout the mathematical sciences that if one has the right model for something, then by just being clever enough one will always find a way to make predictions—or in other words, to work out what a system will do, using a lot less computational effort than the actual evolution of the system requires.

And, yes, there are plenty of examples of “exact solutions” (think 2-body problem, 2D Ising model, etc.) where we essentially just get a formula for what a system will do. But there are also other cases (think 3-body problem, 3D Ising model, etc.) where this has never successfully been done.

And as I first discussed in the early 1980s, I suspect that there are actually many systems (including these) that are computationally irreducible, in the sense that there’s no way to significantly reduce the amount of computational work needed to determine their behavior.

So in effect Problem 3 is asking about the computational irreducibility of rule 30—or at least a specific aspect of it. (The choice of O(n) computational effort is somewhat arbitrary; another version of this problem could ask for O(nα) for any α<2, or, for that matter, O(log β(n))—or some criterion based on both time and memory resources.)

If the answer to Problem 3 is negative, then the obvious way to show this would just be to give an explicit program that successfully computes the nth value in the center column with less than O(n) computational effort, as we did for rule 150 above.

We can ask what O(n) computational effort means. What kind of system are we supposed to use to do the computation? And how do we measure “computational effort”? The phenomenon of computational universality implies that—within some basic constraints—it ultimately doesn’t matter.

For definiteness we could say that we always want to do the computation on a Turing machine. And for example we can say that we’ll feed the digits of the number n in as the initial state of the Turing machine tape, then expect the Turing machine to grind for much less than n steps before generating the answer (and, if it’s really to be “formula like”, more like O(log n) steps).

We don’t need to base things on a Turing machine, of course. We could use any kind of system capable of universal computation, including a cellular automaton, and, for that matter, the whole Wolfram Language. It gets a little harder to measure “computational effort” in these systems. Presumably in a cellular automaton we’d want to count the total number of cell updates done. And in the Wolfram Language we might end up just actually measuring CPU time for executing whatever program we’ve set up.

I strongly suspect that rule 30 is computationally irreducible, and that Problem 3 has an affirmative answer. But if isn’t, my guess is that eventually there’ll turn out to be a program that rather obviously computes the nth value in less than O(n) computational effort, and there won’t be a lot of argument about the details of whether the computational resources are counted correctly.

But proving that no such program exists is a much more difficult proposition. And even though I suspect computational irreducibility is quite ubiquitous, it’s always very hard to prove explicit lower bounds on the difficulty of doing particular computations. And in fact almost all explicit lower bounds currently known are quite weak, and essentially boil down just to arguments about information content—like that you need O(log n) steps to even read all the digits in the value of n.

Undoubtedly the most famous lower-bound problem is the P vs. NP question. I don’t think there’s a direct relation to our rule 30 problem (which is more like a P vs. LOGTIME question), but it’s perhaps worth understanding how things are connected. The basic point is that the forward evolution of a cellular automaton, say for n steps from an initial condition with n cells specified, is at most an O(n2) computation, and is therefore in P (“polynomial time”). But the question of whether there exists an initial condition that evolves to produce some particular final result is in NP. If you happen (“non-deterministically”) to pick the correct initial condition, then it’s polynomial time to check that it’s correct. But there are potentially 2n possible initial conditions to check.

Of course there are plenty of cellular automata where you don’t have to check all these 2n initial conditions, and a polynomial-time computation clearly suffices. But it’s possible to construct a cellular automaton where finding the initial condition is an NP-complete problem, or in other words, where it’s possible to encode any problem in NP in this particular cellular automaton inversion problem. Is the rule 30 inversion problem NP-complete? We don’t know, though it seems conceivable that it could be proved to be (and if one did prove it then rule 30 could finally be a provably NP-complete cryptosystem).

But there doesn’t seem to be a direct connection between the inversion problem for rule 30, and the problem of predicting the center column. Still, there’s at least a more direct connection to another global question: whether rule 30 is computation universal, or, in other words, whether there exist initial conditions for rule 30 that allow it to be “programmed” to perform any computation that, for example, any Turing machine can perform.

We know that among the 256 simplest cellular automata, rule 110 is universal (as are three other rules that are simple transformations of it). But looking at a typical example of rule 110 evolution, it’s already clear that there are definite, modular structures one can identify. And indeed the proof proceeds by showing how one can “engineer” a known universal system out of rule 110 by appropriately assembling these structures.

Rule 110

SeedRandom[23542345]; ArrayPlot[
 CellularAutomaton[110, RandomInteger[1, 600], 400], 
 PixelConstrained -> 1]

Rule 30, however, shows no such obvious modularity—so it doesn’t seem plausible that one can establish universality in the “engineering” way it’s been established for all other known-to-be-universal systems. Still, my Principle of Computational Equivalence strongly suggests that rule 30 is indeed universal; we just don’t yet have an obvious direction to take in trying to prove it.

If one can show that a system is universal, however, then this does have implications that are closer to our rule 30 problem. In particular, if a system is universal, then there’ll be questions (like the halting problem) about its infinite-time behavior that will be undecidable, and which no guaranteed-finite-time computation can answer. But as such, universality is a statement about the existence of initial conditions that reproduce a given computation. It doesn’t say anything about the specifics of a particular initial condition—or about how long it will take to compute a particular result.

OK, but what about a different direction: what about getting empirical evidence about our Problem 3? Is there a way to use statistics, or cryptanalysis, or mathematics, or machine learning to even slightly reduce the computational effort needed to compute the nth value in the center column?

Well, we know that the whole 2D pattern of rule 30 is far from random. In fact, of all 2m2 patches, only m × m can possibly occur—and in practice the number weighted by probability is much smaller. And I don’t doubt that facts like this can be used to reduce the effort to compute the center column to less than O(n2) effort (and that would be a nice partial result). But can it be less than O(n) effort? That’s a much more difficult question.

Clearly if Problem 1 was answered in the negative then it could be. But in a sense asking for less than O(n) computation of the center column is precisely like asking whether there are “predictable regularities” in it. Of course, even if one could find small-scale statistical regularities in the sequence (as answering Problem 2 in the negative would imply), these wouldn’t on their own give one a way to do more than perhaps slightly improve a constant multiplier in the speed of computing the sequence.

Could there be some systematically reduced way to compute the sequence using a neural net—which is essentially a collection of nested real-number functions? I’ve tried to find such a neural net using our current deep-learning technology—and haven’t been able to get anywhere at all.

What about statistical methods? If we could find statistical non-randomness in the sequence, then that would imply an ability to compress the sequence, and thus some redundancy or predictability in the sequence. But I’ve tried all sorts of statistical randomness tests on the center column of rule 30—and never found any significant deviation from randomness. (And for many years—until we found a slightly more efficient rule—we used sequences from finite-size rule 30 systems as our source of random numbers in the Wolfram Language, and no legitimate “it’s not random!” bugs ever showed up.)

Statistical tests of randomness typically work by saying, “Take the supposedly random sequence and process it in some way, then see if the result is obviously non-random”. But what kind of processing should be done? One might see if blocks occur with equal frequency, or if correlations exist, or if some compression algorithm succeeds in doing compression. But typically batteries of tests end up seeming a bit haphazard and arbitrary. In principle one can imagine enumerating all possible tests—by enumerating all possible programs that can be applied to the sequence. But I’ve tried doing this, for example for classes of cellular automaton rules—and have never managed to detect any non-randomness in the rule 30 sequence.

So how about using ideas from mathematics to predict the rule 30 sequence? Well, as such, rule 30 doesn’t seem connected to any well-developed area of math. But of course it’s conceivable that some mapping could be found between rule 30 and ideas, say, in an area like number theory—and that these could either help in finding a shortcut for computing rule 30, or could show that computing it is equivalent to some problem like integer factoring that’s thought to be fundamentally difficult.

I know a few examples of interesting interplays between traditional mathematical structures and cellular automata. For example, consider the digits of successive powers of 3 in base 2 and in base 6:

Digits of successive powers

  ArrayPlot[#, ImageSize -> {Automatic, 275}] & /@ {Table[
     IntegerDigits[3^t, 2, 159], {t, 100}], 
    Table[IntegerDigits[3^t, 6, 62], {t, 100}]}, Spacer[10]]]

It turns out that in the base 6 case, the rule for generating the pattern is exactly a cellular automaton. (For base 2, there are additional long-range carries.) But although both these patterns look complex, it turns out that their mathematical structure lets us speed up making certain predictions about them.

Consider the sth digit from the right-hand edge of line n in each pattern. It’s just the sth digit in 3n, which is given by the “formula” (where b is the base, here 2 or 6) Mod[Quotient[3n, bs], b]. But how easy is it to evaluate this formula? One might think that to compute 3n one would have to do n multiplications. But this isn’t the case: instead, one can for example build up 3n using repeated squaring, with about log(n) multiplications. That this is possible is a consequence of the associativity of multiplication. There’s nothing obviously like that for rule 30—but it’s always conceivable that some mapping to a mathematical structure like this could be found.

Talking of mathematical structure, it’s worth mentioning that there are more formula-like ways to state the basic rule for rule 30. For example, taking the values of three adjacent cells to be p, q, r the basic rule is just p (q r) or Xor[p, Or[q, r]]. With numerical cell values 0 and 1, the basic rule is just Mod[p + q + r + q r, 2]. Do these forms help? I don’t know. But, for example, it’s remarkable that in a sense all the complexity of rule 30 comes from the presence of that one little nonlinear q r term—for without that term, one would have rule 150, about which one can develop a complete algebraic theory using quite traditional mathematics.

To work out n steps in the evolution of rule 30, one’s effectively got to repeatedly compose the basic rule. And so far as one can tell, the symbolic expressions that arise just get more and more complicated—and don’t show any sign of simplifying in such a way as to save computational work.

In Problem 3, we’re talking about the computational effort to compute the nth value in the center column of rule 30—and asking if it can be less than O(n). But imagine that we have a definite algorithm for doing the computation. For any given n, we can see what computational resources it uses. Say the result is r[n]. Then what we’re asking is whether r[n] is less than “big O” of n, or whether MaxLimit[r[n]/n, n ]<.

But imagine that we have a particular Turing machine (or some other computational system) that’s implementing our algorithm. It could be that r[n] will at least asymptotically just be a smooth or otherwise regular function of n for which it’s easy to see what the limit is. But if one just starts enumerating Turing machines, one encounters examples where r[n] appears to have peaks of random heights in random places. It might even be that somewhere there’d be a value of n for which the Turing machine doesn’t halt (or whatever) at all, so that r[n] is infinite. And in general, as we’ll discuss in more detail later, it could even be undecidable just how r[n] grows relative to O(n).

Formal Statements of the Problems

So far, I’ve mostly described the Prize Problems in words. But we can also describe them in computational language (or effectively also in math).

In the Wolfram Language, the first t values in the center column of rule 30 are given by:


c[t_] := CellularAutomaton[30, {{1}, 0}, {t, {{0}}}]

And with this definition, the three problems can be stated as predicates about c[t].

Problem 1: Does the center column always remain non-periodic?

Problem 1

\*SubscriptBox[\(\[NotExists]\), \({p, i}\)]\(
\*SubscriptBox[\(\[ForAll]\), \(t, t > i\)]c[t + p] == c[t]\)\)



NotExists[{p, i}, ForAll[t, t > i, c[t + p] == c[t]]]

or “there does not exist a period p and an initial length i such that for all t with t>i, c[t + p] equals c[t]”.

Problem 2: Does each color of cell occur on average equally often in the center column?

Problem 2

\!\(\*UnderscriptBox[\(\[Limit]\), \(t\*
"Integers"]]\[Infinity]\)]\) Total[c[t]]/t == 1/2



DiscreteLimit[Total[c[t]]/t, t -> Infinity] == 1/2

or “the discrete limit of the total of the values in c[t]/t as t is 1/2”.

Problem 3: Does computing the nth cell of the center column require at least O(n) computational effort?

Define machine[m] to be a machine parametrized by m (for example TuringMachine[...]), and let machine[m][n] give {v, t}, where v is the output value, and t is the amount of computational effort taken (e.g. number of steps). Then the problem can be formulated as:

Problem 3

\*SubscriptBox[\(\[NotExists]\), \(m\)]\((
\*SubscriptBox[\(\[ForAll]\), \(n\)]\(\(machine[m]\)[n]\)[[1]] == 
     Last[c[n]]\  \[And] \ 
\*UnderscriptBox[\(\[MaxLimit]\), \(n -> \[Infinity]\)]
       2]]\), \(n\)] < \[Infinity])\)\)

or “there does not exist a machine m which for all n gives c[n], and for which the lim sup of the amount of computational effort spent, divided by n, is finite”. (Yes, one should also require that m be finite, so the machine’s rule can’t just store the answer.)

The Formal Character of Solutions

Before we discuss the individual problems, an obvious question to ask is what the interdependence of the problems might be. If the answer to Problem 3 is negative (which I very strongly doubt), then it holds the possibility for simple algorithms or formulas from which the answers to Problems 1 and 2 might become straightforward. If the answer to Problem 3 is affirmative (as I strongly suspect), then it implies that the answer to Problem 1 must also be affirmative. The contrapositive is also true: if the answer to Problem 1 is negative, then it implies that the answer to Problem 3 must also be negative.

If the answer to Problem 1 is negative, so that there is some periodic sequence that appears in the center column, then if one explicitly knows that sequence, one can immediately answer Problem 2. One might think that answering Problem 2 in the negative would imply something about Problem 3. And, yes, unequal probabilities for black and white implies compression by a constant factor in a Shannon-information way. But to compute value with less than O(n) resources—and therefore to answer Problem 3 in the negative—requires that one be able to identify in a sense infinitely more compression.

So what does it take to establish the answers to the problems?

If Problem 1 is answered in the negative, then one can imagine explicitly exhibiting the pattern generated by rule 30 at some known step—and being able to see the periodic sequence in the center. Of course, Problem 1 could still be answered in the negative, but less constructively. One might be able to show that eventually the sequence has to be periodic, but not know even any bound on where this might happen. If Problem 3 is answered in the negative, a way to do this is to explicitly give an algorithm (or, say, a Turing machine) that does the computation with less than O(n) computational resources.

But let’s say one has such an algorithm. One still has to prove that for all n, the algorithm will correctly reproduce the nth value. This might be easy. Perhaps there would just be a proof by induction or some such. But it might be arbitrarily hard. For example, it could be that for most n, the running time of the algorithm is clearly less than n. But it might not be obvious that the running time will always even be finite. Indeed, the “halting problem” for the algorithm might simply be undecidable. But just showing that a particular algorithm doesn’t halt for a given n doesn’t really tell one anything about the answer to the problem. For that one would have to show that there’s no algorithm that exists that will successfully halt in less than O(n) time.

The mention of undecidability brings up an issue, however: just what axiom system is one supposed to use to answer the problems? For the purposes of the Prize, I’ll just say “the traditional axioms of standard mathematics”, which one can assume are Peano arithmetic and/or the axioms of set theory (with or without the continuum hypothesis).

Could it be that the answers to the problems depend on the choice of axioms—or even that they’re independent of the traditional axioms (in the sense of Gödel’s incompleteness theorem)? Historical experience in mathematics makes this seem extremely unlikely, because, to date, essentially all “natural” problems in mathematics seem to have turned out to be decidable in the (sometimes rather implicit) axiom system that’s used in doing the mathematics.

In the computational universe, though—freed from the bounds of historical math tradition—it’s vastly more common to run into undecidability. And, actually, my guess is that a fair fraction of long-unsolved problems even in traditional mathematics will also turn out to be undecidable. So that definitely raises the possibility that the problems here could be independent of at least some standard axiom systems.

OK, but assume there’s no undecidability around, and one’s not dealing with the few cases in which one can just answer a problem by saying “look at this explicitly constructed thing”. Well, then to answer the problem, we’re going to have to give a proof.

In essence what drives the need for proof is the presence of something infinite. We want to know something for any n, even infinitely large, etc. And the only way to handle this is then to represent things symbolically (“the symbol Infinity means infinity”, etc.), and apply formal rules to everything, defined by the axioms in the underlying axiom system one’s assuming.

In the best case, one might be able to just explicitly exhibit that series of rule applications—in such a way that a computer can immediately verify that they’re correct. Perhaps the series of rule applications could be found by automated theorem proving (as in FindEquationalProof). More likely, it might be constructed using a proof assistant system.

It would certainly be exciting to have a fully formalized proof of the answer to any of the problems. But my guess is that it’ll be vastly easier to construct a standard proof of the kind human mathematicians traditionally do. What is such a proof? Well, it’s basically an argument that will convince other humans that a result is correct.

There isn’t really a precise definition of that. In our step-by-step solutions in Wolfram|Alpha, we’re effectively proving results (say in calculus) in such a way that students can follow them. In an academic math journal, one’s giving proofs that successfully get past the peer review process for the journal.

My own guess would be that if one were to try to formalize essentially any nontrivial proof in the math literature, one would find little corners that require new results, though usually ones that wouldn’t be too hard to get.

How can we handle this in practice for our prizes? In essence, we have to define a computational contract for what constitutes success, and when prize money should be paid out. For a constructive proof, we can get Wolfram Language code that can explicitly be run on any sufficiently large computer to establish the result. For formalized proofs, we can get Wolfram Language code that can run through the proof, validating each step.

But what about for a “human proof”? Ultimately we have no choice but to rely on some kind of human review process. We can ask multiple people to verify the proof. We could have some blockchain-inspired scheme where people “stake” the correctness of the proof, then if one eventually gets consensus (whatever this means) one pays out to people some of the prize money, in proportion to their stake. But whatever is done, it’s going to be an imperfect, “societal” result—like almost all of the pure mathematics that’s so far been done in the world.

What Will It Take?

OK, so for people interested in working on the Problems, what skills are relevant? I don’t really know. It could be discrete and combinatorial mathematics. It could be number theory, if there’s a correspondence with number-based systems found. It could be some branch of algebraic mathematics, if there’s a correspondence with algebraic systems found. It could be dynamical systems theory. It could be something closer to mathematical logic or theoretical computer science, like the theory of term rewriting systems.

Of course, it could be that no existing towers of knowledge—say in branches of mathematics—will be relevant to the problems, and that to solve them will require building “from the ground up”. And indeed that’s effectively what ended up happening in the solution for my 2,3 Turing Machine Prize in 2007.

I’m a great believer in the power of computer experiments—and of course it’s on the basis of computer experiments that I’ve formulated the Rule 30 Prize Problems. But there are definitely more computer experiments that could be done. So far we know a billion elements in the center column sequence. And so far the sequence doesn’t seem to show any deviation from randomness (at least based on tests I’ve tried). But maybe at a trillion elements (which should be well within range of current computer systems) or a quadrillion elements, or more, it eventually will—and it’s definitely worth doing the computations to check.

The direct way to compute n elements in the center column is to run rule 30 for n steps, using at an intermediate stage up to n cells of memory. The actual computation is quite well optimized in the Wolfram Language. Running on my desktop computer, it takes less than 0.4 seconds to compute 100,000 elements:


CellularAutomaton[30, {{1}, 0}, {100000, {{0}}}]; // Timing

Internally, this is using the fact that rule 30 can be expressed as Xor[p, Or[q, r]], and implemented using bitwise operations on whole words of data at a time. Using explicit bitwise operations on long integers takes about twice as long as the built-in CellularAutomaton function:


Module[{a = 1}, 
   Table[BitGet[a, a = BitXor[a, BitOr[2 a, 4 a]]; i - 1], {i, 
     100000}]]; // Timing

But these results are from single CPU processors. It’s perfectly possible to imagine parallelizing across many CPUs, or using GPUs. One might imagine that one could speed up the computation by effectively caching the results of many steps in rule 30 evolution, but the fact that across the rows of the rule 30 pattern all blocks appear to occur with at least roughly equal frequency makes it seem as though this would not lead to significant speedup.

Solving some types of math-like problems seem pretty certain to require deep knowledge of high-level existing mathematics. For example, it seems quite unlikely that there can be an “elementary” proof of Fermat’s last theorem, or even of the four-color theorem. But for the Rule 30 Prize Problems it’s not clear to me. Each of them might need sophisticated existing mathematics, or they might not. They might be accessible only to people professionally trained in mathematics, or they might be solvable by clever “programming-style” or “puzzle-style” work, without sophisticated mathematics.

Generalizations and Relations

Sometimes the best way to solve a specific problem is first to solve a related problem—often a more general one—and then come back to the specific problem. And there are certainly many problems related to the Rule 30 Prize Problems that one can consider.

For example, instead of looking at the vertical column of cells at the center of the rule 30 pattern, one could look at a column of cells in a different direction. At 45°, it’s easy to see that any sequence must be periodic. On the left the periods increase very slowly; on the right they increase rapidly. But what about other angles?

Or what about looking at rows of cells in the pattern? Do all possible blocks occur? How many steps is it before any given block appears? The empirical evidence doesn’t see any deviation from blocks occurring at random, but obviously, for example, successive rows are highly correlated.

What about different initial conditions? There are many dynamical systems–style results about the behavior of rule 30 starting with equal probability from all possible infinite initial conditions. In this case, for example, it’s easy to show that all possible blocks occur with equal frequency, both at a given row, and in a given vertical column. Things get more complicated if one asks for initial conditions that correspond, for example, to all possible sequences generated by a given finite state machine, and one could imagine that from a sequence of results about different sets of possible initial conditions, one would eventually be able to say something about the case of the single black cell initial condition.

Another straightforward generalization is just to look not at a single black cell initial condition, but at other “special” initial conditions. An infinite periodic initial condition will always give periodic behavior (that’s the same as one gets in a finite-size region with periodic boundary conditions). But one can, for example, study what happens if one puts a “single defect” in the periodic pattern:

A 'single defect' in the periodic pattern

      MapAt[1 - #1 &, Flatten[Table[#1, Round[150/Length[#1]]]], 50], 
      100]] &) /@ {{1, 0}, {1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0}, {1, 
    0, 0, 0, 0, 0, 0}, {1, 1, 1, 0, 0}}]

One can also ask what happens when one has not just a single black cell, but some longer sequence in the initial conditions. How does the center column change with different initial sequences? Are there finite initial sequences that lead to “simpler” center columns?

Or are there infinite initial conditions generated by other computational systems (say substitution systems) that aren’t periodic, but still give somehow simple rule 30 patterns?

Then one can imagine going “beyond” rule 30. What happens if one adds longer-range “exceptions” to the rules? When do extensions to rule 30 show behavior that can be analyzed in one way or another? And can one then see the effect of removing the “exceptions” in the rule?

Of course, one can consider rules quite different from rule 30 as well—and perhaps hope to develop intuition or methods relevant to rule 30 by looking at other rules. Even among the 256 two-color nearest-neighbor rules, there are others that show complex behavior starting from a simple initial condition:


  Labeled[ArrayPlot[CellularAutomaton[#, {{1}, 0}, {150, All}], 
      PixelConstrained -> 1, Frame -> False], 
     Style[Text[StringTemplate["rule ``"][#]], 12], 
     LabelStyle -> Opacity[.5]] & /@ {45, 73}, Spacer[8]]]

And if one looks at larger numbers of colors and larger neighbors one can find an infinite number of examples. There’s all sorts of behavior that one sees. And, for example, given any particular sequence, one can search for rules that will generate it as their center column. One can also try to classify the center-column sequences that one sees, perhaps identifying a general class “like rule 30” about which global statements can be made.

But let’s discuss the specific Rule 30 Prize Problems. To investigate the possibility of periodicity in rule 30 (as in Problem 1), one could study lots of different rules, looking for examples with very long periods, or very long transients—and try to use these to develop an intuition for how and when these can occur.

To investigate the equal-frequency phenomenon of Problem 2, one can look at different statistical features, and see both in rule 30 and across different rules when it’s possible to detect regularity.

For Problem 3, one can start looking at different levels of computational effort. Can one find the nth value with computational effort O(nγ) for any γ<2 (I don't know any method to achieve this)? Can one show that one can’t find the nth value with less than O(log(n)) computational effort? What about with less than O(log(n)) available memory? What about for different rules? Periodic and nested patterns are easy to compute quickly. But what other examples can one find?

As I’ve mentioned, a big achievement would be to show computation universality for rule 30. But even if one can’t do it for rule 30, finding additional examples (beyond, for example, rule 110) will help build intuition about what might be going on in rule 30.

Then there’s NP-completeness. Is there a way of setting up some question about the behavior of rule 30 for some family of initial conditions where it’s possible to prove that the question is NP-complete? If this worked, it would be an exciting result for cryptography. And perhaps, again, one can build up intuition by looking at other rules, even ones that are more “purposefully constructed” than rule 30.

How Hard Are the Problems?

When I set up my 2,3 Turing Machine Prize in 2007 I didn’t know if it’d be solved in a month, a year, a decade, a century, or more. As it turned out, it was actually solved in about four months. So what will happen with the Rule 30 Prize Problems? I don’t know. After nearly 40 years, I’d be surprised if any of them could now be solved in a month (but it’d be really exciting if that happened!). And of course some superficially similar problems (like features of the digits of π) have been out there for well over a century.

It’s not clear whether there’s any sophisticated math (or computer science) that exists today that will be helpful in solving the problems. But I’m confident that whatever is built to solve them will provide structure that will be important for solving other problems about the computational universe. And the longer it takes (think Fermat’s last theorem), the larger the amount of useful structure is likely to be built on the way to a solution.

I don’t know if solutions to the problems will be “obviously correct” (it’ll help if they’re constructive, or presented in computable form), or whether there’ll be a long period of verification to go through. I don’t know if proofs will be comparatively short, or outrageously long. I don’t know if the solutions will depend on details of axiom systems (“assuming the continuum hypothesis”, etc.), or if they’ll be robust for any reasonable choices of axioms. I don’t know if the three problems are somehow “comparably difficult”—or if one or two might be solved, with the others holding out for a very long time.

But what I am sure about is that solving any of the problems will be a significant achievement. I’ve picked the problems to be specific, definite and concrete. But the issues of randomness and computational irreducibility that they address are deep and general. And to know the solutions to these problems will provide important evidence and raw material for thinking about these issues wherever they occur.

Of course, having lived now with rule 30 and its implications for nearly 40 years, I will personally be thrilled to know for certain even a little more about its remarkable behavior.

]]> 3
<![CDATA[The Ease of Wolfram|Alpha, the Power of Mathematica: Introducing Wolfram|Alpha Notebook Edition]]> Thu, 12 Sep 2019 14:43:43 +0000 Stephen Wolfram sw-iconThe Next Big Step for Wolfram|Alpha Wolfram|Alpha has been a huge hit with students. Whether in college or high school, Wolfram|Alpha has become a ubiquitous way for students to get answers. But it’s a one-shot process: a student enters the question they want to ask (say in math) and Wolfram|Alpha gives them the (usually richly [...]]]> sw-icon

Wolfram|Alpha Notebook Edition

The Next Big Step for Wolfram|Alpha

Wolfram|Alpha has been a huge hit with students. Whether in college or high school, Wolfram|Alpha has become a ubiquitous way for students to get answers. But it’s a one-shot process: a student enters the question they want to ask (say in math) and Wolfram|Alpha gives them the (usually richly contextualized) answer. It’s incredibly useful—especially when coupled with its step-by-step solution capabilities.

But what if one doesn’t want just a one-shot answer? What if one wants to build up (or work through) a whole computation? Well, that’s what we created Mathematica and its whole notebook interface to do. And for more than 30 years that’s how countless inventions and discoveries have been made around the world. It’s also how generations of higher-level students have been taught.

But what about students who aren’t ready to use Mathematica yet? What if we could take the power of Mathematica (and what’s now the Wolfram Language), but combine it with the ease of Wolfram|Alpha?

Well, that’s what we’ve done in Wolfram|Alpha Notebook Edition.

It’s built on a huge tower of technology, but what it does is to let any student—without learning any syntax or reading any documentation—immediately build up or work through computations. Just type input the way you would in Wolfram|Alpha. But now you’re not just getting a one-shot answer. Instead, everything is in a Wolfram Notebook, where you can save and use previous results, and build up or work through a whole computation:

Wolfram Notebook

The Power of Notebooks

Being able to use Wolfram|Alpha-style free-form input is what opens Wolfram|Alpha Notebook Edition up to the full range of students. But it’s the use of the notebook environment that makes it so uniquely valuable for education. Because by being able to work through things in a sequence of steps, students get to really engage with the computations they’re doing.

Try one step. See what happens. Change it if you want. Understand the output. See how it fits into the next step. And then—right there in the notebook—see how all your steps fit together to give your final results. And then save your work in the notebook, to continue—or review what you did—another time.

But notebooks aren’t just for storing computations. They can also contain text and structure. So students can use them not just to do their computations, but also to keep notes, and to explain the computations they’re doing, or the results they get:

Student notebook

And in fact, Wolfram Notebooks enable a whole new kind of student work: computational essays. A computational essay has both text and computation—combined to build up a narrative to which both human and computer contribute.

The process of creating a computational essay is a great way for students to engage with material they’re studying. Computational essays can also provide a great showcase of student achievement, as well as a means of assessing student understanding. And they’re not just something to produce for an assignment: they’re active computable documents that students can keep and use at any time in the future.

Study notebook—click to enlarge

But students aren’t the only ones to produce notebooks. In Wolfram|Alpha Notebook Edition, notebooks are also a great medium for teachers to provide material to students. Describe a concept in a notebook, then let students explore by doing their own computations right there in the notebook. Or make a notebook defining an assignment or a test—then let the students fill in their work (and grade it right there in the notebook).


It’s very common to use Wolfram|Alpha Notebook Edition to create visualizations of concepts. Often students will just ask for the visualizations themselves. But teachers can also set up templates for visualizations, and let students fill in their own functions or data to explore for themselves.


Wolfram|Alpha Notebook Edition also supports dynamic interactive visualizations—for example using the Wolfram Language Manipulate function. And in Wolfram|Alpha Notebook Edition students (and teachers!) can build all sorts of dynamic visualizations just using natural language:

Dynamic visualizations

But what if you want some more sophisticated interactive demonstration, that might be hard to specify? Well, Wolfram|Alpha Notebook Edition has direct access to the Wolfram Demonstrations Project, which contains over 12,000 Demonstrations. You can ask for Demonstrations using natural language, or you can just browse the Demonstrations Project website, select a Demonstration, copy it into your Wolfram|Alpha Notebook Edition notebook, and then immediately use it there:


With Wolfram|Alpha Notebook Edition it’s very easy to create compelling content. The content can involve pure calculations or visualizations. But—using the capabilities of the Wolfram Knowledgebase—it can also involve a vast range of real-world data, whether about countries, chemicals, words or artworks. And you can access it using natural language, and work with it directly in a notebook:

Using natural language

Wolfram|Alpha Notebook Edition is a great tool for students to use on their own computers. But it’s also a great tool for lectures and class demonstrations (as well as for student presentations). Go to File > New > Presenter Notebook, and you’ll get a notebook that’s set up to create a Wolfram|Alpha Notebook Edition slide show:

Presenter notebook

Click Start Presentation and you can start presenting. But what you’ll have is not just a “PowerPoint-style” slide show. It’s a fully interactive, editable, computable slide show. The Manipulate interfaces work. Everything is immediately editable. And you can do computations right there during the presentation, exploring different cases, pulling in different data, and so on.

Slide show

Making Code from Natural Language

We invented notebooks more than 30 years ago, and they’ve been widely used in Mathematica ever since. But while in Mathematica (and Wolfram Desktop) notebooks you (by default) specify computations in the precise syntax and semantics of the Wolfram Language, in Wolfram|Alpha Notebook Edition notebooks you instead specify them just using free-form Wolfram|Alpha-style input.

And indeed one of the key technical achievements that’s made Wolfram|Alpha Notebook Edition possible is that we’ve now developed increasingly robust natural-language-to-code technology that’s able to go from the free-form natural language input you type to precise Wolfram Language code that can be used to build up computations:

Natural language to code

By default, Wolfram|Alpha Notebook Edition is set up to show you the Wolfram Language code it generates. You don’t need to look at this code (and you can set it to always be hidden). But—satisfyingly for me as a language designer—students seem to find it very easy to read, often actually easier than math. And reading it gives them an extra opportunity to understand what’s going on—and to make sure the computation they’ve specified is actually the one they want.

And there’s a great side effect to the fact that Wolfram|Alpha Notebook Edition generates code: through routinely being exposed to code that represents natural language they’ve entered, students gradually absorb the idea of expressing things in computational language, and the concepts of computational thinking.

If a student wants to change a computation when they’re using Wolfram|Alpha Notebook Edition, they can always edit the free-form input they gave. But they can also directly edit the Wolfram Language that’s been generated, giving them real computational language experience.

Free-form input

What Should I Do Next? The Predictive Interface

A central goal of Wolfram|Alpha Notebook Edition is to be completely “self-service”—so that students at all levels can successfully use it without any outside instruction or assistance. Of course, free-form input is a key part of achieving this. But another part is the Wolfram|Alpha Notebook Edition Predictive Interface—that suggests what to do next based on what students have done.

Enter a computation and you’ll typically see some buttons pop up under the input field:


These buttons will suggest directions to take. Here step-by-step solution generates an enhanced interactive version of Wolfram|Alpha Pro step-by-step functionality—all right in the notebook:

Step-by-step functionality

Click related computations and you’ll see suggestions for different computations you might want to do:

Related computations

It suggests plotting the integrand and the integral:

Plotting the integrand and the integral

It also suggests you might like to see a series expansion:

Series expansion

Now notice that underneath the output there’s a bar of suggestions about possible follow-on computations to do on this output. Click, for example, coefficient list to find the list of coefficients:

Coefficient list

Now there are new suggestions. Click, for example, total to find the total of the coefficients:

Find the total of the coefficients

The Math Experience

Wolfram|Alpha Notebook Edition has got lots of features to enhance the “math experience”. For example, click the button at the top of the notebook and you’ll get a “math keyboard” that you can use to directly enter math notation:

Math keyboard

The Wolfram Language that underlies Wolfram|Alpha Notebook Edition routinely handles the math that’s needed by the world’s top mathematicians. But having all that sophisticated math can sometimes lead to confusions for students. So in Wolfram|Alpha Notebook Edition there are ways to say “keep the math simple”. For example, you can set it to minimize the use of complex numbers:



Wolfram|Alpha Notebook Edition also by default does things like adding constants of integration to indefinite integrals:

Constants of integration

By the way, Wolfram|Alpha Notebook Edition by default automatically formats mathematical output in elegant “traditional textbook” form. But it always includes a little button next to each output, so you can toggle between “traditional form”, and standard Wolfram Language form.

It’s quite common in doing math to have a function, and just say “I want to plot that!” But what range should you use? In Mathematica (or the Wolfram Language), you’d have to specify it. But in Wolfram|Alpha Notebook Edition there’s always an automatic range that’s picked:

Automatic range

But since you can see the Wolfram Language code—including the range—it’s easy to change that, and specify whatever range you want.

Specify range

What if you want to get an interactive control to change the range, or to change a parameter in the function? In Mathematica (or the Wolfram Language) you’d have to write a Manipulate. But in Wolfram|Alpha Notebook Edition, you can build a whole interactive interface just using natural language:

Interactive interface

And because in Wolfram|Alpha Notebook Edition the Manipulate computations are all running directly on your local computer, nothing is being slowed down by network transmission—and so everything moves at full speed. (Also, if you have a long computation, you can just let it keep running on your computer; there’s no timeout like in Wolfram|Alpha on the web.)

Multistep Computation

One of the important features of Wolfram|Alpha Notebook Edition is that it doesn’t just do one-shot computations; it allows you to do multistep computations that in effect involve a back-and-forth conversation with the computer, in which you routinely refer to previous results:

Multistep computation

Often it’s enough to just talk about the most recent result, and say things like “plot it as a function of x”. But it’s also quite common to want to refer back to results earlier in the notebook. One way to do this is to say things like “the result before last”—or to use the Out[n] labels for each result. But another thing that Wolfram|Alpha Notebook Edition allows you to do is to set values of variables, that you can then use throughout your session:

Set values

It’s also possible to define functions, all with natural language:

Define functions

There are lots of complicated design and implementation issues that arise in dealing with multistep computations. For example, if you have a traditional result for an indefinite integral, with a constant of integration, what do you do with the constant when you want to plot the result? (Wolfram|Alpha Notebook Edition consistently handles arbitrary additive constants in plots by effectively setting them to zero.)

Integrate x

It can also be complicated to know what refers to what in the “conversation”. If you say “plot”, are you trying to plot your latest result, or are you asking for an interface to create a completely new plot? If you use a pronoun, as in “plot it”, then it’s potentially more obvious what you mean, and Wolfram|Alpha Notebook Edition has a better chance of being able to use its natural language understanding capabilities to figure it out.

The World with Wolfram|Alpha Notebook Edition

It’s been very satisfying to see how extensively Wolfram|Alpha has been adopted by students. But mostly that adoption has been outside the classroom. Now, with Wolfram|Alpha Notebook Edition, we’ve got a tool that can immediately be put to use in the classroom, across the whole college and precollege spectrum. And I’m excited to see how it can streamline coursework, deepen understanding, enable new concepts to be taught, and effectively provide a course-based personal AI tutor for every student.

Starting today, Wolfram|Alpha Notebook Edition is available on all standard computer platforms (Mac, Windows, Linux). (A cloud version will also be available on the web soon.) Colleges and universities with full Wolfram Technology System site licenses can automatically start using Wolfram|Alpha Notebook Edition today; at schools with other site licenses, it can immediately be added. It’s available to K–12 schools and junior colleges in classroom packs, or as a site license. And, of course, it’s also available to individual teachers, students, hobbyists and others.

(Oh, and if you have Mathematica or Wolfram Desktop, it’ll also be possible in future versions to create “Wolfram|Alpha mode” notebooks that effectively integrate Wolfram|Alpha Notebook Edition capabilities. And in general there’s perfect compatibility among Wolfram|Alpha Notebook Edition, Mathematica, Wolfram Desktop, Wolfram Cloud, Wolfram Programming Lab, etc.—providing a seamless experience for people progressing across education and through professional careers.)

Like Wolfram|Alpha—and the Wolfram Language—Wolfram|Alpha Notebook Edition will continue to grow in capabilities far into the future. But what’s there today is already a remarkable achievement that I think will be transformative in many educational settings.

More than 31 years ago we introduced Mathematica (and what’s now the Wolfram Language). A decade ago we introduced Wolfram|Alpha. Now, today, with the release of Wolfram|Alpha Notebook Edition we’re giving a first taste—in the context of education—of a whole new approach to computing: a full computing environment that’s driven by natural language. It doesn’t supplant Wolfram Language, or Wolfram|Alpha—but it defines a new direction that in time will bring the power of computation to a whole massive new audience.

]]> 0
<![CDATA[A Book from Alan Turing… and a Mysterious Piece of Paper]]> Tue, 27 Aug 2019 17:12:32 +0000 Stephen Wolfram A Book from Alan Turing...How I Got the Book In May 2017, I got an email from a former high-school teacher of mine named George Rutter: “I have a copy of Dirac’s big book in German (Die Prinzipien der Quantenmechanik) that was owned by Alan Turing, and following your book Idea Makers it seemed obvious that you were the [...]]]> A Book from Alan Turing...

A Book from Alan Turing...

How I Got the Book

In May 2017, I got an email from a former high-school teacher of mine named George Rutter: “I have a copy of Dirac’s big book in German (Die Prinzipien der Quantenmechanik) that was owned by Alan Turing, and following your book Idea Makers it seemed obvious that you were the right person to own this.” He explained that he’d got the book from another (by then deceased) former high-school teacher of mine, Norman Routledge, who I knew had been a friend of Alan Turing’s. George ended, “If you would like the book, I could give it to you the next time you are in England.”

A couple of years passed. But in March 2019 I was indeed in England, and arranged to meet George for breakfast at a small hotel in Oxford. We ate and chatted, and waited for the food to be cleared. Then the book moment arrived. George reached into his briefcase and pulled out a rather unassuming, typical mid-1900s academic volume.

P. A. M. Dirac's Die Prinzipien der Quantenmechanik

I opened the front of the book, wondering if it might have a “Property of Alan Turing” sticker or something. It didn’t. But what it did have (in addition to an inscription saying “from Alan Turing’s books”) was a colorful four-page note from Norman Routledge to George Rutter, written in 2002.

I had known Norman Routledge when I was a high-school student at Eton in the early 1970s. He was a math teacher, nicknamed “Nutty Norman”. He was charmingly over the top in many ways, and told endless stories about math and other things. He’d also been responsible for the school getting a computer (programmed with paper tape, and the size of a desk)—that was the very first computer I ever used.

At the time, I didn’t know too much about Norman’s background (remember, this was long before the web). I knew he was “Dr. Routledge”. And he often told stories about people in Cambridge. But he never mentioned Alan Turing to me. Of course, Alan Turing wasn’t famous yet (although, as it happens, I’d already heard of him from someone who’d known him at Bletchley Park during the Second World War).

Alan M. Turing by Sara Turing

Alan Turing still wasn’t famous in 1981 when I started studying simple programs, albeit in the context of cellular automata rather than Turing machines. But looking through the card catalog at the Caltech library one day, I chanced upon a book called Alan M. Turing by Sara Turing, his mother. There was lots of information in the book—among other things, about Turing’s largely unpublished work on biology. But I didn’t learn anything about a connection to Norman Routledge, because the book didn’t mention him (although, as I’ve now found out, Sara Turing did correspond with Norman about the book, and Norman ended up writing a review of it).

A decade later, very curious about Turing and his (then still unpublished) work on biology, I arranged to visit the Turing Archive at King’s College, Cambridge. Soon I’d gone through what they had of Turing’s technical papers, and with some time to spare, I thought I might as well ask to see his personal correspondence too. And flipping through it, I suddenly saw a couple of letters from Alan Turing to Norman Routledge.

By that time, Andrew Hodges’s biography—which did so much to make Turing famous—had appeared, and it confirmed that, yes, Alan Turing and Norman Routledge had indeed been friends, and in fact Turing had been Norman’s PhD advisor. I wanted to ask Norman about Turing, but by then Norman was retired and something of a recluse. Still, when I finished A New Kind of Science in 2002 (after my own decade of reclusiveness) I tracked him down and sent him a copy of the book with an inscription describing him as “My last mathematics teacher”. Some correspondence ensued, and in 2005 I was finally in England again, and arranged to meet Norman for a quintessentially English tea at a fancy hotel in London.

We had a lovely chat about many things, including Alan Turing. Norman started by saying that he’d really known Turing mostly socially—and that that was 50 years ago. But still he had plenty to say about him. “He was a loner.” “He giggled a lot.” “He couldn’t really talk to non-mathematicians.” “He was always afraid of upsetting his mother.” “He would go off in the afternoon and run a marathon.” “He wasn’t very ambitious (though ‘one wasn’t’ at King’s in those days).” Eventually the conversation came back to Norman. He said that even though he’d been retired for 16 years, he still contributed items to the Mathematical Gazette, in order, he said, “to unload things before I pass to a better place”, where, he added, somewhat impishly, “all mathematical truths will surely be revealed”. When our tea was finished, Norman donned his signature leather jacket and headed for his moped, quite oblivious to the bombings that had so disrupted transportation in London on that particular day.

That was the last time I saw Norman, and he died in 2013. But now, six years later, as I sat at breakfast with George Rutter, here was this note from him, written in 2002 in his characteristically lively handwriting:

Norman's letter

Norman's letter, page 1 Norman's letter, page 2 Norman's letter, page 3 Norman's letter, page 4 Image Map

I read it quickly at first. It was colorful as always:

I got Alan Turing’s book from his friend & executor Robin Gandy (it was quite usual at King’s for friends to be offered books from a dead man’s library—I selected the collected poems of A. E. Housman from the books of Ivor Ramsay as a suitable memento: he was the Dean & jumped off the chapel [in 1956])…

Later in the note he said:

You ask about where, eventually, the book should go—I would prefer it to go to someone (or some where) wh. wd. appreciate the Turing connection, but really it is up to you.

Stephen Wolfram sent me his impressive book, but I’ve done no more than dip into it…

He ended by congratulating George Rutter for having the courage to move (as it turned out, temporarily) to Australia in his retirement, saying that he’d “toyed with moving to Sri Lanka, for a cheap, lotus-eating existence”, but added “events there mean I was wise not to do so” (presumably referring to the Sri Lankan Civil War).

What’s In the Book?

OK, so here I was with a copy of a book in German written by Paul Dirac, that was at one time owned by Alan Turing. I don’t read German, and I’d had a copy of the same book in English (which was its original language) since the 1970s. Still, as I sat at breakfast, I thought it only proper that I should look through the book page by page. After all, that’s a standard thing one does with antiquarian books.

I have to say that I was struck by the elegance of Dirac’s presentation. The book was published in 1931, yet its clean formalism (and, yes, despite the language barrier, I could read the math) is pretty much as one would write it today. (I don’t want to digress too much about Dirac, but my friend Richard Feynman told me that at least to him, Dirac spoke only monosyllabically. Norman Routledge told me that he had been friends in Cambridge with Dirac’s stepson, who became a graph theorist. Norman quite often visited the Dirac household, and said the “great man” was sometimes in the background, always with lots of mathematical puzzles around. I myself unfortunately never met Dirac, though I’m told that after he finally retired from Cambridge and went to Florida, he lost much of his stiffness and became quite social.)

But back to Turing’s copy of Dirac’s book. On page 9 I started to see underlinings and little marginal notes, all written in light pencil. I kept on flipping pages. After a few chapters, the annotations disappeared. But then, suddenly, tucked into page 127, there was a note:

German note

It was in German, with what looked like fairly typical older German handwriting. And it seemed to have something to do with Lagrangian mechanics. By this point I’d figured out that someone must have had the book before Turing, and this must be a note made by that person.

I kept flipping through the book. No more annotations. And I was thinking I wouldn’t find anything else. But then, on page 231, a bookmark—with a charmingly direct branding message:

Heffers bookmark

Would there be anything more? I continued flipping. Then, near the end of the book, on page 259, in a section on the relativistic theory of electrons, I found this:

Folded note

I opened the piece of paper:

Opened note

I recognized it immediately: it’s lambda calculus, with a dash of combinators. But what on Earth was it doing here? Remember, the book is about quantum mechanics. But this is about mathematical logic, or what’s now considered theory of computation. Quintessential Turing stuff. So, I immediately wondered, did Turing write this page?

Even as we were sitting at breakfast, I was looking on the web for samples of Turing’s handwriting. But I couldn’t find many calculational ones, so couldn’t immediately conclude much. And soon I had to go, carefully packing the book away, ready to pursue the mystery of what this page was, and who wrote it.

About the Book

Before anything else, let’s talk about the book itself. Dirac’s The Principles of Quantum Mechanics was published in English in 1930, and very quickly also appeared in German. (The preface by Dirac is dated May 29, 1930; the one from the translator—Werner Bloch—August 15, 1930.) The book was a landmark in the development of quantum mechanics, systematically setting up a clear formalism for doing calculations, and, among other things, explaining Dirac’s prediction of the positron, which would be discovered in 1932.

Why did Alan Turing get the book in German rather than English? I don’t know for sure. But in those days, German was the leading language of science, and we know Alan Turing knew how to read it. (After all, the title of his famous Turing machine paper “On Computable Numbers, with an Application to the Entscheidungsproblem” had a great big German word in it—and within the body of the paper he referred to the rather obscure Gothic characters he used as “German letters”, contrasting them, for example, with Greek letters.)

Did Alan Turing buy the book, or was he given it? I don’t know. On the inside front cover of Turing’s copy of the book is a pencil notation “20/-”, which was standard notation for “20 shillings”, equal to £1. On the right-hand page, there’s an erased “26.9.30”, presumably meaning 26 September, 1930—perhaps the date when the book was first in inventory. Then to the far right, there’s an erased “20-.”, perhaps again the price. (Could this have been a price in Reichsmarks, suggesting the book was sold in Germany? Even though at that time 1 RM was worth roughly 1 shilling, a German price would likely have been written as, for example, “20 RM”.) Finally, on the inside back cover there’s “c 5/-”—maybe the (highly discounted) price for the book used.

Let’s review the basic timeline. Alan Turing was born June 23, 1912 (coincidentally, exactly 76 years before Mathematica 1.0 was released). He went as an undergraduate to King’s College, Cambridge in the fall of 1931. He got his undergraduate degree after the usual three years, in 1934.

In the 1920s and early 1930s, quantum mechanics was hot, and Alan Turing was interested in it. From his archives, we know that in 1932—as soon as it was published—he got John von Neumann’s Mathematical Foundations of Quantum Mechanics (in its original German). We also know that in 1935, he asked the Cambridge physicist Ralph Fowler for a possible question to study in quantum mechanics. (Fowler suggested computing the dielectric constant of water—which actually turns out to be a very hard problem, basically requiring full-fledged interacting quantum field theory analysis, and still not completely solved.)

When and how did Turing get his copy of Dirac’s book? Given that there seems to be a used price in the book, Turing presumably bought it used. Who was its first owner? The annotations in the book seem to be concerned primarily with logical structure, noting what should be considered an axiom, what logically depends on what, and so on. What about the note tucked into page 127?

Well, perhaps coincidentally, page 127 isn’t just any page: it’s the page where Dirac talks about the quantum principle of least action, and sets the stage for the Feynman path integral—and basically all modern quantum formalism. But what does the note say? It’s expanding on equation 14, which is an equation for the time evolution of a quantum amplitude. The writer has converted Dirac’s A for amplitude into a ρ, possibly reflecting an earlier (fluid-density analogy) German notation. Then the writer attempts an expansion of the action in powers of (Planck’s constant over 2π, sometimes called Dirac’s constant).

But there doesn’t seem to be a lot to be gleaned from what’s on the page. Hold the page up to the light, though, and there’s a little surprise—a watermark reading “Z f. Physik. Chem. B”:

Z f. Physik. Chem. B watermark

That’s a short form of Zeitschrift für physikalische Chemie, Abteilung B—a German journal of physical chemistry that began publication in 1928. Was the note perhaps written by an editor of the journal? Here’s the masthead of the journal for 1933. Conveniently, the editors are listed with their locations, and one stands out: Born · Cambridge.

Zeitschrift für physikalische Chemie, Abteilung B

That’s Max Born, of the Born interpretation, and many other things in quantum mechanics (and also the grandfather of the singer Olivia Newton-John). So, was this note written by Max Born? Unfortunately it doesn’t seem like it: the handwriting doesn’t match.

OK, so what about the bookmark at page 231? Here are the two sides of it:

Heffers bookmark

The marketing copy is quaint and rather charming. But when is it from? Well, there’s still a Heffers Bookshop in Cambridge, though it’s now part of Blackwell’s. But for more than 70 years (ending in 1970) Heffers was located, as the bookmark indicates, at 3 and 4 Petty Cury.

But there’s an important clue on the bookmark: the phone number is listed as “Tel. 862”. Well, it turns out that in 1939, most of Cambridge (including Heffers) switched to 4-digit numbers, and certainly by 1940 bookmarks were being printed with “modern” phone numbers. (English phone numbers progressively got longer; when I was growing up in England in the 1960s, our phone numbers were “Oxford 56186” and “Kidmore End 2378”. Part of why I remember these numbers is the now-strange-seeming convention of always saying one’s number when answering the phone.)

But, OK, so the bookmark was from before 1939. But how much before? There are quite a few scans of old Heffers ads to be found on the web—and from at least 1912 (along with “We solicit the favour of your enquiries…”) they list “Telephone 862”, helpfully adding “(2 lines)”. And there are even some bookmarks with the same design to be found in copies of books from as long ago as 1904 (though it’s not clear they were original to the books). But for our purposes it seems as if we can reasonably conclude that our book came from Heffers (which was the main bookstore in Cambridge, by the way) sometime between 1930 and 1939.

The Lambda Calculus Page

OK, so we know something about when the book was bought. But what about the “lambda calculus page”? When was it written? Well, of course, lambda calculus had to have been invented. And that was done by Alonzo Church, a mathematician at Princeton, in an initial form in 1932, and in final form in 1935. (There had been precursors, but they hadn’t used the λ notation.)

There’s a complicated interaction between Alan Turing and lambda calculus. It was in 1935 that Turing had gotten interested in “mechanizing” the operations of mathematics, and had invented the idea of a Turing machine, and used it to solve a problem in the foundations of mathematics. Turing had sent a paper about it to a French journal (Comptes rendus), but initially it was lost in the mail; and then it turned out the person he’d sent it to wasn’t around anyway, because they’d gone to China.

But in May 1936, before Turing could send his paper anywhere else, Alonzo Church’s paper arrived from the US. Turing had been “scooped” once before, when in 1934 he created a proof of the central limit theorem, only to find that there was a Norwegian mathematician who’d already given a proof in 1922.

It wasn’t too hard to see that Turing machines and lambda calculus were actually equivalent in the kinds of computations they could represent (and that was the beginning of the Church–Turing thesis). But Turing (and his mentor Max Newman) got convinced that Turing’s approach was different enough to deserve separate publication. And so it was that in November 1936 (with a bug fix the following month), Turing’s famous paper “On Computable Numbers…” was published in the Proceedings of the London Mathematical Society.

To fill in a little more of the timeline: from September 1936 to July 1938 (with a break of three months in the summer of 1937), Turing was at Princeton, having gone there to be, at least nominally, a graduate student of Alonzo Church. While at Princeton, Turing seems to have concentrated pretty completely on mathematical logic—writing several difficult-to-read papers full of Church’s lambdas—and most likely wouldn’t have had a book about quantum mechanics with him.

Turing was back in Cambridge in July 1938, but already by September of that year he was working part-time for the Government Code and Cypher School—and a year later he moved to Bletchley Park to work full time on cryptanalysis. After the war ended in 1945, Turing moved to London to work at the National Physical Laboratory on producing a design for a computer. He spent the 1947–8 academic year back in Cambridge, but then moved to Manchester to work on building a computer there.

In 1951, he began working in earnest on theoretical biology. (To me it’s an interesting irony that he seems to have always implicitly assumed that biological systems have to be modeled by differential equations, rather than by something discrete like Turing machines or cellular automata.) He also seems to have gotten interested in physics again, and by 1954 even wrote to his friend and student Robin Gandy that “I’ve been trying to invent a new Quantum Mechanics” (though he added, “but it won’t really work”). But all this came to an end on June 7, 1954, when Turing suddenly died. (My own guess is that it was not suicide, but that’s a different story.)

OK, but back to the lambda calculus page. Hold it up to the light, and once again there’s a watermark:

Excelsior watermark

So it’s a British-made piece of paper, which seems, for example, to make it unlikely to have been used in Princeton. But can we date the paper? Well, after some help from the British Association of Paper Historians, we know that the official manufacturer of the paper was Spalding & Hodge, Papermakers, Wholesale and Export Stationers of Drury House, Russell Street off Drury Lane, Covent Garden, London. But this doesn’t help as much as one might think—because their Excelsior brand of machine-made paper seems to have been listed in catalogs all the way from the 1890s to 1954.

What Does the Page Say?

What does the page say?

OK, so let’s talk in more detail about what’s on the two sides of the page. Let’s start with the lambdas.

These are a way of defining “pure” or “anonymous” functions, and they’re a core concept in mathematical logic, and nowadays also in functional programming. They’re common in the Wolfram Language, and they’re pretty easy to explain there. One writes f[x] to mean a function f applied to an argument x. And there are lots of named functions that f can be—like Abs or Sin or Blur. But what if one wants f[x] to be 2x+1? There’s no immediate name for that function. But is there still something we can write for f that will make f[x] be this?

The answer is yes: in place of f we write Function[a, 2a+1]. And in the Wolfram Language, Function[a, 2a+1][x] is defined to give 2x+1. The Function[a, 2a+1] is a “pure” or “anonymous” function, that represents the pure operation of doubling and adding 1.

Well, λ in lambda calculus is the exact analog of Function in the Wolfram Language—and so for example λa.(2a+1) is equivalent to Function[a, 2a+1]. (It’s worth noting that Function[b, 2b+1] is equivalent; the “bound variable” a or b is just a placeholder—and in the Wolfram Language it can be avoided by using the alternative notation (2#+1)&.)

In traditional mathematics, functions tend to be thought of as things that map inputs (like, say, integers) to outputs (that are also, say, integers). But what kind of a thing is Function (or λ)? It’s basically a structural operator that takes expressions and turns them into functions. That’s a bit weird from the point of view of traditional mathematics and mathematical notation. But if one’s thinking about manipulating arbitrary symbols, it’s much more natural, even if at first it still seems a little abstract. (And, yes, when people learn the Wolfram Language, I can always tell they’ve passed a certain threshold of abstract understanding when they get the idea of Function.)

OK, but the lambdas are just part of what’s on the page. There’s also another, yet more abstract concept: combinators. See the rather obscure-looking line PI1IIx? What does it mean? Well, it’s a sequence of combinators, or effectively, a kind of abstract composition of symbolic functions.

Ordinary composition of functions is pretty familiar from mathematics. And in Wolfram Language one can write f[g[x]] to mean “apply f to the result of applying g to x”. But does one really need the brackets? In the Wolfram Language f@g@x is an alternative notation. But in this notation, we’re relying on a convention in the Wolfram Language: that the @ operator associates to the right, so that f@g@x is equivalent to f@(g@x).

But what would (f@g)@x mean? It’s equivalent to f[g][x]. And if f and g were ordinary functions in mathematics, this would basically be meaningless. But if f is a higher-order function, then f[g] can itself be a function, which can perfectly well be applied to x.

OK, there’s another piece of complexity here. In f[x] the f is a function of one argument. And f[x] is equivalent to Function[a, f[a]][x]. But what about a function of two arguments, say f[x, y]? This can be written Function[{a,b}, f[a, b]][x, y]. But what about Function[{a}, f[a, b]]? What would this be? It’s got a “free variable” b just hanging out. Function[{b}, Function[{a}, f[a, b]]] would “bind” that variable. And then Function[{b}, Function[{a}, f[a, b]]][y][x] gives f[x, y] again. (The process of unwinding functions so that they have single arguments is called “currying”, after a logician named Haskell Curry.)

If there are free variables, then there’s all sorts of complexity about how functions can be composed. But if we restrict ourselves to Function or λ objects that don’t have free variables, then these can basically be freely composed. And such objects are called combinators.

Combinators have a long history. So far as one knows, they were first invented in 1920 by a student of David Hilbert’s named Moses Schönfinkel. At the time, it had only recently been discovered that one didn’t need And and Or and Not to represent expressions in standard propositional logic: it was sufficient to use the single operator that we’d now call Nand (because, for example, writing Nand as ·, Or[a, b] is just (a·a)·(b·b)). Schönfinkel wanted to find the same kind of minimal representation of predicate logic, or in effect, logic including functions.

And what he came up with was the two “combinators” S and K. In Wolfram Language notation, K[x_][y_] x and S[x_][y_][z_] x[z][y[z]]. Now, here’s the remarkable thing: it turns out to be possible to use these two combinators to perform any computation. So, for example, S[K[S]][S[K[S[K[S]]]][S[K[K]]]] can be used as a function to add two integers.

It is, to put it mildly, quite abstract stuff. But now that one’s understood Turing machines and lambda calculus, it’s possible to see that Schönfinkel’s combinators actually anticipated the concept of universal computation. (And what’s more remarkable still, the definitions of S and K from 1920 are almost minimally simple, reminiscent of the very simplest universal Turing machine that I finally suggested in the 1990s, and was proved in 2007.)

But back to our page, and the line PI1IIx. The symbols here are combinators, and they’re all intended to be composed. But the convention was that function composition should be left-associative, so that fgx should be interpreted not like f@g@x as f@(g@x) or f[g[x]] but rather like (f@g)@x or f[g][x]. So, translating a bit for convenient Wolfram Language use, PI1IIx is p[i][one][i][i][x].

Why would someone be writing something like this? To explain that, we have to talk about the concept of Church numerals (named after Alonzo Church). Let’s say we’re just working with symbols and with lambdas, or combinators. Is there a way we use these to represent integers?

Well, how about just saying that a number n corresponds to Function[x, Nest[f, x, n]]? Or, in other words, that (in shorter notation) 1 is f[#]&, 2 is f[f[#]]&, 3 is f[f[f[#]]]&, and so on. This might seem irreducibly obscure. But the reason it’s interesting is that it allows us to do everything completely symbolically and abstractly, without ever having to explicitly talk about something like integers.

With this setup, imagine, for example, adding two numbers: 3 can be represented as f[f[f[#]]]&, and 2 is f[f[#]]&. We can add them just by applying one of them to the other:

f[f[f[#]]] & [f[f[#]] &]

f[f[f[#]]] & [f[f[#]] &]

OK, but what is the f supposed to be? Well, just let it be anything! In a sense, “go lambda” all the way, and represent numbers by functions that take f as an argument. In other words, make 3 for example be Function[f, f[f[f[#]]]&] or Function[f, Function[x, f[f[f[x]]]]. (And, yes, exactly when and how you need to name variables is the bane of lambda calculus.)

Here’s a fragment from Turing’s 1937 paper “Computability and λ-Definability” that sets things up exactly as we just discussed:

Fragment from "Computability and λ-Definability"

The notation is a little confusing. Turing’s x is our f, while his x' (the typesetter did him no favor by inserting space) is our x. But it’s exactly the same setup.

OK, so let’s take a look at the line right after the fold on the front of the page. It’s I1IIYI1IIx. In Wolfram Language notation this would be i[one][i][i][y][i][one][i][i][x]. But here, i is the identity function, so i[one] is just one. Meanwhile, one is the Church numeral for 1, or Function[f, f[#]&]. But with this definition one[a] becomes a[#]& and one[a][b] becomes a[b]. (By the way, i[a][b], or Identity[a][b], is also a[b].)

It keeps things cleaner to write the rules for i and one using pattern matching rather than explicit lambdas, but the result is the same. Apply these rules and one gets:

i[one][i][i][y][i][one][i][i][x] //. {i[x_] → x, one[x_][y_] → x[y]}

i[one][i][i][y][i][one][i][i][x] //.
					 {i[x_] -> x, one[x_][y_] -> x[y]}

And that’s exactly the same as the first reduction shown:

Excerpt 1

OK, let’s look higher on the page again now:

Excerpt 2

There’s a rather confusing “E” and “D”, but underneath these say “P” and “Q”, so we can write out the expression, and evaluate it (note that here—after some confusion with the very last character—the writer makes both [ ... ] and ( … ) represent function application):

Function[a, a[p]][q]

Function[a, a[p]][q]

OK, so this is the first reduction shown. To see more, let’s substitute in the form of Q:

q[p] /. q → Function[f, f[i][one][i][i][x]]

q[p] /.
					 q -> Function[f, f[i][one][i][i][x]]

We get exactly the next reduction shown. OK, so what about putting in the form for P?

Excerpt 3

Here’s the result:

p[i][one][i][i][x] /. {p → Function[r, r[Function[s, s[one][i][i][y]]]]}

  x] /.
   {p -> Function[r, r[Function[s, s[one][i][i][y]]]]}

And now using the fact that i is the identity, we get:

i[Function[s, s[one][i][i][y]]][one][i][i][x] /. {i[x_] → x}

i[Function[s, s[one][i][i][y]]][one][i][i][x] /.
					 {i[x_] -> x}

But oops. This isn’t the next line written. Is there a mistake? It’s not clear. Because, after all, unlike in most of the other cases, there isn’t an arrow indicating that the next line follows from the previous one.

OK, so there’s a mystery there. But let’s skip ahead to the bottom of the page:

Excerpt 4

The 2 here is a Church numeral, defined for example by the pattern two[a_][b_] a[a[b]]. But notice that this is actually the form of the second line, with a being Function[r, r[p]] and b being q. So then we’d expect the reduction to be:

two[Function[r, r[p]]][q] //. {two[x_][y_] → x[x[y]]}

two[Function[r, r[p]]][q] //.
					 {two[x_][y_] -> x[x[y]]}

Somehow, though, the innermost a[b] is being written as x (probably different from the x earlier on the page), making the final result instead:

Function[r, r[p]][x]

Function[r, r[p]][x]

OK, so we can decode quite a bit of what’s happening on the page. But at least one mystery that remains is what Y is supposed to be.

There’s actually a standard “Y combinator” in combinatory logic: the so-called fixed-point combinator. Formally, this is defined by saying that Y[f] must be equal to f[Y[f]], or, in other words, that Y[f] doesn’t change when f is applied, so that it’s a fixed point of f. (The Y combinator is related to #0 in the Wolfram Language.)

In modern times, the Y combinator has been made famous by the Y Combinator startup accelerator, named that way by Paul Graham (who had been a longtime enthusiast of functional programming and the LISP programming language—and had written an early web store using it) because (as he once told me) “nobody understands the Y combinator”. (Needless to say, Y Combinator is all about avoiding having companies go to fixed points…)

The Y combinator (in the sense of fixed-point combinator) was invented several times. Turing actually came up a version of it in 1937, that he called Θ. But is the “Y” on our page the famous fixed-point combinator? Probably not. So what is our “Y”? We see this reduction:

Excerpt 5

But that’s not enough information to uniquely determine what Y is. It’s clear Y isn’t operating just on a single argument; it seems to be dealing with at least two arguments. But it’s not clear (at least to me) how many arguments it’s taking, and what it’s doing.

OK, so even though we can interpret many parts of the page, we have to say that globally it’s not clear what’s been done. But even though it’s needed a lot of explanation here, what’s on the page is actually fairly elementary in the world of lambda calculus and combinators. Presumably it’s an attempt to construct a simple “program”—using lambda calculus and combinators—to do something. But as is typical in reverse engineering, it’s hard for us to tell what the “something”— the overall “explainable” goal—is supposed to be.

There’s one more feature of the page that’s worth commenting on, and that’s its use of brackets. In traditional mathematics one basically (if confusingly) uses parentheses for everything—both function application (as in f(x)) and grouping of terms (as in (1+x)(1-x), or, more ambiguously, a(1-x)). (In Wolfram Language, we separate different uses, with square brackets for function application—as in f[x]—and parentheses only for grouping.)

And in the early days of lambda calculus, there were lots of issues about brackets. Later, Alan Turing would write a whole (unpublished) paper entitled “The Reform of Mathematical Notation and Phraseology”, but already in 1937 he felt he needed to describe the (rather hacky) current conventions for lambda calculus (which were due to Church, by the way).

He said that f applied to g should be written {f}(g), unless f is just a single symbol, in which case it can be f(g). Then he said that a lambda (as in Function[a, b]) should be written λ a[b], or alternatively λ a . b. By perhaps 1940, however, the whole idea of using { … } and [ ... ] to mean different things had been dropped, basically in favor of standard-mathematical-style parentheses.

Look at what’s near the top of the page:

Excerpt 6

As written, this is a bit hard to understand. In Church’s convention, the square brackets would be for grouping, with the opening bracket replacing the dot. And with this convention, it’s clear that the Q (finally labeled D) enclosed in parentheses at the end is what the whole initial lambda is applied to. But actually, the square bracket doesn’t delimit the body of the lambda; instead, it’s actually representing another function application, and there’s no explicit specification of where the body of the lambda ends. At the very end, one can see that the writer changed a closing square bracket to a parenthesis, thereby effectively enforcing Church’s convention—and making the expression evaluate as the page shows.

So what does this little notational tangle imply? I think it strongly suggests that the page was written in the 1930s, or not too long thereafter—before conventions for brackets became clearer.

Whose Handwriting Is It?

OK, so we’ve talked about what’s on the page. But what about who wrote it?

The most obvious candidate would be Alan Turing, since, after all, the page was inside a book he owned. And in terms of content there doesn’t seem to be anything inconsistent with Alan Turing having written it—perhaps even when he was first understanding lambda calculus after getting Church’s paper in early 1936.

But what about the handwriting? Is that consistent with Alan Turing’s? Here are a few surviving samples that we know were written by Alan Turing:

Samples of Alan Turing's handwriting Samples of Alan Turing's handwriting Samples of Alan Turing's handwriting Image Map

The running text definitely looks quite different. But what about the notation? At least to my eye, it didn’t look so obviously different—and one might think that any difference could just be a reflection of the fact that the extant samples are pieces of exposition, while our page shows “thinking in action”.

Conveniently, the Turing Archive contains a page where Turing wrote out a table of symbols to use for notation. And comparing this, the letter forms did look to me fairly similar (this was from Turing’s time of studying plant growth, hence the “leaf area” annotation):

Table of Symbols

But I wanted to check further. So I sent the samples to Sheila Lowe, a professional handwriting examiner (and handwriting-based mystery writer) I happen to know—just presenting our page as “sample A” and known Turing handwriting as “sample B”. Her response was definitive, and negative: “The writing style is entirely different. Personality-wise, the writer of sample B has a quicker, more intuitive thinking style than the one of sample A.” I wasn’t yet completely convinced, but decided it was time to start looking at other alternatives.

So if Turing didn’t write this, who did? Norman Routledge said he got the book from Robin Gandy, who was Turing’s executor. So I sent along a “Sample C”, from Gandy:

Sample C

But Sheila’s initial conclusion was that the three samples were likely written by three different people, noting again that sample B came from “the quickest thinker and the one that is likely most willing to seek unusual solutions to problems”. (I find it a little charming that a modern handwriting expert would give this assessment of Turing’s handwriting, given how vociferously Turing’s school reports from the 1920s complained about his handwriting.)

Well, at this point it seemed as if both Turing and Gandy had been eliminated as writers of the page. So who might have written it? I started thinking about people Turing might have lent the book to. Of course, they’d have to be capable of doing calculations in lambda calculus.

I assumed that the person would have to be in Cambridge, or at least in England, given the watermark on the paper. And I took as a working hypothesis that 1936 or thereabouts was the relevant time. So who did Turing know then? We got a list of all math students and faculty at King’s College at the time. (There were 13 known students who started in 1930 through 1936.)

And from these, the most promising candidate seemed to be David Champernowne. He was the same age as Turing, a longtime friend, and also interested in the foundations of mathematics—in 1933 already publishing a paper on what’s now called Champernowne’s constant: 0.12345678910111213… (obtained by concatenating the digits of 1, 2, 3, 4, …, 8, 9, 10, 11, 12, …, and one of the very few numbers known to be “normal” in the sense that every possible block of digits occurs with equal frequency). In 1937, he even used Dirac gamma matrices, as mentioned in Dirac’s book, to solve a recreational math problem. (As it happens, years later, I became quite an aficionado of gamma matrix computations.)

After starting in mathematics, though, Champernowne came under the influence of John Maynard Keynes (also at King’s), and eventually became a distinguished economist, notably doing extensive work on income inequality. (Still, in 1948 he also worked with Turing to design Turbochamp: a chess-playing program that almost became the first ever to be implemented on a computer.)

But where could I find a sample of Champernowne’s handwriting? Soon I’d located his son Arthur Champernowne on LinkedIn, who, curiously, had a degree in mathematical logic, and had been working for Microsoft. He said his father had talked to him quite a lot about Turing’s work, though hadn’t mentioned combinators. He sent me a sample of his father’s handwriting (a piece about algorithmic music composition):

Champernowne's handwriting

One could immediately tell it wasn’t a match (Champernowne’s f’s have loops, etc.)

So who else might it be? I wondered about Max Newman, in many ways Alan Turing’s mentor. Newman had first got Turing interested in “mechanizing mathematics”, was a longtime friend, and years later would be his boss at Manchester in the computer project there. (Despite his interest in computation, Newman always seems to have seen himself first and foremost as a topologist, though his cause wasn’t helped by a flawed proof he produced of the Poincaré conjecture.)

It wasn’t difficult to find a sample of Newman’s handwriting. And no, definitely not a match.

Tracing the Book

OK, so handwriting identification hadn’t worked. And I decided the next thing to do was to try to trace in a bit more detail what had actually happened to the book I had in my hands.

So, first, what was the more detailed story with Norman Routledge? He had gone to King’s College, Cambridge as an undergraduate in 1946, and had gotten to know Turing then (yes, they were both gay). He graduated in 1949, then started doing a PhD with Turing as his advisor. He got his PhD in 1954, working on mathematical logic and recursion theory. He got a fellowship at King’s College, and by 1957 was Director of Studies in Mathematics there. He could have stayed doing this his whole life, but he had broad interests (music, art, architecture, recreational math, genealogy, etc.) and in 1960 changed course, and became a teacher at Eton—where he entertained (and educated) many generations of students (including me) with his eclectic and sometimes outlandish knowledge.

Could Norman Routledge have written the mysterious page? He knew lambda calculus (though, coincidentally, he mentioned at our tea in 2005 that he always found it “confusing”). But his distinctive handwriting style immediately excludes him as a possible writer.

Could the page be somehow associated with a student of Norman’s, perhaps from when he was still in Cambridge? I don’t think so. Because I don’t think Norman ever taught about lambda calculus or anything like it. In writing this piece, I found that Norman wrote a paper in 1955 about doing logic on “electronic computers” (and creating conjunctive normal forms, as BooleanMinimize now does). And when I knew Norman he was quite keen on writing utilities for actual computers (his initials were “NAR”, and he named his programs “NAR…”, with, for example, “NARLAB” being a program for creating textual labels using hole patterns punched in paper tape). But he never talked about theoretical models of computation.

OK, but let’s read Norman’s note inside the book a bit more carefully. The first thing we notice is that he talks about being “offered books from a dead man’s library”. And from the wording, it sounds as if this happened quite quickly after a person died, suggesting that Norman got the book soon after Turing’s death in 1954, and that Gandy didn’t have it for very long. Norman goes on to say that actually he got four books in total, two on pure math, and two on theoretical physics.

Then he says that he gave “the other [physics] one (by Herman Weyl, I think)” to “Sebag Montefiore, a pleasant, clever boy whom you [George Rutter] may remember”. OK, so who is that? I searched for my rarely used Old Etonian Association List of Members. (I have to report that on opening it, I could not help but notice its rules from 1902, the first under “Rights of Members” charmingly being “To wear the Colours of the Association”. I should add that I would probably never have joined this association or got this book but for the insistence of a friend of mine at Eton named Nicholas Kermack, who from the age of 12 planned how he would one day become Prime Minister, but sadly died at the age of 21.)

But in any case, there were five Sebag-Montefiores listed, with quite a distribution of dates. It wasn’t hard to figure out that the appropriate one was probably Hugh Sebag-Montefiore. Small world that it is, it turned out that his family had owned Bletchley Park before selling it to the British Government in 1938. And in 2000, Sebag-Montefiore had written a book about the breaking of Enigma—which is presumably why in 2002 Norman thought to give him a book that had been owned by Turing.

OK, so what about the other books Norman got from Turing? Not having any other way to work out what happened to them, I ordered a copy of Norman’s will. The last clause in the will was classic Norman:

Excerpt from Norman's will

But what the will ultimately said was that Norman’s books should be left to King’s College. And although the complete collection of his books doesn’t seem to be anywhere to be found, the two Turing-owned pure math books that he mentioned in his note are now duly in the King’s College archive collection.

But, OK, so the next question is: what happened to Turing’s other books? I looked up Turing’s will, which seemed to leave them all to Robin Gandy.

Gandy was a math undergraduate at King’s College, Cambridge, who in his last year of college—in 1940—had become friends with Alan Turing. In the early part of the war, Gandy worked on radio and radar, but in 1944 he was assigned to the same unit as Turing, working on speech encipherment. And after the war, Gandy went back to Cambridge, soon starting a PhD, with Turing as his advisor.

Gandy’s war work apparently got him interested in physics, and his thesis, completed in 1952, was entitled “On Axiomatic Systems in Mathematics and Theories in Physics”. What Gandy seems to have been trying to do is to characterize what physical theories are in mathematical logic terms. He talks about type theory and rules of inference, but never about Turing machines. And from what we know now, I think he rather missed the point. And indeed my own work from the early 1980s argued that physical processes should be thought of as computations—like Turing machines or cellular automata—not as things like theorems to be deduced. (Gandy has a rather charming discussion of the order of types involved in physical theories, saying for example that “I reckon that the order of any computable binary decimal is less than eight”. He says that “one of the reasons why modern quantum field theory is so difficult is that it deals with objects of rather high type—functionals of functions…”, eventually suggesting that “we might well take the greatest type in common use as an index of mathematical progress”.)

Gandy mentions Turing a few times in the thesis, noting in the introduction that he owes a debt to A. M. Turing, who “first called my somewhat unwilling attention to the system of Church” (i.e. lambda calculus)—though in fact the thesis has very few lambdas in evidence.

After his thesis, Gandy turned to purer mathematical logic, and for more than three decades wrote papers at the rate of about one per year, and traveled the international mathematical logic circuit. In 1969 he moved to Oxford, and I have to believe that I must have met him in my youth, though I don’t have any recollection of it.

Gandy apparently quite idolized Turing, and in later years would often talk about him. But then there was the matter of the Turing collected works. Shortly after Turing died, Sara Turing and Max Newman had asked Gandy—as Turing’s executor—to organize the publication of Turing’s unpublished papers. Years went by. Letters in the archives record Sara Turing’s frustration. But somehow Gandy never seemed to get the papers together.

Gandy died in 1995, still without the collected works complete. Nick Furbank—a literary critic and biographer of E. M. Forster who Turing had gotten to know at King’s College—was Turing’s literary executor, and finally he swung into action on the collected works. The most contentious volume seemed to be the one on mathematical logic, and for this he enlisted Robin Gandy’s first serious PhD student, a certain Mike Yates—who found letters to Gandy about the collected works that had been unopened for 24 years. (The collected works finally appeared in 2001—45 years after they were started.)

But what about the books Turing owned? In continuing to try to track them down, my next stop was the Turing family, and specifically Turing’s brother’s youngest child, Dermot Turing (who is actually Sir Dermot Turing, as a result of a baronetcy which passed down the non-Alan branch of the Turing family). Dermot Turing (who recently wrote a biography of Alan Turing) told me about “granny Turing” (aka Sara Turing), whose house apparently shared a garden gate with his family’s, and many other things about Alan Turing. But he said the family never had any of Alan Turing’s books.

So I went back to reading wills, and found out that Gandy’s executor was his student Mike Yates. We found out that Mike Yates had retired from being a professor 30 years ago, but was now living in North Wales. He said that in the decades he was working in mathematical logic and theory of computation, he’d never really touched a computer—but finally did when he retired (and, as it happens, discovered Mathematica soon thereafter). He said how remarkable it was that Turing had become so famous—and that when he’d arrived at Manchester just three years after Turing died, nobody talked about Turing, not even Max Newman when he gave a course about logic. Though later on, Gandy would talk about how swamped he was in dealing with Turing’s collected works—eventually leaving the task to Mike.

What did Mike know about Turing’s books? Well, he’d found one handwritten notebook of Turing’s, that Gandy had not given to King’s College, because (bizarrely) Gandy had used it as camouflage for notes he kept about his dreams. (Turing kept dream notebooks too, that were destroyed when he died.) Mike said that notebook had recently been sold at auction for about $1M. And that otherwise he didn’t think there was any Turing material among Gandy’s things.

It seemed like all our leads had dried up. But Mike asked to see the mysterious piece of paper. And immediately he said, “That’s Robin Gandy’s handwriting!” He said he’d seen so much of it over the years. And he was sure. He said he didn’t know much about lambda calculus, and couldn’t really read the page. But he was sure it had been written by Robin Gandy.

We went back to our handwriting examiner with more samples, and she agreed that, yes, what was there was consistent with Gandy’s writing. So finally we had it: Robin Gandy had written our mysterious piece of paper. It wasn’t written by Alan Turing; it was written by his student Robin Gandy.

Of course, some mysteries remain. Presumably Turing lent Gandy the book. But when? The lambda calculus notation seems like it’s from the 1930s. But based on comments in Gandy’s thesis, Gandy probably wouldn’t have been doing anything with lambda calculus until the late 1940s. Then there’s the question of why Gandy wrote it. It doesn’t seem directly related to his thesis, so maybe it was when he was first trying to understand lambda calculus.

I doubt we’ll ever know. But it’s certainly been interesting trying to track it down. And I have to say that the whole process has done much to heighten my awareness of just how complex the stories may be of all those books from past centuries that I own. And it makes me think I’d better make sure I’ve gone through all their pages, just to find out what curious things might be in there…

Thanks for additional help to Jonathan Gorard (local research in Cambridge), Dana Scott (mathematical logic) and Matthew Szudzik (mathematical logic).

]]> 9
<![CDATA[Fifty Years of Mentoring]]> Wed, 21 Aug 2019 20:06:13 +0000 Stephen Wolfram Fifty Years of MentoringI’ve been reflecting recently on things I like to do. Of course I like creating things, figuring things out, and so on. But something else I like—that I don’t believe I’ve ever written about before—is mentoring. I’ve been doing it a shockingly long time: my first memories of it date from before I was 10 [...]]]> Fifty Years of Mentoring

I’ve been reflecting recently on things I like to do. Of course I like creating things, figuring things out, and so on. But something else I like—that I don’t believe I’ve ever written about before—is mentoring. I’ve been doing it a shockingly long time: my first memories of it date from before I was 10 years old, 50 years ago. Somehow I always ended up being the one giving lots of advice—first to kids my own age, then also to ones somewhat younger, or older, and later to all sorts of people.

I was in England recently, and ran into someone I’d known as a kid nearly 50 years ago—and hadn’t seen since. He’s had a fascinating and successful career, but was kind enough to say that my interactions and advice to him nearly 50 years ago had really been important to him. Of course it’s nice to hear things like that—but as I reflect on it, I realize that mentoring is something I find fulfilling, whether or not I end up knowing that whatever seeds I’ve sown germinate (though, to be clear, I do find it fascinating to see what happens).

Mentoring is not like teaching. It’s something much more individual and personal. It’s about answering the specific “What should I do about X?” questions, and the general “What should I do given who I am?” questions. I’ve always been interested in people—which has been a great asset in identifying and leading people at my company all these years. It’s also what’s gotten me in recent years to write historical biography, and, sadly, to write a rather large number of obituaries.

But there’s something particularly fulfilling to me about mentoring, and about helping and changing outcomes, one person at a time. These days, there are two main populations I end up mentoring: CEOs, and kids. At some level, they’re totally different. But at some level, they’re surprisingly similar.

I like learning things, and I like solving problems. And in the mentoring I do, I’m always doing both these things. I’m hearing—often in quite a lot of detail—about different kinds of situations. And I’m trying to use my skills at problem solving to work out what to do. The constraint is always what is right for this particular person, and what is possible given the world as it is. But it’s so satisfying when one figures it out.

“Have you ever thought of X?” Sometimes, there’ll be an immediate “Oh, that’s a good idea” response. Sometimes one will be told a host of reasons why it can’t work—and then it’s a matter of picking through which objections are real, where all that’s needed is encouragement, and where there are other problems to be solved.

Sometimes my mentoring ends up being about things that have immediate effects on the world, like major strategy decisions for significant companies. Sometimes my mentoring is about things that are—for now—completely invisible to the world, like whether a kid should study this or that.

I tend to find mentoring the most interesting when it’s dealing with things I’ve never dealt with before. Maybe they’re things that are genuinely new in the world—like new situations in the technology industry. Or maybe they’re things that are just new to me, because I’ve never experienced or encountered that particular corner of human experience, or the world.

One thing that’s in common between CEOs and kids is that at some level they tend to be in “anything is possible” situations: they have a wide range of choices they can make about how to lead their companies, or their lives. And they also tend to want to think about the future—and about where they might go.

To be fair, there are both CEOs and kids where I wouldn’t be a particularly useful mentor. And most often that’s when they’re somehow already on some definite track, and where their next several years are largely determined. (Say just following a particular business plan, or a particular educational program.)

In the case of CEO mentoring, there’s a tendency for there to be quite long periods where not much happens, interspersed by the occasional urgent crises—deals to do or not, PR emergencies, personnel meltdowns, etc. (And, yes, those calls can come in at the most awkward times, making me glad that when I’m pushing other things aside, at least I can say to myself that I’m typically an official company advisor too, usually with a little equity in the company.)

With kids, things usually tend to be less urgent, and it’s more a matter of repeated interactions, gradually showing a direction, or working through issues. Sometimes—and this applies to CEOs as well—the issues are exogenous, and relate to situations in the world. Sometimes they’re endogenous, and they’re about how someone is motivated, or thinks about themselves or their place in the world.

I’ve found that the kids I find it most interesting to mentor fall into two categories. The first are the seriously precocious kids who are already starting to launch in high-flying directions. And the second are kids who aren’t connected to the high-flying world, and may be in difficult circumstances, but who somehow have some kind of spark that interactions with me can help nurture.

I’ve done a fair amount of traveling around the world in recent years (often with one or more of my own kids). And I always find it interesting to visit schools. (Research universities tend to seem similar all over the world, but as one gets to high schools and below, there are more and more obvious—and interesting—differences.) Usually I’ll give talks and have discussions with students. And there’s a pattern that’s repeated over and over again. At the end of an event, one or two students will come up to me and start an interesting conversation, and eventually I’ll hand them a business card and say: “If you ever want to chat more, send me mail”.

And, yes, the ones I hear from are a very self-selected set. Typically I’ll do an initial phone call to learn more about them. And if it seems like I can be useful, I’ll say, “Let me put you on my list of people I’ll call when I have time”.

I have a busy life, and I like to be as productive as possible. But there are always scraps of time when I’m not doing what I usually do. Maybe I’ll be driving from here to there at a time when there’s no useful meeting I can do. Maybe I’ll be procrastinating starting something because I’m not quite in the right frame of mind. And at those kinds of times it’s great to do a mentoring phone call. Because even if I’m hearing about all sorts of problems, I always find it energizing.

With CEOs, the problems can be big and sophisticated. With kids one might at first assume they’d be too familiar and low-level to be interesting. But at least for me, that’s not the case. Sometimes it’s that I started my career sufficiently early that I never personally encountered that kind of problem. Sometimes it’s that the problems are ones that newly exist only in recent years.

And particularly for kids in difficult circumstances, it’s often that with my particular trajectory in life I’ve just never been exposed to those kinds of problems. Sometimes I’m quite embarrassed at how clueless I am about some economic or social hardship a kid tells me about. But I’ll ask lots of questions—and often I’m quite proud of the solutions I’ll come up with.

I have to say that in modern times, it’s disappointing how difficult it tends to be for someone like me to reach kids who aren’t already connected to the rather high-flying parts of the world I usually deal with. There’s an example with our (very successful, I might add) Wolfram High School Summer Camp, which we’ve been putting on for the past seven years. We’ve always got great kids at the Summer Camp. But in the first few years, I noticed that almost all of them came from the most elite schools—usually on the East Coast or West Coast of the US, and generally had very sophisticated backgrounds.

I wanted to broaden things out, and so we put effort into advertising the Summer Camp on our Wolfram|Alpha website that (I’m happy to say) a very large number of kids use. The results were good in the sense that we immediately got a much broader geographic distribution, both within the US and outside. But though we advertised that scholarships and financial aid were available, few people applied for those, and in fact the fraction even seems to have recently been going down slightly.

It’s a frustrating situation, and perhaps it’s a reflection of broader societal issues. Of course, the Summer Camp is a somewhat different situation from mentoring, because to be successful at the Summer Camp, kids already have to have (or give themselves) a certain amount of preparation (learn at least the basics of the Wolfram Language, etc.). And in fact, it’s not uncommon for kids I’ve mentored to end up going to the Summer Camp. And from that point on (or, for example, when they go to some good college), they’re often basically “solved problems”, now connected to people and resources that will help take them forward.

When my company was young, I often found myself mentoring employees. But as the company grew, and developed a strong internal culture, that became less and less necessary because in a sense, the whole ambient environment provided mentoring. And, yes, as is typical in companies, my values as founder and CEO are (for better or worse) deeply imprinted on the organization. And part of what that means is that I don’t personally have to communicate them to everyone in the organization.

In a company it clearly makes sense to promote a certain coherent set of goals and values. But what about in the world at large, or, say, in kids one mentors? There’s always a great tendency to promote—often with missionary zeal—the kind of thing one does oneself. “Everyone should want to be a tech entrepreneur!” “Everyone should want to be a professor!” etc. And, yes, there will be people for whom those are terrific directions, and unless someone mentors them in those directions, they’ll never find them. But what about all the others?

I did some surveys of kids a couple of years ago, asking them about their goals. I asked them to say how interested they were in things like having their own reality TV show, making a billion dollars, making a big scientific discovery, having lots of friends, taking a one-way trip to Mars, etc. And, perhaps not surprisingly, there was great diversity in their answers. I asked some adults the same questions, and then asked them how they thought their answers would have been different when they were kids.

And my very anecdotal conclusion was that at least at this coarse level, the things people say they’d like to do or have done change fairly little over the course of their lives—at least after their early teenage years. Of course, an important goal of education should surely be to show people what’s out there in the world, and what it’s possible to do. In practice, though, much of modern formal education is deeply institutionalized in particular tracks that were defined a century ago. But still there are signals to be gleaned.

So you like math in school? The number of people who just do math for a living is pretty small. But what is the essence of what you like about math? Is it the definiteness of it? The problem solving? The abstract aesthetics? The measurable competitiveness? If you’re mentoring a kid you should be able to parse it out—and depending on the answer there’ll be all sorts of different possible directions and opportunities.

And in general, my point of view is that the goal should always be to try to find signals from people, and then to see how to help amplify them, and solve the problem of how to fit them into what’s possible in the world. I like to think that for every person there’s something out there that’s the best fit for what they should be doing. Of course, you may be lucky or unlucky in the time in history in which you live. You want to be an explorer, doing things like searching for the sources of rivers? Sorry, that’s been done. You want to be an asteroid miner or a genetic designer of animals? Sorry, you’re too early.

In a company, I view it as a major role—and responsibility—of management to take the skills and talents of the people one has, and solve the puzzle of fitting them into the projects that the company needs to do. Quite often one ends up suggesting quite new directions to people. (“I had no idea there was a thing like software quality assurance.” “Linguistic curation is something people do?” etc.) And over the years it’s been very satisfying to see how many successful careers I’ve been able to help launch by pointing people to new fields where it turns out their skills and interests are a match.

I don’t claim to be immune to the “encourage people to do what you do” phenomenon. And in a sense that informs the people—CEOs or kids—who I mentor. But I like to think that I’m unprejudiced about subject areas (and the more experience I get in the world, and with different kinds of people, the easier that gets). What does tend to be in common, though, is that I believe in just figuring out what to do, and doing it.

Too few people have had serious experience in going from “nothing to something”: of starting from some idea that just got invented, and then seeing it over the course of time turn into something real—and perhaps even important—in the world. But that’s the kind of thing I’ve spent my life doing, and that I try to do all the time.

And (at least given my worldview) I think it’s something that’s incredibly valuable and educational for people to see, and if possible experience for themselves. When people at the company have been involved in major “nothing-to-something” projects, I think there’s a certain glow of confidence they get that lasts a decade.

I can see that my own children have benefitted from watching so many projects of mine go from nothing to something—and being exposed to the process that’s been involved (and often giving their own input). And when I mentor kids (and often CEOs too) I like to mention projects I’ve got going on, so that over the course of time they too gradually get a sense of at least my version of the “nothing-to-something” process.

For the past several years, I’ve spent a couple of hours most Sundays doing “Computational Adventures” with groups of kids (mostly middle school, with some early high school, and some late elementary school). It’s been fascinating for me, especially as I try to understand more about teaching computational thinking. And of course it’s invigorating for me to be doing something so different from my typical “day job”.

Most of the time what I’ll actually do with the kids is try to figure out or build something with the Wolfram Language. It’s not the same kind of thing as mentoring individual kids, but there’s a little bit of “create something from nothing” when we develop ideas and implement them in the Wolfram Language.

I think to most kids, knowledge is something that just exists, not something that they know people create. And so it’s always fun when the kids bring up a topic, and I’m like “well, it so happens that the world expert on that is a friend of mine”, or, “well, actually, I was the one who discovered this or that!”. Like in mentoring, all this helps communicate the “you can do that too” message. And after a while, it’s something that kids just start to take for granted.

One of the features of having done mentoring for so long is that I’ve been able to see all sorts of long-term outcomes. Sometimes it’s a bit uncanny. I’ll be talking to some kid, and I’ll think to myself: “They’re just like that kid I knew 50 years ago!” And then I’ll start playing out in my mind what I think would naturally happen this time around, decades hence. And it’s the same with CEOs and their issues.

And, yes, it’s useful to have the experience, and to be able to make those predictions. But there’s still the problem solving about the present to do, and the human connection to make. And for me it all adds up to the fascinating and fulfilling experience I’ve had in doing all that mentoring over the past half-century or so.

Often it’s been some random coincidence that’s brought a particular mentoree to me. Sometimes it’s been their initiative in reaching out (or, very occasionally, someone reaching out on their behalf). I’m hoping that in the future (particularly when it comes to kids), it’ll be a still broader cross-section. And that in the years to come I’ll have the pleasure of successfully answering ever more of those “What should I do?” questions—that make me think about something I’ve never thought about before, and help someone follow the path they want.

]]> 2
<![CDATA[Mitchell Feigenbaum (1944‑2019), <span class="wordwrap">4.66920160910299067185320382…</span>]]> Tue, 23 Jul 2019 18:23:53 +0000 Stephen Wolfram feigenbaum_icon(Artwork by Gunilla Feigenbaum) Behind the Feigenbaum Constant It’s called the Feigenbaum constant, and it’s about 4.6692016. And it shows up, quite universally, in certain kinds of mathematical—and physical—systems that can exhibit chaotic behavior. Mitchell Feigenbaum, who died on June 30 at the age of 74, was the person who discovered it—back in 1975, by [...]]]> feigenbaum_icon
Mitchell Feigenbaum
(Artwork by Gunilla Feigenbaum)

Behind the Feigenbaum Constant

It’s called the Feigenbaum constant, and it’s about 4.6692016. And it shows up, quite universally, in certain kinds of mathematical—and physical—systems that can exhibit chaotic behavior.

Mitchell Feigenbaum, who died on June 30 at the age of 74, was the person who discovered it—back in 1975, by doing experimental mathematics on a pocket calculator.

It became a defining discovery in the history of chaos theory. But when it was first discovered, it was a surprising, almost bizarre result, that didn’t really connect with anything that had been studied before. Somehow, though, it’s fitting that it should have been Mitchell Feigenbaum—who I knew for nearly 40 years—who would discover it.

Trained in theoretical physics, and a connoisseur of its mathematical traditions, Mitchell always seemed to see himself as an outsider. He looked a bit like Beethoven—and projected a certain stylish sense of intellectual mystery. He would often make strong assertions, usually with a conspiratorial air, a twinkle in his eye, and a glass of wine or a cigarette in his hand.

He would talk in long, flowing sentences which exuded a certain erudite intelligence. But ideas would jump around. Sometimes detailed and technical. Sometimes leaps of intuition that I, for one, could not follow. He was always calculating, staying up until 5 or 6 am, filling yellow pads with formulas and stressing Mathematica with elaborate algebraic computations that might run for hours.

He published very little, and what he did publish he was often disappointed wasn’t widely understood. When he died, he had been working for years on the optics of perception, and on questions like why the Moon appears larger when it’s close to the horizon. But he never got to the point of publishing anything on any of this.

For more than 30 years, Mitchell’s official position (obtained essentially on the basis of his Feigenbaum constant result) was as a professor at the Rockefeller University in New York City. (To fit with Rockefeller’s biological research mission, he was themed as the Head of the “Laboratory of Mathematical Physics”.) But he dabbled elsewhere, lending his name to a financial computation startup, and becoming deeply involved in inventing new cartographic methods for the Hammond World Atlas.

What Mitchell Discovered

The basic idea is quite simple. Take a value x between 0 and 1. Then iteratively replace x by a x (1 – x). Let’s say one starts from x = , and takes a = 3.2. Then here’s what one gets for the successive values of x:

Successive values

ListLinePlot[NestList[Compile[x, 3.2 x (1 - x)], N[1/3], 50], 
 Mesh -> All, PlotRange -> {0, 1}, Frame -> True]

After a little transient, the values of x are periodic, with period 2. But what happens with other values of a? Here are a few results for this so-called “logistic map”:

Logistic map

    ListLinePlot[NestList[Compile[x, a x (1 - x)], N[1/3], 50], 
     Mesh -> All, PlotRange -> {0, 1}, Frame -> True, 
      FrameTicks -> None]], StringTemplate["a = ``"][a]], {a, 2.75, 
    4, .25}], 3], Spacings -> {.1, -.1}]

For small a, the values of x quickly go to a fixed point. For larger a they become periodic, first with period 2, then 4. And finally, for larger a, the values start bouncing around seemingly randomly.

One can summarize this by plotting the values of x (here, 300, after dropping the first 50 to avoid transients) reached as a function of the value of a:

Period doublings

  Table[{a, #} & /@ 
    Drop[NestList[Compile[x, a x (1 - x)], N[1/3], 300], 50], {a, 0, 
    4, .01}], 1], Frame -> True, FrameLabel -> {"a", "x"}]

As a increases, one sees a cascade of “period doublings”. In this case, they’re at a = 3, a 3.449, a 3.544090, a 3.5644072. What Mitchell noticed is that these successive values approach a limit (here a 3.569946) in a geometric sequence, with aan ~ δ-n and δ 4.669.

That’s a nice little result. But here’s what makes it much more significant: it isn’t just true about the specific iterated map xa x (1 – x); it’s true about any map like that. Here, for example, is the “bifurcation diagram” for xa sin(π ):

Bifucation diagram

  Table[{a, #} & /@ 
    Drop[NestList[Compile[x, a Sin[Pi Sqrt@x]], N[1/3], 300], 50], {a,
     0, 1, .002}], 1], Frame -> True, FrameLabel -> {"a", "x"}]

The details are different. But what Mitchell noticed is that the positions of the period doublings again form a geometric sequence, with the exact same base: δ 4.669.

It’s not just that different iterated maps give qualitatively similar results; when one measures the convergence rate this turns out be exactly and quantitatively the same—always δ 4.669. And this was Mitchell’s big discovery: a quantitatively universal feature of the approach to chaos in a class of systems.

The Scientific Backstory

The basic idea behind iterated maps has a long history, stretching all the way back to antiquity. Early versions arose in connection with finding successive approximations, say to square roots. For example, using Newton’s method from the late 1600s, can be obtained by iterating x (here starting from x = 1):

Starting from x = 1

NestList[Function[x, 1/x + x/2], N[1, 8], 6]

The notion of iterating an arbitrary function seems to have first been formalized in an 1870 paper by Ernst Schröder (who was notable for his work in formalizing things from powers to Boolean algebra), although most of the discussion that arose was around solving functional equations, not actually doing iterations. (An exception was the investigation of regions of convergence for Newton’s approximation by Arthur Cayley in 1879.) In 1918 Gaston Julia made a fairly extensive study of iterated rational functions in the complex plane—inventing, if not drawing, Julia sets. But until fractals in the late 1970s (which soon led to the Mandelbrot set), this area of mathematics basically languished.

But quite independent of any pure mathematical developments, iterated maps with forms similar to xa x (1 – x) started appearing in the 1930s as possible practical models in fields like population biology and business cycle theory—usually arising as discrete annualized versions of continuous equations like the Verhulst logistic differential equation from the mid-1800s. Oscillatory behavior was often seen—and in 1954 William Ricker (one of the founders of fisheries science) also found more complex behavior when he iterated some empirical fish reproduction curves.

Back in pure mathematics, versions of iterated maps had also shown up from time to time in number theory. In 1799 Carl Friedrich Gauss effectively studied the map x FractionalPart[] in connection with continued fractions. And starting in the late 1800s there was interest in studying maps like x FractionalPart[a x] and their connections to the properties of the number a.

Particularly following Henri Poincaré’s work on celestial mechanics around 1900, the idea of sensitive dependence on initial conditions arose, and it was eventually noted that iterated maps could effectively “excavate digits” in their initial conditions. For example, iterating xFractionalPart[10 x], starting with the digits of π, gives (effectively just shifting the sequence of digits one place to the left at each step):

Starting with the digits of pi...

N[NestList[Function[x, FractionalPart[10 x]], N[Pi, 100], 5], 10]


 Rest@N[NestList[Function[x, FractionalPart[10 x]], N[Pi, 100], 50], 
   40], Mesh -> All]

(Confusingly enough, with typical “machine precision” computer arithmetic, this doesn’t work correctly, because even though one “runs out of precision”, the IEEE Floating Point standard says to keep on delivering digits, even though they are completely wrong. Arbitrary precision in the Wolfram Language gets it right.)

Maps like xa x(1 – x) show similar kinds of “digit excavation” behavior (for example, replacing x by sin[π u]2, x ⟶ 4 x(1 – x) becomes exactly uFractionalPart[u, 2]—and this was already known by the 1940s, and, for example, commented on by John von Neumann in connection with his 1949 iterative “middle-square” method for generating pseudorandom numbers by computer.

But what about doing experimental math on iterated maps? There wasn’t too much experimental math at all on early digital computers (after all, most computer time was expensive). But in the aftermath of the Manhattan Project, Los Alamos had built its own computer (named MANIAC), that ended up being used for a whole series of experimental math studies. And in 1964 Paul Stein and Stan Ulam wrote a report entitled “Non-linear Transformation Studies on Electronic Computers” that included photographs of oscilloscope-like MANIAC screens displaying output from some fairly elaborate iterated maps. In 1971, another “just out of curiosity” report from Los Alamos (this time by Nick Metropolis [leader of the MANIAC project, and developer of the Monte Carlo method], Paul Stein and his brother Myron Stein) started to give more specific computer results for the behavior logistic maps, and noted the basic phenomenon of period doubling (which they called the “U-sequence”), as well as its qualitative robustness under changes in the underlying map.

But quite separately from all of this, there were other developments in physics and mathematics. In 1964 Ed Lorenz (a meteorologist at MIT) introduced and simulated his “naturally occurring” Lorenz differential equations, that showed sensitive dependence on initial conditions. Starting in the 1940s (but following on from Poincaré’s work around 1900) there’d been a steady stream of developments in mathematics in so-called dynamical systems theory—particularly investigating global properties of the solutions to differential equations. Usually there’d be simple fixed points observed; sometimes “limit cycles”. But by the 1970s, particularly after the arrival of early computer simulations (like Lorenz’s), it was clear that for nonlinear equations something else could happen: a so-called “strange attractor”. And in studying so-called “return maps” for strange attractors, iterated maps like the logistic map again appeared.

But it was in 1975 that various threads of development around iterated maps somehow converged. On the mathematical side, dynamical systems theorist Jim Yorke and his student Tien-Yien Li at the University of Maryland published their paper “Period Three Implies Chaos”, showing that in an iterated map with a particular parameter value, if there’s ever an initial condition that leads to a cycle of length 3, there must be other initial conditions that don’t lead to cycles at all—or, as they put it, show chaos. (As it turned out, Aleksandr Sarkovskii—who was part of a Ukrainian school of dynamical systems research—had already in 1962 proved the slightly weaker result that a cycle of period 3 implies cycles of all periods.)

But meanwhile there had also been growing interest in things like the logistic maps among mathematically oriented population biologists, leading to the rather readable review (published in mid-1976) entitled “Simple Mathematical Models with Very Complicated Dynamics” by physics-trained Australian Robert May, who was then a biology professor at Princeton (and would subsequently become science advisor to the UK government, and is now “Baron May of Oxford”).

But even though things like sketches of bifurcation diagrams existed, the discovery of their quantitatively universal properties had to await Mitchell Feigenbaum and his discovery.

Mitchell’s Journey

Mitchell Feigenbaum grew up in Brooklyn, New York. His father was an analytical chemist, and his mother was a public-school teacher. Mitchell was unenthusiastic about school, though did well on math and science tests, and managed to teach himself calculus and piano. In 1960, at age 16, as something of a prodigy, he enrolled in the City College of New York, officially studying electrical engineering, but also taking physics and math classes. After graduating in 1964, he went to MIT. Initially he was going to do a PhD in electrical engineering, but he quickly switched to physics.

But although he was enamored of classic mathematical physics (as represented, for example, in the books of Landau and Lifshiftz), he ended up writing his thesis on a topic set by his advisor about particle physics, and specifically about evaluating a class of Feynman diagrams for the scattering of photons by scalar particles (with lots of integrals, if not special functions). It wasn’t a terribly exciting thesis, but in 1970 he was duly dispatched to Cornell for a postdoc position.

Mitchell struggled with motivation, preferring to hang out in coffee shops doing the New York Times crossword (at which he was apparently very fast) to doing physics. But at Cornell, Mitchell made several friends who were to be important to him. One was Predrag Cvitanović, a star graduate student from what is now Croatia, who was studying quantum electrodynamics, and with whom he shared an interest in German literature. Another was a young poet named Kathleen Doorish (later, Kathy Hammond), who was a friend of Predrag’s. And another was a rising-star physics professor named Pete Carruthers, with whom he shared an interest in classical music.

In the early 1970s quantum field theory was entering a golden age. But despite the topic of his thesis, Mitchell didn’t get involved, and in the end, during his two years at Cornell, he produced no visible output at all. Still, he had managed to impress Hans Bethe enough to be dispatched for another postdoc position, though now at a place lower in the pecking order of physics, Virginia Polytechnic Institute, in rural Virginia.

At Virginia Tech, Mitchell did even less well than at Cornell. He didn’t interact much with people, and he produced only one three-page paper: “The Relationship between the Normalization Coefficient and Dispersion Function for the Multigroup Transport Equation”. As its title might suggest, the paper was quite technical and quite unexciting.

As Mitchell’s two years at Virginia Tech drew to a close it wasn’t clear what was going to happen. But luck intervened. Mitchell’s friend from Cornell, Pete Carruthers, had just been hired to build up the theory division (“T Division”) at Los Alamos, and given carte blanche to hire several bright young physicists. Pete would later tell me with pride (as part of his advice to me about general scientific management) that he had a gut feeling that Mitchell could do something great, and that despite other people’s input—and the evidence—he decided to bet on Mitchell.

Having brought Mitchell to Los Alamos, Pete set about suggesting projects for him. At first, it was following up on some of Pete’s own work, and trying to compute bulk collective (“transport”) properties of quantum field theories as a way to understand high-energy particle collisions—a kind of foreshadowing of investigations of quark-gluon plasma.

But soon Pete suggested that Mitchell try looking at fluid turbulence, and in particular on seeing whether renormalization group methods might help in understanding it.

Whenever a fluid—like water—flows sufficiently rapidly it forms lots of little eddies and behaves in a complex and seemingly random way. But even though this qualitative phenomenon had been discussed for centuries (with, for example, Leonardo da Vinci making nice pictures of it), physics had had remarkably little to say about it—though in the 1940s Andrei Kolmogorov had given a simple argument that the eddies should form a cascade with a k distribution of energies. At Los Alamos, though, with its focus on nuclear weapons development (inevitably involving violent fluid phenomena), turbulence was a very important thing to understand—even if it wasn’t obvious how to approach it.

But in 1974, there was news that Ken Wilson from Cornell had just “solved the Kondo problem” using a technique called the renormalization group. And Pete Carruthers suggested that Mitchell should try to apply this technique to turbulence.

The renormalization group is about seeing how changes of scale (or other parameters) affect descriptions (and behavior) of systems. And as it happened, it was Mitchell’s thesis advisor at MIT, Francis Low, who, along with Murray Gell-Mann, had introduced it back in 1954 in the context of quantum electrodynamics. The idea had lain dormant for many years, but in the early 1970s it came back to life with dramatic—though quite different—applications in both particle physics (specifically, QCD) and condensed matter physics.

In a piece of iron at room temperature, you can basically get all electron spins associated with each atom lined up, so the iron is magnetized. But if you heat the iron up, there start to be fluctuations, and suddenly—above the so-called Curie temperature (770°C for iron)—there’s effectively so much randomness that the magnetization disappears. And in fact there are lots of situations (think, for example, melting or boiling—or, for that matter, the formation of traffic jams) where this kind of sudden so-called phase transition occurs.

But what is actually going on in a phase transition? I think the clearest way to see this is by looking at an analog in cellular automata. With the particular rule shown below, if there aren’t very many initial black cells, the whole system will soon be white. But if you increase the number of initial black cells (as a kind of analog of increasing the temperature in a magnetic system), then suddenly, in this case at 50% black, there’s a sharp transition, and now the whole system eventually becomes black. (For phase transition experts: yes, this is a phase transition in a 1D system; one only needs 2D if the system is required to be microscopically reversible.)


     "RuleNumber" -> 294869764523995749814890097794812493824, 
     "Colors" -> 4|>, 
    3 Boole[Thread[RandomReal[{0, 1}, 2000] < rho]], {500, {-300, 
      300}}], FrameLabel -> {None, 
Round[100 rho], "% black"}]}], {rho, {0.4, 0.45, 0.55, 0.6}}], -30]

But what does the system do near 50% black? In effect, it can’t decide whether to finally become black or white. And so it ends up showing a whole hierarchy of “fluctuations” from the smallest scales to the largest. And what became clear by the 1960s is that the “critical exponents” characterizing the power laws describing these fluctuations are universal across many different systems.

But how can one compute these critical exponents? In a few toy cases, analytical methods were known. But mostly, something else was needed. And in the late 1960s Ken Wilson realized that one could use the renormalization group, and computers. One might have a model for how individual spins interact. But the renormalization group gives a procedure for “scaling up” to the interactions of larger and larger blocks of spins. And by studying that on a computer, Ken Wilson was able to start computing critical exponents.

At first, the physics world didn’t pay much attention, not least because they weren’t used to computers being so intimately in the loop in theoretical physics. But then there was the Kondo problem (and, yes, so far as I know, it has no relation to modern Kondoing—though it does relate to modern quantum dot cellular automata). In most materials, electrical resistivity decreases as the temperature decreases (going to zero for superconductors even above absolute zero). But back in the 1930s, measurements on gold had shown instead an increase of resistivity at low temperatures. By the 1960s, it was believed that this was due to the scattering of electrons from magnetic impurities—but calculations ran into trouble, generating infinite results.

But then, in 1975, Ken Wilson applied his renormalization group methods—and correctly managed to compute the effect. There was still a certain mystery about the whole thing (and it probably didn’t help that—at least when I knew him in the 1980s and beyond—I often found Ken Wilson’s explanations quite hard to understand). But the idea that the renormalization group could be important was established.

So how might it apply to fluid turbulence? Kolmogorov’s power law seemed suggestive. But could one take the Navier–Stokes equations which govern idealized fluid flow and actually derive something like this? This was the project on which Mitchell Feigenbaum embarked.

The Big Discovery

The Navier–Stokes equations are very hard to work with. In fact, to this day it’s still not clear how even the most obvious feature of turbulence—its apparent randomness—arises from these equations. (It could be that the equations aren’t a full or consistent mathematical description, and one’s actually seeing amplified microscopic molecular motions. It could be that—as in chaos theory and the Lorenz equations—it’s due to amplification of randomness in the initial conditions. But my own belief, based on work I did in the 1980s, is that it’s actually an intrinsic computational phenomenon—analogous to the randomness one sees in my rule 30 cellular automaton.)

So how did Mitchell approach the problem? He tried simplifying it—first by going from equations depending on both space and time to ones depending only on time, and then by effectively making time discrete, and looking at iterated maps. Through Paul Stein, Mitchell knew about the (not widely known) previous work at Los Alamos on iterated maps. But Mitchell didn’t quite know where to go with it, though having just got a swank new HP-65 programmable calculator, he decided to program iterated maps on it.

Then in July 1975, Mitchell went (as I also did a few times in the early 1980s) to the summer physics hang-out-together event in Aspen, CO. There he ran into Steve Smale—a well-known mathematician who’d been studying dynamical systems—and was surprised to find Smale talking about iterated maps. Smale mentioned that someone had asked him if the limit of the period-doubling cascade a 3.56995 could be expressed in terms of standard constants like π and . Smale related that he’d said he didn’t know. But Mitchell’s interest was piqued, and he set about trying to figure it out.

He didn’t have his HP-65 with him, but he dove into the problem using the standard tools of a well-educated mathematical physicist, and had soon turned it into something about poles of functions in the complex plane—about which he couldn’t really say anything. Back at Los Alamos in August, though, he had his HP-65, and he set about programming it to find the bifurcation points an.

The iterative procedure ran pretty fast for small n. But by n = 5 it was taking 30 seconds. And for n = 6 it took minutes. While it was computing, however, Mitchell decided to look at the an values he had so far—and noticed something: they seemed to be converging geometrically to a final value.

At first, he just used this fact to estimate a, which he tried—unsuccessfully—to express in terms of standard constants. But soon he began to think that actually the convergence exponent δ was more significant than a—since its value stayed the same under simple changes of variables in the map. For perhaps a month Mitchell tried to express δ in terms of standard constants.

But then, in early October 1975, he remembered that Paul Stein had said period doubling seemed to look the same not just for logistic maps but for any iterated map with a single hump. Reunited with his HP-65 after a trip to Caltech, Mitchell immediately tried the map x ⟶ sin(x)—and discovered that, at least to 3-digit precision, the exponent δ was exactly the same.

He was immediately convinced that he’d discovered something great. But Stein told him he needed more digits to really conclude much. Los Alamos had plenty of powerful computers—so the next day Mitchell got someone to show him how to write a program in FORTRAN on one of them to go further—and by the end of the day he had managed to compute that in both cases δ was about 4.6692.

The computer he used was a typical workhorse US scientific computer of the day: a CDC 6000 series machine (of the same type I used when I first moved to the US in 1978). It had been designed by Seymour Cray, and by default it used 60-bit floating-point numbers. But at this precision (about 14 decimal digits), 4.6692 was as far as Mitchell could compute. Fortunately, however, Pete’s wife Lucy Carruthers was a programmer at Los Alamos, and she showed Mitchell how to use double precision—with the result that he was able to compute δ to 11-digit precision, and determine that the values for his two different iterated maps agreed.

Within a few weeks, Mitchell had found that δ seemed to be universal whenever the iterated map had a single quadratic maximum. But he didn’t know why this was, or have any particular framework for thinking about it. But still, finally, at the age of 30, Mitchell had discovered something that he thought was really interesting.

On Mitchell’s birthday, December 19, he saw his friend Predrag, and told him about his result. But at the time, Predrag was working hard on mainstream particle physics, and didn’t pay too much attention.

Mitchell continued working, and within a few months he was convinced that not only was the exponent δ universal—the appropriately scaled, limiting, infinitely wiggly, actual iteration of the map was too. In April 1976 Mitchell wrote a report announcing his results. Then on May 2, 1976, he gave a talk about them at the Institute for Advanced Study in Princeton. Predrag was there, and now he got interested in what Mitchell was doing.

As so often, however, it was hard to understand just what Mitchell was talking about. But by the next day, Predrag had successfully simplified things, and come up with a single, explicit, functional equation for the limiting form of the scaled iterated map: g(g(x)) = , with α 2.50290—implying that for any iterated map of the appropriate type, the limiting form would always look like an even wigglier version of:

FeigenbaumFunction plot

fUD[z_] = 
  1. - 1.5276329970363323 z^2 + 0.1048151947874277 z^4 + 
   0.026705670524930787 z^6 - 0.003527409660464297 z^8 + 
   0.00008160096594827505 z^10 + 0.000025285084886512315 z^12 - 
   2.5563177536625283*^-6 z^14 - 9.65122702290271*^-8 z^16 + 
   2.8193175723520713*^-8 z^18 - 2.771441260107602*^-10 z^20 - 
   3.0292086423142963*^-10 z^22 + 2.6739057855563045*^-11 z^24 + 
   9.838888060875235*^-13 z^26 - 3.5838769501333333*^-13 z^28 + 
   2.063994985307743*^-14 z^30;
   fCF = Compile[{z}, 
    Module[{\[Alpha] = -2.5029078750959130867, n, \[Zeta]},
     n = If[Abs[z] <= 1., 0, Ceiling[Log[-\[Alpha], Abs[z]]]];
     \[Zeta] = z/\[Alpha]^n;
     Do[\[Zeta] = #, {2^n}];
     \[Alpha]^n \[Zeta]]] &[fUD[\[Zeta]]];
     Plot[fCF[x], {x, -100, 100}, MaxRecursion -> 5, PlotRange -> All]

How It Developed

The whole area of iterated maps got a boost on June 10, 1976, with the publication in Nature of Robert May’s survey about them, written independent of Mitchell and (of course) not mentioning his results. But in the months that followed, Mitchell traveled around and gave talks about his results. The reactions were mixed. Physicists wondered how the results related to physics. Mathematicians wondered about their status, given that they came from experimental mathematics, without any formal mathematical proof. And—as always—people found Mitchell’s explanations hard to understand.

In the fall of 1976, Predrag went as a postdoc to Oxford—and on the very first day that I showed up as 17-year-old particle-physics-paper-writing undergraduate, I ran into him. We talked mostly about his elegant “bird tracks” method for doing group theory (about which he finally published a book 32 years later). But he also tried to explain iterated maps. And I still remember him talking about an idealized model for fish populations in the Adriatic Sea (only years later did I make the connection that Predrag was from what is now Croatia).

At the time I didn’t pay much attention, but somehow the idea of iterated maps lodged in my consciousness, soon mixed together with the notion of fractals that I learned from Benoit Mandelbrot’s book. And when I began to concentrate on issues of complexity a couple of years later, these ideas helped guide me towards systems like cellular automata.

But back in 1976, Mitchell (who I wouldn’t meet for several more years) was off giving lots of talks about his results. He also submitted a paper to the prestigious academic journal Advances in Mathematics. For 6 months he heard nothing. But eventually the paper was rejected. He tried again with another paper, now sending it to the SIAM Journal of Applied Mathematics. Same result.

I have to say I’m not surprised this happened. In my own experience of academic publishing (now long in the past), if one was reporting progress within an established area it wasn’t too hard to get a paper published. But anything genuinely new or original one could pretty much count on getting rejected by the peer review process, either through intellectual shortsightedness or through academic corruption. And for Mitchell there was the additional problem that his explanations weren’t easy to understand.

But finally, in late 1977, Joel Lebowitz, editor of the Journal of Statistical Physics, agreed to publish Mitchell’s paper—essentially on the basis of knowing Mitchell, even though he admitted he didn’t really understand the paper. And so it was that early in 1978 “Quantitative Universality for a Class of Nonlinear Transformations”—reporting Mitchell’s big result—officially appeared. (For purposes of academic priority, Mitchell would sometimes quote a summary of a talk he gave on August 26, 1976, that was published in the Los Alamos Theoretical Division Annual Report 1975–1976. Mitchell was quite affected by the rejection of his papers, and for years kept the rejection letters in his desk drawer.)

Mitchell continued to travel the world talking about his results. There was interest, but also confusion. But in the summer of 1979, something exciting happened: Albert Libchaber in Paris reported results on a physical experiment on the transition to turbulence in convection in liquid helium—where he saw period doubling, with exactly the exponent δ that Mitchell had calculated. Mitchell’s δ apparently wasn’t just universal to a class of mathematical systems—it also showed up in real, physical systems.

Pretty much immediately, Mitchell was famous. Connections to the renormalization group had been made, and his work was becoming fashionable among both physicists and mathematicians. Mitchell himself was still traveling around, but now he was regularly hobnobbing with the top physicists and mathematicians.

I remember him coming to Caltech, perhaps in the fall of 1979. There was a certain rock-star character to the whole thing. Mitchell showed up, gave a stylish but somewhat mysterious talk, and was then whisked away to talk privately with Richard Feynman and Murray Gell-Mann.

Soon Mitchell was being offered all sorts of high-level jobs, and in 1982 he triumphantly returned to Cornell as a full professor of physics. There was an air of Nobel Prize–worthiness, and by June 1984 he was appearing in the New York Times magazine, in full Beethoven mode, in front of a Cornell waterfall:

Mitchell in New York Times Magazine

Still, the mathematicians weren’t satisfied. As with Benoit Mandelbrot’s work, they tended to see Mitchell’s results as mere “numerical conjectures”, not proven and not always even quite worth citing. But top mathematicians (who Mitchell had befriended) were soon working on the problem, and results began to appear—though it took a decade for there to be a full, final proof of the universality of δ.

Where the Science Went

So what happened to Mitchell’s big discovery? It was famous, for sure. And, yes, period-doubling cascades with his universal features were seen in a whole sequence of systems—in fluids, optics and more. But how general was it, really? And could it, for example, be extended to the full problem of fluid turbulence?

Mitchell and others studied systems other than iterated maps, and found some related phenomena. But none were quite as striking as Mitchell’s original discovery.

In a sense, my own efforts on cellular automata and the behavior of simple programs, beginning around 1981, have tried to address some of the same bigger questions as Mitchell’s work might have led to. But the methods and results have been very different. Mitchell always tried to stay close to the kinds of things that traditional mathematical physics can address, while I unabashedly struck out into the computational universe, investigating the phenomena that occur there.

I tried to see how Mitchell’s work might relate to mine—and even in my very first paper on cellular automata in 1981 I noted for example that the average density of black cells on successive steps of a cellular automaton’s evolution can be approximated (in “mean field theory”) by an iterated map.

I also noted that mathematically the whole evolution of a cellular automaton can be viewed as an iterated map—though on the Cantor set, rather than on ordinary real numbers. In my first paper, I even plotted the analog of Mitchell’s smooth mappings, but now they were wild and discontinuous:

Rules plot

     Table[FromDigits[CellularAutomaton[#, IntegerDigits[n, 2, 12]], 
       2], {n, 0, 2^12 - 1}], Sequence[
     AspectRatio -> 1, Frame -> True, FrameTicks -> None]], 
    Text[StringTemplate["rule ``"][#]]] & /@ {22, 42, 90, 110}]

But try as I might, I could never find any strong connection with Mitchell’s work. I looked for analogs of things like period doubling, and Sarkovskii’s theorem, but didn’t find much. In my computational framework, even thinking about real numbers, with their infinite sequence of digits, was a bit unnatural. Years later, in A New Kind of Science, I had a note entitled “Smooth iterated maps”. I showed their digit sequences, and observed, rather undramatically, that Mitchell’s discovery implied an unusual nested structure at the beginning of the sequences:


FractionalDigits[x_, digs_Integer] := 
 NestList[{Mod[2 First[#], 1], Floor[2 First[#]]} &, {x, 0}, digs][[
  2 ;;, -1]];
    FractionalDigits[#, 40] & /@ 
     NestList[a # (1 - #) &, N[1/8, 80], 80]]] /@ {2.5, 3.3, 3.4, 3.5,
    3.6, 4}]

The Rest of the Story

Portrait of Mitchell
(Photograph by Predrag Cvitanović)

So what became of Mitchell? After four years at Cornell, he moved to the Rockefeller University in New York, and for the next 30 years settled into a somewhat Bohemian existence, spending most of his time at his apartment on the Upper East Side of Manhattan.

While he was still at Los Alamos, Mitchell had married a woman from Germany named Cornelia, who was the sister of the wife of physicist (and longtime friend of mine) David Campbell, who had started the Center for Nonlinear Studies at Los Alamos, and would later go on to be provost at Boston University. But after not too long, Cornelia left Mitchell, taking up instead with none other than Pete Carruthers. (Pete—who struggled with alcoholism and other issues—later reunited with Lucy, but died in 1997 at the age of 61.)

When he was back at Cornell, Mitchell met a woman named Gunilla, who had run away from her life as a pastor’s daughter in a small town in northern Sweden at the age of 14, had ended up as a model for Salvador Dalí, and then in 1966 had been brought to New York as a fashion model. Gunilla had been a journalist, video maker, playwright and painter. Mitchell and she married in 1986, and remained married for 26 years, during which time Gunilla developed quite a career as a figurative painter.

Mitchell’s last solo academic paper was published in 1987. He did publish a handful of other papers with various collaborators, though none were terribly remarkable. Most were extensions of his earlier work, or attempts to apply traditional methods of mathematical physics to various complex fluid-like phenomena.

Mitchell liked interacting with the upper echelons of academia. He received all sorts of honors and recognition (though never a Nobel Prize). But to the end he viewed himself as something of an outsider—a Renaissance man who happened to have focused on physics, but didn’t really buy into all its institutions or practices.

From the early 1980s on, I used to see Mitchell fairly regularly, in New York or elsewhere. He became a daily user of Mathematica, singing its praises and often telling me about elaborate calculations he had done with it. Like many mathematical physicists, Mitchell was a connoisseur of special functions, and would regularly talk to me about more and more exotic functions he thought we should add.

Mitchell had two major excursions outside of academia. By the mid-1980s, the young poetess—now named Kathy Hammond—that Mitchell had known at Cornell had been an advertising manager for the New York Times and had then married into the family that owned the Hammond World Atlas. And through this connection, Mitchell was pulled into a completely new field for him: cartography.

I talked to him about it many times. He was very proud of figuring out how to use the Riemann mapping theorem to produce custom local projections for maps. He described (though I never fully understood it) a very physics-based algorithm for placing labels on maps. And he was very pleased when finally an entirely new edition of the Hammond World Atlas (that he would refer to as “my atlas”) came out.

Starting in the 1980s, there’d been an increasing trend for physics ideas to be applied to quantitative finance, and for physicists to become Wall Street quants. And with people in finance continually looking for a unique edge, there was always an interest in new methods. I was certainly contacted a lot about this—but with the success of James Gleick’s 1987 book Chaos (for which I did a long interview, though was only mentioned, misspelled, in a list of scientists who’d been helpful), there was a whole new set of people looking to see how “chaos” could help them in finance.

One of those was a certain Michael Goodkin. When he was in college back in the early 1960s, Goodkin had started a company that marketed the legal research services of law students. A few years later, he enlisted several Nobel Prize–winning economists and started what may have been the first hedge fund to do computerized arbitrage trading. Goodkin had always been a high-rolling, globetrotting gambler and backgammon player, and he made and lost a lot of money. And, down on his luck, he was looking for the next big thing—and found chaos theory, and Mitchell Feigenbaum.

For a few years he cultivated various physicists, then in 1995 he found a team to start a company called Numerix to commercialize the use of physics-like methods in computations for increasingly exotic financial instruments. Mitchell Feigenbaum was the marquee name, though the heavy lifting was mostly done by my longtime friend Nigel Goldenfeld, and a younger colleague of his named Sasha Sokol.

At the beginning there was lots of mathematical-physics-like work, and Mitchell was quite involved. (He was an enthusiast of Itô calculus, gave lectures about it, and was proud of having found 1000 speed-ups of stochastic integrations.) But what the company actually did was to write C++ libraries for banks to integrate into their systems. It wasn’t something Mitchell wanted to do long term. And after a number of years, Mitchell’s active involvement in the company declined.

(I’d met Michael Goodkin back in 1998, and 14 years later—having recently written his autobiography The Wrong Answer Faster: The Inside Story of Making the Machine That Trades Trillions—he suddenly contacted me again, pitching my involvement in a rather undefined new venture. Mitchell still spoke highly of Michael, though when the discussion rather bizarrely pivoted to me basically starting and CEOing a new company, I quickly dropped it.)

I had many interactions with Mitchell over the years, though they’re not as well archived as they might be, because they tended to be verbal rather than written, since, as Mitchell told me (in email): “I dislike corresponding by email. I still prefer to hear an actual voice and interact…”

There are fragments in my archive, though. There’s correspondence, for example, about Mitchell’s 2004 60th-birthday event, that I couldn’t attend because it conflicted with a significant birthday for one of my children. In lieu of attending, I commissioned the creation of a “Feigenbaum–Cvitanović Crystal”—a 3D rendering in glass of the limiting function g(z) in the complex plane.

It was a little complex to solve the functional equation, and the laser manufacturing method initially shattered a few blocks of glass, but eventually the object was duly made, and sent—and I was pleased many years later to see it nicely displayed in Mitchell’s apartment:

Feigenbaum–Cvitanović crystal

Sometimes my archives record mentions of Mitchell by others, usually Predrag. In 2007, Predrag reported (with characteristic wit):

“Other news: just saw Mitchell, he is dating Odyssey.

No, no, it’s not a high-level Washington type escort service—he is dating Homer’s Odyssey, by computing the positions of low stars as function of the 26000 year precession—says Hiparcus [sic] had it all figured out, but Catholic church succeeded in destroying every single copy of his tables.”

Living up to the Renaissance man tradition, Mitchell always had a serious interest in history. In 2013, responding to a piece of mine about Leibniz, Mitchell said he’d been a Leibniz enthusiast since he was a teenager, then explained:

“The Newton hagiographer (literally) Voltaire had no idea of the substance of the Monadology, so could only spoof ‘the best of all possible worlds’. Long ago I’ve published this as a verbal means of explaining 2^n universality.

Leibniz’s second published paper at age 19, ‘On the Method of Inverse Tangents’, or something like that, is actually the invention of the method of isoclines to solve ODEs, quite contrary to the extant scholarly commentary. Both Leibniz and Newton start with differential equations, already having received the diff. calculus. This is quite an intriguing story.”

But the mainstay of Mitchell’s intellectual life was always mathematical physics, though done more as a personal matter than as part of institutional academic work. At some point he was asked by his then-young goddaughter (he never had children of his own) why the Moon looks larger when it’s close to the horizon. He wrote back an explanation (a bit in the style of Euler’s Letters to a German Princess), then realized he wasn’t sure of the answer, and got launched into many years of investigation of optics and image formation. (He’d actually been generally interested in the retina since he was at MIT, influenced by Jerry Lettvin of “What the Frog’s Eye Tells the Frog’s Brain” fame.)

He would tell me about it, explaining that the usual theory of image formation was wrong, and he had a better one. He always used the size of the Moon as an example, but I was never quite clear whether the issue was one of optics or perception. He never published anything about what he did, though with luck his manuscripts (rumored to have the makings of a book) will eventually see the light of day—assuming others can understand them.

When I would visit Mitchell (and Gunilla), their apartment had a distinctly Bohemian feel, with books, papers, paintings and various devices strewn around. And then there was The Bird. It was a cockatoo, and it was loud. I’m not sure who got it or why. But it was a handful. Mitchell and Gunilla nearly got ejected from their apartment because of noise complaints from neighbors, and they ended up having to take The Bird to therapy. (As I learned in a slightly bizarre—and never executed—plan to make videogames for “they-are-alien-intelligences-right-here-on-this-planet” pets, cockatoos are social and, as pets, arguably really need a “Twitter for Cockatoos”.)

The Bird
(Photograph by Predrag Cvitanović)

In the end, though, it was Gunilla who left, with the rumor being that she’d been driven away by The Bird.

The last time I saw Mitchell in person was a few years ago. My son Christopher and I visited him at his apartment—and he was in full Mitchell form, with eyes bright, talking rapidly and just a little conspiratorially about the mathematical physics of image formation. “Bird eyes are overrated”, he said, even as his cockatoo squawked in the next room. “Eagles have very small foveas, you know. Their eyes are like telescopes.”

“Fish have the best eyes”, he said, explaining that all eyes evolved underwater—and that the architecture hadn’t really changed since. “Fish keep their entire field of view in focus, not like us”, he said. It was charming, eccentric, and very Mitchell.

For years, we had talked from time to time on the phone, usually late at night. I saw Predrag a few months ago, saying that I was surprised not to have heard from Mitchell. He explained that Mitchell was sick, but was being very private about it. Then, a few weeks ago, just after midnight, Predrag sent me an email with the subject line “Mitchell is dead”, explaining that Mitchell had died at around 8 pm, and attaching a quintessential Mitchell-in-New-York picture:

Mitchell in New York
(Photograph by Predrag Cvitanović)

It’s kind of a ritual I’ve developed when I hear that someone I know has died: I immediately search my archives. And this time I was surprised to find that a few years ago Mitchell had successfully reached voicemail I didn’t know I had. So now we can give Mitchell the last word:

And, of course, the last number too: 4.66920160910299067185320382…

]]> 5
<![CDATA[Testifying at the Senate about A.I.‑Selected Content on the Internet]]> Wed, 26 Jun 2019 03:05:12 +0000 Stephen Wolfram capitol-thumbAn Invitation to Washington Three and a half weeks ago I got an email asking me if I’d testify at a hearing of the US Senate Commerce Committee’s Subcommittee on Communications, Technology, Innovation and the Internet. Given that the title of the hearing was “Optimizing for Engagement: Understanding the Use of Persuasive Technology on Internet [...]]]> capitol-thumb

Optimizing for Engagement: Understanding the Use of Persuasive Technology on Internet Platforms

An Invitation to Washington

Three and a half weeks ago I got an email asking me if I’d testify at a hearing of the US Senate Commerce Committee’s Subcommittee on Communications, Technology, Innovation and the Internet. Given that the title of the hearing was “Optimizing for Engagement: Understanding the Use of Persuasive Technology on Internet Platforms” I wasn’t sure why I’d be relevant.

But then the email went on: “The hearing is intended to examine, among other things, whether algorithmic transparency or algorithmic explanation are policy options Congress should be considering.” That piqued my interest, because, yes, I have thought about “algorithmic transparency” and “algorithmic explanation”, and their implications for the deployment of artificial intelligence.

Generally I stay far away from anything to do with politics. But figuring out how the world should interact with AI is really important. So I decided that—even though it was logistically a bit difficult—I should do my civic duty and go to Washington and testify.

Watch the Senate hearing:
Optimizing for Engagement: Understanding the Use of Persuasive Technology on Internet Platforms »

Understanding the Issues

So what was the hearing really about? For me, it was in large measure an early example of reaction to the realization that, yes, AIs are starting to run the world. Billions of people are being fed content that is basically selected for them by AIs, and there are mounting concerns about this, as reported almost every day in the media.

Are the AIs cleverly hacking us humans to get us to behave in a certain way? What kind of biases do the AIs have, relative to what the world is like, or what we think the world should be like? What are the AIs optimizing for, anyway? And when are there actually “humans behind the curtain”, controlling in detail what the AIs are doing?

It doesn’t help that in some sense the AIs are getting much more free rein than they might because the people who use them aren’t really their customers. I have to say that back when the internet was young, I personally never thought it would work this way, but in today’s world many of the most successful businesses on the internet—including Google, Facebook, YouTube and Twitter—make their revenue not from their users, but instead from advertisers who are going through them to reach their users.

All these business also have in common that they are fundamentally what one can call “automated content selection businesses”: they work by getting large amounts of content that they didn’t themselves generate, then using what amounts to AI to automatically select what content to deliver or to suggest to any particular user at any given time—based on data that they’ve captured about that user. Part of what’s happening is presumably optimized to give a good experience to their users (whatever that might mean), but part of it is also optimized to get revenue from the actual customers, i.e. advertisers. And there’s also an increasing suspicion that somehow the AI is biased in what it’s doing—maybe because someone explicitly made it be, or because it somehow evolved that way.

“Open Up the AI”?

So why not just “open up the AI” and see what it’s doing inside? Well, that’s what the algorithmic transparency idea mentioned in the invitation to the hearing is about.

And the problem is that, no, that can’t work. If we want to seriously use the power of computation—and AI—then inevitably there won’t be a “human-explainable” story about what’s happening inside.

So, OK, if you can’t check what’s happening inside the AI, what about putting constraints on what the AI does? Well, to do that, you have to say what you want. What rule for balance between opposing kinds of views do you want? How much do you allow people to be unsettled by what they see? And so on.

And there are two problems here: first, what to want, and, second, how to describe it. In the past, the only way we could imagine describing things like this was with traditional legal rules, written in legalese. But if we want AIs to automatically follow these rules, perhaps billions of times a second, that’s not good enough: instead, we need something that AIs can intrinsically understand.

And at least on this point I think we’re making good progress. Because—thanks to our 30+ years of work on Wolfram Language—we’re now beginning to have a computational language that has the scope to formulate “computational contracts” that can specify relevant kinds of constraints in computational terms, in a form that humans can write and understand, and that machines can automatically interpret.

But even though we’re beginning to have the tools, there’s still the huge question of what the “computational laws” for automatic content selection AIs will be.

A lot of the hearing ultimately revolved around Section 230 of the 1996 Communications Decency Act—which specifies what kinds of content companies can choose to block without losing their status as “neutral platforms”. There’s a list of fairly uncontroversially blockable kinds of content. But then the sentence ends with “or otherwise objectionable [content]”. What does this mean? Does it mean content that espouses objectionable points of view? Whose definition of “objectionable”? Etc.

Well, one day things like Section 230 will, of necessity, not be legalese laws, but computational laws. There’ll be some piece of computational language that specifies for example that this-or-that machine learning classifier trained on this-or-that sample of the internet will be used to define this or that.

We’re not there yet, however. We’re only just beginning to be able to set up computational contracts for much simpler things, like business situations. And—somewhat fueled by blockchain—I expect that this will accelerate in the years to come. But it’s going to be a while before the US Senate is routinely debating lines of code in computational laws.

So, OK, what can be done now?

A Possible Path Forward?

A little more than a week ago, what I’d figured out was basically what I’ve already described here. But that meant I was looking at going to the hearing and basically saying only negative things. “Sorry, this won’t work. You can’t do that. The science says it’s impossible. The solution is years in the future.” Etc.

And, as someone who prides himself on turning the seemingly impossible into the possible, this didn’t sit well with me. So I decided I’d better try to figure out if I could actually see a pragmatic, near-term path forward. At first, I tried thinking about purely technological solutions. But soon I basically convinced myself that no such solution was going to work.

So, with some reticence, I decided I’d better start thinking about other kinds of solutions. Fortunately there are quite a few people at my company and in my circle who I could talk to about this—although I soon discovered they often had strongly conflicting views. But after a little while, a glimmer of an idea emerged.

Why does every aspect of automated content selection have to be done by a single business? Why not open up the pipeline, and create a market in which users can make choices for themselves?

One of the constraints I imposed on myself is that my solution couldn’t detract from the impressive engineering and monetization of current automated content selection businesses. But I came up with at least two potential ways to open things up that I think could still perfectly well satisfy this constraint.

One of my ideas involved introducing what I call “final ranking providers”: third parties who take pre-digested feature vectors from the underlying content platform, then use these to do the final ranking of items in whatever way they want. My other ideas involved introducing “constraint providers”: third parties who provide constraints in the form of computational contracts that are inserted into the machine learning loop of the automated content selection system.

The important feature of both these solutions is that users don’t have to trust the single AI of the automated content selection business. They can in effect pick their own brand of AI—provided by a third party they trust—to determine what content they’ll actually be given.

Who would these third-party providers be? They might be existing media organizations, or nonprofits, or startups. Or they might be something completely new. They’d have to have some technical sophistication. But fundamentally what they’d have to do is to define—or represent—brands that users would trust to decide what the final list of items in their news feed, or video recommendations, or search results, or whatever, might be.

Social networks get their usefulness by being monolithic: by having “everyone” connected into them. But the point is that the network can prosper as a monolithic thing, but there doesn’t need to be just one monolithic AI that selects content for all the users on the network. Instead, there can be a whole market of AIs, that users can freely pick between.

And here’s another important thing: right now there’s no consistent market pressure on the final details of how content is selected for users, not least because users aren’t the final customers. (Indeed, pretty much the only pressure right now comes from PR eruptions and incidents.) But if the ecosystem changes, and there are third parties whose sole purpose is to serve users, and to deliver the final content they want, then there’ll start to be real market forces that drive innovation—and potentially add more value.

Could It Work?

AI provides powerful ways to automate the doing of things. But AIs on their own can’t ultimately decide what they want to do. That has to come from outside—from humans defining goals. But at a practical level, where should those goals be set? Should they just come—monolithically—from an automated content selection business? Or should users have more freedom, and more choice?

One might say: “Why not let every user set everything for themselves?”. Well, the problem with that is that automated content selection is a complicated matter. And—much as I hope that there’ll soon be very widespread computational language literacy—I don’t think it’s realistic that everyone will be able to set up everything in detail for themselves. So instead, I think the better idea is to have discrete third-party providers, who set things up in a way that appeals to some particular group of users.

Then standard market forces can come into play. No doubt the result would even be a greater level of overall success for the delivery of content to users who want it (and monetize it). But this market approach also solves some other problems associated with the “single point of failure” monolithic AI.

For example, with the monolithic AI, if someone figures out how to spread some kind of bad content, it’ll spread everywhere. With third-party providers, there’s a good chance it’ll only spread through some of them.

Right now there’s lots of unhappiness about people simply being “banned” from particular content platforms. But with the market of third-party providers, banning is not an all-or-nothing proposition anymore: some providers could ban someone, but others might not.

OK, but are there “fatal flaws” with my idea? People could object that it’s technically difficult to do. I don’t know the state of the codebases inside the major automated content selection businesses. But I’m certain that with manageable effort, appropriate APIs etc. could be set up. (And it might even help these businesses by forcing some code cleanup and modernization.)

Another issue might be: how will the third-party providers be incentivized? I can imagine some organizations just being third-party providers as a public service. But in other cases they’d have to be paid a commission by the underlying content platform. The theory, though, is that good work by third-party content providers would expand the whole market, and make them “worth their commission”. Plus, of course, the underlying content platforms could save a lot by not having to deal with all those complaints and issues they’re currently getting.

What if there’s a third-party provider that upranks content some people don’t like? That will undoubtedly happen. But the point is that this is a market—so market dynamics can operate.

Another objection is that my idea makes even worse the tendency with modern technology for people to live inside “content bubbles” where they never broaden their points of view. Well, of course, there can be providers who offer broader content. But people could choose “content bubbles” providers. The good thing, though, is that they’re choosing them, and they know they’re doing that, just like they know they’re choosing to watch one television channel and not another.

Of course it’s important for the operation of society that people have some level of shared values. But what should those shared values be, and who should decide them? In a totalitarian system, it’s basically the government. Right now, with the current monolithic state of automated content selection, one could argue it’s the automated content selection businesses.

If I were running one of those businesses, I’d certainly not want to get set up as the moral arbiter for the world; it seems like a no-win role. With the third-party providers idea, there’s a way out, without damaging the viability of the business. Yes, users get more control, as arguably they should have, given that they are the fuel that makes the business work. But the core business model is still perfectly intact. And there’s a new market that opens up, for third-party providers, potentially delivering all sorts of new economic value.

What Should I Do?

At the beginning of last weekend, what I just described was basically the state of my thinking. But what should I do with it? Was there some issue I hadn’t noticed? Was I falling into some political or business trap? I wasn’t sure. But it seemed as if some idea in this area was needed, and I had an idea, so I really should tell people about it.

So I quickly wrote up the written testimony for the hearing, and sent it in by the deadline on Sunday morning. (The full text of the testimony is included at the end of this piece.)

Stephen Wolfram's written testimony

The Hearing Itself

View of the Senate

This morning was the hearing itself. It was in the same room as the hearing Mark Zuckerberg did last fall. The staffers were saying that they expected a good turnout of senators, and that of the 24 senators on the subcommittee (out of 100 total in the Senate), they expected about 15 to show up at some point or another.

At the beginning, staffers were putting out nameplates for the senators. I was trying to figure out what the arrangement was. And then I realized! It was a horseshoe configuration and Republican senators were on the right side of the horseshoe, Democrats were on the left. There really are right and left wings! (Yes, I obviously don’t watch C-SPAN enough, or I’d already know that.)

When the four of us on the panel were getting situated, one of the senators (Marsha Blackburn [R-TN]) wandered up, and started talking about computational irreducibility. Wow, I thought, this is going to be interesting. That’s a pretty abstruse science concept to be finding its way into the Senate.

Everyone had five minutes to give opening remarks, and everyone had a little countdown timer in front of them. I talked a bit about the science and technology of AI and explainability. I mentioned computational contracts and the concept of an AI Constitution. Then I said I didn’t want to just explain that everything was impossible—and gave a brief summary of my ideas for solutions. Rather uncharacteristically for me, I ended a full minute before my time was up.

The format for statements and questions was five minutes per senator. The issues raised were quite diverse. I quickly realized, though, that it was unfortunate that I really had three different things I was talking about (non-explainability, computational laws, and my ideas for a near-term solution). In retrospect perhaps I should have concentrated on the near-term solution, but it felt odd to be emphasizing something I just thought of last week, rather than something I’ve thought about for many years.

Still, it was fascinating—and a sign of things to come—to see serious issues about what amounts to the philosophy of computation being discussed in the Senate. To be fair, I had done a small hearing at the Senate back in 2003 (my only other such experience) about the ideas in A New Kind of Science. But then it had been very much on the “science track”; now the whole discussion was decidedly mainstream.

I couldn’t help thinking that I was witnessing the concept of computation beginning to come of age. What used to be esoteric issues in the theory of computation were now starting to be things that senators were discussing writing laws about. One of the senators mentioned atomic energy, and compared it to AI. But really, AI is going to be something much more central to the whole future of our species.

It enables us to do so much. But yet it forces us to confront what we want to do, and who we want to be. Today it’s rare and exotic for the Senate to be discussing issues of AI. In time I suspect AI and its many consequences will be a dominant theme in many Senate discussions. This is just the beginning.

I wish we were ready to really start creating an AI Constitution. But we’re not (and it doesn’t help that we don’t have an AI analog of the few thousand years of human political history that were available as a guide when the US Constitution was drafted). Still, issue by issue I suspect we’ll move closer to the point where having a coherent AI Constitution becomes a necessity. No doubt there’ll be different ones in different communities and different countries. But one day a group like the one I saw today—with all the diverse and sometimes colorful characters involved—will end up having to figure out just how we humans interact with AI and the computational world.

The Written Testimony

Download PDF


Automated content selection by internet businesses has become progressively more contentious—leading to calls to make it more transparent or constrained. I explain some of the complex intellectual and scientific problems involved, then offer two possible technical and market suggestions for paths forward. Both are based on giving users a choice about who to trust for the final content they see—in one case introducing what I call “final ranking providers”, and in the other case what I call “constraint providers”.

The Nature of the Problem

There are many kinds of businesses that operate on the internet, but some of the largest and most successful are what one can call automated content selection businesses. Facebook, Twitter, YouTube and Google are all examples. All of them deliver content that others have created, but a key part of their value is associated with their ability to (largely) automatically select what content they should serve to a given user at a given time—whether in news feeds, recommendations, web search results, or advertisements.

What criteria are used to determine content selection? Part of the story is certainly to provide good service to users. But the paying customers for these businesses are not the users, but advertisers, and necessarily a key objective of these businesses must be to maximize advertising income. Increasingly, there are concerns that this objective may have unacceptable consequences in terms of content selection for users. And in addition there are concerns that—through their content selection—the companies involved may be exerting unreasonable influence in other kinds of business (such as news delivery), or in areas such as politics.

Methods for content selection—using machine learning, artificial intelligence, etc.—have become increasingly sophisticated in recent years. A significant part of their effectiveness—and economic success—comes from their ability to use extensive data about users and their previous activities. But there has been increasing dissatisfaction and, in some cases, suspicion about just what is going on inside the content selection process.

This has led to a desire to make content selection more transparent, and perhaps to constrain aspects of how it works. As I will explain, these are not easy things to achieve in a useful way. And in fact, they run into deep intellectual and scientific issues, that are in some ways a foretaste of problems we will encounter ever more broadly as artificial intelligence becomes more central to the things we do. Satisfactory ultimate solutions will be difficult to develop, but I will suggest here two near-term practical approaches that I believe significantly address current concerns.

How Automated Content Selection Works

Whether one’s dealing with videos, posts, webpages, news items or, for that matter, ads, the underlying problem of automated content selection (ACS) is basically always the same. There are many content items available (perhaps even billions of them), and somehow one has to quickly decide which ones are “best” to show to a given user at a given time. There’s no fundamental principle to say what “best” means, but operationally it’s usually in the end defined in terms of what maximizes user clicks, or revenue from clicks.

The major innovation that has made modern ACS systems possible is the idea of automatically extrapolating from large numbers of examples. The techniques have evolved, but the basic idea is to effectively deduce a model of the examples and then to use this model to make predictions, for example about what ranking of items will be best for a given user.

Because it will be relevant for the suggestions I’m going to make later, let me explain here a little more about how most current ACS systems work in practice. The starting point is normally to extract a collection of perhaps hundreds or thousands of features (or “signals”) for each item. If a human were doing it, they might use features like: “How long is the video? Is it entertainment or education? Is it happy or sad?” But these days—with the volume of data that’s involved—it’s a machine doing it, and often it’s also a machine figuring out what features to extract. Typically the machine will optimize for features that make its ultimate task easiest—whether or not (and it’s almost always not) there’s a human-understandable interpretation of what the features represent.

As an example, here are the letters of the alphabet automatically laid out by a machine in a “feature space” in which letters that “look similar” appear nearby:

Feature space plot

How does the machine know what features to extract to determine whether things will “look similar”? A typical approach is to give it millions of images that have been tagged with what they are of (“elephant”, “teacup”, etc.). And then from seeing which images are tagged the same (even though in detail they look different), the machine is able—using the methods of modern machine learning—to identify features that could be used to determine how similar images of anything should be considered to be.

OK, so let’s imagine that instead of letters of the alphabet laid out in a 2D feature space, we’ve got a million videos laid out in a 200-dimensional feature space. If we’ve got the features right, then videos that are somehow similar should be nearby in this feature space.

But given a particular person, what videos are they likely to want to watch? Well, we can do the same kind of thing with people as with videos: we can take the data we know about each person, and extract some set of features. “Similar people” would then be nearby in “people feature space”, and so on.

But now there’s a “final ranking” problem. Given features of videos, and features of people, which videos should be ranked “best” for which people? Often in practice, there’s an initial coarse ranking. But then, as soon as we have a specific definition of “best”—or enough examples of what we mean by “best”—we can use machine learning to learn a program that will look at the features of videos and people, and will effectively see how to use them to optimize the final ranking.

The setup is a bit different in different cases, and there are many details, most of which are proprietary to particular companies. However, modern ACS systems—dealing as they do with immense amounts of data at very high speed—are a triumph of engineering, and an outstanding example of the power of artificial intelligence techniques.

Is It “Just an Algorithm”?

When one hears the term “algorithm” one tends to think of a procedure that will operate in a precise and logical way, always giving a correct answer, not influenced by human input. One also tends to think of something that consists of well-defined steps, that a human could, if needed, readily trace through.

But this is pretty far from how modern ACS systems work. They don’t deal with the same kind of precise questions (“What video should I watch next?” just isn’t something with a precise, well-defined answer). And the actual methods involved make fundamental use of machine learning, which doesn’t have the kind of well-defined structure or explainable step-by-step character that’s associated with what people traditionally think of as an “algorithm”. There’s another thing too: while traditional algorithms tend to be small and self-contained, machine learning inevitably requires large amounts of externally supplied data.

In the past, computer programs were almost exclusively written directly by humans (with some notable exceptions in my own scientific work). But the key idea of machine learning is instead to create programs automatically, by “learning the program” from large numbers of examples. The most common type of program on which to apply machine learning is a so-called neural network. Although originally inspired by the brain, neural networks are purely computational constructs that are typically defined by large arrays of numbers called weights.

Imagine you’re trying to build a program that recognizes pictures of cats versus dogs. You start with lots of specific pictures that have been identified—normally by humans—as being either of cats or dogs. Then you “train” a neural network by showing it these pictures and gradually adjusting its weights to make it give the correct identification for these pictures. But then the crucial point is that the neural network generalizes. Feed it another picture of a cat, and even if it’s never seen that picture before, it’ll still (almost certainly) say it’s a cat.

What will it do if you feed it a picture of a cat dressed as a dog? It’s not clear what the answer is supposed to be. But the neural network will still confidently give some result—that’s derived in some way from the training data it was given.

So in a case like this, how would one tell why the neural network did what it did? Well, it’s difficult. All those weights inside the network were learned automatically; no human explicitly set them up. It’s very much like the case of extracting features from images of letters above. One can use these features to tell which letters are similar, but there’s no “human explanation” (like “count the number of loops in the letter”) of what each of the features are.

Would it be possible to make an explainable cat vs. dog program? For 50 years most people thought that a problem like cat vs. dog just wasn’t the kind of thing computers would be able to do. But modern machine learning made it possible—by learning the program rather than having humans explicitly write it. And there are fundamental reasons to expect that there can’t in general be an explainable version—and that if one’s going to do the level of automated content selection that people have become used to, then one cannot expect it to be broadly explainable.

Sometimes one hears it said that automated content selection is just “being done by an algorithm”, with the implication that it’s somehow fair and unbiased, and not subject to human manipulation. As I’ve explained, what’s actually being used are machine learning methods that aren’t like traditional precise algorithms.

And a crucial point about machine learning methods is that by their nature they’re based on learning from examples. And inevitably the results they give depend on what examples were used.

And this is where things get tricky. Imagine we’re training the cat vs. dog program. But let’s say that, for whatever reason, among our examples there are spotted dogs but no spotted cats. What will the program do if it’s shown a spotted cat? It might successfully recognize the shape of the cat, but quite likely it will conclude—based on the spots—that it must be seeing a dog.

So is there any way to guarantee that there are no problems like this, that were introduced either knowingly or unknowingly? Ultimately the answer is no—because one can’t know everything about the world. Is the lack of spotted cats in the training set an error, or are there simply no spotted cats in the world?

One can do one’s best to find correct and complete training data. But one will never be able to prove that one has succeeded.

But let’s say that we want to ensure some property of our results. In almost all cases, that’ll be perfectly possible—either by modifying the training set, or the neural network. For example, if we want to make sure that spotted cats aren’t left out, we can just insist, say, that our training set has an equal number of spotted and unspotted cats. That might not be a correct representation of what’s actually true in the world, but we can still choose to train our neural network on that basis.

As a different example, let’s say we’re selecting pictures of pets. How many cats should be there, versus dogs? Should we base it on the number of cat vs. dog images on the web? Or how often people search for cats vs. dogs? Or how many cats and dogs are registered in America? There’s no ultimate “right answer”. But if we want to, we can give a constraint that says what should happen.

This isn’t really an “algorithm” in the traditional sense either—not least because it’s not about abstract things; it’s about real things in the world, like cats and dogs. But an important development (that I happen to have been personally much involved in for 30+ years) is the construction of a computational language that lets one talk about things in the world in a precise way that can immediately be run on a computer.

In the past, things like legal contracts had to be written in English (or “legalese”). Somewhat inspired by blockchain smart contracts, we are now getting to the point where we can write automatically executable computational contracts not in human language but in computational language. And if we want to define constraints on the training sets or results of automated content selection, this is how we can do it.

Issues from Basic Science

Why is it difficult to find solutions to problems associated with automated content selection? In addition to all the business, societal and political issues, there are also some deep issues of basic science involved. Here’s a list of some of those issues. The precursors of these issues date back nearly a century, though it’s only quite recently (in part through my own work) that they’ve become clarified. And although they’re not enunciated (or named) as I have here, I don’t believe any of them are at this point controversial—though to come to terms with them requires a significant shift in intuition from what exists without modern computational thinking.

Data Deducibility

Even if you don’t explicitly know something (say about someone), it can almost always be statistically deduced if there’s enough other related data available

What is a particular person’s gender identity, ethnicity, political persuasion, etc.? Even if one’s not allowed to explicitly ask these questions, it’s basically inevitable that with enough other data about the person, one will be able to deduce what the best answers must be.

Everyone is different in detail. But the point is that there are enough commonalities and correlations between people that it’s basically inevitable that with enough data, one can figure out almost any attribute of a person.

The basic mathematical methods for doing this were already known from classical statistics. But what’s made this now a reality is the availability of vastly more data about people in digital form—as well as the ability of modern machine learning to readily work not just with numerical data, but also with things like textual and image data.

What is the consequence of ubiquitous data deducibility? It means that it’s not useful to block particular pieces of data—say in an attempt to avoid bias—because it’ll essentially always be possible to deduce what that blocked data was. And it’s not just that this can be done intentionally; inside a machine learning system, it’ll often just happen automatically and invisibly.

Computational Irreducibility

Even given every detail of a program, it can be arbitrarily hard to predict what it will
or won’t do

One might think that if one had the complete code for a program, one would readily be able to deduce everything about what the program would do. But it’s a fundamental fact that in general one can’t do this. Given a particular input, one can always just run the program and see what it does. But even if the program is simple, its behavior may be very complicated, and computational irreducibility implies that there won’t be a way to “jump ahead” and immediately find out what the program will do, without explicitly running it.

One consequence of this is that if one wants to know, for example, whether with any input a program can do such-and-such, then there may be no finite way to determine this—because one might have to check an infinite number of possible inputs. As a practical matter, this is why bugs in programs can be so hard to detect. But as a matter of principle, it means that it can ultimately be impossible to completely verify that a program is “correct”, or has some specific property.

Software engineering has in the past often tried to constrain the programs it deals with so as to minimize such effects. But with methods like machine learning, this is basically impossible to do. And the result is that even if it had a complete automated content selection program, one wouldn’t in general be able to verify that, for example, it could never show some particular bad behavior.


For a well-optimized computation, there’s not likely to be a human-understandable narrative about how it works inside

Should we expect to understand how our technological systems work inside? When things like donkeys were routinely part of such systems, people didn’t expect to. But once the systems began to be “completely engineered” with cogs and levers and so on, there developed an assumption that at least in principle one could explain what was going on inside. The same was true with at least simpler software systems. But with things like machine learning systems, it absolutely isn’t.

Yes, one can in principle trace what happens to every bit of data in the program. But can one create a human-understandable narrative about it? It’s a bit like imagining we could trace the firing of every neuron in a person’s brain. We might be able to predict what a person would do in a particular case, but it’s a different thing to get a high-level “psychological narrative” about why they did it.

Inside a machine learning system—say the cats vs. dogs program—one can think of it as extracting all sorts of features, and making all sorts of distinctions. And occasionally one of these features or distinctions might be something we have a word for (“pointedness”, say). But most of the time they’ll be things the machine learning system discovered, and they won’t have any connection to concepts we’re familiar with.

And in fact—as a consequence of computational irreducibility—it’s basically inevitable that with things like the finiteness of human language and human knowledge, in any well-optimized computation we’re not going to be able to give a high-level narrative to explain what it’s doing. And the result of this is that it’s impossible to expect any useful form of general “explainability” for automated content selection systems.

Ethical Incompleteness

There’s no finite set of principles that can completely define any reasonable, practical system of ethics

Let’s say one’s trying to teach ethics to a computer, or an artificial intelligence. Is there some simple set of principles—like Asimov’s Laws of Robotics—that will capture a viable complete system of ethics? Looking at the complexity of human systems of laws one might suspect that the answer is no. And in fact this is presumably a fundamental result—essentially another consequence of computational irreducibility.

Imagine that we’re trying to define constraints (or “laws”) for an artificial intelligence, in order to ensure that the AI behaves in some particular “globally ethical” way. We set up a few constraints, and we find that many things the AI does follow our ethics. But computational irreducibility essentially guarantees that eventually there’ll always be something unexpected that’s possible. And the only way to deal with that is to add a “patch”—essentially to introduce another constraint for that new case. And the issue is that this will never end: there’ll be no way to give a finite set of constraints that will achieve our global objectives. (There’s a somewhat technical analogy of this in mathematics, in which Gödel’s theorem shows that no finite set of axiomatic constraints can give one only ordinary integers and nothing else.)

So for our purposes here, the main consequence of this is that we can’t expect to have some finite set of computational principles (or, for that matter, laws) that will constrain automated content selection systems to always behave according to some reasonable, global system of ethics—because they’ll always be generating unexpected new cases that we have to define a new principle to handle.

The Path Forward

I’ve described some of the complexities of handling issues with automated content selection systems. But what in practice can be done?

One obvious idea would be just to somehow “look inside” the systems, auditing their internal operation and examining their construction. But for both fundamental and practical reasons, I don’t think this can usefully be done. As I’ve discussed, to achieve the kind of functionality that users have become accustomed to, modern automated content selection systems make use of methods such as machine learning that are not amenable to human-level explainability or systematic predictability.

What about checking whether a system is, for example, biased in some way? Again, this is a fundamentally difficult thing to determine. Given a particular definition of bias, one could look at the internal training data used for the system—but this won’t usually give more information than just studying how the system behaves.

What about seeing if the system has somehow intentionally been made to do this or that? It’s conceivable that the source code could have explicit “if” statements that would reveal intention. But the bulk of the system will tend to consist of trained neural networks and so on—and as in most other complex systems, it’ll typically be impossible to tell what features might have been inserted “on purpose” and what are just accidental or emergent properties.

So if it’s not going to work to “look inside” the system, what about restricting how the system can be set up? For example, one approach that’s been suggested is to limit the inputs that the system can have, in an extreme case preventing it from getting any personal information about the user and their history. The problem with this is that it negates what’s been achieved over the course of many years in content selection systems—both in terms of user experience and economic success. And for example, knowing nothing about a user, if one has to recommend a video, one’s just going to have to suggest whatever video is generically most popular—which is very unlikely to be what most users want most of the time.

As a variant of the idea of blocking all personal information, one can imagine blocking just some information—or, say, allowing a third party to broker what information is provided. But if one wants to get the advantages of modern content selection methods, one’s going to have to leave a significant amount of information—and then there’s no point in blocking anything, because it’ll almost certainly be reproducible through the phenomenon of data deducibility.

Here’s another approach: what about just defining rules (in the form of computational contracts) that specify constraints on the results content selection systems can produce? One day, we’re going to have to have such computational contracts to define what we want AIs in general to do. And because of ethical incompleteness—like with human laws—we’re going to have to have an expanding collection of such contracts.

But even though (particularly through my own efforts) we’re beginning to have the kind of computational language necessary to specify a broad range of computational contracts, we realistically have to get much more experience with computational contracts in standard business and other situations before it makes sense to try setting them up for something as complex as global constraints on content selection systems.

So, what can we do? I’ve not been able to see a viable, purely technical solution. But I have formulated two possible suggestions based on mixing technical ideas with what amount to market mechanisms.

The basic principle of both suggestions is to give users a choice about who to trust, and to let the final results they see not necessarily be completely determined by the underlying ACS business.

There’s been debate about whether ACS businesses are operating as “platforms” that more or less blindly deliver content, or whether they’re operating as “publishers” who take responsibility for content they deliver. Part of this debate can be seen as being about what responsibility should be taken for an AI. But my suggestions sidestep this issue, and in different ways tease apart the “platform” and “publisher” roles.

It’s worth saying that the whole content platform infrastructure that’s been built by the large ACS businesses is an impressive and very valuable piece of engineering—managing huge amounts of content, efficiently delivering ads against it, and so on. What’s really at issue is whether the fine details of the ACS systems need to be handled by the same businesses, or whether they can be opened up. (This is relevant only for ACS businesses whose network effects have allowed them to serve a large fraction of a population. Small ACS businesses don’t have the same kind of lock-in.)

Suggestion A: Allow Users to Choose among Final Ranking Providers

Suggestion A

As I discussed earlier, the rough (and oversimplified) outline of how a typical ACS system works is that first features are extracted for each content item and each user. Then, based on these features, there’s a final ranking done that determines what will actually be shown to the user, in what order, etc.

What I’m suggesting is that this final ranking doesn’t have to be done by the same entity that sets up the infrastructure and extracts the features. Instead, there could be a single content platform but a variety of “final ranking providers”, who take the features, and then use their own programs to actually deliver a final ranking.

Different final ranking providers might use different methods, and emphasize different kinds of content. But the point is to let users be free to choose among different providers. Some users might prefer (or trust more) some particular provider—that might or might not be associated with some existing brand. Other users might prefer another provider, or choose to see results from multiple providers.

How technically would all this be implemented? The underlying content platform (presumably associated with an existing ACS business) would take on the large-scale information-handling task of deriving extracted features. The content platform would provide sufficient examples of underlying content (and user information) and its extracted features to allow the final ranking provider’s systems to “learn the meaning” of the features.

When the system is running, the content platform would in real time deliver extracted features to the final ranking provider, which would then feed this into whatever system they have developed (which could use whatever automated or human selection methods they choose). This system would generate a ranking of content items, which would then be fed back to the content platform for final display to the user.

To avoid revealing private user information to lots of different providers, the final ranking provider’s system should probably run on the content platform’s infrastructure. The content platform would be responsible for the overall user experience, presumably providing some kind of selector to pick among final ranking providers. The content platform would also be responsible for delivering ads against the selected content.

Presumably the content platform would give a commission to the final ranking provider. If properly set up, competition among final ranking providers could actually increase total revenue to the whole ACS business, by achieving automated content selection that serves users and advertisers better.

Suggestion B: Allow Users to Choose among Constraint Providers

Suggestion B

One feature of Suggestion A is that it breaks up ACS businesses into a content platform component, and a final ranking component. (One could still imagine, however, that a quasi-independent part of an ACS business could be one of the competing final ranking providers.) An alternative suggestion is to keep ACS businesses intact, but to put constraints on the results that they generate, for example forcing certain kinds of balance, etc.

Much like final ranking providers, there would be constraint providers who define sets of constraints. For example, a constraint provider could require that there be on average an equal number of items delivered to a user that are classified (say, by a particular machine learning system) as politically left-leaning or politically right-leaning.

Constraint providers would effectively define computational contracts about properties they want results delivered to users to have. Different constraint providers would define different computational contracts. Some might want balance; others might want to promote particular types of content, and so on. But the idea is that users could decide what constraint provider they wish to use.

How would constraint providers interact with ACS businesses? It’s more complicated than for final ranking providers in Suggestion A, because effectively the constraints from constraint providers have to be woven deeply into the basic operation of the ACS system.

One possible approach is to use the machine learning character of ACS systems, and to insert the constraints as part of the “learning objectives” (or, technically, “loss functions”) for the system. Of course, there could be constraints that just can’t be successfully learned (for example, they might call for types of content that simply don’t exist). But there will be a wide range of acceptable constraints, and in effect, for each one, a different ACS system would be built.

All these ACS systems would then be operated by the underlying ACS business, with users selecting which constraint provider—and therefore which overall ACS system—they want to use.

As with Suggestion A, the underlying ACS business would be responsible for delivering advertising, and would pay a commission to the constraint provider.

Although their detailed mechanisms are different, both Suggestions A and B attempt to leverage the exceptional engineering and commercial achievements of the ACS businesses, while diffusing current trust issues about content selection, providing greater freedom for users, and inserting new opportunities for market growth.

The suggestions also help with some other issues. One example is the banning of content providers. At present, with ACS businesses feeling responsible for content on their platforms, there is considerable pressure, not least from within the ACS businesses themselves, to ban content providers that they feel are providing inappropriate content. The suggestions diffuse the responsibility for content, potentially allowing the underlying ACS businesses not to ban anything but explicitly illegal content.

It would then be up to the final ranking providers, or the constraint providers, to choose whether or not to deliver or allow content of a particular character, or from a particular content provider. In any given case, some might deliver or allow it, and some might not, removing the difficult all-or-none nature of the banning that’s currently done by ACS businesses.

One feature of my suggestions is that they allow fragmentation of users into groups with different preferences. At present, all users of a particular ACS business have content that is basically selected in the same way. With my suggestions, users of different persuasions could potentially receive completely different content, selected in different ways.

While fragmentation like this appears to be an almost universal tendency in human society, some might argue that having people routinely be exposed to other people’s points of view is important for the cohesiveness of society. And technically some version of this would not be difficult to achieve. For example, one could take the final ranking or constraint providers, and effectively generate a feature space plot of what they do.

Some would be clustered close together, because they lead to similar results. Others would be far apart in feature space—in effect representing very different points of view. Then if someone wanted to, say, see their typical content 80% of the time, but see different points of view 20% of the time, the system could combine different providers from different parts of feature space with a certain probability.

Of course, in all these matters, the full technical story is much more complex. But I am confident that if they are considered desirable, either of the suggestions I have made can be implemented in practice. (Suggestion A is likely to be somewhat easier to implement than Suggestion B.) The result, I believe, will be richer, more trusted, and even more widely used automated content selection. In effect both my suggestions mix the capabilities of humans and AIs—to help get the best of both of them—and to navigate through the complex practical and fundamental problems with the use of automated content selection.

]]> 4
<![CDATA[My Part in an Origin Story: <br />The Launching of the Santa Fe Institute</br>]]> Tue, 18 Jun 2019 19:36:02 +0000 Stephen Wolfram Launching the Santa Fe InstituteThe first workshop to define what is now the Santa Fe Institute took place on October 5–6, 1984. I was recently asked to give some reminiscences of the event, for a republication of a collection of papers derived from this and subsequent workshops. It was a slightly dark room, decorated with Native American artifacts. Around [...]]]> Launching the Santa Fe Institute

The first workshop to define what is now the Santa Fe Institute took place on October 5–6, 1984. I was recently asked to give some reminiscences of the event, for a republication of a collection of papers derived from this and subsequent workshops.

It was a slightly dark room, decorated with Native American artifacts. Around it were tables arranged in a large rectangle, at which sat a couple dozen men (yes, all men), mostly in their sixties. The afternoon was wearing on, with many different people giving their various views about how to organize what amounted to a putative great new interdisciplinary university.

Here’s the original seating chart, together with a current view of the meeting room. (I’m only “Steve” to Americans currently over the age of 60…):

Santa Fe seating chart
Dobkin Boardroom

I think I was less patient in those days. But eventually I could stand it no longer. I don’t remember my exact words, but they boiled down to: “What are you going to do if you only raise a few million dollars, not two billion?” It was a strange moment. After all, I was by far the youngest person there—at 25 years old—and yet it seemed to have fallen to me to play the “let’s get real” role. (To be fair, I had founded my first tech company a couple of years earlier, and wasn’t a complete stranger to the world of grandiose “what-if” discussions, even if I was surprised, though more than a little charmed, to be seeing them in the sixty-something-year-old set.)

A fragment of my notes from the day record my feelings:

What is supposed to be the point of this discussion?

George Cowan (Manhattan Project alum, Los Alamos administrator, and founder of the Los Alamos Bank) was running the meeting, and I sensed a mixture of frustration and relief at my question. I don’t remember precisely what he said, but it boiled down to: “Well, what do you think we should do?” “Well”, I said, “I do have a suggestion”. I summarized it a bit, but then it was agreed that later that day I should give a more formal presentation. And that’s basically how I came to suggest that what would become the Santa Fe Institute should focus on what I called “Complex Systems Theory”.

Of course, there was a whole backstory to this. It basically began in 1972, when I was 12 years old, and saw the cover of a college physics textbook that purported to show an arrangement of simulated colliding molecules progressively becoming more random. I was fascinated by this phenomenon, and quite soon started trying to use a computer to understand it. I didn’t get too far with this. But it was the golden age of particle physics, and I was soon swept up in publishing papers about a variety of topics in particle physics and cosmology.

Still, in all sorts of different ways I kept on coming back to my interest in how randomness—or complexity—gets produced. In 1978 I went to Caltech as a graduate student, with Murray Gell-Mann (inventor of quarks, and the first chairman of the Santa Fe Institute) doing his part to recruit me by successfully tracking down a phone number for me in England. Then in 1979, as a way to help get physics done, I set about building my first large-scale computer language. In 1981, the first version was finished, I was installed as a faculty member at Caltech—and I decided it was time for me to try something more ambitious, and really see what I could figure out about my old interest in randomness and complexity.

By then I had picked away at many examples of complexity. In self-gravitating gases. In dendritic crystal growth. In road traffic flow. In neural networks. But the reductionist physicist in me wanted to drill down and find out what was underneath all these. And meanwhile the computer language designer in me thought, “Let’s just invent something and see what can be done with it”. Well, pretty soon I invented what I later found out were called cellular automata.

I didn’t expect that simple cellular automata would do anything particularly interesting. But I decided to try computer experiments on them anyway. And to my great surprise I discovered that—despite the simplicity of their construction—cellular automata can in fact produce behavior of great complexity. It’s a major shock to traditional scientific intuition—and, as I came to realize in later years, a clue to a whole new kind of science.

But for me the period from 1981 to 1984 was an exciting one, as I began to explore the computational universe of simple programs like cellular automata, and saw just how rich and unexpected it was. David Pines, as the editor of Reviews of Modern Physics, had done me the favor of publishing my first big paper on cellular automata (John Maddox, editor of Nature, had published a short summary a little earlier). Through the Center for Nonlinear Studies, I had started making visits to Los Alamos in 1981, and I initiated and co-organized the first-ever conference devoted to cellular automata, held at Los Alamos in 1983.

In 1983 I had left Caltech (primarily as a result of an unhappy interaction about intellectual property rights) and gone to the Institute for Advanced Study in Princeton, and begun to build a group there concerned with studying the basic science of complex systems. I wasn’t sure until quite a few years later just how general the phenomena I’d seen in cellular automata were. But I was pretty certain that there were at least many examples of complexity across all sorts of fields that they’d finally let one explain in a fundamental, theoretical way.

I’m not sure when I first heard about plans for what was then called the Rio Grande Institute. But I remember not being very hopeful about it; it seemed too correlated with the retirement plans of a group of older physicists. But meanwhile, people like Pete Carruthers (director of T Division at Los Alamos) were encouraging me to think about starting my own institute to pursue the kind of science I thought could be done.

I didn’t know quite what to make of the letter I received in July 1984 from Nick Metropolis (long-time Los Alamos scientist, and inventor of the Metropolis method). It described the nascent Rio Grande Institute as “a teaching and research institution responsive to the challenge of emerging new syntheses in science”. Murray Gell-Mann had told me that it would bring together physics and archaeology, linguistics and cosmology, and more. But at least in the circulated documents, the word “complexity” appeared quite often.

Letter from Los Alamos—click to enlarge

The invitation described the workshop as being “to examine a concept for a fresh approach to research and teaching in rapidly developing fields of scientific activity dealing with highly complex, interactive systems”. Murray Gell-Mann, who had become a sort of de facto intellectual leader of the effort, was given to quite flowery descriptions, and declared that the institute would be involved with “simplicity and complexity”.

When I arrived at the workshop it was clear that everyone wanted their favorite field to get a piece of the potential action. Should I even bring up my favorite emerging field? Or should I just make a few comments about computers and let the older guys do their thing?

As I listened to the talks and discussions, I kept wondering how what I was studying might relate to them. Quite often I really didn’t know. At the time I still believed, for example, that adaptive systems might have fundamentally different characteristics. But still, the term “complexity” kept on coming up. And if the Rio Grande Institute needed an area to concentrate on, it seemed that a general study of complexity would be the closest to being central to everything they were talking about.

I’m not sure quite what the people in the room made of my speech about “complex systems theory”. But I think I did succeed in making the point that there really could be a general “science of complexity”—and that things like cellular automata could show one how it might work. People had been talking about the complexity of this, or the complexity of that. But it seemed like I’d at least started the process of getting people to talk about complexity as an abstract thing one could expect to have general theories about.

After that first workshop, I had a few more interactions with what was to be the Santa Fe Institute. I still wasn’t sure what was going to happen with it—but the “science of complexity” idea did seem to be sticking. Meanwhile, however, I was forging ahead with my own plans to start a complex systems institute (I avoided the term “complexity theory” out of deference to the rather different field of computational complexity theory). I was talking to all sorts of universities, and in fact David Pines was encouraging me to consider the University of Illinois.

George Cowan asked me if I’d be interested in running the research program for the Santa Fe Institute, but by that point I was committed to starting my own operation, and it wasn’t long afterwards that I decided to do it at the University of Illinois. My Center for Complex Systems Research—and my journal Complex Systems—began operations in the summer of 1986.

Complex Systems

I’m not sure how things would have been different if I’d ended up working with the Santa Fe Institute. But as it was, I rather quickly tired of the effort to raise money for complex systems research, and I was soon off creating what became Mathematica (and now the Wolfram Language), and starting my company Wolfram Research.

By the early 1990s, probably in no small part through the efforts of the Santa Fe Institute, “complexity” had actually become a popular buzzword, and, partly through a rather circuitous connection to climate science, funding had started pouring in. But having launched Mathematica and my company, I’d personally pretty much vanished from the scene, working quietly on using the tools I’d created to pursue my interests in basic science. I thought it would only take a couple of years, but in the end it took more than a decade.

I discovered a lot—and realized that, yes, the phenomena I’d first seen with cellular automata and talked about at the Santa Fe workshop were indeed a clue to a whole new kind of science, with all sorts of implications for long-standing problems and for the future. I packaged up what I’d figured out—and in 2002 published my magnum opus A New Kind of Science.

A New Kind of Science

It was strange to reemerge after a decade and a half away. The Santa Fe Institute had continued to pursue the science of complexity. As something of a hermit in those years, I hadn’t interacted with it—but there was curiosity about what I was doing (highlighted, if nothing else, by a bizarre incident in 1998 involving “leaks” about my research). When my book came out in 2002 I was pleased that I thought I’d actually done what I talked about doing back at that Santa Fe workshop in 1984—as well as much more.

But by then almost nobody who’d been there in 1984 was still involved with the Santa Fe Institute, and instead there was a “new guard” (now, I believe, again departed), who, far from being pleased with my progress and success in broadening the field, actually responded with rather unseemly hostility.

It’s been an interesting journey from those days in October 1984. Today complex systems research is very definitely “a thing”, and there are hundreds of “complex systems” institutes around the world. (Though I still don’t think the basic science of complexity, as opposed to its applications, has received the attention it should.) But the Santa Fe Institute remains the prototypical example—and it’s not uncommon when I talk about complexity research for people to ask, “Is that like what the Santa Fe Institute does?”

“Well actually”, I sometimes say, “there’s a little footnote to history about that”. And off I go, talking about that Saturday afternoon back in October 1984—when I could be reached (as the notes I distributed said) through that newfangled thing called email at ias!swolf

Stephen Wolfram's notes on complex systems—click to enlarge

]]> 0
<![CDATA[A Few Thoughts about Deep Fakes]]> Wed, 12 Jun 2019 23:55:38 +0000 Stephen Wolfram deep-fake-thumbSomeone from the House Permanent Select Committee on Intelligence recently contacted me about a hearing they’re having on the subject of deep fakes. I can’t attend the hearing, but the conversation got me thinking about the subject of deep fakes, and I made a few quick notes…. What You See May Not Be What Happened [...]]]> deep-fake-thumb

Someone from the House Permanent Select Committee on Intelligence recently contacted me about a hearing they’re having on the subject of deep fakes. I can’t attend the hearing, but the conversation got me thinking about the subject of deep fakes, and I made a few quick notes….

What You See May Not Be What Happened

The idea of modifying images is as old as photography. At first, it had to be done by hand (sometimes with airbrushing). By the 1990s, it was routinely being done with image manipulation software such as Photoshop. But it’s something of an art to get a convincing result, say for a person inserted into a scene. And if, for example, the lighting or shadows don’t agree, it’s easy to tell that what one has isn’t real.

What about videos? If one does motion capture, and spends enough effort, it’s perfectly possible to get quite convincing results—say for animating aliens, or for putting dead actors into movies. The way this works, at least in a first approximation, is for example to painstakingly pick out the keypoints on one face, and map them onto another.

What’s new in the past couple of years is that this process can basically be automated using machine learning. And, for example, there are now neural nets that are simply trained to do “face swapping”:

Face swap

In essence, what these neural nets do is to fit an internal model to one face, and then apply it to the other. The parameters of the model are in effect learned from looking at lots of real-world scenes, and seeing what’s needed to reproduce them. The current approaches typically use generative adversarial networks (GANs), in which there’s iteration between two networks: one trying to generate a result, and one trying to discriminate that result from a real one.

Today’s examples are far from perfect, and it’s not too hard for a human to tell that something isn’t right. But even just as a result of engineering tweaks and faster computers, there’s been progressive improvement, and there’s no reason to think that within a modest amount of time it won’t be possible to routinely produce human-indistinguishable results.

Can Machine Learning Police Itself?

OK, so maybe a human won’t immediately be able to tell what’s real and what’s not. But why not have a machine do it? Surely there’s some signature of something being “machine generated”. Surely there’s something about a machine-generated image that’s statistically implausible for a real image.

Well, not naturally. Because, in fact, the whole way the machine images are generated is by having models that as faithfully as possible reproduce the “statistics” of real images. Indeed, inside a GAN there’s explicitly a “fake or not” discriminator. And the whole point of the GAN is to iterate until the discriminator can’t tell the difference between what’s being generated, and something real.

Could one find some other feature of an image that the GAN isn’t paying attention to—like whether a face is symmetric enough, or whether writing in the background is readable? Sure. But at this level it’s just an arms race: having identified a feature, one puts it into the model the neural net is using, and then one can’t use that feature to discriminate any more.

There are limitations to this, however. Because there’s a limit to what a typical neural net can learn. Generally, neural nets do well at tasks like image recognition that humans do without thinking. But it’s a different story if one tries to get neural nets to do math, and for example factor numbers.

Imagine that in modifying a video one has to fill in a background that’s showing some elaborate computation—say a mathematical one. Well, then a standard neural net basically doesn’t stand a chance.

Will it be easy to tell that it’s getting it wrong? It could be. If one’s dealing with public-key cryptography, or digital signatures, one can certainly imagine setting things up so that it’s very hard to generate something that is correct, but easy to check whether it is.

But will this kind of thing show up in real images or videos? My own scientific work has actually shown that irreducibly complex computation can be quite ubiquitous even in systems with very simple rules—and presumably in many systems in nature. Watch a splash in water. It takes a complex computation to figure out the details of what’s going to happen. And while a neural net might be able to get something that basically looks like a splash, it’d be vastly harder for it to get the details of a particular splash right.

But even though in the abstract computational irreducibility may be common, we humans, in our evolution and the environments we set up for ourselves, tend to end up doing our best to avoid it. We have shapes with smooth curves. We build things with simple geometries. We try to make things evolvable or understandable.  And it’s this avoidance of computational irreducibility that makes it feasible for neural nets to successfully model things like the visual scenes in which we typically find ourselves.

One can disrupt this, of course. Just put in the picture a display that’s showing some sophisticated computation (even, for example, a cellular automaton). If someone tries to fake some aspect of this with a neural net, it won’t (at least on its own) feasibly be able to get the details right.

I suspect that in the future of human technology—as we mine deeper in the computational universe—irreducible computation will be much more common in what we build. But as of now, it’s still rare in typical human-related situations. And as a result, we can expect that neural nets will successfully be able to model what’s going on well enough to at least fool other neural nets.

How to Know What’s Real

So if there’s no way to analyze the bits in an image to tell if it’s a real photograph, does that mean we just can’t tell? No. Because we can also think about metadata associated with the image—and about the provenance of the image. When was the image created? By whom? And so on.

So let’s say we create an image. How can we set things up so that we can prove when we did it? Well, in modern times it’s actually very easy. We take the image, and compute a cryptographic hash from it (effectively by applying a mathematical operation that derives a number from the bits in the image). Then we take this hash and put it on a blockchain.

The blockchain acts as a permanent ledger. Once we’ve put data on it, it can never be changed, and we can always go back and see what the data was, and when it was added to the blockchain.

This setup lets us prove that the image was created no later than a certain time. If we want to prove that the image wasn’t created earlier, then when we create the hash for the image, we can throw in a hash from the latest block on our favorite blockchain.

OK, but what about knowing who created the image? It takes a bit of cryptographic infrastructure—very similar to what’s done in proving the authenticity of websites. But if one can trust some “certificate authority” then one can associate a digital signature to the image that validates who created it.

But how about knowing where the image was taken? Assuming one has a certain level of access to the device or the software, GPS can be spoofed. If one records enough about the environment when the image was taken, then it gets harder and harder to spoof. What were the nearby Wi-Fi networks? The Bluetooth pings? The temperature? The barometric pressure? The sound level? The accelerometer readings? If one has enough information collected, then it becomes easier to tell if something doesn’t fit.

There are several ways one could do this. Perhaps one could just detect anomalies using machine learning. Or perhaps one could use actual models of how the world works (the path implied by the accelerometer isn’t consistent with the equations of mechanics, etc.). Or one could somehow tie the information to some public computational fact. Was the weather really like that in the place the photo was said to be taken? Why isn’t there a shadow from such-and-such a plane going overhead? Why is what’s playing on the television not what it should be? Etc.

But, OK, even if one just restricts oneself to creation time and creator ID, how can one in practice validate them?

The best scheme seems to be something like how modern browsers handle website security. The browser tries to check the cryptographic signature of the website. If it matches, the browser shows something to say the website is secure; if not, it shows some kind of warning.

So let’s say an image comes with data on its creation time and creator ID. The data could be metadata (say EXIF data), or it could be a watermark imprinted on the detailed bits in the image. Then the image viewer (say in the browser) can check whether the hash on a blockchain agrees with what the data provided by the image implies. If it does, fine. And the image viewer can make the creation time and creator ID available. If not, the image viewer should warn the user that something seems to be wrong.

Exactly the same kind of thing can be done with videos. It just requires video players computing hashes on the video, and comparing to what’s on a blockchain. And by doing this, one can guarantee, for example, that one’s seeing a whole video that was made at a certain time.

How would this work in practice? Probably people often wouldn’t want to see all the raw video taken at some event. But a news organization, for example, could let people click through to it if they wanted. And one can easily imagine digital signature mechanisms that could be used to guarantee that an edited video, for example, contained no content not in certain source videos, and involved, say, specified contiguous chunks from these source videos.

The Path Forward

So, where does this leave us with deep fakes? Machine learning on its own won’t save us. There’s not going to be a pure “fake or not” detector that can run on any image or video. Yes, there’ll be ways to protect oneself against being “faked” by doing things like wearing a live cellular automaton tie. But the real way to combat deep fakes, I think, is to use blockchain technology—and to store on a public ledger cryptographic hashes of both images and sensor data from the environment where the images were acquired. The very presence of a hash can guarantee when an image was acquired; “triangulating” from sensor and other data can give confidence that what one is seeing was something that actually happened in the real world.

Of course, there are lots of technical details to work out. But in time I’d expect image and video viewers could routinely check against blockchains (and “data triangulation computations”), a bit like how web browsers now check security certificates. And today’s “pics or it didn’t happen” will turn into “if it’s not on the blockchain it didn’t happen”.

]]> 3