## Yet Another Ruliological Surprise

Integers. Addition. Subtraction. Maybe multiplication. Surely that’s not enough to be able to generate any serious complexity. In the early 1980s I had made the very surprising discovery that very simple programs based on cellular automata could generate great complexity. But how widespread was this phenomenon?

At the beginning of the 1990s I had set about exploring this. Over and over I would consider some type of system and be sure it was too simple to “do anything interesting”. And over and over again I would be wrong. And so it was that on the night of August 13, 1993, I thought I should check what could happen with integer functions defined using just addition and subtraction.

I knew, of course, about defining functions by recursion, like `Fibonacci`:

But could I find something like this that would have complex behavior? I did the analog of what I have done so many times, and just started (symbolically) enumerating possible definitions. And immediately I saw cases with nested functions, like:

(For some reason I wanted to keep the same initial conditions as `Fibonacci`: *f*[1] = *f*[2] = 1.) What would functions like this do? My original notebook records the result in this case:

But a few minutes later I found something very different: a simple nestedly recursive function with what seemed like highly complex behavior:

I remembered seeing a somewhat similarly defined function discussed before. But the behavior I’d seen reported for that function, while intricate, was nested and ultimately highly regular. And, so far as I could tell, much like with rule 30 and all the other systems I’d investigated, nobody had ever seen serious complexity in simple recursive functions.

It was a nice example. But it was one among many. And when I published *A New Kind of Science* in 2002, I devoted just four pages (and 7 notes) to “recursive sequences”—even though the gallery I made of their behavior became a favorite page of mine:

A year after the book was published we held our first Wolfram Summer School, and as an opening event I decided to do a live computer experiment—in which I would try to make a real-time science discovery. The subject I chose was nestedly recursive functions. It took a few hours. But then, yes, we made a discovery! We found that there was a nestedly recursive function simpler than the ones I’d discussed in *A New Kind of Science* that already seemed to have very complex behavior:

Over the couple of decades that followed I returned many times to nestedly recursive functions—particularly in explorations I did with high school and other students, or in suggestions I made for student projects. Then recently I used them several times as “intuition-building examples” in various investigations.

I’d always felt my work with nestedly recursive functions was unfinished. Beginning about five years ago—particularly energized by our Physics Project—I started looking at harvesting seeds I’d sown in *A New Kind of Science* and before. I’ve been on quite a roll, with a few pages or even footnotes repeatedly flowering into rich book-length stories. And finally—particularly after my work last year on “Expression Evaluation and Fundamental Physics”—I decided it was time to try to finish my exploration of nestedly recursive functions.

Our modern Wolfram Language tools—as well as ideas from our Physics Project—provided some new directions to explore. But I still thought I pretty much knew what we’d find. And perhaps after all these years I should have known better. Because somehow in the computational universe—and in the world of ruliology—there are always surprises.

And here, yet again, there was indeed quite a surprise.

## The Basic Idea

Consider the definition (later we’ll call this “P312”)

which we can also write as:

The first few values for *f*[*n*] generated from this definition are:

Continuing further we get:

But how are these values actually computed? To see that we can make an “evaluation graph” in which we show how each value of *f*[*n*] is computed from ones with smaller values of *n*, here starting from *f*[20]:

The gray nodes represent initial conditions: places where *f*[*n*] was sampled for *n *≤ 0. The two different colors of edges correspond to the two different computations done in evaluating *f*[*n*]

Continuing to *f*[30] we get:

But what’s the structure of this graph? If we pull out the “red” graph on its own, we can see that it breaks into two path graphs, that consist of the sequences of the *f*[*n*] for odd and even *n*, respectively:

The “blue” graph, on the other hand, breaks into four components—each always a tree—leading respectively to the four different initial conditions:

And for example we can now plot *f*[*n*], showing which tree each *f*[*n*] ends up being associated with:

We’ll be using this same basic setup throughout, though for different functions. We’ll mostly consider recursive definitions with a single term (i.e. with a single “outermost *f*”, not two, as in Fibonacci recurrences).

The specific families of recursive functions we’ll be focusing on are:

And with this designation, the function we just introduced is P312.

## A Closer Look at P312 ( f[n_] := 3 + f[n – f[n – 2]] )

Let’s start off by looking in more detail at the function we just introduced. Here’s what it does up to *n *= 500:

It might seem as if it’s going to go on “seemingly randomly” forever. But if we take it further, we get a surprise: it seems to “resolve itself” to something potentially simpler:

What’s going on? Let’s plot this again, but now showing which “blue graph tree” each value is associated with:

And now what we see is that the *f*[–3] and *f*[–2] trees stop contributing to *f*[*n*] when *n* is (respectively) 537 and 296, and these trees are finite (and have sizes 53 and 15):

The overall structures of the “remaining” trees—here shown up to *f*[5000]—eventually start to exhibit some regularity:

We can home in on this regularity by arranging these trees in layers, starting from the root, then plotting the number of nodes in each successive layer:

Looking at these pictures suggests that there should be some kind of more-or-less direct “formula” for *f*[*n*], at least for large *n*. They also suggest that such a formula should have some kind of mod-6 structure. And, yes, there does turn out to be essentially a “formula”. Though the “formula” is quite complicated—and reminiscent of several other “strangely messy” formulas in other ruliological cases—like Turing machine 600720 discussed in *A New Kind of Science* or combinator `s[s[s]][s][s][s][s]`.

Later on, we’ll see the much simpler recursive function P111 (*f*[*n*_] := 1 + *f*[*n* – *f*[*n* – 1]]). The values for this function form a sequence in which successive blocks of length *k* have value *k*:

P312 has the same kind of structure, but much embellished. First, it has 6 separate riffled (“mod”) subsequences. Each subsequence then consists of a sequence of blocks. Given a value *n*, this computes which subsequence this is on, which block for that subsequence it’s in, and where it is within that block:

So, for example, here are results for multiples of 1000:

For *n *= 1000 we’re not yet in the “simple” regime, we can’t describe the sequence in any simple way, and our “indices” calculation is meaningless. For *n *= 2000 it so happens that we are at block 0 for the mod-1 subsequence. And the way things are set up, we just start by giving exactly the form of block 0 for each mod. So for mod 1 the block is:

But now *n *= 2000 has offset 16 within this block, so the final value of *f*[2000] is simply the 16th value from this list, or 100. *f*[2001] is then simply the next element within this block, or 109. And so on—until we reach the end of the block.

But what if we’re not dealing with block 0? For example, according to the table above, *f*[3000] is determined by mod-3 block 1. It turns out there’s a straightforward, if messy, way to compute any block *b* (for mod *m*):

So now we have a way to compute the value, say of *f*[3000], effectively just by “evaluating a formula”:

And what’s notable is that this evaluation doesn’t involve any recursion. In other words, at the cost of “messiness” we’ve—somewhat surprisingly—been able to unravel all the recursion in P312 to arrive at a “direct formula” for the value of *f*[*n*] for any *n*.

So what else can we see about the behavior of *f*[*n*] for P312? One notable feature is its overall growth rate. For large *n*, it turns out that (as can be seen by substituting this form into the recursive definition and taking a limit):

One thing this means is that our evaluation graph eventually has a roughly conical form:

This can be compared to the very regular cone generated by P111 (which has asymptotic value ):

If one just looks at the form of the recursive definition for P312 it’s far from obvious “how far back” it will need to probe, or, in other words, what values of *f*[*n*] one will need to specify as initial conditions. As it turns out, though, the only values needed are *f*[–3], *f*[–2], *f*[–1] and *f*[0].

How can one see this? In 3 + *f*[*n* – *f*[*n* – 2]] it’s only the outer *f* that can probe “far back” values. But how far it actually goes back depends on how much larger *f*[*n – *2] gets compared to *n*. Plotting *f*[*n – *2] and *n* together we have:

And the point is that only for very few values of *n* does *f*[*n –* 2] exceed *n*—and it’s these values that probe back. Meanwhile, for larger *n*, there can never be additional “lookbacks”, because *f*[*n*] only grows like .

So does any P312 recursion always have the same lookback? So far, we’ve considered specifically the initial condition *f*[*n*] = 1 for all *n* ≤ 0. But what if we change the value of *f*[0]? Here are plots of *f*[*n*] for different cases:

And it turns out that with *f*[0] = *z*, the lookback goes to –*z* for *z *≥ 3, and to *z* – 4 for 1 ≤ *z* ≤ 2.

(If *z *≤ 0 the function *f*[*n*] is basically not defined, because the recursion is trying to compute *f*[*n*] from *f*[*n*], *f*[*n* + 1], etc., so never “makes progress”.)

The case *f*[0] = 2 (i.e. *z* = 2) is the one that involves the least lookback—and a total of 3 initial values. Here is the evaluation graph in this case:

By comparison, here is the evaluation graph for the case *f*[0] = 5, involving 6 initial values:

If we plot the value of *f*[*n*] as a function of *f*[0] we get the following:

For *n* < 3 *f*[0], *f*[*n*] always has simple behavior, and is essentially periodic in *n* with period 3:

And it turns out that for any specified initial configuration of values, there is always only bounded lookback—with the bound apparently being determined by the largest of the initial values *f*[*n*_{init}].

So what about the behavior of *f*[*n*] for large *n*? Just like in our original *f*[0] = 1 case, we can construct “blue graph trees” rooted at each of the initial conditions. In the case *f*[0] = 1 we found that of the 4 trees only two continue to grow as *n* increases. As we vary *f*[0], the number of “surviving trees” varies quite erratically:

What if instead of just changing *f*[0], and keeping all other *f*[–*k*] = 1, we set *f*[*n*] = *s* for all *n* ≤ 0? The result is somewhat surprising:

For *s* ≥ 2, the behavior turns out to be simple—and similar to the behavior of P111.

So what can P312 be made to do if we change its initial conditions? With *f*[*n*] = 2 for *n *< 0, we see that for small *f*[0] the behavior remains “tame”, but as *f*[0] increases it starts showing its typical complexity:

One question to ask is what set of values *f*[*n*] takes on. Given that the initial values have certain residues mod 3, all subsequent values must have the same residues. But apart from this constraint, it seems that all values for *f*[*n*] are obtained—which is not surprising given that *f*[*n*] grows only like .

## The “P Family”: f[n_] := a + f[n – b f[n – c]]

P312 is just one example of the “P family” of sequences defined by:

Here is the behavior of some other P*abc* sequences:

And here are their evaluation graphs:

P312 is the first “seriously complex” example.

P111 (as mentioned earlier) has a particularly simple form

which corresponds to the simple formula:

The evaluation graph in this case is just:

Only a single initial condition *f*[0] = 1 is used, and there is only a single “blue graph tree” with a simple form:

Another interesting case is P123:

Picking out only odd values of *n* we get:

This might look just like the behavior of P111. But it’s not. The lengths of the successive “plateaus” are now

with differences:

But this turns out to be exactly a nested sequence generated by joining together the successive steps in the evolution of the substitution system:

P123 immediately “gets into its final behavior”, even for small *n*. But—as we saw rather dramatically with P312—there can be “transient behavior” that doesn’t “resolve” until *n* is large. A smaller case of this phenomenon occurs with P213. Above *n* = 68 it shows a simple “square root” pattern of behavior, basically like P111. But for smaller *n* it’s a bit more complicated:

And in this case the transients aren’t due to “blue graph trees” that stop growing. Instead, there are only two trees (associated with *f*[0] and *f*[–1]), but both of them soon end up growing in very regular ways:

## The “T Family”: f[n_] := a f[n – b f[n – c]]

What happens if our outermost operation is not addition, but multiplication?

Here are some examples of the behavior one gets. In each case we’re plotting on a log scale—and we’re not including T1*xx* cases, which are always trivial:

We see that some sequences have regular and readily predictable behavior, but others do not. And this is reflected in the evaluation graphs for these functions:

The first “complicated case” is T212:

The evaluation graph for *f*[50] in this case has the form:

And something that’s immediately notable is that in addition to “looking back” to the values of *f*[0] and *f*[–1], this also looks back to the value of *f*[–24]. Meanwhile, the evaluation graph for *f*[51] looks back not only to *f*[0] and* f*[–1] but also to *f*[–3] and f[–27]:

How far back does it look in general? Here’s a plot showing which lookbacks are made as a function of *n* (with the roots of the “blue graph trees” highlighted):

There’s alternation between behaviors for even and odd *n*. But apart from that, additional lookbacks are just steadily added as *n* increases—and indeed the total number of lookbacks seems to follow a simple pattern:

But—just for once—if one looks in more detail, it’s not so simple. The lengths of the successive “blocks” are:

So, yes, the lookbacks are quite “unpredictable”. But the main point here is that—unlike for the P family—the number of lookbacks isn’t limited. In a sense, to compute T212 for progressively larger *n*, progressively more information about its initial conditions is needed.

When one deals with ordinary, unnested recurrence relations, one’s always dealing with a fixed lookback. And the number of initial conditions then just depends on the lookback. (So, for example, the Fibonacci recurrence has lookback 2, so needs two initial conditions, while the standard factorial recurrence has lookback 1, so needs only one initial condition.)

But for the nested recurrence relation T212 we see that this is no longer true; there can be an unboundedly large lookback.

OK, but let’s look back at the actual T212 sequence. Here it is up to larger values of *n*:

Or, plotting each point as a dot:

Given the recursive definition of *f*[*n*], the values of *f*[*n*] must always be powers of 2. This shows where each successive power of 2 is first reached as a function of *n*:

Meanwhile, this shows the accumulated average of *f*[*n*] as a function of *n*:

This is well fit by 0.38 Log[*n*], implying that, at least with this averaging, *f*[*n*] asymptotically approximates *n*^{0.26}. And, yes, it is somewhat surprising that what seems like a very “exponential” recursive definition should lead to an *f*[*n*] that increases only like a power. But, needless to say, this is the kind of surprise one has to expect in the computational universe.

It’s worth noticing that *f*[*n*] fluctuates very intensely as a function of *n*. The overall distribution of values is very close to exponentially distributed—for example with the distribution of logarithmic values of *f*[*n*] for *n* between 9 million and 10 million being:

What else can we say about this sequence? Let’s say we reduce mod 2 the powers of 2 for each *f*[*n*]. Then we get a sequence which starts:

This is definitely not “uniformly random”. But if one look at blocks of sequential values, one can plot at what *n* each of the 2^{b} possible configurations of a length-*b* block first appears:

And eventually it seems as if all length-*b* blocks for any given *b* will appear.

By the way, whereas in the P family, there were always a limited number of “blue graph trees” (associated with the limited number of initial conditions), for T212 the number of such trees increases with *n*, as more initial conditions are used. So, for example, here are the trees for *f*[50] and *f*[51]:

We’ve so far discussed T212 only with the initial condition *f*[*n*] = 1 for *n* ≤ 0. The fact that *f*[*n*] is always a power of 2 relies on every initial value also being a power of 2. But here’s what happens, for example, if *f*(*n*) = 2^{s} for *n *≤ 0:

In general, one can think of T212 as transforming an ultimately infinite sequence of initial conditions into an infinite sequence of function values, with different forms of initial conditions potentially giving very different sequences of function values:

(Note that not all choices of initial conditions are possible; some lead to “*f*[*n*] = *f*[*n*]” or *f*[*n*] = *f*[*n *+ 1]”

## The “Summer School” Sequence T311 (f[n_] := 3 f[n – f[n – 1]])

Having explored T212, let’s now look at T311—the original one-term nestedly recursive function discovered at the 2003 Wolfram Summer School:

Here’s its basic behavior:

And here is its evaluation graph—which immediately reveals a lot more lookback than T212:

Plotting lookbacks as a function of *n *we get:

Much as with T212, the total number of lookbacks varies with *n* in the fairly simple way (~ 0.44 *n*):

Continuing the T311 sequence further, it looks qualitatively very much like T212:

And indeed T311—despite its larger number of lookbacks—seems to basically behave like T212. In a story typical of the Principle of Computational Equivalence, T212 seems to have already “filled out the computational possibilities”, so T311 “doesn’t have anything to add”.

## The “S Family”: f[n_] := n – f[f[n – a] – b]

As another (somewhat historically motivated) example of nestedly recursive functions, consider what we’ll call the “S family”, defined by:

Let’s start with the very minimal case S10 (or “S1”):

Our standard initial condition *f*[*n*] = 1 for *n* ≤ 0 doesn’t work here, because it implies that *f*[1] = 1 – *f*[1]. But if we take *f*[*n*] = 1 for *n *≤ 1 we get:

Meanwhile, with *f*[*n*] = 1 for *n* ≤ 3 we get:

The first obvious feature of both these results is their overall slope: 1/ϕ ≈ 0.618, where ϕ is the golden ratio. It’s not too hard to see why one gets this slope. Assume that for large *n* we can take *f*[*n*] = σ *n*. Then substitute this form into both sides of the recursive definition for the S family to get σ *n* *n* – σ (σ (*n* – *a*) – *b*). For large *n* all that survives is the condition for the coefficients *n*

which has solution σ = 1/ϕ.

Plotting *f*[*n*] – *n*/ϕ for the case *f*[*n*] = 1 for *n* ≤ 1 we get:

The evaluation graph is this case has a fairly simple form

as we can see even more clearly with a different graph layout:

It’s notable that only the initial condition *f*[1] = 1 is used—leading to a single “blue graph tree” that turns out to have a very simple “Fibonacci tree” form (which, as we’ll discuss below, has been known since the 1970s):

From this it follows that *f*[*n*] related to the “Fibonacci-like” substitution system

and in fact the sequence of values of *f*[*n*] can be computed just as:

And indeed it turns out that in this case *f*[*n*] is given exactly by:

What about when *f*[*n*] = 1 not just for *n* ≤ 1 but beyond? For *n* ≤ 2 the results are essentially the same as for *n* ≤ 1. But for *n* ≤ 3 there’s a surprise: the behavior is considerably more complicated—as we can see if we plot *f*[*n*] – *n*/ϕ:

Looking at the evaluation graph in this case we see that the only initial conditions sampled are *f*[1] = 1 and *f*[3] = 1 (with *f*[2] only being reached if one specifically starts with *f*[2]):

And continuing the evaluation graph we see a mixture of irregularity and comparative regularity:

The plot of *f*[*n*] has a strange “hand-drawn” appearance, with overall regularity but detailed apparent randomness. The most obvious large-scale feature is “bursting” behavior (interspersed in an audio rendering with an annoying hum). The bursts all seem to have approximately (though not exactly) the same structure—and get systematically larger. The lengths of successive “regions of calm” between bursts (characterized by runs with Abs[*f*[*n*] – *n*/ϕ] < 3) seem to consistently increase by a factor ϕ:

What happens to S1 with other initial conditions? Here are a few examples:

So how does S*a* depend on *a*? Sometimes there’s at least a certain amount of clear regularity; sometimes it’s more complicated:

As is very common, adding the parameter *b* in the definition doesn’t seem to lead to fundamentally new behavior—though for *b *> 0 the initial condition *f*[*n*] = 1, *n *≤ 0 can be used:

In all cases, only a limited number of initial conditions are sampled (bounded by the value of *a *+ *b* in the original definition). But as we can see, the behavior can either be quite simple, or can be highly complex.

## More Complicated Rules

Highly complex behavior arises even from very simple rules. It’s a phenomenon one sees all over the computational universe. And we’re seeing it here in nestedly recursive functions. But if we make the rules (i.e. definitions) for our functions more complicated, will we see fundamentally different behavior, or just more of the same?

The Principle of Computational Equivalence (as well as many empirical observations of other systems) suggests that it’ll be “more of the same”: that once one’s passed a fairly low threshold the computational sophistication—and complexity—of behavior will no longer change.

And indeed this is what one sees in nestedly recursive functions. But below the threshold different kinds of things can happen with different kinds of rules.

There are several directions in which we can make rules more complicated. One that we won’t discuss here is to use operations (conditional, bitwise, etc.) that go beyond arithmetic. Others tend to involve adding more instances of *f* in our definitions.

An obvious way to do this is to take *f*[*n*_] to be given by a sum of terms, “Fibonacci style”. There are various specific forms one can consider. As a first example—that we can call *ab—*let’s look at:

The value of *a* doesn’t seem to matter much. But changing *b* we see:

12 has unbounded lookback (at least starting with *f*[*n*] = 1 for *n* ≤ 0), but for larger *b*, 1*b* has bounded lookback. In both 13 and 15 there is continuing large-scale structure (here visible in log plots)

though this does not seem to be reflected in the corresponding evaluation graphs:

As another level of Fibonacci-style definition, we can consider *ab*:

But the typical behavior here does not seem much different from what we already saw with one-term definitions involving only two *f*’s:

(Note that *aa* is equivalent to *a*. Cases like 13 lead after a transient to pure exponential growth.)

A somewhat more unusual case is what we can call *abc*:

Subtracting overall linear trends we get:

For 111 using initial conditions *f*[1] = *f*[2] = 1 and plotting *f*[*n*] – *n*/2 we get

which has a nested structure that is closely related to the result of concatenating binary digit sequences of successive integers:

But despite the regularity in the sequence of values, the evaluation graph for this function is not particularly simple:

So how else might we come up with more complicated rules? One possibility is that instead of “adding *f*’s by adding terms” we can add *f*’s by additional nesting. So, for example, we can consider what we can call S_{3}1 (here shown with initial condition *f*[*n*] = 1 for *n* ≤ 3):

We can estimate the overall slope here by solving for *x* in *x* 1 – *x*^{3} to get ≈ 0.682. Subtracting this off we get:

We can also consider deeper nestings. At depth *d* the slope is the solution to *x* 1 – *x*^{d}. Somewhat remarkably, in all cases the only initial conditions probed are *f*[1] = 1 and *f*[3] = 1:

As another example of “higher nesting” we can consider the class of functions (that we call *a*):

Subtracting a constant 1/ϕ slope we get:

The evaluation graph for 1 is complicated, but has some definite structure:

What happens if we nest even more deeply, say defining *a* functions:

With depth-*d* nesting, we can estimate the overall slope of *f*[*n*] by solving for *x* in

or

so that for the *d *= 3 case here the overall slope is the real root of or about 0.544. Subtracting out this overall slope we get:

And, yes, the sine-curve-like form of 5 is very odd. Continuing 10x longer, though, things are “squaring off”:

What happens if we continue nesting deeper? stays fairly tame:

However, already allows for more complicated behavior:

And for different values of *a* there are different regularities:

There are all sorts of other extensions and generalizations one might consider. Some involve alternate functional forms; others involve introducing additional functions, or allowing multiple arguments to our function *f*.

## An Aside: The Continuous Case

In talking about recursive functions *f*[*n*] we’ve been assuming—as one normally does—that *n* is always an integer. But can we generalize what we’re doing to functions *f*[*x*] where *x* is a continuous real number?

Consider for example a continuous analog of the Fibonacci recurrence:

This produces a staircase-like function whose steps correspond to the usual Fibonacci numbers:

Adjusting the initial condition produces a slightly different result:

We can think of these as being solutions to a kind of “Fibonacci delay equation”—where we’ve given initial conditions not at discrete points, but instead on an interval.

So what happens with nestedly recursive functions? We can define an analog of S1 as:

Plotting this along with the discrete result we get:

In more detail, we get

where now the plateaus occur at the (“Wythoff numbers”) .

Changing the initial condition to be *x* ≤ 3 we get:

Removing the overall slope by subtracting *x*/ϕ gives:

One feature of the continuous case is that one can continuously change initial conditions—though the behavior one gets typically breaks into “domains” with discontinuous boundaries, as in this case where we’re plotting the value of *f*[*x*] as a function of *x* and the “cutoff” in the initial conditions *f*[*x*], *x* ≤ :

So what about other rules? A rule like P312 (*f*[*n*_] := 3 + *f*[*n* – *f*[*n* – 2]]) given “constant” initial conditions effectively just copies and translates the initial interval, and gives a simple order-0 interpolation of the discrete case. With initial condition *f*[*x*] = *x* some segments get “tipped”:

All the cases we’ve considered here don’t “look back” to negative values, in either the discrete or continuous case. But what about a rule like T212 (*f*[*n*_] := 2 *f*[*n* – 1 *f*[*n* – 2]]) that progressively “looks back further”? With the initial condition *f*[*x*] = 1 for *x* ≤ 0, one gets the same result as in the discrete case:

But if one uses the initial condition *f*[*x *] = Abs[*x* – 1] for *x* ≤ 0 (the Abs[*x* – 1] is needed to avoid ending up with *f*[*x*] depending on *f*[*y*] for *y* > *x*) one instead has

yielding the rather different result:

Continuing for larger *x* (on a log scale) we get:

Successively zooming in on one of the first “regions of noise” we see that it ultimately consists just of a large number of straight segments:

What’s going on here? If we count the number of initial conditions that are used for different values of *x* we see that this has discontinuous changes, leading to disjoint segments in *f*[*x*]:

Plotting over a larger range of *x* values the number of initial conditions used is:

And plotting the actual values of those initial conditions we get:

If we go to later, “more intense” regions of noise, we see more fragmentation—and presumably in the limit *x* ∞ we get the analog of an essential singularity in *f*[*x*]:

For the S family, with its overall *n*/ϕ trend, even constant initial conditions—say for S1—already lead to tipping, here shown compared to the discrete case:

## How Do You Actually Compute Recursive Functions?

Let’s say we have a recursive definition—like the standard Fibonacci one:

How do we actually use this to compute the value of, say, *f*[7]? Well, we can start from *f*[7], then use the definition to write this as *f*[6] + *f*[5], then write *f*[6] as *f*[5] + *f*[4], and so on. And we can represent this using a evaluation graph, in the form:

But this computation is in a sense very wasteful; for example, it’s independently computing *f*[3] five separate times (and of course getting the same answer each time). But what if we just stored each *f*[*n*] as soon as we compute, and then just retrieve that stored (“cached”) value whenever we need it again?

In the Wolfram Language, it’s a very simple change to our original definition:

And now our evaluation graph becomes much simpler:

And indeed it’s this kind of minimal evaluation graph that we’ve been using in everything we’ve discussed so far.

What’s the relationship between the “tree” evaluation graph, and this minimal one? The tree graph is basically an “unrolled” version of the minimal graph, in which all the possible paths that can be taken from the root node to the initial condition nodes have been treed out.

In general, the number of edges that come out of a single node in a evaluation graph will be equal to the number of instances of the function *f* that appear on the right-hand side of the recursive definition we’re using (i.e. 2 in the case of the standard Fibonacci definition). So this means that if the maximum length of path from the root to the initial conditions is *s*, the maximum number of nodes that can appear in the “unrolled” graph is 2^{s}. And whenever there are a fixed set of initial conditions (i.e. if there’s always the same lookback), the maximum path length is essentially *n*—implying in the end that the maximum possible number of nodes in the unrolled graph will be 2^{n}.

(In the actual case of the Fibonacci recurrence, the number of nodes in the unrolled graph is, or about 1.6^{n}.)

But if we actually evaluate *f*[7]—say in the Wolfram Language—what is the sequence of *f*[*n*]’s that we’ll end up computing? Or, in effect, how will the evaluation graph be traversed? Here are the results for the unrolled and minimal evaluation graphs—i.e. without and with caching:

Particularly in the first case this isn’t the only conceivable result we could have gotten. It’s the way it is here because of the particular “leftmost innermost” evaluation order that the Wolfram Language uses by default. In effect, we’re traversing the graph in a depth-first way. In principle we could use other traversal orders, leading to *f*[*n*]’s being evaluated in different orders. But unless we allow other operations (like *f*[3] + *f*[3] 2 *f*[3]) to be interspersed with *f* evaluations, we’ll still always end up with the same number of *f* evaluations for a given evaluation graph.

But which is the “correct” evaluation graph? The unrolled one? Or the minimal one? Well, it depends on the computational primitives we’re prepared to use. With a pure stack machine, the unrolled graph is the only one possible. But if we allow (random-access) memory, then the minimal graph becomes possible.

OK, so what happens with nestedly recursive functions? Here, for example, are unrolled and minimal graphs for T212:

Here are the sequences of *f*[*n*]’s that are computed:

And here’s a comparison of the number of nodes (i.e. *f* evaluations) from unrolled and minimal evaluation graphs (roughly 1.2^{n} and 0.5 *n*, respectively):

Different recursive functions lead to different patterns of behavior. The differences are less obvious in evaluation graphs, but can be quite obvious in the actual sequence of *f*[*n*]’s that are evaluated:

But although looking at evaluation sequences from unrolled evaluation graphs can be helpful as a way of classifying behavior, the exponentially more steps involved in the unrolled graph typically makes this impractical in practice.

## Primitive Recursive or Not?

Recursive functions have a fairly long history, that we’ll be discussing below. And for nearly a hundred years there’s been a distinction made between “primitive recursive functions” and “general recursive functions”. Primitive recursive functions are basically ones where there’s a “known-in-advance” pattern of computation that has to be done; general recursive functions are ones that may in effect make one have to “search arbitrarily far” to get what one needs.

In Wolfram Language terms, primitive recursive functions are roughly ones that can be constructed directly using functions like `Nest` and `Fold` (perhaps nested); general recursive functions can also involve functions like `NestWhile` and `FoldWhile`.

So, for example, with the Fibonacci definition

the function *f*[*n*] is primitive recursive and can be written, say, as:

Lots of the functions one encounters in practice are similarly primitive recursive—including most “typical mathematical functions” (`Plus`, `Power`, `GCD`, `Prime`, …). And for example functions that give the results of *n* steps in the evolution of a Turing machine, cellular automaton, etc. are also primitive recursive. But functions that for example test whether a Turing machine will ever halt (or give the state that it achieves if and when it does halt) are not in general primitive recursive.

On the face of it, our nestedly recursive functions seem like they must be primitive recursive, since they don’t for example appear to be “searching for anything”. But things like the presence of longer and longer lookbacks raise questions. And then there’s the potential confusion of the very first example (dating from the late 1920s) of a recursively defined function known not to be primitive recursive: the Ackermann function.

The Ackermann function has three (or sometimes two) arguments—and, notably, its definition (here given in its classic form) includes nested recursion:

This is what the evaluation graphs look like for some small cases:

Looking at these graphs we can begin to see a pattern. And in fact there’s a simple interpretation: *f*[*m*, *x*, *y*] for successive *m* is doing progressively more nested iterations of integer successor operations. *f*[0, *x*, *y*] computes *x* + *y*; *f*[1, *x*, *y*] does “repeated addition”, i.e. computes *x* *× y*; *f*[2, *x*, *y*] does “repeated multiplication”, i.e. computes *x*y; f[3, x, y] does “tetration”, i.e. computes the “power tower” Nest[x^{#&}, 1, y]; etc.

Or, alternatively, these can be given explicitly in successively more nested form:

And at least in this form *f*[*m*, *x*, *y*] involves *m* nestings. But a given primitive recursive function can involve only a fixed number of nestings. It might be conceivable that we could rewrite *f*[*m*, *x*, *y*]*f*[*m*, *m*, *m*]

But it turns out that the fact that this can happen depends critically on the Ackermann function having more than one argument—so that one can construct the “diagonal” *f*[*m*, *m*, *m*].

So what about our nestedly recursive functions? Well, at least in the form that we’ve used them, they can all be written in terms of `Fold`. The key idea is to accumulate a list of values so far (conveniently represented as an association)—sampling whichever parts are needed—and then at the end take the last element. So for example the “Summer School function” T311

can be written:

An important feature here is that we’re getting `Lookup` to give 1 if the value it’s trying to look up hasn’t been filled in yet, implementing the fact that *f*[*n*] = 1 for *n* ≤ 0.

So, yes, our recursive definition might look back further and further. But it always just finds value 1—which is easy for us to represent without, for example, any extra nesting, etc.

The ultimate (historical) definition of primitive recursion, though, doesn’t involve subsets of the Wolfram Language (the definition was given almost exactly 100 years too early!). Instead, it involves a specific set of simple primitives:

(An alternative, equivalent definition for recursion—explicitly involving `Fold`—is `r[g_, h_] := Fold[{u, v} h[u, v, ##2]], g[##2]``, ``Range``[0, #1 – 1]] &`.)

So can our nestedly recursive functions be written purely in terms of these primitives? The answer is yes, though it’s seriously complicated. A simple function like `Plus` can for example be written as `r``[p[1], s]`, so that e.g. `r[p[1], s][2,3]5`. `Times` can be written as `r[z, c`[`Plus`, `p[1], p[3]]]` or `r[z, c[r[p[1], s], p[1], p[3]]]`, while `Factorial` can be written as `r[c[s, z], c[``Times`, `p[1], c[s, p[2``]]]]`. But even `Fibonacci`, for example, seems to require a very much longer specification.

In writing “primitive-recursive-style” definitions in Wolfram Language we accumulated values in lists and associations. But in the ultimate definition of primitive recursion, there are no such constructs; the only form of “data” is positive integers. But for our definitions of nestedly recursive functions we can use a “tupling function” that “packages up” any list of integer values into a single integer (and an untupling function that unpacks it). And we can do this say based on a pairing (2-element-tupling) function like:

But what about the actual `If``[n ≤0, 1, ...]` lookback test itself? Well, `If` can be written in primitive recursive form too: for example, `r[c[s, z], c[f, c[s, p[2]]]][n]` is equivalent to `If``[n ≤ 0, 1, f[n]]`.

So our nestedly recursive functions as we’re using them are indeed primitive recursive. Or, more strictly, finding values *f*[*n*] is primitive recursive. Asking questions like “For what *n* does *f*[*n*] reach 1000?” might not be primitive recursive. (The obvious way of answering them involves a `FoldWhile`-style non-primitive-recursive search, but proving that there’s no primitive recursive way to answer the question is likely very much harder.)

By the way, it’s worth commenting that while for primitive recursive functions it’s always possible to compute a value *f*[*n*] for any *n*, that’s not necessarily true for general recursive functions. For example, if we ask “For what *n* does *f*[*n*] reach 1000?” there might simply be no answer to this; *f*[*n*] might never reach 1000. And when we look at the computations going on underneath, the key distinction is that in evaluating primitive recursive functions, the computations always halt, while for general recursive functions, they may not.

So, OK. Our nestedly recursive functions can be represented in “official primitive recursive form”, but they’re very complicated in that form. So that raises the question: what functions can be represented simply in this form? In *A New Kind of Science* I gave some examples, each minimal for the output it produces:

And then there’s the most interesting function I found:

It’s the simplest primitive recursive function whose output has no obvious regularity:

Because it’s primitive recursive, it’s possible to express it in terms of functions like `Fold`—though it’s two deep in those, making it in some ways more complicated (at least as far as the Grzegorczyk hierarchy that counts “Fold levels” is concerned) than our nestedly recursive functions:

But there’s still an issue to address with nestedly recursive functions and primitive recursion. When we have functions (like T212) that “reach back” progressively further as *n* increases, there’s a question of what they’ll find. We’ve simply assumed *f*[*n*] = 1 for *n* ≤0. But what if there was something more complicated there? Even if *f*[–*m*] was given by some primitive recursive function, say *p*[*m*], it seems possible that in computing *f*[*n*] one could end up somehow “bouncing back and forth” between positive and negative arguments, and in effect searching for an *m* for which *p*[*m*] has some particular value, and in doing that searching one could find oneself outside the domain of primitive recursive functions.

And this raises yet another question: are all definitions we can give of nestedly recursive functions consistent? Consider for example:

Now ask: what is *f*[1]? We apply the recursive definition. But it gives us *f*[1] = 1 – *f*[*f*[0]]*f*[1] = 1 – *f*[1], or, in other words, an inconsistency. There are many such inconsistencies that seem to “happen instantly” when we apply definitions. But it seems conceivable that there could be “insidious inconsistencies” that show up only after many applications of a recursive definition. And it’s also conceivable that one could end up with “loops” like *f*[*i*] = *f*[*i*]. And things like this could be reasons that *f*[*n*] might not be a “total function”, defined for all *n*.

We’ve seen all sorts of complex behavior in nestedly recursive functions. And what the Principle of Computational Equivalence suggests is that whenever one sees complex behavior, one must in some sense be dealing with computations that are “as sophisticated as any computation can be”. And in particular one must be dealing with computations that can somehow support computation universality.

So what would it mean for a nestedly recursive function to be universal? For a start, one would need some way to “program” the function. There seem to be a couple of possibilities. First, one could imagine packing both “code” and “data” into the argument *n* of *f*[*n*]. So, for example, one might use some form of tupling function to take a description of a rule and an initial state for a Turing machine, together with a specification of a step number, then package all these things into an integer *n* that one feeds into one’s universal nestedly recursive function *f*. Then the idea would be that the value computed for *f*[*n*] could be decoded to give the state of the Turing machine at the specified step. (Such a computation by definition always halts—but much as one computes with Turing machines by successively asking for the next steps in their evolution, one can imagine setting up a “harness” that just keeps asking for values of *f*[*n*] at an infinite progression of values *n*.)

Another possible approach to making a universal nestedly recursive function is to imagine feeding in a “program” through the initial conditions one gives for the function. There might well need to be decoding involved, but in some sense what one might hope is that just by changing its initial conditions one could get a nestedly recursive function with a specific recursive definition to emulate a nestedly recursive function with any other recursive definition (or, say, for a start, any linear recurrence).

Perhaps one could construct a complicated nestedly recursive function that would have this property. But what the Principle of Computational Equivalence suggests is that it should be possible to find the property even in “naturally occurring cases”—like P312 or T212.

The situation is probably going to be quite analogous to what happened with the rule 110 cellular automaton or the *s* = 2, *k* = 3 596440 Turing machine. By looking at the actual typical behavior of the system one got some intuition about what was likely to be going on. And then later, with great effort, it became possible to actually prove computation universality.

In the case of nestedly recursive functions, we’ve seen here examples of just how diverse the behavior generated by changing initial conditions can be. It’s not clear how to harness this diversity to extract some form of universality. But it seems likely that the “raw material” is there. And that nestedly recursive functions will show themselves as able join so many other systems in fitting into the framework defined by the Principle of Computational Equivalence.

## Some History

Once one has the concept of functions and the concept of recursion, nestedly recursive functions aren’t in some sense a “complicated idea”. And between this fact and the fact that nestedly recursive functions haven’t historically had a clear place in any major line of mathematical or other development it’s quite difficult to be sure one’s accurately tracing their history. But I’ll describe here at least what I currently know.

The concept of something like recursion is very old. It’s closely related to mathematical induction, which was already being used for proofs by Euclid around 300 BC. And in a quite different vein, around the same time (though not recorded in written form until many centuries later) Fibonacci numbers arose in Indian culture in connection with the enumeration of prosody (“How many different orders are there in which to say the Sanskrit words in this veda?”).

Then in 1202 Leonardo Fibonacci, at the end of his calculational math book *Liber Abaci* (which was notable for popularizing Hindu-Arabic numerals in the West) stated—more or less as a recreational example—his “rabbit problem” in recursive form, and explicitly listed the Fibonacci numbers up to 377. But despite this early appearance, explicit recursively defined sequences remained largely a curiosity until as late as the latter part of the twentieth century.

The concept of an abstract function began to emerge with calculus in the late 1600s, and became more solidified in the 1700s—but basically always in the context of continuous arguments. A variety of specific examples of recurrence relations—for binomial coefficients, Bernoulli numbers, etc.—were in fairly widespread use. But there didn’t seem to have yet been a sense that there was a general mathematical structure to study.

In the course of the 1800s there had been an increasing emphasis on rigor and abstraction in mathematics, leading by the latter part of the century to a serious effort to axiomatize concepts associated with numbers. Starting with concepts like the recursive definition of integers by repeated application of the successor operation, by the time of Peano’s axioms for arithmetic in 1891 there was a clear general notion (particularly related to the induction axiom) that (integer) functions could be defined recursively. And when David Hilbert’s program of axiomatizing mathematics got underway at the beginning of the 1900s, it was generally assumed that all (integer) functions of interest could actually be defined specifically using primitive recursion.

The notation for recursively specifying functions gradually got cleaner, making it easier to explore more elaborate examples. And in 1927 Wilhelm Ackermann (a student of Hilbert’s) introduced (in completely modern notation) a “reasonable mathematical function” that—as we discussed above—he showed was not primitive recursive. And right there, in his paper, without any particular comment, is a nestedly recursive function definition:

In 1931 Kurt Gödel further streamlined the representation of recursion, and solidified the notion of general recursion. There soon developed a whole field of recursion theory—though most of it was concerned with general issues, not with specific, concrete recursive functions. A notable exception was the work of Rózsa Péter (Politzer), beginning in the 1930s, and leading in 1957 to her book *Recursive Functions—*which contains a chapter on “Nested Recursion” (here in English translation):

But despite the many specific (mostly primitive) recursive functions discussed in the rest of the book, this chapter doesn’t stray far from the particular function Ackermann defined (or at least Péter’s variant of it).

What about the recreational mathematics literature? By the late 1800s there were all sorts of publications involving numbers, games, etc. that at least implicitly involved recursion (an example being Édouard Lucas’s 1883 Tower of Hanoi puzzle). But—perhaps because problems tended to be stated in words rather than mathematical notation—it doesn’t seem as if nestedly recursive functions ever showed up.

In the theoretical mathematics literature, a handful of somewhat abstract papers about “nested recursion” did appear, an example being one in 1961 by William Tait, then at Stanford:

But, meanwhile, the general idea of recursion was slowly beginning to go from purely theoretical to more practical. John McCarthy—who had coined the term “artificial intelligence”—was designing LISP as “the language for AI” and by 1960 was writing papers with titles like “Recursive Functions of Symbolic Expressions and Their Computation by Machine”.

In 1962 McCarthy came to Stanford to found the AI Lab there, bringing with him enthusiasm for both AI and recursive functions. And by 1968 these two topics had come together in an effort to use “AI methods” to prove properties of programs, and in particular programs involving recursive functions. And in doing this, John McCarthy came up with an example he intended to be awkward—that’s exactly a nestedly recursive function:

In our notation, it would be:

And it became known as “McCarthy’s 91-function” because, yes, for many *n*, *f*[*n*] = 91. These days it’s trivial to evaluate this function—and to find out that *f*[*n*] = 91 only up to *n* = 102:

But even the evaluation graph is somewhat large

and in pure recursive evaluation the recursion stack can get deep—which back then was a struggle for LISP systems to handle.

There were efforts at theoretical analysis, for example by Zohar Manna, who in 1974 published *Mathematical Theory of Computation* which—in a section entitled “Fixpoints of Functionals”—presents the 91-function and other nestedly recursively functions, particularly in the context of evaluation-order questions.

In the years that followed, a variety of nestedly recursive functions were considered in connection with proving theorems about programs, and with practical assessments of LISP systems, a notable example being Ikuo Takeuchi’s 1978 triple recursive function:

But in all these cases the focus was on how these functions would be evaluated, not on what their behavior would be (and it was typically very simple).

But now we have to follow another thread in the story. Back in 1961, right on the Stanford campus, a then-16-year-old Douglas Hofstadter was being led towards nestedly recursive functions. As Doug tells it, it all started with him seeing that squares are interspersed with gaps of 1 or 2 between triangular numbers, and then noticing patterns in those gaps (and later realizing that they showed nesting). Meanwhile, at Stanford he had access to a computer running Algol, a language which (like LISP and unlike Fortran) supported recursion (though this wasn’t particularly advertised, since recursion was still generally considered quite obscure).

And as Doug tells it, within a year or two he was using Algol to do things like recursively create trees representing English sentences. Meanwhile—in a kind of imitation of the Eleusis “guess-a-card-rule” game—Doug was apparently challenging his fellow students to a “function game” based on guessing a math function from specified values. And, as he tells it, he found that functions that were defined recursively were the ones people found it hardest to guess.

That was all in the early 1960s, but it wasn’t until the mid-1970s that Doug Hofstadter returned to such pursuits. After various adventures, Doug was back at Stanford—writing what became his book *Gödel, Escher, Bach*. And in 1977 he sent a letter to Neil Sloane, creator of the 1973 *A Handbook of Integer Sequences* (and what’s now the Online Encyclopedia of Integer Sequences, or OEIS):

As suggested by the accumulation of “sequence ID” annotations on the letter, Doug’s “eta sequences” had actually been studied in number theory before—in fact, since at least the 1920s (they are now usually called Beatty sequences). But the letter went on, now introducing some related sequences—that had nestedly recursive definitions:

As Doug pointed out, these particular sequences (which were derived from golden ratio versions of his “eta sequences”) have a very regular form—which we would now call nested. And it was the properties of this form that Doug seemed most concerned about in his letter. But actually, as we saw above, just a small change in initial conditions in what I’m calling S1 would have led to much wilder behavior. But that apparently wasn’t something Doug happened to notice. A bit later in the letter, though, there was another nestedly recursive sequence—that Doug described as a “horse of an entirely nother color”: the “absolutely CRAZY” Q sequence:

Two years later, Doug’s *Gödel, Escher, Bach* book was published—and in it, tucked away at the bottom of page 137, a few pages after a discussion of recursive generation of text (with examples such as “the strange bagels that the purple cow without horns gobbled”), there was the Q sequence:

Strangely, though, there was no picture of it, and Doug listed only 17 terms (which, until I was writing this, was all I assumed he had computed):

So now nestedly recursive sequences were out in the open—in what quickly became a very popular book. But I don’t think many people noticed them there (though, as I’ll discuss, I did). *Gödel, Escher, Bach* is primarily a playful book focused on exposition—and not the kind of place you’d expect to find a new, mathematical-style result.

Still—quite independent of the book—Neil Sloane showed Doug’s 1977 letter to his Bell Labs colleague Ron Graham, who within a year made a small mention of the Q sequence in a staid academic math publication (in a characteristic “state-it-as-a-problem” Erdös form):

There was a small and tight-knit circle of serious mathematicians—essentially all of whom, as it happens, I personally knew—who would chase these kinds of easy-to-state-but-hard-to-solve problems. Another was Richard Guy, who soon included the sequence as part of problem E31 in his *Unsolved Problems in Number Theory*, and mentioned it again a few years later.* *

But for most of the 1980s little was heard about the sequence. As it later turns out, a senior British applied mathematician named Brian Conolly (who wasn’t part of the aforementioned tight-knit circle) had—presumably as a kind of hobby project—made some progress, and in 1986 had written to Guy about it. Guy apparently misplaced the letter, but later told Conolly that John Conway and Sol Golomb had worked on similar things.

Conway presumably got the idea from Hofstadter’s work (though he had a habit of obfuscating his sources). But in any case, on July 15, 1988, Conway gave a talk at Bell Labs entitled “Some Crazy Sequences” (note the word “crazy”, just like in Hofstadter’s letter to Sloane) in which he discussed the regular-enough-to-be-mathematically-interesting sequence (which we’re calling G_{3}111 here):

Despite its visual regularity, Conway couldn’t mathematically prove certain features of the wiggles in the sequence—and in his talk offered a $10,000 prize for anyone who could. By August a Bell Labs mathematician named Colin Mallows had done it. Conway claimed (later to be contradicted by video evidence) that he’d only offered $1000—and somehow the whole affair landed as a story in the August 30 *New York Times* under the heading “Intellectual Duel: Brash Challenge, Swift Response”. But in any case, this particular nestedly recursive sequence became known as “Conway’s Challenge Sequence”.

So what about Sol Golomb? It turns out he’d started writing a paper—though never finished it:

He’d computed 280 terms of the Q sequence (he wasn’t much of a computer user) and noticed a few coincidences. But he also mentioned another kind of nestedly recursive sequence, no doubt inspired by his work on feedback shift registers:

As he noted, the behavior depends greatly on the initial conditions, though is always eventually periodic—with his student Unjeng Cheng having found long-period examples.

OK, so by 1988 nestedly recursive functions had at least some notoriety. So what happened next? Not so much. There’s a modest academic literature that’s emerged over the last few decades, mostly concentrated very specifically around “Conway’s Challenge Sequence”, Hofstadter’s Q function, or very similar “meta Fibonacci” generalizations of them. And so far as I know, even the first published large-scale picture of the Q sequence only appeared in 1998 (though I had pictures of it many years earlier):

Why wasn’t more done with nestedly recursive functions? At some level it’s because they tend to have too much computational irreducibility—making it pretty difficult to say much about them in traditional mathematical ways. But perhaps more important, studying them broadly is really a matter of ruliology: it requires the idea of exploring spaces of rules, and of expecting the kinds of behavior and phenomena that are characteristic of systems in the computational universe. And that’s something that’s still not nearly as widely understood as it should be.

## My Personal Story with Nestedly Recursive Functions

I think 1979 was the year when I first took recursion seriously. I’d heard about the Fibonacci sequence (though not under that name) as a young child a decade earlier. I’d implicitly (and sometimes explicitly) encountered recursion (sometimes through error messages!) in computer algebra systems I’d used. In science, I’d studied fractals quite extensively (Benoit Mandelbrot’s book having appeared in 1977), and I’d been exposed to things like iterated maps. And I’d quite extensively studied cascade processes, notably of quarks and gluons in QCD.

As I think about it now, I realize that for several years I’d written programs that made use of recursion (and I had quite a lot of exposure to LISP, and the culture around it). But it was in 1979—having just started using C—that I first remember writing a program (for doing percolation theory) where I explicitly thought “this is using recursion”. But then, in late 1979, I began to design SMP (“Symbolic Manipulation Program”), the forerunner of the modern Wolfram Language. And in doing this I quickly solidified my knowledge of mathematical logic and the (then-fledgling) field of theoretical computer science.

My concept of repeated transformations for symbolic expressions—which is still the core of Wolfram Language today—is somehow fundamentally recursive. And by the time we had the first signs of life for our SMP system, Fibonacci was one of our very first tests. We soon tried the Ackermann function too. And in 1980 I became very interested in the problem of evaluation order, particularly for recursive functions—and the best treatment I found of it (though at the time not very useful to me) was in none other than the book by Zohar Manna that I mentioned above. (In a strange twist, I was at that time also studying gauge choices in physics—and it was only last year that I realized that they’re fundamentally the same thing as evaluation orders.)

It was soon after it came out in 1979 that I first saw Douglas Hofstadter’s book. At the time I wasn’t too interested in its Lewis-Carroll-like aspects, or its exposition; I just wanted to know what the “science meat” in it was. And somehow I found the page about the Q sequence, and filed it away as “something interesting”.

I’m not sure when I first implemented the Q sequence in SMP, but by the time we released Version 1.0 in July 1981, there it was: an external package (hence the “X” prefix) for evaluating “Hofstadter’s recursive function”, elegantly using memoization—with the description I gave saying (presumably because that’s what I’d noticed) that its values “have several properties of randomness”:

Firing up a copy of SMP today—running on a virtual machine that still thinks it’s 1986—I can run this code, and easily compute the function:

I can even plot it—though without an emulator for a 1980s-period storage-tube display, only the ASCIIfied rendering works:

So what did I make of the function back in 1981? I was interested in how complexity and randomness could occur in nature. But at the time, I didn’t have enough of a framework to understand the connection. And, as it was, I was just starting to explore cellular automata, which seemed a lot more “nature like”—and which soon led me to things like rule 30 and the phenomenon of computational irreducibility.

Still, I didn’t forget the Q sequence. And when I was building Mathematica I again used it as a test (the .tq file extension came from the brief period in 1987 when we were trying out “Technique” as the name of the system):

When Mathematica 1.0 was released on June 23, 1988, the Q sequence appeared again, this time as an example in the soon-in-every-major-bookstore Mathematica book:

I don’t think I was aware of Conway’s lecture that occurred just 18 days later. And for a couple of years I was consumed with tending to a young product and a young company. But by 1991, I was getting ready to launch into basic science again. Meanwhile, the number theorist (and today horologist) Ilan Vardi—yet again from Stanford—had been working at Wolfram Research and writing a book entitled *Computational Recreations in Mathematica*, which included a long section on the analysis of Takeuchi’s nested recursive function (“TAK”). My email archive records an exchange I had with him:

He suggested a “more symmetrical” nested recursive function. I responded, including a picture that made it fairly clear that this particular function would have only nested behavior, and not seem “random”:

By the summer of 1991 I was in the thick of exploring different kinds of systems with simple rules, discovering the remarkable complexity they could produce, and filling out what became Chapter 3 of *A New Kind of Science*: “The World of Simple Programs”. But then came Chapter 4: “Systems Based on Numbers”. I had known since the mid-1980s about the randomness in things like digit sequences produced by successive arithmetic operations. But what about randomness in pure sequences of integers? I resolved to find out just what it would take to produce randomness there. And so it was that on August 13, 1993, I came to be enumerating possible symbolic forms for recursive functions—and selecting ones that could generate at least 10 terms:

As soon as I plotted the “survivors” I could see that interesting things were going to happen:

Was this complexity somehow going to end? I checked out to 10 million terms. And soon I started collecting my “prize specimens” and making a gallery of them:

I had some one-term recurrences, and some two-term ones. Somewhat shortsightedly I was always using “Fibonacci-like” initial conditions *f*[1] = *f*[2] = 1—and I rejected any recurrence that tried to “look back” to *f*[0], *f*[–1], etc. And at the time, with this constraint, I only found “really interesting” behavior in two-term recurrences.

In 1994 I returned briefly to recursive sequences, adding a note “solving” a few of them, and discussing the evaluation graphs of others:

When I finally finished *A New Kind of Science* in 2002, I included a list of historical “Close approaches” to its core discoveries, one of them being Douglas Hofstadter’s work on recursive sequences:

In retrospect, back in 1981 I should have been able to take the “Q sequence” and recognize in it the essential “rule 30 phenomenon”. But as it was, it took another decade—and many other explorations in the computational universe—for me to build up the right conceptual framework to see this. In *A New Kind of Science* I studied many kinds of systems, probing them far enough, I hoped, to see their most important features. But recursive functions were an example where I always felt there was more to do; I felt I’d only just scratched the surface.

In June 2003—a year after *A New Kind of Science* was published—we held our first summer school. And as a way to introduce methodology—and be sure that people knew I was fallible and approachable—I decided on the first day of the summer school to do a “live experiment”, and try to stumble my way to discovering something new, live and in public.

A few minutes before the session started, I picked the subject: recursive functions. I began with some examples I knew. Then it was time to go exploring. At first lots of functions “didn’t work” because they were looking back too far. But then someone piped up “Why not just say that *f*[*n*] is 1 whenever *n* isn’t a positive integer?” Good idea! And very easy to try.

Soon we had the “obvious” functions written (today `Apply``[``Plus``, ...]` could be just `Total``[...]`, but otherwise there’s nothing “out of date” here):

In a typical story of Wolfram-Language-helps-one-think-clearly, the obvious function was also very general, and allowed a recurrence with any number of terms. So why not start with just one term? And immediately, there it was—what we’re now calling T311:

And then a plot (yes, after Version 6 one didn’t need the `Show` or the trailing “;”):

Of course, as is the nature of computational constructions, this is something timeless—that looks the same today as it did 21 years ago (well, except that now our plots display with color by default).

I thought this was a pretty neat discovery. And I just couldn’t believe that years earlier I’d failed to see the obvious generalization of having “infinite” initial conditions.

The next week I did a followup session, this time talking about how one would write up a discovery like this. We started off with possible titles (including audience suggestions):

And, yes, the first title listed is exactly the one I’ve now used here. In the notebook I created back then, there were first some notes (some of which should still be explored!):

Three hours later (on the afternoon of July 11, 2003) there’s another notebook, with the beginnings of a writeup:

By the way, another thing we came up with at the summer school was the (non-nestedly) recursive function:

Plotting *g*[*n* + 1] – *g*[*n*] gives:

And, yes, bizarrely (and reminiscent of McCarthy’s 91-function) for *n* ≥ 396, *g*[*n* + 1] – *g*[*n*] is always 97, and *g*[*n*] = 38606 + 97 (*n* – 396).

But in any case, a week or so after my “writeups” session, the summer school was over. In January 2004 I briefly picked the project up again, and made some pictures that, yes, show interesting structure that perhaps I should investigate now:

In the years that followed, I would occasionally bring nestedly recursive functions out again—particularly in interacting with high school and other students. And at our summer programs I suggested projects with them for a number of students.

In 2008, they seemed like an “obvious interesting thing” to add to our Demonstrations Project:

But mostly, they languished. Until, that is, my burst of “finish this” intellectual energy that followed the launch of our Physics Project in 2020. So here now, finally, after a journey of 43 years, I feel like I’ve been able to do some justice to nestedly recursive functions, and provided a bit more illumination to yet another corner of the computational universe.

(Needless to say, there are many, many additional questions and issues to explore. Different primitives, e.g. `Mod`, `Floor`, etc. Multiple functions that refer to each other. Multiway cases. Functions based on rational numbers. And endless potential approaches to analysis, identifying pockets of regularity and computational reducibility.)

## Thanks

Thanks to Brad Klee for extensive help. Thanks also to those who’ve worked on nestedly recursive functions as students at our summer programs over the years, including Roberto Martinez (2003), Eric Rowland (2003), Chris Song (2021) and Thomas Adler (2024). I’ve benefitted from interactions about nestedly recursive functions with Ilan Vardi (1991), Tal Kubo (1993), Robby Villegas (2003), Todd Rowland (2003 etc.), Jim Propp (2004), Matthew Szudzik (2005 etc.), Joseph Stocke (2021 etc.), Christopher Gilbert (2024) and Max Niedermann (2024). Thanks to Doug Hofstadter for extensive answers to questions about history for this piece. It’s perhaps worth noting that I’ve personally known many of the people mentioned in the history section here (with the dates I met them indicated): John Conway (1984), Paul Erdös (1986), Sol Golomb (1981), Ron Graham (1983), Benoit Mandelbrot (1986), John McCarthy (1981) and Neil Sloane (1983).