(This post was originally posted on the NKS Forum.)
The following are some remarks that I made on the Foundations of Math (FOM) mailing list in connection with discussion of the Wolfram 2,3 Turing Machine Prize. Though much of what I say is well understood in the NKS community, I thought it might nevertheless be of interest here.
Several people forwarded me the thread on this mailing list about our 2,3 Turing machine prize.
I’m glad to see that it has stimulated discussion. Perhaps I can make a few general remarks.
What do we learn from simple universal Turing machines?
John McCarthy wrote:
In the 1950s I thought that the smallest possible (symbol-state product) universal Turing machine would tell something about the nature of computation. Unfortunately, it didn’t.
I suspect that what was imagined at that time was that by finding the smallest universal machines one would discover some “spark of computation”—some critical ingredient in the rules necessary to make universal computation possible. (At the time, it probably also still seemed that there might be a “spark of life” or a “spark of intelligence” that could be found in systems.)
I remember that when I first heard about universality in the Game of Life in the early 1970s, I didn’t think it particularly significant; it seemed like just a clever hack.
But a few decades later—after building up the whole framework in A New Kind of Science, I have a quite different view.
I now think that it’s extremely significant just how widespread—or not—computational ability is.
Are typical systems that we encounter in nature universal? Or are they computationally much simpler?
Can we expect to find non-trivial computational behavior just by searching a space of possible systems? Or can we only reach such behavior by specifically setting it up, say using traditional engineering methods?
I think these kinds of questions are very important for the future of both science and technology. In science, for example, they tell us to what extent we can expect to find “computationally reduced” models for systems we study. And in technology, they tell us to what extent we can expect to find interesting practical algorithms and other things just by searching the “computational universe”.
I don’t think that the details of which particular Turing machine is or is not universal are all that significant. What is significant, though, is how widespread universality is in general. And currently the only way we have of establishing that is to do the hard work of proving the universality or non-universality of particular Turing machines.
Our standard intuition tends to be that systems with simple rules must behave in simple ways. But what we find empirically by studying the computational universe is that that just isn’t true.
So now the problem is to get a scientific understanding of this, and to make precise statements about it.
One thing that’s come out of my efforts in this direction is what I call the Principle of Computational Equivalence. It’s a fairly bold hypothesis about which there’s much to say (e.g. Chapter 12).
But one of its predictions is that universality should be possible even in systems with very simple rules. I’ve been keen to test that prediction. And looking for small universal Turing machines is a good way to do it.
There are other reasons to be interested in small universal Turing machines, too.
Perhaps one can even use them as raw material for creating useful computers out of components like molecules. Alex Smith’s encoding for the 2,3 Turing machine clearly isn’t going to be practical (and wasn’t intended to be). But one of the lessons of modern technology is that once one knows something is in principle possible, it tends to be just a matter of time before it can be reduced to practice.
There are also theoretical computer science reasons to be interested in small universal machines. An example is understanding the tradeoffs between algorithmic complexity and computational complexity.
(For example, I made some progress in empirical studies of traditional computational complexity questions by looking at small Turing machines in Section 12.8)
But most of all, what I think is important about finding small universal Turing machines is that it helps build our intuition about what kinds of systems are universal—so that when we encounter systems in computer science, mathematics, natural science, or whatever, we have a better chance of knowing whether or not they’re likely to be universal (and thus for example to show undecidability).
How should one set up the definition of universality?
As for any concept, one wants a definition that’s useful and robust, and that one can build on.
Obviously the initial conditions can’t need a universal computer to set them up. But just what can they contain?
I think the “obvious answer” has changed since the 1950s. In the 1950s (which were firmly before my time), my impression is that one imagined that a computer memory—or a Turing machine tape—would somehow always start with 0s in every one of its cells. (Actually, even then, I suspect that when one powered up a computer, there could be random data in its memory, which would need to be explicitly cleared.)
Now, particularly when we look at the correspondence with systems in nature, it doesn’t seem so especially natural to have an infinite sequence specifically of 0s. Why not have 1s? Or 01 blocks? (Like one has digit sequences of integers vs. rationals, etc.)
I must say that I would consider it most elegant to have universality established with an initial condition that is basically just repetitive. But what is so special about repetitiveness? There are lots of other patterns one can imagine. And it seems as if the only robust distinction one can make is whether the pattern itself requires universal computation to set up.
There is no doubt some nice theoretical computer science that can be done in tightening up all these notions.
Perhaps there is really a whole hierarchy of “levels of universality”—with systems requiring different levels of computation in the preparation of their inputs to achieve universality. (One might imagine block-universal vs. finite-automaton-universal vs. context-free-universal, etc.)
My own intuition is that while there may be boundary cases where universality is possible with some forms of input but not others, this will be rare—and that most of the time a system will either be “fully universal” or not universal at all.
There may well be cases where special forms of input prevent universality. It might even be that infinite sequences of 0s prevent universality in the “prize” 2,3 Turing machine (which has rather special behavior in a field of 0s). But I’m guessing that if one considers more robust classes of encodings, it will usually matter very little which class one is using.
I’m guessing that the situation is similar to intermediate degrees. That there are in principle systems that show undecidability but not universality, but that this is extremely rare.
It would, of course, be quite wonderful to have a simple Turing machine or other system that fundamentally shows undecidability but not universality. Though I wonder slightly whether this is in any robust sense possible. After all, if one looks at the existing examples of intermediate degrees, they always seem to have “universality inside”.
OK, so what about the 2,3 “prize” Turing machine? Alex Smith’s proof is about universality with initial conditions created by fairly complicated (but non-universal) computations. Can it be extended to simpler initial conditions? I certainly expect the initial conditions can be considerably simpler.
Can they be a finite region embedded in an infinite sea of 0s? Obviously I don’t know. Though it would be interesting to find out.
I might mention, by the way, that there are other 2,3 (and 3,2) Turing machines that I suspect are universal too. And some of them definitely have quite different behaviors with respect to infinite sequences of 0s.
Will the encoding always be more complex if the machine is simpler?
Our standard intuition always tends to be that the more complex a thing we want to get out, the more complex a thing we have to put in.
But one of the big points of A New Kind of Science (encapsulated, for example, in the Principle of Computational Equivalence) is that this isn’t generally true.
Alex Smith’s encoding for the 2,3 Turing machine is complicated and inefficient. But is this inevitable when one has a simple underlying system?
I don’t think so. But so far I don’t really have strong evidence.
Here’s one slightly related observation, though.
One of the things I did in A New Kind of Science (and thought was rather nice) was finding the very simplest possible (equational) axiom system for Boolean algebra. It turns out to be just one axiom: ((b.c).a).(b.(b.a).b))==a. (See Section 12.9)
Now the question is: will proofs that are based on this tiny axiom system inevitably be systematically longer than ones based on bigger (“engineered”) axiom systems? (Do they for example always first have to prove the axioms of the bigger axiom systems as theorems, and then go on from there?)
Well, what I found empirically is that bigger axioms aren’t necessarily more efficient. (This may not be a robust result; it could depend on details of theorem-proving algorithms.)
In fact, as http://www.wolframscience.com/nksonline/page-1175d-text shows, even the single-axiom axiom system isn’t terribly inefficient—at least after it proves commutativity. And the minimal axiom system I found that explicitly includes commutativity—{(a.b).(a.(b.c))==a, a.b==b.a}—seems to be pretty much as efficient as anything.
This isn’t directly relevant to encodings for universality, but perhaps it gives some indications.
Another way I tried to get evidence about encodings was to see how far I could get in covering a space of possible finite cellular automaton computations by using encodings based on blocks of different lengths.
It was encouraging that cellular automata I didn’t think were universal didn’t manage to cover much, but ones I did think were universal did.
I looked a little at whether more complicated rules would allow smaller sets of blocks (“smaller encodings”) to cover a given region of cellular automaton computation space. And didn’t find any evidence of it.
I’m not quite sure in general how to formulate the problem of complexity of encodings.
After all, there are inevitably computations that are arbitrarily difficult to encode for a given system.
But somehow the question is whether “reasonable” or “interesting” computations can be encoded in a short way in a given system.
It’s a little like asking what it takes for a programming language to let you write the programs you want in a short way.
For a language designer like me, this is an important practical question. And I’d love to have a better theoretical formulation of it.
(Our “open code” Demonstrations Project has some pretty interesting examples of what can and occasionally can’t be done with short Mathematica programs…)
What happens after the prize, etc.?
I’ve obviously spent a huge amount of effort pursuing my interests in NKS and writing up what I’ve done. As well as being fascinating and fun for me, I think it’s important stuff—and I’ve been keen to get other people involved.
We’ve had a lot of success with things like our Summer School. But with the prize we wanted to try stimulating the field in a different way.
And so far, the experiment of the prize seems very successful.
Is the prize the end of the road for this particular 2,3 Turing machine? Definitely not. There is lots more to be studied and established about it. Perhaps it’ll be proved that it can’t be universal in certain senses. Probably there’ll be much simpler proofs of universality found. Perhaps someone will “reduce to number theory” or some such the detailed behavior from a blank tape. Even though it has such simple rules, it’s obviously a rich system to study.
And having seen how this prize has gone, I’m now motivated to think about setting up some other prizes…