Not a lot gets written in the general press about foundational issues in mathematics, but this afternoon I found myself talking to a journalist about the subject of certainty in mathematics.

It turned out to be a pretty interesting conversation, so I thought I’d write here about a few things that came up.

Mathematics likes to think of itself as a very certainty-based business. If you’ve “proved something mathematically”, then it’s supposed to just be true. No ifs or buts. Complete certainty.

But in practice that’s not quite how it works—at least the way mathematics has traditionally been done. Because in reality a mathematical proof of the kind people publish in papers is something much more social. It’s a vehicle for convincing other humans—one’s fellow mathematicians—that something is true.

There’ve been a few million mathematical proofs published over the past century or so. Essentially all of them are written in a kind of human-compatible mixture of mathematical formalism and ordinary natural language.

They’re intended for human consumption. For people to read, and process. The best of them aren’t just arguments for some particular theorem. Instead they’re expositions that illuminate some whole mathematical structure.

But inevitably they require effort to read. It’s not just a mechanical matter. Instead, every reader of every proof has to somehow map what the proof is saying into their particular patterns of thought. So that they can personally be convinced that the proof is true.

And of course, in practice, proofs written by humans have bugs. Somewhere between the imprecision of ordinary language, and the difficulty of really thinking through every possible eventuality, it’s almost inevitable that any long proof that’s been published isn’t perfect. And even with an army of people to check it, not every bug will be found.

So how do computers—and *Mathematica* change this picture?

The first thing we might think is: let’s use computers instead of humans to check all those proofs we have. Well, as such that’s not going to work. Because the proofs we have aren’t in computer-digestible form. They’re written for humans. With human language. And to have computers read them, we’d first have to solve the problem of having computers understand natural language.

But as is so often the case, once we really start thinking in terms of computers and computation, we realize that there’s a whole different way to operate—that’s not just a straight automation of what humans have done before.

In the traditional way of doing mathematics, mathematicians use their imagination and intuition to come up with theorems. Then they do a lot of work to try to fill in the proofs and check what’s true.

But what if we could somehow be more systematic about establishing mathematical facts? Well, that’s something that’s kind of natural with computation. We compute what 2+2 is. We compute the factors of some polynomial. We compute how some iteration behaves. And each of these computations establishes some mathematical fact.

Of course, it takes judgement to see what facts are worth establishing. But with computation the actual process of establishing the facts becomes in a sense mechanical.

The traditional way of doing mathematics is a bit like painting, where every step relies on human effort. But with computation, what we do becomes more like photography. The effort and aesthetic judgement is all in the setup: then the execution is mechanical.

Needless to say, with a new methodology like this, one can do all sorts of interesting new things. In fact, I think there’s what one can think of as a whole generalization of mathematics that becomes possible when one starts just systematically exploring the complete universe of possible abstract mathematical constructs.

But how does computation relate to standard kinds of facts in mathematics?

One might think there’s a difference between a simple fact like 2+2=4, and some theorem that says that such-and-such a thing is true for all x.

It seems like 2+2=4 is a finite question, that one could always answer by a finite number of computational steps. But that somehow once one starts talking about things “true for all x”, one needs something different.

But the whole idea that launched algebra—and that we routinely use with symbolic computation in *Mathematica*—is that one doesn’t. Finite operations on symbolic expressions can give one results that are true for infinite numbers of possible values. Like 2x+2x=4x, for all x.

And indeed, in *Mathematica*, particularly with functions like `Reduce`, we’re able, in effect, to establish some pretty sophisticated theorems about algebra and geometry and so on, just by “doing computations”.

Of course, that’s not to say that computation can somehow systematically decide any question in mathematics. Godel’s Theorem shows that it can’t.

In fact, if instead of having humans come up with mathematical questions, one just lets a computer systematically do it, I don’t think one has to go far before one gets to questions that are formally undecidable.

And what that means is that if one just follows the rules and tries to answer a question from the mathematical axioms one’s set up, there’s no finite procedure that can guarantee to do it. Whether one’s a computer—or a human. (Though sometimes the humans just decide to add new axioms “because they seem reasonable”—but then one’s outside the domain of systematic mathematical reasoning.)

OK, but what about the practicalities?

Let’s say we do a computation—say with *Mathematica*—and we find a mathematical result. How can we be sure it’s correct?

If we were doing a traditional human proof, we’d have to rely on other humans to check it. Each one separately.

But if we’re using *Mathematica* on a computer, it’s a much more systematic thing. From the *Mathematica* input we’ve given, we have a precise definition of what we’re trying to do.

And it’s then perfectly reproducible: we can just run the computation we’ve defined, anywhere, anytime. And see what result it gives.

But let’s say we’ve done a perfect job defining our computation. How can we be sure we get the right answer from it? How do we know there’s isn’t a bug in our computer, or in *Mathematica*?

What if the number 42 that we got out isn’t correct? What if there’s a Pentium-FDIV-like bug, and it really should have been 41?

In practice, interesting mathematical results tend to have lots of structure that builds up around them. They don’t just rely on one computational result. They have lots of interlocking pieces that form a picture that one can understand.

But still, the question remains: can we trust what the computer tells us?

I mean, think of all those everyday times we’re told that such-and-such a thing is wrong “because of a computer error”.

I’ve noticed that there’s a pretty good heuristic for guessing where the error is likely to lie: it’s in the part of the chain that’s been exercised least often.

It’s really really unlikely that the error is in the hardware of the computer (though that does happen, as in the Pentium FDIV bug). It’s a tiny bit less unlikely that it’s in the system software. And still a tiny bit less unlikely that it’s in the application software.

But of course the overwhelming probability if there’s an error is that it’s in the human operation of the software.

That’s not surprising. Because that’s the part that’s being done for the first time.

And it’s the part that one can’t systematically test.

Over the last twenty years, we’ve built up a huge effort in automated testing of *Mathematica* (using *Mathematica* of course).

We run millions and millions of tests on every version of *Mathematica*, trying to exercise every part of the system. And doing that is orders of magnitude more powerful at catching bugs than any kind of pure human testing.

Sometimes we use the symbolic capabilities of *Mathematica* to analyse the raw code of *Mathematica*. But pretty quickly we tend to run right up against undecidability: there’s a theoretical limit to what we can automatically verify.

Why not just read the code, like people read mathematical proofs?

Traditional mathematical proofs are of course written so people can read them. But the code of *Mathematica* is written to give the best answers, independent of how hard it was to write—or read.

Think of almost any mathematical operation. Multiplying numbers. Factoring polynomials. Doing integrals. There are traditional human algorithms for doing these things.

And if the code of *Mathematica* just used those algorithms, it’d probably be quite easy to read.

But it’d also be really slow, and fragile.

And in fact one of the things that’s happened in *Mathematica* over the past decade or so is that almost all its internal algorithms have become very broad “polyalgorithms”—with lots of separate algorithms automatically being selected when they’re needed. And routinely making use of algorithms from very different parts of the system.

So even if there’s some algorithm for doing something that’s written out as a page of pseudocode in a paper or a book, the chances are that the real production algorithm in *Mathematica* is hundreds of pages long—and makes use of perhaps hundreds of other sophisticated algorithms in other parts of the system.

So it quickly becomes completely impractical to imagine “just reading the code” for something.

Actually, it’s even worse than that. You see, in recent years, we’ve increasingly been using NKS methods for doing automated algorithm discovery: finding efficient robust algorithms by systematically searching large spaces of possible programs.

But such algorithms were not constructed to be understandable—and much like things we see in physics or biology, we can see that they work efficiently, but they’re really difficult for us humans to understand.

And in general, the way *Mathematica* works inside just isn’t very “human friendly”.

Like when *Mathematica* does an integral. It doesn’t use all those little tricks people learn in calculus classes. It uses very systematic and powerful methods that involve elaborate higher mathematics.

At our Integrator website, people are always asking (perhaps for their homework) to see the steps that *Mathematica* uses to work out the integrals. But those steps would be quite useless to them.

Getting human understandable steps is really a quite separate problem from actually doing the integral. It’s something one has to set up all sorts of detailed heuristic rules for—as on the web*Mathematica* site calc101.com.

When *Mathematica* was young, we actually used to include as part of our software distribution a lot of source code for doing things like symbolic integration. It happened to be easier to do the software engineering that way. And we had the idea that perhaps occasionally someone would look at the code, and give us a good suggestion about it.

But that turned out to be completely unrealistic. The code was pretty clean. But it was mathematically very sophisticated. And only a very small number of experts (quite a few of whom had actually worked on the code for us) could understand it.

And the only tangible thing that happened were a few tech support calls from people who thought they could modify some piece of the code—and were about to horribly mess things up.

OK, so what *can* one do to help raise the level of certainty in the mathematics people do?

One direction is to set up human-constructed proofs in a way that can readily be checked by computer. This is a difficult thing. Because in effect it requires creating a precise formal language for all the things that people normally write in natural language in their papers.

And when one’s done that, one then has to write every proof with the same level of completeness and precision as one uses to write programs.

Still, there have been some interesting developments along these lines—a notable one being the Theorema *Mathematica*-based proof system.

Another direction is to get actual proofs constructed automatically by computer. Functions like `Reduce` implicitly do it. And in *Mathematica 6*, `FullSimplify` can quite directly use automated theorem proving to construct chains of results from systems of axioms.

In fact, when I was working on the NKS book, I used a precursor of what’s in *Mathematica 6* to prove something I thought was fairly interesting (that ((a.b).c).(a.((a.c).a)) = c is the simplest possible axiom system for logic). And I thought it’d be illuminating to actually give the proof in the book.

So I did. It fills two pages, in 4-point type. It’s all there. But it’s utterly incomprehensible.

It’s the same story over and over again. Non-trivial computer-generated proofs—like “production-grade” algorithms—aren’t naturally human understandable.

Even though they satisfy a formalist’s view of establishing what’s true, they fail on the “anthropomathematics”—on relating what’s true to what humans can instrinsically understand.

I’ve thought quite a bit about the anthropomathematics problem. I’ve tried to determine statistical and other properties of good human-created proofs. To get an idea of how to organize automated proofs so that they are easy for humans to assimilate—just as we do computational aesthetics to make visualizations easy for humans to assimilate.

In the NKS book, I spent a lot of effort creating graphics that would make results visually clear. And in general, I’ve found that creating visualizations of the steps in a process is the single best way to understand what’s going on in the process.

And now, with *Mathematica* 6, we have a lot of new possibilities—for example creating dynamic interfaces on the fly that allow one to explore and drill-down in different aspects of a proof. A good test as far as I’m concerned will be to turn my logic-axiom proof into something that can not only readily be checked by computer, but that I as a human can understand.

Of course, the reality is that most of the time, one doesn’t really care about the proof as such. One just wants to know what’s true. So one can get on and find more things that are true.

And the good news is that with all the automation and everything that’s now in *Mathematica*, it’s becoming easier and easier to do that…