Who Can Understand the Proof? A Window on Formalized Mathematics

Wolfram, Stephen

Contents Who Can Understand the Proof? A Window on Formalized Mathematics

Who Can Understand the Proof? A Window on Formalized Mathematics (25th Anniversary of the Wolfram Axiom) » (VIDEO)

Who Can Understand the Proof? A Window on Formalized Mathematics

January 9, 2025

The Simplest Axiom for Logic

Theorem (Wolfram with Mathematica, 2000):
The single axiom ((a•b)•c)•(a•((a•c)•a))c is a complete axiom system for Boolean algebra (and is the simplest possible)

For more than a century people had wondered how simple the axioms of logic (Boolean algebra) could be. On January 29, 2000, I found the answer—and made the surprising discovery that they could be about twice as simple as anyone knew. (I also showed that what I found was the simplest possible.)

It was an interesting result—that gave new intuition about just how simple the foundations of things can be, and for example helped inspire my efforts to find a simple underlying theory of physics.

But how did I get the result? Well, I used automated theorem proving (specifically, what’s now FindEquationalProof in Wolfram Language). Automated theorem proving is something that’s been around since at least the 1950s, and its core methods haven’t changed in a long time. But in the rare cases it’s been used in mathematics it’s typically been to confirm things that were already believed to be true. And in fact, to my knowledge, my Boolean algebra axiom is actually the only truly unexpected result that’s ever been found for the first time using automated theorem proving.

But, OK, so we know it’s true. And that’s interesting. But what about the proof? Does the proof, for example, show us why the result is true? Well, actually, in a quarter of a century, nobody (including me) has ever made much headway at all in understanding the proof (which, at least in the form we currently know it, is long and complicated). So is that basically inevitable—say as a consequence of computational irreducibility? Or is there some way—perhaps using modern AI—to “humanize” the proof to a point where one can understand it?

It is, I think, an interesting challenge—that gets at the heart of what one can (and can’t) expect to achieve with formalized mathematics. In what follows, I’ll discuss what I’ve been able to figure out—and how it relates to foundational questions about what mathematics is and how it can be done. And while I think I’ve been able to clarify some of the issues, the core problem is still out there—and I’d like to issue it here as a challenge:

Challenge: Understand the proof of the Theorem

What do I mean by “understand”? Inevitably, “understand” has to be defined in human terms. Something like “so a human can follow and reproduce it”—and, with luck, feel like saying “aha!” at some point, the kind of way they might on hearing a proof of the Pythagorean theorem (or, in logic, something like de Morgan’s law Not[And[p, q]]Or[Not[p], Not[q]]).

It should be said that it’s certainly not clear that such an understanding would ever be possible. After all, as we’ll discuss, it’s a basic metamathematical fact that out of all possible theorems almost none have short proofs, at least in terms of any particular way of stating the proofs. But what about an “interesting theorem” like the one we’re considering here? Maybe that’s different. Or maybe, at least, there’s some way of building out a “higher-level mathematical narrative” for a theorem like this that will take one through the proof in human-accessible steps.

In principle one could always imagine a somewhat bizarre scenario in which people would just rote learn chunks of the proof, perhaps giving each chunk some name (a bit like how people learned bArbArA and cElArEnt syllogisms in the Middle Ages). And in terms of these chunks there’d presumably then be a “human way” to talk about the proof. But learning the chunks—other than as some kind of recreational or devotional activity—doesn’t seem to make much sense unless there’s metamathematical structure that somehow connects the chunks to “general concepts” that are widely useful elsewhere.

But of course it’s still conceivable that there might be a “big theory” that would lead us to the theorem in an “understandable way”. And that could be a traditional mathematical theory, built up with precise, if potentially very abstract, constructs. But what about something more like a theory in natural science? In which we might treat our automatically generated proof as an object for empirical study—exploring its characteristics, trying to get intuition about it, and ultimately trying to deduce the analog of “natural laws” that give us a “human-level” way of understanding it.

Of course, for many purposes it doesn’t really matter why the theorem is true. All that matters is that it is true, and that one can deduce things on the basis of it. But as one thinks about the future of mathematics, and the future of doing mathematics, it’s interesting to explore to what extent it might or might not ultimately be possible to understand in a human-accessible way the kind of seemingly alien result that the theorem represents.

The Proof as We Know It

I first presented a version of the proof on two pages of my 2002 book A New Kind of Science, printing it in 4-point type to make it fit:

Today, generating a very similar proof is a one-liner in Wolfram Language (as we’ll discuss below, the · dot here can be thought of as representing the Nand operation):

The proof involves 307 (mostly rather elaborate) steps. And here’s one page of it (out of about 30)—presented in the form of a computable Wolfram Language dataset:

What’s the basic idea of this proof? Essentially it’s to perform a sequence of purely structural symbolic operations that go from our axiom to known axioms of Boolean algebra. And the proof does this by proving a series of lemmas which can be combined to eventually give what we want:

The highlighted “targets” here are the standard Sheffer axioms for Boolean algebra from 1913:

And, yes, even though these are quite short, the intermediate lemmas involved in the proof get quite long—the longest involving 60 symbols (i.e. having LeafCount 60):

It’s as if to get to where it’s going, the proof ends up having to go through the wilds of metamathematical space. And indeed one gets a sense of this if one plots the sizes (i.e. LeafCount) of successive lemmas:

Here’s the distribution of these sizes, showing that while they’re often small, there’s a long tail (note, by the way, that if dot · appears n times in a lemma, the LeafCount will be 2n + 3):

So how are these lemmas related? Here’s a graph of their interdependence (with the size of each dot being proportional to the size of the lemma it represents):

Zooming in on the top we see more detail:

We start from our axiom, then derive a whole sequence of lemmas—as we’ll see later, always combining two lemmas to create a new one. (And, yes, we could equally well call these things theorems—but we generate so many of them it seems more natural to call them “lemmas”.)

So, OK, we’ve got a complicated proof. But how can we check that it’s correct? Well, from the symbolic representation of the proof in the Wolfram Language we can immediately generate a “proof function” that in effect contains executable versions of all the lemmas—implemented using simple structural operations:

And when you run this function, it applies all these lemmas and checks that the result comes out right:

And, yes, this is basically what one would do in a proof assistant system (like Lean or Metamath)—except that here the steps in the proof were generated purely automatically, without any human guidance (or effort). And, by the way, the fact that we can readily translate our symbolic proof representation into a function that we can run provides an operational manifestation of the equivalence between proofs and programs.

But let’s look back at our lemma-interdependence “proof graph”. One notable feature is that we see several nodes with high out-degree—corresponding to what we can think of as “pivotal lemmas” from which many other lemmas end up directly being proved. So here’s a list of the “most pivotal” lemmas in our proof:

Or, more graphically, here are the results for all lemmas that occur:

So what are the “pivotal lemmas”? a · b = b · a we readily recognize as commutativity. But the others—despite their comparative simplicity—don’t seem to correspond to things that have specifically shown up before in the mathematical literature (or, as we’ll discuss later, that’s at least what the current generation of LLMs tell us).

But looking at our proof graph something we can conclude is that a large fraction of the “heavy lifting” needed for the whole proof has already happened by the time we can prove a · b = b · a. So, for the sake of avoiding at least some of hairy detail in the full proof, in most of what follows, we’ll concentrate on the proof of a · b = b · a—which FindEquationalProof tells us we can accomplish in 104 steps, with a proof graph of the form

with the sizes of successive lemmas (in what is basically a breadth-first traversal of the proof graph) being:

The “Machine Code” of the Proof

It’s already obvious from the previous section that the proof as we currently know it is long, complicated, and fiddly—and in many ways reminiscent of something at a “machine-code” level. But to get a grounded sense of what’s going on in the proof, it’s useful to dive into the details—even if, yes, they can be seriously hard to wrap one’s head around.

At a fundamental level, the way the proof—say of a · b = b · a—works is by starting from our axiom, and then progressively deducing new lemmas from pairs of existing lemmas. In the simplest case, that deduction works by straightforward symbolic substitution. So, for example, let’s say we have the lemmas

and

Then it turns out that from these lemmas we can deduce:

Or, in other words, knowing that the first two lemmas hold for any a gives us enough information about · that the third lemma must inevitably also hold. So how do we derive this?

Our lemmas in effect define two-way equivalences: their left-hand sides are defined as equal to their right-hand sides, which means that if we see an expression that (structurally) matches one side of a lemma, we can always replace it by the other side of the lemma. And to implement this, we can write our second lemma explicitly as a rule—where to avoid confusion we’re using x rather than a:

But if we now look at our first lemma, we see that there’s part of it (indicated with a frame) that matches the left-hand side of our rule:

If we replace this part (which is at position {2,2}) using our rule we then get

which is precisely the lemma we wanted to deduce.

We can summarize what happened here as a fragment of our proof graph—in which a “substitution event” node takes our first two lemmas as input, and “outputs” our final lemma:

As always, the symbolic expressions we’re working with here can be represented as trees:

The substitution event then corresponds to a tree rewriting:

The essence of automated theorem proving is to find a particular sequence of substitutions etc. that get us from whatever axioms or lemmas we’re starting with, to whatever lemmas or theorems we want to reach. Or in effect to find a suitable “path” through the multiway graph of all possible substitutions etc. that can be made.

So, for example, in the particular case we’re considering here, this is the graph that represents all possible transformations that can occur through a single substitution event:

The particular transformation (or “path”) we’ve used to prove a · a = a · ((a · a) · a) is highlighted. But as we can see, there are many other possible lemmas that can be generated, or in other words that can be proved from the two lemmas we’ve given as input. Put another way, we can think of our input lemmas as implying or entailing all the other lemmas shown here. And, by analogy to the concept of a light cone in physics, we can view the collection of everything entailed by given lemmas or given events as the (future) “entailment cone” of those lemmas or events. A proof that reaches a particular lemma is then effectively a path in this entailment cone—analogous in physics to a world line that reaches a particular spacetime point.

If we continue building out the entailment cone from our original lemmas, then after two (substitution) events we get:

There are 21 lemmas generated here. But it turns out that beyond the lemma we already discussed there are only three (highlighted here) that appear in the proof we are studying here:

And indeed the main algorithmic challenge of theorem proving is to figure out which lemmas to generate in order to get a path to the theorem one’s trying to prove. And, yes, as we’ll discuss later, there are typically many paths that will work, and different algorithms will yield different paths and therefore different proofs.

But, OK, seeing how new lemmas can be derived from old by substitution is already quite complicated. But actually there’s something even more complicated we need to discuss: deriving lemmas not only by substitution but also by what we’ve called bisubstitution.

We can think of both substitution and bisubstitution as turning one lemma X == Y into a transformation rule (either X Y or Y X), and then applying this rule to another lemma, to derive a new lemma. In ordinary substitution, the left-hand side of the rule directly matches (in a Wolfram Language pattern-matching sense) a subexpression in the lemma we’re transforming. But the key point is that all the variables that appear in both our lemmas are really “pattern variables” (x_ etc. in Wolfram Language). So that means there’s another way that one lemma can transform another, in which in effect replacements are made not only in the lemma being transformed, but also in the lemma that’s doing the transforming.

The net effect, though, is still to take two lemmas and derive another, as in:

But in tracing through the details of our proof, we need to distinguish “substitution events” (shown yellowish) from “bisubstitution” ones (shown reddish). (In FindEquationalProof in Wolfram Language, lemmas produced by ordinary substitution are called “substitution lemmas”, while lemmas produced by bisubstitution are called “critical pair lemmas”.)

OK, so how does bisubstitution work? Let’s look at an example. We’re going to be transforming the lemma

using the lemma (which in this case happens to be our original axiom)

to derive the new lemma:

We start by creating a rule from the second lemma. In this case, the rule we need happens to be reversed relative to the way we wrote the lemma, and this means that (in the canonical form we’re using) it’s convenient to rename the variables that appear:

To do our bisubstitution we’re going to apply this rule to a subterm of our first lemma. We can write that first lemma with explicit pattern variables:

As always, the particular names of those variables don’t matter. And to avoid confusion, we’re going to rename them:

Now look at this subterm of this lemma (which is part {2,1,1,2} of the expression):

It turns out that with appropriate bindings for pattern variables this can be matched (or “unified”) with the left-hand side of our rule. This provides a way to find such bindings:

(Note that in these bindings things like c_ stand only for explicit expressions, like c_, not for expressions that the ordinary Wolfram Language pattern c_ would match.)

Now if we apply the bindings we’ve found to the left-hand side of our rule

and to the subterm we picked out from our lemma

we see that we get the same expression. Which means that with these bindings the subterm matches the left-hand side of our rule, and we can therefore replace this subterm with the right-hand side of the rule. To see all this in operation, we first apply the bindings we’ve found to the lemma we’re going to transform (and, as it happens, the binding for y_ is the only one that matters here):

Now we take this form and apply the rule at the position of the subterm we identified:

Renaming variables

we now finally get exactly the lemma that we were trying to derive:

And, yes, getting here was a pretty complicated process. But with the symbolic character of our lemmas, it’s one that is inevitably possible, and so can be used in our proof. And in the end, out of the 101 lemmas used in the proof, 47 were derived by ordinary substitution, while 54 were derived by bisubstitution.

And indeed the first few steps of the proof turn out to use only bisubstituion. An example is the first step—which effectively applies the original axiom to itself using bisubsitution:

And, yes, even this very first step is pretty difficult to follow.

If we start from the original axiom, there are 16 lemmas that can be derived purely by a single ordinary substitution (effectively of the axiom into itself)—resulting in the following entailment cone:

As it happens, though, none of the 16 new lemmas here actually get used in our proof. On the other hand, in the bisubstitution entailment cone

there are 24 new lemmas, and 4 of them get used in the proof—as we can see from the first level of the proof graph (here rotated for easier rendering):

At the next level of the entailment cone from ordinary substitution, there are 5062 new lemmas—none of which get used in the proof. But of the 31431 new lemmas in the (pure) bisubstitution entailment cone, 13 do get used:

At the next level, lemmas generated by ordinary substitution also start to get used:

Here’s another rendering of these first few levels of the proof graph:

Going to another couple of levels we’re starting to see quite a few independent chains of lemmas developing

which eventually join up when we assemble the whole proof graph:

A notable feature of this proof graph is that it has more bisubstitution events at the top, and more ordinary substitution events at the bottom. So why is that? Essentially it seems to be because bisubstitution events tend to produce larger lemmas, and ordinary substitution events tend to produce smaller ones—as we can see if we plot input and output lemma sizes for all events in the proof:

So in effect what seems to be happening is that the proof first has to “spread out in metamathematical space”, using bisubstitution to generate large lemmas “far out in metamathematical space”. Then later the proof has to “corral things back in”, using ordinary substitution to generate smaller lemmas. And for example, at the very end, it’s a substitution event that yields the final theorem we’re trying to prove:

And earlier in the graph, there’s a similar “collapse” to a small (and rather pivotal) lemma:

As the plot above indicates, ordinary substitution can lead to large lemmas, and indeed bisubstitution can also lead to smaller ones, as in

or slightly more dramatically:

But, OK, so this is some of what’s going on at a “machine-code” level inside the proof we have. Of course, given our axiom and the operations of substitution and bisubstitution there are inevitably a huge number of different possible proofs that could be given. The particular proof we’re considering is what the Wolfram Language FindEquationalProof gives. (In the Appendix, we’ll also look at results from some other automated theorem proving systems. The results will be very comparable, if usually a little lengthier.)

We won’t discuss the detailed (and rather elaborate) algorithms inside FindEquationalProof. But fundamentally what they’re doing is to try constructing certain lemmas, then to find sequences of lemmas that eventually form a “path” to what we’re trying to prove. And as some indication of what’s involved in this, here’s a plot of the number of “candidate lemmas” that are being maintained as possible when different lemmas in the proof are generated:

And, yes, for a while there’s roughly exponential growth, leveling off at just over a million when we get to the “pulling everything together” stage of the proof.

Unrolling the Proof

In what we’ve done so far, we’ve viewed our proof as working by starting from an axiom, then progressively building up lemmas, until eventually we get to the theorem we want. But there’s an alternative view that’s in some ways useful in getting a more direct, “mechanical” intuition about what’s going on in the proof.

Let’s say we’re trying to prove that our axiom implies that p · q = q · p. Well, then there must be some way to start from the expression p · q and just keep on judiciously applying the axiom until eventually we get to the expression q · p. And, yes, the number of axiom application steps required might be very large. But ultimately, if it’s true that the axiom implies p · q = q · p there must be a path that gets from p · q to q · p.

But before considering the case of our full proof, let’s start with something simpler. Let’s assume that we’ve already established the lemmas:

Then we can treat them as axioms, and ask a question like whether they imply the lemma

or, in our current approach, whether they can be used to form a path from a · a to a · (a · (a · a)).

Well, it’s not too hard to see that in fact there is such a path. Apply our second lemma to a · a to get:

But now this subterm

matches the left-hand of the first lemma, so that it can be replaced by the right-hand side of that lemma (i.e. by a · (a · a)), giving in the end the desired a · (a · (a · a)).

So now we can summarize this process as:

In what follows, it’ll be convenient to label lemmas. We’ll call our original axiom A1, we’ll call our successive lemmas generated by ordinary substitution Sn and the ones generated by bisubsitution Bn:

In our proof we’ll also use and to indicate whether we’re going to use the lemma (say X = Y) in the “forward direction” X Y or the “reverse direction” X Y. And with this labeling, the proof we just gave (which is for the lemma S23) becomes:

Each step here is a pure substitution, and requires no replacement in the rule (i.e. “axiom”) being used. But proofs like this can also be done with bisubstitution, where replacements are applied to the rule to get it in a form where it can directly be applied to transform an expression:

OK, so how about the first lemma in our full proof? Here’s a proof that its left-hand side can be transformed to its right-hand side just by judiciously applying the original axiom:

Here’s a corresponding proof for the second lemma:

Both these involve bisubstitution. Here’s a proof of the first lemma derived purely by ordinary substitution:

This proof is using not only the original axiom but also the lemma B5. Meanwhile, B5 can be proved using the original axiom together with B2:

But now, inserting the proof we just gave above for B2, we can give a proof of B5 just in terms of the original axiom:

And recursively continuing this unrolling process, we can then prove S1 purely using the original axiom:

What about the whole proof? Well, at the very end we have:

If we “unroll” one step we have

and after 2 steps:

In principle we could go on with this unrolling, in effect recursively replacing each rule by the sequence of transformations that represents its proof. Typically this process will, however, generate exponentially longer proof sequences. But say for lemma S5

the result is still very easily manageable:

We can summarize this result by in effect plotting the sizes of the intermediate expressions involved—and indicating what part of each expression is replaced at each step (with as above indicating “forward” use of the axiom A1 and “backward” A1 ):

For lemma B33

the unrolled proof is now 30 steps long

while for lemma S11

the unrolled proof is 88 steps long:

But here there is a new subtlety. Doing a direct substitution of the “proof paths” for the lemmas used to prove S11 in our original proof gives a proof of length 104:

But this proof turns out to be repetitive, with the whole gray section going from one copy to another of:

As an example of a larger proof, we can consider lemma B47:

And despite the simplicity of this lemma, our proof for it is 1008 steps long:

If we don’t remove repetitive sections, it’s 6805 steps:

Can we unroll the whole proof of a · b = b · a? We can get closer by considering lemma S36:

Its proof is 27105 steps long:

The distribution of expression sizes follows a roughly exponential distribution, with a maximum of 20107:

Plotting the expression sizes on a log scale one gets:

And what stands out most here is a kind of recursive structure—which is the result of long sequences that basically represent the analog of “subroutine calls” that go back and repeatedly prove lemmas that are needed.

OK, so what about the whole proof of a · b = b · a? Yes, it can be unrolled—in terms of 83,314 applications of the original axiom. The sequence of expression sizes is:

Or on a log scale:

The distribution of expression sizes now shows clear deviation from being exponential:

The maximum is 63245, which occurs just 81 steps after the exact midpoint of the proof. In other words, in the middle, the proof has wandered incredibly far out in metamathematical space (there are altogether CatalanNumber[63245] ≈ 10³⁸⁰⁷⁰ possible expressions of the size it reaches).

The proof returns to small expressions just a few times; here are all the cases in which the size is below 10:

So, yes, it is possible to completely unroll the proof into a sequence of applications of the original axiom. But if one does this, it inevitably involves repeating lots of work. Being able to use intermediate lemmas in effect lets one “share common subparts” in the proof. So that one ends up with just 104 “rule applications”, rather than 83314. Not that it’s easy to understand those 104 steps…

Is There a Better Notation?

Looking at our proof—either in its original “lemma” form, or in its “unrolled” form—the most striking aspect of it is how complicated (and incomprehensible) it seems to be. But one might wonder whether much of that complexity is just the result of not “using the right notation”. In the end, we’ve got a huge number of expressions written in terms of · operations that we can interpret as Nand (or Nor). And maybe it’s a little like seeing the operation of a microprocessor down at the level of individual gates implementing Nands or Nors. And might there perhaps be an analog of a higher-level representation—with higher-level operations (even like arithmetic) that are more accessible to us humans?

It perhaps doesn’t help that Nand itself is a rather non-human construct. For example, not a single natural human language seems to have a word for Nand. But there are combinations of Nands that have more familiar interpretations:

But what combinations actually occur in our proof? Here are the most common subexpressions that appear in lemmas in the proof:

And, yes, we could give the most common of these special names. But it wouldn’t really help in “compressing” the proof—or making it easier to understand.

What about “upgrading” our “laws of inference”, i.e. the way that we can derive new lemmas from old? Perhaps instead of substitution and bisubstitution, which both take two lemmas and produce one more, we could set up more elaborate “tactics” that for example take in more input lemmas. We’ve seen that if we completely unroll the proof, it gets much longer. So perhaps there is a “higher-order” setup that for example dramatically shortens the proof.

One way one might identify this is by seeing commonly repeating structures in the subgraphs that lead to lemmas. But in fact these subgraphs are quite diverse:

What Are the Popular Lemmas?

A typical feature of human-written mathematical proofs is that they’re “anchored” by famous theorems or lemmas. They may have fiddly technical pieces. But usually there’s a backbone of “theorems people know”.

We have the impression that the proof we’re discussing here “spends most of its time wandering around the wilds of metamathematical space”. But perhaps it visits waypoints that are somehow recognizable, or at least should be. Or in other words, perhaps out in the metamathematical space of lemmas there are ones that are somehow sufficiently popular that they’re worth giving names to, and learning—and can then be used as “reference points” in terms of which our proof becomes simpler and more human accessible.

It’s a story very much like what happens with human language. There are things out there in the world, but when there’s a certain category of them that are somehow common or important enough, we make a word for them in our language, which we can then use to “compactly” refer to them. (It’s again the same story when it comes to computational language, and in particular the Wolfram Language, except that in that case it’s been my personal responsibility to come up with the appropriate definitions and names for functions to represent “common lumps of computation”.)

But, OK, so what are the “popular lemmas” of Nand proofs? One way to explore this is to enumerate statements that are “true about Nand”—then to look at proofs of these statements (say found with FindEquationalProof from our axiom) and see what lemmas show up frequently in them.

Enumerating statements “true about Nand”, starting from the smallest, we get

where we have highlighted statements from this list that appear as lemmas in our proof.

Proving each of these statements from our original axiom, here are the lengths of proofs we find (for all 1341 distinct theorems with up to LeafCount 4 on each side):

A histogram shows that it’s basically a bimodal distribution

with the smallest “long-proof” theorem being:

In aggregate, all these proofs use about 200,000 lemmas. But only about 1200 of these are distinct. And we can plot which lemmas are used in which proofs—and we see that there are indeed many lemmas that are used across wide ranges of proofs, while there are a few others that are “special” to each proof (the diagonal stripe is associated with lemmas close to the statement being proved):

If we rank all distinct lemmas from most frequently to least frequently used, we get the following distribution of lemma usage frequencies across all our proofs:

It turns out that there is a “common core” of 49 lemmas that are used in every single one of the proofs. So what are these lemmas? Here’s a plot of the usage frequency of lemmas against their size—with the “common ones” populating the top line:

And at first this might seem surprising. We might have expected that short lemmas would be the most frequent, but instead we’re seeing long lemmas that always appear, the very longest being:

So why is this? Basically it’s that these long lemmas are being used at the beginning of every proof. They’re the result of applying bisubstitution to the original axiom, and in some sense they seem to be laying down a kind of net in metamathematical space that then allows more diverse—and smaller—lemmas to be derived.

But how are these “common core” popular lemmas distributed within proofs? Here are a few examples:

And what we see is that while, yes, the common core lemmas are always at the beginning, they don’t seem to have a uniform way of “plugging into” the rest of the proof. And it doesn’t, for example, seem as if there’s just some small set of (perhaps simple) “waypoint” lemmas that one can introduce that will typically shorten these proofs.

If one effectively allows all the common core lemmas to be used as axioms, then inevitably proofs will be shortened; for example, the proof of a · b = b · a—which only ends up using 5 of the common core lemmas—is now shortened to 51 lemmas:

It doesn’t seem to become easier to understand, though. And if it’s unrolled, it’s still 5013 steps.

Still, one can ask what happens if one just introduces particular “recognizable” lemmas as additional axioms. For example, if we include “commutativity” a · b = b · a then we find that, yes, we do manage to reduce the lengths of some proofs, but certainly not all:

Are there any other “pivotal” lemmas we could add? In particular, what about lemmas that can help with the length-200 or more proofs? It turns out that all of these proofs involve the lemma:

So what happens if we add this? Well, it definitely reduces proof lengths:

And sometimes it even seems like it brings proofs into “human range”. For example, a proof of

from our original axiom has length 56. Adding in commutativity reduces it to length 18. And adding our third lemma reduces it to just length 9—and makes it not even depend directly on the original axiom:

But despite the apparent simplicity here, the steps involved—particularly when bisubstitution is used—are remarkably hard to follow. (Note the use of a = a as a kind of “implicit axiom”—something that has actually also appeared, without comment, in many of our other proofs.)

Can We Get a Shorter Proof?

The proof that we’ve been studying can be seen in some ways as a rather arbitrary artifact. It’s the output of FindEquationalProof, with all its specific detailed internal algorithms and choices. In the Appendix, we’ll see that other automated theorem proving systems give very similar results. But we still might wonder whether actually the complexity of the proof as we’ve been studying it is just a consequence of the details of our automated theorem proving—and that in fact there’s a much shorter (and perhaps easier to understand) proof that exists.

One approach we could take—reminiscent of higher category theory—is to think about just simplifying the proof we have, effectively using proof-to-proof transformations. And, yes, this is technically difficult, though it doesn’t seem impossible. But what if there are “holes” in proof space? Then a “continuous deformation” of one proof into another will get stuck, and even if there is a much shorter proof, we’re liable to get “topologically stuck” before we find it.

One way to be sure we’re getting the shortest proof of a particular lemma is to explicitly find the first place that lemma appears in the (future) entailment cone of our original axiom. For example, as we saw above, a single substitution event leads to the entailment cone:

Every lemma produced here is, by construction, in principle derivable by a proof involving a single substitution event. But if we actually use FindEquationalProof to prove these lemmas, the proofs we get most involve 2 events (and in one case 4):

If we take another step in the entailment cone, we get a total of 5062 lemmas. From the way we generated them, we know that all these lemmas can in principle be reached by proofs of length 2. But if we run FindEquationalProof on them, we find a distribution of proof lengths:

And, yes, there is one lemma (with LeafCount 183) that is found only by a proof of length 15. But most often the proof length is 4—or about double what it could be.

If we generate the entailment cone for lemmas using bisubstitution rather than just ordinary substitution, there are slightly more cases where FindEquationalProof does worse at getting minimal proofs.

For example, the lemma

and 3 others can be generated by a single bisubstitution from the original axiom, but FindEquationalProof gives only proofs of length 4 for all of these.

What about unrolled proofs, in which one can generate an entailment cone by starting from a particular expression, and then applying the original axiom in all possible ways? For example, let’s say we start with:

Then applying bisubstitution with the original axiom once in all possible ways gives:

Applying bisubstitution a second time gives a larger entailment cone:

But now it turns out that—as indicated—one of the expressions in this cone is:

So this shows that the lemma

can in principle be reached with just two steps of “unrolled” proof:

And in this particular case, if we use FindEquationalProof and then unroll the resulting proof we also get a proof of length 3—but it goes through a different intermediate expression:

As it happens, this intermediate expression is also reached in the entailment cone that we get by starting from our “output” expression and then applying two bisubsitutions:

What Actually Is the “·”? Models and the Proof

We can think of logic (or Boolean algebra) as being associated with a certain collection of theorems. And what our axiom does is to provide something from which all theorems of logic (and nothing but theorems of logic) can be derived. At some level, we can think of it as just being about symbolic expressions. But in our effort to understand what’s going on—say with our proof—it’s sometimes useful to ask how we can “concretely” interpret these expressions.

For example, we might ask what the · operator actually is. And what kinds of things can our symbolic variables be? In effect we’re asking for what in model theory are called “models” of our axiom system. And in aligning with logic the most obvious model to discuss is one in which variables can be True or False, and the · represents either the logical operator Nand or the logical operator Nor.

The truth table, say for Nand, is:

And as expected, with this model for ·, we can confirm that our original axiom holds:

In general, though, our original axiom allows two size-2 models (that we can interpret as Nand and Nor):

It allows no size-3 models, and in fact in general allows only models of size 2ⁿ; for example, for size 4 its models are:

So what about a · b = b · a? What models does it allow? For size 2, it’s all 8 possible models with symmetric “multiplication tables”:

But the crucial point is that the 2 models for our original axiom system are part of these. In other words, at least for size-2 models, satisfying the original axiom system implies satisfying a · b = b · a.

And indeed any lemma derived from our axiom system must allow the models associated with our original axiom system. But it may also allow more—and sometimes many more. So here’s a map of our proof, showing how many models (out of 16 possible) each lemma allows:

Here are the results for size-3 models:

And, once again, these look complicated. We can think of models as defining—in some sense—what lemmas are “about”. So, for example, our original axiom is “about” Nand and Nor. The lemma a · b = b · a is “about” symmetric functions. And so on. And we might have hoped that we could gain some understanding of our proof by looking at how different lemmas that occur in it “sculpt” what is being talked about. But in fact we just seem to end up with complicated descriptions of sets that don’t seem to have any obvious relationship with each other.

What about a Higher-Level Abstraction?

If there’s one thing that stands out about our proof—and the analysis we’ve given of it here—it’s how fiddly and “in the weeds” it seems to be. But is that because we’re missing some big picture? Is there actually a more abstract way of discussing things, that gets to our result without having to go through all the details?

In the history of mathematics many of the most important themes have been precisely about finding such higher-level abstractions. We could start from the explicit symbolic axioms

or even

and start building up theorems much as we’ve done here. Or we could recognize that these are axioms for group theory, and then start using the abstract ideas of group theory to derive our theorems.

So is there some higher-level version of what we’re discussing here? Remember that the issue is not about the overall structure of Boolean algebra; rather it’s about the more metamathematical question of how one can prove that all of Boolean algebra can be generated from the axiom:

In the last few sections we’ve tried a few semi-empirical approaches to finding higher-level representations. But they haven’t gotten very far. And to get further we’re probably going to need a serious new idea.

And, if history is a guide, we’re going to need to come up with an abstraction that somehow “goes outside of the system” before “coming back”. It’s like trying to figure out the real roots of a cubic equation, and realizing that the best way to do this is to introduce complex numbers, even though the imaginary parts will cancel at the end.

In the direct exploration of our proof, it feels as if the intermediate lemmas we generate “wander off into the wilds of metamathematical space” before coming back to establish our final result. And if we were using a higher-level abstraction, we’d instead be “wandering off” into the space of that abstraction. But what we might hope is that—at least with the concepts we would use in discussing that abstraction—the path that would be involved would be “short enough to be accessible to human understanding”.

Will we be able to find such an abstraction? It’s a subtle question. Because in effect it asks whether we can reduce the computational effort needed for the proof—or, in other words, whether we can find a pocket of computational reducibility in what in general will be a computationally irreducible process. But it’s not a question that can really be answered just for our specific proof on it own. After all, our “abstraction” could in principle just involve introducing a primitive that represents our whole proof or a large part of it. But to make it what we can think of as a real abstraction we need something that spans many different specific examples—and, in our case, likely many axiomatic systems or symbolic proofs.

So is such an abstraction possible? In the history of mathematics the experience has been that after enough time (often measured in centuries) has passed, abstractions tend to be found. But at some level this has been self fulfilling. Because the areas that are considered to have remained “interesting for mathematics” tend to be just those where general abstractions have in fact been found.

In ruliology, though, the typical experience has been different. Because there it’s been routine to sample the computational universe of possible simple programs and encounter computational irreducibility. In the end it’s still inevitable that among the computational irreducibility there must be pockets of computational reducibility. But the issue is that these pockets of computational reducibility may not involve features of our system that we care about.

So is a proof of the kind we’re discussing here more like ruliology, or more like “typical mathematics”? Insofar as it’s a mathematical-style proof of a mathematical statement it feels more like typical mathematics. But insofar as it’s something found by the computational process of automated theorem proving it perhaps seems more ruliology.

But what might a higher-level abstraction for it look like? Figuring that out is probably tantamount to finding the abstraction. But perhaps one can at least expect that in some ways it will be metamathematical, and more about the structure and character of proofs than about their content. Perhaps it will be something related to the framework of higher category theory, or some form of meta-algebra. But as of now, we really don’t know—and we can’t even say that such an abstraction with any degree of generality is possible.

LLMs to the Rescue?

The unexpected success of LLMs in language generation and related tasks has led to the idea that perhaps eventually systems like LLMs will be able to “do everything”—including for example math. We already know—not least thanks to Wolfram Language—that lots of math can be done computationally. But often the computations are hard—and, as in the example of the proof we’re discussing here, incomprehensible to humans. So the question really is: can LLMs “humanize” what has to be done in math, turning everything into a human-accessible narrative? And here our proof seems like an excellent—if challenging—test case.

But what happens if we just ask a current LLM to generate the proof from scratch? It’s not a good picture. Very often the LLM will eagerly generate a proof, but it’ll be completely wrong, often with the same kind of mistakes that a student somewhat out of their depth might make. Here’s a typical response where an LLM simply assumes that the · operator is associative (which it isn’t in Boolean algebra) then produces a proof that on first blush looks at least vaguely plausible, but is in fact completely wrong:

Inadequate LLM proof

Coming up with an explanation for what went wrong is basically an exercise in “LLM psychology”. But in a first approximation one might say the following. LLMs are trained to “fill in what’s typical”, where “typical” is defined by what appears in the training set. But (absent some recent Wolfram Language and Wolfram|Alpha based technology of ours) what’s been available as a training set has been human-generated mathematical texts, where, yes, operators are often associative, and typical proofs are fairly short. And in the “psychology of LLMs” an LLM is much more likely to “do what’s typical” than to “rigorously follow the rules”.

If you press the LLM harder, then it might just “abdicate”, and suggest using the Wolfram Language as a tool to generate the proof. So what happens if we do that, then feed the finished proof to the LLM and ask it to explain? Well, typically it just does what LLMs do so well, and writes an essay:

LLM proof essay

So, yes, it does fine in “generally framing the problem”. But not on the details. And if you press it for details, it’ll typically eventually just start parroting what it was given as input.

How else might we try to get the LLM to help? One thing I’ve certainly wondered is how the lemmas in the proof relate to known theorems—perhaps in quite different areas of mathematics. It’s something one might imagine one would be able to answer by searching the literature of mathematics. But, for example, textual search won’t be sufficient: it has to be some form of semantic search based on the meaning or symbolic structure of lemmas, not their (fairly arbitrary) textual presentation. A vector database might be all one needs, but one can certainly ask an LLM too:

LLM semantic search results

It’s not extremely helpful, though, charmingly, it correctly identifies the source of our original axiom. I’ve tried similar queries for our whole set of lemmas across a variety of LLMs, with a variety of RAG systems. Often the LLM will talk about an interpretation for some lemma—but the lemma isn’t actual present in our proof. But occasionally the LLM will mention possible connections (“band theory”; “left self-distributive operations in quandles”; “Moufang loops”)—though so far none have seemed to quite hit the mark.

And perhaps this failure is itself actually a result—telling us that the lemmas that show up in our proof really are, in effect, out in the wilds of metamathematical space, probing places that haven’t ever been seriously visited before by human mathematics.

But beyond LLMs, what about more general machine learning and neural net approaches? Could we imagine using a neural net as a probe to find “exploitable regularities” in our proof? It’s certainly possible, but I suspect that the systematic algorithmic methods we’ve already discussed for finding optimal notations, popular lemmas, etc. will tend to do better. I suppose it would be one thing if our systematic methods had failed to even find a proof. Then we might have wanted something like neural nets to try to guess the right paths to follow, etc. But as it is, our systematic methods rather efficiently do manage to successfully find a proof.

Of course, there’s still the issue that we’re discussing here that the proof is very “non-human”. And perhaps we could imagine that neural nets, etc.—especially when trained on existing human knowledge—could be used to “form concepts” that would help us humans to understand the proof.

We can get at least a rough analogy for how this might work by looking at visual images produced by a generative AI system trained from billions of human-selected images. There’s a concept (like “a cube”) that exists somewhere in the feature space of possible images. But “around” that concept are other things—“out in interconcept space”—that we don’t (at least yet) explicitly have words for:

And it’ll presumably be similar for math, though harder to represent in something like a visual way. There’ll be existing math concepts. But these will be embedded in a vast domain of “mathematical interconcept space” that we humans haven’t yet “colonized”. And what we can imagine is that—perhaps with the help of neural nets, etc.—we can identify a limited number of “points in interconcept space” that we can introduce as new concepts that will, for example, provide useful “waypoints” in understanding our proof.

But Why Is the Theorem True?

It’s a common human urge to think that anything that’s true must be true for a reason. But what about our theorem? Why is it true? Well, we’ve seen a proof. But somehow that doesn’t seem satisfactory. We want “an explanation we can understand”. But we know that in general we can’t always expect to get one.

It’s a fundamental implication of computational irreducibility that things can happen where the only way to “see how they happen” is just to “watch them happen”; there’s no way to “compress the explanation”.

Consider the following patterns. They’re all generated by cellular automata. And all live exactly 100 steps before dying out. But why?

In a few cases it seems like we can perhaps at least begin to imagine “narratively describing” a mechanism. But most of the time all we can say is basically that they “live 100 steps because they do”.

It’s a quintessential consequence of computational irreducibility. It might not be what we’d expect, or hope for. But it’s reality in the computational universe. And it seems very likely that our theorem—and its proof—is like this too. The theorem in effect “just happens to be true”—and if you run the steps in the proof (or find the appropriate path in the entailment cone) you’ll find that it is. But there’s no “narrative explanation”. No “understanding of why it’s true”.

Intuition and Automated Theorem Proving

We’ve been talking a lot about the proof of our theorem. But where did the theorem to prove come from in the first place? Its immediate origin was an exhaustive search I did of simple axiom systems, filtering for ones that could conceivably generate Boolean algebra, followed by testing each of the candidates using automated theorem proving.

But how did I even get the idea of searching for a simple axiom system for Boolean algebra? Based on the axiom systems for Boolean algebra known before—and the historical difficulty of finding them—one might have concluded that it was quite hopeless to find an axiom system for Boolean algebra by exhaustive search. But by 2000 I had nearly two decades of experience in exploring the computational universe—and I was well used to the remarkable phenomenon that even very simple computational rules can lead to behavior of great complexity. So the result was that when I came to think about axiom systems and the foundations of mathematics my intuition led me to imagine that perhaps the simplest axiom system for something like Boolean algebra might be simple enough to exhaustively search for.

And indeed discovering the axiom system we’ve discussed here helped further expand and deepen my intuition about the consequences of simple rules. But what about the proof? What intuition might one get from the proof as we now know it, and as we’ve discussed here?

There’s much intuition to be got from observing the world as it is. But for nearly half a century I’ve had another crucial source of intuition: observing the computational universe—and doing computational experiments. I was recently reflecting on how I came to start developing intuition in this way. And what it might mean for intuition I could now develop from things like automated theorem proving and AI.

Back in the mid-1970s my efforts in particle physics led me to start using computers to do not just numerical, but also algebraic computations. In numerical computations it was usual to just get a few numbers out, that perhaps one could plot to make a curve. But in algebraic computations one instead got out formulas—and often very ornate ones full of structure and detail. And for me it was routine to get not just one formula, but many. And looking at these formulas I started to develop intuition about them. What functions would they involve? What algebraic form would they take? What kind of numbers would they involve?

I don’t think I ever consciously realized that I was developing a new kind of computationally based intuition. But I soon began to take it for granted. And when—at the beginning of the 1980s—I started to explore the consequences of simple abstract systems like cellular automata it was natural to expect that I would get intuition from just “seeing” how they behaved. And here there was also another important element. Because part of the reason I concentrated on cellular automata was precisely because one could readily visualize their behavior on a computer.

I don’t think I would have learned much if I’d just been printing out “numerical summaries” of what cellular automata do. But as it was, I was seeing their behavior in full detail. And—surprising though what I saw was—I was soon able to start getting an intuition for what could happen. It wasn’t a matter of knowing what the value of every cell would be. But I started doing things like identifying four general classes of cellular automata, and then recognizing the phenomenon of computational irreducibility.

By the 1990s I was much more broadly exploring the computational universe—always trying to see what could happen there. And in almost all cases it was a story of defining simple rules, then running them, and making an explicit step-by-step visualization of what they do—and thereby in effect “seeing computation in action”.

In recent years—spurred by our Physics Project—I’ve increasingly explored not just computational processes, but also multicomputational ones. And although it’s more difficult I’ve made every effort to visualize the behavior of multiway systems—and to get intuition about what they do.

But what about automated theorem proving? In effect, automated theorem proving is about finding a particular path in a multiway system that leads to a theorem we want. We’re not getting to see “complete behavior”; we’re in effect just seeing one particular “solution” for how to prove a theorem.

And after one’s seen many examples, the challenge once again is to develop intuition. And that’s a large part of what I’ve been trying to do here. It’s crucial, I think, to have some way to visualize what’s happening—in effect because visual input is the most efficient way to get information into our brains. And while the visualizations we’ve developed here aren’t as direct and complete as, say, for cellular automaton evolution, I think they begin to give some overall sense of our proof—and other proofs like it.

In studying simple programs like cellular automata, the intuition I developed led me to things like my classification of cellular automaton behavior, as well as to bigger ideas like the Principle of Computational Equivalence and computational irreducibility. So having now exposed myself to automated theorem proving as I exposed myself to algebraic computation and the running of simple rules in the past, what general principles might I begin to see? And might they, for example, somehow make the fact that our proof works ultimately seem “obvious”?

In some ways yes, but in other ways no. Much as with simple programs, there are axiom systems so simple that, for example, the multiway systems they generate are highly regular. But beyond a low threshold, it’s common to get very complicated—and in many ways seemingly random—multiway system structures. Typically an infinite number of lemmas are generated, with little or no obvious regularity in their forms.

And one can expect that—following the ideas of universal computation—it’ll typically be possible to encode in any one such multiway system the behavior of any other multiway system. In terms of axioms what one’s saying is that if one sets up the right translation between theorems, one will be able to use any one such axiom system to generate the theorems of any other. But the issue is that the translation will often make major changes to the structure of the theorems, and in effect define not just a “mathematical translation” (like between geometry and algebra) but a metamathematical one (as one would need to get from Peano arithmetic to set theory).

And what this means is that it isn’t surprising that even a very simple axiom system can generate a complicated set of possible lemmas. But knowing this doesn’t immediately tell one whether those lemmas will align with some particular existing theory—like Boolean algebra. And in a sense that’s a much more detailed question.

At some metamathematical level it might not be a natural question. But at a “mathematical level” it is. And it’s what we have to address in connection with the theorem—and proof—we’re discussing here. Many aspects of the overall form and properties of the proof will be quite generic, and won’t depend on the particulars of the axiom system we’re using. But some will. And quite what intuition we may be able to get about these isn’t clear. And perhaps it’ll necessarily be fragmented and specific—in effect responding to the presence of computational irreducibility.

It’s perhaps worth commenting that LLMs—and machine learning in general—represent another potential source of intuition. That intuition may well be more about the general features of us as observers and thinkers. But such intuition is potentially critical in framing just what we can experience, not only in the natural world, but also in the mathematical and metamathematical worlds. And perhaps the apparent impotence of LLMs when faced with the proof we’ve been discussing already tells us something significant about the nature of “mathematical observers” like us.

So What Does It Mean for the Future of Mathematics?

Let’s say we never manage to “humanize” the proof we’ve been discussing here. Then in effect we’ll end up with a “black-box theorem”—that we can be sure is true—but we’ll never know quite how or why. So what would that mean for mathematics?

Traditionally, mathematics has tended to operate in a “white box” kind of way, trying to build narrative and understanding along with “facts”. And in this respect it’s very different from natural science. Because in natural science much of our knowledge has traditionally been empirical—derived from observing the world or experimenting on it—and without any certainty that we can “understand its origins”.

Automated theorem proving of the kind we’re discussing here—or, for that matter, pretty much any exploratory computational experimentation—aligns mathematics much more with natural science, deriving what’s true without an expectation of having a narrative explanation of why.

Could one imagine practicing mathematics that way? One’s already to some extent following such a path as soon as one introduces axiom systems to base one’s mathematics on. Where do the axiom systems come from? In the time of Euclid perhaps they were thought of as an idealization of nature. But in more modern times they are realistically much more the result of human choice and human aesthetics.

So let’s say we determine (given a particular axiom system) that some black-box theorem is true. Well, then we can just add it, just as we could another axiom. Maybe one day it’ll be possible to prove P≠NP or the Riemann Hypothesis from existing axioms of mathematics (if they don’t in fact turn out to be independent). And—black box or not—we can expect to add them to what we assume in subsequent mathematics we do, much as they’re routinely added right now, even though their status isn’t yet known.

But it’s one thing to add one or two “black-box theorems”. But what happens when black-box theorems—that we can think of as “experimentally determined”—start to dominate the landscape of mathematics?

Well, then mathematics will take on much more of the character of ruliology—or of an experimental science. When it comes to the applications of mathematics, this probably won’t make much difference, except that in effect mathematics will be able to become much more powerful. But the “inner experience” of mathematics will be quite different—and much less “human”.

If one indeed starts from axioms, it’s not at the outset obvious why everything in mathematics should not be mired in the kind of alien-seeming metamathematical complexity that we’ve encountered in the discussion of our proof here. But what I’ve argued elsewhere is that the fact that in our experience of doing mathematics it’s not is a reflection of how “mathematical observers like us” sample the raw metamathematical structure generated by axioms (or ultimately by the subaxiomatic structure of the ruliad).

The physics analogy I’ve used is that we succeed in doing mathematics at a “fluid dynamics level”, far above the detailed “molecular dynamics level” of things like the proof we’ve discussed here. Yes, we can ask questions—like ones about the structure of our proof—that probe the axiomatic “molecular dynamics level”. But it’s an important fact that in doing what we normally think of as mathematics we almost never have to; there’s a coherent way to operate purely at the “fluid dynamics level”.

Is it useful to “dip down” to the molecular dynamics? Definitely yes, because that’s where we can readily do computations—like those in our proof, or in general those going on in the internals of the Wolfram Language. But a key idea in the design of the Wolfram Language is to provide a computational language that can express concepts at a humanized “fluid dynamics” level—in effect bridging between the way humans can think and understand things, and the way raw computation can be done with them.

And it’s notable that while we’ve had great success over the years in defining “human-accessible” high-level representations for what amount to the “inputs” and “outputs” of computations, that’s been much less true of the “ongoing processes” of computation—or, for example, of the innards of proofs.

Is there a good “human-level” way to represent proofs? If the proofs are short, it’s not too difficult (and the step-by-step solutions technology of Wolfram|Alpha provides a good large-scale example of what can be done). But—as we’ve discussed—computational irreducibility implies that some proofs will inevitably be long.

If they’re not too long, then at least some parts of them might be constructed by human effort, say in a system like a proof assistant. But as soon as there’s much automation (whether with automated theorem proving or with LLMs) it’s basically inevitable that one will end up with things that at least approach what we’ve seen with the proof we’re discussing here.

What can then be done? Well, that’s the challenge. Maybe there is some way to simplify, abstract or otherwise “humanize” the proof we’ve been discussing. But I rather doubt it. I think this is likely one of those cases where we inevitably find ourselves face to face with computational irreducibility.

And, yes, there’s important science (particularly ruliology) to do on the structures we see. But it’s not mathematics as it’s traditionally been practiced. But that’s not to say that the results that come out of things like our proof won’t be useful for mathematics. They will be. But they make mathematics more like an experimental science—where what matters most is in effect the input and output rather than a “publishable” or human-readable derivation in between. And where the key issue in making progress is less in the innards of derivations than in defining clear computational ways to express input and output. Or, in effect, in capturing “human-level mathematics” in the primitives and structure of computational language.

Appendix: What about a Different Theorem Proving System?

The proof we’ve been discussing here was created using FindEquationalProof in the Wolfram Language. But what if we were to use a different automated theorem proving system? How different would the results be? In the spectrum of things that automated theorem proving systems do, our proof here is on the difficult end. And many existing automated theorem proving systems don’t manage to do it all. But some of the stronger ones do. And in the end—despite their different internal algorithms and heuristics—it’s remarkable how similar the results they give are to those from the Wolfram Language FindEquationalProof (differences in the way lemmas vs. inference steps, etc. are identified make detailed quantitative comparisons difficult):

Thanks

Thanks to Nik Murzin of the Wolfram Institute for his extensive help as part of the Wolfram Institute Empirical Metamathematics Project. Also Roger Germundsson, Sergio Sandoval, Adam Strzebonski, Michael Trott, Liubov Tupikina, James Wiles and Carlos Zapata for input. Thanks to Arnim Buch and Thomas Hillenbrand for their work in the 1990s on Waldmeister which is now part of FindEquationalProof (also to Jonathan Gorard for his 2017 work on the interface for FindEquationalProof). I was first seriously introduced to automated theorem proving in the late 1980s by Dana Scott, and have interacted with many people about it over the years, including Richard Assar, Bruno Buchberger, David Hillman, Norm Megill, Todd Rowland and Matthew Szudzik. (I’ve also interacted with many people about proof assistant, proof presentation and proof verification systems, both recently and in the past.)

Stephen Wolfram (2025), "Who Can Understand the Proof? A Window on Formalized Mathematics," Stephen Wolfram Writings. writings.stephenwolfram.com/2025/01/who-can-understand-the-proof-a-window-on-formalized-mathematics.

Text

Stephen Wolfram (2025), "Who Can Understand the Proof? A Window on Formalized Mathematics," Stephen Wolfram Writings. writings.stephenwolfram.com/2025/01/who-can-understand-the-proof-a-window-on-formalized-mathematics.

CMS

Wolfram, Stephen. "Who Can Understand the Proof? A Window on Formalized Mathematics." Stephen Wolfram Writings. January 9, 2025. writings.stephenwolfram.com/2025/01/who-can-understand-the-proof-a-window-on-formalized-mathematics.

APA

Wolfram, S. (2025, January 9). Who can understand the proof? A window on formalized mathematics. Stephen Wolfram Writings. writings.stephenwolfram.com/2025/01/who-can-understand-the-proof-a-window-on-formalized-mathematics.

Posted in: Artificial Intelligence, Computational Science, Mathematics, Ruliology