When we released Version 12.1 in March of this year, I was pleased to be able to say that with its 182 new functions it was the biggest .1 release we’d ever had. But just nine months later, we’ve got an even bigger .1 release! Version 12.2, launching today, has 228 completely new functions!
We always have a portfolio of development projects going on, with any given project taking anywhere from a few months to more than a decade to complete. And of course it’s a tribute to our whole Wolfram Language technology stack that we’re able to develop so much, so quickly. But Version 12.2 is perhaps all the more impressive for the fact that we didn’t concentrate on its final development until midJune of this year. Because between March and June we were concentrating on 12.1.1, which was a “polishing release”. No new features, but more than a thousand outstanding bugs fixed:
How did we design all those new functions and new features that are now in 12.2? It’s a lot of work! And it’s what I personally spend a lot of my time on (along with other “small items” like physics, etc.). But for the past couple of years we’ve done our language design in a very open way—livestreaming our internal design discussions, and getting all sorts of great feedback in real time. So far we’ve recorded about 550 hours—of which Version 12.2 occupied at least 150 hours.
By the way, in addition to all of the fully integrated new functionality in 12.2, there’s also been significant activity in the Wolfram Function Repository—and even since 12.1 was released 534 new, curated functions for all sorts of specialized purposes have been added there.
There are so many different things in so many areas in Version 12.2 that it’s hard to know where to start. But let’s talk about a completely new area: biosequence computation. Yes, we’ve had gene and protein data in the Wolfram Language for more than a decade. But what’s new in 12.2 is the beginning of the ability to do flexible, general computation with bio sequences. And to do it in a way that fits in with all the chemical computation capabilities we’ve been adding to the Wolfram Language over the past few years.
Here’s how we represent a DNA sequence (and, yes, this works with very long sequences too):
✕
BioSequence["DNA", "CTTTTCGAGATCTCGGCGTCA"] 
This translates the sequence to a peptide (like a “symbolic ribosome”):
✕
BioSequenceTranslate[%] 
Now we can find out what the corresponding molecule is:
✕
Molecule[%] 
And visualize it in 3D (or compute lots of properties):
✕
MoleculePlot3D[%] 
I have to say that I agonized a bit about the “nonuniversality” of putting the specifics of “our” biology into our core language… but it definitely swayed my thinking that, of course, all our users are (for now) definitively eukaryotes. Needless to say, though, we’re set up to deal with other branches of life too:
✕
Entity["GeneticTranslationTable", "AscidianMitochondrial"]["StartCodons"] 
You might think that handling genome sequences is “just string manipulation”—and indeed our string functions are now set up to work with bio sequences:
✕
StringReverse[BioSequence["DNA", "CTTTTCGAGATCTCGGCGTCA"]] 
But there’s also a lot of biologyspecific additional functionality. Like this finds a complementary basepair sequence:
✕
BioSequenceComplement[BioSequence["DNA", "CTTTTCGAGATCTCGGCGTCA"]] 
Actual, experimental sequences often have base pairs that are somehow uncertain—and there are standard conventions for representing this (e.g. “S” means C or G; “N” means any base). And now our string patterns also understand things like this for bio sequences:
✕
StringMatchQ[BioSequence["DNA", "CTTT"], "STTT"] 
And there are new functions like BioSequenceInstances for resolving degenerate characters:
✕
BioSequenceInstances[BioSequence["DNA", "STTT"]] 
BioSequence is also completely integrated with our builtin genome and protein data. Here’s a gene that we can ask for in natural language “WolframAlpha style”:
✕
BioSequence[CloudGet["https://wolfr.am/ROWvGTNr"]] 
Now we ask to do sequence alignment between these two genes (in this case, both human—which is, needless to say, the default):
✕

What’s in 12.2 is really just the beginning of what we’re planning for biosequence computation. But already you can do very flexible things with large datasets. And, for example, it’s now straightforward for me to read my genome in from FASTA files and start exploring it…
✕
BioSequence["DNA", First[Import["Genome/Consensus/c1.fa.consensus.fa"]]] 
Locations of birds’ nests, gold deposits, houses for sale, defects in a material, galaxies…. These are all examples of spatial point datasets. And in Version 12.2 we now have a broad collection of functions for handling such datasets.
Here’s the “spatial point data” for the locations of US state capitals:
✕
SpatialPointData[ GeoPosition[EntityClass["City", "UnitedStatesCapitals"]]] 
Since it’s geo data, it’s plotted on a map:
✕
PointValuePlot[%] 
Let’s restrict our domain to the contiguous US:
✕
capitals = SpatialPointData[ GeoPosition[EntityClass["City", "UnitedStatesCapitals"]], Entity["Country", "UnitedStates"]]; 
✕
PointValuePlot[%] 
Now we can start computing spatial statistics. Like here’s the mean density of state capitals:
✕
MeanPointDensity[capitals] 
Assume you’re in a state capital. Here’s the probability to find the nearest other state capital a certain distance away:
✕
NearestNeighborG[capitals] 
✕
Plot[%[Quantity[r, "Miles"]], {r, 0, 400}] 
This tests whether the state capitals are randomly distributed; needless to say, they’re not:
✕
SpatialRandomnessTest[capitals] 
In addition to computing statistics from spatial data, Version 12.2 can also generate spatial data according to a wide range of models. Here’s a model that picks “center points” at random, then has other points clustered around them:
✕
PointValuePlot[ RandomPointConfiguration[MaternPointProcess[.0001, 1, .1, 2], CloudGet["https://wolfr.am/ROWwlIqR"]]] 
You can also go the other way around, and fit a spatial model to data:
✕
EstimatedPointProcess[capitals, MaternPointProcess[\[Mu], \[Lambda], r, 2], {\[Mu], \[Lambda], r}] 
In some ways we’ve been working towards it for 30 years. We first introduced NDSolve back in Version 2.0, and we’ve been steadily enhancing it ever since. But our longterm goal has always been convenient handling of realworld PDEs of the kind that appear throughout highend engineering. And in Version 12.2 we’ve finally got all the pieces of underlying algorithmic technology to be able to create a truly streamlined PDEsolving experience.
OK, so how do you specify a PDE? In the past, it was always done explicitly in terms of particular derivatives, boundary conditions, etc. But most PDEs used for example in engineering consist of higherlevel components that “package together” derivatives, boundary conditions, etc. to represent features of physics, materials, etc.
The lowest level of our new PDE framework consists of symbolic “terms”, corresponding to common mathematical constructs that appear in realworld PDEs. For example, here’s a 2D “Laplacian term”:
✕
LaplacianPDETerm[{u[x, y], {x, y}}] 
And now this is all it takes to find the first 5 eigenvalues of the Laplacian in a regular polygon:
✕
NDEigenvalues[LaplacianPDETerm[{u[x, y], {x, y}}], u[x, y], {x, y} \[Element] RegularPolygon[5], 5] 
And the important thing is that you can put this kind of operation into a whole pipeline. Like here we’re getting the region from an image, solving for the 10th eigenmode, and then 3D plotting the result:
✕
NDEigensystem[{LaplacianPDETerm[{u[x, y], {x, y}}]}, u[x, y], {x, y} \[Element] ImageMesh[CloudGet["https://wolfr.am/ROWwBtE7"]], 10][[2, 1]] 
✕
Plot3D[%, {x, y} \[Element] ImageMesh[CloudGet["https://wolfr.am/ROWwGqjg"]]] 
In addition to LaplacianPDETerm, there are things like DiffusionPDETerm and ConvectionPDETerm that represent other terms that arise in realworld PDEs. Here’s a term for isotropic diffusion with unit diffusion coefficient:
✕
DiffusionPDETerm[{\[Phi][x, y, z], {x, y, z}}] 
Beyond individual terms, there are also “components” that combine multiple terms, usually with various parameters. Here’s a Helmholtz PDE component:
✕
HelmholtzPDEComponent[{u[x, y], {x, y}}, <"HelmholtzEigenvalue" > k>] 
By the way, it’s worth pointing out that our “terms” and “components” are set up to represent the symbolic structure of PDEs in a form suitable for structural manipulation and for things like numerical analysis. And to ensure that they maintain their structure, they’re normally kept in an inactivated form. But you can always “activate” them if you want to do things like algebraic operations:
✕
Activate[%] 
In realworld PDEs, one’s often dealing with actual, physical processes taking place in actual physical materials. And in Version 12.2 we’ve got immediate ways to deal not only with things like diffusion, but also with acoustics, heat transfer and mass transport—and to feed in properties of actual materials. Typically the structure is that there’s a PDE “component” that represents the bulk behavior of the material, together with a variety of PDE “values” or “conditions” that represent boundary conditions.
Here’s a typical PDE component, using material properties from the Wolfram Knowledgebase:
✕
HeatTransferPDEComponent[{\[CapitalTheta][t, x, y], t, {x, y}}, < "Material" > CloudGet["https://wolfr.am/ROWwUQai"]>] 
There’s quite a bit of diversity and complexity to the possible boundary conditions. For example, for heat transfer, there’s HeatFluxValue, HeatInsulationValue and five other symbolic boundary condition specification constructs. In each case, the basic idea is to say where (geometrically) the condition applies, then what it applies to, and what parameters relate to it.
So, for example, here’s a condition that specifies that there’s a fixed “surface temperature” θ_{0} everywhere outside the (circular) region defined by x^{2} + y^{2} = 1:
✕
HeatTemperatureCondition[ x^2 + y^2 > 1, {\[CapitalTheta][t, x, y], t, {x, y}}, < "SurfaceTemperature" > Subscript[\[Theta], 0]>] 
What’s basically happening here is that our highlevel “physics” description is being “compiled” into explicit “mathematical” PDE structures—like Dirichlet boundary conditions.
OK, so how does all this fit together in a reallife situation? Let me show an example. But first, let me tell a story. Back in 2009 I was having tea with our lead PDE developer. I picked up a teaspoon and asked “When will we be able to model the stresses in this?” Our lead developer explained that there was quite a bit to build to get to that point. Well, I’m excited to say that after 11 years of work, in Version 12.2 we’re there. And to prove it, our lead developer just gave me… a (computational) spoon!
✕
spoon = CloudGet["https://wolfr.am/ROWx6wKF"]; 
The core of the computation is a 3D diffusion PDE term, with a “diffusion coefficient” given by a rank4 tensor parametrized by Young’s modulus (here Y) and Poisson ratio (ν):
✕
pdeterm = DiffusionPDETerm[{{u[x, y, z], v[x, y, z], w[x, y, z]}, {x, y, z}}, Y/(1 + \[Nu]) { {{ {(1  \[Nu])/(1  2 \[Nu]), 0, 0}, {0, 1/2, 0}, {0, 0, 1/2} }, { {0, \[Nu]/(1  2 \[Nu]), 0}, {1/2, 0, 0}, {0, 0, 0} }, { {0, 0, \[Nu]/(1  2 \[Nu])}, {0, 0, 0}, {1/2, 0, 0} }}, {{ {0, 1/2, 0}, {\[Nu]/(1  2 \[Nu]), 0, 0}, {0, 0, 0} }, { {1/2, 0, 0}, {0, (1  \[Nu])/(1  2 \[Nu]), 0}, {0, 0, 1/2} }, { {0, 0, 0}, {0, 0, \[Nu]/(1  2 \[Nu])}, {0, 1/2, 0} }}, {{ {0, 0, 1/2}, {0, 0, 0}, {\[Nu]/(1  2 \[Nu]), 0, 0} }, { {0, 0, 0}, {0, 0, 1/2}, {0, \[Nu]/(1  2 \[Nu]), 0} }, { {1/2, 0, 0}, {0, 1/2, 0}, {0, 0, (1  \[Nu])/(1  2 \[Nu])} }} }, <Y > 10^9, \[Nu] > 33/100>]; 
There are boundary conditions to specify how the spoon is being held, and pushed. Then solving the PDE (which takes just a few seconds) gives the displacement field for the spoon
✕
dfield = deformations = NDSolveValue[{pdeterm == {0, NeumannValue[1000, x <= 100], 0}, DirichletCondition[{u[x, y, z] == 0., v[x, y, z] == 0., w[x, y, z] == 0.}, x >= 100]}, {u, v, w}, {x, y, z} \[Element] spoon]; 
which we can then use to find how the spoon would deform:
✕
Show[MeshRegion[ Table[Apply[if, m], {m, MeshCoordinates[spoon]}, {if, deformations}] + MeshCoordinates[spoon], MeshCells[spoon, MeshCells[spoon, {2, All}]]], Graphics3D[Style[spoon, LightGray]]] 
PDE modeling is a complicated area, and I consider it to be a major achievement that we’ve now managed to “package” it as cleanly as this. But in Version 12.2, in addition to the actual technology of PDE modeling, something else that’s important is a large collection of computational essays about PDE modeling—altogether about 400 pages of detailed explanation and application examples, currently in acoustics, heat transfer and mass transport, but with many other domains to come.
The Wolfram Language is all about expressing yourself in precise computational language. But in notebooks you can also express yourself with ordinary text in natural language. But what if you want to display math in there as well? For 25 years we’ve had the infrastructure to do the math display—through our box language. But the only convenient way to enter the math is through Wolfram Language math constructs—that in some sense have to have computational meaning.
But what about “math” that’s “for human eyes only”? That has a certain visual layout that you want to specify, but that doesn’t necessarily have any particular underlying computational meaning that’s been defined? Well, for many decades there’s been a good way to specify such math, thanks to my friend Don Knuth: just use T_{E}X. And in Version 12.2 we’re now supporting direct entry of T_{E}X math into Wolfram Notebooks, both on the desktop and in the cloud. Underneath, the T_{E}X is being turned into our box representation, so it structurally interoperates with everything else. But you can just enter it—and edit it—as T_{E}X.
The interface is very much like the += interface for WolframAlphastyle natural language input. But for T_{E}X (in a nod to standard T_{E}X delimiters), it’s +$.
Type +$ and you get a T_{E}X input box. When you’ve finished the T_{E}X, just hit and it’ll be rendered:
Like with +=, if you click the rendered form, it’ll go back to text and you can edit again, just as T_{E}X.
Entering T_{E}X in text cells is the most common thing to want. But Version 12.2 also supports entering T_{E}X in input cells:
What happens if you + evaluate? Your input will be treated as TraditionalForm, and at least an attempt will be made to interpret it. Though, of course, if you wrote “computationally meaningless math” that won’t work.
Type Canvas[] and you’ll get a blank canvas to draw whatever you want:
✕
Canvas[] 
We’ve worked hard to make the drawing tools as ergonomic as possible.
Applying Normal gives you graphics that you can then use or manipulate:
✕
GraphicsGrid[ Partition[ Table[Rasterize[Rotate[Normal[%], \[Theta]], ImageSize > 50], {\[Theta], 0, 2 Pi, .4}], UpTo[8]], ImageSize > 500] 
✕
GraphicsGrid[ Partition[ Table[Rasterize[Rotate[Normal[%], \[Theta]], ImageSize > 50], {\[Theta], 0, 2 Pi, .4}], UpTo[8]], ImageSize > 500] 
When you create a canvas, it can have any graphic as initial content—and it can have any background you want:
✕
Canvas[Graphics[ Style[Disk[], Opacity[.4, Red], EdgeForm[{Thick, Red}]]], Background > GeoGraphics[ Entity["MannedSpaceMission", "Apollo16"][ EntityProperty["MannedSpaceMission", "LandingPosition"]]]] 
On the subject of drawing anything, Version 12.2 has another new function: MoleculeDraw, for drawing (or editing) molecules. Start with the symbolic representation of a molecule:
✕
Molecule[Entity["Chemical", "Caffeine"]] 
Now use MoleculeDraw to bring up the interactive molecule drawing environment, make an edit, and return the result:
It’s another molecule now:
Math has been a core use case for the Wolfram Language (and Mathematica) since the beginning. And it’s been very satisfying over the past third of a century to see how much math we’ve been able to make computational. But the more we do, the more we realize is possible, and the further we can go. It’s become in a sense routine for us. There’ll be some area of math that people have been doing by hand or piecemeal forever. And we’ll figure out: yes, we can make an algorithm for that! We can use the giant tower of capabilities we’ve built over all these years to systematize and automate yet more mathematics; to make yet more math computationally accessible to anyone. And so it has been with Version 12.2. A whole collection of pieces of “math progress”.
Let’s start with something rather cut and dried: special functions. In a sense, every special function is an encapsulation of a certain nugget of mathematics: a way of defining computations and properties for a particular type of mathematical problem or system. Starting from Mathematica 1.0 we’ve achieved excellent coverage of special functions, steadily expanding to more and more complicated functions. And in Version 12.2 we’ve got another class of functions: the Lamé functions.
Lamé functions are part of the complicated world of handling ellipsoidal coordinates; they appear as solutions to the Laplace equation in an ellipsoid. And now we can evaluate them, expand them, transform them, and do all the other kinds of things that are involved in integrating a function into our language:
✕
Plot[Abs[LameS[3/2 + I, 3, z, 0.1 + 0.1 I]], {z, 8 EllipticK[1/3], 8 EllipticK[1/3]}] 
✕
Series[LameC[\[Nu], j, z, m], {z, 0, 3}] 
Also in Version 12.2 we’ve done a lot on elliptic functions—dramatically speeding up their numerical evaluation and inventing algorithms doing this efficiently at arbitrary precision. We’ve also introduced some new elliptic functions, like JacobiEpsilon—which provides a generalization of EllipticE that avoids branch cuts and maintains the analytic structure of elliptic integrals:
✕
ComplexPlot3D[JacobiEpsilon[z, 1/2], {z, 6}] 
We’ve been able to do many symbolic Laplace and inverse Laplace transforms for a couple of decades. But in Version 12.2 we’ve solved the subtle problem of using contour integration to do inverse Laplace transforms. It’s a story of knowing enough about the structure of functions in the complex plane to avoid branch cuts and other nasty singularities. A typical result effectively sums over an infinite number of poles:
✕
InverseLaplaceTransform[Coth[s \[Pi] /2 ]/(1 + s^2), s, t] 
And between contour integration and other methods we’ve also added numerical inverse Laplace transforms. It all looks easy in the end, but there’s a lot of complicated algorithmic work needed to achieve this:
✕
InverseLaplaceTransform[1/(s + Sqrt[s] + 1), s, 1.5] 
Another new algorithm made possible by finer “function understanding” has to do with asymptotic expansion of integrals. Here’s a complex function that becomes increasingly wiggly as λ increases:
✕
Table[ReImPlot[(t^10 + 3) Exp[I \[Lambda] (t^5 + t + 1)], {t, 2, 2}], {\[Lambda], 10, 30, 10}] 
And here’s the asymptotic expansion for λ→∞:
✕
AsymptoticIntegrate[(t^10 + 3) Exp[ I \[Lambda] (t^5 + t + 1)], {t, 2, 2}, {\[Lambda], Infinity, 2}] 
It’s a very common calculus exercise to determine, for example, whether a particular function is injective. And it’s pretty straightforward to do this in easy cases. But a big step forward in Version 12.2 is that we can now systematically figure out these kinds of global properties of functions—not just in easy cases, but also in very hard cases. Often there are whole networks of theorems that depend on functions having suchandsuch a property. Well, now we can automatically determine whether a particular function has that property, and so whether the theorems hold for it. And that means that we can create systematic algorithms that automatically use the theorems when they apply.
Here’s an example. Is Tan[x] injective? Not globally:
✕
FunctionInjective[Tan[x], x] 
But over an interval, yes:
✕
FunctionInjective[{Tan[x], 0 < x < Pi/2}, x] 
What about the singularities of Tan[x]? This gives a description of the set:
✕
FunctionSingularities[Tan[x], x] 
You can get explicit values with Reduce:
✕
Reduce[%, x] 
So far, fairly straightforward. But things quickly get more complicated:
✕
FunctionSingularities[ArcTan[x^y], {x, y}, Complexes] 
And there are more sophisticated properties you can ask about as well:
✕
FunctionMeromorphic[Log[z], z] 
✕
FunctionMeromorphic[{Log[z], z > 0}, z] 
We’ve internally used various kinds of functiontesting properties for a long time. But with Version 12.2 function properties are much more complete and fully exposed for anyone to use. Want to know if you can interchange the order of two limits? Check FunctionSingularities. Want to know if you can do a multivariate change of variables in an integral? Check FunctionInjective.
And, yes, even in Plot3D we’re routinely using FunctionSingularities to figure out what’s going on:
✕
Plot3D[Re[ArcTan[x^y]], {x, 5, 5}, {y, 5, 5}] 
In Version 12.1 we began the process of introducing video as a builtin feature of the Wolfram Language. Version 12.2 continues that process. In 12.1 we could only handle video in desktop notebooks; now it’s extended to cloud notebooks—so when you generate a video in Wolfram Language it’s immediately deployable to the cloud.
A major new video feature in 12.2 is VideoGenerator. Provide a function that makes images (and/or audio), and VideoGenerator will generate a video from them (here a 4second video):
✕
VideoGenerator[Graphics3D[AugmentedPolyhedron[Icosahedron[], #  2], ImageSize > {200, 200}] &, 4] 
To add a sound track, we can just use VideoCombine:
✕
VideoCombine[{%, \!\(\* TagBox[ RowBox[{"CloudGet", "[", "\"\<https://wolfr.am/ROWzckqS\>\"", "]"}], Audio`AudioBox["AudioClass" > "AudioData"], Editable>False, Selectable>False]\)}] 
So how would we edit this video? In Version 12.2 we have programmatic versions of standard videoediting functions. VideoSplit, for example, splits the video at particular times:
✕
VideoSplit[%, {.3, .5, 2}] 
But the real power of the Wolfram Language comes in systematically applying arbitrary functions to videos. VideoMap lets you apply a function to a video to get another video. For example, we could progressively blur the video we just made:
✕
VideoMap[Blur[#Image, 20 #Time] &, %%] 
There are also two new functions for analyzing videos—VideoMapList and VideoMapTimeSeries—which respectively generate a list and a time series by applying a function to the frames in a video, and to its audio track.
Another new function—highly relevant for video processing and video editing—is VideoIntervals, which determines the time intervals over which any given criterion applies in a video:
✕
VideoIntervals[%, Length[DominantColors[#Image]] < 3 &] 
Now, for example, we can delete those intervals in the video:
✕
VideoDelete[%, %%] 
A common operation in the practical handling of videos is transcoding. And in Version 12.2 the function VideoTranscode lets you convert a video among any of the over 300 containers and codecs that we support. By the way, 12.2 also has new functions ImageWaveformPlot and ImageVectorscopePlot that are commonly used in video color correction:
✕
ImageVectorscopePlot[CloudGet["https://wolfr.am/ROWzsGFw"]] 
One of the main technical issues in handling video is dealing with the large amount of data in a typical video. In Version 12.2 there’s now finer control over where that data is stored. The option GeneratedAssetLocation (with default $GeneratedAssetLocation) lets you pick between different files, directories, local object stores, etc.
But there’s also a new function in Version 12.2 for handling “lightweight video”, in the form of AnimatedImage. AnimatedImage simply takes a list of images and produces an animation that immediately plays in your notebook—and has everything directly stored in your notebook:
✕
AnimatedImage[ Table[Rasterize[Rotate[Style["W", 40], \[Theta]]], {\[Theta], 0, 2 Pi, .1}]] 
It comes up quite frequently for me—especially given our Physics Project. I’ve got a big computation I’d like to do, but I don’t want to (or can’t) do it on my computer. And instead what I’d like to do is run it as a batch job in the cloud.
This has been possible in principle for as long as cloud computation providers have been around. But it’s been very involved and difficult. Well, now, in Version 12.2 it’s finally easy. Given any piece of Wolfram Language code, you can just use RemoteBatchSubmit to send it to be run as a batch job in the cloud.
There’s a little bit of setup required on the batch computation provider side. First, you have to have an account with an appropriate provider—and initially we’re supporting AWS Batch and Charity Engine. Then you have to configure things with that provider (and we’ve got workflows that describe how to do that). But as soon as that’s done, you’ll get a remote batch submission environment that’s basically all you need to start submitting batch jobs:
✕
env = RemoteBatchSubmissionEnvironment[ "AWSBatch", <"JobQueue" > "arn:aws:batch:useast1:123456789012:jobqueue/MyQueue", "JobDefinition" > "arn:aws:batch:useast1:123456789012:jobdefinition/MyDefinition:\ 1", "IOBucket" > "myjobbucket">] 
OK, so what would be involved, say, in submitting a neural net training? Here’s how I would run it locally on my machine (and, yes, this is a very simple example):
✕
NetTrain[NetModel["LeNet"], "MNIST"] 
And here’s the minimal way I would send it to run on AWS Batch:
✕
job = RemoteBatchSubmit[env, NetTrain[NetModel["LeNet"], "MNIST"]] 
I get back an object that represents my remote batch job—that I can query to find out what’s happened with my job. At first it’ll just tell me that my job is “runnable”:
✕
job["JobStatus"] 
Later on, it’ll say that it’s “starting”, then “running”, then (if all goes well) “succeeded”. And once the job is finished, you can get back the result like this:
✕
job["EvaluationResult"] 
There’s lots of detail you can retrieve about what actually happened. Like here’s the beginning of the raw job log:
✕
job["JobLog"] 
But the real point of running your computations remotely in a cloud is that they can potentially be bigger and crunchier than the ones you can run on your own machines. Here’s how we could run the same computation as above, but now requesting the use of a GPU:
✕
RemoteBatchSubmit[env, NetTrain[NetModel["LeNet"], "MNIST", TargetDevice > "GPU"], RemoteProviderSettings > <"GPUCount" > 1>] 
RemoteBatchSubmit can also handle parallel computations. If you request a multicore machine, you can immediately run ParallelMap etc. across its cores. But you can go even further with RemoteBatchMapSubmit—which automatically distributes your computation across a whole collection of separate machines in the cloud.
Here’s an example:
✕
job = RemoteBatchMapSubmit[env, ImageIdentify, WebImageSearch["happy", 100]] 
While it’s running, we can get a dynamic display of the status of each part of the job:
✕
job["DynamicStatusVisualization"] 
About 5 minutes later, the job is finished:
✕
job["JobStatus"] 
And here are our results:
✕
ReverseSort[Counts[job["EvaluationResults"]]] 
RemoteBatchSubmit and RemoteBatchMapSubmit give you highlevel access to cloud compute services for general batch computation. But in Version 12.2 there is also a direct lowerlevel interface available, for example for AWS.
Connect to AWS:
✕
aws = ServiceConnect["AWS"] 
Once you’ve authenticated, you can see all the services that are available:
✕
aws["Services"] 
This gives a handle to the Amazon Translate service:
✕
aws["GetService", "Name" > "Translate"] 
Now you can use this to call the service:
✕
%["TranslateText", "Text" > "今日は良い一日だった", "SourceLanguageCode" > "auto", "TargetLanguageCode" > "en" ] 
Of course, you can always do language translation directly through the Wolfram Language too:
✕
TextTranslation["今日は良い一日だった"] 
It’s straightforward to plot data that involves one, two or three dimensions. For a few dimensions above that, you can use colors or other styling. But by the time you’re dealing with ten dimensions, that breaks down. And if you’ve got a lot of data in 10D, for example, then you’re probably going to have to use something like DimensionReduce to try to tease out “interesting features”.
But if you’re just dealing with a few “data points”, there are other ways to visualize things like 10dimensional data. And in Version 12.2 we’re introducing several functions for doing this.
As a first example, let’s look at ParallelAxisPlot. The idea here is that every “dimension” is plotted on a “separate axis”. For a single point it’s not that exciting:
✕
ParallelAxisPlot[{{10, 17, 19, 8, 7, 5, 17, 4, 8, 2}}, PlotRange > {0, 20}] 
Here’s what happens if we plot three random “10D data points”:
✕
ParallelAxisPlot[RandomInteger[20, {3, 10}], PlotRange > {0, 20}] 
But one of the important features of ParallelAxisPlot is that by default it automatically determines the scale on each axis, so there’s no need for the axes to be representing similar kinds of things. So, for example, here are 7 completely different quantities plotted for all the chemical elements:
✕
ParallelAxisPlot[ EntityValue[ "Element", {EntityProperty["Element", "AtomicMass"], EntityProperty["Element", "AtomicRadius"], EntityProperty["Element", "BoilingPoint"], EntityProperty["Element", "ElectricalConductivity"], EntityProperty["Element", "MeltingPoint"], EntityProperty["Element", "NeutronCrossSection"], EntityProperty["Element", "ThermalConductivity"]}]] 
Different kinds of highdimensional data do best on different kinds of plots. Another new type of plot in Version 12.2 is RadialAxisPlot. (This type of plot also goes by names like radar plot, spider plot and star plot.)
RadialAxisPlot plots each dimension in a different direction:
✕
RadialAxisPlot[ EntityValue[ "Element", {EntityProperty["Element", "AtomicMass"], EntityProperty["Element", "AtomicRadius"], EntityProperty["Element", "BoilingPoint"], EntityProperty["Element", "ElectricalConductivity"], EntityProperty["Element", "MeltingPoint"], EntityProperty["Element", "NeutronCrossSection"], EntityProperty["Element", "ThermalConductivity"]}]] 
It’s typically most informative when there aren’t too many data points:
✕
RadialAxisPlot[ EntityValue[{Entity["City", {"Chicago", "Illinois", "UnitedStates"}], Entity["City", {"Dallas", "Texas", "UnitedStates"}], Entity["City", {"NewYork", "NewYork", "UnitedStates"}], Entity["City", {"LosAngeles", "California", "UnitedStates"}]}, {EntityProperty["City", "MedianHomeSalePrice"], EntityProperty["City", "TotalSalesTaxRate"], EntityProperty["City", "MedianHouseholdIncome"], EntityProperty["City", "Population"], EntityProperty["City", "Area"]}, "EntityAssociation"], PlotLegends > Automatic] 
Back in 1984 I used a Cray supercomputer to make 3D pictures of 2D cellular automata evolving in time (yes, captured on 35 mm slides):
I’ve been waiting for 36 years to have a really streamlined way to reproduce these. And now finally in Version 12.2 we have it: ArrayPlot3D. Already in 2012 we introduced Image3D to represent and display 3D images composed of 3D voxels with specified colors and opacities. But its emphasis is on “radiologystyle” work, in which there’s a certain assumption of continuity between voxels. And if you’ve really got a discrete array of discrete data (as in cellular automata) that won’t lead to crisp results.
And here it is, for a slightly more elaborate case of a 3D cellular automaton:
✕
Table[ArrayPlot3D[ CellularAutomaton[{14, {2, 1}, {1, 1, 1}}, {{{{1}}}, 0}, {{{t}}}]], {t, 20, 40, 10}] 
Another new ArrayPlotfamily function in 12.2 is ComplexArrayPlot, here applied to an array of values from Newton’s method:
✕
Table[ArrayPlot3D[ CellularAutomaton[{14, {2, 1}, {1, 1, 1}}, {{{{1}}}, 0}, {{{t}}}], PlotTheme > "Web"], {t, 10, 40, 10}] 
One of our objectives in Wolfram Language is to have visualizations that just “automatically look good”—because they’ve got algorithms and heuristics that effectively implement good computational aesthetics. In Version 12.2 we’ve tuned up the computational aesthetics for a variety of types of visualization. For example, in 12.1 this is what a SliceVectorPlot3D looked like by default:
✕
SliceVectorPlot3D[{y + x, z, y}, {x, 2, 2}, {y, 2, 2}, {z, 2, 2}] 
Now it looks like this:
Since Version 10, we’ve also been making increasing use of our PlotTheme option, to “bank switch” detailed options to make visualizations that are suitable for different purposes, and meet different aesthetic goals. So for example in Version 12.2 we’ve added plot themes to GeoRegionValuePlot. Here’s an example of the default (which has been updated, by the way):
✕
GeoRegionValuePlot[CloudGet["https://wolfr.am/ROWDoxAw"] > "GDP"] 
And here it is with the "Marketing" plot theme:
✕
GeoRegionValuePlot[CloudGet["https://wolfr.am/ROWDoxAw"] > "GDP", PlotTheme > "Marketing"] 
Another thing in Version 12.2 is the addition of new primitives and new “raw material” for creating aesthetic visual effects. In Version 12.1 we introduced things like HatchFilling for crosshatching. In Version 12.2 we now also have LinearGradientFilling:
✕
Graphics[Style[Disk[], LinearGradientFilling[{RGBColor[1., 0.71, 0.75], RGBColor[0.64, Rational[182, 255], Rational[244, 255]]}]]] 
And we can now add this kind of effect to the filling in a plot:
✕
Plot[2 Sin[x] + x, {x, 0, 15}, FillingStyle > LinearGradientFilling[{RGBColor[0.64, Rational[182, 255], Rational[244, 255]], RGBColor[1., 0.71, 0.75]}, Top], Filling > Bottom] 
To be even more stylish, one can plot random points using the new ConicGradientFilling:
✕
Graphics[Table[ Style[Disk[RandomReal[20, 2]], ConicGradientFilling[RandomColor[3]]], 100]] 
A core goal of the Wolfram Language is to define a coherent computational language that can readily be understood by both computers and humans. We (and I in particular!) put a lot of effort into the design of the language, and into things like picking the right names for functions. But in making the language as easy to read as possible, it’s also important to streamline its “nonverbal” or syntactic aspects. For function names, we’re basically leveraging people’s understanding of words in natural language. For syntactic structure, we want to leverage people’s “ambient understanding”, for example, from areas like math.
More than a decade ago we introduced as a way to specify Function functions, so instead of writing
✕
Function[x, x^2] 
(or #^{2}&) you could write:
✕
x > x^2 
But to enter you had to type \[Function] or at least fn , which tended to feel “a bit difficult”.
Well, in Version 12.2, we’re “mainstreaming” by making it possible to type just as >
✕
x   > x^2 
You can also do things like
✕
{x, y} > x + y 
as well as things like:
✕
SameTest > ({x, y} > Mod[x  y, 2] == 0) 
In Version 12.2, there’s also another new piece of “short syntax”: //=
Imagine you’ve got a result, say called res. Now you want to apply a function to res, and then “update res”. The new function ApplyTo (written //=) makes it easy to do that:
✕
res = 10 
✕
res //= f 
✕
res 
We’re always on the lookout for repeated “lumps of computation” that we can “package” into functions with “easytounderstand names”. And in Version 12.2 we have a couple of new such functions: FoldWhile and FoldWhileList. FoldList normally just takes a list and “folds” each successive element into the result it’s building up—until it gets to the end of the list:
✕
FoldList[f, {1, 2, 3, 4}] 
But what if you want to “stop early”? FoldWhileList lets you do that. So here we’re successively dividing by 1, 2, 3, …, stopping when the result isn’t an integer anymore:
✕
FoldWhileList[Divide, 5!, Range[10], IntegerQ] 
Let’s say you’ve got an array, like:
✕
{{a, b, c, d}, {x, y, z, w}} // MatrixForm 
Map lets you map a function over the “rows” of this array:
✕
Map[f, {{a, b, c, d}, {x, y, z, w}}] 
But what if you want to operate on the “columns” of the array, effectively “reducing out” the first dimension of the array? In Version 12.2 the function ArrayReduce lets you do this:
✕
ArrayReduce[f, {{a, b, c, d}, {x, y, z, w}}, 1] 
Here’s what happens if instead we tell ArrayReduce to “reduce out” the second dimension of the array:
✕
ArrayReduce[f, {{a, b, c, d}, {x, y, z, w}}, 2] 
What’s really going on here? The array has dimensions 2×4:
✕
Dimensions[{{a, b, c, d}, {x, y, z, w}}] 
ArrayReduce[f, ..., 1] “reduces out” the first dimension, leaving an array with dimensions {4}. ArrayReduce[f, ..., 2] reduces out the second dimension, leaving an array with dimensions {2}.
Let’s look at a slightly bigger case—a 2×3×4 array:
✕
array = ArrayReshape[Range[24], {2, 3, 4}] 
This now eliminates the “first dimension”, leaving a 3×4 array:
✕
ArrayReduce[f, array, 1] 
✕
Dimensions[%] 
This, on the other hand, eliminates the “second dimension”, leaving a 2×4 array:
✕
ArrayReduce[f, array, 2] 
✕
Dimensions[%] 
Why is this useful? One example is when you have arrays of data where different dimensions correspond to different attributes, and then you want to “ignore” a particular attribute, and aggregate the data with respect to it. Let’s say that the attribute you want to ignore is at level n in your array. Then all you do to “ignore” it is to use ArrayReduce[f, ..., n], where f is the function that aggregates values (often something like Total or Mean).
You can achieve the same results as ArrayReduce by appropriate sequences of Transpose, Apply, etc. But it’s quite messy, and ArrayReduce provides an elegant “packaging” of these kinds of array operations.
ArrayReduce is quite general; it lets you not only “reduce out” single dimensions, but whole collections of dimensions:
✕
ArrayReduce[f, array, {2, 3}] 
✕
ArrayReduce[f, array, {{2}, {3}}] 
At the simplest level, ArrayReduce is a convenient way to apply functions “columnwise” on arrays. But in full generality it’s a way to apply functions to subarrays with arbitrary indices. And if you’re thinking in terms of tensors, ArrayReduce is a generalization of contraction, in which more than two indices can be involved, and elements can be “flattened” before the operation (which doesn’t have to be summation) is applied.
It’s an old adage in debugging code: “put in a print statement”. But it’s more elegant in the Wolfram Language, thanks particularly to Echo. It’s a simple idea: Echo[expr] “echoes” (i.e. prints) the value of expr, but then returns that value. So the result is that you can put Echo anywhere into your code (often as Echo@…) without affecting what your code does.
In Version 12.2 there are some new functions that follow the “Echo” pattern. A first example is EchoLabel, which just adds a label to what’s echoed:
✕
EchoLabel["a"]@5! + EchoLabel["b"]@10! 
Aficionados might wonder why EchoLabel is needed. After all, Echo itself allows a second argument that can specify a label. The answer—and yes, it’s a mildly subtle piece of language design—is that if one’s going to just insert Echo as a function to apply (say with @), then it can only have one argument, so no label. EchoLabel is set up to have the operator form EchoLabel[label] so that EchoLabel[label][expr] is equivalent to Echo[expr,label].
Another new “echo function” in 12.2 is EchoTiming, which displays the timing (in seconds) of whatever it evaluates:
✕
Table[Length[EchoTiming[Permutations[Range[n]]]], {n, 8, 10}] 
It’s often helpful to use both Echo and EchoTiming:
✕
Length[EchoTiming[Permutations[Range[Echo@10]]]] 
And, by the way, if you always want to print evaluation time (just like Mathematica 1.0 did by default 32 years ago) you can always globally set $Pre=EchoTiming.
Another new “echo function” in 12.2 is EchoEvaluation which echoes the “before” and “after” for an evaluation:
✕
EchoEvaluation[2 + 2] 
You might wonder what happens with nested EchoEvaluation’s. Here’s an example:
✕
EchoEvaluation[ Accumulate[EchoEvaluation[Reverse[EchoEvaluation[Range[10]]]]]] 
By the way, it’s quite common to want to use both EchoTiming and EchoEvaluation:
✕
Table[EchoTiming@EchoEvaluation@FactorInteger[2^(50 n)  1], {n, 2}] 
Finally, if you want to leave echo functions in your code, but want your code to “run quiet”, you can use the new QuietEcho to “quiet” all the echoes (like Quiet “quiets” messages):
✕
QuietEcho@ Table[EchoTiming@EchoEvaluation@FactorInteger[2^(50 n)  1], {n, 2}] 
Did something go wrong inside your program? And if so, what should the program do? It can be possible to write very elegant code if one ignores such things. But as soon as one starts to put in checks, and has logic for unwinding things if something goes wrong, it’s common for the code to get vastly more complicated, and vastly less readable.
What can one do about this? Well, in Version 12.2 we’ve developed a highlevel symbolic mechanism for handling things going wrong in code. Basically the idea is that you insert Confirm (or related functions)—a bit like you might insert Echo—to “confirm” that something in your program is doing what it should. If the confirmation works, then your program just keeps going. But if it fails, then the program stops–and exits to the nearest enclosing Enclose. In a sense, Enclose “encloses” regions of your program, not letting anything that goes wrong inside immediately propagate out.
Let’s see how this works in a simple case. Here the Confirm successfully “confirms” y, just returning it, and the Enclose doesn’t really do anything:
✕
Enclose[f[x, Confirm[y], z]] 
But now let’s put $Failed in place of y. $Failed is something that Confirm by default considers to be a problem. So when it sees $Failed, it stops, exiting to the Enclose—which in turn yields a Failure object:
✕
Enclose[f[x, Confirm[$Failed], z]] 
If we put in some echoes, we’ll see that x is successfully reached, but z is not; as soon as the Confirm fails, it stops everything:
✕
Enclose[f[Echo[x], Confirm[$Failed], Echo[z]]] 
A very common thing is to want to use Confirm/Enclose when you define a function:
✕
addtwo[x_] := Enclose[Confirm[x] + 2] 
Use argument 5 and everything just works:
✕
addtwo[5] 
But if we instead use Missing[]—which Confirm by default considers to be a problem—we get back a Failure object:
✕
addtwo[Missing[]] 
We could achieve the same thing with If, Return, etc. But even in this very simple case, it wouldn’t look as nice.
Confirm has a certain default set of things that it considers “wrong” ($Failed, Failure[...], Missing[...] are examples). But there are related functions that allow you to specify particular tests. For example, ConfirmBy applies a function to test if an expression should be confirmed.
Here, ConfirmBy confirms that 2 is a number:
✕
Enclose[f[1, ConfirmBy[2, NumberQ], 3]] 
But x is not considered so by NumberQ:
✕
Enclose[f[1, ConfirmBy[x, NumberQ], 3]] 
OK, so let’s put these pieces together. Let’s define a function that’s supposed to operate on strings:
✕
world[x_] := Enclose[ConfirmBy[x, StringQ] <> " world!"] 
If we give it a string, all is well:
✕
world["hello"] 
But if we give it a number instead, the ConfirmBy fails:
✕
world[4] 
But here’s where really nice things start to happen. Let’s say we want to map world over a list, always confirming that it gets a good result. Here everything is OK:
✕
Enclose[Confirm[world[#]] & /@ {"a", "b", "c"}] 
But now something has gone wrong:
✕
Enclose[Confirm[world[#]] & /@ {"a", "b", 3}] 
The ConfirmBy inside the definition of world failed, causing its enclosing Enclose to produce a Failure object. Then this Failure object caused the Confirm inside the Map to fail, and the enclosing Enclose gave a Failure object for the whole thing. Once again, we could have achieved the same thing with If, Throw, Catch, etc. But Confirm/Enclose do it more robustly, and more elegantly.
These are all very small examples. But where Confirm/Enclose really show their value is in large programs, and in providing a clear, highlevel framework for handling errors and exceptions, and defining their scope.
In addition to Confirm and ConfirmBy, there’s also ConfirmMatch, which confirms that an expression matches a specified pattern. Then there’s ConfirmQuiet, which confirms that the evaluation of an expression doesn’t generate any messages (or, at least, none that you told it to test for). There’s also ConfirmAssert, which simply takes an “assertion” (like p>0) and confirms that it’s true.
When a confirmation fails, the program always exits to the nearest enclosing Enclose, delivering to the Enclose a Failure object with information about the failure that occurred. When you set up the Enclose, you can tell it how to handle failure objects it receives—either just returning them (perhaps to enclosing Confirm’s and Enclose’s), or applying functions to their contents.
Confirm and Enclose provide an elegant mechanism for handling errors, that are easy and clean to insert into programs. But—needless to say—there are definitely some tricky issues around them. Let me mention just one. The question is: which Confirm’s does a given Enclose really enclose? If you’ve written a piece of code that explicitly contains Enclose and Confirm, it’s pretty obvious. But what if there’s a Confirm that’s somehow generated—perhaps dynamically—deep inside some stack of functions? It’s similar to the situation with named variables. Module just looks for the variables directly (“lexically”) inside its body. Block looks for variables (“dynamically”) wherever they may occur. Well, Enclose by default works like Module, “lexically” looking for Confirm’s to enclose. But if you include tags in Confirm and Enclose, you can set them up to “find each other” even if they’re not explicitly “visible” in the same piece of code.
Confirm/Enclose provide a good highlevel way to handle the “flow” of things going wrong inside a program or a function. But what if there’s something wrong right at the getgo? In our builtin Wolfram Language functions, there’s a standard set of checks we apply. Are there the correct number of arguments? If there are options, are they allowed options, and are they in the correct place? In Version 12.2 we’ve added two functions that can perform these standard checks for functions you write.
This says that f should have two arguments, which here it doesn’t:
✕
CheckArguments[f[x, y, z], 2] 
Here’s a way to make CheckArguments part of the basic definition of a function:
✕
f[args___] := Null /; CheckArguments[f[args], 2] 
Give it the wrong number of arguments, and it’ll generate a message, and then return unevaluated, just like lots of builtin Wolfram Language functions do:
✕
f[7] 
ArgumentsOptions is another new function in Version 12.2—that separates “positional arguments” from options in a function. Set up options for a function:
✕
Options[f] = {opt > Automatic}; 
This expects one positional argument, which it finds:
✕
ArgumentsOptions[f[x, opt > 7], 1] 
If it doesn’t find exactly one positional argument, it generates a message:
✕
ArgumentsOptions[f[x, y], 1] 
You run a piece of code and it does what it does—and typically you don’t want it to leave anything behind. Often you can use scoping constructs like Module, Block, BlockRandom, etc. to achieve this. But sometimes there’ll be something you set up that needs to be explicitly “cleaned up” when your code finishes.
For example, you might create a file in your piece of code, and want the file removed when that particular piece of code finishes. In Version 12.2 there’s a convenient new function for managing things like this: WithCleanup.
WithCleanup[expr, cleanup] evaluates expr, then cleanup—but returns the result from expr. Here’s a trivial example (which could really be achieved better with Block). You’re assigning a value to x, getting its square—then clearing x before returning the square:
✕
WithCleanup[x = 7; x^2, Clear[x]] 
It’s already convenient just to have a construct that does cleanup while still returning the main expression you were evaluating. But an important detail of WithCleanup is that it also handles the situation where you abort the main evaluation you were doing. Normally, issuing an abort would cause everything to stop. But WithCleanup is set up to make sure that the cleanup happens even if there’s an abort. So if the cleanup involves, for example, deleting a file, the file gets deleted, even if the main operation is aborted.
WithCleanup also allows an initialization to be given. So here the initialization is done, as is the cleanup, but the main evaluation is aborted:
✕
WithCleanup[Echo[1], Abort[]; Echo[2], Echo[3]] 
By the way, WithCleanup can also be used with Confirm/Enclose to ensure that even if a confirmation fails, certain cleanup will be done.
It’s December 16, 2020, today—at least according to the standard Gregorian calendar that’s usually used in the US. But there are many other calendar systems in use for various purposes around the world, and even more that have been used at one time or another historically.
In earlier versions of Wolfram Language we supported a few common calendar systems. But in Version 12.2 we’ve added very broad support for calendar systems—altogether 41 of them. One can think of calendar systems as being a bit like projections in geodesy or coordinate systems in geometry. You have a certain time: now you have to know how it is represented in whatever system you’re using. And much like GeoProjectionData, there’s now CalendarData which can give you a list of available calendar systems:
✕
CalendarData["DateCalendar"] 
So here’s the representation of “now” converted to different calendars:
✕
CalendarConvert[Now, #] & /@ CalendarData["DateCalendar"] 
There are many subtleties here. Some calendars are purely “arithmetic”; others rely on astronomical computations. And then there’s the matter of “leap variants”. With the Gregorian calendar, we’re used to just adding a February 29. But the Chinese calendar, for example, can add whole “leap months” within a year (so that, for example, there can be two “fourth months”). In the Wolfram Language, we now have a symbolic representation for such things, using LeapVariant:
✕
DateObject[{72, 25, LeapVariant[4], 20}, CalendarType > "Chinese"] 
One reason to deal with different calendar systems is that they’re used to determine holidays and festivals in different cultures. (Another reason, particularly relevant to someone like me who studies history quite a bit, is in the conversion of historical dates: Newton’s birthday was originally recorded as December 25, 1642, but converting it to a Gregorian date it’s January 4, 1643.)
Given a calendar, something one often wants to do is to select dates that satisfy a particular criterion. And in Version 12.2 we’ve introduced the function DateSelect to do this. So, for example, we can select dates within a particular interval that satisfy the criterion that they are Wednesdays:
✕
DateSelect[DateInterval[{{{2020, 4, 1}, {2020, 4, 30}}}, "Day", "Gregorian", 5.], #DayName == Wednesday &] 
As a more complicated example, we can convert the current algorithm for selecting dates of US presidential elections to computable form, and then use it to determine dates for the next 50 years:
✕
DateSelect[DateInterval[{{2020}, {2070}}, "Day"], Divisible[#Year, 4] && #Month == 11 && #DayName == Tuesday && Or[#DayNameInstanceInMonth == 1 && #Day =!= 1, #DayNameInstanceInMonth == 2 && #Day == 8] &] 
By now, the Wolfram Language has strong capabilities in geo computation and geo visualization. But we’re continuing to expand our geo functionality. In Version 12.2 an important addition is spatial statistics (mentioned above)—which is fully integrated with geo. But there are also a couple of new geo primitives. One is GeoBoundary, which computes boundaries of things:
✕
GeoBoundary[CloudGet["https://wolfr.am/ROWGPJ4I"]] 
✕
GeoLength[%] 
There’s also GeoPolygon, which is a full geo generalization of ordinary polygons. One of the tricky issues GeoPolygon has to handle is what counts as the “interior” of a polygon on the Earth. Here it’s picking the larger area (i.e. the one that wraps around the globe):
✕
GeoGraphics[ GeoPolygon[{{50, 70}, {30, 90}, {70, 50}}, "LargerArea"]] 
GeoPolygon can also—like Polygon—handle holes, or in fact arbitrary levels of nesting:
✕
GeoGraphics[ GeoPolygon[ Entity["AdministrativeDivision", {"Illinois", "UnitedStates"}] > Entity["AdministrativeDivision", {"ChampaignCounty", "Illinois", "UnitedStates"}]]] 
But the biggest “coming attraction” of geo is completely new rendering of geo graphics and maps. It’s still preliminary (and unfinished) in Version 12.2, but there’s at least experimental support for vectorbased map rendering. The most obvious payoff from this is maps that look much crisper and sharper at all scales. But another payoff is our ability to introduce new styling for maps, and in Version 12.2 we’re including eight new map styles.
Here’s our “oldstyle”map:
✕
GeoGraphics[Entity["Building", "EiffelTower::5h9w8"], GeoRange > Quantity[400, "Meters"]] 
Here’s the new, vector version of this “classic” style:
✕
GeoGraphics[Entity["Building", "EiffelTower::5h9w8"], GeoBackground > "VectorClassic", GeoRange > Quantity[400, "Meters"]] 
Here’s a new (vector) style, intended for the web:
✕
GeoGraphics[Entity["Building", "EiffelTower::5h9w8"], GeoBackground > "VectorWeb", GeoRange > Quantity[400, "Meters"]] 
And here’s a “dark” style, suitable for having information overlaid on it:
✕
GeoGraphics[Entity["Building", "EiffelTower::5h9w8"], GeoBackground > "VectorDark", GeoRange > Quantity[400, "Meters"]] 
Want to analyze a document that’s in PDF? We’ve been able to extract basic content from PDF files for well over a decade. But PDF is a highly complex (and evolving) format, and many documents “in the wild” have complicated structures. In Version 12.2, however, we’ve dramatically expanded our PDF import capabilities, so that it becomes realistic to, for example, take a random paper from arXiv, and import it:
✕
Import["https://arxiv.org/pdf/2011.12174.pdf"] 
By default, what you’ll get is a highresolution image for each page (in this particular case, all 100 pages).
If you want the text, you can import that with "Plaintext":
✕
Import["https://arxiv.org/pdf/2011.12174.pdf", "Plaintext"] 
Now you can immediately make a word cloud of the words in the paper:
✕
WordCloud[%] 
This picks out all the images from the paper, and makes a collage of them:
✕
ImageCollage[Import["https://arxiv.org/pdf/2011.12174.pdf", "Images"]] 
You can get the URLs from each page:
✕
Import["https://arxiv.org/pdf/2011.12174.pdf", "URLs"] 
Now pick off the last two, and get images of those webpages:
✕
WebImage /@ Take[Flatten[Values[%]], 2] 
Depending on how they’re produced, PDFs can have all sorts of structure. "ContentsGraph" gives a graph representing the overall structure detected for a document:
✕
Import["https://arxiv.org/pdf/2011.12174.pdf", "ContentsGraph"] 
And, yes, it really is a graph:
✕
Graph[EdgeList[%]] 
For PDFs that are fillable forms, there’s more structure to import. Here I grabbed a random unfilled government form from the web. Import gives an association whose keys are the names of the fields—and if the form had been filled in, it would have given their values too, so you could immediately do analysis on them:
✕
Import["https://www.fws.gov/forms/320041.pdf", "FormFieldRules"] 
Starting in Version 12.0, we’ve been adding stateoftheart capabilities for solving largescale optimization problems. In Version 12.2 we’ve continued to round out these capabilities.
One new thing is the superfunction ConvexOptimization, which automatically handles the full spectrum of linear, linearfractional, quadratic, semidefinite and conic optimization—giving both optimal solutions and their dual properties. In 12.1 we added support for integer variables (i.e. combinatorial optimization); in 12.2 we’re also adding support for complex variables.
But the biggest new things for optimization in 12.2 are the introduction of robust optimization and of parametric optimization. Robust optimization lets you find an optimum that’s valid across a whole range of values of some of the variables. Parametric optimization lets you get a parametric function that gives the optimum for any possible value of particular parameters. So for example this finds the optimum for x, y for any (positive) value of α:
✕
ParametricConvexOptimization[(x  1)^2 + Abs[y], {(x + \[Alpha])^2 <= 1, x + y >= \[Alpha]}, {x, y}, {\[Alpha]}] 
Now evaluate the parametric function for a particular α:
✕
%[.76] 
As with everything in the Wolfram Language, we’ve put a lot of effort into making sure that convex optimization integrates seamlessly into the rest of the system—so you can set up models symbolically, and flow their results into other functions. We’ve also included some very powerful convex optimization solvers. But particularly if you’re doing mixed (i.e. real+integer) optimization, or you’re dealing with really huge (e.g. 10 million variables) problems, we’re also giving access to other, external solvers. So, for example, you can set up your problem using Wolfram Language as your “algebraic modeling language”, then (assuming you have the appropriate external licenses) just by setting Method to, say, “Gurobi” or “Mosek” you can immediately run your problem with an external solver. (And, by the way, we now have an open framework for adding more solvers.)
One can say that the whole idea of symbolic expressions (and their transformations) on which we rely so much in the Wolfram Language originated with combinators—which just celebrated their centenary on December 7, 2020. The version of symbolic expressions that we have in Wolfram Language is in many ways vastly more advanced and usable than raw combinators. But in Version 12.2—partly by way of celebrating combinators—we wanted to add a framework for raw combinators.
So now for example we have CombinatorS, CombinatorK, etc., rendered appropriately:
✕
CombinatorS[CombinatorK] 
But how should we represent the application of one combinator to another? Today we write something like:
✕
f@g@h@x 
But in the early days of mathematical logic there was a different convention—that involved leftassociative application, in which one expected “combinator style” to generate “functions” not “values” from applying functions to things. So in Version 12.2 we’re introducing a new “application operator” Application, displayed as (and entered as \[Application] or ap ):
✕
Application[f, Application[g, Application[h, x]]] 
✕
Application[Application[Application[f, g], h], x] 
And, by the way, I fully expect Application—as a new, basic “constructor”—to have a variety of uses (not to mention “applications”) in setting up general structures in the Wolfram Language.
The rules for combinators are trivial to specify using pattern transformations in the Wolfram Language:
✕
{CombinatorS\[Application]x_\[Application]y_\[Application]z_ :> x\[Application]z\[Application](y\[Application]z), CombinatorK\[Application]x_\[Application]y_ :> x} 
But one can also think about combinators more “algebraically” as defining relations between expressions—and there’s now a theory in AxiomaticTheory for that.
And in 12.2 a few more other theories have been added to AxiomaticTheory, as well as several new properties.
One of the major advances in Version 12.0 was the introduction of a symbolic representation for Euclidean geometry: you specify a symbolic GeometricScene, giving a variety of objects and constraints, and the Wolfram Language can “solve” it, and draw a diagram of a random instance that satisfies the constraints. In Version 12.2 we’ve made this interactive, so you can move the points in the diagram around, and everything will (if possible) interactively be rearranged so as to maintain the constraints.
Here’s a random instance of a simple geometric scene:
✕
RandomInstance[ GeometricScene[{a, b, c, d}, {CircleThrough[{a, b, c}, d], Triangle[{a, b, c}], d == Midpoint[{a, c}]}]] 
If you move one of the points, the other points will interactively be rearranged so as to maintain the constraints defined in the symbolic representation of the geometric scene:
✕
RandomInstance[ GeometricScene[{a, b, c, d}, {CircleThrough[{a, b, c}, d], Triangle[{a, b, c}], d == Midpoint[{a, c}]}]] 
What’s really going on inside here? Basically, the geometry is getting converted to algebra. And if you want, you can get the algebraic formulation:
✕
%["AlgebraicFormulation"] 
And, needless to say, you can manipulate this using the many powerful algebraic computation capabilities of the Wolfram Language.
In addition to interactivity, another major new feature in 12.2 is the ability to handle not just complete geometric scenes, but also geometric constructions that involve building up a scene in multiple steps. Here’s an example—that happens to be taken directly from Euclid:
✕
RandomInstance[GeometricScene[ {{\[FormalCapitalA], \[FormalCapitalB], \[FormalCapitalC], \ \[FormalCapitalD], \[FormalCapitalE], \[FormalCapitalF]}, {}}, { GeometricStep[{Line[{\[FormalCapitalA], \[FormalCapitalB]}], Line[{\[FormalCapitalA], \[FormalCapitalC]}]}, "Define an arbitrary angle BAC."], GeometricStep[{\[FormalCapitalD] \[Element] Line[{\[FormalCapitalA], \[FormalCapitalB]}], \[FormalCapitalE] \ \[Element] Line[{\[FormalCapitalA], \[FormalCapitalC]}], EuclideanDistance[\[FormalCapitalA], \[FormalCapitalD]] == EuclideanDistance[\[FormalCapitalA], \[FormalCapitalE]]}, "Put D and E on AB and AC equidistant from A."], GeometricStep[{Line[{\[FormalCapitalD], \[FormalCapitalE]}], GeometricAssertion[{\[FormalCapitalA], \[FormalCapitalF]}, \ {"OppositeSides", Line[{\[FormalCapitalD], \[FormalCapitalE]}]}], GeometricAssertion[ Triangle[{\[FormalCapitalE], \[FormalCapitalF], \ \[FormalCapitalD]}], "Equilateral"], Line[{\[FormalCapitalA], \[FormalCapitalF]}]}, "Construct an equilateral triangle on DE."] } ]] 
The first image you get is basically the result of the construction. And—like all other geometric scenes—it’s now interactive. But if you mouse over it, you’ll get controls that allow you to move to earlier steps:
✕

Move a point at an earlier step, and you’ll see what consequences that has for later steps in the construction.
Euclid’s geometry is the very first axiomatic system for mathematics that we know about. So—2000+ years later—it’s exciting that we can finally make it computable. (And, yes, it will eventually connect up with AxiomaticTheory, FindEquationalProof, etc.)
But in recognition of the significance of Euclid’s original formulation of geometry, we’ve added computable versions of his propositions (as well as a bunch of other “famous geometric theorems”). The example above turns out to be proposition 9 in Euclid’s book 1. And now, for example, we can get his original statement of it in Greek:
✕
Entity["GeometricScene", "EuclidBook1Proposition9"]["GreekStatement"] 
And here it is in modern Wolfram Language—in a form that can be understood by both computers and humans:
✕
Entity["GeometricScene", "EuclidBook1Proposition9"]["Scene"] 
An important part of the story of Wolfram Language as a fullscale computational language is its access to our vast knowledgebase of data about the world. The knowledgebase is continually being updated and expanded, and indeed in the time since Version 12.1 essentially all domains have had data (and often a substantial amount) updated, or entities added or modified.
But as examples of what’s been done, let me mention a few additions. One area that’s received a lot of attention is food. By now we have data about more than half a million foods (by comparison, a typical large grocery store stocks perhaps 30,000 types of items). Pick a random food:
✕
RandomEntity["Food"] 
Now generate a nutrition label:
✕
%["NutritionLabel"] 
As another example, a new type of entity that’s been added is physical effects. Here are some random ones:
✕
RandomEntity["PhysicalEffect", 10] 
And as an example of something that can be done with all the data in this domain, here’s a histogram of the dates when these effects were discovered:
✕
DateHistogram[EntityValue["PhysicalEffect", "DiscoveryDate"], "Year", PlotRange > {{DateObject[{1700}, "Year", "Gregorian", 5.`], DateObject[{2000}, "Year", "Gregorian", 5.`]}, Automatic}] 
As another sample of what we’ve been up to, there’s also now what one might (tongueincheek) call a “heavylifting” domain—weighttraining exercises:
✕
Entity["WeightTrainingExercise", "BenchPress"]["Dataset"] 
An important feature of the Wolfram Knowledgebase is that it contains symbolic objects, which can represent not only “plain data”—like numbers or strings—but full computational content. And as an example of this, Version 12.2 allows one to access the Wolfram Demonstrations Project—with all its active Wolfram Language code and notebooks—directly in the knowledgebase. Here are some random Demonstrations:
✕
RandomEntity["WolframDemonstration", 5] 
The values of properties can be dynamic interactive objects:
✕
Entity["WolframDemonstration", "MooreSpiegelAttractor"]["Manipulate"] 
And because everything is computable, one can for example immediately make an image collage of all Demonstrations on a particular topic:
✕
ImageCollage[ EntityValue[ EntityClass["WolframDemonstration", "ChemicalEngineering"], "Thumbnail"]] 
It’s been nearly 7 years since we first introduced Classify and Predict, and began the process of fully integrating neural networks into the Wolfram Language. There’ve been two major directions: the first is to develop “superfunctions”, like Classify and Predict, that—as automatically as possible—perform machinelearningbased operations. The second direction is to provide a powerful symbolic framework to take advantage of the latest advances with neural nets (notably through the Wolfram Neural Net Repository) and to allow flexible continued development and experimentation.
Version 12.2 has progress in both these areas. An example of a new superfunction is FaceRecognize. Give it a small number of tagged examples of faces, and it will try to identify them in images, videos, etc. Let’s get some training data from web searches (and, yes, it’s somewhat noisy):
✕
faces = Image[#, ImageSize > 30] & /@ AssociationMap[Flatten[ FindFaces[#, "Image"] & /@ WebImageSearch["star trek " <> #]] &, {"JeanLuc Picard", "William Riker", "Phillipa Louvois", "Data"}] 
Now create a face recognizer with this training data:
✕
recognizer = FaceRecognize[faces] 
Now we can use this to find out who’s on screen in each frame of a video:
✕
VideoMapList[recognizer[FindFaces[#Image, "Image"]] &, Video[URLDownload["https://ia802900.us.archive.org/7/items/2000promoforstartrekthenextgeneration/2000%20promo%20for%20Star%20Trek%20%20The%20Next%20Generation.ia.mp4"]]] /. m_Missing \[RuleDelayed] "Other" 
Now plot the results:
✕
ListPlot[Catenate[ MapIndexed[{First[#2], #1} &, ArrayComponents[%], {2}]], Sequence[ ColorFunction > ColorData["Rainbow"], Ticks > {None, Thread[{ Range[ Max[ ArrayComponents[rec]]], DeleteDuplicates[ Flatten[rec]]}]}]] 
In the Wolfram Neural Net Repository there’s a regular stream of new networks being added. Since Version 12.1 about 20 new kinds of networks have been added—including many new transformer nets, as well as EfficientNet and for example feature extractors like BioBERT and SciBERT specifically trained on text from scientific papers.
In each case, the networks are immediately accessible—and usable—through NetModel. Something that’s updated in Version 12.2 is the visual display of networks:
✕
NetModel["ELMo Contextual Word Representations Trained on 1B Word \ Benchmark"] 
There are lots of new icons, but there’s also now a clear convention that circles represent fixed elements of a net, while squares represent trainable ones. In addition, when there’s a thick border in an icon, it means there’s an additional network inside, that you can see by clicking.
Whether it’s a network that comes from NetModel or your construct yourself (or a combination of those two), it’s often convenient to extract the “summary graphic” for the network, for example so you can put it in documentation or a publication. Information provides several levels of summary graphics:
✕
Information[ NetModel["CapsNet Trained on MNIST Data"], "SummaryGraphic"] 
There are several important additions to our core neural net framework that broaden the range of neural net functionality we can access. The first is that in Version 12.2 we have native encoders for graphs and for time series. So, here, for example, we’re making a feature space plot of 20 random named graphs:
✕
FeatureSpacePlot[GraphData /@ RandomSample[GraphData[], 20]] 
Another enhancement to the framework has to do with diagnostics for models. We introduced PredictorMeasurements and ClassifierMeasurements many years ago to provide a symbolic representation for the performance of models. In Version 12.2—in response to many requests—we’ve made it possible to feed final predictions, rather than a model, to create a PredictorMeasurements object, and we’ve streamlined the appearance and operation of PredictorMeasurements objects:
✕
PredictorMeasurements[{3.2, 3.5, 4.6, 5}, {3, 4, 5, 6}] 
An important new feature of ClassifierMeasurements is the ability to compute a calibration curve that compares the actual probabilities observed from sampling a test set with the predictions from the classifier. But what’s even more important is that Classify automatically calibrates its probabilities, in effect trying to “sculpt” the calibration curve:
✕
Row[{ First@ClassifierMeasurements[ Classify[training, Method > "RandomForest", "Calibration" > False], test, "CalibrationCurve"], " \[LongRightArrow] ", First@ClassifierMeasurements[ Classify[training, Method > "RandomForest", "Calibration" > True], test, "CalibrationCurve"] }] 
Version 12.2 also has the beginning of a major update to the way neural networks can be constructed. The fundamental setup has always been to put together a certain collection of layers that expose what amount to array indices that are connected by explicit edges in a graph. Version 12.2 now introduces FunctionLayer, which allows you to give something much closer to ordinary Wolfram Language code. As an example, here’s a particular function layer:
✕
FunctionLayer[ 2*(#v . #m . {0.25, 0.75}) . NetArray[<"Array" > {0.1, 0.9}>] & ] 
And here’s the representation of this function layer as an explicit NetGraph:
✕
NetGraph[%] 
v and m are named “input ports”. The NetArray—indicated by the square icons in the net graph—is a learnable array, here containing just two elements.
There are cases where it’s easier to use the “blockbased” (or “graphical”) programming approach of just connecting together layers (and we’ve worked hard to ensure that the connections can be made as automatically as possible). But there are also cases where it’s easier to use the “functional” programming approach of FunctionLayer. For now, FunctionLayer supports only a subset of the constructs available in the Wolfram Language—though this already includes many standard array and functional programming operations, and more will be added in the future.
An important feature of FunctionLayer is that the neural net it produces will be as efficient as any other neural net, and can run on GPUs etc. But what can you do about Wolfram Language constructs that are not yet natively supported by FunctionLayer? In Version 12.2 we’re adding another new experimental function—CompiledLayer—that extends the range of Wolfram Language code that can be handled efficiently.
It’s perhaps worth explaining a bit about what’s happening inside. Our main neural net framework is essentially a symbolic layer that organizes things for optimized lowlevel implementation, currently using MXNet. FunctionLayer is effectively translating certain Wolfram Language constructs directly to MXNet. CompiledLayer is translating Wolfram Language to LLVM and then to machine code, and inserting this into the execution process within MXNet. CompiledLayer makes use of the new Wolfram Language compiler, and its extensive type inference and type declaration mechanisms.
OK, so let’s say one’s built a magnificent neural net in our Wolfram Language framework. Everything is set up so that the network can immediately be used in a whole range of Wolfram Language superfunctions (Classify, FeatureSpacePlot, AnomalyDetection, FindClusters, …). But what if one wants to use the network “standalone” in an external environment? In Version 12.2 we’re introducing the capability to export essentially any network in the recently developed ONNX standard representation.
And once one has a network in ONNX form, one can use the whole ecosystem of external tools to deploy it in a wide variety of environments. A notable example—that’s now a fairly streamlined process—is to take a full Wolfram Language–created neural net and run it in CoreML on an iPhone, so that it can for example directly be included in a mobile app.
What’s the best way to collect structured material? If you just want to get a few items, an ordinary form created with FormFunction (and for example deployed in the cloud) can work well. But what if you’re trying to collect longer, richer material?
For example, let’s say you’re creating a quiz where you want students to enter a whole sequence of complex responses. Or let’s say you’re creating a template for people to fill in documentation for something. What you need in these cases is a new concept that we’re introducing in Version 12.2: form notebooks.
A form notebook is basically a notebook that is set up to be used as a complex “form”, where the inputs in the form can be all the kinds of things that you’re used to having in a notebook.
The basic workflow for form notebooks is the following. First you author a form notebook, defining the various “form elements” (or areas) that you want the user of the form notebook to fill in. As part of the authoring process, you define what you want to have happen to the material the user of the form notebook enters when they use the form notebook (e.g. put the material in a Wolfram Data Drop databin, send the material to a cloud API, send the material as a symbolic expression by email, etc.).
After you’ve authored the form notebook, you then generate an active version that can be sent to whoever will be using the form notebook. Once someone has filled in their material in their copy of the deployed form notebook, they press a button, typically “Submit”, and their material is then sent as a structured symbolic expression to whatever destination the author of the form notebook specified.
It’s perhaps worth mentioning how form notebooks relate to something that sounds similar: template notebooks. In a sense, a template notebook is doing the reverse of a form notebook. A form notebook is about having a user enter material that will then be processed. A template notebook, on the other hand, is about having the computer generate material which will then be used to populate a notebook whose structure is defined by the template notebook.
OK, so how do you get started with form notebooks? Just go to File > New > Programmatic Notebook > Form Notebook Authoring:
This is just a notebook, where you can enter whatever content you want—say an explanation of what you want people to do when they “fill out” the form notebook. But then there are special cells or sequences of cells in the form notebook that we call “form elements” and “editable notebook areas”. These are what the user of the form notebook “fills out” to enter their “responses”, and the material they provide is what gets sent when they press the “Submit” button (or whatever final action has been defined).
In the authoring notebook, the toolbar gives you a menu of possible form elements that you can insert:
Let’s pick Input Field as an example:
What does all this mean? Basically a form element is represented by a very flexible symbolic Wolfram Language expression, and this is giving you a way to specify the expression you want. You can give a label and a hint to put in the input field. But it’s with the Interpreter that you start to see the power of Wolfram Language. Because the Interpreter is what takes whatever the user of the form notebook enters in this input field, and interprets it as a computable object. The default is just to treat it as a string. But it could for example be a “Country” or a “MathExpression”. And with these choices, the material will automatically be interpreted as a country, math expression, etc., with the user typically being prompted if their input can’t be interpreted as specified.
There are lots of options about the details of how even an input field can work. Some of them are provided in the Add Action menu:
But so what actually “is” this form element? Press the CODE tab on the left to see:
What would a user of the form notebook see here? Press the PREVIEW tab to find out:
Beyond input fields, there are lots of other possible form elements. There are things like checkboxes, radio buttons and sliders. And in general it’s possible to use any of the rich symbolic user interface constructs that exist in the Wolfram Language.
Once you’ve finishing authoring, you press Generate to generate a form notebook that is ready to be provided to users to be filled in. The Settings define things like how the “submit” action should be specified, and what should be done when the form notebook is submitted:
So what is the “result” of a submitted form notebook? Basically it’s an association that says what was filled into each area of the form notebook. (The areas are identified by keys in the association that were specified when the areas were first defined in the authoring notebook.)
Let’s see how this works in a simple case. Here’s the authoring notebook for a form notebook:
Here’s the generated form notebook, ready to be filled in (assuming you have 12.2):
Here’s a sample of how the form notebook might be filled in:
And this is what “comes back” when Submit is pressed:
For testing, you can just have this association placed interactively in a notebook. But in practice it’s more common to send the association to a databin, store it in a cloud object, or generally put it in a more “centralized” location.
Notice that at the end of this example we have an editable notebook area—where you can enter freeform notebook content (with cells, headings, code, output, etc.) that will all be captured when the form notebook is submitted.
Form notebooks are very powerful idea, and you’ll see them used all over the place. As a first example, the various submission notebooks for the Wolfram Function Repository, Wolfram Demonstrations Project, etc. are becoming form notebooks. We’re also expecting a lot of use of form notebooks in educational settings. And as part of that, we’re building a system that leverages Wolfram Language for assessing responses in form notebooks (and elsewhere).
You can see the beginnings of this in Version 12.2 with the experimental function AssessmentFunction—which can be hooked into form notebooks somewhat like Interpreter. But even without the full capabilities planned for AssessmentFunction there’s still an incredible amount that can be done—in educational settings and otherwise—using form notebooks.
It’s worth understanding, by the way, that form notebooks are ultimately very simple to use in any particular case. Yes, they have a lot of depth that allows them to do a very wide range of things. And they’re basically only possible because of the whole symbolic structure of the Wolfram Language, and the fact that Wolfram Notebooks are ultimately represented as symbolic expressions. But when it comes to using them for a particular purpose they’re very streamlined and straightforward, and it’s completely realistic to create a useful form notebook in just a few minutes.
We invented notebooks—with all their basic features of hierarchical cells, etc.—back in 1987. But for a third of a century, we’ve been progressively polishing and streamlining how they work. And in Version 12.2 there are all sorts of useful and convenient new notebook features.
It’s a very simple feature, but it’s very useful. You see something in a notebook, and all you really want to be able to do with it is copy it (or perhaps copy something related to it). Well, then just use ClickToCopy:
✕
ClickToCopy[10!] 
If you want to clicktocopy something unevaluated, use Defer:
✕
ClickToCopy[Plot[Sin[x], {x, 0, 10}], Defer[Plot[Sin[x], {x, 0, 10}]]] 
++h has inserted a hyperlink in a Wolfram Notebook since 1996. But in Version 12.2 there are two important new things with hyperlinks. First, automatic hyperlinking that handles a wide range of different situations. And second, a modernized and streamlined mechanism for hyperlink creation and editing.
In Version 12.2 we’re exposing something that we’ve had internally for a while: the ability to attach a floating fully functional cell to any given cell (or box, or whole notebook). Accessing this feature needs symbolic notebook programming, but it lets you do very powerful things—particularly in introducing contextual and “justintime” interfaces. Here’s an example that puts a dynamic counter that counts in primes on the rightbottom part of the cell bracket:
✕
obj=AttachCell[EvaluationCell[],Panel[Dynamic[i]],{"CellBracket",Bottom},0,{Right,Bottom}]; Do[PrimeQ[i],{i,10^7}]; NotebookDelete[obj] 
Sometimes it’s useful for what you see not to be what you have. For example, you might want to display something in a notebook as J_{0}(x) but have it really be BesselJ[0, x]. For many years, we’ve had Interpretation as a way to set this up for specific expressions. But we’ve also had a more general mechanism—TemplateBox—that lets you take expressions, and separately specify how they should be displayed, and interpreted.
In Version 12.2 we’ve further generalized—and streamlined—TemplateBox, allowing it to incorporate arbitrary user interface elements, as well as allowing it to specify things like copy behavior. Our new T_{E}X input mechanism, for example, is basically just an application of the new TemplateBox.
In this case, "TeXAssistantTemplate" refers to a piece of functionality defined in the notebook stylesheet—whose parameters are specified by the association given in the TemplateBox:
✕
RawBoxes[TemplateBox[< "boxes" > FormBox[FractionBox["1", "2"], TraditionalForm], "errors" > {}, "input" > "\\frac{1}{2}", "state" > "Boxes">, "TeXAssistantTemplate"]] 
An important feature of Wolfram Notebooks is that they’re set up to operate both on the desktop and in the cloud. And even between versions of Wolfram Language there’s lots of continued enhancement in the way notebooks work in the cloud. But in Version 12.2 there’s been some particular streamlining of the interface for notebooks between desktop and cloud.
A particularly nice mechanism already available for a couple of years in any desktop notebook is the File > Publish to Cloud menu item, which allows you to take the notebook and immediately make it available as a published cloud notebook that can be accessed by anyone with a web browser. In Version 12.2 we’ve streamlined the process of notebook publishing.
When I’m giving a presentation I’ll usually be creating a desktop notebook as I go (or perhaps using one that already exists). And at the end of the presentation, it’s become my practice to publish it to the cloud, so anyone in the audience can interact with it. But how can I give everyone the URL for the notebook? In a virtual setting, you can just use chat. But in an actual physical presentation, that’s not an option. And in Version 12.2 we’ve provided a convenient alternative: the result of Publish to Cloud includes a QR code that people can capture with their phones, then immediately go to the URL and interact with the notebook on their phones.
There’s one other notable new item visible in the result of Publish to Cloud: “Direct JavaScript Embedding”. This is a link to the Wolfram Notebook Embedder which allows cloud notebooks to be directly embedded through JavaScript onto webpages.
It’s always easy to use an iframe to embed one webpage on another. But iframes have many limitations, such as requiring their sizes to be defined in advance. The Wolfram Notebook Embedder allows fullfunction fluid embedding of cloud notebooks—as well as scriptable control of the notebooks from other elements of a webpage. And since the Wolfram Notebook Embedder is set up to use the oEmbed embedding standard, it can immediately be used in basically all standard web content management systems.
We’ve talked about sending notebooks from the desktop to the cloud. But another thing that’s new in Version 12.2 is faster and easier browsing of your cloud file system from the desktop—as accessed from File > Open from Cloud and File > Save to Cloud.
One of the things we want to do with Wolfram Language is to make it as easy as possible to connect with pretty much any external system. And in modern times an important part of that is being able to conveniently handle cryptographic protocols. And ever since we started introducing cryptography directly into the Wolfram Language five years ago, I’ve been surprised at just how much the symbolic character of the Wolfram Language has allowed us to clarify and streamline things to do with cryptography.
A particularly dramatic example of this has been how we’ve been able to integrate blockchains into Wolfram Language (and Version 12.2 adds bloxberg with several more on the way). And in successive versions we’re handling different applications of cryptography. In Version 12.2 a major emphasis is symbolic capabilities for key management. Version 12.1 already introduced SystemCredential for dealing with local “keychain” key management (supporting, for example, “remember me” in authentication dialogs). In 12.2 we’re also dealing with PEM files.
If we import a PEM file containing a private key we get a nice, symbolic representation of the private key:
✕
private = First[Import["ExampleData/privatesecp256k1.pem"]] 
Now we can derive a public key:
✕
public = PublicKey[%] 
If we generate a digital signature for a message using the private key
✕
GenerateDigitalSignature["Hello there", private] 
then this verifies the signature using the public key we’ve derived:
✕
VerifyDigitalSignature[{"Hello there", %}, public] 
An important part of modern security infrastructure is the concept of a security certificate—a digital construct that allows a third party to attest to the authenticity of a particular public key. In Version 12.2 we now have a symbolic representation for security certificates—providing what’s needed for programs to establish secure communication channels with outside entities in the same kind of way that https does:
✕
Import["ExampleData/client.pem"] 
In Version 12.0 we introduced powerful functionality for querying relational databases symbolically within the Wolfram Language. Here’s how we connect to a database:
✕
db = DatabaseReference[ FindFile["ExampleData/ecommercedatabase.sqlite"]] 
Here’s how we connect the database so that its tables can be treated just like entity types from the builtin Wolfram Knowledgebase:
✕
EntityRegister[EntityStore[RelationalDatabase[db]]] 
Now we can for example ask for a list of entities of a given type:
✕
EntityList["offices"] 
What’s new in 12.2 is that we can conveniently go “under” this layer, to directly execute SQL queries against the underlying database, getting the complete database table as a Dataset expression:
✕
ExternalEvaluate[db, "SELECT * FROM offices"] 
These queries can not only read from the database, but also write to it. And to make things even more convenient, we can effectively treat SQL just like any other “external language” in a notebook.
First we have to register our database, to say what we want our SQL to be run against:
✕
RegisterExternalEvaluator["SQL", db] 
And now we can just type SQL as input—and get back Wolfram Language output, directly in the notebook:
You’ve developed a control system or signal processing in Wolfram Language. Now how do you deploy it to a piece of standalone electronics? In Version 12.0 we introduced the Microcontroller Kit for compiling from symbolic Wolfram Language structures directly to microcontroller code.
We’ve had lots of feedback on this, asking us to expand the range of microcontrollers that we support. So in Version 12.2 I’m happy to say that we’re adding support for 36 new microcontrollers, particularly 32bit ones:
Here’s an example in which we deploy a symbolically defined digital filter to a particular kind of microcontroller, showing the simplified C source code generated for that particular microcontroller:
✕
Needs["MicrocontrollerKit`"] 
✕
ToDiscreteTimeModel[ButterworthFilterModel[{3, 2}], 0.6] // Chop 
✕
MicrocontrollerEmbedCode[%, <"Target" > "AdafruitGrandCentralM4", "Inputs" > 0 > "Serial", "Outputs" > 1 > "Serial">, "/dev/cu.usbmodem14101"]["SourceCode"] 
Our longterm goal is to make the Wolfram Language and the computational intelligence it provides as ubiquitous as possible. And part of doing this is to set up the Wolfram Engine which implements the language so that it can be deployed in as broad a range of computational infrastructure settings as possible.
Wolfram Desktop—as well as classic Mathematica—primarily provides a notebook interface to the Wolfram Engine, running on a local desktop system. It’s also possible to run Wolfram Engine directly—as a commandline program (e.g. through WolframScript)—on a local computer system. And, of course, one can run the Wolfram Engine in the cloud, either through the full Wolfram Cloud (public or private), or through more lightweight cloud and server offerings (both existing and forthcoming).
But with Version 12.2 there’s a new deployment of the Wolfram Engine: WSTPServer. If you use Wolfram Engine in the cloud, you’re typically communicating with it through http or related protocols. But for more than thirty years, the Wolfram Language has had its own dedicated protocol for transferring symbolic expressions and everything around them. Originally we called it MathLink, but in more recent years, as it’s progressively been extended, we’ve called it WSTP: the Wolfram Symbolic Transfer Protocol. What WSTPServer does, as its name suggests, is to give you a lightweight server that delivers Wolfram Engines and lets you communicate with them directly in native WSTP.
Why is this important? Basically because it gives you a way to manage pools of persistent Wolfram Language sessions that can operate as services for other applications. For example, normally each time you call WolframScript you get a new, fresh Wolfram Engine. But by using wolframscript wstpserver with a particular “WSTP profile name” you can keep getting the same Wolfram Engine every time you call WolframScript. You can do this directly on your local machine—or on remote machines.
And an important use of WSTPServer is to expose pools of Wolfram Engines that can be accessed through the new RemoteEvaluate function in Version 12.2. It’s also possible to use WSTPServer to expose Wolfram Engines for use by ParallelMap, etc. And finally, since WSTP has (for nearly 30 years!) been the way the notebook front end communicates with the Wolfram Engine kernel, it’s now possible to use WSTPServer to set up a centralized kernel pool to which you can connect the notebook front end, allowing you, for example, to keep running a particular session (or even a particular computation) in the kernel even as you switch to a different notebook front end, on a different computer.
Along the lines of “use Wolfram Language everywhere” another new function in Version 12.2 is RemoteEvaluate. We’ve got CloudEvaluate which does a computation in the Wolfram Cloud, or an Enterprise Private Cloud. We’ve got ParallelEvaluate which does computations on a predefined collection of parallel subkernels. And in Version 12.2 we’ve got RemoteBatchSubmit which submits batch computations to cloud computation providers.
RemoteEvaluate is a general, lightweight “evaluate now” function that lets you do a computation on any specified remote machine that has an accessible Wolfram Engine. You can connect to the remote machine using ssh or wstp (or http with a Wolfram Cloud endpoint).
✕
RemoteEvaluate["ssh://byblis67.wolfram.com", Labeled[Framed[$MachineName], Now]] 
Sometimes you’ll want to use RemoteEvaluate to do things like system administration across a range of machines. Sometimes you might want to collect or send data to remote devices. For example, you might have a network of Raspberry Pi computers which all have Wolfram Engine—and then you can use RemoteEvaluate to do something like retrieve data from these machines. By the way, you can also use ParallelEvaluate from within RemoteEvaluate, so you’re having a remote machine be the master for a collection of parallel subkernels.
Sometimes you’ll want RemoteEvaluate to start a fresh instance of Wolfram Engine whenever you do an evaluation. But with WSTPServer you can also have it use a persistent Wolfram Language session. RemoteEvaluate and WSTPServer are the beginning of a general symbolic framework for representing running Wolfram Engine processes. Version 12.2 already has RemoteKernelObject and $DefaultRemoteKernel which provide symbolic ways to represent remote Wolfram Language instances.
I’ve at least touched on many of the bigger new features of Version 12.2. But there’s a lot more. Additional functions, enhancements, fixes and general rounding out and polishing.
Like in computational geometry, ConvexHullRegion now deals with regions, not just points. And there are functions like CollinearPoints and CoplanarPoints that test for collinearity and coplanarity, or give conditions for achieving them.
There are more import and export formats. Like there’s now support for the archive formats: “7z”, “ISO”, “RAR”, “ZSTD”. There’s also FileFormatQ and ByteArrayFormatQ for testing whether things correspond to particular formats.
In terms of core language, there are things like updates to the complicatedtodefine ValueQ. There’s also RandomGeneratorState that gives a symbolic representation of random generator states.
In the desktop package (i.e. .wl file) editor, there’s a new (somewhat experimental) Format Cell button, that reformats code—with a control on how “airy” it should be (i.e. how dense it should be in newlines).
In WolframAlphaMode Notebooks (as used by default in WolframAlpha Notebook Edition) there are other new features, like function documentation targeted for particular function usage.
There’s also more in TableView, as well as a large suite of new paclet authoring tools that are included on an experimental basis.
To me it’s rather amazing how much we’ve been able to bring together in Version 12.2, and, as always, I’m excited that it’s now out and available to everyone to use….
]]>On Tuesday, December 7, 1920, the Göttingen Mathematics Society held its regular weekly meeting—at which a 32yearold local mathematician named Moses Schönfinkel with no known previous mathematical publications gave a talk entitled “Elemente der Logik” (“Elements of Logic”).
A hundred years later what was presented in that talk still seems in many ways alien and futuristic—and for most people almost irreducibly abstract. But we now realize that that talk gave the first complete formalism for what is probably the single most important idea of this past century: the idea of universal computation.
Sixteen years later would come Turing machines (and lambda calculus). But in 1920 Moses Schönfinkel presented what he called “building blocks of logic”—or what we now call “combinators”—and then proceeded to show that by appropriately combining them one could effectively define any function, or, in modern terms, that they could be used to do universal computation.
Looking back a century it’s remarkable enough that Moses Schönfinkel conceptualized a formal system that could effectively capture the abstract notion of computation. And it’s more remarkable still that he formulated what amounts to the idea of universal computation, and showed that his system achieved it.
But for me the most amazing thing is that not only did he invent the first complete formalism for universal computation, but his formalism is probably in some sense minimal. I’ve personally spent years trying to work out just how simple the structure of systems that support universal computation can be—and for example with Turing machines it took from 1936 until 2007 for us to find the minimal case.
But back in his 1920 talk Moses Schönfinkel—presenting a formalism for universal computation for the very first time—gave something that is probably already in his context minimal.
Moses Schönfinkel described the result of his 1920 talk in an 11page paper published in 1924 entitled “Über die Bausteine der mathematischen Logik” (“On the Building Blocks of Mathematical Logic”). The paper is a model of clarity. It starts by saying that in the “axiomatic method” for mathematics it makes sense to try to keep the number of “fundamental notions” as small as possible. It reports that in 1913 Henry Sheffer managed to show that basic logic requires only one connective, that we now call Nand. But then it begins to go further. And already within a couple of paragraphs it’s saying that “We are led to [an] idea, which at first glance certainly appears extremely bold”. But by the end of the introduction it’s reporting, with surprise, the big news: “It seems to me remarkable in the extreme that the goal we have just set can be realized… [and]; as it happens, it can be done by a reduction to three fundamental signs”.
Those “three fundamental signs”, of which he only really needs two, are what we now call the S and K combinators (he called them S and C). In concept they’re remarkably simple, but their actual operation is in many ways braintwistingly complex. But there they were—already a century ago—just as they are today: minimal elements for universal computation, somehow conjured up from the mind of Moses Schönfinkel.
So who was this person, who managed so long ago to see so far?
The complete known published output of Moses Schönfinkel consists of just two papers: his 1924 “On the Building Blocks of Mathematical Logic”, and another, 31page paper from 1927, coauthored with Paul Bernays, entitled “Zum Entscheidungsproblem der mathematischen Logik” (“On the Decision Problem of Mathematical Logic”).
And somehow Schönfinkel has always been in the shadows—appearing at best only as a kind of footnote to a footnote. Turing machines have taken the limelight as models of computation—with combinators, hard to understand as they are, being mentioned at most only in obscure footnotes. And even within the study of combinators—often called “combinatory logic”—even as S and K have remained ubiquitous, Schönfinkel’s invention of them typically garners at most a footnote.
About Schönfinkel as a person, three things are commonly said. First, that he was somehow connected with the mathematician David Hilbert in Göttingen. Second, that he spent time in a psychiatric institution. And third, that he died in poverty in Moscow, probably around 1940 or 1942.
But of course there has to be more to the story. And in recognition of the centenary of Schönfinkel’s announcement of combinators, I decided to try to see what I could find out.
I don’t think I’ve got all the answers. But it’s been an interesting, if at times unsettling, trek through the Europe—and mathematics—of a century or so ago. And at the end of it I feel I’ve come to know and understand at least a little more about the triumph and tragedy of Moses Schönfinkel.
It’s a strange and sad resonance with Moses Schönfinkel’s life… but there’s a 1953 song by Tom Lehrer about plagiarism in mathematics—where the protagonist explains his chain of intellectual theft: “I have a friend in Minsk/Who has a friend in Pinsk/Whose friend in Omsk”… “/Whose friend somehow/Is solving now/The problem in Dnepropetrovsk”. Well, Dnepropetrovsk is where Moses Schönfinkel was born.
Except, confusingly, at the time it was called (after Catherine the Great) Ekaterinoslav (Екатеринослáв)—and it’s now called Dnipro. It’s one of the larger cities in Ukraine, roughly in the center of the country, about 250 miles down the river Dnieper from Kiev. And at the time when Schönfinkel was born, Ukraine was part of the Russian Empire.
So what traces are there of Moses Schönfinkel in Ekaterinoslav (AKA Dnipro) today? 132 years later it wasn’t so easy to find (especially during a pandemic)… but here’s a record of his birth: a certificate from the Ekaterinoslav Public Rabbi stating that entry 272 of the Birth Register for Jews from 1888 records that on September 7, 1888, a son Moses was born to the Ekaterinoslav citizen Ilya Schönfinkel and his wife Masha:
This seems straightforward enough. But immediately there’s a subtlety. When exactly was Moses Schönfinkel born? What is that date? At the time the Russian Empire—which had the Russian Orthodox Church, which eschewed Pope Gregory’s 1582 revision of the calendar—was still using the Julian calendar introduced by Julius Caesar. (The calendar was switched in 1918 after the Russian Revolution, although the Orthodox Church plans to go on celebrating Christmas on January 7 until 2100.) So to know a correct modern (i.e. Gregorian calendar) date of birth we have to do a conversion. And from this we’d conclude that Moses Schönfinkel was born on September 19, 1888.
But it turns out that’s not the end of the story. There are several other documents associated with Schönfinkel’s college years that also list his date of birth as September 7, 1888. But the state archives of the Dnepropetrovsk region contain the actual, original register from the synagogue in Ekaterinoslav. And here’s entry 272—and it records the birth of Moses Schönfinkel, but on September 17, not September 7:
So the official certificate is wrong! Someone left a digit out. And there’s a check: the Birth Register also gives the date in the Jewish calendar: 24 Tishrei–which for 1888 is the Julian date September 17. So converting to modern Gregorian form, the correct date of birth for Moses Schönfinkel is September 29, 1888.
OK, now what about his name? In Russian it’s given as Моисей Шейнфинкель (or, including the patronymic, with the most common transliteration from Hebrew, Моисей Эльевич Шейнфинкель). But how should his last name be transliterated? Well, there are several possibilities. We’re using Schönfinkel—but other possibilities are Sheinfinkel and Sheynfinkel—and these show up almost randomly in different documents.
What else can we learn from Moses Schönfinkel’s “birth certificate”? Well, it describes his father Эльева (Ilya) as an Ekaterinoslav мещанина. But what is that word? It’s often translated “bourgeoisie”, but seems to have basically meant “middleclass city dweller”. And in other documents from the time, Ilya Schönfinkel is described as a “merchant of the 2nd guild” (i.e. not the “top 5%” 1st guild, nor the lower 3rd guild).
Apparently, however, his fortunes improved. The 1905 “Index of Active Enterprises Incorporated in the [Russian] Empire” lists him as a “merchant of the 1st guild” and records that in 1894 he cofounded the company of “Lurie & Sheinfinkel” (with a paidin capital of 10,000 rubles, or about $150k today) that was engaged in the grocery trade:
Lurie & Sheinfinkel seems to have had multiple wine and grocery stores. Between 1901 and 1904 its “store #2” was next to a homeopathic pharmacy in a building that probably looked at the time much like it does today:
And for store #1 there are actually contemporary photographs (note the инкель for the end of “Schönfinkel” visible on the bottom left; this particular building was destroyed in World War II):
There seems to have been a close connection between the Schönfinkels and the Luries—who were a prominent Ekaterinoslav family involved in a variety of enterprises. Moses Schönfinkel’s mother Maria (Masha) was originally a Lurie (actually, she was one of the 8 siblings of Ilya Schönfinkel’s business partner Aron Lurie). Ilya Schönfinkel is listed from 1894 to 1897 as “treasurer of the Lurie Synagogue”. And in 1906 Moses Schönfinkel listed his mailing address in Ekaterinoslav as Lurie House, Ostrozhnaya Square. (By 1906 that square sported an upscale park—though a century earlier it had housed a prison that was referenced in a poem by Pushkin. Now it’s the site of an opera house.)
Accounts of Schönfinkel sometimes describe him as coming from a “village in Ukraine”. In actuality, at the turn of the twentieth century Ekaterinoslav was a bustling metropolis, that for example had just become the third city in the whole Russian Empire to have electric trams. Schönfinkel’s family also seems to have been quite well to do. Some pictures of Ekaterinoslav from the time give a sense of the environment (this building was actually the site of a Lurie candy factory):
As the name “Moses” might suggest, Moses Schönfinkel was Jewish, and at the time he was born there was a large Jewish population in the southern part of Ukraine. Many Jews had come to Ekaterinoslav from Moscow, and in fact 40% of the whole population of the town was identified as Jewish.
Moses Schönfinkel went to the main high school in town (the “Ekaterinoslav classical gymnasium”)—and graduated in 1906, shortly before turning 18. Here’s his diploma:
The diploma shows that he got 5/5 in all subjects—the subjects being theology, Russian, logic, Latin, Greek, mathematics, geodesy (“mathematical geography”), physics, history, geography, French, German and drawing. So, yes, he did well in high school. And in fact the diploma goes on to say: “In view of his excellent behavior and diligence and excellent success in the sciences, especially in mathematics, the Pedagogical Council decided to award him the Gold Medal…”
Having graduated from high school, Moses Schönfinkel wanted to go (“for purely family reasons”, he said) to the University of Kiev. But being told that Ekaterinoslav was in the wrong district for that, he instead asked to enroll at Novorossiysk University in Odessa. He wrote a letter—in rather neat handwriting—to unscramble a bureaucratic issue, giving various excuses along the way:
But in the fall of 1906, there he was: a student in the Faculty of Physics and Mathematics Faculty of Novorossiysk University, in the rather upscale and cosmopolitan town of Odessa, on the Black Sea.
The Imperial Novorossiya University, as it was then officially called, had been created out of an earlier institution by Tsar Alexander II in 1865. It was a distinguished university, with for example Dmitri Mendeleev (of periodic table fame) having taught there. In Soviet times it would be renamed after the discoverer of macrophages, Élie Metchnikoff (who worked there). Nowadays it is usually known as Odessa University. And conveniently, it has maintained its archives well—so that, still there, 114 years later, is Moses Schönfinkel’s student file:
It’s amazing how “modern” a lot of what’s in it seems. First, there are documents Moses Schönfinkel sent so he could register (confirming them by telegram on September 1, 1906). There’s his highschool diploma and birth certificate—and there’s a document from the Ekaterinoslav City Council certifying his “citizen rank” (see above). The cover sheet also records a couple of other documents, one of which is presumably some kind of deferment of military service.
And then in the file there are two “photo cards” giving us pictures of the young Moses Schönfinkel, wearing the uniform of the Imperial Russian Army:
(These pictures actually seem to come from 1908; the style of uniform was a standard one issued after 1907; the [presumably] white collar tabs indicate the 3rd regiment of whatever division he was assigned to.)
Nowadays it would all be online, but in his physical file there is a “lecture book” listing courses (yes, every document is numbered, to correspond to a line in a central ledger):
Here are the courses Moses Schönfinkel took in his first semester in college (fall 1906):
Introduction to Analysis (6 hrs), Introduction to Determinant Theory (2 hrs), Analytical Geometry 1 (2 hrs), Chemistry (5 hrs), Physics 1 (3 hrs), Elementary Number Theory (2 hrs): a total of 20 hours. Here’s the bill for these courses: pretty good value at 1 ruble per coursehour, or a total of 20 rubles, which is about $300 today:
Subsequent semesters list many very familiar courses: Differential Calculus, Integrals (parts 1 and 2), and Higher Algebra, as well as “Calculus of Probabilities” (presumably probability theory) and “Determinant Theory” (essentially differently branded “linear algebra”). There are some “distribution” courses, like Astronomy (and Spherical Astronomy) and Physical Geography (or is that Geodesy?). And by 1908, there are also courses like Functions of a Complex Variable, IntegroDifferential Equations (yeah, differential equations definitely pulled ahead of integral equations over the past century), Calculus of Variations and Infinite Series. And—perhaps presaging Schönfinkel’s next life move—another course that makes an appearance in 1908 is German (and it’s Schönfinkel only nonscience course during his whole university career).
In Schönfinkel’s “lecture book” many of the courses also have names of professors listed. For example, there’s “Kagan”, who’s listed as teaching Foundations of Geometry (as well as Higher Algebra, Determinant Theory and IntegroDifferential Equations). That’s Benjamin Kagan, who was then a young lecturer, but would later become a leader in differential geometry in Moscow—and also someone who studied the axiomatic foundations of geometry (as well as writing about the somewhat tragic life of Lobachevsky).
Another professor—listed as teaching Schönfinkel Introduction to Analysis and Theory of Algebraic Equation Solving—is “Shatunovsky”. And (at least according to Shatunovsky’s later student Sofya Yanoskaya, of whom we’ll hear more later), Samuil Shatunovsky was basically Schönfinkel’s undergraduate advisor.
Shatunovsky had been the 9th child of a poor Jewish family (actually) from a village in Ukraine. He was never able to enroll at a university, but for some years did manage to go to lectures by people around Pafnuty Chebyshev in Saint Petersburg. For quite a few years he then made a living as an itinerant math tutor (notably in Ekaterinoslav) but papers he wrote were eventually noticed by people at the university in Odessa, and, finally, in 1905, at the age of 46, he ended up as a lecturer at the university—where the following year he taught Schönfinkel.
Shatunovsky (who stayed in Odessa until his death in 1929) was apparently an energetic but precise lecturer. He seems to have been quite axiomatically oriented, creating axiomatic systems for geometry, algebraic fields, and notably, for order relations. (He was also quite a constructivist, opposed to the indiscriminate use of the Law of Excluded Middle.) The lectures from his Introduction to Analysis course (which Schönfinkel took in 1906) were published in 1923 (by the local publishing company Mathesis in which he and Kagan were involved).
Another of Schönfinkel’s professors (from whom he took Differential Calculus and “Calculus of Probabilities”) was a certain Ivan (or Jan) Śleszyński, who had worked with Karl Weierstrass on things like continued fractions, but by 1906 was in his early 50s and increasingly transitioning to working on logic. In 1911 he moved to Poland, where he sowed some of the seeds for the Polish school of mathematical logic, in 1923 writing a book called On the Significance of Logic for Mathematics (notably with no mention of Schönfinkel), and in 1925 one on proof theory.
It’s not clear how much mathematical logic Moses Schönfinkel picked up in college, but in any case, in 1910, he was ready to graduate. Here’s his final student ID (what are those pieces of string for?):
There’s a certificate confirming that on April 6, 1910, Moses Schönfinkel had no books that needed returning to the library. And he sent a letter asking to graduate (with slightlylessneat handwriting than in 1906):
The letter closes with his signature (Моисей Шейнфинкель):
After Moses Schönfinkel graduated college in 1910 he probably went into four years of military service (perhaps as an engineer) in the Russian Imperial Army. World War I began on July 28, 1914—and Russia mobilized on July 30. But in one of his few pieces of good luck Moses Schönfinkel was not called up, having arrived in Göttingen, Germany on June 1, 1914 (just four weeks before the event that would trigger World War I), to study mathematics.
Göttingen was at the time a top place for mathematics. In fact, it was sufficiently much of a “math town” that around that time postcards of local mathematicians were for sale there. And the biggest star was David Hilbert—which is who Schönfinkel went to Göttingen hoping to work with.
Hilbert had grown up in Prussia and started his career in Königsberg. His big break came in 1888 at age 26 when he got a major result in representation theory (then called “invariant theory”)—using thenshocking nonconstructive techniques. And it was soon after this that Felix Klein recruited Hilbert to Göttingen—where he remained for the rest of his life.
In 1900 Hilbert gave his famous address to the International Congress of Mathematicians where he first listed his (ultimately 23) problems that he thought should be important in the future of mathematics. Almost all the problems are what anyone would call “mathematical”. But problem 6 has always stuck out for me: “Mathematical Treatment of the Axioms of Physics”: Hilbert somehow wanted to axiomatize physics as Euclid had axiomatized geometry. And he didn’t just talk about this; he spent nearly 20 years working on it. He brought in physicists to teach him, and he worked on things like gravitation theory (“Einstein–Hilbert action”) and kinetic theory—and wanted for example to derive the existence of the electron from something like Maxwell’s equations. (He was particularly interested in the way atomistic processes limit to continua—a problem that I now believe is deeply connected to computational irreducibility, in effect implying another appearance of undecidability, like in Hilbert’s 1st, 2nd and 10th problems.)
Hilbert seemed to feel that physics was a crucial source of raw material for mathematics. But yet he developed a whole program of research based on doing mathematics in a completely formalistic way—where one just writes down axioms and somehow “mechanically” generates all true theorems from them. (He seems to have drawn some distinction between “merely mathematical” questions, and questions about physics, apparently noting—in a certain resonance with my life’s work—that in the latter case “the physicist has the great calculating machine, Nature”.)
In 1899 Hilbert had written down more precise and formal axioms for Euclid’s geometry, and he wanted to go on and figure out how to formulate other areas of mathematics in this kind of axiomatic way. But for more than a decade he seems to have spent most of his time on physics—finally returning to questions about the foundations of mathematics around 1917, giving lectures about “logical calculus” in the winter session of 1920.
By 1920, World War I had come and gone, with comparatively little effect on mathematical life in Göttingen (the nearest battle was in Belgium 200 miles to the west). Hilbert was 58 years old, and had apparently lost quite a bit of his earlier energy (not least as a result of having contracted pernicious anemia [autoimmune vitamin B12 deficiency], whose cure was found only a few years later). But Hilbert was still a celebrity around Göttingen, and generating mathematical excitement. (Among “celebrity gossip” mentioned in a letter home by young Russian topologist Pavel Urysohn is that Hilbert was a huge fan of the gramophone, and that even at his advanced age, in the summer, he would sit in a tree to study.)
I have been able to find out almost nothing about Schönfinkel’s interaction with Hilbert. However, from April to August 1920 Hilbert gave weekly lectures entitled “Problems of Mathematical Logic” which summarized the standard formalism of the field—and the official notes for those lectures were put together by Moses Schönfinkel and Paul Bernays (the “N” initial for Schönfinkel is a typo):
A few months after these lectures came, at least from our perspective today, the highlight of Schönfinkel’s time in Göttingen: the talk he gave on December 7, 1920. The venue was the weekly meeting of the Göttingen Mathematics Society, held at 6pm on Tuesdays. The society wasn’t officially part of the university, but it met in the same university “Auditorium Building” that at the time housed the math institute:
The talks at the Göttingen Mathematics Society were listed in the Annual Report of the German Mathematicians Association:
There’s quite a lineup. November 9, Ludwig Neder (student of Edmund Landau): “Trigonometric Series”. November 16, Erich BesselHagen (student of Carathéodory): “Discontinuous Solutions of Variational Problems”. November 23, Carl Runge (of Runge–Kutta fame, then a Göttingen professor): “American Work on Star Clusters in the Milky Way”. November 30 Gottfried Rückle (assistant of van der Waals): “Explanations of Natural Laws Using a Statistical Mechanics Basis”. And then: December 7: Moses Schönfinkel, “Elements of Logic”.
The next week, December 14, Paul Bernays, who worked with Hilbert and interacted with Schönfinkel, spoke about “Probability, the Arrow of Time and Causality” (yes, there was still a lot of interest around Hilbert in the foundations of physics). January 10+11, Joseph Petzoldt (philosopher of science): “The Epistemological Basis of Special and General Relativity”. January 25, Emmy Noether (of Noether’s theorem fame): “Elementary Divisors and General Ideal Theory”. February 1+8, Richard Courant (of PDE etc. fame) & Paul Bernays: “About the New Arithmetic Theories of Weyl and Brouwer”. February 22, David Hilbert: “On a New Basis for the Meaning of a Number” (yes, that’s foundations of math).
What in detail happened at Schönfinkel’s talk, or as a result of it? We don’t know. But he seems to have been close enough to Hilbert that just over a year later he was in a picture taken for David Hilbert’s 60th birthday on January 23, 1922:
There are all sorts of wellknown mathematicians in the picture (Richard Courant, Hermann Minkowski, Edmund Landau, …) as well as some physicists (Peter Debye, Theodore von Kármán, Ludwig Prandtl, …). And there near the top left is Moses Schönfinkel, sporting a somewhat surprised expression.
For his 60th birthday Hilbert was given a photo album—with 44 pages of pictures of altogether about 200 mathematicians (and physicists). And there on page 22 is Moses Schönfinkel:
Who are the other people on the page with him? Adolf Kratzer (1893–1983) was a student of Arnold Sommerfeld, and a “physics assistant” to Hilbert. Hermann Vermeil (1889–1959) was an assistant to Hermann Weyl, who worked on differential geometry for general relativity. Heinrich Behmann (1891–1970) was a student of Hilbert and worked on mathematical logic, and we’ll encounter him again later. Finally, Carl Ludwig Siegel (1896–1981) had been a student of Landau and would become a wellknown number theorist.
There’s a lot that’s still mysterious about Moses Schönfinkel’s time in Göttingen. But we have one (undated) letter written by Nathan Schönfinkel, Moses’s younger brother, presumably in 1921 or 1922 (yes, he romanizes his name “Scheinfinkel” rather than “Schönfinkel”):
Dear Professor!
I received a letter from Rabbi Dr. Behrens in which he wrote that my brother was in need, that he was completely malnourished. It was very difficult for me to read these lines, even more so because I cannot help my brother. I haven’t received any messages or money myself for two years. Thanks to the good people where I live, I am protected from severe hardship. I am able to continue my studies. I hope to finish my PhD in 6 months. A few weeks ago I received a letter from my cousin stating that our parents and relatives are healthy. My cousin is in Kishinev (Bessarabia), now in Romania. He received the letter from our parents who live in Ekaterinoslav. Our parents want to help us but cannot do so because the postal connections are nonexistent. I hope these difficulties will not last long. My brother is helpless and impractical in this material world. He is a victim of his great love for science. Even as a 12 year old boy he loved mathematics, and all window frames and doors were painted with mathematical formulas by him. As a high school student, he devoted all his free time to mathematics. When he was studying at the university in Odessa, he was not satisfied with the knowledge there, and his striving and ideal was Göttingen and the king of mathematics, Prof. Hilbert. When he was accepted in Göttingen, he once wrote to me the following: “My dear brother, it seems to me as if I am dreaming but this is reality: I am in Göttingen, I saw Prof. Hilbert, I spoke to Prof. Hilbert.” The war came and with it suffering. My brother, who is helpless, has suffered more than anyone else. But he did not write to me so as not to worry me. He has a good heart. I ask you, dear Professor, for a few months until the connections with our city are established, to help him by finding a suitable (not harmful to his health) job for him. I will be very grateful to you, dear Professor, if you will answer me.
Sincerely.
N. Scheinfinkel
We’ll talk more about Nathan Schönfinkel later. But suffice it to say here that when he wrote the letter he was a physiology graduate student at the University of Bern—and he would get his PhD in 1922, and later became a professor. But the letter he wrote is probably our best single surviving source of information about the situation and personality of Moses Schönfinkel. Obviously he was a serious math enthusiast from a young age. And the letter implies that he’d wanted to work with Hilbert for some time (presumably hence the German classes in college).
It also implies that he was financially supported in Göttingen by his parents—until this was disrupted by World War I. (And we learn that his parents were OK in the Russian Revolution.) (By the way, the rabbi mentioned is probably a certain Siegfried Behrens, who left Göttingen in 1922.)
There’s no record of any reply to Nathan Schönfinkel’s letter from Hilbert. But at least by the time of Hilbert’s 60th birthday in 1922 Moses Schönfinkel was (as we saw above) enough in the inner circle to be invited to the birthday party.
What else is there in the university archives in Göttingen about Moses Schönfinkel? There’s just one document, but it’s very telling:
It’s dated 18 March 1924. And it’s a carbon copy of a reference for Schönfinkel. It’s rather cold and formal, and reads:
“The Russian privatdozent [private lecturer] in mathematics, Mr. Scheinfinkel, is hereby certified to have worked in mathematics for ten years with Prof. Hilbert in Göttingen.”
It’s signed (with a stylized “S”) by the “University Secretary”, a certain Ludwig Gossmann, who we’ll be talking about later. And it’s being sent to Ms. Raissa Neuburger, at Bühlplatz 5, Bern. That address is where the Physiology Institute at the University of Bern is now, and also was in 1924. And Raissa Neuberger either was then, or soon would become, Nathan Schönfinkel’s wife.
But there’s one more thing, handwritten in black ink at the bottom of the document. Dated March 20, it’s another note from the University Secretary. It’s annotated “a.a.”, i.e. ad acta—for the records. And in German it reads:
Gott sei Dank, dass Sch weg ist
which translates in English as:
Thank goodness Sch is gone
Hmm. So for some reason at least the university secretary was happy to see Schönfinkel go. (Or perhaps it was a German 1920s version of an HR notation: “not eligible for rehire”.) But let’s analyze this document in a little more detail. It says Schönfinkel worked with Hilbert for 10 years. That agrees with him having arrived in Göttingen in 1914 (which is a date we know for other reasons, as we’ll see below).
But now there’s a mystery. The reference describes Schönfinkel as a “privatdozent”. That’s a definite position at a German university, with definite rules, that in 1924 one would expect to have been rigidly enforced. The basic career track was (and largely still is): first, spend 2–5 years getting a PhD. Then perhaps get recruited for a professorship, or if not, continue doing research, and write a habilitation, after which the university may issue what amounts to an official government “license to teach”, making someone a privatdozent, able to give lectures. Being a privatdozent wasn’t as such a paid gig. But it could be combined with a job like being an assistant to a professor—or something outside the university, like tutoring, teaching high school or working at a company.
So if Schönfinkel was a privatdozent in 1924, where is the record of his PhD, or his habilitation? To get a PhD required “formally publishing” a thesis, and printing (as in, on a printing press) at least 20 or so copies of the thesis. A habilitation was typically a substantial, published research paper. But there’s absolutely no record of any of these things for Schönfinkel. And that’s very surprising. Because there are detailed records for other people (like Paul Bernays) who were around at the time, and were indeed privatdozents.
And what’s more the Annual Report of the German Mathematicians Association—which listed Schönfinkel’s 1920 talk—seems to have listed mathematical goingson in meticulous detail. Who gave what talk. Who wrote what paper. And most definitely who got a PhD, did a habilitation or became a privatdozent. (And becoming a privatdozent also required an action of the university senate, which was carefully recorded.) But going through all the annual reports of the German Mathematicians Association we find only four mentions of Schönfinkel. There’s his 1920 talk, and also a 1921 talk with Paul Bernays that we’ll discuss later. There’s the publication of his papers in 1924 and 1927. And there’s a single other entry, which says that on November 4, 1924, Richard Courant gave a report to the Göttingen Mathematical Society about a conference in Innsbruck, where Heinrich Behmann reported on “published work by M. Schönfinkel”. (It describes the work as follows: “It is a continuation of Sheffer’s [1913] idea of replacing the elementary operations of symbolic logic with a single one. By means of a certain function calculus, all logical statements (including the mathematical ones) are represented by three basic signs alone.”)
So, it seems, the university secretary wasn’t telling it straight. Schönfinkel might have worked with Hilbert for 10 years. But he wasn’t a privatdozent. And actually it doesn’t seem as if he had any “official status” at all.
So how do we even know that Schönfinkel was in Göttingen from 1914 to 1924? Well, he was Russian, and so in Germany he was an “alien”, and as such he was required to register his address with the local police (no doubt even more so from 1914 to 1918 when Germany was, after all, at war with Russia). And the remarkable thing is that even after all these years, Schönfinkel’s registration card is still right there in the municipal archives of the city of Göttingen:
So that means we have all Schönfinkel’s addresses during his time in Göttingen. Of course, there are confusions. There’s yet another birthdate for Schönfinkel: September 4, 1889. Wrong year. Perhaps a wrongly done correction from the Julian calendar. Perhaps “adjusted” for some reason of military service obligations. But, in any case, the document says that Moses Schönfinkel from Ekaterinoslav arrived in Göttingen on June 1, 1914, and started living at 6 Lindenstraße (now FelixKleinStrasse).
He moved pretty often (11 times in 10 years), not at particularly systematic times of year. It’s not clear exactly what the setup was in all these places, but at least at the end (and in another document) it lists addresses and “with Frau….”, presumably indicating that he was renting a room in someone’s house.
Where were all those addresses? Well, here’s a map of Göttingen circa 1920, with all of them plotted (along with a red “M” for the location of the math institute):
The last item on the registration card says that on March 18, 1924 he departed Göttingen, and went to Moscow. And the note on the copy of the reference saying “thank goodness [he’s] gone” is dated March 20, so that all ties together.
But let’s come back to the reference. Who was this “University Secretary” who seems to have made up the claim that Schönfinkel was a privatdozent? It was fairly easy to find out that his name was Ludwig Gossmann. But the big surprise was to find out that the university archives in Göttingen have nearly 500 pages about him—primarily in connection with a “criminal investigation”.
Here’s the story. Ludwig Gossmann was born in 1878 (so he was 10 years older than Schönfinkel). He grew up in Göttingen, where his father was a janitor at the university. He finished high school but didn’t go to college and started working for the local government. Then in 1906 (at age 28) he was hired by the university as its “secretary”.
The position of “university secretary” was a highlevel one. It reported directly to the vicerector of the university, and was responsible for “general administrative matters” for the university, including, notably, the supervision of international students (of whom there were many, Schönfinkel being one). Ludwig Gossmann held the position of university secretary for 27 years—even while the university had a different rector (normally a distinguished academic) every year.
But Mr. Gossmann also had a sideline: he was involved in real estate. In the 1910s he started building houses (borrowing money from, among others, various university professors). And by the 1920s he had significant real estate holdings—and a business where he rented to international visitors and students at the university.
Years went by. But then, on January 24, 1933, the newspaper headline announced: “Sensational arrest: senior university official Gossmann arrested on suspicion of treason—communist revolution material [Zersetzungsschrift] confiscated from his apartment”. It was said that perhaps it was a setup, and that he’d been targeted because he was gay (though, a year earlier, at age 54, he did marry a woman named Elfriede).
This was a bad time to be accused of being a communist (Hitler would become chancellor less than a week later, on January 30, 1933, in part propelled by fears of communism). Gossmann was taken to Hanover “for questioning”, but was then allowed back to Göttingen “under house arrest”. He’d had health problems for several years, and died of a heart attack on February 24, 1933.
But none of this really helps us understand why Gossmann would go out on a limb to falsify the reference for Schönfinkel. We can’t specifically find an address match, but perhaps Schönfinkel had at least at some point been a tenant of Gossmann’s. Perhaps he still owed rent. Perhaps he was just difficult in dealing with the university administration. It’s not clear. It’s also not clear why the reference Gossmann wrote was sent to Schönfinkel’s brother in Bern, even though Schönfinkel himself was going to Moscow. Or why it wasn’t just handed to Schönfinkel before he left Göttingen.
Whatever was going on with Schönfinkel in Göttingen in 1924, we know one thing for sure: it was then that he published his remarkable paper about what are now called combinators. Let’s talk in a bit more detail about the paper—though the technicalities I’m discussing elsewhere.
First, there’s some timing. At the end of the paper, it says it was received by the journal on March 15, 1924, i.e. just three days before the date of Ludwig Gossmann’s reference for Schönfinkel. And then at the top of the paper, there’s something else: under Schönfinkel’s name it says “in Moskau”, i.e. at least as far as the journal was concerned, Schönfinkel was in Moscow, Russia, at the time the article was published:
There’s also a footnote on the first page of the paper:
“The following thoughts were presented by the author to the Mathematical Society in Göttingen on December 7, 1920. Their formal and stylistic processing for this publication was done by H. Behmann in Göttingen.”
The paper itself is written in a nice, clear and mathematically mature way. Its big result (as I’ve discussed elsewhere) is the introduction of what would later be called combinators: two abstract constructs from which arbitrary functions and computations can be built up. Schönfinkel names one of them S, after the German word “Schmelzen” for “fusion”. The other has become known as K, although Schönfinkel calls it C, even though the German word for “constancy” (which is what would naturally describe it) is “Konstantheit”, which starts with a K.
The paper ends with three paragraphs, footnoted with “The considerations that follow are the editor’s” (i.e. Behmann’s). They’re not as clear as the rest of the paper, and contain a confused mistake.
The main part of the paper is “just math” (or computation, or whatever). But here’s the page where S and K (called C here) are first used:
And now there’s something more peopleoriented: a footnote to the combinator equation I = SCC saying “This reduction was communicated to me by Mr. Boskowitz; some time before that, Mr. Bernays had called the somewhat less simple one (SC)(CC) to my attention.” In other words, even if nothing else, Schönfinkel had talked to Boskowitz and Bernays about what he was doing.
OK, so we’ve got three people—in addition to David Hilbert—somehow connected to Moses Schönfinkel.
Let’s start with Heinrich Behmann—the person footnoted as “processing” Schönfinkel’s paper for publication:
He was born in Bremen, Germany in 1891, making him a couple of years younger than Schönfinkel. He arrived in Göttingen as a student in 1911, and by 1914 was giving a talk about Whitehead and Russell’s Principia Mathematica (which had been published in 1910). When World War I started he volunteered for military service, and in 1915 he was wounded in action in Poland (receiving an Iron Cross)—but in 1916 he was back in Göttingen studying under Hilbert, and in 1918 he wrote his PhD thesis on “The Antinomy of the Transfinite Number and Its Resolution by the Theory of Russell and Whitehead” (i.e. using the idea of types to deal with paradoxes associated with infinity).
Behmann continued in the standard academic track (i.e. what Schönfinkel apparently didn’t do)—and in 1921 he got his habilitation with the thesis “Contributions to the Algebra of Logic, in Particular to the Entscheidungsproblem [Decision Problem]”. There’d been other decision problems discussed before, but Behmann said what he meant was a “procedure [giving] complete instructions for determining whether a [logical or mathematical] assertion is true or false by a deterministic calculation after finitely many steps”. And, yes, Alan Turing’s 1936 paper “On Computable Numbers, with an Application to the Entscheidungsproblem” was what finally established that the halting problem, and therefore the Entscheidungsproblem, was undecidable. Curiously, in principle, there should have been enough in Schönfinkel’s paper that this could have been figured out back in 1921 if Behmann or others had been thinking about it in the right way (which might have been difficult before Gödel’s work).
So what happened to Behmann? He continued to work on mathematical logic and the philosophy of mathematics. After his habilitation in 1921 he became a privatdozent at Göttingen (with a job as an assistant in the applied math institute), and then in 1925 got a professorship in Halle in applied math—though having been an active member of the Nazi Party since 1937, lost this professorship in 1945 and became a librarian. He died in 1970.
(By the way, even though in 1920 “PM” [Principia Mathematica] was hot—and Behmann was promoting it—Schönfinkel had what in my opinion was the good taste to not explicitly mention it in his paper, referring only to Hilbert’s muchlessmuddy ideas about the formalization of mathematics.)
OK, so what about Boskovitz, credited in the footnote with having discovered the classic combinator result I = SKK? That was Alfred Boskovitz, in 1920 a 23yearold Jewish student at Göttingen, who came from Budapest, Hungary, and worked with Paul Bernays on set theory. Boskovitz is notable for having contributed far more corrections (nearly 200) to Principia Mathematica than anyone else, and being acknowledged (along with Behmann) in a footnote in the (1925–27) second edition. (This edition also gives a reference to Schönwinkel’s [sic] paper at the end of a list of 14 “other contributions to mathematical logic” since the first edition.) In the mid1920s Boskovitz returned to Budapest. In 1936 he wrote to Behmann that antiJewish sentiment there made him concerned for his safety. There’s one more known communication from him in 1942, then no further trace.
The third person mentioned in Schönfinkel’s paper is Paul Bernays, who ended up living a long and productive life, mostly in Switzerland. But we’ll come to him later.
So where was Schönfinkel’s paper published? It was in a journal called Mathematische Annalen (Annals of Mathematics)—probably the top math journal of the time. Here’s its rather swank masthead, with quite a collection of famous names (including physicists like Einstein, Born and Sommerfeld):
The “instructions to contributors” on the inside cover of each issue had a statement from the “Editorial Office” about not changing things at the proof stage because “according to a calculation they [cost] 6% of the price of a volume”. The instructions then go on to tell people to submit papers to the editors—at their various home addresses (it seems David Hilbert lived just down the street from Felix Klein…):
Here’s the complete table of contents for the volume in which Schönfinkel’s paper appears:
There are a variety of famous names here. But particularly notable for our purposes are Aleksandr Khintchine (of Khinchin constant fame) and the topologists Pavel Alexandroff and Pavel Urysohn, who were all from Moscow State University, and who are all indicated, like Schönfinkel as being “in Moscow”.
There’s a little bit of timing information here. Schönfinkel’s paper was indicated as having been received by the journal on March 15, 1924. The “thank goodness [he’s] gone [from Göttingen]” comment is dated March 20. Meanwhile, the actual issue of the journal with Schönfinkel’s article (number 3 of 4) was published September 15, with table of contents:
But note the ominous † next to Urysohn’s name. Turns out his fatal swimming accident was August 17, so—notwithstanding their admonitions—the journal must have added the † quite quickly at the proof stage.
Beyond his 1924 paper on combinators, there’s only one other known piece of published output from Moses Schönfinkel: a paper coauthored with Paul Bernays “On the Decision Problem of Mathematical Logic”:
It’s actually much more widely cited than Schönfinkel’s 1924 combinator paper, but it’s vastly less visionary and ultimately much less significant; it’s really about a technical point in mathematical logic.
About halfway through the paper it has a note:
“The following thoughts were inspired by Hilbert’s lectures on mathematical logic and date back several years. The decision procedure for a single function F(x, y) was derived by M. Schönfinkel, who first tackled the problem; P. Bernays extended the method to several logical functions, and also wrote the current paper.”
The paper was submitted on March 24, 1927. But in the records of the German Mathematicians Association we find a listing of another talk at the Göttingen Mathematical Society: December 6, 1921, P. Bernays and M. Schönfinkel, “Das Entscheidungsproblem im Logikkalkul”. So the paper had a long gestation period, and (as the note in the paper suggests) it basically seems to have fallen to Bernays to get it written, quite likely with little or no communication with Schönfinkel.
So what else do we know about it? Well, remarkably enough, the Bernays archive contains two notebooks (the paper kind!) by Moses Schönfinkel that are basically an early draft of the paper (with the title already being the same as it finally was, but with Schönfinkel alone listed as the author):
These notebooks are basically our best window into the front lines of Moses Schönfinkel’s work. They aren’t dated as such, but at the end of the second notebook there’s a byline of sorts, that lists his street address in Göttingen—and we know he lived at that address from September 1922 until March 1924:
OK, so what’s in the notebooks? The first page might indicate that the notebooks were originally intended for a different purpose. It’s just a timetable of lectures:
“Hilbert lectures: Monday: Mathematical foundations of quantum theory; Thursday: Hilbert–Bernays: Foundations of arithmetic; Saturday: Hilbert: Knowledge and mathematical thinking”. (There’s also a slightly unreadable note that seems to say “Hoppe. 6–8… electricity”, perhaps referring to Edmund Hoppe, who taught physics in Göttingen, and wrote a history of electricity.)
But then we’re into 15 pages (plus 6 in the other notebook) of content, written in essentially perfect German, but with lots of parentheticals of different possible word choices:
The final paper as coauthored with Bernays begins:
“The central problem of mathematical logic, which is also closely connected to its axiomatic foundations, is the decision problem [Entscheidungsproblem]. And it deals with the following. We have logical formulas which contain logic functions, predicates, …”
Schönfinkel’s version begins considerably more philosophically (here with a little editing for clarity):
“Generality has always been the main goal—the ideal of the mathematician. Generality in the solution, in the method, in the concept and formulation of the theorem, in the problem and question. This tendency is even more pronounced and clearer with modern mathematicians than with earlier ones, and reaches its high point in the work of Hilbert and Ms. Noether. Such an ideal finds its most extreme expression when one faces the problem of “solving all problems”—at least all mathematical problems, because everything else after is easy, as soon as this “Gordian Knot” is cut (because the world is written in “mathematical letters” according to Hilbert).
In just the previous century mathematicians would have been extremely skeptical and even averse to such fantasies… But today’s mathematician has already been trained and tested in the formal achievements of modern mathematics and Hilbert’s axiomatics, and nowadays one has the courage and the boldness to dare to touch this question as well. We owe to mathematical logic the fact that we are able to have such a question at all.
From Leibniz’s bold conjectures, the great logicianmathematicians went step by step in pursuit of this goal, in the systematic structure of mathematical logic: Boole (discoverer of the logical calculus), (Bolzano?), Ernst Schröder, Frege, Peano, Ms. LaddFranklin, the two Peirces, Sheffer, Whitehead, Couturat, Huntington, Padoa, Shatunovsky, Sleshinsky, Kagan, Poretsky, Löwenheim, Skolem, … and their numerous students, collaborators and contemporaries … until in 1910–1914 “the system” by Bertrand Russell and Whitehead appeared—the famous “Principia Mathematica”—a mighty titanic work, a large system. Finally came our knowledge of logic from Hilbert’s lectures on (the algebra of) logic (calculus) and, following on from this, the groundbreaking work of Hilbert’s students: Bernays and Behmann.
The investigations of all these scholars and researchers have led (in no uncertain terms) to the fact that it has become clear that actual mathematics represents a branch of logic. … This emerges most clearly from the treatment and conception of mathematical logic that Hilbert has given. And now, thanks to Hilbert’s approach, we can (satisfactorily) formulate the great decision problem of mathematical logic.”
We learn quite a bit about Schönfinkel from this. Perhaps the most obvious thing is that he was a serious fan of Hilbert and his approach to mathematics (with a definite shoutout to “Ms. Noether”). It’s also interesting that he refers to Bernays and Behmann as “students” of Hilbert. That’s pretty much correct for Behmann. But Bernays (as we’ll see soon) was more an assistant or colleague of Hilbert’s than a student.
It gives interesting context to see Schönfinkel rattle off a sequence of contributors to what he saw as the modern view of mathematical logic. He begins—quite rightly I think—mentioning “Leibniz’s bold conjectures”. He’s not sure whether Bernard Bolzano fits (and neither am I). Then he lists Schröder, Frege and Peano—all pretty standard choices, involved in building up the formal structure of mathematical logic.
Next he mentions Christine LaddFranklin. At least these days, she’s not particularly well known, but she had been a mathematical logic student of Charles Peirce, and in 1881 she’d written a paper about the “Algebra of Logic” which included a truth table, a solid 40 years before Post or Wittgenstein. (In 1891 she had also worked in Göttingen on color vision with the experimental psychologist Georg Müller—who was still there in 1921.) It’s notable that Schönfinkel mentions LaddFranklin ahead of the fatherandson Peirces. Next we see Sheffer, who Schönfinkel quotes in connection with Nand in his combinator paper. (No doubt unbeknownst to Schönfinkel, Henry Sheffer—who spent most of his life in the US—was also born in Ukraine [“near Odessa”, his documents said], and was also Jewish, and was just 6 years older than Schönfinkel.) I’m guessing Schönfinkel mentions Whitehead next in connection with universal algebra, rather than his later collaboration with Russell.
Next comes Louis Couturat, who frankly wouldn’t have made my list for mathematical logic, but was another “algebra of logic” person, as well as a Leibniz fan, and developer of the Ido language offshoot from Esperanto. Huntington was involved in the axiomatization of Boolean algebra; Padoa was connected to Peano’s program. Shatunovsky, Sleshinsky and Kagan were all professors of Schönfinkel’s in Odessa (as mentioned above), concerned in various ways with foundations of mathematics. Platon Poretsky I must say I had never heard of before; he seems to have done fairly technical work on propositional logic. And finally Schönfinkel lists Löwenheim and Skolem, both of whom are well known in mathematical logic today.
I consider it rather wonderful that Schönfinkel refers to Whitehead and Russell’s Principia Mathematica as a “titanic work” (Titanenwerk). The showy and “overconfident” Titanic had come to grief on its iceberg in 1912, somehow reminiscent of Principia Mathematica, eventually coming to grief on Gödel’s theorem.
At first it might just seem charming—particularly in view of his brother’s comment that “[Moses] is helpless and impractical in this material world”—to see Schönfinkel talk about how after one’s solved all mathematical problems, then solving all problems will be easy, explaining that, after all, Hilbert has said that “the world is written in ‘mathematical letters’”. He says that in the previous century mathematicians wouldn’t have seriously considered “solving everything”, but now, because of progress in mathematical logic, “one has the courage and the boldness to dare to touch this question”.
It’s very easy to see this as naive and unworldly—the writing of someone who knew only about mathematics. But though he didn’t have the right way to express it, Schönfinkel was actually onto something, and something very big. He talks at the beginning of his piece about generality, and about how recent advances in mathematical logic embolden one to pursue it. And in a sense he was very right about this. Because mathematical logic—through work like his—is what led us to the modern conception of computation, which really is successful in “talking about everything”. Of course, after Schönfinkel’s time we learned about Gödel’s theorem and computational irreducibility, which tell us that even though we may be able to talk about everything, we can never expect to “solve every problem” about everything.
But back to Schönfinkel’s life and times. The remainder of Schönfinkel’s notebooks give the technical details of his solution to a particular case of the decision problem. Bernays obviously worked through these, adding more examples as well as some generalization. And Bernays cut out Schönfinkel’s philosophical introduction, no doubt on the (probably correct) assumption that it would seem too airyfairy for the paper’s intended technical audience.
So who was Paul Bernays? Here’s a picture of him from 1928:
Bernays was almost exactly the same age as Schönfinkel (he was born on October 17, 1888—in London, where there was no calendar issue to worry about). He came from an international business family, was a Swiss citizen and grew up in Paris and Berlin. He studied math, physics and philosophy with a distinguished roster of professors in Berlin and Göttingen, getting his PhD in 1912 with a thesis on analytic number theory.
After his PhD he went to the University of Zurich, where he wrote a habilitation (on complex analysis), and became a privatdozent (yes, with the usual documentation, that can still be found), and an assistant to Ernst Zermelo (of ZFC set theory fame). But in 1917 Hilbert visited Zurich and soon recruited Bernays to return to Göttingen. In Göttingen, for apparently bureaucratic reasons, Bernays wrote a second habilitation, this time on the axiomatic structure of Principia Mathematica (again, all the documentation can still be found). Bernays was also hired to work as a “foundations of math assistant” to Hilbert. And it was presumably in that capacity that he—along with Moses Schönfinkel—wrote the notes for Hilbert’s 1920 course on mathematical logic.
Unlike Schönfinkel, Bernays followed a fairly standard—and successful—academic track. He became a professor in Göttingen in 1922, staying there until he was dismissed (because of partially Jewish ancestry) in 1933—after which he moved back to Zurich, where he stayed and worked very productively, mostly in mathematical logic (von Neumann–Bernays–Gödel set theory, etc.), until he died in 1977.
Back when he was in Göttingen one of the things Bernays did with Hilbert was to produce the twovolume classic Grundlagen der Mathematik (Foundations of Mathematics). So did the Grundlagen mention Schönfinkel? It has one mention of the Bernays–Schönfinkel paper, but no direct mention of combinators. However, there is one curious footnote:
This starts “A system of axioms that is sufficient to derive all true implicational formulas was first set up by M. Schönfinkel…”, then goes on to discuss work by Alfred Tarski. So do we have evidence of something else Schönfinkel worked on? Probably.
In ordinary logic, one starts from an axiom system that gives relations, say about And, Or and Not. But, as Sheffer established in 1910, it’s also possible to give an axiom system purely in terms of Nand (and, yes, I’m proud to say that I found the very simplest such axiom system in 2000). Well, it’s also possible to use other bases for logic. And this footnote is about using Implies as the basis. Actually, it’s implicational calculus, which isn’t as strong as ordinary logic, in the sense that it only lets you prove some of the theorems. But there’s a question again: what are the possible axioms for implicational calculus?
Well, it seems that Schönfinkel found a possible set of such axioms, though we’re not told what they were; only that Tarski later found a simpler set. (And, yes, I looked for the simpler axiom systems for implicational calculus in 2000, but didn’t find any.) So again we see Schönfinkel in effect trying to explore the lowestlevel foundations of mathematical logic, though we don’t know any details.
So what other interactions did Bernays have with Schönfinkel? There seems to be no other information in Bernays’s archives. But I have been able to get a tiny bit more information. In a strange chain of connections, someone who’s worked on Mathematica and Wolfram Language since 1987 is Roman Maeder. And Roman’s thesis advisor (at ETH Zurich) was Erwin Engeler—who was a student of Paul Bernays. Engeler (who is now in his 90s) worked for many years on combinators, so of course I had to ask him what Bernays might have told him about Schönfinkel. He told me he recalled only two conversations. He told me he had the impression that Bernays found Schönfinkel a difficult person. He also said he believed that the last time Bernays saw Schönfinkel it was in Berlin, and that Schönfinkel was somehow in difficult circumstances. Any such meeting in Berlin would have had to be before 1933. But try as we might to track it down, we haven’t succeeded.
In the space of three days in March 1924 Moses Schönfinkel—by then 35 years old—got his paper on combinators submitted to Mathematische Annalen, got a reference for himself sent out, and left for Moscow. But why did he go to Moscow? We simply don’t know.
A few things are clear, though. First, it wasn’t difficult to get to Moscow from Göttingen at that time; there was pretty much a direct train there. Second, Schönfinkel presumably had a valid Russian passport (and, one assumes, didn’t have any difficulties from not having served in the Russian military during World War I).
One also knows that there was a fair amount of intellectual exchange and travel between Göttingen and Moscow. The very same volume of Mathematische Annalen in which Schönfinkel’s paper was published has three (out of 19 authors) authors in addition to Schönfinkel listed as being in Moscow: Pavel Alexandroff, Pavel Urysohn and Aleksandr Khinchin. Interestingly, all of these people were at Moscow State University.
And we know there was more exchange with that university. Nikolai Luzin, for example, got his PhD in Göttingen in 1915, and went on to be a leader in mathematics at Moscow State University (until he was effectively dismissed by Stalin in 1936). And we know that for example in 1930, Andrei Kolmogorov, having just graduated from Moscow State University, came to visit Hilbert.
Did Schönfinkel go to Moscow State University? We don’t know (though we haven’t yet been able to access any archives that may be there).
Did Schönfinkel go to Moscow because he was interested in communism? Again, we don’t know. It’s not uncommon to find mathematicians ideologically sympathetic to at least the theory of communism. But communism doesn’t seem to have particularly been a thing in the mathematics or general university community in Göttingen. And indeed when Ludwig Gossmann was arrested in 1933, investigations of who he might have recruited into communism didn’t find anything of substance.
Still, as I’ll discuss later, there is a tenuous reason to think that Schönfinkel might have had some connection to Leon Trotsky’s circle, so perhaps that had something to do with him going to Moscow—though it would have been a bad time to be involved with Trotsky, since by 1925 he was already out of favor with Stalin.
A final theory is that Schönfinkel might have had relatives in Moscow; at least it looks as if some of his Lurie cousins ended up there.
But realistically we don’t know. And beyond the bylines on the journals, we don’t really have any documentary evidence that Schönfinkel was in Moscow. However, there is one more data point, from November 1927 (8 months after the submission of Schönfinkel’s paper with Bernays). Pavel Alexandroff was visiting Princeton University, and when Haskell Curry (who we’ll meet later) asked him about Schönfinkel he was apparently told that “Schönfinkel has… gone insane and is now in a sanatorium & will probably not be able to work any more.”
Ugh! What happened? Once again, we don’t know. Schönfinkel doesn’t seem to have ever been “in a sanatorium” while he was in Göttingen; after all, we have all his addresses, and none of them were sanatoria. Maybe there’s a hint of something in Schönfinkel’s brother’s letter to Hilbert. But are we really sure that Schönfinkel actually suffered from mental illness? There’s a bunch of hearsay that says he did. But then it’s a common claim that logicians who do highly abstract work are prone to mental illness (and, well, yes, there are a disappointingly large number of historical examples).
Mental illness wasn’t handled very well in the 1920s. Hilbert’s only child, his son Franz (who was about five years younger than Schönfinkel), suffered from mental illness, and after a delusional episode that ended up with him in a clinic, David Hilbert simply said “From now on I have to consider myself as someone who does not have a son”. In Moscow in the 1920s—despite some political rhetoric—conditions in psychiatric institutions were probably quite poor, and there was for example quite a bit of use of primitive shock therapy (though not yet electroshock). It’s notable, by the way, that Curry reports that Alexandroff described Schönfinkel as being “in a sanatorium”. But while at that time the word “sanatorium” was being used in the US as a better term for “insane asylum”, in Russia it still had more the meaning of a place for a rest cure. So this still doesn’t tell us if Schönfinkel was in fact “institutionalized”—or just “resting”. (By the way, if there was mental illness involved, another connection for Schönfinkel that doesn’t seem to have been made is that Paul Bernays’s first cousin once removed was Martha Bernays, wife of Sigmund Freud.)
Whether or not he was mentally ill, what would it have been like for Schönfinkel in what was then the Soviet Union in the 1920s? One thing is that in the Soviet system, everyone was supposed to have a job. So Schönfinkel was presumably employed doing something—though we have no idea what. Schönfinkel had presumably been at least somewhat involved with the synagogue in Göttingen (which is how the rabbi there knew to tell his brother he was in bad shape). There was a large and growing Jewish population in Moscow in the 1920s, complete with things like Yiddish newspapers. But by the mid 1930s it was no longer so comfortable to be Jewish in Moscow, and Jewish cultural organizations were being shut down.
By the way, in the unlikely event that Schönfinkel was involved with Trotsky, there could have been trouble even by 1925, and certainly by 1929. And it’s notable that it was a common tactic for Stalin (and others) to claim that their various opponents were “insane”.
So what else do we know about Schönfinkel in Moscow? It’s said that he died there in 1940 or 1942, aged 52–54. Conditions in Moscow wouldn’t have been good then; the socalled Battle of Moscow occurred in the winter of 1941. And there are various stories told about Schönfinkel’s situation at that time.
The closest to a primary source seems to be a summary of mathematical logic in the Soviet Union, written by Sofya Yanovskaya in 1948. Yanovskaya was born in 1896 (so 8 years after Schönfinkel), and grew up in Odessa. She attended the same university there as Schönfinkel, studying mathematics, though arrived five years after Schönfinkel graduated. She had many of the same professors as Schönfinkel, and, probably like Schönfinkel, was particularly influenced by Shatunovsky. When the Russian Revolution happened, Yanovskaya went “all in”, becoming a serious party operative, but eventually began to teach, first at the Institute of Red Professors, and then from 1925 at Moscow State University—where she became a major figure in mathematical logic, and was eventually awarded the Order of Lenin.
One might perhaps have thought that mathematical logic would be pretty much immune to political issues. But the founders of communism had talked about mathematics, and there was a complex debate about the relationship between Marxist–Leninist ideology and formal ideas in mathematics, notably the Law of Excluded Middle. Sofya Yanovskaya was deeply involved, initially in trying to “bring mathematics to heel”, but later in defending it as a discipline, as well as in editing Karl Marx’s mathematical writings.
It’s not clear to what extent her historical writings were censored or influenced by party considerations, but they certainly contain lots of good information, and in 1948 she wrote a paragraph about Schönfinkel:
“The work of M. I. Sheinfinkel played a substantial role in the further development of mathematical logic. This brilliant student of S. O. Shatunovsky, unfortunately, left us early. (After getting mentally ill [заболев душевно], M. I. Sheinfinkel passed away in Moscow in 1942.) He did the work mentioned here in 1920, but only published it in 1924, edited by Behmann.”
Unless she was hiding things, this quote doesn’t make it sound as if Yanovskaya knew much about Schönfinkel. (By the way, her own son was apparently severely mentally ill.) A student of Jean van Heijenoort (who we’ll encounter later) named Irving Anellis did apparently in the 1990s ask a student of Yanovskaya’s whether Yanovskaya had known Schönfinkel. Apparently he responded that unfortunately nobody had thought to ask her that question before she died in 1966.
What else do we know? Nothing substantial. The most extensively embellished story I’ve seen about Schönfinkel appears in an anonymous comment on the talk page for the Wikipedia entry about Schönfinkel:
“William Hatcher, while spending time in St Petersburg during the 1990s, was told by Soviet mathematicians that Schönfinkel died in wretched poverty, having no job and but one room in a collective apartment. After his death, the rough ordinary people who shared his apartment burned his manuscripts for fuel (WWII was raging). The few Soviet mathematicians around 1940 who had any discussions with Schönfinkel later said that those mss reinvented a great deal of 20th century mathematical logic. Schönfinkel had no way of accessing the work of Turing, Church, and Tarski, but had derived their results for himself. Stalin did not order Schönfinkel shot or deported to Siberia, but blame for Schönfinkel’s death and inability to publish in his final years can be placed on Stalin’s doorstep. 202.36.179.65 06:50, 25 February 2006 (UTC)”
William Hatcher was a mathematician and philosopher who wrote extensively about the Baháʼí Faith and did indeed spend time at the Steklov Institute of Mathematics in Saint Petersburg in the 1990s—and mentioned Schönfinkel’s technical work in his writings. People I’ve asked at the Steklov Institute do remember Hatcher, but don’t know anything about what it’s claimed he was told about Schönfinkel. (Hatcher died in 2005, and I haven’t been successful at getting any material from his archives.)
So are there any other leads? I did notice that the IP address that originated the Wikipedia comment is registered to the University of Canterbury in New Zealand. So I asked people there and in the New Zealand foundations of math scene. But despite a few “maybe soandso wrote that” ideas, nobody shed any light.
OK, so what about at least a death certificate for Schönfinkel? Well, there’s some evidence that the registry office in Moscow has one. But they tell us that in Russia only direct relatives can access death certificates….
So far as we know, Moses Schönfinkel never married, and didn’t have children. But he did have a brother, Nathan, who we encountered earlier in connection with the letter he wrote about Moses to David Hilbert. And in fact we know quite a bit about Nathan Scheinfinkel (as he normally styled himself). Here’s a biographical summary from 1932:
The basic story is that he was about five years younger than Moses, and went to study medicine at the University of Bern in Switzerland in April 1914 (i.e. just before World War I began). He got his MD in 1920, then got his PhD on “Gas Exchange and Metamorphosis of Amphibian Larvae after Feeding on the Thyroid Gland or Substances Containing Iodine” in 1922. He did subsequent research on the electrochemistry of the nervous system, and in 1929 became a privatdozent—with official “license to teach” documentation:
(In a piece of bizarre smallworldness, my grandfather, Max Wolfram, also got a PhD in the physiology [veterinary medicine] department at the University of Bern [studying the function of the thymus gland], though that was in 1909, and presumably he had left before Nathan Scheinfinkel arrived.)
But in any case, Nathan Scheinfinkel stayed at Bern, eventually becoming a professor, and publishing extensively, including in English. He became a Swiss citizen in 1932, with the official notice stating:
“Scheinfinkel, Nathan. Son of Ilia Gerschow and Mascha [born] Lurie, born in Yekaterinoslav, Russia, September 13, 1893 (old style). Doctor of medicine, residing in Bern, Neufeldstrasse 5a, husband of Raissa [born] Neuburger.”
In 1947, however, he moved to become a founding professor in a new medical school in Ankara, Turkey. (Note that Turkey, like Switzerland, had been neutral in World War II.) In 1958 he moved again, this time to found the Institute of Physiology at Ege University in Izmir, Turkey, and then at age 67, in 1961, he retired and returned to Switzerland.
Did Nathan Scheinfinkel have children (whose descendents, at least, might know something about “Uncle Moses”)? It doesn’t seem so. We tracked down Nuran Harirî, now an emeritus professor, but in the 1950s a young physiology resident at Ege University responsible for translating Nathan Scheinfinkel’s lectures into Turkish. She said that Nathan Scheinfinkel was at that point living in campus housing with his wife, but she never heard mention of any children, or indeed of any other family members.
What about any other siblings? Amazingly, looking through handwritten birth records from Ekaterinoslav, we found one! Debora Schönfinkel, born December 22, 1889 (i.e. January 3, 1890, in the modern calendar):
So Moses Schönfinkel had a younger sister, as well as a younger brother. And we even know that his sister graduated from 7th grade in June 1907. But we don’t know anything else about her, or about other siblings. We know that Schönfinkel’s mother died in 1936, at the age of 74.
Might there have been other Schönfinkel relatives in Ekaterinoslav? Perhaps, but it’s unlikely they survived World War II—because in one of those shocking and tragic pieces of history, over a fourday period in February 1942 almost the whole Jewish population of 30,000 was killed.
Could there be other Schönfinkels elsewhere? The name is not common, but it does show up (with various spellings and transliterations), both before and after Moses Schönfinkel. There’s a Scheinfinkel Russian revolutionary buried in the Kremlin Wall; there was a Lovers of Zion delegate Scheinfinkel from Ekaterinoslav. There was a Benjamin Scheinfinkel in New York City in the 1940s; a Shlomo Scheinfinkel in Haifa in the 1930s. There was even a certain curiously named Bas Saul Haskell Scheinfinkel born in 1875. But despite quite a bit of effort, I’ve been unable to locate any living relative of Moses Schönfinkel. At least so far.
What happened with combinators after Schönfinkel published his 1924 paper? Initially, so far as one can tell, nothing. That is, until Haskell Curry found Schönfinkel’s paper in the library at Princeton University in November 1927—and launched into a lifetime of work on combinators.
Who was Haskell Curry? And why did he know to care about Schönfinkel’s paper?
Haskell Brooks Curry was born on September 12, 1900, in a small town near Boston, MA. His parents were both elocution educators, who by the time Haskell Curry was born were running the School of Expression (which had evolved from his mother’s Bostonbased School of Elocution and Expression). (Many years later, the School of Expression would evolve into Curry College in Waltham, Massachusetts—which happens to be where for several years we held our Wolfram Summer School, often noting the “coincidence” of names when combinators came up.)
Haskell Curry went to college at Harvard, graduating in mathematics in 1920. After a couple of years doing electrical engineering, he went back to Harvard, initially working with Percy Bridgman, who was primarily an experimental physicist, but was writing a philosophy of science book entitled The Logic of Modern Physics. And perhaps through this Curry got introduced to Whitehead and Russell’s Principia Mathematica.
But in any case, there’s a note in his archive about Principia Mathematica dated May 20, 1922:
Curry seems—perhaps like an electrical engineer or a “preprogrammer”—to have been very interested in the actual process of mathematical logic, starting his notes with: “No logical process is possible without the phenomenon of substitution.” He continued, trying to break down the process of substitution.
But then his notes end, more philosophically, and perhaps with “expression” influence: “Phylogenetic origin of logic: 1. Sensation; 2. Association: Red hot poker–law of permanence”.
At Harvard Curry started working with George Birkhoff towards a PhD on differential equations. But by 1927–8 he had decided to switch to logic, and was spending a year as an instructor at Princeton. And it was there—in November 1927—that he found Schönfinkel’s paper. Preserved in his archives are the notes he made:
At the top there’s a date stamp of November 28, 1927. Then Curry writes: “This paper anticipates much of what I have done”—then launches into a formal summary of Schönfinkel’s paper (charmingly using f@x to indicate function application—just as we do in Wolfram Language, except his is left associative…).
He ends his “report” with “In criticism I might say that no formal development have been undertaken in the above. Equality is taken intuitively and such things as universality, and proofs of identity are shown on the principle that if for every z, x@z : y@z then x=y ….”
But then there’s another piece:
“On discovery of this paper I saw Prof. Veblen. Schönfinkel’s paper said ‘in Moskau’. Accordingly we sought out Paul Alexandroff. The latter says Schönfinkel has since gone insane and is now in a sanatorium & will probably not be able to work any more. The paper was written with help of Paul Bernays and Behman [sic]; who would presumably be the only people in the world who would write on that subject.”
What was the backstory to this? Oswald Veblen was a math professor at Princeton who had worked on the axiomatization of geometry and was by then working on topology. Pavel Alexandroff (who we encountered earlier) was visiting from Moscow State University for the year, working on topology with Hopf, Lefschetz, Veblen and Alexander. I’m not quite sure why Curry thought Bernays and Behmann “would be the only people in the world who would write on that subject”; I don’t see how he could have known.
Curry continues: “It was suggested I write to Bernays, who is außerord. prof. [longterm lecturer] at Göttingen.” But then he adds—in depressingly familiar academic form: “Prof. Veblen thought it unwise until I had something definite ready to publish.”
“A footnote to Schönfinkel’s paper said the ideas were presented before Math Gesellschaft in Göttingen on Dec. 7, 1920 and that its formal and elegant [sic] write up was due to H. Behman”. “Elegant” is a peculiar translation of “stilistische” that probably gives Behmann too much credit; a more obvious translation might be “stylistic”.
Curry continues: “Alexandroff’s statements, as I interpret them, are to the effect that Bernays, Behman, Ackermann, von Neumann, Schönfinkel & some others form a small school of math logicians working on this & similar topics in Göttingen.”
And so it was that Curry resolved to study in Göttingen, and do his PhD in logic there. But before he left for Göttingen, Curry wrote a paper (published in 1929):
Already there’s something interesting in the table of contents: the use of the word “combinatory”, which, yes, in Curry’s care is going to turn into “combinator”.
The paper starts off reading a bit like a student essay, and one’s not encouraged by a footnote a few pages in:
“In the writing the foregoing account I have naturally made use of any ideas I may have gleaned from reading the literature. The writings of Hilbert are fundamental in this connection. I hope that I have added clearness to certain points where the existing treatments are obscure.” [“Clearness” not “clarity”?]
Then, towards the end of the “Preliminary Discussion” is this:
And the footnote says: “See the paper of Schönfinkel cited below”. It’s (so far as I know) the firstever citation to Schönfinkel’s paper!
On the next page Curry starts to give details. Curry starts talking about substitution, then says (in an echo of modern symbolic language design) this relates to the idea of “transformation of functions”:
At first he’s off talking about all the various combinatorial arrangements of variables, etc. But then he introduces Schönfinkel—and starts trying to explain in a formal way what Schönfinkel did. And even though he says he’s talking about what one assumes is structural substitution, he seems very concerned about what equality means, and how Schönfinkel didn’t quite define that. (And, of course, in the end, with universal computation, undecidability, etc. we know that the definition of equality wasn’t really accessible in the 1920s.)
By the next page, here we are, S and K (Curry renamed Schönfinkel’s C):
At first he’s imagining that the combinators have to be applied to something (i.e. f[x] not just f). But by the next page he comes around to what Schönfinkel was doing in looking at “pure combinators”:
The rest of the paper is basically concerned with setting up combinators that can successively represent permutations—and it certainly would have been much easier if Curry had had a computer (and one could imagine minimal “combinator sorters” like minimal sorting networks):
After writing this paper, Curry went to Göttingen—where he worked with Bernays. I must say that I’m curious what Bernays said to Curry about Schönfinkel (was it more than to Erwin Engeler?), and whether other people around Göttingen even remembered Schönfinkel, who by then had been gone for more than four years. In 1928, travel in Europe was open enough that Curry should have had no trouble going, for example, to Moscow, but there’s no evidence he made any effort to reach out to Schönfinkel. But in any case, in Göttingen he worked on combinators, and over the course of a year produced his first official paper on “combinatory logic”:
Strangely, the paper was published in an American journal—as the only paper not in English in that volume. The paper is more straightforward, and in many ways more “Schönfinkel like”. But it was just the first of many papers that Curry wrote about combinators over the course of nearly 50 years.
Curry was particularly concerned with the “mathematicization” of combinators, finding and fixing problems with axioms invented for them, connecting to other formalisms (notably Church’s lambda calculus), and generally trying to prove theorems about what combinators do. But more than that, Curry spread the word about combinators far and wide. And before long most people viewed him as “Mr. Combinator”, with Schönfinkel at most a footnote.
In 1958, when Haskell Curry and Robert Feys wrote their book on Combinatory Logic, there’s a historical footnote—that gives the impression that Curry “almost” had Schönfinkel’s ideas before he saw Schönfinkel’s paper in 1927:
I have to say that I don’t think that’s a correct impression. What Schönfinkel did was much more singular than that. It’s plausible to think that others (and particularly Curry) could have had the idea that there could be a way to go “below the operations of mathematical logic” and find more fundamental building blocks based on understanding things like the process of substitution. But the actuality of how Schönfinkel did it is something quite different—and something quite unique.
And when one sees Schönfinkel’s S combinator: what mind could have come up with such a thing? Even Curry says he didn’t really understand the significance of the S combinator until the 1940s.
I suppose if one’s just thinking of combinatory logic as a formal system with a certain general structure then it might not seem to matter that things as simple as S and K can be the ultimate building blocks. But the whole point of what Schönfinkel was trying to do (as the title of his paper says) was to find the “building blocks of logic”. And the fact that he was able to do it—especially in terms of things as simple as S and K—was a great and unique achievement. And not something that (despite all the good he did for combinators) Curry did.
In the decade or so after Schönfinkel’s paper appeared, Curry occasionally referenced it, as did Church and a few other closely connected people. But soon Schönfinkel’s paper—and Schönfinkel himself—disappeared completely from view, and standard databases list no citations.
But in 1967 Schönfinkel’s paper was seen again—now even translated into English. The venue was a book called From Frege to Gödel: A Source Book in Mathematical Logic, 1879–1931. And there, sandwiched between von Neumann on transfinite numbers and Hilbert on “the infinite”, is Schönfinkel’s paper, in English, with a couple of pages of introduction by Willard Van Orman Quine. (And indeed it was from this book that I myself first became aware of Schönfinkel and his work.)
But how did Schönfinkel’s paper get into the book? And do we learn anything about Schönfinkel from its appearance there? Maybe. The person who put the book together was a certain Jean van Heijenoort, who himself had a colorful history. Born in 1912, he grew up mostly in France, and went to college to study mathematics—but soon became obsessed with communism, and in 1932 left to spend what ended up being nearly ten years working as a kind of combination PR person and bodyguard for Leon Trotsky, initially in Turkey but eventually in Mexico. Having married an American, van Heijenoort moved to New York City, eventually enrolling in a math PhD program, and becoming a professor doing mathematical logic (though with some colorful papers along the way, with titles like “The Algebra of Revolution”).
Why is this relevant? Well, the question is: how did van Heijenoort know about Schönfinkel? Perhaps it was just through careful scholarship. But just maybe it was through Trotsky. There’s no real evidence, although it is known that during his time in Mexico, Trotsky did request a copy of Principia Mathematica (or was it his “PR person”?). But at least if there was a Trotsky connection it could help explain Schönfinkel’s strange move to Moscow. But in the end we just don’t know.
When one reads about the history of science, there’s a great tendency to get the impression that big ideas come suddenly to people. But my historical research—and my personal experience—suggest that that’s essentially never what happens. Instead, there’s usually a period of many years in which some methodology or conceptual framework gradually develops, and only then can the great idea emerge.
So with Schönfinkel it’s extremely frustrating that we just can’t see that long period of development. The records we have just tell us that Schönfinkel announced combinators on December 7, 1920. But how long had he been working towards them? We just don’t know.
On the face of it, his paper seems simple—the kind of thing that could have been dashed off in a few weeks. But I think it’s much more likely that it was the result of a decade of development—of which, through foibles of history, we now have no trace.
Yes, what Schönfinkel finally came up with is simple to explain. But to get to it, he had to cut through a whole thicket of technicality—and see the essence of what lay beneath. My life as a computational language designer has often involved doing very much this same kind of thing. And at the end of it, what you come up with may seem in retrospect “obvious”. But to get there often requires a lot of hard intellectual work.
And in a sense what Schönfinkel did was the most impressive possible version of this. There were no computers. There was no ambient knowledge of computation as a concept. Yet Schönfinkel managed to come up with a system that captures the core of those ideas. And while he didn’t quite have the language to describe it, I think he did have a sense of what he was doing—and the significance it could have.
What was the personal environment in which Schönfinkel did all this? We just don’t know. We know he was in Göttingen. We don’t think he was involved in any particularly official way with the university. Most likely he was just someone who was “around”. Clearly he had some interaction with people like Hilbert and Bernays. But we don’t know how much. And we don’t really know if they ever thought they understood what Schönfinkel was doing.
Even when Curry picked up the idea of combinators—and did so much with it—I don’t think he really saw the essence of what Schönfinkel was trying to do. Combinators and Schönfinkel are a strange episode in intellectual history. A seed sown far ahead of its time by a person who left surprisingly few traces, and about whom we know personally so little.
But much as combinators represent a way of getting at the essence of computation, perhaps in combinators we have the essence of Moses Schönfinkel: years of a life compressed to two “signs” (as he would call them) S and K. And maybe if the operation we now call currying needs a symbol we should be using the “sha” character Ш from the beginning of Schönfinkel’s name to remind us of a person about whom we know so little, but who planted a seed that gave us so much.
Many people and organizations have helped in doing research and providing material for this piece. Thanks particularly to Hatem Elshatlawy (fieldwork in Göttingen, etc.), Erwin Engeler (firstperson history), Unal Goktas (Turkish material), Vitaliy Kaurov (locating Ukraine + Russia material), Anna & Oleg Marichev (interpreting old Russian handwriting), Nik Murzin (fieldwork in Moscow), Eila Stiegler (German translations), Michael Trott (interpreting German). Thanks also for input from Henk Barendregt, Semih Baskan, Metin Baştuğ, Cem Boszahin, Jason Cawley, Jack Copeland, Nuran Hariri, Ersin Koylu, Alexander Kuzichev, Yuri Matiyasevich, Roman Maeder, Volker Peckhaus, Jonathan Seldin, Vladimir Shalack, Matthew Szudzik, Christian Thiel, Richard Zach. Particular thanks to the following archives and staff: Berlin State Library [Gabriele Kaiser], Bern University Archive [Niklaus Bütikofer], ETHZ (Bernays) Archive [Flavia Lanini, Johannes Wahl], Göttingen City Archive [Lena Uffelmann], Göttingen University [Katarzyna Chmielewska, Bärbel Mund, Petra Vintrová, Dietlind Willer].
]]>“In principle you could use combinators,” some footnote might say. But the implication tends to be “But you probably don’t want to.” And, yes, combinators are deeply abstract—and in many ways hard to understand. But tracing their history over the hundred years since they were invented, I’ve come to realize just how critical they’ve actually been to the development of our modern conception of computation—and indeed my own contributions to it.
The idea of representing things in a formal, symbolic way has a long history. In antiquity there was Aristotle’s logic and Euclid’s geometry. By the 1400s there was algebra, and in the 1840s Boolean algebra. Each of these was a formal system that allowed one to make deductions purely within the system. But each, in a sense, ultimately viewed itself as being set up to model something specific. Logic was for modeling the structure of arguments, Euclid’s geometry the properties of space, algebra the properties of numbers; Boolean algebra aspired to model the “laws of thought”.
But was there perhaps some more general and fundamental infrastructure: some kind of abstract system that could ultimately model or represent anything? Today we understand that’s what computation is. And it’s becoming clear that the modern conception of computation is one of the single most powerful ideas in all of intellectual history—whose implications are only just beginning to unfold.
But how did we finally get to it? Combinators had an important role to play, woven into a complex tapestry of ideas stretching across more than a century.
The main part of the story begins in the 1800s. Through the course of the 1700s and 1800s mathematics had developed a more and more elaborate formal structure that seemed to be reaching ever further. But what really was mathematics? Was it a formal way of describing the world, or was it something else—perhaps something that could exist without any reference to the world?
Developments like nonEuclidean geometry, group theory and transfinite numbers made it seem as if meaningful mathematics could indeed be done just by positing abstract axioms from scratch and then following a process of deduction. But could all of mathematics actually just be a story of deduction, perhaps even ultimately derivable from something seemingly lower level—like logic?
But if so, what would things like numbers and arithmetic be? Somehow they would have to be “constructed out of pure logic”. Today we would recognize these efforts as “writing programs” for numbers and arithmetic in a “machine code” based on certain “instructions of logic”. But back then, everything about this and the ideas around it had to be invented.
Before one could really dig into the idea of “building mathematics from logic” one had to have ways to “write mathematics” and “write logic”. At first, everything was just words and ordinary language. But by the end of the 1600s mathematical notation like +, =, > had been established. For a while new concepts—like Boolean algebra—tended to just piggyback on existing notation. By the end of the 1800s, however, there was a clear need to extend and generalize how one wrote mathematics.
In addition to algebraic variables like x, there was the notion of symbolic functions f, as in f(x). In logic, there had long been the idea of letters (p, q, …) standing for propositions (“it is raining now”). But now there needed to be notation for quantifiers (“for all x suchandsuch”, or “there exists x such that…”). In addition, in analogy to symbolic functions in mathematics, there were symbolic logical predicates: not just explicit statements like x > y but also ones like p(x, y) for symbolic p.
The first full effort to set up the necessary notation and come up with an actual scheme for constructing arithmetic from logic was Gottlob Frege’s 1879 Begriffsschrift (“concept script”):
And, yes, it was not so easy to read, or to typeset—and at first it didn’t make much of an impression. But the notation got more streamlined with Giuseppe Peano’s Formulario project in the 1890s—which wasn’t so concerned with starting from logic as starting from some specified set of axioms (the “Peano axioms”):
And then in 1910 Alfred Whitehead and Bertrand Russell began publishing their 2000page Principia Mathematica—which pretty much by its sheer weight and ambition (and notwithstanding what I would today consider grotesque errors of language design)—popularized the possibility of building up “the complexity of mathematics” from “the simplicity of logic”:
It was one thing to try to represent the content of mathematics, but there was also the question of representing the infrastructure and processes of mathematics. Let’s say one picks some axioms. How can one know if they’re consistent? What’s involved in proving everything one can prove from them?
In the 1890s David Hilbert began to develop ideas about this, particularly in the context of tightening up the formalism of Euclid’s geometry and its axioms. And after Principia Mathematica, Hilbert turned more seriously to the use of logicbased ideas to develop “metamathematics”—notably leading to the formulation of things like the “decision problem” (Entscheidungsproblem) of asking whether, given an axiom system, there’s a definite procedure to prove or disprove any statement with respect to it.
But while connections between logic and mathematics were of great interest to people concerned with the philosophy of mathematics, a more obviously mathematical development was universal algebra—in which axioms for different areas of mathematics were specified just by giving appropriate algebraiclike relations. (As it happens, universal algebra was launched under that name by the 1898 book A Treatise on Universal Algebra by Alfred Whitehead, later of Principia Mathematica fame.)
But there was one area where ideas about algebra and logic intersected: the tightening up of Boolean algebra, and in particular the finding of simpler foundations for it. Logic had pretty much always been formulated in terms of And, Or and Not. But in 1912 Henry Sheffer—attempting to simplify Principia Mathematica—showed that just Nand (or Nor) were sufficient. (It turned out that Charles Peirce had already noted the same thing in the 1880s.)
So that established that the notation of logic could be made basically as simple as one could imagine. But what about its actual structure, and axioms? Sheffer talked about needing five “algebrastyle” axioms. But by going to axioms based on logical inferences Jean Nicod managed in 1917 to get it down to just one axiom. (And, as it happens, I finally finished the job in 2000 by finding the very simplest “algebrastyle” axioms for logic—the single axiom: ((p·q)·r)·(p·((p·r)·p))r.)
The big question had in a sense been “What is mathematics ultimately made of?”. Well, now it was known that ordinary propositional logic could be built up from very simple elements. So what about the other things used in mathematics—like functions and predicates? Was there a simple way of building these up too?
People like Frege, Whitehead and Russell had all been concerned with constructing specific things—like sets or numbers—that would have immediate mathematical meaning. But Hilbert’s work in the late 1910s began to highlight the idea of looking instead at metamathematics and the “mechanism of mathematics”—and in effect at how the pure symbolic infrastructure of mathematics fits together (through proofs, etc.), independent of any immediate “external” mathematical meaning.
Much as Aristotle and subsequent logicians had used (propositional) logic to define a “symbolic structure” for arguments, independent of their subject matter, so too did Hilbert’s program imagine a general “symbolic structure” for mathematics, independent of particular mathematical subject matter.
And this is what finally set the stage for the invention of combinators.
We don’t know how long it took Moses Schönfinkel to come up with combinators. From what we know of his personal history, it could have been as long as a decade. But it could also have been as short as a few weeks.
There’s no advanced math or advanced logic involved in defining combinators. But to drill through the layers of technical detail of mathematical logic to realize that it’s even conceivable that everything can be defined in terms of them is a supreme achievement of a kind of abstract reductionism.
There is much we don’t know about Schönfinkel as a person. But the 11page paper he wrote on the basis of his December 7, 1920, talk in which he introduced combinators is extremely clear.
The paper is entitled “On the Building Blocks of Mathematical Logic” (in the original German, “Über die Bausteine der mathematischen Logik”.) In other words, its goal is to talk about “atoms” from which mathematical logic can be built. Schönfinkel explains that it’s “in the spirit of” Hilbert’s axiomatic method to build everything from as few notions as possible; then he says that what he wants to do is to “seek out those notions from which we shall best be able to construct all other notions of the branch of science in question”.
His first step is to explain that Hilbert, Whitehead, Russell and Frege all set up mathematical logic in terms of standard And, Or, Not, etc. connectives—but that Sheffer had recently been able to show that just a single connective (indicated by a stroke “”—and what we would now call Nand) was sufficient:
But in addition to the “content” of these relations, I think Schönfinkel was trying to communicate by example something else: that all these logical connectives can ultimately be thought of just as examples of “abstract symbolic structures” with a certain “function of arguments” (i.e. f[x,y]) form.
The next couple of paragraphs talk about how the quantifiers “for all” (∀) and “there exists” (∃) can also be simplified in terms of the Sheffer stroke (i.e. Nand). But then comes the rallying cry: “The successes that we have encountered thus far… encourage us to attempt further progress.” And then he’s ready for the big idea—which he explains “at first glance certainly appears extremely bold”. He proposes to “eliminate by suitable reduction the remaining fundamental concepts of proposition, function and variable”.
He explains that this only makes sense for “arbitrary, logically general propositions”, or, as we’d say now, for purely symbolic constructs without specific meanings yet assigned. In other words, his goal is to create a general framework for operating on arbitrary symbolic expressions independent of their interpretation.
He explains that this is valuable both from a “methodological point of view” in achieving “the greatest possible conceptual uniformity”, but also from a certain philosophical or perhaps aesthetic point of view.
And in a sense what he was explaining—back in 1920—was something that’s been a core part of the computational language design that I’ve done for the past 40 years: that everything can be represented as a symbolic expression, and that there’s tremendous value to this kind of uniformity.
But as a “language designer” Schönfinkel was an ultimate minimalist. He wanted to get rid of as many notions as possible—and in particular he didn’t want variables, which he explained were “nothing but tokens that characterize certain argument places and operators as belonging together”; “mere auxiliary notions”.
Today we have all sorts of mathematical notation that’s at least somewhat “variable free” (think coordinatefree notation, category theory, etc.) But in 1920 mathematics as it was written was full of variables. And it needed a serious idea to see how to get rid of them. And that’s where Schönfinkel starts to go “even more symbolic”.
He explains that he’s going to make a kind of “functional calculus” (Funktionalkalkül). He says that normally functions just define a certain correspondence between the domain of their arguments, and the domain of their values. But he says he’s going to generalize that—and allow (“disembodied”) functions to appear as arguments and values of functions. In other words, he’s inventing what we’d now call higherorder functions, where functions can operate “symbolically” on other functions.
In the context of traditional calculusandalgebrastyle mathematics it’s a bizarre idea. But really it’s an idea about computation and computational structures—that’s more abstract and ultimately much more general than the mathematical objectives that inspired it.
But back to Schönfinkel’s paper. His next step is to explain that once functions can have other functions as arguments, functions only ever need to take a single argument. In modern (Wolfram Language) notation he says that you never need f[x,y]; you can always do everything with f[x][y].
In something of a sleight of hand, he sets up his notation so that fxyz (which might look like a function of three arguments f[x,y,z]) actually means (((fx)y)z) (i.e. f[x][y][z]). (In other words—somewhat confusingly with respect to modern standard functional notation—he takes function application to be left associative.)
Again, it’s a bizarre idea—though actually Frege had had a similar idea many years earlier (and now the idea is usually called currying, after Haskell Curry, who we’ll be talking about later). But with his “functional calculus” set up, and all functions needing to take only one argument, Schönfinkel is ready for his big result.
He’s effectively going to argue that by combining a small set of particular functions he can construct any possible symbolic function—or at least anything needed for predicate logic. He calls them a “sequence of particular functions of a very general nature”. Initially there are five of them: the identity function (Identitätsfunktion) I, the constancy function (Konstanzfunktion) C (which we now call K), the interchange function (Vertauschungsfunktion) T, the composition function (Zusammensetzungsfunktion) Z, and the fusion function (Verschmelzungsfunktion) S.
And then he’s off and running defining what we now call combinators. The definitions look simple and direct. But to get to them Schönfinkel effectively had to cut away all sorts of conceptual baggage that had come with the historical development of logic and mathematics.
Even talking about the identity combinator isn’t completely straightforward. Schönfinkel carefully explains that in I x = x, equality is direct symbolic or structural equality, or as he puts it “the equal sign is not to be taken to represent logical equivalence as it is ordinarily defined in the propositional calculus of logic but signifies that the expressions on the left and on the right mean the same thing, that is, that the function value lx is always the same as the argument value x, whatever we may substitute for x.” He then adds parenthetically, “Thus, for instance, I I would be equal to I”. And, yes, to someone used to the mathematical idea that a function takes values like numbers, and gives back numbers, this is a bit mindblowing.
Next he explains the constancy combinator, that he called C (even though the German word for it starts with K), and that we now call K. He says “let us assume that the argument value is again arbitrary without restriction, while, regardless of what this value is, the function value will always be the fixed value a”. And when he says “arbitrary” he really means it: it’s not just a number or something; it’s what we would now think of as any symbolic expression.
First he writes (C a)y = a, i.e. the value of the “constancy function C a operating on any y is a”, then he says to “let a be variable too”, and defines (C x)y = x or Cxy = x. Helpfully, almost as if he were writing computer documentation, he adds: “In practical applications C serves to permit the introduction of a quantity x as a ‘blind’ variable.”
Then he’s on to T. In modern notation the definition is T[f][x][y] = f[y][x] (i.e. T is essentially ReverseApplied). (He wrote the definition as (Tϕ)xy = ϕyx, explaining that the parentheses can be omitted.) He justifies the idea of T by saying that “The function T makes it possible to alter the order of the terms of an expression, and in this way it compensates to a certain extent for the lack of a commutative law.”
Next comes the composition combinator Z. He explains that “In [mathematical] analysis, as is well known, we speak loosely of a ‘function of a function’...”, by which he meant that it was pretty common then (and now) to write something like f(g(x)). But then he “went symbolic”—and defined a composition function that could symbolically act on any two functions f and g: Z[f][g][x] = f[g[x]]. He explains that Z allows one to “shift parentheses” in an expression: i.e. whatever the objects in an expression might be, Z allows one to transform [][][] to [[]] etc. But in case this might have seemed too abstract and symbolic, he then attempted to explain in a more “algebraic” way that the effect of Z is “somewhat like that of the associative law” (though, he added, the actual associative law is not satisfied).
Finally comes the pièce de résistance: the S combinator (that Schönfinkel calls the “fusion function”):
He doesn’t take too long to define it. He basically says: consider (fx)(gx) (i.e. f[x][g[x]]). This is really just “a function of x”. But what function? It’s not a composition of f and g; he calls it a “fusion”, and he defines the S combinator to create it: S[f][g][x] = f[x][g[x]].
It’s pretty clear Schönfinkel knew this kind of “symbolic gymnastics” would be hard for people to understand. He continues: “It will be advisable to make this function more intelligible by means of a practical example.” He says to take fxy (i.e. f[x][y]) to be log_{x}y (i.e. Log[x,y]), and gz (i.e. g[z]) to be 1 + z. Then Sfgx = (fx)(gx) = log_{x}(1 + x) (i.e. S[f][g][x]=f[x][g[x]]=Log[x,1+x]). And, OK, it’s not obvious why one would want to do that, and I’m not rushing to make S a builtin function in the Wolfram Language.
But Schönfinkel explains that for him “the practical use of the function S will be to enable us to reduce the number of occurrences of a variable—and to some extent also of a particular function—from several to a single one”.
Setting up everything in terms of five basic objects I, C (now K), T, Z and S might already seem impressive and minimalist enough. But Schönfinkel realized that he could go even further:
First, he says that actually I = SCC (or, in modern notation, s[k][k]). In other words, s[k][k][x] for symbolic x is just equal to x (since s[k][k][x] becomes k[x][k[x]] by using the definition of S, and this becomes x by using the definition of C). He notes that this particular reduction was communicated to him by a certain Alfred Boskowitz (who we know to have been a student at the time); he says that Paul Bernays (who was more of a colleague) had “some time before” noted that I = (SC)(CC) (i.e. s[k][k[k]]). Today, of course, we can use a computer to just enumerate all possible combinator expressions of a particular size, and find what the smallest reduction is. But in Schönfinkel’s day, it would have been more like solving a puzzle by hand.
Schönfinkel goes on, and proves that Z can also be reduced: Z = S(CS)C (i.e. s[k[s]][k]). And, yes, a very simple Wolfram Language program can verify in a few milliseconds that that is the simplest form.
OK, what about T? Schönfinkel gives 8 steps of reduction to prove that T = S(ZZS)(CC) (i.e. s[s[k[s]][k][s[k[s]][k]][s]][k[k]]). But is this the simplest possible form for T? Well, no. But (with the very straightforward 2line Wolfram Language program I wrote) it did take my modern computer a number of minutes to determine what the simplest form is.
The answer is that it doesn't have size 12, like Schönfinkel’s, but rather size 9. Actually, there are 6 cases of size 9 that all work: s[s[k[s]][s[k[k]][s]]][k[k]] (S(S(KS)(S(KK)S))(KK))) and five others. And, yes, it takes a few steps of reduction to prove that they work (the other size9 cases S(SSK(K(SS(KK))))S, S(S(K(S(KS)K))S)(KK), S(K(S(S(KS)K)(KK)))S, S(K(SS(KK)))(S(KK)S), S(K(S(K(SS(KK)))K))S all have more complicated reductions):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/\ Programs.wl"]; CombinatorEvolutionPlot[ CombinatorFixedPointList[ s[s[k[s]][s[k[k]][s]]][k[k]][f][g][x]], "StatesDisplay"] 
But, OK, what did Schönfinkel want to do with these objects he’d constructed? As the title of his paper suggests, he wanted to use them as building blocks for mathematical logic. He begins: “Let us now apply our results to a special case, that of the calculus of logic in which the basic elements are individuals and the functions are propositional functions.” I consider this sentence significant. Schönfinkel didn’t have a way to express it (the concept of universal computation hadn’t been invented yet), but he seems to have realized that what he’d done was quite general, and went even beyond being able to represent a particular kind of logic.
Still, he went on to give his example. He’d explained at the beginning of the paper that the quantifiers we now call ∀ and ∃ could both be represented in terms of a kind of “quantified Nand” that he wrote :
But now he wanted to “combinatorify” everything. So he introduced a new combinator U, and defined it to represent his “quantified Nand”: Ufg = fx gx (he called U the “incompatibility function”—an interesting linguistic description of Nand):
“It is a remarkable fact”, he says, “that every formula of logic can now be expressed by means... solely of C, S and U.” So he’s saying that any expression from mathematical logic can be written out as some combinator expression in terms of S, C (now K) and U. He says that when there are quantifiers like “for all x...” it’s always possible to use combinators to get rid of the “bound variables” x, etc. He says that he “will not give the complete demonstration here”, but rather content himself with an example. (Unfortunately—for reasons of the trajectory of his life that are still quite unclear—he never published his “complete demonstration”.)
But, OK, so what had he achieved? He’d basically shown that any expression that might appear in predicate logic (with logical connectives, quantifiers, variables, etc.) could be reduced to an expression purely in terms of the combinators S, C (now K) and U.
Did he need the U? Not really. But he had to have some way to represent the thing with mathematical or logical “meaning” on which his combinators would be acting. Today the obvious thing to do would be to have a representation for true and false. And what’s more, to represent these purely in terms of combinators. For example, if we took K to represent true, and SK (s[k]) to represent false, then Or can be represented as SSK (s[s][k]), And as S(SS)S(SK) (s[s[s]][s][s[k]]) and Nand as S(S(S(SS(K(K(KK)))))(KS)) (s[s[s[s[s][k[k[k[k]]]]]][k[s]]). Schönfinkel got amazingly far in reducing everything to his “building blocks”. But, yes, he missed this final step.
But given that he’d managed to reduce everything to S, C and U he figured he should try to go further. So he considered an object J that would be a single building block of S and C: JJ = S and J(JJ) = C.
With S and K one can just point to any piece of an expression and see if it reduces. With J it’s a bit more complicated. In modern Wolfram Language terms one can state the rules as {j[j][x_][y_][z_]→x[z][y[z]], j[j[j]][x_][y_]→x} (where order matters) but to apply these requires pattern matching “clusters of J’s” rather than just looking at single S’s and K’s at a time.
But even though—as Schönfinkel observed—this “final reduction” to J didn’t work out, getting everything down to S and K was already amazing. At the beginning of the paper, Schönfinkel had described his objectives. And then he says “It seems to me remarkable in the extreme that the goal we have just set can be realized also; as it happens, it can be done by a reduction to three fundamental signs.” (The paper does say three fundamental signs, presumably counting U as well as S and K.)
I’m sure Schönfinkel expected that to reproduce all the richness of mathematical logic he’d need quite an elaborate set of building blocks. And certainly people like Frege, Whitehead and Russell had used what were eventually very complicated setups. Schönfinkel managed to cut through all the complexity to show that simple building blocks were all that were needed. But then he found something else: that actually just two building blocks (S and K) were enough.
In modern terms, we’d say that Schönfinkel managed to construct a system capable of universal computation. And that’s amazing in itself. But even more amazing is that he found he could do it with such a simple setup.
I’m sure Schönfinkel was extremely surprised. And here I personally feel a certain commonality with him. Because in my own explorations of the computational universe, what I’ve found over and over again is that it takes only remarkably simple systems to be capable of highly complex behavior—and of universal computation. And even after exploring the computational universe for four decades, I’m still continually surprised at just how simple the systems can be.
For me, this has turned into a general principle—the Principle of Computational Equivalence—and a whole conceptual framework around it. Schönfinkel didn’t have anything like that to think in terms of. But he was in a sense a good enough scientist that he still managed to discover what he discovered—that many decades later we can see fits in as another piece of evidence for the Principle of Computational Equivalence.
Looking at Schönfinkel’s paper a century later, it’s remarkable not only for what it discovers, but also for the clarity and simplicity with which it is presented. A little of the notation is now dated (and of course the original paper is written in German, which is no longer the kind of leading language of scholarship it once was). But for the most part, the paper still seems perfectly modern. Except, of course, that now it could be couched in terms of symbolic expressions and computation, rather than mathematical logic.
Combinators are hard to understand, and it’s not clear how many people understood them when they were first introduced—let alone understood their implications. It’s not a good sign that when Schönfinkel’s paper appeared in 1924 the person who helped prepare it for final publication (Heinrich Behmann) added his own three paragraphs at the end, that were quite confused. And Schönfinkel’s sole other published paper—coauthored with Paul Bernays in 1927—didn’t even mention combinators, even though they could have very profitably been used to discuss the subject at hand (decision problems in mathematical logic).
But in 1927 combinators (if not perhaps Schönfinkel’s recognition for them) had a remarkable piece of good fortune. Schönfinkel’s paper was discovered by a certain Haskell Curry—who would then devote more than 50 years to studying what he named “combinators”, and to spreading the word about them.
At some level I think one can view the main thrust of what Curry and his disciples did with combinators as an effort to “mathematicize” them. Schönfinkel had presented combinators in a rather straightforward “structural” way. But what was the mathematical interpretation of what he did, and of how combinators work in general? What mathematical formalism could capture Schönfinkel’s structural idea of substitution? Just what, for example, was the true notion of equality for combinators?
In the end, combinators are fundamentally computational constructs, full of all the phenomena of “unbridled computation”—like undecidability and computational irreducibility. And it’s inevitable that mathematics as normally conceived can only go so far in “cracking” them.
But back in the 1920s and 1930s the concept and power of computation was not yet understood, and it was assumed that the ideas and tools of mathematics would be the ones to use in analyzing a formal system like combinators. And it wasn’t that mathematical methods got absolutely nowhere with combinators.
Unlike cellular automata, or even Turing machines, there’s a certain immediate structural complexity to combinators, with their elaborate tree structures, equivalences and so on. And so there was progress to be made—and years of work to be done—in untangling this, without having to face the raw features of fullscale computation, like computational irreducibility.
In the end, combinators are full of computational irreducibility. But they also have layers of computational reducibility, some of which are aligned with the kinds of things mathematics and mathematical logic have been set up to handle. And in this there’s a curious resonance with our recent Physics Project.
In our models based on hypergraph rewriting there’s also a kind of bedrock of computational irreducibility. But as with combinators, there’s a certain immediate structural complexity to what our models do. And there are layers of computational reducibility associated with this. But the remarkable thing with our models is that some of those layers—and the formalisms one can build to understand them—have an immediate interpretation: they are basically the core theories of twentiethcentury physics, namely general relativity and quantum mechanics.
Combinators work sufficiently differently that they don’t immediately align with that kind of interpretation. But it’s still true that one of the important properties discovered in combinators (namely confluence, related to our idea of causal invariance) turns out to be crucial to our models, their correspondence with physics, and in the end our whole ability to perceive regularity in the universe, even in the face of computational irreducibility.
But let’s get back to the story of combinators as it played out after Schönfinkel’s paper. Schönfinkel had basically set things up in a novel, very direct, structural way. But Curry wanted to connect with more traditional ideas in mathematical logic, and mathematics in general. And after a first paper (published in 1929) which pretty much just recorded his first thoughts, and his efforts to understand what Schönfinkel had done, Curry was by 1930 starting to do things like formulate axioms for combinators, and hoping to prove general theorems about mathematical properties like equality.
Without the understanding of universal computation and their relationship to it, it wasn’t clear yet how complicated it might ultimately be to deal with combinators. And Curry pushed forward, publishing more papers and trying to do things like define set theory using his axioms for combinators. But in 1934 disaster struck. It wasn’t something about computation or undecidability; instead it was that Stephen Kleene and J. Barkley Rosser showed the axioms Curry had come up with to try and “tighten up Schönfinkel” were just plain inconsistent.
To Kleene and Rosser it provided more evidence of the need for Russell’s (originally quite hacky) idea of types—and led them to more complicated axiom systems, and away from combinators. But Curry was undeterred. He revised his axiom system and continued—ultimately for many decades—to see what could be proved about combinators and things like them using mathematical methods.
But already at the beginning of the 1930s there were bigger things afoot around mathematical logic—which would soon intersect with combinators.
How should one represent the fundamental constructs of mathematics? Back in the 1920s nobody thought seriously about using combinators. And instead there were basically three “big brands”: Principia Mathematica, set theory and Hilbert’s program. Relations were being found, details were being filled in, and issues were being found. But there was a general sense that progress was being made.
Quite where the boundaries might lie wasn’t clear. For example, could one specify a way to “construct any function” from lowerlevel primitives? The basic idea of recursion was very old (think: Fibonacci). But by the early 1920s there was a fairly wellformalized notion of “primitive recursion” in which functions always found their values from earlier values. But could all “mathematical” functions be constructed this way?
By 1926 it was known that this wouldn’t work: the Ackermann function was a reasonable “mathematical” function, but it wasn’t primitive recursive. It meant that definitions had to be generalized (e.g. to “general recursive functions” that didn’t just look back at earlier values, but could “look forward until...” as well). But there didn’t seem to be any fundamental problem with the idea that mathematics could just “mechanistically” be built out forever from appropriate primitives.
But in 1931 came Gödel’s theorem. There’d been a long tradition of identifying paradoxes and inconsistencies, and finding ways to patch them by changing axioms. But Gödel’s theorem was based on Peano’s bythenstandard axioms for arithmetic (branded by Gödel as a fragment of Principia Mathematica). And it showed there was a fundamental problem.
In essence, Gödel took the paradoxical statement “this statement is unprovable” and showed that it could be expressed purely as a statement of arithmetic—roughly a statement about the existence of solutions to appropriate integer equations. And basically what Gödel had to do to achieve this was to create a “compiler” capable of compiling things like “this statement is unprovable” into arithmetic.
In his paper one can basically see him building up different capabilities (e.g. representing arbitrary expressions as numbers through Gödel numbering, checking conditions using general recursion, etc.)—eventually getting to a “high enough level” to represent the statement he wanted:
What did Gödel’s theorem mean? For the foundations of mathematics it meant that the idea of mechanically proving “all true theorems of mathematics” wasn’t going to work. Because it showed that there was at least one statement that by its own admission couldn’t be proved, but was still a “statement about arithmetic”, in the sense that it could be “compiled into arithmetic”.
That was a big deal for the foundations of mathematics. But actually there was something much more significant about Gödel’s theorem, even though it wasn’t recognized at the time. Gödel had used the primitives of number theory and logic to build what amounted to a computational system—in which one could take things like “this statement is unprovable”, and “run them in arithmetic”.
What Gödel had, though, wasn’t exactly a streamlined general system (after all, it only really needed to handle one statement). But the immediate question then was: if there’s a problem with this statement in arithmetic, what about Hilbert’s general “decision problem” (Entscheidungsproblem) for any axiom system?
To discuss the “general decision problem”, though, one needed some kind of general notion of how one could decide things. What ultimate primitives should one use? Schönfinkel (with Paul Bernays)—in his sole other published paper—wrote about a restricted case of the decision problem in 1927, but doesn’t seem to have had the idea of using combinators to study it.
By 1934 Gödel was talking about general recursiveness (i.e. definability through general recursion). And Alonzo Church and Stephen Kleene were introducing λ definability. Then in 1936 Alan Turing introduced Turing machines. All these approaches involved setting up certain primitives, then showing that a large class of things could be “compiled” to those primitives. And that—in effect by thinking about having it compile itself—Hilbert’s Entscheidungsproblem couldn’t be solved.
Perhaps no single result along these lines would have been so significant. But it was soon established that all three kinds of systems were exactly equivalent: the set of computations they could represent were the same, as established by showing that one system could emulate another. And from that discovery eventually emerged the modern notion of universal computation—and all its implications for technology and science.
In the early days, though, there was actually a fourth equivalent kind of system—based on string rewriting—that had been invented by Emil Post in 1920–1. Oh, and then there were combinators.
What was the right “language” to use for setting up mathematical logic? There’d been gradual improvement since the complexities of Principia Mathematica. But around 1930 Alonzo Church wanted a new and cleaner setup. And he needed to have a way (as Frege and Principia Mathematica had done before him) to represent “pure functions”. And that’s how he came to invent λ.
Today in the Wolfram Language we have Function[x,f[x]] or xf[x] (or various shorthands). Church originally had λx[M]:
But what’s perhaps most notable is that on the very first page he defines λ, he’s referencing Schönfinkel’s combinator paper. (Well, specifically, he’s referencing it because he wants to use the device Schönfinkel invented that we now call currying—f[x][y] in place of f[x,y]—though ironically he doesn’t mention Curry.) In his 1932 paper (apparently based on work in 1928–9) λ is almost a sideshow—the main event being the introduction of 37 formal postulates for mathematical logic:
By the next year J. Barkley Rosser is trying to retool Curry’s “combinatory logic” with combinators of his own—and showing how they correspond to lambda expressions:
Then in 1935 lambda calculus has its big “coming out” in Church’s “An Unsolvable Problem of Elementary Number Theory”, in which he introduces the idea that any “effectively calculable” function should be “λ definable”, then defines integers in terms of λ’s (“Church numerals”)
and then shows that the problem of determining equivalence for λ expressions is undecidable.
Very soon thereafter Turing publishes his “On Computable Numbers, with an Application to the Entscheidungsproblem” in which he introduces his much more manifestly mechanistic Turing machine model of computation. In the main part of the paper there are no lambdas—or combinators—to be seen. But by late 1936 Turing had gone to Princeton to be a student with Church—and added a note showing the correspondence between his Turing machines and Church’s lambda calculus.
By the next year, when Turing is writing his rather abstruse “Systems of Logic Based on Ordinals” he’s using lambda calculus all over the place. Early in the document he writes I → λx[x], and soon he’s mixing lambdas and combinators with wild abandon—and in fact he’d already published a onepage paper which introduced the fixedpoint combinator Θ (and, yes, the K in the title refers to Schönfinkel’s K combinator):
When Church summarized the state of lambda calculus in 1941 in his “The Calculi of LambdaConversion” he again made extensive use of combinators. Schönfinkel’s K is prominent. But Schönfinkel’s S is nowhere to be seen—and in fact Church has his own S combinator S[n][f][x]→f[n[f][x]] which implements successors in Church’s numeral system. And he also has a few other “basic combinators” that he routinely uses.
In the end, combinators and lambda calculus are completely equivalent, and it’s quite easy to convert between them—but there’s a curious tradeoff. In lambda calculus one names variables, which is good for human readability, but can lead to problems at a formal level. In combinators, things are formally much cleaner, but the expressions one gets can be completely incomprehensible to humans.
The point is that in a lambda expression like λx λy x[y] one’s naming the variables (here x and y), but really these names are just placeholders: what they are doesn’t matter; they’re just showing where different arguments go. And in a simple case like this, everything is fine. But what happens if one substitutes for y another lambda expression, say λx f[x]? What is that x? Is it the same x as the one outside, or something different? In practice, there are all sorts of renaming schemes that can be used, but they tend to be quite hacky, and things can quickly get tangled up. And if one wants to make formal proofs about lambda calculus, this can potentially be a big problem, and indeed at the beginning it wasn’t clear it wouldn’t derail the whole idea of lambda calculus.
And that’s part of why the correspondence between lambda calculus and combinators was important. With combinators there are no variables, and so no variable names to get tangled up. So if one can show that something can be converted to combinators—even if one never looks at the potentially very long and ugly combinator expression that’s generated—one knows one’s safe from issues about variable names.
There are still plenty of other complicated issues, though. Prominent among them are questions about when combinator expressions can be considered equal. Let’s say you have a combinator expression, like s[s[s[s][k]]][k]. Well, you can repeatedly apply the rules for combinators to transform and reduce it. And it’ll often end up at a fixed point, where no rules apply anymore. But a basic question is whether it matters in which order the rules are applied. And in 1936 Church and Rosser proved it doesn’t.
Actually, what they specifically proved was the analogous result for lambda calculus. They drew a picture to indicate different possible orders in which lambdas could be reduced out, and showed it didn’t matter which path one takes:
This all might seem like a detail. But it turns out that generalizations of their result apply to all sorts of systems. In doing computations (or automatically proving theorems) it’s all about “it doesn’t matter what path you take; you’ll always get the same result”. And that’s important. But recently there’s been another important application that’s shown up. It turns out that a generalization of the “Church–Rosser property” is what we call causal invariance in our Physics Project.
And it’s causal invariance that leads in our models to relativistic invariance, general covariance, objective reality in quantum mechanics, and other central features of physics.
In retrospect, one of the great achievements of the 1930s was the inception of what ended up being the idea of universal computation. But at the time what was done was couched in terms of mathematical logic and it was far from obvious that any of the theoretical structures being built would have any real application beyond thinking about the foundations of mathematics. But even as people like Hilbert were talking in theoretical terms about the mechanization of mathematics, more and more there were actual machines being built for doing mathematical calculations.
We know that even in antiquity (at least one) simple gearbased mechanical calculational devices existed. In the mid1600s arithmetic calculators started being constructed, and by the late 1800s they were in widespread use. At first they were mechanical, but by the 1930s most were electromechanical, and there started to be systems where units for carrying out different arithmetic operations could be chained together. And by the end of the 1940s fairly elaborate such systems based on electronics were being built.
Already in the 1830s Charles Babbage had imagined an “analytical engine” which could do different operations depending on a “program” specified by punch cards—and Ada Lovelace had realized that such a machine had broad “computational” potential. But by the 1930s a century had passed and nothing like this was connected to the theoretical developments that were going on—and the actual engineering of computational systems was done without any particular overarching theoretical framework.
Still, as electronic devices got more complicated and scientific interest in psychology intensified, something else happened: there started to be the idea (sometimes associated with the name cybernetics) that somehow electronics might reproduce how things like brains work. In the mid1930s Claude Shannon had shown that Boolean algebra could represent how switching circuits work, and in 1943 Warren McCulloch and Walter Pitts proposed a model of idealized neural networks formulated in something close to mathematical logic terms.
Meanwhile by the mid1940s John von Neumann—who had worked extensively on mathematical logic—had started suggesting mathlike specifications for practical electronic computers, including the way their programs might be stored electronically. At first he made lots of brainlike references to “organs” and “inhibitory connections”, and essentially no mention of ideas from mathematical logic. But by the end of the 1940s von Neumann was talking at least conceptually about connections to Gödel’s theorem and Turing machines, Alan Turing had become involved with actual electronic computers, and there was the beginning of widespread understanding of the notion of generalpurpose computers and universal computation.
In the 1950s there was an explosion of interest in what would now be called the theory of computation—and great optimism about its relevance to artificial intelligence. There was all sorts of “interdisciplinary work” on fairly “concrete” models of computation, like finite automata, Turing machines, cellular automata and idealized neural networks. More “abstract” approaches, like recursive functions, lambda calculus—and combinators—remained, however, pretty much restricted to researchers in mathematical logic.
When early programming languages started to appear in the latter part of the 1950s, thinking about practical computers began to become a bit more abstract. It was understood that the grammars of languages could be specified recursively—and actual recursion (of functions being able to call themselves) just snuck into the specification of ALGOL 60. But what about the structures on which programs operated? Most of the concentration was on arrays (sometimes rather elegantly, as in APL) and, occasionally, character strings.
But a notable exception was LISP, described in John McCarthy’s 1960 paper “Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I” (part 2 was not written). There was lots of optimism about AI at the time, and the idea was to create a language to “implement AI”—and do things like “mechanical theorem proving”. A key idea—that McCarthy described as being based on “recursive function formalism”—was to have treestructured symbolic expressions (“S expressions”). (In the original paper, what’s now Wolfram Language–style f[g[x]] “M expression” notation, complete with square brackets, was used as part of the specification, but the quintessentialLISPlike (f (g x)) notation won out when LISP was actually implemented.)
An issue in LISP was how to take “expressions” (which were viewed as representing things) and turn them into functions (which do things). And the basic plan was to use Church’s idea of λ notation. But when it came time to implement this, there was, of course, trouble with name collisions, which ended up getting handled in quite hacky ways. So did McCarthy know about combinators? The answer is yes, as his 1960 paper shows:
I actually didn’t know until just now that McCarthy had ever even considered combinators, and in the years I knew him I don’t think I ever personally talked to him about them. But it seems that for McCarthy—as for Church—combinators were a kind of “comforting backstop” that ensured that it was OK to use lambdas, and that if things went too badly wrong with variable naming, there was at least in principle always a way to untangle everything.
In the practical development of computers and computer languages, even lambdas—let alone combinators—weren’t really much heard from again (except in a small AI circle) until the 1980s. And even then it didn’t help that in an effort variously to stay close to hardware and to structure programs there tended to be a desire to give everything a “data type”—which was at odds with the “consume any expression” approach of standard combinators and lambdas. But beginning in the 1980s—particularly with the progressive rise of functional programming—lambdas, at least, have steadily gained in visibility and practical application.
What of combinators? Occasionally as a proof of principle there’ll be a hardware system developed that natively implements Schönfinkel’s combinators. Or—particularly in modern times—there’ll be an esoteric language that uses combinators in some kind of purposeful effort at obfuscation. Still, a remarkable crosssection of notable people concerned with the foundations of computing have—at one time or another—taught about combinators or written a paper about them. And in recent years the term “combinator” has become more popular as a way to describe a “purely applicative” function.
But by and large the important ideas that first arose with combinators ended up being absorbed into practical computing by quite circuitous routes, without direct reference to their origins, or to the specific structure of combinators.
For 100 years combinators have mostly been an obscure academic topic, studied particularly in connection with lambda calculus, at borders between theoretical computer science, mathematical logic and to some extent mathematical formalisms like category theory. Much of the work that’s been done can be traced in one way or another to the influence of Haskell Curry or Alonzo Church—particularly through their students, grandstudents, greatgrandstudents, etc. Partly in the early years, most of the work was centered in the US, but by the 1960s there was a strong migration to Europe and especially the Netherlands.
But even with all their abstractness and obscurity, on a few rare occasions combinators have broken into something closer to the mainstream. One such time was with the popular logicpuzzle book To Mock a Mockingbird, published in 1985 by Raymond Smullyan—a former student of Alonzo Church’s. It begins: “A certain enchanted forest is inhabited by talking birds” and goes on to tell a story that's basically about combinators “dressed up” as birds calling each other (S is the “starling”, K the “kestrel”)—with a convenient “bird who’s who” at the end. The book is dedicated “To the memory of Haskell Curry—an early pioneer in combinatory logic and an avid birdwatcher”.
And then there’s Y Combinator. The original Y combinator arose out of work that Curry did in the 1930s on the consistency of axiom systems for combinators, and it appeared explicitly in his 1958 classic book:
He called it the “paradoxical combinator” because it was recursively defined in a kind of selfreferential way analogous to various paradoxes. Its explicit form is SSK(S(K(SS(S(SSK))))K) and its most immediately notable feature is that under Schönfinkel’s combinator transformation rules it never settles down to a particular “value” but just keeps growing forever.
Well, in 2005 Paul Graham—who had long been an enthusiast of functional programming and LISP—decided to name his new (and now very famous) startup accelerator “Y Combinator”. I remember asking him why he’d called it that. “Because,” he said, “nobody understands the Y combinator”.
Looking in my own archives from that time I find an email I sent a combinator enthusiast who was working with me:
Followed by, basically, “Yes our theorem prover can prove the basic property of the Y combinator” (V6 sounds so ancient; we’re now just about to release V12.2):
I had another unexpected encounter with combinators last year. I had been given a book that was once owned by Alan Turing, and in it I found a piece of paper—that I recognized as being covered with none other than lambdas and combinators (but that’s not the Y combinator):
It took quite a bit of sleuthing (that I wrote extensively about)—but I eventually discovered that the piece of paper was written by Turing’s student Robin Gandy. But I never figured out why he was doing combinators....
I think I first found out about combinators around 1979 by seeing Schönfinkel’s original paper in a book called From Frege to Gödel: A Source Book in Mathematical Logic (by a certain Jean van Heijenoort). How Schönfinkel’s paper ended up being in that book is an interesting question, which I’ll write about elsewhere. The spine of my copy of the book has long been broken at the location of Schönfinkel’s paper, and at different times I’ve come back to the paper, always thinking there was more to understand about it.
But why was I even studying things like this back in 1979? I guess in retrospect I can say I was engaged in an activity that goes back to Frege or even Leibniz: I was trying to find a fundamental framework for representing mathematics and beyond. But my goal wasn’t a philosophical one; it was a very practical one: I was trying to build a computer language that could do general computations in mathematics and beyond.
My immediate applications were in physics, and it was from physics that my main methodological experience came. And the result was that—like trying to understand the world in terms of elementary particles—I wanted to understand computation in terms of its most fundamental elements. But I also had lots of practical experience in using computers to do mathematical computation. And I soon developed a theory about how I thought computation could fundamentally be done.
It started from the practical issue of transformations on algebraic expressions (turn sin(2x) into 2 sin(x) cos(x), etc.). But it soon became a general idea: compute by doing transformations on symbolic expressions. Was this going to work? I wanted to understand as fundamentally as possible what computation really was—and from that I was led to its history in mathematical logic. Much of what I saw in books and papers about mathematical logic I found abstruse and steeped in sometimes horrendous notational complexity. But what were these people really doing? It made it much easier that I had a definite theory, against which I could essentially do reductionist science. That stuff in Principia Mathematica? Those ideas about rewriting systems? Yup, I could see how to represent them as rules for transformations on symbolic expressions.
And so it was that I came to design SMP: “A Symbolic Manipulation Program”—all based on transformation rules for symbolic expressions. It was easy to represent mathematical relations ($x is a pattern variable that would now in the Wolfram Language be x_ on the lefthand side only):
Or basic logic:
Or, for that matter, predicate logic of the kind Schönfinkel wanted to capture:
And, yes, it could emulate a Turing machine (note the tapeastransformationrules representation that appears at the end):
But the most important thing I realized is that it really worked to represent basically anything in terms of symbolic expressions, and transformation rules on them. Yes, it was quite often useful to think of “applying functions to things” (and SMP had its version of lambda, for example), but it was much more powerful to think about symbolic expressions as just “being there” (“x doesn’t have to have a value”)—like things in the world—with the language being able to define how things should transform.
In retrospect this all seems awfully like the core idea of combinators, but with one important exception: that instead of everything being built from “purely structural elements” with names like S and K, there was a whole collection of “primitive objects” that were intended to have direct understandable meanings (like Plus, Times, etc.). And indeed I saw a large part of my task in language design as being to think about computations one might want to do, and then try to “drill down” to find the “elementary particles”—or primitive objects—from which these computations might be built up.
Over time I’ve come to realize that doing this is less about what one can in principle use to construct computations, and more about making a bridge to the way humans think about things. It’s crucial that there’s an underlying structure—symbolic expressions—that can represent anything. But increasingly I’ve come to realize that what we need from a computational language is to have a way to encapsulate in precise computational form the kinds of things we humans think about—in a way that we humans can understand. And a crucial part of being able to do that is to leverage what has ultimately been at the core of making our whole intellectual development as a species possible: the idea of human language.
Human language has given us a way to talk symbolically about the world: to give symbolic names to things, and then to build things up using these. In designing a computational language the goal is to leverage this: to use what humans already know and understand, but be able to represent it in a precise computational way that is amenable to actual computation that can be done automatically by computer.
It’s probably no coincidence that the tree structure of symbolic expressions that I have found to be such a successful foundation for computational language is a bit like an idealized version of the kind of tree structure (think parse trees or sentence diagramming) that one can view human language as following. There are other ways to set up universal computation, but this is the one that seems to fit most directly with our way of thinking about things.
And, yes, in the end all those symbolic expressions could be constructed like combinators from objects—like S and K—with no direct human meaning. But that would be like having a world without nouns—a world where there’s no name for anything—and the representation of everything has to be built from scratch. But the crucial idea that’s central to human language—and now to computational language—is to be able to have layers of abstraction, where one can name things and then refer to them just by name without having to think about how they’re built up “inside”.
In some sense one can see the goal of people like Frege—and Schönfinkel—as being to “reduce out” what exists in mathematics (or the world) and turn it into something like “pure logic”. And the structural part of that is exactly what makes computational language possible. But in my conception of computational language the whole idea is to have content that relates to the world and the way we humans think about it.
And over the decades I’ve continually been amazed at just how strong and successful the idea of representing things in terms of symbolic expressions and transformations on them is. Underneath everything that’s going on in the Wolfram Language—and in all the many systems that now use it—it’s all ultimately just symbolic expressions being transformed according to particular rules, and reaching fixed points that represent results of computations, just like in those examples in Schönfinkel’s original paper.
One important feature of Schönfinkel’s setup is the idea that one doesn’t just have “functions” like f[x], or even just nested functions, like f[g[x]]. Instead one can have constructs where instead of the “name of a function” (like f) one can have a whole complex symbolic structure. And while this was certainly possible in SMP, not too much was built around it. But when I came to start designing what’s now the Wolfram Language in 1986, I made sure that the “head” (as I called it) of an expression could itself be an arbitrary expression.
And when Mathematica was first launched in 1988 I was charmed to see more than one person from mathematical logic immediately think of implementing combinators. Make the definitions:
✕
Clear[s,k]; s[x_][y_][z_] := x[z][y[z]] 
✕
k[x_][y_] := x 
Then combinators “just work” (at least if they reach a fixed point):
✕
s[s[k[s]][s[k[k]][s[k[s]][k]]]][s[k[s[s[k][k]]]][k]][a][b][c] 
But what about the idea of “composite symbolic heads”? Already in SMP I’d used them to do simple things like represent derivatives (and in Wolfram Language f'[x] is Derivative[1][f][x]). But something that’s been interesting to me to see is that as the decades have gone by, more and more gets done with “composite heads”. Sometimes one thinks of them as some kind of nesting of operations, or nesting of modifiers to a symbolic object. But increasingly they end up being a way to represent “higherorder constructs”—in effect things that produce things that produce things etc. that eventually give a concrete object one wants.
I don’t think most of us humans are particularly good at following this kind of chain of abstraction, at least without some kind of “guide rails”. And it’s been interesting for me to see over the years how we’ve been able to progressively build up guide rails for longer and longer chains of abstraction. First there were things like Function, Apply, Map. Then Nest, Fold, FixedPoint, MapThread. But only quite recently NestGraph, FoldPair, SubsetMap, etc. Even from the beginning there were direct “head manipulation” functions like Operate and Through. But unlike more “arraylike” operations for list manipulation they’ve been slow to catch on.
In a sense combinators are an ultimate story of “symbolic head manipulation”: everything can get applied to everything before it’s applied to anything. And, yes, it’s very hard to keep track of what’s going on—which is why “named guide rails” are so important, and also why they’re challenging to devise. But it seems as if, as we progressively evolve our understanding, we’re slowly able to get a little further, in effect building towards the kind of structure and power that combinators—in their very nonhumanrelatable way—first showed us was possible a century ago.
Combinators were invented for a definite purpose: to provide building blocks, as Schönfinkel put it, for logic. It was the same kind of thing with other models of what we now know of as computation. All of them were “constructed for a purpose”. But in the end computation—and programs—are abstract things, that can in principle be studied without reference to any particular purpose. One might have some particular reason to be looking at how fast programs of some kind can run, or what can be proved about them. But what about the analog of pure natural science: of studying what programs just “naturally do”?
At the beginning of the 1980s I got very interested in what one can think of as the “natural science of programs”. My interest originally arose out of a question about ordinary natural science. One of the very noticeable features of the natural world is how much in it seems to us highly complex. But where does this complexity really come from? Through what kind of mechanism does nature produce it? I quickly realized that in trying to address that question, I needed as general a foundation for making models of things as possible. And for that I turned to programs, and began to study just what “programs in the wild” might do.
Ever since the time of Galileo and Newton mathematical equations had been the main way that people ultimately imagined making models of nature. And on the face of it—with their real numbers and continuous character—these seemed quite different from the usual setup for computation, with its discrete elements and discrete choices. But perhaps in part through my own experience in doing mathematics symbolically on computers, I didn’t see a real conflict, and I began to think of programs as a kind of generalization of the traditional approach to modeling in science.
But what kind of programs might nature use? I decided to just start exploring all the possibilities: the whole “computational universe” of programs—starting with the simplest. I came up with a particularly simple setup involving a row of cells with values 0 or 1 updated in parallel based on the values of their neighbors. I soon learned that systems like this had actually been studied under the name “cellular automata” in the 1950s (particularly in 2D) as potential models of computation, though had fallen out of favor mainly through not having seemed very “human programmable”.
My initial assumption was that with simple programs I’d only see simple behavior. But with my cellular automata it was very easy to do actual computer experiments, and to visualize the results. And though in many cases what I saw was simple behavior, I also saw something very surprising: that in some cases—even though the rules were very simple—the behavior that was generated could be immensely complex:
✕
GraphicsRow[ Labeled[ArrayPlot[CellularAutomaton[#, {{1}, 0}, {80, All}]], RulePlot[CellularAutomaton[#]]] & /@ {150, 30, 73}, ImageSize > {Full, Automatic}, Spacings > 0] 
It took me years to come to terms with this phenomenon, and it’s gradually informed the way I think about science, computation and many other things. At first I studied it almost exclusively in cellular automata. I made connections to actual systems in nature that cellular automata could model. I tried to understand what existing mathematical and other methods could say about what I’d seen. And slowly I began to formulate general ideas to explain what was going on—like computational irreducibility and the Principle of Computational Equivalence.
But at the beginning of the 1990s—now armed with what would become the Wolfram Language—I decided I should try to see just how the phenomenon I had found in cellular automata would play it in other kinds of computational systems. And my archives record that on April 4, 1992, I started looking at combinators.
I seem to have come back to them several times, but in a notebook from July 10, 1994 (which, yes, still runs just fine), there it is:
A randomly chosen combinator made of Schönfinkel’s S’s and K’s starting to show complex behavior. I seem to have a lot of notebooks that start with the simple combinator definitions—and then start exploring:
There are what seem like they could be pages from a “computational naturalist’s field notebook”:
Then there are attempts to visualize combinators in the same kind of way as cellular automata:
But the end result was that, yes, like Turing machines, string substitution systems and all the other systems I explored in the computational universe, combinators did exactly the same kinds of things I’d originally discovered in cellular automata. Combinators weren’t just systems that could be set up to do things. Even “in the wild” they could spontaneously do very interesting and complex things.
I included a few pages on what I called “symbolic systems” (essentially lambdas) at the end of my chapter on “The World of Simple Programs” in A New Kind of Science (and, yes, reading particularly the notes again now, I realize there are still many more things to explore...):
Later in the book I talk specifically about Schönfinkel’s combinators in connection with the threshold of computation universality. But before showing examples of what they do, I remark:
“Originally intended as an idealized way to represent structures of functions defined in logic, combinators were actually first introduced in 1920—sixteen years before Turing machines. But although they have been investigated somewhat over the past eighty years, they have for the most part been viewed as rather obscure and irrelevant constructs”
How “irrelevant” should they be seen as being? Of course it depends on what for. As things to explore in the computational universe, cellular automata have the great advantage of allowing immediate visualization. With combinators it’s a challenge to find any way to translate their behavior at all faithfully into something suitable for human perception. And since the Principle of Computational Equivalence implies that general computational features won’t depend on the particulars of different systems, there’s a tendency to feel that even in studying the computational universe, combinators “aren’t worth the trouble”.
Still, one thing that’s been prominently on display with cellular automata over the past 20 or so years is the idea that any sufficiently simple system will eventually end up being a useful model for something. Mollusc pigmentation. Catalysis processes. Road traffic flow. There are simple cellular automaton models for all of these. What about combinators? Without good visualization it’s harder to say “that looks like combinator behavior”. And even after 100 years they’re still a bit too unfamiliar. But when it comes to capturing some largescale expression or tree behavior of some system, I won’t be surprised if combinators are a good fit.
When one looks at the computational universe, one of the important ideas is “mining” it not just for programs that can serve as models for things, but also for programs that are somehow useful for some technological purpose. Yes, one can imagine specifically “compiling” some known program to combinators. But the question is whether “naturally occurring combinators” can somehow be identified as useful for some particular purpose. Could they deliver some new kind of distributed cryptographic protocol? Could they be helpful in mapping out distributed computing systems? Could they serve as a base for setting up molecularscale computation, say with treelike molecules? I don’t know. But it will be interesting to find out. And as combinators enter their second century they provide a unique kind of “computational raw material” to mine from the computational universe.
What is the universe fundamentally made of? For a long time the assumption was that it must be described by something fundamentally mathematical. And indeed right around the time combinators were being invented the two great theories of general relativity and quantum mechanics were just developing. And in fact it seemed as if both physics and mathematics were going so well that people like David Hilbert imagined that perhaps both might be completely solved—and that there might be a mathematicslike axiomatic basis for physics that could be “mechanically explored” as he imagined mathematics could be.
But it didn’t work out that way. Gödel’s theorem appeared to shatter the idea of a “complete mechanical exploration” of mathematics. And while there was immense technical progress in working out the consequences of general relativity and quantum mechanics little was discovered about what might lie underneath. Computers (including things like Mathematica) were certainly useful in exploring the existing theories of physics. But physics didn’t show any particular signs of being “fundamentally computational”, and indeed the existing theories seemed structurally not terribly compatible with computational processes.
But as I explored the computational universe and saw just what rich and complex behavior could arise even from very simple rules, I began to wonder whether maybe, far below the level of existing physics, the universe might be fundamentally computational. I began to make specific models in which space and time were formed from an evolving network of discrete points. And I realized that some of the ideas that had arisen in the study of things like combinators and lambda calculus from the 1930s and 1940s might have direct relevance.
Like combinators (or lambda calculus) my models had the feature that they allowed many possible paths of evolution. And like combinators (or lambda calculus) at least some of my models had the remarkable feature that in some sense it didn’t matter what path one took; the final result would always be the same. For combinators this “Church–Rosser” or “confluence” feature was what allowed one to have a definite fixed point that could be considered the result of a computation. In my models of the universe that doesn’t just stop—things are a bit more subtle—but the generalization to what I call causal invariance is precisely what leads to relativistic invariance and the validity of general relativity.
For many years my work on fundamental physics languished—a victim of other priorities and the uphill effort of introducing new paradigms into a wellestablished field. But just over a year ago—with help from two very talented young physicists—I started again, with unexpectedly spectacular results.
I had never been quite satisfied with my idea of everything in the universe being represented as a particular kind of giant graph. But now I imagined that perhaps it was more like a giant symbolic expression, or, specifically, like an expression consisting of a huge collection of relations between elements—in effect, a certain kind of giant hypergraph. It was, in a way, a very combinatorlike concept.
At a technical level, it’s not the same as a general combinator expression: it’s basically just a single layer, not a tree. And in fact that’s what seems to allow the physical universe to consist of something that approximates uniform (manifoldlike) space, rather than showing some kind of hierarchical treelike structure everywhere.
But when it comes to the progression of the universe through time, it’s basically just like the transformation of combinator expressions. And what’s become clear is that the existence of different paths—and their ultimate equivalences—is exactly what’s responsible not only for the phenomena of relativity, but also for quantum mechanics. And what’s remarkable is that many of the concepts that were first discovered in the context of combinators and lambda calculus now directly inform the theory of physics. Normal forms (basically fixed points) are related to black holes where “time stops”. Critical pair lemmas are related to measurement in quantum mechanics. And so on.
In practical computing, and in the creation of computational language, it was the addition of “meaningful names” to the raw structure of combinators that turned them into the powerful symbolic expressions we use. But in understanding the “data structure of the universe” we’re in a sense going back to something much more like “raw combinators”. Because now all those “atoms of space” that make up the universe don’t have meaningful names; they’re more like S’s and K’s in a giant combinator expression, distinct but yet all the same.
In the traditional, mathematical view of physics, there was always some sense that by “appropriately clever mathematics” it would be possible to “figure out what will happen” in any physical system. But once one imagines that physics is fundamentally computational, that’s not what one can expect.
And just like combinators—with their capability for universal computation—can’t in a sense be “cracked” using mathematics, so also that’ll be true of the universe. And indeed in our model that’s what the progress of time is about: it’s the inexorable, irreducible process of computation, associated with the repeated transformation of the symbolic expression that represents the universe.
When Hilbert first imagined that physics could be reduced to mathematics he probably thought that meant that physics could be “solved”. But with Gödel’s theorem—which is a reflection of universal computation—it became clear that mathematics itself couldn’t just be “solved”. But now in effect we have a theory that “reduces physics to mathematics”, and the result of the Gödel’s theorem phenomenon is something very important in our universe: it’s what leads to a meaningful notion of time.
Moses Schönfinkel imagined that with combinators he was finding “building blocks for logic”. And perhaps the very simplicity of what he came up with makes it almost inevitable that it wasn’t just about logic: it was something much more general. Something that can represent computations. Something that has the germ of how we can represent the “machine code” of the physical universe.
It took in a sense “humanizing” combinators to make them useful for things like computational language whose very purpose is to connect with humans. But there are other places where inevitably we’re dealing with something more like largescale “combinators in the raw”. Physics is one of them. But there are others. In distributed computing. And perhaps in biology, in economics and in other places.
There are specific issues of whether one’s dealing with trees (like combinators), or hypergraphs (like our model of physics), or something else. But what’s important is that many of the ideas—particularly around what we call multiway systems—show up with combinators. And yes, combinators often aren’t the easiest places for us humans to understand the ideas in. But the remarkable fact is that they exist in combinators—and that combinators are now a century old.
I’m not sure if there’ll ever be a significant area where combinators alone will be the dominant force. But combinators have—for a century—had the essence of many important ideas. Maybe as such they are at some level destined forever to be footnotes. But in sense they are also seeds or roots—from which remarkable things have grown. And as combinators enter their second century it seems quite certain that there is still much more that will grow from them.
]]>
Before Turing machines, before lambda calculus—even before Gödel’s theorem—there were combinators. They were the very first abstract examples ever to be constructed of what we now know as universal computation—and they were first presented on December 7, 1920. In an alternative version of history our whole computing infrastructure might have been built on them. But as it is, for a century, they have remained for the most part a kind of curiosity—and a pinnacle of abstraction, and obscurity.
It’s not hard to see why. In their original form from 1920, there were two basic combinators, s and k, which followed the simple replacement rules (now represented very cleanly in terms of patterns in the Wolfram Language):
✕
s[x_][y_][z_] > x[z][y[z]] 
✕
k[x_][y_] > x 
The idea was that any symbolic structure could be generated from some combination of s’s and k’s. As an example, consider a[b[a][c]]. We’re not saying what a, b and c are; they’re just symbolic objects. But given a, b and c how do we construct a[b[a][c]]? Well, we can do it with the s, k combinators.
Consider the (admittedly obscure) object
✕
s[s[k[s]][s[k[k]][s[k[s]][k]]]][s[k[s[s[k][k]]]][k]] 
(sometimes instead written S(S(KS)(S(KK)(S(KS)K)))(S(K(S(SKK)))K)).
Now treat this like a function and apply it to a,b,c s[s[k[s]][s[k[k]][s[k[s]][k]]]][s[k[s[s[k][k]]]][k]][a][b][c]. Then watch what happens when we repeatedly use the s, k combinator replacement rules:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/\ Programs.wl"]; CombinatorEvolutionPlot[ CombinatorFixedPointList[ s[s[k[s]][s[k[k]][s[k[s]][k]]]][s[k[s[s[k][k]]]][k]][a][b][ c]], "StatesDisplay"] 
Or, a tiny bit less obscurely:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/\ Programs.wl"]; Magnify[ CombinatorEvolutionPlot[(CombinatorPlot[#, "FramedMatches"] & /@ CombinatorFixedPointList[ s[s[k[s]][s[k[k]][s[k[s]][k]]]][s[k[s[s[k][k]]]][k]][a][b][c]]), "StatesDisplay"], .9] 
After a number of steps, we get a[b[a][c]]! And the point is that whatever symbolic construction we want, we can always set up some combination of s’s and k’s that will eventually do it for us—and ultimately be computation universal. They’re equivalent to Turing machines, lambda calculus and all those other systems we know are universal. But they were discovered before any of these systems.
By the way, here’s the Wolfram Language way to get the result above (//. repeatedly applies the rules until nothing changes anymore):
✕
s[s[k[s]][s[k[k]][s[k[s]][k]]]][s[k[s[s[k][k]]]][k]][a][b][ c] //. {s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x} 
And, yes, it’s no accident that it’s extremely easy and natural to work with combinators in the Wolfram Language—because in fact combinators were part of the deep ancestry of the core design of the Wolfram Language.
For me, though, combinators also have another profound personal resonance. They’re examples of very simple computational systems that turn out (as we’ll see at length here) to show the same remarkable complexity of behavior that I’ve spent so many years studying across the computational universe.
A century ago—particularly without actual computers on which to do experiments—the conceptual framework that I’ve developed for thinking about the computational universe didn’t exist. But I’ve always thought that of all systems, combinators were perhaps the earliest great “near miss” to what I’ve ended up discovering in the computational universe.
Let’s say we want to use combinators to do a computation on something. The first question is: how should we represent the “something”? Well, the obvious answer is: just use structures built out of combinators!
For example, let’s say we want to represent integers. Here’s an (at first bizarreseeming) way to do that. Take s[k] and repeatedly apply s[s[k[s]][k]]. Then we’ll get a sequence of combinator expressions:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ Append[NestList[s[s[k[s]][k]], s[k], 5], \[VerticalEllipsis]], "StatesDisplay"] 
On their own, these expressions are inert under the s and k rules. But take each one (say e) and form e[s][k]. Here’s what happens for example to the third case above when you then apply the s and k rules:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/\ Programs.wl"]; CombinatorEvolutionPlot[ CombinatorFixedPointList[ Nest[s[s[k[s]][k]], s[k], 2][s][k]], "StatesDisplay"] 
To get this in the Wolfram Language, we can use Nest, which nestedly applies functions:
✕
Nest[f, x, 4] 
Then the final result above is obtained as:
✕
Nest[s[s[k[s]][k]], s[k], 2][s][k] //. {s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x} 
Here’s an example involving nesting 7 times:
✕
Nest[s[s[k[s]][k]], s[k], 7][s][k] //. {s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x} 
So this gives us a (perhaps seemingly obscure) way to represent an integer n. Just form:
✕
Nest[s[s[k[s]][k]], s[k], n] 
This is a combinator representation of n, that we can “decode” by applying to [s][k]. OK, so given two integers represented this way, how would we add them together? Well, there’s a combinator for that! And here it is:
✕
s[k[s]][s[k[s[k[s]]]][s[k[k]]]] 
If we call this plus, then let’s compute plus[1][2][s][k], where 1 and 2 are represented by combinators:
✕
Cell[CellGroupData[{ Cell[BoxData[ RowBox[{ RowBox[{ "CloudGet", "[", "\"\<https://www.wolframcloud.com/obj/swblog/Combinators/Programs.\ wl\>\"", "]"}], ";"}]], "Input"], Cell[BoxData[ RowBox[{ RowBox[{"plus", "=", RowBox[{ RowBox[{"s", "[", RowBox[{"k", "[", "s", "]"}], "]"}], "[", RowBox[{ RowBox[{"s", "[", RowBox[{"k", "[", RowBox[{"s", "[", RowBox[{"k", "[", "s", "]"}], "]"}], "]"}], "]"}], "[", RowBox[{"s", "[", RowBox[{"k", "[", "k", "]"}], "]"}], "]"}], "]"}]}], ";"}]], "Input"], Cell[BoxData[ RowBox[{ RowBox[{"integer", "[", "n_", "]"}], ":=", RowBox[{"Nest", "[", RowBox[{ RowBox[{"s", "[", RowBox[{ RowBox[{"s", "[", RowBox[{"k", "[", "s", "]"}], "]"}], "[", "k", "]"}], "]"}], ",", RowBox[{"s", "[", "k", "]"}], ",", "n"}], "]"}]}]], "Input"], Cell[BoxData[ RowBox[{"CombinatorEvolutionPlot", "[", RowBox[{ RowBox[{"CombinatorFixedPointList", "[", RowBox[{ RowBox[{ RowBox[{ RowBox[{"plus", "[", RowBox[{"integer", "[", "1", "]"}], "]"}], "[", RowBox[{"integer", "[", "2", "]"}], "]"}], "[", "s", "]"}], "[", "k", "]"}], "]"}], ",", " ", "\"\<StatesDisplay\>\"", ",", " ", RowBox[{"Spacings", "\[Rule]", ".85"}], ",", RowBox[{"BaseStyle", "\[Rule]", RowBox[{"{", RowBox[{ RowBox[{"GrayLevel", "[", ".4", "]"}], ",", RowBox[{"FontSize", "\[Rule]", "9.8"}]}], "}"}]}]}], "]"}]], "Input"] }, Open ]] 
It takes a while, but there’s the result: 1 + 2 = 3.
Here’s 4 + 3, giving the result s[s[s[s[s[s[s[k]]]]]]] (i.e. 7), albeit after 51 steps:
✕
Cell[CellGroupData[{ Cell[BoxData[ RowBox[{ RowBox[{ "CloudGet", "[", "\"\<https://www.wolframcloud.com/obj/swblog/Combinators/Programs.\ wl\>\"", "]"}], ";"}]], "Input"], Cell[BoxData[ RowBox[{ RowBox[{"plus", "=", RowBox[{ RowBox[{"s", "[", RowBox[{"k", "[", "s", "]"}], "]"}], "[", RowBox[{ RowBox[{"s", "[", RowBox[{"k", "[", RowBox[{"s", "[", RowBox[{"k", "[", "s", "]"}], "]"}], "]"}], "]"}], "[", RowBox[{"s", "[", RowBox[{"k", "[", "k", "]"}], "]"}], "]"}], "]"}]}], ";"}]], "Input"], Cell[BoxData[ RowBox[{ RowBox[{"integer", "[", "n_", "]"}], ":=", RowBox[{"Nest", "[", RowBox[{ RowBox[{"s", "[", RowBox[{ RowBox[{"s", "[", RowBox[{"k", "[", "s", "]"}], "]"}], "[", "k", "]"}], "]"}], ",", RowBox[{"s", "[", "k", "]"}], ",", "n"}], "]"}]}]], "Input"], Cell[BoxData[ RowBox[{"Magnify", "[", RowBox[{ RowBox[{"Column", "[", RowBox[{"CombinatorFixedPointList", "[", RowBox[{ RowBox[{ RowBox[{ RowBox[{"plus", "[", RowBox[{"integer", "[", "4", "]"}], "]"}], "[", RowBox[{"integer", "[", "3", "]"}], "]"}], "[", "s", "]"}], "[", "k", "]"}], "]"}], "]"}], ",", ".3"}], "]"}]], "Input"] }, Open ]] 
What about doing multiplication? There’s a combinator for that too, and it’s actually rather simple:
✕
s[k[s]][k] 
Here’s the computation for 3 × 2—giving 6 after 58 steps:
✕
Cell[CellGroupData[{Cell[BoxData[ RowBox[{ RowBox[{ "CloudGet", "[", "\"\<https://www.wolframcloud.com/obj/swblog/Combinators/Programs.\ wl\>\"", "]"}], ";"}]], "Input", CellGroupingRules>{"GroupTogetherGrouping", 10001.}], Cell[BoxData[ RowBox[{ RowBox[{"integer", "[", "n_", "]"}], ":=", RowBox[{"Nest", "[", RowBox[{ RowBox[{"s", "[", RowBox[{ RowBox[{"s", "[", RowBox[{"k", "[", "s", "]"}], "]"}], "[", "k", "]"}], "]"}], ",", RowBox[{"s", "[", "k", "]"}], ",", "n"}], "]"}]}]], "Input", CellGroupingRules>{"GroupTogetherGrouping", 10001.}], Cell[BoxData[ RowBox[{ RowBox[{"times", "=", RowBox[{ RowBox[{"s", "[", RowBox[{"k", "[", "s", "]"}], "]"}], "[", "k", "]"}]}], ";"}]], "Input", CellGroupingRules>{"GroupTogetherGrouping", 10001.}], Cell[BoxData[ RowBox[{"Magnify", "[", RowBox[{ RowBox[{"Column", "[", RowBox[{"CombinatorFixedPointList", "[", RowBox[{ RowBox[{ RowBox[{ RowBox[{"times", "[", RowBox[{"integer", "[", "3", "]"}], "]"}], "[", RowBox[{"integer", "[", "2", "]"}], "]"}], "[", "s", "]"}], "[", "k", "]"}], "]"}], "]"}], ",", ".3"}], "]"}]], "Input", CellGroupingRules>{"GroupTogetherGrouping", 10001.}] }, Open ]] 
Here’s a combinator for power:
✕
s[k[s[s[k][k]]]][k] 
And here’s the computation of 3^{2} using it (which takes 116 steps):
✕
Cell[CellGroupData[{Cell[BoxData[ RowBox[{ RowBox[{ "CloudGet", "[", "\"\<https://www.wolframcloud.com/obj/swblog/Combinators/Programs.\ wl\>\"", "]"}], ";"}]], "Input", CellGroupingRules>{"GroupTogetherGrouping", 10001.}], Cell[BoxData[ RowBox[{ RowBox[{"power", "=", RowBox[{ RowBox[{"s", "[", RowBox[{"k", "[", RowBox[{"s", "[", RowBox[{ RowBox[{"s", "[", "k", "]"}], "[", "k", "]"}], "]"}], "]"}], "]"}], "[", "k", "]"}]}], ";"}]], "Input", CellGroupingRules>{"GroupTogetherGrouping", 10001.}], Cell[BoxData[ RowBox[{ RowBox[{"integer", "[", "n_", "]"}], ":=", RowBox[{"Nest", "[", RowBox[{ RowBox[{"s", "[", RowBox[{ RowBox[{"s", "[", RowBox[{"k", "[", "s", "]"}], "]"}], "[", "k", "]"}], "]"}], ",", RowBox[{"s", "[", "k", "]"}], ",", "n"}], "]"}]}]], "Input", CellGroupingRules>{"GroupTogetherGrouping", 10001.}], Cell[BoxData[ RowBox[{"Magnify", "[", RowBox[{ RowBox[{"Column", "[", RowBox[{ RowBox[{"CombinatorFixedPointList", "[", RowBox[{ RowBox[{ RowBox[{ RowBox[{"power", "[", RowBox[{"integer", "[", "3", "]"}], "]"}], "[", RowBox[{"integer", "[", "2", "]"}], "]"}], "[", "s", "]"}], "[", "k", "]"}], "]"}], ",", RowBox[{"BaseStyle", "\[Rule]", RowBox[{"{", RowBox[{"FontWeight", "\[Rule]", " ", "\"\<Fat\>\""}], "}"}]}]}], "]"}], ",", ".12"}], "]"}]], "Input", CellGroupingRules>{"GroupTogetherGrouping", 10001.}] }, Open ]] 
One might think this is a crazy way to compute things. But what’s important is that it works, and, by the way, the basic idea for it was invented in 1920.
And while it might seem complicated, it’s very elegant. All you need are s and k. Then you can construct everything from them: functions, data, whatever.
So far we’re using what’s essentially a unary representation of numbers. But we can set up combinators to handle binary numbers instead. Or, for example, we can set up combinators to do logic operations.
Imagine having k stand for true, and s[k] stand for false (so, like If[p,x,y], k[x][y] gives x while s[k][x][y] gives y). Then the minimal combinator for And is just
✕
s[s][k] 
and we can check this works by computing a truth table (TT, TF, FT, FF):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ Map[If[LeafCount[#] <= 2, #, Magnify[#, .8]] &, CombinatorFixedPointList /@ Apply[s[s][k][#1][#2] &] /@ Tuples[{s[k], k}, 2], {2}], "StatesDisplay", Spacings > 2] 
A search gives the minimal combinator expressions for the 16 possible 2input Boolean functions:
✕

And by combining these (or even just copies of the one for Nand) one can make combinators that compute any possible Boolean function. And in fact in general one can—at least in principle—represent any computation by “compiling” it into combinators.
Here’s a more elaborate example, from my book A New Kind of Science. This is a combinator that represents one step in the evolution of the rule 110 cellular automaton:
✕

And, here from the book, are representations of repeatedly applying this combinator to compute—with great effort—three steps in the evolution of rule 110:
There’s a little further to go, involving fixedpoint combinators, etc. But basically, since we know that rule 110 is computation universal, this shows that combinators also are.
Now that a century has passed, what should we think of combinators? In some sense, they still might be the purest way to represent computation that we know. But they’re also very hard for us humans to understand.
Still, as computation and the computational paradigm advance, and become more familiar, it seems like on many fronts we’re moving ever closer to core ideas of combinators. And indeed the foundational symbolic structure of the Wolfram Language—and much of what I’ve personally built over the past 40 years—can ultimately be seen as deeply informed by ideas that first arose in combinators.
Computation may be the single most powerful unifying intellectual concept ever. But the actual engineering development of computers and computing has tended to keep different aspects of it apart. There’s data. There are data types. There’s code. There are functions. There are variables. There’s flow of control. And, yes, it may be convenient to keep these things apart in the traditional approach to the engineering of computer systems. But it doesn’t need to be that way. And combinators show us that actually there’s no need to have any of these distinctions: everything can be together, and can made of the same, dynamic “computational stuff”.
It’s a very powerful idea. But in its raw form, it’s also very disorienting for us humans. Because to understand things, we tend to rely on having “fixed anchors” to which we can attach meaning. And in pure, everchanging seas of s, k combinators—like the ones we saw above—we just don’t have these.
Still, there’s a compromise—and in a sense that’s exactly what’s made it possible for me to build the fullscale computational language that the Wolfram Language now is. The point is that if we’re going to be able to represent everything in the world computationally we need the kind of unity and flexibility that combinatorlike constructs provide. But we don’t just want raw, simple combinators. We need to in effect predefine lots of combinatorlike constructs that have particular meanings related to what we’re representing in the world.
At a practical level, the crucial idea is to represent everything as a symbolic expression, and then to say that evaluating these expressions consists in repeatedly applying transformations to them. And, yes, symbolic expressions in the Wolfram Language are just like the expressions we’ve made out of combinators—except that instead of involving only s’s and k’s, they involve thousands of different symbolic constructs that we define to represent molecules, or cities or polynomials. But the key point is that—like with combinators—the things we’re dealing with are always structurally just nested applications of pure symbolic objects.
Something we immediately learn from combinators is that “data” is really no different from “code”; they can both be represented as symbolic expressions. And both can be the raw material for computation. We also learn that “data” doesn’t have to maintain any particular type or structure; not only its content, but also the way it is built up as a symbolic expression can be the dynamic output of a computation.
One might imagine that things like this would just be esoteric matters of principle. But what I’ve learned in building the Wolfram Language is that actually they’re natural and crucially important in having convenient ways to capture computationally how we humans think about things, and the way the world is.
From the early days of practical computing, there was an immediate instinct to imagine that programs should be set up as sequences of instructions saying for example “take a thing, then do this to it, then do that” and so on. The result would be a “procedural” program like:
✕
x = f[x]; x = g[x]; x = h[x]; x 
But as the combinator approach suggests, there’s a conceptually much simpler way to write this in which one’s just successively applying functions, to make a “functional” program:
✕
h[g[f[x]]] 
(In the Wolfram Language, this can also be written h@g@f@x or x//f//g//h.)
Given the notion that everything is a symbolic expression, one’s immediately led to have functions to operate on other functions, like
✕
Nest[f, x, 6] 
or:
✕
ReverseApplied[f][a, b] 
This idea of such “higherorder functions” is quintessentially combinator informed—and very elegant and powerful. And as the years go by we’re gradually managing to see how to make more and more aspects of it understandable and accessible in the Wolfram Language (think: Fold, MapThread, SubsetMap, FoldPair, …).
OK, but there’s one more thing combinators do—and it’s their most famous: they allow one to set things up so that one never needs to define variables or name things. In typical programming one might write things like:
✕
With[{x = 3}, 1 + x^2] 
✕
f[x_] := 1 + x^2 
✕
Function[x, 1 + x^2] 
✕
x > 1 + x^2 
But in none of these cases does it matter what the actual name x is. The x is just a placeholder that’s standing for something one’s “passing around” in one’s code.
But why can’t one just “do the plumbing” of specifying how something should be passed around, without explicitly naming anything? In a sense a nested sequence of functions like f[g[x]] is doing a simple case of this; we’re not giving a name to the result of g[x]; we’re just feeding it as input to f in a “single pipe”. And by setting up something like Function[x, 1+x^2] we’re constructing a function that doesn’t have a name, but which we can still apply to things:
✕
Function[x, 1 + x^2][4] 
The Wolfram Language gives us an easy way to get rid of the x here too:
✕
(1 + #^2) &[4] 
In a sense the # (“slot”) here acts a like a pronoun in a natural language: we’re saying that whatever we’re dealing with (which we’re not going to name), we want to find “one plus the square of it”.
OK, but so what about the general case? Well, that’s what combinators provide a way to do.
Consider an expression like:
✕
f[g[x][y]][y] 
Imagine this was called q, and that we wanted q[x][y] to give f[g[x][y]][y]. Is there a way to define q without ever mentioning names of variables? Yes, here’s how to do it with s, k combinators:
✕
CombinatorEvolutionPlot[{SKCombinatorCompile[ f[g[x][y]][y], {x, y}]}, "StatesDisplay", "DisplayStyles" > {s > Style[s, Black, FontWeight > "SemiBold"], k > Style[k, Black, FontWeight > "SemiBold"], g > Style[g, Gray], f > Style[f, Gray]}] 
There’s no mention of x and y here; the combinator structure is just defining—without naming anything—how to “flow in” whatever one provides as “arguments”. Let’s watch it happen:
✕
CombinatorEvolutionPlot[ CombinatorFixedPointList[ s[s[k[s]][s[k[s[k[f]]]][g]]][k[s[k][k]]][x][y]], "StatesDisplay", "DisplayStyles" > {s > Style[s, Black, FontWeight > "SemiBold"], k > Style[k, Black, FontWeight > "SemiBold"], g > Style[g, Gray], f > Style[f, Gray], x > Style[x, Gray], y > Style[y, Gray]}] 
Yes, it seems utterly obscure. And try as I might over the years to find a usefully humanunderstandable “packaging” of this that we could build into the Wolfram Language, I have so far failed.
But it’s very interesting—and inspirational—that there’s even in principle a way to avoid all named variables. Yes, it’s often not a problem to use named variables in writing programs, and the names may even communicate useful information. But there are all sorts of tangles they can get one into.
It’s particularly bad when a name is somehow global, and assigning a value to it affects (potentially insidiously) everything one’s doing. But even if one keeps the scope of a name localized, there are still plenty of problems that can occur.
Consider for example:
✕
Function[x, Function[y, 2 x + y]] 
It’s two nested anonymous functions (AKA lambdas)—and here the x “gets” a, and y “gets” b:
✕
Function[x, Function[y, 2 x + y]][a][b] 
But what about this:
✕
Function[x, Function[x, 2 x + x]] 
The Wolfram Language conveniently colors things red to indicate that something bad is going on. We’ve got a clash of names, and we don’t know “which x” is supposed to refer to what.
It’s a pretty general problem; it happens even in natural language. If we write “Jane chased Bob. Jane ran fast.” it’s pretty clear what we’re saying. But “Jane chased Jane. Jane ran fast.” is already confused. In natural language, we avoid names with pronouns (which are basically the analog of # in the Wolfram Language). And because of the (traditional) gender setup in English “Jane chased Bob. She ran fast.” happens to work. But “The cat chased the mouse. It ran fast.” again doesn’t.
But combinators solve all this, by in effect giving a symbolic procedure to describe what reference goes where. And, yes, by now computers can easily follow this (at least if they deal with symbolic expressions, like in the Wolfram Language). But the passage of a century—and even our experience with computation—doesn’t seem to have made it much easier for us humans to follow it.
By the way, it’s worth mentioning one more “famous” feature of combinators—that actually had been independently invented before combinators—and that these days, rather ahistorically, usually goes by the name “currying”. It’s pretty common—say in the Wolfram Language—to have functions that naturally take multiple arguments. GeoDistance[a, b] or Plus[a, b, c] (or a+b+c) are examples. But in trying to uniformize as much as possible, combinators just make all “functions” nominally have only one argument.
To set up things that “really” have multiple arguments, one uses structures like f[x][y][z]. From the point of standard mathematics, this is very weird: one expects “functions” to just “take an argument and return a result”, and “map one space to another” (say real numbers to complex numbers).
But if one’s thinking “sufficiently symbolically” it’s fine. And in the Wolfram Language—with its fundamentally symbolic character (and distant ancestry in combinator concepts)—one can just as well make a definition like
✕
f[x_][y_] := x + y 
as:
✕
f[x_, y_] := x + y 
Back in 1980—even though I don’t think I knew about combinators yet at that time—I actually tried in my SMP system that was a predecessor to the Wolfram Language the idea of having f[x][y] be able to be equivalent to f[x,y]. But it was a bit like forcing every verb to be intransitive—and there were many situations in which it was quite unnatural, and hard to understand.
So far we’ve been talking about combinators that are set up to compute specific things that we want to compute. But what if we just pick possible combinators “from the wild”, say at random? What will they do?
In the past, that might not have seemed like a question that was worth asking. But I’ve now spent decades studying the abstract computational universe of simple programs—and building a whole “new kind of science” around the discoveries I’ve made about how they behave. And with that conceptual framework it now becomes very interesting to look at combinators “in the wild” and see how they behave.
So let’s begin at the beginning. The simplest s, k combinator expressions that won’t just remain unchanged under the combinator rules have to have size 3. There are a total of 16 such expressions:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[{#}, "StatesDisplay"] & /@ EnumerateCombinators[3] 
And none of them do anything interesting: they either don’t change at all, or, as in for example k[s][s], they immediately give a single symbol (here s).
But what about larger combinator expressions? The total number of possible combinator expressions of size n grows like
✕
Table[2^n CatalanNumber[n  1], {n, 10}] 
or in general
✕
2^n CatalanNumber[n  1] == (2^n Binomial[2 n  2, n  1])/n 
or asymptotically:
✕

At size 4, again nothing too interesting happens. With all the 80 possible expressions, the longest it takes to reach a fixed point is 3 steps, and that happens in 4 cases:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ CombinatorFixedPointList /@ {s[k][s][s], s[k][s][k], s[k][k][s], s[k][k][k]}, "StatesDisplay", Spacings > 2] 
At size 5, the longest it takes to reach a fixed point is 4 steps, and that happens in 10 cases out of 448:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Column[ CombinatorEvolutionPlot[#, "StatesDisplay", ItemSize > 10] & /@ Partition[ CombinatorFixedPointList /@ Select[EnumerateCombinators[5], Length[CombinatorFixedPointList[#]] == 4 &] /. {s > Style[s, Black, FontWeight > "SemiBold"], k > Style[k, Black, FontWeight > "SemiBold"], a > Style[a, Black], b > Style[b, Black], c > Style[c, Black]}, 5], Spacings > 1] 
At size 6, there is a slightly broader distribution of “halting times”:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Histogram[ Length /@ CombinatorFixedPointList /@ EnumerateCombinators[6], {1}, Frame > True, ChartStyle > $PlotStyles["Histogram", "ChartStyle"], ImageSize > 200] 
The longest halting time is 7, achieved by:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Magnify[ CombinatorEvolutionPlot[ CombinatorFixedPointList /@ Select[EnumerateCombinators[6], Length[CombinatorFixedPointList[#]] == 7 &], "StatesDisplay", Spacings > 2], .9] 
Meanwhile, the largest expressions created are of size 10 (in the sense that they contain a total of 10 s’s or k’s):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Magnify[ CombinatorEvolutionPlot[ PadRight[CombinatorFixedPointList /@ Select[EnumerateCombinators[6], LeafCount[CombinatorFixedPoint[#]] == 10 &], {Automatic, Automatic}, ""], "StatesDisplay", Spacings > 2], .75] 
The distribution of final sizes is a little odd:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Histogram[ LeafCount[CombinatorFixedPoint[#]] & /@ EnumerateCombinators[6], {1}, Frame > True, ChartStyle > $PlotStyles["Histogram", "ChartStyle"], ImageSize > 200] 
For size n ≤ 5, there’s actually a gap with no final states of size n – 1 generated. But at size 6, out of 2688 expressions, there are just 12 that give size 5 (about 0.4%).
OK, so what’s going to happen if we go to size 7? Now there are 16,896 possible expressions. And there’s something new: two never stabilize (S(SS)SSSS), (SSS(SS)SS):
✕
{s[s[s]][s][s][s][s], s[s][s][s[s]][s][s]} 
After one step, the first one of these evolves to the second, but then this is what happens over the next few steps (we’ll see other visualizations of this later):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ CombinatorEvolveList[s[s][s][s[s]][s][s], 8], "StatesDisplay"] 
The total size (i.e. LeafCount, or “number of s’s”) grows like:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; LeafCount /@ CombinatorEvolveList[s[s][s][s[s]][s][s], 30] 
A log plot shows that after an initial transient the size grows roughly exponentially:
✕

And looking at successive ratios one sees some elaborate fine structure:
✕

What is this ultimately doing? With a little effort, one finds that the sizes have a length83 transient, followed by sequences of values of length 23 + 2n, in which the second differences of successive sizes are given by:
✕
Join[38 {0, 0, 0, 12, 17} 2^n + {0, 1, 0, 135, 189}, Table[0, n], 38 {0, 1, 0, 0, 1, 1, 0, 0, 0, 4} 2^n + {12, 13, 0, 6, 7, 1, 0, 1, 0, 27}, Table[0, n + 2], 228 {0, 1, 0, 0, 1, 1} 2^n + 2 {6, 20, 0, 3, 17, 14}] 
The final sequence of sizes is obtained by concatenating these blocks and computing Accumulate[Accumulate[list]]—giving an asymptotic size that appears to be of the form . So, yes, we can ultimately “figure out what’s going on” with this little size7 combinator (and we’ll see some more details later). But it’s remarkable how complicated it is.
OK, but let’s go back and look at the other size7 expressions. The halting time distribution (ignoring the 2 cases that don’t halt) basically falls off exponentially, but shows a couple of outliers:
✕

The maximum finite halting time is 16 steps, achieved by s[s[s[s]]][s][s][s] (S(S(SS))SSS):
✕

And the distribution of final sizes is (with the maximum of 41 being achieved by the maximumhaltingtime expression we’ve just seen):
✕

OK, so what happens at size 8? There are 109,824 possible combinator expressions. And it’s fairly easy to find out that all but 76 of these go to fixed points within at most 50 steps (the longest survivor is s[s][s][s[s[s]]][k][k] (SSS(S(SS))KK), which halts after 44 steps):
✕

The final fixed points in these cases are mostly quite small; this is the distribution of their sizes:
✕

And here is a comparison between halting times and final sizes:
✕

The outlier for size is s[s][k][s[s[s]][s]][s] (SSK(S(SS)S)S), which evolves in 27 steps to a fixed expression of size 80 (along the way reaching an intermediate expression of size 86):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ LeafCount /@ CombinatorEvolveList[s[s][k][s[s[s]][s]][s], 33], Frame > True, Joined > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleDark"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"], ImageSize > 250] 
Among combinator expressions that halt in less than 50 steps, the maximum intermediate expression size of 275 is achieved for s[s][s][s[s[s][k]]][k] (SSS(S(SSK))K) (which ultimately evolves to s[s[s[s][k]]][k] (S(S(SSK))K) after 26 steps):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ LeafCount /@ CombinatorEvolveList[s[s][s][s[s[s][k]]][k], 33], Frame > True, Joined > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleDark"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"], ImageSize > 250] 
So what about size8 expressions that don’t halt after 50 steps? There are altogether 76—with 46 of these being inequivalent (in the sense that they don’t quickly evolve to others in the set).
Here’s how these 46 expressions grow (at least until they reach size 10,000):
✕

Some of these actually end up halting. In fact, s[s][s][s[s]][s][k[k]] (SSS(SS)S(KK)) halts after just 52 steps, with final result k[s[k][k[s[k][k]]]] (K(SK(K(SKK)))), having achieved a maximum expression size of 433:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ LeafCount /@ CombinatorEvolveList[s[s][s][s[s]][s][k[k]], 60], Frame > True, Joined > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleDark"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"], ImageSize > 250] 
The next shortest halting time occurs for s[s][s][s[s[s]]][k][s] (SSS(S(SS))KS), which takes 89 steps to produce an expression of size 65:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ LeafCount /@ CombinatorEvolveList[s[s][s][s[s[s]]][k][s], 95], Frame > True, Joined > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleDark"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"], ImageSize > 250] 
Then we have s[s][s][s[s[s]]][s][k] (SSS(S(SS))SK), which halts (giving the size10 s[k][s[s[s[s[s[s]]][s]]][k]] (SK(S(S(S(S(SS))S))K)), but only after 325 steps:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ LeafCount /@ CombinatorEvolveList[s[s][s][s[s[s]]][s][k], 350], Frame > True, Joined > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleDark"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"], ImageSize > 250] 
There’s also a stilllarger case to be seen: s[s[s][s]][s][s[s]][k] (S(SSS)S(SS)K), which exhibits an interesting “IntegerExponentlike” nested pattern of growth, but finally halts after 1958 steps, having achieved a maximum intermediate expression size of 21,720 along the way:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ LeafCount /@ CombinatorFixedPointList[s[s[s[s][s]]][s][s][k]], Frame > True, Joined > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleDark"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"], ImageSize > 250] 
What about the other expressions? s[s][s][s[s]][s][s[k]] shows very regular growth in size:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ LeafCount /@ CombinatorEvolveList[s[s][s][s[s]][s][s[k]], 300], Frame > True, Joined > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleDark"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"], ImageSize > 250] 
In the other cases, there’s no such obvious regularity. But one can start to get a sense of what happens by plotting differences between sizes on successive steps:
✕

There are some obvious cases of regularity here. Several show a regular pattern of linearly increasing differences, implying overall t^{2} growth in size:
✕

✕

Others show regular growth in differences, leading to t^{3/2} growth in size:
✕

✕

Others have pure exponential growth:
✕

There are quite a few that have regular but belowexponential growth, much like the size7 case s[s][s][s[s]][s][s] (SSS(SS)SS) with ~ growth:
✕

All the cases we’ve just looked at only involve s. When we allow k as well, there’s for example s[s][s][s[s[s][s]]][k] (SSS(S(SSS))K)—which shows regular, essentially “stairlike” growth:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ LeafCount /@ CombinatorFixedPointList[s[s][s][s[s[s][s]]][k], "MaxSize" > 100000, "MaxSteps" > 2000], Frame > True, Joined > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleDark"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"], ImageSize > 330] 
There’s also a case like s[s[s]][s][s[s]][s][k] (S(SS)S(SS)SK):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ LeafCount /@ CombinatorFixedPointList[s[s[s]][s][s[s]][s][k], "MaxSize" > 50000, "MaxSteps" > 4000], Frame > True, Joined > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleDark"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"], AspectRatio > 1/3, ImageSize > 600] 
On a small scale, this appears somewhat regular, but the largerscale structure, as revealed by taking differences, it doesn’t seem so regular (though it does have a certain “IntegerExponentlike” look):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ Differences[ LeafCount /@ CombinatorFixedPointList[s[s[s]][s][s[s]][s][k], "MaxSize" > 50000, "MaxSteps" > 4000]], Frame > True, Joined > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleLight"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"], AspectRatio > 1/5, PlotRange > All, ImageSize > 620] 
It’s not clear what will happen in this case. The overall form of the behavior looks a bit similar to examples above that eventually terminate. Continuing for 50,000 steps, though, here’s what’s happened:
✕

And in fact it turns out that the sizedifference peaks continue to get higher—having values of the form 6 (17 × 2^{n} + 1) and occurring at positions of the form 2 (9 × 2^{n+2} + n – 18).
Here’s another example: s[s][s][s[s]][s][k[s]] (SSS(SS)S(KS))). The overall growth in this case—at least for 200 steps—looks somewhat irregular:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ LeafCount /@ CombinatorEvolveList[s[s][s][s[s]][s][k[s]], 200], Frame > True, Joined > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleDark"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"], AspectRatio > 1/3] 
And taking differences reveals a fairly complex pattern of behavior:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ Differences[ LeafCount /@ CombinatorEvolveList[s[s][s][s[s]][s][k[s]], 200]], Frame > True, Joined > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleLight"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"], PlotRange > All, AspectRatio > 1/3, ImageSize > 570] 
But after 1000 steps there appears to be some regularity to be seen:
✕

And even after 2000 steps the regularity is more obvious:
✕

There’s a long transient, but after that there are systematic peaks in the size difference, with the n^{th} peak having height 16487 + 3320 n and occurring at step 14n^{2} + 59n + 284. (And, yes, it’s pretty weird to see all these strange numbers cropping up.)
What happens if we look at size10 combinator expressions? There’s a lot of repeating of behavior that we’ve seen with smaller expressions. But some new things do seem to happen.
After 1000 steps s[s][k][s[s][k][s[s]][s]][k] (SSK(SSK(SS)S)K) seems to be doing something quite complicated when one looks at its size differences:
✕

But it turns out that this is just a transient, and after 1000 steps or so, the system settles into a pattern of continual growth similar to ones we’ve seen before:
✕

Another example is s[s][k][s[s][k][s[s]][s]][s] (SSK(SSK(SS)S)S). After 2000 steps there seems to be some regularity, and some irregularity:
✕

And basically this continues:
✕

s[s][s][s[s[s[k]]]][s][s[k]] (SSS(S(S(SK)))S(SK)) is a fairly rare example of “nestedlike” growth that continues forever (after a million steps, the size obtained is 597,871,806):
✕

As a final example, consider s[s[s]][s][s][s][s[s][k[k]]] (S(SS)SSS(SS(KK))). Here’s what this does for the first 1000 steps:
✕

It looks somewhat complicated, but seems to be growing slowly. But then around step 4750 it suddenly jumps up, quickly reaching size 51,462:
✕

Keep going further, and there are more jumps:
✕

After 100,000 steps there’s a definite pattern of jumps—but it’s not quite regular:
✕

So what’s going to happen? Mostly it seems to be maintaining a size of a few thousand or more. But then, after 218,703 steps, it dips down, to size 319. So, one might think, perhaps it’s going to “die out”. Keep going longer, and at step 34,339,093 it gets down to size 27, even though by step 36,536,622 it’s at size 105,723.
Keep going even longer, and one sees it dipping down in size again (here shown in a downsampled log plot):
✕

But, then, suddenly, boom. At step 137,356,329 it stops, reaching a fixed point of size 39. And, yes, it’s totally remarkable that a tiny combinator expression like s[s[s]][s][s][s][s[s][k[k]]] (S(SS)SSS((SS(KK))) can do all this.
If one hasn’t seen it before, this kind of complexity would be quite shocking. But after spending so long exploring the computational universe, I’ve become used to it. And now I just view each new case I see as yet more evidence for my Principle of Computational Equivalence.
A central fact about s, k combinators is that they’re computation universal. And this tells us that whatever computation we want to do, it’ll always be possible to “write a combinator program”—i.e. to create a combinator expression—that’ll do it. And from this it follows that—just like with the halting problem for Turing machines—the problem of whether a combinator will halt is in general undecidable.
But the new thing we’re seeing here is that it’s difficult to figure out what will happen not just “in general” for complicated expressions set up to do particular computations but also for simple combinator expressions that one might “find in the wild”. But the Principle of Computational Equivalence tells us why this happens.
Because it says that even simple programs—and simple combinator expressions—can lead to computations that are as sophisticated as anything. And this means that their behavior can be computationally irreducible, so that the only way to find out what will happen is essentially just to run each step and see what happens. So then if one wants to know what will happen in an infinite time, one may have to do an effectively infinite amount of computation to find out.
Might there be another way to formulate our questions about the behavior of combinators? Ultimately we could use any computation universal system to represent what combinators do. But some formulations may connect more immediately with existing ideas—say mathematical ones. And for example I think it’s conceivable that the sequences of combinator sizes we’ve seen above could be obtained in a more “direct numerical way”, perhaps from something like nestedly recursive functions (I discovered this particular example in 2003):
✕
f[n_] := 3 f[n  f[n  1]] 
✕
f[n_ /; n < 1] = 1 
✕

One of the issues in studying combinators is that it’s so hard to visualize what they’re doing. It’s not like with cellular automata where one can make arrays of black and white cells and readily use our visual system to get an impression of what’s going on. Consider for example the combinator evolution:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ CombinatorEvolveList[s[s][k][s[s[s]][s]][s], 7], "StatesDisplay"] 
In a cellular automaton the rule would be operating on neighboring elements, and so there’d be locality to everything that’s happening. But here the combinator rules are effectively moving whole chunks around at a time, so it’s really hard to visually trace what’s happening.
But even before we get to this issue, can we make the mass of brackets and letters in something like
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolveList[s[s][k][s[s[s]][s]][s], 6] // Last 
easier to read? For example, do we really need all those brackets? In the Wolfram Language, for example, instead of writing
✕
a[b[c[d[e]]]] 
we can equivalently write
✕
a@b@c@d@e 
thereby avoiding brackets.
But using @ doesn’t avoid all grouping indications. For example, to represent
✕
a[b][c][d][e] 
with @ we’d have to write:
✕
(((a@b)@c)@d)@e 
In our combinator expression above, we had 24 pairs of brackets. By using @, we can reduce this to 10:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorPlot[ s[s[s[s]][k[s[s[s]][s]][s]]][ k[s[s[s]][s]][s][ s[s[s]][k[s[s[s]][s]][s]]]], "CharactersRightAssociative", "ApplicationGlyph" > Style["\[NegativeVeryThinSpace]\[NegativeVeryThinSpace]@\ \[NegativeVeryThinSpace]", 11], "UseCombinatorGlyphs" > None] 
And we don’t really need to show the @, so we can make this smaller:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/\ Programs.wl"]; CombinatorPlot[ s[s[s[s]][k[s[s[s]][s]][s]]][ k[s[s[s]][s]][s][ s[s[s]][k[s[s[s]][s]][s]]]], "CharactersRightAssociative", "ApplicationGlyph" > "\[NegativeVeryThinSpace]", "UseCombinatorGlyphs" > None] 
When combinators were first introduced a century ago, the focus was on “multiargumentfunctionlike” expressions such as a[b][c] (as appear in the rules for s and k), rather than on “nestedfunctionlike” expressions such as a[b[c]]. So instead of thinking of function application as “right associative”—so that a[b[c]] can be written without parentheses as a@b@c—people instead thought of function application as left associative—so that a[b][c] could be written without parentheses. (Confusingly, people often used @ as the symbol for this leftassociative function application.)
As it’s turned out, the f[g[x]] form is much more common in practice than f[g][x], and in 30+ years there hasn’t been much of a call for a notation for leftassociative function application in the Wolfram Language. But in celebration of the centenary of combinators, we’ve decided to introduce Application (indicated by •) to represent leftassociative function application.
So this means that a[b][c][d][e] can now be written
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; FunctionToApplication[a[b][c][d][e]] 
without parentheses. Of course, now a[b[c[d[e]]]] needs parentheses:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; FunctionToApplication[a[b[c[d[e]]]]] 
In this notation the rules for s and k can be written without brackets as:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; FunctionToApplication[{s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x}] 
Our combinator expression above becomes
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; FunctionToApplication[ s[s[s[s]][k[s[s[s]][s]][s]]][ k[s[s[s]][s]][s][s[s[s]][k[s[s[s]][s]][s]]]]] 
or without the function application character
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorPlot[ s[s[s[s]][k[s[s[s]][s]][s]]][ k[s[s[s]][s]][s][ s[s[s]][k[s[s[s]][s]][s]]]], "CharactersLeftAssociative", "ApplicationGlyph" > "\[NegativeVeryThinSpace]", "UseCombinatorGlyphs" > None] 
which now involves 13 pairs of parentheses.
Needless to say, if you consider all possible combinator expressions, left and right associativity on average do just the same in terms of parenthesis counts: for sizen combinator expressions, both on average need pairs; the number of cases needing k pairs is
✕
Binomial[n  1, k  1] Binomial[n, k  1]/k 
(the “Catalan triangle”). (Without associativity, we’re dealing with our standard representation of combinator expressions, which always requires n – 1 pairs of brackets.)
By the way, the number of “rightassociative” parenthesis pairs is just the number of subparts of the combinator expression that match _[_][_], while for leftassociative parenthesis pairs it’s the number that match _[_[_]]. (The number of brackets in the noassociativity case is the number of matches of _[_].)
If we look at the parenthesis/bracket count in the evolution of the smallest nonterminating combinator expression from above s[s][s][s[s]][s][s] (otherwise known as s•s•s•(s•s)•s•s) we find:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/\ Programs.wl"]; ListStepPlot[ Callout[#[[1]] /@ CombinatorEvolveList[s[s][s][s[s]][s][s], 50], #[[ 2]]] & /@ {{LeafCount[#] &, "none"}, {Count[#, _[_][_], {0, Infinity}, Heads > True] &, "left"}, {Count[#, _[_[_]], {0, Infinity}, Heads > True] &, "right"}}, Center, Frame > True, Joined > True] 
Or in other words, in this case, left associativity leads on average to about 62% of the number of parentheses of right associativity. We’ll look at this in more detail later, but for growing combinator expressions, it’ll almost always turn out to be the case that left associativity is the “parenthesisavoidance winner”.
But even with our “best parenthesis avoidance” it’s still very hard to see what’s going on from the textual form:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Magnify[ Grid[{{Column[{CombinatorEvolutionPlot[ CombinatorPlot[#, "CharactersLeftAssociative", "ApplicationGlyph" > "\[NegativeVeryThinSpace]", "UseCombinatorGlyphs" > None] & /@ CombinatorEvolveList[s[s][s][s[s]][s][s], 7], "StatesDisplay"], Text[Style["left", Italic, 12]]}, Dividers > Center, FrameStyle > LightGray, Alignment > {1 > Center}], Column[{CombinatorEvolutionPlot[ CombinatorPlot[#, "CharactersRightAssociative", "ApplicationGlyph" > "\[NegativeVeryThinSpace]", "UseCombinatorGlyphs" > None] & /@ CombinatorEvolveList[s[s][s][s[s]][s][s], 7], "StatesDisplay"], Text[Style["right", Italic, 12]]}, Dividers > Center, FrameStyle > LightGray, Alignment > {1 > Center}], Column[{CombinatorEvolutionPlot[ CombinatorEvolveList[s[s][s][s[s]][s][s], 7], "StatesDisplay"], Text[Style["none", Italic, 12]]}, Dividers > Center, FrameStyle > LightGray, Alignment > {1 > Center}]}}, Dividers > {{{Directive[Thick, Gray]}}, {False}}, Spacings > 2], .89] 
So what about getting rid of parentheses altogether? Well, we can always use socalled Polish (or Łukasiewicz) “prefix” notation—in which we write f[x] as •fx and f[g[x]] as •f•gx. And in this case our combinator expression from above becomes:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorPlot[ s[s[s[s]][k[s[s[s]][s]][s]]][ k[s[s[s]][s]][s][ s[s[s]][k[s[s[s]][s]][s]]]], "CharactersPolishNotation", "UseCombinatorGlyphs" > None] 
Alternatively—like a traditional HP calculator—we can use reverse Polish “postfix” notation, in which f[x] is fx• and f[g[x]] is fgx•• (and • is like HP ):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorPlot[ s[s[s[s]][k[s[s[s]][s]][s]]][ k[s[s[s]][s]][s][ s[s[s]][k[s[s[s]][s]][s]]]], "CharactersReversePolishNotation", "UseCombinatorGlyphs" > None] 
The total number of • symbols is always equal to the number of pairs of brackets in our standard “nonassociative” functional form:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Magnify[ Grid[{{Column[{CombinatorEvolutionPlot[ Row[Flatten[# //. x_[y_] > {\[Bullet], x, y}]] & /@ CombinatorEvolveList[s[s][s][s[s]][s][s], 7], "StatesDisplay"], Text[Style["Polish", Italic, 12]]}, Dividers > Center, FrameStyle > LightGray, Alignment > {1 > Center}], Column[{CombinatorEvolutionPlot[ Row[Flatten[# //. x_[y_] > {x, y, \[Bullet]}]] & /@ CombinatorEvolveList[s[s][s][s[s]][s][s], 7], "StatesDisplay"], Text[Style["reverse Polish", Italic, 12]]}, Dividers > Center, FrameStyle > LightGray, Alignment > {1 > Center}]}}, Dividers > {{{Directive[Thick, Gray]}}, {False}}, Spacings > 2], 1] 
What if we look at this on a larger scale, “cellular automaton style”, with s being and • being ? Here’s the notveryenlightening result:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; GraphicsRow[{Labeled[ CombinatorEvolutionPlot[ CombinatorEvolveList[s[s][s][s[s]][s][s], 7], "ArrayPlotPolishNotation", Mesh > True, ImageSize > 300, MeshStyle > GrayLevel[0, .18]], Text[Style["Polish", Italic, 12]]], Labeled[CombinatorEvolutionPlot[ CombinatorEvolveList[s[s][s][s[s]][s][s], 7], "ArrayPlotReversePolishNotation", Mesh > True, ImageSize > 300, MeshStyle > GrayLevel[0, .18]], Text[Style["reverse Polish", Italic, 12]]]}, ImageSize > 640] 
Running for 50 steps, and fixing the aspect ratio, we get (for the Polish case):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ CombinatorEvolveList[s[s][s][s[s]][s][s], 50], "ArrayPlotPolishNotation", ImageSize > 630, AspectRatio > 1/2] 
We can make the same kinds of pictures from our bracket representation too. We just take a string like s[s][s][s[s]][s][s] and render each successive character as a cell of some color. (It’s particularly easy if we’ve only got one basic combinator—say s—because then we only need colors for the opening and closing brackets.) We can also make “cellular automaton–style” pictures from parenthesis representations like SSS(SS)SS. Again, all we do is render each successive character as a cell of some color.
The results essentially always tend to look much like the reverse Polish case above. Occasionally, though, they reveal at least something about the “innards of the computation”. Like here’s the terminating combinator expression s[s][s][s[s[s]]][k][s]] from above, rendered in rightassociative form:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ CombinatorFixedPointList[ s[s][s][s[s[s]]][k][s]], "ArrayPlotRightAssociative", AspectRatio > .6, "IncludeUpdateHighlighting" > False, ImageSize > 530] 
Pictures like this in a sense convert all combinator expressions to sequences. But combinator expressions are in fact hierarchical structures, formed by nested invocations of symbolic “functions”. One way to represent the hierarchical structure of
✕
s[s][s][s[s]][s][ s] /. {s > Style[s, Black, FontWeight > "SemiBold"]} 
is through a hierarchy of nested boxes:
✕
MapAll[# /. {a_[b_] > Framed[Row[{a, " ", b}]], a_Symbol > Framed[a]} &, s[s][s][s[s]][s][s], Heads > True] 
We can color each box by its depth in the expression:
✕

But now to represent the expression all we really need to do is show the basic combinators in a color representing its depth. And doing this, we can visualize the terminating combinator evolution above as:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Legended[ CombinatorEvolutionPlot[ CombinatorFixedPointList[s[s][s][s[s[s]]][k][s]], "ExpressionDepthPlot", FrameTicks > {True, True, False, False}], BarLegend[{"Rainbow", {0, 18}}, LegendMarkerSize > 125]] 
We can also render this in 3D (with the height being the “depth” in the expression):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ CombinatorFixedPointList[ s[s][s][s[s[s]]][k][s]], "ExpressionDepthPlot", "IncludeDepthAxis" > True, BoxRatios > {1, 1, .1}, PlotRange > {Automatic, Automatic, {.001, 18}}, ClippingStyle > Directive[GrayLevel[.5], Opacity[.4]], BoundaryStyle > None, Boxed > False, Axes > False] 
To test out visualizations like these, let’s look (as above) at all the size8 combinator expressions with distinct evolutions that don’t terminate within 50 steps. Here’s the “depth map” for each case:
✕

In these pictures we’re drawing a cell for each element in the “stringified version” of the combinator expression at each step, then coloring it by depth. But given a particular combinator expression, one can consider other ways to indicate the depth of each element. Here are a few possibilities, shown for step 8 in the evolution of s[s][s][s[s]][s][s] (SSS(SS)SS) (note that the first of these is essentially the “indentation level” that might be used if each s, k were “pretty printed” on a separate line):
✕

✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; MatchedBracketsPlot[CombinatorEvolve[s[s][s][s[s]][s][s], 8], Appearance > #, AspectRatio > 1/3, "BracketCharacters" > {"[", "]"}, ColorRules > {s > RGBColor[ 0.8823529411764706, 0.29411764705882354`, 0.2980392156862745]}, ImageSize > 300, PlotStyle > AbsolutePointSize[4], "IncludeTextForm" > False] & /@ {"Mountain", "Vee", "Bush", "Tree"} 
And this is what one gets on a series of steps:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Table[ Labeled[GraphicsRow[ MatchedBracketsPlot[CombinatorEvolve[s[s][s][s[s]][s][s], t], Appearance > #, "IncludeTextForm" > False, AspectRatio > 1/3, PlotStyle > AbsolutePointSize[0]] & /@ {"Mountain", "Vee", "Bush", "Tree"}, ImageSize > 600], Text[Style[t, Gray]], Left], {t, 10, 25, 5}] 
But in a sense a more direct visualization of combinator expressions is as trees, as for example in:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Grid[{CombinatorEvolutionPlot[{#}, "StatesDisplay"] , CombinatorExpressionGraph[#, "MatchHighlighting" > False, VertexSize > {"Scaled", 0.08}, ImageSize > {Automatic, 100}, AspectRatio > 1/2]} & /@ {s[s[s]], s[s][s], s[s][s][s[s]][s][s]}, Frame > All, FrameStyle > LightGray] 
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Grid[{{CombinatorEvolutionPlot[{s[ s[s[s]][k[s[s[s]][s]][s]]][ k[s[s[s]][s]][s][s[s[s]][k[s[s[s]][s]][s]]]]}, "StatesDisplay"]}, {CombinatorExpressionGraph[ s[s[s[s]][k[s[s[s]][s]][s]]][ k[s[s[s]][s]][s][s[s[s]][k[s[s[s]][s]][s]]]], AspectRatio > 1/2, "MatchHighlighting" > False, ImageSize > 450]}}, Frame > All, FrameStyle > LightGray, Alignment > Center] 
Note that these trees can be somewhat simplified by treating them as left or right “associative”, and essentially pulling left or right leaves into the “branch nodes”.
But using the original trees, we can ask for example what the trees for the expressions produced by the evolution of s[s][s][s[s]][s][s] (SS(SS)SS) are. Here are the results for the first 15 steps:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorExpressionGraph[#, "UpdateHighlighting" > {}, "MatchHighlighting" > False, "EvaluationScheme" > {"Leftmost", "Outermost", 1}, "ShowVertexLabels" > False, ImageSize > {Automatic, 80}] & /@ CombinatorEvolveList[s[s][s][s[s]][s][s], 15] 
In a different rendering, these become:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; TreePlot[CombinatorExpressionGraph[#, "UpdateHighlighting" > {}, "EvaluationScheme" > {"Leftmost", "Outermost", 1}, "ShowVertexLabels" > False, ImageSize > {Automatic, 60}], Center] & /@ CombinatorEvolveList[s[s][s][s[s]][s][s], 15] 
OK, so these are representations of the combinator expressions on successive steps. But where are the rules being applied at each step? As we’ll discuss in much more detail in the next section, in the way we’ve done things so far we’re always doing just one update at each step. Here’s an example of where the updates are happening in a particular case:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ CombinatorPlot[#, "FramedMatches"] & /@ NestList[CombinatorStep, s[s][s][s[s[s]]][k][s], 7], "StatesDisplay"] 
Continuing longer we get (note that some lines have wrapped in this display):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Magnify[ CombinatorEvolutionPlot[ CombinatorPlot[#, "FramedMatches"] & /@ NestList[CombinatorStep, s[s][s][s[s[s]]][k][s], 20], "StatesDisplay"], .4] 
A feature of the way we’re writing out combinator expressions is that the “input” to any combinator rule always corresponds to a contiguous span within the expression as we display it. So when we show the total size of combinator expressions on each step in an evolution, we can display which part is getting rewritten:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ CombinatorFixedPointList[s[s][s][s[s[s]]][k][s]], "SizeAndMatches", ImageSize > 500] 
Notice that, as expected, application of the S rule tends to increase size, while the K rule decreases it.
Here is the distribution of rule applications for all the examples we showed above:
✕

We can combine multiple forms of visualization by including depths:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ CombinatorFixedPointList[s[s][s][s[s[s]]][k][s], "MaxSize" > 100], "DepthAndUpdatePlot"] 
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ CombinatorFixedPointList[ s[s][s][s[s[s]]][k][s]], "DepthAndUpdatePlot", "SpanThickness" > .6] 
We can also do the same in 3D:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ CombinatorFixedPointList[s[s][s][s[s[s]]][k][s]], "DepthCuboidPlot", Axes > True] 
So what about the underlying trees? Here are the S, K combinator rules in terms of trees:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Column[ Map[Row[{#[[1]], Spacer[15], Style["\[LongRightArrow]", FontSize > 18], Spacer[3], #[[2]]} ] &, Map[CombinatorExpressionGraph[#, VertexSize > .3, "UpdateHighlighting" > {}, "MatchHighlighting" > True, ImageSize > Small] &, {{s[x][y][z], x[z][y[z]]}, {k[x][y], x}}, {2}]], Spacings > 2] 
And here are the updates for the first few steps of the evolution of s[s][s][s[s[s]]][k][s] (SSS(S(SS))KS):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorExpressionGraph[#, "UpdateHighlighting" > {"Subtrees"}, "EvaluationScheme" > {"Leftmost", "Outermost", 1}, "MatchHighlighting" > False, "ShowVertexLabels" > False, ImageSize > {Automatic, 100}] & /@ CombinatorEvolveList[s[s][s][s[s[s]]][k][s], 12] 
In these pictures we are effectively at each step highlighting the “first” subtree matching s[_][_][_] or k[_][_]. To get a sense of the whole evolution, we can also simply count the number of subtrees with a given general structure (like _[_][_] or _[_[_]]) that occur at a given step (see also below):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ Function[p, Callout[Count[#, p, {0, Infinity}, Heads > True] & /@ CombinatorFixedPointList[s[s][s][s[s[s]]][k][s]], Quiet[CombinatorExpressionGraph[p, ImageSize > 30]], Frame > True]] /@ {_, _[_], _[_][_], _[_[_]], _[_[_[_]]], _[_][_][_]}, \ Center, Joined > True, Frame > True, ImageSize > 530] 
One more indication of the behavior of combinators comes from looking at tree depths. In addition to the total depth (i.e. Wolfram Language Depth) of the combinator tree, one can also look at the depth at which update events happen (here with the total size shown underneath):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Function[{list}, GraphicsColumn[{CombinatorEvolutionPlot[list, "UpdateDepthPlot"], ListStepPlot[LeafCount /@ list, Center, AspectRatio > 1/4, Frame > True, PlotStyle > $PlotStyles["ListPlot", "PlotStyle"]]}]][ CombinatorFixedPointList[s[s][s][s[s[s]]][k][s]]] 
Here are the depth profiles for the rules shown above:
✕

Not surprisingly, total depth tends to increase when growth continues. But it is notable that—except when termination is near at hand—it seems like (at least with our current updating scheme) updates tend to be made to “highlevel” (i.e. lowdepth) parts of the expression tree.
When we write out a combinator expression like the size33
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[{CombinatorEvolve[ s[s][s][s[s]][s][s], 9]}, "StatesDisplay"] 
or show it as a tree
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorExpressionGraph[ CombinatorEvolve[s[s][s][s[s]][s][s], 9], "UpdateHighlighting" > {}, "ShowVertexLabels" > False] 
we’re in a sense being very wasteful, because we’re repeating the same subexpressions many times. In fact, in this particular expression, there are 65 subexpressions altogether—but only 11 distinct ones.
So how can we represent a combinator expression making as much use as possible of the commonality of these subexpressions? Well, instead of using a tree for the combinator expression, we can use a directed acyclic graph (DAG) in which we start from a node representing the whole expression, and then show how it breaks down into shared subexpressions, with each shared subexpression represented by a node.
To see how this works, let’s consider first the trivial case of f[x]. We can represent this as a tree—in which the root represents the whole expression f[x], and has one connection to the head f, and another to the argument x:
✕

The expression f[g[x]] is still a tree:
✕

But in f[f[x]] there is a “shared subexpression” (which in this case is just f), and the graph is no longer a tree:
✕

For f[x][f[x][f[x]]], f[x] is a shared subexpression:
✕

For s[s][s][s[s]][s][s]] things get a bit more complicated:
✕

For the size33 expression above, the DAG representation is
✕

where the nodes correspond to the 11 distinct subexpression of the whole expression that appears at the root.
So what does combinator evolution look like in terms of DAGs? Here are the first 15 steps in the evolution of s[s][s][s[s]][s][s]:
✕

And here are some later steps:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/\ Programs.wl"]; ParallelTable[ Labeled[Graph[ CombinatorToDAG[CombinatorEvolve[s[s][s][s[s]][s][s], t]], GraphLayout > "LayeredDigraphEmbedding", AspectRatio > 1/2], Text[Style[t, Gray, 12]]], {t, 50, 150, 50}] 
Sharing all common subexpressions is in a sense a maximally reduced way to specify a combinator expression. And even when the total size of the expressions is growing roughly exponentially, the number of distinct subexpressions may grow only linearly—here roughly like 1.24 t:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ Length[Union[Level[#, {0, Infinity}, Heads > True]]] & /@ CombinatorEvolveList[s[s][s][s[s]][s][s], 100], Frame > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleLight"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"], AspectRatio > 1/4, ImageSize > 600] 
Looking at successive differences suggests a fairly simple pattern:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ Differences[ Length[Union[Level[#, {0, Infinity}, Heads > True]]] & /@ CombinatorEvolveList[s[s][s][s[s]][s][s], 300]], Frame > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleLight"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"], AspectRatio > 1/6, ImageSize > 600] 
Here are the DAG representations of the result of 50 steps in the evolution of the 46 “growing size7” combinator expressions above:
✕

It’s notable that some of these show considerable complexity, while others have a rather simple structure.
The world of combinators as we’ve discussed it so far may seem complicated. But we’ve actually so far been consistently making a big simplification. And it has to do with how the combinator rules are applied.
Consider the combinator expression:
✕
s[s[s][s][s[s][k[k][s]][s]]][s][s][s[k[s][k]][k][s]] 
There are 6 places (some overlapping) at which s[_][_][_] or k[_][_] matches some subpart of this expression:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[{CombinatorPlot[ s[s[s][s][s[s][k[k][s]][s]]][s][s][s[k[s][k]][k][s]], "FramedMatches", "EvaluationScheme" > {"Leftmost", "Outermost"}]}, "StatesDisplay"] 
One can see the same thing in the tree form of the expression (the matches are indicated at the roots of their subtrees):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorExpressionGraph[ s[s[s][s][s[s][k[k][s]][s]]][s][s][s[k[s][k]][k][s]], "UpdateHighlighting" > {}, "MatchHighlighting" > True, AspectRatio > .8, ImageSize > 410] 
But now the question is: if one’s applying combinator rules, which of these matches should one use?
What we’ve done so far is to follow a particular strategy—usually called leftmost outermost—which can be thought of as looking at the combinator expression as we normally write it out with brackets etc. and applying the first match we encounter in a lefttoright scan, or in this case:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[{CombinatorPlot[ s[s[s][s][s[s][k[k][s]][s]]][s][s][s[k[s][k]][k][s]], "FramedMatches", "IncludeBackgroundFraming" > True]}, "StatesDisplay"] 
In the Wolfram Language we can find the positions of the matches just using:
✕
expr = s[s[s][s][s[s][k[k][s]][s]]][s][s][s[k[s][k]][k][s]] 
✕
pos = Position[expr, s[_][_][_]  k[_][_]] 
This shows—as above—where these matches are in the expression:
✕
expr = s[s[s][s][s[s][k[k][s]][s]]][s][s][s[k[s][k]][k][s]]; 
✕
pos = Position[expr, s[_][_][_]  k[_][_]]; 
✕
MapAt[Framed, expr, pos] 
Here are the matches, in the order provided by Position:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; With[{expr = s[s[s][s][s[s][k[k][s]][s]]][s][s][s[k[s][k]][k][s]]}, Grid[{Text[#], CombinatorEvolutionPlot[{CombinatorPlot[ expr, {"FramedPositions", {#}}, "IncludeArgumentFraming" > False]}, "StatesDisplay"]} & /@ Position[expr, s[_][_][_]  k[_][_]], Frame > All]] 
The leftmostoutermost match here is the one with position {0}.
In general the series of indices that specify the position of a subexpression say whether to reach the subexpression one should go left or right at each level as one descends the expression tree. An index 0 says to go to “head”, i.e. the f in f[x], or the f[a][b] in f[a][b][c]; an index 1 says to the “first argument”, i.e. the x in f[x], or the c in f[a][b][c]. The length of the list of indices gives the depth of the corresponding subexpression.
We’ll talk in the next section about how leftmost outermost—and other schemes—are defined in terms of indices. But here the thing to notice is that in our example here Position doesn’t give us part {0} first; instead it gives us {0,0,0,1,1,0,1}:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[{CombinatorPlot[ s[s[s][s][s[s][k[k][s]][s]]][s][s][ s[k[s][k]][k][s]], {"FramedPositions", {{0, 0, 0, 1, 1, 0, 1}}}, "IncludeArgumentFraming" > True, "IncludeBackgroundFraming" > True]}, "StatesDisplay"] 
And what’s happening is that Position is doing a depthfirst traversal of the expression tree to look for matches, so it first descends all the way down the lefthand tree branches—and since it finds a match there, that’s what it returns. In the taxonomy we’ll discuss in the next section, this corresponds to a leftmostinnermost scheme, though here we’ll refer to it as “depth first”.
Now consider the example of s[s][s][k[s][s]]. Here is what it does first with the leftmostoutermost strategy we’ve been using so far, and second with the new strategy:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Grid[{{Column[{CombinatorEvolutionPlot[ CombinatorPlot[#, "FramedMatches"] & /@ CombinatorFixedPointList[s[s][s][k[s][s]]], "StatesDisplay"], Text[Style["standard (leftmost outermost)", Italic, 12]]}, Dividers > Center, FrameStyle > Gray, Alignment > {1 > Center}], Column[{CombinatorEvolutionPlot[ CombinatorPlot[#, "FramedMatches", "EvaluationScheme" > {"Innermost", "Leftmost", 1}] & /@ CombinatorFixedPointList[ s[s][s][k[s][s]], {"Innermost", "Leftmost", 1}], "StatesDisplay"], Text[Style["depthfirst (leftmost innermost)", Italic, 12]]}, Dividers > Center, FrameStyle > Gray, Alignment > {1 > Center}]}}, Dividers > {{{Directive[Thick, LightGray]}}, {False}}, Alignment > Top] 
There are two important things to notice. First, that in both cases the final result is the same. And second, that the steps taken—and the total number required to get to the final result—is different in the two cases.
Let’s consider a larger example: s[s][s][s[s[s]]][k][s]] (SSS(S(SS))KS). With our standard strategy we saw above that the evolution of this expression terminates after 89 steps, giving an expression of size 65. With the depthfirst strategy the evolution still terminates with the same expression of size 65, but now it takes only 29 steps:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; (Labeled[ ListStepPlot[ LeafCount /@ CombinatorFixedPointList[s[s][s][s[s[s]]][k][s], First[#]], PlotRange > All, Frame > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleDark"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"], ImageSize > 300], Text[Style[ToLowerCase[ #[[2]]], Italic, 12]]]) & /@ {{{"Leftmost", "Outermost"}, "standard (leftmost outermost)"}, {{"Leftmost", "Innermost"}, "depth first (leftmost innermost)"}} 
It’s an important feature of combinator expression evolution that when it terminates—whatever strategy one’s used—the result must always be the same. (This “confluence” property—that we’ll discuss more later—is closely related to the concept of causal invariance in our models of physics.)
What happens when the evolution doesn’t terminate? Let’s consider the simplest nonterminating case we found above: s[s][s][s[s]][s][s] (SSS(SS)SS). Here’s how the sizes increase with the two strategies we’ve discussed:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; (Labeled[ ListStepPlot[ LeafCount /@ CombinatorEvolveList[s[s][s][s[s]][s][s], 60, First@#], PlotRange > All, Frame > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleDark"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"], ImageSize > 300, ScalingFunctions > "Log"], Text[Style[ToLowerCase[ #[[2]]], Italic, 12]]]) & /@ {{{"Leftmost", "Outermost"}, "standard (leftmost outermost)"}, {{"Leftmost", "Innermost"}, "depth first (leftmost innermost)"}} 
The difference is more obvious if we plot the ratios of sizes on successive steps:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; (Labeled[ ListStepPlot[ Ratios[LeafCount /@ CombinatorEvolveList[s[s][s][s[s]][s][s], 150, First@#]], PlotRange > All, Frame > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleLight"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"], ImageSize > 300, ScalingFunctions > "Log"], Text[Style[ToLowerCase[ #[[2]]], Italic, 12]]]) & /@ {{{"Leftmost", "Outermost"}, "standard (leftmost outermost)"}, {{"Leftmost", "Innermost"}, "depth first (leftmost innermost)"}} 
In both these pairs of pictures, we can see that the two strategies start off producing the same results, but soon diverge.
OK, so we’ve looked at two particular strategies for picking which updates to do. But is there a general way to explore all possibilities? It turns out that there is—and it’s to use multiway systems, of exactly the kind that are also important in our Physics Project.
The idea is to make a multiway graph in which there’s an edge to represent each possible update that can be performed from each possible “state” (i.e. combinator expression). Here’s what this looks like for the example of s[s][s][k[s][s]] (SSS(KSS)) above:
✕

Here’s what we get if we include all the “updating events”:
✕

Now each possible sequence of updating events corresponds to a path in the multiway graph. The two particular strategies we used above correspond to these paths:
✕

We see that even at the first step here, there are two possible ways to go. But in addition to branching, there is also merging, and indeed whichever branch one takes, it’s inevitable that one will end up at the same final state—in effect the unique “result” of applying the combinator rules.
Here’s a slightly more complicated case, where there starts out being a unique path, but then after 4 steps, there’s a branch, but after a few more steps, everything converges again to a unique final result:
✕

For combinator expressions of size 4, there’s never any branching in the multiway graph. At size 5 the multiway graphs that occur are:
✕

At size 6 the 2688 possible combinator expressions yield the following multiway graphs, with the one shown above being basically as complicated as it gets:
✕

At size 7, much more starts being able to happen. There are rather regular structures like:
✕

As well as cases like:
✕

This can be summarized by giving just the size of each intermediate expression, here showing the path defined by our standard leftmostoutermost updating strategy:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Graph[ MWCombinatorGraph[s[s][k][s[s]][k][k], 15, CombinatorEvolveList], AspectRatio > 1] 
By comparison, here is the path defined by the depthfirst strategy above:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Graph[ MWCombinatorGraph[s[s][k][s[s]][k][k], 15, CombinatorEvolveList[#1, #2, {"Leftmost", "Innermost"}] &], AspectRatio > 1] 
s[s][s][s[s[k]]][k] (SSS(S(SK))K) is a case where leftmost outermostevaluation avoids longer paths and larger intermediate expressions
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Graph[ MWCombinatorGraph[s[s][s][s[s[k]]][k], 15, "LeftmostOutermost"], AspectRatio > 1.2] 
while depthfirst evaluation takes more steps:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Graph[ MWCombinatorGraph[s[s][s][s[s[k]]][k], 15, CombinatorEvolveList[#1, #2, {"Leftmost", "Innermost"}] &], AspectRatio > 1] 
s[s[s]][s][s[s]][s] (S(SS)S(SS)S) gives a larger but more uniform multiway graph (s[s[s[s]]][s][s][s] evolves directly to s[s[s]][s][s[s]][s]):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Graph[ MWCombinatorGraph[s[s[s]][s][s[s]][s], 15, "LeftmostOutermost"], AspectRatio > 1.2, ImageSize > 480] 
Depthfirst evaluation gives a slightly shorter path:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Graph[ MWCombinatorGraph[s[s[s]][s][s[s]][s], 15, CombinatorEvolveList[#1, #2, {"Leftmost", "Innermost"}] &], AspectRatio > 1.2, ImageSize > 450] 
Among size7 expressions, the largest finite multiway graph (with 94 nodes) is for s[s[s[s]]][s][s][k] (S(S(SS))SSK):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Graph[ MWCombinatorGraphMinimal[s[s[s[s]]][s][s][k], 18], AspectRatio > 1.2] 
Depending on the path, this can take between 10 and 18 steps to reach its final state:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Histogram[ Length /@ FindPath[ MWCombinatorGraphMinimal[s[s[s[s]]][s][s][k], 18], "s[s[s[s]]][s][s][k]", "s[k[s[k][s[s[s]][k]]]][k[s[k][s[s[s]][k]]]]", Infinity, All], ChartStyle > $PlotStyles["Histogram", "ChartStyle"], Frame > True, FrameTicks > {True, False}] 
Our standard leftmostoutermost strategy takes 12 steps; the depth first takes 13 steps:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Labeled[Graph[ MWCombinatorGraphMinimal[s[s[s[s]]][s][s][k], 15, First[#]], AspectRatio > 1.2, ImageSize > {Automatic, 300}], Text[Style[#[[2]], Italic, 12]]] & /@ {{CombinatorEvolveList, "standard"}, {CombinatorEvolveList[#1, #2, {"Leftmost", "Innermost"}] &, "depthfirst"}} 
But among size7 combinator expressions there are basically two that do not lead to finite multiway systems: s[s[s]][s][s][s][k] (S(SS)SSSK) (which evolves immediately to s[s][s][s[s]][s][k]) and s[s[s]][s][s][s][s] (S(SS)SSSS) (which evolves immediately to s[s][s][s[s]][s][s]).
Let’s consider s[s[s]][s][s][s][k]. For 8 steps there’s a unique path of evolution. But at step 9, the evolution branches
✕

as a result of there being two distinct possible updating events:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[{CombinatorPlot[ Last[CombinatorEvolveList[s[s[s]][s][s][s][k], 8]], "FramedMatches", "EvaluationScheme" > {"Leftmost", "Outermost"}]}, "StatesDisplay"] 
Continuing for 14 steps we get a fairly complex multiway system:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Graph[ MWCombinatorGraphMinimal[s[s[s]][s][s][s][k], 14], AspectRatio > 1.2] 
But this isn’t “finished”; the nodes circled in red correspond to expressions that are not fixed points, and will evolve further. So what happens with particular evaluation orders?
Here are the results for our two updating schemes:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Labeled[Graph[ MWCombinatorGraphMinimal[s[s[s]][s][s][s][k], 14, First[#]], AspectRatio > 1.2, ImageSize > {Automatic, 300}], Text[Style[Last[#], Italic, 12]]] & /@ {{CombinatorEvolveList, "standard"}, {CombinatorEvolveList[#1, #2, {"Leftmost", "Innermost"}] &, "depthfirst"}} 
Something important is visible here: the leftmostoutermost path leads (in 12 steps) to a fixedpoint node, while the depthfirst path goes to a node that will evolve further. In other words, at least as far as we can see in this multiway graph, leftmostoutermost evaluation terminates while depth first does not.
There is just a single fixed point visible (s[k]), but there are many “unfinished paths”. What will happen with these? Let’s look at depthfirst evaluation. Even though it hasn’t terminated after 14 steps, it does so after 29 steps—yielding the same final result s[k]:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ CombinatorFixedPointList[ s[s][s][s[s[s]]][k][s], {"Leftmost", "Innermost", 1}], "SizeAndMatches", "EvaluationScheme" > {"Leftmost", "Innermost", 1}, PlotRange > All] 
And indeed it turns out to be a general result (known since the 1940s) that if a combinator evolution path is going to terminate, it must terminate in a unique fixed point, but it’s also possible that the path won’t terminate at all.
Here’s what happens after 17 steps. We see more and more paths leading to the fixed point, but we also see an increasing number of “unfinished paths” being generated:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Graph[ MWCombinatorGraphMinimal[s[s[s]][s][s][s][k], 17, CombinatorEvolveList[#1, #2] &, "PathThickness" > 3.5], AspectRatio > 1.2] 
Let’s now come back to the other case we mentioned above: s[s[s]][s][s][s][s] (S(SS)SSSS). For 12 steps the evolution is unique:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Magnify[ CombinatorEvolutionPlot[ CombinatorEvolveList[s[s[s]][s][s][s][s], 12], "StatesDisplay"], .7] 
But at that step there are two possible updating events:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Magnify[ CombinatorEvolutionPlot[{CombinatorPlot[ Last[CombinatorEvolveList[s[s[s]][s][s][s][s], 12]], "FramedMatches", "EvaluationScheme" > {"Leftmost", "Outermost"}]}, "StatesDisplay"], .7] 
And from there on out, there’s rapid growth in the multiway graph:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Graph[ MWCombinatorGraphMinimal[s[s[s]][s][s][s][s], 18], AspectRatio > 1.2] 
And what’s important here is that there are no fixed points: there is no possible evaluation strategy that leads to a fixed point. And what we’re seeing here is an example of a general result: if there is a fixed point in a combinator evolution, then leftmostoutermost evaluation will always find it.
In a sense, leftmostoutermost evaluation is the “most conservative” evaluation strategy, with the least propensity for ending up with “runaway evolution”. Its “conservatism” is on display if one compares growth from it and from depthfirst evaluation in this case:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Labeled[CombinatorEvolutionPlot[ CombinatorEvolveList[s[s[s]][s][s][s][s], 80, Append[#, 1]], "SizeAndMatches", "EvaluationScheme" > Append[#, 1], PlotRange > All, ImageSize > 300], Text[Row[Style[#, Italic, 12] & /@ (ToLowerCase /@ #), Spacer[1]]]] & /@ {{"Leftmost", "Outermost"}, {"Leftmost", "Innermost"}} 
Looking at the multiway graph—as well as others—a notable feature is the presence of “long necks”: for many steps every evaluation strategy leads to the same sequence of expressions, and there is just one possible match at each step.
But how long can this go on? For size 8 and below it’s always limited (the longest “neck” at size 7 is for s[s[s]][s][s][s][s] and is of length 13; for size 8 it is no longer, but is of length 13 for s[s[s[s]][s][s][s][s]] and k[s[s[s]][s][s][s][s]]). But at size 9 there are four cases (3 distinct) for which growth continues forever, but is always unique:
✕
{s[s[s[s]]][s[s[s][s]]][s], s[s[s[s]]][s[s[s]]][s[s]], s[s[s]][s][s[s[s][s]][s]], s[s[s]][k][s[s[s][s]][s]]} 
And as one might expect, all these show rather regular patterns of growth:
✕

The second differences are given in the first and third cases by repeats of (for successive n):
✕
Join[{0, 0, 1} , Table[0, n], {7, 0, 0, 1, 0, 3 (2^(n + 2)  3)}] 
In the second they are given by repeats of
✕
Join[Table[0, n], {2}] 
and in the final case by repeats of
✕
Join[{0, 1} , Table[0, n], {3 2^(n + 3) + 18, 3 2^(n + 3)  11, 0, 1, 0, 3 2^(n + 3) + 2, 9 2^(n + 2)  11}] 
As a computational language designer, it’s an issue I’ve been chasing for 40 years: what’s the best way to define the order in which one evaluates (i.e. computes) things? The good news is that in a welldesigned language (like the Wolfram Language!) it fundamentally doesn’t matter, at least much of the time. But in thinking about combinators—and the way they evolve—evaluation order suddenly becomes a central issue. And in fact it’s also a central issue in our new model of physics—where it corresponds to the choice of reference frame, for relativity, quantum mechanics and beyond.
Let’s talk first about evaluation order as it shows up in the symbolic structure of the Wolfram Language. Imagine you’re doing this computation:
✕
Length[Join[{a, b}, {c, d, e}]] 
The result is unsurprising. But what’s actually going on here? Well, first you’re computing Join[...]:
✕
Join[{a, b}, {c, d, e}] 
Then you’re taking the result, and providing it as argument to Length, which then does its job, and gives the result 5. And in general in the Wolfram Language, if you’re computing f[g[x]] what’ll happen is that x will be evaluated first, followed by g[x], and finally f[g[x]]. (Actually, the head f in f[x] is the very first thing evaluated, and in f[x, y] one evaluates f, then x, then y and then f[x, y].)
And usually this is exactly what one wants, and what people implicitly expect. But there are cases where it isn’t. For example, let’s say you’ve defined x = 1 (i.e. Set[x,1]). Now you want to say x = 2 (Set[x,2]). If the x evaluated first, you’d get Set[1,2], which doesn’t make any sense. Instead, you want Set to “hold its first argument”, and “consume it” without first evaluating it. And in the Wolfram Language this happens automatically because Set has attribute HoldFirst.
How is this relevant to combinators? Well, basically, the standard evaluation order used by the Wolfram Language is like the depthfirst (leftmostinnermost) scheme we described above, while what happens when functions have Hold attributes is like the leftmostoutermost scheme.
But, OK, so if we have something like f[a[x],y] we usually first evaluate a[x], then use the result to compute f[a[x],y]. And that’s pretty easy to understand if a[x], say, immediately evaluates to something like 4 that doesn’t itself need to be evaluated. But what happens when in f[a[x],y], a[x] evaluates to b[x] which then evaluates to c[x] and so on? Do you do the complete chain of “subevaluations” before you “come back up” to evaluate y, and f[...]?
What’s the analog of this for combinators? Basically it’s whether when you do an update based on a particular match in a combinator expression, you then just keep on “updating the update”, or whether instead you go on and find the next match in the expression before doing anything with the result of the update. The “updating the update” scheme is basically what we’ve called our depthfirst scheme, and it’s essentially what the Wolfram Language does in its automatic evaluation process.
Imagine we give the combinator rules as Wolfram Language assignments:
✕
s[x_][y_][z_] := x[z][y[z]] 
✕
k[x_][y_] := x 
Then—by virtue of the standard evaluation process in the Wolfram Language—every time we enter a combinator expression these rules will automatically be repeatedly applied, until a fixed point is reached:
✕
s[s][s][s[s[s]]][k][s] 
What exactly is happening “inside” here? If we trace it in a simpler case, we can see that there is repeated evaluation, with a depthfirst (AKA leftmostinnermost) scheme for deciding what to evaluate:
✕
Dataset[Trace[s[k[k][k]][s][s]]] 
Of course, given the assignment above for s, if one enters a combinator expression—like s[s][s][s[s]][s][s]]—whose evaluation doesn’t terminate, there’ll be trouble, much as if we define x = x + 1 (or x = {x}) and ask for x. Back when I was first doing language design people often told me that issues like this meant that a language that used automatic infinite evaluation “just couldn’t work”. But 40+ years later I think I can say with confidence that “programming with infinite evaluation, assuming fixed points” works just great in practice—and in rare cases where there isn’t going to be a fixed point one has to do something more careful anyway.
In the Wolfram Language, that’s all about specifically applying rules, rather than just having it happen automatically. Let’s say we clear our assignments for s and k:
✕
Clear[s, k] 
Now no transformations associated with s and k will automatically be made:
✕
s[s][s][s[s[s]]][k][s] 
But by using /. (ReplaceAll) we can ask that the s, k transformation rules be applied once:
✕
s[s][s][s[s[s]]][k][s] /. {s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x} 
With FixedPointList we can go on applying the rule until we reach a fixed point:
✕
FixedPointList[#] 
It takes 26 steps—which is different from the 89 steps for our leftmostoutermost evaluation, or the 29 steps for leftmostinnermost (depthfirst) evaluation. And, yes, the difference is the result of /. in effect applying rules on the basis of a different scheme than the ones we’ve considered so far.
But, OK, so how can we parametrize possible schemes? Let’s go back to the combinator expression from the beginning of the previous section:
✕
Clear[s,k]; s[s[s][s][s[s][k[k][s]][s]]][s][s][s[k[s][k]][k][s]] 
Here are the positions of possible matches in this expression:
✕
Position[s[s[s][s][s[s][k[k][s]][s]]][s][s][s[k[s][k]][k][s]], s[_][_][_]  k[_][_]] 
An evaluation scheme must define a way to say which of these matches to actually do at each step. In general we can apply pretty much any algorithm to determine this. But a convenient approach is to think about sorting the list of positions by particular criteria, and then for example using the first k positions in the result.
Given a list of positions, there are two obvious potential types of sorting criteria to use: ones based on the lengths of the position specifications, and ones based on their contents. For example, we might choose (as Sort by default does) to sort shorter position specifications first:
✕
Sort[{{0, 0, 0, 1, 1, 0, 1}, {0, 0, 0, 1, 1}, {0, 0, 0, 1}, {0}, {1, 0, 0, 1}, {1}}] 
But what do the shorter position specifications correspond to? They’re the more “outer” parts of the combinator expression, higher on the tree. And when we say we’re using an “outermost” evaluation scheme, what we mean is that we’re considering matches higher on the tree first.
Given two position specifications of the same length, we then need a way to compare these. An obvious one is lexicographic—with 0 sorted before 1. And this corresponds to taking f before x in f[x], or taking the leftmost object first.
We have to decide whether to sort first by length and then by content, or the other way around. But if we enumerate all choices, here’s what we get:
✕

And here’s where the first match with each scheme occurs in the expression tree:
✕

So what happens if we use these schemes in our combinator evolution? Here’s the result for the terminating example s[s][s][s[s[s]]][k][s] above, always keeping only the first match with a given sorting criterion, and at each step showing where the matches were applied:
✕

Here now are the results if we allow the first up to 2 matches from each sorted list to be applied:
✕

Here are the results for leftmost outermost, allowing up to between 1 and 8 updates at each step:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Table[ CombinatorEvolutionPlot[ CombinatorFixedPointList[ s[s][s][s[s[s]]][k][s], {"Leftmost", "Outermost", k}], "SizeAndMatches", "EvaluationScheme" > {"Leftmost", "Outermost", k}, PlotRange > All, ImageSize > 150], {k, 8}] 
And here’s a table of the “time to reach the fixed point” with different evaluation schemes, allowing different numbers of updates at each step:
✕

Not too surprisingly, the time to reach the fixed point always decreases when the number of updates that can be done at each step increases.
For the somewhat simpler terminating example s[s[s[s]]][s][s][s] (S(S(SS))SSS) we can explicitly look at the updates on the trees at each step for each of the different schemes:
✕

OK, so what about a combinator expression that does not terminate? What will these different evaluation schemes do? Here are the results for s[s[s]][s][s][s][s] (S(SS)SSSS) over the course of 50 steps, in each case using only one match at each step:
✕

And here is what happens if we allow successively more matches (selected in leftmostoutermost order) to be used at each step:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Table[ CombinatorEvolutionPlot[ CombinatorEvolveList[s[s[s]][s][s][s][s], 50, {"Leftmost", "Outermost", k}], "SizeAndMatches", "EvaluationScheme" > {"Leftmost", "Outermost", k}, PlotRange > All, ImageSize > 150], {k, 4}] 
Not surprisingly, the more matches allowed, the faster the growth in size (and, yes, looking at pictures like this suggests studying a kind of “continuum limit” or “mean field theory” for combinator evolution):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ Table[Callout[ LeafCount /@ CombinatorEvolveList[s[s[s]][s][s][s][s], 50, {"Leftmost", "Outermost", n}], n], {n, 10}], ScalingFunctions > "Log", Frame > True] 
It’s interesting to look at the ratios of sizes on successive steps for different updating schemes (still for s[s[s]][s][s][s][s]). Some schemes lead to much more “obviously simple” longterm behavior than others:
✕

In fact, just changing the number of allowed matches (here for leftmost outermost) can have similar effects:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Table[ ListStepPlot[ Ratios[LeafCount /@ CombinatorEvolveList[s[s[s]][s][s][s][s], 100, {"Leftmost", "Outermost", k}]], PlotRange > All, ImageSize > 150, Frame > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleLight"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"]], {k, 4}] 
What about for other combinator expressions? Different updating schemes can lead to quite different behavior. Here’s s[s[s]][s][s[s[s]]][k] (S(SS)S(S(SS))K):
✕

And here’s s[s[s]][s][s][s][s[k]] (S(SS)SSS(SK))—which for some updating schemes gives purely periodic behavior (something which can’t happen without a k in the original combinator expression):
✕

It’s worth noting that—at least when there are k’s involved—different updating schemes can even change whether the evaluation of a particular combinator expression ever terminates. This doesn’t happen below size 8. But at size 8, here’s what happens for example with s[s][s][s[s]][s][s][k] (SSS(SS)SSK):
✕

For some updating schemes it reaches a fixed point (always just s[k]) but for others it gives unbounded growth. The innermost schemes are the worst in terms of “missing fixed points”; they do it for 16 size8 combinator expressions. But (as we mentioned earlier) leftmost outermost has the important feature that it’ll never miss a fixed point if one exists—though sometimes at the risk of taking an overly ponderous route to the fixed point.
But so if one’s applying combinatorlike transformation rules in practice, what’s the best scheme to use? The Wolfram Language /. (ReplaceAll) operation in effect uses a leftmostoutermost scheme—but with an important wrinkle: instead of just using one match, it uses as many nonoverlapping matches as possible.
Consider again the combinator expression:
✕
s[s[s][s][s[s][k[k][s]][s]]][s][s][s[k[s][k]][k][s]] 
In leftmostoutermost order the possible matches here are:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Keys[ CombinatorMatches[ s[s[s][s][s[s][k[k][s]][s]]][s][s][s[k[s][k]][k][s]], {"Leftmost", "Outermost"}]] 
But the point is that the match at position {0} overlaps the match at position {0,0,0,1} (i.e. it is a tree ancestor of it). And in general the possible match positions form a partially ordered set, here:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ReverseGraph[ RelationGraph[ ListStrictPrefixQ, {{0, 0, 0, 1, 1, 0, 1}, {0, 0, 0, 1, 1}, {0, 0, 0, 1}, {0}, {1, 0, 0, 1}, {1}}, VertexLabels > Automatic]] 
One possibility is always to use matches at the “bottom” of the partial order—or in other words, the very innermost matches. Inevitably these matches can’t overlap, so they can always be done in parallel, yielding a “parallel innermost” evaluation scheme that is potentially faster (though runs the risk of not finding a fixed point at all).
What /. does is effectively to use (in leftmost order) all the matches that appear at the “top” of the partial order. And the result is again typically faster overall updating. In the s[s][s][s[s]][s][s][k] example above, repeatedly applying /. (which is what //. does) finds the fixed point in 23 steps, while it takes ordinary onereplacementatatime leftmostoutermost updating 30 steps—and parallel innermost doesn’t terminate in this case:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Labeled[ListStepPlot[#[[1]], PlotRange > All, ImageSize > 150, Frame > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleDark"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"]], Text[Style[#[[2]], Italic, 12]]] & /@ {{LeafCount /@ NestList[# /. {s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x} &, s[s][s][s[s]][s][s][k], 35], "Wolfram Language /."}, {LeafCount /@ CombinatorEvolveList[s[s][s][s[s]][s][s][k], 35], "leftmost outermost"}} 
For s[s][s][s[s[s]]][k][s] (SSS(S(SS))KS) parallel innermost does terminate, getting a result in 27 steps compared to 26 for /.—but with somewhat smaller intermediate expressions:
✕

For a case in which there isn’t a fixed point, however, /. will often lead to more rapid growth. For example, with s[s[s]][s][s][s][s] (S(SS)SSSS) it basically gives pure exponential 2^{t/2} growth (and eventually so does parallel innermost):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Labeled[ListStepPlot[#[[1]], PlotRange > All, ImageSize > 150, Frame > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleDark"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"]], Text[Style[#[[2]], Italic, 12]]] & /@ {{LeafCount /@ NestList[# /. {s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x} &, s[s[s]][s][s][s][s], 35], "Wolfram Language /."}, {LeafCount /@ CombinatorEvolveList[s[s[s]][s][s][s][s], 35], "leftmost outermost"}, {LeafCount /@ NestList[ CombinatorStep[Automatic, #, {"Parallel", "Innermost"}] &, s[s[s]][s][s][s][s], 35], "parallel innermost"}} 
In A New Kind of Science I gave a bunch of results for combinators with /. updating, finding much of the same kind of behavior for “combinators in the wild” as we’ve seen here.
But, OK, so we’ve got the updating scheme of /. (and its repeated version //.), and we’ve got the updating scheme for automatic evaluation (with and without functions with “hold” attributes). But are there other updating schemes that might also be useful, and if so, how might we parametrize them?
I’ve wondered about this since I was first designing SMP—the forerunner to Mathematica and the Wolfram Language—more than 40 years ago. One place where the issue comes up is in automatic evaluation of recursively defined functions. Say one has a factorial function defined by:
✕
f[1] = 1; f[n_] := n f[n  1] 
What will happen if one asks for f[0]? With the most obvious depthfirst evaluation scheme, one will evaluate f[1], f[2], etc. forever, never noticing that everything is eventually going to be multiplied by 0, and so the result will be 0. If instead of automatic evaluation one was using //. all would be well—because it’s using a different evaluation order:
✕
f[0] //. f[n_] > n f[n  1] 
Let’s consider instead the recursive definition of Fibonacci numbers (to make this more obviously “combinator like” we could for example use Construct instead of Plus):
✕
f[1] = f[2] = 1; f[n_] := f[n  1] + f[n  2] 
If you ask for f[7] you’re essentially going to be evaluating this tree:
✕

But the question is: how do you do it? The most obvious approach amounts to doing a depthfirst scan of the tree—and doing about ϕ^{n} computations. But if you were to repeatedly use /. instead, you’d be doing more of a breadthfirst scan, and it’d take more like O(n^{2}) computations:
✕
FixedPointList[# /. {f[1] > 1, f[2] > 1, f[n_] > f[n  1] + f[n  2]} &, f[7]] 
But how can one parametrize these different kinds of behavior? From our modern perspective in the Wolfram Physics Project, it’s like picking different foliations—or different reference frames—in what amount to causal graphs that describe the dependence of one result on others. In relativity, there are some standard reference frames—like inertial frames parametrized by velocity. But in general it’s not easy to “describe reasonable reference frames”, and we’re typically reduced to just talking about named metrics (Schwarzschild, Kerr, …), much like here we’re talking about “named updated orders” (“leftmost innermost”, “outermost rightmost”, …).
But back in 1980 I did have an idea for at least a partial parametrization of evaluation orders. Here it is from section 3.1 of the SMP documentation:
What I called a “projection” then is what we’d call a function now; a “filter” is what we’d now call an argument. But basically what this is saying is that usually the arguments of a function are evaluated (or “simplified” in SMP parlance) before the function itself is evaluated. (Though note the aheadofitstime escape clause about “future parallelprocessing implementations” which might evaluate arguments asynchronously.)
But here’s the funky part: functions in SMP also had Smp and Rec properties (roughly, modern “attributes”) that determined how recursive evaluation would be done. And in a first approximation, the concept was that Smp would choose between innermost and outermost, but then in the innermost case, Rec would say how many levels to go before “going outermost” again.
And, yes, nobody (including me) seems to have really understood how to use these things. Perhaps there’s a natural and easytounderstand way to parametrize evaluation order (beyond the /. vs. automatic evaluation vs. hold attributes mechanism in Wolfram Language), but I’ve never found it. And it’s not encouraging here to see all the complexity associated with different updating schemes for combinators.
By the way, it’s worth mentioning that there is always a way to completely specify evaluation order: just do something like procedural programming, where every “statement” is effectively numbered, and there can be explicit Goto’s that say what statement to execute next. But in practice this quickly gets extremely fiddly and fragile—and one of the great values of functional programming is that it streamlines things by having “execution order” just implicitly determined by the order in which functions get evaluated (yes, with things like Throw and Catch also available).
And as soon as one’s determining “execution order” by function evaluation order, things are immediately much more extensible: without having to specify anything else, there’s automatically a definition of what to do, for example, when one gets a piece of input with more complex structure. If one thinks about it, there are lots of complex issues about when to recurse through different parts of an expression versus when to recurse through reevaluation. But the good news is that at least the way the Wolfram Language is designed, things in practice normally “just work” and one doesn’t have to think about them.
Combinator evaluation is one exception, where, as we have seen, the details of evaluation order can have important effects. And presumably this dependence is in fact connected to why it’s so hard to understand how combinators work. But studying combinator evaluation once again inspires one (or at least me) to try to find convenient parametrizations for evaluation order—perhaps now using ideas and intuition from physics.
In the definitions of the combinators s and k
✕
{s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x} 
S is basically the one that “builds things up”, while K is the one that “cuts things down”. And historically, in creating and proving things with combinators, it was important to have the balance of both S and K. But what we’ve seen above makes it pretty clear that S alone can already do some pretty complicated things.
So it’s interesting to consider the minimal case of combinators formed solely from s. For size n (i.e. LeafCount[n]), there are
✕
CatalanNumber[n1]==Binomial[2 n+1,n+1]/(2n+1) 
(~ for large n) possible such combinators, each of which can be characterized simply in terms of the sequence of bracket openings and closings it involves.
Some of these combinators terminate in a limited time, but above size 7 there are ones that do not:
✕

And already there’s something weird: the fraction of nonterminating combinator expressions steadily increases with size, then precipitously drops, then starts climbing again:
✕

But let’s look first at the combinator expressions whose evaluation does terminate. And, by the way, when we’re dealing with S alone, there’s no possibility of some evaluation schemes terminating and others not: they either all terminate, or none do. (This result was established in the 1930s from the fact that the S combinator—unlike K—in effect “conserves variables”, making it an example of the socalled λI calculus.)
With leftmostoutermost evaluation, here are the halting time distributions, showing roughly exponential falloff with gradual broadening:
✕

And here are the (leftmostoutermost) “champions”—the combinator expressions that survive longest (with leftmostoutermost evaluation) before terminating:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Text[ Grid[Prepend[ Append[#, CombinatorTraditionalForm[Last[#]]] & /@ {{2, 0, s[s]}, {3, 0, s[s][s]}, {4, 1, s[s][s][s]}, {5, 2, s[s][s][s][s]}, {6, 4, s[s][s][s][s][s]}, {7, 15, s[s[s[s]]][s][s][s]}, {8, 15, s[s[s[s[s]]][s][s][s]]}, {9, 86, s[s[s]][s[s]][s[s]][s][s]}, {10, 1109, s[s[s][s]][s[s]][s][s][s][s]}, {11, 1109, s[s[s[s][s]][s[s]][s][s][s][s]]}, {12, 1444, s[s[s]][s[s]][s[s][s][s][s][s]][s]}, {13, 6317, s[s[s]][s[s]][s[s][s][s][s][s][s]][s]}, {14, 23679, s[s[s]][s[s]][s[s][s][s][s][s][s][s]][s]}, {15, 131245, s[s[s]][s[s]][s[s][s][s][s][s][s][s][s]][s]}, {16, 454708, s[s[s]][s[s]][s[s][s][s][s][s][s][s][s][s]][s]} }, Style[#, Italic] & /@ {"size", "max steps", "expression", ""}], Frame > All, Background > {{GrayLevel[0.9]}, {GrayLevel[0.9]}, None}]] 
The survival (AKA halting) times grow roughly exponentially with size—and notably much slower than what we saw in the SK case above:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ Transpose[{Range[4, 16], {1, 2, 4, 15, 15, 86, 1109, 1109, 1444, 6317, 23679, 131245, 454708}}], Center, ScalingFunctions > "Log", AspectRatio > .4, Frame > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleLight"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"], ImageSize > 310] 
How do the champions actually behave? Here’s what happens for a sequence of sizes:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[CombinatorFixedPointList[#], "SizeAndMatches", ImageSize > 250, Epilog > Text[Style[First[#], Directive[FontSize > 12, GrayLevel[0.25], FontFamily > "Source Sans Pro"]], Scaled[{.25, 1}], {1.5, 1.4}]] & /@ {{"size 8", s[s[s[s[s]]][s][s][s]]}, {"size 9", s[s[s]][s[s]][s[s]][s][s]}, {"size 10", s[s[s][s]][s[s]][s][s][s][s]}, {"size 11", s[s[s[s][s]][s[s]][s][s][s][s]]}} 
There’s progressive increase in size, and then splat: the evolution terminates. Looking at the detailed behavior (here for size 9 with a “rightassociative rendering”) shows that what’s going on is quite systematic:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ CombinatorFixedPointList[ s[s[s]][s[s]][s[s]][s][s]], "ArrayPlotRightAssociative", AspectRatio > .3, "IncludeUpdateHighlighting" > False] 
The differences again reflect the systematic character of the behavior:
✕

And it seems that what’s basically happening is that the combinator is acting as a kind of digital counter that’s going through an exponential number of steps—and ultimately building a very regular tree structure:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorExpressionGraph[ CombinatorFixedPoint[s[s[s]][s[s]][s[s][s][s][s][s]][s]], AspectRatio > .25, "ShowVertexLabels" > False, VertexSize > Large] 
By the way, even though the final state is the same, the evolution is quite different with different evaluation schemes. And for example our “leftmostoutermost champions” actually terminate much faster with depthfirst evaluation:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ CombinatorFixedPointList[#, {"Leftmost", "Innermost", 1}], "SizeAndMatches", "EvaluationScheme" > {"Leftmost", "Innermost", 1}, ImageSize > 190, PlotRange > All, Epilog > Text[Style[First[#], Directive[FontSize > 12, GrayLevel[0.25], FontFamily > "Source Sans Pro"]], Scaled[{.25, 1}], {1, 1.4}]] & /@ {{"size 8", s[s[s[s[s]]][s][s][s]]}, {"size 9", s[s[s]][s[s]][s[s]][s][s]}, {"size 10", s[s[s][s]][s[s]][s][s][s][s]}} 
Needless to say, there can be different depthfirst (AKA leftmostinnermost) champions, although—somewhat surprisingly—some turn out to be the same (but not sizes 8, 12, 13):
✕

We can get a sense of what happens with all possible evaluation schemes if we look at the multiway graph. Here is the result for the size8leftmostoutermost champion s[s[s[s]]][s][s][s]:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Function[sch, Labeled[Graph[ MWCombinatorGraphMinimal[s[s[s[s]]][s][s][s], 15, CombinatorEvolveList[#1, #2, sch] &, NodeSizeMultiplier > .4], AspectRatio > 1, ImageSize > 250], Text[Style[Row[ToLowerCase /@ sch, Spacer[1]], Italic, 12]]]] /@ {{"Leftmost", "Outermost"}, {"Leftmost", "Innermost"}} 
The number of expressions at successive levels in the multiway graph starts off growing quite exponentially, but after 12 steps it rapidly drops—eventually yielding a finite graph with 74 nodes (leftmost outermost is the “slowest” evaluation scheme—taking the maximum 15 steps possible):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ Length /@ ResourceFunction["MultiwayCombinator"][{s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x}, s[s[s[s]]][s][s][s], 16], Center, Frame > True, AspectRatio > .4, Frame > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleLight"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"], ImageSize > 280] 
Even for the size9 champion the full multiway graph is too large to construct explicitly. After 15 steps the number of nodes has reached 6598, and seems to be increasingly roughly like —even though after at most 86 steps all “dangling ends” must have resolved, and the system must reach its fixed point:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Graph[ MWCombinatorGraphMinimal[s[s[s]][s[s]][s[s]][s][s], 12, NodeSizeMultiplier > 1.5], AspectRatio > 1] 
What happens with s combinator expressions that do not terminate? We already saw above some examples of the kind of growth in size one observes (say with leftmostoutermost evaluation). Here are examples with roughly exponential behavior, with differences between successive steps shown on a log scale:
✕

And here are examples of differences shown on a linear scale:
✕

Sometimes there are fairly long transients, but what’s notable is that among all the 8629 infinitegrowth combinator expressions up to size 11 there are none whose evolution seems to show longterm irregularity in overall size. Of course, something like rule 30 also doesn’t show irregularity in overall size; one has to look “inside” to see complex behavior—and difficulties of visualization make that hard to systematically do in the case of combinators.
But looking at the pictures above there seem to be a “limited number of ways” that combinator expressions grow without bound. Sometimes it’s rather straightforward to see how the infinite growth happens. Here’s a particularly “pure play” example: the size9 case s[s[s[s]]][s[s[s]]][s[s]] (S(S(SS))(S(SS))(SS)) which evolves the same way with all evaluation schemes (in the pictures, the root of the match at each step is highlighted):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorExpressionGraph[#, "ShowVertexLabels" > False, "UpdateHighlighting" > {"Leftmost", "Outermost", 1}, "MatchHighlighting" > True, ImageSize > {Automatic, 120}] & /@ CombinatorEvolveList[s[s[s[s]]][s[s[s]]][s[s]], 6] 
Looking at the subtree “below” each match we see
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorExpressionGraph[#, "ShowVertexLabels" > False, "MatchHighlighting" > True, ImageSize > {Automatic, 60}] & /@ (First[ Extract[#, {First[Keys[CombinatorMatches[#]]]}]] & /@ CombinatorEvolveList[s[s[s[s]]][s[s[s]]][s[s]], 20]) 
and it is clear that there is a definite progression which will keep going forever, leading to infinite growth.
But if one looks at the corresponding sequence of subtrees for a case like the smallest infinitegrowth combinator expression s[s][s][s[s]][s][s] (SSS(SS)SS), it’s less immediately obvious what’s going on:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorExpressionGraph[#, "ShowVertexLabels" > False, "UpdateHighlighting" > "Nodes", "EvaluationScheme" > {"Leftmost", "Outermost", 1}, ImageSize > {Automatic, 60}] & /@ (First[ Extract[#, {First[Keys[CombinatorMatches[#]]]}]] & /@ CombinatorEvolveList[s[s][s][s[s]][s][s], 20]) 
But there’s a rather remarkable result from the end of the 1990s that gives one a way to “evaluate” combinator expressions, and tell whether they’ll lead to infinite growth—and in particular to be able to say directly from an initial combinator expression whether it’ll continue evolving forever, or will reach a fixed point.
One starts by writing a combinator expression like s[s[s[s]]][s[s[s]]][s[s]] (S(S(SS))(S(SS))(SS)) in an explicitly “functional” form:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; FunctionToApplication[s[s[s[s]]][s[s[s]]][s[s]]] /. Application > f 
Then one imagines f[x,y] as being a function with explicit (say, integer) values. One replaces s by some explicit value (say an integer), then defines values for f[1,1], f[1,2], etc.
As a first example, let’s say that we take s = 1 and f[x_,y_]=x+y. Then we can “evaluate” the combinator expression above as
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; SCombinatorAutomatonTreeGeneral[ s[s[s[s]]][s[s[s]]][s[s]], Application[x_, y_] > x + y, 1, VertexSize > .6] 
and in this case the value at the root just counts the total size (i.e. LeafCount).
But by changing f one can probe other aspects of the combinator expression tree. And what was discovered in 2000 is that there’s a complete way to test for infinite growth by setting up 39 possible values, and making f[x,y] be a particular (“tree automaton”) “multiplication table” for these values:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; gridColors[x_] := Blend[{Hue[0.1, 0.89, 0.984], Hue[0.16, 0.51, 0.984], Hue[ 0.04768041237113402, 0, 0.984]}, x] Grid[MapIndexed[ If[#2[[1]] === 1  #2[[2]] === 1, Item[Style[#1, 9, Bold, GrayLevel[.35]], Background > GrayLevel[.9]], If[#1 == 38, Item["", Background > RGBColor[0.984, 0.43, 0.208], FrameStyle > Darker[RGBColor[0.984, 0.43, 0.208], .2]], Item[Style[#1, 9, GrayLevel[0, .6]], Background > gridColors[(38  #1)/38] , FrameStyle > Darker[RGBColor[0.984, 0.43, 0.208], .2]]]] &, Prepend[MapIndexed[Flatten[Prepend[#1, (#2  1)]] &, Table[i\[Application]j /. maintablesw, {i, 0, 38}, {j, 0, 38}]], Flatten[Prepend[Range[0, 38], "\[Application]"]]], {2}], Spacings > {.25, 0}, ItemSize > {1, 1}, Frame > All, FrameStyle > GrayLevel[.6], BaseStyle > "Text"] 
Bright red (value 38) represents the presence of an infinite growth seed—and once one exists, f makes it propagate up to the root of the tree. And with this setup, if we replace s by the value 0, the combinator expression above can be “evaluated” as:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; SCombinatorAutomatonTree[s[s[s[s]]][s[s[s]]][s[s]], VertexSize > .5] 
At successive steps in the evolution we get:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; SCombinatorAutomatonTree[#, VertexSize > .8, ImageSize > {Automatic, 140}] & /@ CombinatorEvolveList[s[s[s[s]]][s[s[s]]][s[s]], 5] 
Or after 8 steps:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; SCombinatorAutomatonTree[ CombinatorEvolve[s[s[s[s]]][s[s[s]]][s[s]], 8], VertexSize > .8] 
The “lowest 38” is always at the top of the subtree where the match occurs, serving as a “witness” of the fact that this subtree is an infinite growth seed.
Here are some sample size7 combinator expressions, showing how the two that lead to infinite growth are identified:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Labeled[SCombinatorAutomatonTree[#, VertexSize > .5, ImageSize > {Automatic, 200}], Style[Text[#], 12]] & /@ {s[s][s][s][s][s][s], s[s[s][s][s][s]][s], s[s[s]][s][s][s][s], s[s][s][s[s]][s][s], s[s[s[s]][s[s]]][s], s[s][s[s[s][s]]][s], s[s[s[s]]][s[s[s]]]} 
If we were dealing with combinator expressions involving both S and K we know that it’s in general undecidable whether a particular expression will halt. So what does it mean that there’s a decidable way to determine whether an expression involving only S halts?
One might assume it’s a sign that S alone is somehow computationally trivial. But there’s more to this issue. In the past, it has often been thought that a “computation” must involve starting with some initial (“input”) state, then ending up at a fixed point corresponding to a final result. But that’s certainly not how modern computing in practice works. The computer and its operating system do not completely stop when a particular computation is finished. Instead, the computer keeps running, but the user is given a signal to come and look at something that provides the output for the computation.
There’s nothing fundamentally different about how computation universality works in a setup like this; it’s just a “deployment” issue. And indeed the simplest possible examples of universality in cellular automata and Turing machines have been proved this way.
So how might this work for S combinator expressions? Basically any sophisticated computation has to live on top of an infinite combinator growth process. Or, put another way, the computation has to exist as some kind of “transient” of potentially unbounded length, that in effect “modulates” the infinite growth “carrier”.
One would set up a program by picking an appropriate combinator expression from the infinite collection that lead to infinite growth. Then the evolution of the combinator expression would “run” the program. And one would use some computationally bounded process (perhaps a bounded version of a tree automaton) to identify when the result of the computation is ready—and one would “read it out” by using some computationally bounded “decoder”.
My experience in the computational universe—as captured in the Principle of Computational Equivalence—is that once the behavior of a system is not “obviously simple”, the system will be capable of sophisticated computation, and in particular will be computation universal. The S combinator is a strange and marginal case. At least in the ways we have looked at it here, its behavior is not “obviously simple”. But we have not quite managed to identify things like the kind of seemingly random behavior that occurs in a system like rule 30, that are a hallmark of sophisticated computation, and probably computation universality.
There are really two basic possibilities. Either the S combinator alone is capable of sophisticated computation, and there is, for example, computational irreducibility in determining the outcome of a long s combinator evolution. Or the S combinator is fundamentally computationally reducible—and there is some approach (and maybe some new direction in mathematics) that “cracks it open”, and allows one to readily predict everything that an S combinator expression will do.
I’m not sure which way it’s going to go—although my almostuniform experience over the last four decades has been that when I think some system is “too simple” to “do anything interesting” or show sophisticated computation, it eventually proves me wrong, often in bizarre and unexpected ways. (In the case of the S combinator, a possibility—like I found for example in register machines—is that sophisticated computation might first reveal itself in very subtle effects, like seemingly random offbyone patterns.)
But whatever happens, it’s amazing that 100 years after the invention of the S combinator there are still such big mysteries about it. In his original paper, Moses Schönfinkel expressed his surprise that something as simple as S and K were sufficient to achieve what we would now call universal computation. And it will be truly remarkable if in fact one can go even further, and S alone is sufficient: a minimal example of universal computation hiding in plain sight for a hundred years.
(By the way, in addition to ordinary “deterministic” combinator evolution with a particular evaluation scheme, one can also consider the “nondeterministic” case corresponding to all possible paths in the multiway graph. And in that case there’s a question of categorizing infinite graphs obtained by nonterminating S combinator expressions—perhaps in terms of transfinite numbers.)
Not long ago one wouldn’t have had any reason to think that ideas from physics would relate to combinators. But our Wolfram Physics Project has changed that. And in fact it looks as if methods and intuition from our Physics Project—and the connections they make to things like relativity—may give some interesting new insights into combinators, and may in fact make their operation a little less mysterious.
In our Physics Project we imagine that the universe consists of a very large number of abstract elements (“atoms of space”) connected by relations—as represented by a hypergraph. The behavior of the universe—and the progression of time—is then associated with repeated rewriting of this hypergraph according to a certain set of (presumably local) rules.
It’s certainly not the same as the way combinators work, but there are definite similarities. In combinators, the basic “data structure” is not a hypergraph, but a binary tree. But combinator expressions evolve by repeated rewriting of this tree according to rules that are local on the tree.
There’s a kind of intermediate case that we’ve often used as a toy model for aspects of physics (particularly quantum mechanics): string substitution systems. A combinator expression can be written out “linearly” (say as s[s][s][s[s[s]]][k][s]), but really it’s treestructured and hierarchical. In a string substitution system, however, one just has plain strings, consisting of sequences of characters, without any hierarchy. The system then evolves by repeatedly rewriting the string by applying some local string substitution rule.
For example, one could have a rule like {"A" → "BBB","BB" → "A"}. And just like with combinators, given a particular string—like "BBA"—there are different possible choices about where to apply the rule. And—again like with combinators—we can construct a multiway graph to represent all possible sequences of rewritings:
✕
Graph[ResourceFunction["MultiwaySystem"][{"A" > "BBB", "BB" > "A"}, "BBA", 5, "StatesGraph"], AspectRatio > 1] 
And again as with combinators we can define a particular “evaluation order” that determines which of the possible updates to the string to apply at each step—and that defines a path through the multiway graph.
For strings there aren’t really the same notions of “innermost” and “outermost”, but there are “leftmost” and “rightmost”. Leftmost updating in this case would give the evolution history
✕
NestList[StringReplace[#, {"A" > "BBB", "BB" > "A"}, 1] &, "BBA", 10] 
which corresponds to the path:
✕
With[{g = Graph[ResourceFunction["MultiwaySystem"][{"A" > "BBB", "BB" > "A"}, "BBA", 5, "StatesGraph"], AspectRatio > 1]}, HighlightGraph[g, Style[Subgraph[g, NestList[StringReplace[#, {"A" > "BBB", "BB" > "A"}, 1] &, "BBA", 10]], Thick, RGBColor[0.984, 0.43, 0.208]]]] 
Here’s the underlying evolution corresponding to that path, with the updating events indicated in yellow:
✕
ResourceFunction["SubstitutionSystemCausalPlot"][ ResourceFunction["SubstitutionSystemCausalEvolution"][{"A" > "BBB", "BB" > "A"}, "BBA", 8, "First"], "CellLabels" > True, "ColorTable" > {Hue[0.6296304159168616, 0.13, 0.9400000000000001], Hue[0.6296304159168616, 0.07257971950090639, 0.9725480985324374, 1.]}, ImageSize > 120] 
But now we can start tracing the “causal dependence” of one event on another. What characters need to have been produced as “output” from a preceding event in order to provide “input” to a new event? Let’s look at a case where we have a few more events going on:
✕
ResourceFunction["SubstitutionSystemCausalPlot"][ BlockRandom[SeedRandom[33242]; ResourceFunction["SubstitutionSystemCausalEvolution"][{"A" > "BBB", "BB" > "A"}, "BBA", 8, {"Random", 3}]], "CellLabels" > True, "ColorTable" > {Hue[0.6296304159168616, 0.13, 0.9400000000000001], Hue[0.6296304159168616, 0.07257971950090639, 0.9725480985324374, 1.]}, ImageSize > 180] 
But now we can draw a causal graph that shows causal relationships between events, i.e. which events have to have happened in order to enable subsequent events:
✕
With[{gr = ResourceFunction["SubstitutionSystemCausalPlot"][ BlockRandom[SeedRandom[33242]; ResourceFunction[ "SubstitutionSystemCausalEvolution"][{"A" > "BBB", "BB" > "A"}, "BBA", 8, {"Random", 3}]], "CausalGraph" > True, "CausalGraphStyle" > Directive[Thick, Red], "ColorTable" > {Hue[ 0.6296304159168616, 0.13, 0.9400000000000001], Hue[ 0.6296304159168616, 0.07257971950090639, 0.9725480985324374, 1.]}, ImageSize > 180]}, Prepend[gr[[2 ;; 1]], Replace[gr[[1]], Arrow[__] > {}, Infinity]~Join~ {RGBColor[0.984, 0.43, 0.208], Thickness[0.01], Cases[gr, Arrow[__], Infinity]}] ] 
And at a physics level, if we’re an observer embedded in the system, operating according to the rules of the system, all we can ultimately “observe” is the “disembodied” causal graph, where the nodes are events, and the edges represent the causal relationships between these events:
✕
ResourceFunction["SubstitutionSystemCausalGraph"][{"A" > "BBB", "BB" > "A"}, "BBA", 5] 
So how does this relate to combinators? Well, we can also create causal graphs for those—to get a different view of “what’s going on” during combinator evolution.
There is significant subtlety in exactly how “causal dependence” should be defined for combinator systems (when is a copied subtree “different”?, etc.). Here I’ll use a straightforward definition that’ll give us an indication of how causal relationships in combinators work, but that’s going to require further refinement to fit in with other definitions we want.
Imagine we just write out combinator expressions in a linear way. Then here’s a combinator evolution:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Magnify[ CombinatorEvolutionPlot[ CombinatorPlot[#, "FramedMatches", "EvaluationScheme" > {"Leftmost", "Innermost", 1}] & /@ CombinatorEvolveList[s[s][s][s[s[s][k]]][k], 36, {"Leftmost", "Innermost", 1}], "StatesDisplay"], .65] 
To understand causal relationships we need to trace “what gets rewritten to what”—and which previous rewriting events a given rewriting event “takes its input from”. It’s helpful to look at the rewriting process above in terms of trees:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorExpressionGraph[#, "UpdateHighlighting" > {"Nodes", "Subtrees"}, "ShowVertexLabels" > False, "EvaluationScheme" > {"Leftmost", "Innermost", 1}, ImageSize > {Automatic, 50}] & /@ CombinatorEvolveList[s[s][s][s[s[s][k]]][k], 36, {"Leftmost", "Innermost", 1}] 
Going back to a textual representation, we can show the evolution in terms of “states”, and the “events” that connect them. Then we can trace (in orange) what the causal relationships between the events are:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ResourceFunction[ "MultiwayCombinator"][{s[x_][y_][z_] :> x[z][y[z]]} > (EvaluationOrderTake[#, {"Leftmost", "Outermost", 3}] &), s[s][s][s[s[s][k]]][k], 6, "EvolutionCausalGraph"] 
Continuing this for a few more steps we get:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ResourceFunction[ "MultiwayCombinator"][{s[x_][y_][z_] :> x[z][y[z]]} > (EvaluationOrderTake[#, {"Leftmost", "Outermost", 3}] &), s[s][s][s[s[s][k]]][ k], 20, "EvolutionCausalGraphStructure"] 
Now keeping only the causal graph, and continuing until the combinator evolution terminates, we get:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorCausalGraph[ s[s][s][s[s[s][k]]][k], 50, {"Leftmost", "Innermost", 1}, AspectRatio > 3] 
It’s interesting to compare this with a plot that summarizes the succession of rewriting events:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ CombinatorFixedPointList[ s[s][s][s[s[s][k]]][k], {"Leftmost", "Innermost", 1}], "SizeAndMatches", "EvaluationScheme" > {"Leftmost", "Innermost", 1}, ImageSize > 190, PlotRange > All] 
So what are we actually seeing in the causal graph? Basically it’s showing us what “threads of evaluation” occur in the system. When there are different parts of the combinator expression that are in effect getting updated independently, we see multiple causal edges running in parallel. But when there’s a synchronized evaluation that affects the whole system, we just see a single thread—a single causal edge.
The causal graph is in a sense giving us a summary of the structure of the combinator evolution, with many details stripped out. And even when the size of the combinator expression grows rapidly, the causal graph can still stay quite simple. So, for example, the growing combinator s[s][s][s[s]][s][s] has a causal graph that forms a linear chain with simple “side loops” that get systematically further apart:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorCausalGraph[ s[s][s][s[s]][s][s], 40, {"Leftmost", "Outermost", 1}, AspectRatio > 3] 
Sometimes it seems that the growth dies out because different parts of the combinator system become causally disconnected from each other:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; {CombinatorCausalGraph[s[s[s]][s[s]][s[s]][s][s], 200, AspectRatio > 3], Labeled[CombinatorEvolutionPlot[CombinatorEvolveList[#, 100], "SizeAndMatches", ImageSize > 190, PlotRange > All], Style[Text[#], 12]] &[s[s[s]][s[s]][s[s]][s][s]]} 
Here are a few other examples:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ParallelMap[{CombinatorCausalGraph[#, 30, AspectRatio > 3], Labeled[CombinatorEvolutionPlot[CombinatorEvolveList[#, 50], "SizeAndMatches", ImageSize > 190, PlotRange > All], Style[Text[#], 12]]} &, {s[s][s][s[s[s][s]]][k], s[s][s][s[s[s[k]]]][s][s[k]], s[s][s][s[s]][s][k[s]], s[s][s][s[s]][s][s[k]]}] 
But do such causal graphs depend on the evaluation scheme used? This turns out to be a subtle question that depends sensitively on definitions of identity for abstract expressions and their subexpressions.
The first thing to say is that combinators are confluent, in the sense that different evaluation schemes—even if they take different paths—must always give the same final result whenever the evolution of a combinator expression terminates. And closely related to this is the fact that in the multiway graph for a combinator system, any branching must be accompanied by subsequent merging.
For both string and hypergraph rewriting rules, the presence of these properties is associated with another important property that we call causal invariance. And causal invariance is precisely the property that causal graphs produced by different updating orders must always be isomorphic. (And in our model of physics, this is what leads to relativistic invariance, general covariance, objective measurement in quantum mechanics, etc.)
So is the same thing true for combinators? It’s complicated. Both string and hypergraph rewriting systems have an important simplifying feature: when you update something in them, it’s reasonable to think of the thing you update as being “fully consumed” by the updating event, with a “completely new thing” being created as a result of the event.
But with combinators that’s not such a reasonable picture. Because when there’s an updating event, say for s[x][y][z], x can be a giant subtree that you end up “just copying”, without, in a sense, “consuming” and “reconstituting”. In the case of strings and hypergraphs, there’s a clear distinction between elements of the system that are “involved in an update”, and ones that aren’t. But in a combinator system, it’s not so obvious whether nodes buried deep in a subtree that’s “just copied” should be considered “involved” or not.
There’s a complicated interplay with definitions used in constructing multiway graphs. Consider a string rewriting system. Start from a particular state and then apply rewriting rules in all possible ways:
✕
LayeredGraphPlot[ ResourceFunction["MultiwaySystem"][{"A" > "AB", "BB" > "A"}, "A", 5, "EvolutionGraphUnmerged"], AspectRatio > .4] 
Absent anything else, this will just generate a tree of results. But the crucial idea behind multiway graphs is that when states are identical, they should be merged, in this case giving:
✕
Graph[ResourceFunction["MultiwaySystem"][{"A" > "AB", "BB" > "A"}, "A", 5, "StatesGraph"], AspectRatio > .4] 
For strings it’s very obvious what “being identical” means. For hypergraphs, the natural definition is hypergraph isomorphism. What about for combinators? Is it pure tree isomorphism, or should one take into account the “provenance” of subtrees?
(There are also questions like whether one should define the nodes in the multiway graph in terms of “instantaneous states” at all, or whether instead they should be based on “causal graphs so far”, as obtained with particular event histories.)
These are subtle issues, but it seems pretty clear that with appropriate definitions combinators will show causal invariance, so that (appropriately defined) causal graphs will be independent of evaluation scheme.
By the way, in addition to constructing causal graphs for particular evolution histories, one can also construct multiway causal graphs representing all possible causal relationships both within and between different branches of history. This shows the multiway graph for the (terminating) evolution of s[s][s][s[s[k]]][k], annotated with casual edges:
✕
ResourceFunction["MultiwayCombinator"][{s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x}, s[s][s][s[s[k]]][k], 15, "EvolutionCausalGraphStructure", GraphLayout > "LayeredDigraphEmbedding", AspectRatio > 2] 
And here’s the multiway causal graph alone in this case:
✕
ResourceFunction["MultiwayCombinator"][{s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x}, s[s][s][s[s[k]]][k], 15, "CausalGraphStructure", AspectRatio > 1] 
(And, yes, the definitions don’t all quite line up here, so the individual instances of causal graphs that can be extracted here aren’t all the same, as causal invariance would imply.)
The multiway causal graph for s[s[s]][s][s][s][s] shows a veritable explosion of causal edges:
✕
ResourceFunction["MultiwayCombinator"][{s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x}, s[s[s]][s][s][s][s], 17, "CausalGraphStructure", AspectRatio > 1, GraphLayout > "LayeredDigraphEmbedding"] 
In our model of physics, the causal graph can be thought of as a representation of the structure of spacetime. Events that follow from each other are “timelike separated”. Events that can be arranged so that none are timelike separated can be considered to form a “spacelike slice” (or a “surface of simultaneity”), and to be spacelike separated. (Different foliations of the causal graph correspond to different “reference frames” and identify different sets of events as being in the same spacelike slice.)
When we’re dealing with multiway systems it’s also possible for events to be associated with different “threads of history”—and so to be branchlike separated. But in combinator systems, there’s yet another form of separation between events that’s possible—that we can call “treelike separation”.
Consider these two pairs of updating events:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; {CombinatorExpressionGraph[ x[s[x][y][z]][y[s[x][y][z]]], "MatchHighlighting" > True], CombinatorExpressionGraph[s[x][y][s[x][y][z]], "MatchHighlighting" > True]} 
In the first case, the events are effectively “spacelike separated”. They are connected by being in the same combinator expression, but they somehow appear at “distinct places”. But what about the second case? Again the two events are connected by being in the same combinator expression. But now they’re not really “at distinct places”; they’re just “at distinct scales” in the tree.
One feature of hypergraph rewriting systems is that in largescale limits the hypergraphs they produce can behave like continuous manifolds that potentially represent physical space, with hypergraph distances approximating geometric distances. In combinator systems there is almost inevitably a kind of nested structure that may perhaps be reminiscent of scaleinvariant critical phenomena and ideas like scale relativity. But I haven’t yet seen combinator systems whose limiting behavior produces something like finitedimensional “manifoldlike” space.
It’s common to see “event horizons” in combinator causal graphs, in which different parts of the combinator system effectively become causally disconnected. When combinators reach fixed points, it’s as if “time is ending”—much as it does in spacelike singularities in spacetime. But there are no doubt new “treelike” limiting phenomena in combinator systems, that may perhaps be reflected in properties of hyperbolic spaces.
One important feature of both string and hypergraph rewriting systems is that their rules are generally assumed to be somehow local, so that the future effect of any given element must lie within a certain “cone of influence”. Or, in other words, there’s a light cone which defines the maximum spacelike separation of events that can be causally connected when they have a certain timelike separation. In our model of physics, there’s also an “entanglement cone” that defines maximum branchlike separation between events.
But what about in combinator systems? The rules aren’t really “spatially local”, but they are “tree local”. And so they have a limited “tree cone” of influence, associated with a “maximum treelike speed”—or, in a sense, a maximum speed of scale change.
Rewriting systems based on strings, hypergraphs and combinator expressions all have different simplifying and complexifying features. The relation between underlying elements (“characters arranged in sequence”) is simplest for strings. The notion of what counts as the same element is simplest for hypergraphs. But the relation between the “identities of elements” is probably simplest for combinator expressions.
Recall that we can always represent a combinator expression by a DAG in which we “build up from atoms”, sharing common subexpressions all the way up:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Graph[CombinatorToDAG[s[s[s]][s][s[s]][s]], VertexLabels > Placed[Automatic, Automatic, ToString]] 
But what does combinator evolution look like in this representation? Let’s start from the extremely simple case of k[x][y], which in one step becomes just x. Here’s how we can represent this evolution process in DAGs:
✕
CloudGet[ "https://www.wolframcloud.com/obj/swblog/Combinators/Programs.\ wl"]; Graph[#, VertexLabels > Placed[Automatic, Automatic, ToString], GraphLayout > "LayeredDigraphEmbedding"] & /@ SKDAGList[k[x][y], 1] 
The dotted line in the second DAG indicates an update event, which in this case transforms k[x][y] to the “atom” x.
Now let’s consider s[x][y][z]. Once again there’s a dotted line that signifies the evolution:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Graph[#, VertexLabels > Placed[Automatic, Automatic, ToString], GraphLayout > "LayeredDigraphEmbedding"] & /@ SKDAGList[s[x][y][z], 1] 
Now let’s add an extra wrinkle: consider not k[x][y] but s[k[x][y]]. The outer s doesn’t really do anything here. But it still has to be accounted for, in the sense that it has to be “wrapped back around” the x that comes from k[x][y]→x. We can represent that “rewrapping” process, by a “tree pullback pseudoevent” indicated by the dotted line:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Graph[#, VertexLabels > Placed[Automatic, Automatic, ToString], GraphLayout > "LayeredDigraphEmbedding"] & /@ SKDAGList[s[k[x][y]], 1] 
If a given event happens deep inside a tree, there’ll be a whole sequence of “pullback pseudoevents” that “reconstitute the tree”.
Things get quite complicated pretty quickly. Here’s the (leftmostoutermost) evolution of s[s[s]][s][k][s] to its fixed point in terms of DAGs:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; SKDAGList[s[s[s]][s][k][s], 5] 
Or with labels:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Graph[Last[SKDAGList[s[s[s]][s][k][s], 5]], VertexLabels > Placed[Automatic, Automatic, ToString]] 
One notable feature is that this final DAG in a sense encodes the complete history of the evolution—in a “maximally shared” way. And from this DAG we can construct a causal graph—whose nodes are derived from the edges in the DAG representing update events and pseudoevents. It’s not clear how to do this in the most consistent way—particularly when it comes to handling pseudoevents. But here’s one possible version of a causal graph for the evolution of s[s[s]][s][k][s] to its fixed point—with the yellow nodes representing events, and the gray ones pseudoevents:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Graph[DAGCausalGraph[s[s[s]][s][k][s], 5], VertexSize > .3, AspectRatio > .6] 
Start with all possible combinator expressions of a certain size, say involving only s. Some are immediately fixed points. But some only evolve to fixed points. So how are the possible fixed points distributed in the set of all possible combinator expressions?
For size 6 there are 42 possible combinator expressions, and all evolve to fixed points—but only 27 distinct ones. Here are results for several combinator sizes:
✕

As the size of the combinator expression goes up, the fraction of distinct fixed points seems to systematically go down:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ Table[N[Length[Union[fps[[n]]]]/Length[fps[[n]]]], {n, Length[fps]}], Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleLight"], Frame > True] 
And what this shows is that combinator evolution is in a sense a “contractive” process: starting from all possible expressions, there’s only a certain “attractor” of expressions that survives. Here’s a “state transition graph” for initial expressions of size 9 computed with leftmostoutermost evaluation (we’ll see a more general version in the next section):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/\ Programs.wl"]; Graph[ With[{n = 9}, ResourceFunction[ "ParallelMapMonitored"][# > CombinatorFixedPoint[#] &, Complement[Groupings[Table[s, n], Construct > 2], CloudImport[ StringTemplate[ "https://www.wolframcloud.com/obj/swblog/Combinators/Data/S\ NT1e4``.wxf"][n]]]]]] 
This shows the prevalence of different fixedpoint sizes as a function of the size of the initial expression:
✕

What about the cases that don’t reach fixed points? Can we somehow identify different equivalent classes of infinite combinator evolutions (perhaps analogously to the way we can identify different transfinite numbers)? In general we can look at similarities between the multiway systems that are generated, since these are always independent of updating scheme (see the next section).
But something else we can do for both finite and infinite evolutions is to consider the set of subexpressions common to different steps in the evolution—or across different evolutions. Here’s a plot of the number of copies of the ultimately most frequent subexpressions at successive steps in the (leftmostoutermost) evolution of s[s][s][s[s]][s][s] (SSS(SS)SS):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ With[{evol = CombinatorEvolveList[s[s][s][s[s]][s][s], 35]}, Function[c, Callout[Count[#, c, {0, Infinity}, Heads > True] & /@ evol, CombinatorExpressionGraph[c, "ShowVertexLabels" > False]]] /@ {s, s[s], s[s[s]], s[s[s]][s], s[s[s[s]][s]], s[s[s[s]][s]][s], s[s[s]][s[s[s[s]][s]][s]], s[s[s[s]][s[s[s[s]][s]][s]]], s[s[s]][s[s[s[s]][s]][s]][s[s[s[s]][s[s[s[s]][s]][s]]]], s[s[s]][s[s[s[s]][s]][s]][s[s[s[s]][s[s[s[s]][s]][s]]]][ s[s[s[s]][s[s[s[s]][s]][s]]]]}], Frame > True, ImageSize > 800] 
The largest subexpression shown here has size 29. And as the picture makes clear, most subexpressions do not appear with substantial frequency; it’s only a thin set that does.
Looking at the evolution of all possible combinator expressions up to size 8, one sees gradual “freezing out” of certain subexpressions (basically as a result of their involvement in halting), and continued growth of others:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ With[{evols = Transpose[ CombinatorEvolveList[#, 35] & /@ Flatten[Table[ Groupings[Table[s, n], Construct > 2], {n, 8}]]]}, Function[cb, Callout[Count[#, cb, {1, Infinity}, Heads > True] & /@ evols, StandardForm[Text[cb]]]] /@ {s[s[s][s]], s[s[s[s]]], s[s][s], s[s[s]][s], s[s[s[s]][s]], s[s][s[s]], s, s[s[s]], s[s[s[s][s]]], s[s]}], ScalingFunctions > "Log", Frame > True, ImageSize > 800] 
In an attempt to make contact with traditional dynamical systems theory it’s interesting to try to map combinator expressions to numbers. A straightforward way to do this (particularly when one’s only dealing with expressions involving s) is to use Polish notation, which represents
✕
s[s[s]][s[s[s[s]][s]][s]][s[s[s[s]][s[s[s[s]][s]][s]]]][ s[s[s[s]][s[s[s[s]][s]][s]]]] 
as
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorPlot[ s[s[s]][s[s[s[s]][s]][s]][s[s[s[s]][s[s[s[s]][s]][s]]]][ s[s[s[s]][s[s[s[s]][s]][s]]]], "CharactersPolishNotation", "UseCombinatorGlyphs" > None] 
or the binary number
✕
Row[Flatten[ s[s[s]][s[s[s[s]][s]][s]][s[s[s[s]][s[s[s[s]][s]][s]]]][ s[s[s[s]][s[s[s[s]][s]][s]]]] //. x_[y_] > {\[Bullet], x, y}] /. {\[Bullet] > 1, s > 0}] 
i.e., in decimal:
✕
FromDigits[ Flatten[s[s[s]][s[s[s[s]][s]][s]][s[s[s[s]][s[s[s[s]][s]][s]]]][ s[s[s[s]][s[s[s[s]][s]][s]]]] //. x_[y_] > {\[Bullet], x, y}] /. {\[Bullet] > 1, s > 0}, 2] 
Represented in terms of numbers like this, we can plot all subexpressions which arise in the evolution of s[s][s][s[s]][s][s] (SSS(SS)SS):
✕

Making a combined picture for all combinator expressions up to size 8, one gets:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListPlot[ Union[Catenate[ ResourceFunction["ParallelMapMonitored"][ Function[expr, Catenate[ MapIndexed[(Function[u, {First[#2  1], u}] /@ DeleteCases[ cbToNumber /@ Level[#, {0, Infinity}, Heads > True], 0]) &, CombinatorEvolveList[expr, 50]]]], Flatten[Table[Groupings[Table[s, n], Construct > 2], {n, 8}]]]]], ScalingFunctions > "Log", Frame > True] 
There’s definitely some structure: one’s not just visiting every possible subexpression. But quite what the limiting form of this might be is not clear.
Another type of question to ask is what the effect of a small change in a combinator expression is on its evolution. The result will inevitably be somewhat subtle—because there is both spacelike and treelike propagation of effects in the evolution.
As one example, though, consider evolving s[s][s][s[s]][s][s] (SSS(SS)SS) for 20 steps (to get an expression of size 301). Now look at the effect of changing a single s in this expression to s[s], and then evolving the result. Here are the sizes of the expressions that are generated:
✕

How do you tell if two combinator expressions are equal? It depends what you mean by “equal”. The simplest definition—that we’ve implicitly used in constructing multiway graphs—is that expressions are equal only if they’re syntactically exactly the same (say they’re both s[k][s[s]]).
But what about a more semantic definition, that takes into account the fact that one combinator expression can be transformed to another by the combinator rules? The obvious thing to say is that combinator expressions should be considered equal if they can somehow be transformed by the rules into expressions that are syntactically the same.
And so long as the combinators evolve to fixed points this is in principle straightforward to tell. Like here are four syntactically different combinator expressions that all evolve to the same fixed point, and so in a semantic sense can be considered equal:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ Transpose[ Map[If[LeafCount[#] <= 2, #, Magnify[#, .8]] &, Transpose[ PadRight[ CombinatorFixedPointList[#] & /@ {s[s][k][k][s[k]], s[s[s]][s][k][k], s[s][s][k][s][k], s[k[s]][k[k]][k]}, Automatic, ""]], {2}]], "StatesDisplay", Spacings > 2] 
One can think of the fixed point as representing a canonical form to which combinator expressions that are equal can be transformed. One can also think of the steps in the evolution as corresponding to steps in a proof of equality.
But there’s already an issue—that’s associated with the fundamental fact that combinators are computation universal. Because in general there’s no upper bound on how many steps it can take for the evolution of a combinator expression to halt (and no general a priori way to even tell if it’ll halt at all). So that means that there’s also no upper bound on the “length of proof” needed to show by explicit computation that two combinators are equal. Yes, it might only take 12 steps to show that this is yet another combinator equal to s[k]:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ CombinatorFixedPointList[s[s[s]][s][s][s][k]], "StatesDisplay"] 
But it could also take 31 steps (and involve an intermediate expression of size 65):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Magnify[ CombinatorEvolutionPlot[ CombinatorFixedPointList[s[s[s]][s][s][s][s][k]], "StatesDisplay"], .8] 
We know that if we use leftmostoutermost evaluation, then any combinator expression that has a fixed point will eventually evolve to it (even though we can’t in general know how long it will take). But what about combinator expressions that don’t have fixed points? How can we tell if they’re “equal” according to our definition?
Basically we have to be able to tell if there are sequences of transformations under the combinator rules that cause the expressions to wind up syntactically the same. We can think of these sequences of transformations as being like possible paths of evolution. So then in effect what we’re asking is whether there are paths of evolution for different combinators that intersect.
But how can we characterize what possible paths of evolution might exist for all possible evaluation schemes? Well, that’s what the multiway graph does. And in terms of multiway graphs there’s then a concrete way to ask about equality (or, really, equivalence) between combinator expressions. We basically just need to ask whether there is some appropriate path between the expressions in the multiway graph.
There are lots of details, some of which we’ll discuss later. But what we’re basically dealing with is a quintessential example of the problem of theorem proving in a formal system. There are different ways to set things up. But as one example, we could take our system to define certain axioms that transform expressions. Applying these axioms in all possible ways generates a multiway graph with expressions as nodes. But then the statement that there’s a theorem that expression A is equal to expression B (in the sense that it can be transformed to it) becomes the statement that there’s a way to get from A to B in the graph—and giving a path can then be thought of as giving a proof of the theorem.
As an example, consider the combinator expressions:
✕
s[s][s[s][s[s[s]]][k]][k[s[s][s[s[s]]][k]]] 
✕
s[k[s[k][s[s[s]][k]]]][k[s[s][s[s[s]]][k]]] 
Constructing a multiway graph one can then find a path
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; With[{g = ResourceFunction[ "MultiwayCombinator"][{s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x}, s[s][s[s][s[s[s]]][k]][k[s[s][s[s[s]]][k]]], 6, "StatesGraphStructure", GraphLayout > "LayeredDigraphEmbedding", AspectRatio > 1/2]}, HighlightGraph[g, Style[Subgraph[ g, {"s[s][s[s][s[s[s]]][k]][k[s[s][s[s[s]]][k]]]", "s[k[s[s][s[s[s]]][k]]][s[s][s[s[s]]][k][k[s[s][s[s[s]]][k]]]]", "s[k[s[k][s[s[s]][k]]]][s[s][s[s[s]]][k][k[s[s][s[s[s]]][k]]]]", "s[k[s[k][s[s[s]][k]]]][s[k][s[s[s]][k]][k[s[s][s[s[s]]][k]]]]", "s[k[s[k][s[s[s]][k]]]][k[k[s[s][s[s[s]]][k]]][s[s[s]][k][k[s[s][\ s[s[s]]][k]]]]]", "s[k[s[k][s[s[s]][k]]]][k[s[s][s[s[s]]][k]]]"}], Thick, RGBColor[0.984, 0.43, 0.208]]]] 
which corresponds to the proof that one can get from one of these expressions to the other:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[{s[s][s[s][s[s[s]]][k]][ k[s[s][s[s[s]]][k]]], s[k[s[s][s[s[s]]][k]]][s[s][s[s[s]]][k][k[s[s][s[s[s]]][k]]]], s[k[s[k][s[s[s]][k]]]][s[s][s[s[s]]][k][k[s[s][s[s[s]]][k]]]], s[k[s[k][s[s[s]][k]]]][s[k][s[s[s]][k]][k[s[s][s[s[s]]][k]]]], s[k[s[k][s[s[s]][k]]]][ k[k[s[s][s[s[s]]][k]]][s[s[s]][k][k[s[s][s[s[s]]][k]]]]], s[k[s[k][s[s[s]][k]]]][k[s[s][s[s[s]]][k]]]}, "StatesDisplay"] 
In this particular case, both expressions eventually reach a fixed point. But consider the expressions:
✕
s[s[s[s[s][s]]][k]][s[s[s[s[s][s]]][k]]][k[s[s[s[s][s]]][k]]] 
✕
s[s[s[s][s]]][k][s[s[s[s[s][s]]][k]][k[s[s[s[s][s]]][k]]]] 
Neither of these expressions evolves to a fixed point. But there’s still a path in the (ultimately infinite) multiway graph between them
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; With[{g = ResourceFunction[ "MultiwayCombinator"][{s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x}, s[s[s[s[s][s]]][k]][s[s[s[s[s][s]]][k]]][k[s[s[s[s][s]]][k]]], 7, "StatesGraphStructure"]}, HighlightGraph[g, Style[Subgraph[ g, {"s[s[s[s[s][s]]][k]][s[s[s[s[s][s]]][k]]][k[s[s[s[s][s]]][k]]]\ ", "s[s[s[s][s]]][k][k[s[s[s[s][s]]][k]]][s[s[s[s[s][s]]][k]][k[s[s[s[\ s][s]]][k]]]]", "s[s[s][s]][k[s[s[s[s][s]]][k]]][k[k[s[s[s[s][s]]][k]]]][s[s[s[s[\ s][s]]][k]][k[s[s[s[s][s]]][k]]]]", "s[s][s][k[k[s[s[s[s][s]]][k]]]][k[s[s[s[s][s]]][k]][k[k[s[s[s[s]\ [s]]][k]]]]][s[s[s[s[s][s]]][k]][k[s[s[s[s][s]]][k]]]]", "s[k[k[s[s[s[s][s]]][k]]]][s[k[k[s[s[s[s][s]]][k]]]]][k[s[s[s[s][\ s]]][k]][k[k[s[s[s[s][s]]][k]]]]][s[s[s[s[s][s]]][k]][k[s[s[s[s][s]]][\ k]]]]", "k[k[s[s[s[s][s]]][k]]][k[s[s[s[s][s]]][k]][k[k[s[s[s[s][s]]][\ k]]]]][s[k[k[s[s[s[s][s]]][k]]]][k[s[s[s[s][s]]][k]][k[k[s[s[s[s][s]]]\ [k]]]]]][s[s[s[s[s][s]]][k]][k[s[s[s[s][s]]][k]]]]", "k[s[s[s[s][s]]][k]][s[k[k[s[s[s[s][s]]][k]]]][k[s[s[s[s][s]]][k]\ ][k[k[s[s[s[s][s]]][k]]]]]][s[s[s[s[s][s]]][k]][k[s[s[s[s][s]]][k]]]]\ ", "s[s[s[s][s]]][k][s[s[s[s[s][s]]][k]][k[s[s[s[s][s]]][k]]]]"}], Thick, RGBColor[0.984, 0.43, 0.208]]]] 
corresponding to the equivalence proof:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Magnify[ CombinatorEvolutionPlot[{s[s[s[s[s][s]]][k]][s[s[s[s[s][s]]][k]]][ k[s[s[s[s][s]]][k]]], s[s[s[s][s]]][k][k[s[s[s[s][s]]][k]]][ s[s[s[s[s][s]]][k]][k[s[s[s[s][s]]][k]]]], s[s[s][s]][k[s[s[s[s][s]]][k]]][k[k[s[s[s[s][s]]][k]]]][ s[s[s[s[s][s]]][k]][k[s[s[s[s][s]]][k]]]], s[s][s][k[k[s[s[s[s][s]]][k]]]][ k[s[s[s[s][s]]][k]][k[k[s[s[s[s][s]]][k]]]]][ s[s[s[s[s][s]]][k]][k[s[s[s[s][s]]][k]]]], s[k[k[s[s[s[s][s]]][k]]]][s[k[k[s[s[s[s][s]]][k]]]]][ k[s[s[s[s][s]]][k]][k[k[s[s[s[s][s]]][k]]]]][ s[s[s[s[s][s]]][k]][k[s[s[s[s][s]]][k]]]], k[k[s[s[s[s][s]]][k]]][ k[s[s[s[s][s]]][k]][k[k[s[s[s[s][s]]][k]]]]][ s[k[k[s[s[s[s][s]]][k]]]][ k[s[s[s[s][s]]][k]][k[k[s[s[s[s][s]]][k]]]]]][ s[s[s[s[s][s]]][k]][k[s[s[s[s][s]]][k]]]], k[s[s[s[s][s]]][k]][ s[k[k[s[s[s[s][s]]][k]]]][ k[s[s[s[s][s]]][k]][k[k[s[s[s[s][s]]][k]]]]]][ s[s[s[s[s][s]]][k]][k[s[s[s[s][s]]][k]]]], s[s[s[s][s]]][k][s[s[s[s[s][s]]][k]][k[s[s[s[s][s]]][k]]]]}, "StatesDisplay"], .8] 
But with our definition, two combinator expressions can still be considered equal even if one of them can’t evolve into the other: it can just be that among the possible ancestors (or, equivalently for combinators, successors) of the expressions there’s somewhere an expression in common. (In physics terms, that their light cones somewhere overlap.)
Consider the expressions:
✕
{s[s[s][s]][s][s[s][k]], s[s][k][s[s[s][k]]][k]} 
Neither terminates, but it still turns out that there are paths of evolution for each of them that lead to the same expression:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Magnify[ CombinatorEvolutionPlot[ PadRight[FindShortestPath[ ResourceFunction[ "MultiwayCombinator"][{s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x}, #, 12, "StatesGraphStructure"], ToString[#], "s[s[s][k]][s[s[s][k]]][s[s[s][k]]]"] & /@ {s[s[s][s]][s][ s[s][k]], s[s][k][s[s[s][k]]][k]}, Automatic, ""], "StatesDisplay", Spacings > 2], .9] 
If we draw a combined multiway graph starting from the two initial expressions, we can see the converging paths:
✕

But is there a more systematic way to think about relations between combinator expressions? Combinators are in a sense fundamentally computational constructs. But one can still try to connect them with traditional mathematics, and in particular with abstract algebra.
And so, for example, it’s common in the literature of combinators to talk about “combinatory algebra”, and to write an expression like
✕
s[k][s[s[k[s[s]][s]][s]][k]][k[s[k]][s[s[k[s[s]][s]][s]][k]]] 
as
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorPlot[ s[k][s[s[k[s[s]][s]][s]][k]][ k[s[k]][s[s[k[s[s]][s]][s]][k]]], "CharactersLeftAssociative"] 
where now one imagines that • (“application”) is like an algebraic operator that “satisfies the relations”
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Map[Row[{CombinatorPlot[#[[1]], "CharactersLeftAssociative"], Spacer[1], "\[LongEqual]", Spacer[1], CombinatorPlot[#[[2]], "CharactersLeftAssociative"]}] &, {s[x][ y][z] == x[z][y[z]], k[x][y] == x}] /. {x > Style[x, Italic], y > Style[y, Italic], z > Style[z, Italic]} 
with “constants” S and K. To determine whether two combinator expressions are equal one then has to see if there’s a sequence of “algebraic” transformations that can go from one to the other. The setup is very similar to what we’ve discussed above, but the “twoway” character of the rules allows one to directly use standard equational logic theoremproving methods (although because combinator evolution is confluent one never strictly has to use reversed rules).
So, for example, to prove s[k[s]][k[k]][k]s[s][s][k][s][k] or
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Row[{CombinatorTraditionalForm[#[[1]]], Spacer[1], "\[LongEqual]", Spacer[1], CombinatorTraditionalForm[#[[2]]]}] &[ s[k[s]][k[k]][k] == s[s][s][k][s][k]] 
one applies a series of transformations based on the S and K “axioms” to parts of the left and righthand sides to eventually reduce the original equation to a tautology:
✕

One can give the outline of this proof as a standard FindEquationalProof proof graph:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Graph[ FindCombinatorProof[s[k[s]][k[k]][k] == s[s][s][k][s][k], "SK"][ "ProofGraph"], VertexLabels > {"Axiom 1" > "\!\(\*TemplateBox[{},\n\"CombinatorK\"]\) axiom", "Axiom 2" > "S axiom", "Hypothesis 1" > "hypothesis", "Conclusion 1" > "tautology", x_ /; (StringTake[x, 1] === "S") > None}] 
The yellowish dots correspond to the “intermediate lemmas” listed above, and the dotted lines indicate which lemmas use which axioms.
One can establish a theorem like
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Row[{CombinatorTraditionalForm[#[[1]]], Spacer[1], "\[LongEqual]", Spacer[1], CombinatorTraditionalForm[#[[2]]]}] &[ s[k[s]][k[k]][k] == s[s[s]][s][s][s][k]] 
with a slightly more complex proof:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Graph[ FindCombinatorProof[s[k[s]][k[k]][k] == s[s[s]][s][s][s][k], "SK"][ "ProofGraph"], VertexLabels > {"Axiom 1" > "\!\(\*TemplateBox[{},\n\"CombinatorK\"]\) axiom", "Axiom 2" > "S axiom", "Hypothesis 1" > "hypothesis", "Conclusion 1" > "tautology", x_ /; (StringTake[x, 1] === "S") > None}] 
One feature of this proof is that because the combinator rules are confluent—so that different branches in the multiway system always merge—the proof never has to involve critical pair lemmas representing equivalences between branches in the multiway system, and so can consist purely of a sequence of “substitution lemmas”.
There’s another tricky issue, though. And it has to do with taking “everyday” mathematical notions and connecting them with the precise symbolic structure that defines combinators and their evolution. As an example, let’s say you have combinators a and b. It might seem obvious that if a is to be considered equal to b, then it must follow that a[x]b[x] for all x.
But actually saying this is true is telling us something about what we mean by “equal”, and to specify this precisely we have to add the statement as a new axiom.
In our basic setup for proving anything to do with equality (or, for that matter, any equivalence relation), we’re already assuming the basic features of equivalence relations (reflexivity, symmetry, transitivity):
✕
Column[{Infix[f[x, x], "\[LongEqual]"], Implies[Infix[f[x, y], "\[LongEqual]"], Infix[f[y, x], "\[LongEqual]"]], Implies[Wedge[Infix[f[x, y], "\[LongEqual]"], Infix[f[y, z], "\[LongEqual]"]], Infix[f[x, z], "\[LongEqual]"]]} ] 
In order to allow us to maintain equality while doing substitutions we also need the axiom:
✕
Implies[Wedge[Infix[f[x, y], "\[LongEqual]"], Infix[f[z, u], "\[LongEqual]"]], Infix[f[Application[x, z], Application[y, u]], "\[LongEqual]"]] 
And now to specify that combinator expressions that are considered equal also “do the same thing” when applied to equal expressions, we need the “extensionality” axiom:
✕
Implies[Infix[f[x, y], "\[LongEqual]"], Infix[f[Application[x, z], Application[y, z]], "\[LongEqual]"]] 
The previous axioms all work in pure “equational logic”. But when we add the extensionality axiom we have to explicitly use full firstorder logic—with the result that we get more complicated proofs, though the same basic methods apply.
One feature of the proofs we’ve seen above is that each intermediate lemma just involves direct use of one or other of the axioms. But in general, lemmas can use lemmas, and one can “recursively” build up a proof much more efficiently than just by always directly using the axioms.
But which lemmas are best to use? If one’s doing ordinary human mathematics—and trying to make proofs intended for human consumption—one typically wants to use “famous lemmas” that help create a humanrelatable narrative. But realistically there isn’t likely to be a “humanrelatable narrative” for most combinator equivalence theorems (or, at least there won’t be until or unless thinking in terms of combinators somehow becomes commonplace).
So then there’s a more “mechanical” criterion: what lemmas do best at reducing the lengths of as many proofs as much as possible? There’s some trickiness associated with translations between proofs of equalities and proofs that one expression can evolve into another. But roughly the question boils downs to this. When we construct a multiway graph of combinator evolution, each event—and thus each edge—is just the application of a single combinator “axiom”.
But if instead we do transformations based on more sophisticated lemmas we can potentially get from one expression to another in fewer steps. In other words, if we “cache” certain combinator transformations, can we make finding paths in combinator multiway graphs systematically more efficient?
To find all possible “combinator theorems” from a multiway system, we should start from all possible combinator expressions, then trace all possible paths to other expressions. It’s a little like what we did in the previous section—except now we want to consider multiway evolution with all possible evaluation orders.
Here’s the complete multiway graph starting from all size4 combinator expressions:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ResourceFunction["MultiwayCombinator"][{s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x}, EnumerateCombinators[4], 4, "StatesGraph"] 
Up to size 6, the graph is still finite (with each disconnected component in effect corresponding to a separate “fixedpoint attractor”):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ResourceFunction["MultiwayCombinator"][{s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x}, EnumerateCombinators[6], 12, "StatesGraphStructure"] 
For size 7 and above, it becomes infinite. Here’s the beginning of the graph for size8 expressions involving only s:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ResourceFunction["MultiwayCombinator"][{s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x}, Groupings[Table[s, 8], Construct > 2], 10, "StatesGraphStructure"] 
If one keeps only terminating cases, one gets for size 8:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; With[{n = 8}, ResourceFunction["MultiwayCombinator"][{s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x}, Complement[Groupings[Table[s, n], Construct > 2], Import[CloudObject[ StringTemplate[ "https://www.wolframcloud.com/obj/swblog/Combinators/Data/S\ NT1e4``.wxf"][n]]]], 50, "StatesGraphStructure", ImageSize > 260]] 
And for size 9:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; With[{n = 10}, ResourceFunction["MultiwayCombinator"][{s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x}, Complement[Groupings[Table[s, n], Construct > 2], Import[CloudObject[ StringTemplate[ "https://www.wolframcloud.com/obj/swblog/Combinators/Data/S\ NT1e4``.wxf"][n]]]], 10, "StatesGraphStructure", ImageSize > 270]] 
To assess the “most useful” transformations for “finding equations” there’s more to do: not only do we need to track what leads to what, but we also need to track causal relationships. And this leads to ideas like using lemmas that have the largest number of causal edges associated with them.
But are there perhaps other ways to find relations between combinator expressions, and combinator theorems? Can we for example figure out what combinator expressions are “close to” what others? In a sense what we need is to define a “space of combinator expressions” with some appropriate notion of nearness.
One approach would just be to look at “raw distances” between trees—say based on asking how many edits have to be made to one tree to get to another. But an approach that more closely reflects actual features of combinators is to think about the concept of branchial graphs and branchial space that comes from our Physics Project.
Consider for example the multiway graph generated from s[s[s]][s][s[s]][s] (S(SS)S(SS)S):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; LayeredGraphPlot[ ResourceFunction["MultiwayCombinator"][{s[x_][y_][z_] :> x[z][y[z]]}, s[s[s]][s][s[s]][s], 13, "StatesGraphStructure"], AspectRatio > 2] 
Now consider a foliation of this graph (and in general there will be many possible foliations that respect the partial order defined by the multiway graph):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; LayeredGraphPlot[ ResourceFunction["MultiwayCombinator"][{s[x_][y_][z_] :> x[z][y[z]]}, s[s[s]][s][s[s]][s], 13, "StatesGraphStructure"], AspectRatio > 2, Epilog > {ResourceFunction["WolframPhysicsProjectStyleData"][ "BranchialGraph", "EdgeStyle"], AbsoluteThickness[1], Table[Line[{{20, i}, {5, i}}], {i, 1.5, 36, 2.6}]}] 
In each slice, we can then define—as in our Physics Project—a branchial graph in which nodes are joined when they have an immediate common ancestor in the multiway graph. In the case shown here, the branchial graphs in successive slices are:
✕
Table[Framed[ ResourceFunction[ "MultiwayCombinator"][{s[x_][y_][z_] :> x[z][y[z]]}, s[s[s]][s][s[s]][s], t, "BranchialGraphStructure", ImageSize > Tiny], FrameStyle > LightGray], {t, 4, 13}] 
If we consider a combinator expression like s[s][s][s[s]][s][s] (SSS(SS)SS) that leads to infinite growth, we can ask what the “longterm” structure of the branchial graph will be. Here are the results after 18 and 19 steps:
✕
Table[Framed[ ResourceFunction[ "MultiwayCombinator"][{s[x_][y_][z_] :> x[z][y[z]]}, s[s][s][s[s]][s][s], t, "BranchialGraphStructure", ImageSize > 300], FrameStyle > LightGray], {t, 18, 19}] 
The largest connected components here contain respectively 1879 and 10,693 combinator expressions. But what can we say about their structure? One thing suggested by our Physics Project is to try to “fit them to continuous spaces”. And a first step in doing that is to estimate their effective dimension—which one can do by looking at the growth in the volume of a “geodesic ball” in the graph as a function of its radius:
✕

The result for distances small compared to the diameter of the graph is close to quadratic growth—suggesting that there is some sense in which the space of combinator expressions generated in this way may have a limiting 2D manifold structure.
It’s worth pointing out that different foliations of the multiway graph (i.e. using different “reference frames”) will lead to different branchial graphs—but presumably the (suitably defined) causal invariance of combinator evolution will lead to relativisticlike invariance properties of the branchial graphs.
Somewhat complementary to looking at foliations of the multiway graph is the idea of trying to find quantities that can be computed for combinator expressions to determine whether the combinator expressions can be equal. Can we in essence find hash codes for combinator expressions that are equal whenever the combinator expressions are equal?
In general we’ve been looking at “purely symbolic” combinator expressions—like:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorPlot[ k[k[s[k]][s[k]]][s[k][k[s[k]][s[k]]]], "CharactersLeftAssociative"] 
But what if we consider S, K to have definite, say numerical, values, and • to be some kind of generalized multiplication operator that combines these values? We used this kind of approach above in finding a procedure for determining whether S combinator expressions will evolve to fixed points. And in general each possible choice of “multiplication functions” (and S, K “constant values”) can be viewed in mathematical terms as setting up a “model” (in the modeltheoretic sense) for the “combinatory algebra”.
As a simple example, let’s consider a finite model in which there are just 2 possible values, and the “multiplication table” for the • operator is:
✕
Grid[MapIndexed[ If[#2[[1]] === 1  #2[[2]] === 1, Item[Style[#1, 12, Bold, GrayLevel[.35]], Background > GrayLevel[.9]], Item[Style[#1, 12], Background > Blend[{Hue[0.1, 0.89, 0.984], Hue[0.16, 0.51, 0.984], Hue[ 0.04768041237113402, 0, 0.984]}, (2  #1)/2], FrameStyle > Darker[RGBColor[0.984, 0.43, 0.208], .2]]] &, Prepend[MapIndexed[Prepend[#, First[#2]] &, {{2, 1}, {2, 2}}], Prepend[Range[2], "\[Application]"]], {2}], Spacings > {.25, 0}, ItemSize > {2, 2}, Frame > All, FrameStyle > GrayLevel[.6], BaseStyle > "Text"] 
If we consider S combinator expressions of size 5, there are a total of 14 such expressions, in 10 equivalence classes, that evolve to different fixed points. If we now “evaluate the trees” according to our “model for •” we can see that within each equivalence class the value accumulated at the root of the tree is always the same, but differs between at least some of the equivalence classes:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Framed[Row[#, Spacer[5]]] & /@ Map[SCombinatorAutomatonTreeGeneral[#, Application[x_, y_] :> ({{2, 1}, {2, 2}}[[x, y]]), 1, VertexSize > .6, ImageSize > {UpTo[120], UpTo[120]}] &, EquivalenceGroups[5], {2}] 
If we look at larger combinator expressions this all keeps working—until we get to two particular size10 expressions, which have the same fixed point, but different “values”:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; SCombinatorAutomatonTreeGeneral[#, Application[x_, y_] :> ({{2, 1}, {2, 2}}[[x, y]]), 1, VertexSize > .6, ImageSize > {UpTo[200], UpTo[200]}] & /@ {s[s[s]][ s[s[s]][s][s][s[s]]], s[s][s[s[s]][s[s[s]]]][s[s]]} 
Allowing 3 possible values, the longestsurviving models are
✕
Grid[MapIndexed[ If[#2[[1]] === 1  #2[[2]] === 1, Item[Style[#1, 12, Bold, GrayLevel[.35]], Background > GrayLevel[.9]], Item[Style[#1, 12], Background > Blend[{Hue[0.1, 0.89, 0.984], Hue[0.16, 0.51, 0.984], Hue[ 0.04768041237113402, 0, 0.984]}, (3  #1)/3], FrameStyle > Darker[RGBColor[0.984, 0.43, 0.208], .2]]] &, Prepend[MapIndexed[Prepend[#, First[#2]] &, #], Prepend[Range[3], "\[Application]"]], {2}], Spacings > {.25, 0}, ItemSize > {2, 2}, Frame > All, FrameStyle > GrayLevel[.6], BaseStyle > "Text"] & /@ {{{2, 3, 2}, {2, 2, 2}, {2, 2, 1}}, {{3, 3, 2}, {3, 1, 3}, {3, 3, 3}}} 
but these both fail at size 13 (e.g. for s[s][s[s]][s[s[s[s[s]][s][s]]][s[s]]], s[s[s]][s[s[s[s[s]][s[s[s]]]]]][s[s]]).
The fact that combinator equivalence is in general undecidable means we can’t expect to find a computationally finite “valuation procedure” that will distinguish all inequivalent combinator expressions. But it’s still conceivable that we could have a scheme to distinguish some classes of combinator expressions from others—in essence through the values of a kind of “conserved quantity for combinators”.
Another approach is to consider directly “combinator axioms” like
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Row[{CombinatorTraditionalForm[#[[1]]], Spacer[1], "\[LongEqual]", Spacer[1], CombinatorTraditionalForm[#[[ 2]]]}] &[#] & /@ {CombinatorS\[Application]x\[Application]y\ \[Application]z == x\[Application]z\[Application](y\[Application]z), CombinatorK\[Application]x\[Application]y == x} /. {x > Style[x, Italic], y > Style[y, Italic], z > Style[z, Italic]} 
and simply ask if there are models of •, S and K that satisfy them. Assuming a finite “multiplication table”, there’s no way to do this for K, and thus for S and K together. For S alone, however, there are already 8 2valued models, and 285 3valued ones.
The full story is more complicated, and has been the subject of a fair amount of academic work on combinators over the past half century. The main result is that there are models that are in principle known to exist, though they’re infinite and probably can’t be explicitly constructed.
In the case of something like arithmetic, there are formal axioms (the Peano axioms). But we know that (even though Gödel’s theorem shows that there are inevitably also other, exotic, nonstandard models) there’s a model of these axioms that is the ordinary integers. And our familiarity with these and their properties makes us feel that the Peano axioms aren’t just formal axioms; they’re axioms “about” something, namely integers.
What are the combinator axioms “about”? There’s a perfectly good interpretation of them in terms of computational processes. But there doesn’t seem to be some “static” set of constructs—like the integers—that give one more insight about what combinators “really are”. Instead, it seems, combinators are in the end just through and through computational.
We’ve talked a lot here about what combinators “naturally do”. But what about getting combinators to do something specific—for example to perform a particular computation we want?
As we saw by example at the beginning of this piece, it’s not difficult to take any symbolic structure and “compile it” to combinators. Let’s say we’re given:
✕
f[y[x]][y][x] 
There’s then a recursive procedure that effectively builds “function invocations” out of s’s and “stops computations” with k’s. And using this we can “compile” our symbolic expression to the (slightly complicated) combinator expression:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; SKCombinatorCompile[f[y[x]][y][x], {f, x, y}] 
To “compute our original expression” we just have to take this combinator expression (“■”), form ■[f][x][y], then apply the combinator rules and find the fixed point:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Magnify[ CombinatorEvolutionPlot[ CombinatorFixedPointList[ SKCombinatorCompile[f[y[x]][y][x], {f, x, y}][f][x][y]], "StatesDisplay"], .5] 
But is this the “best combinator way” to compute this result?
There are various different things we could mean by “best”. Smallest program? Fastest program? Most memoryefficient program? Or said in terms of combinators: Smallest combinator expression? Smallest number of rule applications? Smallest intermediate expression growth?
In computation theory one often talks theoretically about optimal programs and their characteristics. But when one’s used to studying programs “in the wild” one can start to do empirical studies of computationtheoretic questions—as I did, for example, with simple Turing machines in A New Kind of Science.
Traditional computation theory tends to focus on asymptotic results about “all possible programs”. But in empirical computation theory one’s dealing with specific programs—and in practice there’s a limit to how many one can look at. But the crucial and surprising fact that comes from studying the computational universe of “programs in the wild” is that actually even very small programs can show highly complex behavior that’s in some sense typical of all possible programs. And that means that it’s realistic to get intuition—and results—about computationtheoretic questions just by doing empirical investigations of actual, small programs.
So how does this work with combinators? An immediate question to ask is: if one wants a particular expression, what are all the possible combinator expressions that will generate it?
Let’s start with a seemingly trivial case: x[x]. With the compilation procedure we used above we get the size7 combinator expression
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; SKCombinatorCompile[x[x], {x}] 
which (with leftmostoutermost evaluation) generates x[x] in 6 steps:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ CombinatorFixedPointList[ SKCombinatorCompile[x[x], {x}][x]], "StatesDisplay"] 
But what happens if we just start enumerating possible combinator expressions? Up to size 5, none compute x[x]. But at size 6, we have:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ CombinatorFixedPointList[s[s[s]][s][s[k]][x]], "StatesDisplay"] 
So we can “save” one unit of program size, but at the “cost” of taking 9 steps, and having an intermediate expression of size 21.
What if we look at size7 programs? There are a total of 11 that work (including the one from our “compiler”):
✕
{s[s[s[s]]][s][s[k]], s[s[s]][s[k]][s[k]], s[s][s[k]][s[s[k]]], s[s[s[k]]][s[k][s]], s[s][s[k]][s[k][s]], s[s[s[k]]][s[k][k]], s[s][s[k]][s[k][k]], s[s[k][s]][s[k][s]], s[s[k][s]][s[k][k]], s[s[k][k]][s[k][s]], s[s[k][k]][s[k][k]]} 
How do these compare in terms of “time” (i.e. number of steps) and “memory” (i.e. maximum intermediate expression size)? There are 4 distinct programs that all take the same time and memory, there are none that are faster, but there are others that are slowest (the slowest taking 12 steps):
✕

What happens with larger programs? Here’s a summary:
✕

Here are the distributions of times (dropping outliers)—implying (as the medians above suggest) that even a randomly picked program is likely to be fairly fast:
✕

And here’s the distribution of time vs. memory on a loglog scale:
✕

At size 10, the slowest and most memoryintensive program is s[s[s][k][s[s[s[s]]]]][s][k] (S(SSK(S(S(SS))))SK):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ LeafCount /@ CombinatorFixedPointList[s[s[s][k][s[s[s[s]]]]][s][k][1]], AspectRatio > 1/2, Frame > True, Joined > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleDark"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"], ImageSize > 295] 
There are so many other questions one can ask. For example: how similar are the various fastest programs? Do they all “work the same way”? At size 7 they pretty much seem to:
✕

At size 8 there are a few “different schemes” that start to appear:
✕

Then one can start to ask questions about how these fastest programs are laid out in the kind of “combinator space” we discussed in the last section—and whether there are good incremental (“evolutionary”) ways to find these fastest programs.
Another type of question has to do with the running of our programs. In everything we’ve done so far in this section, we’ve used a definite evaluation scheme: leftmost outermost. And in using this definite scheme, we can think of ourselves as doing “deterministic combinator computation”. But we can also consider the complete multiway system of all possible updating sequences—which amounts to doing nondeterministic computation.
Here’s the multiway graph for the size6 case we considered above, highlighting the leftmostoutermost evaluation path:
✕
Module[CloudGet[ "https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"];\ {g = ResourceFunction[ "MultiwayCombinator"][{s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x}, s[s[s]][s][s[k]][x], 12, "StatesGraphStructure", GraphLayout > "LayeredDigraphEmbedding", AspectRatio > 1], max}, max = Max[LeafCount[ToExpression[#]] & /@ VertexList[g]]; g = Graph[g, VertexSize > ((# > 0.75*Sqrt[LeafCount[ToExpression[#]]/max]) & /@ VertexList[g])]; HighlightGraph[g, Style[Subgraph[g, ToString /@ CombinatorFixedPointList[s[s[s]][s][s[k]][x]]], Thick, RGBColor[0.984, 0.43, 0.208]]]] 
And, yes, in this case leftmost outermost happens to follow a fastest path here. Some other possible schemes are very slow in comparison—with the maximum time being 13 and the maximum intermediate expression size being 21.
At size 7 the multiway graphs for all the leftmostoutermostfastest programs are the same—and are very simple—among other things making it seem that in retrospect the size6 case “only just makes it”:
✕

At size 8 there “two ideas” among the 16 cases:
✕

At size 9 there are “5 ideas” among 80 cases:
✕

And at size 10 things are starting to get more complicated:
✕

But if we don’t look at only at leftmostoutermostfastest programs? At size 7 here are the multiway graphs for all combinator expressions that compute x[x]:
✕

So if one operates “nondeterministically”—i.e. one can follow any path in the multiway graph, not just the leftmostoutermost evaluation scheme one—can one compute the answer faster? The answer in this particular case is no.
But what about at size 8? Of the 95 programs that compute x[x], in most cases the situation is like for size 7 and leftmost outermost gives the fastest result. But there are some wilder things that can happen.
Consider for example
✕
s[s[s[s]]][k[s[k]]][s] 
Here’s the complete multiway graph in this case (with 477 nodes altogether):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; With[{g = ResourceFunction[ "MultiwayCombinator"][{s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x}, #[x], 16, "StatesGraphStructure", GraphLayout > "LayeredDigraphEmbedding", AspectRatio > 1]}, HighlightGraph[ g, {Style[Subgraph[g, ToString /@ CombinatorFixedPointList[#[x]]], Thick, RGBColor[0.984, 0.43, 0.208]], Style[Subgraph[g, FindShortestPath[g, ToString[#[x]], "x[x]"]], Thick, Red]}]] &[s[s[s[s]]][k[s[k]]][s]] 
Two paths are indicated: the one in orange is the leftmostoutermost evaluation—which takes 12 steps in this case. But there’s also another path, shown in red—which has length 11. Here’s a comparison:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; Magnify[ CombinatorEvolutionPlot[{MapIndexed[ Row[{Text[Style[First[#2], Gray]], Spacer[6], #1}] &, CombinatorFixedPointList[s[s[s[s]]][k[s[k]]][s][x]]], MapIndexed[Row[{Text[Style[First[#2], Gray]], Spacer[6], #1}] &, ToExpression /@ With[{g = ResourceFunction[ "MultiwayCombinator"][{s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x}, #[x], 16, "StatesGraphStructure"]}, FindShortestPath[g, ToString[#[x]], "x[x]"]] &[ s[s[s[s]]][k[s[k]]][s]]]}, "StatesDisplay"], 0.8] 
To get a sense of the “amount of nondeterminism” that can occur, we can look at the number of nodes in successive layers of the multiway graph—essentially the number of “parallel threads” present at each “nondeterministic step”:
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; ListStepPlot[ ResourceFunction["MultiwayCombinator"][{s[x_][y_][z_] > x[z][y[z]], k[x_][y_] > x}, s[s[s[s]]][k[s[k]]][s][x], 16, "StatesCountsList"], Center, Frame > True, Filling > Axis, FillingStyle > $PlotStyles["ListPlot", "FillingStyleLight"], PlotStyle > $PlotStyles["ListPlot", "PlotStyle"], ImageSize > 190] 
What about size8 programs for x[x]? There are 9 more—similar to this one—where the nondeterministic computation is one step shorter. (Sometimes—as for s[s[s]][s][s[k[k][s]]]—the multiway graph is more complicated, in this case having 1661 nodes.)
But there are some other things that happen. And a dramatic one is that there can be paths that just don’t terminate at all. s[s[s[s][s]]][s][s[k]] gives an example. Leftmostoutermost evaluation reaches a fixed point after 14 steps. But overall the multiway graph grows exponentially (already having size 24,705 after 14 steps)—yielding eventually an infinite number of infinite paths: nondeterministic threads that in a sense get “lost forever”.
So far all we’ve talked about here is the computation of the one—seemingly trivial—object x[x]. But what about computing other things? Imagine we have a combinator expression ■ that we apply to x to form ■[x]. If when we “evaluate” this with the combinator rules it reaches a fixed point we can say this is the result of the computation. But a key point is that most of the time this “result” won’t just contain x; it’ll still have “innards of the computation”—in the form of S’s and K’s—in it.
Out of all 2688 combinator expressions of size 6, 224 compute x. Only one (that we saw above) computes something more complicated: x[x]. At size 7, there are 11 programs that compute x[x], and 4 that compute x[x][x]. At size 8 the things that can be computed are:
✕

At size 9 the result is:
✕

In a sense what we’re seeing here are the expressions (or objects) of “low algorithmic information content” with respect to combinator computation: those for which the shortest combinator program that generates them is just of length 9. In addition to shortest program length, we can also ask about expressions generated within certain time or intermediateexpressionsize constraints.
What about the other way around? How large a program does one need to generate a certain object? We know that x[x] can be generated with a program of size 6. It turns out x[x[x]] needs a program of size 8:
✕

Here are the shortest programs for objects of size 4:
✕

Our original “straightforward compiler” generates considerably longer programs: to get an object involving only x’s of size n it produces a program of length 4n – 1 (i.e. 15 in this case).
It’s interesting to compare the different situations here. x[x[x]][x[x]][x[x[x[x]][x[x]]]][x[x[x[x]][x[x]]]] (of size 17) can be generated by the program s[s[s]][s][s[s][s[k]]] (of size 8). But the shortest program that can generate x[x[x[x]]] (size 4) is of length 10. And what we’re seeing is that different objects can have very different levels of “algorithmic redundancy” under combinator computation.
Clearly we could go on to investigate objects that involve not just x, but also y, etc. And in general there’s lots of empirical computation theory that one can expect to do with combinators.
As one last example, one can ask how large a combinator expression is needed to “build to a certain size”, in the sense that the combinator expression evolves to a fixed point with that size. Here is the result for all sizes up to 100, both for S,K expressions, and for expressions with S alone (the dotted line is ):
✕

By the way, we can also ask about programs that involve only S, without K. If one wants ■[x] to evaluate to an expression involving only x this isn’t possible if one only uses S. But as we discussed above, it’s still perfectly possible to imagine “doing a computation” only using S: one just can’t expect to have the result delivered directly on its own. Instead, one must run some kind of procedure to extract the result from a “wrapper” that contains S’s.
What about practical computations? The most obvious implementation of combinators on standard modern computer systems isn’t very efficient because it tends to involve extensive copying of expressions. But by using things like the DAG approach discussed above it’s perfectly possible to make it efficient.
What about physical systems? Is there a way to do “intrinsically combinator” computation? As I discussed above, our model of fundamental physics doesn’t quite align with combinators. But closer would be computations that can be done with molecules. Imagine a molecule with a certain structure. Now imagine that another molecule reacts with it to produce a molecule with a new structure. If the molecules were treelike dendrimers, it’s at least conceivable that one can get something like a combinator transformation process.
I’ve been interested for decades in using ideas gleaned from exploring the computational universe to do molecularscale computation. Combinators as such probably aren’t the best “raw material”, but understanding how computation works with combinators is likely to be helpful.
And just for fun we can imagine taking actual expressions—say from the evolution of s[s][s][s[s]][s][s]—and converting them to “molecules” just using standard chemical SMILES strings (with C in place of S):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorTraditionalForm /@ CombinatorEvolveList[s[s][s][s[s]][s][s], 10] 
✕

S and K at first seem so simple, so basic. But as we’ve seen here, there’s an immense richness to what they can do. It’s a story I’ve seen played out many times across the computational universe. But in a sense it’s particularly remarkable for combinators because they were invented so early, and they seem so very simple.
There’s little question that even a century after they were invented, combinators are still hard to get one’s head around. Perhaps if computation and computer technology had developed differently, we’d now find combinators easier to understand. Or perhaps the way our brains are made, they’re just intrinsically difficult.
In a sense what makes combinators particularly difficult is the extent to which they’re both featureless and fundamentally dynamic in their structure. When we apply the ideas of combinators in practical “humanoriented” computing—for example in the Wolfram Language—we annotate what’s going on in a variety of ways. But with the Wolfram Physics Project we now have the idea that what happens at the lowest level in the physics of our universe is something much more like “raw combinators”.
The details are different—we’re dealing with hypergraphs, not trees—but many of the concepts are remarkably similar. Yes, a universe made with combinators probably won’t have anything like space in the way we experience it. But a lot of ideas about updating processes and multiway systems are all there in combinators.
For most of their history, combinators have been treated mainly as a kind of backstop for proofs. Yes, it is possible to avoid variables, construct everything symbolically, etc. But a century after they were invented, we can now see that combinators in their own right have much to contribute.
What happens if we don’t just think about combinators in general, but actually look at what specific combinators do? What happens if we do experiments on combinators? In the past some elaborate behavior of a particular combinator expression might have just seemed like a curiosity. But now that we have the whole paradigm that I’ve developed from studying the computational universe we can see how such things fit in, and help build up a coherent story about the ways of computation.
In A New Kind of Science I looked a bit at the behavior of combinators; here I’ve done more. But there’s still vastly more to explore in the combinator universe—and many surprises yet to uncover. Doing it will both advance the general science of the computational universe, and will give us a new palette of phenomena and intuition with which to think about other computational systems.
There are things to learn for physics. There are things to learn for language design. There are things to learn about the theoretical foundations of computer science. There may also be things to learn for models of concrete systems in the natural and artificial world—and for the construction of useful technology.
As we look at different kinds of computational systems, several stand out for their minimalism. Particularly notable in the past have been cellular automata, Turing machines and string substitution systems. And now there are also the systems from our Wolfram Physics Project—that seem destined to have all sorts of implications even far beyond physics. And there are also combinators.
One can think of cellular automata, for example, as minimal systems that are intrinsically organized in space and time. The systems from our Wolfram Physics Project are minimal systems that purely capture relations between things. And combinators are in a sense minimal systems that are intrinsically about programs—and whose fundamental structure and operation revolve around the symbolic representation of programs.
What can be done with such things? How should we think about them?
Despite the passage of a century—and a substantial body of academic work—we’re still just at the beginning of understanding what can be done with combinators. There’s a rich and fertile future ahead, as we begin the second combinator century, now equipped with the ideas of symbolic computational language, the phenomena of the computational universe, and the computational character of fundamental physics.
I’m writing elsewhere about the origin of combinators, and about their interaction with the history of computation. But here let me make some remarks more specific to this piece.
Combinators were invented in 1920 by Moses Schönfinkel (hence the centenary), and since the late 1920s there’s been continuous academic work on them—notably over more than half a century by Haskell Curry.
A classic summary of combinators from a mathematical point of view is the book: Haskell B. Curry and Robert Feys, Combinatory Logic (1958). More recent treatments (also of lambda calculus) include: H. P. Barendregt, The Lambda Calculus (1981) and J. Roger Hindley and Jonathan P. Seldin, LambdaCalculus and Combinators (1986).
In the combinator literature, what I call “combinator expressions” are often called “terms” (as in “term rewriting systems”). The part of the expression that gets rewritten is often called the “redex”; the parts that get left over are sometimes called the “residuals”. The fixed point to which a combinator expression evolves is often called its “normal form”, and expressions that reach fixed points are called “normalizing”.
Forms like a[b[a][c]] that I “immediately apply to arguments” are basically lambda expressions, written in Wolfram Language using Function. The procedure of “compiling” from lambda expressions to combinators is sometimes called bracket abstraction. As indicated by examples at the end of this piece, there are many possible methods for doing this.
The scheme for doing arithmetic with combinators at the beginning of this piece is based on work by Alonzo Church in the 1930s, and uses socalled “Church numerals”. The idea of encoding logic by combinators was discussed by Schönfinkel in his original paper, though the specific minimal encoding I give was something I found by explicit computational search in just the past few weeks. Note that if one uses s[k] for True and k for False (as in the rule 110 cellular automaton encoding) the minimal forms for the Boolean operators are:
✕

The uniqueness of the fixed point for combinators is a consequence of the Church–Rosser property for combinators from 1941. It is closely related to the causal invariance property that appears in our model of physics.
There’s been a steady stream of specific combinators defined for particular mathematical purposes. An example is the Y combinator s[s][k][s[k[s[s][s[s[s][k]]]]][k]], which has the property that for any x, Y[x] can be proved to be equivalent to x[Y[x]], and “recurses forever”. Here’s how Y[x] grows if one just runs it with leftmostoutermost evaluation (and it produces expressions of the form Nest[x, _, n] at step n^{2} + 7n):
✕
CloudGet["https://www.wolframcloud.com/obj/swblog/Combinators/Programs.wl"]; CombinatorEvolutionPlot[ CombinatorEvolveList[s[s][k][s[k[s[s][s[s[s][k]]]]][k]][x], 100], "SizeAndMatches"] 
The Y combinator was notably used by Paul Graham in 2005 to name his Y Combinator startup accelerator. And perhaps channeling the aspirations of startups the “actual” Y combinator goes through many ups and downs but (with leftmostoutermost evaluation) reaches size 1 billion (“unicorn”) after 494 steps—and after 1284 steps reaches moredollarsthanintheworld size: 508,107,499,710,983.
Empirical studies of the actual behavior of combinators “in the wild” have been pretty sparse. The vast majority of academic work on combinators has been done by hand, and without the overall framework of A New Kind of Science the detailed behavior of actual combinators mostly just seemed like a curiosity.
I did fairly extensive computational exploration of combinators (and in general what I called “symbolic systems”) in the 1990s for A New Kind of Science. Page 712 summarized some combinator behavior I found (with /. evaluation):
I don’t know to what extent the combinator results in A New Kind of Science were anticipated elsewhere. Longtime combinator enthusiast Henk Barendregt for example recently pointed me to a paper of his from 1976 mentioning nontermination in S combinator expressions:
The procedure I describe for determining the termination of S combinator expression was invented by Johannes Waldmann at the end of the 1990s. (The detailed version that I used here came from Jörg Endrullis.)
What we call multiway systems have been studied in different ways in different fields, under different names. In the case of combinators, they are basically Böhm trees (named after Corrado Böhm).
I’ve concentrated here on the original S, K combinators; in recent livestreams, as in A New Kind of Science, I’ve also been exploring other combinator rules.
Matthew Szudzik has helped me with combinator matters since 1998 (and has given a lecture on combinators almost every year for the past 18 years at our Wolfram Summer School). Roman Maeder did a demo implementation of combinators in Mathematica in 1988, and has now added CombinatorS etc. to Version 12.2 of Wolfram Language.
I’ve had specific help on this piece from Jonathan Gorard, Jose MartinGarcia, Eric Paul, Ed Pegg, Max Piskunov, and particularly Mano Namuduri, as well as Jeremy Davis, Sushma Kini, Amy Simpson and Jessica Wong. We’ve had recent interactions about combinators with a fouracademicgeneration sequence of combinator researchers: Henk Barendregt, Jan Willem Klop, Jörg Endrullis and Roy Overbeek.
]]>In preparing my keynote at our 31st annual technology conference, I tried to collect some of my thoughts about our longterm mission and how I view the opportunities it is creating…
I’ve been fortunate to live at a time in history when there’s a transformational intellectual development: the rise of computation and the computational paradigm. And I’ve devoted my adult life to doing what I can to make computation and the computational method achieve their potential, both intellectually and in the world at large. I’ve alternated (about five times so far) between doing this with basic science and with practical technology, each time building on what I’ve been able to do before.
The basic science has shown me the immense power and potential of what’s out there in the computational universe: the capability of even simple programs to generate behavior of immense complexity, including, I now believe, the fundamental physics of our whole universe. But how can we humans harness all that power and potential? How do we use the computational universe to achieve things we want: to take our human objectives and automate achieving them?
I’ve now spent four decades in an effort to build a bridge between what’s possible with computation, and what we humans care about and think about. It’s a story of technology, but it’s also a story of big and deep ideas. And the result has been the creation of the first and only fullscale computational language—that we now call the Wolfram Language.
The goal of our computational language is to define a medium for expressing our thoughts in computational terms—whether they be about abstract things or real things in the actual world. We want a language that both helps us think in a new way, and lets us communicate with actual computers that can automate working out their consequences. It’s a powerful combination, not really like anything seen before in history.
When I began on this path a little more than forty years ago, I only understood a small part of what a fullscale computational language would give us, and just how far it would diverge from the aspirations of programming languages. But with every passing year—particularly as we develop our language ever further—I see yet more of what’s possible. Along the way it’s brought us Mathematica, WolframAlpha, my A New Kind of Science and now our Physics Project. It’s delivered the tools for countless inventions and discoveries, as well as the education of several generations of students. And it’s become a unique part of the technology stack for some of the world’s largest companies.
And, yes, it’s nice to see that validation of the bold vision of computational language. But even after all these years we’re still only at the very beginning of what’s possible. Computation has the potential to change so much for so many people. For every field X there’s going to be a computational X, and it’s going to be dramatically more powerful, more accessible, and more general than anything that came before. We’re seeing a major watershed in intellectual history.
There was a precursor four hundred years ago—when mathematical notation for the first time provided a streamlined way to represent and think about mathematics, and led to algebra, calculus and the mathematical sciences and engineering we have today. But computation is much bigger than mathematics, with much more farreaching consequences. It affects not just the “technical layer” of understanding the world, but the full spectrum of how we think about the world, what we can create in it and what can happen in it. And now, with our computational language, we have a medium—a notation—for humans and computers to together take advantage of this.
We’re at a moment of great potential. For the first time, we have broad access to the power of the computational paradigm. But just what can be done with this, and by whom? There’s been a trend for the front lines of thinking to become increasingly specialized and inaccessible. But rather like literacy half a millennium ago, computation and computational language provides the potential to open things up: to have a framework in which pretty much anyone can partake in frontlevel thinking, now with the clarity and concreteness of computation, and with the practical assistance of computers.
The arrival of the computational paradigm—and computational language—is the single largest change in content to have happened since the advent of public education a century or so ago. But whatever practical difficulty it may cause, I view it as a critical responsibility to educate future generations to be able to take advantage of the power of computation—and to make the rise of computation and everything it brings be what our time in history is most remembered for.
In the history of ideas, some things are inexorable. And the rise of the computational paradigm is one of those things. I have seen it myself over the course of nearly half a century. From “computation is just for specialists”, to “computation is useful in lots of places”, to “everyone should know about computation”, to a dawning awareness that “computation is a way of thinking about the world”. But this is just a foretaste of what is to come.
Computation is an incredibly general and powerful concept—which now indeed appears to be fundamental to our whole universe—and it seems inevitable that in time computation will provide the framework for describing and thinking about pretty much everything. But how will this actually work? We already know: computational language is the key.
And there is an inexorability to this as well. In the early days of computing, one programmed directly in the machine code of the computer. But slowly programming languages developed that gave us more convenient ways to describe and organize what we wanted to tell computers to do. Over time the languages gradually got higher and higher level, abstracting further and further away from the details of the operations of the computer.
It’s a pretty big jump to go to our modern conception of computational language, but it’s an inevitable one. Unlike programming languages—which are about describing what computers should do—my concept with the Wolfram Language is to have a way to represent everything in computational terms, for both computers and humans.
Over the past 40 years I’ve gradually understood more and more about how to construct a computer language for everything, and gradually we’ve been covering more and more with the Wolfram Language. But the endpoint is clear: to have a symbolic, computational representation for everything we humans choose to describe and work with in the world.
Some parts of this vision were absorbed quickly after we first delivered them. Mathematica as a “system for doing mathematics by computer” took only a few years to sweep across theoretical science. But even our concept of notebooks (which I always considered quite straightforward) took a solid quarter of a century to be widely absorbed, and copied.
Part of my original motivation for building the Wolfram Language was to have a tool that I could use myself, and it has turned out to be vastly more powerful than I could ever have imagined. It’s always a pleasure to see what people do with the Wolfram Language. Whether they’re distinguished leaders in their fields, or young students, they somehow seem to have a superpower that they can apply.
Yes, at some point in the future, the whole concept of computational language—and what we’ve done with Wolfram Language—will be part of what everyone takes for granted. But even after all these years, pretty much whenever I demo what we can do, many people still seem to think it’s magic. It’s as if I’m bringing an artifact from the future.
For oneself there’s no question that it’s fun—and valuable—to have an artifact from the future to use. But I feel a strong responsibility to try to bring everyone to the future—and to let everyone take advantage of the power of the computational paradigm as soon as possible.
I used to think that it wouldn’t take too long for this to just happen. But I’m realizing that the timescales are much, much longer than I imagined. Our physics project, for example, I first conceptualized 25 years ago, and nearly 20 years ago millions of people were exposed to it. Yet had it not been for a fortunate coincidence a year or so ago, I think the project could easily have languished for 50 years.
What about the whole concept of computational language? Some parts of it are quickly absorbed. But the further and further we go, the longer it’s going to take for the full story to be absorbed, and at this point it seems we’re looking at timescales of at least 50 years and perhaps 100 or more.
I’ve always wanted to build the best engine for innovation that I can. And for the past 34 years that’s been our company—which I’ve worked hard to optimize to consistently develop and deliver the best technology we can. I’ve considered other models, but what we’ve built seems basically unique in its ability to consistently sustain highly innovative R&D over the course of decades.
Over the years, our company has become more and more of an outlier in the technology world. Yes, we’re a company. But our focus is not so much commercial as intellectual. And I view what we’re doing more as a mission than a business. We want to build the computational future, and we want to do that by creating the technology to make that possible.
By now we’ve built a tower that reaches into the distant future, and we’re energetically working to extend it even further. It’s wonderful to see our community of users enabled by what we’re building—and to see the things they’re able to do.
But so far it’s still a comparatively small number of people who can harness artifacts from the future to do magic today. At some level it’s a shame it isn’t more widespread. But of course, it creates some amazing opportunities.
Who will bring computational language to this or that field? Who will write the definitive book or do the definitive research that leverages computational language in some particular way? Who will have the pleasure of seeing all those epiphanies as, one by one, people learn what the computational paradigm can do? Who will really develop the largescale communities and disciplines enabled by the computational paradigm?
It has been wonderful to plant the seeds to make all these things possible, and I personally look forward to continuing to push further into the computational future. But more than that, I hope to see an increasing number of other people take advantage of all the opportunities there are for bringing what now seem like artifacts from the future to the benefit of the world today.
]]>When the NASA Innovative Advanced Concepts Program asked me to keynote their annual conference I thought it would be a good excuse to spend some time on a question I’ve always wanted to explore…
“So you think you have a fundamental theory of physics. Well, then tell us if warp drive is possible!” Despite the hopes and assumptions of science fiction, real physics has for at least a century almost universally assumed that no genuine effect can ever propagate through physical space any faster than light. But is this actually true? We’re now in a position to analyze this in the context of our model for fundamental physics. And I’ll say at the outset that it’s a subtle and complicated question, and I don’t know the full answer yet.
But I increasingly suspect that going faster than light is not a physical impossibility; instead, in a sense, doing it is “just” an engineering problem. But it may well be an irreducibly hard engineering problem. And one that can’t be solved with the computational resources available to us in our universe. But it’s also conceivable that there may be some clever “engineering solution”, as there have been to so many seemingly insuperable engineering problems in the past. And that in fact there is a way to “move through space” faster than light.
It’s a little tricky even to define what it means to “go faster than light”. Do we allow an existing “space tunnel” (like the wormholes of general relativity)? Perhaps a space tunnel that has been there since the beginning of the universe. Or even if no space tunnel already exists, do we allow the possibility of building one—that we can then travel through? I’ll discuss these possibilities later. But the most dramatic possibility is that even if one’s going where “no one has gone before”, it might still be possible to traverse space faster than light to get there.
To give a preview of why doing this might devolve into an “engineering problem”, let’s consider a loose (but, in the end, not quite so loose) analogy. Imagine you’ve got molecules of gas in a room, all bouncing around and colliding with each other. Now imagine there’s a special molecule—or even a tiny speck of dust or a virus particle—somewhere in the room. Normally the special molecule will be buffeted by the molecules in the air, and will move in some kind of random walk, gradually diffusing across the room. But imagine that the special molecule somehow knows enough about the motion of the air molecules that it can compute exactly where to go to avoid being buffeted. Then that special molecule can travel much faster than diffusion—and effectively make a beeline from one side of the room to the other.
Of course this requires more knowledge and more computation than we currently imagine something like a molecule can muster (though it’s not clear this is true when we start thinking about explicitly constructing moleculescale computers). But the point is that the limit on the speed of the molecule is less a question of what’s physically possible, and more a question of what’s “engineerable”.
And so, I suspect, it is with space, and motion through space. Like our room full of air molecules, space in our theory of physics has a complex structure with many component parts that act in seemingly (but not actually) random ways. And in our theory the question of whether we can “move through space” faster than light can then be thought of as becoming a question of whether there can exist a “space demon” that can find ways to do computations fast enough to be able to successfully “hack space”.
But before we can discuss this further, we have to talk about just what space—and time—are in our models.
In standard physics, space (and the “spacetime continuum”) is just a background on which everything exists. Mathematically, it’s thought of as a manifold, in which every possible position can ultimately be labeled by 3 coordinate values. In our model, space is different. It’s not just a background; it’s got definite, intrinsic structure. And in fact everything in the universe is ultimately defined by that structure; in fact, at some level, everything is just “made of space”.
We might think of something like water as being a continuous fluid. But we know that at a small scale it’s actually made of discrete molecules. And so it is, I suspect, with space. At a small enough scale, there are actually discrete “atoms of space”—and only on a large scale does space appear to be continuous.
In our model, the “atoms of space” correspond to abstract elements whose only property is their relation to other abstract elements. Mathematically the structure can be thought of as a hypergraph, where the atoms of space are nodes, which are related by hyperedges to other nodes. On a very small scale we might have for example:
✕
Graph3D[Rule @@@ ResourceFunction[ "WolframModel"][{{{x, y}, {x, z}} > {{x, z}, {x, w}, {y, w}, {z, w}}}, {{0, 0}, {0, 0}}, 5, "FinalState"], GraphLayout > "SpringElectricalEmbedding"] 
On a slightly larger scale we might have:
✕
Graph3D[Rule @@@ ResourceFunction[ "WolframModel"][{{{x, y}, {x, z}} > {{x, z}, {x, w}, {y, w}, {z, w}}}, {{0, 0}, {0, 0}}, 12, "FinalState"]] 
And in our actual universe we might have a hypergraph with perhaps 10^{400} nodes.
How does a giant hypergraph behave like continuous space? In a case like this we can see that the nodes can be thought of as forming a 2D grid on a (curved) surface:
✕
ResourceFunction[ "WolframModel"][{{1, 2, 3}, {4, 2, 5}} > {{6, 3, 1}, {3, 6, 4}, {1, 2, 6}}, {{0, 0, 0}, {0, 0, 0}}, 1000, "FinalStatePlot"] 
There’s nothing intrinsic about our model of space that determines the effective dimensionality it will have. These are all perfectly good possible (hyper)graphs, but on a large scale they behave like space in different numbers of dimensions:
✕
Table[GridGraph[Table[10, n]], {n, 1, 3}] 
It’s convenient to introduce the notion of a “geodesic ball”: the region in a (hyper)graph that one reaches by following at most r connections in the (hyper)graph. A key fact is that in a (hyper)graph that limits to ddimensional space, the number of nodes in the geodesic ball grows like r^{d}. In a curved space (say, on the surface of a sphere) there’s a correction to r^{d}, proportional to the curvature of the space.
The full story is quite long, but ultimately what happens is that—much as we can derive the properties of a fluid from the largescale aggregate dynamics of lots of discrete molecules—so we can derive the properties of space from the largescale aggregate dynamics of lots of nodes in our hypergraphs. And—excitingly enough—it seems that we get exactly Einstein’s equations from general relativity.
OK, so if space is a collection of elements laid out in a “spatial hypergraph”, what is time? Unlike in standard physics, it’s something initially very different. It’s a reflection of the process of computation by which the spatial hypergraph is progressively updated.
Let’s say our underlying rule for updating the hypergraph is:
✕
RulePlot[ResourceFunction[ "WolframModel"][{{x, y}, {x, z}} > {{x, y}, {x, w}, {y, w}, {z, w}}]] 
Here’s a representation of the results of a sequence of updates according to this:
✕
Flatten[With[{eo = ResourceFunction[ "WolframModel"][{{x, y}, {x, z}} > {{x, z}, {x, w}, {y, w}, {z, w}}, {{0, 0}, {0, 0}}, 4]}, TakeList[eo["EventsStatesPlotsList", ImageSize > Tiny], eo["GenerationEventsCountList", "IncludeBoundaryEvents" > "Initial"]]]] 
Going further we’ll get for example:
✕
ResourceFunction[ "WolframModel"][{{x, y}, {x, z}} > {{x, z}, {x, w}, {y, w}, {z, w}}, {{1, 1}, {1, 1}}, 10]["StatesPlotsList", "MaxImageSize" > 100] 
But there’s a crucial point here. The underlying rule just defines how a local piece of hypergraph that has a particular form should be updated. If there are several pieces of hypergraph that have that form, it doesn’t say anything about which of them should be updated first. But once we’ve done a particular update, that can affect subsequent updates—and in general there’s a whole “causal graph” of causal relationships between updates.
We can see what’s going on a little more easily if instead of using spatial hypergraphs we just use strings of characters. Here we’re updating a string by repeatedly applying the (“sorting”) rule BA → AB:
✕
evo = (SeedRandom[2424]; ResourceFunction[ "SubstitutionSystemCausalEvolution"][{"BA" > "AB"}, "BBAAAABAABBABBBBBAAA", 10, {"Random", 4}]); ResourceFunction["SubstitutionSystemCausalPlot"][evo, EventLabels > False, CellLabels > True, CausalGraph > False] 
The yellow boxes indicate “updating events”, and we can join them by a causal graph that represents which event affects which other ones:
✕
evo = (SeedRandom[2424]; ResourceFunction[ "SubstitutionSystemCausalEvolution"][{"BA" > "AB"}, "BBAAAABAABBABBBBBAAA", 10, {"Random", 4}]); ResourceFunction["SubstitutionSystemCausalPlot"][evo, EventLabels > False, CellLabels > False, CausalGraph > True] 
If we’re an observer inside this system, all we can directly tell is what events are occurring, and how they’re causally connected. But to set up a description of what’s going on, it’s convenient to be able to talk about certain events happening “at a certain time”, and others happening later. Or, in other words, we want to define some kind of “simultaneity surfaces”—or a “reference frame”.
Here are two choices for how to do this
✕
CloudGet["https://wolfr.am/KVkTxvC5"]; \ CloudGet["https://wolfr.am/KVl97Tf4"]; Show[regularCausalGraphPlot[10, {1, 0}, {#, 0.0}, lorentz[0]], ImageSize > 330] & /@ {0., .3} 
where the second one can be reinterpreted as:
✕
CloudGet["https://wolfr.am/KVkTxvC5"]; \ CloudGet["https://wolfr.am/KVl97Tf4"]; regularCausalGraphPlot[10, {1, 0}, {0.3, 0.0}, lorentz[0.3]] 
And, yes, this can be thought of as corresponding to a reference frame with a different speed, just like in standard special relativity. But now there’s a crucial point. The particular rule we’ve used here is an example of one with the property of causal invariance—which means that it doesn’t matter “at what time” we do a particular update; we’ll always get the same causal graph. And this is why—even though space and time start out so differently in our models—we end up being able to derive the fact that they follow special relativity.
Given a reference frame, we can always “reconstruct” a view of the behavior of the system from the causal graph. In the cases shown here we’d get:
✕
CloudGet["https://wolfr.am/LbaDFVSn"]; GraphicsRow[ Show[ResourceFunction["SubstitutionSystemCausalPlot"][ boostedEvolution[ ResourceFunction[ "SubstitutionSystemCausalEvolution"][{"BA" > "AB"}, StringRepeat["BA", 10], 5], #], EventLabels > False, CellLabels > True, CausalGraph > False], ImageSize > {250, Automatic}] & /@ {0., 0.3}, Alignment > Top] 
And the fact that the system seems to “take longer to do its thing” in the second reference frame is precisely a reflection of relativistic time dilation in that frame.
Just as with strings, we can also draw causal graphs to represent the causal relationships between updating events in spatial hypergraphs. Here’s an example of what we get for the rule shown above:
✕
ResourceFunction[ "WolframModel"][{{{x, y}, {x, z}} > {{x, z}, {x, w}, {y, w}, {z, w}}}, {{0, 0}, {0, 0}}, 7]["LayeredCausalGraph", AspectRatio > 1/2] 
And once again we can set up reference frames to define what events we want to consider “simultaneous”. The only fundamental constraint on our reference frames is that in each slice of the “foliation” that defines the reference frame there can never be two events in which one follows from the other. Or, in the language of relativity, no events in a given slice can be timelike separated; instead, all of them must be spacelike separated, so that the slice defines a purely spacelike hypersurface.
In drawing a causal graph like the one above, we’re picking a particular collection of relative orderings of different possible updating events in the spatial hypergraph. But why one choice and not another? A key feature of our models is that actually we can think of all possible orderings as being done, or said, differently, we can construct a whole multiway graph of possibilities. Here’s what the multiway graph looks like for the string system above:
✕
LayeredGraphPlot[ ResourceFunction["MultiwaySystem"][{"BA" > "AB"}, "BBABBAA", 8, "StatesGraph"], AspectRatio > 1] 
Each node in this multiway graph represents a complete state of our system (in this case, a string), and a path through the multiway system corresponds to a possible history of the system, with a particular corresponding causal graph.
But now there’s an important connection with physics: the fact that we get a multiway graph makes quantum mechanics inevitable in our models. And it turns out that just like we can use reference frames to make sense of the evolution of our systems in space and time, so also we can use “quantum observation frames” to make sense of the time evolution of multiway graphs. But now the analog of space is what we call “branchial space”: in effect a space of possible quantum states, with the connections between states defined by their relationship on branches in the multiway system.
And much as we can define a spatial hypergraph representing relationships between “points in space”, so we can define a branchial graph that represents relationships (or “entanglements”) between quantum states, in branchial space:
✕
Cell[CellGroupData[{Cell[BoxData[ RowBox[{"LayeredGraphPlot", "[", RowBox[{ RowBox[{"Graph", "[", RowBox[{ RowBox[{"ResourceFunction", "[", "\"\<MultiwaySystem\>\"", "]"}], "[", RowBox[{ RowBox[{"{", RowBox[{ RowBox[{"\"\<A\>\"", "\[Rule]", "\"\<AB\>\""}], ",", RowBox[{"\"\<B\>\"", "\[Rule]", "\"\<A\>\""}]}], "}"}], ",", "\"\<A\>\"", ",", "5", ",", "\"\<EvolutionGraph\>\""}], "]"}], "]"}], ",", RowBox[{"Epilog", "\[Rule]", RowBox[{"{", RowBox[{ RowBox[{ RowBox[{ "ResourceFunction", "[", "\"\<WolframPhysicsProjectStyleData\>\"", "]"}], "[", RowBox[{"\"\<BranchialGraph\>\"", ",", "\"\<EdgeStyle\>\""}], "]"}], ",", RowBox[{"AbsoluteThickness", "[", "1.5", "]"}], ",", RowBox[{"Table", "[", RowBox[{ RowBox[{"Line", "[", RowBox[{"{", RowBox[{ RowBox[{"{", RowBox[{ RowBox[{"", "10"}], ",", "i"}], "}"}], ",", RowBox[{"{", RowBox[{"9", ",", "i"}], "}"}]}], "}"}], "]"}], ",", RowBox[{"{", RowBox[{"i", ",", ".4", ",", "5", ",", "1.05"}], "}"}]}], "]"}]}], "}"}]}]}], "]"}]], "Input"], Cell[BoxData[ RowBox[{ RowBox[{"ResourceFunction", "[", "\"\<MultiwaySystem\>\"", "]"}], "[", RowBox[{ RowBox[{"{", RowBox[{ RowBox[{"\"\<A\>\"", "\[Rule]", "\"\<AB\>\""}], ",", RowBox[{"\"\<B\>\"", "\[Rule]", "\"\<A\>\""}]}], "}"}], ",", "\"\<A\>\"", ",", "5", ",", "\"\<BranchialGraph\>\""}], "]"}]], "Input"] }, Open ]] 
I won’t go into the details here, but one of the beautiful things in our models is that just as we can derive the Einstein equations as a largescale limiting description of the behavior of our spatial hypergraphs, so also we can figure out the largescale limiting behavior for multiway systems—and it seems that we get the Feynman path integral for quantum mechanics!
By the way, since we’re talking about faster than light and motion in space, it’s worth mentioning that there’s also a notion of motion in branchial space. And just like we have the speed of light c that defines some kind of limit on how fast we can explore physical space, so also we have a maximal entanglement rate ζ that defines a limit on how fast we can explore (and thus “entangle”) different quantum states in branchial space. And just as we can ask about “faster than c”, we can also talk about “faster than ζ”. But before we get to that, we’ve got a lot of other things to discuss.
Traditional general relativity describes space as a continuous manifold that evolves according to certain partial differential equations. But our models talk about what’s underneath that, and what space actually seems to be made of. And while in appropriate limits they reproduce what general relativity says, they also imply all sorts of new and different phenomena.
Imagine that the hypergraph that represents space has the form of a simple 2D grid:
✕
GridGraph[{15, 15}, EdgeStyle > ResourceFunction["WolframPhysicsProjectStyleData"]["SpatialGraph", "EdgeLineStyle"], VertexStyle > ResourceFunction["WolframPhysicsProjectStyleData"]["SpatialGraph", "VertexStyle"]] 
In the limit this will be like 2D Euclidean space. But now suppose we add some extra “longrange threads” to the graph:
✕
SeedRandom[243234]; With[{g = GridGraph[{20, 20}]}, EdgeAdd[g, UndirectedEdge @@@ Select[Table[RandomInteger[{1, VertexCount[g]}, 2], 10], GraphDistance[g, #[[1]], #[[2]]] > 8 &], EdgeStyle > ResourceFunction["WolframPhysicsProjectStyleData"]["SpatialGraph", "EdgeLineStyle"], VertexStyle > ResourceFunction["WolframPhysicsProjectStyleData"]["SpatialGraph", "VertexStyle"]]] 
Here’s a different rendering of the same graph:
✕
Graph3D[EdgeList[%], EdgeStyle > ResourceFunction["WolframPhysicsProjectStyleData"]["SpatialGraph3D", "EdgeLineStyle"], VertexStyle > ResourceFunction["WolframPhysicsProjectStyleData"]["SpatialGraph3D", "VertexStyle"]] 
Now let’s ask about distances on this graph. Some nodes on the graph will have distances that are just like what one would expect in ordinary 2D space. But some will be “anomalously close”, because one will be able to get from one to another not by going “all the way through 2D space” but by taking a shortcut along one of the longrange threads.
Let’s say that we’re able to move around so that at every elementary interval of time we traverse a single connection in the graph. Then if our view of “what space is like” is based on the general structure of the graph (ignoring the longrange threads) we’ll come to some conclusion about how far we can go in a certain time—and what the maximum speed is at which we can “go through space”. But then what happens if we encounter one of the longrange threads? If we go through it we’ll be able to get from one “place in space” to another much faster than would be implied by the maximum speed we deduced from looking at “ordinary space”.
In a graph, there are many ways to end up having “longrange threads”—and we can think of these as defining various kinds of “space tunnels” that provide ways to get around in space evading usual speedoflight constraints. We can imagine both persistent space tunnels that could be repeatedly used, and spontaneous or “justintime” ones that exist only transiently. But—needless to say—there is all sorts of subtlety around the notion of space tunnels. If a tunnel is a pattern in a graph, what actually happens when something “goes through it”? And if a tunnel didn’t always exist, how does it get formed?
Space tunnels are a fairly general concept that can be defined on graphs or hypergraphs. But there’s at least a special case of them that can be defined even in standard general relativity: wormholes. General relativity describes space as a continuum—a manifold—in which there’s no way to have “just a few longrange threads”. The best one can do is to imagine that there’s a kind of “handle in space”, that provides an alternative path from one part of space to another:
How would such a nonsimplyconnected manifold form? Perhaps it’s a bit like the gastrulation that happens in embryonic development. But mathematically one can’t continuously change the topology of something continuous; there has to at least be some kind of singularity. In general relativity it’s been tricky to see how this could work. But of course in our models there’s not the same kind of constraint, because one doesn’t have to “rearrange a whole continuum”; one can do something more like “growing a handle one thread at a time”.
Here’s an example where one can see something a bit like this happening. We’re using the rule:
✕
RulePlot[ResourceFunction[ "WolframModel"][{{1, 2, 3}, {1, 4, 5}} > {{3, 3, 6}, {6, 6, 5}, {4, 5, 6}}]] 
And what it does is effectively to “knit handles” that provide “shortcuts” between “separated” points in patches of what limits to 2D Euclidean space:
✕
Labeled[ResourceFunction[ "WolframModel"][{{1, 2, 3}, {1, 4, 5}} > {{3, 3, 6}, {6, 6, 5}, {4, 5, 6}}, {{0, 0, 0}, {0, 0, 0}}, #, "FinalStatePlot"], Text[#]] & /@ {0, 5, 10, 50, 100, 500, 1000} 
In our models—free from the constraints of continuity—space can have all sorts of exotic forms. First of all, there’s no constraint that space has to have an integer number of dimension (say 3). Dimension is just defined by the asymptotic growth rates of balls, and can have any value. Like here’s a case that approximates 2.3dimensional space:
✕
ResourceFunction[ "WolframModel"][{{{1, 2, 3}, {2, 4, 5}} > {{6, 7, 2}, {5, 7, 8}, {4, 2, 8}, {9, 3, 5}}}, {{0, 0, 0}, {0, 0, 0}}, 20, "FinalStatePlot"] 
It’s worth noting that although it’s perfectly possibly to define distance—and, in the limit, lots of other geometric concepts—on a graph like this, one doesn’t get to say that nodes are at positions defined by particular sets of coordinates, as one would in integerdimensional space.
With a manifold, one basically has to pick a certain (integer) dimension, then stick to it. In our models, dimension can effectively become a dynamical variable, that can change with position (and time). So in our models one possible form of “space tunnel” is a region of space with higher or lower dimension. (Our derivation of general relativity is based on assuming that space has a limiting finite dimension, then asking what curvature and other properties it must have; the derivation is in a sense blind to differentdimensional space tunnels.)
It’s worth noting that both lower and higherdimensional space tunnels can be interesting in terms of “getting places quickly”. Lowerdimensional space tunnels (such as bigger versions of the 1D longrange threads in the 2D grid above) potentially connect some specific sparse set of “distant” points. Higherdimensional space tunnels (which in the infinitedimensional limit can be trees) are more like “switching stations” that make many points on their boundaries closer.
Let’s say we’ve somehow managed to get a space tunnel. What will happen to it? Traditional general relativity suggests that it’s pretty hard to maintain a wormhole under the evolution of space implied by Einstein’s equations. A wormhole is in effect defined by geodesic paths coming together when they enter the wormhole and diverging again when they exit. In general relativity the presence of mass makes geodesics converge; that’s the “attraction due to gravity”. But what could make the geodesics diverge again? Basically one needs some kind of gravitational repulsion. And the only obvious way to get this in general relativity is to introduce negative mass.
Normally mass is assumed to be a positive quantity. But, for example, dark energy effectively has to have negative mass. And actually there are several mechanisms in traditional physics that effectively lead to negative mass. All of them revolve around the question of where one sets the zero to be. Normally one sets things up so that one can say that “the vacuum” has zero energy (and mass). But actually—even in traditional physics—there’s lots that’s supposed to be going on in “the vacuum”. For example, there’s supposed to be a constant intensity of the Higgs field, that interacts with all massive particles and has the effect of giving them mass. And there are supposed to be vacuum fluctuations associated with all quantum fields, each leading (at least in standard quantum field theory) to an infinite energy density.
But if these things exist everywhere in the universe, then (at least for most purposes) we can just set our zero of energy to include them. So then if there’s anything that can reduce their effects, we’ll effectively see negative mass. And one example of where this can in some sense happen is the Casimir effect. Imagine that instead of having an infinite vacuum, we just have vacuum inside a box. Having the box cuts out some of the possible vacuum fluctuations of quantum fields (basically modes with wavelengths larger than the size of the box)—and so in some sense leads to negative energy density inside the box (at least relative to outside). And, yes, the effect is observable with metal boxes, etc. But what becomes of the Casimir effects in a purely spacetime or gravitational setting isn’t clear.
(This leads to a personal anecdote. Back in 1981 I wrote two papers about the Casimir effect with Jan Ambjørn, titled Properties of the Vacuum: 1. Mechanical and …: 2. Electrodynamic. We had planned a “…: 3. Gravitational” but never wrote it, and now I’m really curious what the results would have been. By the way, our paper #1 computed Casimir effects for boxes of different shapes, and had the surprising implication that by changing shapes in a cycle it would in principle be possible to continuously “mine” energy from the vacuum. This was later suggested as a method for interstellar propulsion, but to make it work requires an infinitely impermeable box, which doesn’t seem physically constructible, except maybe using gravitational effects and event horizons… but we never wrote paper #3 to figure that out….)
In traditional physics there’s been a conflict between what the vacuum is like according to quantum field theory (with infinite energy density from vacuum fluctuations, etc.) and what the vacuum is assumed to be like in general relativity (effectively zero energy density). In our models there isn’t the same kind of conflict, but “the vacuum” is something with even more structure.
In particular, in our models, space isn’t some separate thing that exists; it is just a consequence of the largescale structure of the spatial hypergraph. And any matter, particles, quantum fields, etc. that exist “in space” must also be features of this same hypergraph. Things like vacuum fluctuations aren’t something that happens in space; they are an integral part of the formation of space itself.
By the way, it’s important to note that in our models the hypergraph isn’t something static—and it’s in the end knitted together only through actual update events that occur. And the energy of some region of the hypergraph is directly related to the amount of updating activity in that region (or, more accurately, to the flux of causal edges through that portion of spacelike hypersurfaces).
So what does this mean for negative mass in our models? Well, if there was a region of the hypergraph where there was somehow less activity, it would have negative energy relative to the zero defined by the “normal vacuum”. It’s tempting to call whatever might reduce activity in the hypergraph a “vacuum cleaner”. And, no, we don’t know if vacuum cleaners can exist. But if they do, then there’s a fairly direct path to seeing how wormholes can be maintained (basically because geodesics almost by definition diverge wherever a vacuum cleaner has operated).
By the way, while a largescale wormholelike structure presumably requires negative mass, vacuum cleaners, etc., and other space tunnel structures may not have the same requirements. By their very construction, they tend to operate outside the regime described by general relativity and Einstein’s equations. So things like the standard singularity theorems of general relativity can’t be expected to apply. And instead there doesn’t seem to be any choice but to analyze them directly in the context of our models.
One might think: given a particular space tunnel configuration, why not just run a simulation of it, and see what happens? The problem is computational irreducibility. Yes, the simulation might show that the configuration is stable for a million or a billion steps. But that might still be far, far away from humanlevel timescales. And there may be no way to determine what the outcome for a given number of steps will be except in effect by doing that irreducible amount of computational work—so that if, for example, we want to find out the limiting result after an infinite time, that’ll in general require an infinite amount of computational work, and thus effectively be undecidable.
Or, put another way, even if we can successfully “engineer” a space tunnel, there may be no systematic way to guarantee that it’ll “stay up”; it may require an infinite sequence of “engineering tweaks” to keep it going, and eventually it may not be possible to keep it going. But before that, of course, we have to figure out how to construct a space tunnel in the first place…
In ordinary general relativity one tends to think of everything in terms of spacetime. So if a wormhole connects two different places, one assumes they are places in spacetime. Or, in other words, a wormhole can allow shortcuts between both different parts of space, and different parts of time. But with a shortcut between different parts of time one can potentially have time travel.
More specifically, one can have a situation where the future of something affects its past: in other words there is a causal connection from the future to the past. At some level this isn’t particularly strange. In any system that behaves in a perfectly periodic way one can think of the future as leading to a repetition of the past. But of course it’s not a future that one can freely determine; it’s just a future that’s completely determined by the periodic behavior.
How all this works is rather complicated to see in the standard mathematical treatment of general relativity, although in the end what presumably happens is that in the presence of wormholes the only consistent solutions to the equations are ones for which past and future are locked together with something like purely periodic behavior.
Still, in traditional physics there’s a certain sense that “time is just a coordinate”, so there’s the potential for “motion in time” just like we have motion in space. In our models, however, things work quite differently. Because now space and time are not the same kind of thing at all. Space is defined by the structure of the spatial hypergraph. But time is defined by the computational process of applying updates. And that computational process undoubtedly shows computational irreducibility.
So while we may go backwards and forwards in space, exploring different parts of the spatial hypergraph, the progress of time is associated with the progressive performance of irreducible computation by the universe. One can compute what will happen (or, with certain restrictions, what has happened), but one can only do so effectively by following the actual steps of it happening; one can’t somehow separately “move through it” to see what happens or has happened.
But in our models the whole causality of events is completely tracked, and is represented by the causal graph. And in fact each connection in the causal graph can be thought of as a representation of the very smallest unit of progression in time.
So now let’s look at a causal graph again:
✕
ResourceFunction[ "WolframModel"][{{x, y}, {z, y}} > {{x, z}, {y, z}, {w, z}}, {{0, 0}, {0, 0}}, 12, "LayeredCausalGraph"] 
There’s a very important feature of this graph: it contains no cycles. In other words, there’s a definite “flow of causality”. There’s a partial ordering of what events can affect what other events, and there’s never any looping back, and having an event affect itself.
There are different ways we can define “simultaneity surfaces”, corresponding to different foliations of this graph:
✕
Show[#, ImageSize > 400] & /@ {CloudGet["https://wolfr.am/KXgcRNRJ"]; evolution = ResourceFunction[ "WolframModel"][{{x, y}, {z, y}} > {{x, z}, {y, z}, {w, z}}, {{0, 0}, {0, 0}}, 12]; gg = Graph[evolution["LayeredCausalGraph"]]; GraphPlot[gg, Epilog > {Directive[Red], straightFoliationLines[{1/2, 0}, {0, 0}, (# &), {0, 1}]}], CloudGet["https://wolfr.am/KXgcRNRJ"];(*drawFoliation*) gg = Graph[ ResourceFunction[ "WolframModel"][{{x, y}, {z, y}} > {{x, z}, {y, z}, {w, z}}, {{0, 0}, {0, 0}}, 12, "LayeredCausalGraph"]]; semiRandomWMFoliation = {{1}, {1, 2, 4, 6, 9, 3}, {1, 2, 4, 6, 9, 3, 13, 19, 12, 26, 36, 5, 7, 10, 51, 14, 69, 18, 8, 25, 11, 34, 20, 35, 50, 17}, {1, 2, 4, 6, 9, 3, 13, 19, 12, 26, 36, 5, 7, 10, 51, 14, 69, 18, 8, 25, 11, 34, 20, 35, 50, 17, 24, 68, 47, 15, 92, 27, 48, 37, 21, 28, 42, 22, 30, 16, 32, 23, 33, 46, 64, 90, 94, 65, 88, 49, 67, 91, 66, 89}}; Quiet[drawFoliation[gg, semiRandomWMFoliation, Directive[Red]], FindRoot::cvmit]} 
But there’s always a way to do it so that all events in a given slice are “causally before” events in subsequent slices. And indeed whenever the underlying rule has the property of causal invariance, it’s inevitable that things have to work this way.
But if we break causal invariance, other things can happen. Here’s an example of the multiway system for a (string) rule that doesn’t have causal invariance, and in which the same state can repeatedly be visited:
✕
Graph[ResourceFunction["MultiwaySystem"][{"AB" > "BAB", "BA" > "A"}, "ABA", 5, "StatesGraph"], GraphLayout > {"LayeredDigraphEmbedding", "RootVertex" > "ABA"}] 
If we look at the corresponding (multiway) causal graph, it contains a loop:
✕
LayeredGraphPlot[ ResourceFunction["MultiwaySystem"][{"AB" > "BAB", "BA" > "A"}, "ABA", 4, "CausalGraphStructure"]] 
In the language of general relativity, this loop represents a closed timelike curve, where the future can affect the past. And if we try to construct a foliation in which “time systematically moves forward” we won’t be able to do it.
But the presence of these kinds of loops is a different phenomenon from the existence of space tunnels. In a space tunnel there’s connectivity in the spatial hypergraph that makes the (graph) distance between two points be shorter than you’d expect from the overall structure of the hypergraph. But it’s just connecting different places in space. An event that happens at one end of the space tunnel can affect events associated with distant places in space, but (assuming causal invariance, etc.) those events have to be “subsequent events” with respect to the partial ordering defined by the causal graph.
Needless to say, there’s all sorts of subtlety about the events involved in maintaining the space tunnel, the definition of distance being “shorter than you’d expect”, etc. But the main point here is that “jumping” between distant places in space doesn’t in any way require or imply “traveling backwards in time”. Yes, if you think about flat, continuum space and you imagine a tachyon going faster than light, then the standard equations of special relativity imply that it must be going backwards in time. But as soon as space itself can have features like space tunnels, nothing like this needs to be going on. Time—and the computational process that corresponds to it—can still progress even as effects propagate, say through space tunnels, faster than light to places that seem distant in space.
OK, now we’re ready to get to the meat of the question of fasterthanlight effects in our models. Let’s say some event occurs. This event can affect a cone of subsequent events in the causal graph. When the causal graph is a simple grid, it’s all quite straightforward:
✕
CloudGet["https://wolfr.am/LcADnk1u"]; upTriangleGraph = diamondCausalGraphPlot[11, {0, 0}, {}, # &, "Up", ImageSize > 450]; HighlightGraph[upTriangleGraph, Style[Subgraph[upTriangleGraph, VertexOutComponent[upTriangleGraph, 8]], Red, Thick]] 
But in a more realistic causal graph the story is more complicated:
✕
With[{g = ResourceFunction[ "WolframModel"][{{{x, y}, {x, z}} > {{x, z}, {x, w}, {y, w}, {z, w}}}, {{0, 0}, {0, 0}}, 8]["LayeredCausalGraph", AspectRatio > 1/2]}, HighlightGraph[g, Style[Subgraph[g, VertexOutComponent[g, 10]], Red, Thick]]] 
The “causal cone” of affected events is very well defined. But now the question is: how does this relate to what happens in space and time?
When one thinks about the propagation of effects in space and time one typically thinks of light cones. Given a source of light somewhere in space and time, where in space and time can this affect?
And one might assume that the causal cone is exactly the light cone. But things are more subtle than that. The light cone is normally defined by the positions in space and time that it reaches. And that makes perfect sense if we’re dealing with a manifold representing continuous spacetime, on which we can, for example, set up numerical coordinates. But in our models there’s not intrinsically anything like that. Yes, we can say what element in a hypergraph is affected after some sequence of events. But there’s no a priori way to say where that element is in space. That’s only defined in some limit, relative to everything else in the whole hypergraph.
And this is the nub of the issue of fasterthanlight effects in our models: causal (and, in a sense, temporal) relationships are immediately well defined. But spatial ones are not. One event can affect another through a single connection in the causal graph, but those events might be occurring at different ends of a space tunnel that traverses what we consider to be a large distance in space.
There are several related issues to consider, but they center around the question of what space really is in our models. We started off by talking about space corresponding to a collection of elements and relations, represented by a hypergraph. But the hypergraph is continually being updated. So the first question is: can we define an instantaneous snapshot of space?
Well, that’s what our reference frames, and foliations, and simultaneity surfaces, and so on, are about. They specify which particular collection of events we should consider to have happened at the moment when we “sample the structure of space”. There is arbitrariness to this choice, which corresponds directly to the arbitrariness that we’re used to in the selection of reference frames in relativity.
But can we choose any collection of events consistent with the partial ordering defined by the causal graph (i.e. where no events associated with a “single time slice” follow each other in the causal graph, and thus affect each other)? This is where things begin to get complicated. Let’s imagine we pick a foliation like this, or something even wilder:
✕
CloudGet["https://wolfr.am/LcADnk1u"]; upTriangleGraph = diamondCausalGraphPlot[9, {0, 0}, {}, # &, "Up", ImageSize > 450]; Show[ drawFoliation[ Graph[upTriangleGraph, VertexLabelStyle > Directive[8, Bold], VertexSize > .45], {{1}, {1, 3, 6, 10, 2, 4, 5}, {1, 3, 6, 10, 2, 4, 5, 8, 9, 15, 13, 14, 19, 20, 26, 7, 12}, {1, 3, 6, 10, 2, 4, 5, 8, 9, 15, 13, 14, 19, 20, 26, 7, 12, 11, 17, 21, 18, 25, 24, 27, 32, 34, 28, 33, 16, 23, 31, 35, 42}}, Directive[AbsoluteThickness[2], Red]], ImageSize > 550] 
We may know what the spatial hypergraph “typically” looks like. But perhaps with a weird enough foliation, it could be very different.
But for now, let’s ignore this (though it will be important later). And let’s just imagine we pick some “reasonable” foliation. Then we want to ask what the “projection” of the causal cone onto the instantaneous structure of space is. Or, in other words, what elements in space are affected by a particular event?
Let’s look at a specific example. Let’s consider the same rule and same causal cone as above, with the “flat” (“cosmological rest frame”) foliation:
✕
CloudGet["https://wolfr.am/KXgcRNRJ"]; With[{g = ResourceFunction[ "WolframModel"][{{{x, y}, {x, z}} > {{x, z}, {x, w}, {y, w}, {z, w}}}, {{0, 0}, {0, 0}}, 8]["LayeredCausalGraph", AspectRatio > 1/2, Epilog > {Directive[Red], straightFoliationLines[{0.22, 0}, {0, 0}, (# &), {0, 2}]}]}, HighlightGraph[g, Style[Subgraph[g, VertexOutComponent[g, 10]], Red, Thick]]] 
Here are spatial hypergraphs associated with successive slices in this foliation, with the parts contained in the causal cone highlighted:
✕
Cell[CellGroupData[{Cell[BoxData[ RowBox[{ RowBox[{"EffectiveSpatialBall", "[", RowBox[{"wmo_", ",", "expr0_"}], "]"}], ":=", RowBox[{"Module", "[", RowBox[{ RowBox[{"{", RowBox[{ RowBox[{"t", "=", RowBox[{ "wmo", "[", "\"\<CompleteGenerationsCount\>\"", "]"}]}], ",", "fexprs"}], "}"}], ",", RowBox[{ RowBox[{"fexprs", "=", RowBox[{"wmo", "[", RowBox[{"\"\<StateEdgeIndicesAfterEvent\>\"", ",", RowBox[{"", "1"}]}], "]"}]}], ";", RowBox[{"Intersection", "[", RowBox[{ RowBox[{"Cases", "[", RowBox[{ RowBox[{"VertexOutComponent", "[", RowBox[{ RowBox[{ "wmo", "[", "\"\<ExpressionsEventsGraph\>\"", "]"}], ",", RowBox[{"{", "expr0", "}"}]}], "]"}], ",", RowBox[{ RowBox[{"{", RowBox[{"\"\<Expression\>\"", ",", "n_"}], "}"}], ":>", "n"}]}], "]"}], ",", "fexprs"}], "]"}]}]}], "]"}]}]], "Input"], Cell[BoxData[ RowBox[{ RowBox[{"EffectiveSpatialAtomBall", "[", RowBox[{"wmo_", ",", "expr0_"}], "]"}], ":=", RowBox[{"Module", "[", RowBox[{ RowBox[{"{", RowBox[{ RowBox[{"t", "=", RowBox[{ "wmo", "[", "\"\<CompleteGenerationsCount\>\"", "]"}]}], ",", "fexprs"}], "}"}], ",", RowBox[{ RowBox[{"fexprs", "=", RowBox[{"wmo", "[", RowBox[{"\"\<StateEdgeIndicesAfterEvent\>\"", ",", RowBox[{"", "1"}]}], "]"}]}], ";", RowBox[{ RowBox[{"wmo", "[", "\"\<AllExpressions\>\"", "]"}], "[", RowBox[{"[", RowBox[{"Intersection", "[", RowBox[{ RowBox[{"Cases", "[", RowBox[{ RowBox[{"VertexOutComponent", "[", RowBox[{ RowBox[{ "wmo", "[", "\"\<ExpressionsEventsGraph\>\"", "]"}], ",", RowBox[{"{", "expr0", "}"}]}], "]"}], ",", RowBox[{ RowBox[{"{", RowBox[{"\"\<Expression\>\"", ",", "n_"}], "}"}], ":>", "n"}]}], "]"}], ",", "fexprs"}], "]"}], "]"}], "]"}]}]}], "]"}]}]], "Input"], Cell[BoxData[ RowBox[{ RowBox[{"EffectiveSpatialBallPlot", "[", RowBox[{"wmo_", ",", "expr0_"}], "]"}], ":=", RowBox[{"With", "[", RowBox[{ RowBox[{"{", RowBox[{"bb", "=", RowBox[{"EffectiveSpatialAtomBall", "[", RowBox[{"wmo", ",", "expr0"}], "]"}]}], "}"}], ",", RowBox[{"wmo", "[", RowBox[{"\"\<FinalStatePlot\>\"", ",", RowBox[{"GraphHighlight", "\[Rule]", RowBox[{"Join", "[", RowBox[{"bb", ",", RowBox[{"Union", "[", RowBox[{"Catenate", "[", "bb", "]"}], "]"}]}], "]"}]}]}], "]"}]}], "]"}]}]], "Input"], Cell[BoxData[ RowBox[{"Table", "[", RowBox[{ RowBox[{"If", "[", RowBox[{ RowBox[{"t", "<", "4"}], ",", RowBox[{ RowBox[{"ResourceFunction", "[", "\"\<WolframModel\>\"", "]"}], "[", RowBox[{ RowBox[{"{", RowBox[{ RowBox[{"{", RowBox[{ RowBox[{"{", RowBox[{"x", ",", "y"}], "}"}], ",", RowBox[{"{", RowBox[{"x", ",", "z"}], "}"}]}], "}"}], "\[Rule]", RowBox[{"{", RowBox[{ RowBox[{"{", RowBox[{"x", ",", "z"}], "}"}], ",", RowBox[{"{", RowBox[{"x", ",", "w"}], "}"}], ",", RowBox[{"{", RowBox[{"y", ",", "w"}], "}"}], ",", RowBox[{"{", RowBox[{"z", ",", "w"}], "}"}]}], "}"}]}], "}"}], ",", RowBox[{"{", RowBox[{ RowBox[{"{", RowBox[{"0", ",", "0"}], "}"}], ",", RowBox[{"{", RowBox[{"0", ",", "0"}], "}"}]}], "}"}], ",", "t", ",", "\"\<FinalStatePlot\>\""}], "]"}], ",", RowBox[{"EffectiveSpatialBallPlot", "[", RowBox[{ RowBox[{ RowBox[{ "ResourceFunction", "[", "\"\<WolframModel\>\"", "]"}], "[", RowBox[{ RowBox[{"{", RowBox[{ RowBox[{"{", RowBox[{ RowBox[{"{", RowBox[{"x", ",", "y"}], "}"}], ",", RowBox[{"{", RowBox[{"x", ",", "z"}], "}"}]}], "}"}], "\[Rule]", RowBox[{"{", RowBox[{ RowBox[{"{", RowBox[{"x", ",", "z"}], "}"}], ",", RowBox[{"{", RowBox[{"x", ",", "w"}], "}"}], ",", RowBox[{"{", RowBox[{"y", ",", "w"}], "}"}], ",", RowBox[{"{", RowBox[{"z", ",", "w"}], "}"}]}], "}"}]}], "}"}], ",", RowBox[{"{", RowBox[{ RowBox[{"{", RowBox[{"0", ",", "0"}], "}"}], ",", RowBox[{"{", RowBox[{"0", ",", "0"}], "}"}]}], "}"}], ",", "t"}], "]"}], ",", RowBox[{"{", RowBox[{"\"\<Event\>\"", ",", "10"}], "}"}]}], "]"}]}], "]"}], ",", RowBox[{"{", RowBox[{"t", ",", "9"}], "}"}]}], "]"}]], "Input"] }, Open ]] 
For the first 3 slices the event that begins the causal cone hasn’t happened yet. But after that we start seeing the effect of the event, gradually spreading across successive spatial hypergraphs.
Yes, there are more subtleties ahead. But basically what we’re seeing here is the expansion of the light cone with time. So now we’ve got to ask the critical question: how fast does the edge of this light cone actually expand? How much space does it traverse at each unit in time? In other words, what is the effective speed of light here?
It is already clear from the pictures above that this is a somewhat subtle question. But let’s begin with an even more basic issue. The speed of light is something we measure in units like meters per second. But what we can potentially get from our model is instead a speed in spatial hypergraph edges per causal edge. We can say that each causal edge corresponds to a certain elementary time elapsing. And as soon as we quote the elementary time in seconds—say 100^{–100} s—we’re basically defining the second. And similarly, we can say that each spatial hypergraph edge corresponds to a distance of a certain elementary length. But now imagine that in t elementary times the light cone in the hypergraph has advanced by α t spatial hypergraph edges, or α t elementary lengths. What is α t in meters? It has to be α c t, where c is the speed of light, because in effect this defines the speed of light.
In other words, it’s at some level a tautology to say that the light cone in the spatial hypergraph advances at the speed of light—because this is the definition of the speed of light. But it’s more complicated than that. In continuum space there’s nothing inconsistent about saying that the speed of light is the same in every direction, everywhere. But when we’re projecting our causal cone onto the spatial hypergraph we can’t really say that anymore. But to know what happens we have to figure out more about how to characterize space.
In our models it’s clear what causal effects there are, and even how they spread. But what’s far from clear is where in detail these effects show up in what we call space. We know what the causal cones are like; but we still have to figure out how they map into positions in space. And from that we can try to work out whether—relative to the way we set up space—there can be effects that go faster than light.
In a sense speeds are complicated to characterize in our models because positions and times are hard to define. But it’s useful to consider for a moment the much simpler case of cellular automata, where from the outset we just set up a grid in space and time. Given some cellular automaton, say with a random initial condition, we can ask how fast an effect can propagate. For example, if we change one cell in the initial condition, by how many cells per step can the effect of this expand? Here are a couple of typical results:
✕
With[{u = RandomInteger[1, 160]}, SeedRandom[24245]; ArrayPlot[ Sum[(2 + (1)^i) CellularAutomaton[#, ReplacePart[u, 80 > i], 80], {i, 0, 1}], ColorRules > {0 > White, 4 > Black, 1 > Red, 3 > Red}, ImageSize > 330]] & /@ {22, 30} 
The actual speed of expansion can vary, but in both cases the absolute maximum speed is 1 cell/step. And this is very straightforward to understand from the underlying rules for the cellular automata:
✕
RulePlot[CellularAutomaton[#], ImageSize > 300] & /@ {22, 30} 
In both cases, the rule for each step “reaches” one cell away, so 1 cell/step is the maximum rate at which effects can propagate.
There’s something somewhat analogous that happens in our models. Consider a rule like:
✕
RulePlot[ResourceFunction[ "WolframModel"][{{{1, 2}, {2, 3}} > {{2, 4}, {2, 4}, {4, 1}, {4, 3}}}]] 
A bit like in the cellular automaton, the rule only “reaches” a limited number of connections away. And what this means is that in each updating event only elements within a certain range of connections can “have an effect” on each other. But inevitably this is only a very local statement. Because while the structure of the rule implies that effects can only spread a certain distance in a single update there is nothing that says what the “relative geometry” of successive updates will be, and what connection might be connected to what. Unlike in a cellular automaton where the global spatial structure is predefined, in our models there is no immediate global consequence to the fact that the rules are fundamentally local with respect to the hypergraph.
It should be mentioned that the rules don’t strictly even have to be local. If the lefthand side is disconnected, as in
✕
RulePlot[ResourceFunction["WolframModel"][{{x}, {y}} > {{x, y}}]] 
then in a sense any individual update can pick up elements from anywhere in the spatial hypergraph—even disconnected parts. And as a result, something anywhere in the universe can immediately affect something anywhere else. But with a rule like this, there doesn’t seem to be a way to build up anything with the kind of locality properties that characterize what we think of as space.
OK, but given a spatial hypergraph, how do we figure out “how far” it is from one node to another? That’s a subtle question. It’s easy to figure out the graph distance: just find the geodesic path from one node to another and see how many connections it involves. But this is just an abstract distance on the hypergraph: now the question is how it relates to a distance we might measure “physically”, say with something like a ruler.
It’s a tricky thing: we have a hypergraph that is supposed to represent everything in the universe. And now we want something—presumably itself part of the hypergraph—to measure a distance in the hypergraph. In traditional treatments of relativity it’s common to think of measuring distances by looking at arrival times of light signals or photons. But this implicitly assumes that there’s an underlying structure of space, and photons are simply being added in to probe it. In our models, however, the photons have to themselves be part of the spatial hypergraph: they’re in a sense just “pieces of space”, albeit presumably with appropriate generalized topological properties.
Or, put another way: when we directly study the spatial hypergraph, we’re operating far below the level of things like photons. But if we’re going to compare what we see in spatial hypergraphs with actual distance measurements in physics we’re going to have to find some way to bridge the gap. Or, in other words, we need to find some adequate proxy for physical distance that we can compute directly on the spatial hypergraph.
A simple possibility that we’ve used a lot in practice in exploring our models is just graph distance, though with one wrinkle. The wrinkle is as follows: our hypergraphs represent collections of relations between elements, and we assume that these relations are ordered—so that the hyperedges in our hypergraphs are directed hyperedges. But in computing “physicallike distances” we ignore the directedness, and treat what we have as an undirected hypergraph. In the limit of sufficiently large hypergraphs, this shouldn’t make much difference, although it seems as if including directedness information may let us look at the analog of spinors, while the undirected case corresponds to ordinary vectors, which are what we’re more familiar with in terms of measuring distances.
So is there any other proxy for distance that we could use? Actually, there are several. But one that may be particularly good is directly derived from the causal graph. It’s in some ways the analog of what we might do in traditional discussions of relativity where we imagine a grid of beacons signaling to each other over a limited period of time. In terms of our models we can say that it’s the analog of a branchial distance for the causal graph.
Here’s how it works. Construct a causal graph, say:
✕
ResourceFunction[ "WolframModel"][{{{x, y}, {x, z}} > {{x, z}, {x, w}, {y, w}, {z, w}}}, {{0, 0}, {0, 0}}, 5]["LayeredCausalGraph", AspectRatio > 1/2, VertexLabels > Automatic] 
Now look at the events in the last slice shown here. For each pair of events look at their ancestry, i.e. at what previous event(s) led to them. If a particular pair of events have a common ancestor on the step before, connect them. The result in this case is the graph:
✕
PacletInstall["SetReplace"]; << SetReplace`; SpatialReconstruction[wmo_WolframModelEvolutionObject, dt_Integer : 1] := Module[{cg = wmo["CausalGraph"], ceg = wmo["EventGenerations"], ev0, ev1, oc}, ev0 = First /@ Position[(ceg  Max[ceg]), dt]; ev1 = First /@ Position[(ceg  Max[ceg]), 0]; oc = Select[Rest[VertexOutComponent[cg, #]], MemberQ[ev1, #] &] & /@ ev0; Graph[ WolframPhysicsProjectStyleData["SpatialGraph", "Function"][ Graph[ev1, Flatten[(UndirectedEdge @@@ Subsets[#, {2}]) & /@ oc]]], VertexStyle > WolframPhysicsProjectStyleData["CausalGraph", "VertexStyle"], EdgeStyle > Blend[{First[ WolframPhysicsProjectStyleData["SpatialGraph", "EdgeLineStyle"]], WolframPhysicsProjectStyleData["BranchialGraph", "EdgeStyle"]}]]] Graph[SpatialReconstruction[ WolframModel[{{{x, y}, {x, z}} > {{x, z}, {x, w}, {y, w}, {z, w}}}, {{0, 0}, {0, 0}}, 5], 1], VertexLabels > Automatic] 
One can think of this as a “reconstruction of space”, based on the causal graph. In an appropriate limit, it should be essentially the same as the structure of space associated with the original hypergraph—though with this small a graph the spatial hypergraph still looks quite different:
✕
ResourceFunction[ "WolframModel"][{{{x, y}, {x, z}} > {{x, z}, {x, w}, {y, w}, {z, w}}}, {{0, 0}, {0, 0}}, 5]["FinalStatePlot"] 
It’s slightly complicated, but it’s important to understand the differences between these various graphs. In the underlying spatial hypergraph, the nodes are the fundamental elements in our model—that we’ve dubbed above “atoms of space”. The hyperedges connecting these nodes correspond