I’ve been trying to understand the Second Law now for a bit more than 50 years.
It all started when I was 12 years old. Building on an earlier interest in space and spacecraft, I’d gotten very interested in physics, and was trying to read everything I could about it. There were several shelves of physics books at the local bookstore. But what I coveted most was the largest physics book collection there: a series of five plushly illustrated college textbooks. And as a kind of graduation gift when I finished (British) elementary school in June 1972 I arranged to get those books. And here they are, still on my bookshelf today, just a little faded, more than half a century later:
For a while the first book in the series was my favorite. Then the third. The second. The fourth. The fifth one at first seemed quite mysterious—and somehow more abstract in its goals than the others:
What story was the filmstrip on its cover telling? For a couple of months I didn’t look seriously at the book. And I spent much of the summer of 1972 writing my own (unseen by anyone else for 30+ years) Concise Directory of Physics
that included a rather stiff page about energy, mentioning entropy—along with the heat death of the universe.
But one afternoon late that summer I decided I should really find out what that mysterious fifth book was all about. Memory being what it is I remember that—very unusually for me—I took the book to read sitting on the grass under some trees. And, yes, my archives almost let me check my recollection: in the distance, there’s the spot, except in 1967 the trees are significantly smaller, and in 1985 they’re bigger:
Of course, by 1972 I was a little bigger than in 1967—and here I am a little later, complete with a book called Planets and Life on the ground, along with a tube of (British) Smarties, and, yes, a pocket protector (but, hey, those were actual ink pens):
But back to the mysterious green book. It wasn’t like anything I’d seen before. It was full of pictures like the one on the cover. And it seemed to be saying that—just by looking at those pictures and thinking—one could figure out fundamental things about physics. The other books I’d read had all basically said “physics works like this”. But here was a book saying “you can figure out how physics has to work”. Back then I definitely hadn’t internalized it, but I think what was so exciting that day was that I got a first taste of the idea that one didn’t have to be told how the world works; one could just figure it out:
I didn’t yet understand quite a bit of the math in the book. But it didn’t seem so relevant to the core phenomenon the book was apparently talking about: the tendency of things to become more random. I remember wondering how this related to stars being organized into galaxies. Why might that be different? The book didn’t seem to say, though I thought maybe somewhere it was buried in the math.
But soon the summer was over, and I was at a new school, mostly away from my books, and doing things like diligently learning more Latin and Greek. But whenever I could I was learning more about physics—and particularly about the hot area of the time: particle physics. The pions. The kaons. The lambda hyperon. They all became my personal friends. During the school vacations I would excitedly bicycle the few miles to the nearby university library to check out the latest journals and the latest news about particle physics.
The school I was at (Eton) had five centuries of history, and I think at first I assumed no particular bridge to the future. But it wasn’t long before I started hearing mentions that somewhere at the school there was a computer. I’d seen a computer in real life only once—when I was 10 years old, and from a distance. But now, tucked away at the edge of the school, above a bicycle repair shed, there was an island of modernity, a “computer room” with a glass partition separating off a loudly humming desksized piece of electronics that I could actually touch and use: an Elliott 903C computer with 8 kilowords of 18bit ferrite core memory (acquired by the school in 1970 for £12,000, or about £219,386.05 today):
At first it was such an unfamiliar novelty that I was satisfied writing little programs to do things like compute primes, print curious patterns on the teleprinter, and play tunes with the builtin diagnostic tone generator. But it wasn’t long before I set my sights on the goal of using the computer to reproduce that interesting picture on the book cover.
I programmed in assembler, with my programs on paper tape. The computer had just 16 machine instructions, which included arithmetic ones, but only for integers. So how was I going to simulate colliding “molecules” with that? Somewhat sheepishly, I decided to put everything on a grid, with everything represented by discrete elements. There was a convention for people to name their programs starting with their own first initial. So I called the program SPART, for “Stephen’s Particle Program”. (Thinking about it today, maybe that name reflected some aspiration of relating this to particle physics.)
It was the most complicated program I had ever written. And it was hard to test, because, after all, I didn’t really know what to expect it to do. Over the course of several months, it went through many versions. Rather often the program would just mysteriously crash before producing any output (and, yes, there weren’t real debugging tools yet). But eventually I got it to systematically produce output. But to my disappointment the output never looked much like the book cover.
I didn’t know why, but I assumed it was because I was simplifying things too much, putting everything on a grid, etc. A decade later I realized that in writing my program I’d actually ended up inventing a form of 2D cellular automaton. And I now rather suspect that this cellular automaton—like rule 30—was actually intrinsically generating randomness, and in some sense showing what I now understand to be the core phenomenon of the Second Law. But at the time I absolutely wasn’t ready for this, and instead I just assumed that what I was seeing was something wrong and irrelevant. (In past years, I had suspected that what went wrong had to do with details of particle behavior on square—as opposed to other—grids. But I now suspect it was instead that the system was in a sense generating too much randomness, making the intended “molecular dynamics” unrecognizable.)
I’d love to “bring SPART back to life”, but I don’t seem to have a copy anymore, and I’m pretty sure the printouts I got as output back in 1973 seemed so “wrong” I didn’t keep them. I do still have quite a few paper tapes from around that time, but as of now I’m not sure what’s on them—not least because I wrote my own “advanced” papertape loader, which used what I later learned were errorcorrecting codes to try to avoid problems with pieces of “confetti” getting stuck in the holes that had been punched in the tape:
I don’t know what would have happened if I’d thought my program was more successful in reproducing “Second Law” behavior back in 1973 when I was 13 years old. But as it was, in the summer of 1973 I was away from “my” computer, and spending all my time on particle physics. And between that summer and early 1974 I wrote a booklength summary of what I called “The Physics of Subatomic Particles”:
I don’t think I’d looked at this in any detail in 48 years. But reading it now I am a bit shocked to find history and explanations that I think are often better than I would immediately give today—even if they do bear definite signs of coming from a British early teenager writing “scientific prose”.
Did I talk about statistical mechanics and the Second Law? Not directly, though there’s a curious passage where I speculate about the possibility of antimatter galaxies, and their (rather unSecondLawlike) segregation from ordinary, matter galaxies:
By the next summer I was writing the 230page, much more technical “Introduction to the Weak Interaction”. Lots of quantum mechanics and quantum field theory. No statistical mechanics. The closest it gets is a chapter on CP violation (AKA timereversal violation)—a longtime favorite topic of mine—but from a very particlephysics point of view. By the next year I was publishing papers about particle physics, with no statistical mechanics in sight—though in a picture of me (as a “lanky youth”) from that time, the Statistical Physics book is right there on my shelf, albeit surrounded by particle physics books:
But despite my focus on particle physics, I still kept thinking about statistical mechanics and the Second Law, and particularly its implications for the largescale structure of the universe, and things like the possibility of matterantimatter separation. And in early 1977, now 17 years old, and (briefly) a college student in Oxford, my archives record that I gave a talk to the newly formed (and shortlived) Oxford Natural Science Club entitled “Whither Physics” in which I talked about “large, small, many” as the main frontiers of physics, and presented the visual
with a dash of “unsolved purple” impinging on statistical mechanics, particularly in connection with nonequilibrium situations. Meanwhile, looking at my archives today, I find some “back of the envelope” equilibrium statistical mechanics from that time (though I have no idea now what this was about):
But then, in the fall of 1977 I ended up for the first time really needing to use statistical mechanics “in production”. I had gotten interested in what would later become a hot area: the intersection between particle physics and the early universe. One of my interests was neutrino background radiation (the neutrino analog of the cosmic microwave background); another was earlyuniverse production of stable charged particles heavier than the proton. And it turned out that to study these I needed all three of cosmology, particle physics, and statistical mechanics:
In the couple of years that followed, I worked on all sorts of topics in particle physics and in cosmology. Quite often ideas from statistical mechanics would show up, like when I worked on the hadronization of quarks and gluons, or when I worked on phase transitions in the early universe. But it wasn’t until 1979 that the Second Law made its first explicit appearance by name in my published work.
I was studying how there could be a net excess of matter over antimatter throughout the universe (yes, I’d by then given up on the idea of matterantimatter separation). It was a subtle story of quantum field theory, time reversal violation, General Relativity—and nonequilibrium statistical mechanics. And in the paper we wrote we included a detailed appendix about Boltzmann’s H theorem and the Second Law—and the generalization we needed for relativistic quantum timereversalviolating systems in an expanding universe:
All this got me thinking again about the foundations of the Second Law. The physicists I was around mostly weren’t too interested in such topics—though Richard Feynman was something of an exception. And indeed when I did my PhD thesis defense in November 1979 it ended up devolving into a spirited multihour debate with Feynman about the Second Law. He maintained that the Second Law must ultimately cause everything to randomize, and that the order we see in the universe today must be some kind of temporary fluctuation. I took the point of view that there was something else going on, perhaps related to gravity. Today I would have more strongly made the rather Feynmanesque point that if you have a theory that says everything we observe today is an exception to your theory, then the theory you have isn’t terribly useful.
Back in 1973 I never really managed to do much science on the very first computer I used. But by 1976 I had access to much bigger and faster computers (as well as to the ARPANET—forerunner of the internet). And soon I was routinely using computers as powerful tools for physics, and particularly for symbolic manipulation. But by late 1979 I had basically outgrown the software systems that existed, and within weeks of getting my PhD I embarked on the project of building my own computational system.
It’s a story I’ve told elsewhere, but one of the important elements for our purposes here is that in designing the system I called SMP (for “Symbolic Manipulation Program”) I ended up digging deeply into the foundations of computation, and its connections to areas like mathematical logic. But even as I was developing the criticaltoWolframLanguagetothisday paradigm of basing everything on transformations for symbolic expressions, as well as leading the software engineering to actually build SMP, I was also continuing to think about physics and its foundations.
There was often something of a statistical mechanics orientation to what I did. I worked on cosmology where even the collection of possible particle species had to be treated statistically. I worked on the quantum field theory of the vacuum—or effectively the “bulk properties of quantized fields”. I worked on what amounts to the statistical mechanics of cosmological strings. And I started working on the quantumfieldtheorymeetsstatisticalmechanics problem of “relativistic matter” (where my unfinished notes contain questions like “Does causality forbid relativistic solids?”):
But hovering around all of this was my old interest in the Second Law, and in the seemingly opposing phenomenon of the spontaneous emergence of complex structure.
SMP Version 1.0 was ready in mid1981. And that fall, as a way to focus my efforts, I taught a “Topics in Theoretical Physics” course at Caltech (supposedly for graduate students but actually almost as many professors came too) on what, for want of a better name, I called “nonequilibrium statistical mechanics”. My notes for the first lecture dived right in:
Echoing what I’d seen on that book cover back in 1972 I talked about the example of the expansion of a gas, noting that even in this case “Many features [are] still far from understood”:
I talked about the Boltzmann transport equation and its elaboration in the BBGKY hierarchy, and explored what might be needed to extend it to things like selfgravitating systems. And then—in what must have been a very overstuffed first lecture—I launched into a discussion of “Possible origins of irreversibility”. I began by talking about things like ergodicity, but soon made it clear that this didn’t go the distance, and there was much more to understand—saying that “with a bit of luck” the material in my later lectures might help:
I continued by noting that some systems can “develop order and considerable organization”—which nonequilibrium statistical mechanics should be able to explain:
I then went quite “cosmological”:
The first candidate explanation I listed was the fluctuation argument Feynman had tried to use:
I discussed the possibility of fundamental microscopic irreversibility—say associated with timereversal violation in gravity—but largely dismissed this. I talked about the possibility that the universe could have started in a special state in which “the matter is in thermal equilibrium, but the gravitational field is not.” And finally I gave what the 22yearold me thought at the time was the most plausible explanation:
All of this was in a sense rooted in a traditional mathematical physics style of thinking. But the second lecture gave a hint of a quite different approach:
In my first lecture, I had summarized my plans for subsequent lectures:
But discovery intervened. People had discussed reactiondiffusion patterns as examples of structure being formed “away from equilibrium”. But I was interested in more dramatic examples, like galaxies, or snowflakes, or turbulent flow patterns, or forms of biological organisms. What kinds of models could realistically be made for these? I started from neural networks, selfgravitating gases and spin systems, and just kept on simplifying and simplifying. It was rather like language design, of the kind I’d done for SMP. What were the simplest primitives from which I could build up what I wanted?
Before long I came up with what I’d soon learn could be called onedimensional cellular automata. And immediately I started running them on a computer to see what they did:
And, yes, they were “organizing themselves”—even from random initial conditions—to make all sorts of structures. By December I was beginning to frame how I would write about what was going on:
And by May 1982 I had written my first long paper about cellular automata (published in 1983):
The Second Law featured prominently, even in the first sentence:
I made quite a lot out of the fundamentally irreversible character of most cellular automaton rules, pretty much assuming that this was the fundamental origin of their ability to “generate complex structures”—as the opening transparencies of two talks I gave at the time suggested:
It wasn’t that I didn’t know there could be reversible cellular automata. And a footnote in my paper even records the fact these can generate nested patterns with a certain fractal dimension—as computed in a charmingly manual way on a couple of pages I now find in my archives:
But somehow I hadn’t quite freed myself from the assumption that microscopic irreversibility was what was “causing” structures to be formed. And this was related to another important—and ultimately incorrect—assumption: that all the structure I was seeing was somehow the result of the “filtering” random initial conditions. Right there in my paper is a picture of rule 30 starting from a single cell:
And, yes, the printout from which that was made is still in my archives, if now a little worse for wear:
Of course, it probably didn’t help that with my “display” consisting of an array of printed characters I couldn’t see too much of the pattern—though my archives do contain a long “imitationhighresolution” printout of the conveniently narrow, and ultimately nested, pattern from rule 225:
But I think the more important point was that I just didn’t have the necessary conceptual framework to absorb what I was seeing in rule 30—and I wasn’t ready for the intuitional shock that it takes only simple rules with simple initial conditions to produce highly complex behavior.
My motivation for studying the behavior of cellular automata had come from statistical mechanics. But I soon realized that I could discuss cellular automata without any of the “baggage” of statistical mechanics, or the Second Law. And indeed even as I was finishing my long statisticalmechanicsthemed paper on cellular automata, I was also writing a short paper that described cellular automata essentially as purely computational systems (even though I still used the term “mathematical models”) without talking about any kind of Second Law connections:
Through much of 1982 I was alternating between science, technology and the startup of my first company. I left Caltech in October 1982, and after stops at Los Alamos and Bell Labs, started working at the Institute for Advanced Study in Princeton in January 1983, equipped with a newly obtained Sun workstation computer whose (“one megapixel”) bitmap display let me begin to see in more detail how cellular automata behave:
It had very much the flavor of classic observational science—looking not at something like mollusc shells, but instead at images on a screen—and writing down what I saw in a “lab notebook“:
What did all those rules do? Could I somehow find a way to classify their behavior?
Mostly I was looking at random initial conditions. But in a near miss of the rule 30 phenomenon I wrote in my lab notebook: “In irregular cases, appears that patterns starting from small initial states are not selfsimilar (e.g. code 10)”. I even looked again at asymmetric “elementary” rules (of which rule 30 is an example)—but only from random initial conditions (though noting the presence of “class 4” rules, which would include rule 110):
My technology stack at the time consisted of printing screen dumps of cellular automaton behavior
then using repeated photocopying to shrink them—and finally cutting out the images and assembling arrays of them using Scotch tape:
And looking at these arrays I was indeed able to make an empirical classification, identifying initially five—but in the end four—basic classes of behavior. And although I sometimes made analogies with solids, liquids and gases—and used the mathematical concept of entropy—I was now mostly moving away from thinking in terms of statistical mechanics, and was instead using methods from areas like dynamical systems theory, and computation theory:
Even so, when I summarized the significance of investigating the computational characteristics of cellular automata, I reached back to statistical mechanics, suggesting that much as information theory provided a mathematical basis for equilibrium statistical mechanics, so similarly computation theory might provide a foundation for nonequilibrium statistical mechanics:
My experiments had shown that cellular automata could “spontaneously produce structure” even from randomness. And I had been able to characterize and measure various features of this structure, notably using ideas like entropy. But could I get a more complete picture of what cellular automata could make? I turned to formal language theory, and started to work out the “grammar of possible states”. And, yes, a quarter century before Graph in Wolfram Language, laying out complicated finite state machines wasn’t easy:
But by November 1983 I was writing about “selforganization as a computational process”:
The introduction to my paper again led with the Second Law, though now talked about the idea that computation theory might be what could characterize nonequilibrium and selforganizing phenomena:
The concept of equilibrium in statistical mechanics makes it natural to ask what will happen in a system after an infinite time. But computation theory tells one that the answer to that question can be noncomputable or undecidable. I talked about this in my paper, but then ended by discussing the ultimately much richer finite case, and suggesting (with a reference to NP completeness) that it might be common for there to be no computational shortcut to cellular automaton evolution. And rather presciently, I made the statement that “One may speculate that [this phenomenon] is widespread in physical systems” so that “the consequences of their evolution could not be predicted, but could effectively be found only by direct simulation or observation.”:
These were the beginnings of powerful ideas, but I was still tying them to somewhat technical things like ensembles of all possible states. But in early 1984, that began to change. In January I’d been asked to write an article for the thentop popular science magazine Scientific American on the subject of “Computers in Science and Mathematics”. I wrote about the general idea of computer experiments and simulation. I wrote about SMP. I wrote about cellular automata. But then I wanted to bring it all together. And that was when I came up with the term “computational irreducibility”.
By May 26, the concept was pretty clearly laid out in my draft text:
But just a few days later something big happened. On June 1 I left Princeton for a trip to Europe. And in order to “have something interesting to look at on the plane” I decided to print out pictures of some cellular automata I hadn’t bothered to look at much before. The first one was rule 30:
And it was then that it all clicked. The complexity I’d been seeing in cellular automata wasn’t the result of some kind of “selforganization” or “filtering” of random initial conditions. Instead, here was an example where it was very obviously being “generated intrinsically” just by the process of evolution of the cellular automaton. This was computational irreducibility up close. No need to think about ensembles of states or statistical mechanics. No need to think about elaborate programming of a universal computer. From just a single black cell rule 30 could produce immense complexity, and showed what seemed very likely to be clear computational irreducibility.
Why hadn’t I figured out before that something like this could happen? After all, I’d even generated a small picture of rule 30 more than two years earlier. But at the time I didn’t have a conceptual framework that made me pay attention to it. And a small picture like that just didn’t have the same inyourface “complexity from nothing” character as my larger picture of rule 30.
Of course, as is typical in the history of ideas, there’s more to the story. One of the key things that had originally let me start “scientifically investigating” cellular automata is that out of all the infinite number of possible constructible rules, I’d picked a modest number on which I could do exhaustive experiments. I’d started by considering only “elementary” cellular automata, in one dimension, with k = 2 colors, and with rules of range r = 1. There are 256 such “elementary rules”. But many of them had what seemed to me “distracting” features—like backgrounds alternating between black and white on successive steps, or patterns that systematically shifted to the left or right. And to get rid of these “distractions” I decided to focus on what I (somewhat foolishly in retrospect) called “legal rules”: the 32 rules that leave blank states blank, and are leftright symmetric.
When one uses random initial conditions, the legal rules do seem—at least in small pictures—to capture the most obvious behaviors one sees across all the elementary rules. But it turns out that’s not true when one looks at simple initial conditions. Among the “legal” rules, the most complicated behavior one sees with simple initial conditions is nesting.
But even though I concentrated on “legal” rules, I still included in my first major paper on cellular automata pictures of a few “illegal” rules starting from simple initial conditions—including rule 30. And what’s more, in a section entitled “Extensions”, I discussed cellular automata with more than 2 colors, and showed—though without comment—the pictures:
These were lowresolution pictures, and I think I imagined that if one ran them further, the behavior would somehow resolve into something simple. But by early 1983, I had some clues that this wouldn’t happen. Because by then I was generating fairly highresolution pictures—including ones of the k = 2, r = 2 totalistic rule with code 10 starting from a simple initial condition:
In early drafts of my 1983 paper on “Universality and Complexity in Cellular Automata” I noted the generation of “irregularity”, and speculated that it might be associated with class 4 behavior. But later I just stated as an observation without “cause” that some rules—like code 10—generate “irregular patterns”. I elaborated a little, but in a very “statistical mechanics” kind of way, not getting the main point:
In September 1983 I did a little better:
But in the end it wasn’t until June 1, 1984, that I really grokked what was going on. And a little over a week later I was in a scenic area of northern Sweden
at a posh “Nobel Symposium” conference on “The Physics of Chaos and Related Problems”—talking for the first time about the phenomenon I’d seen in rule 30 and code 10. And from June 15 there’s a transcript of a discussion session where I bring up the neverbeforementionedinpublic concept of computational irreducibility—and, unsurprisingly, leave the other participants (who were basically all traditional mathematically oriented physicists) at best slightly bemused:
I think I was still a bit prejudiced against rule 30 and code 10 as specific rules: I didn’t like the asymmetry of rule 30, and I didn’t like the rapid growth of code 10. (Rule 73—while symmetric—I also didn’t like because of its alternating background.) But having now grokked the rule 30 phenomenon I knew it also happened in “more aesthetic” “legal” rules with more than 2 colors. And while even 3 colors led to a rather large total space of rules, it was easy to generate examples of the phenomenon there.
A few days later I was back in the US, working on finishing my article for Scientific American. A photographer came to help get pictures from the color display I now had:
And, yes, those pictures included multicolor rules that showed the rule 30 phenomenon:
The caption I wrote commented: “Even in this case the patterns generated can be complex, and they sometimes appear quite random. The complex patterns formed in such physical processes as the flow of a turbulent fluid may well arise from the same mechanism.”
The article went on to describe computational irreducibility and its implications in quite a lot of detail— illustrating it rather nicely with a diagram, and commenting that “It seems likely that many physical and mathematical systems for which no simple description is now known are in fact computationally irreducible”:
I also included an example—that would show up almost unchanged in A New Kind of Science nearly 20 years later—indicating how computational irreducibility could lead to undecidability (back in 1984 the picture was made by stitching together many screen photographs, yes, with strange artifacts from longexposure photography of CRTs):
In a rather newspaperproductionlike experience, I spent the evening of July 18 at the offices of Scientific American in New York City putting finishing touches to the article, which at the end of the night—with minutes to spare—was dispatched for final layout and printing.
But already by that time, I was talking about computational irreducibility and the rule 30 phenomenon all over the place. In July I finished “Twenty Problems in the Theory of Cellular Automata” for the proceedings of the Swedish conference, including what would become a rather standard kind of picture:
Problem 15 talks specifically about rule 30, and already asks exactly what would—35 years later—become Problem #2 in my 2019 Rule 30 Prizes
while Problem 18 asks the (still largely unresolved) question of what the ultimate frequency of computational irreducibility is:
Very late in putting together the Scientific American article I’d added to the caption of the picture showing rule30like behavior the statement “Complex patterns generated by cellular automata can also serve as a source of effectively random numbers, and they can be applied to encrypt messages by converting a text into an apparently random form.” I’d realized both that cellular automata could act as good random generators (we used rule 30 as the default in Wolfram Language for more than 25 years), and that their evolution could effectively encrypt things, much as I’d later describe the Second Law as being about “encrypting” initial conditions to produce effective irreversibility.
Back in 1984 it was a surprising claim that something as simple and “scienceoriented” as a cellular automaton could be useful for encryption. Because at the time practical encryption was basically always done by what at least seemed like arbitrary and complicated engineering solutions, whose security relied on details or explanations that were often considered military or commercial secrets.
I’m not sure when I first became aware of cryptography. But back in 1973 when I first had access to a computer there were a couple of kids (as well as a teacher who’d been a friend of Alan Turing’s) who were programming Enigmalike encryption systems (perhaps fueled by what were then still officially just rumors of World War II goingson at Bletchley Park). And by 1980 I knew enough about encryption that I made a point of encrypting the source code of SMP (using a modified version of the Unix crypt program). (As it happens, we lost the password, and it was only in 2015 that we got access to the source again.)
My archives record a curious interaction about encryption in May 1982—right around when I’d first run (though didn’t appreciate) rule 30. A rather colorful physicist I knew named Brosl Hasslacher (who we’ll encounter again later) was trying to start a curiously modernsounding company named Quantum Encryption Devices (or QED for short)—that was actually trying to market a quite hacky and definitively classical (multipleshiftregisterbased) encryption system, ultimately to some rather shady customers (and, yes, the “expected” funding did not materialize):
But it was 1984 before I made a connection between encryption and cellular automata. And the first thing I imagined was giving input as the initial condition of the cellular automaton, then running the cellular automaton rule to produce “encrypted output”. The most straightforward way to make encryption was then to have the cellular automaton rule be reversible, and to run the inverse rule to do the decryption. I’d already done a little bit of investigation of reversible rules, but this led to a big search for reversible rules—which would later come in handy for thinking about microscopically reversible processes and thermodynamics.
Just down the hall from me at the Institute for Advanced Study was a distinguished mathematician named John Milnor, who got very interested in what I was doing with cellular automata. My archives contain all sorts of notes from Jack, like:
There’s even a reversible (“onetoone”) rule, with nice, minimal BASIC code, along with lots of “real math”:
But by the spring of 1984 Jack and I were talking a lot about encryption in cellular automata—and we even began to draft a paper about it
complete with outlines of how encryption schemes could work:
The core of our approach involved reversible rules, and so we did all sorts of searches to find these (and by 1984 Jack was—like me—writing C code):
I wondered how random the output from cellular automata was, and I asked people I knew at Bell Labs about randomness testing (and, yes, email headers haven’t changed much in four decades, though then I was swolf@ias.uucp; research!ken was Ken Thompson of Unix fame):
But then came my internalization of the rule 30 phenomenon, which led to a rather different way of thinking about encryption with cellular automata. Before, we’d basically been assuming that the cellular automaton rule was the encryption key. But rule 30 suggested one could instead have a fixed rule, and have the initial condition define the key. And this is what led me to more physicsoriented thinking about cryptography—and to what I said in Scientific American.
In July I was making “encryptionfriendly” pictures of rule 30:
But what Jack and I were most interested in was doing something more “cryptographically sophisticated”, and in particular inventing a practical publickey cryptosystem based on cellular automata. Pretty much the only publickey cryptosystems known then (or even now) are based on number theory. But we thought maybe one could use something like products of rules instead of products of numbers. Or maybe one didn’t need exact invertibility. Or something. But by the late summer of 1984, things weren’t looking good:
And eventually we decided we just couldn’t figure it out. And it’s basically still not been figured out (and maybe it’s actually impossible). But even though we don’t know how to make a publickey cryptosystem with cellular automata, the whole idea of encrypting initial data and turning it into effective randomness is a crucial part of the whole story of the computational foundations of thermodynamics as I think I now understand them.
Right from when I first formulated it, I thought computational irreducibility was an important idea. And in the late summer of 1984 I decided I’d better write a paper specifically about it. The result was:
It was a pithy paper, arranged to fit in the 4page limit of Physical Review Letters, with a rather clear description of computational irreducibility and its immediate implications (as well as the relation between physics and computation, which it footnoted as a “physical form of the Church–Turing thesis”). It illustrated computational reducibility and irreducibility in a single picture, here in its original Scotchtaped form:
The paper contains all sorts of interesting tidbits, like this run of footnotes:
In the paper itself I didn’t mention the Second Law, but in my archives I find some notes I made in preparing the paper, about candidate irreducible or undecidable problems (with many still unexplored)
which include “Will a hard sphere gas started from a particular state ever exhibit some specific antithermodynamic behaviour?”
In November 1984 the theneditor of Physics Today asked if I’d write something for them. I never did, but my archives include a summary of a possible article—which among other things promises to use computational ideas to explain “why the Second Law of thermodynamics holds so widely”:
So by November 1984 I was already aware of the connection between computational irreducibility and the Second Law (and also I didn’t believe that the Second Law would necessarily always hold). And my notes—perhaps from a little later—make it clear that actually I was thinking about the Second Law along pretty much the same lines as I do now, except that back then I didn’t yet understand the fundamental significance of the observer:
And spelunking now in my old filesystem (retrieved from a 9track backup tape) I find from November 17, 1984 (at 2:42am), troff source for a putative paper (which, yes, we even now run through troff):
This is all that’s in my filesystem. So, yes, in effect, I’m finally (more or less) finishing this 38 years later.
But in 1984 one of the hot—if not new—ideas of the time was “chaos theory”, which talked about how “randomness” could “deterministically arise” from progressive “excavation” of higher and higherorder digits in the initial conditions for a system. But having seen rule 30 this whole phenomenon of what was often (misleadingly) called “deterministic chaos” seemed to me at best like a sideshow—and definitely not the main effect leading to most randomness seen in physical systems.
I began to draft a paper about this
including for the first time an anchor picture of rule 30 intrinsically generating randomness—to be contrasted with pictures of randomness being generated (still in cellular automata) from sensitive dependence on random initial conditions:
It was a bit of a challenge to find an appropriate publishing venue for what amounted to a rather “interdisciplinary” piece of physicsmeetsmathmeetscomputation. But Physical Review Letters seemed like the best bet, so on November 19, 1984, I submitted a version of the paper there, shortened to fit in its 4page limit.
A couple of months later the journal said it was having trouble finding appropriate reviewers. I revised the paper a bit (in retrospect I think not improving it), then on February 1, 1985, sent it in again, with the new title “Origins of Randomness in Physical Systems”:
On March 8 the journal responded, with two reports from reviewers. One of the reviewers completely missed the point (yes, a risk in writing shifttheparadigm papers). The other sent a very constructive twopage report:
I didn’t know it then, but later I found out that Bob Kraichnan had spent much of his life working on fluid turbulence (as well as that he was a very independent and thinkforoneself physicist who’d been one of Einstein’s last assistants at the Institute for Advanced Study). Looking at his report now it’s a little charming to see his statement that “no one who has looked much at turbulent flows can easily doubt [that they intrinsically generate randomness]” (as opposed to getting randomness from noise, initial conditions, etc.). Even decades later, very few people seem to understand this.
There were several exchanges with the journal, leaving it controversial whether they would publish the paper. But then in May I visited Los Alamos, and Bob Kraichnan invited me to lunch. He’d also invited a thenyoung physicist from Los Alamos who I’d known fairly well a few years earlier—and who’d once paid me the unintended compliment that it wasn’t fair for me to work on science because I was “too efficient”. (He told me he’d “intended to work on cellular automata”, but before he’d gotten around to it, I’d basically figured everything out.) Now he was riding the chaos theory bandwagon hard, and insofar as my paper threatened that, he wanted to do anything he could to kill the paper.
I hadn’t seen this kind of “paradigm attack” before. Back when I’d been doing particle physics, it had been a hot and cutthroat area, and I’d had papers plagiarized, sometimes even egregiously. But there wasn’t really any “paradigm divergence”. And cellular automata—being quite far from the fray—were something I could just peacefully work on, without anyone really paying much attention to whatever paradigm I might be developing.
At lunch I was treated to a lecture about why what I was doing was nonsense, or even if it wasn’t, I shouldn’t talk about it, at least now. Eventually I got a chance to respond, I thought rather effectively—causing my “opponent” to leave in a huff, with the parting line “If you publish the paper, I’ll ruin your career”. It was a strange thing to say, given that in the pecking order of physics, he was quite junior to me. (A decade and half later there were nevertheless a couple of “incidents”.) Bob Kraichnan turned to me, cracked a wry smile and said “OK, I’ll go right now and tell [the journal] to publish your paper”:
Kraichnan was quite right that the paper was much too short for what it was trying to say, and in the end it took a long book—namely A New Kind of Science—to explain things more clearly. But the paper was where a highresolution picture of rule 30 first appeared in print. And it was the place where I first tried to explain the distinction between “randomness that’s just transcribed from elsewhere” and the fundamental phenomenon one sees in rule 30 where randomness is intrinsically generated by computational processes within a system.
I wanted words to describe these two different cases. And reaching back to my years of learning ancient Greek in school I invented the terms “homoplectic” and “autoplectic”, with the noun “autoplectism” to describe what rule 30 does. In retrospect, I think these terms are perhaps “too Greek” (or too “medical sounding”), and I’ve tended to just talk about “intrinsic randomness generation” instead of autoplectism. (Originally, I’d wanted to avoid the term “intrinsic” to prevent confusion with randomness that’s baked into the rules of a system.)
The paper (as Bob Kraichnan pointed out) talks about many things. And at the end, having talked about fluid turbulence, there’s a final sentence—about the Second Law:
In my archives, I find other mentions of the Second Law too. Like an April 1985 protopaper that was never completed
but included the statement:
My main reason for working on cellular automata was to use them as idealized models for systems in nature, and as a window into foundational issues. But being quite involved in the computer industry, I couldn’t help wondering whether they might be directly useful for practical computation. And I talked about the possibility of building a “metachip” in which—instead of having predefined “meaningful” opcodes like in an ordinary microprocessor—everything would be built up “purely in software” from an underlying universal cellular automaton rule. And various people and companies started sending me possible designs:
But in 1984 I got involved in being a consultant to an MITspinoff startup called Thinking Machines Corporation that was trying to build a massively parallel “Connection Machine” computer with 65536 processors. The company had aspirations around AI (hence the name, which I’d actually been involved in suggesting), but their machine could also be put to work simulating cellular automata, like rule 30. In June 1985, hot off my work on the origins of randomness, I went to spend some of the summer at Thinking Machines, and decided it was time to do whatever analysis—or, as I would call it now, ruliology—I could on rule 30.
My filesystem from 1985 records that it was fast work. On June 24 I printed a somewhathigherresolution image of rule 30 (my login was “swolf” back then, so that’s how my printer output was labeled):
By July 2 a prototype Connection Machine had generated 2000 steps of rule 30 evolution:
With a largeformat printer normally used to print integrated circuit layouts I got an even larger “piece of rule 30”—that I laid out on the floor for analysis, for example trying to measure (with meter rules, etc.) the slope of the border between regularity and irregularity in the pattern.
Richard Feynman was also a consultant at Thinking Machines, and we often timed our visits to coincide:
Feynman and I had talked about randomness quite a bit over the years, most recently in connection with the challenges of making a “quantum randomness chip” as a minimal example of quantum computing. Feynman at first didn’t believe that rule 30 could really be “producing randomness”, and that there must be some way to “crack” it. He tried, both by hand and with a computer, particularly using statistical mechanics methods to try to compute the slope of the border between regularity and irregularity:
But in the end, he gave up, telling me “OK, Wolfram, I think you’re on to something”.
Meanwhile, I was throwing all the methods I knew at rule 30. Combinatorics. Dynamical systems theory. Logic minimization. Statistical analysis. Computational complexity theory. Number theory. And I was pulling in all sorts of hardware and software too. The Connection Machine. A Cray supercomputer. A nowlongextinct Celerity C1200 (which successfully computed a length40,114,679,273 repetition period). A LISP machine for graph layout. A circuitdesign logic minimization program. As well as my own SMP system. (The Wolfram Language was still a few years in the future.)
But by July 21, there it was: a 50page “ruliological profile” of rule 30, in a sense showing what one could of the “anatomy” of its randomness:
A month later I attended in quick succession a conference in California about cryptography, and one in Japan about fluid turbulence—with these two fields now firmly connected through what I’d discovered.
Back from when I first saw it at the age of 14 it was always my favorite page in The Feynman Lectures on Physics. But how did the phenomenon of turbulence that it showed happen, and what really was it?
In late 1984, the first version of the Connection Machine was nearing completion, and there was a question of what could be done with it. I agreed to analyze its potential uses in scientific computation, and in my resulting (never ultimately completed) report
the very first section was about fluid turbulence (others sections were about quantum field theory, nbody problems, number theory, etc.):
The traditional computational approach to studying fluids was to start from known continuum fluid equations, then to try to construct approximations to these suitable for numerical computation. But that wasn’t going to work well for the Connection Machine. Because in optimizing for parallelism, its individual processors were quite simple, and weren’t set up to do fast (e.g. floatingpoint) numerical computation.
I’d been saying for years that cellular automata should be relevant to fluid turbulence. And my recent study of the origins of randomness made me all the more convinced that they would for example be able to capture the fundamental randomness associated with turbulence (which I explained as being a bit like encryption):
I sent a letter to Feynman expressing my enthusiasm:
I had been invited to a conference in Japan that summer on “High Reynolds Number Flow Computation” (i.e. computing turbulent fluid flow), and on May 4 I sent an abstract which explained a little more of my approach:
My basic idea was to start not from continuum equations, but instead from a cellular automaton idealization of molecular dynamics. It was the same kind of underlying model as I’d tried to set up in my SPART program in 1973. But now instead of using it to study thermodynamic phenomena and the microscopic motions associated with heat, my idea was to use it to study the kind of visible motion that occurs in fluid dynamics—and in particular to see whether it could explain the apparent randomness of fluid turbulence.
I knew from the beginning that I needed to rely on “Second Law behavior” in the underlying cellular automaton—because that’s what would lead to the randomness necessary to “wash out” the simple idealizations I was using in the cellular automaton, and allow standard continuum fluid behavior to emerge. And so it was that I embarked on the project of understanding not only thermodynamics, but also hydrodynamics and fluid turbulence, with cellular automata—on the Connection Machine.
I’ve had the experience many times in my life of entering a field and bringing in new tools and new ideas. Back in 1985 I’d already done that several times, and it had always been a pretty much uniformly positive experience. But, sadly, with fluid turbulence, it was to be, at best, a turbulent experience.
The idea that cellular automata might be useful in studying fluid turbulence definitely wasn’t obvious. The year before, for example, at the Nobel Symposium conference in Sweden, a French physicist named Uriel Frisch had been summarizing the state of turbulence research. Fittingly for the topic of turbulence, he and I first met after a rather bumpy helicopter ride to a conference event—where Frisch told me in no uncertain terms that cellular automata would never be relevant to turbulence, and talked about how turbulence was better thought of as being associated (a bit like in the mathematical theory of phase transitions) with “singularities getting close to the real line”. (Strangely, I just now looked at Frisch’s paper in the proceedings of the conference: “Ou en est la Turbulence Developpée?” [roughly: “Fully Developed Turbulence: Where Do We Stand?”], and was surprised to discover that its last paragraph actually mentions cellular automata, and its acknowledgements thank me for conversations—even though the paper says it was received June 11, 1984, a couple of days before I had met Frisch. And, yes, this is the kind of thing that makes accurately reconstructing history hard.)
Los Alamos had always been a hotbed of computational fluid dynamics (not least because of its importance in simulating nuclear explosions)—and in fact of computing in general—and, starting in the late fall of 1984, on my visits there I talked to many people about using cellular automata to do fluid dynamics on the Connection Machine. Meanwhile, Brosl Hasslacher (mentioned above in connection with his 1982 encryption startup) had—after a rather itinerant career as a physicist—landed at Los Alamos. And in fact I had been asked by the Los Alamos management for a letter about him in December 1984 (yes, even though he was 18 years older than me), and ended what I wrote with: “He has considerable ability in identifying promising areas of research. I think he would be a significant addition to the staff at Los Alamos.”
Well, in early 1985 Brosl identified cellular automaton fluid dynamics as a promising area, and started energetically talking to me about it. Meanwhile, the Connection Machine was just starting to work, and a young software engineer named Jim Salem was assigned to help me get cellular automaton fluid dynamics running on it. I didn’t know it at the time, but Brosl—ever the opportunist—had also made contact with Uriel Frisch, and now I find the curious document in French dated May 10, 1985, with the translated title “A New Concept for Supercomputers: Cellular Automata”, laying out a grand international multiyear plan, and referencing the (so far as I know, nonexistent) B. Hasslacher and U. Frisch (1985), “The Cellular Automaton Turbulence Machine”, Los Alamos:
I visited Los Alamos again in May, but for much of the summer I was at Thinking Machines, and on July 18 Uriel Frisch came to visit there, along with a French physicist named Yves Pomeau, who had done some nice work in the 1970s on applying methods of traditional statistical mechanics to “lattice gases”.
But what about realistic fluid dynamics, and turbulence? I wasn’t sure how easy it would be to “build up from the (idealized) molecules” to get to pictures of recognizable fluid flows. But we were starting to have some success in generating at least basic results. It wasn’t clear how seriously anyone else was taking this (especially given that at the time I hadn’t seen the material Frisch had already written), but insofar as anything was “going on”, it seemed to be a perfectly collegial interaction—where perhaps Los Alamos or the French government or both would buy a Connection Machine computer. But meanwhile, on the technical side, it had become clear that the most obvious squarelattice model (that Pomeau had used in the 1970s, and that was basically what my SPART program from 1973 was supposed to implement) was fine for diffusion processes, but couldn’t really represent proper fluid flow.
When I first started working on cellular automata in 1981 the minimal 1D case in which I was most interested had barely been studied, but there had been quite a bit of work done in previous decades on the 2D case. By the 1980s, however, it had mostly petered out—with the exception of a group at MIT led by Ed Fredkin, who had long had the belief that one might in effect be able to “construct all of physics” using cellular automata. Tom Toffoli and Norm Margolus, who were working with him, had built a hardware 2D cellular automaton simulator—that I happened to photograph in 1982 when visiting Fredkin’s island in the Caribbean:
But while “all of physics” was elusive (and our Physics Project suggests that a cellular automaton with a rigid lattice is not the right place to start), there’d been success in making for example an idealized gas, using essentially a block cellular automaton on a square grid. But mostly the cellular automaton machine was used in a maddeningly “Look at this cool thing!” mode, often accompanied by rapid physical rewiring.
In early 1984 I visited MIT to use the machine to try to do what amounted to natural science, systematically studying 2D cellular automata. The result was a paper (with Norman Packard) on 2D cellular automata. We restricted ourselves to square grids, though mentioned hexagonal ones, and my article in Scientific American in late 1984 opened with a fullpage hexagonal cellular automaton simulation of a snowflake made by Packard (and later in 1984 turned into one of a set of cellular automaton cards for sale):
In any case, in the summer of 1985, with square lattices not doing what was needed, it was time to try hexagonal ones. I think Yves Pomeau already had a theoretical argument for this, but as far as I was concerned, it was (at least at first) just a “next thing to try”. Programming the Connection Machine was at that time a rather laborious process (which, almost unprecedentedly for me, I wasn’t doing myself), and mapping a hexagonal grid onto its basically square architecture was a little fiddly, as my notes record:
Meanwhile, at Los Alamos, I’d introduced a young and very computersavvy protege of mine named Tsutomu Shimomura (who had a habit of getting himself into computer security scrapes, though would later become famous for taking down a wellknown hacker) to Brosl Hasslacher, and now Tsutomu jumped into writing optimized code to implement hexagonal cellular automata on a Cray supercomputer.
In my archives I now find a draft paper from September 7 that starts with a nice (if not entirely correct) discussion of what amounts to computational irreducibility, and then continues by giving theoretical symmetrybased arguments that a hexagonal cellular automaton should be able to reproduce fluid mechanics:
Near the end, the draft says (misspelling Tsutomu Shimomura’s name):
Meanwhile, we (as well as everyone else) were starting to get results that looked at least suggestive:
By November 15 I had drafted a paper
that included some more detailed pictures
and that at the end (I thought, graciously) thanked Frisch, Hasslacher, Pomeau and Shimomura for “discussions and for sharing their unpublished results with us”, which by that point included a bunch of suggestive, if not obviously correct, pictures of fluidflowlike behavior.
To me, what was important about our paper is that, after all these years, it filled in with more detail just how computational systems like cellular automata could lead to SecondLawstyle thermodynamic behavior, and it “proved” the physicality of what was going on by showing easytorecognize fluiddynamicslike behavior.
Just four days later, though, there was a big surprise. The Washington Post ran a frontpage story—alongside the day’s characteristicColdWarera geopolitical news—about the “Hasslacher–Frisch model”, and about how it might be judged so important that it “should be classified to keep it out of Soviet hands”:
At that point, things went crazy. There was talk of Nobel Prizes (I wasn’t buying it). There were official complaints from the French embassy about French scientists not being adequately recognized. There was upset at Thinking Machines for not even being mentioned. And, yes, as the originator of the idea, I was miffed that nobody seemed to have even suggested contacting me—even if I did view the rather breathless and “geopolitical” tenor of the article as being pretty far from immediate reality.
At the time, everyone involved denied having been responsible for the appearance of the article. But years later it emerged that the source was a certain John Gage, former political operative and longtime marketing operative at Sun Microsystems, who I’d known since 1982, and had at some point introduced to Brosl Hasslacher. Apparently he’d called around various government contacts to help encourage open (international) sharing of scientific code, quoting this as a test case.
But as it was, the article had pretty much exactly the opposite effect, with everyone now out for themselves. In Princeton, I’d interacted with Steve Orszag, whose funding for his new (traditional) computational fluid dynamics company, Nektonics, now seemed at risk, and who pulled me into an emergency effort to prove that cellular automaton fluid dynamics couldn’t be competitive. (The paper he wrote about this seemed interesting, but I demurred on being a coauthor.) Meanwhile, Thinking Machines wanted to file a patent as quickly as possible. Any possibility of the French government getting a Connection Machine evaporated and soon Brosl Hasslacher was claiming that “the French are faking their data”.
And then there was the matter of the various academic papers. I had been sent the Frisch–Hasslacher–Pomeau paper to review, and checking my 1985 calendar for my whereabouts I must have received it the very day I finished my paper. I told the journal they should publish the paper, suggesting some changes to avoid naivete about computing and computer technology, but not mentioning its very thin recognition of my work.
Our paper, on the other hand, triggered a rather indecorous competitive response, with two “anonymous reviewers” claiming that the paper said nothing more than its “reference 5” (the Frisch–Hasslacher–Pomeau paper). I patiently pointed out that that wasn’t the case, not least because our paper had actual simulations, but also that actually I happened to have “been there first” with the overall idea. The journal solicited other opinions, which were mostly supportive. But in the end a certain Leo Kadanoff swooped in to block it, only to publish his own a few months later.
It felt corrupt, and distasteful. I was at that point a successful and increasingly established academic. And some of the people involved were even longtime friends. So was this kind of thing what I had to look forward to in a life in academia? That didn’t seem attractive, or necessary. And it was what began the process that led me, a year and a half later, to finally choose to leave academia behind, never to return.
Still, despite the “turbulence”—and in the midst of other activities—I continued to work hard on cellular automaton fluids, and by January 1986 I had the first version of a long (and, I thought, rather good) paper on their basic theory (that was finished and published later that year):
As it turns out, the methods I used in that paper provide some important seeds for our Physics Project, and even in recent times I’ve often found myself referring to the paper, complete with its SMP opencode appendix:
But in addition to developing the theory, I was also getting simulations done on the Connection Machine, and getting actual experimental data (particularly on flow past cylinders) to compare them to. By February 1986, we had quite a few results:
But by this point there was a quite industrial effort, particularly in France, that was churning out papers on cellular automaton fluids at a high rate. I’d called my theory paper “Cellular Automaton Fluid 1: Basic Theory”. But was it really worth finishing part 2? There was a veritable army of perfectly good physicists “competing” with me. And, I thought, “I have other things to do. Just let them do this. This doesn’t need me”.
And so it was that in the middle of 1986 I stopped working on cellular automaton fluids. And, yes, that freed me up to work on lots of other interesting things. But even though methods derived from cellular automaton fluids have become widely used in practical fluid dynamics computations, the key basic science that I thought could be addressed with cellular automaton fluids—about things like the origin of randomness in turbulence—has still, even to this day, not really been further explored.
In June 1986 I was about to launch both a research center (the Center for Complex Systems Research at the University of Illinois) and a journal (Complex Systems)—and I was also organizing a conference called CA ’86 (which was held at MIT). The core of the conference was poster presentations, and a few days before the conference was to start I decided I should find a “nice little project” that I could quickly turn into a poster.
In studying cellular automaton fluids I had found that cellular automata with rules based on idealized physical molecular dynamics could on a large scale approximate the continuum behavior of fluids. But what if one just started from continuum behavior? Could one derive underlying rules that would reproduce it? Or perhaps even find the minimal such rules?
By mid1985 I felt I’d made decent progress on the science of cellular automata. But what about their engineering? What about constructing cellular automata with particular behavior? In May 1985 I had given a conference talk about “Cellular Automaton Engineering”, which turned into a paper about “Approaches to Complexity Engineering”—that in effect tried to set up “trainable cellular automata” in what might still be a powerful simpleprogramsmeetmachinelearning scheme that deserves to be explored:
But so it was that a few days before the CA ’86 conference I decided to try to find a minimal “cellular automaton approximation” to a simple continuum process: diffusion in one dimension.
I explained
and described as my objective:
I used block cellular automata, and tried to find rules that were reversible and also conserved something that could serve as “microscopic density” or “particle number”. I quickly determined that there were no such rules with 2 colors and blocks of sizes 2 or 3 that achieved any kind of randomization.
To go to 3 colors, I used SMP to generate candidate rules
where for example the function Apper can be literally be translated into Wolfram Language as
or, more idiomatically, just
then did what I have done so many times and just printed out pictures of their behavior:
Some clearly did not show randomization, but a couple did. And soon I was studying what I called the “winning rule”, which—like rule 30—went from simple initial conditions to apparent randomness:
I analyzed what the rule was “microscopically doing”
and explored its longertime behavior:
Then I did things like analyze its cycle structure in a finitesize region by running C programs I’d basically already developed back in 1982 (though now they were modified to automatically generate troff code for typesetting):
And, like rule 30, the “winning rule” that I found back in June 1986 has stayed with me, essentially as a minimal example of reversible, numberconserving randomness. It appeared in A New Kind of Science, and it appears now in my recent work on the Second Law—and, of course, the patterns it makes are always the same:
Back in 1986 I wanted to know just how efficiently a simple rule like this could reproduce continuum behavior. And in a portent of observer theory my notes from the time talk about “optimal coarse graining, where the 2nd law is ‘most true’”, then go on to compare the distributed character of the cellular automaton with traditional “collect information into numerical value” finitedifference approximations:
In a talk I gave I summarized my understanding:
The phenomenon of randomization is generic in computational systems (witness rule 30, the “winning rule”, etc.) This leads to the genericity of thermodynamics. And this in turn leads to the genericity of continuum behavior, with diffusion and fluid behavior being two examples.
It would take another 34 years, but these basic ideas would eventually be what underlies our Physics Project, and our understanding of the emergence of things like spacetime. As well as now being crucial to our whole understanding of the Second Law.
By the end of 1986 I had begun the development of Mathematica, and what would become the Wolfram Language, and for most of the next five years I was submerged in technology development. But in 1991 I started to use the technology I now had, and began the project that became A New Kind of Science.
Much of the first couple of years was spent exploring the computational universe of simple programs, and discovering that the phenomena I’d discovered in cellular automata were actually much more general. And it was seeing that generality that led me to the Principle of Computational Equivalence. In formulating the concept of computational irreducibility I’d in effect been thinking about trying to “reduce” the behavior of systems using an external aspowerfulaspossible universal computer. But now I’d realized I should just be thinking about all systems as somehow computationally equivalent. And in doing that I was pulling the conception of the “observer” and their computational ability closer to the systems they were observing.
But the further development of that idea would have to wait nearly three more decades, until the arrival of our Physics Project. In A New Kind of Science, Chapter 7 on “Mechanisms in Programs and Nature” describes the concept of intrinsic randomness generation, and how it’s distinguished from other sources of randomness. Chapter 8 on “Implications for Everyday Systems” then has a section on fluid flow, where I describe the idea that randomness in turbulence could be intrinsically generated, making it, for example, repeatable, rather than inevitably different every time an experiment is run.
And then there’s Chapter 9, entitled “Fundamental Physics”. The majority of the chapter—and its “most famous” part—is the presentation of the direct precursor to our Physics Project, including the concept of graphrewritingbased computational models for the lowestlevel structure of spacetime and the universe.
But there’s an earlier part of Chapter 9 as well, and it’s about the Second Law. There’s a precursor about “The Notion of Reversibility”, and then we’re on to a section about “Irreversibility and the Second Law of Thermodynamics”, followed by “Conserved Quantities and Continuum Phenomena”, which is where the “winning rule” I discovered in 1996 appears again:
My records show I wrote all of this—and generated all the pictures—between May 2 and July 11, 1995. I felt I already had a pretty good grasp of how the Second Law worked, and just needed to write it down. My emphasis was on explaining how a microscopically reversible rule—through its intrinsic ability to generate randomness—could lead to what appears to be irreversible behavior.
Mostly I used reversible 1D cellular automata as my examples, showing for example randomization both forwards and backwards in time:
I soon got to the nub of the issue with irreversibility and the Second Law:
I talked about how “typical textbook thermodynamics” involves a bunch of details about energy and motion, and to get closer to this I showed a simple example of an “ideal gas” 2D cellular automaton:
But despite my early exposure to hardsphere gases, I never went as far as to use them as examples in A New Kind of Science. We did actually take some photographs of the mechanics of reallife billiards:
But cellular automata always seemed like a much clearer way to understand what was going on, free from issues like numerical precision, or their physical analogs. And by looking at cellular automata I felt as if I could really see down the foundations of the Second Law, and why it was true.
And mostly it was a story of computational irreducibility, and intrinsic randomness generation. But then there was rule 37R. I’ve often said that in studying the computational universe we have to remember that the “computational animals” are at least as smart as we are—and they’re always up to tricks we don’t expect.
And so it is with rule 37R. In 1986 I’d published a book of cellular automaton papers, and as an appendix I’d included lots of tables of properties of cellular automata. Almost all the tables were about the ordinary elementary cellular automata. But as a kind of “throwaway” at the very end I gave a table of the behavior of the 256 secondorder reversible versions of the elementary rules, including 37R starting both from completely random initial conditions
and from single black cells:
So far, nothing remarkable. And years go by. But then—apparently in the middle of working on the 2D systems section of A New Kind of Science—at 4:38am on February 21, 1994 (according to my filesystem records), I generate pictures of all the reversible elementary rules again, but now from initial conditions that are slightly more complicated than a single black cell. Opening the notebook from that time (and, yes, Wolfram Language and our notebook format have been stable enough that 28 years later that still works) it shows up tiny on a modern screen, but there it is: rule 37R doing something “interesting”:
Clearly I noticed it. Because by 4:47am I’ve generated lots of pictures of rule 37R, like this one evolving from a block of 21 black cells, and showing only every other step
and by 4:54am I’ve got things like:
My guess is that I was looking for class 4 behavior in reversible cellular automata. And with rule 37R I’d found it. And at the time I moved on to other things. (On March 1, 1994, I slipped on some ice and broke my ankle, and was largely out of action for several weeks.)
And that takes us back to May 1995, when I was working on writing about the Second Law. My filesystem records that I did quite a few more experiments on rule 37R then, looking at different initial conditions, and running it as long as I could, to see if its strange neithersimplenorrandomizing—and not very SecondLawlike—behavior would somehow “resolve”.
Up to that moment, for nearly a quarter of a century, I had always fundamentally believed in the Second Law. Yes, I thought there might be exceptions with things like selfgravitating systems. But I’d always assumed that—perhaps with some pathological exceptions—the Second Law was something quite universal, whose origins I could even now understand through computational irreducibility.
But seeing rule 37R this suddenly didn’t seem right. In A New Kind of Science I included a long run of rule 37R (here colorized to emphasize the structure)
then explained:
How could one describe what was happening in rule 37R? I discussed the idea that it was effectively forming “membranes” which could slowly move, but keep things “modular” and organized inside. I summarized at the time, tagging it as “something I wanted to explore in more detail one day”:
Rounding out the rest of A New Kind of Science take another seven years of intense work. But finally in May 2002 it was published. The book talked about many things. And even within Chapter 9 my discussion of the Second Law was overshadowed by the outline I gave of an approach to finding a truly fundamental theory of physics—and of the ideas that evolved into our Physics Project.
After A New Kind of Science was finished I spent many years working mainly on technology—building WolframAlpha, launching the Wolfram Language and so on. But “follow up on Chapter 9” was always on my longterm todo list. The biggest—and most difficult—part of that had to do with fundamental physics. But I still had a great intellectual attachment to the Second Law, and I always wanted to use what I’d then understood about the computational paradigm to “tighten up” and “round out” the Second Law.
I’d mention it to people from time to time. Usually the response was the same: “Wasn’t the Second Law understood a century ago? What more is there to say?” Then I’d explain, and it’d be like “Oh, yes, that is interesting”. But somehow it always seemed like people felt the Second Law was “old news”, and that whatever I might do would just be “dotting an i or crossing a t”. And in the end my Second Law project never quite made it onto my active list, despite the fact that it was something I always wanted to do.
Occasionally I would write about my ideas for finding a fundamental theory of physics. And, implicitly I’d rely on the understanding I’d developed of the foundations and generalization of the Second Law. In 2015, for example, celebrating the centenary of General Relativity, I wrote about what spacetime might really be like “underneath”
and how a perceived spacetime continuum might emerge from discrete underlying structure like fluid behavior emerges from molecular dynamics—in effect through the operation of a generalized Second Law:
It was 17 years after the publication of A New Kind of Science that (as I’ve described elsewhere) circumstances finally aligned to embark on what became our Physics Project. And after all those years, the idea of computational irreducibility—and its immediate implications for the Second Law—had come to seem so obvious to me (and to the young physicists with whom I worked) that they could just be taken for granted as conceptual building blocks in constructing the tower of ideas we needed.
One of the surprising and dramatic implications of our Physics Project is that General Relativity and quantum mechanics are in a sense both manifestations of the same fundamental phenomenon—but played out respectively in physical space and in branchial space. But what really is this phenomenon?
What became clear is that ultimately it’s all about the interplay between underlying computational irreducibility and our nature as observers. It’s a concept that had its origins in my thinking about the Second Law. Because even in 1984 I’d understood that the Second Law is about our inability to “decode” underlying computationally irreducible behavior.
In A New Kind of Science I’d devoted Chapter 10 to “Processes of Perception and Analysis”, and I’d recognized that we should view such processes—like any processes in nature or elsewhere—as being fundamentally computational. But I still thought of processes of perception and analysis as being separated from—and in some sense “outside”—actual processes we might be studying. But in our Physics Project we’re studying the whole universe, so inevitably we as observers are “inside” and part of the system.
And what then became clear is the emergence of things like General Relativity and quantum mechanics depends on certain characteristics of us as observers. “Alien observers” might perceive quite different laws of physics (or no systematic laws at all). But for “observers like us”, who are computationally bounded and believe we are persistent in time, General Relativity and quantum mechanics are inevitable.
In a sense, therefore, General Relativity and quantum mechanics become “abstractly derivable” given our nature as observers. And the remarkable thing is that at some level the story is exactly the same with the Second Law. To me it’s a surprising and deeply beautiful scientific unification: that all three of the great foundational theories of physics—General Relativity, quantum mechanics and statistical mechanics—are in effect manifestations of the same core phenomenon: an interplay between computational irreducibility and our nature as observers.
Back in the 1970s I had no inkling of all this. And even when I chose to combine my discussions of the Second Law and of my approach to a fundamental theory of physics into a single chapter of A New Kind of Science, I didn’t know how deeply these would be connected. It’s been a long and winding path, that’s needed to pass through many different pieces of science and technology. But in the end the feeling I had when I first studied that book cover when I was 12 years old that “this was something fundamental” has played out on a scale almost incomprehensibly beyond what I had ever imagined.
Most of my journey with the Second Law has had to do with understanding origins of randomness, and their relation to “typical SecondLaw behavior”. But there’s another piece—still incompletely worked out—which has to do with surprises like rule 37R, and, more generally, with largescale versions of class 4 behavior, or what I’ve begun to call the “mechanoidal phase”.
I first identified class 4 behavior as part of my systematic exploration of 1D cellular automata at the beginning of 1983—with the “code 20” k = 2, r = 2 totalistic rule being my first clear example:
Very soon my searches had identified a whole variety of localized structures in this rule:
At the time, the most significant attribute of class 4 cellular automata as far as I was concerned was that they seemed likely to be computation universal—and potentially provably so. But from the beginning I was also interested in what their “thermodynamics” might be. If you start them off from random initial conditions, will their patterns die out, or will some arrangement of localized structures persist, and perhaps even grow?
In most cellular automata—and indeed most systems with local rules—one expects that at least their statistical properties will somehow stabilize when one goes to the limit of infinite size. But, I asked, does that infinitesize limit even “exist” for class 4 systems—or if you progressively increase the size, will the results you get keep on jumping around forever, perhaps as you succeed in sampling progressively more exotic structures?
A paper I wrote in September 1983 talks about the idea that in a sufficiently large class 4 cellular automaton one would eventually get selfreproducing structures, which would end up “taking over everything”:
The idea that one might be able to see “biologylike” selfreproduction in cellular automata has a long history. Indeed, one of the multiple ways that cellular automata were invented (and the one that led to their name) was through John von Neumann’s 1952 effort to construct a complicated cellular automaton in which there could be a complicated configuration capable of selfreproduction.
But could selfreproducing structures ever “occur naturally” in cellular automata? Without the benefit of intuition from things like rule 30, von Neumann assumed that something like selfreproduction would need an incredibly complicated setup, as it seems to have, for example, in biology. But having seen rule 30—and more so class 4 cellular automata—it didn’t seem so implausible to me that even with very simple underlying rules, there could be fairly simple configurations that would show phenomena like selfreproduction.
But for such a configuration to “occur naturally” in a random initial condition might require a system with exponentially many cells. And I wondered if in the oceans of the early Earth there might have been only “just enough” molecules for something like a selfreproducing lifeform to occur.
Back in 1983 I already had pretty efficient code for searching for structures in class 4 cellular automata. But even running for days at a time, I never found anything more complicated than purely periodic (if sometimes moving) structures. And in March 1985, following an article about my work in Scientific American, I appealed to the public to find “interesting structures”—like “glider guns” that would “shoot out” moving structures:
As it happened, right before I made my “public appeal”, a student at Princeton working with a professor I knew had sent me a glider gun he’d found the k = 2, r = 3 totalistic code 88 rule:
At the time, though, with computer displays only large enough to see behavior like
I wasn’t convinced this was an “ordinary class 4 rule”—even though now, with the benefit of higher display resolution, it seems more convincing:
The “public appeal” generated a lot of interesting feedback—but no glider guns or other exotic structures in the rules I considered “obviously class 4”. And it wasn’t until after I started working on A New Kind of Science that I got back to the question. But then, on the evening of December 31, 1991, using exactly the same code as in 1983, but now with faster computers, there it was: in an ordinary class 4 rule (k = 3, r = 1 code 1329), after finding several localized structures, there was one that grew without bound (albeit not in the most obvious “glider gun” way):
But that wasn’t all. Exemplifying the principle that in the computational universe there are always surprises, searching a little further revealed yet other unexpected structures:
Every few years something else would come up with class 4 rules. In 1994, lots of work on rule 110. In 1995, the surprise of rule 37R. In 1998 efforts to find analogs of particles that might carry over to my graphbased model of space.
After A New Kind of Science was published in 2002, we started our annual Wolfram Summer School (at first called the NKS Summer School)—and in 2010 our High School Summer Camp. Some years we asked students to pick their “favorite cellular automaton”. Often they were class 4:
And occasionally someone would do a project to explore the world of some particular class 4 rule. But beyond those specifics—and statements about computation universality—it’s never been clear quite what one could say about class 4.
Back in 1984 in the series of cellular automaton postcards I’d produced, there were a couple of class 4 examples:
And even then the typical response to these images was that they looked “organic”—like the kind of thing living organisms might produce. A decade later—for A New Kind of Science—I studied “organic forms” quite a bit, trying to understand how organisms get their overall shapes, and surface patterns. Mostly that didn’t end up being a story of class 4 behavior, though.
Since the early 1980s I’ve been interested in molecular computing, and in how computation might be done at the level of molecules. My discoveries in A New Kind of Science (and specifically the Principle of Computational Equivalence) convinced me that it should be possible to get even fairly simple collections of molecules to “do arbitrary computations” or even build more or less arbitrary structures (in a more general and streamlined way than happens with the whole protein synthesis structure in biology). And over the years, I sometimes thought about trying to do practical work in this area. But it didn’t feel as if the ambient technology was quite ready. So I never jumped in.
Meanwhile, I’d long understood the basic correspondence between multiway systems and patterns of possible pathways for chemical reactions. And after our Physics Project was announced in 2020 and we began to develop the general multicomputational paradigm, I immediately considered molecular computing a potential application. But just what might the “choreography” of molecules be like? What causal relationships might there be, for example, between different interactions of the same molecule? That’s not something ordinary chemistry—dealing for example with liquidphase reactions—tends to consider important.
But what I increasingly started to wonder is whether in molecular biology it might actually be crucial. And even in the 20 years since A New Kind of Science was published, it’s become increasingly clear that in molecular biology things are extremely “orchestrated”. It’s not about molecules randomly moving around, like in a liquid. It’s about molecules being carefully channeled and actively transported from one “event” to another.
Class 3 cellular automata seem to be good “metamodels” for things like liquids, and readily give SecondLawlike behavior. But what about the kind of situation that seems to exist in molecular biology? It’s something I’ve been thinking about only recently, but I think this is a place where class 4 cellular automata can contribute. I’ve started calling the “bulk limit” of class 4 systems the “mechanoidal phase”. It’s a place where the ordinary Second Law doesn’t seem to apply.
Four decades ago when I was trying to understand how structure could arise “in violation of the Second Law” I didn’t yet even know about computational irreducibility. But now we’ve come a lot further, in particular with the development of the multicomputational paradigm, and the recognition of the importance of the characteristics of the observer in defining what perceived overall laws there will be. It’s an inevitable feature of computational irreducibility that there will always be an infinite sequence of new challenges for science, and new pieces of computational reducibility to be found. So, now, yes, a challenge is to understand the mechanoidal phase. And with all the tools and ideas we’ve developed, I’m hoping the process will happen more than it has for the ordinary Second Law.
I began my quest to understand the Second Law a bit more than 50 years ago. And—even though there’s certainly more to say and figure out—it’s very satisfying now to be able to bring a certain amount of closure to what has been the single longestrunning piece of intellectual “unfinished business” in my life. It’s been an interesting journey—that’s very much relied on, and at times helped drive, the tower of science and technology that I’ve spent my life building. There are many things that might not have happened as they did. And in the end it’s been a story of longterm intellectual tenacity—stretching across much of my life so far.
For a long time I’ve kept (automatically when possible) quite extensive archives. And now these archives allow one to reconstruct in almost unprecedented detail my journey with the Second Law. One sees the gradual formation of intellectual frameworks over the course of years, then the occasional discovery or realization that allows one to take the next step in what is sometimes mere days. There’s a curious interweaving of computational and essentially philosophical methodologies—with an occasional dash of mathematics.
Sometimes there’s general intuition that’s significantly ahead of specific results. But more often there’s a surprise computational discovery that seeds the development of new intuition. And, yes, it’s a little embarrassing how often I managed to generate in a computer experiment something that I completely failed to interpret or even notice at first because I didn’t have the right intellectual framework or intuition.
And in the end, there’s an air of computational irreducibility to the whole process: there really wasn’t a way to shortcut the intellectual development; one just had to live it. Already in the 1990s I had taken things a fair distance, and I had even written a little about what I had figured out. But for years it hung out there as one of a small collection of unfinished projects: to finally round out the intellectual story of the Second Law, and to write down an exposition of it. But the arrival of our Physics Project just over two years ago brought both a cascade of new ideas, and for me personally a sense that even things that had been out there for a long time could in fact be brought to closure.
And so it is that I’ve returned to the quest I began when I was 12 years old—but now with five decades of new tools and new ideas. The wonder and magic of the Second Law is still there. But now I’m able to see it in a much broader context, and to realize that it’s not just a law about thermodynamics and heat, but instead a window into a very general computational phenomenon. None of this I could know when I was 12 years old. But somehow the quest I was drawn to all those years ago has turned out to be deeply aligned with the whole arc of intellectual development that I have followed in my life. And no doubt it’s no coincidence.
But for now I’m just grateful to have had the quest to understand Second Law as one of my guiding forces through so much of my life, and now to realize that my quest was part of something so broad and so deep.
What is the backstory of the book cover that launched my long journey with the Second Law? The book was published in 1965, and inside its front flap we find:
On page 7 we then find:
In 2001—as I was putting the finishing touches to the historical notes for A New Kind of Science—I tracked down Berni Alder (who died in 2020 at the age of 94) to ask him the origin of the pictures. It turned out to be a complex story, reaching back to the earliest serious uses of computers for basic science, and even beyond.
The book had been born out the sense of urgency around science education in the US that followed the launch of Sputnik by the Soviet Union—with a group of professors from Berkeley and Harvard believing that the teaching of freshman college physics was in need of modernization, and that they should write a series of textbooks to enable this. (It was also the time of the “new math”, and a host of other STEMrelated educational initiatives.) Fred Reif (who died at the age of 92 in 2019) was asked to write the statistical physics volume. As he explained in the preface to the book
ending with:
Well, it’s taken me 50 years to get to the point where I think I really understand the Second Law that is at the center of the book. And in 2001 I was able to tell Fred Reif that, yes, his book had indeed been useful. He said he was pleased to learn that, adding “It is all too rare that one’s educational efforts seem to bear some fruit.”
He explained to me that when he was writing the book he thought that “the basic ideas of irreversibility and fluctuations might be very vividly illustrated by the behavior of a gas of particles spreading through a box”. He added: “It then occurred to me that Berni Alder might actually show this by a computer generated film since he had worked on molecular dynamics simulations and had also good computer facilities available to him. I was able to enlist Berni’s interest in this project, with the results shown in my book.”
The acknowledgements in the book report:
Berni Alder and and Fred Reif did indeed create a “film loop”, which “could be bought separately from the book and viewed in the physics lab”, as Alder told me, adding that “I understand the students liked it very much, but the venture was not a commercial success.” Still, he sent me a copy of a videotape version:
The film (which has no sound) begins:
Soon it’s showing an actual process of “coming to equilibrium”:
“However”, as Alder explained it to me, “if a large number of particles are put in the corner and the velocities of all the particles are reversed after a certain time, the audience laughs or is supposed to after all the particles return to their original positions.” (One suspects that particularly in the 1960s this might have been reminiscent of various cartoonfilm gags.)
OK, so how were the pictures (and the film) made? It was done in 1964 at what’s now Lawrence Livermore Lab (that had been created in 1952 as a spinoff of the Berkeley Radiation Lab, which had initiated some key pieces for the Manhattan Project) on a computer called the LARC (“Livermore Advanced Research Computer”), first made in 1960, that was probably the most advanced scientific computer of the time. Alder explained to me, however: “We could not run the problem much longer than about 10 collision times with 64 bits [sic] arithmetic before the roundoff error prevented the particles from returning.”
Why did they start the particles off in a somewhat random configuration? (The randomness, Alder told me, had been created by a middlesquare random number generator.) Apparently if they’d been in a regular array—which would have made the whole process of randomization much easier to see—the roundoff errors would have been too obvious. (And it’s issues like this that made it so hard to recognize the rule 30 phenomenon in systems based on real numbers—and without the idea of just studying simple programs not tied to traditional equationbased formulations of physics.)
The actual code for the molecular dynamics simulation was written in assembler and run by Mary Ann Mansigh (Karlsen), who had a degree in math and chemistry and worked as a programmer at Livermore from 1955 until the 1980s, much of the time specifically with Alder. Here she is at the console of the LARC (yes, computers had builtin desks in those days):
The program that was used was called STEP, and the original version of it had actually been written (by a certain Norm Hardy, who ended up having a long Silicon Valley career) to run on a previous generation of computer. (A stillearlier program was called APE, for “Approach to Equilibrium”.) But it was only with the LARC—and STEP—that things were fast enough to run substantial simulations, at the rate of about 200,000 collisions per hour (the simulation for the book cover involved 40 particles and about 500 collisions). At the time of the book STEP used an n^{2} algorithm where all pairs of particles were tested for collisions; later a neighborhoodbased linked list method was used.
The standard method of getting output from a computer back in 1964—and basically until the 1980s—was to print characters on paper. But the LARC could also drive an oscilloscope, and it was with this that the graphics for the book were created (capturing them from the oscilloscope screen with a Polaroid instant camera).
But why was Berni Alder studying molecular dynamics and “hard sphere gases” in the first place? Well, that’s another long story. But ultimately it was driven by the effort to develop a microscopic theory of liquids.
The notion that gases might consist of discrete molecules in motion had arisen in the 1700s (and even to some extent in antiquity), but it was only in the mid1800s that serious development of the “kinetic theory” idea began. Pretty immediately it was clear how to derive the ideal gas law P V = R T for essentially noninteracting molecules. But what analog of this “equation of state” might apply to gases with significant interactions between molecules, or, for that matter, liquids? In 1873 Johannes Diderik van der Waals proposed, on essentially empirical grounds, the formula (P + a/V^{2})(V–b) = RT—where the parameter b represented “excluded volume” taken up by molecules, that were implicitly being viewed as hard spheres. But could such a formula be derived—like the ideal gas law—from a microscopic kinetic theory of molecules? At the time, nobody really knew how to start, and the problem languished for more than half a century.
(It’s worth pointing out, by the way, that the idea of modeling gases, as opposed to liquids, as collections of hard spheres was extensively pursued in the mid1800s, notably by Maxwell and Boltzmann—though with their traditional mathematical analysis methods, they were limited to studying average properties of what amount to dilute gases.)
Meanwhile, there was increasing interest in the microscopic structure of liquids, particularly among chemists concerned for example with how chemical solutions might work. And at the end of the 1920s the technique of xray diffraction, which had originally been used to study the microscopic structure of crystals, was applied to liquids—allowing in particular the experimental determination of the radial distribution function (or pair correlation function) g(r), which gives the probability to find another molecule a distance r from a given one.
But how might this radial distribution function be computed? By the mid1930s there were several proposals based on looking at the statistics of random assemblies of hard spheres:
Some tried to get results by mathematical methods; others did physical experiments with ball bearings and gelatin balls, getting at least rough agreement with actual experiments on liquids:
But then in 1939 a physical chemist named John Kirkwood gave an actual probabilistic derivation (using a variety of simplifying assumptions) that fairly closely reproduced the radial distribution function:
But what about just computing from first principles, on the basis of the mechanics of colliding molecules? Back in 1872 Ludwig Boltzmann had proposed a statistical equation (the “Boltzmann transport equation”) for the behavior of collections of molecules, that was based on the approximation of independent probabilities for individual molecules. By the 1940s the independence assumption had been overcome, but at the cost of introducing an infinite hierarchy of equations (the “BBGKY hierarchy”, where the “K” stood for Kirkwood). And although the full equations were intractable, approximations were suggested that—while themselves mathematically sophisticated—seemed as if they should, at least in principle, be applicable to liquids.
Meanwhile, in 1948, Berni Alder, fresh from a master’s degree in chemical engineering, and already interested in liquids, went to Caltech to work on a PhD with John Kirkwood—who suggested that he look at a couple of approximations to the BBGKY hierarchy for the case of hard spheres. This led to some nasty integrodifferential equations which couldn’t be solved by analytical techniques. Caltech didn’t yet have a computer in the modern sense, but in 1949 they acquired an IBM 604 Electronic Calculating Punch, which could be wired to do calculations with input and output specified on punched cards—and it was on this machine that Alder got the calculations he needed done (the paper records that “[this] … was calculated … with the use of IBM equipment and the file of punched cards of sin(ut) employed in these laboratories for electron diffraction calculation”):
Our story now moves to Los Alamos, where in 1947 Stan Ulam had suggested the Monte Carlo method as a way to study neutron diffusion. In 1949 the method was implemented on the ENIAC computer. And in 1952 Los Alamos got its own MANIAC computer. Meanwhile, there was significant interest at Los Alamos in computing equations of state for matter, especially in extreme conditions such as those in a nuclear explosion. And by 1953 the idea had arisen of using the Monte Carlo method to do this.
The concept was to take a collection of hard spheres (or actually 2D disks), and move them randomly in a series of steps with the constraint that they could not overlap—then look at the statistics of the resulting “equilibrium” configurations. This was done on the MANIAC, with the resulting paper now giving “Monte Carlo results” for things like the radial distribution function:
Kirkwood and Alder had been continuing their BBGKY hierarchy work, now using more realistic LennardJones forces between molecules. But by 1954 Alder was also using the Monte Carlo method, implementing it partly (rather painfully) on the IBM Electronic Calculating Punch, and partly on the Manchester Mark II computer in the UK (whose documentation had been written by Alan Turing):
In 1955 Alder started working fulltime at Livermore, recruited by Edward Teller. Another Livermore recruit—fresh from a physics PhD—was Thomas Wainwright. And soon Alder and Wainwright came up with an alternative to the Monte Carlo method—that would eventually give the book cover pictures: just explicitly compute the dynamics of colliding hard spheres, with the expectation that after enough collisions the system would come to equilibrium and allow things like equations of state to be obtained.
In 1953 Livermore had obtained its first computer: a Remington Rand Univac I. And it was on this computer that Alder and Wainwright did a first proof of concept of their method, tracing 100 hard spheres with collisions computed at the rate of about 100 per hour. Then in 1955 Livermore got IBM 704 computers, which, with their hardware floatingpoint capabilities, were able to compute about 2000 collisions per hour.
Alder and Wainwright reported their first results at a statistical mechanics conference in Brussels in August 1956 (organized by Ilya Prigogine). The published version appeared in 1958:
It gives evidence—that they tagged as “provisional”—for the emergence of a Maxwell–Boltzmann velocity distribution “after the system reached equilibrium”
as well as things like the radial distribution function—and the equation of state:
It was notable that there seemed to be a discrepancy between the results for the equation of state computed by explicit molecular dynamics and by the Monte Carlo method. And what is more, there seemed to be evidence of some kind of discontinuous phasetransitionlike behavior as the density of spheres changed (an effect which Kirkwood had predicted in 1949).
Given the small system sizes and short runtimes it was all a bit muddy. But by August 1957 Alder and Wainwright announced that they’d found a phase transition, presumably between a highdensity phase where the spheres were packed together like in a crystalline solid, and a lowdensity phase, where they were able to more freely “wander around” like in a liquid or gas. Meanwhile, the group at Los Alamos had redone their Monte Carlo calculations, and they too now claimed a phase transition. Their papers were published back to back:
But at this point no actual pictures of molecular trajectories had yet been published, or, I believe, made. All there was were traditional plots of aggregated quantities. And in 1958, these plots made their first appearance in a textbook. Tucked into Appendix C of Elementary Statistical Physics by Berkeley physics professor Charles Kittel (who would later be chairman of the group developing the Berkeley Physics Course book series) were two rather confusing plots about the approach to the Maxwell–Boltzmann distribution taken from a prepublication version of Alder and Wainwright’s paper:
Alder and Wainwright’s phase transition result had created enough of a stir that they were asked to write a Scientific American article about it. And in that article—entitled “Molecular Motions”, from October 1959—there were finally pictures of actual trajectories, with their caption explaining that the “paths of particles … appear as bright lines on the face of a cathoderay tube hooked to a computer” (the paths are of the centers of the colliding disks):
A technical article published at the same time gave a diagram of the logic for the dynamical computation:
Then in 1960 Livermore (after various delays) took delivery of the LARC computer—arguably the first scientific supercomputer—which allowed molecular dynamics computations to be done perhaps 20 times times faster. A 1962 picture shows Berni Alder (left) and Thomas Wainwright (right) looking at outputs from the LARC with Mary Ann Mansigh (yes, in those days it was typical for male physicists to wear ties):
And in 1964, the pictures for the Statistical Physics book (and film loop) got made, with Mary Ann Mansigh painstakingly constructing images of disks on the oscilloscope display.
Work on molecular dynamics continued, though to do it required the most powerful computers, so for many years it was pretty much restricted to places like Livermore. And in 1967, Alder and Wainwright made another discovery about hard spheres. Even in their first paper about molecular dynamics they’d plotted the velocity autocorrelation function, and noted that it decayed roughly exponentially with time. But by 1967 they had much more precise data, and realized that there was a deviation from exponential decay: a definite “longtime tail”. And soon they had figured out that this powerlaw tail was basically the result of a continuum hydrodynamic effect (essentially a vortex) operating even on the scale of a few molecules. (And—though it didn’t occur to me at the time—this should have suggested that even with fairly small numbers of cells cellular automaton fluid simulations had a good chance of giving recognizable hydrodynamic results.)
It’s never been entirely easy to do molecular dynamics, even with hard spheres, not least because in standard computations one’s inevitably confronted with things like numerical roundoff errors. And no doubt this is why some of the obvious foundational questions about the Second Law weren’t really explored there, and why intrinsic randomness generation and the rule 30 phenomenon weren’t identified.
Incidentally, even before molecular dynamics emerged, there was already one computer study of what could potentially have been Second Law behavior. Visiting Los Alamos in the early 1950s Enrico Fermi had gotten interested in using computers for physics, and wondered what would happen if one simulated the motion of an array of masses with nonlinear springs between them. The results of running this on the MANIAC computer were reported in 1955 (after Fermi had died)
and it was noted that there wasn’t just exponential approach to equilibrium, but instead something more complicated (later connected to solitons). Strangely, though, instead of plotting actual particle trajectories, what were given were mode energies—but these still exhibited what, if it hadn’t been obscured by continuum issues, might have been recognized as something like the rule 30 phenomenon:
But I knew none of this history when I saw the Statistical Physics book cover in 1972. And indeed, for all I knew, it could have been a “standard statistical physics cover picture”. I didn’t know it was the first of its kind—and a leadingedge example of the use of computers for basic science, accessible only with the most powerful computers of the time. Of course, had I known those things, I probably wouldn’t have tried to reproduce the picture myself and I wouldn’t have had that early experience in trying to use a computer to do science. (Curiously enough, looking at the numbers now, I realize that the base speed of the LARC was only 20x the Elliott 903C, though with floating point, etc.—a factor that pales in comparison with the 500x speedup in computers in the 40 years since I started working on cellular automata.)
But now I know the history of that book cover, and where it came from. And what I only just discovered now is that actually there’s a bigger circle than I knew. Because the path from Berni Alder to that book cover to my work on cellular automaton fluids came full circle—when in 1988 Alder wrote a paper based on cellular automaton fluids (though through the vicissitudes of academic behavior I don’t think he knew these had been my idea—and now it’s too late to tell him his role in seeding them):
Notes & Thanks
There are many people who’ve contributed to the 50year journey I’ve described here. Some I’ve already mentioned by name, but others not—including many who doubtless wouldn’t even be aware that they contributed. The longtime store clerk at Blackwell’s bookstore who in 1972 sold college physics books to a 12year old without batting an eye. (I learned his name—Keith Clack—30 years later when he organized a book signing for A New Kind of Science at Blackwell’s.) John Helliwell and Lawrence Wickens who in 1977 invited me to give the first talk where I explicitly discussed the foundations of the Second Law. Douglas Abraham who in 1977 taught a course on mathematical statistical mechanics that I attended. Paul Davies who wrote a book on The Physics of Time Asymmetry that I read around that time. Rocky Kolb who in 1979 and 1980 worked with me on cosmology that used statistical mechanics. The students (including professors like Steve Frautschi and David Politzer) who attended my 1981 class at Caltech about “nonequilibrium statistical mechanics”. David Pines and Elliott Lieb who in 1983 were responsible for publishing my breakout paper on “Statistical Mechanics of Cellular Automata”. Charles Bennett (curiously, a student of Berni Alder’s) with whom in the early 1980s I discussed applying computation theory (notably the ideas of Greg Chaitin) to physics. Brian Hayes who commissioned my 1984 Scientific American article, and Peter Brown who edited it. Danny Hillis and Sheryl Handler who in 1984 got me involved with Thinking Machines. Jim Salem and Bruce Nemnich (Walker) who worked on fluid dynamics on the Connection Machine with me. Then—36 years later—Jonathan Gorard and Max Piskunov, who catalyzed the doing of our Physics Project.
In the last 50 years, there’ve been surprisingly few people with whom I’ve directly discussed the foundations of the Second Law. Perhaps one reason is that back when I was a “professional physicist” statistical mechanics as a whole wasn’t a prominent area. But, more important, as I’ve described elsewhere, for more than a century most physicists have effectively assumed that the foundations of the Second Law are a solved (or at least merely pedantic) problem.
Probably the single person with whom I had the most discussions about the foundations of the Second Law is Richard Feynman. But there are others with whom at one time or another I’ve discussed related issues, including: Bruce Boghosian, Richard Crandall, Roger Dashen, Mitchell Feigenbaum, Nigel Goldenfeld, Theodore Gray, Bill Hayes, Joel Lebowitz, David Levermore, Ed Lorenz, John Maddox, Roger Penrose, Ilya Prigogine, Rudy Rucker, David Ruelle, Rob Shaw, Yakov Sinai, Michael Trott, Léon van Hove and Larry Yaffe. (There are also many others with whom I’ve discussed general issues about origins of randomness.)
Finally, one technical note about the presentation here: in an effort to maintain a clearer timeline, I’ve typically shown the earliest drafts or preprint versions of papers that I have. Their final published versions (if indeed they were ever published) appeared anything from weeks to years later, sometimes with changes.
]]>As I’ll explain elsewhere, I think I now finally understand the Second Law of thermodynamics. But it’s a new understanding, and to get to it I’ve had to overcome a certain amount of conventional wisdom about the Second Law that I at least have long taken for granted. And to check myself I’ve been keen to know just where this conventional wisdom came from, how it’s been validated, and what might have made it go astray.
And from this I’ve been led into a rather detailed examination of the origins and history of thermodynamics. All in all, it’s a fascinating story, that both explains what’s been believed about thermodynamics, and provides some powerful examples of the complicated dynamics of the development and acceptance of ideas.
The basic concept of the Second Law was first formulated in the 1850s, and rather rapidly took on something close to its modern form. It began partly as an empirical law, and partly as something abstractly constructed on the basis of the idea of molecules, that nobody at the time knew for sure existed. But by the end of the 1800s, with the existence of molecules increasingly firmly established, the Second Law began to often be treated as an almostmathematicallyproven necessary law of physics. There were still mathematical loose ends, as well as issues such as its application to living systems and to systems involving gravity. But the almostuniversal conventional wisdom became that the Second Law must always hold, and if it didn’t seem to in a particular case, then that must just be because there was something one didn’t yet understand about that case.
There was also a sense that regardless of its foundations, the Second Law was successfully used in practice. And indeed particularly in chemistry and engineering it’s often been in the background, justifying all the computations routinely done using entropy. But despite its ubiquitous appearance in textbooks, when it comes to foundational questions, there’s always been a certain air of mystery around the Second Law. Though after 150 years there’s typically an assumption that “somehow it must all have been worked out”. I myself have been interested in the Second Law now for a little more than 50 years, and over that time I’ve had a growing awareness that actually, no, it hasn’t all been worked out. Which is why, now, it’s wonderful to see the computational paradigm—and ideas from our Physics Project—after all these years be able to provide solid foundations for understanding the Second Law, as well as seeing its limitations.
And from the vantage point of the understanding we now have, we can go back and realize that there were precursors of it even from long ago. In some ways it’s all an inspiring tale—of how there were scientists with ideas ahead of their time, blocked only by the lack of a conceptual framework that would take another century to develop. But in other ways it’s also a cautionary tale, of how the forces of “conventional wisdom” can blind people to unanswered questions and—over a surprisingly long time—inhibit the development of new ideas.
But, first and foremost, the story of the Second Law is the story of a great intellectual achievement of the mid19th century. It’s exciting now, of course, to be able to use the latest 21stcentury ideas to take another step. But to appreciate how this fits in with what’s already known we have to go back and study the history of what originally led to the Second Law, and how what emerged as conventional wisdom about it took shape.
Once it became clear what heat is, it actually didn’t take long for the Second Law to be formulated. But for centuries—and indeed until the mid1800s—there was all sorts of confusion about the nature of heat.
That there’s a distinction between hot and cold is a matter of basic human perception. And seeing fire one might imagine it as a disembodied form of heat. In ancient Greek times Heraclitus (~500 BC) talked about everything somehow being “made of fire”, and also somehow being intrinsically “in motion”. Democritus (~460–~370 BC) and the Epicureans had the important idea (that also arose independently in other cultures) that everything might be made of large numbers of a few types of tiny discrete atoms. They imagined these atoms moving around in the “void” of space. And when it came to heat, they seem to have correctly associated it with the motion of atoms—though they imagined it came from particular spherical “fire” atoms that could slide more quickly between other atoms, and they also thought that souls were the ultimate sources of motion and heat (at least in warmblooded animals?), and were made of fire atoms.
And for two thousand years that’s pretty much where things stood. And indeed in 1623 Galileo (1564–1642) (in his book The Assayer, about weighing competing world theories) was still saying:
Those materials which produce heat in us and make us feel warmth, which are known by the general name of “fire,” would then be a multitude of minute particles having certain shapes and moving with certain velocities. Meeting with our bodies, they penetrate by means of their extreme subtlety, and their touch as felt by us when they pass through our substance is the sensation we call “heat.”
He goes on:
Since the presence of firecorpuscles alone does not suffice to excite heat, but their motion is needed also, it seems to me that one may very reasonably say that motion is the cause of heat… But I hold it to be silly to accept that proposition in the ordinary way, as if a stone or piece of iron or a stick must heat up when moved. The rubbing together and friction of two hard bodies, either by resolving their parts into very subtle flying particles or by opening an exit for the tiny firecorpuscles within, ultimately sets these in motion; and when they meet our bodies and penetrate them, our conscious mind feels those pleasant or unpleasant sensations which we have named heat…
And although he can tell there’s something different about it, he thinks of heat as effectively being associated with a substance or material:
The tenuous material which produces heat is even more subtle than that which causes odor, for the latter cannot leak through a glass container, whereas the material of heat makes its way through any substance.
In 1620, Francis Bacon (1561–1626) (in his “update on Aristotle”, The New Organon) says, a little more abstractly, if obscurely—and without any reference to atoms or substances:
[It is not] that heat generates motion or that motion generates heat (though both are true in certain cases), but that heat itself, its essence and quiddity, is motion and nothing else.
But real progress in understanding the nature of heat had to wait for more understanding about the nature of gases, with air being the prime example. (It was actually only in the 1640s that any kind of general notion of gas began to emerge—with the word “gas” being invented by the “antiGalen” physician Jan Baptista van Helmont (1580–1644), as a Dutch rendering of the Greek word “chaos”, that meant essentially “void”, or primordial formlessness.) Ever since antiquity there’d been Aristotlestyle explanations like “nature abhors a vacuum” about what nature “wants to do”. But by the mid1600s the idea was emerging that there could be more explicit and mechanical explanations for phenomena in the natural world.
And in 1660 Robert Boyle (1627–1691)—now thoroughly committed to the experimental approach to science—published New Experiments Physicomechanicall, Touching the Spring of the Air and its Effects in which he argued that air has an intrinsic pressure associated with it, which pushes it to fill spaces, and for which he effectively found Boyle’s Law PV = constant.
But what was air actually made of? Boyle had two basic hypotheses that he explained in rather flowery terms:
His first hypothesis was that air might be like a “fleece of wool” made of “aerial corpuscles” (gases were later often called “aeriform fluids”) with a “power or principle of selfdilatation” that resulted from there being “hairs” or “little springs” between these corpuscles. But he had a second hypothesis too—based, he said, on the ideas of “that most ingenious gentleman, Monsieur Descartes”: that instead air consists of “flexible particles” that are “so whirled around” that “each corpuscle endeavors to beat off all others”. In this second hypothesis, Boyle’s “spring of the air” was effectively the result of particles bouncing off each other.
And, as it happens, in 1668 there was quite an effort to understand the “laws of impact” (that would for example be applicable to balls in games like croquet and billiards, that had existed since at least the 1300s, and were becoming popular), with John Wallis (1616–1703), Christopher Wren (1632–1723) and Christiaan Huygens (1629–1695) all contributing, and Huygens producing diagrams like:
But while some understanding developed of what amount to impacts between pairs of hard spheres, there wasn’t the mathematical methodology—or probably the idea—to apply this to large collections of spheres.
Meanwhile, in his 1687 Principia Mathematica, Isaac Newton (1642–1727), wanting to analyze the properties of selfgravitating spheres of fluid, discussed the idea that fluids could in effect be made up of arrays of particles held apart by repulsive forces, as in Boyle’s first hypothesis. Newton had of course had great success with his 1/r^{2} universal attractive force for gravity. But now he noted (writing originally in Latin) that with a 1/r repulsive force between particles in a fluid, he could essentially reproduce Boyle’s law:
Newton discussed questions like whether one particle would “shield” others from the force, but then concluded:
But whether elastic fluids do really consist of particles so repelling each other, is a physical question. We have here demonstrated mathematically the property of fluids consisting of particles of this kind, that hence philosophers may take occasion to discuss that question.
Well, in fact, particularly given Newton’s authority, for well over a century people pretty much just assumed that this was how gases worked. There was one major exception, however, in 1738, when—as part of his eclectic mathematical career spanning probability theory, elasticity theory, biostatistics, economics and more—Daniel Bernoulli (1700–1782) published his book on hydrodynamics. Mostly he discusses incompressible fluids and their flow, but in one section he considers “elastic fluids”—and along with a whole variety of experimental results about atmospheric pressure in different places—draws the picture
and says
Let the space ECDF contain very small particles in rapid motion; as they strike against the piston EF and hold it up by their impact, they constitute an elastic fluid which expands as the weight P is removed or reduced; but if P is increased it becomes denser and presses on the horizontal case CD just as if it were endowed with no elastic property.
Then—in a direct and clear anticipation of the kinetic theory of heat—he goes on:
The pressure of the air is increased not only by reduction in volume but also by rise in temperature. As it is well known that heat is intensified as the internal motion of the particles increases, it follows that any increase in the pressure of air that has not changed its volume indicates more intense motion of its particles, which is in agreement with our hypothesis…
But at the time, and in fact for more than a century thereafter, this wasn’t followed up.
A large part of the reason seems to have been that people just assumed that heat ultimately had to have some kind of material existence; to think that it was merely a manifestation of microscopic motion was too abstract an idea. And then there was the observation of “radiant heat” (i.e. infrared radiation)—that seemed like it could only work by explicitly transferring some kind of “heat material” from one body to another.
But what was this “heat material”? It was thought of as a fluid—called caloric—that could suffuse matter, and for example flow from a hotter body to a colder. And in an echo of Democritus, it was often assumed that caloric consisted of particles that could slide between ordinary particles of matter. There was some thought that it might be related to the concept of phlogiston from the mid1600s, that was effectively a chemical substance, for example participating in chemical reactions or being generated in combustion (through the “principle of fire”). But the more mainstream view was that there were caloric particles that would collect around ordinary particles of matter (often called “molecules”, after the use of that term by Descartes (1596–1650) in 1620), generating a repulsive force that would for example expand gases—and that in various circumstances these caloric particles would move around, corresponding to the transfer of heat.
To us today it might seem hacky and implausible (perhaps a little like dark matter, cosmological inflation, etc.), but the caloric theory lasted for more than two hundred years and managed to explain plenty of phenomena—and indeed was certainly going strong in 1825 when Laplace wrote his A Treatise of Celestial Mechanics, which included a successful computation of properties of gases like the speed of sound and the ratio of specific heats, on the basis of a somewhat elaborated and mathematicized version of caloric theory (that by then included the concept of “caloric rays” associated with radiant heat).
But even though it wasn’t understood what heat ultimately was, one could still measure its attributes. Already in antiquity there were devices that made use of heat to produce pressure or mechanical motion. And by the beginning of the 1600s—catalyzed by Galileo’s development of the thermoscope (in which heated liquid could be seen to expand up a tube)—the idea quickly caught on of making thermometers, and of quantitatively measuring temperature.
And given a measurement of temperature, one could correlate it with effects one saw. So, for example, in the late 1700s the French balloonist Jacques Charles (1746–1823) noted the linear increase of volume of a gas with temperature. Meanwhile, at the beginning of the 1800s Joseph Fourier (1768–1830) (science advisor to Napoleon) developed what became his 1822 Analytical Theory of Heat, and in it he begins by noting that:
Heat, like gravity, penetrates every substance of the universe, its rays occupy all parts of space. The object of our work is to set forth the mathematical laws which this element obeys. The theory of heat will hereafter form one of the most important branches of general physics.
Later he describes what he calls the “Principle of the Communication of Heat”. He refers to “molecules”—though basically just to indicate a small amount of substance—and says
When two molecules of the same solid are extremely near and at unequal temperatures, the most heated molecule communicates to that which is less heated a quantity of heat exactly expressed by the product of the duration of the instant, of the extremely small difference of the temperatures, and of certain function of the distance of the molecules.
then goes on to develop what’s now called the heat equation and all sorts of mathematics around it, all the while effectively adopting a caloric theory of heat. (And, yes, if you think of heat as a fluid it does lead you to describe its “motion” in terms of differential equations just like Fourier did. Though it’s then ironic that Bernoulli, even though he studied hydrodynamics, seemed to have a less “fluidbased” view of heat.)
At the beginning of the 1800s the Industrial Revolution was in full swing—driven in no small part by the availability of increasingly efficient steam engines. There had been precursors of steam engines even in antiquity, but it was only in 1712 that the first practical steam engine was developed. And after James Watt (1736–1819) produced a much more efficient version in 1776, the adoption of steam engines began to take off.
Over the years that followed there were all sorts of engineering innovations that increased the efficiency of steam engines. But it wasn’t clear how far it could go—and whether for example there was a limit to how much mechanical work could ever, even in principle, be derived from a given amount of heat. And it was the investigation of this question—in the hands of a young French engineer named Sadi Carnot (1796–1832)—that began the development of an abstract basic science of thermodynamics, and to the Second Law.
The story really begins with Sadi Carnot’s father, Lazare Carnot (1753–1823), who was trained as an engineer but ascended to the highest levels of French politics, and was involved with both the French Revolution and Napoleon. Particularly in years when he was out of political favor, Lazare Carnot worked on mathematics and mathematical engineering. His first significant work—in 1778—was entitled Memoir on the Theory of Machines. The mathematical and geometrical science of mechanics was by then fairly well developed; Lazare Carnot’s objective was to understand its consequences for actual engineering machines, and to somehow abstract general principles from the mechanical details of the operation of those machines. In 1803 (alongside works on the geometrical theory of fortifications) he published his Fundamental Principles of [Mechanical] Equilibrium and Movement, which argued for what was at one time called (in a strange foreshadowing of reversible thermodynamic processes) “Carnot’s Principle”: that useful work in a machine will be maximized if accelerations and shocks of moving parts are minimized—and that a machine with perpetual motion is impossible.
Sadi Carnot was born in 1796, and was largely educated by his father until he went to college in 1812. It’s notable that during the years when Sadi Carnot was a kid, one of his father’s activities was to give opinions on a whole range of inventions—including many steam engines and their generalizations. Lazare Carnot died in 1823. Sadi Carnot was by that point a welleducated but professionally undistinguished French military engineer. But in 1824, at the age of 28, he produced his one published work, Reflections on the Motive Power of Fire, and on Machines to Develop That Power (where by “fire” he meant what we would call heat):
The style and approach of the younger Carnot’s work is quite similar to his father’s. But the subject matter turned out to be more fruitful. The book begins:
Everyone knows that heat can produce motion. That it possesses vast motivepower none can doubt, in these days when the steamengine is everywhere so well known… The study of these engines is of the greatest interest, their importance is enormous, their use is continually increasing, and they seem destined to produce a great revolution in the civilized world. Already the steamengine works our mines, impels our ships, excavates our ports and our rivers, forges iron, fashions wood, grinds grain, spins and weaves our cloths, transports the heaviest burdens, etc. It appears that it must some day serve as a universal motor, and be substituted for animal power, waterfalls, and air currents. …
Notwithstanding the work of all kinds done by steamengines, notwithstanding the satisfactory condition to which they have been brought today, their theory is very little understood, and the attempts to improve them are still directed almost by chance. …
The question has often been raised whether the motive power of heat is unbounded, whether the possible improvements in steamengines have an assignable limit, a limit which the nature of things will not allow to be passed by any means whatever; or whether, on the contrary, these improvements may be carried on indefinitely. We propose now to submit these questions to a deliberate examination.
Carnot operated very much within the framework of caloric theory, and indeed his ideas were crucially based on the concept that one could think about “heat itself” (which for him was caloric fluid), independent of the material substance (like steam) that was hot. But—like his father’s efforts with mechanical machines—his goal was to develop an abstract “metamodel” of something like a steam engine, crucially assuming that the generation of unbounded heat or mechanical work (i.e. perpetual motion) in the closed cycle of the operation of the machine was impossible, and noting (again with a reflection of his father’s work) that the system would necessarily maximize efficiency if it operated reversibly. And he then argued that:
The production of motive power is then due in steamengines not to an actual consumption of caloric, but to its transportation from a warm body to a cold body, that is, to its reestablishment of equilibrium…
In other words, what was important about a steam engine was that it was a “heat engine”, that “moved heat around”. His book is mostly words, with just a few formulas related to the behavior of ideal gases, and some tables of actual parameters for particular materials. But even though his underlying conceptual framework—of caloric theory—was not correct, the abstract arguments that he made (that involved essentially logical consequences of reversibility and of operating in a closed cycle) were robust enough that it didn’t matter, and in particular he was able to successfully show that there was a theoretical maximum efficiency for a heat engine, that depended only on the temperatures of its hot and cold reservoirs of heat. But what’s important for our purposes here is that in the setup Carnot constructed he basically ended up introducing the Second Law.
At the time it appeared, however, Carnot’s book was basically ignored, and Carnot died in obscurity from cholera in 1832 (about 9 months after Évariste Galois (1811–1832)) at the age of 36. (The Sadi Carnot who would later become president of France was his nephew.) But in 1834, Émile Clapeyron (1799–1864)—a rather distinguished French engineering professor (and steam engine designer)—wrote a paper entitled “Memoir on the Motive Power of Heat”. He starts off by saying about Carnot’s book:
The idea which serves as a basis of his researches seems to me to be both fertile and beyond question; his demonstrations are founded on the absurdity of the possibility of creating motive power or heat out of nothing. …
This new method of demonstration seems to me worthy of the attention of theoreticians; it seems to me to be free of all objection …
I believe that it is of some interest to take up this theory again; S. Carnot, avoiding the use of mathematical analysis, arrives by a chain of difficult and elusive arguments at results which can be deduced easily from a more general law which I shall attempt to prove…
Clapeyron’s paper doesn’t live up to the claims of originality or rigor expressed here, but it served as a more accessible (both in terms of where it was published and how it was written) exposition of Carnot’s work, featuring, for example, for the first time a diagrammatic representation of a Carnot cycle
as well as notations like Qforheat that are still in use today:
One of the implications of Newton’s Laws of Motion is that momentum is conserved. But what else might also be conserved? In the 1680s Gottfried Leibniz (1646–1716) suggested the quantity m v^{2}, which he called, rather grandly, vis viva—or, in English, “life force”. And yes, in things like elastic collisions, this quantity did seem to be conserved. But in plenty of situations it wasn’t. By 1807 the term “energy” had been introduced, but the question remained of whether it could in any sense globally be thought of as conserved.
It had seemed for a long time that heat was something a bit like mechanical energy, but the relation wasn’t clear—and the caloric theory of heat implied that caloric (i.e. the fluid corresponding to heat) was conserved, and so certainly wasn’t something that for example could be interconverted with mechanical energy. But in 1798 Benjamin Thompson (Count Rumford) (1753–1814) measured the heat produced by the mechanical process of boring a cannon, and began to make the argument that, in contradiction to the caloric theory, there was actually some kind of correspondence between mechanical energy and amount of heat.
It wasn’t a very accurate experiment, and it took until the 1840s—with new experiments by the English brewer and “amateur” scientist James Joule (1818–1889) and the German physician Robert Mayer (1814–1878)—before the idea of some kind of equivalence between heat and mechanical work began to look more plausible. And in 1847 this was something William Thomson (1824–1907) (later Lord Kelvin)—a prolific young physicist recently graduated from the Mathematical Tripos in Cambridge and now installed as a professor of “natural philosophy” (i.e. physics) in Glasgow—began to be curious about.
But first we have to go back a bit in the story. In 1845 Kelvin (as we’ll call him) had spent some time in Paris (primarily at at a lab that was measuring properties of steam for the French government), and there he’d learned about Carnot’s work from Clapeyron’s paper (at first he couldn’t get a copy of Carnot’s actual book). Meanwhile, one of the issues of the time was a proliferation of different temperature scales based on using different kinds of thermometers based on different substances. And in 1848 Kelvin realized that Carnot’s concept of a “pure heat engine”—assumed at the time to be based on caloric—could be used to define an “absolute” scale of temperature in which, for example, at absolute zero all caloric would have been removed from all substances:
Having found Carnot’s ideas useful, Kelvin in 1849 wrote a 33page summary of them (small world that it was then, the immediately preceding paper in the journal is “On the Theory of Rolling Curves”, written by the then17yearold James Clerk Maxwell (1831–1879), while the one that follows is “Theoretical Considerations on the Effect of Pressure in Lowering the Freezing Point of Water” by James Thomson (1786–1849), engineeringoriented older brother of William):
He characterizes Carnot’s work as being based not so much on physics and experiment, but on the “strictest principles of philosophy”:
He doesn’t immediately mention “caloric” (though it does slip in later), referring instead to a vaguer concept of “thermal agency”:
In keeping with the idea that this is more philosophy than experimental science, he refers to “Carnot’s fundamental principle”—that after a complete cycle an engine can be treated as back in the “same state”—while adding the footnote that “this is tacitly assumed as an axiom”:
In actuality, to say that an engine comes back to the same state is a nontrivial statement of the existence of some kind of unique equilibrium in the system, related to the Second Law. But in 1848 Kelvin brushes this off by saying that the “axiom” has “never, so far as I am aware, been questioned by practical engineers”.
His next page is notable for the firstever use of the term “thermodynamic” (then hyphenated) to discuss systems where what matters is “the dynamics of heat”:
That same page has a curious footnote presaging what will come, and making the statement that “no energy can be destroyed”, and considering it “perplexing” that this seems incompatible with Carnot’s work and its caloric theory framework:
After going through Carnot’s basic arguments, the paper ends with an appendix in which Kelvin basically says that even though the theory seems to just be based on a formal axiom, it should be experimentally tested:
He proceeds to give some tests, which he claims agree with Carnot’s results—and finally ends with a very practical (but probably not correct) table of theoretical efficiencies for steam engines of his day:
But now what of Joule’s and Mayer’s experiments, and their apparent disagreement with the caloric theory of heat? By 1849 a new idea had emerged: that perhaps heat was itself a form of energy, and that, when heat was accounted for, the total energy of a system would always be conserved. And what this suggested was that heat was somehow a dynamical phenomenon, associated with microscopic motion—which in turn suggested that gases might indeed consist just of molecules in motion.
And so it was that in 1850 Kelvin (then still “William Thomson”) wrote a long exposition “On the Dynamical Theory of Heat”, attempting to reconcile Carnot’s ideas with the new concept that heat was dynamical in origin:
He begins by quoting—presumably for some kind of “Britishbased authority”—an “anticaloric” experiment apparently done by Humphry Davy (1778–1829) as a teenager, involving melting pieces of ice by rubbing them together, and included anonymously in a 1799 list of pieces of knowledge “principally from the west of England”:
But soon Kelvin is getting to the main point:
And then we have it: a statement of the Second Law (albeit with some hedging to which we’ll come back later):
And there’s immediately a footnote that basically asserts the “absurdity” of a SecondLawviolating perpetual motion machine:
But by the next page we find out that Kelvin admits he’s in some sense been “scooped”—by a certain Rudolf Clausius (1822–1888), who we’ll be discussing soon. But what’s remarkable is that Clausius’s “axiom” turns out to be exactly equivalent to Kelvin’s statement:
And what this suggests is that the underlying concept—the Second Law—is something quite robust. And indeed, as Kelvin implies, it’s the main thing that ultimately underlies Carnot’s results. And so even though Carnot is operating on the nowoutmoded idea of caloric theory, his main results are still correct, because in the end all they really depend on is a certain amount of “logical structure”, together with the Second Law (and a version of the First Law, but that’s a slightly trickier story).
Kelvin recognized, though, that Carnot had chosen to look at the particular (“equilibrium thermodynamics”) case of processes that occur reversibly, effectively at an infinitesimal rate. And at the end of the first installment of his exposition, he explains that things will be more complicated if finite rates are considered—and that in particular the results one gets in such cases will depend on things like having a correct model for the nature of heat.
Kelvin’s exposition on the “dynamical nature of heat” runs to four installments, and the next two dive into detailed derivations and attempted comparison with experiment:
But before Kelvin gets to publish part four of his exposition he publishes two other pieces. In the first, he’s talking about sources of energy for human use (now that he believes energy is conserved):
He emphasizes that the Sun is—directly or indirectly—the main source of energy on Earth (later he’ll argue that coal will run out, etc.):
But he wonders how animals actually manage to produce mechanical work, noting that “the animal body does not act as a thermodynamic engine; and [it is] very probable that the chemical forces produce the external mechanical effects through electrical means”:
And then, by April 1852, he’s back to thinking directly about the Second Law, and he’s cut through the technicalities, and is stating the Second Law in everyday (if slightly ponderous) terms:
It’s interesting to see his apparently rather deeply held Presbyterian beliefs manifest themselves here in his mention that “Creative Power” is what must set the total energy of the universe. He ends his piece with:
In (2) the hedging is interesting. He makes the definitive assertion that what amounts to a violation of the Second Law “is impossible in inanimate material processes”. And he’s pretty sure the same is true for “vegetable life” (recognizing that in his previous paper he discussed the harvesting of sunlight by plants). But what about “animal life”, like us humans? Here he says that “by our will” we can’t violate the Second Law—so we can’t, for example, build a machine to do it. But he leaves it open whether we as humans might have some innate (“Godgiven”?) ability to overcome the Second Law.
And then there’s his (3). It’s worth realizing that his whole paper is less than 3 pages long, and right before his conclusions we’re seeing triple integrals:
So what is (3) about? It’s presumably something like a SecondLawimpliesheatdeathoftheuniverse statement (but what’s this stuff about the past?)—but with an added twist that there’s something (God?) beyond the “known operations going on at present in the material world” that might be able to swoop in to save the world for us humans.
It doesn’t take people long to pick up on the “cosmic significance” of all this. But in the fall of 1852, Kelvin’s colleague, the Glasgow engineering professor William Rankine (1820–1872) (who was deeply involved with the First Law of thermodynamics), is writing about a way the universe might save itself:
After touting the increasingly solid evidence for energy conservation and the First Law
he goes on to talk about dissipation of energy and what we now call the Second Law
and the fact that it implies an “end of all physical phenomena”, i.e. heat death of the universe. He continues:
But now he offers a “ray of hope”. He believes that there must exist a “medium capable of transmitting light and heat”, i.e. an aether, “[between] the heavenly bodies”. And if this aether can’t itself acquire heat, he concludes that all energy must be converted into a radiant form:
Now he supposes that the universe is effectively a giant drop of aether, with nothing outside, so that all this radiant energy will get totally internally reflected from its surface, allowing the universe to “[reconcentrate] its physical energies, and [renew] its activity and life”—and save it from heat death:
He ends with the speculation that perhaps “some of the luminous objects which we see in distant regions of space may be, not stars, but foci in the interstellar aether”.
But independent of cosmic speculations, Kelvin himself continues to study the “dynamical theory of gases”. It’s often a bit unclear what’s being assumed. There’s the First Law (energy conservation). And the Second Law. But there’s also reversibility. Equilibrium. And the ideal gas law (P V = R T). But it soon becomes clear that that’s not always correct for real gases—as the Joule–Thomson effect demonstrates:
Kelvin soon returned to more cosmic speculations, suggesting that perhaps gravitation—rather than direct “Creative Power”—might “in reality [be] the ultimate created antecedent of all motion…”:
Not long after these papers Kelvin got involved with the practical “electrical” problem of laying a transatlantic telegraph cable, and in 1858 was on the ship that first succeeded in doing this. (His commercial efforts soon allowed him to buy a 126ton yacht.) But he continued to write physics papers, which ranged over many different areas, occasionally touching thermodynamics, though most often in the service of answering a “general science” question—like how old the Sun is (he estimated 32,000 years from thermodynamic arguments, though of course without knowledge of nuclear reactions).
Kelvin’s ideas about the inevitable dissipation of “useful energy” spread quickly—by 1854, for example, finding their way into an eloquent public lecture by Hermann von Helmholtz (1821–1894). Helmholtz had trained as a doctor, becoming in 1843 a surgeon to a German military regiment. But he was also doing experiments and developing theories about “animal heat” and how muscles manage to “do mechanical work”, for example publishing an 1845 paper entitled “On Metabolism during Muscular Activity”. And in 1847 he was one of the inventors of the law of conservation of energy—and the First Law of thermodynamics—as well as perhaps its clearest expositor at the time (the word “force” in the title is what we now call “energy”):
By 1854 Helmholtz was a physiology professor, beginning a distinguished career in physics, psychophysics and physiology—and talking about the Second Law and its implications. He began his lecture by saying that “A new conquest of very general interest has been recently made by natural philosophy”—and what he’s referring to here is the Second Law:
Having discussed the inability of “automata” (he uses that word) to reproduce living systems, he starts talking about perpetual motion machines:
First he disposes of the idea that perpetual motion can be achieved by generating energy from nothing (i.e. violating the First Law), charmingly including the anecdote:
And then he’s on to talking about the Second Law
and discussing how it implies the heat death of the universe:
He notes, correctly, that the Second Law hasn’t been “proved”. But he’s impressed at how Kelvin was able to go from a “mathematical formula” to a global fact about the fate of the universe:
He ends the whole lecture quite poetically:
We’ve talked quite a bit about Kelvin and how his ideas spread. But let’s turn now to Rudolf Clausius, who in 1850 at least to some extent “scooped” Kelvin on the Second Law. At that time Clausius was a freshly minted German physics PhD. His thesis had been on an ingenious but ultimately incorrect theory of why the sky is blue. But he’d also worked on elasticity theory, and there he’d been led to start thinking about molecules and their configurations in materials. By 1850 caloric theory had become fairly elaborate, complete with concepts like “latent heat” (bound to molecules) and “free heat” (able to be transferred). Clausius’s experience in elasticity theory made him skeptical, and knowing Mayer’s and Joule’s results he decided to break with the caloric theory—writing his careerlaunching paper (translated from German in 1851, with Carnot’s puissance motrice [“motive power”] being rendered as “moving force”):
The first installment of the English version of the paper gives a clear description of the ideal gas laws and the Carnot cycle, having started from a statement of the “caloricbusting” First Law:
The general discussion continues in the second installment, but now there’s a critical side comment that describes the “general deportment of heat, which everywhere exhibits the tendency to annul differences of temperature, and therefore to pass from a warmer body to a colder one”:
Clausius “has” the Second Law, as Carnot basically did before him. But when Kelvin quotes Clausius he does so much more forcefully:
But there it is: by 1852 the Second Law is out in the open, in at least two different forms. The path to reach it has been circuitous and quite technical. But in the end, stripped of its technical origins, the law seems somehow unsurprising and even obvious. For it’s a matter of common experience that heat flows from hotter bodies to colder ones, and that motion is dissipated by friction into heat. But the point is that it wasn’t until basically 1850 that the overall scientific framework existed to make it useful—or even really possible—to enunciate such observations as a formal scientific law.
Of course the fact that a law “seems true” based on common experience doesn’t mean it’ll always be true, and that there won’t be some special circumstance or elaborate construction that will evade it. But somehow the very fact that the Second Law had in a sense been “technically hard won”—yet in the end seemed so “obvious”—appears to have given it a sense of inevitability and certainty. And it didn’t hurt that somehow it seemed to have emerged from Carnot’s work, which had a certain air of “logical necessity”. (Of course, in reality, the Second Law entered Carnot’s logical structure as an “axiom”.) But all this helped set the stage for some of the curious confusions about the Second Law that would develop over the century that followed.
In the first half of the 1850s the Second Law had in a sense been presented in two ways. First, as an almost “footnotestyle” assumption needed to support the “pure thermodynamics” that had grown out of Carnot’s work. And second, as an explicitlystatedforthefirsttime—if “obvious”—“everyday” feature of nature, that was now realized as having potentially cosmic significance. But an important feature of the decade that followed was a certain progressive atleastphenomenological “mathematicization” of the Second Law—pursued most notably by Rudolf Clausius.
In 1854 Clausius was already beginning this process. Perhaps confusingly, he refers to the Second Law as the “second fundamental theorem [Hauptsatz]” in the “mechanical theory of heat”—suggesting it’s something that is proved, even though it’s really introduced just as an empirical law of nature, or perhaps a theoretical axiom:
He starts off by discussing the “first fundamental theorem”, i.e. the First Law. And he emphasizes that this implies that there’s a quantity U (which we now call “internal energy”) that is a pure “function of state”—so that its value depends only on the state of a system, and not the path by which that state was reached. And as an “application” of this, he then points out that the overall change in U in a cyclic process (like the one executed by Carnot’s heat engine) must be zero.
And now he’s ready to tackle the Second Law. He gives a statement that at first seems somewhat convoluted:
But soon he’s deriving this from a more “everyday” statement of the Second Law (which, notably, is clearly not a “theorem” in any normal sense):
After giving a Carnotstyle argument he’s then got a new statement (that he calls “the theorem of the equivalence of transformations”) of the Second Law:
And there it is: basically what we now call entropy (even with the same notation of Q for heat and T for temperature)—together with the statement that this quantity is a function of state, so that its differences are “independent of the nature of the process by which the transformation is effected”.
Pretty soon there’s a familiar expression for entropy change:
And by the next page he’s giving what he describes as “the analytical expression” of the Second Law, for the particular case of reversible cyclic processes:
A bit later he backs out of the assumption of reversibility, concluding that:
(And, yes, with modern mathematical rigor, that should be “nonnegative” rather than “positive”.)
He goes on to say that if something has changed after going around a cycle, he’ll call that an “uncompensated transformation”—or what we would now refer to as an irreversible change. He lists a few possible (now very familiar) examples:
Earlier in his paper he’s careful to say that T is “a function of temperature”; he doesn’t say it’s actually the quantity we measure as temperature. But now he wants to determine what it is:
He doesn’t talk about the ultimately critical assumption (effectively the Zeroth Law of thermodynamics) that the system is “in equilibrium”, with a uniform temperature. But he uses an ideal gas as a kind of “standard material”, and determines that, yes, in that case T can be simply the absolute temperature.
So there it is: in 1854 Clausius has effectively defined entropy and described its relation to the Second Law, though everything is being done in a very “heatengine” style. And pretty soon he’s writing about “Theory of the SteamEngine” and filling actual approximate steam tables into his theoretical formulas:
After a few years “off” (working, as we’ll discuss later, on the kinetic theory of gases) Clausius is back in 1862 talking about the Second Law again, in terms of his “theorem of the equivalence of transformations”:
He’s slightly tightened up his 1854 discussion, but, more importantly, he’s now stating a result not just for reversible cyclic processes, but for general ones:
But what does this result really mean? Clausius claims that this “theorem admits of strict mathematical proof if we start from the fundamental proposition above quoted”—though it’s not particularly clear just what that proposition is. But then he says he wants to find a “physical cause”:
A little earlier in the paper he said:
So what does he think the “physical cause” is? He says that even from his first investigations he’d assumed a general law:
What are these “resistances”? He’s basically saying they are the forces between molecules in a material (which from his work on the kinetic theory of gases he now imagines exist):
He introduces what he calls the “disgregation” to represent the microscopic effect of adding heat:
For ideal gases things are straightforward, including the proportionality of “resistance” to absolute temperature. But in other cases, it’s not so clear what’s going on. A decade later he identifies “disgregation” with average kinetic energy per molecule—which is indeed proportional to absolute temperature. But in 1862 it’s all still quite muddy, with somewhat curious statements like:
And then the main part of the paper ends with what seems to be an anticipation of the Third Law of thermodynamics:
There’s an appendix entitled “On Terminology” which admits that between Clausius’s own work, and other people’s, it’s become rather difficult to follow what’s going on. He agrees that the term “energy” that Kelvin is using makes sense. He suggests “energy of the body” for what he calls U and we now call “internal energy”. He suggests “heat of the body” or “thermal content of the body” for Q. But then he talks about the fact that these are measured in thermal units (say the amount of heat needed to increase the temperature of water by 1°), while mechanical work is measured in units related to kilograms and meters. He proposes therefore to introduce the concept of “ergon” for “work measured in thermal units”:
And pretty soon he’s talking about the “interior ergon” and “exterior ergon”, as well as concepts like “ergonized heat”. (In later work he also tries to introduce the concept of “ergal” to go along with his development of what he called—in a name that did stick—the “virial theorem”.)
But in 1865 he has his biggest success in introducing a term. He’s writing a paper, he says, basically to clarify the Second Law, (or, as he calls it, “the second fundamental theorem”—rather confidently asserting that he will “prove this theorem”):
Part of the issue he’s trying to address is how the calculus is done:
The partial derivative symbol ∂ had been introduced in the late 1700s. He doesn’t use it, but he does introduce the nowstandardinthermodynamics subscript notation for variables that are kept constant:
A little later, as part of the “notational cleanup”, we see the variable S:
And then—there it is—Clausius introduces the term “entropy”, “Greekifying” his concept of “transformation”:
His paper ends with his famous crisp statements of the First and Second Laws of thermodynamics—manifesting the parallelism he’s been claiming between energy and entropy:
We began above by discussing the history of the question of “What is heat?” Was it like a fluid—the caloric theory? Or was it something more dynamical, and in a sense more abstract? But then we saw how Carnot—followed by Kelvin and Clausius—managed in effect to sidestep the question, and come up with all sorts of “thermodynamic conclusions”, by talking just about “what heat does” without ever really having to seriously address the question of “what heat is”. But to be able to discuss the foundations of the Second Law—and what it says about heat—we have to know more about what heat actually is. And the crucial development that began to clarify the nature of heat was the kinetic theory of gases.
Central to the kinetic theory of gases is the idea that gases are made up of discrete molecules. And it’s important to remember that it wasn’t until the beginning of the 1900s that anyone knew for sure that molecules existed. Yes, something like them had been discussed ever since antiquity, and in the 1800s there was increasing “circumstantial evidence” for them. But nobody had directly “seen a molecule”, or been able, for example, until about 1870, to even guess what the size of molecules might be. Still, by the mid1800s it had become common for physicists to talk and reason in terms of ordinary matter at least effectively being made of up molecules.
But if a gas was made of molecules bouncing off each other like billiard balls according to the laws of mechanics, what would its overall properties be? Daniel Bernoulli had in 1738 already worked out the basic answer that pressure would vary inversely with volume, or in his notation, π = P/s (and he even also gave formulas for molecules of nonzero size—in a precursor of van der Waals):
Results like Bernouilli’s would be rediscovered several times, for example in 1820 by John Herapath (1790–1868), a math teacher in England, who developed a fairly elaborate theory that purported to describe gravity as well as heat (but for example implied a P V = a T^{2} gas law):
Then there was the case of John Waterston (1811–1883), a naval instructor for the East India company, who in 1843 published a book called Thoughts on the Mental Functions, which included results on what he called the “vis viva theory of heat”—that he developed in more detail in a paper he wrote in 1846. But when he submitted the paper to the Royal Society it was rejected as “nonsense”, and its manuscript was “lost” until 1891 when it was finally published (with an “explanation” of the “delay”):
The paper had included a perfectly sensible mathematical analysis that included a derivation of the kinetic theory relation between pressure and meansquare molecular velocity:
But with all these pieces of work unknown, it fell to a German highschool chemistry teacher (and sometime professor and philosophical/theological writer) named August Krönig (1822–1879) to publish in 1856 yet another “rediscovery”, that he entitled “Principles of a Theory of Gases”. He said it was going to analyze the “mechanical theory of heat”, and once again he wanted to compute the pressure associated with colliding molecules. But to simplify the math, he assumed that molecules went only along the coordinate directions, at a fixed speed—almost anticipating a cellular automaton fluid:
What ultimately launched the subsequent development of the kinetic theory of gases, however, was the 1857 publication by Rudolf Clausius (by then an increasingly established German physics professor) of a paper entitled rather poetically “On the Nature of the Motion Which We Call Heat” (“Über die Art der Bewegung die wir Wärme nennen”):
It’s a clean and clear paper, with none of the mathematical muddiness around Clausius’s work on the Second Law (which, by the way, isn’t even mentioned in this paper even though Clausius had recently worked on it). Clausius figures out lots of the “obvious” implications of his molecular theory, outlining for example what happens in different phases of matter:
It takes him only a couple of pages of very light mathematics to derive the standard kinetic theory formula for the pressure of an ideal gas:
He’s implicitly assuming a certain randomness to the motions of the molecules, but he barely mentions it (and this particular formula is robust enough that average values are actually all that matter):
But having derived the formula for pressure, he goes on to use the ideal gas law to derive the relation between average molecular kinetic energy (which he still calls “vis viva”) and absolute temperature:
From this he can do things like work out the actual average velocities of molecules in different gases—which he does without any mention of the question of just how real or not molecules might be. By knowing experimental results about specific heats of gases he also manages to determine that not all the energy (“heat”) in a gas is associated with “translatory motion”: he realizes that for molecules involving several atoms there can be energy associated with other (as we would now say) internal degrees of freedom:
Clausius’s paper was widely read. And it didn’t take long before the Dutch meteorologist (and effectively founder of the World Meteorological Organization) Christophorus Buys Ballot (1817–1890) asked why—if molecules were moving as quickly as Clausius suggested—gases didn’t mix much more quickly than they’re observed to do:
Within a few months, Clausius published the answer: the molecules didn’t just keep moving in straight lines; they were constantly being deflected, to follow what we would now call a random walk. He invented the concept of a mean free path to describe how far on average a molecule goes before it hits another molecule:
As a capable theoretical physicist, Clausius quickly brings in the concept of probability
and is soon computing the average number of molecules which will survive undeflected for a certain distance:
Then he works out the mean free path λ (and it’s often still called λ):
And he concludes that actually there’s no conflict between rapid microscopic motion and largescale “diffusive” motion:
Of course, he could have actually drawn a sample random walk, but drawing diagrams wasn’t his style. And in fact it seems as if the first published drawing of a random walk was something added by John Venn (1834–1923) in the 1888 edition of his Logic of Chance—and, interestingly, in alignment with my computational irreducibility concept from a century later he used the digits of π to generate his “randomness”:
In 1859, Clausius’s paper came to the attention of the then28yearold James Clerk Maxwell, who had grown up in Scotland, done the Mathematical Tripos in Cambridge, and was now back in Scotland as professor of “natural philosophy” at Aberdeen. Maxwell had already worked on things like elasticity theory, color vision, the mechanics of tops, the dynamics of the rings of Saturn and electromagnetism—having published his first paper (on geometry) at age 14. And, by the way, Maxwell was quite a “diagrammist”—and his early papers include all sorts of pictures that he drew:
But in 1859 Maxwell applied his talents to what he called the “dynamical theory of gases”:
He models molecules as hard spheres, and sets about computing the “statistical” results of their collisions:
And pretty soon he’s trying to compute distribution of their velocities:
It’s a somewhat unconvincing (or, as Maxwell himself later put it, “precarious”) derivation (how does it work in 1D, for example?), but somehow it manages to produce what’s now known as the Maxwell distribution:
Maxwell observes that the distribution is the same as for “errors … in the ‘method of least squares’”:
Maxwell didn’t get back to the dynamical theory of gases until 1866, but in the meantime he was making a “dynamical theory” of something else: what he called the electromagnetic field:
Even though he’d worked extensively with the inverse square law of gravity he didn’t like the idea of “action at a distance”, and for example he wanted magnetic field lines to have some underlying “material” manifestation
imagining that they might be associated with arrays of “molecular vortices”:
We now know, of course, that there isn’t this kind of “underlying mechanics” for the electromagnetic field. But—with shades of the story of Carnot—even though the underlying framework isn’t right, Maxwell successfully derives correct equations for the electromagnetic field—that are now known as Maxwell’s equations:
His statement of how the electromagnetic field “works” is highly reminiscent of the dynamical theory of gases:
But he quickly and correctly adds:
And a few sections later he derives the idea of general electromagnetic waves
noting that there’s no evidence that the medium through which he assumes they’re propagating has elasticity:
By the way, when it comes to gravity he can’t figure out how to make his idea of a “mechanical medium” work:
But in any case, after using it as an inspiration for thinking about electromagnetism, Maxwell in 1866 returns to the actual dynamical theory of gases, still feeling that he needs to justify looking at a molecular theory:
And now he gives a recognizable (and correct, so far as it goes) derivation of the Maxwell distribution:
He goes on to try to understand experimental results on gases, about things like diffusion, viscosity and conductivity. For some reason, Maxwell doesn’t want to think of molecules, as he did before, as hard spheres. And instead he imagines that they have “action at a distance” forces, which basically work like hard squares if it’s r^{5} force law:
In the years that followed, Maxwell visited the dynamical theory of gases several more times. In 1871, a few years before he died at age 48, he wrote a textbook entitled Theory of Heat, which begins, in erudite fashion, discussing what “thermodynamics” should even be called:
Most of the book is concerned with the macroscopic “theory of heat”—though, as we’ll discuss later, in the very last chapter Maxwell does talk about the “molecular theory”, if in somewhat tentative terms.
The Second Law was in effect originally introduced as a formalization of everyday observations about heat. But the development of kinetic theory seemed to open up the possibility that the Second Law could actually be proved from the underlying mechanics of molecules. And this was something that Ludwig Boltzmann (1844–1906) embarked on towards the end of his physics PhD at the University of Vienna. In 1865 he’d published his first paper (“On the Movement of Electricity on Curved Surfaces”), and in 1866 he published his second paper, “On the Mechanical Meaning of the Second Law of Thermodynamics”:
The introduction promises “a purely analytical, perfectly general proof of the Second Law”. And what he seemed to imagine was that the equations of mechanics would somehow inevitably lead to motion that would reproduce the Second Law. And in a sense what computational irreducibility, rule 30, etc. now show is that in the end that’s indeed basically how things work. But the methods and conceptual framework that Boltzmann had at his disposal were very far away from being able to see that. And instead what Boltzmann did was to use standard mathematical methods from mechanics to compute average properties of cyclic mechanical motions—and then made the somewhat unconvincing claim that combinations of these averages could be related (e.g. via temperature as average kinetic energy) to “Clausius’s entropy”:
It’s not clear how much this paper was read, but in 1871 Boltzmann (now a professor of mathematical physics in Graz) published another paper entitled simply “On the Priority of Finding the Relationship between the Second Law of Thermodynamics and the Principle of Least Action” that claimed (with some justification) that Clausius’s thennewlyannounced virial theorem was already contained in Boltzmann’s 1866 paper.
But back in 1868—instead of trying to get all the way to Clausius’s entropy—Boltzmann instead uses mechanics to get a generalization of Maxwell’s law for the distribution of molecular velocities. His paper “Studies on the Equilibrium of [Kinetic Energy] between [Point Masses] in Motion” opens by saying that while analytical mechanics has in effect successfully studied the evolution of mechanical systems “from a given state to another”, it’s had little to say about what happens when such systems “have been left moving on their own for a long time”. He intends to remedy that, and spends 47 pages—complete with elaborate diagrams and formulas about collisions between hard spheres—in deriving an exponential distribution of energies if one assumes “equilibrium” (or, more specifically, balance between forward and backward processes):
It’s notable that one of the mathematical approaches Boltzmann uses is to discretize (i.e. effectively quantize) things, then look at the “combinatorial” limit. (Based on his later statements, he didn’t want to trust “purely continuous” mathematics—at least in the context of discrete molecular processes—and wanted to explicitly “watch the limits happening”.) But in the end it’s not clear that Boltzmann’s 1868 arguments do more than the fewline functionalequation approach that Maxwell had already used. (Maxwell would later complain about Boltzmann’s “overly long” arguments.)
Boltzmann’s 1868 paper had derived what the distribution of molecular energies should be “in equilibrium”. (In 1871 he was talking about “equipartition” not just of kinetic energy, but also of energies associated with “internal motion” of polyatomic molecules.) But what about the approach to equilibrium? How would an initial distribution of molecular energies evolve over time? And would it always end up at the exponential (“Maxwell–Boltzmann”) distribution? These are questions deeply related to a microscopic understanding of the Second Law. And they’re what Boltzmann addressed in 1872 in his 22nd published paper “Further Studies on the Thermal Equilibrium of Gas Molecules”:
Boltzmann explains that:
Maxwell already found the value Av^{2} e–Bv^{2} [for the distribution of velocities] … so that the probability of different velocities is given by a formula similar to that for the probability of different errors of observation in the theory of the method of least squares. The first proof which Maxwell gave for this formula was recognized to be incorrect even by himself. He later gave a very elegant proof that, if the above distribution has once been established, it will not be changed by collisions. He also tries to prove that it is the only velocity distribution that has this property. But the latter proof appears to me to contain a false inference. It has still not yet been proved that, whatever the initial state of the gas may be, it must always approach the limit found by Maxwell. It is possible that there may be other possible limits. This proof is easily obtained, however, by the method which I am about to explain…
(He gives a long footnote explaining why Maxwell might be wrong, talking about how a sequence of collisions might lead to a “cycle of velocity states”—which Maxwell hasn’t proved will be traversed with equal probability in each direction. Ironically, this is actually already an analog of where things are going to go wrong with Boltzmann’s own argument.)
The main idea of Boltzmann’s paper is not to assume equilibrium, but instead to write down an equation (now called the Boltzmann Transport Equation) that explicitly describes how the velocity (or energy) distribution of molecules will change as a result of collisions. He begins by defining infinitesimal changes in time:
He then goes through a rather elaborate analysis of velocities before and after collisions, and how to integrate over them, and eventually winds up with a partial differential equation for the time variation of the energy distribution (yes, he confusingly uses x to denote energy)—and argues that Maxwell’s exponential distribution is a stationary solution to this equation:
A few paragraphs further on, something important happens: Boltzmann introduces a function that here he calls E, though later he’ll call it H:
Ten pages of computation follow
and finally Boltzmann gets his main result: if the velocity distribution evolves according to his equation, H can never increase with time, becoming zero for the Maxwell distribution. In other words, he is saying that he’s proved that a gas will always (“monotonically”) approach equilibrium—which seems awfully like some kind of microscopic proof of the Second Law.
But then Boltzmann makes a bolder claim:
It has thus been rigorously proved that, whatever the initial distribution of kinetic energy may be, in the course of a very long time it must always necessarily approach the one found by Maxwell. The procedure used so far is of course nothing more than a mathematical artifice employed in order to give a rigorous proof of a theorem whose exact proof has not previously been found. It gains meaning by its applicability to the theory of polyatomic gas molecules. There one can again prove that a certain quantity E can only decrease as a consequence of molecular motion, or in a limiting case can remain constant. One can also prove that for the atomic motion of a system of arbitrarily many material points there always exists a certain quantity which, in consequence of any atomic motion, cannot increase, and this quantity agrees up to a constant factor with the value found for the wellknown integral ∫dQ/T in my [1871] paper on the “Analytical proof of the 2nd law, etc.”. We have therefore prepared the way for an analytical proof of the Second Law in a completely different way from those previously investigated. Up to now the object has been to show that ∫dQ/T = 0 for reversible cyclic processes, but it has not been proved analytically that this quantity is always negative for irreversible processes, which are the only ones that occur in nature. The reversible cyclic process is only an ideal, which one can more or less closely approach but never completely attain. Here, however, we have succeeded in showing that ∫dQ/T is in general negative, and is equal to zero only for the limiting case, which is of course the reversible cyclic process (since if one can go through the process in either direction, ∫dQ/T cannot be negative).
In other words, he’s saying that the quantity H that he’s defined microscopically in terms of velocity distributions can be identified (up to a sign) with the entropy that Clausius defined as dQ/T. He says that he’ll show this in the context of analyzing the mechanics of polyatomic molecules.
But first he’s going to take a break and show that his derivation doesn’t need to assume continuity. In a prequantummechanics precellularautomatonfluid kind of way he replaces all the integrals by limits of sums of discrete quantities (i.e. he’s quantizing kinetic energy, etc.):
He says that this discrete approach makes everything clearer, and quotes Lagrange’s derivation of vibrations of a string as an example of where this has happened before. But then he argues that everything works out fine with the discrete approach, and that H still decreases, with the Maxwell distribution as the only possible end point. As an aside, he makes a jab at Maxwell’s derivation, pointing out that with Maxwell’s functional equation:
… there are infinitely many other solutions, which are not useful however since ƒ(x) comes out negative or imaginary for some values of x. Hence, it follows very clearly that Maxwell’s attempt to prove a priori that his solution is the only one must fail, since it is not the only one but rather it is the only one that gives purely positive probabilities, and therefore the only useful one.
But finally—after another aside about computing thermal conductivities of gases—Boltzmann digs into polyatomic molecules, and his claim about H being related to entropy. There’s another 26 pages of calculations, and then we get to a section entitled “Solution of Equation (81) and Calculation of Entropy”. More pages of calculation about polyatomic molecules ensue. But finally we’re computing H, and, yes, it agrees with the Clausius result—but anticlimactically he’s only dealing with the case of equilibrium for monatomic molecules, where we already knew we got the Maxwell distribution:
And now he decides he’s not talking about polyatomic molecules anymore, and instead:
In order to find the relation of the quantity [H] to the second law of thermodynamics in the form ∫dQ/T < 0, we shall interpret the system of mass points not, as previously, as a gas molecule, but rather as an entire body.
But then, in the last couple of pages of his paper, Boltzmann pulls out another idea. He’s discussed the concept that polyatomic molecules (or, now, whole systems) can be in many different configurations, or “phases”. But now he says: “We shall replace [our] single system by a large number of equivalent systems distributed over many different phases, but which do not interact with each other”. In other words, he’s introducing the idea of an ensemble of states of a system. And now he says that instead of looking at the distribution just for a single velocity, we should do it for all velocities, i.e. for the whole “phase” of the system.
[These distributions] may be discontinuous, so that they have large values when the variables are very close to certain values determined by one or more equations, and otherwise vanishingly small. We may choose these equations to be those that characterize visible external motion of the body and the kinetic energy contained in it. In this connection it should be noted that the kinetic energy of visible motion corresponds to such a large deviation from the final equilibrium distribution of kinetic energy
that it leads to an infinity in H, so that from the point of view of the Second Law of thermodynamics it acts like heat supplied from an infinite temperature.
There are a bunch of ideas swirling around here. Phasespace density (cf. Liouville’s equation). Coarsegrained variables. Microscopic representation of mechanical work. Etc. But the paper is ending. There’s a discussion about H for systems that interact, and how there’s an equilibrium value achieved. And finally there’s a formula for entropy
that Boltzmann said “agrees … with the expression I found in my previous [1871] paper”.
So what exactly did Boltzmann really do in his 1872 paper? He introduced the Boltzmann Transport Equation which allows one to compute at least certain nonequilibrium properties of gases. But is his ƒ log ƒ quantity really what we can call “entropy” in the sense Clausius meant? And is it true that he’s proved that entropy (even in his sense) increases? A century and a half later there’s still a remarkable level of confusion around both these issues.
But in any case, back in 1872 Boltzmann’s “minimum theorem” (now called his “H theorem”) created quite a stir. But after some time there was an objection raised, which we’ll discuss below. And partly in response to this, Boltzmann (after spending time working on microscopic models of electrical properties of materials—as well as doing some actual experiments) wrote another major paper on entropy and the Second Law in 1877:
The translated title of the paper is “On the Relation between the Second Law of Thermodynamics and Probability Theory with Respect to the Laws of Thermal Equilibrium”. And at the very beginning of the paper Boltzmann makes a statement that was pivotal for future discussions of the Second Law: he says it’s now clear to him that an “analytical proof” of the Second Law is “only possible on the basis of probability calculations”. Now that we know about computational irreducibility and its implications one could say that this was the point where Boltzmann and those who followed him went off track in understanding the true foundations of the Second Law. But Boltzmann’s idea of introducing probability theory was effectively what launched statistical mechanics, with all its rich and varied consequences.
Boltzmann makes his basic claim early in the paper
with the statement (quoting from a comment in a paper he’d written earlier the same year) that “it is clear” (always a dangerous thing to say!) that in thermal equilibrium all possible states of the system—say, spatially uniform and nonuniform alike—are equally probable
… comparable to the situation in the game of Lotto where every single quintet is as improbable as the quintet 12345. The higher probability that the state distribution becomes uniform with time arises only because there are far more uniform than nonuniform state distributions…
He goes on:
[Thus] it is possible to calculate the thermal equilibrium state by finding the probability of the different possible states of the system. The initial state will in most cases be highly improbable but from it the system will always rapidly approach a more probable state until it finally reaches the most probable state, i.e., that of thermal equilibrium. If we apply this to the Second Law we will be able to identify the quantity which is usually called entropy with the probability of the particular state…
He’s talked about thermal equilibrium, even in the title, but now he says:
… our main purpose here is not to limit ourselves to thermal equilibrium, but to explore the relationship of the probabilistic formulation to the [Second Law].
He says his goal is to calculate probability distribution for different states, and he’ll start with
as simple a case as possible, namely a gas of rigid absolutely elastic spherical molecules trapped in a container with absolutely elastic walls. (Which interact with central forces only within a certain small distance, but not otherwise; the latter assumption, which includes the former as a special case, does not change the calculations in the least).
In other words, yet again he’s going to look at hard sphere gases. But, he says:
Even in this case, the application of probability theory is not easy. The number of molecules is not infinite, in a mathematical sense, yet the number of velocities each molecule is capable of is effectively infinite. Given this last condition, the calculations are very difficult; to facilitate understanding, I will, as in earlier work, consider a limiting case.
And this is where he “goes discrete” again—allowing (“cellularautomatonstyle”) only discrete possible velocities for each molecule:
He says that upon colliding, two molecules can exchange these discrete velocities, but nothing more. As he explains, though:
Even if, at first sight, this seems a very abstract way of treating the problem, it rapidly leads to the desired objective, and when you consider that in nature all infinities are but limiting cases, one assumes each molecule can behave in this fashion only in the limiting case where each molecule can assume more and more values of the velocity.
But now—much like in an earlier paper—he makes things even simpler, saying he’s going to ignore velocities for now, and just say that the possible energies of molecules are “in an arithmetic progression”:
He plans to look at collisions, but first he just wants to consider the combinatorial problem of distributing these energies among n molecules in all possible ways, subject to the constraint of having a certain fixed total energy. He sets up a specific example, with 7 molecules, total energy 7, and maximum energy per molecule 7—then gives an explicit table of all possible states (up to, as he puts it, “immaterial permutations of molecular labels”):
Tables like this had been common for nearly two centuries in combinatorial mathematics books like Jacob Bernoulli’s (1655–1705) Ars Conjectandi
but this might have been the first place such a table had appeared in a paper about fundamental physics.
And now Boltzmann goes into an analysis of the distribution of states—of the kind that’s now long been standard in textbooks of statistical physics, but will then have been quite unfamiliar to the purecalculusbased physicists of the time:
He derives the average energy per molecule, as well as the fluctuations:
He says that “of course” the real interest is in the limit of an infinite number of molecules, but he still wants to show that for “moderate values” the formulas remain quite accurate. And then (even without Wolfram Language!) he’s off finding (using Newton’s method it seems) approximate roots of the necessary polynomials:
Just to show how it all works, he considers a slightly larger case as well:
Now he’s computing the probability that a given molecule has a particular energy
and determining that in the limit it’s an exponential
that is, as he says, “consistent with that known from gases in thermal equilibrium”.
He claims that in order to really get a “mechanical theory of heat” it’s necessary to take a continuum limit. And here he concludes that thermal equilibrium is achieved by maximizing the quantity Ω (where the “l” stands for log, so this is basically ƒ log ƒ):
He explains that Ω is basically the log of the number of possible permutations, and that it’s “of special importance”, and he’ll call it the “permutability measure”. He immediately notes that “the total permutability measure of two bodies is equal to the sum of the permutability measures of each body”. (Note that Boltzmann’s Ω isn’t the modern totalnumberofstates Ω; confusingly, that’s essentially the exponential of Boltzmann’s Ω.)
He goes through some discussion of how to handle extra degrees of freedom in polyatomic molecules, but then he’s on to the main event: arguing that Ω is (essentially) the entropy. It doesn’t take long:
Basically he just says that in equilibrium the probability ƒ(…) for a molecule to have a particular velocity is given by the Maxwell distribution, then he substitutes this into the formula for Ω, and shows that indeed, up to a constant, Ω is exactly the “Clausius entropy” ∫dQ/T.
So, yes, in equilibrium Ω seems to be giving the entropy. But then Boltzmann makes a bit of a jump. He says that in processes that aren’t reversible both “Clausius entropy” and Ω will increase, and can still be identified—and enunciates the general principle, printed in his paper in special doubledspaced form:
… [In] any system of bodies that undergoes state changes … even if the initial and final states are not in thermal equilibrium … the total permutability measure for the bodies will continually increase during the state changes, and can remain constant only so long as all the bodies during the state changes remain infinitely close to thermal equilibrium (reversible state changes).
In other words, he’s asserting that Ω behaves the same way entropy is said to behave according to the Second Law. He gives various thought experiments about gases in boxes with dividers, gases under gravity, etc. And finally concludes that, yes, the relationship of entropy to Ω “applies to the general case”.
There’s one final paragraph in the paper, though:
Up to this point, these propositions may be demonstrated exactly using the theory of gases. If one tries, however, to generalize to liquid drops and solid bodies, one must dispense with an exact treatment from the outset, since far too little is known about the nature of the latter states of matter, and the mathematical theory is barely developed. But I have already mentioned reasons in previous papers, in virtue of which it is likely that for these two aggregate states, the thermal equilibrium is achieved when Ω becomes a maximum, and that when thermal equilibrium exists, the entropy is given by the same expression. It can therefore be described as likely that the validity of the principle which I have developed is not just limited to gases, but that the same constitutes a general natural law applicable to solid bodies and liquid droplets, although the exact mathematical treatment of these cases still seems to encounter extraordinary difficulties.
Interestingly, Boltzmann is only saying that it’s “likely” that in thermal equilibrium his permutability measure agrees with Clausius’s entropy, and he’s implying that actually that’s really the only place where Clausius’s entropy is properly defined. But certainly his definition is more general (after all, it doesn’t refer to things like temperature that are only properly defined in equilibrium), and so—even though Boltzmann didn’t explicitly say it—one can imagine basically just using it as the definition of entropy for arbitrary cases. Needless to say, the story is actually more complicated, as we’ll see soon.
But this definition of entropy—crispened up by Max Planck (1858–1947) and with different notation—is what ended up years later “written in stone” at Boltzmann’s grave:
In his 1877 paper Boltzmann had made the claim that in equilibrium all possible microscopic states of a system would be equally probable. But why should this be true? One reason could be that in its pure “mechanical evolution” the system would just successively visit all these states. And this was an idea that Boltzmann seems to have had—with increasing clarity—from the time of his very first paper in 1866 that purported to “prove the Second Law” from mechanics.
In modern times—with our understanding of discrete systems and computational rules—it’s not difficult to describe the idea of “visiting all states”. But in Boltzmann’s time it was considerably more complicated. Did one expect to hit all the infinite possible infinitesimally separated configurations of a system? Or somehow just get close? The fact is that Boltzmann had certainly dipped his toe into thinking about things in terms of discrete quantities. But he didn’t make the jump to imagining discrete rules, even though he certainly did know about discrete iterative processes, like Newton’s method for finding roots.
Boltzmann knew about cases—like circular motion—where everything was purely periodic. But maybe when motion wasn’t periodic, it’d inevitably “visit all states”. Already in 1868 Boltzmann was writing a paper entitled “Solution to a Mechanical Problem” where he studies a single point mass moving in an α/r – β/r^{2} potential and bouncing elastically off a line—and manages to show that it visits every position with equal probability. In this paper he’s just got traditional formulas, but by 1871, in “Some General Theorems about Thermal Equilibrium”—computing motion in the same potential as before—he’s got a picture:
Boltzmann probably knew about Lissajous figures—cataloged in 1857
and the fact that in this case a rational ratio of x and y periods gives a periodic overall curve while an irrational one always gives a curve that visits every position might have led him to suspect that all systems would either be periodic, or would visit every possible configuration (or at least, as he identified in his paper, every configuration that had the same values of “constants of the motion”, like energy).
In early 1877 Boltzmann returned to the same question, including as one section in his “Remarks on Some Problems in the Mechanical Theory of Heat” more analysis of the same potential as before, but now showing a diversity of more complicated pictures that almost seem to justify his rule30beforeitstime idea that there could be “pure mechanics” that would lead to “Second Law” behavior:
In modern times, of course, it’s easy to solve those equations of motion, and typical results obtained for an array of values of parameters are:
Boltzmann returned to these questions in 1884, responding to Helmholtz’s analysis of what he was calling “monocyclic systems”. Boltzmann used the same potential again, but now with a name for the “visitallstates” property: isodic. Meanwhile, Boltzmann had introduced the name “ergoden” for the collection of all possible configurations of a system with a given energy (what would now be called the microcanonical ensemble). But somehow, quite a few years later, Boltzmann’s student Paul Ehrenfest (1880–1933) (along with Tatiana EhrenfestAfanassjewa (1876–1964)) would introduce the term “ergodic” for Boltzmann’s isodic. And “ergodic” is the term that caught on. And in the twentieth century there was all sorts of development of “ergodic theory”, as we’ll discuss a bit later.
But back in the 1800s people continued to discuss the possibility that what would become called ergodicity was somehow generic, and would explain why all states would somehow be equally probable, why the Maxwell distribution of velocities would be obtained, and ultimately why the Second Law was true. Maxwell worked out some examples. So did Kelvin. But it remained unclear how it would all work out, as Kelvin (now with many letters after his name) discussed in a talk he gave in 1900 celebrating the new century:
The dynamical theory of light didn’t work out. And about the dynamical theory of heat, he quotes Maxwell (following Boltzmann) in one of his very last papers, published in 1878, as saying, in reference to what amounts to a proof of the Second Law from underlying dynamics:
Kelvin talks about exploring test cases:
When, for example, is the motion of a single particle bouncing around in a fixed region ergodic? He considers first an ellipse, and proves that, no, there isn’t in general ergodicity there:
Then he goes on to the much more complicated case
and now he does an “experiment” (with a rather Monte Carlo flavor):
Kelvin considers a few other examples
but mostly concludes that he can’t tell in general about ergodicity—and that probably something else is needed, or as he puts it (somehow wrapping the theory of light into the story as well):
Had Boltzmann’s 1872 H theorem proved the Second Law? Was the Second Law—with its rather downbeat implication about the heat death of the universe—even true? One skeptic was Boltzmann’s friend and former teacher, the chemist Josef Loschmidt (1821–1895), who in 1866 had used kinetic theory to (rather accurately) estimate the size of air molecules. And in 1876 Loschmidt wrote a paper entitled “On the State of Thermal Equilibrium in a System of Bodies with Consideration of Gravity” in which he claimed to show that when gravity was taken into account, there wouldn’t be uniform thermal equilibrium, the Maxwell distribution, or the Second Law—and thus, as he poetically explained:
The terroristic nimbus of the Second Law is destroyed, a nimbus which makes that Second Law appear as the annihilating principle of all life in the universe—and at the same time we are confronted with the comforting perspective that, as far as the conversion of heat into work is concerned, mankind will not solely be dependent on the intervention of coal or of the Sun, but will have available an inexhaustible resource of convertible heat at all times.
His main argument revolves around a thought experiment involving molecules in a gravitational field:
Over the next couple of years, despite Loschmidt’s progressively more elaborate constructions
Boltzmann and Maxwell will debunk this particular argument—even though to this day the role of gravity in relation to the Second Law remains incompletely resolved.
But what’s more important for our narrative about Loschmidt’s original paper are a couple of paragraphs tucked away at the end of one section (that in fact Kelvin had basically anticipated in 1874):
[Consider what would happen if] after a time t sufficiently long for the stationary state to obtain, we suddenly reversed the velocities of all atoms. Initially we would be in a state that would look like the stationary state. This would be true for some time, but in the long run the stationary state would deteriorate and after the time t we would inevitably return to the initial state…
It is clear that in general in any system one can revert the entire course of events by suddenly inverting the velocities of all the elements of the system. This doesn’t give a solution to the problem of undoing everything that happens [in the universe] but it does give a simple prescription: just suddenly revert the instantaneous velocities of all atoms of the universe.
How did this relate to the H theorem? The underlying molecular equations of motion that Boltzmann had assumed in his proof were reversible in time. Yet Boltzmann claimed that H was always going to a minimum. But why couldn’t one use Loschmidt’s argument to construct an equally possible “reverse evolution” in which H was instead going to a maximum?
It didn’t take Boltzmann long to answer, in print, tucked away in a section of his paper “Remarks on Some Problems in the Mechanical Theory of Heat”. He admits that Loschmidt’s argument “has great seductiveness”. But he claims it is merely “an interesting sophism”—and then says he will “locate the source of the fallacy”. He begins with a classic setup: a collection of hard spheres in a box.
Suppose that at time zero the distribution of spheres in the box is not uniform; for example, suppose that the density of spheres is greater on the right than on the left … The sophism now consists in saying that, without reference to the initial conditions, it cannot be proved that the spheres will become uniformly mixed in the course of time.
But then he rather boldly claims that with the actual initial conditions described, the spheres will “almost always [become] uniform” at a future time t. Now he imagines (following Loschmidt) reversing all the velocities in this state at time t. Then, he says:
… the spheres would sort themselves out as time progresses, and at [the analog of] time 0, they would have a completely nonuniform distribution, even though the [new] initial distribution [one had used] was almost uniform.
But now he says that, yes—given this counterexample—it won’t be possible to prove that the final distribution of spheres will always be uniform.
This is in fact a consequence of probability theory, for any nonuniform distribution, no matter how improbable it may be, is still not absolutely impossible. Indeed it is clear that any individual uniform distribution, which might arise after a certain time from some particular initial state, is just as improbable as an individual nonuniform distribution; just as in the game of Lotto, any individual set of five numbers is as improbable as the set 1, 2, 3, 4, 5. It is only because there are many more uniform distributions than nonuniform ones that the distribution of states will become uniform in the course of time. One therefore cannot prove that, whatever may be the positions and velocities of the spheres at the beginning, the distribution must become uniform after a long time; rather one can only prove that infinitely many more initial states will lead to a uniform one after a definite length of time than to a nonuniform one.
He adds:
One could even calculate, from the relative numbers of the different state distributions, their probabilities, which might lead to an interesting method for the calculation of thermal equilibrium.
And indeed within a few months Boltzmann has followed up on that “interesting method” to produce his classic paper on the probabilistic interpretation of entropy.
But in his earlier paper he goes on to argue:
Since there are infinitely many more uniform than nonuniform distributions of states, the latter case is extraordinarily improbable [to arise] and can be considered impossible for practical purposes; just as it may be considered impossible that if one starts with oxygen and nitrogen mixed in a container, after a month one will find chemically pure oxygen in the lower half and nitrogen in the upper half, although according to probability theory this is merely very improbable but not impossible.
He talks about how interesting it is that the Second Law is intimately connected with probability while the First Law is not. But at the end he does admit:
Perhaps this reduction of the Second Law to the realm of probability makes its application to the entire universe appear dubious, but the laws of probability theory are confirmed by all experiments carried out in the laboratory.
At this point it’s all rather unconvincing. The H theorem had purported to prove the Second Law. But now he’s just talking about probability theory. He seems to have given up on proving the Second Law. And he’s basically just saying that the Second Law is true because it’s observed to be true—like other laws of nature, but not like something that can be “proved”, say from underlying molecular dynamics.
For many years not much attention was paid to these issues, but by the late 1880s there were attempts to clarify things, particularly among the rather active British circle of kinetic theorists. A published 1894 letter from the Irish mathematician Edward Culverwell (1855–1931) (who also wrote about ice ages and Montessori education) summed up some of the confusions that were circulating:
At a lecture in England the next year, Boltzmann countered (conveniently, in English):
He goes on, but doesn’t get much more specific:
He then makes an argument that will be repeated many times in different forms, saying that there will be fluctuations, where H deviates temporarily from its minimum value, but these will be rare:
Later he’s talking about what he calls the “H curve” (a plot of H as a function of time), and he’s trying to describe its limiting form:
And he even refers to Weierstrass’s recent work on nondifferentiable functions:
But he doesn’t pursue this, and instead ends his “rebuttal” with a more philosophical—and in some sense anthropic—argument that he attributes to his former assistant Ignaz Schütz (1867–1927):
It’s an argument that we’ll see in various forms repeated over the century and a half that follows. In essence what it’s saying is that, yes, the Second Law implies that the universe will end up in thermal equilibrium. But there’ll always be fluctuations. And in a big enough universe there’ll be fluctuations somewhere that are large enough to correspond to the world as we experience it, where “visible motion and life exist”.
But regardless of such claims, there’s a purely formal question about the H theorem. How exactly is it that from the Boltzmann transport equation—which is supposed to describe reversible mechanical processes—the H theorem manages to prove that the H function irreversibly decreases? It wasn’t until 1895—fully 25 years after Boltzmann first claimed to prove the H theorem—that this issue was even addressed. And it first came up rather circuitously through Boltzmann’s response to comments in a textbook by Gustav Kirchhoff (1824–1887) that had been completed by Max Planck.
The key point is that Boltzmann’s equation makes an implicit assumption, that’s essentially the same as Maxwell made back in 1860: that before each collision between molecules, the molecules are statistically uncorrelated, so that the probability for the collision has the factored form ƒ(v_{1}) ƒ(v_{2}). But what about after the collision? Inevitably the collision itself will lead to correlations. So now there’s an asymmetry: there are no correlations before each collision, but there are correlations after. And that’s why the behavior of the system doesn’t have to be symmetrical—and the H theorem can prove that H irreversibly decreases.
In 1895 Boltzmann wrote a 3page paper (after half in footnotes) entitled “More about Maxwell’s Distribution Law for Speeds” where he explained what he thought was going on:
[The reversibility of the laws of mechanics] has been recently applied in judging the assumptions necessary for a proof of [the H theorem]. This proof requires the hypothesis that the state of the gas is and remains molecularly disordered, namely, that the molecules of a given class do not always or predominantly collide in a specific manner and that, on the contrary, the number of collisions of a given kind can be found by the laws of probability.
Now, if we assume that in general a state distribution never remains molecularly ordered for an unlimited time and also that for a stationary statedistribution every velocity is as probable as the reversed velocity, then it follows that by inversion of all the velocities after an infinitely long time every stationary statedistribution remains unchanged. After the reversal, however, there are exactly as many collisions occurring in the reversed way as there were collisions occurring in the direct way. Since the two state distributions are identical, the probability of direct and indirect collisions must be equal for each of them, whence follows Maxwell’s distribution of velocities.
Boltzmann is introducing what we’d now call the “molecular chaos” assumption (and what Ehrenfest would call the Stosszahlansatz)—giving a rather selffulfilling argument for why the assumption should be true. In Boltzmann’s time there wasn’t really anything better to do. By the 1940s the BBGKY hierarchy at least let one organize the hierarchy of correlations between molecules—though it still didn’t give one a tractable way to assess what correlations should exist in practice, and what not.
Boltzmann knew these were all complicated issues. But he wrote about them at a technical level only a few more times in his life. The last time was in 1898 when, responding to a request from the mathematician Felix Klein (1849–1925), he wrote a paper about the H curve for mathematicians. He begins by saying that although this curve comes from the theory of gases, the essence of it can be reproduced by a process based on accumulating balls randomly picked from an urn. He then goes on to outline what amounts to a story of random walks and fractals. In another paper, he actually sketches the curve
saying that his drawing “should be taken with a large grain of salt”, noting—in a remarkably fractalreminiscent way—that “a zincographer [i.e. an engraver of printing plates] would not have been able to produce a real figure since the Hcurve has a very large number of maxima and minima on each finite segment, and hence defies representation as a line of continuously changing direction.”
Of course, in modern times it’s easy to produce an approximation to the H curve according to his prescription:
But at the end of his “mathematical” paper he comes back to talking about gases. And first he makes the claim that the effective reversibility seen in the H curve will never be seen in actual physical systems because, in essence, there are always perturbations from outside. But then he ends, in a statement of ultimate reversibility that casts our everyday observation of irreversibility as tautological:
There is no doubt that it is just as conceivable to have a world in which all natural processes take place in the wrong chronological order. But a person living in this upsidedown world would have feelings no different than we do: they would just describe what we call the future as the past and vice versa.
Probably the single most prominent research topic in mathematical physics in the 1800s was the threebody problem—of solving for the motion under gravity of three bodies, such as the Earth, Moon and Sun. And in 1890 the French mathematician Henri Poincaré (1854–1912) (whose breakout work had been on the threebody problem) wrote a paper entitled “On the ThreeBody Problem and the Equations of Dynamics” in which, as he said:
It is proved that there are infinitely many ways of choosing the initial conditions such that the system will return infinitely many times as close as one wishes to its initial position. There are also an infinite number of solutions that do not have this property, but it is shown that these unstable solutions can be regarded as “exceptional” and may be said to have zero probability.
This was a mathematical result. But three years later Poincaré wrote what amounted to a philosophy paper entitled “Mechanism and Experience” which expounded on its significance for the Second Law:
In the mechanistic hypothesis, all phenomena must be reversible; for example, the stars might traverse their orbits in the retrograde sense without violating Newton’s law; this would be true for any law of attraction whatever. This is therefore not a fact peculiar to astronomy; reversibility is a necessary consequence of all mechanistic hypotheses.
Experience provides on the contrary a number of irreversible phenomena. For example, if one puts together a warm and a cold body, the former will give up its heat to the latter; the opposite phenomenon never occurs. Not only will the cold body not return to the warm one the heat which it has taken away when it is in direct contact with it; no matter what artifice one may employ, using other intervening bodies, this restitution will be impossible, at least unless the gain thereby realized is compensated by an equivalent or large loss. In other words, if a system of bodies can pass from state A to state B by a certain path, it cannot return from B to A, either by the same path or by a different one. It is this circumstance that one describes by saying that not only is there not direct reversibility, but also there is not even indirect reversibility.
But then he continues:
A theorem, easy to prove, tells us that a bounded world, governed only by the laws of mechanics, will always pass through a state very close to its initial state. On the other hand, according to accepted experimental laws (if one attributes absolute validity to them, and if one is willing to press their consequences to the extreme), the universe tends toward a certain final state, from which it will never depart. In this final state, which will be a kind of death, all bodies will be at rest at the same temperature.
But in fact, he says, the recurrence theorem shows that:
This state will not be the final death of the universe, but a sort of slumber, from which it will awake after millions of millions of centuries. According to this theory, to see heat pass from a cold body to a warm one … it will suffice to have a little patience. [And we may] hope that some day the telescope will show us a world in the process of waking up, where the laws of thermodynamics are reversed.
By 1903, Poincaré was more strident in his critique of the formalism around the Second Law, writing (in English) in a paper entitled “On Entropy”:
But back in 1896, Boltzmann and the H theorem had another critic: Ernst Zermelo (1871–1953), a recent German math PhD who was then working with Max Planck on applied mathematics—though would soon turn to foundations of mathematics and become the “Z” in ZFC set theory. Zermelo’s attack on the H theorem began with a paper entitled “On a Theorem of Dynamics and the Mechanical Theory of Heat”. After explaining Poincaré’s recurrence theorem, Zermelo gives some “mathematicianstyle” conditions (the gas must be in a finite region, must have no infinite energies, etc.), then says that even though there must exist states that would be nonrecurrent and could show irreversible behavior, there would necessarily be infinitely more states that “would periodically repeat themselves … with arbitrarily small variations”. And, he argues, such repetition would affect macroscopic quantities discernable by our senses. He continues:
In order to retain the general validity of the Second Law, we therefore would have to assume that just those initial states leading to irreversible processes are realized in nature, their small number notwithstanding, while the other ones, whose probability of existence is higher, mathematically speaking, do not actually occur.
And he concludes that the Poincaré recurrence phenomenon means that:
… it is certainly impossible to carry out a mechanical derivation of the Second Law on the basis of the existing theory without specializing the initial states.
Boltzmann responded promptly but quite impatiently:
I have pointed out particularly often, and as clearly as I possibly could … that the Second Law is but a principle of probability theory as far as the moleculartheoretic point of view is concerned. … While the theorem by Poincaré that Zermelo discusses in the beginning of his paper is of course correct, its application to heat theory is not.
Boltzmann talks about the H curve, and first makes rather a mathematicianstyle point about the order of limits:
If we first take the number of gas molecules to be infinite, as was clearly done in [my 1896 proof], and only then let the time grow very large, then, in the vast majority of cases, we obtain a curve asymptotically [always close to zero]. Moreover, as can easily be seen, Poincaré’s theorem is not applicable in this case. If, however, we take the time [span] to be infinitely great and, in contrast, the number of molecules to be very great but not absolutely infinite, then the Hcurve has a different character. It almost always runs very close to [zero], but in rare cases it rises above that, in what we shall call a “hump” … at which significant deviations from the Maxwell velocity distribution can occur …
Boltzmann then argues that even if you start “at a hump”, you won’t stay there, and “over an enormously long period of time” you’ll see something infinitely close to “equilibrium behavior”. But, he says:
… it is [always] possible to reach again a greater hump of the Hcurve by further extending the time … In fact, it is even the case that the original state must return, provided only that we continue to sufficiently extend the time…
He continues:
Mr. Zermelo is therefore right in claiming that, mathematically speaking, the motion is periodic. He has by no means succeeded, however, in refuting my theorems, which, in fact, are entirely consistent with this periodicity.
After giving arguments about the probabilistic character of his results, and (as we would now say it) the fact that a 1D random walk is certain to repeatedly return to the origin, Boltzmann says that:
… we must not conclude that the mechanical approach has to be modified in any way. This conclusion would be justified only if the approach had a consequence that runs contrary to experience. But this would be the case only if Mr. Zermelo were able to prove that the duration of the period within which the old state of the gas must recur in accordance with Poincaré’s theorem has an observable length…
He goes on to imagine “a trillion tiny spheres, each with a [certain initial velocity] … in the one corner of a box” (and by “trillion” he means million million million, or today’s quintillion) and then says that “after a short time the spheres will be distributed fairly evenly in the box”, but the period for a “Poincaré recurrence” in which they all will return to their original corner is “so great that nobody can live to see it happen”. And to make this point more forcefully, Boltzmann has an appendix in which he tries to get an actual approximation to the recurrence time, concluding that its numerical value “has many trillions of digits”.
He concludes:
If we consider heat as a motion of molecules that occurs in accordance with the general equations of mechanics and assume that the arrangement of bodies that we perceive is currently in a highly improbable state, then a theorem follows that is in agreement with the Second Law for all phenomena so far observed.
Of course, this theorem can no longer hold once we observe bodies of so small a scale that they only contain a few molecules. Since, however, we do not have at hand any experimental results on the behavior of bodies so small, this assumption does not run counter to previous experience. In fact, certain experiments conducted on very small bodies in gases seem rather to support the assumption, although we are still far from being able to assert its correctness on the basis of experimental proof.
But then he gives an important caveat—with a small philosophical flourish:
Of course, we cannot expect natural science to answer the question as to why the bodies surrounding us currently exist in a highly improbable state, just as we cannot expect it to answer the question as to why there are any phenomena at all and why they adhere to certain given principles.
Unsurprisingly—particularly in view of his future efforts in the foundations of mathematics—Zermelo is unconvinced by all of this. And six months later he replies again in print. He admits that a full Poincaré recurrence might take astronomically long, but notes that (where, by “physical state”, he means one that we perceive):
… we are after all always concerned only with the “physical state”, which can be realized by many different combinations, and hence can recur much sooner.
Zermelo zeroes in on many of the weaknesses in Boltzmann’s arguments, saying that the thing he particularly “contests … is the analogy that is supposed to exist between the properties of the H curve and the Second Law”. He claims that irreversibility cannot be explained from “mechanical suppositions” without “new physical assumptions”—and in particular criteria for choosing appropriate initial states. He ends by saying that:
From the great successes of the kinetic theory of gases in explaining the relationships among states we must not deduce its … applicability also to temporal processes. … [For in this case I am] convinced that it necessarily fails in the absence of entirely new assumptions.
Boltzmann replies again—starting off with the strangely weak argument:
The Second Law receives a mechanical explanation by virtue of the assumption, which is of course unprovable, that the universe, when considered as a mechanical system, or at least a very extensive part thereof surrounding us, started out in a highly improbable state and still is in such a state.
And, yes, there’s clearly something missing in the understanding of the Second Law. And even as Zermelo pushes for formal mathematicianstyle clarity, Boltzmann responds with physiciststyle “reasonable arguments”. There’s lots of rhetoric:
The applicability of the calculus of probabilities to a particular case can of course never be proved with precision. If 100 out of 100,000 objects of a particular sort are consumed by fire per year, then we cannot infer with certainty that this will also be the case next year. On the contrary! If the same conditions continue to obtain for 10^{10} years, then it will often be the case during this period that the 100,000 objects are all consumed by fire at once on a single day, and even that not a single object suffers damage over the course of an entire year. Nevertheless, every insurance company places its faith in the calculus of probabilities.
Or, in justification of the idea that we live in a highly improbable “lowentropy” part of the universe:
I refuse to grant the objection that a mental picture requiring so great a number of dead parts of the universe for the explanation of so small a number of animated parts is wasteful, and hence inexpedient. I still vividly remember someone who adamantly refused to believe that the Sun’s distance from the Earth is 20 million miles on the ground that it would simply be foolish to assume so vast a space only containing luminiferous aether alongside so small a space filled with life.
Curiously—given his apparent reliance on “commonsense” arguments—Boltzmann also says:
I myself have repeatedly cautioned against placing excessive trust in the extension of our mental pictures beyond experience and issued reminders that the pictures of contemporary mechanics, and in particular the conception of the smallest particles of bodies as material points, will turn out to be provisional.
In other words, we don’t know that we can think of atoms (even if they exist at all) as points, and we can’t really expect our everyday intuition to tell us about how they work. Which presumably means that we need some kind of solid, “formal” argument if we’re going to explain the Second Law.
Zermelo didn’t respond again, and moved on to other topics. But Boltzmann wrote one more paper in 1897 about “A Mechanical Theorem of Poincaré” ending with two more whyitdoesn’tapplyinpractice arguments:
Poincaré’s theorem is of course never applicable to terrestrial bodies which we can hold in our hands as none of them is entirely closed. Nor it is applicable to an entirely closed gas of the sort considered by the kinetic theory if first the number of molecules and only then the quotients of the intervals between two neighboring collisions in the observation time is allowed to become infinite.
Boltzmann—and Maxwell before him—had introduced the idea of using probability theory to discuss the emergence of thermodynamics and potentially the Second Law. But it wasn’t until around 1900—with the work of J. Willard Gibbs (1839–1903)—that a principled mathematical framework for thinking about this developed. And while we can now see that this framework distracts in some ways from several of the key issues in understanding the foundations of the Second Law, it’s been important in framing the discussion of what the Second Law really says—as well as being central in defining the foundations for much of what’s been done over the past century or so under the banner of “statistical mechanics”.
Gibbs seems to have first gotten involved with thermodynamics around 1870. He’d finished his PhD at Yale on the geometry of gears in 1863—getting the first engineering PhD awarded in the US. After traveling in Europe and interacting with various leading mathematicians and physicists, he came back to Yale (where he stayed for the remaining 34 years of his life) and in 1871 became professor of mathematical physics there.
His first papers (published in 1873 when he was already 34 years old) were in a sense based on taking seriously the formalism of equilibrium thermodynamics defined by Clausius and Maxwell—treating entropy and internal energy, just like pressure, volume and temperature, as variables that defined properties of materials (and notably whether they were solids, liquids or gases). Gibbs’s main idea was to “geometrize” this setup, and make it essentially a story of multivariate calculus:
Unlike the European developers of thermodynamics, Gibbs didn’t interact deeply with other scientists—with the possible exception of Maxwell, who (a few years before his death in 1879) made a 3D version of Gibbs’s thermodynamic surface out of clay—and supplemented his 2D thermodynamic diagrams after the first edition of his textbook Theory of Heat with renderings of 3D versions:
Three years later, Gibbs began publishing what would be a 300page work defining what has become the standard formalism for equilibrium chemical thermodynamics. He began with a quote from Clausius:
In the years that followed, Gibbs’s work—stimulated by Maxwell—mostly concentrated on electrodynamics, and later quaternions and vector analysis. But Gibbs published a few more small papers on thermodynamics—always in effect taking equilibrium (and the Second Law) for granted.
In 1882—a certain Henry Eddy (1844–1921) (who in 1879 had written a book on thermodynamics, and in 1890 would become president of the University of Cincinnati), claimed that “radiant heat” could be used to violate the Second Law:
Gibbs soon published a 2page rebuttal (in the 6thever issue of Science magazine):
Then in 1889 Clausius died, and Gibbs wrote an obituary—praising Clausius but making it clear he didn’t think the kinetic theory of gases was a solved problem:
That same year Gibbs announced a short course that he would teach at Yale on “The a priori Deduction of Thermodynamic Principles from the Theory of Probabilities”. After a decade of work, this evolved into Gibbs’s last publication—an original and elegant book that’s largely defined how the Second Law has been thought about ever since:
The book begins by explaining that mechanics is about studying the time evolution of single systems:
But Gibbs says he is going to do something different: he is going to look at what he’ll call an ensemble of systems, and see how the distribution of their characteristics changes over time:
He explains that these “inquiries” originally arose in connection with deriving the laws of thermodynamics:
But he argues that this area—which he’s calling statistical mechanics—is worth investigating even independent of its connection to thermodynamics:
Still, he expects this effort will be relevant to the foundations of thermodynamics:
He immediately then goes on to what he’ll claim is the way to think about the relation of “observed thermodynamics” to his exact statistical mechanics:
Soon he makes the interesting—if, in the light of history, very overly optimistic—claim that “the laws of thermodynamics may be easily obtained from the principles of statistical mechanics”:
At first the text of the book reads very much like a typical mathematical work on mechanics:
But soon it’s “going statistical”, talking about the “density” of systems in “phase” (i.e. with respect to the variables defining the configuration of the system). And a few pages in, he’s proving the fundamental result that the density of “phase fluid” satisfies a continuity equation (which we’d now call the Liouville equation):
It’s all quite elegant, and all very rooted in the calculusbased mathematics of its time. He’s thinking about a collection of instances of a system. But while with our modern computational paradigm we’d readily be able to talk about a discrete list of instances, with his calculusbased approach he has to consider a continuous collection of instances—whose treatment inevitably seems more abstract and less explicit.
He soon makes contact with the “theory of errors”, discussing in effect how probability distributions over the space of possible states evolve. But what probability distributions should one consider? By chapter 4, he’s looking at what he calls (and is still called) the “canonical distribution”:
He gives a nowclassic definition for the probability as a function of energy ϵ:
He observes that this distribution combines nicely when independent parts of a system are brought together, and soon he’s noting that:
But so far he’s careful to just talk about how things are “analogous”, without committing to a true connection:
More than halfway through the book he’s defined certain properties of his probability distributions that “may … correspond to the thermodynamic notions of entropy and temperature”:
Next he’s on to the concept of a “microcanonical ensemble” that includes only states of a given energy. For him—with his continuumbased setup—this is a slightly elaborate thing to define; in our modern computational framework it actually becomes more straightforward than his “canonical ensemble”. Or, as he already says:
But what about the Second Law? Now he’s getting a little closer:
When he says “index of probability” he’s talking about the log of a probability in his ensemble, so this result is about the fact that this quantity is extremized when all the elements of the ensemble have equal probability:
Soon he’s discussing whether he can use his index as a way—like Boltzmann tried to do with his version of entropy—to measure deviations from “statistical equilibrium”:
But now Gibbs has hit one of the classic gotchas of his approach: if you look in perfect detail at the evolution of an ensemble of systems, there’ll never be a change in the value of his index—essentially because of the overall conservation of probability. Gibbs brings in what amounts to a commonsense physics argument to handle this. He says to consider putting “coloring matter” in a liquid that one stirs. And then he says that even though the liquid (like his phase fluid) is microscopically conserved, the coloring matter will still end up being “uniformly mixed” in the liquid:
He talks about how the conclusion about whether mixing happens in effect depends on what order one takes limits in. And while he doesn’t put it quite this way, he’s essentially realized that there’s a competition between the system “mixing things up more and more finely” and the observer being able to track finer and finer details. He realizes, though, that not all systems will show this kind of mixing behavior, noting for example that there are mechanical systems that’ll just keep going in simple cycles forever.
He doesn’t really resolve the question of why “practical systems” should show mixing, more or less ending with a statement that even though his underlying mechanical systems are reversible, it’s somehow “in practice” difficult to go back:
Despite things like this, Gibbs appears to have been keen to keep the majority of his book “purely mathematical”, in effect proving theorems that necessarily followed from the setup he had given. But in the penultimate chapter of the book he makes what he seems to have viewed as a lessthansatisfactory attempt to connect what he’s done with “real thermodynamics”. He doesn’t really commit to the connection, though, characterizing it more as an “analogy”:
But he soon starts to be pretty clear that he actually wants to prove the Second Law:
He quickly backs off a little, in effect bringing in the observer to soften the requirements:
But then he fires his best shot. He says that the quantities he’s defined in connection with his canonical ensemble satisfy the same equations as Clausius originally set up for temperature and entropy:
He adds that fluctuations (or “anomalies”, as he calls them) become imperceptible in the limit of a large system:
But in physical reality, why should one have a whole collection of systems as in the canonical ensemble? Gibbs suggests it would be more natural to look at the microcanonical ensemble—and in fact to look at a “time ensemble”, i.e. an averaging over time rather than an averaging over different possible states of the system:
Gibbs has proved some results (e.g. related to the virial theorem) about the relation between time and ensemble averages. But as the future of the subject amply demonstrates, they’re not nearly strong enough to establish any general equivalence. Still, Gibbs presses on.
In the end, though, as he himself recognized, things weren’t solved—and certainly the canonical ensemble wasn’t the whole story:
He discusses the tradeoff between having a canonical ensemble “heat bath” of a known temperature, and having a microcanonical ensemble with known energy. At one point he admits that it might be better to consider the time evolution of a single state, but basically decides that—at least in his continuousprobabilitydistributionbased formalism—he can’t really set this up:
Gibbs definitely encourages the idea that his “statistical mechanics” has successfully “derived” thermodynamics. But he’s ultimately quite careful and circumspect in what he actually says. He mentions the Second Law only once in his whole book—and then only to note that he can get the same “mathematical expression” from his canonical ensemble as Clausius’s form of the Second Law. He doesn’t mention Boltzmann’s H theorem anywhere in the book, and—apart from one footnote concerning “difficulties long recognized by physicists”—he mentions only Boltzmann’s work on theoretical mechanics.
One can view the main achievement of Gibbs’s book as having been to define a framework in which precise results about the statistical properties of collections of systems could be defined and in some cases derived. Within the mathematics and other formalism of the time, such ensemble results represented in a sense a distinctly “higherorder” description of things. Within our current computational paradigm, though, there’s much less of a distinction to be made: whether one’s looking at a single path of evolution, or a whole collection, one’s ultimately still just dealing with a computation. And that makes it clearer that—ensembles or not—one’s thrown back into the same kinds of issues about the origin of the Second Law. But even so, Gibbs provided a language in which to talk with some clarity about many of the things that come up.
In late 1867 Peter Tait (1831–1901)—a childhood friend of Maxwell’s who was by then a professor of “natural philosophy” in Edinburgh—was finishing his sixth book. It was entitled Sketch of Thermodynamics and gave a brief, historically oriented and not particularly conceptual outline of what was then known about thermodynamics. He sent a draft to Maxwell, who responded with a fairly long letter:
The letter begins:
I do not know in a controversial manner the history of thermodynamics … [and] I could make no assertions about the priority of authors …
Any contributions I could make … [involve] picking holes here and there to ensure strength and stability.
Then he continues (with “ΘΔcs” being his whimsical Greekified rendering of the word “thermodynamics”):
To pick a hole—say in the 2nd law of ΘΔcs, that if two things are in contact the hotter cannot take heat from the colder without external agency.
Now let A and B be two vessels divided by a diaphragm … Now conceive a finite being who knows the paths and velocities of all the molecules by simple inspection but who can do no work except open and close a hole in the diaphragm by means of a slide without mass. Let him … observe the molecules in A and when he sees one coming … whose velocity is less than the mean [velocity] of the molecules in B let him open the hole and let it go into B [and vice versa].
Then the number of molecules in A and B are the same as at first, but the energy in A is increased and that in B diminished, that is, the hot system has got hotter and the cold colder and yet no work has been done, only the intelligence of a very observant and neatfingered being has been employed.
Or in short [we can] … restore a uniformly hot system to unequal temperatures… Only we can’t, not being clever enough.
And so it was that the idea of “Maxwell’s demon” was launched. Tait must at some point have shown Maxwell’s letter to Kelvin, who wrote on it:
Very good. Another way is to reverse the motion of every particle of the Universe and to preside over the unstable motion thus produced.
But the first place Maxwell’s demon idea appeared in print was in Maxwell’s 1872 textbook Theory of Heat:
Much of the book is devoted to what was by then quite traditional, experimentally oriented thermodynamics. But Maxwell included one final chapter:
Even in 1871, after all his work on kinetic theory, Maxwell is quite circumspect in his discussion of molecules:
But Maxwell’s textbook goes through a series of standard kinetic theory results, much as a modern textbook would. The secondtolast section in the whole book sounds a warning, however:
Interestingly, Maxwell continues, somewhat in anticipation of what Gibbs will say 30 years later:
But then there’s a reminder that this is being written in 1871, several decades before any clear observation of molecules was made. Maxwell says:
In other words, if there are water molecules, there must be something other than a law of averages that makes them all appear the same. And, yes, it’s now treated as a fundamental fact of physics that, for example, all electrons have exactly—not just statistically—the same properties such as mass and charge. But back in 1871 it was much less clear what characteristics molecules—if they existed as real entities at all—might have.
Maxwell included one last section in his book that to us today might seem quite wild:
In other words, aware of Darwin’s (1809–1882) 1859 Origin of Species, he’s considering a kind of “speciation” of molecules, along the lines of the discrete species observed in biology. But then he notes that unlike biological organisms, molecules are “permanent”, so their “selection” must come from some kind of pure separation process:
And at the very end he suggests that if molecules really are all identical, that suggests a level of fundamental order in the world that we might even be able to flow through to “exact principles of distributive justice” (presumably for people rather than molecules):
Maxwell has described rather clearly his idea of demons. But the actual name “demon” first appears in print in a paper by Kelvin in 1874:
It’s a British paper, so—in a nod to future nanomachinery—it’s talking about (molecular) cricket bats:
Kelvin’s paper—like his note written on Maxwell’s letter—imagines that the demons don’t just “sort” molecules; they actually reverse their velocities, thus in effect anticipating Loschmidt’s 1876 “reversibility objection” to Boltzmann’s H theorem.
In an undated note, Maxwell discusses demons, attributing the name to Kelvin—and then starts considering the “physicalization” of demons, simplifying what they need to do:
Concerning Demons.
I. Who gave them this name? Thomson.
2. What were they by nature? Very small BUT lively beings incapable of doing work but able to open and shut valves which move without friction or inertia.
3. What was their chief end? To show that the 2nd Law of Thermodynamics has only a statistical certainty.
4. Is the production of an inequality of temperature their only occupation? No, for less intelligent demons can produce a difference in pressure as well as temperature by merely allowing all particles going in one direction while stopping all those going the other way. This reduces the demon to a valve. As such value him. Call him no more a demon but a valve like that of the hydraulic ram, suppose.
It didn’t take long for Maxwell’s demon to become something of a fixture in expositions of thermodynamics, even if it wasn’t clear how it connected to other things people were saying about thermodynamics. And in 1879, for example, Kelvin gave a talk all about Maxwell’s “sorting demon” (like other British people of the time he referred to Maxwell as “Clerk Maxwell”):
Kelvin describes—without much commentary, and without mentioning the Second Law—some of the feats of which the demon would be capable. But he adds:
The description of the lecture ends:
Presumably no actual Maxwell’s demon was shown—or Kelvin wouldn’t have continued for the rest of his life to treat the Second Law as an established principle.
But in any case, Maxwell’s demon has always remained something of a fixture in discussions of the foundations of the Second Law. One might think that the observability of Brownian motion would make something like a Maxwell’s demon possible. And indeed in 1912 Marian Smoluchowski (1872–1917) suggested experiments that one could imagine would “systematically harvest” Brownian motion—but showed that in fact they couldn’t. In later years, a sequence of arguments were advanced that the mechanism of a Maxwell’s demon just couldn’t work in practice—though even today microscopic versions of what amount to Maxwell’s demons are routinely being investigated.
We’ve finally now come to the end of the story of how the original framework for the Second Law came to be set up. And, as we’ve seen, only a fairly small number of key players were involved:
So what became of these people? Carnot lived a generation earlier than the others, never made a living as a scientist, and was all but unknown in his time. But all the others had distinguished careers as academic scientists, and were widely known in their time. Clausius, Boltzmann and Gibbs are today celebrated mainly for their contributions to thermodynamics; Kelvin and Maxwell also for other things. Clausius and Gibbs were in a sense “pure professors”; Boltzmann, Maxwell and especially Kelvin also had engagement with the more general public.
All of them spent the majority of their lives in the countries of their birth—and all (with the exception of Carnot) were able to live out the entirety of their lives without timeconsuming disruptions from war or other upheavals:
Almost all of what is known about Sadi Carnot as a person comes from a single biographical note written nearly half a century after his death by his younger brother Hippolyte Carnot (who was a distinguished French politician—and sometime education minister—and father of the Sadi Carnot who would become president of France). Hippolyte Carnot began by saying that:
As the life of Sadi Carnot was not marked by any notable event, his biography would have occupied only a few lines; but a scientific work by him, after remaining long in obscurity, brought again to light many years after his death, has caused his name to be placed among those of great inventors.
The Carnots’ father was close to Napoleon, and Hippolyte explains that when Sadi was a young child he ended up being babysat by “Madame Bonaparte”—but one day wandered off, and was found inspecting the operation of a nearby mill, and quizzing the miller about it. For the most part, however, throughout his life, Sadi Carnot apparently kept very much to himself—while with quiet intensity showing a great appetite for intellectual pursuits from mathematics and science to art, music and literature, as well as practical engineering and the science of various sports.
Even his brother Hippolyte can’t explain quite how Sadi Carnot—at the age of 28—suddenly “came out” and in 1824 published his book on thermodynamics. (As we discussed above, it no doubt had something to do with the work of his father, who died two years earlier.) Sadi Carnot funded the publication of the book himself—having 600 copies printed (at least some of which remained unsold a decade later). But after the book was published, Carnot appears to have returned to just privately doing research, living alone, and never publishing again in his lifetime. And indeed he lived only another eight years, dying (apparently after some months of ill health) in the same Paris cholera outbreak that claimed General Lamarque of Les Misérables fame.
Twentythree pages of unpublished personal notes survive from the period after the publication of Carnot’s book. Some are general aphorisms and life principles:
Speak little of what you know, and not at all of what you do not know.
Why try to be witty? I would rather be thought stupid and modest than witty and pretentious.
God cannot punish man for not believing when he could so easily have enlightened and convinced him.
The belief in an allpowerful Being, who loves us and watches over us, gives to the mind great strength to endure misfortune.
When walking, carry a book, a notebook to preserve ideas, and a piece of bread in order to prolong the walk if need be.
But others are more technical—and in fact reveal that Carnot, despite having based his book on caloric theory, had realized that it probably wasn’t correct:
When a hypothesis no longer suffices to explain phenomena, it should be abandoned. This is the case with the hypothesis which regards caloric as matter, as a subtile fluid.
The experimental facts tending to destroy this theory are as follows: The development of heat by percussion or friction of bodies … The elevation of temperature which takes place [when] air [expands into a] vacuum …
He continues:
At present, light is generally regarded as the result of a vibratory movement of the ethereal fluid. Light produces heat, or at least accompanies radiating heat, and moves with the same velocity as heat. Radiating heat is then a vibratory movement. It would be ridiculous to suppose that it is an emission of matter while the light which accompanies it could be only a movement.
Could a motion (that of radiating heat) produce matter (caloric)? No, undoubtedly; it can only produce a motion. Heat is then the result of a motion.
And then—in a rather clear enunciation of what would become the First Law of thermodynamics:
Heat is simply motive power, or rather motion which has changed form. It is a movement among the particles of bodies. Wherever there is destruction of motive power there is, at the same time, production of heat in quantity exactly proportional to the quantity of motive power destroyed. Reciprocally, wherever there is destruction of heat, there is production of motive power.
Carnot also wonders:
Liquefaction of bodies, solidification of liquids, crystallization—are they not forms of combinations of integrant molecules? Supposing heat due to a vibratory movement, how can the passage from the solid or the liquid to the gaseous state be explained?
There is no indication of how Carnot felt about this emerging rethinking of thermodynamics, or of how it might affect the results in his book. But Carnot clearly hoped to do experiments (as outlined in his notes) to test what was really going on. But as it was, he presumably didn’t get around to any of them—and his notes, ahead of their time as they were, did not resurface for many decades, by which time the ideas they contained had already been discovered by others.
Rudolf Clausius was born in what’s now Poland (and was then Prussia), one of more than 14 children of an education administrator and pastor. He went to university in Berlin, and, after considering doing history, eventually specialized in math and physics. After graduating in 1844 he started teaching at a top high school in Berlin (which he did for 6 years), and meanwhile earned his PhD in physics. His career took off after his breakout paper on thermodynamics appeared in 1850. For a while he was a professor in Berlin, then for 12 years in Zürich, then briefly in Würzburg, then—for the remaining 19 years of his life—in Bonn.
He was a diligent—if, one suspects, somewhat stiff—professor, notable for the clarity of his lectures, and his organizational care with students. He seems to have been a competent administrator, and late in his career he spent a couple of years as the president (“rector”) of his university. But first and foremost, he was a researcher, writing about a hundred papers over the course of his career. Most physicists of the time devoted at least some of their efforts to doing actual physics experiments. But Clausius was a pioneer in the idea of being a “pure theoretical physicist”, inspired by experiments and quoting their results, but not doing them himself.
The majority of Clausius’s papers were about thermodynamics, though late in his career his emphasis shifted more to electrodynamics. Clausius’s papers were original, clear, incisive and often fairly mathematically sophisticated. But from his very first paper on thermodynamics in 1850, he very much adopted a macroscopic approach, talking about what he considered to be “bulk” quantities like energy, and later entropy. He did explore some of the potential mechanics of molecules, but he never really made the connection between molecular phenomena and entropy—or the Second Law. He had a number of runins about academic credit with Kelvin, Tait, Maxwell and Boltzmann, but he didn’t seem to ever pay much attention to, for example, Boltzmann’s efforts to find molecularbased probabilistic derivations of Clausius’s results.
It probably didn’t help that after two decades of highly productive work, two misfortunes befell Clausius. First, in 1870, he had volunteered to lead an ambulance corps in the FrancoPrussian war, and was wounded in the knee, leading to chronic pain (as well as to his habit of riding to class on horseback). And then, in 1875, Clausius’s wife died in the birth of their sixth child—leaving him to care for six young children (which apparently he did with great conscientiousness). Clausius nevertheless continued to pursue his research—even to the end of his life—receiving many honors along the way (like election to no less than 40 professional societies), but it never again rose to the level of significance of his early work on thermodynamics and the Second Law.
Of the people we’re discussing here, by far the most famous during their lifetime was Kelvin. In his long career he wrote more than 600 scientific papers, received dozens of patents, started several companies and served in many administrative and governmental roles. His father was a math professor, ultimately in Glasgow, who took a great interest in the education of his children. Kelvin himself got an early start, effectively going to college at the age of 10, and becoming a professor in Glasgow at the age of 22—a position in which he continued for 53 years.
Kelvin’s breakout work, done in his twenties, was on thermodynamics. But over the years he also worked on many other areas of physics, and beyond, mixing theory, experiment and engineering. Beginning in 1854 he became involved in a technical megaproject of the time: the attempt to lay a transatlantic telegraph cable. He wound up very much on the front lines, helping out as a justintime physicist + engineer on the cablelaying ship. The first few attempts didn’t work out, but finally in 1866—in no small part through Kelvin’s contributions—a cable was successfully laid, and Kelvin (or William Thomson, as he then was) became something of a celebrity. He was made “Sir William Thomson” and—along with two other techies—formed his first company, which had considerable success in exploiting telegraphcablerelated engineering innovations.
Kelvin’s first wife died after a long illness in 1870, and Kelvin, with no children and already enthusiastic about the sea, bought a fairly large yacht, and pursued a number of nauticalrelated projects. One of these—begun in 1872—was the construction of an analog computer for calculating tides (basically with 10 gears for adding up 10 harmonic tide components), a device that, with progressive refinements, continued to be used for close to a century.
Being rather charmed by Kelvin’s physicistwithabigyacht persona, I once purchased a letter that Kelvin wrote in 1877 on the letterhead of “Yacht Lalla Rookh”:
The letter—in true academic style—promises that Kelvin will soon send an article he’s been asked to write on elasticity theory. And in fact he did write the article, and it was an expository one that appeared in the 9th edition of the Encyclopedia Britannica.
Kelvin was a prolific (if, to modern ears, sometimes rather pompous) writer, who took exposition seriously. And indeed—finding the textbooks available to him as a professor inadequate—he worked over the course of a dozen years (1855–1867) with his (and Maxwell’s) friend Peter Guthrie Tait to produce the influential Treatise on Natural Philosophy.
Kelvin explored many topics and theories, some more immediately successful than others. In the 1870s he suggested that perhaps atoms might be knotted vortices in the (luminiferous) aether (causing Tait to begin developing knot theory)—a hypothesis that’s in some sense a Victorian prelude to modern ideas about particles in our Physics Project.
Throughout his life, Kelvin was a devout Christian, writing that “The more thoroughly I conduct scientific research, the more I believe science excludes atheism.” And indeed this belief seems to make an appearance in his implication that humans—presumably as a result of their special relationship with God—might avoid the Second Law. But more significant at the time was Kelvin’s skepticism about Charles Darwin’s 1859 theory of natural selection, believing that there must in the end be a “continually guiding and controlling intelligence”. Despite being somewhat ridiculed for it, Kelvin talked about the possibility that life might have come to Earth from elsewhere via meteorites, believing that his estimates of the age of the Earth (which didn’t take into account radioactivity) made it too young for the things Darwin described to have occurred.
By the 1870s, Kelvin had become a distinguished man of science, receiving all sorts of honors, assignments and invitations. And in 1876, for example, he was invited to Philadelphia to chair the committee judging electrical inventions at the US Centennial International Exhibition, notably reporting, in the terms of the time:
Then in 1892 a “peerage of the realm” was conferred on him by Queen Victoria. His wife (he had remarried) and various friends (including Charles Darwin’s son George) suggested he pick the title “Kelvin”, after the River Kelvin that flowed by the university in Glasgow. And by the end of his life “Lord Kelvin” had accumulated enough honorifics that they were just summarized with “…” (the MD was an honorary degree conferred by the University of Heidelberg because “it was the only one at their disposal which he did not already possess”):
And when Kelvin died in 1907 he was given a state funeral and buried in Westminster Abbey near Newton and Darwin.
James Clerk Maxwell lived only 48 years but in that time managed to do a remarkable amount of important science. His early years were spent on a 1500acre family estate (inherited by his father) in a fairly remote part of Scotland—to which he would return later. He was an only child and was homeschooled—initially by his mother, until she died, when he was 8. At 10 he went to an upscale school in Edinburgh, and by the age of 14 had written his first scientific paper. At 16 he went as an undergraduate to the University of Edinburgh, then, effectively as a graduate student, to Cambridge—coming second in the final exams (“Second Wrangler”) to a certain Edward Routh, who would spend most of his life coaching other students on those very same exams.
Within a couple of years, Maxwell was a professor, first in Aberdeen, then in London. In Aberdeen he married the daughter of the university president, who would soon be his “Observer K” (for “Katherine”) in his classic work on color vision. But after nine fairly strenuous years as a professor, Maxwell in 1865 “retired” to his family estate, supervising a house renovation, and in “rural solitude” (recreationally riding around his estate on horseback with his wife) having the most scientifically productive time of his life. In addition to his work on things like the kinetic theory of gases, he also wrote his 2volume Treatise on Electricity and Magnetism, which ultimately took 7 years to finish, and which, with considerable clarity, described his approach to electromagnetism and what are now called “Maxwell’s Equations”. Occasionally, there were hints of his “country life”—like his 1870 “On Hills and Dales” that in his characteristic mathematicizeeverything way gave a kind of “pretopological” analysis of contour maps (perhaps conceived as he walked half a mile every day down to the mailbox at which journals and correspondence would arrive):
As a person, Maxwell was calm, reserved and unassuming, yet cheerful and charming—and given to writing (arguably sometimes sophomoric) poetry:
With a certain sense of the absurd, he would occasionally publish satirical pieces in Nature, signing them dp/dt, which in the thermodynamic notation created by his friend Tait was equal to JCM, which were his initials. Maxwell liked games and tricks, and spinning tops featured prominently in some of his work. He enjoyed children, though never had any of his own. As a lecturer, he prepared diligently, but often got too sophisticated for his audience. In writing, though, he showed both great clarity and great erudition, for example freely quoting Latin and Greek in articles he wrote for the 9th edition of the Encyclopedia Britannica (of which he was scientific coeditor) on topics such as “Atom” and “Ether”.
As we mentioned above, Maxwell was quite an enthusiast of diagrams and visual presentation (even writing an article on “Diagrams” for the Encyclopedia Britannica). He was also a capable experimentalist, making many measurements (sometimes along with his wife), and in 1861 creating the first color photograph.
In 1871 William Cavendish, 7th Duke of Devonshire, who had studied math in Cambridge, and was now chancellor of the university, agreed to put up the money to build what became the Cavendish Laboratory and to endow a new chair of experimental physics. Kelvin having turned down the job, it was offered to the stillratherobscure Maxwell, who somewhat reluctantly accepted—with the result that for several years he spent much of his time supervising the design and building of the lab.
The lab was finished in 1874, but then William Cavendish dropped on Maxwell a large collection of papers from his great uncle Henry Cavendish, who had been a wealthy “gentleman scientist” of the late 1700s and (among other things) the discoverer of hydrogen. Maxwell liked history (as some of us do!), noticed that Cavendish had discovered Ohm’s law 50 years before Ohm, and in the end spent several years painstakingly editing and annotating the papers into a 500page book. By 1879 Maxwell was finally ready to energetically concentrate on physics research again, but, sadly, in the fall of that year his health failed, and he died at the age of 48—having succumbed to stomach cancer, as his mother also had at almost the same age.
Gibbs was born near the Yale campus, and died there 64 years later, in the same house where he had lived since he was 7 years old (save for three years spent visiting European universities as a young man, and regular summer “outinnature” vacations). His father (who, like, “our Gibbs” was named “Josiah Willard”—making “our Gibbs” be called “Willard”) came from an old and distinguished intellectual and religious New England family, and was a professor of sacred languages at Yale. Willard Gibbs went to college and graduate school at Yale, and then spent his whole career as a professor at Yale.
He was, it seems, a quiet, modest and rather distant person, who radiated a certain serenity, regularly attended church, had a small circle of friends and lived with his two sisters (and the husband and children of one of them). He diligently discharged his teaching responsibilities, though his lectures were very sparsely attended, and he seems not to have been thought forceful enough in dealing with people to have been called on for many administrative tasks—though he became the treasurer of his former high school, and himself was careful enough with money that by the end of his life he had accumulated what would now be several million dollars.
He had begun his academic career in practical engineering, for example patenting an “improved [railway] carbrake”, but was soon drawn in more mathematical directions, favoring a certain clarity and minimalism of formulation, and a cleanliness, if not brevity, of exposition. His work on thermodynamics (initially published in the rather obscure Transactions of the Connecticut Academy) was divided into two parts: the first, in the 1870s, concentrating on macroscopic equilibrium properties, and second, in the 1890s, concentrating on microscopic “statistical mechanics” (as Gibbs called it). Even before he started on thermodynamics, he’d been interested in electromagnetism, and between his two “thermodynamic periods”, he again worked on electromagnetism. He studied Maxwell’s work, and was at first drawn to the thenpopular formalism of quaternions—but soon decided to invent his own approach and notation for vector analysis, which at first he presented only in notes for his students, though it later became widely adopted.
And while Gibbs did increasingly mathematical work, he never seems to have identified as a mathematician, modestly stating that “If I have had any success in mathematical physics, it is, I think, because I have been able to dodge mathematical difficulties.” His last work was his book on statistical mechanics, which—with considerable effort and perhaps damage to his health—he finished in time for publication in connection with the Yale bicentennial in 1901 (an event which notably also brought a visit from Kelvin), only to die soon thereafter.
Gibbs had a few graduate students at Yale, a notable one being Lee de Forest, inventor of the vacuum tube (triode) electronic amplifier, and radio entrepreneur. (de Forest’s 1899 PhD thesis was entitled “Reflection of Hertzian Waves from the Ends of Parallel Wires”.) Another student of Gibbs was Lynde Wheeler, who became a government radio scientist, and who wrote a biography of Gibbs, of which I have a copy bought years ago at a used bookstore—that I was now just about to put back on a shelf when I opened its front cover and found an inscription:
And, yes, it’s a small world, and “To Willard” refers to Gibbs’s sister’s son (Willard Gibbs Van Name, who became a naturalist and wrote a 1929 book about national park deforestation).
Of the people we’re discussing, Boltzmann is the one whose career was most focused on the Second Law. Boltzmann grew up in Austria, where his father was a civil servant (who died when Boltzmann was 15) and his mother was something of an heiress. Boltzmann did his PhD at the University of Vienna, where his professor notably gave him a copy of some of Maxwell’s papers, together with an English grammar book. Boltzmann started publishing his own papers near the end of his PhD, and soon landed a position as a professor of mathematical physics in Graz. Four years later he moved to Vienna as a professor of mathematics, soon moving back to Graz as a professor of “general and experimental physics”—a position he would keep for 14 years.
He’d married in 1876, and had 5 children, though a son died in 1889, leaving 3 daughters and another son. Boltzmann was apparently a clear and lively lecturer, as well as a spirited and eager debater. He seems, at least in his younger years, to have been a happy and gregarious person, with a strong taste for music—and some charming doityourownway tendencies. For example, wanting to provide fresh milk for his children, he decided to just buy a cow, which he then led from the market through the streets—though had to consult his colleague, the professor of zoology, to find out how to milk it. Boltzmann was a capable experimental physicist, as well as a creator of gadgets, and a technology enthusiast—promoting the idea of airplanes (an application for gas theory!) and noting their potential power as a means of transportation.
Boltzmann had always had mood swings, but by the early 1890s he claimed they were getting worse. It didn’t help that he was worn down by administrative work, and had worsening asthma and increasing nearsightedness (that he’d thought might be a sign of going blind). He moved positions, but then came back to Vienna, where he embarked on writing what would become a 2volume book on Gas Theory—in effect contextualizing his life’s work. The introduction to the first volume laments that “gas theory has gone out of fashion in Germany”. The introduction to the second volume, written in 1898 when Boltzmann was 54, then says that “attacks on the theory of gases have begun to increase”, and continues:
… it would be a great tragedy for science if the theory of gases were temporarily thrown into oblivion because of a momentary hostile attitude toward it, as, for example, was the wave theory [of light] because of Newton’s authority.
I am conscious of being only an individual struggling weakly against the stream of time. But it still remains in my power to contribute in such a way that, when the theory of gases is again revived, not too much will have to be rediscovered.
But even as he was writing this, Boltzmann had pretty much already wound down his physics research, and had basically switched to exposition, and to philosophy. He moved jobs again, but in 1902 again came back to Vienna, but now also as a professor of philosophy. He gave an inaugural lecture, first quoting his predecessor Ernst Mach (1838–1916) as saying “I do not believe that atoms exist”, then discussing the philosophical relations between reality, perception and models. Elsewhere he discussed things like his view of the different philosophical character of models associated with differential equations and with atomism—and he even wrote an article on the general topic of “Models” for Encyclopedia Britannica (which curiously also talks about “in pure mathematics, especially geometry, models constructed of papiermâché and plaster”). Sometimes Boltzmann’s philosophy could be quite polemical, like his attack on Schopenhauer, that ends by saying that “men [should] be freed from the spiritual migraine that is called metaphysics”.
Then, in 1904, Boltzmann addressed the Vienna Philosophical Society (a kind of predecessor of the Vienna Circle) on the subject of a “Reply to a Lecture on Happiness by Professor Ostwald”. Wilhelm Ostwald (1853–1932) (a chemist and social reformer, who was a personal friend of Boltzmann’s, but intellectual adversary) had proposed the concept of “energy of will” to apply mathematical physics ideas to psychology. Boltzmann mocked this, describing its faux formalism as “dangerous for science”. Meanwhile, Boltzmann gives his own Darwinian theory for the origin of happiness, based essentially on the idea that unhappiness is needed as a way to make organisms improve their circumstances in the struggle for survival.
Boltzmann himself was continuing to have problems that he attributed to thenpopular but very vague “diagnosis” of “neurasthenia”, and had even briefly been in a psychiatric hospital. But he continued to do things like travel. He visited the US three times, in 1905 going to California (mainly Berkeley)—which led him to write a witty piece entitled “A German Professor’s Trip to El Dorado” that concluded:
Yes, America will achieve great things. I believe in these people, even after seeing them at work in a setting where they’re not at their best: integrating and differentiating at a theoretical physics seminar…
In 1905 Einstein published his Boltzmannandatomismbased results on Brownian motion and on photons. But it’s not clear Boltzmann ever knew about them. For Boltzmann was sinking further. Perhaps he’d overexerted himself in California, but by the spring of 1906 he said he was no longer able to teach. In the summer he went with his family to an Italian seaside resort in an attempt to rejuvenate. But a day before they were to return to Vienna he failed to join his family for a swim, and his youngest daughter found him hanged in his hotel room, dead at the age of 62.
After Gibbs’s 1902 book introducing the idea of ensembles, most of the language used (at least until now!) to discuss the Second Law was basically in place. But in 1912 one additional term—representing a concept already implicit in Gibbs’s work—was added: coarsegraining. Gibbs had discussed how the phase fluid representing possible states of a system could be elaborately mixed by the mechanical time evolution of the system. But realistic practical measurements could not be expected to probe all the details of the distribution of phase fluid; instead one could say that they would only sample “coarsegrained” aspects of it.
The term “coarsegraining” first appeared in a survey article entitled “The Conceptual Foundations of the Statistical Approach in Mechanics”, written for the Germanlanguage Encyclopaedia of the Mathematical Sciences by Boltzmann’s former student Paul Ehrenfest, and his wife Tatiana EhrenfestAfanassjewa:
The article also introduced all sorts of nowstandard notation, and in many ways can be read as a final summary of what was achieved in the original development around the foundations of thermodynamics and the Second Law. (And indeed the article was sufficiently “final” that when it was republished as a book in 1959 it could still be presented as usefully summarizing the state of things.)
Looking at the article now, though, it’s notable how much it recognized was not at all settled about the Second Law and its foundations. It places Boltzmann squarely at the center, stating in its preface:
The section titles are already revealing:
And soon they’re starting to talk about “loose ends”, and lots of them. Ergodicity is something one can talk about, but there’s no known example (and with this definition it was later proved that there couldn’t be):
But, they point out, it’s something Boltzmann needed in order to justify his results:
Soon they’re talking about Boltzmann’s sloppiness in his discussion of the H curve:
And then they’re on to talking about Gibbs, and the gaps in his reasoning:
In the end they conclude:
In other words, even though people now seem to be buying all these results, there are still plenty of issues with their foundations. And despite people’s implicit assumptions, we can in no way say that the Second Law has been “proved”.
It was already realized in the 1600s that when objects get hot they emit “heat radiation”—which can be transferred to other bodies as “radiant heat”. And particularly following Maxwell’s work in the 1860s on electrodynamics it came to be accepted that radiant heat was associated with electromagnetic waves propagating in the “luminiferous aether”. But unlike the molecules from which it was increasingly assumed that one could think of matter as being made, these electromagnetic waves were always treated—particularly on the basis of their mathematical foundations in calculus—as fundamentally continuous.
But how might this relate to the Second Law? Could it be, perhaps, that the Second Law should ultimately be attributed not to some property of the largescale mechanics of discrete molecules, but rather to a feature of continuous radiant heat?
The basic equations assumed for mechanics—originally due to Newton—are reversible. But what about the equations for electrodynamics? Maxwell’s equations are in and of themselves also reversible. But when one thinks about their solutions for actual electromagnetic radiation, there can be fundamental irreversibility. And the reason is that it’s natural to describe the emission of radiation (say from a hot body), but then to assume that, once emitted, the radiation just “escapes to infinity”—rather than ever reversing the process of emission by being absorbed by some other body.
All the various people we’ve discussed above, from Clausius to Gibbs, made occasional remarks about the possibility that the Second Law—whether or not it could be “derived mechanically”—would still ultimately work, if nothing else, because of the irreversible emission of radiant heat.
But the person who would ultimately be most intimately connected to these issues was Max Planck—though in the end the somewhatconfused connection to the Second Law would recede in importance relative to what emerged from it, which was basically the raw material that led to quantum theory.
As a student of Helmholtz’s in Berlin, Max Planck got interested in thermodynamics, and in 1879 wrote a 61page PhD thesis entitled “On the Second Law of Mechanical Heat Theory”. It was a traditional (if slightly streamlined) discussion of the Second Law, very much based on Clausius’s approach (and even with the same title as Clausius’s 1867 paper)—and without any mention whatsoever of Boltzmann:
For most of the two decades that followed, Planck continued to use similar methods to study the Second Law in various settings (e.g. elastic materials, chemical mixtures, etc.)—and meanwhile ascended the German academic physics hierarchy, ending up as a professor of theoretical physics in Berlin. Planck was in many ways a physics traditionalist, not wanting to commit to things like “newfangled” molecular ideas—and as late as 1897 (with his assistant Zermelo having made his “recurrence objection” to Boltzmann’s work) still saying that he would “abstain completely from any definite assumption about the nature of heat”. But regardless of its foundations, Planck was a true believer in the Second Law, for example in 1891 asserting that it “must extend to all forces of nature … not only thermal and chemical, but also electrical and other”.
And in 1895 he began to investigate how the Second Law applied to electrodynamics—and in particular to the “heat radiation” that it had become clear (particularly through Heinrich Hertz’s (1857–1894) experiments) was of electromagnetic origin. In 1896 Wilhelm Wien (1864–1928) suggested that the heat radiation (or what we now call blackbody radiation) was in effect produced by tiny Hertzian oscillators with velocities following a Maxwell distribution.
Planck, however, had a different viewpoint, instead introducing the concept of “natural radiation”—a kind of intrinsic thermal equilibrium state for radiation, with an associated intrinsic entropy. He imagined “resonators” interacting through Maxwell’s equations with this radiation, and in 1899 invented a (rather arbitrary) formula for the entropy of these resonators, that implied (through the laws of electrodynamics) that overall entropy would increase—just like the Second Law said—and when the entropy was maximized it gave the same result as Wien for the spectrum of blackbody radiation. In early 1900 he sharpened his treatment and began to suggest that with his approach Wien’s form of the blackbody spectrum would emerge as a provable consequence of the universal validity of the Second Law.
But right around that time experimental results arrived that disagreed with Wien’s law. And by the end of 1900 Planck had a new hypothesis, for which he finally began to rely on ideas from Boltzmann. Planck started from the idea that he should treat the behavior of his resonators statistically. But how then could he compute their entropy? He quotes (for the first time ever) his simplification of Boltzmann’s formula for entropy:
As he explains it—claiming now, after years of criticizing Boltzmann, that this is a “theorem”:
We now set the entropy S of the system proportional to the logarithm of its probability W… In my opinion this actually serves as a definition of the probability W, since in the basic assumptions of electromagnetic theory there is no definite evidence for such a probability. The suitability of this expression is evident from the outset, in view of its simplicity and close connection with a theorem from kinetic gas theory.
But how could he figure out the probability for a resonator to have a certain energy, and thus a certain entropy? For this he turns directly to Boltzmann—who, as a matter of convenience in his 1877 paper had introduced discrete values of energy for molecules. Planck simply states that it’s “necessary” (i.e. to get the experimentally right answer) to treat the resonator energy “not as a continuous, infinitely divisible quantity, but as a discrete quantity composed of an integral number of finite equal parts”. As an example of how this works he gives a table just like the one in Boltzmann’s paper from nearly a quarter of a century earlier:
Pretty soon he’s deriving the entropy of a resonator as a function of its energy, and its discrete energy unit ϵ:
Connecting this to blackbody radiation he claims that each resonator’s energy unit is connected to its frequency according to
so that its entropy is
“[where] h and k are universal constants”.
In a similar situation Boltzmann had effectively taken the limit ϵ→0, because that’s what he believed corresponded to (“calculusbased”) physical reality. But Planck—in what he later described as an “act of desperation” to fit the experimental data—didn’t do that. So in computing things like average energies he’s evaluating Sum[x Exp[a x], {x, 0, ∞}] rather than Integrate[x Exp [a x], {x, 0, Infinity}]. And in doing this it takes him only a few lines to derive what’s now called the Planck spectrum for blackbody radiation (i.e. for “radiation in equilibrium”):
And then by fitting this result to the data of the time he gets “Planck’s constant” (the correct result is 6.62):
And, yes, this was essentially the birth of quantum mechanics—essentially as a spinoff from an attempt to extend the domain of the Second Law. Planck himself didn’t seem to internalize what he’d done for at least another decade. And it was really Albert Einstein’s 1905 analysis of the photoelectric effect that made the concept of the quantization of energy that Planck had assumed (more as a calculational hypothesis than anything else) seem to be something of real physical significance—that would lead to the whole development of quantum mechanics, notably in the 1920s.
As we discussed at the very beginning above, already in antiquity there was a notion that at least things like solids and liquids might not ultimately be continuous (as they seemed), but might instead be made of large numbers of discrete “atomic” elements. By the 1600s there was also the idea that light might be “corpuscular”—and, as we discussed above, gases too. But meanwhile, there were opposing theories that espoused continuity—like the caloric theory of heat. And particularly with the success of calculus, there was a strong tendency to develop theories that showed continuity—and to which calculus could be applied.
But in the early 1800s—notably with the work of John Dalton (1766–1844)—there began to be evidence that there were discrete entities participating in chemical reactions. Meanwhile, as we discussed above, the success of the kinetic theory of gases gave increasing evidence for some kind of—at least effectively—discrete elements in gases. But even people like Boltzmann and Maxwell were reluctant to assert that gases really were made of molecules. And there were plenty of wellknown scientists (like Ernst Mach) who “opposed atomism”, often effectively on the grounds that in science one should only talk about things one can actually see or experience—not things like atoms that were too small for that.
But there was something else too: with Newton’s theory of gravitation as a precursor, and then with the investigation of electromagnetic phenomena, there emerged in the 1800s the idea of a “continuous field”. The interpretation of this was fairly clear for something like an elastic solid or a fluid that exhibited continuous deformations.
Mathematically, things like gravity, magnetism—and heat—seemed to work in similar ways. And it was assumed that this meant that in all cases there had to be some fluidlike “carrier” for the field. And this is what led to ideas like the luminiferous aether as the “carrier” of electromagnetic waves. And, by the way, the idea of an aether wasn’t even obviously incompatible with the idea of atoms; Kelvin, for example, had a theory that atoms were vortices (perhaps knotted) in the aether.
But how does this all relate to the Second Law? Well, particularly through the work of Boltzmann there came to be the impression that given atomism, probability theory could essentially “prove” the Second Law. A few people tried to clarify the formal details (as we discussed above), but it seemed like any final conclusion would have to await the validation (or not) of atomism, which in the late 1800s was still a thoroughly controversial theory.
By the first decade of the 1900s, however, the fortunes of atomism began to change. In 1897 J. J. Thomson (1856–1940) discovered the electron, showing that electricity was fundamentally “corpuscular”. And in 1900 Planck had (at least calculationally) introduced discrete quanta of energy. But it was the three classic papers of Albert Einstein in 1905 that—in their different ways—began to secure the ultimate success of atomism.
First there was his paper “On a Heuristic View about the Production and Transformation of Light”, which began:
Maxwell’s theory of electromagnetic [radiation] differs in a profound, essential way from the current theoretical models of gases and other matter. We consider the state of a material body to be completely determined by the positions and velocities of a finite number of atoms and electrons, albeit a very large number. But the electromagnetic state of a region of space is described by continuous functions …
He then points out that optical experiments look only at timeaveraged electromagnetic fields, and continues:
In particular, blackbody radiation, photoluminescence, [the photoelectric effect] and other phenomena associated with the generation and transformation of light seem better modeled by assuming that the energy of light is distributed discontinuously in space. According to this picture, the energy of a light wave emitted from a point source is not spread continuously over ever larger volumes, but consists of a finite number of energy quanta that are spatially localized at points of space, move without dividing and are absorbed or generated only as a whole.
In other words, he’s suggesting that light is “corpuscular”, and that energy is quantized. When he begins to get into details, he’s soon talking about the “entropy of radiation”—and, then, in three core sections of his paper, he’s basing what he’s doing on “Boltzmann’s principle”:
Two months later, Einstein produced another paper: “Investigations on the Theory of Brownian Motion”. Back in 1827 the British botanist Robert Brown (1773–1858) had seen under a microscope tiny grains (ejected by pollen) randomly jiggling around in water. Einstein began his paper:
In this paper it will be shown that according to the molecularkinetic theory of heat, bodies of microscopically visible size suspended in a liquid will perform movements of such magnitude that they can be easily observed in a microscope, on account of the molecular motions of heat.
He doesn’t explicitly mention Boltzmann in this paper, but there’s Boltzmann’s formula again:
And by the next year it’s become clear experimentally that, yes, the jiggling Robert Brown had seen was in fact the result of impacts from discrete, real water molecules.
Einstein’s third 1905 paper, “On the Electrodynamics of Moving Bodies”—in which he introduced relativity theory—wasn’t so obviously related to atomism. But in showing that the luminiferous aether will (as Einstein put it) “prove superfluous” he was removing what was (almost!) the last remaining example of something continuous in physics.
In the years after 1905, the evidence for atomism mounted rapidly, segueing in the 1920s into the development of quantum mechanics. But what happened with the Second Law? By the time atomism was generally accepted, the generation of physicists that had included Boltzmann and Gibbs was gone. And while the Second Law was routinely invoked in expositions of thermodynamics, questions about its foundations were largely forgotten. Except perhaps for one thing: people remembered that “proofs” of the Second Law had been controversial, and had depended on the controversial hypothesis of atomism. But—they appear to have reasoned—now that atomism isn’t controversial anymore, it follows that the Second Law is indeed “satisfactorily proved”. And, after all, there were all sorts of other things to investigate in physics.
There are a couple of “footnotes” to this story. The first has to do with Einstein. Right before Einstein’s remarkable series of papers in 1905, what was he working on? The answer is: the Second Law! In 1902 he wrote a paper entitled “Kinetic Theory of Thermal Equilibrium and of the Second Law of Thermodynamics”. Then in 1903: “A Theory of the Foundations of Thermodynamics”. And in 1904: “On the General Molecular Theory of Heat”. The latter paper claims:
I derive an expression for the entropy of a system, which is completely analogous to the one found by Boltzmann for ideal gases and assumed by Planck in his theory of radiation. Then I give a simple derivation of the Second Law.
But what’s actually there is not quite what’s advertised:
It’s a short argument—about interactions between a collection of heat reservoirs. But in a sense it already assumes its answer, and certainly doesn’t provide any kind of fundamental “derivation of the Second Law”. And this was the last time Einstein ever explicitly wrote about deriving the Second Law. Yes, in those days it was just too hard, even for Einstein.
There’s another footnote to this story too. As we said, at the beginning of the twentieth century it had become clear that lots of things that had been thought to be continuous were in fact discrete. But there was an important exception: space. Ever since Euclid (~300 BC), space had almost universally been implicitly assumed to be continuous. And, yes, when quantum mechanics was being built, people did wonder about whether space might be discrete too (and even in 1917 Einstein expressed the opinion that eventually it would turn out to be). But over time the idea of continuous space (and time) got so entrenched in the fabric of physics that when I started seriously developing the ideas that became our Physics Project based on space as a discrete network (or what—in homage to the dynamical theory of heat one might call the “dynamical theory of space”) it seemed to many people quite shocking. And looking back at the controversies of the late 1800s around atomism and its application to the Second Law it’s charming how familiar many of the arguments against atomism seem. Of course it turns out they were wrong—as they seem again to be in the case of space.
The foundations of thermodynamics were a hot topic in physics in the latter half of the nineteenth century—worked on by many of the most prominent physicists of the time. But by the early twentieth century it’d been firmly eclipsed by other areas of physics. And going forward it’d receive precious little attention—with most physicists just assuming it’d “somehow been solved”, or at least “didn’t need to be worried about”.
As a practical matter, thermodynamics in its basic equilibrium form nevertheless became very widely used in engineering and in chemistry. And in physics, there was steadily increasing interest in doing statistical mechanics—typically enumerating states of systems (quantum or otherwise), weighted as they would be in idealized thermal equilibrium. In mathematics, the field of ergodic theory developed, though for the most part it concerned itself with systems (such as ordinary differential equations) involving few variables—making it relevant to the Second Law essentially only by analogy.
There were a few attempts to “axiomatize” the Second Law, but mostly only at a macroscopic level, not asking about its microscopic origins. And there were also attempts to generalize the Second Law to make robust statements not just about equilibrium and the fact that it would be reached, but also about what would happen in systems driven to be in some manner away from equilibrium. The fluctuationdissipation theorem about small perturbations from equilibrium—established in the mid1900s, though anticipated in Einstein’s work on Brownian motion—was one example of a widely applicable result. And there were also related ideas of “minimum entropy production”—as well as “maximum entropy production”. But for large deviations from equilibrium there really weren’t convincing general results, and in practice most investigations basically used phenomenological models that didn’t have obvious connections to the foundations of thermodynamics, or derivations of the Second Law.
Meanwhile, through most of the twentieth century there were progressively more elaborate mathematical analyses of Boltzmann’s equation (and the H theorem) and their relation to rigorously derivable but hardtomanage concepts like the BBGKY hierarchy. But despite occasional claims to the contrary, such approaches ultimately never seem to have been able to make much progress on the core problem of deriving the Second Law.
And then there’s the story of entropy. And in a sense this had three separate threads. The first was the notion of entropy—essentially in the original form defined by Clausius—being used to talk quantitatively about heat in equilibrium situations, usually for either engineering or chemistry. The second—that we’ll discuss a little more below—was entropy as a qualitative characterization of randomness and degradation. And the third was entropy as a general and formal way to measure the “effective number of degrees of freedom” in a system, computed from the log of the number of its achievable states.
There are definitely correspondences between these different threads. But they’re in no sense “obviously equivalent”. And much of the mystery—and confusion—that developed around entropy in the twentieth century came from conflating them.
Another piece of the story was information theory, which arose in the 1940s. And a core question in information theory is how long an “optimally compressed” message will be. And (with various assumptions) the average such length is given by a ∑p log p form that has essentially the same structure as Boltzmann’s expression for entropy. But even though it’s “mathematically like entropy” this has nothing immediately to do with heat—or even physics; it’s just an abstract consequence of needing log Ω bits (i.e. log Ω degrees of freedom) to specify one of Ω possibilities. (Still, the coincidence of definitions led to an “entropy branding” for various essentially informationtheoretic methods, with claims sometimes being made that, for example, the thing called entropy must always be maximized “because we know that from physics”.)
There’d been an initial thought in the 1940s that there’d be an “inevitable Second Law” for systems that “did computation”. The argument was that logical gates (like And and Or) take 2 bits of input (with 4 overall states 11, 10, 01, 00) but give only 1 bit of output (1 or 0), and are therefore fundamentally irreversible. But in the 1970s it became clear that it’s perfectly possible to do computation reversibly (say with 2input, 2output gates)—and indeed this is what’s used in the typical formalism for quantum circuits.
As I’ll mention elsewhere, there were some computer experiments in the 1950s and beyond on model systems—like hard sphere gases and nonlinear springs—that showed some sign of Second Law behavior (though less than might have been expected). But the analysis of these systems very much concentrated on various regularities, and not on the effective randomness associated with Second Law behavior.
In another direction, the 1970s saw the application of thermodynamic ideas to black holes. At first, it was basically a pure analogy. But then quantum field theory calculations suggested that black holes should produce thermal radiation as if they had a certain effective temperature. By the late 1990s there were more direct ways to “compute entropy” for black holes, by enumerating possible (quantum) configurations consistent with the overall characteristics of the black hole. But such computations in effect assume (timeinvariant) equilibrium, and so can’t be expected to shed light directly on the Second Law.
Talking about black holes brings up gravity. And in the course of the twentieth century there were scattered efforts to understand the effect of gravity on the Second Law. Would a selfgravitating gas achieve “equilibrium” in the usual sense? Does gravity violate the Second Law? It’s been difficult to get definitive answers. Many specific simulations of nbody gravitational systems were done, but without global conclusions for the Second Law. And there were cosmological arguments, particularly about the role of gravity in accounting for entropy in the early universe—but not so much about the actual evolution of the universe and the effect of the Second Law on it.
Yet another direction has involved quantum mechanics. The standard formalism of quantum mechanics—like classical mechanics—is fundamentally reversible. But the formalism for measurement introduced in the 1930s—arguably as something of a hack—is fundamentally irreversible, and there’ve been continuing arguments about whether this could perhaps “explain the Second Law”. (I think our Physics Project finally provides more clarity about what’s going on here—but also tells us this isn’t what’s “needed” for the Second Law.)
From the earliest days of the Second Law, there had always been scattered but ultimately unconvincing assertions of exceptions to the Second Law—usually based on elaborately constructed machines that were claimed to be able to achieve perpetual motion “just powered by heat”. Of course, the Second Law is a claim about large numbers of molecules, etc.—and shouldn’t be expected to apply to very small systems. But by the end of the twentieth century it was starting to be possible to make micromachines that could operate on small numbers of molecules (or electrons). And with the right control systems in place, it was argued that such machines could—at least in principle—effectively be used to set up Maxwell’s demons that would systematically violate the Second Law, albeit on a very small scale.
And then there was the question of life. Early formulations of the Second Law had tended to talk about applying only to “inanimate matter”—because somehow living systems didn’t seem to follow the same process of inexorable “dissipation to heat” as inanimate, mechanical systems. And indeed, quite to the contrary, they seemed able to take disordered input (like food) and generate ordered biological structures from it. And indeed, Erwin Schrödinger (1887–1961), in his 1944 book What Is Life? talked about “negative entropy” associated with life. But he—and many others since—argue that life doesn’t really violate the Second Law because it’s not operating in a closed environment where one should expect evolution to equilibrium. Instead, it’s constantly being driven away from equilibrium, for example by “organized energy” ultimately coming from the Sun.
Still, the concept of at least locally “antithermodynamic” behavior is often considered to be a potential general signature of life. But already by the early part of the 1900s, with the rise of things like biochemistry, and the decline of concepts like “life force” (which seemed a little like “caloric”), there developed a strong belief that the Second Law must at some level always apply, even to living systems. But, yes, even though the Second Law seemed to say that one can’t “unscramble an egg”, there was still the witty rejoinder: “unless you feed it to a chicken”.
What about biological evolution? Well, Boltzmann had been an enthusiast of Darwin’s idea of natural selection. And—although it’s not clear he made this connection—it was pointed out many times in the twentieth century that just as in the Second Law reversible underlying dynamics generate an irreversible overall effect, so also in Darwinian evolution effectively reversible individual changes aggregate to what at least Darwin thought was an “irreversible” progression to things like the formation of higher organisms.
The Second Law also found its way into the social sciences—sometimes under names like “entropy pessimism”—most often being used to justify the necessity of “Maxwell’sdemonlike” active intervention or control to prevent the collapse of economic or social systems into random or incoherent states.
But despite all these applications of the Second Law, the twentieth century largely passed without significant advances in understanding the origin and foundations of the Second Law. Though even by the early 1980s I was beginning to find results—based on computational ideas—that seemed as if they might finally give a foundational understanding of what’s really happening in the Second Law, and the extent to which the Second Law can in the end be “derived” from underlying “mechanical” rules.
Ask a typical physicist today about the Second Law and they’re likely to be very sure that it’s “just true”. Maybe they’ll consider it “another law of nature” like the conservation of energy, or maybe they’ll think it is something that was “proved long ago” from basic principles of mathematics and mechanics. But as we’ve discussed here, there’s really nowhere in the history of the Second Law that should give us this degree of certainty. So where did all the certainty come from? I think in the end it’s a mixture of a kind of don’tquestionthisitcomesfromsophisticatedscience mystique about the Second Law, together with a century and a half of “increasingly certain” textbooks. So let’s talk about the textbooks.
While early contributions to what we now call thermodynamics (and particularly those from continental Europe) often got published as monographs, the first “actual textbooks” of thermodynamics already started to appear in the 1860s, with three examples (curiously, all in French) being:
And in these early textbooks what one repeatedly sees is that the Second Law is simply cited—without much comment—as a “principle” or “axiom” (variously attributed to Carnot, Kelvin or Clausius, and sometimes called “the Principle of Carnot”), from which theory will be developed. By the 1870s there’s a bit of confusion starting to creep in, because people are talking about the “Theorem of Carnot”. But, at least at first, by this they mean not the Second Law, but the result on the efficiency of heat engines that Carnot derived from this.
Occasionally, there are questions in textbooks about the validity of the Second Law. A notable one, that we discussed above when we talked about Maxwell’s demon, shows up under the title “Limitation of the Second Law of Thermodynamics” at the end of Maxwell’s 1871 Theory of Heat.
Tait’s largely historical 1877 Sketch of Thermodynamics notes that, yes, the Second Law hasn’t successfully been proved from the laws of mechanics:
In 1879, Eddy’s Thermodynamics at first shows even more skepticism
but soon he’s talking about how “Rankine’s theory of molecular vortices” has actually “proved the Second Law”:
He goes on to give some standard “phenomenological” statements of the Second Law, but then talks about “molecular hypotheses from which Carnot’s principle has been derived”:
Pretty soon there’s confusion like the section in Alexandre Gouilly’s (1842–1906) 1877 Mechanical Theory of Heat that’s entitled “Second Fundamental Theorem of Thermodynamics or the Theorem of Carnot”:
More textbooks on thermodynamics follow, but the majority tend to be practical expositions (that are often incredibly similar to each other) with no particular theoretical discussion of the Second Law, its origins or validity.
In 1891 there’s an “official report about the Second Law” commissioned by the British Association for the Advancement of Science (and written by a certain George Bryan (1864–1928) who would later produce a thermodynamics textbook):
There’s an enumeration of approaches so far:
Somewhat confusingly it talks about a “proof of the Second Law”—actually referring to an alreadyinequilibrium result:
There’s talk of mechanical instability leading to irreversibility:
The conclusions say that, yes, the Second Law isn’t proved “yet”
but imply that if only we knew more about molecules that might be enough to nail it:
But back to textbooks. In 1895 Boltzmann published his Lectures on Gas Theory, which includes a final chapter about the H theorem and its relation to the Second Law. Boltzmann goes through his mathematical derivations for gases, then (rather overoptimistically) asserts that they’ll also work for solids and liquids:
We have looked mainly at processes in gases and have calculated the function H for this case. Yet the laws of probability that govern atomic motion in the solid and liquid states are clearly not qualitatively different … from those for gases, so that the calculation of the function H corresponding to the entropy would not be more difficult in principle, although to be sure it would involve greater mathematical difficulties.
But soon he’s discussing the more philosophical aspects of things (and by the time Boltzmann wrote this book, he was a professor of philosophy as well as physics). He says that the usual statement of the Second Law is “asserted phenomenologically as an axiom” (just as he says the infinite divisibility of matter also is at that time):
… the Second Law is formulated in such a way that the unconditional irreversibility of all natural processes is asserted as an axiom, just as general physics based on a purely phenomenological standpoint asserts the unconditional divisibility of matter without limit as an axiom.
One might then expect him to say that actually the Second Law is somehow provable from basic physical facts, such as the First Law. But actually his claims about any kind of “general derivation” of the Second Law are rather subdued:
Since however the probability calculus has been verified in so many special cases, I see no reason why it should not also be applied to natural processes of a more general kind. The applicability of the probability calculus to the molecular motion in gases cannot of course be rigorously deduced from the differential equations for the motion of the molecules. It follows rather from the great number of the gas molecules and the length of their paths, by virtue of which the properties of the position in the gas where a molecule undergoes a collision are completely independent of the place where it collided the previous time.
But he still believes in the ultimate applicability of the Second Law, and feels he needs to explain why—in the face of the Second Law—the universe as we perceive “still has interesting things going on”:
… small isolated regions of the universe will always find themselves “initially” in an improbable state. This method seems to me to be the only way in which one can understand the Second Law—the heat death of each single world—without a unidirectional change of the entire universe from a definite initial state to a final state.
Meanwhile, he talks about the idea that elsewhere in the universe things might be different—and that, for example, entropy might be systematically decreasing, making (he suggests) perceived time run backwards:
In the entire universe, the aggregate of all individual worlds, there will however in fact
occur processes going in the opposite direction. But the beings who observe such processes will simply reckon time as proceeding from the less probable to the more probable states, and it will never be discovered whether they reckon time differently from us, since they are separated from us by eons of time and spatial distances 10^{1010} times the distance of Sirius—and moreover their language has no relation to ours.
Most other textbook discussions of thermodynamics are tamer than this, but the rather anthropicstyle argument that “we live in a fluctuation” comes up over and over again as an ultimate way to explain the fact that the universe as we perceive it isn’t just a featureless maximumentropy place.
It’s worth noting that there are roughly three general streams of textbooks that end up discussing the Second Law. There are books about rather practical thermodynamics (of the type pioneered by Clausius), that typically spend most of their time on the equilibrium case. There are books about kinetic theory (effectively pioneered by Maxwell), that typically spend most of their time talking about the dynamics of gas molecules. And then there are books about statistical mechanics (as pioneered by Gibbs) that discuss with various degrees of mathematical sophistication the statistical characteristics of ensembles.
In each of these streams, many textbooks just treat the Second Law as a starting point that can be taken for granted, then go from there. But particularly when they are written by physicists with broader experience, or when they are intended for a nottotallyspecialized audience, textbooks will quite often attempt at least a little justification or explanation for the Second Law—though rather often with a distinct sleight of hand involved.
For example, when Planck in 1903 wrote his Treatise on Thermodynamics he had a chapter in his discussion of the Second Law, misleadingly entitled “Proof”. Still, he explains that:
The second fundamental principle of thermodynamics [Second Law] being, like the first, an empirical law, we can speak of its proof only in so far as its total purport may be deduced from a single selfevident proposition. We, therefore, put forward the following proposition as being given directly by experience. It is impossible to construct an engine which will work in a complete cycle, and produce no effect except the raising of a weight and the cooling of a heatreservoir.
In other words, his “proof” of the Second Law is that nobody has ever managed to build a perpetual motion machine that violates it. (And, yes, this is more than a little reminiscent of P ≠ NP, which, through computational irreducibility, is related to the Second Law.) But after many pages, he says:
In conclusion, we shall briefly discuss the question of the possible limitations to the Second Law. If there exist any such limitations—a view still held by many scientists and philosophers—then this [implies an error] in our starting point: the impossibility of perpetual motion …
(In the 1905 edition of the book he adds a footnote that frankly seems bizarre in view of his—albeit perhaps initially unwilling—role in the initiation of quantum theory five years earlier: “The following discussion, of course, deals with the meaning of the Second Law only insofar as it can be surveyed from the points of view contained in this work avoiding all atomic hypotheses.”)
He ends by basically saying “maybe one day the Second Law will be considered necessarily true; in the meantime let’s assume it and see if anything goes wrong”:
Presumably the time will come when the principle of the increase of the entropy will be presented without any connection with experiment. Some metaphysicians may even put it forward as being a priori valid. In the meantime, no more effective weapon can be used by both champions and opponents of the Second Law than the indefatigable endeavour to follow the real purport of this law to the utmost consequences, taking the latter one by one to the highest court of appeal experience. Whatever the decision may be, lasting gain will accrue to us from such a proceeding, since thereby we serve the chief end of natural science the enlargement of our stock of knowledge.
Planck’s book came in a sense from the Clausius tradition. James Jeans’s (1877–1946) 1904 book The Dynamical Theory of Gases came instead from the Maxwell + Boltzmann tradition. He says at the beginning—reflecting the fact the existence of molecules had not yet been firmly established in 1904—that the whole notion of the molecular basis of heat “is only a hypothesis”:
Later he argues that molecularscale processes are just too “finegrained” to ever be directly detected:
But soon Jeans is giving a derivation of Boltzmann’s H theorem, though noting some subtleties:
His take on the “reversibility objection” is that, yes, the H function will be symmetric at every maximum, but, he argues, it’ll also be discontinuous there:
And in the timehonored tradition of saying “it is clear” right when an argument is questionable, he then claims that an “obvious averaging” will give irreversibility and the Second Law:
Later in his book Jeans simply quotes Maxwell and mentions his demon:
Then effectively just tells readers to go elsewhere:
In 1907 George Bryan (whose 1891 report we mentioned earlier) published Thermodynamics, an Introductory Treatise Dealing Mainly with First Principles and Their Direct Applications. But despite its title, Bryan has now “walked back” the hopes of his earlier report and is just treating the Second Law as an “axiom”:
And—presumably from his interactions with Boltzmann—is saying that the Second Law is basically an empirical fact of our particular experience of the universe, and thus not something fundamentally derivable:
As the years went by, many thermodynamics textbooks appeared, increasingly with an emphasis on applications, and decreasingly with a mention of foundational issues—typically treating the Second Law essentially just as an absolute empirical “law of nature” analogous to the First Law.
But in other books—including some that were widely read—there were occasional mentions of the foundations of the Second Law. A notable example was in Arthur Eddington’s (1882–1944) 1929 The Nature of the Physical World—where now the Second Law is exalted as having the “supreme position among the laws of Nature”:
Although Eddington does admit that the Second Law is probably not “mathematically derivable”:
And even though in the twentieth century questions about thermodynamics and the Second Law weren’t considered “top physics topics”, some top physicists did end up talking about them, if nothing else in general textbooks they wrote. Thus, for example, in the 1930s and 1940s people like Enrico Fermi (1901–1954) and Wolfgang Pauli (1900–1958) wrote in some detail about the Second Law—though rather strenuously avoided discussing foundational issues about it.
Lev Landau (1908–1968), however, was a different story. In 1933 he wrote a paper “On the Second Law of Thermodynamics and the Universe” which basically argues that our everyday experience is only possible because “the world as a whole does not obey the laws of thermodynamics”—and suggests that perhaps relativistic quantum mechanics (which he says, quoting Niels Bohr (1885–1962), could be crucial in the center of stars) might fundamentally violate the Second Law. (And yes, even today it’s not clear how “relativistic temperature” works.)
But this kind of outright denial of the Second Law had disappeared by the time Lev Landau and Evgeny Lifshitz (1915–1985) wrote the 1951 version of their book Statistical Mechanics—though they still showed skepticism about its origins:
There is no doubt that the foregoing simple formulations [of the Second Law] accord with reality; they are confirmed by all our everyday observations. But when we consider more closely the problem of the physical nature and origin of these laws of behaviour, substantial difficulties arise, which to some extent have not yet been overcome.
Their book continues, discussing Boltzmann’s fluctuation argument:
Firstly, if we attempt to apply statistical physics to the entire universe … we immediately encounter a glaring contradiction between theory and experiment. According to the results of statistics, the universe ought to be in a state of complete statistical equilibrium. … Everyday experience shows us, however, that the properties of Nature bear no resemblance to those of an equilibrium system; and astronomical results show that the same is true throughout the vast region of the Universe accessible to our observation.
We might try to overcome this contradiction by supposing that the part of the Universe which we observe is just some huge fluctuation in a system which is in equilibrium as a whole. The fact that we have been able to observe this huge fluctuation might be explained by supposing that the existence of such a fluctuation is a necessary condition for the existence of an observer (a condition for the occurrence of biological evolution). This argument, however, is easily disproved, since a fluctuation within, say, the volume of the solar system only would be very much more probable, and would be sufficient to allow the existence of an observer.
What do they think is the way out? The effect of gravity:
… in the general theory of relativity, the Universe as a whole must be regarded not as a closed system but as a system in a variable gravitational field. Consequently the application of the law of increase of entropy does not prove that statistical equilibrium must necessarily exist.
But they say this isn’t the end of the problem, essentially noting the reversibility objection. How should this be overcome? First, they suggest the solution might be that the observer somehow “artificially closes off the history of a system”, but then they add:
Such a dependence of the laws of physics on the nature of an observer is quite inadmissible, of course.
They continue:
At the present time it is not certain whether the law of increase of entropy thus formulated can be derived on the basis of classical mechanics. … It is more reasonable to suppose that the law of increase of entropy in the above general formulation arises from quantum effects.
They talk about the interaction of classical and quantum systems, and what amounts to the explicit irreversibility of the traditional formalism of quantum measurement, then say that if quantum mechanics is in fact the ultimate source of irreversibility:
… there must exist an inequality involving the quantum constant ℏ which ensures the validity of the law and is satisfied in the real world…
What about other textbooks? Joseph Mayer (1904–1983) and Maria Goeppert Mayer’s (1906–1972) 1940 Statistical Mechanics has the rather charming
though in the end they sidestep difficult questions about the Second Law by basically making convenient definitions of what S and Ω mean in S = k log Ω.
For a long time one of the most cited textbooks in the area was Richard Tolman’s (1881–1948) 1938 Principles of Statistical Mechanics. Tolman (basically following Gibbs) begins by explaining that statistical mechanics is about making predictions when all you know are probabilistic statements about initial conditions:
Tolman continues
He notes that, historically, statistical mechanics was developed for studying systems like gases, where (in a vague foreshadowing of the concept of computational irreducibility) “it is evident that we should be quickly lost in the complexities of our computations” if we try to trace every molecule, but where, he claims, statistical mechanics can still accurately tell us “statistically” what will happen:
But where exactly should we get the probability distributions for initial states from? Tolman says he’s going to consider the kinds of mathematically defined ensembles that Gibbs discusses. And tucked away at the end of a chapter he admits that, well, yes, this setup is really all just a postulate—set up so as to make the results of statistical mechanics “merely a matter for computation”:
On this basis Tolman then derives Boltzmann’s H theorem, and his “coarsegrained” generalization (where, yes, the coarsegraining ultimately operates according to his postulate). For 530 pages, there’s not a single mention of the Second Law. But finally, on page 558 Tolman is at least prepared to talk about an “analog of the Second Law”:
And basically what Tolman argues is that his can reasonably be identified with thermodynamic entropy S. In the end, the argument is very similar to Boltzmann’s, though Tolman seems to feel that it has achieved more:
Very different in character from Tolman’s book, another widely cited book is Percy Bridgman’s (1882–1961) largely philosophical 1943 The Nature of Thermodynamics. His chapter on the Second Law begins:
A decade earlier Bridgman had discussed outright violations of the Second Law, saying that he’d found that the younger generation of physicists at the time seemed to often think that “it may be possible some day to construct a machine which shall violate the Second Law on a scale large enough to be commercially profitable”—perhaps, he said, by harnessing Brownian motion:
At a philosophical level, a notable treatment of the Second Law appeared in Hans Reichenbach’s (1891–1953) (unfinishedathisdeath) 1953 work The Direction of Time. Wanting to make use of the Second Law, but concerned about the reversibility objections, Reichenbach introduces the notion of “branch systems”—essentially parts of the universe that can eventually be considered isolated, but which were once connected to other parts that were responsible for determining their (“nonrandom”) effective initial conditions:
Most textbooks that cover the Second Law use one of the formulations that we’ve already discussed. But there is one more formulation that also sometimes appears, usually associated with the name “Carathéodory” or the term “axiomatic thermodynamics”.
Back in the first decade of the twentieth century—particularly in the circle around David Hilbert (1862–1943)—there was a lot of enthusiasm for axiomatizing things, including physics. And in 1908 the mathematician Constantin Carathéodory (1873–1950) suggested an axiomatization of thermodynamics. His essential idea—that he developed further in the 1920s—was to consider something like Gibbs’s phase fluid and then roughly to assert that it gets (in some measuretheoretic sense) “so mixed up” that there aren’t “experimentally doable” transformations that can unmix it. Or, in his original formulation:
In any arbitrary neighborhood of an arbitrarily given initial point there is a state that cannot be arbitrarily approximated by adiabatic changes of state.
There wasn’t much pickup of this approach—though Max Born (1882–1970) supported it, Max Planck dismissed it, and in 1939 S. Chandrasekhar (1910–1995) based his exposition of stellar structure on it. But in various forms, the approach did make it into a few textbooks. An example is Brian Pippard’s (1920–2008) otherwise rather practical 1957 The Elements of Classical Thermodynamics:
Yet another (loosely related) approach is the “postulatory formulation” on which Herbert Callen’s (1919–1993) 1959 textbook Thermodynamics is based:
In effect this is now “assuming the result” of the Second Law:
Though in an appendix he rather tautologically states:
So what about other textbooks? A famous set are Richard Feynman’s (1918–1988) 1963 Lectures on Physics. Feynman starts his discussion of the Second Law quite carefully, describing it as a “hypothesis”:
Feynman says he’s not going to go very far into thermodynamics, though quotes (and criticizes) Clausius’s statements:
But then he launches into a whole chapter on “Ratchet and pawl”:
His goal, he explains, is to analyze a device (similar to what Marian Smoluchowski had considered in 1912) that one might think by its oneway ratchet action would be able to “harvest random heat” and violate the Second Law. But after a few pages of analysis he claims that, no, if the system is in equilibrium, thermal fluctuations will prevent systematic “oneway” mechanical work from being achieved, so that the Second Law is saved.
But now he applies this to Maxwell’s demon, claiming that the same basic argument shows that the demon can’t work:
But what about reversibility? Feynman first discusses what amounts to Boltzmann’s fluctuation idea:
But then he opts instead for the argument that for some reason—then unknown—the universe started in a “lowentropy” state, and has been “running down” ever since:
By the beginning of the 1960s an immense number of books had appeared that discussed the Second Law. Some were based on macroscopic thermodynamics, some on kinetic theory and some on statistical mechanics. In all three of these cases there was elegant mathematical theory to be described, even if it never really addressed the ultimate origin of the Second Law.
But by the early 1960s there was something new on the scene: computer simulation. And in 1965 that formed the core of Fred Reif’s (1927–2019) textbook Statistical Physics:
In a sense the book is an exploration of what simulated hard sphere gases do—as analyzed using ideas from statistical mechanics. (The simulations had computational limitations, but they could go far enough to meaningfully see most of the basic phenomena of statistical mechanics.)
Even the front and back covers of the book provide a bold statement of both reversibility and the kind of randomization that’s at the heart of the Second Law:
But inside the book the formal concept of entropy doesn’t appear until page 147—where it’s defined very concretely in terms of states one can explicitly enumerate:
And finally, on page 283—after all necessary definitions have been built up—there’s a rather prosaic statement of the Second Law, almost as a technical footnote:
Looking though many textbooks of thermodynamics and statistical mechanics it’s striking how singular Reif’s “showdon’ttell” computersimulation approach is. And, as I’ll describe in detail elsewhere, for me personally it has a particular significance, because this is the book that in 1972, at the age of 12, launched me on what has now been a 50year journey to understand the Second Law and its origins.
When the first textbooks that described the Second Law were published nearly a century and a half ago they often (though even then not always) expressed uncertainty about the Second Law and just how it was supposed to work. But it wasn’t long before the vast majority of books either just “assumed the Second Law” and got on with whatever they wanted to apply it to, or tried to suggest that the Second Law had been established from underlying principles, but that it was a sophisticated story that was “out of the scope of this book” but to be found elsewhere. And so it was that a strong sense emerged that the Second Law was something whose ultimate character and origins the typical working scientist didn’t need to question—and should just believe (and protect) as part of the standard canon of science.
The Second Law is now more than 150 years old. But—at least until now—I think it’s fair to say that the fundamental ideas used to discuss it haven’t materially changed in more than a century. There’s a lot that’s been written about the Second Law. But it’s always tended to follow lines of development already defined over a century ago—and mostly those from Clausius, or Boltzmann, or Gibbs.
Looking at word clouds of titles of the thousands of publications about the Second Law over the decades we see just a few trends, like the appearance of the “generalized Second Law” in the 1990s relating to black holes:
But with all this activity why hasn’t more been worked out about the Second Law? How come after all this time we still don’t really even understand with clarity the correspondence between the Clausius, Boltzmann and Gibbs approaches—or how their respective definitions of “entropy” are ultimately related?
In the end, I think the answer is that it needs a new paradigm—that, yes, is fundamentally based on computation and on ideas like computational irreducibility. A little more than a century ago—with people still actively arguing about what Boltzmann was saying—I don’t think anyone would have been too surprised to find out that to make progress would need a new way of looking at things. (After all, just a few years earlier Boltzmann and Gibbs had needed to bring in the new idea of using probability theory.)
But as we discussed, by the beginning of the twentieth century—with other areas of physics heating up—interest in the Second Law was waning. And even with many questions unresolved people moved on. And soon several academic generations had passed. And as is typical in the history of science, by that point nobody was questioning the foundations anymore. In the particular case of the Second Law there was some sense that the uncertainties had to do with the assumption of the existence of molecules, which had by then been established. But more important, I think, was just the passage of “academic time” and the fact that what might once have been a matter of discussion had now just become a statement in the textbooks—that future academic generations should learn and didn’t need to question.
One of the unusual features of the Second Law is that at the time it passed into the “standard canon of science” it was still rife with controversy. How did those different approaches relate? What about those “mathematical objections”? What about the thought experiments that seemed to suggest exceptions? It wasn’t that these issues were resolved. It was just that after enough time had passed people came to assume that “somehow that must have all been worked out ages ago”.
And it wasn’t that there was really any pressure to investigate foundational issues. The Second Law—particularly in its implications for thermal equilibrium—seemed to work just fine in all its standard applications. And it even seemed to work in new domains like black holes. Yes, there was always a desire to extend it. But the difficulties encountered in trying to do so didn’t seem in any obvious way related to issues about its foundations.
Of course, there were always a few people who kept wondering about the Second Law. And indeed I’ve been surprised at how much of a Who’s Who of twentiethcentury physics this seems to have included. But while many wellknown physicists seem to have privately thought about the foundations of the Second Law they managed to make remarkably little progress—and as a result left very few visible records of their efforts.
But—as is so often the case—the issue, I believe, is that a fundamentally new paradigm was needed in order to make real progress. When the “standard canon” of the Second Law was formed in the latter part of the nineteenth century, calculus was the primary tool for physics—with probability theory a newfangled addition introduced specifically for studying the Second Law. And from that time it would be many decades before even the beginnings of the computational paradigm began to emerge, and nearly a century before phenomena like computational irreducibility were finally discovered. Had the sequence been different I have no doubt that what I have now been able to understand about the Second Law would have been worked out by the likes of Boltzmann, Maxwell and Kelvin.
But as it is, we’ve had to wait more than a century to get to this point. And having now studied the history of the Second Law—and seen the tangled manner in which it developed—I believe that we can now be confident that we have indeed successfully been able to resolve many of the core issues and mysteries that have plagued the Second Law and its foundations over the course of nearly 150 years.
Note
Almost all of what I say here is based on my reading of primary literature, assisted by modern tools and by my latest understanding of the Second Law. About some of what I discuss, there is—sometimes quite extensive—existing scholarship; some references are given in the bibliography.
]]>It’s always amazing when things suddenly “just work”. It happened to us with WolframAlpha back in 2009. It happened with our Physics Project in 2020. And it’s happening now with OpenAI’s ChatGPT.
I’ve been tracking neural net technology for a long time (about 43 years, actually). And even having watched developments in the past few years I find the performance of ChatGPT thoroughly remarkable. Finally, and suddenly, here’s a system that can successfully generate text about almost anything—that’s very comparable to what humans might write. It’s impressive, and useful. And, as I’ll discuss elsewhere, I think its success is probably telling us some very fundamental things about the nature of human thinking.
But while ChatGPT is a remarkable achievement in automating the doing of major humanlike things, not everything that’s useful to do is quite so “human like”. Some of it is instead more formal and structured. And indeed one of the great achievements of our civilization over the past several centuries has been to build up the paradigms of mathematics, the exact sciences—and, most importantly, now computation—and to create a tower of capabilities quite different from what pure humanlike thinking can achieve.
I myself have been deeply involved with the computational paradigm for many decades, in the singular pursuit of building a computational language to represent as many things in the world as possible in formal symbolic ways. And in doing this my goal has been to build a system that can “computationally assist”—and augment—what I and others want to do. I think about things as a human. But I can also immediately call on Wolfram Language and WolframAlpha to tap into a kind of unique “computational superpower” that lets me do all sorts of beyondhuman things.
It’s a tremendously powerful way of working. And the point is that it’s not just important for us humans. It’s equally, if not more, important for humanlike AIs as well—immediately giving them what we can think of as computational knowledge superpowers, that leverage the nonhumanlike power of structured computation and structured knowledge.
We’ve just started exploring what this means for ChatGPT. But it’s pretty clear that wonderful things are possible. WolframAlpha does something very different from ChatGPT, in a very different way. But they have a common interface: natural language. And this means that ChatGPT can “talk to” WolframAlpha just like humans do—with WolframAlpha turning the natural language it gets from ChatGPT into precise, symbolic computational language on which it can apply its computational knowledge power.
For decades there’s been a dichotomy in thinking about AI between “statistical approaches” of the kind ChatGPT uses, and “symbolic approaches” that are in effect the starting point for WolframAlpha. But now—thanks to the success of ChatGPT—as well as all the work we’ve done in making WolframAlpha understand natural language—there’s finally the opportunity to combine these to make something much stronger than either could ever achieve on their own.
At its core, ChatGPT is a system for generating linguistic output that “follows the pattern” of what’s out there on the web and in books and other materials that have been used in its training. And what’s remarkable is how humanlike the output is, not just at a small scale, but across whole essays. It has coherent things to say, that pull in concepts it’s learned, quite often in interesting and unexpected ways. What it produces is always “statistically plausible”, at least at a linguistic level. But—impressive as that ends up being—it certainly doesn’t mean that all the facts and computations it confidently trots out are necessarily correct.
Here’s an example I just noticed (and, yes, ChatGPT has intrinsic builtin randomness, so if you try this, you probably won’t get the same result):
It sounds pretty convincing. But it turns out that it’s wrong, as WolframAlpha can tell us:
To be fair, of course, this is exactly the kind of the thing that WolframAlpha is good at: something that can be turned into a precise computation that can be done on the basis of its structured, curated knowledge.
But the neat thing is that one can think about WolframAlpha automatically helping ChatGPT on this. One can programmatically ask WolframAlpha the question (you can also use a web API, etc.):
✕

Now ask the question again to ChatGPT, appending this result:
ChatGPT very politely takes the correction, and if you ask the question yet again it then gives the correct answer. Obviously there could be a more streamlined way to handle the back and forth with WolframAlpha, but it’s nice to see that even this very straightforward purenaturallanguage approach basically already works.
But why does ChatGPT get this particular thing wrong in the first place? If it had seen the specific distance between Chicago and Tokyo somewhere in its training (e.g. from the web), it could of course get it right. But this is a case where the kind of generalization a neural net can readily do—say from many examples of distances between cities—won’t be enough; there’s an actual computational algorithm that’s needed.
The way WolframAlpha handles things is quite different. It takes natural language and then—assuming it’s possible—it converts this into precise computational language (i.e. Wolfram Language), in this case:
✕

The coordinates of cities and algorithms to compute distances between them are then part of the builtin computational knowledge in the Wolfram Language. And, yes, the Wolfram Language has a huge amount of builtin computational knowledge—the result of decades of work on our part, carefully curating what’s now a vast amount of continually updated data, implementing (and often inventing) methods and models and algorithms—and systematically building up a whole coherent computational language for everything.
ChatGPT and WolframAlpha work in very different ways, and have very different strengths. But in the interests of understanding where ChatGPT can take advantage of WolframAlpha’s strengths, let’s discuss some cases where on its own ChatGPT doesn’t do quite the right thing. And one area where ChatGPT—like humans—often tends to struggle is math.
It’s an interesting, essaystyle response. But the actual result is wrong:
But if ChatGPT “consulted” WolframAlpha it’d of course be able to get it right.
Let’s try something slightly more complex:
At first glance, this result looks great, and I’d be inclined to believe it. It turns out, though, that it’s wrong, as WolframAlpha can tell us:
And, yes, doing math homework with ChatGPT (without it being able to consult WolframAlpha) is probably a bad idea. It can give you a very plausible answer:
But without “really understanding the math” it’s basically impossible for ChatGPT to reliably get the right answer. And in this case, the answer is again wrong:
Still, ChatGPT can even make up a very plausiblelooking explanation of “how it got its answer” (not that it’s in any way how it really “did it”). And, rather charmingly (and interestingly), the explanation it gives has mistakes very similar to what a human who didn’t understand the math might also make:
There are all sorts of situations where “not really understanding what things mean” can cause trouble:
That sounds convincing. But it’s not correct:
ChatGPT seemed to have correctly learned this underlying data somewhere—but it doesn’t “understand what it means” enough to be able to correctly rank the numbers:
And, yes, one can imagine finding a way to “fix this particular bug”. But the point is that the fundamental idea of a generativelanguagebased AI system like ChatGPT just isn’t a good fit in situations where there are structured computational things to do. Put another way, it’d take “fixing” an almost infinite number of “bugs” to patch up what even an almostinfinitesimal corner of WolframAlpha can achieve in its structured way.
And the more complex the “computational chain” gets, the more likely you’ll have to call on WolframAlpha to get it right. Here ChatGPT produces a rather confused answer:
And, as WolframAlpha tells us, its conclusion isn’t correct (as it already in a sense “knew”):
Whenever it comes to specific (e.g. quantitative) data—even in fairly raw form—things very often tend to have to be more of a “WolframAlpha story”. Here’s an example, inspired by a longtime favorite WolframAlpha test query “How many turkeys are there in Turkey?”:
Again, this seems (at first) totally plausible, and it’s even quoting a relevant source. Turns out, though, that this data is basically just “made up”:
Still, what’s very nice is that ChatGPT can easily be made to “ask for facts to check”:
Now feed these through the WolframAlpha API:
✕

Now we can ask ChatGPT to fix its original response, injecting this data (and even showing in bold where it did it):
The ability to “inject facts” is particularly nice when it comes to things involving realtime (or location etc. dependent) data or computation. ChatGPT won’t immediately answer this:
But here’s some relevant WolframAlpha API output:
✕

And if we feed this to ChatGPT, it’ll generate a nice “essaystyle” result:
Sometimes there’s an interesting interplay between the computational and the human like. Here’s a rather whimsical question asked of WolframAlpha (and it even checks if you want “softserve” instead):
ChatGPT at first gets a bit confused about the concept of volume:
But then it seems to “realize” that that much ice cream is fairly silly:
Machine learning is a powerful method, and particularly over the past decade, it’s had some remarkable successes—of which ChatGPT is the latest. Image recognition. Speech to text. Language translation. In each of these cases, and many more, a threshold was passed—usually quite suddenly. And some task went from “basically impossible” to “basically doable”.
But the results are essentially never “perfect”. Maybe something works well 95% of the time. But try as one might, the other 5% remains elusive. For some purposes one might consider this a failure. But the key point is that there are often all sorts of important use cases for which 95% is “good enough”. Maybe it’s because the output is something where there isn’t really a “right answer” anyway. Maybe it’s because one’s just trying to surface possibilities that a human—or a systematic algorithm—will then pick from or refine.
It’s completely remarkable that a fewhundredbillionparameter neural net that generates text a token at a time can do the kinds of things ChatGPT can. And given this dramatic—and unexpected—success, one might think that if one could just go on and “train a big enough network” one would be able to do absolutely anything with it. But it won’t work that way. Fundamental facts about computation—and notably the concept of computational irreducibility—make it clear it ultimately can’t. But what’s more relevant is what we’ve seen in the actual history of machine learning. There’ll be a big breakthrough (like ChatGPT). And improvement won’t stop. But what’s much more important is that there’ll be use cases found that are successful with what can be done, and that aren’t blocked by what can’t.
And yes, there’ll be plenty of cases where “raw ChatGPT” can help with people’s writing, make suggestions, or generate text that’s useful for various kinds of documents or interactions. But when it comes to setting up things that have to be perfect, machine learning just isn’t the way to do it—much as humans aren’t either.
And that’s exactly what we’re seeing in the examples above. ChatGPT does great at the “humanlike parts”, where there isn’t a precise “right answer”. But when it’s “put on the spot” for something precise, it often falls down. But the whole point here is that there’s a great way to solve this problem—by connecting ChatGPT to WolframAlpha and all its computational knowledge “superpowers”.
Inside WolframAlpha, everything is being turned into computational language, and into precise Wolfram Language code, that at some level has to be “perfect” to be reliably useful. But the crucial point is that ChatGPT doesn’t have to generate this. It can produce its usual natural language, and then WolframAlpha can use its natural language understanding capabilities to translate that natural language into precise Wolfram Language.
In many ways, one might say that ChatGPT never “truly understands” things; it just “knows how to produce stuff that’s useful”. But it’s a different story with WolframAlpha. Because once WolframAlpha has converted something to Wolfram Language, what it’s got is a complete, precise, formal representation, from which one can reliably compute things. Needless to say, there are plenty of things of “human interest” for which we don’t have formal computational representations—though we can still talk about them, albeit it perhaps imprecisely, in natural language. And for these, ChatGPT is on its own, with its very impressive capabilities.
But just like us humans, there are times when ChatGPT needs a more formal and precise “power assist”. But the point is that it doesn’t have to be “formal and precise” in saying what it wants. Because WolframAlpha can communicate with it in what amounts to ChatGPT’s native language—natural language. And WolframAlpha will take care of “adding the formality and precision” when it converts to its native language—Wolfram Language. It’s a very good situation, that I think has great practical potential.
And that potential is not only at the level of typical chatbot or text generation applications. It extends to things like doing data science or other forms of computational work (or programming). In a sense, it’s an immediate way to get the best of both worlds: the humanlike world of ChatGPT, and the computationally precise world of Wolfram Language.
What about ChatGPT directly learning Wolfram Language? Well, yes, it could do that, and in fact it’s already started. And in the end I fully expect that something like ChatGPT will be able to operate directly in Wolfram Language, and be very powerful in doing so. It’s an interesting and unique situation, made possible by the character of the Wolfram Language as a fullscale computational language that can talk broadly about things in the world and elsewhere in computational terms.
The whole concept of the Wolfram Language is to take things we humans think about, and be able to represent and work with them computationally. Ordinary programming languages are intended to provide ways to tell computers specifically what to do. The Wolfram Language—in its role as a fullscale computational language—is about something much larger than that. In effect, it’s intended to be a language in which both humans and computers can “think computationally”.
Many centuries ago, when mathematical notation was invented, it provided for the first time a streamlined medium in which to “think mathematically” about things. And its invention soon led to algebra, and calculus, and ultimately all the various mathematical sciences. The goal of the Wolfram Language is to do something similar for computational thinking, though now not just for humans—and to enable all the “computational X” fields that can be opened up by the computational paradigm.
I myself have benefitted greatly from having Wolfram Language as a “language to think in”, and it’s been wonderful to see over the past few decades so many advances being made as a result of people “thinking in computational terms” through the medium of Wolfram Language. So what about ChatGPT? Well, it can get into this too. Quite how it will all work I am not yet sure. But it’s not about ChatGPT learning how to do the computation that the Wolfram Language already knows how to do. It’s about ChatGPT learning how to use the Wolfram Language more like people do. It’s about ChatGPT coming up with the analog of “creative essays”, but now written not in natural language but in computational language.
I’ve long discussed the concept of computational essays written by humans—that communicate in a mixture of natural language and computational language. Now it’s a question of ChatGPT being able to write those—and being able to use Wolfram Language as a way to deliver “meaningful communication”, not just to humans, but also to computers. And, yes, there’s a potentially interesting feedback loop involving actual execution of the Wolfram Language code. But the crucial point is that the richness and flow of “ideas” represented by the Wolfram Language code is—unlike in an ordinary programming language—something much closer to the kind of thing that ChatGPT has “magically” managed to work with in natural language.
Or, put another way, Wolfram Language—like natural language–is something expressive enough that one can imagine writing a meaningful “prompt” for ChatGPT in it. Yes, Wolfram Language can be directly executed on a computer. But as a ChatGPT prompt it can be used to “express an idea” whose “story” could be continued. It might describe some computational structure, leaving ChatGPT to “riff” on what one might computationally say about that structure that would—according to what it’s learned by reading so many things written by humans—be “interesting to humans”.
There are all sorts of exciting possibilities, suddenly opened up by the unexpected success of ChatGPT. But for now there’s the immediate opportunity of giving ChatGPT computational knowledge superpowers through WolframAlpha. So it can not just produce “plausible humanlike output”, but output that leverages the whole tower of computation and knowledge that’s encapsulated in WolframAlpha and the Wolfram Language.
]]>In 2020 it was Versions 12.1 and 12.2; in 2021 Versions 12.3 and 13.0. In late June this year it was Version 13.1. And now we’re releasing Version 13.2. We continue to have a huge pipeline of R&D, some short term, some medium term, some long term (like decadeplus). Our goal is to deliver timely snapshots of where we’re at—so people can start using what we’ve built as quickly as possible.
Version 13.2 is—by our standards—a fairly small release, that mostly concentrates on rounding out areas that have been under development for a long time, as well as adding “polish” to a range of existing capabilities. But it’s also got some “surprise” new dramatic efficiency improvements, and it’s got some first hints of major new areas that we have under development—particularly related to astronomy and celestial mechanics.
But even though I’m calling it a “small release”, Version 13.2 still introduces completely new functions into the Wolfram Language, 41 of them—as well as substantially enhancing 64 existing functions. And, as usual, we’ve put a lot of effort into coherently designing those functions, so they fit into the tightly integrated framework we’ve been building for the past 35 years. For the past several years we’ve been following the principle of open code development (does anyone else do this yet?)—opening up our core software design meetings as livestreams. During the Version 13.2 cycle we’ve done about 61 hours of design livestreams—getting all sorts of great realtime feedback from the community (thanks, everyone!). And, yes, we’re holding steady at an overall average of one hour of livestreamed design time per new function, and a little less than half that per enhanced function.
Astronomy has been a driving force for computation for more than 2000 years (from the Antikythera device on)… and in Version 13.2 it’s coming to Wolfram Language in a big way. Yes, the Wolfram Language (and WolframAlpha) have had astronomical data for well over a decade. But what’s new now is astronomical computation fully integrated into the system. In many ways, our astro computation capabilities are modeled on our geo computation ones. But astro is substantially more complicated. Mountains don’t move (at least perceptibly), but planets certainly do. Relativity also isn’t important in geography, but it is in astronomy. And on the Earth, latitude and longitude are good standard ways to describe where things are. But in astronomy—especially with everything moving—describing where things are is much more complicated. Oh, and there’s the question of where things “are,” versus where things appear to be—because of effects ranging from lightpropagation delays to refraction in the Earth’s atmosphere.
The key function for representing where astronomical things are is AstroPosition. Here’s where Mars is now:
✕

What does that output mean? It’s very “here and now” oriented. By default, it’s telling me the azimuth (angle from north) and altitude (angle above the horizon) for Mars from where Here says I am, at the time specified by Now. How can I get a less “personal” representation of “where Mars is”? Because if even I just reevaluate my previous input now, I’ll get a slightly different answer, just because of the rotation of the Earth:
✕

One thing to do is to use equatorial coordinates, that are based on a frame centered at the center of the Earth but not rotating with the Earth. (One direction is defined by the rotation axis of the Earth, the other by where the Sun is at the time of the spring equinox.) The result is the “astronomerfriendly” right ascension/declination position of Mars:
✕

And maybe that’s good enough for a terrestrial astronomer. But what if you want to specify the position of Mars in a way that doesn’t refer to the Earth? Then you can use the nowstandard ICRS frame, which is centered at the center of mass of the Solar System:
✕

Often in astronomy the question is basically “which direction should I point my telescope in?”, and that’s something one wants to specify in spherical coordinates. But particularly if one’s “out and about in the Solar System” (say thinking about a spacecraft), it’s more useful to be able to give actual Cartesian coordinates for where one is:
✕

And here are the raw coordinates (by default in astronomical units):
✕

AstroPosition is backed by lots of computation, and in particular by ephemeris data that covers all planets and their moons, together with other substantial bodies in the Solar System:
✕

By the way, particularly the first time you ask for the position of an obscure object, there may be some delay while the necessary ephemeris gets downloaded. The main ephemerides we use give data for the period 2000–2050. But we also have access to other ephemerides that cover much longer periods. So, for example, we can tell where Ganymede was when Galileo first observed it:
✕

We also have position data for more than 100,000 stars, galaxies, pulsars and other objects—with many more coming soon:
✕

Things get complicated very quickly. Here’s the position of Venus seen from Mars, using a frame centered at the center of Mars:
✕

If we pick a particular point on Mars, then we can get the result in azimuthaltitude coordinates relative to the Martian horizon:
✕

Another complication is that if you’re looking at something from the surface of the Earth, you’re looking through the atmosphere, and the atmosphere refracts light, making the position of the object look different. By default, AstroPosition takes account of this when you use coordinates based on the horizon. But you can switch it off, and then the results will be different—and, for example, for the Sun at sunset, substantially different:
✕

✕

And then there’s the speed of light, and relativity, to think about. Let’s say we want to know where Neptune “is” now. Well, do we mean where Neptune “actually is”, or do we mean “where we observe Neptune to be” based on light from Neptune coming to us? For frames referring to observations from Earth, we’re normally concerned with the case where we include the “light time” effect—and, yes, it does make a difference:
✕

OK, so AstroPosition—which is the analog of GeoPosition—gives us a way to represent where things are, astronomically. The next important function to discuss is AstroDistance—the analog of GeoDistance.
This gives the current distance between Venus and Mars:
✕

This is the current distance from where we are (according to Here) and the position of the Viking 2 lander on Mars:
✕

This is the distance from Here to the star τ Ceti:
✕

To be more precise, AstroDistance really tells us the distance from a certain object, to an observer, at a certain local time for the observer (and, yes, the fact that it’s local time matters because of light delays):
✕

And, yes, things are quite precise. Here’s the distance to the Apollo 11 landing site on the Moon, computed 5 times with a 1second pause in between, and shown to 10digit precision:
✕

This plots the distance to Mars for every day in the next 10 years:
✕

Another function is AstroAngularSeparation, which gives the angular separation between two objects as seen from a given position. Here’s the result from Jupiter and Saturn (seen from the Earth) over a 20year span:
✕

In addition to being able to compute astronomical things, Version 13.2 includes first steps in visualizing astronomical things. There’ll be more on this in subsequent versions. But Version 13.2 already has some powerful capabilities.
As a first example, here’s a part of the sky around Betelgeuse as seen right now from where I am:
✕

Zooming out, one can see more of the sky:
✕

There are lots of options for how things should be rendered. Here we’re seeing a realistic image of the sky, with grid lines superimposed, aligned with the equator of the Earth:
✕

And here we’re seeing a more whimsical interpretation:
✕

Just like for maps of the Earth, projections matter. Here’s a Lambert azimuthal projection of the whole sky:
✕

The blue line shows the orientation of the Earth’s equator, the yellow line shows the plane of the ecliptic (which is basically the plane of the Solar System), and the red line shows the plane of our galaxy (which is where we see the Milky Way).
If we want to know what we actually “see in the sky” we need a stereographic projection (in this case centered on the south direction):
✕

There’s a lot of detail in the astronomical data and computations we have (and even more will be coming soon). So, for example, if we zoom in on Jupiter we can see the positions of its moons (though their disks are too small to be rendered here):
✕

It’s fun to see how this corresponds to Galileo’s original observation of these moons more than 400 years ago. This is from Galileo:
The old typesetting does cause a little trouble:
✕

But the astronomical computation is more timeless. Here are the computed positions of the moons of Jupiter from when Galileo said he saw them, in Padua:
✕

And, yes, the results agree!
By the way, here’s another computation that could be verified soon. This is the time of maximum eclipse for an upcoming solar eclipse:
✕

And here’s what it will look like from a particular location right at that time:
✕

Dates are complicated. Even without any of the issues of relativity that we have to deal with for astronomy, it’s surprisingly difficult to consistently “name” times. What time zone are you talking about? What calendar system will you use? And so on. Oh, and then what granularity of time are you talking about? A day? A week? A month (whatever that means)? A second? An instantaneous moment (or perhaps a single elementary time from our Physics Project)?
These issues arise in what one might imagine would be trivial functions: the new RandomDate and RandomTime in Version 13.2. If you don’t say otherwise, RandomDate will give an instantaneous moment of time, in your current time zone, with your default calendar system, etc.—randomly picked within the current year:
✕

But let’s say you want a random date in June 1988. You can do that by giving the date object that represents that month:
✕

OK, but let’s say you don’t want an instant of time then, but instead you want a whole day. The new option DateGranularity allows this:
✕

You can ask for a random time in the next 6 hours:
✕

Or 10 random times:
✕

You can also ask for a random date within some interval—or collection of intervals—of dates:
✕

And, needless to say, we correctly sample uniformly over any collection of intervals:
✕

Another area of almost arbitrary complexity is units. And over the course of many years we’ve systematically solved problem after problem in supporting basically every kind of unit that’s in use (now more than 5000 base types). But one holdout has involved temperature. In physics textbooks, it’s traditional to carefully distinguish absolute temperatures, measured in kelvins, from temperature scales, like degrees Celsius or Fahrenheit. And that’s important, because while absolute temperatures can be added, subtracted, multiplied etc. just like other units, temperature scales on their own cannot. (Multiplying by 0° C to get 0 for something like an amount of heat would be very wrong.) On the other hand, differences in temperature—even measured in Celsius—can be multiplied. How can all this be untangled?
In previous versions we had a whole different kind of unit (or, more precisely, different physical quantity dimension) for temperature differences (much as mass and time have different dimensions). But now we’ve got a better solution. We’ve basically introduced new units—but still “temperaturedimensioned” ones—that represent temperature differences. And we’ve introduced a new notation (a little Δ subscript) to indicate them:
✕

If you take a difference between two temperatures, the result will have temperaturedifference units:
✕

But if you convert this to an absolute temperature, it’ll just be in ordinary temperature units:
✕

And with this unscrambled, it’s actually possible to do arbitrary arithmetic even on temperatures measured on any temperature scale—though the results also come back as absolute temperatures:
✕

It’s worth understanding that an absolute temperature can be converted either to a temperature scale value, or a temperature scale difference:
✕

All of this means that you can now use temperatures on any scale in formulas, and they’ll just work:
✕

Almost any algebraic computation ends up somehow involving polynomials. And polynomials have been a welloptimized part of Mathematica and the Wolfram Language since the beginning. And in fact, little has needed to be updated in the fundamental operations we do with them in more than a quarter of a century. But now in Version 13.2—thanks to new algorithms and new data structures, and new ways to use modern computer hardware—we’re updating some core polynomial operations, and making them dramatically faster. And, by the way, we’re getting some new polynomial functionality as well.
Here is a product of two polynomials, expanded out:
✕

Factoring polynomials like this is pretty much instantaneous, and has been ever since Version 1:
✕

But now let’s make this bigger:
✕

There are 999 terms in the expanded polynomial:
✕

Factoring this isn’t an easy computation, and in Version 13.1 takes about 19 seconds:
✕

But now, in Version 13.2, the same computation takes 0.3 seconds—nearly 60 times faster:
✕

It’s pretty rare that anything gets 60x faster. But this is one of those cases, and in fact for still larger polynomials, the ratio will steadily increase further. But is this just something that’s only relevant for obscure, big polynomials? Well, no. Not least because it turns out that big polynomials show up “under the hood” in all sorts of important places. For example, the innocuousseeming object
✕

can be manipulated as an algebraic number, but with minimal polynomial:
✕

In addition to factoring, Version 13.2 also dramatically increases the efficiency of polynomial resultants, GCDs, discriminants, etc. And all of this makes possible a transformative update to polynomial linear algebra, i.e. operations on matrices whose elements are (univariate) polynomials.
Here’s a matrix of polynomials:
✕

And here’s a power of the matrix:
✕

And the determinant of this:
✕

In Version 13.1 this didn’t look nearly as nice; the result comes out unexpanded as:
✕

Both size and speed are dramatically improved in Version 13.2. Here’s a larger case—where in 13.1 the computation takes more than an hour, and the result has a staggering leaf count of 178 billion
✕

but in Version 13.2 it’s 13,000 times faster, and 60 million times smaller:
✕

Polynomial linear algebra is used “under the hood” in a remarkable range of areas, particularly in handling linear differential equations, difference equations, and their symbolic solutions. And in Version 13.2, not only polynomial MatrixPower and Det, but also LinearSolve, Inverse, RowReduce, MatrixRank and NullSpace have been dramatically sped up.
In addition to the dramatic speed improvements, Version 13.2 also adds a polynomial feature for which I, for one, happen to have been waiting for more than 30 years: multivariate polynomial factoring over finite fields:
✕

Indeed, looking in our archives I find many requests stretching back to at least 1990—from quite a range of people—for this capability, even though, charmingly, a 1991 internal note states:
✕

Yup, that was right. But 31 years later, in Version 13.2, it’s done!
The Wolfram Language has had integrated neural net technology since 2015. Sometimes this is automatically used inside other Wolfram Language functions, like ImageIdentify, SpeechRecognize or Classify. But you can also build your own neural nets using the symbolic specification language with functions like NetChain and NetGraph—and the Wolfram Neural Net Repository provides a continually updated source of neural nets that you can immediately use, and modify, in the Wolfram Language.
But what if there’s a neural net out there that you just want to run from within the Wolfram Language, but don’t need to have represented in modifiable (or trainable) symbolic Wolfram Language form—like you might run an external program executable? In Version 13.2 there’s a new construct NetExternalObject that allows you to run trained neural nets “from the wild” in the same integrated framework used for actual WolframLanguagespecified neural nets.
NetExternalObject so far supports neural nets that have been defined in the ONNX neural net exchange format, which can easily be generated from frameworks like PyTorch, TensorFlow, Keras, etc. (as well as from Wolfram Language). One can get a NetExternalObject just by importing an .onnx file. Here’s an example from the web:
✕

If we “open up” the summary for this object we see what basic tensor structure of input and output it deals with:
✕

But to actually use this network we have to set up encoders and decoders suitable for the actual operation of this particular network—with the particular encoding of images that it expects:
✕

✕

Now we just have to run the encoder, the external network and the decoder—to get (in this case) a cartoonized Mount Rushmore:
✕

Often the “wrapper code” for the NetExternalObject will be a bit more complicated than in this case. But the builtin NetEncoder and NetDecoder functions typically provide a very good start, and in general the symbolic structure of the Wolfram Language (and its integrated ability to represent images, video, audio, etc.) makes the process of importing typical neural nets “from the wild” surprisingly straightforward. And once imported, such neural nets can be used directly, or as components of other functions, anywhere in the Wolfram Language.
We first introduced trees as a fundamental structure in Version 12.3, and we’ve been enhancing them ever since. In Version 13.1 we added many options for determining how trees are displayed, but in Version 13.2 we’re adding another, very important one: the ability to elide large subtrees.
Here’s a size200 random tree with every branch shown:
✕

And here’s the same tree with every node being told to display a maximum of 3 children:
✕

And, actually, tree elision is convenient enough that in Version 13.2 we’re doing it by default for any node that has more than 10 children—and we’ve introduced the global $MaxDisplayedChildren to determine what that default limit should be.
Another new tree feature in Version 13.2 is the ability to create trees from your file system. Here’s a tree that goes down 3 directory levels from my Wolfram Desktop installation directory:
✕

Is there still more to do in calculus? Yes! Sometimes the goal is, for example, to solve more differential equations. And sometimes it’s to solve existing ones better. The point is that there may be many different possible forms that can be given for a symbolic solution. And often the forms that are easiest to generate aren’t the ones that are most useful or convenient for subsequent computation, or the easiest for a human to understand.
In Version 13.2 we’ve made dramatic progress in improving the form of solutions that we give for the most kinds of differential equations, and systems of differential equations.
Here’s an example. In Version 13.1 this is an equation we could solve symbolically, but the solution we give is long and complicated:
✕

But now, in 13.2, we immediately give a much more compact and useful form of the solution:
✕

The simplification is often even more dramatic for systems of differential equations. And our new algorithms cover a full range of differential equations with constant coefficients—which are what go by the name LTI (linear timeinvariant) systems in engineering, and are used quite universally to represent electrical, mechanical, chemical, etc. systems.
✕

In Version 13.1 we introduced symbolic solutions of fractional differential equations with constant coefficients; now in Version 13.2 we’re extending this to asymptotic solutions of fractional differential equations with both constant and polynomial coefficients. Here’s an Airylike differential equation, but generalized to the fractional case with a Caputo fractional derivative:
✕

The Wolfram Language has had basic builtin support for cluster analysis since the mid2000s. But in more recent times—with increased sophistication from machine learning—we’ve been adding more and more sophisticated forms of cluster analysis. But it’s one thing to do cluster analysis; it’s another to analyze the cluster analysis one’s done, to try to better understand what it means, how to optimize it, etc. In Version 13.2 we’re both adding the function ClusteringMeasurements to do this, as well as adding more options for cluster analysis, and enhancing the automation we have for method and parameter selection.
Let’s say we do cluster analysis on some data, asking for a sequence of different numbers of clusters:
✕

Which is the “best” number of clusters? One measure of this is to compute the “silhouette score” for each possible clustering, and that’s something that ClusteringMeasurements can now do:
✕

As is fairly typical in statisticsrelated areas, there are lots of different scores and criteria one can use—ClusteringMeasurements supports a wide variety of them.
Our goal with Wolfram Language is to make as much as possible computable. Version 13.2 adds yet another domain—chess—supporting import of the FEN and PGN chess formats:
✕

PGN files typically contain many games, each represented as a list of FEN strings. This counts the number of games in a particular PGN file:
✕

Here’s the first game in the file:
✕

Given this, we can now use Wolfram Language’s video capabilities to make a video of the game:
✕

Back in 1979 when I started building SMP—the forerunner to the Wolfram Language—I did something that to some people seemed very bold, perhaps even reckless: I set up the system to fundamentally do “infinite evaluation”, that is, to continue using whatever definitions had been given until nothing more could be done. In other words, the process of evaluation would always go on until a fixed point was reached. “But what happens if x doesn’t have a value, and you say x = x + 1?” people would ask. “Won’t the system blow up in that case?” Well, in some sense yes. But I took a calculated gamble that the benefits of infinite evaluation for ordinary computations that people actually want to do would vastly outweigh any possible issues with what seemed like “pointless corner cases” such as x = x + 1. Well, 43 years later I think I can say with some confidence that that gamble worked out. The concept of infinite evaluation—combined with the symbolic structure of the Wolfram Language—has been a source of tremendous power, and most users simply never run into, and never have to think about, the x = x + 1 “corner case”.
However, if you type x = x + 1 the system clearly has to do something. And in a sense the purest thing to do would just be to continue computing forever. But 34 years ago that led to a rather disastrous problem on actual computers—and in fact still does today. Because in general this kind of repeated evaluation is a recursive process, that ultimately has to be implemented using the call stack set up for every instance of a program by the operating system. But the way operating systems work (still!) is to allocate only a fixed amount of memory for the stack—and if this is overrun, the operating system will simply make your program crash (or, in earlier times, the operating system itself might crash). And this meant that ever since Version 1, we’ve needed to have a limit in place on infinite evaluation. In early versions we tried to give the “result of the computation so far”, wrapped in Hold. Back in Version 10, we started just returning a held version of the original expression:
✕

But even this is in a sense not safe. Because with other infinite definitions in place, one can end up with a situation where even trying to return the held form triggers additional infinite computational processes.
In recent times, particularly with our exploration of multicomputation, we’ve decided to revisit the question of how to limit infinite computations. At some theoretical level, one might imagine explicitly representing infinite computations using things like transfinite numbers. But that’s fraught with difficulty, and manifest undecidability (“Is this infinite computation output really the same as that one?”, etc.) But in Version 13.2, as the beginning of a new, “purely symbolic” approach to “runaway computation” we’re introducing the construct TerminatedEvaluation—that just symbolically represents, as it says, a terminated computation.
So here’s what now happens with x = x + 1:
✕

A notable feature of this is that it’s “independently encapsulated”: the termination of one part of a computation doesn’t affect others, so that, for example, we get:
✕

There’s a complicated relation between terminated evaluations and lazy evaluation, and we’re working on some interesting and potentially powerful new capabilities in this area. But for now, TerminatedEvaluation is an important construct for improving the “safety” of the system in the corner case of runaway computations. And introducing it has allowed us to fix what seemed for many years like “theoretically unfixable” issues around complex runaway computations.
TerminatedEvaluation is what you run into if you hit systemwide “guard rails” like $RecursionLimit. But in Version 13.2 we’ve also tightened up the handling of explicitly requested aborts—by adding the new option PropagateAborts to CheckAbort. Once an abort has been generated—either directly by using Abort[ ], or as the result of something like TimeConstrained[ ] or MemoryConstrained[ ]—there’s a question of how far that abort should propagate. By default, it’ll propagate all the way up, so your whole computation will end up being aborted. But ever since Version 2 (in 1991) we’ve had the function CheckAbort, which checks for aborts in the expression it’s given, then stops further propagation of the abort.
But there was always a lot of trickiness around the question of things like TimeConstrained[ ]. Should aborts generated by these be propagated the same way as Abort[ ] aborts or not? In Version 13.2 we’ve now cleaned all of this up, with an explicit option PropagateAborts for CheckAbort. With PropagateAborts→True all aborts are propagated, whether initiated by Abort[ ] or TimeConstrained[ ] or whatever. PropagateAborts→False propagates no aborts. But there’s also PropagateAborts→Automatic, which propagates aborts from TimeConstrained[ ] etc., but not from Abort[ ].
In our neverending process of extending and polishing the Wolfram Language we’re constantly on the lookout for “lumps of computational work” that people repeatedly want to do, and for which we can create functions with easytounderstand names. These days we often prototype such functions in the Wolfram Function Repository, then further streamline their design, and eventually implement them in the permanent core Wolfram Language. In Version 13.2 just two new basic listmanipulation functions came out of this process: PositionLargest and PositionSmallest.
We’ve had the function Position since Version 1, as well as Max. But something I’ve often found myself needing to do over the years is to combine these to answer the question: “Where is the max of that list?” Of course it’s not hard to do this in the Wolfram Language—Position[list, Max[list]] basically does it. But there are some edge cases and extensions to think about, and it’s convenient just to have one function to do this. And, what’s more, now that we have functions like TakeLargest, there’s an obvious, consistent name for the function: PositionLargest. (And by “obvious”, I mean obvious after you hear it; the archive of our livestreamed design review meetings will reveal that—as is so often the case—it actually took us quite a while to settle on the “obvious”.)
Here’s PositionLargest and in action:
✕

And, yes, it has to return a list, to deal with “ties”:
✕

Everything in the Wolfram Language is a symbolic expression. But different symbolic expressions are displayed differently, which is, of course, very useful. So, for example, a graph isn’t displayed in the raw symbolic form
✕

but rather as a graph:
✕

But let’s say you’ve got a whole collection of visual objects in a notebook. How can you tell what they “really are”? Well, you can click them, and then see what color their borders are. It’s subtle, but I’ve found one quickly gets used to noticing at least the kinds of objects one commonly uses. And in Version 13.2 we’ve made some additional distinctions, notably between images and graphics.
So, yes, the object above is a Graph—and you can tell that because it has a purple border when you click it:
✕

This is a Graphics object, which you can tell because it’s got an orange border:
✕

And here, now, is an Image object, with a light blue border:
✕

For some things, color hints just don’t work, because people can’t remember which color means what. But for some reason, adding color borders to visual objects seems to work very well; it provides the right level of hinting, and the fact that one often sees the color when it’s obvious what the object is helps cement a memory of the color.
In case you’re wondering, there are some others already in use for borders—and more to come. Trees are green (though, yes, ours by default grow down). Meshes are brown:
✕

How do we make it as easy as possible to type correct Wolfram Language code? This is a question we’ve been working on for years, gradually inventing more and more mechanisms and solutions. In Version 13.2 we’ve made some small tweaks to a mechanism that’s actually been in the system for many years, but the changes we’ve made have a substantial effect on the experience of typing code.
One of the big challenges is that code is typed “linearly”—essentially (apart from 2D constructs) from left to right. But (just like in natural languages like English) the meaning is defined by a more hierarchical tree structure. And one of the issues is to know how something you typed fits into the tree structure.
Something like this is visually obvious quite locally in the “linear” code you typed. But sometimes what defines the tree structure is quite far away. For example, you might have a function with several arguments that are each large expressions. And when you’re looking at one of the arguments it may not be obvious what the overall function is. And part of what we’re now emphasizing more strongly in Version 13.2 is dynamic highlighting that shows you “what function you’re in”.
It’s highlighting that appears when you click. So, for example, this is the highlighting you get clicking at several different positions in a simple expression:
✕

Here’s an example “from the wild” showing you that if you type at the position of the cursor, you’ll be adding an argument to the ContourPlot function:
✕

But now let’s click in a different place:
✕

Here’s a smaller example:
✕

We first introduced the notebook interface in Version 1 back in 1988. And already in that version we had many of the current features of notebooks—like cells and cell groups, cell styles, etc. But over the past 34 years we’ve been continuing to tweak and polish the notebook interface to make it ever smoother to use.
In Version 13.2 we have some minor but convenient additions. We’ve had the Divide Cell menu item (cmdshiftD) for more than 30 years. And the way it’s always worked is that you click where you want a cell to be divided. Meanwhile, we’ve always had the ability to put multiple Wolfram Language inputs into a single cell. And while sometimes it’s convenient to type code that way, or import it from elsewhere like that, it makes better use of all our notebook and cell capabilities if each independent input is in its own cell. And now in Version 13.2 Divide Cell can make it like that, analyzing multiline inputs to divide them between complete inputs that occur on different lines:
✕

Similarly, if you’re dealing with text instead of code, Divide Cell will now divide at explicit line breaks—that might correspond to paragraphs.
In a completely different area, Version 13.1 added a new default toolbar for notebooks, and in Version 13.2 we’re beginning the process of steadily adding features to this toolbar. The main obvious feature that’s been added is a new interactive tool for changing frames in cells. It’s part of the Cell Appearance item in the toolbar:
✕

Just click a side of the frame style widget and you’ll get a tool to edit that frame style—and you’ll immediately see any changes reflected in the notebook:
✕

If you want to edit all the sides, you can lock the settings together with:
✕

Cell frames have always been a useful mechanism for delineating, highlighting or otherwise annotating cells in notebooks. But in the past it’s been comparatively difficult to customize them beyond what’s in the stylesheet you’re using. With the new toolbar feature in Version 13.2 we’ve made it very easy to work with cell frames, making it realistic for custom cell frames to become a routine part of notebook content.
We’ve worked hard to have code you write in the Wolfram Language immediately run efficiently. But by taking the extra onetime effort to invoke the Wolfram Language compiler—telling it more details about how you expect to use your code— you can often make your code run more efficiently, and sometimes dramatically so. In Version 13.2 we’ve been continuing the process of streamlining the workflow for using the compiler, and for unifying code that’s set up for compilation, and code that’s not.
The primary work you have to do in order to make the best use of the Wolfram Language compiler is in specifying types. One of the important features of the Wolfram Language in general is that a symbol x can just as well be an integer, a list of complex numbers or a symbolic representation of a graph. But the main way the compiler adds efficiency is by being able to assume that x is, say, always going to be an integer that fits into a 64bit computer word.
The Wolfram Language compiler has a sophisticated symbolic language for specifying types. Thus, for example
✕

is a symbolic specification for the type of a function that takes two 64bit integers as input, and returns a single one. TypeSpecifier[ ... ] is a symbolic construct that doesn’t evaluate on its own, and can be used and manipulated symbolically. And it’s the same story with Typed[ ... ], which allows you to annotate an expression to say what type it should be assumed to be.
But what if you want to write code which can either be evaluated in the ordinary way, or fed to the compiler? Constructs like Typed[ ... ] are for permanent annotation. In Version 13.2 we’ve added TypeHint which allows you to give a hint that can be used by the compiler, but will be ignored in ordinary evaluation.
This compiles a function assuming that its argument x is an 8bit integer:
✕

By default, the 100 here is assumed to be represented as a 64bit integer. But with a type hint, we can say that it too should be represented as an 8bit integer:
✕

150 doesn’t fit in an 8bit integer, so the compiled code can’t be used:
✕

But what’s relevant here is that the function we compiled can be used not only for compilation, but also in ordinary evaluation, where the TypeHint effectively just “evaporates”:
✕

As the compiler develops, it’s going to be able to do more and more type inferencing on its own. But it’ll always be able to get further if the user gives it some hints. For example, if x is a 64bit integer, what type should be assumed for x^{x}? There are certainly values of x for which x^{x} won’t fit in a 64bit integer. But the user might know those won’t show up. And so they can give a type hint that says that the x^{x} should be assumed to fit in a 64bit integer, and this will allow the compiler to do much more with it.
It’s worth pointing out that there are always going to be limitations to type inferencing, because, in a sense, inferring types requires proving theorems, and there can be theorems that have arbitrarily long proofs, or no proofs at all in a certain axiomatic system. For example, imagine asking whether the type of a zero of the Riemann zeta function has a certain imaginary part. To answer this, the type inferencer would have to solve the Riemann hypothesis. But if the user just wanted to assume the Riemann hypothesis, they could—at least in principle—use TypeHint.
TypeHint is a wrapper that means something to the compiler, but “evaporates” in ordinary evaluation. Version 13.2 adds IfCompiled, which lets you explicitly delineate code that should be used with the compiler, and code that should be used in ordinary evaluation. This is useful when, for example, ordinary evaluation can use a sophisticated builtin Wolfram Language function, but compiled code will be more efficient if it effectively builds up similar functionality from lowerlevel primitives.
In its simplest form FunctionCompile lets you take an explicit pure function and make a compiled version of it. But what if you have a function where you’ve already assigned downvalues to it, like:
✕

Now in Version 13.2 you can use the new DownValuesFunction wrapper to give a function like this to FunctionCompile:
✕

This is important because it lets you set up a whole network of definitions using := etc., then have them automatically be fed to the compiler. In general, you can use DownValuesFunction as a wrapper to tag any use of a function you’ve defined elsewhere. It’s somewhat analogous to the KernelFunction wrapper that you can use to tag builtin functions, and specify what types you want to assume for them in code that you’re feeding to the compiler.
Let’s say you’re building a substantial piece of functionality that might include compiled Wolfram Language code, external libraries, etc. In Version 13.2 we’ve added capabilities to make it easy to “package up” such functionality, and for example deploy it as a distributable paclet.
As an example of what can be done, this installs a paclet called GEOSLink that includes the GEOS external library and compilerbased functionality to access this:
✕

Now that the paclet is installed, we can use a file from it to set up a whole collection of functions that are defined in the paclet:
✕

Given the code in the paclet we can now just start calling functions that use the GEOS library:
✕

It’s quite nontrivial that this “just works”. Because for it to work, the system has to have been told to load and initialize the GEOS library, as well as convert the Wolfram Language polygon geometry to a form suitable for GEOS. The returned result is also nontrivial: it’s essentially a handle to data that’s inside the GEOS library, but being memorymanaged by the Wolfram Language system. Now we can take this result, and call a GEOS library function on it, using the Wolfram Language binding that’s been defined for that function:
✕

This gets the result “back from GEOS” into pure Wolfram Language form:
✕

How does all this work? This goes to the directory for the installed GEOSLink paclet on my system:
✕

There’s a subdirectory called LibraryResources that contains dynamic libraries suitable for my computer system:
✕

The libgeos libraries are the raw external GEOS libraries “from the wild”. The GEOSLink library is a library that was built by the Wolfram Language compiler from Wolfram Language code that defines the “glue” for interfacing between the GEOS library and the Wolfram Language:
✕

What is all this? It’s all based on new functionality in Version 13.2. And ultimately what it’s doing is to create a CompiledComponent construct (which is a new thing in Version 13.2). A CompiledComponent construct represents a bundle of compilable functionality with elements like "Declarations", "InstalledFunctions", "LibraryFunctions", "LoadingEpilogs" and "ExternalLibraries". And in a typical case—like the one shown here—one creates (or adds to) a CompiledComponent using DeclareCompiledComponent.
Here’s an example of part of what’s added by DeclareCompiledComponent:
✕

First there’s a declaration of an external (in this case GEOS) library function, giving its type signature. Then there’s a declaration of a compilable Wolfram Language function GEOSUnion that directly calls the GEOSUnion function in the external library, defining it to take a certain memorymanaged data structure as input, and return a similarly memorymanaged object as output.
From this source code, all you do to build an actual library is use BuildCompiledComponent. And given this library you can start calling external GEOS functions directly from toplevel Wolfram Language code, as we did above.
But the CompiledComponent object does something else as well. It also sets up everything you need to be able to write compilable code that calls the same functions as you can within the built library.
The bottom line is that with all the new functionality in Version 13.2 it’s become dramatically easier to integrate compiled code, external libraries etc. and to make them conveniently distributable. It’s a fairly remarkable simplification of what was previously a timeconsuming and complex software engineering challenge. And it’s good example of how powerful it can be to set up symbolic specifications in the Wolfram Language and then use our compiler technology to automatically create and deploy code defined by them.
In addition to all the things we’ve discussed, there are other updates and enhancements that have arrived in the six months since Version 13.1 was released. A notable example is that there have been no fewer than 241 new functions added to the Wolfram Function Repository during that time, providing specific addon functionality in a whole range of areas:
But within the core Wolfram Language itself, Version 13.2 also adds lots of little new capabilities, that polish and round out existing functionality. Here are some examples:
Parallelize now supports automatic parallelization of a variety of new functions, particularly related to associations.
Blurring now joins DropShadowing as a 2D graphics effect.
MeshRegion, etc. can now store vertex coloring and vertex normals to allow enhanced visualization of regions.
RandomInstance does much better at quickly finding nondegenerate examples of geometric scenes that satisfy specified constraints.
ImageStitch now supports stitching images onto spherical and cylindrical canvases.
Functions like Definition and Clear that operate on symbols now consistently handle lists and string patterns.
FindShortestTour has a direct way to return individual features of the result, rather than always packaging them together in a list.
PersistentSymbol and LocalSymbol now allow reassignment of parts using functions like AppendTo.
SystemModelMeasurements now gives diagnostics such as rise time and overshoot for SystemModel control systems.
Import of OSM (OpenStreetMap) and GXF geo formats are now supported.
]]>
Last week it was 34 years since the original launch of Mathematica and what’s now the Wolfram Language. And through all those years we’ve energetically continued building further and further, adding ever more capabilities, and steadily extending the domain of the computational paradigm.
In recent years we’ve established something of a rhythm, delivering the fruits of our development efforts roughly twice a year. We released Version 13.0 on December 13, 2021. And now, roughly six months later, we’re releasing Version 13.1. As usual, even though it’s a “.1” release, it’s got a lot of new (and updated) functionality, some of which we’ve worked on for many years but finally now brought to fruition.
For me it’s always exciting to see what we manage to deliver in each new version. And in Version 13.1 we have 90 completely new functions—as well as 203 existing functions with substantial updates. And beyond what appears in specific functions, there’s also major new functionality in Version 13.1 in areas like user interfaces and the compiler.
The Wolfram Language as it exists today encompasses a vast range of functionality. But its great power comes not just from what it contains, but also from how coherently everything in it fits together. And for nearly 36 years I’ve taken it as a personal responsibility to ensure that that coherence is maintained. It’s taken both great focus and lots of deep intellectual work. But as I experience them every day in my use of the Wolfram Language, I’m proud of the results.
And for the past four years I’ve been sharing the “behind the scenes” of how it’s achieved—by livestreaming our Wolfram Language design review meetings. It’s an unprecedented level of openness—and engagement with the community. In designing Version 13.1 we’ve done 90 livestreams—lasting more than 96 hours. And in opening up our process we’re providing visibility not only into what was built for Version 13.1, but also of why it was built, and how decisions about it were made.
But, OK, so what finally is in Version 13.1? Let’s talk about some highlights….
From the very beginning of Mathematica and the Wolfram Language we’ve had the concept of listability: if you add two lists, for example, their corresponding elements will be added:
✕

It’s a very convenient mechanism, that typically does exactly what you’d want. And for 35 years we haven’t really considered extending it. But if we look at code that gets written, it often happens that there are parts that basically implement something very much like listability, but slightly more general. And in Version 13.1 we have a new symbolic construct, Threaded, that effectively allows you to easily generalize listability.
Consider:
✕

This uses ordinary listability, effectively computing:
✕

But what if you want instead to “go down a level” and thread {x,y} into the lowest parts of the first list? Well, now you can use Threaded to do that:
✕

On its own, Threaded is just a symbolic wrapper:
✕

But as soon as it appears in a function—like Plus—that has attribute Listable, it specifies that the listability should be applied after what’s specified inside Threaded is “threaded” at the lowest level.
Here’s another example. Create a list:
✕

How should we then multiply each element by {1,–1}? We could do this with:
✕

But now we’ve got Threaded, and so instead we can just say:
✕

You can give Threaded as an argument to any listable function, not just Plus and Times:
✕

You can use Threaded and ordinary listability together:
✕

You can have several Threadeds together as well:
✕

Threaded, by the way, gets its name from the function Thread, which explicitly does “threading”, as in:
✕

By default, Threaded will always thread into the lowest level of a list:
✕

✕

Here’s a “reallife” example of using Threaded like this. The data in a 3D color image consists of a rank3 array of triples of RGB values:
✕

This multiplies every RGB triple by {0,1,2}:
✕

Most of the time you either want to use ordinary listability that operates at the top level of a list, or you want to use the default form of Threaded, that operates at the lowest level of a list. But Threaded has a more general form, in which you can explicitly say what level you want it to operate at.
Here’s the default case:
✕

Here’s level 1, which is just like ordinary listability:
✕

And here’s threading into level 2:
✕

Threaded provides a very convenient way to do all sorts of arraycombining operations. There’s additional complexity when the object being “threaded in” itself has multiple levels. The default in this case is to align the lowest level in the thing being threaded in with the lowest level of the thing into which it’s being threaded:
✕

Here now is “ordinary listability” behavior:
✕

For the arrays we’re looking at here, the default behavior is equivalent to:
✕

Sometimes it’s clearer to write this out in a form like
✕

which says that the first level of the array inside the Threaded is to be aligned with the second level of the outside array. In general, the default case is equivalent to –1 → –1, specifying that the bottom level of the array inside the Threaded should be aligned with the bottom level of the array outside.
In every version of the Wolfram Language we try to add new functions that will make general programs easier to write and easier to read. In Version 13.1 the most important such function is Threaded. But there are quite a few others as well.
First in our collection for Version 13.1 is DeleteElements, which deletes specified elements from a list. It’s like Complement, except that it doesn’t reorder the list (analogous to the way DeleteDuplicates removes duplicate elements, without reordering in the way that Union does):
✕

DeleteElements also allows more detailed control of how many copies of an element can be deleted. Here it is up to 2 b’s and 3 c’s:
✕

Talking of DeleteDuplicates, another new function in Version 13.1 is DeleteAdjacentDuplicates:
✕

We’ve had Union, Intersection and Complement since Version 1.0. In Version 13.1 we’re adding SymmetricDifference: find elements that (in the 2argument case) are in one list or the other, but not both. For example, what countries are in the G20 or the EU, but not both?
✕

Let’s say you have several lists, and you want to know what elements are unique to just one of these lists, and don’t occur in multiple lists. The new UniqueElements tells one.
As an example, this tells us which letters uniquely occur in various alphabets:
✕

We’ve had Map and Apply, with short forms /@ and @@, ever since Version 1.0. In Version 4.0 we added @@@ to represent Apply[f,expr,1]. But we never added a separate function to correspond to @@@. And over the years, there’ve been quite a few occasions where I’ve basically wanted, for example, to do something like “Fold[@@@, ...]”. Obviously Fold[Apply[#1,#2,1]&,...] would work. But it feels as if there’s a “missing” named function. Well, in Version 13.1, we added it: MapApply is equivalent to @@@:
✕

Another small convenience added in Version 13.1 is SameAs—essentially an operator form of SameQ. Why is such a construct needed? Well, there are always tradeoffs in language design. And back in Version 1.0 we decided to make SameQ work with any number of arguments (so you can test whether a whole sequence of things are the same). But this means that for consistency SameQ[expr] must always return True—so it’s not available as an operator of SameQ. And that’s why now in Version 13.1 we’re adding SameAs, that joins the family of operatorform functions like EqualTo and GreaterThan:
✕

Procedural programming—often with “variables hanging out”—isn’t the preferred style for most Wolfram Language code. But sometimes it’s the most convenient way to do things. And in Version 13.1 we’ve add a small piece of streamlining by introducing the function Until. Ever since Version 1.0 we’ve had While[test,body] which repeatedly evaluates body while test is True. But if test isn’t True even at first, While won’t ever evaluate body. Until[test,body] does things the other way around: it evaluates body until test becomes True. So if test isn’t True at first, Until will still evaluate body once, in effect only looking at the test after it’s evaluated the body.
Last but not least in the list of new core language functions in Version 13.1 is ReplaceAt. Replace attempts to apply a replacement rule to a complete expression—or a whole level in an expression. ReplaceAll (/.) does the same thing for all subparts of an expression. But quite often one wants more control over where replacements are done. And that’s what ReplaceAt provides:
✕

An important feature is that it also has an operator form:
✕

Why is this important? The answer is that it gives a symbolic way to specify not just what replacement is made, but also where it is made. And for example this is what’s needed in specifying steps in proofs, say as generated by FindEquationalProof.
What is a character? Back when Version 1.0 was released, characters were represented as 8bit objects: usually ASCII, but you could pick another “character encoding” (hence the ChararacterEncoding option) if you wanted. Then in the early 1990s came Unicode—which we were one of the very first companies to support. Now “characters” could be 16bit constructs, with nearly 65,536 possible “glyphs” allocated across different languages and uses (including some mathematical symbols that we introduced). Back in the early 1990s Unicode was a newfangled thing, that operating systems didn’t yet have builtin support for. But we were betting on Unicode, and so we built our own infrastructure for handling it.
Thirty years later Unicode is indeed the universal standard for representing characterlike things. But somewhere along the way, it turned out the world needed more than 16 bits’ worth of characterlike things. At first it was about supporting variants and historical writing systems (think: cuneiform or Linear B). But then came emoji. And it became clear that—yes, arguably in a return to the Egyptian hieroglyph style of communication—there was an almost infinite number of possible pictorial emoji that could be made, each of them being encoded as their own Unicode code point.
It’s been a slow expansion. Original 16bit Unicode is “plane 0”. Now there are up to 16 additional planes. Not quite 32bit characters, but given the way computers work, the approach now is to allow characters to be represented by 32bit objects. It’s far from trivial to do that uniformly and efficiently. And for us it’s been a long process to upgrade everything in our system—from string manipulation to notebook rendering—to handle full 32bit characters. And that’s finally been achieved in Version 13.1.
But that’s far from all. In English we’re pretty much used to being able to treat text as a sequence of letters and other characters, with each character being separate. Things get a bit more complicated when you start to worry about diphthongs like æ. But if there are fairly few of these, it works to just introduce them as individual “Unicode characters” with their own code point. But there are plenty of languages—like Hindi or Khmer—where what appears in text like an individual character is really a composite of letterlike constructs, diacritical marks and other things. Such composite characters are normally represented as “grapheme clusters”: runs of Unicode code points. The rules for handling these things can be quite complicated. But after many years of development, major operating systems now successfully do it in most cases. And in Version 13.1 we’re able to make use of this to support such constructs in notebooks.
OK, so what does 32bit Unicode look like? Using CharacterRange (or FromCharacterCode) we can dive in and just see what’s out there in “character space”. Here’s part of ordinary 16bit Unicode space:
✕

Here’s some of what happens in “plane1” above character code 65535, in this case catering to “legacy computations”:
✕

Plane0 (below 65535) is pretty much all full. Above that, things are sparser. But around 128000, for example, there are lots of emoji:
✕

You can use these in the Wolfram Language, and in notebooks, just like any other characters. So, for example, you can have wolf and ram variables:
✕

The sorts before the because it happens to have a numerically smaller character code:
✕

In a notebook, you can enter emoji (and other Unicode characters) using standard operating system tools—like ctrlcmdspace on macOS:
✕

The world of emoji is rapidly evolving—and that can sometimes lead to problems. Here’s an emoji range that includes some very familiar emoji, but on at least one of my computer systems also includes emoji that display only as :
✕

The reason that happens is that my default fonts don’t contain glyphs for those emoji. But all is not lost. In Version 13.1 we’re including a font from Twitter that aims to contain glyphs for pretty much all emoji:
✕

Beyond dealing with individual Unicode characters, there’s also the matter of composites, and grapheme clusters. In Hindi, for example, two characters can combine into something that’s rendered (and treated) as one:
✕

The first character here can stand on its own:
✕

But the second one is basically a modifier that extends the first character (in this particular case adding a vowel sound):
✕

But once the composite हि has been formed it acts “textually” just like a single character, in the sense that, for example, the cursor moves through it in one step. When it appears “computationally” in a string, however, it can still be broken into its constituent Unicode elements:
✕

This kind of setup can be used not only for a language like Hindi but also for European languages that have diacritical marks like umlauts:
✕

Even though this looks like one character—and in Version 13.1 it’s treated like that for “textual” purposes, for example in notebooks—it is ultimately made up of two distinct “Unicode characters”:
✕

In this particular case, though, this can be “normalized” to a single character:
✕

It looks the same, but now it really is just one character:
✕

Here’s a “combined character” that you can form
✕

but for which there’s no single character to which it normalizes:
✕

The concept of composite characters applies not only to ordinary text, but also to emojis. For example, take the emoji for a woman
✕

together with the emoji for a microscope
✕

and combine them with the “zerowidthjoiner” character (which, needless to say, doesn’t display as anything)
✕

and you get (yes, somewhat bizarrely) a woman scientist!
✕

Needless to say, you can do this computationally—though the “calculus” of what’s been defined so far in Unicode is fairly bizarre:
✕

I’m sort of hoping that the future of semantics doesn’t end up being defined by the way emojis combine .
As one last—arguably hacky—example of combining characters, Unicode defines various “twoletter” combinations to be flags. Type then , and you get !
Once again, this can be made computational:
✕

(And, yes, it’s an interesting question what renders here, and what doesn’t. In some operating systems, no flags are rendered, and we have to pull in a special font to do it.)
✕

It used to be that the only “special key sequence” one absolutely should know in order to use Wolfram Notebooks was shiftenter. But gradually there have started to be more and more highprofile operations that are conveniently done by “pressing a button”. And rather than expecting people to remember all those special key sequences (or think to look in menus for them) we’ve decided to introduce a toolbar that will be displayed by default in every standard notebook. Version 13.1 has the first iteration of this toolbar. Subsequent versions will support an increasing range of capabilities.
It’s not been easy to design the default toolbar (and we hope you’ll like what we came up with!) The main problem is that Wolfram Notebooks are very general, and there are a great many things you can do with them—which it’s challenging to organize into a manageable toolbar. (Some special types of notebooks have had their own specialized toolbars for a while, which were easier to design by virtue of their specialization.)
So what’s in the toolbar? On the left are a couple of evaluation controls:
means “Evaluate”, and is simply equivalent to pressing shiftret (as its tooltip says). means “Abort”, and will stop a computation. To the right of is the menu shown above. The first part of the menu allows you to choose what will be evaluated. (Don’t forget the extremely useful “Evaluate In Place” that lets you evaluate whatever code you have selected—say to turn RGBColor[1,0,0] in your input into .) The bottom part of the menu gives a couple of more detailed (but highly useful) evaluation controls.
Moving along the toolbar, we next have:
✕

If your cursor isn’t already in a cell, the pulldown allows you to select what type of cell you want to insert (it’s similar to the “tongue” that appears within the notebook). (If your cursor is already inside a cell, then like in a typical word processor, the pulldown will tell you the style that’s being used, and let you reset it.)
gives you a little panel to control to appearance of cells, changing their background colors, frames, dingbats, etc.
Next come cellrelated buttons: . The first is for cell structure and grouping:
✕

copies input from above (cmdL). It’s an operation that I, for one, end up doing all the time. I’ll have an input that I evaluate. Then I’ll want to make a modified version of the input to evaluate again, while keeping the original. So I’ll copy the input from above, edit the copy, and evaluate it again.
copies output from above. I don’t find this quite as useful as copy input from above, but it can be helpful if you want to edit output for subsequent input, while leaving the “actual output” unchanged.
The block is all about content in cells. (which you’ll often press repeatedly) is for extending a selection—in effect going ever upwards in an expression tree. (You can get the same effect by pressing ctrl. or by multiclicking, but it’s a lot more convenient to repeatedly press a single button than to have to precisely time your multiclicks.)
is the singlebutton way to get ctrl= for entering natural language input:
✕

iconizes your selection:
✕

Iconization is something we introduced in Version 11.3, and it’s something that’s proved incredibly useful, particularly for making code easy to read (say by iconizing details of options). (You can also iconize a selection from the rightclick menu, or with ctrlcmd'.)
is most relevant for code, and toggles commenting (with ) a selection. brings up a palette for math typesetting. lets you enter that will be converted to Wolfram Language math typesetting. brings up a drawing canvas. inserts a hyperlink (cmdshiftH).
If you’re in a text cell, the toolbar will look different, now sporting a text formatting control:
Most of this is fairly standard. lets you insert “code voice” material. and are still in the toolbar for inserting math into a text cell.
On the righthand end of the toolbar are three more buttons: . gives you a dialog to publish your notebook to the cloud. opens documentation, either specifically looking up whatever you have selected in the notebook, or opening the front page (“root guide page”) of the main Wolfram Language documentation. Finally, lets you search in your current notebook.
As I mentioned above, what’s in Version 13.1 is just the first iteration of our default toolbar. Expect more features in later versions. One thing that’s notable about the toolbar in general is that it’s 100% implemented in Wolfram Language. And in addition to adding a great deal of flexibility, this also means that the toolbar immediately works on all platforms. (By the way, if you don’t want the toolbar in a particular notebook—or for all your notebooks—just rightclick the background of the toolbar to pick that option.)
We first introduced Wolfram Notebooks with Version 1.0 of Mathematica, in 1988. And ever since then, we’ve been progressively polishing the notebook interface, doing more with every new version.
The ctrl= mechanism for entering natural language (“WolframAlphastyle”) input debuted in Version 10.0—and in Version 13.1 it’s now accessible from the button in the new default notebook toolbar. But what actually is when it’s in a notebook? In the past, it’s been a fairly complex symbolic structure mainly suitable for evaluation. But in Version 13.1 we’ve made it much simpler. And while that doesn’t have any direct effect if you’re just using purely in a notebook, it does have an effect if you copy into another application, like puretext email. In the past this produced something that would work if pasted back into a notebook, but definitely wasn’t particularly readable. In Version 13.1, it’s now simply the Wolfram Language interpretation of your natural language input:
✕

What happens if the computation you do in a notebook generates a huge output? Ever since Version 6.0 we’ve had some form of “output limiter”, but in Version 13.1 it’s become much sleeker and more useful. Here’s a typical example:
✕

Talking of big outputs (as well as other things that keep the notebook interface busy), another change in Version 13.1 is the new asynchronous progress overlay on macOS. This doesn’t affect other platforms where this problem had already been solved, but on the Mac changes in the OS had led to a situation where the notebook front end could mysteriously pop to the front on your desktop—a situation that has now been resolved.
One of the slightly unusual user interface features that’s existed ever since Version 1.0 is the Why the Beep? menu item—that lets you get an explanation of any “error beep” that occurs while you’re running the system. The function Beep lets you generate your own beep. And now in Version 13.1 you can use Beep["string"] to set up an explanation of “your beep”, that users can retrieve through the Why the Beep? menu item.
The basic notebook user interface works as much as possible with standard interface elements on all platforms, so that when these elements are updated, we always automatically get the “most modern” look. But there are parts of the notebook interface that are quite special to Wolfram Notebooks and are always custom designed. One that hadn’t been updated for a while is the Preferences dialog—which now in Version 13.1 gets a full makeover:
✕

When you tell the Wolfram Language to do something, it normally just goes off and does it, without asking you anything (well, unless it explicitly needs input, needs a password, etc.) But what if there’s something that it might be a good idea to do, though it’s not strictly necessary? What should the user interface for this be? It’s tricky, but I think we now have a good solution that we’ve started deploying in Version 13.1.
In particular, in Version 13.1, there’s an example related to the Wolfram Function Repository. Say you use a function for which an update is available. What now happens is that a blue box is generated that tells you about the update—though it still keeps going with the computation, ignoring the update:
✕

If you click the Update Now button in the blue box you can do the update. And then the point is that you can run the computation again (for example, just by pressing shiftenter), and now it’ll use the update. In a sense the core idea is to have an interface where there are potentially multiple passes, and where a computation always runs to completion, but you have an easy way to change how it’s set up, and then run it again.
One of the great things about the Wolfram Language is that it works well for programs of any scale—from less than a line long to millions of lines long. And for the past several years we’ve been working on expanding our support for very large Wolfram Language programs. Using LSP (Language Server Protocol) we’ve provided the capability for most standard external IDEs to automatically do syntax coloring and other customizations for the Wolfram Language.
In Version 13.1 we’re also adding a couple of features that make largescale code editing in notebooks more convenient. The first—and widely requested—is block indent and outdent of code. Select the lines you want to indent or outdent and simply press tab or shifttab to indent or outdent them:
✕

Ever since Version 6.0 we’ve had the ability to work with .wl package files (as well as .wls script files) using our notebook editing system. A new default feature in Version 13.1 is numbering of all code lines that appear in the underlying file (and, yes, we correctly align line numbers accounting for the presence of noncode cells):
✕

So now, for example, if you get a syntax error from Get or a related function, you’ll immediately be able to use the line number it reports to find where it occurs in the underlying file.
In Version 12.2 we introduced Canvas as a convenient interface for interactive drawing in notebooks. In Version 13.1 we’re introducing the notion of toggling a canvas on top of any cell.
Given a cell, just select it and press , and you’ll get a canvas:
✕

Now you can use the drawing tools in the canvas to create an annotation overlay:
✕

If you evaluate the cell, the overlay will stay. (You can get rid of the “canvas wrapper” by applying Normal.)
In Version 12.3 we introduced Tree as a new fundamental construct in the Wolfram Language. In Version 13.0 we added a variety of styling options for trees, and in Version 13.1 we’re adding more styling as well as a variety of new fundamental features.
An important update to the fundamental Tree construct in Version 13.1 is the ability to name branches at each node, by giving them in an association:
✕

All tree functions now include support for associations:
✕

In many uses of trees the labels of nodes are crucial. But particularly in more abstract applications one often wants to deal with unlabeled trees. In Version 13.1 the function UnlabeledTree (roughly analogously to UndirectedGraph) takes a labeled tree, and basically removes all visible labels. Here is a standard labeled tree
✕

and here’s the unlabeled analog:
✕

In Version 12.3 we introduced ExpressionTree for deriving trees from general symbolic expressions. Our plan is to have a wide range of “special trees” appropriate for representing different specific kinds of symbolic expressions. We’re beginning this process in Version 13.1 by, for example, having the concept of “Dataset trees”. Here’s ExpressionTree converting a dataset to a tree:
✕

And now here’s TreeExpression “inverting” that, and producing a dataset:
✕

(Remember the convention that *Tree functions return a tree; while Tree* functions take a tree and return something else.)
Here’s a “graph rendering” of a more complicated dataset tree:
✕

The new function TreeLeafCount lets you count the total number of leaf nodes on a tree (basically the analog of LeafCount for a general symbolic expression):
✕

Another new function in Version 13.1 that’s often useful in getting a sense of the structure of a tree without inspecting every node is RootTree. Here’s a random tree:
✕

RootTree can get a subtree that’s “close to the root”:
✕

It can also get a subtree that’s “far from the leaves”, in this case going down to elements that are at level –2 in the tree:
✕

In some ways the styling of trees is like the styling of graphs—though there are some significant differences as a result of the hierarchical nature of trees. By default, options inserted into a particular tree element affect only that tree element:
✕

But you can give rules that specify how elements in the subtree below that element are affected:
✕

In Version 13.1 there is now detailed control available for styling both nodes and edges in the tree. Here’s an example that gives styling for parent edges of nodes:
✕

Options like TreeElementStyle determine styling from the positions of elements. TreeElementStyleFunction, on the other hand, determines styling by applying a function to the data at each node:
✕

This uses both data and position information for each node:
✕

In analogy with VertexShapeFunction for graphs, TreeElementShapeFunction provides a general mechanism to specify how nodes of a tree should be rendered. This named setting for TreeElementShapeFunction makes every node be displayed as a circle:
✕

We first introduced dates into Wolfram Language in Version 2.0, and we introduced modern date objects in Version 10.0. But to really make dates fully computable, there are many detailed cases to consider. And in Version 13.1 we’re dealing with yet another of them. Let’s say you’ve got the date January 31, 2022. What date is one month later—given that there’s no February 31, 2022?
If we define a month “physically”, it corresponds to a certain fractional number of days:
✕

And, yes, we can use this to decide what is a month after January 31, 2022:
✕

Slightly confusing here is that we’re dealing with date objects of “day” granularity. We can see more if we go down to the level of minutes:
✕

If one’s doing something like astronomy, this kind of “physical” date computation is probably what one wants. But if one’s doing everyday “human” activities, it’s almost certainly not what one wants; instead, one wants to land on some calendar date or another.
Here’s the default in the Wolfram Language:
✕

But now in Version 13.1 we can parametrize more precisely what we want. This default is what we call "RollBackward": wherever we “land” by doing the raw date computation, we “roll backward” to the first valid date. An alternative is "RollForward":
✕

Whatever method one uses, there are going to be weird cases. Let’s say we start with several consecutive dates:
✕

With "RollBackward" we have the weirdness of repeating February 28:
✕

With "RollForward" we have the weirdness of repeating March 1:
✕

Is there any alternative? Yes, we can use "RollOver":
✕

This keeps advancing through days, but then has the weirdness that it goes backwards. And, yes, there’s no “right answer” here. But in Version 13.1 you can now specify exactly what you want the behavior to be.
The same issue arises not just for months, but also, for example, for years. And it affects not just DatePlus, but also DateDifference.
It’s worth mentioning that in Version 13.1, in addition to dealing with the detail we’ve just discussed, the whole framework for doing “date arithmetic” in Wolfram Language has been made vastly more efficient, sometimes by factors of hundreds.
We’ve had ImageCapture since Version 8.0 (in 2010) and AudioCapture since Version 11.1 (in 2017). Now in Version 13.1 we have VideoCapture. By default VideoCapture[] gives you a GUI that lets you record from your camera:
✕

Clicking the down arrow opens up a preview window that shows your current video:
✕

When you’ve finished recording, VideoCapture returns the Video object you created:
✕
VideoCapture[] 
Now you can process or analyze this Video object just like you would any other:
✕

VideoCapture[] is a blocking operation that waits until you’ve finished recording, then returns a result. But VideoCapture can also be used “indirectly” as a dynamic control. Thus, for example
✕

lets you asynchronously start and stop recording, even as you do other things in your Wolfram Language session. But every time you stop recording, the value of video is updated.
VideoCapture records video from your camera (and you can use the ImageDevice option to specify which one if you have several). VideoScreenCapture, on the other hand, records from your computer screen—in effect providing a video analog of CurrentScreenImage.
VideoScreenCapture[], like VideoCapture[], is a blocking operation as far as the Wolfram Language is concerned. But if you want to watch something happening in another application (say, a web browser), it’ll do just fine. And in addition, you can give a screen rectangle to capture a particular region on your screen:
✕
VideoScreenCapture[{{0, 50}, {640, 498}}] 
Then for example you can analyze the time series of RGB color levels in the video that’s produced:
✕

What if you want to screen record from a notebook? Well, then you can use the asynchronous dynamic recording mechanism that exists in VideoScreenCapture just as it does in VideoCapture.
By the way, both VideoCapture and VideoScreenCapture by default capture audio. You can switch off audio recording either from the GUI, or with the option AudioInputDevice→None.
If you want to get fancy, you can screen record a notebook in which you are capturing video from your camera (which in turn shows you capturing a video, etc.):
✕
VideoScreenCapture[EvaluationNotebook[]] 
In addition to capturing video from realtime goingson, you can also generate video directly from functions like AnimationVideo and SlideShowVideo—as well as by “touring” an image using TourVideo. In Version 13.1 there are some significant enhancements to TourVideo.
Take an animal scene and extract bounding boxes for elephants and zebras:
✕

Now you can make a tour video that visits each animal:
✕


Define a path function of a variable t:
✕

✕

Now we can use the path function to make a “spiralling” tour video:


Transforming college calculus was one of the early achievements of Mathematica. But even now we’re continuing to add functionality to make college calculus ever easier and smoother to do—and more immediately connectable to applications. We’ve always had the function D for taking derivatives at a point. Now in Version 13.1 we’re adding ImplicitD for finding implicit derivatives.
So, for example, it can find the derivative of x^{y} with respect to x, with y determined implicit by the constraint x^{2} + y^{2} = 1:
✕

Leave out the first argument and you’ll get the standard college calculus “find the slope of the tangent line to a curve”:
✕

So far all of this is a fairly straightforward repackaging of our longstanding calculus functionality. And indeed these kinds of implicit derivatives have been available for a long time in WolframAlpha. But for Mathematica and the Wolfram Language we want everything to be as general as possible—and to support the kinds of things that show up in differential geometry, and in things like asymptotics and validation of implicit solutions to differential equations. So in addition to ordinary collegelevel calculus, ImplicitD can do things like finding a second implicit derivative on a curve defined by the intersection of two surfaces:
✕

In Mathematica and the Wolfram Language Integrate is a function that just gets you answers. (In WolframAlpha you can ask for a stepbystep solution too.) But particularly for educational purposes—and sometimes also when pushing boundaries of what’s possible—it can be useful to do integrals in steps. And so in Version 13.1 we’ve added the function IntegrateChangeVariables for changing variables in integrals.
An immediate issue is that when you specify an integral with Integrate[...], Integrate will just go ahead and do the integral:
✕

But for IntegrateChangeVariables you need an “undone” integral. And you can get this using Inactive, as in:
✕

And given this inactive form, we can use IntegrateChangeVariables to do a “trig substitution”:
✕

The result is again an inactive form, now stating the integral differently. Activate goes ahead and actually does the integral:
✕

IntegrateChangeVariables can deal with multiple integrals as well—and with named coordinate systems. Here it’s transforming a double integral to polar coordinates:
✕

Although the basic “structural” transformation of variables in integrals is quite straightforward, the whole story of IntegrateChangeVariables is considerably more complicated. “Collegelevel” changes of variables are usually carefully arranged to come out easily. But in the more general case, IntegrateChangeVariables ends up having to do nontrivial transformations of geometric regions, difficult simplifications of integrands subject to certain constraints, and so on.
In addition to changing variables in integrals, Version 13.1 also introduces DSolveChangeVariables for changing variables in differential equations. Here it’s transforming the Laplace equation to polar coordinates:
✕

Sometimes a change of variables can just be a convenience. But sometimes (think General Relativity) it can lead one to a whole different view of a system. Here, for example, an exponential transformation converts the usual Cauchy–Euler equation to a form with constant coefficients:
✕

The first derivative of x^{2} is 2x; the second derivative is 2. But what is the derivative? It’s a question that was asked (for example by Leibniz) even in the first years of calculus. And by the 1800s Riemann and Liouville had given an answer—which in Version 13.1 can now be computed by the new FractionalD:
✕

And, yes, do another derivative and you get back the 1^{st} derivative:
✕

In the more general case we have:
✕

And this works even for negative derivatives, so that, for example, the (–1)^{st} derivative is an ordinary integral:
✕

It can be at least as difficult to compute a fractional derivative as an integral. But FractionalD can still often do it
✕

though the result can quickly become quite complicated:
✕

Why is FractionalD a separate function, rather than just being part of a generalization of D? We discussed this for quite a while. And the reason we introduced the explicit FractionalD is that there isn’t a unique definition of fractional derivatives. In fact, in Version 13.1 we also support the Caputo fractional derivative (or differintegral) CaputoD.
For the derivative of x^{2}, the answer is still the same:
✕

But as soon as a function isn’t zero at x = 0 the answer can be different:
✕

CaputoD is a particularly convenient definition of fractional differentiation when one’s dealing with Laplace transforms and differential equations. And in Version 13.1 we can now only compute CaputoD but also do integral transforms and solve equations that involve it.
Here’s a order differential equation
✕

and a order one
✕

as well as a π^{th}order one:
✕

Note the appearance of MittagLefflerE. This function (which we introduced in Version 9.0) plays the same kind of role for fractional derivatives that Exp plays for ordinary derivatives.
In February 1990 an internal bug report was filed against the stillindevelopment Version 2.0 of Mathematica:
✕

It’s taken a long time (and similar issues have been reported many times), but in Version 13.1 we can finally close this bug!
Consider the differential equation (the Clairaut equation):
✕

What DSolve does by default is to give the generic solution to this equation, in terms of the parameter 𝕔_{1}. But the subtle point (which in optics is associated with caustics) is that the family of solutions for different values of 𝕔_{1} has an envelope which isn’t itself part of the family of solutions, but is also a solution:
✕

In Version 13.1 you can request that solution with the option IncludeSingularSolutions→True:
✕

And here’s a plot of it:
✕

DSolve was a new function (back in 1991) in Version 2.0. Another new function in Version 2.0 was Residue. And in Version 13.1 we’re also adding an extension to Residue: the function ResidueSum. And while Residue finds the residue of a complex function at a specific point, ResidueSum finds a sum of residues.
This computes the sum of all residues for a function, across the whole complex plane:
✕

This computes the sum of residues within a particular region, in this case the unit disk:
✕

An important part of the builtin documentation for the Wolfram Language are what we call “guide pages”—pages like the following that organize functions (and other constructs) to give an overall “cognitive map” and summary of some area:
✕

In Version 13.1 it’s now easy to create your own custom guide pages. You can list builtin functions or other constructs, as well as things from the Wolfram Function Repository and other repositories.
Go to the “root page” of the Documentation Center and press the icon:
✕

You’ll get a blank custom guide page:
✕

Fill in the guide page however you want, then use Deploy to deploy the page either locally, or to your cloud account. Either way, the page will now show up in the menu from the top of the root guide page (and they’ll also show up in search):
✕

You might end up creating just one custom guide page for your favorite functions. Or you might create several, say one for each task or topic you commonly deal with. Guide pages aren’t about putting in the effort to create fullscale documentation; they’re much more lightweight, and aimed more at providing quick (“what was that function called?”) reminders and “bigpicture” maps—leveraging all the specific function and other documentation that already exists.
At first it seemed like a minor feature. But once we’d implemented it, we realized it was much more useful than we’d expected. Just as you can style a graphics object with its color (and, as of Version 13.0, its filling pattern), now in Version 13.1 you can style it with its drop shadowing:
✕

Drop shadowing turns out to be a nice way to “bring graphics to life”
✕

or to emphasize one element over others:
✕

It works well in geo graphics as well:

DropShadowing allows detailed control over the shadows: what direction they’re in, how blurred they are and what color they are:
✕

Drop shadowing is more complicated “under the hood” than one might imagine. And when possible it actually works using hardware GPU pixel shaders—the same technology that we’ve used since Version 12.3 to implement materialbased surface textures for 3D graphics. In Version 13.1 we’ve explicitly exposed some wellknown underlying types of 3D shading. Here’s a geodesic polyhedron (yes, that’s another new function in Version 13.1), with its surface normals added (using the again new function EstimatedPointNormals):
✕

Here’s the most basic form of shading: flat shading of each facet (and the specularity in this case doesn’t “catch” any facets):
✕

Here now is Gouraud shading, with a somewhatfaceted glint:
✕

And then there’s Phong shading, looking somewhat more natural for a sphere:
✕

Ever since Version 1.0, we’ve had an interactive way to rotate—and zoom into—3D graphics. (Yes, the mechanism was a bit primitive 34 years ago, but it rapidly got to more or less its modern form.) But in Version 13.1 we’re adding something new: the ability to “dolly” into a 3D graphic, imitating what would happen if you actually walked into a physical version of the graphic, as opposed to just zooming your camera:
✕

And, yes, things can get a bit surreal (or “treky”)—here dollying in and then zooming out:
There are some capabilities that—over the course of years—have been requested over and over again. In the past these have included infinite undo, high dpi displays, multiple axis plots, and others. And I’m happy to say that most of these have now been taken care of. But there’s one—seemingly obscure—“straggler” that I’ve heard about for well over 25 years, and that I’ve actually also wanted myself quite a few times: 3D Voronoi diagrams. Well, in Version 13.1, they’re here.
Set up 25 random points in 3D:
✕

✕

Now make a Voronoi mesh for these points:
✕

To “see inside” we can use opacity:
✕

Why was this so hard? In a Voronoi there’s a cell that surrounds each original point, and includes everywhere that’s closer to that point than to any other. We’ve had 2D Voronoi meshes for a long time:
✕

But there’s something easier about the 2D case. The issue is not so much the algorithm for generating the cells as it is how the cells can be represented in such a way that they’re useful for subsequent computations. In the 2D case each cell is just a polygon.
But in the 3D case the cells are polyhedra, and to make a Voronoi mesh we have to have a polyhedral mesh where all the polyhedra fit together. And it’s taken us many years to build the large tower of computational geometry necessary to support this. There’s a somewhat simpler case based purely on cells that are always either simplices or hexahedra—that we’ve used for finiteelement solutions to PDEs for a while. But in a true 3D Voronoi that’s not enough: the cells can be any (convex) polyhedral shape.
Here are the “puzzle piece” cells for the 3D Voronoi mesh we made above:
✕

Pick 500 random points inside an annulus:
✕

✕

Version 13.1 now has a general function reconstructing geometry from a cloud of points:
✕

(Of course, given only a finite number of points, the reconstruction can’t be expected to be perfect.)
The function also works in 3D:
✕

✕

ReconstructionMesh is a general superfunction that uses a variety of methods, including extended versions of the functions ConcaveHullMesh and GradientFittedMesh that were introduced in Version 13.0. And in addition to reconstructing “solid objects”, it can also reconstruct lowerdimensional things like curves and surfaces:
✕

A related function new in Version 13.1 is EstimatedPointNormals, which reconstructs not the geometry itself, but normal vectors to each element in the geometry:
✕

In every new version for the past 30 years we’ve steadily expanded our visualization capabilities, and Version 13.1 is no exception. One function we’ve added is TernaryListPlot—an analog of ListPlot that conveniently plots triples of values where what one’s trying to emphasize is their ratios. For example let’s plot data from our knowledgebase on the sources of electricity for different countries:
✕

The plot shows the “energy mixture” for different countries, with the ones on the bottom axis being those with zero nuclear. Inserting colors for each axis, along with grid lines, helps explain how to read the plot:
✕

Most of the time plots are plotting numbers, or at least quantities. In Version 13.0, we extended functions like ListPlot to also accept dates. In Version 13.1 we’re going much further, and introducing the possibility of plotting what amount to purely symbolic values.
Let’s say our data consists of letters A through C:
✕

How do we plot these? In Version 13.1 we just specify an ordinal scale:
✕

OrdinalScale lets you specify that certain symbolic values are to be treated as if they are in a specified order. There’s also the concept of a nominal scale—represented by NominalScale—in which different symbolic values correspond to different “categories”, but in no particular order.
Molecule lets one symbolically represent a molecule. Quantity lets one symbolically represent a quantity with units. In Version 13.1 we now have the new construct ChemicalInstance that’s in effect a merger of these, allowing one to represent a certain quantity of a certain chemical.
This gives a symbolic representation of 1 liter of acetone (by default at standard temperature and pressure):
✕

We can ask what the mass of this instance of this chemical is:
✕

ChemicalConvert lets us do a conversion returning particular units:
✕

Here’s instead a conversion to moles:
✕

This directly gives the amount of substance that 1 liter of acetone corresponds to:
✕

This generates a sequence of straightchain hydrocarbons:
✕

Here’s the amount of substance corresponding to 1 g of each of these chemicals:
✕

ChemicalInstance lets you specify not just the amount of a substance, but also its state, in particular temperature and pressure. Here we’re converting 1 kg of water at 4° C to be represented in terms of volume:
✕

At the core of the Wolfram Language is the abstract idea of applying transformations to symbolic expressions. And at some level one can view chemistry and chemical reactions as a physical instantiation of this idea, where one’s not dealing with abstract symbolic constructs, but instead with actual molecules and atoms.
In Version 13.1 we’re introducing PatternReaction as a symbolic representation for classes of chemical reactions—in effect providing an analog for chemistry of Rule for general symbolic expressions.
Here’s an example of a “pattern reaction”:
✕

The first argument specifies a pair of “reactant” molecule patterns to be transformed into “product” molecule patterns. The second argument specifies which atoms in which reactant molecules map to which atoms in which product molecules. If you mouse over the resulting pattern reaction, you’ll see corresponding atoms “light up”:
✕

Given a pattern reaction, we can use ApplyReaction to apply the reaction to concrete molecules:
✕

Here are plots of the resulting product molecules:
✕

The molecule patterns in the pattern reaction are matched against subparts of the concrete molecules, then the transformation is done, leaving the other parts of the molecules unchanged. In a sense it’s the direct analog of something like
✕

where the b in the symbolic expression is replaced, and the result is “knitted back” to fill in where the b used to be.
You can do what amounts to various kinds of “chemical functional programming” with ApplyReaction and PatternReaction. Here’s an example where we’re essentially building up a polymer by successive nesting of a reaction:
✕

✕

It’s often convenient to build pattern reactions symbolically using Wolfram Language “chemical primitives”. But PatternReaction also lets you specify reactions as SMARTS strings:
✕

It’s been a 25year journey, steadily increasing our builtin PDE capabilities. And in Version 13.1 we’ve added several (admittedly somewhat technical) features that have been much requested, and are important for solving particular kinds of realworld PDE problems. The first feature is being able to set up a PDE as axisymmetric. Normally a 2D diffusion term would be assumed Cartesian:
✕

But now you can say you’re dealing with an axisymmetric system, with your coordinates being interpreted as radius and height, and everything assumed to be symmetrical in the azimuthal direction:
✕

What’s important about this is not just that it makes it easy to set up certain kinds of equations, but also that in solving equations axial symmetry can be assumed, allowing much more efficient methods to be used:
✕

Also in Version 13.1 is an extension to the solid mechanics modeling framework introduced in Version 13.0. Just as there’s viscosity that damps out motion in fluids, so there’s a similar phenomenon that damps out motion in solids. It’s more of an engineering story, and it’s usually described in terms of two parameters: mass damping and stiffness damping. And now in Version 13.1 we support this kind of socalled Rayleigh damping in our modeling framework.
Another phenomenon included in Version 13.1 is hyperelasticity. If you bend something like metal beyond a certain point (but not so far that it breaks), it’ll stay bent. But materials like rubber and foam (and some biological tissues) can “bounce back” from basically any deformation.
Let’s imagine that we have a square of rubberlike material. We anchor it on the left, and then we pull it on the right with a certain force. What does it do?
This defines the properties of our material:
✕

We define variables for the problem, representing x and y displacements by u and v:
✕

Now we can set up our whole problem, and solve the PDEs for it for each value of the force:
✕

✕

Then one can plot the results, and see the rubber being nonlinearly stretched:
✕

There’s in the end considerable depth in our handling of PDEbased modeling, and our increasing ability to do “multiphysics” computations that span multiple types of physics (mechanical, thermal, electromagnetic, acoustic, …). And by now we’ve got nearly 1000 pages of documentation purely about PDEbased modeling. And for example in Version 13.1 we’ve added a monograph specifically about hyperelasticity, as well as expanded our collection of documented PDE models:
Let’s say you have trained a machine learning model and you apply it to a particular input. It gives you some result. But why? What were the important features in the input that led it to that result? In Version 13.1 we’re introducing several functions that try to answer such questions.
Here’s some simple “training data”:
✕

We can use machine learning to make a predictor for this data:
✕

Applying the predictor to a particular input gives us a prediction:
✕

What was important in making this prediction? The "SHAPValues" property introduced in Version 12.3 tells us what contribution each feature made to the result; in this case v was more important than u in determining the value of the prediction:
✕

But what about in general, for all inputs? The new function FeatureImpactPlot gives a visual representation of the contribution or “impact” of each feature in each input on the output of the predictor:
✕

What does this plot mean? It’s basically showing how often there are what contributions from values of the two input features. And with this particular predictor we see that there’s a wide range of contributions from both features.
If we use a different method to create the predictor, the results can be quite different. Here we’re using linear regression, and it turns out that with this method v never has much impact on predictions:
✕

If we make a predictor using a decision tree, the feature impact plot shows the splitting of impact corresponding to different branches of the tree:
✕

FeatureImpactPlot gives a kind of bird’seye view of the impact of different features. FeatureValueImpactPlot gives more detail, showing as a function of the actual values of input features the impact points with those values would have on the final prediction (and, yes, the actual points plotted here are based on data simulated on the basis of the distribution inferred by the predictor; the actual data is usually too big to want to carry around, at least by default):
✕

CumulativeFeatureImpactPlot gives a visual representation of how “successive” features affect the final value for each (simulated) data point:
✕

For predictors, feature impact plots show impact on predicted values. For classifiers, they show impact on (log) probabilities for particular outcomes.
One area that leverages many algorithmic capabilities of the Wolfram Language is control systems. We first started developing control systems functionality more than 25 years ago, and by Version 8.0 ten years ago we started to have builtin functions like StateSpaceModel and BodePlot specifically for working with control systems.
Over the past decade we’ve progressively been adding more builtin control systems capabilities, and in Version 13.1 we’re now introducing model predictive controllers (MPCs). Many simple control systems (like PID controllers) take an ad hoc approach in which they effectively just “watch what a system does” without trying to have a specific model for what’s going on inside the system. Model predictive control is about having a specific model for a system, and then deriving an optimal controller based on that model.
For example, we could have a statespace model for a system:
✕

Then in Version 13.1 we can derive (using our parametric optimization capabilities) an optimal controller that minimizes a certain set of costs while satisfying particular constraints:
✕

The SystemsModelControllerData that we get here contains a variety of elements that allow us to automate the control design and analysis workflow. As an example, we can get a model that represents the controller running in a closed loop with the system it is controlling:
✕

Now let’s imagine that we drive this whole system with the input:
✕

Now we can compute the output response for the system, and we see that both output variables are driven to zero through the operation of the controller:
✕

Within the SystemsModelControllerData object generated by ModelPredictiveController is the actual controller computed in this case—using the new construct DiscreteInputOutputModel:
✕

What actually is this controller? Ultimately it’s a collection of piecewise functions that depends on the values of states x_{1}[t] and x_{2}[t]:
✕

And this shows the different statespace regions in which the controller has:
✕

In Version 13.0 we introduced our question and assessment framework that allows you to author things like quizzes in notebooks, together with assessment functions, then deploy these for use. In Version 13.1 we’re adding capabilities to let you algorithmically or randomly generate questions.
The two new functions QuestionGenerator and QuestionSelector let you specify questions to be generated according to a template, or randomly selected from a pool. You can either use these functions directly in pure Wolfram Language code, or you can use them through the Question Notebook authoring GUI.
When you select Insert Question in the GUI, you now get a choice between Fixed Question, Randomized Question and Generated Question:
✕

Pick Randomized Question and you’ll get
✕

which then allows you to enter questions, and eventually produce a QuestionSelector—which will select newly randomized questions for every copy of the quiz that’s produced:
✕

Version 13.1 also introduces some enhancements for authoring questions. An example is a pureGUI “nocode” way to specify multiplechoice questions:
✕

In the Wolfram Language expressions normally have two aspects: they have a structure, and they have a meaning. Thus, for example, Plus[1,1] has both a definite tree structure
✕

and has a value:
✕

In the normal operation of the Wolfram Language, the evaluator is automatically applied to all expressions, and essentially the only way to avoid evaluation by the evaluator is to insert “wrappers” like Hold and Inactive that necessarily change the structure of expressions.
In Version 13.1, however, there’s a new way to handle “unevaluated” expressions: the "ExprStruct" data structure. ExprStructs represent expressions as raw data structures that are never directly seen by the evaluator, but can nevertheless be structurally manipulated.
This creates an ExprStruct corresponding to the expression {1,2,3,4}:
✕

This structurally wraps Total around the list, but does no evaluation:
✕

One can also see this by “visualizing” the data structure:
✕

Normal takes an ExprStruct object and converts it to a normal expression, to which the evaluator is automatically applied:
✕

One can do a variety of essentially structural operations directly on an ExprStruct. This applies Plus, then maps Factorial over the resulting ExprStruct:
✕

The result is an ExprStruct representing an unevaluated expression:
✕

With "MapImmediateEvaluate" there is an evaluation done each time the mapping operation generates an expression:
✕

One powerful use of ExprStruct is in doing code transformations. And in a typical case one might want to import expressions from, say, a .wl file, then manipulate them in ExprStruct form. In Version 13.1 Import now supports an ExprStructs import element:
✕

This selects expressions that correspond to definitions, in the sense that they have SetDelayed as their head:
✕

Here’s a visualization of the first one:
✕

Let’s say you’ve got external code that’s in a compiled Ccompatible dynamic library. An important new capability in Version 13.1 is a superefficient and very streamlined way to call any function in a dynamic library directly from within the Wolfram Language.
It’s one of the accelerating stream of developments that are being made possible by the largescale infrastructure buildout that we’ve been doing in connection with the new Wolfram Language compiler—and in particular it often leverages our sophisticated new typehandling capabilities.
As a first example, let’s consider the RAND_bytes (“cryptographically secure pseudorandom number generator”) function in OpenSSL. The C declaration for this function is:
In Version 13.1 we now have a symbolic way to represent such a declaration directly in the Wolfram Language:
✕

(In general we’d also have to specify the library that this function is coming from. OpenSSL happens to be a library that’s loaded by default with the Wolfram Language so you don’t need to mention it.)
There are quite a few new things going on in the declaration. First, as part of our collection of compiled types, we’re adding ones like "CInt" and "CChar" that refer to raw C language types (here int and char). There’s also CArray which is for declaring C arrays. Notice the new ::[ ... ] syntax for TypeSpecifier that allows compact specifications for parametrized types, like the char* here, that is described in Wolfram Language as "CArray"::["CChar"].
Having set up the declaration, we now need to create an actual function that can take an argument from Wolfram Language, convert it to something suitable for the library function, then call the library function, and convert the result back to Wolfram Language form. Here’s a way to do that in this case:
✕

What we get back is a compiled code function that we can directly use, and that works by very efficiently calling the library function:
✕

The FunctionCompile above uses several constructs that are new in Version 13.1. What it fundamentally does is to take a Wolfram Language integer (which it assumes to be a machine integer), cast it into a C integer, then pass this to the library function, along with a specification of a C char * into which the library function will put its result, and from which the final Wolfram Language result will be retrieved.
It’s worth emphasizing that most of the complexity here has to do with handling data types and conversions between them—something that the Wolfram Language goes to a lot of trouble to avoid usually exposing the user to. But when we’re connecting to external languages that make fundamental use of types, there’s no choice but to deal with them, and the complexity they involve.
In the FunctionCompile above the first new construct we encounter is
✕

The basic purpose of this is to create the buffer into which the external function will write its results. The buffer is an array of bytes, declared in C as char *, or here as "CArray"::["CChar"]. There’s an actual wrinkle though: who’s going to manage the memory associated with this array? The "Managed":: type specifier says that the Wolfram Language wrapper will do memory management for this object.
The next new construct we see in the FunctionCompile is
✕

Cast is one of a family of new functions that can appear in compilable code, but have no significance outside the compiler. Cast is used to specify that data should be converted to a form consistent with a specified type (here a C int type).
The core of the FunctionCompile is the use of LibraryFunction, which is what actually calls the external library function that we declared with the library function declaration.
The last step in the function compiled by FunctionCompile is to extract data from the C array and return it as a Wolfram Language list. To do this requires the new function FromRawPointer, which actually retrieves data from a specified location in memory. (And, yes, this is a raw dereferencing operation that will cause a crash if it isn’t done correctly.)
All of this may at first seem rather complicated, but for what it’s doing, it’s remarkably simple—and greatly leverages the whole symbolic structure of the Wolfram Language. It’s also worth realizing that in this particular example, we’re just dipping into compiled code and then returning results. In largerscale cases we’d be doing many more operations—typically specified directly by toplevel Wolfram Language code—within compiled code, and so type declaration and conversion operations would be a smaller fraction of the code we have to write.
One feature of the example we’ve just looked at is that it only uses builtin types. But in Version 13.1 it’s now possible to define custom types, such as the analog of C structs. As an example, consider the function ldiv from the C standard library. This function returns an object of type ldiv_t, defined by the following typedef:
Here’s the Wolfram Language version of this declaration, based on setting up a "Product" type named "CLDivT":
✕

(The "ReferenceSemantics"False option specifies that this type will actually be passed around as a value, rather than just a pointer to a value.)
Now the declaration for the ldiv function can use this new custom type:
✕

The final definition of the call to the external ldiv function is then:
✕

And now we can use the function (and, yes, it will be as efficient as if we’d directly written everything in C):
✕

The examples we’ve given here are very small ones. But the whole structure for external function calls that’s now in Version 13.1 is set up to handle large and complex situations—and indeed we’ve been using it internally with great success to set up important new builtin pieces of the Wolfram Language.
One of the elements that’s often needed in more complex situations is more sophisticated memory management, and our new "Managed" type provides a convenient and streamlined way to do this.
This makes a compiled function that creates an array of 10,000 machine integers:
✕

Running the function effectively “leaks” memory:
✕

But now define a version of the function in which the array is “managed”:
✕

Now the memory associated with the array is automatically freed when it is no longer referenced:
✕

If you have an explicit pure function (Function[...]) you can use FunctionCompile to produce a compiled version of it. But what if you have a function that’s defined using downvalues, as in:
✕

In Version 13.1 you can directly compile function definitions like this. But—as is the nature of compilation—you have declare what types are involved. Here is a declaration for the function fac that says it takes a single machine integer, and returns a machine integer:
✕

Now we can create a compiled function that computes fac[n]:
✕

The compiled function runs significantly faster than the ordinary symbolic definition:
✕

✕

The ability to declare and use downvalue definitions in compilation has the important feature that it allows you to write a definition just once, and then use it both directly, and in compiled code.
An early focus of the Wolfram Language compiler is handling lowlevel “machine” types, such as integers or reals of certain lengths. But one of the advances in the Version 13.1 compiler is direct support for an "InertExpression" type for representing any Wolfram Language expression within compiled code.
When you use something like FunctionCompile, it will explicitly try to compile whatever Wolfram Language expressions it’s given. But if you wrap the expressions with InertExpression the compiler will then just treat the expressions as inert structural objects of type "InertExpression". This sets up a compiled function that constructs an expression (implicitly of type "InertExpression"):
< 