Today marks an important milestone for Wolfram|Alpha, and for computational knowledge in general: for the first time, Wolfram|Alpha is now on average giving complete, successful responses to more than 90% of the queries entered on its website (and with “nearby” interpretations included, the fraction is closer to 95%).
I consider this an impressive achievement—the hard-won result of many years of progressively filling out the knowledge and linguistic capabilities of the system.
The picture below shows how the fraction of successful queries (in green) has increased relative to unsuccessful ones (red) since Wolfram|Alpha was launched in 2009. And from the log scale in the right-hand panel, we can see that there’s been a roughly exponential decrease in the failure rate, with a half-life of around 18 months. It seems to be a kind of Moore’s law for computational knowledge: the net effect of innumerable individual engineering achievements and new ideas is to give exponential improvement.
But to celebrate reaching our 90% query success rate, I thought it’d be fun to take a look at some of what we’ve left behind. Ever since the early days of Wolfram|Alpha, we’ve been keeping a scrapbook of our favorite examples of “artificial stupidity”: places where Wolfram|Alpha gets the wrong idea, and applies its version of “artificial intelligence” to go off in what seems to us humans as a stupid direction.
Here’s an example, captured over a year ago (and now long-since fixed):
When we typed “guinea pigs”, we probably meant those furry little animals (which for example I once had as a kid). But Wolfram|Alpha somehow got the wrong idea, and thought we were asking about pigs in the country of Guinea, and diligently (if absurdly, in this case) told us that there were 86,431 of those in a 2008 count.
At some level, this wasn’t such a big bug. After all, at the top of the output Wolfram|Alpha perfectly well told us it was assuming “‘guinea’ is a country”, and offered the alternative of taking the input as a “species specification” instead. And indeed, if one tries the query today, the species is the default, and everything is fine, as below. But having the wrong default interpretation a year ago was a simple but quintessential example of artificial stupidity, in which a subtle imperfection can lead to what seems to us laughably stupid behavior.
Here’s what “guinea pigs” does today—a good and sensible result:
Below are some other examples from our scrapbook of artificial stupidity, collected over the past 3 years. I’m happy to say that every single one of these now works nicely; many actually give rather impressive results, which you can see by clicking each image below.
There’s a certain humorous absurdity to many of these examples. In fact, looking at them suggests that this kind of artificial stupidity might actually be a good systematic source of things that we humans find humorous.
But where is the artificial stupidity coming from? And how can we overcome it?
There are two main issues that seem to combine to produce most of the artificial stupidity we see in these scrapbook examples. The first is that Wolfram|Alpha tries too hard to please—valiantly giving a result even if it doesn’t really know what it’s talking about. And the second is that Wolfram|Alpha may simply not know enough—so that it misses the point because it’s completely unaware of some possible meaning for a query.
Curiously enough, these two issues come up all the time for humans too—especially, say, when they’re talking on a bad cellphone connection, and can’t quite hear clearly.
For humans, we don’t yet know the internal story of how these things work. But in Wolfram|Alpha it’s very well defined. It’s millions of lines of Mathematica code, but ultimately what Wolfram|Alpha does is to take the fragment of natural language it’s given as input, and try to map it into some precise symbolic form (in the Mathematica language) that represents in a standard way the meaning of the input—and from which Wolfram|Alpha can compute results.
By now—particularly with data from nearly 3 years of actual usage—Wolfram|Alpha knows an immense amount about the detailed structure and foibles of natural language. And of necessity, it has to go far beyond what’s in any grammar book.
When people type input to Wolfram|Alpha, I think we’re seeing a kind of linguistic representation of undigested thoughts. It’s not a random soup of words (as people might feed a search engine). It has structure—often quite complex—but it has scant respect for the niceties of traditional word order or grammar.
And as far as I am concerned one of the great achievements of Wolfram|Alpha is the creation of a linguistic understanding system that’s robust enough to handle such things, and successfully to convert them to precise computable symbolic expressions.
One can think of any particular symbolic expression as having a certain “basin of attraction” of linguistic forms that will lead to it. Some of these forms may look perfectly reasonable. Others may look odd—but that doesn’t mean they can’t occur in the “stream of consciousness” of actual Wolfram|Alpha queries made by humans.
And usually it won’t hurt anything to allow even very odd forms, with quite bizarre distortions of common language. Because the worst that will happen is that these forms just won’t ever actually get used as input.
But here’s the problem: what if one of those forms overlaps with something with a quite different meaning? If it’s something that Wolfram|Alpha knows about, Wolfram|Alpha’s linguistic understanding system will recognize the clash, and—if all is working properly—will choose the correct meaning.
But what happens if the overlap is with something Wolfram|Alpha doesn’t know about?
In the last scrapbook example above (from 2 years ago) Wolfram|Alpha was asked “what is a plum”. At the time, it didn’t know about fruits that weren’t explicitly plant types. But it did happen to know about a crater on the moon named “Plum”. The linguistic understanding system certainly noticed the indefinite article “a” in front of “plum”. But knowing nothing with the name “plum” other than a moon crater (and erring—at least on the website—in the direction of giving some response rather than none), it will have concluded that the “a” must be some kind of “linguistic noise”, gone for the moon crater meaning, and done something that looks to us quite stupid.
How can Wolfram|Alpha avoid this? The answer is simple: it just has to know more.
One might have thought that doing better at understanding natural language would be about covering a broader range of more grammar-like forms. And certainly this is part of it. But our experience with Wolfram|Alpha is that it is at least as important to add to the knowledgebase of the system.
A lot of artificial stupidity is about failing to have “common sense” about what an input might mean. Within some narrow domain of knowledge an interpretation might seem quite reasonable. But in a more general “common sense” context, the interpretation is obviously absurd. And the point is that as the domains of Wolfram|Alpha knowledge expand, they gradually fill out all the areas that we humans consider common sense, pushing out absurd “artificially stupid” interpretations.
Sometimes Wolfram|Alpha can in a sense overshoot. Consider the query “clever population”. What does it mean? The linguistic construction seems a bit odd, but I’d probably think it was talking about how many clever people there are somewhere. But here’s what Wolfram|Alpha says:
And the point is that Wolfram|Alpha knows something I don’t: that there’s a small city in Missouri named “Clever”. Aha! Now the construction “clever population” makes sense. To people in southwestern Missouri, it would probably always have been obvious. But with typical everyday knowledge and common sense, it’s not. And just like Wolfram|Alpha in the scrapbook examples above, most humans will assume that the query is about something completely different.
There’ve been a number of attempts to create natural-language question-answering systems in the history of work on artificial intelligence. And in terms of immediate user impression, the problem with these systems has usually been not so much a failure to create artificial intelligence but rather the presence of painfully obvious artificial stupidity. In ways much more dramatic than the scrapbook examples above, the system will “grab” a meaning it happens to know about, and robotically insist on using this, even though to a human it will seem stupid.
And what we learn from the Wolfram|Alpha experience is that the problem hasn’t been our failure to discover some particular magic human-thinking-like language understanding algorithm. Rather, it’s in a sense broader and more fundamental: the systems just didn’t know, and couldn’t work out, enough about the world. It’s not good enough to know wonderfully about just some particular domain; you have to cover enough domains at enough depth to achieve common sense about the linguistic forms you see.
I always conceived Wolfram|Alpha as a kind of all-encompassing project. And what’s now clear is that to succeed it’s got to be that way. Solving a part of the problem is not enough.
The fact that as of today we’ve reached a 90% success rate in query understanding is a remarkable achievement—that shows we’re definitely on the right track. And indeed, looking at the Wolfram|Alpha query stream, in many domains we’re definitely at least on a par with typical human query-understanding performance. We’re not in the running for the Turing Test, though: Wolfram|Alpha doesn’t currently do conversational exchanges, but more important, Wolfram|Alpha knows and can compute far too much to pass for a human.
And indeed after all these years perhaps it’s time to upgrade the Turing Test, recognizing that computers should actually be able to do much more than humans. And from the point of view of user experience, probably the single most obvious metric is the banishment of artificial stupidity.
When Wolfram|Alpha was first released, it was quite common to run into artificial stupidity even in casual use. And I for one had no idea how long it would take to overcome it. But now, just 3 years later, I am quite pleased at how far we’ve got. It’s certainly still possible to find artificial stupidity in Wolfram|Alpha (and it’s quite fun to try). But it’s definitely more difficult.
With all the knowledge and computation that we’ve put into Wolfram|Alpha, we’re successfully making Wolfram|Alpha not only smarter but also less stupid. And we’re continuing to progress down the exponential curve toward perfect query understanding.