Does the Turing Test pass the test of time?
Four Amazon scientists weigh in on whether the famed mathematician's definition of artificial intelligence is still applicable, and what might surprise him most today.
On Oct. 1, 1950, the journal Mind featured a 27-page entry authored by Alan Turing. More than 70 years later, that paper — "Computing Machinery and Intelligence" — which posed the question, “Can machines think?” remains foundational in artificial intelligence.
However, while the paper is iconic, the original goal of building a system comparable to human intelligence has proved elusive. In fact, Alexa VP and Head Scientist Rohit Prasad has written, “I believe the goal put forth by Turing is not a useful one for AI scientists like myself to work toward. The Turing Test is fraught with limitations, some of which Turing himself debated in his seminal paper.”
In light of the 2021 AAAI Conference on Artificial Intelligence, we asked scientists and scholars at Amazon how they view that paper today. We spoke with Yoelle Maarek, vice president of research and science for Alexa Shopping; Alex Smola, AWS vice president and distinguished scientist; Nikko Ström, Alexa AI vice president and distinguished scientist; and Gaurav Sukhatme, the USC Fletcher Jones Foundation Endowed Chair in Computer Science and Computer Engineering and an Amazon Scholar.
We asked them whether Turing’s definition of artificial intelligence still applies, what they think Turing would be surprised by in 2020, and which of today’s problems researchers will still be puzzling over 70 years from now.
Q. Does Turing’s definition of AI (essentially “a test of a machine's ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human”) still apply, or does it need to be updated?
Smola: “The core of the question remains as relevant as it was 70 years ago. That said, I would argue that rather than seeking binary (yes/no) tests for AI we should have something more gradual. For instance, the argument could be about how long a machine can fool a human. Alexa and others by now do a pretty good job for many queries for single turn, and there are even multi-turn systems that are pretty capable. In fact, you can test out some of them as part of the Alexa Prize (‘Alexa, let’s chat’). Using time, you can measure progress more finely, e.g., by the number of minutes (or turns) it takes to uncover the imposter, rather than a fixed time limit.”
Evaluating AI on the basis of being indistinguishable from human intelligence makes as much sense as evaluating airplanes based on being indistinguishable from birds.
Maarek: “It is clear it is not a perfect definition. First, I doubt there exists a universally agreed-upon definition of intelligence, and it is not clear what ‘a human’ refers to. Is that any human? Can a machine be indistinguishable from some humans and not from others? It is, however, a simplifier that can still be used for inspiration. And it does bring inspiration, see for instance the outstanding progress in chess or Go. There are, of course, so many other areas where machines still require learning, and these challenges keep inspiring scientists. Two such areas, among others, on which we are focusing in Alexa Shopping Research are to make advancements in conversational shopping (as a subfield of conversational AI) and computational humor. With even small progress in these hard AI challenges, I am sure we will bring tremendous value to our customers and even make them smile.”
Ström: “Evaluating AI on the basis of being indistinguishable from human intelligence makes as much sense as evaluating airplanes based on being indistinguishable from birds. We may never have a single definition, but a common thread is generalizability, i.e., the ability to be successful in novel situations, not considered during the design of the system. To achieve such generalization, an AI needs the ability to reason and plan, have a representation of world-knowledge, an ability to learn and remember, and an ability to regulate and integrate those cognitive capabilities toward goals.
"The AI also needs to be an active participant in the world, and when evaluating intelligence, one needs to consider not just whether goals are met, but how efficiently goals are reached based on efficacy metrics that depend on the application — e.g., cost, energy use, speed, et cetera. My prediction is that once one or several successful such systems exist, a standard model will emerge that becomes a de facto definition of AI.”
Sukhatme: “I think the idea that we want a machine to have the ‘ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human’ still applies when thinking about AI. However, this idea has over the years been interpreted very narrowly when it comes to the ‘test’ – i.e. people look for human-like performance on some narrow task. I think we need to remind people that intelligence is very broad set of capabilities and we need to acknowledge that humans have deep understanding of the world, are social, have empathy, can and do learn continually and can do a very broad range of things. If we are to say that we’ve built a machine or system that exhibits AI, I would want to see it exhibit behavior indistinguishable from humans on a similar breadth of abilities.”
Q. In terms of AI, what do you think would surprise Turing today?
I think he'd be surprised at how far we’ve come in terms of the technological artifacts we’ve produced. And he’d be disappointed in how un-intelligent they are
Sukhatme: “I think he’d be surprised at how far we’ve come in terms of the technological artifacts we’ve produced. And he’d be disappointed in how un-intelligent they are.”
Maarek: “Hard to answer, as this is pure speculation. But I would like to believe that computational humor would be one of them, simply because it makes us all smile.”
Ström: “The resolution of Moravec's paradox. Machine learning and, in particular, deep learning, is now enabling us to solve sensorimotor tasks in robotics, and sensory tasks such as object recognition and speech recognition. Yet general intelligence is still a hard, largely unsolved, problem. I also think Turing would be fascinated by quantum computers.”
Smola: “The thing that would surprise Turing the most is probably the amount of data and its ready availability. The fact that we can build language models on more than 1 trillion characters of text, or that we have hundreds of millions of images available, is probably the biggest differentiator. It’s only thanks to these mountains of data that we’ve been able to build systems that generate speech (e.g. Amazon Polly), that translate text (e.g. Amazon Translate), that recognize speech (e.g. Transcribe), that recognize images, faces in images, or that are able to analyze poses in video.
"At the same time, it’s unclear whether he would have anticipated the exponential growth in computation. The UNIVAC was capable of performing around 4,000 floating point operators (FLOPS) per second. Our latest P4 servers can carry out around 1-2 PetaFLOPS, so that’s 1,000,000,000,000,000 multiply-adds — and you can rent them for around $30 an hour.”
Q. Which of today’s theoretical questions will scientists still be puzzling about in 2090?
Sukhatme: “How do human brains do what they do in such an energy efficient manner? What is consciousness?”
Maarek: “In terms of theoretical computer science problems, I believe that hard AI problems like Winograd Schema Challenge, will be resolved. But I want to believe that other AI challenges, like giving a true sense of humor to machines, won’t be solved yet. It's humbling to think that in 1534 the French writer François Rabelais said, 'le rire est le propre de l’homme' — which can be translated as 'the laugh is unique to humans'. It’s probably why my team is researching computational humor — it’s fun and hard.”
Ström: “In 70 years, I predict that AI has been solved for practical purposes and is used for cognitive tasks, small and large. So that is not it. Some long-standing profound questions like NP=P will still be unsolved. The physics model of time, space, energy and matter will still not be complete, and the question about how life spontaneously emerges from lifeless building-blocks will still puzzle both human and synthetic scientists. Unless we get lucky, 70 years will also not be enough to determine if there is alien intelligent life in our galaxy.”
Smola: “That’s really difficult since most projections don’t hold up well, even for a decade or so. In 2016, when I interviewed for a job and was deciding between Amazon and another major company, I was told at that other company that I was making a mistake in betting on AI in the cloud. Problems that will keep us awake, probably forever, are how to appropriately balance innovation while also protecting individual liberties. Those challenges will require continuous and careful consideration by multiple stakeholders in academia, industry, government, and our society. Likewise, we will never be able to have a full characterization of the empirical power of our statistical tools. In simple terms, we’ll likely always encounter algorithms that work way better than they should in theory. Lastly, there’s the issue of actually gaining causal understanding from data as to how the world works. This is hard and has been vexing (natural) scientists for centuries.
"Areas where we will likely see a lot of progress include autonomous systems. There’s so much economic promise in self-driving vehicles that I think we will eventually deliver something that works. The algorithms used for cars can also be adapted for a wide variety of other problems such as manufacturing, maintenance, et cetera. The next decade or two will be amazing — and we’ll likely also see great progress on the Turing test itself.”