This Robot Debates and Cracks Jokes, but It’s Still a Toaster
The Monolithic black rectangle on stage with luminous, bouncing blue dots at eye level was not Project Debater, IBM’s argumentative artificial intelligence. It was just something for an audience to look at while a voice—is it redundant to call an AI’s synthesized voice “disembodied”?—projected over the sound system of the Yerba Buena Center for the Arts, in San Francisco.
Project Debater is, as its name suggests, software that engages in formal, you-go-then-I-go staged debates. Back in June, when my colleague Tom Simonite wrote about its introduction, Project Debater was a halting, infantile thing. It’d get confused about which side it was on, or make mistakes in what evidence made sense to marshall. But Monday night, in front of hundreds of people, it was sophisticated—polished, even. Also: Kind of creepy.
As moderator John Donvan—of the debate-sponsoring group Intelligence Squared US—put it, the point wasn’t necessarily to win, but to get at some kind of truth, to “raise the level of public discourse” through a civil exchange of ideas. Presumably that suited IBM just fine; the point of Project Debater isn’t to produce a robot that can Well Actually. It’s to have a robot that understands human speech and helps humans understand complex ideas. Clearly, there could be fine robots on both sides.
To that end, neither champion debater Harish Natarajan—an Oxford- and Cambridge-educated financier—nor Project Debater (and its four pale, black-clad coders monitoring things from stage right) knew what the topic would be until 15 minutes before curtain. The subject turned out to be whether governments should subsidize preschool. Project Debater took the “yes” side; Natarajan took the “no.” I should mention, too, that Project Debater’s voice, while more synthetic-sounding than, say, Alexa’s, was also feminine. That’s a trope. Pretty much everyone referred to Project Debater with she/her pronouns. I will refrain.
The IBM reps said that the system works by drawing from a corpus of 10 billion sentences, which it can parse and understand in the context of the topic and what its debate opponent says. It also can concatenate those arguments—the robot equivalent of rhetorical technique. It’s supposed to go from learning to a simulation of actual reasoning, and to model the kind of dilemma contained in any debatable assertion in order to anticipate an opponent’s argument.
Indeed, Project Debater’s facility with statistics was impressive—it quoted from the UN’s Organization for Economic Co-operation and Development, from the Centers for Disease Control and Prevention, and elsewhere. It could determine relevance seemingly as well as any human. But its attempts at expressing personality went almost perfectly sideways. It addressed Natarajan by name, and used high-school essay constructions like “There are two issues. I will elaborate.” It even told a joke: In defending the benefits of subsidized preschools for poor families, Project Debater acknowledged that “I cannot experience poverty directly.” It opened a rebuttal by saying, “I sometimes listen to opponents and wonder, what do they want?” That all felt askew—as when a bot uses what Clive Thompson called phatic spackle, the ums and likes of human chitchat.
Even weirder than Project Debater trying to sound human was when Natarajan—perhaps in a rhetorical flourish of his own—seemed to fall for it. He described Project Debater’s argument as a fallacy, and said that a subsidy “doesn’t mean that those individuals who are as poor as Project Debater seems to care about are going to be those who have the ability to send their child to preschool.” That’s an interesting argument, but it also presupposes that the computer cares about something. Which it cannot. It was a rhetorical conceit.
On the other hand, Natarajan’s rhetorical moves generally landed more solidly than Project Debater’s. He concluded by saying, “I think we disagree on far less that it may seem,” a gesture toward agreeability designed to get an audience on one’s side. When Natarajan said “they’ll struggle to send their child to good quality preschools. They’ll struggle to send their child to good quality preschools that they don’t even have the money for … They’ll struggle to send their child to good quality preschools if they don’t value the amount of effort and time they have to put into it,” he was deploying anaphora, repetition of a word of phrase at the beginning of clauses for emphasis. (“We shall fight on the beaches. We shall fight on the landing grounds. We shall fight in the fields, and in the streets.”) It all felt a lot more believable coming from a person than from a Toaster.
Formal debates are a little weird if, like me, you’re not used to them. The participants don’t necessarily argue what they believe. The humans involved are supposed to be able to plausibly argue either side as a sign of their skill. A debater is already a little inhuman. So maybe it makes no difference that Project Debater is a lot inhuman. It can argue any position in part because it literally cannot believe anything. No matter what words it uses, it doesn’t “wonder” anything. It doesn’t remember its previous opponents (except in the sense that prior debates helped its programmers hone its skills). It doesn’t “think” or “hope.” It can tell jokes, but it doesn’t think they’re funny—because it doesn’t know what funny is. They are joke-like sounds. It certainly doesn’t know (except in the sense that everyone who programmed it passes on this knowledge implicitly) that humor sets an audience at ease and greases the intellectual chute for harder conceptual work.
All that freaked me right the hell out. There’s something faintly sociopathic about arguing a point of view when whoever or whatever is arguing can make no distinction between fact, opinion, and punchline. Project Debater puts words in an order we listeners recognize. That might be information, but it’s not knowledge. Watching the luminous blue light on the onstage screen pretending to be Project Debater, I kept remembering what Deckard says in Blade Runner when he finds out Rachel is a replicant: How can it not know what it is?
But of course, Deckard didn’t know what he—it?—was, either. It’s entirely possible that the people who sit near me at work don’t think I know what humor is, and wish I would stop with all the puns. One of the central problems of philosophy is how we know what we think we know, and whether we can trust that knowledge. Who am I to accuse Project Debater’s vast, distributed algorithmic system of inauthenticity?
So, the results of the debate: Intelligence Squared polls the audience before and after its debates, and counts the winner as whichever side shifted more people to that perspective. By that count, Natarajan was the clear victor. He’d started deep in the red, with only 13 percent in the “no” column, and ended with 30 percent. A few minutes after it was all over, I huddled with Dario Gil, the director of IBM Research. “I thought it went great,” he said. “One of the issues we’ve struggled with over the past year is getting the polarity right”—that is, making sure that all the evidence Project Debater comes up with is on the same side. That night, everything was.
And that’s how Project Debater isn’t a replicant. More real than real isn’t the goal here—at all. “We really lean into the ability to construct coherent arguments supported by evidence,” Gil said. The point isn’t to debate harder and harder issues—“You control a trolley that will kill one person or five people; what do you do?” or maybe “Resolved: I’ll Tell You About My Mother.” For IBM, debates are just a StarCraft II battlefield to test the soul of a new machine. “In the end, it’s about working with us, and it’s useful to know that this thing is not a human,” Gil said. But whether you’ll ever want a combative little buddy in your phone, making a case for or against some course of action? Well … that’s arguable.