In building and programming the world's most capable natural language processing computer, David Ferrucci, and IBM Research's DeepQA Team are shooting far higher than a million-dollar payday on "Jeopardy!".
In 2005, David Ferrucci told his superiors at IBM he could build a machine that could play “Jeopardy!”. From the beginning, however, the computer scientist was building the system for a lot more than a television game show. Photos: IBM Research
In February 2011, the television game show "Jeopardy!" hosted an unusual competitor. So unusual, in fact, that a new set of shows were filmed to both showcase and test this contestant's abilities. Going by the name of Watson, the new player fared well in "his" initial outing, winning two out of three episodes and showing an ability to navigate the puns and mind games that help make "Jeopardy!" such a challenge for even the most knowledgeable human players.
The twist: Watson was a machine, one built specifically to excel in a task that requires encyclopedic knowledge and the ability to find the correct answer quickly. Named after IBM's founder, Thomas Watson, the computerized Watson was perhaps the biggest microprocessing celebrity since IBM's Deep Blue matched chess wits with Garry Kasparov more than a decade earlier. To succeed on "Jeopardy!", however, Watson needed to be both more capable than Deep Blue and more "human". To play the game, the computer had to analyze clues that often involved subtle meanings, irony, riddles, and complexities that suit the abilities of humans, not computers. And it had to accomplish these concurrent tasks nearly instantly, in real time, to beat its human competitors to the buzzer.
Watson won the cumulative points battle, but it wasn't easy. For six years, IBM Fellow David Ferrucci, PhD, a research staff member and Watson team leader, and the members of IBM Research's DeepQA Team labored to build on decades of artificial intelligence know-how to construct the most capable natural language processing computer on the planet. In recognition of their accomplishment in Watson, the editors of R&D Magazine have honored Ferrucci and his team with the 2011 Innovator of the Year Award.
A "crazy" assignment
The Watson project emerged in an unlikely place: a restaurant. In 2005, Charles Lickel, an IBM executive who later became Ferrucci's boss, was eating with colleagues who happened to be discussing what the next Deep Blue might look like. The restaurant suddenly emptied out as diners gravitated to the bar's television to watch Ken Jennings attempt to stretch his stupendous 74-game winning streak on "Jeopardy!".
If ever there was a challenge for a computer, Lickel thought, it would be to square off against a player like Jennings. Lickel broached the idea to IBM's vice president of research, Paul Horn, who was tasked with "trying to find someone crazy enough to take it on," says Ferrucci. "Everyone around the company was saying it is pure folly, it's far too difficult to do."
Ferrucci had a different point of view. His entire career to this point had been spent studying the way in which computers can interpret and process knowledge. His field of open-domain question-and-answering—hence, DeepQA—originated in the 1960s, but both the science and the hardware behind it had advanced considerably in the 1980s and 1990s. As Ferrucci was beginning his career, Moore's law was being maintained to such a predictable degree that computers were now able to exercise some of the knowledge representation theories that had been developed.
As he explains it, Ferrucci was destined to be a medical doctor, not a computer scientist. His parents always urged him to pursue a career in medicine, and he began his college career on that track before, of all things, a programming assignment in a math course at Iona College in 1978 opened his eyes to another possibility.
"I came out of that class with a totally new perspective. We were writing BASIC programs, and I was complete blown away by the concept of programming a computer to do something," says Ferrucci.
In the late 1970s, computing was still new to a host of industries, including the medical community, and personal computers were only just beginning to get traction. Ferrucci, however, recognized that the computer was something special, and could change the way medicine could be done.
"I had so many implications in my head, and immediately got excited about the prospect of medical expert systems," he says.
While his medical studies continued, he wrote software in his free time. After a while, it became his hobby and his joy, and he realized he didn’t want to be a medical doctor.
At Renssaelaer Polytechnic Institute, Ferrucci earned a doctorate in a relatively new computer science discipline called knowledge representation and reasoning. In basic terms, the research field involves analysis of how to effectively use a set of symbols to represent a set of facts in a defined knowledge domain.
Essentially, a proxy symbolic language is created and enmeshed with a system of logic to allow a network of inferences to be built about a given set of samples, or elements. The logic system governs how reasoning functions can be applied to the symbols.
Since the early 1990s, IBM had successfully built computer systems based on knowledge representation, but none were particularly powerful, or fast.
But Ferrucci thought that the advent of massively parallel processors could make it possible. With department support, he managed to sell the effort to IBM's senior vice president. But he was not certain he would be successful. Even in 2007, when the Watson project became official, IBM's technology wasn't yet up to the task.
"All of the technologies we had fell way short. To compete, we needed to predict accurate probabilities, with confidence, with over 85% precision. The state-of-the-art systems we had at the time were down to 15%," he says.
But the allure was too difficult to ignore. Although "Jeopardy!" is a far cry from chess, the effort seemed achievable to IBM's experts, and success would offer multiple rewards, including positive exposure for the company’s capabilities.
"We had a firm belief that this was one problem to try to solve. From a scientist's position, failure would have taught us a lot. From a career position, I was definitely putting mine at risk," says Ferrucci. He quickly realized that IBM's executives really wanted to win the game.
Managing data, the human way
The notion that software could be used to create an expert system for the medical community was probably the single biggest motivator for Ferrucci. Even early in his career, he recognized a single overarching roadblock to his vision: data volume.
"No matter how much knowledge we put into a computer, the volume of knowledge is changing too quickly and it’s too voluminous. If we are trying to keep that knowledge in computer terms, we would never get there and we would never catch up," he says.
The solution, he realized, would be to develop a machine that would deal with language on human terms—a natural language computer. Instead of trying to conform human language to the needs of the computer, Ferrucci says, the best approach is to create a computer that is able to do the heavy lifting.
"If swallowing is the same thing as food getting caught in your throat, then a computer would have no problem. But human meaning is so nuanced. There are so many things which are expressed in so many different ways that have not been formalized or mathematized," says Ferrucci.
The ability for a machine to interpret the meaning of human speech has been envisioned since before the computer was invented. In 1950, Alan Turing published his famous article "Computing Machinery and Intelligence". He proposed a test by which a computer program would be judged by its ability to impersonate a human in a real-time written conversation so well that it "fools" a human judge.
Early computers showed promise, but even direct machine translation (from English to French, for example) defied programmers until the 1980s. Statistical machine translation system developed at IBM helped crack the problem, using a combination of probability distribution and a search-limiting decoder to accelerate greatly the translation process.
Such efforts were only tapping on the door of Turing's test space, however. In order to truly interact with a human, a machine would have to speed up dramatically and converse in real time.
Speed, the key to Watson's success
The format of "Jeopardy!" is well-known. Three contestants are pitted against one another as they are asked trivia questions based on a series of categories. Given a clue, they have milliseconds to decide if they can answer before pressing a plunger. To complicate matters, puns and jokes are used in the clues delivered by long-time host Alex Trebek; and all answers must be phrased as a question.
Unlike chess—a game of pure strategy based on a staggering number of possible combinations—"Jeopardy!" is a more finite world, but one that is rich in the most difficult obstacle for programmers: human language. When the DeepQA project was officially launched in 2007, computers couldn't hope to challenge even the least curious or cosmopolitan "Jeopardy!" player. The first problem is access to knowledge. The "Jeopardy!" machine would have to know where to look among a large repository of human knowledge. Even if a databasing system that could do this within seconds were developed, the prior problem that would consistently raise its head is: What’s the answer?
Like many of today's best supercomputers, DeepQA is designed to be a massively parallel system.
The underlying architecture—one in which Ferrucci is an expert—is Apache Unstructured Information Management Architecture (UIMA). Designed to support interoperability and scale-out of text and multimodal analysis applications, UIMA speeds searching tasks by annotating text. Over time, the DeepQA team has built out the original system to include hundreds of annotating components that all function in an overlapping manner to analyze natural language, score evidence, find sources, and evaluate hypotheses.
Early on, Watson needed two hours to answer a single question using a single processor. When processors were added, a UIMA approach enabled each processor to execute a different part of the original task. After tasks are assigned, Watson’s Apache Hadoop framework is used to prepare large volumes of prepocessed data to speed run-time processing. The DeepQA's annotators then take over, mapping the Hadoop framework and distributing work to processors.
Processor-level optimization also helps push the answer through.
Watson leverages 2,880 POWER7 cores in a cluster of 90 servers. Operating at 80 teraflops, the system can sift through 200 million pages of natural language content and produce answers in one to six seconds. The 16 TB of random-access memory stores Watson’s entire repository and serves as the computing equivalent of a fast-responding "brain" that can scan vast amounts of data in fractions of a second.
On top of this infrastructure is the DeepQA strategy. IBM boils down its question-and-answering principles to just four key principles. The first two components are obvious: many experts and massive parallelism. The more sources that are available, the greater the chances of finding the answer. In order to access the "experts" quickly, however, a large volume of parallel work—in both hardware and software—must be accomplished.
The actual process of finding the answer relies heavily on the method of questioning, which is why the DeepQA team adopted techniques that in some ways bear an uncanny resemblance to human ways of finding correct answers. Instead of allowing UIMA components to commit to an answer, Watson's elements produce "confidences". These interpretations, called "pervasive confidence estimation", are scored, stacked, and combined.
Finally, the designers adhered to a policy of diversity with respect to search methods. A balance of strict and shallow semantics, they felt, are invaluable in arriving at the final answer. With this basis, Watson was able to respond quickly to Trebek's clues by receiving a text version of the clue at the same time that Rutter and Jennings visually processed their clue. The system immediately broke down the clue’s components and delivered the answer.
"Watson's success ultimately depends on the relationship between the question being asked and information available," says Ferrucci. The conceptual space between what an expression literally means and the actual semantic meaning is one that Watson attempts to solve by breaking it down into parts.
"If the question has to be broken up in larger parts, there are larger probabilities for error," says Ferrucci.
One question that Watson fumbled was about King Kong, the cinematic gorilla that scaled the Empire State Building. Not only did the system struggle with the fictional context, it was unable to parse the term "APB" for "all-points bulletin". Acronyms and short words can stymie Watson, because those three letters can be expanded to mean many different things.
As part of Watson's development, Ferrucci's team encountered countless such instances of the system's failure. Incremental improvements in the annotation scheme helped them improve. It also helped them think about what Watson might be capable of in more practical applications. For Ferrucci's dream of a medical expert system, Watson could not only sift through a vast trove of medical information, it could provide multiple treatment recommendations or diagnoses.
"The real value behind it is the generation of evidence that supports the treatment options given. That gives the results real value, and that's what's behind the scenes in 'Jeopardy!'," says Ferrucci.
The drive to build meaning
What’s next for Watson? Less play and more work. "Jeopardy!" was an ideal showcase for Watson's capabilities, but IBM did not spend that much effort on the system without an eye toward practical applications and potential customers.
One of the nation's largest health insurers, WellPoint Inc., will use Watson[s speed and health care database to help diagnose medical problems and authorize treatments for its 34.2 million members. The application will be designed pull data from three different sources: a patient's chart and electronic medical records from doctors or hospitals; the insurance company's history of medicines and treatments; and Watson's library of textbooks and medical journals. As on "Jeopardy!", Watson will sift through the repository of information in seconds. A pilot program is planned to launch next year at several cancer centers, academic medical centers, and oncology practices.
Ferrucci's team is also working on the DARPA Machine Reading Program, which is extending DeepQA to perform deeper understanding of natural language content.
"I feel like with Watson we've reached a milestone in the program. But we have a long way to go," says Ferrucci, and he is realistic about Watson's limitations. People can get confused about artificial intelligence, he says, because they mistakenly believe that computers originate meaning. They can’t. They can predict it, and with inventions like Watson they can now mimic and create a "de facto" meaning. Only humans can originate meaning; computers are there to help.