Monday 21 February 2011

Watson, the computer Behemoth in Jeopardy!

Alex Popescu's excellent blog mentioned the DeepQA project and IBM's supercomputer Watson. Watson's recent appearance on the US TV show Jeopardy!. Interestingly, DeepQA uses both Apache Hadoop and UIMA to analyse large volumes of documents to build DeepQA's knowledge-base.

As explained in https://www.stanford.edu/class/cs124/AIMagzine-DeepQA.pdf
"To preprocess the corpus and create fast run-time indices we used Hadoop. UIMA annotators were easily deployed as mappers in the Hadoop map-reduce framework. Hadoop distributes the
content over the cluster to afford high CPU utilization and provides convenient tools for deploying, managing, and monitoring the corpus analysis process."
which is exactly what Behemoth does (how very reassuring!).

The article also mentions UIMA-AS and it is not entirely clear what part of the system uses what : is UIMA-AS used for the runtime analysis of the questions and Hadoop for the background learning?

Would be interesting to know what sort of UIMA annotators were used internally for the analysis of the text and, more importantly from Behemoth's point of view, whether it could have been used for this project and/or what features would have been required to get it to work on DeepQA.