Saturday, February 16, 2013

Software Predicts Tomorrow’s News by Analyzing Today's and Yesterday’s

 
Software Predicts Tomorrow’s News by Analyzing Today's and Yesterday’s
 
The other day, I was searching the news and information about big data, and there is one news in MIT Technology Review caught my eyes. It is reported that researchers from Microsoft and the Technion-Israel Institute of Technology created a software that predicts when and where disease outbreaks might occur based on the past two decades of New York Times articles and other online data.
 
The predicting system turns out to have a good performance (70-90 percent accuracy) in providing warnings of disease, violence and deaths. The article gives one example in disease warning happened in Angola in 2006. A warning about possible cholera outbreaks are given by the system after the country underwent droughts during 2006. According to the records, it is highly possible that the cholera occurs in years after droughts happen.
 
It sounds pretty simple at this point, however, they’ve done lots of work and effort to make it work. Take the above case for an example, to predict the cholera outbreaks precisely, the system is supposed to obtain the information about the city’s location, proportion, proportion of land covered by water, population density, GDP… in advance. Furthermore, it also needs to get the inter-connections between them by meaning of modern algorithm and out general rules for what events follow others.
 
Other than 22 years of New York Times (from 1986 to 2007), the system also draws on data from three main sources websites: DSpedia, WordNet and OpenCyc. In fact, each of these three websites have their respective functions. DSpedia is a crowd-sourced community effort to extract structured information from Wikipedia and to make this information available on the Web. The system take advantage of this website to obtain precise information like ” the location of the places in the news articles, how much money people earn there, and even information about politics.” To enable the system to understand the meaning of words involved in news, they prefer to use WordNet. The last one is OpenCyc which provides a database of common knowledge.
 
 “Eventually this kind of work will start to have an influence on how things go for people.” Horvitz did the research in collaboration with Kira Radinsky, a PhD researcher at the Technion-Israel Institute. However, the system still have to be improved to be transferred into real products.
 
News links :
http://www.technologyreview.com/news/510191/software-predicts-tomorrows-news-by-analyzing-todays-and-yesterdays/#comments
 
DSpedia http://dbpedia.org/About
WordNet http://wordnet.princeton.edu/
OpenCyc. http://www.cyc.com/platform/opencyc

2 comments:

  1. Shaomao,

    Thank you for sharing. Please edit post to reduce the use of same sentences from original reference.

    Fadel

    ReplyDelete
    Replies
    1. Dr.Megahed,
      Thanks for your comments.
      I've edited it, please let me know if it is qualified or not.
      Thanks

      Shaomao

      Delete