Software Predicts Tomorrow’s News by Analyzing Today's and Yesterday’s
The other day, I was searching the news
and information about big data, and there is one news in MIT Technology Review
caught my eyes. It is reported that researchers from Microsoft and the Technion-Israel
Institute of Technology created a software that predicts when and where disease
outbreaks might occur based on the past two decades of New York Times articles
and other online data.
The predicting system turns out to have a
good performance (70-90 percent accuracy) in providing warnings of disease, violence
and deaths. The article gives one example in disease warning happened in Angola
in 2006. A warning about possible cholera outbreaks are given by the system after
the country underwent droughts during 2006. According to the records, it is highly
possible that the cholera occurs in years after droughts happen.
It sounds pretty simple at this point, however,
they’ve done lots of work and effort to make it work. Take the above case for an
example, to predict the cholera outbreaks precisely, the system is supposed to obtain
the information about the city’s location, proportion, proportion of land covered
by water, population density, GDP… in advance. Furthermore, it also needs to get
the inter-connections between them by meaning of modern algorithm and out
general rules for what events follow others.
Other than 22 years of New York Times (from
1986 to 2007), the system also draws on data from three main sources websites: DSpedia,
WordNet and OpenCyc. In fact, each of these three websites have their respective
functions. DSpedia is a crowd-sourced community effort to extract structured
information from Wikipedia and to make this information available on the Web. The
system take advantage of this website to obtain precise information like ” the
location of the places in the news articles, how much money people earn there,
and even information about politics.” To enable the system to understand the meaning
of words involved in news, they prefer to use WordNet. The last one is OpenCyc which
provides a database of common knowledge.
“Eventually this kind of work will start to
have an influence on how things go for people.” Horvitz did the research in
collaboration with Kira Radinsky, a PhD researcher at the Technion-Israel
Institute. However, the system still have to be improved to be transferred into
real products.
News links :
http://www.technologyreview.com/news/510191/software-predicts-tomorrows-news-by-analyzing-todays-and-yesterdays/#comments
DSpedia http://dbpedia.org/About
WordNet http://wordnet.princeton.edu/
OpenCyc. http://www.cyc.com/platform/opencyc
Shaomao,
ReplyDeleteThank you for sharing. Please edit post to reduce the use of same sentences from original reference.
Fadel
Dr.Megahed,
DeleteThanks for your comments.
I've edited it, please let me know if it is qualified or not.
Thanks
Shaomao