Monday, February 25, 2013

Text Mining Twitter

Over the years, many different organizations have attempted to rate locations based on topics such as education level, income level, or overall happiness.  There are many articles concerning best locations to retire or best locations to live cheaply.  These lists are generally compiled by data mining relevant sources, such as hospital and insurance data when determining overall health of a location.  Surveys are also widely used to determine things that are harder to measure, such as happiness. 

Recently Lewis Mitchell, a mathematician from the University of Vermont, created the “hedonmeter” in order to gauge happiness by location.  By text mining 10 million Twitter feeds from all over the country, Mitchell attempted to determine the overall happiness of each region.  Keywords were established which indicated whether a tweet was happy or sad. 

With the increase is social media users over the last decade, this is an interesting concept which could prove valuable in determining social issues.  Data mining social media is already used in some capacity by advertisers on sites like Facebook, where the advertisement is geared toward each user by data mining status updates and pages viewed.  Unlike surveys, which can be biased or inaccurate for a number of reasons, tweets are generally the honest expression or the user.  This method should lead to more open and honest answers and provide insight into the social status of people in the region. 

There are drawbacks to using social media such as Twitter.  Some people feel that Twitter users are only a vocal minority, and may not accurately reflect the sentiment of a region.  The author states that only 15% of adults use Twitter, and that the overwhelming demographic is adults ages 18 through 29.  This makes it difficult to gauge a population accurately with such a small sample and narrow demographic.  However, the study did cross reference income levels and obesity and found that the results from text mining twitter were proportional to income level and inversely proportional to obesity. 

Below is a visual representation of the results.  For more information and top 10 rankings of saddest and happiest U.S. cities, visit http://www.nbcnews.com/technology/technolog/tweets-reveal-happiest-u-s-cities-1C8502786.

 

1 comment:

  1. Jay,

    Good post. I wish you had related it more to the topic of sentiment analysis that was discussed earlier in the blog and to Brianna's post about other methods to make sense of Twitter data (see http://auburnbigdata.blogspot.com/2013/02/analysis-of-big-data-by-twitter-and.html).

    Fadel

    ReplyDelete