Analytics and Visualization of Big Data: Cumulus Clouds

OK, so today’s class (January 31, 2013) for me seemed to be very computer science oriented. I too, was a little overwhelmed at first when I started working with cloud computing. I am very accustomed to working with data that is found locally on my computer and thus I am very acquainted to plugging in a flash drive, saving the data to some local directory and opening Excel. This is most familiar to me because, like the rest of you, it is what was everyday practice growing up. Before we get into cloud computing, lets think about a few things…

What is the largest flash drive you own?

What is the largest external (or internal) hard drive you are using?

I’m going to venture out on a limb and make a gross assumption that no one has any local file storage capacities larger than 5 terabytes. That is a very large hard drive, which can store lots of information (for a single person). Now imagine you work for a company that mines (analyzes) twitter and Instagram. Your company has been hired by the National Football League (NFL) to store all twitter posts and Instagram photos relating to the playoff games, commercials spots, and lastly the Superbowl. The league wants to see how social media is “playing out” during the games. They want to use this social media data to increase the prices of the commercial seconds in the future.

(http://www.dailymail.co.uk/news/article-2082140/Super-Bowl-ads-sell-record-3-5m-EACH-just-30-seconds.html)

All of that data will NOT store on one, two, or even three computers; so the question becomes: where do you store all of this data? Well, your company can spend a tremendous amount of money to buy lots of hard drives to store all of this data but this could be very costly if your sales team does not have another great lead on a job. Your data storage should be flexible given the demand you might have. Now, you remember of this class you took on data mining in college and recognize a potential solution to the problem! You introduce to your superior the idea of cloud computing and NOW because of this, your company invests in cloud storage. This allows you to buy space, as it is needed.

Imagine the NFL comes to your office the week after the Super Bowl and says that they want to know how often a particular word or combination of words was used. The league wants to show the power of advertising. How can you do this?

*Yikes, in the past, we could easily pull up Excel but the data is so large that Excel will not help and it is located on the Amazon Cloud. This is where the Python code comes into play. By implementing a few simple commands, you can export valuable information regarding what is going on with the data.

In class today we looked at word count. Lets put this idea to use:

1. We have an extremely large file on the Amazon Cloud.

2. We wish to examine the word count to show us how often certain words are used. (**In the NFL example, this could be touchdown, 49ers, Ravens, and the list can go on).

3. We can use a simple python code found by ordinarily Google-ing for it.

4. Once we have performed the job and saved the results in an output file, we can then use “Orange,” “RapidMiner” and even Excel to visualize the results.

In the project we outlined in class, we have a text file that shows word count. Simply opening this in Excel and sorting shows that the word most commonly used is the word “the” with it being mentioned 31 times, followed by the word “to.” As you can see, data mining can be quite easy and informative. The data though we mind might be extremely large and we are unable to perform such tasks using our local hard drives. This is where the advantages of cloud computing comes into fruition.

Analytics and Visualization of Big Data

Thursday, January 31, 2013

Cumulus Clouds

1 comment: