OK, so today’s class (January 31, 2013) for me seemed to be
very computer science oriented. I too, was a little overwhelmed at first when I
started working with cloud computing. I am very accustomed to working with data
that is found locally on my computer and thus I am very acquainted to plugging
in a flash drive, saving the data to some local directory and opening Excel. This
is most familiar to me because, like the rest of you, it is what was everyday
practice growing up. Before we get into cloud computing, lets think about a few
things…
What is the largest flash drive you
own?
What is the largest external (or
internal) hard drive you are using?
I’m going to venture out on a limb and make a gross
assumption that no one has any local file storage capacities larger than 5
terabytes. That is a very large hard drive, which can store lots of information
(for a single person). Now imagine you work for a company that mines (analyzes)
twitter and Instagram. Your company has been hired by the National Football
League (NFL) to store all twitter posts and Instagram photos relating to the
playoff games, commercials spots, and lastly the Superbowl. The league wants to
see how social media is “playing out” during the games. They want to use this
social media data to increase the prices of the commercial seconds in the
future.
All of that data will NOT store on one, two, or even three
computers; so the question becomes: where do you store all of this data? Well,
your company can spend a tremendous amount of money to buy lots of hard drives
to store all of this data but this could be very costly if your sales team does
not have another great lead on a job. Your data storage should be flexible
given the demand you might have. Now,
you remember of this class you took on data mining in college and recognize a
potential solution to the problem! You
introduce to your superior the idea of cloud computing and NOW because of this,
your company invests in cloud storage. This allows you to buy space, as it is
needed.
Imagine the NFL comes to your office the week after the
Super Bowl and says that they want to know how often a particular word or combination
of words was used. The league wants to show the power of advertising. How can
you do this?
*Yikes, in the past, we could easily pull up Excel but the
data is so large that Excel will not help and it is located on the Amazon
Cloud. This is where the Python code comes into play. By implementing a few
simple commands, you can export valuable information regarding what is going on
with the data.
In class today we looked at word count. Lets put this idea
to use:
1.
We have an extremely large file on the Amazon
Cloud.
2.
We wish to examine the word count to show us how
often certain words are used. (**In the
NFL example, this could be touchdown, 49ers, Ravens, and the list can go on).
3.
We can use a simple python code found by ordinarily
Google-ing for it.
4.
Once we have performed the job and saved the
results in an output file, we can then use “Orange,” “RapidMiner” and even
Excel to visualize the results.
In the project we outlined in class, we have a text file
that shows word count. Simply opening this in Excel and sorting shows that the
word most commonly used is the word “the” with it being mentioned 31 times,
followed by the word “to.” As you can see, data mining can be quite easy and
informative. The data though we mind might be extremely large and we are unable
to perform such tasks using our local hard drives. This is where the advantages
of cloud computing comes into fruition.
https://auburnbigdata.blogspot.com/2013/02/money-ball-for-super-bowl-sunday.html?showComment=1564726256783#c8410058815076684209
ReplyDelete