Cloudera- Interview with Mike Olsen
Over the past couple of weeks, we
have talked about the basics of MapReduce and Hadoop. We had a guest lecturer Tuesday
that also went over these ideas and mentioned “Cloudera” in passing. I probably
wouldn’t have even noticed that he said that, but I had already done a little
research on this company. Last week as I was trying to keep afloat about what
was going on in class, I came across a video on YouTube that was an interview
with the CEO of Cloudera, Mike Olsen. The video is pretty helpful if you need
another perspective about how all of this works in general. In fact, its title
is “What is Hadoop? Other big data terms like MapReduce? Cloudera's CEO talks
us through big data trends”. The video might seem kind of long, but it isn’t too
difficult to understand. Olsen definitely knows his stuff and is excited to
share what he knows.
I think it is important to see what
kind of involvement companies have in open-source software so that we are able
to utilize the software better ourselves. Olsen gives a little insight on what
his company does, which is an interesting take on the big data “scene”.
Brief Overview:
Olsen discusses Google’s need to
handle large amounts of data and how that lead to the MapReduce function. These
days there are machine-generated data that makes it hard to keep up unless
there is large-scale distributed data storage. Complex data that is found when data
mining doesn’t fit into a table nicely. Data platforms are needed to do new
kinds of data processing for these “nonconforming” types of data. Complex data
at volume is coming to all industries, not only computer software companies.
Hadoop is the open source
implementation of MapReduce. It spreads out the data and pushes to code down to
the data so that analyses can be run in parallel. Hadoop is flexible and allows
for more information to be introduced by adding new clusters to the “cloud.” Data
centers are starting to choose technology that is aimed at problems because
there are problems that don’t always fit a certain mold.
Open source technologies are taking
over the planet. Cloudera is a business that takes full advantage of open
source infrastructure. Programmers and systems people are all thinking about
storing and using data in new ways using new tools. Cloudera is now able to
sell their services because of the acceptance of new tools, such as Hadoop. It’s
unbelievably easy these days to have clusters that run on rack space clouds. It
makes data analytics much easier to be able to essentially buy computing space
online instead of having to own dozens of computing machines.
Cloudera contributes to the open
source project and views itself as a member of a great global community doing
innovative work. It is an apache project and all the IP goes to the software
foundation. They package the open source core so that companies can easily use
it. They build products that compliment the open source core. Training,
consulting, and technical support to companies that use Hadoop. “Ordinary”
companies that are not necessarily computer whizzes are able to use this
software to its potential to solve the problems that they have.
Olsen says, “I don’t actually like
the term ‘Big Data’ because if I go talk to companies about it, they actually
say, ‘I just have medium-sized data’.” It is easy to get a lot of data these
days, unlike in the past when it was hard to accumulate data. Software, such as
Hadoop, does valuable processing in small volumes that can be used to solve
real problems.
There are videos on Clodera’s
website that help with training on Hadoop. The videos focus on more detail
about Hadoop than the particular interview above. One of the videos in
particular is called “Thinking at Scale” and it should be checked out.
Brianna,
ReplyDeleteThanks for sharing. See http://www.networkworld.com/news/2013/012313-hadoop-cloud-266062.html for some related news.
Fadel
Thanks for sharingData Mining software service providers
ReplyDelete