Friday, February 1, 2013

Cloudera-Interview with Mike Olsen


Cloudera- Interview with Mike Olsen

Over the past couple of weeks, we have talked about the basics of MapReduce and Hadoop. We had a guest lecturer Tuesday that also went over these ideas and mentioned “Cloudera” in passing. I probably wouldn’t have even noticed that he said that, but I had already done a little research on this company. Last week as I was trying to keep afloat about what was going on in class, I came across a video on YouTube that was an interview with the CEO of Cloudera, Mike Olsen. The video is pretty helpful if you need another perspective about how all of this works in general. In fact, its title is “What is Hadoop? Other big data terms like MapReduce? Cloudera's CEO talks us through big data trends”. The video might seem kind of long, but it isn’t too difficult to understand. Olsen definitely knows his stuff and is excited to share what he knows.

I think it is important to see what kind of involvement companies have in open-source software so that we are able to utilize the software better ourselves. Olsen gives a little insight on what his company does, which is an interesting take on the big data “scene”.



Brief Overview:

Olsen discusses Google’s need to handle large amounts of data and how that lead to the MapReduce function. These days there are machine-generated data that makes it hard to keep up unless there is large-scale distributed data storage. Complex data that is found when data mining doesn’t fit into a table nicely. Data platforms are needed to do new kinds of data processing for these “nonconforming” types of data. Complex data at volume is coming to all industries, not only computer software companies.

Hadoop is the open source implementation of MapReduce. It spreads out the data and pushes to code down to the data so that analyses can be run in parallel. Hadoop is flexible and allows for more information to be introduced by adding new clusters to the “cloud.” Data centers are starting to choose technology that is aimed at problems because there are problems that don’t always fit a certain mold.

Open source technologies are taking over the planet. Cloudera is a business that takes full advantage of open source infrastructure. Programmers and systems people are all thinking about storing and using data in new ways using new tools. Cloudera is now able to sell their services because of the acceptance of new tools, such as Hadoop. It’s unbelievably easy these days to have clusters that run on rack space clouds. It makes data analytics much easier to be able to essentially buy computing space online instead of having to own dozens of computing machines.

Cloudera contributes to the open source project and views itself as a member of a great global community doing innovative work. It is an apache project and all the IP goes to the software foundation. They package the open source core so that companies can easily use it. They build products that compliment the open source core. Training, consulting, and technical support to companies that use Hadoop. “Ordinary” companies that are not necessarily computer whizzes are able to use this software to its potential to solve the problems that they have.

Olsen says, “I don’t actually like the term ‘Big Data’ because if I go talk to companies about it, they actually say, ‘I just have medium-sized data’.” It is easy to get a lot of data these days, unlike in the past when it was hard to accumulate data. Software, such as Hadoop, does valuable processing in small volumes that can be used to solve real problems.

There are videos on Clodera’s website that help with training on Hadoop. The videos focus on more detail about Hadoop than the particular interview above. One of the videos in particular is called “Thinking at Scale” and it should be checked out.


2 comments:

  1. Brianna,

    Thanks for sharing. See http://www.networkworld.com/news/2013/012313-hadoop-cloud-266062.html for some related news.

    Fadel

    ReplyDelete