Sunday, February 24, 2013

Is Hadoop old news?


We have heard about how Target and other companies use Big Data analytics to pinpoint their customers’ interests. Web services used worldwide have to be sure to provide quick results, or else the customers will bail on the site. That is exactly the motivation of one company as they expand in the Big Data community.

A large German travel company needed help providing quick answers. By quick, the agency meant that they wanted to provide an answer to their customer in a second or less because that is the amount of time that they figured was allotted until the customer decided to go to another site. Intuitively, as more time passes, more customers are lost. At the time the company started looking for help with their site, the fastest they could build was 6.5 seconds on 200 million records. At 6.5 seconds, the company was operating entirely too slow, and not to mention on too many computing machines. Big Data solutions, such as Hadoop, columnar database technology, Oracle, and FAST from Microsoft were not cutting it. Hadoop, who we keep hearing about as the latest and greatest, wasn’t doing the job!

The travel company decided to build their own method of data processing because they couldn’t afford all of the machines needed to run the other systems. The new method started with data structures, algorithms, indexing, and continuous loading of new data. The company that would represent this product would be ParStream. With ParStream’s advancements, the travel company is now able to handle 1,000 queries per second, rummage through 18 billion offers with 20 parameters all to give a response in (the less than desired) one second. This is achieved by CPUs combined with Nvidia’s Fermi GPU processors. ParStream technology allows for the same amount of processing with the fraction of the machines.

Michael Hummel, who has been involved as a manager along the way stated, “Nobody wants to wait for results. Most people think big data is billions of records, but static. That is completely wrong. Big data is dynamic. New data is created every second and you have to take this new data and process it together with historical data.”

Hummel, like most consumers, believes that faster is better. Something that can keep up with real time changes in data is especially innovative. That is what ParStream is able to do. Changes happen so often, especially on the internet, and this technology is able to keep up with these changes, unlike MapReduce technology.

MapReduce has been a popular term since Google made it popular in 2005, but now even Google is saying that MapReduce isn’t quite up to par. In fact, Google is now using Caffeine and Dremel for Big Data analytics. Ironic, right? This makes for an interesting perspective about Hadoop. People are obviously still utilizing MapReduce and Hadoop, but from what it sounds like, ParStream has a leg up on them. ParStream claims “ParStream has already been used to replace Hadoop clusters, with a better efficiency ratio of 10 to 20 (less nodes) and a better effectiveness ratio of the same order of magnitude (in query response time).” While ParStream would not be able to replace all the functions of MapReduce, it is a Big Data analytics tool to keep our ears open for. Will “ParSteam” become a household term, or will something else in the Big Data world be bigger?


***Disregard that this is an ad for AT&T and that these commercials are on TV all the time, but take it from this girl that faster is better.





Sources:



1 comment:

  1. Brianna,

    As you know this field has a very short memory and if you get our innovated, you are done. There will continue to be advancements in big data analytics and hopefully, we can make not only faster decisions, but also smarter ones using the existing platforms.

    Big data is definitely not only about the volume and the velocity of data; there are other dimensions that make such problems more challenging (see http://www.sas.com/big-data/) for a nice introductory discussion on these dimensions.

    Fadel

    ReplyDelete