Sunday, March 24, 2013

Hadoop for Database Analysis


 After Hadoop revolutionized the way web services analyzed data, there are those who are setting their sights on making Hadoop perform more like traditional database software. While Hadoop excels at handling large amounts and less relational data that database software struggles with, Hadoop takes significantly longer to actually address the data in any meaningful way. A straightforward query might take a few minutes whereas in a relational database it would take only a few seconds. To remedy this multiple startups are implementing the structured query language (SQL) into Hadoop with significantly shorter processing times as a major feature. One major company that is attacking this problem is Greenplum. Greenplum recently announced their own version of Hadoop, Pivotal HD, which increases performance, and accepts SQL queries. Database giant Oracle also sells its own version of Hadoop. Cloudera, a startup founded by the former Facebook employee who chose to implement Hadoop as Facebook’s preeminent data management and analysis system. Cloudera’s implementation of SQL queries within Hadoop is significantly faster than Hadoop itself, and Greenplum’s is faster still, but they do have some weaknesses when compared to Hadoop. In Pivotal HD, if a machine fails during a query the query is stopped and must be completely restarted, a problem that Hadoop does not share. This type of issue could create problems on larger networks, where machine failures would be more frequent. These improvements on Hadoop do showcase a big positive to Hadoop’s open source architecture, as they would not be possible with a closed source system.


Source: http://www.wired.com/wiredenterprise/2013/02/pivotal-hd-greenplum-emc/

No comments:

Post a Comment