Sunday, March 31, 2013

Crunching Big Data with Google Big Query





Ryan Boyed who is the developer advocate at Google and focuses on Google Big Query presents first part of this video, and in the five years at Google, he helped build the Google Apps ISV ecosystem. Tomer Shiran who is the director of the product management team at MapR and is the founding member of Apache Drill presents the second part of this video.

The developers have to face different kinds of data and a large number of data, without good analyzing software and useful analyzing methods, they have to use lots of time to collect a big amounts of data and then throw some “invaluable” data, but most of time, the “invaluable” data has their own potential value. Google has a good knowledge of Big Data with the situation that every minute there are countless of users using Google’s products such as Youtube, Google Search, Google+ or Gmail. With the big amounts of data, Google begins to use APIs or other technologies to make the developers focus on their fields. Google Big Query also known as Dremel is a Google internal technology for Big Data analysis, and Apache Drill as Wikipedia says “Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Drill is the open source version of Google's Dremel system which is available as an infrastructure service called Google BigQuery. One explicitly stated design goal is that Drill is able to scale to 10,000 servers or more and to be able to process petabytes of data and trillions of records in seconds. Currently, Drill is incubating at Apache.” Apache Drill can make users query terabytes of data in seconds and can support Protocol Buffers, Avro and JSON data formats, with using data sources, it can use Hadoop and HBase. MapR Technologies is the open enterprise-grade distribution for Hadoop, which is easy, dependable and fast to use, and is the open source with standards-based extensions. MapR is deployed at one thousand’s of companies from small Internet startups to the world’s largest enterprises. MapR customers analyze massive amounts of data including hundreds of billions of events daily, data from ninety percent of world’s Internet population monthly and data from one trillion dollars in retail purchases annually. MapR has partnered with Google to provide Hadoop on Google computer engine. Drill execution engine has two layers which are operator layer and execution layer, the operator layer is serialization-aware to process individual records and execution layer is not serialization-aware to process batches of records to be responsible for communication, dependencies and fault tolerance. MapR can provide the best Big Data processing capabilities and is the leading Hadoop innovator.

Sources:
http://en.wikipedia.org/wiki/Apache_Drill

No comments:

Post a Comment