Ryan Boyed who is the developer advocate at Google and
focuses on Google Big Query presents first part of this video, and in the five
years at Google, he helped build the Google Apps ISV ecosystem. Tomer Shiran
who is the director of the product management team at MapR and is the founding
member of Apache Drill presents the second part of this video.
The developers have to face different kinds of data and a
large number of data, without good analyzing software and useful analyzing
methods, they have to use lots of time to collect a big amounts of data and
then throw some “invaluable” data, but most of time, the “invaluable” data has
their own potential value. Google has a good knowledge of Big Data with the
situation that every minute there are countless of users using Google’s
products such as Youtube, Google Search, Google+ or Gmail. With the big amounts
of data, Google begins to use APIs or other technologies to make the developers
focus on their fields. Google Big Query also known as Dremel is a Google
internal technology for Big Data analysis, and Apache Drill as Wikipedia says “Apache
Drill is an open-source software framework that supports data-intensive
distributed applications for interactive analysis of large-scale datasets.
Drill is the open source version of Google's Dremel system which is available
as an infrastructure service called Google BigQuery. One explicitly stated
design goal is that Drill is able to scale to 10,000 servers or more and to be able
to process petabytes of data and trillions of records in seconds. Currently,
Drill is incubating at Apache.” Apache Drill can make users query terabytes of
data in seconds and can support Protocol Buffers, Avro and JSON data formats,
with using data sources, it can use Hadoop and HBase. MapR Technologies is the
open enterprise-grade distribution for Hadoop, which is easy, dependable and
fast to use, and is the open source with standards-based extensions. MapR is
deployed at one thousand’s of companies from small Internet startups to the
world’s largest enterprises. MapR customers analyze massive amounts of data
including hundreds of billions of events daily, data from ninety percent of
world’s Internet population monthly and data from one trillion dollars in
retail purchases annually. MapR has partnered with Google to provide Hadoop on
Google computer engine. Drill execution engine has two layers which are
operator layer and execution layer, the operator layer is serialization-aware
to process individual records and execution layer is not serialization-aware to
process batches of records to be responsible for communication, dependencies
and fault tolerance. MapR can provide the best Big Data processing capabilities
and is the leading Hadoop innovator.
Sources:
http://en.wikipedia.org/wiki/Apache_Drill
No comments:
Post a Comment