Facebook 're-invents' the wheel
We
have discussed Hadoop in our data mining class. Although we did not extensively
work on Hadoop in class, it has numerous applications and can be applied to
various domains in data mining. Hadoop is open source software which makes it
really easy to implement in various platforms.
It enables the distributed processing of
large data sets across clusters of servers. A significant feature of this software
is to detect the failures at an application layer than relying on hardware. The
following few lines summarizes an article which shows that implementation of
Hadoop platform, Facebook will make better use of clusters than MapReduce.
Facebook has used source code for scheduling workloads on Hadoop platform which
they claim to be their own. This also known as ‘Corona’ has achieved superior
results. The Corona scheduler was able to utilize 95 % of a cluster to work on
jobs compared to a 70 % utilization using MapReduce. Facebook’s operations and
users generate more than half a petabyte of data each day. Corona offers a number of additional benefits as well,
including faster loading of workloads and a more flexible way of upgrading the
software. Corona took around 55 seconds to fill an empty
workspace, which is a 17% improvement from MapReduce. Typically,
analysis jobs running on Hadoop are scheduled through the MapReduce framework,
which breaks jobs into multiple parts so they can be executed across many computers in parallel.
The Corona software used overcomes the fact that the cluster
utilization will not drop during peak scheduling overloads. The company is now
deploying Corona for all workloads. Some of the limitations of MapReduce are
that the software used to delay the jobs before executing them. This does not
work very well with the Facebook interface as it relies on quick scheduling and
any downtime will raise havoc and generate negative publicity. Corona is a software
which will scale more easily and make better use of clusters. It would offer lower latency for smaller jobs and could be
upgraded without disrupting the system. Corona was
developed in house with the Facebook team and they evaluated some other alternatives
such as Yarn, which is Apache’s overhaul of MapReduce. But they were unsure
Yarn could process clusters with a social networking application which
processes petabytes of data.
Corona will be run on non-MapReduce jobs, therefore making a
Hadoop cluster more of a general-purpose parallel computing cluster. Facebook
is also trying to incorporate online upgrades, which would mean a cluster
doesn’t have to come down every time part of the management layer undergoes an
update.
Facebook is extremely confident that they will make this in house product to successfully work. They have officially posted a link on the facebook website explaining the the scheduling of MapReduce jobs and the limitations they have overcome by implementing 'Corona'.
http://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920
Facebook is extremely confident that they will make this in house product to successfully work. They have officially posted a link on the facebook website explaining the the scheduling of MapReduce jobs and the limitations they have overcome by implementing 'Corona'.
http://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920
Reference: By Joab Jackson | IDG News Service
:http://www.infoworld.com/d/business-intelligence/facebook-boosts-hadoop-scheduling-muscle-206731
:http://www.infoworld.com/d/business-intelligence/facebook-boosts-hadoop-scheduling-muscle-206731
The reason for delay and overload are mainly because, what Facebook was doing involved taking a job tracker program along with many task trackers and executing them so that it processes the data. The role of the job tracker was to manage the cluster resources and schedule all the user jobs, thus funneling it all to the individual task tracker programs. But as the number of jobs increased, the job tracker program just couldn't handle its responsibilities adequately enough for Facebook’s needs, the company said that its cluster utilization would drop because of the overload.
ReplyDeleteOther frustrations that it found with MapReduce included the fixed “slot-based resource management model” which divides the cluster into a fixed number of map and reduce slots designed by a specific configuration, it felt this was inefficient because slots become wasted anytime the cluster workload doesn't match up with the configuration. Also, when software upgrades needed to happen, Facebook found that all running jobs needed to be “killed” or cease to operate, resulting in significant downtime.
Sam,
DeletePlease provide references. Also, see my comment to Prashant below.
Fadel
Reference: http://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920
DeletePrashant,
ReplyDeleteCan you please provide the URL for the reference? Also, are there any other articles that support this view? I am assuming that this is a controversial topic so it would be good to see both sides of the story :)
Fadel
Thanks for the feedback on this post. It is a new implementation by using their own in house project. Facebook's data warehouse has grown over 2500 times in the past 4-5 years and they are confident that this implementation will result in more efficient processing.
ReplyDeleteFacebook will be making upgrades to Corona as time goes by, since it is now a core part of the company’s infrastructure. This was published in November 2012. It is interesting to see how this turns out as it might not have been fully converted to scheduling with Corona. It could possibly solve their data issues on a 'Facebook' scale. I have not found any adverse comments to this implementation. We could possibly hear about some trouble they faced with 'Corona' in the following few months. This is an interesting piece and I will follow up on it and post if I get any articles hearing the downside of 'Corona'. For now all seems to be good with their latest in-house innovation:)
Thanks
Prashant