Tuesday, April 2, 2013

How Netflix Recommendations Are Made



Netflix uses a wide array of Big Data techniques to generate their above average recommendations. Netflix uses machine-learning algorithms heavily, essentially before or after almost every other step, in generating recommendations. This focus is important because it raises significant issues with processing. With online processing, user interactions are responded to rapidly, but the amount of data that can be processed and the computational complexity of the processing are limited. Offline processing alleviates both of these issues, but lowers responsiveness, increasing the likelihood of data becoming outdated during processing. Nearline processing is a middle ground option that allows for online processing but is not required to occur in real time. With each of these possibilities come complex consequences and side effects. To control this, Netflix uses a combination of all three methods of processing across Amazon’s Web Services in an architecture illustrated below.


As you can see, this is an extremely complex setup. Netflix uses offline processing for calculating overarching trends or other things that require no user input, as well as machine learning to develop algorithms that can be used for result calculations. Nearline processing is used largely to develop backup plans should online processing fail to produce results as quickly as required. Nearline is also used in situations where time is of less importance than accuracy, for instance updating recommendations to show that a movie has been watched, while the user is watching the movie. Online computing is used largely in response to user activity, such as searching for a category. Netflix’s hybrid approach is particularly useful in situations where intermediate results can be batch processed and then used to calculate more specific results in real time in response to user activity. Most of Netflix’s model training and machine learning is done offline and then used online.
Netflix's hybrid approach is particularly important to big data, because it manages to create very strong recommendations, less likely to be accomplished using only online or nearline methods, while still maintaining a fast response time that would not be possible using only offline approaches.

Source: http://techblog.netflix.com/2013/03/system-architectures-for.html

1 comment:

  1. After reading this post I was interested on how youtube conducted their recommendation algorithm. What I found is that they have a different outlook on how their algorithm should run. Because the Youtube servers are growing at a ridiculously quick rate their algorithm takes into account the newness of a video as well. So in order to keep youtubers interested. The analysts at Youtube have created an algorithm that recommends a mixture of old and new videos in order to keep new content from getting buried immediately. This also is good news to youtubers that create content on the site. It gives them a larger chance for their content to be seen, and I think this is what you tube was going for.

    ReplyDelete