Thursday, March 14, 2013

Big Data and March Madness




There probably isn't a bigger area of concern for data analytics right now than the NCAA Division I Men's Basketball tournament, which begins next week with Selection Sunday coming up in just a couple of days.  This sports phenomenon completely enthralls the United States for the entire month of March.  In March of 2012, ESPN drew 30.4 million unique viewers to their websites.  42% of IT professionals say that March Madness has affected their networks, with 37 % reporting that their networked slowed, and 34% saying that March Madness basically crashed their network.  This is just the user activity surrounding the tournament. The tournament itself consists of 68 teams playing for the championship.  This bracket set up allows for 147.57 quintillion possible combinations.  So with this many possible combinations and the potential to win money, it is needless to say that people are looking for a competitive advantage.  One of the tools being currently used is called BracketOdds which lets you know the probability of any combination of seeds making it to a particular round of the tournament.  The most likely Final Four seeds for this year are 1,1,2,3 with a 16.08 to 1 chance of occurring.  The big problem with predicting March Madness is the appearance of a Cinderella or underdog team, which are sometimes referred to as "Bracket Busters" due to the fact they will mess up a majority of people's brackets.  Last year two 15 seeds upset 2 seeds an occurrence which has only happened 4 times in the previous history of the tournament, also a 16 seed has never beaten a 1 seed.  It is needless to day that March Madness is an excellent opportunity and challenge for people interested in Data Analytics everywhere given the amount of interest and the almost innumerable possibilities of the tournament.

Resources:

http://www.baselinemag.com/networking/slideshows/dont-let-march-madness-shut-down-your-network/
http://spotfire.tibco.com/blog/?p=11028

1 comment:

  1. I agree with you Patrick. The "bracket challenge" is definitely offers the opportunity to gain a statistical advantage through implementing big data analytics. It would be interesting to see an algorithm which could pick a winner given two teams (it would have to be run 64 times to complete a full bracket. A database could be constructed with all available confidence percentages on specific teams. It could incorporate statistics like the fact that highly ranked Gonzaga hasn't made the final four since 1999 despite being a 5 seed or lower multiple years. This would allow a user to build a bracket based on the teams involved and possibly predict the "Cinderella" team. It could also account for the bracket buster that seems to always occur in a 5-12 matchup. Since 1985, there have been 38 12-seed teams to win their opening games with 19 of them advancing to the sweet 16. Also, a 16th seeded team has never defeated a #1 team and a 15 seed has never advanced past the second round. If you incorporated this data, you could strategically make a bracket based solely on the seedings set by the NCAA committee, who vigorously gauge each team and rank them based on professional opinions. I employ somewhat of a statistical approach when completing my bracket and I'm sure most other basketball fans do as well. But as we've studied this semester, there are definitely more ways than the ordinary to gain an advantage through an in depth statistical approach. I might just have to fill out a new bracket!

    ReplyDelete