Monday, February 25, 2013

Data mining in sports


Personally, I am a big fan of football, which is called “soccer” in the states. Today, I’d like to share information about the application of data mining in sports areas. As we easily know, before the advent of computer science and data mining techniques, sportsmen and coaches make their decisions based mainly on human expertise. However, in recent decades, they have become more and more relied on the data collected in different aspects.

As a result, most sports organizations hire statisticians to gain better way to measure performance and offer better decision-making criteria. Furthermore, they have to find a way to extract valuable knowledge from the big data by using data mining techniques. By transforming this data into actionable knowledge, scouts, managers and coaches are able to have a better idea of what to expect from opponents and to use a player draft more effectively and successfully.

Data mining techniques used in sports includes statistical analysis, pattern discovery and outcome predictions. A variety of non-typical sports data can be similarly monitored including injury likelihood. Here is on example: AC Milan, my favorite Italian football club, is employing biomedical tool piloted by which uses software to monitor workouts that helps to predict and prevent players’ injuries. Apparently, most of professional football teams have been using this kind of software to do the same function as well.

Another example of novel data mining research comes from the discovery that physical aptitude is related  to anticipated physical performance. It is reported that the National Football League (NFL) conducts aCombine” every year, where prospective college draft players are run through a series of physical drills in front of team scouts and coaches. There is a mental test in Combine called Wonderlic Personnel Test, which assesses the intellectual capacity of the players. The NFL has developed expected Wonderlic scores based on amount of intelligence required to play a particular position; e.g., a quarterback should have a higher Wonderlic score (24), than a halfback (16).

Nevertheless, data statistics somehow can be misleading if one can’t understand their essential meaning, due to imprecise measurement of an event or the sports community’s misuse and over reliance on particular statistics.

 All in all, in sports areas, people have been using data mining techniques, which bring in “revolution” at some point. It result in better team performance by matching players to certain situations, identifying individual player contribution, evaluating the tendencies of opposition, and exploiting any weaknesses.

3 comments:

  1. I was going to blog about something very similar to this, but after reading your post I will just comment and further your argument. Although I am not familiar with "soccer" as I am the National Football League, I wonder if we could determine which organizations are on the forefront of statistical decision making using big data? One article that I found recently posted in the NY Times (http://www.nytimes.com/2012/11/25/sports/football/more-nfl-teams-hire-statisticians-but-their-use-remains-mostly-guarded.html?pagewanted=all&_r=0) suggests that NFL teams are not yet utilizing data mined on future athletes. The article linked above follows very close to your argument about under utilized statistics. One NFL coach is quoted as saying, “We’re still about people here.” when presented with the idea of weighing statistical analysis of prospective athletes prior performances. I believe there is a general consensus around professional sports that statistics are just that, "statistics" and should not play a factor in decision making. I agree that if data mining techniques do become a more weighted factor in decision making, we will see a "revolution" at some point.

    ReplyDelete
  2. Baseball is one of the leaders when it comes to mining statistical data. Most are familiar with the movie Moneyball where the Oakland Athletics developed advanced statistics called sabremetrics to grade players. They were able field competitive teams on a very limited payroll. The Boston Red Sox adopted the sabremetrics model in 2002 after witnessing Oakland’s success. Two years later, they won the franchise’s first World Series Championship in 86 years. Terry Francona was the manager of this team and is a staunch supporter of the use of sabremetrics. He is now with a new team, the Cleveland Indians. It will be interesting to see if sabremetrics will successful in Cleveland.
    The trick to seizing the advantages with mining sports data or any other type of data is not only mining the right data, but also applying your results in a correct and logical way. For example in baseball, the two most popular pitching statistics are wins and earned run average. While these can tell part of the story, they don’t quite tell you everything you need to know. For example, a pitcher can have a record of 20-5 on the season, but he might be on a team with a great offense so the fact that he gives up six or seven runs a game is hidden. ERA is a little more useful statistics in that it tells you on average how many earned runs the pitcher gives up over nine innings; however one bad game can skew the results greatly. It’s also not very good when comparing ERAs of players in different eras. One sabremetric designed to compare pitchers more objectively is called pitching runs. The formula for pitching runs is this: Pitching runs= (Total Innings Pitched)x(League ERA/9)-(Earned Runs). This way each pitcher is compared to the same standard, the league ERA.
    Further information on these sabremetrics can be found at the following websites:
    http://www-math.bgsu.edu/~albert/papers/saber.html
    http://www.baseballprospectus.com/glossary/index.php?context=2&category=true

    ReplyDelete
  3. Sam and Rusty,

    Thank you for your valuable contributions. I agree there is much more area for application for data mining in sports since the question now is how can we outsmart other teams?

    Fadel

    ReplyDelete