Personally, I am a big fan of football,
which is called “soccer” in the states. Today, I’d like to share information
about the application of data mining in sports areas. As we easily know, before
the advent of computer science and data mining techniques, sportsmen and
coaches make their decisions based mainly on human expertise. However, in
recent decades, they have become more and more relied on the data collected in
different aspects.
As a result, most sports organizations hire
statisticians to gain better way to measure performance and offer better
decision-making criteria. Furthermore, they have to find a way to extract valuable
knowledge from the big data by using data mining techniques. By transforming this
data into actionable knowledge, scouts, managers and coaches are able to have a
better idea of what to expect from opponents and to use a player draft more
effectively and successfully.
Data mining techniques used in sports
includes statistical analysis, pattern discovery and outcome predictions. A
variety of non-typical sports data can be similarly monitored including injury
likelihood. Here is on example: AC Milan, my favorite Italian football club, is
employing biomedical tool piloted by which uses software to monitor workouts
that helps to predict and prevent players’ injuries. Apparently, most of professional
football teams have been using this kind of software to do the same function as
well.
Another example of novel data mining
research comes from the discovery that physical aptitude is related to anticipated physical performance. It is
reported that the National Football League (NFL) conducts a“Combine” every
year, where prospective college draft players are run through a series of
physical drills in front of team scouts and coaches. There is a mental test in
Combine called Wonderlic Personnel Test, which assesses the intellectual
capacity of the players. The NFL has developed expected Wonderlic scores based
on amount of intelligence required to play a particular position; e.g., a
quarterback should have a higher Wonderlic score (24), than a halfback (16).
Nevertheless, data statistics
somehow can be misleading if one can’t understand their essential meaning, due
to imprecise measurement of an event or the sports community’s misuse and over
reliance on particular statistics.
I was going to blog about something very similar to this, but after reading your post I will just comment and further your argument. Although I am not familiar with "soccer" as I am the National Football League, I wonder if we could determine which organizations are on the forefront of statistical decision making using big data? One article that I found recently posted in the NY Times (http://www.nytimes.com/2012/11/25/sports/football/more-nfl-teams-hire-statisticians-but-their-use-remains-mostly-guarded.html?pagewanted=all&_r=0) suggests that NFL teams are not yet utilizing data mined on future athletes. The article linked above follows very close to your argument about under utilized statistics. One NFL coach is quoted as saying, “We’re still about people here.” when presented with the idea of weighing statistical analysis of prospective athletes prior performances. I believe there is a general consensus around professional sports that statistics are just that, "statistics" and should not play a factor in decision making. I agree that if data mining techniques do become a more weighted factor in decision making, we will see a "revolution" at some point.
ReplyDeleteBaseball is one of the leaders when it comes to mining statistical data. Most are familiar with the movie Moneyball where the Oakland Athletics developed advanced statistics called sabremetrics to grade players. They were able field competitive teams on a very limited payroll. The Boston Red Sox adopted the sabremetrics model in 2002 after witnessing Oakland’s success. Two years later, they won the franchise’s first World Series Championship in 86 years. Terry Francona was the manager of this team and is a staunch supporter of the use of sabremetrics. He is now with a new team, the Cleveland Indians. It will be interesting to see if sabremetrics will successful in Cleveland.
ReplyDeleteThe trick to seizing the advantages with mining sports data or any other type of data is not only mining the right data, but also applying your results in a correct and logical way. For example in baseball, the two most popular pitching statistics are wins and earned run average. While these can tell part of the story, they don’t quite tell you everything you need to know. For example, a pitcher can have a record of 20-5 on the season, but he might be on a team with a great offense so the fact that he gives up six or seven runs a game is hidden. ERA is a little more useful statistics in that it tells you on average how many earned runs the pitcher gives up over nine innings; however one bad game can skew the results greatly. It’s also not very good when comparing ERAs of players in different eras. One sabremetric designed to compare pitchers more objectively is called pitching runs. The formula for pitching runs is this: Pitching runs= (Total Innings Pitched)x(League ERA/9)-(Earned Runs). This way each pitcher is compared to the same standard, the league ERA.
Further information on these sabremetrics can be found at the following websites:
http://www-math.bgsu.edu/~albert/papers/saber.html
http://www.baseballprospectus.com/glossary/index.php?context=2&category=true
Sam and Rusty,
ReplyDeleteThank you for your valuable contributions. I agree there is much more area for application for data mining in sports since the question now is how can we outsmart other teams?
Fadel