Friday, February 1, 2013

Money Ball For Super Bowl Sunday?




Well it time for the Super Bowl yet again and if your wanting to spice up the game with a little betting (legally of course...) or just want to the inside scoop on how the game might turn out then a little data mining might just be the ticket.

Anyone that has ever seen the movie Money Ball understands that in Professional Baseball there is a large potential for statistics to give insights on how to manage a team. In Money Ball (and this is used by  many professional teams) statistics were used to assist in personal decisions including hiring, firing and what position players should fill.  One of the reasons that the use of statistics in baseball is so powerful is the fact that the game of baseball in many ways supports the achievements of the individual in accomplishing the goal of winning a game. A single player can score a point by hitting a home run of conversely save his team by catching a deep fly in the air for the out.  This allows you to create measures of an individual player and compare against many establish players.

If you were to Google search the use of Money Ball in professional football you find two distinctly different results. The first is websites that claim they can make you rich by selling you information that they compiled on teams and players. None of these websites actually allow you to see an example of the information that the website provides, but one might conclude that any person running these websites who has achieved defined formulas that produce reliable results wouldn't be selling them online and would instead sell them to professional teams.

The other result you would find in your Google search are articles and forums that discuss the challenge of using Money Ball type statistics in football. One of the hardest problems is that unlike baseball, the probability of a successful football play depends not on just a single player but a group of players. And determining who is the most responsible for a plays success or failure is not always clear.



In this blog post I want to try and look at one statistic that can give some insight on player performance as it relates to the probability of winning a game; in this case the game is the Super Bowl. The quarterback is one of the most important offensive players on a football team. He has many responsibilities but one of the biggest is the responsibility of reading how a play is progressing and then throwing the ball to one of his team mates. When the ball is caught by the opposing team, an interception, this can potentially have a devastating impact on the outcome of a game. Because interceptions can be game changers, I want to look at each teams history of offensive interceptions, meaning there quarterback threw the ball and the offsoing teams defense intercepted it. 

As with any data mining project, the first step is to find a data set that will allow me to extract the information I want. There is a website that offers free files with every play of every game in the NFL. This website is called AdvancedNFLStats.com (Thanks to Patrick for showing us this in class). The only information that the website had posted containing this years data was for the regular season, but beggars can't be choosers. This data set has 42,537 lines representing that many plays ran during the regular season by all teams in the NFL. I know the two teams that are playing in the Super Bowl so I ran a macro that deleted all the lines that not involve Baltimore or San Francisco. These leaves me with 5353 lines/plays to look at.

My focus is get a better understanding of each teams interceptions during the regular season, to see if there is any trend to when they occur. Luckily, the data set includes a description of the play and if the play had an interception, that is included in the description. So I ran a macro that searched each lines description for the word interception and then places either true or false in a column I created at the end of the data set. Then all I had to do was delete all the cells that did not contain a the word true in the column that I creates previously. After that there was only a few lines of code for each team and it was easier for me to find the trend information in the description than use an algorithm (after all: time is money).

Below is all the trend data that I found from my results for each team.  I broke out the results by the quarterback that was in the game at the time for more clarity. Then I looked for trends that related to where the ball was thrown. Results in chart below.  By far the biggest trend was Flacco throws more interceptions when throwing short than long. Also, Smith doesn't seem to have a problem getting intercepted in the middle.

Lastly, I wanted to know if there was any correlation to interceptions and whether the team was losing or not. So I wrote an equation in excel that would tell me whether the team was losing or winning the game. Results in chart below. From just the raw statistics I don't think there is an discernible difference.

Tomorrow I will look at the teams defense and there abilities to produce turnovers by interception.

Offensive Interception Statistics
Team
Total
Short
Deep
Left
Middle
Right
Winning
Loosing
San Francisco (Smith)
5
3
2
2
0
3
2
3
San Francisco (Kaepernick)
3
1
2
1
2
0
1
2
Baltimore (Flacco)
10
8
2
3
4
3
5
5








Work Sited:
Data set retrieved from: http://www.advancednflstats.com/2010/04/play-by-play-data.html

3 comments:

  1. As a typical Monday morning quarterback would, we can take a look at the great analysis completed by Mr. Joshua Jacks, compiled with the quarterback’s interception totals from the playoff games including the championship game. As Mr. Jacks did the hard work and compiled the entire year’s worth of data for all the NFL teams passing plays. After his analysis showed which quarterback was more likely to throw an interception and where, I then added his results to the playoff results for each team quarterback. I researched the Box Scores to see if the quarterback threw an interception and then found what type of pass play he threw it on within the Play-By-Play. The Scale of each quarterbacks interception totals is different. I included the number of games that each quarterback played inside the table. At one point in time Colin Kaepernick was the backup quarterback to Alex Smith. That was until Alex Smith was hurt and Kaepernick won the starting job over Smith. So Kaepernick’s game #’s could be a little misleading because seven of those games he was backing up Smith. Since the 49er’s had a bye the first week of the playoffs they played one less game than the Ravens.

    Since each team didn’t play the same number of games and each quarterback didn’t play the same number of full games, I decided to include pass attempts into the analysis. After adding pass attempts, you can look at the ratio of Interceptions / Pass Attempt. Joe Flacco and Colin Kaepernick had similar ratios with Flacco attempting over double the pass attempts. However, Flacco didn’t record a single interception in the playoffs and Kaepernick had two (one with the score tied and the other when the 49er’s were losing). Flacco proves to be equally effective whether winning or losing, while Kaepernick proves to throw more interceptions when either tied or losing. Since the Super Bowl has already been played, it is easy to see why the Super Bowl actually played out the way it did.

    Offensive Interception Statistics - Regular Season and Post Season

    Team Games Pass Att Total Int/Attempt Winning Tie Losing
    "Ravens
    (Flacco)" 20 657 10 0.01522 5 0 5
    "49er's
    (Smith)" 10 218 5 0.02294 2 0 3
    "49er's
    (Kaepernick)" 16 298 5 0.01678 1 1 3

    ReplyDelete
  2. Josh and Jason,

    Thank you both for two great posts. Very interesting to look at the NFL from these perspectives.

    It would be very interesting to see if this analysis holds for the last few years. Some additional interesting followup questions:

    1- Is there a difference between CFB and NFL in terms of stats? (To normalize the data, you may want to only consider only SEC games since the SEC is often considered the best conference in CFB).
    2- Can we better predict the success of quarterbacks in the NFL?
    3- How can you change your typical offensive gameplan to account for defensive vulnerabilities for the opposing team? --> How would you measure their vulnerabilities (In basketball it is more straightforward, you can look at the percentage of FGs made vs missed from different areas of the court).
    4- etc

    For CFB stats, please check: http://www.cfbstats.com/blog/college-football-data/ (2005-2012 statistics)

    Fadel

    Fadel

    ReplyDelete