Well it time for the Super Bowl yet again and if your
wanting to spice up the game with a little betting (legally of course...) or
just want to the inside scoop on how the game might turn out then a little data
mining might just be the ticket.
Anyone that has ever seen the movie Money Ball understands
that in Professional Baseball there is a large potential for statistics to give
insights on how to manage a team. In Money Ball (and this is used by many professional teams) statistics were used
to assist in personal decisions including hiring, firing and what position
players should fill. One of the reasons
that the use of statistics in baseball is so powerful is the fact that the game
of baseball in many ways supports the achievements of the individual in accomplishing
the goal of winning a game. A single player can score a point by hitting a home
run of conversely save his team by catching a deep fly in the air for the out. This allows you to create measures of an
individual player and compare against many establish players.
If you were to Google search the use of Money Ball in
professional football you find two distinctly different results. The first is websites
that claim they can make you rich by selling you information that they compiled
on teams and players. None of these websites actually allow you to see an
example of the information that the website provides, but one might conclude
that any person running these websites who has achieved defined formulas that
produce reliable results wouldn't be selling them online and would instead sell
them to professional teams.
The other result you would find in your Google search are
articles and forums that discuss the challenge of using Money Ball type statistics
in football. One of the hardest problems is that unlike baseball, the probability
of a successful football play depends not on just a single player but a group
of players. And determining who is the most responsible for a plays success or failure
is not always clear.
In this blog post I want to try and look at one statistic
that can give some insight on player performance as it relates to the probability
of winning a game; in this case the game is the Super Bowl. The quarterback is
one of the most important offensive players on a football team. He has many responsibilities
but one of the biggest is the responsibility of reading how a play is
progressing and then throwing the ball to one of his team mates. When the ball
is caught by the opposing team, an interception, this can potentially have a devastating
impact on the outcome of a game. Because interceptions can be game changers, I want to look at each teams history of offensive interceptions, meaning there quarterback threw the ball and the offsoing teams defense intercepted it.
As with any data mining project, the first step is to find a
data set that will allow me to extract the information I want. There is a website
that offers free files with every play of every game in the NFL. This website
is called AdvancedNFLStats.com (Thanks to Patrick for showing us this in
class). The only information that the website had posted containing this years data was for the regular
season, but beggars can't be choosers. This data set has 42,537 lines representing
that many plays ran during the regular season by all teams in the NFL. I know the two teams that are
playing in the Super Bowl so I ran a macro that deleted all the lines that not
involve Baltimore or San Francisco. These leaves me with 5353 lines/plays to
look at.
My focus is get a better understanding of each teams interceptions
during the regular season, to see if there is any trend to when they occur. Luckily,
the data set includes a description of the play and if the play had an interception,
that is included in the description. So I ran a macro that searched each lines
description for the word interception and then places either true or false in a
column I created at the end of the data set. Then all I had to do was delete
all the cells that did not contain a the word true in the column that I creates
previously. After that there was only a few lines of code for each team and it
was easier for me to find the trend information in the description than use an algorithm (after all: time is money).
Below is all the trend data that I found from my results for
each team. I broke out the results by
the quarterback that was in the game at the time for more clarity. Then I
looked for trends that related to where the ball was thrown. Results in chart
below. By far the biggest trend was
Flacco throws more interceptions when throwing short than long. Also, Smith
doesn't seem to have a problem getting intercepted in the middle.
Lastly, I wanted to know if there was any correlation to interceptions
and whether the team was losing or not. So I wrote an equation in excel that would
tell me whether the team was losing or winning the game. Results in chart
below. From just the raw statistics I don't think there is an discernible difference.
Tomorrow I will look at the teams defense and there abilities
to produce turnovers by interception.
Offensive Interception Statistics
Team
|
Total
|
Short
|
Deep
|
Left
|
Middle
|
Right
|
Winning
|
Loosing
|
San Francisco (Smith)
|
5
|
3
|
2
|
2
|
0
|
3
|
2
|
3
|
San Francisco (Kaepernick)
|
3
|
1
|
2
|
1
|
2
|
0
|
1
|
2
|
Baltimore (Flacco)
|
10
|
8
|
2
|
3
|
4
|
3
|
5
|
5
|
Work Sited:
Data set retrieved from: http://www.advancednflstats.com/2010/04/play-by-play-data.html
As a typical Monday morning quarterback would, we can take a look at the great analysis completed by Mr. Joshua Jacks, compiled with the quarterback’s interception totals from the playoff games including the championship game. As Mr. Jacks did the hard work and compiled the entire year’s worth of data for all the NFL teams passing plays. After his analysis showed which quarterback was more likely to throw an interception and where, I then added his results to the playoff results for each team quarterback. I researched the Box Scores to see if the quarterback threw an interception and then found what type of pass play he threw it on within the Play-By-Play. The Scale of each quarterbacks interception totals is different. I included the number of games that each quarterback played inside the table. At one point in time Colin Kaepernick was the backup quarterback to Alex Smith. That was until Alex Smith was hurt and Kaepernick won the starting job over Smith. So Kaepernick’s game #’s could be a little misleading because seven of those games he was backing up Smith. Since the 49er’s had a bye the first week of the playoffs they played one less game than the Ravens.
ReplyDeleteSince each team didn’t play the same number of games and each quarterback didn’t play the same number of full games, I decided to include pass attempts into the analysis. After adding pass attempts, you can look at the ratio of Interceptions / Pass Attempt. Joe Flacco and Colin Kaepernick had similar ratios with Flacco attempting over double the pass attempts. However, Flacco didn’t record a single interception in the playoffs and Kaepernick had two (one with the score tied and the other when the 49er’s were losing). Flacco proves to be equally effective whether winning or losing, while Kaepernick proves to throw more interceptions when either tied or losing. Since the Super Bowl has already been played, it is easy to see why the Super Bowl actually played out the way it did.
Offensive Interception Statistics - Regular Season and Post Season
Team Games Pass Att Total Int/Attempt Winning Tie Losing
"Ravens
(Flacco)" 20 657 10 0.01522 5 0 5
"49er's
(Smith)" 10 218 5 0.02294 2 0 3
"49er's
(Kaepernick)" 16 298 5 0.01678 1 1 3
Josh and Jason,
ReplyDeleteThank you both for two great posts. Very interesting to look at the NFL from these perspectives.
It would be very interesting to see if this analysis holds for the last few years. Some additional interesting followup questions:
1- Is there a difference between CFB and NFL in terms of stats? (To normalize the data, you may want to only consider only SEC games since the SEC is often considered the best conference in CFB).
2- Can we better predict the success of quarterbacks in the NFL?
3- How can you change your typical offensive gameplan to account for defensive vulnerabilities for the opposing team? --> How would you measure their vulnerabilities (In basketball it is more straightforward, you can look at the percentage of FGs made vs missed from different areas of the court).
4- etc
For CFB stats, please check: http://www.cfbstats.com/blog/college-football-data/ (2005-2012 statistics)
Fadel
Fadel
Thanks for sharingData Mining software service providers
ReplyDelete