This is the second in a series of posts that will review the 2009 season (as well as 2008) from an adjusted stats perspective.
Tentative Publication Schedule
Part I: Which Stats Correlate Best to Winning? - 6/24/10
Part II: Drilling Down and Regression Models - 6/28/10
Part III: Passing Efficiency Formula - 6/30/10
Part IV: Conference Strengths and Pace - 7/2/10
Part V: Testing Conventional Wisdom - 7/16/10
Part VI: Team Matchups - 7/20/10
Part VII: Year-to-Year Changes - 7/22/10
Part VIII: Points per Yard Efficiencies - 7/26/10
Part IX: Data Dump - Team Rankings - 7/28/10
Part X: Data Dump - Team Reports - 7/30/10
Drilling Down and Regression Models
While correlation coefficients have been provided between individual statistics and adjusted winning percentage in the previous post, I have not delved into them in more detail because I believe that the best way to attack the idea of what contributes most to winning is on a level-by-level basis. The first level is comprised of the wins and losses themselves. The second level is points scored and points allowed because these are the statistics that contribute directly to winning and losing. So instead of attempting to correlate everything directly to winning, what statistics correlate best to scoring points and best to allowing (or preventing) points? Let’s start with scoring points on a per game basis. Leaving out the other point scoring calculations, here are the offensive stats and their correlations with scoring points (note that I have left out conversion percentages and other statistics that are a result of the below contributing stats):
Rank | Statistic | 2009 Coefficient | 2008 Coefficient | Average |
1 | Yards per Play | 0.870 | 0.880 | 0.875 |
2 | Yards per Game | 0.860 | 0.880 | 0.870 |
3 | Passing Efficiency | 0.854 | 0.858 | 0.856 |
4 | Total Passing per Attempt | 0.836 | 0.857 | 0.847 |
5 | Yards per Possession | 0.818 | 0.844 | 0.831 |
6 | Yards per Attempt | 0.778 | 0.832 | 0.805 |
7 | Yards per Carry | 0.665 | 0.604 | 0.635 |
8 | Total Rushing per Carry | 0.603 | 0.558 | 0.580 |
9 | Rushing Yards per Game | 0.545 | 0.493 | 0.519 |
10 | Turnovers per Play | -0.525 | -0.495 | -0.510 |
11 | Total Passing per Game | 0.453 | 0.556 | 0.505 |
12 | Total Rushing per Game | 0.521 | 0.474 | 0.497 |
13 | Sacks Allowed per Game | -0.520 | -0.434 | -0.477 |
14 | Passing Yards per Game | 0.403 | 0.519 | 0.461 |
15 | Interceptions Thrown per Attempt | -0.479 | -0.406 | -0.443 |
16 | Turnovers per Possession | -0.452 | -0.416 | -0.434 |
17 | Turnovers per Game | -0.448 | -0.418 | -0.433 |
18 | Sacks Allowed per Attempt | -0.452 | -0.399 | -0.425 |
19 | Interceptions Thrown per Game | -0.472 | -0.358 | -0.415 |
20 | Punt Return Average | 0.373 | 0.410 | 0.391 |
21 | Kickoff Return Average | 0.529 | 0.147 | 0.338 |
22 | Fumbles per Carry | -0.326 | -0.308 | -0.317 |
23 | Fumbles per Game | -0.172 | -0.148 | -0.160 |
It’s immediately apparent that the per game stats are less correlated than the rate, or per play, stats. Yards per play appears the top of the list as we should have expected. It may seem that we should treat yards per play as the next level and then determine correlations to that number. However, I am going to skip that level as it is a straightforward product of the total passing per attempt and total rushing per carry numbers along with a team’s play selection ratio. Another thing worth mentioning is that passing efficiency, included above, is simply a combination stat of others already included – specifically Yards per Attempt, Interceptions per Attempt, and the non-included completion percentage and touchdowns per attempt. Determining a better Passing Efficiency formula is a study of its own.
And how would we attempt to improve that formula? Regression analysis is a quick and easy way thanks to available technology. The first place I wanted to use it, though, was to generate a formula to approximate the number of points per game a team should expect to score based on the most important contributing statistics: Total Passing Yards per Attempt, Total Rushing Yards per Carry, and Turnovers per Play. Here are the results for the past two seasons alone and combined:
2009 – Points per Game = 3.69*TPYPA + 2.02*TRYPC - 249.46*TOPP
2008 – Points per Game = 4.10*TPYPA + 1.85*TRYPC - 267.42*TOPP
Combined - Points per Game = 3.88*TPYPA + 1.93*TRYPC - 254.40*TOPP
All of the regression results showed good confidence values; for the combined version with a population of 240, the P-values for TPYPA, TRYPC, and TOPP were 1.53E-36, 3.87E-15, and 1.91E-35, respectively along with t-statistics of 15.10, 8.41, and -14.77, respectively. Using this information, we would expect to find that teams scoring reasonably above or below their predicted number of points per game should have extraordinary defensive numbers, pace numbers, special teams numbers, or a combination thereof. Here are the defensive numbers, both correlation and regression:
Rank | Statistic | 2009 Coefficient | 2008 Coefficient | Average |
1 | Yards Allowed per Game | 0.916 | 0.909 | 0.912 |
2 | Yards Allowed per Play | 0.904 | 0.893 | 0.899 |
3 | Passing Efficiency Allowed | 0.869 | 0.866 | 0.867 |
4 | Yards Allowed per Possession | 0.874 | 0.844 | 0.859 |
5 | Total Passing Allowed per Attempt | 0.872 | 0.830 | 0.851 |
6 | Rushing Yards Allowed per Game | 0.821 | 0.873 | 0.847 |
7 | Total Rushing Allowed per Game | 0.810 | 0.867 | 0.838 |
8 | Yards Allowed per Attempt | 0.857 | 0.788 | 0.823 |
9 | Yards Allowed per Carry | 0.800 | 0.845 | 0.822 |
10 | Total Rushing Allowed per Carry | 0.756 | 0.793 | 0.774 |
11 | Sacks per Game | -0.653 | -0.604 | -0.628 |
12 | Interceptions per Game | -0.645 | -0.599 | -0.622 |
13 | Turnovers Forced per Play | -0.581 | -0.588 | -0.585 |
14 | Net Kickoff Average | -0.564 | -0.522 | -0.543 |
15 | Total Passing Allowed per Game | 0.626 | 0.457 | 0.542 |
16 | Sacks per Attempt | -0.585 | -0.497 | -0.541 |
17 | Interceptions per Attempt | -0.549 | -0.520 | -0.535 |
18 | Turnovers Forced per Game | -0.528 | -0.526 | -0.527 |
19 | Turnovers Forced per Possession | -0.502 | -0.513 | -0.507 |
20 | Passing Yards Allowed per Game | 0.573 | 0.375 | 0.474 |
21 | Net Punting Average | -0.277 | -0.320 | -0.299 |
22 | Fumbles Forced per Carry | -0.213 | -0.273 | -0.243 |
23 | Fumbles Forced per Game | -0.010 | -0.075 | -0.042 |
2009 – Points Allowed per Game = 3.68*TPAPA + 2.42*TRAPC - 318.12*TFPP
2008 – Points Allowed per Game = 3.62*TPAPA + 2.89*TRAPC - 348.70*TOPP
Combined - Points Allowed per Game = 3.61*TPAPA + 2.70*TRAPC - 331.02*TOPP
The most readily apparent issue is that the offensive and defensive results differ. What’s important to keep in mind, then, is that this is not a game-by-game analysis. What we have done here is to attempt to put a formula together that will predict season-long scoring performance on a per game basis. That prediction is based on the strength of a team’s individual passing, running, and ball protection capabilities. Admittedly, though, it does raise the question regarding a game-specific correlation and regression analysis. Using game data, then, here are the results for each season and both together, keeping in mind that the results for points scored and points allowed will necessarily be identical here:
Rank | Statistic | 2009 Coefficient | 2008 Coefficient | Average |
1 | Yards | 0.756 | 0.762 | 0.759 |
2 | Yards per Play | 0.739 | 0.743 | 0.741 |
3 | Passing Efficiency | 0.671 | 0.684 | 0.678 |
4 | Yards per Possession | 0.650 | 0.644 | 0.647 |
5 | Total Passing per Attempt | 0.641 | 0.637 | 0.639 |
6 | Yards per Attempt | 0.597 | 0.597 | 0.597 |
7 | Rushing Yards | 0.543 | 0.531 | 0.537 |
8 | Total Rushing | 0.525 | 0.514 | 0.520 |
9 | Yards per Carry | 0.524 | 0.514 | 0.519 |
10 | Total Passing | 0.444 | 0.462 | 0.453 |
11 | Total Rushing per Carry | 0.450 | 0.443 | 0.446 |
12 | Passing Yards | 0.408 | 0.426 | 0.417 |
13 | Turnovers per Play | -0.328 | -0.354 | -0.341 |
14 | Sacks | -0.350 | -0.305 | -0.327 |
15 | Turnovers per Possession | -0.309 | -0.334 | -0.322 |
16 | Interceptions | -0.296 | -0.296 | -0.296 |
17 | Turnovers | -0.280 | -0.309 | -0.295 |
18 | Sacks per Attempt | -0.286 | -0.276 | -0.281 |
19 | Interceptions per Attempt | -0.274 | -0.260 | -0.267 |
20 | Fumbles per Carry | -0.196 | -0.185 | -0.190 |
21 | Punt Return Average | 0.112 | 0.188 | 0.150 |
22 | Kickoff Return Average | 0.187 | 0.092 | 0.140 |
23 | Net Kickoff Average | 0.132 | 0.095 | 0.113 |
24 | Fumbles | 0.095 | 0.080 | 0.087 |
25 | Net Punting Average | -0.071 | -0.051 | -0.061 |
2009 – Points = 2.72*TPYPA + 2.65*TRYPC - 127.05*TOPP
2008 – Points = 2.84*TPYPA + 2.78*TRYPC - 142.41*TOPP
Combined - Points = 2.78*TPYPA + 2.71*TRYPC - 134.61*TOPP
Generally speaking we see lower correlation values across the board when we look at the games individually. My first thought on that is that in a single game we see more variation and effect from other inputs such as a team’s defense and its effect on points scored. This includes not only defensive and/or special teams scores but also field position, turnovers forced, etc. These factors may wash out over the course of the year such that season-long performance stats are more directly related to a unit’s own capability with less of a correlation to other units’ performance, although the other units clearly will still have an impact.
Another interesting note is that turnovers cost teams more points in their season averages than they do in an individual game. Similarly, efficient passing games create more points over the long haul than in a single matchup. I’m not completely sure what to make of that aspect of the results, but intuitively it seems that perhaps turnovers can be overcome in a single game but teams that have a consistent problem with gifting possession to their opponents find it difficult to always overcome the issue. On the passing game side of the coin, perhaps the efficient passing games statistically lead to fewer points on their own as leading teams revert to shorter passes and their run game to milk the clock. That one is a bit trickier to understand.
Football will remain the most difficult of popular sports to analyze statistically based on its requirement of continuous action between 22 players on the field at the same time. All of these actions take place completely in parallel on any given play leading to a tremendous amount of interdependency as the players accrue their individual statistics. This helped lead me to make my attempt at tackling the team statistics first; they are far simpler to handle as the interaction between two teams in discrete events be they plays, possessions, or games. Is it impossible to rate football players with better statistical analysis? No, but the amount of data acquisition would be staggering and even that can come only after careful consideration of what interactions we should track. For now I am resigned to stop at the team level and hope that we can glean at least some sort of useful conclusions there.
Thoughts, comments, requests, and suggestions welcome.