clock menu more-arrow no yes

Filed under:

Adjusted Stats 2009 Year in Review Part II

New, 29 comments

This is the second in a series of posts that will review the 2009 season (as well as 2008) from an adjusted stats perspective.

Tentative Publication Schedule

Part I: Which Stats Correlate Best to Winning? - 6/24/10
Part II: Drilling Down and Regression Models - 6/28/10
Part III: Passing Efficiency Formula - 6/30/10
Part IV: Conference Strengths and Pace - 7/2/10
Part V: Testing Conventional Wisdom - 7/16/10
Part VI: Team Matchups - 7/20/10
Part VII: Year-to-Year Changes - 7/22/10
Part VIII: Points per Yard Efficiencies - 7/26/10
Part IX: Data Dump - Team Rankings - 7/28/10
Part X: Data Dump - Team Reports - 7/30/10

Drilling Down and Regression Models

While correlation coefficients have been provided between individual statistics and adjusted winning percentage in the previous post, I have not delved into them in more detail because I believe that the best way to attack the idea of what contributes most to winning is on a level-by-level basis. The first level is comprised of the wins and losses themselves. The second level is points scored and points allowed because these are the statistics that contribute directly to winning and losing. So instead of attempting to correlate everything directly to winning, what statistics correlate best to scoring points and best to allowing (or preventing) points? Let’s start with scoring points on a per game basis. Leaving out the other point scoring calculations, here are the offensive stats and their correlations with scoring points (note that I have left out conversion percentages and other statistics that are a result of the below contributing stats):

Rank Statistic 2009 Coefficient 2008 Coefficient Average
1 Yards per Play 0.870 0.880 0.875
2 Yards per Game 0.860 0.880 0.870
3 Passing Efficiency 0.854 0.858 0.856
4 Total Passing per Attempt 0.836 0.857 0.847
5 Yards per Possession 0.818 0.844 0.831
6 Yards per Attempt 0.778 0.832 0.805
7 Yards per Carry 0.665 0.604 0.635
8 Total Rushing per Carry 0.603 0.558 0.580
9 Rushing Yards per Game 0.545 0.493 0.519
10 Turnovers per Play -0.525 -0.495 -0.510
11 Total Passing per Game 0.453 0.556 0.505
12 Total Rushing per Game 0.521 0.474 0.497
13 Sacks Allowed per Game -0.520 -0.434 -0.477
14 Passing Yards per Game 0.403 0.519 0.461
15 Interceptions Thrown per Attempt -0.479 -0.406 -0.443
16 Turnovers per Possession -0.452 -0.416 -0.434
17 Turnovers per Game -0.448 -0.418 -0.433
18 Sacks Allowed per Attempt -0.452 -0.399 -0.425
19 Interceptions Thrown per Game -0.472 -0.358 -0.415
20 Punt Return Average 0.373 0.410 0.391
21 Kickoff Return Average 0.529 0.147 0.338
22 Fumbles per Carry -0.326 -0.308 -0.317
23 Fumbles per Game -0.172 -0.148 -0.160

It’s immediately apparent that the per game stats are less correlated than the rate, or per play, stats. Yards per play appears the top of the list as we should have expected. It may seem that we should treat yards per play as the next level and then determine correlations to that number. However, I am going to skip that level as it is a straightforward product of the total passing per attempt and total rushing per carry numbers along with a team’s play selection ratio. Another thing worth mentioning is that passing efficiency, included above, is simply a combination stat of others already included – specifically Yards per Attempt, Interceptions per Attempt, and the non-included completion percentage and touchdowns per attempt. Determining a better Passing Efficiency formula is a study of its own.

And how would we attempt to improve that formula? Regression analysis is a quick and easy way thanks to available technology. The first place I wanted to use it, though, was to generate a formula to approximate the number of points per game a team should expect to score based on the most important contributing statistics: Total Passing Yards per Attempt, Total Rushing Yards per Carry, and Turnovers per Play. Here are the results for the past two seasons alone and combined:

2009 – Points per Game = 3.69*TPYPA + 2.02*TRYPC - 249.46*TOPP
2008 – Points per Game = 4.10*TPYPA + 1.85*TRYPC - 267.42*TOPP
Combined - Points per Game = 3.88*TPYPA + 1.93*TRYPC - 254.40*TOPP

All of the regression results showed good confidence values; for the combined version with a population of 240, the P-values for TPYPA, TRYPC, and TOPP were 1.53E-36, 3.87E-15, and 1.91E-35, respectively along with t-statistics of 15.10, 8.41, and -14.77, respectively. Using this information, we would expect to find that teams scoring reasonably above or below their predicted number of points per game should have extraordinary defensive numbers, pace numbers, special teams numbers, or a combination thereof. Here are the defensive numbers, both correlation and regression:

Rank Statistic 2009 Coefficient 2008 Coefficient Average
1 Yards Allowed per Game 0.916 0.909 0.912
2 Yards Allowed per Play 0.904 0.893 0.899
3 Passing Efficiency Allowed 0.869 0.866 0.867
4 Yards Allowed per Possession 0.874 0.844 0.859
5 Total Passing Allowed per Attempt 0.872 0.830 0.851
6 Rushing Yards Allowed per Game 0.821 0.873 0.847
7 Total Rushing Allowed per Game 0.810 0.867 0.838
8 Yards Allowed per Attempt 0.857 0.788 0.823
9 Yards Allowed per Carry 0.800 0.845 0.822
10 Total Rushing Allowed per Carry 0.756 0.793 0.774
11 Sacks per Game -0.653 -0.604 -0.628
12 Interceptions per Game -0.645 -0.599 -0.622
13 Turnovers Forced per Play -0.581 -0.588 -0.585
14 Net Kickoff Average -0.564 -0.522 -0.543
15 Total Passing Allowed per Game 0.626 0.457 0.542
16 Sacks per Attempt -0.585 -0.497 -0.541
17 Interceptions per Attempt -0.549 -0.520 -0.535
18 Turnovers Forced per Game -0.528 -0.526 -0.527
19 Turnovers Forced per Possession -0.502 -0.513 -0.507
20 Passing Yards Allowed per Game 0.573 0.375 0.474
21 Net Punting Average -0.277 -0.320 -0.299
22 Fumbles Forced per Carry -0.213 -0.273 -0.243
23 Fumbles Forced per Game -0.010 -0.075 -0.042

2009 – Points Allowed per Game = 3.68*TPAPA + 2.42*TRAPC - 318.12*TFPP
2008 – Points Allowed per Game = 3.62*TPAPA + 2.89*TRAPC - 348.70*TOPP
Combined - Points Allowed per Game = 3.61*TPAPA + 2.70*TRAPC - 331.02*TOPP

The most readily apparent issue is that the offensive and defensive results differ. What’s important to keep in mind, then, is that this is not a game-by-game analysis. What we have done here is to attempt to put a formula together that will predict season-long scoring performance on a per game basis. That prediction is based on the strength of a team’s individual passing, running, and ball protection capabilities. Admittedly, though, it does raise the question regarding a game-specific correlation and regression analysis. Using game data, then, here are the results for each season and both together, keeping in mind that the results for points scored and points allowed will necessarily be identical here:

Rank Statistic 2009 Coefficient 2008 Coefficient Average
1 Yards 0.756 0.762 0.759
2 Yards per Play 0.739 0.743 0.741
3 Passing Efficiency 0.671 0.684 0.678
4 Yards per Possession 0.650 0.644 0.647
5 Total Passing per Attempt 0.641 0.637 0.639
6 Yards per Attempt 0.597 0.597 0.597
7 Rushing Yards 0.543 0.531 0.537
8 Total Rushing 0.525 0.514 0.520
9 Yards per Carry 0.524 0.514 0.519
10 Total Passing 0.444 0.462 0.453
11 Total Rushing per Carry 0.450 0.443 0.446
12 Passing Yards 0.408 0.426 0.417
13 Turnovers per Play -0.328 -0.354 -0.341
14 Sacks -0.350 -0.305 -0.327
15 Turnovers per Possession -0.309 -0.334 -0.322
16 Interceptions -0.296 -0.296 -0.296
17 Turnovers -0.280 -0.309 -0.295
18 Sacks per Attempt -0.286 -0.276 -0.281
19 Interceptions per Attempt -0.274 -0.260 -0.267
20 Fumbles per Carry -0.196 -0.185 -0.190
21 Punt Return Average 0.112 0.188 0.150
22 Kickoff Return Average 0.187 0.092 0.140
23 Net Kickoff Average 0.132 0.095 0.113
24 Fumbles 0.095 0.080 0.087
25 Net Punting Average -0.071 -0.051 -0.061

2009 – Points = 2.72*TPYPA + 2.65*TRYPC - 127.05*TOPP
2008 – Points = 2.84*TPYPA + 2.78*TRYPC - 142.41*TOPP
Combined - Points = 2.78*TPYPA + 2.71*TRYPC - 134.61*TOPP

Generally speaking we see lower correlation values across the board when we look at the games individually. My first thought on that is that in a single game we see more variation and effect from other inputs such as a team’s defense and its effect on points scored. This includes not only defensive and/or special teams scores but also field position, turnovers forced, etc. These factors may wash out over the course of the year such that season-long performance stats are more directly related to a unit’s own capability with less of a correlation to other units’ performance, although the other units clearly will still have an impact.

Another interesting note is that turnovers cost teams more points in their season averages than they do in an individual game. Similarly, efficient passing games create more points over the long haul than in a single matchup. I’m not completely sure what to make of that aspect of the results, but intuitively it seems that perhaps turnovers can be overcome in a single game but teams that have a consistent problem with gifting possession to their opponents find it difficult to always overcome the issue. On the passing game side of the coin, perhaps the efficient passing games statistically lead to fewer points on their own as leading teams revert to shorter passes and their run game to milk the clock. That one is a bit trickier to understand.

Football will remain the most difficult of popular sports to analyze statistically based on its requirement of continuous action between 22 players on the field at the same time. All of these actions take place completely in parallel on any given play leading to a tremendous amount of interdependency as the players accrue their individual statistics. This helped lead me to make my attempt at tackling the team statistics first; they are far simpler to handle as the interaction between two teams in discrete events be they plays, possessions, or games. Is it impossible to rate football players with better statistical analysis? No, but the amount of data acquisition would be staggering and even that can come only after careful consideration of what interactions we should track. For now I am resigned to stop at the team level and hope that we can glean at least some sort of useful conclusions there.

Thoughts, comments, requests, and suggestions welcome.