This is the third in a series of posts that will review the 2009 season (as well as 2008) from an adjusted stats perspective.
Tentative Publication Schedule
Part I: Which Stats Correlate Best to Winning? - 6/24/10
Part II: Drilling Down and Regression Models - 6/28/10
Part III: Passing Efficiency Formula - 6/30/10
Part IV: Conference Strengths and Pace - 7/2/10
Part V: Testing Conventional Wisdom - 7/16/10
Part VI: Team Matchups - 7/20/10
Part VII: Year-to-Year Changes - 7/22/10
Part VIII: Points per Yard Efficiencies - 7/26/10
Part IX: Data Dump - Team Rankings - 7/28/10
Part X: Data Dump - Team Reports - 7/30/10
Not that anyone noticed, but Part II was a day late but hopefully I'm back on track. I realize that HenryJames was actually ecstatic to get an extra day of recovery after the series premiered.
Passing Efficiency Formula
The first couple of posts have dealt directly with attempting to get a better handle on the big picture and what components are most important to winning and points production. During Part II, though, I briefly mentioned that I was interested in taking a look at the NCAA passing efficiency formula. Today I'll take that look.
The NCAA formulated their passing efficiency calculation based on the average passer in 1979. Their goal was to ensure that an average passer had a 100 passer rating. Clearly the passing game has changed over the last 30 years as we know that a passing efficiency of 100 today is well below average, and in fact quite awful, for modern college football. Recalculating the formula in order for the average passer to have a 100 rating for the 2009 season is fairly straightforward. In my 2009 database (note again that games against FCS opponents are not included), FBS teams combined for 26,167 completions on 44,589 attempts good for 315,819 yards with 2,112 touchdowns and 1,373 interceptions. We start by making sure that the completion percentage and yards per attempt factors add to 100 for the average passer. This requires a coefficient of approximately 5.833, which we will round to 5.8, for the yards per attempt factor. Next we have to determine not only the ratio between the touchdown per attempt and interception per attempt numbers so they cancel out for the average passer, but also the scale for both. I elected to scale them so that they are worth the same relative to yards per attempt that they are with the standard formula. This resulted in a touchdown per attempt coefficient of about 229.16, which I will round to 230, and an interception per attempt coefficient of about 352.5, now rounded to 350. Doing the same for 2008 yields the following results:
Standard – PEFF = (8.4*Yards + 100*Comps + 330*TDs - 200*INTs)/Attempts
2009 – PEFF = (5.8*Yards + 100*Comps + 230*TDs - 350*INTs)/Attempts
2008 – PEFF = (6.0*Yards + 100*Comps + 235*TDs - 360*INTs)/Attempts
In 1979 quarterbacks threw more interceptions than touchdowns at a ratio about equal to how many more touchdowns they throw than interceptions today. And, as we expected, the yards per attempt coefficient has fallen drastically as both higher completion percentages and higher yards per attempt numbers work together. By using a year-adjusted formula we are immediately able to make better comparisons of quarterback performances from one year or era to those from another. A 100 rating is league average whether the player threw his passes in 1979 or 2009. This would, of course, make it more difficult for every school to have their starting QB break their passing efficiency record twice a decade, but the records would be more meaningful. Additionally, year-adjusting the formula could be done on a weekly basis without much effort as long as all season totals are tracked, which the NCAA does.
So clearly this formula has two major advantages. Not only are comparisons between seasons more meaningful, but fans and media alike are better able to ascertain how far above or below average a quarterback’s performance has been. We’ve identified the coefficients necessary to accomplish this for the factors that the NCAA uses in the current formula. But are these factors the right ones?
Passing efficiency is obviously intended to put a value to a quarterback’s effectiveness. There were a couple of questions I had in mind before attempting to put together a new formula.
- Is completion percentage worth including? My hunch was that it is fairly worthless to include as yards per attempt is already included. In fact, including both is slightly illogical because completion percentage is itself a component of yards per attempt (multiply completion percentage by yards per completion and there you go).
- Should touchdowns per attempt be included? It has always struck me as odd to include a stat that contains the answer in the formula where you try to calculate that answer. Touchdowns are the name of the game for an offense, so isn’t it odd to include touchdowns in the efficiency calculation?
Consideration of the second question contributed to my plan to answer the first. The second issue really comes down to what we are trying to measure with passing efficiency. Do we intend for the value to correlate with winning or with scoring points for the offense? If we intend to correlate it to winning then it’s somewhat reasonable to try to include touchdowns per attempt. Correlating a single player on offense’s production directly to wins may not be completely wise as it ignores the contributions of not only his offensive teammates but the other side of the ball completely. However, the desired end result of every statistical analysis should eventually be to determine how much a contribution aids a player’s or team’s quest for their only goal – winning. But if we correlate it to scoring points, then we are essentially including our dependent variable in the independent variables of a regression analysis, for example. It would dwarf the correlations of the other stats. So my eventual decision was to include the stat in one version of the formula correlated to winning and not to include it in another version correlated to points production.
At this point, having decided to run the correlations and regression analyses for both winning and points production, I ran them with an eye on completion percentage. Here are the correlations for each of the four stats included in the NCAA passing efficiency formula for both 2008 and 2009. The first table is correlation to winning percentage and the second is correlation to points scored per game (these use unadjusted stats as I don’t calculate adjusted completion percentage or touchdowns per attempt).
Rank | Statistic | 2009 Coefficient | 2008 Coefficient | Average |
1 | Yards per Attempt | 0.606 | 0.649 | 0.628 |
2 | Touchdowns per Attempt | 0.613 | 0.579 | 0.596 |
3 | Completion Percentage | 0.264 | 0.459 | 0.362 |
4 | Interceptions per Attempt | -0.294 | -0.355 | -0.325 |
Rank | Statistic | 2009 Coefficient | 2008 Coefficient | Average |
1 | Touchdowns per Attempt | 0.756 | 0.809 | 0.782 |
2 | Yards per Attempt | 0.712 | 0.814 | 0.763 |
3 | Completion Percentage | 0.447 | 0.607 | 0.596 |
4 | Interceptions per Attempt | -0.455 | -0.461 | -0.458 |
Touchdowns per attempt did not end up dwarfing yards per attempt by any means, even in correlation directly to points scored. Completion percentage, meanwhile, while correlated positively and significantly, is not nearly as strong as yards per attempt and the difference between the 2008 and 2009 numbers is extreme compared to the others. For those reasons I decided to omit completion percentage moving forward. The next step is to run the regression analyses for each relationship. After running the linear regression directly to adjusted winning percentage and points per game, I scaled the coefficients so that the overall average would be 100 for each case. I did this by multiplying all coefficients by 200 for the winning percentage regression (average winning percentage is 0.500) and by 100 divided by average points per game for the second set. The results:
Winning Percentage Regression:
2009 – PEFF = (8.7*Yards + 1220*TDs - 650*INTs)/Attempts
2008 – PEFF = (15.1*Yards + 440*TDs - 770*INTs)/Attempts
Points per Game Regression:
2009 – PEFF = (10.9*Yards + 690*TDs - 350*INTs)/Attempts
2008 – PEFF = (12.5*Yards + 580*TDs - 415*INTs)/Attempts
The winning percentage regressions show a very high amount of volatility. The points per game versions are somewhat more consistent but also somewhat volatile. This shouldn’t be surprising due to the additional contributing factors to winning percentage as discussed above. It seems that passing efficiency should be correlated to points scored rather than winning percentage as predicted. Because calculating the year-specific formula would be more time-consuming on a weekly basis and because of the volatility, it would likely be more productive to adjust the existing formula simply by removing completion percentage and then reformulating based again on setting 100 as an average rating. The same methodology is followed except now the yards per attempt figure is set to 100 for the overall environment on its own without consideration for completion percentage and then, as before, the touchdown and interception factors are set to cancel each other out.
2009 – PEFF = (14.1*Yards + 555*TDs - 855*INTs)/Attempts
2008 – PEFF = (14.5*Yards + 570*TDs - 865*INTs)/Attempts
Here we see much more consistency from year to year and as mentioned previously the year-adjusted formula is very quickly calculated at any point. I've put together two tables that contain comparisons between the standard calculation and the year-adjusted calculations without completion percentage for 2009 and 2008, respectively. Only the top 40 for each calculation are included. The very best performances show similar values for each calculation but the values diminish much more rapidly as they should based on the adjustment so that 100 is once again the average rating. For reference, the 100th ranked quarterback in 2009 had a 110.86 rating based on the NCAA formula and a 66.06 rating with the year-adjusted formula.
Friday's Part IV will move back into the overall view as I spit out some data on the different conferences' performances. Then it's a two-week hiatus as I put in for my hard-earned vacation away from Barking Carnival Headquarters and load the family up to head to Disney World. It will be like the Griswolds except with three kids instead of two. And no dead Aunt Edna.