The idea that the park in which a game is played affects baseball statistics, both individual and team, has been accepted for years. There is plenty of literature available on park effects at the major league level, but to my knowledge only Boyd Nation has tackled the idea at the college level.
His methodology, which he describes here, is a theoretically sound system.
What I wanted to do, though, was use all games played in a park instead of only those between home-and-home team pairs. The team pair concept ignores a good percentage of games in a park and also compares teams from different seasons to each other without accounting for their differences. So I adapted my standard power rating system to account for homefield by using the following process:
- Take Nation's scores file and filter out all games played at an unidentifiable park location. To do this I assigned all regional, superregional, and College World Series games to their proper ballpark. All other game described as "@neutral" in his scores file were removed.
- Calculate each team's initial offensive and defensive ratings by dividing their runs scored and allowed by the number of games played. Calculate each park's initial park factor by dividing the number of runs scored per game in that park by the average number of runs scored per game in all parks.
- Run the standard power rating algorithm through one iteration using park-neutral scores. To obtain the park-neutral scores, divide the runs scored by each team by the park factor. For example, if Texas beat Rice 8-4 but the park factor was 0.80 then the park-neutral score used for the power rating calculation is 10-5. This algorithm compares the total number of runs a team scored and allowed per game and compares it to their relevant average opponents' values. This gives a new offensive and defensive rating for every team.
- Determine the expected runs allowed for each park based on the offensive and defensive ratings of the teams that played each game in that park. Divide actual runs allowed by expected runs allowed for a new park factor.
- Iterate Steps 3 and 4 as many times as necessary until all ratings, both team and park, stabilize. This required approximately 200 iterations for each season before the sum of the absolute value of all ratings changes dropped below 0.0001, which was the threshold I selected.
Note that 1997 was not included in the analysis as the scores file does not specify game location for enough games
To be frank, I find this methodology much more palatable than the one used for major league baseball. While I haven't spent too much time worrying about major league park effects, the system seems to require after-the-fact adjustments based on each team's batting and pitching strength among other factors. The system outlined above constantly adjusts for these team-specific strengths.
But while I shockingly consider my method the best available, it is not without its faults. Sample size is something that can't be overcome with college baseball analyses but can only be mitigated to some extent. In the first link below you will see the effect that limited sample sizes can have. Park factors jump around from year to year by more than can be attributed to varying weather conditions. In order to attempt to control for this I set up a sliding scale based on the number of surrounding years available for each park. To see the effect of sample size, you can consider the 2005 Texas park factor. It is 138 for that season, which is a major outlier within the context of Disch-Falk's historical numbers. While there may be some argument about how the Disch played as a slight hitters' park over the years, there is no doubt that 138 is a major blip. If you'll recall, though, that was the year that Quinnipiac came to town for the regional and promptly gave up 55 runs in two games to Texas and Miami (OH). Texas also won a game 19-8 over Arkansas in that regional. Take out the regional and rerun the calculations and the Texas park factor for that season is merely 104, which is in line with other seasons in that period.
A single season, of course, was given 100% weight in the adjusted park factor. A season with only one adjacent year available was given 60% with the other 40% going to the adjacent season. Any season with more than one nearby year available was weighted at 40% with the other 60% being assigned to the surrounding years in descending values as the years became more removed. All years up to three years before and three years after each season were eligible for consideration. This may seem like far too many, but the average college baseball park sees only about 24 games per season. Here are the links for both the single year factors as well as the adjusted:
This system could still use improvement. Park renovations or new parks are ignored for now. Looking at LSU, their adjusted values currently use both the old and the new stadium's results. As for Texas, the single-year factors appear to indicate that replacing the old Astroturf with FieldTurf significantly lowered offensive production at Disch-Falk. It would seem that while there are still many fewer home runs than at other parks, the new surface is preventing the extra singles, doubles, and triples that used to make up for the difference in home runs. North Carolina played a season at the USA Baseball facility while their home park was being rebuilt. Full information regarding park renovations and changes aren't available without more research than I can do right now.
If you sort the tables by any particular season (sorry about the unrated teams showing up on top and the screwiness when sorting in reverse) you can see that the methodology passes the smell test. Looking at the adjusted 2010 values, the Top 10 highest park factors belong to:
- New Mexico St. - 158
- Morehead St. - 154
- Air Force - 152
- Northern Colorado - 150
- UNLV - 137
- New Mexico - 135
- Arizona - 135
- East Tennessee St. - 135
- Utah Valley - 132
- South Carolina-Upstate - 129
The Bottom 10:
- Princeton - 67
- Hawaii - 68
- Cornell - 69
- Rhode Island - 72
- Marist - 73
- Rutgers - 73
- Navy - 73
- Wagner - 74
- Columbia - 75
- Army - 75
The Top 10 is littered with high altitude facilities while the Bottom 10 is dominated by lower altitude and cold weather facilities. Position in my list correlates fairly well with position in Nation's list, which I expected. Given BC's audience, I'm sure some of you may be incredulous at Disch-Falk's historical ratings. As stated above, though, I would guess that non-HR hits used to more than make up for the lower home run totals. This theory exists solely because of the immediate change to a much lower park factor starting in 2009. Historically Disch-Falk had park factors over 1 in 1998, 1999, 2001, 2002, 2003, and 2005. I found that hard to believe at first, enough to double check against actual run totals to verify the methodology wasn't completely out of whack. What I found is that in 1998, 1999, 2001, 2002, and 2005 there were more runs scored per game in Texas games at Disch-Falk than there were in Texas games away from Disch-Falk. 2003 is the only season in which the process where we analyze the specific teams that played at each park causes Disch-Falk to flip from a pitchers' park by the raw numbers to a hitters' park by the adjusted numbers.
Questions? Thoughts? Suggestions?