My latest random project idea was to calculate college baseball win shares. Unfortunately there are a large number of big obstacles between here and there, not the least of which is lack of available data.
Another major roadblock is the unbalanced schedules played by every team. Win shares, and most major league baseball sabermetric statistics, rely on approximately equal schedule strength. Schedule strength is routinely ignored, and with good reason.
Adjusting the stats is tedious and doesn't add enough value to justify the effort for MLB numbers. Park and scoring environment adjustments can be done much more easily and offer more value.
So my first decision was to run them for only the Big 12. The schedule is inherently balanced in conference play for all intents and purposes. So I set out to calculate win shares for the Big 12.
That didn't last long. I don't have nearly enough familiarity with all the squads to know offhand who plays what position for every team. So while I knew that I would end up running them only for Texas, I also know that I needed a park factor and that I may run them for the whole conference over the summer.
Not to mention that you can't really run them without getting a league environment set up. If you have any familiarity with win shares, you know that much of the calculation is based on league norms. This is yet another post that I could make entirely too long, so I'll try to cut it down with a formulaic account of issues and how I handled them.
- Park Factors - I ran my adjusted stats formulas for the Big 12 season to calculate adjusted offensive runs per inning, defensive runs per inning, ballpark runs per inning, and ballpark home runs per inning. These were used where required.
- Team Total Win Shares - 486 times the team's winning percentage in conference. Selected only so that the average team will have 243 win shares, the same as MLB.
- Marginal Runs - Based on the adjusted offensive and defensive numbers from above relative to league average.
- Batting Calculation - Essentially the same as the MLB formula. Runs created, etc., formulas ported pretty much directly.
- Pitching Calculation - Also essentially the same. This could cause issues as component ERA may need to be different, etc.
- Fielding Calculation - This was the tough one. I tweaked the formulas where I thought it made sense, but you'll see that the first round of results has raised some questions regarding my decisions. Because I haven't identified who plays what position on the other teams at this point, I had to use the 2009 AL for league norms. My hope is that this washed out between positions as each player was compared to their major league counterparts, so hopefully they were all penalized fairly equally. I also changed some constants based on the shorter season. If I run the entire Big 12 in the future (anyone want to help?) I can go back and plug the accurate league factors in. Also, players were assumed to have accumulated all their fielding stats at their primary position.
So, here are the results through last weekend's games. It will take some time to go through them all after this coming weekend's games and recalculate, but I will try to do so.