clock menu more-arrow no yes mobile

Filed under:

Horrible No-Good Refs, Quantified

A modest attempt to quantify the impact of bad referee calls on the Texas-OSU football matchup. Warning: the language in comments will likely be NSFW.

Holding? What holding?
Holding? What holding?
Brendan Maloney-USA TODAY Sports


So tonight I decided I would try to put the bad refereeing we saw Saturday night in a numerical context.

WARNING: There will be tables of numbers and stuff. I'll do my best to explain.

First step is to get a big picture idea of what refereeing Charlie Strong has been accustomed to dealing with since arriving at Texas.

Here's a table that shows how the penalties break down, game by game:


Let's take a second to understand what we're seeing here. Here's a key:

ref = The head referee for the game. The refs typically serve with the same or very similar crews throughout the year.

PEN = Number of penalties accepted. Does not include declined penalties.

FORC = Number of penalties in which the referee is forced to make a call by events on the field. False Start, Delay of Game, etc. Includes declined penalties.

JUDG = Number of penalties in which a judgment call is required by the referee. Holding, Pass Interference, etc. Includes declined penalties.

YDS = Penalty yards accepted.

1ST = Number of first downs the opposing team obtained via penalty.

Long story short, don't let anyone tell you that Texas has been getting all the lucky breaks from the refs. At least not recently. Big 12 referees have generally been very egalitarian in Texas matchups, giving Texas sightly more penalties, judgment call penalties, penalty yards and opponent first downs over Strong's first 15 games.

* * *

Then the last two weeks happened.

First, let's look at Texas' poor penalty performance in the Cal game. 10 penalties accepted, including 8 judgment calls, for 93 yards total. To that date it was the worst penalty performance of Charlie Strong's tenure.

Was that game rigged? A cursory glance at the tape says no.

Below is my personal grading chart for the Texas penalties in the Cal game. I decided upon a quartile grading scheme, coded as follows:

easy = An easy call. This call is scored 0% incorrect. We'll get into why we need the percentage later.

force = A call forced by events on the field. This call is also considered 0% incorrect.

? = Review of the game film shows no footage of the penalty in question, or no suitable angle to judge whether the call was correct or not. This is fairly rare and usually indicates that a penalty happened in an unexpected area of the field. Due to suspicious oddness, this call is considered 25% likely to be incorrect.

weak = A foul that is so weak or questionable that it isn't called half the time. This call is considered 50% likely to be incorrect.

bad = A call that probably shouldn't be called based on the film. Possibly justified from the limited angle at which the ref saw it but it's still a bad call in light of the tape. Considered 75% incorrect.

awful = A call that is completely inexcusable. 100% incorrect.

Capiche? Good. Now let's look at the Texas scorecard for the Cal game:

tx_cal bad call on the Haines targeting, and a mess of bankshots and layups for the refs. We really did shoot ourselves in the foot with penalties.

What about the OSU game? Different story altogether.

First let's look at OSU's penalties:


Clean shirts. Two delay-of-game calls and one false start on field goal kicking plays. One false start on 3rd and 16. Three obvious judgment calls.

* = the play in which Walsh lost the fumble and mysteriously came up with it. There was an egregious hold on Ridgeway before Walsh scrambled.

Now, behold Texas:


I'll admit, these scores are subjective. They're based on my non-expert viewing of the game tape. But for the exercise I'd like to perform next, it'll do. We can fudge around with these scores and still come out with the same basic outcome.

* * *

What am I talking about? I'm talking about quantifying the impact of these ref calls in terms of Expected Points (EP), and then divvying up the responsibility for that impact between the team being penalized and the (possibly crappy) refs assessing the penalty.

I'm not going to spend a lot of time discussing EP here. I'm going to give you one example; if that's not enough to make sense of it, use your Google. Or ask in comments.

* * *

Let's say you are given a choice: your team can have the ball at:

(a) second and ten, at your opponent's twenty

- or -

(b) third and five, at your opponent's fifteen.

Which do you choose?

There's one simple way to find out, if you have the data - look up every instance in recent college football history where teams got the ball at that exact down, distance, and field position, and average how many points those teams scored.

That's all EP is.

As it turns out, historical data says option (a) produces 3.7 EP on average, while option (b) produces 3.3 EP.

So let's change things up a bit:

Now picture that it's a real game situation - second and ten, at your opponent's twenty. Your RB rumbles forward for five yards, but flags go flying. Offensive holding + defensive personal foul. Offsetting penalties; replay the down.

But did those penalties really offset, in terms of game impact?

Nope. The chance to retry second down is worth more to the offense than the five yard gain. Before the play, the Expected Points value of the field position was (a) 3.7. After the run, it was (b) 3.3. By securing the offsetting penalties, the offense gained 0.4 EP, on average.

So basically, if you use two Expected Values to paint a "before" and "after" picture around a penalty, you can look at the difference between those two numbers and call that the impact of the penalty.

* * *

So what was the impact of the penalties in the Texas-OSU game?

First, let's look at OSU:


(you might need to open this image in a new tab to see it clearly)

Here's your key:

Q = Quarter

rating = See rating system, above

unit = "O" for offense or "D" for defense

ratio - the incorrectness score for each rating. a 0 means it's "easy" or "force" and incorrect 0% of the time. A 1 means it's an "awful" call, incorrect 100% of the time.

pre = What the situation would have been, had the penalty never been called.

pst = What the situation actually was after the penalty was called.

dwn/yds/ydln = down, distance, and number of yards away from end zone.

pre-EP = Expected points if the penalty had not been called.

pst-EP = Expected points after the penalty was called.

net = Raw impact of penalty. Difference between pre-EP and pst-EP. Positive if it helps Texas, negative if it hurts Texas.

ref EP = portion of Expected Points attributable to refs according to the rating system.

player EP = portion of Expected Points attributable to the team according to the rating system.

Since OSU didn't experience any bad or borderline calls, all of the Expected Points lost by OSU due to penalties accrue to the team and not the refs.

Side note: Expected Points for penalties immediately before field goal attempts are calculated by the formula: 3* (1-.0004*(Distance^2)), since that's a rough estimate as to the expected point value of kicking from any distance under 50 yards.

Conversions of down, distance, and field position to EP values are performed with charts and graphs such as those posted here. They're strictly eyeballing guesstimates but that's good enough for my purposes.

And what do those numbers say? Here's the OSU totals:


Interpretation: OSU had no defensive penalties at all and none of the pain OSU suffered from penalties can be ascribed to ref error. The penalties it had on offense were extremely minor - most of its penalties were on fourth down at chip shot kicking distance, or penalties deep in their own end of the field, or on 3rd-and-long. These are all situations where EP is already low, or a small change in yards isn't going to change very much. Penalties only hurt their offensive production by 2.3 points or so.

* * *

Now, check out Texas:


Here's the totals:


In a nutshell, what these numbers are saying is this: bring in some fair zebras, and Texas wins this game by two touchdowns.

If you hold a thousand games and deal the home team all of these penalties once each over the course of each game, on average you'd get a net score 27 points lower for the home team than games in which all the same plays happened but none of these penalties were called.

Based on the rating system I'm using, Texas players are actually responsible for roughly ten of those lost points.

The bad refs are responsible for the remaining 17.

Try to wrap your head around that.

It makes some intuitive sense: on a per-play basis, Texas hung tightly with OSU, and produced a similar number of explosive plays (if you ignore that most were called back by errant flags), plus the Texas defense produced two touchdowns off turnovers and OSU produced none.

Just going off of that, one should expect Texas to win by roughly 14.

But the cost of the penalties was simply staggering.

In order of egregiousness:

1) The Boyette roughing the passer call alone converted a turnover near midfield (likely to result in 2.6 net points for Texas on average) into a first and ten for OSU on the cusp of the red zone (which would cost us -3.8 net points on average). That's a -6.4 expected point swing on one play, of which -4.8 EP is attributable to the refs due to the Boyette penalty being ranked as "bad".

2) The first inexplicable call of Vahe for the cut block, obliterating a TD run by Foreman. While we later procured the touchdown, it was far from certain we would at the time (thank you, Tyrone). -3.2 EP, all on the refs for a horrific call.

3) Calling the perfectly-sideways Heard lateral a forward lateral on the double pass. This penalty isn't "awful" but it's a model description of a "weak"-rated play that could've gone either way. It erased a TD scored from midfield, and left us in second-and-long on our half of the field. -6.3 EP, half of that (-3.15 EP) on the refs.

4) The utterly magical fumble recovery by Walsh, sticking one hand into a Texas dogpile and securing the ball. Garbage call. -2 EP, all on the refs.

5) The late game Poona-Strong penalty combo that got everyone worked up for obvious reasons. -1.1 EP (-0.825 EP on the refs) for the Poona penalty and -0.9 EP for the Strong penalty...

... and yeah, this is a perfect point to note that these EP estimates don't consider game situation, e.g., time remaining.

So if anything, the impact on Texas is underestimated because the negative EP of the last two penalties doesn't take into account that the game was within a field goal and the ball past midfield with time running short at the moment the penalties were called.

Those penalties were in fact worth much more than two points combined. They pretty much changed the entire complexion of the game in fact, given the context.

But this method will only assign values denominated in points, and in the crux of the late game, a sudden switch of two EP is often enough to be decisive.

tl;dr: This was an epic screw job. Horrible calls by referees cost Texas as much as 17 points Saturday. Bad calls shouldn't be impacting games THAT much, ever.

If you have any questions, feel free to add them in comments and I'll address them as the opportunity arises.