One of the goals of making the many maps that I do is to get a better idea of how consistent umpires are between games. In other words, how much does an umpire’s strike zone differ from game to game? One way we can do that is by looking at all the maps I have made for each umpire over the course of the season. This can of course give us an idea about what an umpire is doing game to game but it’s rather qualitative. We can describe it. We can see when the umpire decided to be a little more generous on the outer half, or called a pitch a bit lower than other times. But this isn’t a quantitative result, which is what this blog is all about. Finding ways to quantify umpire performance. So how do we quantify umpire consistency?
One way is to use a two-sample Kolmogorov-Smirnov Test. This test compares two distributions and determines if they are the same or not. It does this by calculating the maximum difference between the two cumulative density functions which results in the statistic D. Higher values of D mean the two different samples have very different distributions while low D-values indicate more similarity between the two sample distributions.
The Kolmogorov-Smirnov test compares only two samples, and at this point in the season, each umpire has called anywhere from 6 to 12 games behind the plate. To get a cumulative measure of umpire consistency, I applied the K-S test to the strike zone map for each game called by an umpire comparing it to all of the other games that umpire has called for the months of April and May. The results for Dana DeMuth look like this:
|Apr 1, 2013||Apr 5, 2013||Apr 10, 2013||Apr 15, 2013||May 22, 2013||May 27, 2013||May 31, 2013|
|Apr 1, 2013||0||0.050||0.086||0.141||0.062||0.082||0.097|
|Apr 5, 0203||0.050||0||0.052||0.095||0.053||0.080||0.057|
|Apr 10, 2013||0.086||0.052||0||0.075||0.064||0.042||0.036|
|Apr 15, 2013||0.141||0.095||0.075||0||0.123||0.096||0.054|
|May 22, 2013||0.062||0.053||0.064||0.123||0||0.055||0.077|
|May 27, 2013||0.082||0.080||0.042||0.096||0.055||0||0.048|
|May 31, 2013||0.097||0.057||0.036||0.054||0.077||0.048||0|
I then averaged all the unique D-values (basically, all the numbers above a 0 in that table) weighting them with the combined number of pitches called in each game. The result is an average D-value for each umpire for the season thus far. The average D-value is basically the average difference between the ball-strike maps for all the games the umpire has called. A lower average D-value indicates the umpire has been calling a more consistent strike zone throughout the course of the season. The results are shown below:
|C. B. Bucknor||0.164|
So, over the course of the season thus far, Dana DeMuth has been the most consistent umpire according to these numbers. An interesting thing to note is that the mean D-value for these umpires is slightly bimodal with clusters forming around D-values of 0.08 and 0.013. There seems to be a group of umpires that apply the strike zone pretty consistently and then some that struggle a bit more. For the most part though, I think this shows the umpires apply their own strike zone fairly evenly. This data doesn’t at all describe what the shape of that strike zone is, or if it abides by the MLB rules. But I think it is a step in the right direction for an umpire to have the same strike zone day in and day out.