Umpire Consistency

One of the goals of making the many maps that I do is to get a better idea of how consistent umpires are between games. In other words, how much does an umpire’s strike zone differ from game to game? One way we can do that is by looking at all the maps I have made for each umpire over the course of the season. This can of course give us an idea about what an umpire is doing game to game but it’s rather qualitative. We can describe it. We can see when the umpire decided to be a little more generous on the outer half, or called a pitch a bit lower than other times. But this isn’t a quantitative result, which is what this blog is all about. Finding ways to quantify umpire performance. So how do we quantify umpire consistency?

One way is to use a two-sample Kolmogorov-Smirnov  Test. This test compares two distributions and determines if they are the same or not. It does this by calculating the maximum difference between the two cumulative density functions which results in the statistic D. Higher values of D mean the two different samples have very different distributions while low D-values indicate more similarity between the two sample distributions.

The Kolmogorov-Smirnov test compares only two samples, and at this point in the season, each umpire has called anywhere from 6 to 12 games behind the plate. To get a cumulative measure of umpire consistency, I applied the K-S test to the strike zone map for each game called by an umpire comparing it to all of the other games that umpire has called for the months of April and May. The results for Dana DeMuth look like this:

Apr 1, 2013 Apr 5, 2013 Apr 10, 2013 Apr 15, 2013 May 22, 2013 May 27, 2013 May 31, 2013
Apr 1, 2013 0 0.050 0.086 0.141 0.062 0.082 0.097
Apr 5, 0203 0.050 0 0.052 0.095 0.053 0.080 0.057
Apr 10, 2013 0.086 0.052 0 0.075 0.064 0.042 0.036
Apr 15, 2013 0.141 0.095 0.075 0 0.123 0.096 0.054
May 22, 2013 0.062 0.053 0.064 0.123 0 0.055 0.077
May 27, 2013 0.082 0.080 0.042 0.096 0.055 0 0.048
May 31, 2013 0.097 0.057 0.036 0.054 0.077 0.048 0

I then averaged all the unique D-values (basically, all the numbers above a 0 in that table) weighting them with the combined number of pitches called in each game. The result is an average D-value for each umpire for the season thus far. The average D-value is basically the average difference between the ball-strike maps for all the games the umpire has called. A lower average D-value indicates the umpire has been calling a more consistent strike zone throughout the course of the season. The results are shown below:

Umpire Average D
Dana DeMuth 0.063
Greg Gibson 0.067
Jerry Layne 0.072
Ed Hickox 0.073
Tim McClelland 0.073
Chris Conroy 0.074
Brian O’Nora 0.074
Phil Cuzzi 0.074
Kerwin Danley 0.077
Chris Guccione 0.079
Mark Carlson 0.079
Alan Porter 0.082
Marty Foster 0.083
Adrian Johnson 0.083
Dan Bellino 0.084
James Hoye 0.084
Paul Schrieber 0.084
Mike DiMuro 0.085
Jim Joyce 0.085
Angel Hernandez 0.088
Manny Gonzalez 0.089
Jordan Baker 0.089
Tim Timmons 0.090
Clint Fagan 0.091
Tom Hallion 0.092
Jerry Meals 0.093
Mike Muchlinski 0.093
Mike Estabrook 0.094
Gerry Davis 0.094
Vic Carapazza 0.096
Bill Miller 0.096
Scott Barry 0.097
Bob Davidson 0.098
John Tumpane 0.099
John Hirschbeck 0.100
Laz Diaz 0.100
Sam Holbrook 0.102
Dale Scott 0.102
Jeff Nelson 0.105
Ron Kulpa 0.109
Paul Nauert 0.110
Andy Fletcher 0.110
Paul Emmel 0.115
Wally Bell 0.115
Joe West 0.117
Chad Fairchild 0.124
Lance Barksdale 0.124
Rob Drake 0.128
Jeff Kellogg 0.134
Lance Barrett 0.135
Tony Randazzo 0.139
Tim Welke 0.143
Bill Welke 0.145
Jim Reynolds 0.146
Fieldin Culbreth 0.148
Gary Darling 0.150
Alfonso Marquez 0.156
Cory Blaser 0.159
Mike Winters 0.162
Hunter Wendelstedt 0.162
C. B. Bucknor 0.164
Ted Barrett 0.164
Mike Everitt 0.166
Bruce Dreckman 0.166
Doug Eddings 0.167
Brian Knight 0.172
Marvin Hudson 0.180
Dan Iassogna 0.181
Mark Wegner 0.184
Eric Cooper 0.187
Todd Tichenor 0.193
Gary Cederstrom 0.210
Larry Vanover 0.210

So, over the course of the season thus far, Dana DeMuth has been the most consistent umpire according to these numbers. An interesting thing to note is that the mean D-value for these umpires is slightly bimodal with clusters forming around D-values of 0.08 and 0.013. There seems to be a group of umpires that apply the strike zone pretty consistently and then some that struggle a bit more. For the most part though, I think this shows the umpires apply their own strike zone fairly evenly. This data doesn’t at all describe what the shape of that strike zone is, or if it abides by the MLB rules. But I think it is a step in the right direction for an umpire to have the same strike zone day in and day out.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s