Mapping strike zones as called by umpires is not a new task. Several people have already done this before me. In this post, I want to show what has been done already, and then, in the next post, contrast that with the method I have employed. To start let’s look at the strike zone tool available at Brooks Baseball.
As an example, let’s look at the strike zone called by Dana DeMuth on April 5, 2012. This was the home opener for the Cubs who played the Washington Nationals. The Brooks Baseball strike zone map is here. On this plot, which is from the catcher’s perspective, strikes are plotted in red and balls are plotted in green. There is a box that denotes the strike zone as defined by MLB rules and a box that that shows the strike zone as called by Dana DeMuth.
I have two major issues with this plot. First, it is a qualitative analysis of the umpire’s performance. We can see a green pitch that is located inside the strike zone and know that he messed up, but we can’t make any estimates of how often that happens. We also can’t say how often a pitch is called a strike in a certain part of the zone. Second, this plot assumes the strike zone is rectangular. This may be true for the strike zone defined by MLB, but umpires do not see the strike zone that way. So this tool makes some basic assumptions about the shape of the strike zone and does not provide us with a way to quantify the umpires performance.
Another way to map the strike zone is available at Baseball Heat Maps. We can again use Dana DeMuth as an example. Their plot for left handed hitters is here, and the right handed hitter plot is here for the same game. Notice, that they quantify the frequency with which Dana DeMuth calls strikes in different parts of the zone. The shape of the strike zone is also allowed to vary based on the performance of the umpire. So they do a pretty good job at Baseball Heat Maps, but I think it can be improved.
First, the plots aren’t allowed to freely mimic the distribution of pitches called by the umpire. Notice the straight lines and sharp angles where there is a change in direction. This is a product of the method they used to generate the graph. A robust method will allow edges to have curved shapes, or in other words be a continuous function. Second, and this might be nitpicky, but their color scheme is well, ugly and not that informative.
So if I want to improve upon these strike zone maps, I need a method that allows the strike zone map to be a continuous function, quantifies the frequency pitches get called strikes or balls, and has a better color scheme than the other two methods shown here. Fortunately, all these things can be done using kernel density estimation but that’s for the next post.