I have been a baseball fan for my entire life. Some of my earliest memories are of me returning from my morning kindergarten class and sitting down with a peanut butter and jelly sandwich to watch Harry Caray on WGN broadcast the afternoon Cubs game. I grew up watching Shawon Dunston, Mark Grace, Greg Maddux, Sammy Sosa, and my favorite player back then, Ryne Sandberg. Since then, I have been a pretty avid Cubs fan, and I have watched a lot more Cubs though less often with a Pb&J.
I am starting this blog to share some work I have been doing on how umpires call strikes and balls. This is, by no means, new territory for baseball research. A simple google search will reveal a lot of discussion about heat maps, called strike percentages, and all sorts of ways at grading umpire performance. Some of the most in depth analysis of umpiring can be found at Brooks Baseball and Baseball Heat Maps. Most of these researchers rely on MLB’s pitch f/x data in their analyses and I will do the same. However, I will employ some different methods in my analysis.
My foray into pitch f/x data started during my time at the University of South Florida as a graduate student in geology. There I was introduced to a statistical method called kernel density estimation. This method is used in the field of volcanology to estimate the spatial hazards within distributed volcanic fields. I applied this method to the pitches umpires call balls and strikes. The goal of this was to develop a way for generating strike zone maps for umpires that can be used to predict whether a pitch will be called a ball or strike and I believe I have come close.
Some acknowledgment is due. This work wouldn’t be possible without the people who have delved into the MLB pitch f/x database before me. In addition to the two previous researchers, Mike Fast’s blog was helpful in getting the first generation of this project off the ground. I would like to thank MLB as well for making all the data used in this blog available. Finally, in my actual analysis I have made extensive use of Perl, GMT, and R. This project would certainly not be possible if any of those tools were not available.