Predicting wOBA Using Process-Based Statistics

When trying to determine a batter’s overall offensive value using a single statistic, one of the most popular metrics to use is weighted on-base average (wOBA). wOBA is calculated as a ratio of a linear combination of “outcome” statistics (unintentional walks, hit-by-pitches, singles, doubles, triples, and home runs) divided by, essentially, the number of plate appearances.

With that being said, could one predict whether a given player’s wOBA will be above a certain threshold using “process” statistics such as plate discipline and batted ball parameters? In particular, if we know a player’s zone contact rate, chase rate, and average exit velocity, could we predict with any confidence whether that particular player’s wOBA will be above, say, .320?

Using Statcast data and a bit of machine learning, I have decided to train a shallow neural network to try to do just that. I’ll be using snapshots of the Jupyter Notebook throughout the analysis to try and make it a little easier to follow. Read the rest of this entry »

Thinking About My Baby: Does Paternity Leave Affect Performance?

As an ardent follower of the Baltimore Orioles, I’ve experienced a lot of bad baseball over the past few years, and one specific bit of bad baseball caught my eye recently. On April 5th, Shawn Armstrong returned from the paternity list after his wife gave birth just a few days before. He was continually demolished over the next week, giving up six earned runs in two innings of relief. He wasn’t getting too unlucky either, even if his FIP (20.12) was below his ERA (27.00).

As the parent of a three-year-old, I thought back to my first week after work following a month-long paternity leave. I was distracted, tired, and couldn’t wait to get home at the end of the day. Of course Armstrong got lit up, he just became a dad a few days before! Maybe professional athletes aren’t staying up all night changing diapers, but it stills seems like they would perform worse after a trip to the paternity list as they reorient their life. Is that true though? Do athletes perform worse after returning from the paternity list?

Fortunately, Baseball Prospectus tracks all paternity leave going back to MLB’s implementation of the policy in 2011. Instead of parsing through every season of data, I just focused on the most recent ones: 2017-2020. This still provided 115 different trips to the paternity leave list, enough to give an idea of trends and differences. I separated these individuals into pitchers (62) and hitters (53) to make for easier comparison. For hitters, I used wRC+ as my key metric and tracked it across 7-day, 14-day, and full season time frames. For pitchers, I used ERA and FIP and tracked those across the same time frames.

There are a few quick caveats. Occasionally players will make an appearance before promptly being demoted or ending up on the injured list. I’ve kept the same number for the 7- and 14-day span even if the player didn’t make an appearance during that time frame. This only accounts for six players (five pitchers and a hitter), a fairly small amount of the sample.

Pitchers Returning From Paternity Leave
Time Frame ERA Mean Added ERA to Total FIP Mean Added FIP to Total
7 Days 5.7604021 .09293 4.706660804 .07591
14 Days 4.667755914 .07529 4.446168986 .071712
Full Season 3.922911761 .06327 3.99527033 .063349

The impact on hitters isn’t quite as noticeable, without any clear trend. Hitters’ wRC+ is actually higher within seven days of a paternity list visit compared to their full season performance. There is an 8-point gap between performance 14 days after a paternity list visit when compared to the full season numbers, but it’s hard to see how this squares with the 7-day performance. None of the differences are statistically significant at the 95% confidence interval.

Hitters Returning From Paternity Leave
Time Frame wRC+ Mean Added wRC+ to Total
7 Days 98.58 1.86
14 Days 90.86 1.71
Full Season 96.78 1.83

In summary, a trip to the paternity list doesn’t seem to have much of an impact for players. Maybe Shawn Armstrong was pitching badly just because he’s a bad pitcher; There’s a reason he’s since been DFA’d and passed through waivers. The performance for pitchers does still pique my interest, as there is a consistent trend when looking at 7-day, 14-day, and full-season performance across ERA and FIP. Despite this, the only statistically significant difference is between 7-day ERA and full-season ERA, far from anything conclusive.

The small sample (263.2 innings) and other confounding variables leave it far from conclusive in any direction for pitchers, especially given the other t-test results. It does appear, however, interesting enough to look at in a larger sample. A future quantitative analysis incorporating additional years of data may be able to provide more comprehensive answers. It may also be an area where qualitative research can provide answers on the impact of pitcher preparation, stamina, and overall performance.

Using the Toxicological Prioritization Index To Visualize Baseball

Major League Baseball is awash in advanced statistics that more reliably describe key aspects of players’ offensive and defensive performance. It has been reported that through the use of Statcast, the MLB Advanced Media group can supply teams with 70 fields x 1.5 billion rows of data per season [i]. Yes, billion with a b. This flood of information has supercharged MLB teams’ and the sabermetric community’s development of ever-more useful statistics for describing player performance.

However, this amount of data brings significant challenges. Perhaps chief among them is that while certain individuals may be comfortable with reams of tables and ever-increasing numbers of descriptive statistics, many others prefer or require analyses and visualization tools that convert disparate metrics into informative and readily interpretable graphics.

MLB’s situation has certain similarities to the discipline of safety toxicology, where the use of high-information content assays for characterizing chemicals’ toxicological profiles has exploded [ii]. Drawing conclusions from multiple biomarkers and test systems is challenging, as it requires synthesis of large amounts of dissimilar data sets. One tool that toxicologists have found useful is the Toxicological Prioritization Index, or ToxPi for short [iii]. ToxPi is an analytical software package that was developed to combine multiple sources of evidence by transforming data into integrated, visual profiles. Read the rest of this entry »

What if the Mound Was Moved Back?

Moving the mound back is a proposed solution to the ever-increasing rate of strikeouts in the modern game of baseball. The effect of moving the mound back one foot will be tested in the Atlantic League from August this year. Without the results of this test, we don’t know much about how this rule change could affect the delicate balance between pitchers and hitters. There are many unknowns such as:

  • How much will the perceived velocity decrease benefit hitters?
  • Will the added break on pitches benefit pitchers?
  • Will throwing a further distance add injury risk or cause a loss of pitcher control?
  • Will batters change their approach if it is easier to make contact?

In this article I aim to use my model of predicted pitch outcomes to investigate how moving the mound back may change the game. I’ve written previously about modeling the deadened baseball and I shall take a similar approach here. Read the rest of this entry »

How Sticky Are Walk Rates After Velocity Changes?

An increasingly popular strategy for drafting pitchers is taking ones with plus control and underwhelming fastballs, with the idea being that the club’s player development team can coax out a velocity jump. Intuitively, this makes sense since it is relatively easy to develop velocity and relatively difficult to improve control. Being able to get a pitcher from control-only to one with above-average stuff and plus control, and suddenly you have a solid rotation arm out of an org-depth pitcher.

However, skill improvement does not happen in a vacuum, and there are potential side effects to this strategy, namely that control might end up getting worse as velocity increases. In general, these end up being fine tradeoffs since we are in a power-oriented offense, but investigating these effects is important in evaluating how valuable this strategy is, which is what I do in this article. Read the rest of this entry »

An Examination of Rebuilding Team Timelines

Rebuilding has become the popular way for MLB franchises to construct a World Series contender. Considering the league’s structure of compensating the worst teams with the best draft picks, it seems like a viable strategy to maximize your losses in order to obtain the services of the best amateur talent available. The Astros and Cubs are two of the more recent franchises to successfully cap their extensive rebuilding process with a World Series victory, and both franchises acquired top-10 draft picks for several years before they turned the corner and became champions, but how often does this strategy work and how long does a rebuild take?

If an organization’s strategy is to not win games right away, when do the fans and ownership realize that the rebuilding process has failed and that their team is in the middle of a downward spiral of ineptitude? I am sure there are fans of the Pittsburgh Pirates and Kansas City Royals from the 1990s and 2000s that know how difficult it is to build a contender and cringe whenever they hear the term rebuild. Hopefully this article can provide a reasonable timeline for contention and an objective overview on how a franchise’s rebuilding effort should be progressing.

For my dataset, I gathered the GM or President of Baseball Operations for each organization since 1998. I chose 1998 because it was the first year the league consisted of 30 teams and it also happened to be the first full season for the current longest-tenured executives, Billy Beane and Brian Cashman. If an executive’s tenure with the team started before the 1998 season, their entire tenure was included in the dataset. This means Braves GM John Schuerholz’s regime is measured in its entirety from 1991-2007 and not just from 1998-2007. Read the rest of this entry »

Why Are so Many Runs Scored in the Bottom of the First Inning?

After starting to look at some inning-by-inning data from my baseball win expectancy finder for another project, I stumbled across something weird that I can’t explain. Here’s a graph of expected runs scored per inning:

Graph of expected runs by inning

Check out how high the bottom of the first inning is. On average, 0.6 runs are scored then compared to 0.5 runs in the top of the first. That’s a huge difference! Let’s look closer:

Graph of home advantage in runs by inning

Holy outlier, Batman! So what’s going on? Here are some ideas:

Read the rest of this entry »

Properly Diving Into Expected Stats

“This player is having a good year, but his xwOBA is slightly lower than his wOBA, therefore he’s going to get worse.”

This is a common concept you’ll hear within the baseball analysis community. With the data made available to us, it’s easy to come to conclusions like this. However, it’s not always about the data made available to us, but the analysis that comes from it.

To better grasp how this “problem” of data analysis came to fruition, let us go back in time.

Starting in 2015, the public was provided with Statcast metrics for MLB players via Baseball Savant. Among those stats were exit velocity, launch angle, hard-hit rate, pitch velocity, sprint speed, and, to be honest, practically anything that can be measured! It’s a fabulous website that provides very useful information we should be exceptionally grateful for.

The most popular metrics on the website, however, are their expected stats: expected batting average (xBA), expected weighted on-base average (xwOBA), expected on-base percentage (xOBP), expected slugging percentage (xSLG), and expected isolated power (xISO). Essentially, these statistics are what you’d expect based on the name; they indicate what a player’s “true talent level is” based on the quality of their contact, frequency of contact, and, depending on the batted ball, sprint speed.

This would appear to be a gold mine on the surface. With the ability to know what numbers a player deserves to have, we should be able to separate their talent level from outside circumstances, and thus better predict future performance. Yet that actually isn’t the case. Read the rest of this entry »

Peer Learning Among MLB Umpires

A growing group of social scientists are researching peer learning, looking to answer the question “does an individual learn from their network?” In this post, I’ll present some evidence that MLB umpires “learn” from their peers in their assigned crews.

To quantify this, I calculate “call quality” for each umpire in each season from 2008 to 2019. Call quality is determined in a similar way to many umpire score card measures: I take PITCHf/x data for each game that a given umpire was assigned to home plate, subset to all called strikes and called balls, and overlay the true strike zone to calculate the proportion of correct calls.

I’m specifically interested in whether an umpire’s call quality is driven by the call quality of umpires they have been assigned to work with in the past. Read the rest of this entry »

Learning a Lesson From Basketball Analytics

I read an interesting article here by Brian Woolley which attempted to adjust batter performance for the quality of pitching they face. It’s interesting because we tend to assume when we look at a player’s performance that they faced more or less the same quality of competition as everyone else, despite the fact we know, especially in small samples, that may not be the case. This is even more evident when we look at minor league performance, where the quality of competition can vary wildly from one prospect to another. How can we discard the assumption of equal quality of competition and try to get a more accurate picture of a player’s performance? In basketball analytics, this quality of competition piece is an even more pronounced issue because of the fact that players are selected to play in specific situations by a coach, unlike the lineup card which dictates when everyone bats.

There is a metric in basketball called Regularized Adjusted Plus-Minus (RAPM) which attempts to value individual players based on their contribution to the outcome of a game while accounting for the quality of the teammates and opponents when on the court. The initial idea in the public sphere came from Dan T. Rosenbaum in a 2004 article detailing Adjusted Plus-Minus (APM). You can read more about the basketball variant in the linked article, but I’ll describe how I adapted it to a baseball context.

To setup the system, I created a linear regression model which takes each player as an independent variable, every plate appearance as an observation, and the outcome of the plate appearance as the dependent variable we’re trying to predict. Specifically, if a player is not part of a plate appearance, the value for their independent variable is a 0, if they are the pitcher, they are a -1, and if they are the batter, they are a 1. Note that for players who appear as both a pitcher and hitter, they are given two independent variables so we can measure their impact on both sides of the ball separately. The outcome of the plate appearance is defined in terms of weighted On-Base Average (wOBA). Read the rest of this entry »