When I first learned about a mysterious cabal of smart nerds who were analyzing baseball, I took the words I got from them as though passed down from heaven. I read Moneyball, of course. But I also read about DIPS theory, wOBA, and whatever else I could get my hands on. I read The Book so many times I wore it out and had to buy a new copy. It felt like there were cheat codes just under the surface of the sport that someone was highlighting for me.
Many of those lessons from 15 years ago are still kicking around in my head. I’m skeptical of BABIP-driven hitters, perhaps more skeptical than I should be. I dismiss batters with anomalous platoon splits, even if there’s something about them that really does make them unique. And recently I realized that I might be misunderstanding the signaling value of strikeout rate.
Back in the early 2000s, batters who struck out more hit better. That sounds counterintuitive, because strikeouts are bad. It’s actually not that weird though. Barry Bonds struck out more than Ozzie Smith in his career, just to pick two illustrative examples. Bonds isn’t even a great example, because his batting eye was otherworldly. Alex Rodriguez struck out twice as often as Omar Vizquel.
The popular opinion was that strikeouts weren’t really a negative indicator. A strikeout was bad, sure, but it was often a hidden indicator of some positive process under the hood. No one would say that being sore is good for your health, and yet people in great shape are probably sore more often than sedentary types, what with all the exercising. Amount of time spent being sore very likely has a positive correlation with health.
Take a look at a few correlations from 2000 to 2010. These look at every batter with 300 or more plate appearances in a given year:
Various Correlations to wOBA By Year
Hey, more strikeouts mean more wOBA. Correlation, as you may have heard, doesn’t equal causation, and the slope was minimal (half a point of wOBA for every percentage point of strikeout rate), but it was there.
Let’s zoom that data out to every year since 2000:
Various Correlations to wOBA By Year
Wuh oh. Walks and production on contact are still correlated with positive outcomes, but higher strikeout rates are now associated with lower wOBA. That is unexpected.
Naturally, no one was ever saying that batters should strike out more to improve their outcomes. There was simply covariance (or correlation if you’re a normalizing sort) between striking out and doing good things like walking or smashing the ball. Maybe that’s the change. Did new training methods, or something else I can’t think of, somehow eliminate the link between strikeouts and positive outcomes?
K% Correlation to Positive Outcomes
Nope! Those linkages are running strong, even if the correlation to walks has been a little bit wiggly of late. Again, this doesn’t say anything about causality, but it’s not hard to imagine that deeper counts lead to both more walks and more strikeouts, while swinging really, really, ridiculously hard produces more strikeouts and more damage when you do hit the ball. Neither of those fundamental relationships seems to have changed much in the last twenty years.
So what gives? I thought I’d approach the problem using an ad hoc method. I predicted each batter’s wOBA in each year using only their wOBA on contact (wOBACON if you’re hungry). From there, I took the “error,” the difference between the prediction and the batter’s actual wOBA, and compared that error term to a batter’s strikeout rate. This should, in theory, handle the covariance issue; a batter with a high strikeout rate but also high production on contact will have a higher predicted wOBA than one who strikes out less and dinks the ball more, which will get rid of that annoying cross-correlation.
When we look only at how strikeout rate contributes to this error term, we get a real relationship. Strikeout rate is strongly correlated to the gap between predicted and actual production:
K% Correlation to wOBA Error
|Year||Correlation||wOBA Per 1% K|
That correlation, and the slope, are strong and consistent over time. Once you know how well someone does when they put the ball in play, that’s not shocking. For every one point increase in strikeout rate, we’d expect to see a wOBA roughly three points lower on average (assuming contact quality stays constant).
But there’s something weird here. Look at that table again: Both the correlation and the slope are consistent over time. Adding strikeouts seems to hurt exactly as much, after accounting for loudness of contact, as it did in 2000. And yet we saw the evidence above: higher strikeout rates are now a sign of worse batters, not better.
Why is this? It’s because there’s a hidden factor we haven’t yet considered. A single point of strikeout rate might be tied to a similar decline in wOBA, but strikeout rates are far more dispersed now than they’ve ever been. It’s hardly a secret that strikeout rates have crept up over the years; non-pitchers struck out 15.9% of the time in 2000 and 22.4% of the time in 2019. That higher rate comes with wider variation:
K% Variation Over Time
|Year||K%||StDev K%||3-Year Avg StDev|
And that wider variation seems to be enough. It was never good to be a higher-strikeout batter; there simply wasn’t as much variation. Every batter was clumped in the middle, and production on contact was more important. But production on contact hasn’t changed much in magnitude, and it hasn’t changed at all in variance:
wOBACON Variation Over Time
|Year||wOBACON||StDev||3-Year Avg StDev|
So now normal dispersion in strikeout rate is bigger, which means that a batter one standard deviation below the mean and a batter one standard deviation above the mean are much further apart in strikeout rate. It’s easier, in this day and age, to strike out enough that you just aren’t playable. Variance is wider, which means more players fit that bill. And correlation is a simple thing; it more or less looks at the data points and draws a line. A handful of batters striking out so much they torpedo their stats could flip the observed correlation of strikeout rate and wOBA, and now there’s an intuitive result where once there was a confusing one.
I’d like to point out, at this point, that this whole article has arguably been nonsense. There’s no real meaning in these relationships; as we saw, adding strikeouts is as costly as it ever was, and racking up value when you put the ball in play is still king. In fact, it’s always possible that I interpreted these correlations incorrectly; I’m no mathematician, and I’m merely spitballing based on my intuition of how these relationships have changed. The magnitude of everything is tiny, and there’s nothing causal. It’s just fun with numbers.
But the baseball season hasn’t started yet, and in an increasingly grim world, we could all use a little frivolous data entertainment. If, like me, you enjoy a little mathematical tomfoolery, I hope this fits the bill. Strikeouts have always been bad! They just show up that way now, even if you don’t take the time to control for other things.