join.booking.com

When I first learned about a mysterious cabal of smart nerds who were analyzing baseball, I took the words I got from them as though passed down from heaven. I read Moneyball, of course. But I also read about DIPS theory, wOBA, and whatever else I could get my hands on. I read The Book so many times I wore it out and had to buy a new copy. It felt like there were cheat codes just under the surface of the sport that someone was highlighting for me.

Many of those lessons from 15 years ago are still kicking around in my head. I’m skeptical of BABIP-driven hitters, perhaps more skeptical than I should be. I dismiss batters with anomalous platoon splits, even if there’s something about them that really does make them unique. And recently I realized that I might be misunderstanding the signaling value of strikeout rate.

Back in the early 2000s, batters who struck out more hit better. That sounds counterintuitive, because strikeouts are bad. It’s actually not that weird though. Barry Bonds struck out more than Ozzie Smith in his career, just to pick two illustrative examples. Bonds isn’t even a great example, because his batting eye was otherworldly. Alex Rodriguez struck out twice as often as Omar Vizquel.

The popular opinion was that strikeouts weren’t really a negative indicator. A strikeout was bad, sure, but it was often a hidden indicator of some positive process under the hood. No one would say that being sore is good for your health, and yet people in great shape are probably sore more often than sedentary types, what with all the exercising. Amount of time spent being sore very likely has a positive correlation with health.

Take a look at a few correlations from 2000 to 2010. These look at every batter with 300 or more plate appearances in a given year:

Various Correlations to wOBA By Year

YearStrikeoutsWalkswOBACON
20000.0540.5680.899
20010.0430.6060.895
20020.0660.6850.872
20030.0370.5810.892
20040.0260.6460.851
20050.1210.5970.868
20060.0740.5750.858
20070.0800.5300.853
20080.0810.5140.837
20090.0410.5360.842
20100.1770.5370.863

Hey, more strikeouts mean more wOBA. Correlation, as you may have heard, doesn’t equal causation, and the slope was minimal (half a point of wOBA for every percentage point of strikeout rate), but it was there.

Let’s zoom that data out to every year since 2000:

Various Correlations to wOBA By Year

YearStrikeoutsWalkswOBACON
20000.0540.5680.899
20010.0430.6060.895
20020.0660.6850.872
20030.0370.5810.892
20040.0260.6460.851
20050.1210.5970.868
20060.0740.5750.858
20070.0800.5300.853
20080.0810.5140.837
20090.0410.5360.842
20100.1770.5370.863
20110.0360.5190.869
2012-0.0060.3910.845
20130.0070.4470.838
2014-0.0980.3730.800
20150.0510.5540.838
2016-0.0400.4470.789
2017-0.0630.5040.766
2018-0.1040.5210.784
2019-0.0840.4060.804

Wuh oh. Walks and production on contact are still correlated with positive outcomes, but higher strikeout rates are now associated with lower wOBA. That is unexpected.

Naturally, no one was ever saying that batters should strike out more to improve their outcomes. There was simply covariance (or correlation if you’re a normalizing sort) between striking out and doing good things like walking or smashing the ball. Maybe that’s the change. Did new training methods, or something else I can’t think of, somehow eliminate the link between strikeouts and positive outcomes?

K% Correlation to Positive Outcomes

YearWalkswOBACON
20000.2780.427
20010.2850.431
20020.1980.501
20030.1720.427
20040.1470.498
20050.2440.550
20060.2480.522
20070.2030.539
20080.2830.558
20090.2750.516
20100.2700.602
20110.2590.475
20120.2560.478
20130.2670.499
20140.1840.467
20150.2040.542
20160.2040.534
20170.1550.541
20180.1230.478
20190.1590.468

Nope! Those linkages are running strong, even if the correlation to walks has been a little bit wiggly of late. Again, this doesn’t say anything about causality, but it’s not hard to imagine that deeper counts lead to both more walks and more strikeouts, while swinging really, really, ridiculously hard produces more strikeouts and more damage when you do hit the ball. Neither of those fundamental relationships seems to have changed much in the last twenty years.

So what gives? I thought I’d approach the problem using an ad hoc method. I predicted each batter’s wOBA in each year using only their wOBA on contact (wOBACON if you’re hungry). From there, I took the “error,” the difference between the prediction and the batter’s actual wOBA, and compared that error term to a batter’s strikeout rate. This should, in theory, handle the covariance issue; a batter with a high strikeout rate but also high production on contact will have a higher predicted wOBA than one who strikes out less and dinks the ball more, which will get rid of that annoying cross-correlation.

When we look only at how strikeout rate contributes to this error term, we get a real relationship. Strikeout rate is strongly correlated to the gap between predicted and actual production:

K% Correlation to wOBA Error

YearCorrelationwOBA Per 1% K
2000-0.754-.0030
2001-0.769-.0029
2002-0.755-.0029
2003-0.762-.0030
2004-0.759-.0029
2005-0.717-.0025
2006-0.726-.0027
2007-0.728-.0027
2008-0.707-.0024
2009-0.729-.0026
2010-0.680-.0023
2011-0.761-.0026
2012-0.767-.0026
2013-0.752-.0026
2014-0.786-.0028
2015-0.739-.0026
2016-0.750-.0027
2017-0.743-.0027
2018-0.771-.0029
2019-0.775-.0028

That correlation, and the slope, are strong and consistent over time. Once you know how well someone does when they put the ball in play, that’s not shocking. For every one point increase in strikeout rate, we’d expect to see a wOBA roughly three points lower on average (assuming contact quality stays constant).

But there’s something weird here. Look at that table again: Both the correlation and the slope are consistent over time. Adding strikeouts seems to hurt exactly as much, after accounting for loudness of contact, as it did in 2000. And yet we saw the evidence above: higher strikeout rates are now a sign of worse batters, not better.

Why is this? It’s because there’s a hidden factor we haven’t yet considered. A single point of strikeout rate might be tied to a similar decline in wOBA, but strikeout rates are far more dispersed now than they’ve ever been. It’s hardly a secret that strikeout rates have crept up over the years; non-pitchers struck out 15.9% of the time in 2000 and 22.4% of the time in 2019. That higher rate comes with wider variation:

K% Variation Over Time

YearK%StDev K%3-Year Avg StDev
200015.9%4.79%
200116.8%5.08%
200216.3%5.45%5.11%
200315.9%4.75%5.10%
200416.3%5.19%5.13%
200516.0%5.06%5.00%
200616.3%5.11%5.12%
200716.6%5.35%5.17%
200817.0%5.63%5.36%
200917.5%5.37%5.45%
201018.0%5.46%5.49%
201118.1%5.37%5.40%
201219.2%5.69%5.51%
201319.3%5.87%5.64%
201419.9%6.03%5.86%
201519.9%5.72%5.87%
201620.6%5.79%5.85%
201721.2%6.04%5.85%
201821.7%5.77%5.87%
201922.4%5.92%5.91%

And that wider variation seems to be enough. It was never good to be a higher-strikeout batter; there simply wasn’t as much variation. Every batter was clumped in the middle, and production on contact was more important. But production on contact hasn’t changed much in magnitude, and it hasn’t changed at all in variance:

wOBACON Variation Over Time

YearwOBACONStDev3-Year Avg StDev
2000.383 0.061
2001.371.060
2002.367.057.059
2003.369.053.057
2004.374.052.054
2005.366.051.052
2006.376.052.052
2007.376.055.053
2008.375.053.053
2009.376.052.053
2010.374.057.054
2011.365.052.054
2012.372.055.055
2013.368.055.054
2014.369.052.054
2015.370.055.054
2016.380.050.053
2017.388.055.053
2018.379.052.052
2019.390.056.054

So now normal dispersion in strikeout rate is bigger, which means that a batter one standard deviation below the mean and a batter one standard deviation above the mean are much further apart in strikeout rate. It’s easier, in this day and age, to strike out enough that you just aren’t playable. Variance is wider, which means more players fit that bill. And correlation is a simple thing; it more or less looks at the data points and draws a line. A handful of batters striking out so much they torpedo their stats could flip the observed correlation of strikeout rate and wOBA, and now there’s an intuitive result where once there was a confusing one.

I’d like to point out, at this point, that this whole article has arguably been nonsense. There’s no real meaning in these relationships; as we saw, adding strikeouts is as costly as it ever was, and racking up value when you put the ball in play is still king. In fact, it’s always possible that I interpreted these correlations incorrectly; I’m no mathematician, and I’m merely spitballing based on my intuition of how these relationships have changed. The magnitude of everything is tiny, and there’s nothing causal. It’s just fun with numbers.

But the baseball season hasn’t started yet, and in an increasingly grim world, we could all use a little frivolous data entertainment. If, like me, you enjoy a little mathematical tomfoolery, I hope this fits the bill. Strikeouts have always been bad! They just show up that way now, even if you don’t take the time to control for other things.

Source link

join.booking.com