Alphabet Soup: How To Solve A Problem Like Wonky Stats

We looked at ballpark factors last week, and discussed how wOBA and other rate stats can vary by virtue of the park they were generated in. At the end of the post, I left you waiting with bated breath for the reason behind the big OPS+ discrepancy between Evan Longoria and Adrian Beltre.

The short answer? I was more or less right.

Fangraphs’ wRC+ is based on wOBA, while B-R’s OPS+ is based on OBP and SLG. My first question was whether the discrepancy could be in the way OPS weights (or doesn’t weight) its components, where wOBA does.

But the problem doesn’t seem to lie in the fundamental components of the number. Beltre’s OPS is .892 and Longoria’s is .850. Not terribly different, and Beltre’s is still higher than Longoria’s. So somewhere in the application of league averages and park factors, the two are flipped, and Longoria comes out with a 10% lead.

The only thing I could think of that would give Longoria such an advantage was an extreme ballpark factor, which makes sense since the two play in parks at complete opposite ends of the spectrum. We also know that B-R uses single-season park factors, which can vary more than multi-year factors.

B-R publishes park factors for each team on its page. I wasn’t able to find a list of Fangraphs’ park factors by year, but I did find something else interesting. Take a look at the page for the 2011 Texas Rangers. It lists both multi-year and single-season park factors.

The Rangers’ multi-year park factors for batting and pitching are 111 and 109, respectively. Above 100 indicates a hitter-friendly park. The single-season numbers are 117 for batters and 115 for pitchers. That’s a pretty significant discrepancy – nearly half again as far from average as the multi-year factors.

The Rays, on the other hand, remain relatively consistent. Their multi-year batting/pitching factors are 92/91 respectively, with single-year factors of 92/92.

I think we’ve spotted our discrepancy. Fangraphs’ multi-year park factors gave both Longoria and Beltre a wRC+ of 134. We can expect that when calculated over multiple years, B-R’s and Fangraphs’ park factors probably converge.

» Continue reading “Alphabet Soup: How To Solve A Problem Like Wonky Stats”

Share

Alphabet Soup: Home Sweet Home

Last week, reader bkibbs did an awesome analysis of Carlos Pena’s 2011 contract using OBP and BABIP regression. It was a good example of how we can use advanced stats to get a better picture of a player’s value, and use peripheral stats like BABIP to predict future performance.

This week, I want to talk about ballpark factors. We looked at wRC+, which is a park-adjusted stat, a few weeks ago. Today, we’ll pick apart some examples and see how park adjustments actually play out in the numbers.

First, though, we need to know what our park factors are. There are two ways to calculate them: the easy way, and the hard way. ESPN has a table of park factors calculated the easy way; this is fine for getting a general idea of the trends at various parks.

The ‘easy’ park factor looks at a single season of data. The formula is listed at the bottom of ESPN’s page. It consists of the ratio between the average total scoring (by both teams) at a team’s home games, and the average total scoring at its road games. If more runs are scored, on average, in a team’s home ballpark than on the road, the ratio is greater than 1.00 and the park is considered a hitter’s park.

AT&T Park has the lowest ratio at 0.737; in all the games that the Giants played, the average total scoring (from both teams together) was higher on the road than at home. Rangers Ballpark in Arlington has the highest, at 1.409, meaning that the Rangers and their opponents in aggregate scored nearly half again as many runs in Arlington as they did on the road.

But wait, I hear you saying. Surely this isn’t the best way to calculate ballpark factors! A single-season number will be skewed by the individual team’s offensive capabilities or an unbalanced schedule. But multiple years of data mean that we aren’t wholly comparing apples to apples, what with weather changes and new parks opening. Well, because this is sabermetrics, nothing is ever simple, and there is in fact plenty of disagreement on how to calculate park factors.

We’ve seen how Baseball-Reference calculates theirs; it’s basically a single-season number much like ESPN’s, but with adjustments for the fact that batters don’t have to face their own team’s pitchers. Fangraphs does theirs a little differently, using multiple years of data and then regressing the results toward the mean, in order to prevent the possibility of overcorrecting.

Since we’re using Fangraphs player data, theirs are the park factors going into the calculations. But if we just want a quick and dirty ranking of how the various ballparks play, checking the ESPN list for comparison won’t hurt anything.

Right. Now that we’ve got that straightened out…

» Continue reading “Alphabet Soup: Home Sweet Home”

Share

Alphabet Soup: You Can’t Compare Ruth And Bonds

That title is a lie. But there are three kinds of lies, right? Lies, damn lies, and statistics. So clearly we’re on the right track already.

We talked last week about wOBA, one of the best single-number characterizations of a player’s offensive contributions. But now we have to broaden the context a little, especially if we want to compare players on different teams and/or in different eras.

And so, we add in league and park adjustments. These are essentially just one more step toward making sure we’re measuring every player against the same yardstick.

As you’re keenly aware if you’ve watched baseball for a while – particularly if you were around for the past decade or so – league averages in things like offense and pitching tend to wax and wane with the changing alignment of the planets, or Jose Canseco’s sanity, or something.

That makes it hard to take, say, Mark McGwire’s power numbers from the heart of the steroid era and compare them to, say, Ty Cobb’s achievements in the dead-ball era. So it’s helpful to account for the offensive context around the league if we want everyone to be on a level playing field.

By the same token, ballparks can vary wildly in how easy it is to hit in them. Some are cavernous, with lots of space for a batted ball to fall in; some are tiny, making it easy on the defenders. There are clear trends in offensive statistics at some ballparks, showing an average blow or boost to hitting when compared to league-wide numbers. If it’s your home park, you’re playing half your games there; it’s gonna have an effect.

And there’s also the fact that a team’s hitters don’t have to face its own pitchers. This can make a marked difference if, for example, you play for the Phillies.

To account for these contextual differences, a couple of the statistics we’ve already talked about have been used to develop league- and/or park-adjusted derivatives of themselves. We’ll start with the one you’re most likely to see, and then talk about the one you should probably use instead.

» Continue reading “Alphabet Soup: You Can’t Compare Ruth And Bonds”

Share