About Site


I Need a Stat, Stat!
Written October 27, 2006


Americans eat a lot of cheese.

That's too vague.  Quantify what you mean by “a lot.”

Americans consumed a whopping 4.4 million pounds of cheese in 2003, according to the California Milk Advisory Board.

That's better.  You've cited an authority and attached a number to your assertion — 4.4 million pounds is indeed “a lot of cheese.”  Now the reader will believe what you say.

Except I just noticed that I made a mistake.  It should be 4.4 million tons, not pounds.

Oh.  Well, the sentence works either way.  True, the correct amount is two thousand times as large as the incorrect one.  But “4.4 million pounds” and “4.4 million tons” sound the same to the average reader who has no concept of how much a million pounds is.

We find many statistics like this in news stories and essays.  The actual numbers don't matter at all.  Their purpose is to give credibility to the writer, by showing that he has data to back up his assertions.

My job often involves inserting statistics into baseball telecasts.  Often the statistics mean little.  They're traditional decorations, included on our graphics in case a few of our viewers actually care that this batter is hitting .287 with 10 home runs and 40 runs batted in.

Frequently, though, we select the stats to back up our assertions about the players.  In particular, on a telecast aimed at Pittsburgh Pirates fans, we want to use numbers that point out how well the Pirates are doing.

We can't completely avoid “negative” stats that detail the home team's problems.  Perhaps we've lost all but two of the last ten games, and it's mostly because our starting pitchers have given up almost a run per inning on average.  But, when given a choice, we like to use “positive” numbers that we can cheer about.  Perhaps in those ten games our leadoff hitter has reached base in almost half of his plate appearances.

There are so many statistical categories in baseball that we can quickly and easily find a positive or negative stat on almost every player.

Let's look at the most basic stat, batting average.

As an experiment, I set up a spreadsheet on my computer.  It simulates the first 100 games of a season for a batter who gets four at-bats in each game.  His career batting average is .250, so in each at-bat of my simulation, he has a 1-in-4 chance of getting a hit, as determined by a random number generator.  The results are depicted in the chart you see on the left.

By isolating different portions of this 100-game season, we can find either positive or negative statistics.

Positive:  This batter is hot.  He has 3 hits in his last 9 at-bats (a .333 average).  In the last 25 games he's batting .270 (20 points above his career average).  And for the season, he's hitting a career-best .285.

Negative:  This batter is cold.  He went 0 for 4 in his most recent game.  His average for the last ten games is only .175.  And although he started the season with a .304 average in his first 79 games, he's been in a .214 slump since then.

“We believe that the more we know, the better our chances of making sense of what is going on,” writes Mary Joan Winn Leith of Stonehill College.  (Quoting a professor is, of course, another way for a writer to seem more credible.)  “Psychologists tell us humans abhor chaos and determinedly resist notions of a random universe.  As instinctive ‘meaning-makers’ we prefer to look for an agent or mechanism at work behind the scenes.”

Therefore, we want to claim significance for this batter's numbers.  We attribute his career-high average to a new hitting instructor, while blaming his current slump on a nagging elbow injury.

Remember, though, that in my experiment there are no reasons for the computerized “batter” to be either hot or cold.  Whether he gets a hit or not is, whether we like it or not, actually random.

And, more often than we “meaning-makers” would like to admit, that is the case in the chaotic real world!

A cartoon from the website xkcd.com implies that all sports analysis consists of pretending that these quasi-random quantities have meaning.

Additional comments from me are here.

We've only begun sorting through the statistics.

Rather than breaking out the last X games (with X chosen so as to give us either a positive- or negative-sounding result), we could look at only home games, or only day games, or only games against left-handed starting pitchers, or only games against a particular team.

We can break out individual at-bats, isolating those with runners on base, or with a runner at third base and less than two out, or when the count reaches 3 balls and 2 strikes.

And we're not limited to looking at batting average.  We can use slugging percentage or on-base percentage or doubles or walks or sacrifices, and so on and so on.

There are so many possibilities that, although most of the stats will be unremarkable, there's a high probability that at least one of them will be way outside the expected range.

When we've found a particular number that's unusually high or low, we highlight it for our viewers.  “This batter has faced this particular pitcher only four times and has hit three home runs!”

Maybe the batter has the pitcher figured out, and this time he'll homer again.  More likely, his remarkable success in the past has simply resulted from the random workings of chance, and this time he'll strike out.

But that doesn't prevent us from proclaiming the stat as if it means something.  And the audience, perking up in anticipation of another home run, is persuaded.



Back to Top
More Math/ScienceMore Math/Science
More SportsMore Sports