Project managers should have a copy of the classic How to Lie With Statistics on their shelf.
Darrell Huff’s book is still relevant over 55 years later — and still a good (and quick!) read. Huff describes ways in which statistics — metrics, in today’s project management parlance — can mislead. More specifically, he helps us spot the ways that others use metrics to guide our beliefs and actions in ways not truly supported by the facts.
I saw a report this morning about housing prices in my neighborhood. Among other things, it said:
The average price of homes sold in January dropped almost 17 percent compared to the previous month, and 14.5 percent compared to the same month last year. Similarly, the median price of homes in fell… 22 percent… from the same month last year.
It’s instructive to parse this statement carefully. (It’s probably even more instructive if you’re looking to buy or sell a house, but that’s not the point here.) Let’s look at three aspects of this seemingly simple statement.
Median v. Average
At least this report gives both median and average prices. Outlier values — values in a set of numbers that are significantly larger or smaller than most of the other numbers — make averages misleading. For example, if Bill Gates and four paupers walk into a room together, the average wealth of the five people in the room is billions of dollars. However, if you picked a person at random from that room, four out of five times you’d get someone whose financial situation is dire.
What’s the median value, the value “in the middle” of the set? Zero, at least as measured on a scale marked off in gradations of billions. Now if you pick a person at random from that room, you have a four in five chance of picking someone reasonably described by the median value.
The houses in my statistical neighborhood are largely middle- to upper-middle-class dwellings… but there are two strips of very high end homes in the area. One strip has the iconic view of Seattle featured on shows such as Frasier (at left); the other has equally stunning views looking out across Puget Sound to the snow-capped Olympic mountains. Both feature very large urban lots and homes with 5,000 to 10,000 square feet of living space (and a few even larger than that). These homes make up maybe 1% of the available houses, but they sell for around ten times the average price.
Thus, as with the Bill Gates example, they turn the average value into meaningless drivel.1
In this particular case, the median price is a bit more than 10% higher than the average price.
Now note that average prices dropped 17 percent the past month and 14.5 percent for the year. Plotting the data points yields the graph at left. This example doesn’t give the data that will tell us what happened between a year and a month ago, but there are only three real possibilities:
- The graph as shown is correct: prices were relatively flat, perhaps climbing a bit, and then they fell off a cliff this past month.
- The curve should be smoothed: prices climbed through the first half of 2010 and then took a precipitate dive as reflected in the numbers for December and January, the two data points at the right of the graph.
- Prices were relatively random the past year on a month to month basis — the three data points are just that, three isolated data points.
Obviously, we know the reality of the housing market the past year in Seattle doesn’t match either possibility 1 or possibility 2. It’s been declining in fits and starts for the past year (we got hit a lot later than most of the nation with housing woes).
The writer attempted to convey a sky-is-falling shock — housing prices fell off the cliff the past month! Not so, unless they regularly fall off the cliff and then climb back on (possibility 3). It’s bad news, true (or probably true – see below), but it’s not an unexpected skyfall.
Let’s look at the third issue, selection bias. Beware this trap in particular in putting together project metrics.
Go back to the Bill Gates example. It made sense because I picked specifically Bill and four anti-Bills. Had I done it with five people off the street, I would probably have gotten a much closer spread. Even had I done it with five ex-Microsoft employees (Bill is Chairman but not really an employee anymore), I’d probably have gotten a much closer spread — a few more folks up near the high end but not very many near the bottom.
Look at the sample again. What does it tell me about the worth of my house? Has my house declined in value 15% or so over the past year?
First, in absolute terms, no. I live in my house and have no intention of selling it this year. Thus its value to me is in its current utility — shelter, warmth, space for us, the kids, and the dog, power for the computers, an expansive view (albeit not quite the Frasier view) to relax in front of, etc. That had the same value to me last year as it does this year and probably as it will next year. Only if I want to sell my house does the year’s change in value matter.2
Clearly, for those who need to sell or want to buy, these changes do matter. But for most of us, they’re scary sounding numbers that contain more thunder than lightning.
??But let’s say I do care. Do the numbers tell me anything about the relative selling price of my house?
Again, no, at least not necessarily. Sure, the sociological effects will play into it; buyers who believe house prices are falling will offer less money or stay away entirely. But let me offer three scenarios that explain the data:
- Houses have uniformly dropped in price ~15% in the past year — mansions, 1000 square feet bungalows, and everything in between.
- Houses on certain streets — say, those without views — have suffered a steep decline. Absent the view, one house is as good as another, so why buy in my expensive neighborhood if you’re not getting a view? Houses with views have remained relatively stable. A 20% drop for three-quarters of the homes sold plus no drop for the quarter with views equals a 15% drop overall.
- The only houses that are selling are at the bottom and the top of the market. People at the top still have money to throw around and want to buy those iconic views. People at the bottom are squeezed by the recession and bad mortgages and such and have had to sell at a deep discount. Those of us in the middle are pretty much sitting in our homes and riding it out.
All three scenarios explain the data. Which is true?
You can’t tell from the data alone. None of the scenarios are implausible; #2 is an exaggeration, but it’s not hard to come up with real-world variants that make sense. #3 is also an exaggeration, since at least some houses in the middle must have sold in a neighborhood as large as the one in the survey (maybe six square miles of just-outside-the-city-core housing), but it might off only in degree, not in kind.3
Bottom Line (So to Speak)
Here’s the point: You cannot draw accurate conclusions from this data alone.
More generally, metrics without a huge amount of context are likely to be bogus. Misleading. Sometimes dead wrong.
Bad metrics are worse than no metrics. They can cause you to move with unwarranted certainty down a poorly chosen path. It’s one thing to start down a wrong path; we all do it. It’s another to remain on that path because the “data” is telling you that’s where you belong.
Don’t fear metrics. Do fear bad metrics.
1In addition, it’s likely the folks buying houses in the $5MM range felt the recession less than those buying closer to the average or median price, meaning that proportionally more homes sold at the high end of the market, which a realtor confirmed anecdotally to me. Another confirming factor is the median price falling faster than the average price, suggesting the homes at the high end that inflate the average are selling more than those in the middle.
2The change in value doesn’t even matter in terms of property taxes. Let’s assume my house was worth 1% of the total value of houses in a neighborhood. (That would be a pretty small neighborhood, but let’s keep the example numbers simple.) The state and local government still needs to raise the same number of dollars, so I’m still going to be assessed 1% of that total. Last year it might have been 6% of the putative worth of my house and this year 6.5%, but it’s still the same absolute number, more or less, and it’s still the same in proportion to what my neighbors are paying. I’ll leave the implications of taxing policies to others.
3Indeed, I think the reality is some combination of all three scenarios. All houses have declined somewhat — but not uniformly 15%. Some subareas and some houses have held their value more than others; those with views (“location, location, location”) and those in the middle of the market haven’t declined as steeply as other houses in the area.