Some Thoughts on the Data Driven Law Practice
Ron Friedmann has some comments on the so-called data driven law practice, in particular commenting on an article in Corporate Counsel on compliance data.
Both are worth reading (the Corporate Counsel one first, and then Ron Friedmann’s take).
First Problem: The Black Swan
The biggest issue with the approach suggested is the you-don’t-know-what-you-don’t-know problem, featuring events that author Nicholas Nissim Taleb calls “black swans.” For example, the Fukushima nuclear power plant’s defenses had been designed against the largest tsunami’s in data-recorded history. In other words, they looked at what had happened in the past 100 or so years, and designed against that. They were disastrously wrong because they hadn’t considered tsunamis outside that data… even though there was strong anecdotal evidence of larger tsunamis going back further in history.
Or consider the Corporate Counsel example of the compliance hotline. Not only will it provide sampling errors2, it will sample only those issues where people know to call the hotline. Consider:
- In a small overseas office, there are only two people dealing with the country in question. Together they conspire3 to “help” the company in ways that violate the FCPA. Think either of them will call the hotline?
- Compliance training teaches employees to call the hotline on specific issues. An employee sees behavior that isn’t on the list (and isn’t overtly outrageous). The chance that the employee will call is very small. Even if this behavior occurs in ten different offices, you may get no calls because your compliance training omits this particular behavior. I’ve seen too many example of such training that are designed to “check the box” rather than pose dilemma in ethical behavior. That kind of training likely minimizes hotline calls.
Don’t ignore the data you have, but do learn how to interpret it – especially in the realm of the unexpected. It’s hard to expect the unexpected, of course… which is my point.
Second Problem: Misreading Moneyball and the A’s
Key quote from the Corporate Counsel article:
Silver concludes in his chapter on baseball statistical analysis that even sporting organizations that are leaders in statistical analysis, such as the Oakland Athletics, rely heavily on scouts to analyze the human factor.
Moneyball minimized the scouts because Billy-Beane-is-different made a much better story.4 Oakland scouts not only analyzed the human factor, they had drafted and signed Oakland’s five most effective players three to nine years prior to the 2002 season (Tim Hudson, Barry Zito, Miguel Tejada, Mark Mulder, and Eric Chavez). One can make a solid statistical case (which I understand Nate Silver has done) that the Moneyball aspect played less of a role than random chance in getting Oakland to the playoffs that year.
Silght digression: I’m not suggesting the Moneyball approach is bogus. The idea of sabremetrics, the science of baseball analysis, is that you can and should use hard data rather than random, selective data in the data-based analyses you do. For example, RBIs, runs batted in, was a hugely important stat 20 years ago. However, RBIs are heavily influenced (duh!) by how often runners are on base when a batter comes to the plate. If you’re leading off for the Mariners and their #7, #8, and #9 hitters are terrible at getting on base, you’re simply not going to have opportunities to drive in runs.
The real lesson of the 2002 Oakland Athletics is that having good data, and using it, can help you make decisions, in particular by pointing out your biases and blind spots. Data can’t make decisions by itself, nor can it be the only factor in a decision – in part because you never have all the data, and an informed human brain(s) can make better leaps than a random guess at that point.
Both these articles are well worth reading and considering. But the world of data – and its limitations – is far to complex to be described effectively in their 2000 or so words.
My take: Gather good data. But engage someone well versed in practical metrics and statistics to evaluate its worth before you rely on it.