Failure Is an Option: Project Management Lessons From TEPCO, Part 3
This article is part 3 of a four-part series on project management lessons from the tsunami that damaged the TEPCO Fukushima Dai-ichi nuclear complex.
Tuesday, I noted three big project-management errors in their risk analysis. Yesterday I described how they missed a big risk. Today I’ll discuss single-point-of-failure issues. The final installment will look further at reviewing the risk plan.
In the movie Apollo 13, NASA Flight Director Gene Kranz says dramatically, “Failure is not an option.”1 I think society presumes the same holds true for nuclear power plants.
When failure is truly not an option, it is imperative that the project manager seek out single points of failure. A single point of failure occurs when the failure of one particular item in a chain of cause-and-effect will lead to the failure of the whole chain.
In law, there is an expression dating back to Roman times suggesting this principle: falsus in unum, falsus in omnibus. Lie once, and the rest of your testimony is compromised.
In the original iPod, the battery could not be replaced when it died.2 When your battery died, so did your iPod. (Apple changed this after a public outcry.)
On today’s commercial airplanes, designers work very hard — after all, their lives too are on the line when the plane is airborne — to eliminate single points of failure. Look at the record of most recent crashes or near-crashes; almost all of them were the result of cascading failures.3
So what happened at the Dai-ichi complex?
The system had in effect a single point of failure — the backup generators.
One could argue that by virtue of their being backup power sources they were the second point of failure, with the failure of normal electrical power being the first point. However, consider the separate project to keep the plant safe in the event of a significant emergency. For this project, which was a real and critical project, the generators were exposed as a single point of failure.
Remember, it wasn’t only an earthquake or tsunami that could knock out power. I lived through both of the big New York City blackouts — in 1965, when power was out for about 12 hours, and in 1977, when it was out for over a day.4 Japan was no stranger to acts of violence (e.g., war), which could certainly cause power to go out. Indeed, the loss of power for a significant period is an event any electricity-dependent facility plans for, from nuclear power plants to food warehouses to data centers.
It is a certainty (or almost a certainty, but it sure feels like a certainty) that if you have a large system that depends on power, the power will fail at some point.5 You’ll need the backup generators to come on.
It is a certainty (or almost a certainty — ask anyone who’s been there) that when you need the backup generators to work, they won’t.
Backup generators fail.
To me as a project manager, that’s one of the inexcusable items in this mess — a single installation of backup generators. If something happened to them, they wouldn’t work when they were needed.
A tsunami was what happened to them, but that was only one of the potential issues. What about, say, water somehow getting into the fuel supply? (It happens.) Only one set of generators, in a single location, quite possibly with a single fuel supply, a single transmission line….
The cost of redundancy here was minimal compared to the cost of failure.
Blame (Not), Lessons, and Examples
Even if it were up to me to point a finger at someone, I wouldn’t know whom. Did the project manager miss it? Was he or she overruled? Was it a cost issue?
In the long run, blame doesn’t matter. What does matter, in the project management world, is to learn lessons from what happened.
Lesson one, as I noted yesterday, was to be both assiduous and wide-ranging in identifying risks.
Lesson two is to be alert for single points of failure.
It’s easy to see how the first point maps to Legal Project Management. However, what do single points of failure look like in legal projects?
Here are some examples:
- A part of the project can be handled only by one person in your practice, whether a specialized attorney, an IT forensics expert, a paralegal with specific organizational skills, etc. That person becomes a single point of failure. Traditionally project managers ask the question, What if he gets hit by a bus? Or, to put it in nicer terms, what if he wins the lottery this evening? A project manager needs to account for critical players and identify backup plans (or sometimes simply accept the risk).
- E-discovery is highly software-intensive these days. What if the software fails, whether an intrinsic bug or it’s housed in a data center where the backup generators…. Do you have enough time in the schedule to handle restarts and rework? Are there things you can do to maximize the time available… or negotiate a less stressful e-discovery plan?
- You’re traveling and you need to work on a brief from your hotel room. What if the Internet connection in your hotel room isn’t working? (It happened to me on a recent business trip.) Do you know where there’s a Starbucks open late? (No, you can’t look it up on your computer only when you need it. The Internet connection isn’t working, remember? At least most of us have browsers on our smartphones these days.)
- Your lead attorney needs to be in court in another state Wednesday afternoon. Does she fly in Wednesday morning… or Tuesday night? With full flights these days and increasing delays, flight timing can be a single point of failure.6
It is a project manager’s job to identify potential single points of failure and determine whether to do something about them. Not all such items require a pre-planned response, or even an entry on the risk sheet. I wouldn’t make an issue these days of the hotel Internet connection risk, for example, since finding a wireless hotspot using your phone’s browser is so easy. (If I’m sending an attorney who isn’t very technologically savvy, however, I might make sure that he knows how to find a hotspot — in my experience, not all hotel front desk folks are local and know where to find one — and knows how to connect to it with his laptop.)
That’s the second lesson from the TEPCO disaster: Identify potential single points of failure.
In the wrap-up to this series I’ll dig into ongoing risk reviews a bit more than I did yesterday.
1Actually, it’s the the character portraying NASA Flight Director Gene Kranz who says, “Failure is not an option.” Kranz never said it. It was invented by the script writers based on a discussion with Jerry Bostick, who had been in the room while NASA Mission Control was trying to come up with a way to get the astronauts back safely. However, Kranz knew a good line when he heard it, and he used it for the title of his book on NASA.
2Supposedly Steve Jobs mandated that a screw for a battery compartment not compromise the smooth, sleek look of the case. The original Macintosh computer was likewise unopenable. (Well, almost. You needed a very long, thin, special screwdriver to get at two tiny, almost invisible screws.)
3We’re still waiting on a determination of the cause of the Air France crash, which might now be possible with the “black boxes” recovered. Preliminary evidence, as interpreted by folks who know what they’re doing behind the controls of a commercial jet, suggests a series of things went wrong. Even the US Air flight that landed in the Hudson river a few years ago was a two-point failure, although both failure points were caused by the same flock of geese.
4As long as I’m in footnote mode… For a brief time, I was convinced I’d caused the 1977 blackout. I was working alone building a platform in a large loft in New York’s SoHo district, where I was living at the time. I plugged in my circular saw to a newly installed outlet, and as I started it up all the lights went out. I stumbled my way in the lumber-strewn darkness toward the circuit breaker panel about 100 feet away. It wasn’t until I got up next to the panel that I realized I shouldn’t have been stumbling in the dark, since there was a streetlight outside the front windows — open on a hot July night — that usually lit up the front part of the main room. “Could I really have shorted out the whole block?” Upon reflection, I realized how unlikely that was, but for a few seconds…. And of course it was a whole lot more than my block that was in the dark. I do recall that downtown we made a pretty good party out of it… and were thankful we weren’t living on the 30th floor of a high-rise apartment building.
5I have no experience with nuclear power plants, but I do have significant experience with data centers… including those caught in a power failure.
6I always arrive the night before I’m scheduled to teach one of my classes, even if the starting time is mid-afternoon. Maybe I’m a worry-wart, but I think it comes with the territory. I always have two backups of my class materials, one in my pocket on a flash drive and one on line. Stuff happens. Murphy was an optimist.