Do you remember where you were when the lights went out?
I can tell you exactly where I was… on a conference call reviewing a presentation with a group of colleagues from all over the country. The call was scheduled for 4:00 pm. It was shortly after. The phone went dead. Odd, I thought, it’s a land line. How often to land lines just cut out all together? Then, I thought, I know these conference calls can be painful, but this is a bit much… “OK, who sabotaged this call??? Raise your hand!”
Along with the phones, the lights and then the computer powered off. A look out the window… no traffic lights. Even odder and, at first, terrifying. All of us in the Northeast, New Yorkers in particular, I suppose, were all too familiar with that feeling of vulnerability and deep-seated dread. Slowly, as we learned about what had occurred and that there was no ominous threat, New York turned into a community, a community of strangers helping each other out and making the most of the darkness… at least for the most part.
What happened?
You can find a summary of the final report on the 2003 blackout here. It is interesting reading, particularly looking back on the recommendations presented by the joint Canadian/US research teams back in 2004. We will get to recommendations, below, and to whether we’re better off now than we were 10 years ago.
Essentially, as Terry Boston, an expert who was asked to participate in the official investigation, explained on NPR this morning, the blackout:
causes came down to three things: “Tools, training and trees.”
Beyond trees that were too close to power lines, the cascading blackout was fueled by failing computer systems and poorly trained operators who didn’t talk with each other.
The immediate, direct cause of the blackout were overgrown trees that interfered with transmission lines in 3 parts of the Ohio transmission grid. These failures caused greater stress on remaining transmission lines, and as the lines failed the cascading effect of the failures brought power systems down from Ontario, down to Baltimore and west to parts of Michigan. According to the final report, the immediate cause had a number of other failures linked to it. A failure of this sort should never cause an outage of that magnitude. That’s where the training and tools come in. In addition to the maintenance problems, computer communications and system/software failures delayed alerts to key personnel. By the time alarms and communications were functioning again, the damage was done.

The final report about the many, interlocking causes of the blackout is excellent reading for those of you who are interested in the dynamics of the power grid. For a shorter overview, and also an excellent report that explains how the power grid works, listen to a story that aired on NPR this morning “Complex Networks Make up U.S. Power Grid”.
What has changed?
The recommendations from the task force include:
- Adding enforcement capabilities to the North American Electric Reliability Corporation (NERC).
- Developing a regulator-approved funding mechanism for NERC and the regional reliability councils,to ensure their independence from the parties they oversee.
- Clarifying that prudent expenditures and investments for bulk system reliability (including investments in new technologies) will be recoverable through transmission rates.
- Tracking implementation of recommended actions to improve reliability.
- Not approving the operation of new RTOs or ISOs until they have met minimum functional requirements.
- Shielding operators who initiate load shedding pursuant to approved guidelines from liability or retaliation.
- Integrating a “reliability impact” consideration into the regulatory decision-making process.
- Improving the sensor capability in transmission systems.
- Improving the quality of system modeling data and data exchange practices.
- Tightening communications protocols, especially for communications during alerts and emergencies.
- Developing enforceable standards for transmission line ratings.
- Requiring the use of time-synchronized data recorders.
(See all recommendations, here, in the final report on the blackout.)
According to a number of experts on transmission and grid management, NERC’s responsibilities have expanded. NERC now has enforcement capability it did not have prior to August 14, 2003. There are fines and financial implications for ignoring maintenance issues and not paying close attention to reliability and grid stability.
Many in the industry believe that that day, August 14, 2003, was a turning point for the electric utility industry. The industry has adopted an approach to risk management called a “defense-in-depth approach” — this is a risk management strategy adopted by the airline and nuclear power industries (Brent Barker, “Game Changer” Power Power Magazine,July-August 2013 issue Vol. 71, No. 5). Defense-in-depth approach was developed by the National Security Agency. This approach essentially puts layers of redundant checks and balances into a system in order to try to anticipate and correct small problems at a number of levels (personnel errors, procedural, technical, and physical breakdowns) before large, system-wide failures can occur. Experts also report that the utility industry, government experts, and regulators have found common ground on dictating and implementing reliability standards and have been able to cooperate in a way that hasn’t occurred in the past.
Most experts in the field believe that we are far better off than we were 10 years ago–personnel are better trained, technology has improved, reasonable standards are in place, a culture of learning apparently exists. No one in the industry can promise the end of the blackout. The ongoing challenge to the industry is the trade-off between ensuring greater reliability and the cost of implementing these improved measures. What are we willing to pay, and who will pay, to get to a reliable and well-maintained electric grid? For many, the recent recovery from super storm Sandy and Hurricane Irene seem to contradict this view— but perhaps it is important to make the distinction between storm recovery, and other, non-weather-related grid failures.
What do you think? Are we better off now than we were 10 years ago? Let us know.