reliability | Implementation, detail

They just don't make them like they used to

October 28, 2003 - Reading time: 5 minutes

In the past couple of years, national security has been on everyone's mind; laws have been passed, rules have been enacted, and generally life has been made more miserable so that we as a country can feel more secure. Some of the initiatives that we have seen are very visible; airport security, security at federal buildings, and legislation such as the Patriot Act have been widely discussed, and their relative merits are subject to some debate. There has also been much behind-the-scenes work, such as the Container Security Initiative (CSI), which is designed to protect the transportation of the ubiquitous and increasingly important 40-foot containers that bring us much of what we buy. All discussion of the merits of these security precautions aside, we can still say that people are actively working to keep our critical infrastructure safe from attack. But we have been ignoring an important point in the process of securing our national infrastructure, and that overlooked point presented itself to us recently. The recent massive power outage in the northeast provided us an important lesson: decreasing margins of safety and error in our infrastructure place critical societal functions at greater risk of significant disruptions from rare occurrences -- accidental, malicious, or otherwise unforeseen. This is nothing new; it has been going on for decades now, as a series of decisions by policy makers placed the administration of our national infrastructure in the hands of profit-seeking organizations. This is not necessarily bad, but redefining acceptable levels of risk and protections as the world changes is hard work, and needs to be done carefully.

Cost pressures and tight engineering under benign assumptions over the last few decades have lead to thin margins of error in our current infrastructure. This is to say that certain major failures are assumed to be so unlikely that they are discounted during the design process. This way of thinking creates systems that tend to be less expensive, and are optimized to fit the relatively optimistic world and set of basic assumptions. But while optimized engineering leads to most events being of small consequence (because the systems are engineered to tolerate them), some rare events that might otherwise have been relatively benign (or at least tolerable) can now lead to massive disruption. As the margins of safety designed into the large, complex, and poorly understood systems that make up our critical infrastructure (such as the national power grid) are whittled away in the name of cost-effectiveness, the likelihood of massive, uncontrolled failures increases. But while it seems like this might be just asking for trouble, it is seen as "bad engineering" to overdesign a system to tolerate very rare events, or events whose specific causes are not well understood, if that tolerance is perceived to cost more than the failures it would prevent (in terms of expected value to the customer), or if the likelihood of the failure seems very remote -- fragility to extremely rare events is seen as a good business decision. This is why rare disruptions (like power outages) come as little surprise to insiders of highly optimized or complex infrastructures. Building excess capacity and redundancy into a system such as the electric power grid is essential to safety and reliability, but it has no market incentive -- safety doesn't sell.

What the market calls "excess capacity" (note the connotations of "excess"), others call a safety net. When a critical power line fails, parallel lines must have this "excess" capacity to take over the flow, and this safety net must remain intact when lines are out of service for maintenance. Such safety is not cheap. So while adequate margins of safety generally have the side effect of increasing the overall efficiency and reliability of a system, at some point investments in redundancy are seen as extravagant and wasteful to stakeholders, whether they are private stakeholders (i.e. shareholders) or public (i.e. taxpayers). Those who are out to placate stakeholders tend to favor more visible single-point safety or security measures, which tend to cost more in the long run and are generally less effective.

The invisible hand of economics creates systems designed and optimized under optimistic assumptions of relatively benign environments; these systems are at great risk if new or unexpected threats arise, because the margins that have historically made it possible to work around unexpected problems (think of the Apollo-13 near-disaster) are no longer designed in. The development of our critical infrastructure is subject to these economic motivations, so it is already (and will become more) fragile to rare or unexpected events. That's good business paving the road to future vulnerabilities, because the market will not bear the cost of the level of reliability that it expects. The pace of technological change and societal reliance on these systems amplify the uncertainty, urgency, and magnitude of risk here.

After 9/11, we can point out how scenarios that were previously almost unthinkable are suddenly possible, and thus engineered defenses against potential attacks are more strongly motivated. However, to define and quantify threats and their impact, particularly in combination with coordinated physical and psychological attacks and effects, requires deep contemplative research, development, large-scale experimentation, and the like -- all very costly with little to no visible immediate payoff (which makes them politically unpopular). But given the social and economic consequences that arose from the recent power outage, the national power grid is suddenly a large, inviting target for those who seek to disrupt society because it has demonstrated weaknesses and widespread impact. It is impossible to protect all important points of such a large system using the standard paradigms of physical security, which is generally designed in isolation from the system it is protecting, and therefore offers little real protection. Instead we need to fix the basic problems with the infrastructure -- if we can reduce the potential impact of catastrophic events on the power grid by making it more robust and flexible, it will become a less inviting target for catastrophic terrorism. To achieve this, we must accept that we need non-market investments in the design and implementation of safety, security, and robustness in critical infrastructure.

They just don't make them like they used to

October 28, 2003 - Reading time: 5 minutes

Search

Tags