In the past couple of years, national security has been on everyone's mind; laws have been passed, rules have been enacted, and generally life has been made more miserable so that we as a country can feel more secure. Some of the initiatives that we have seen are very visible; airport security, security at federal buildings, and legislation such as the Patriot Act have been widely discussed, and their relative merits are subject to some debate. There has also been much behind-the-scenes work, such as the Container Security Initiative (CSI), which is designed to protect the transportation of the ubiquitous and increasingly important 40-foot containers that bring us much of what we buy. All discussion of the merits of these security precautions aside, we can still say that people are actively working to keep our critical infrastructure safe from attack. But we have been ignoring an important point in the process of securing our national infrastructure, and that overlooked point presented itself to us recently. The recent massive power outage in the northeast provided us an important lesson: decreasing margins of safety and error in our infrastructure place critical societal functions at greater risk of significant disruptions from rare occurrences -- accidental, malicious, or otherwise unforeseen. This is nothing new; it has been going on for decades now, as a series of decisions by policy makers placed the administration of our national infrastructure in the hands of profit-seeking organizations. This is not necessarily bad, but redefining acceptable levels of risk and protections as the world changes is hard work, and needs to be done carefully.
Cost pressures and tight engineering under benign assumptions over the last few decades have lead to thin margins of error in our current infrastructure. This is to say that certain major failures are assumed to be so unlikely that they are discounted during the design process. This way of thinking creates systems that tend to be less expensive, and are optimized to fit the relatively optimistic world and set of basic assumptions. But while optimized engineering leads to most events being of small consequence (because the systems are engineered to tolerate them), some rare events that might otherwise have been relatively benign (or at least tolerable) can now lead to massive disruption. As the margins of safety designed into the large, complex, and poorly understood systems that make up our critical infrastructure (such as the national power grid) are whittled away in the name of cost-effectiveness, the likelihood of massive, uncontrolled failures increases. But while it seems like this might be just asking for trouble, it is seen as "bad engineering" to overdesign a system to tolerate very rare events, or events whose specific causes are not well understood, if that tolerance is perceived to cost more than the failures it would prevent (in terms of expected value to the customer), or if the likelihood of the failure seems very remote -- fragility to extremely rare events is seen as a good business decision. This is why rare disruptions (like power outages) come as little surprise to insiders of highly optimized or complex infrastructures. Building excess capacity and redundancy into a system such as the electric power grid is essential to safety and reliability, but it has no market incentive -- safety doesn't sell.
What the market calls "excess capacity" (note the connotations of "excess"), others call a safety net. When a critical power line fails, parallel lines must have this "excess" capacity to take over the flow, and this safety net must remain intact when lines are out of service for maintenance. Such safety is not cheap. So while adequate margins of safety generally have the side effect of increasing the overall efficiency and reliability of a system, at some point investments in redundancy are seen as extravagant and wasteful to stakeholders, whether they are private stakeholders (i.e. shareholders) or public (i.e. taxpayers). Those who are out to placate stakeholders tend to favor more visible single-point safety or security measures, which tend to cost more in the long run and are generally less effective.
The invisible hand of economics creates systems designed and optimized under optimistic assumptions of relatively benign environments; these systems are at great risk if new or unexpected threats arise, because the margins that have historically made it possible to work around unexpected problems (think of the Apollo-13 near-disaster) are no longer designed in. The development of our critical infrastructure is subject to these economic motivations, so it is already (and will become more) fragile to rare or unexpected events. That's good business paving the road to future vulnerabilities, because the market will not bear the cost of the level of reliability that it expects. The pace of technological change and societal reliance on these systems amplify the uncertainty, urgency, and magnitude of risk here.
After 9/11, we can point out how scenarios that were previously almost unthinkable are suddenly possible, and thus engineered defenses against potential attacks are more strongly motivated. However, to define and quantify threats and their impact, particularly in combination with coordinated physical and psychological attacks and effects, requires deep contemplative research, development, large-scale experimentation, and the like -- all very costly with little to no visible immediate payoff (which makes them politically unpopular). But given the social and economic consequences that arose from the recent power outage, the national power grid is suddenly a large, inviting target for those who seek to disrupt society because it has demonstrated weaknesses and widespread impact. It is impossible to protect all important points of such a large system using the standard paradigms of physical security, which is generally designed in isolation from the system it is protecting, and therefore offers little real protection. Instead we need to fix the basic problems with the infrastructure -- if we can reduce the potential impact of catastrophic events on the power grid by making it more robust and flexible, it will become a less inviting target for catastrophic terrorism. To achieve this, we must accept that we need non-market investments in the design and implementation of safety, security, and robustness in critical infrastructure.
So a friend of mine was on TV today -- specifically, he was a guest on the show Tech Support on People TV, which is broadcast live to whoever is watching in metro Atlanta. Since it's not everyday that most people get on TV, and there was supposedly room in the studio for 3 friends to watch, I went with two other people to watch David be on TV. Which was going to be fun.
So we get to People TV, which was highly reminiscent of UHF, but it was really neat to be hanging around the studio, and we were going to be sitting in the control room watching the show.
At least that was the plan until about 40 seconds (literally) before the show started, when the producer of the show asks us "are you three on camera?" We thought that he was asking us if we were going to be on the show, so we said "no" -- to which he said "Well, you are now," and started herding us through the door into the studio. We were trying to tell him that we weren't on the show until we realized that what he wanted us to do was operate the cameras.
So we operated the cameras, which was cool. Since none of us knew what we were doing, it was a bit interesting at first, but we had lots of fun and really got the hang of it by the end. And we got on the credits of the show, which was neat even though they spelled my name wrong. Plus we learned lots of neat things like how to zoom and focus and roll the cameras around, plus some cool TV cameraman phrases like "I need a two shot, left."
After the show, we were hanging around outside the building waiting for David to take care of some paperwork to get a VHS copy of the show, but he was taking too long so we went inside to get some free pizza and escape some weird drunk guy. So we're eating pizza in the hallway, and the producer comes out and says "hey, do you want to run cameras for the next show?"
Of course we said yes, and this time even got a more active role in the production process, which is really hectic by the way -- especially for live broadcasts.
Oh, and they spelled my name wrong on the credits again, but differently this time.
So anyway, definitely an interesting day. I wonder if I can go and operate the cameras more; that was fun. He said we could come back; maybe we can take him up on his offer :)
Note, 1 May 2020: As I look at this post, almost 2 decades after originally posting it, I no longer agree with a lot of what I said here. In the intervening 17 years, I started working at NASA, the Shuttle fleet was retired, and then I stopped working at NASA (those three events are unrelated, I swear!). Anyway, I have a lot to say about the Shuttle program (mostly good things) and NASA in general (a lot of bad things) but some get said in later posts, and some I will keep to myself for now. Maybe I'll do another NASA post, or set of posts, in the future.
I spent most of this morning and early afternoon glued to the radio, listening to reports and commentary on the loss of the Space Shuttle Columbia. I tried sitting in front of the TV watching CNN as I had done in September of 2001, but CNN's coverage of the event was sickening. NPR ended up having more intelligent coverage than any of the other news sources I tried.
The train of events leading up to the disaster is posted in so many places that I'm not going to bother mentioning it here. I'm also going to refrain from speculating on the direct cause of the disaster, because I don't have the requisite competence in this area. However, there is one nagging issue that I feel bears a closer look -- that of the piece of insulation that fell off of the OV (Orbiter Vehicle) at launch and apparently impacted the left wing.
What bothers me is not that this appears to be a smoking gun -- as I said, I'm in no position to speculate on that. The part that bothers me is the fact that once the Shuttle had launched, NASA had no way of inspecting the wing to see if it was damaged.
In one of the press conferences, we learned that Columbia was not equipped with an arm, there was no method of getting a view of the sides or bottom of the OV, and EVA was out of the question because even if one of the astronauts could get to the wing (they couldn't), there would be nothing for them to do because the astronauts do not have the training or equipment to make repairs of that nature to the shuttle. Furthermore, if there was in fact visible damage to the OV, the astronauts could do nothing but float around in space, because Columbia would not be able to (for instance) maneuver itself to rendezvous with the ISS, and even if it could it is not equipped to dock with the station. Furthermore, NASA's most optimistic estimate of how long it would take to launch a Shuttle to respond to some emergency is 2-3 weeks -- as long as there is already a shuttle on the pad, ready to go, and there are no crew change requirements. Otherwise, your emergency could have to wait 3-4 months to prepare a vehicle and crew for launch. Hardly a viable option.
Yes, it's true: NASA, which makes backups of backups of backups and contingency plans for contingency plans, has no way of saving astronauts once they are in space. Not only that; they have left themselves a huge blind spot (the physical condition of the bottom of the shuttle).
This blind spot is the cause of much speculation now on the cause of the Columbia disaster -- was there damage to the left wing of the OV from a piece of insulation that fell during launch? We may never know for sure. Any method of showing an image of the Shuttle's wing -- EVA, a camera, whatever -- could have answered many questions, and perhaps saved the lives of seven astronauts. If there are any benefits to be gained from this event, I hope to see:
It seems fitting at this point to refrain from drawing any conclusions. Hopefully, we will know more about what happened and what could have been done to prevent it in the weeks and months to come. Only then can a backseat engineer like myself feel confident in providing direction for the future of the space program...
In our rush to fix the problems that led to the recent election-related debacle, several states are trying to implement electronic voting systems to ensure quick and accurate election results. In theory, this seems like an excellent idea -- after all, an all-electronic system means no hanging chads, no butterfly ballots, and no manual recounts. The problem is that so far, we apparently haven't come across a way to do it right.
After the 2000 presidential elections, everyone had an opinion on how our voting system should be improved. Among the worst ideas were internet voting or voting at ATMs. Thankfully, those ideas weren't implemented, but some of what we've seen in 2002 is just as bad. Why bad? After all, an MIT/CalTech press release bubbled on about the wonderful improvement in Florida's voting technology in 2002. "On average," it says, "2.0 percent of Democratic voters recorded no vote for governor in [Brevard, Broward, Duval, Hillsborough, Miami-Dade, Palm Beach, and Pinellas] counties ... this is a 35 percent improvement in performance. ... These results are very encouraging."
I cannot begin to apprehend the confusion of ideas that could provoke such a statement.
Democracy has failed if even a single voter is not heard in an election. Period. According to the 2000 census, the seven counties represented in the MIT/CalTech study contain 6,260,142 residents over the age of 18. I don't know how many registered voters are in those counties, and how many of those are democrats, so for the sake of argument, I'll say that 5% of those 6,260,142 residents are democratic voters that went out to vote. (this may be unnecessarily conservative but it should work for the sake of this argument). If this were the case, over 6,200 votes were not counted in those seven counties alone. Over 6,200 votes. This is nothing to be proud of, even if it is an improvement.
Problems that did occur in the seven counties were lightly dismissed by Professor Charles Stewart, an MIT professor working on the Voting Technology Project (which released this statement), as "problems encountered preparing for election day, such as training poll workers." Next time you wonder why your computer is so hard to use, keep in mind that Charles Stewart is a professor at one of the nation's most respected engineering schools. Does he tell his students that user interface design and end-user training are unimportant? That engineers design circuits, and problems with the final product can be attributed to incompetent users? Blaming the user is a common fault among engineers who feel that if they understand their product, so should everyone else. Everyone else isn't an engineer, though, or a computer scientist. Not accounting for end users is the biggest mistake an engineer can make.
Blaming hapless poll workers or poorly funded local election commissions, while easy, overlooks two fundamental problems:
Having said all of that, we can't dismiss some of the problems so lightly. Here are some of the more spectacular failures that happened in the 2002 Florida elections:
This list could go on, but there is no point in berating the obvious. The fact is that voting should not require training, but apparently it does. Electronic voting systems should fix this problem, but so far they haven't. An electronic voting system needs to meet several requirements:
None of this is hard, or difficult to fathom. However, I've never accused our elected officials of being competent. I say lets go back to dropping stones in vases.