Cascading failure
   HOME

TheInfoList



OR:

A cascading failure is a failure in a
system A system is a group of interacting or interrelated elements that act according to a set of rules to form a unified whole. A system, surrounded and influenced by its environment, is described by its boundaries, structure and purpose and express ...
of interconnected parts in which the failure of one or few parts leads to the failure of other parts, growing progressively as a result of
positive feedback Positive feedback (exacerbating feedback, self-reinforcing feedback) is a process that occurs in a feedback loop which exacerbates the effects of a small disturbance. That is, the effects of a perturbation on a system include an increase in th ...
. This can occur when a single part fails, increasing the probability that other portions of the system fail. Such a failure may happen in many types of systems, including power transmission, computer networking, finance, transportation systems, organisms, the human body, and ecosystems. Cascading failures may occur when one part of the system fails. When this happens, other parts must then compensate for the failed component. This in turn overloads these nodes, causing them to fail as well, prompting additional nodes to fail one after another.


In power transmission

Cascading failure is common in
power grid An electrical grid is an interconnected network for electricity delivery from producers to consumers. Electrical grids vary in size and can cover whole countries or continents. It consists of:Kaplan, S. M. (2009). Smart Grid. Electrical Power ...
s when one of the elements fails (completely or partially) and shifts its load to nearby elements in the system. Those nearby elements are then pushed beyond their capacity so they become overloaded and shift their load onto other elements. Cascading failure is a common effect seen in
high voltage High voltage electricity refers to electrical potential large enough to cause injury or damage. In certain industries, ''high voltage'' refers to voltage above a certain threshold. Equipment and conductors that carry high voltage warrant sp ...
systems, where a
single point of failure A single point of failure (SPOF) is a part of a system that, if it fails, will stop the entire system from working. SPOFs are undesirable in any system with a goal of high availability or reliability, be it a business practice, software ap ...
(SPF) on a fully loaded or slightly overloaded system results in a sudden spike across all nodes of the system. This surge current can induce the already overloaded nodes into failure, setting off more overloads and thereby taking down the entire system in a very short time. This failure process cascades through the elements of the system like a ripple on a pond and continues until substantially all of the elements in the system are compromised and/or the system becomes functionally disconnected from the source of its load. For example, under certain conditions a large power grid can collapse after the failure of a single transformer. Monitoring the operation of a system, in
real-time Real-time or real time describes various operations in computing or other processes that must guarantee response times within a specified time (deadline), usually a relatively short time. A real-time process is generally one that happens in defined ...
, and judicious disconnection of parts can help stop a cascade. Another common technique is to calculate a safety margin for the system by computer simulation of possible failures, to establish safe operating levels below which none of the calculated scenarios is predicted to cause cascading failure, and to identify the parts of the network which are most likely to cause cascading failures. One of the primary problems with preventing electrical grid failures is that the speed of the control signal is no faster than the speed of the propagating power overload, i.e. since both the control signal and the electrical power are moving at the same speed, it is not possible to isolate the outage by sending a warning ahead to isolate the element.


Examples

Cascading failure caused the following
power outage A power outage (also called a powercut, a power out, a power failure, a power blackout, a power loss, or a blackout) is the loss of the electrical power network supply to an end user. There are many causes of power failures in an electricity ...
s: * Blackout in Northeast America in 1965 * Blackout in Southern Brazil in 1999 * Blackout in Northeast America in 2003 * Blackout in Italy in 2003 * Blackout in London in 2003 * European Blackout in 2006 * Blackout in Northern India in 2012 * Blackout in South Australia in 2016 * Blackout in southeast South America in 2019


In computer networks

Cascading failures can also occur in
computer network A computer network is a set of computers sharing resources located on or provided by network nodes. The computers use common communication protocols over digital interconnections to communicate with each other. These interconnections are ...
s (such as the
Internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a '' network of networks'' that consists of private, p ...
) in which network traffic is severely impaired or halted to or between larger sections of the network, caused by failing or disconnected hardware or software. In this context, the cascading failure is known by the term cascade failure. A cascade failure can affect large groups of people and systems. The cause of a cascade failure is usually the overloading of a single, crucial router or node, which causes the node to go down, even briefly. It can also be caused by taking a node down for maintenance or upgrades. In either case, traffic is
routed Routing is the process of selecting a path for traffic in a network or between or across multiple networks. Broadly, routing is performed in many types of networks, including circuit-switched networks, such as the public switched telephone netwo ...
to or through another (alternative) path. This alternative path, as a result, becomes overloaded, causing it to go down, and so on. It will also affect systems which depend on the node for regular operation.


Symptoms

The symptoms of a cascade failure include:
packet loss Packet loss occurs when one or more packets of data travelling across a computer network fail to reach their destination. Packet loss is either caused by errors in data transmission, typically across wireless networks, or network congestion.Ku ...
and high network latency, not just to single systems, but to whole sections of a network or the internet. The high latency and packet loss is caused by the nodes that fail to operate due to
congestion collapse Network congestion in data networking and queueing theory is the reduced quality of service that occurs when a network node or link is carrying more data than it can handle. Typical effects include queueing delay, packet loss or the blocking ...
, which causes them to still be present in the network but without much or any useful communication going through them. As a result, routes can still be considered valid, without them actually providing communication. If enough routes go down because of a cascade failure, a complete section of the network or internet can become unreachable. Although undesired, this can help speed up the recovery from this failure as connections will time out, and other nodes will give up trying to establish connections to the section(s) that have become cut off, decreasing load on the involved nodes. A common occurrence during a cascade failure is a walking failure, where sections go down, causing the next section to fail, after which the first section comes back up. This ripple can make several passes through the same sections or connecting nodes before stability is restored.


History

Cascade failures are a relatively recent development, with the massive increase in traffic and the high interconnectivity between systems and networks. The term was first applied in this context in the late 1990s by a Dutch IT professional and has slowly become a relatively common term for this kind of large-scale failure.


Example

Network failures typically start when a single network node fails. Initially, the traffic that would normally go through the node is stopped. Systems and users get errors about not being able to reach hosts. Usually, the redundant systems of an ISP respond very quickly, choosing another path through a different backbone. The routing path through this alternative route is longer, with more
hops Hops are the flowers (also called seed cones or strobiles) of the hop plant '' Humulus lupulus'', a member of the Cannabaceae family of flowering plants. They are used primarily as a bittering, flavouring, and stability agent in beer, to w ...
and subsequently going through more systems that normally do not process the amount of traffic suddenly offered. This can cause one or more systems along the alternative route to go down, creating similar problems of their own. Related systems are also affected in this case. As an example, DNS resolution might fail and what would normally cause systems to be interconnected, might break connections that are not even directly involved in the actual systems that went down. This, in turn, may cause seemingly unrelated nodes to develop problems, that can cause another cascade failure all on its own. In December 2012, a partial loss (40%) of
Gmail Gmail is a free email service provided by Google. As of 2019, it had 1.5 billion active users worldwide. A user typically accesses Gmail in a web browser or the official mobile app. Google also supports the use of email clients via the POP and ...
service occurred globally, for 18 minutes. This loss of service was caused by a routine update of load balancing software which contained faulty logic—in this case, the error was caused by logic using an inappropriate 'all' instead of the more appropriate 'some'. The cascading error was fixed by fully updating a single node in the network instead of partially updating all nodes at one time.


Cascading structural failure

Certain load-bearing structures with discrete structural components can be subject to the "zipper effect", where the failure of a single structural member increases the load on adjacent members. In the case of the Hyatt Regency walkway collapse, a suspended walkway (which was already overstressed due to an error in construction) failed when a single vertical suspension rod failed, overloading the neighboring rods which failed sequentially (i.e. like a
zipper A zipper, zip, fly, or zip fastener, formerly known as a clasp locker, is a commonly used device for binding together two edges of fabric or other flexible material. Used in clothing (e.g. jackets and jeans), luggage and other bags, camping ...
). A bridge that can have such a failure is called fracture critical, and numerous bridge collapses have been caused by the failure of a single part. Properly designed structures use an adequate
factor of safety In engineering, a factor of safety (FoS), also known as (and used interchangeably with) safety factor (SF), expresses how much stronger a system is than it needs to be for an intended load. Safety factors are often calculated using detailed analy ...
and/or alternate load paths to prevent this type of mechanical cascade failure.


Other examples


Biology

Biochemical cascade A biochemical cascade, also known as a signaling cascade or signaling pathway, is a series of chemical reactions that occur within a biological cell when initiated by a stimulus. This stimulus, known as a first messenger, acts on a receptor that ...
s exist in biology, where a small reaction can have system-wide implications. One negative example is
ischemic cascade The ischemic (ischaemic) cascade is a series of biochemical reactions that are initiated in the brain and other aerobic tissues after seconds to minutes of ischemia (inadequate blood supply). This is typically secondary to stroke, injury, or cardi ...
, in which a small
ischemic Ischemia or ischaemia is a restriction in blood supply to any tissue, muscle group, or organ of the body, causing a shortage of oxygen that is needed for cellular metabolism (to keep tissue alive). Ischemia is generally caused by problems w ...
attack releases
toxin A toxin is a naturally occurring organic poison produced by metabolic activities of living cells or organisms. Toxins occur especially as a protein or conjugated protein. The term toxin was first used by organic chemist Ludwig Brieger (1849 ...
s which kill off far more cells than the initial damage, resulting in more toxins being released. Current research is to find a way to block this cascade in
stroke A stroke is a disease, medical condition in which poor cerebral circulation, blood flow to the brain causes cell death. There are two main types of stroke: brain ischemia, ischemic, due to lack of blood flow, and intracranial hemorrhage, hemorr ...
patients to minimize the damage. In the study of extinction, sometimes the extinction of one species will cause many other extinctions to happen. Such a species is known as a
keystone species A keystone species is a species which has a disproportionately large effect on its natural environment relative to its abundance, a concept introduced in 1969 by the zoologist Robert T. Paine. Keystone species play a critical role in maintaini ...
.


Electronics

Another example is the
Cockcroft–Walton generator The Cockcroft–Walton (CW) generator, or multiplier, is an electric circuit that generates a high DC voltage from a low-voltage AC or pulsing DC input. It was named after the British and Irish physicists John Douglas Cockcroft and Ernest ...
, which can also experience cascade failures wherein one failed
diode A diode is a two-terminal electronic component that conducts current primarily in one direction (asymmetric conductance); it has low (ideally zero) resistance in one direction, and high (ideally infinite) resistance in the other. A diod ...
can result in all the diodes failing in a fraction of a second. Yet another example of this effect in a scientific experiment was the implosion in 2001 of several thousand fragile glass photomultiplier tubes used in the
Super-Kamiokande Super-Kamiokande (abbreviation of Super-Kamioka Neutrino Detection Experiment, also abbreviated to Super-K or SK; ja, スーパーカミオカンデ) is a Neutrino detector, neutrino observatory located Kamioka Observatory, under Mount Ikeno ...
experiment, where the shock wave caused by the failure of a single detector appears to have triggered the implosion of the other detectors in a chain reaction.


Finance

In
finance Finance is the study and discipline of money, currency and capital assets. It is related to, but not synonymous with economics, the study of production, distribution, and consumption of money, assets, goods and services (the discipline of f ...
, the risk of cascading failures of financial institutions is referred to as ''
systemic risk In finance, systemic risk is the risk of collapse of an entire financial system or entire market, as opposed to the risk associated with any one individual entity, group or component of a system, that can be contained therein without harming the ...
:'' the failure of one financial institution may cause other financial institutions (its counterparties) to fail, cascading throughout the system. Institutions that are believed to pose systemic risk are deemed either "
too big to fail "Too big to fail" (TBTF) and "too big to jail" is a theory in banking and finance that asserts that certain corporations, particularly financial institutions, are so large and so interconnected that their failure would be disastrous to the great ...
" (TBTF) or "too interconnected to fail" (TICTF), depending on why they appear to pose a threat. Note however that systemic risk is not due to individual institutions per se, but due to the interconnections. Frameworks to study and predict the effects of cascading failures have been developed in the research literature. A related (though distinct) type of cascading failure in finance occurs in the stock market, exemplified by the 2010 Flash Crash.


Interdependent cascading failures

Diverse
infrastructure Infrastructure is the set of facilities and systems that serve a country, city, or other area, and encompasses the services and facilities necessary for its economy, households and firms to function. Infrastructure is composed of public and priv ...
s such as
water supply Water supply is the provision of water by public utilities, commercial organisations, community endeavors or by individuals, usually via a system of pumps and pipes. Public water supply systems are crucial to properly functioning societies. Th ...
,
transportation Transport (in British English), or transportation (in American English), is the intentional movement of humans, animals, and goods from one location to another. Modes of transport include air, land ( rail and road), water, cable, pipelin ...
, fuel and
power station A power station, also referred to as a power plant and sometimes generating station or generating plant, is an industrial facility for the generation of electric power. Power stations are generally connected to an electrical grid. Many ...
s are coupled together and depend on each other for functioning, see Fig. 1. Owing to this coupling, interdependent networks are extremely sensitive to random failures, and in particular to
targeted attacks Targeted threats are a class of malware destined for one specific organization or industry. A type of crimeware, these threats are of particular concern because they are designed to capture sensitive information. Targeted attacks may include threa ...
, such that a failure of a small fraction of nodes in one network can trigger an iterative cascade of failures in several interdependent networks. Electrical blackouts frequently result from a cascade of failures between interdependent networks, and the problem has been dramatically exemplified by the several large-scale blackouts that have occurred in recent years. Blackouts are a fascinating demonstration of the important role played by the dependencies between networks. For example, the
2003 Italy blackout The 2003 Italy blackout was a serious power outage that affected all of the Italian Peninsula for 12 hours and part of Switzerland near Geneva for 3 hours on 28 September 2003. It was the largest blackout in the series of blackouts in 2003, in ...
resulted in a widespread failure of the railway network,
health care systems Health, according to the World Health Organization, is "a state of complete physical, mental and social well-being and not merely the absence of disease and infirmity".World Health Organization. (2006)''Constitution of the World Health Organiza ...
, and
financial services Financial services are the economic services provided by the finance industry, which encompasses a broad range of businesses that manage money, including credit unions, banks, credit-card companies, insurance companies, accountancy companies, ...
and, in addition, severely influenced the
telecommunication network A telecommunications network is a group of nodes interconnected by telecommunications links that are used to exchange messages between the nodes. The links may use a variety of technologies based on the methodologies of circuit switching, mess ...
s. The partial failure of the communication system in turn further impaired the
electrical grid An electrical grid is an interconnected network for electricity delivery from producers to consumers. Electrical grids vary in size and can cover whole countries or continents. It consists of:Kaplan, S. M. (2009). Smart Grid. Electrical Power ...
management system, thus producing a positive feedback on the power grid. This example emphasizes how inter-dependence can significantly magnify the damage in an interacting network system.


Model for overload cascading failures

A model for cascading failures due to overload propagation is the Motter–Lai model.


See also

* Blackouts *
Brittle system Brittle systems theory creates an analogy between communication theory and mechanical systems. A brittle system is a system characterized by a sudden and steep decline in performance as the system state changes. This can be due to input parameters ...
* Butterfly effect *
Byzantine failure A Byzantine fault (also Byzantine generals problem, interactive consistency, source congruency, error avalanche, Byzantine agreement problem, and Byzantine failure) is a condition of a computer system, particularly distributed computing systems, ...
* Cascading rollback *
Chain reaction A chain reaction is a sequence of reactions where a reactive product or by-product causes additional reactions to take place. In a chain reaction, positive feedback leads to a self-amplifying chain of events. Chain reactions are one way that sys ...
*
Chaos theory Chaos theory is an interdisciplinary area of scientific study and branch of mathematics focused on underlying patterns and deterministic laws of dynamical systems that are highly sensitive to initial conditions, and were once thought to hav ...
*
Cache stampede A cache stampede is a type of cascading failure that can occur when massively parallel computing systems with caching mechanisms come under very high load. This behaviour is sometimes also called dog-piling. To understand how cache stampedes occ ...
*
Congestion collapse Network congestion in data networking and queueing theory is the reduced quality of service that occurs when a network node or link is carrying more data than it can handle. Typical effects include queueing delay, packet loss or the blocking ...
*
Domino effect A domino effect or chain reaction is the cumulative effect generated when a particular event triggers a chain of similar events. This term is best known as a mechanical effect and is used as an analogy to a falling row of dominoes. It typically ...
*
For Want of a Nail (proverb) "For Want of a Nail" is a proverb, having numerous variations over several centuries, reminding that seemingly unimportant acts or omissions can have grave and unforeseen consequences. Analysis The proverb has come down in many variations o ...
*
Network science Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors rep ...
*
Network theory Network theory is the study of graphs as a representation of either symmetric relations or asymmetric relations between discrete objects. In computer science and network science, network theory is a part of graph theory: a network can be de ...
*
Interdependent networks The study of interdependent networks is a subfield of network science dealing with phenomena caused by the interactions between complex networks. Though there may be a wide variety of interactions between networks, ''dependency'' focuses on th ...
*
Kessler Syndrome The Kessler syndrome (also called the Kessler effect, collisional cascading, or ablation cascade), proposed by NASA scientist Donald J. Kessler in 1978, is a scenario in which the density of objects in low Earth orbit (LEO) due to space pollutio ...
*
Percolation theory In statistical physics and mathematics, percolation theory describes the behavior of a network when nodes or links are added. This is a geometric type of phase transition, since at a critical fraction of addition the network of small, disconnecte ...
* Progressive collapse *
Virtuous circle and vicious circle A vicious circle (or cycle) is a complex chain of events that reinforces itself through a feedback loop, with detrimental results. It is a system with no tendency toward equilibrium (social, economic, ecological, etc.), at least in the short ...
*
Wicked problem In planning and policy, a wicked problem is a problem that is difficult or impossible to solve because of incomplete, contradictory, and changing requirements that are often difficult to recognize. It refers to an idea or problem that cannot be fi ...


References


Further reading

* * *


External links


Space Weather: Blackout — Massive Power Grid Failure

Cascading failure demo applet
(Monash University's Virtual Lab) * A. E. Motter and Y.-C. Lai
''Cascade-based attacks on complex networks,''
Physical Review E (Rapid Communications) 66, 065102 (2002). * P. Crucitti, V. Latora and M. Marchiori
''Model for cascading failures in complex networks,''
Physical Review E (Rapid Communications) 69, 045104 (2004).
Protection Strategies for Cascading Grid Failures — A Shortcut Approach
* I. Dobson, B. A. Carreras, and D. E. Newman
preprint
A loading-dependent model of probabilistic cascading failure, Probability in the Engineering and Informational Sciences, vol. 19, no. 1, January 2005, pp. 15–32.

on September 2, 1998. Swissair Flight 111 flying from New York to Geneva slammed into the Atlantic Ocean off the coast of Nova Scotia with 229 people aboard. Originally believed a terrorist act. After $39 million investigation, insurance settlement of $1.5 billion and more than four years, investigators unravel the puzzle: cascading failure. What is the legacy of Swissair 111? "We have a window into the internal structure of design, checks and balances, protection, and safety." -David Evans, Editor-in-Chief of Air Safety Week. * PhysicsWeb story
Accident grounds neutrino lab


* ttp://havlin.biu.ac.il/Pdf/Bremen070715a.pdf From Single Network to Network of Networks {{Electricity delivery Failure Reliability engineering Electric power transmission Systemic risk Systems science