Failover is switching to a
redundant or standby
computer
A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations ( computation) automatically. Modern digital electronic computers can perform generic sets of operations known as programs. These prog ...
server
Server may refer to:
Computing
*Server (computing), a computer program or a device that provides functionality for other programs or devices, called clients
Role
* Waiting staff, those who work at a restaurant or a bar attending customers and su ...
,
system
A system is a group of interacting or interrelated elements that act according to a set of rules to form a unified whole. A system, surrounded and influenced by its environment, is described by its boundaries, structure and purpose and express ...
, hardware component or network upon the failure or
abnormal termination of the previously active
application
Application may refer to:
Mathematics and computing
* Application software, computer software designed to help the user to perform specific tasks
** Application layer, an abstraction layer that specifies protocols and interface methods used in a c ...
, server, system, hardware component, or network in a
computer network
A computer network is a set of computers sharing resources located on or provided by network nodes. The computers use common communication protocols over digital interconnections to communicate with each other. These interconnections are ...
. Failover and
switchover are essentially the same operation, except that failover is automatic and usually operates without warning, while switchover requires human intervention.
Systems designers usually provide failover capability in servers, systems or networks requiring
near-continuous availability and a high degree of
reliability.
At the server level, failover automation usually uses a "
heartbeat" system that connects two servers, either through using a separate cable (for example,
RS-232
In telecommunications, RS-232 or Recommended Standard 232 is a standard originally introduced in 1960 for serial communication transmission of data. It formally defines signals connecting between a ''DTE'' ('' data terminal equipment'') suc ...
serial ports/cable) or a network connection. As long as a regular "pulse" or "heartbeat" continues between the main server and the second server, the second server will not bring its systems online. There may also be a third "spare parts" server that has running spare components for "hot" switching to prevent downtime. The second server takes over the work of the first as soon as it detects an alteration in the "heartbeat" of the first machine. Some systems have the ability to send a notification of failover.
Certain systems, intentionally, do not failover entirely automatically, but require human intervention. This "automated with manual approval" configuration runs automatically once a human has approved the failover.
Failback is the process of restoring a system, component, or service previously in a state of failure back to its original, working state, and having the standby system go from functioning back to standby.
The use of
virtualization
In computing, virtualization or virtualisation (sometimes abbreviated v12n, a numeronym) is the act of creating a virtual (rather than actual) version of something at the same abstraction level, including virtual computer hardware platforms, stor ...
software has allowed failover practices to become less reliant on physical hardware through the process referred to as
migration in which a running virtual machine is moved from one physical host to another, with little or no disruption in service.
History
The term "failover", although probably in use by engineers much earlier, can be found in a 1962 declassified
NASA
The National Aeronautics and Space Administration (NASA ) is an independent agency of the US federal government responsible for the civil space program, aeronautics research, and space research.
NASA was established in 1958, succeedin ...
report. The term "switchover" can be found in the 1950s when describing '"Hot" and "Cold" Standby Systems', with the current meaning of immediate switchover to a running system (hot) and delayed switchover to a system that needs starting (cold). A conference proceedings from 1957 describes computer systems with both Emergency Switchover (i.e. failover) and Scheduled Failover (for maintenance).
Proceedings of the Western Joint Computer Conference
Macmillan 1957
See also
* Data integrity
* Disaster recovery
Disaster recovery is the process of maintaining or reestablishing vital infrastructure and systems following a natural or human-induced disaster, such as a storm or battle.It employs policies, tools, and procedures. Disaster recovery focuses on ...
* Fault-tolerance
* Fencing (computing)
* High-availability cluster
* Load balancing
* Log shipping
* Safety engineering
* teleportation (virtualization)
References
{{compu-network-stub
Computer networking
Fault-tolerant computer systems