
Failover is switching to a
redundant or standby
computer
A computer is a machine that can be Computer programming, programmed to automatically Execution (computing), carry out sequences of arithmetic or logical operations (''computation''). Modern digital electronic computers can perform generic set ...
server,
system
A system is a group of interacting or interrelated elements that act according to a set of rules to form a unified whole. A system, surrounded and influenced by its open system (systems theory), environment, is described by its boundaries, str ...
, hardware component or network upon the failure or
abnormal termination of the previously active
application, server, system, hardware component, or network in a
computer network
A computer network is a collection of communicating computers and other devices, such as printers and smart phones. In order to communicate, the computers and devices must be connected by wired media like copper cables, optical fibers, or b ...
. Failover and
switchover are essentially the same operation, except that failover is automatic and usually operates without warning, while switchover requires human intervention.
History
The term "failover", although probably in use by engineers much earlier, can be found in a 1962 declassified
NASA
The National Aeronautics and Space Administration (NASA ) is an independent agencies of the United States government, independent agency of the federal government of the United States, US federal government responsible for the United States ...
report. The term "switchover" can be found in the 1950s when describing '"Hot" and "Cold" Standby Systems', with the current meaning of immediate switchover to a running system (hot) and delayed switchover to a system that needs starting (cold). A conference proceedings from 1957 describes computer systems with both Emergency Switchover (i.e. failover) and Scheduled Failover (for maintenance).
Failover
Systems design
The basic study of system design is the understanding of component parts and their subsequent interaction with one another.
Systems design has appeared in a variety of fields, including sustainability, computer/software architecture, and sociolog ...
ers usually provide failover capability in servers, systems or networks requiring
high availability
High availability (HA) is a characteristic of a system that aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.
There is now more dependence on these systems as a result of modernization ...
and a high degree of
reliability.
At the server level, failover automation usually uses a
heartbeat system that connects two servers, either through using a separate cable (for example,
RS-232
In telecommunications, RS-232 or Recommended Standard 232 is a standard introduced in 1960 for serial communication transmission of data. It formally defines signals connecting between a ''DTE'' (''data terminal equipment'') such as a compu ...
serial ports/cable) or a network connection. In the most common design, as long as a regular "pulse" or heartbeat continues between the main server and the second server, the second server will not bring its systems online; however a few systems actively use all servers and can failover their work to remaining servers after a failure. There may also be a third "spare parts" server that has running spare components for "hot" switching to prevent downtime. The second server takes over the work of the first as soon as it detects an alteration in the heartbeat of the first machine. Some systems have the ability to send a notification of failover.
Certain systems, intentionally, do not failover entirely automatically, but require human intervention. This "automated with manual approval" configuration runs automatically once a human has approved the failover.
Failback
Failback is the process of restoring a system, component, or service previously in a state of failure back to its original, working state, and having the standby system go from functioning back to standby.
Usage
The use of
virtualization
In computing, virtualization (abbreviated v12n) is a series of technologies that allows dividing of physical computing resources into a series of virtual machines, operating systems, processes or containers.
Virtualization began in the 1960s wit ...
software has allowed failover practices to become less reliant on physical hardware through the process referred to as
migration
Migration, migratory, or migrate may refer to: Human migration
* Human migration, physical movement by humans from one region to another
** International migration, when peoples cross state boundaries and stay in the host state for some minimum le ...
in which a running virtual machine is moved from one physical host to another, with little or no disruption in service.
Failover and failback technology are also regularly used in the Microsoft SQL Server database, in which SQL Server Failover Cluster Instance (FCI) is installed/configured on top of the Windows Server failover Cluster (WSFC). The SQL Server groups and resources running on WSFC can manually be failover to the second node
[https://www.dbsection.com/how-to-failover-cluster-from-one-node-to-another/] for any planned maintenance on the first node or automatically failover to the second node in case of any issues on the first node. In the same way, a failback operation can be performed to the first node once the issue is resolved or maintenance is done on it.
See also
*
Computer cluster
A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The newes ...
*
Data integrity
Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire Information Lifecycle Management, life-cycle. It is a critical aspect to the design, implementation, and usage of any system that stores, proc ...
*
Fail-safe
In engineering, a fail-safe is a design feature or practice that, in the event of a failure causes, failure of the design feature, inherently responds in a way that will cause minimal or no harm to other equipment, to the environment or to people. ...
*
Fault-tolerance
Fault tolerance is the ability of a system to maintain proper operation despite failures or faults in one or more of its components. This capability is essential for high-availability, mission critical, mission-critical, or even life-critical sys ...
*
Fencing (computing)
*
High-availability cluster
In computing, high-availability clusters (HA clusters) or fail-over clusters are groups of computers that support server applications that can be reliably utilized with a minimum amount of down-time. They operate by using high availability sof ...
*
IT disaster recovery
IT disaster recovery (also, simply disaster recovery (DR)) is the process of maintaining or reestablishing vital infrastructure and systems following a natural or human-induced disaster, such as a storm or battle. DR employs policies, tools, an ...
*
Load balancing
*
Log shipping
*
Safety engineering
Safety engineering is an engineering Branches of science, discipline which assures that engineered systems provide acceptable levels of safety. It is strongly related to industrial engineering/systems engineering, and the subset system safety en ...
*
Teleportation (virtualization)
References
{{Authority control
Computer networking
Fault-tolerant computer systems