IT service continuity
IT Service Continuity (ITSC) is a subset of business continuity planning (BCP) that focuses on Recovery Point Objective (RPO) and Recovery Time Objective (RTO). It encompasses IT disaster recovery planning and wider IT resilience planning. It also incorporates IT infrastructure and services related to communications, such as telephony and data communications.Principles of backup sites
Planning includes arranging for backup sites, whether they are "hot" (operating prior to a disaster), "warm" (ready to begin operating), or "cold" (requires substantial work to begin operating), and standby sites with hardware as needed for continuity. In 2008, the British Standards Institution launched a specific standard supporting Business Continuity StandardRecovery Time Objective
The Recovery Time Objective (RTO) is the targeted duration of time and a service level within which a business process must be restored after a disruption in order to avoid a break in business continuity. According to business continuity planning methodology, the RTO is established during theRecovery Time Actual
Recovery Time Actual (RTA) is the critical metric for business continuity and disaster recovery. The business continuity group conducts timed rehearsals (or actuals), during which RTA gets determined and refined as needed.Recovery Point Objective
A Recovery Point Objective (RPO) is the maximum acceptable interval during which transactional data is lost from an IT service. For example, if RPO is measured in minutes, then in practice, off-site mirrored backups must be continuously maintained as a daily off-site backup will not suffice.Relationship to Recovery Time Objective
A recovery that is not instantaneous restores transactional data over some interval without incurring significant risks or losses. RPO measures the maximum time in which recent data might have been permanently lost and not a direct measure of loss quantity. For instance, if the BC plan is to restore up to the last available backup, then the RPO is the interval between such backups. RPO is not determined by the existing backup regime. InsteadData synchronization points
A data synchronization point is a backup is completed. It halts update processing while a disk-to-disk copy is completed. The backup copy reflects the earlier version of the copy operation; not when the data is copied to tape or transmitted elsewhere.System design
RTO and the RPO must be balanced, taking business risk into account, along with other system design criteria. RPO is tied to the times backups are secured offsite. Sending synchronous copies to an offsite mirror allows for most unforeseen events. The use of physical transportation for tapes (or other transportable media) is common. Recovery can be activated at a predetermined site. Shared offsite space and hardware complete the package. For high volumes of high-value transaction data, hardware can be split across multiple sites.History
Planning for disaster recovery and information technology (IT) developed in the mid to late 1970s as computer center managers began to recognize the dependence of their organizations on their computer systems. At that time, most systems were batch-orientedClassification
Disasters can be the result of three broad categories of threats and hazards. * Natural hazards include acts of nature such as floods, hurricanes, tornadoes, earthquakes, and epidemics. * Technological hazards include accidents or the failures of systems and structures such as pipeline explosions, transportation accidents, utility disruptions, dam failures, and accidental hazardous material releases. * Human-caused threats that include intentional acts such as active assailant attacks, chemical or biological attacks, cyber attacks against data or infrastructure, sabotage, and war. Preparedness measures for all categories and types of disasters fall into the five mission areas of prevention, protection, mitigation, response, and recovery.Planning
Research supports the idea that implementing a more holistic pre-disaster planning approach is more cost-effective. Every $1 spent on hazard mitigation (such as a disaster recovery plan) saves society $4 in response and recovery costs. 2015 disaster recovery statistics suggest that downtime lasting for one hour can cost * small companies $8,000, * mid-size organizations $74,000, and * large enterprises $700,000 or more. As IT systems have become increasingly critical to the smooth operation of a company, and arguably the economy as a whole, the importance of ensuring the continued operation of those systems, and their rapid recovery, has increased.Control measures
Control measures are steps or mechanisms that can reduce or eliminate threats. The choice of mechanisms is reflected in a disaster recovery plan (DRP). Control measures can be classified as controls aimed at preventing an event from occurring, controls aimed at detecting or discovering unwanted events, and controls aimed at correcting or restoring the system after a disaster or an event. These controls are documented and exercised regularly using so-called "DR tests".Strategies
The disaster recovery strategy derives from the business continuity plan. Metrics for business processes are then mapped to systems and infrastructure. A cost-benefit analysis highlighs which disaster recovery measures are appropriate. Different strategies make sense based on the cost of downtime compared to the cost of implementing a particular strategy. Common strategies include: * backups to tape and sent off-site * backups to disk on-site (copied to off-site disk) or off-site * replication off-site, such that once the systems are restored or synchronized, possibly via storage area network technology * private cloud solutions that replicate metadata (VMs, templates and disks) into the private cloud. Metadata are configured as an XML representation called Open Virtualization Format, and can be easily restored * hybrid cloud solutions that replicate both on-site and to off-site data centers. This provides instant fail-over to on-site hardware or to cloud data centers. * high availability systems which keep both the data and system replicated off-site, enabling continuous access to systems and data, even after a disaster (often associated withDisaster recovery as a service
See also
References
Further reading
* * * * * * * *External links
* * * {{Authority control Backup Business continuity Data management IT risk management