HOME

TheInfoList



OR:

Given organizations' increasing dependency on
information technology Information technology (IT) is a set of related fields within information and communications technology (ICT), that encompass computer systems, software, programming languages, data processing, data and information processing, and storage. Inf ...
(IT) to run their operations, business continuity planning (and its subset IT service continuity planning) covers the entire organization, while disaster recovery focuses on IT.
Auditing An audit is an "independent examination of financial information of any entity, whether profit oriented or not, irrespective of its size or legal form when such an examination is conducted with a view to express an opinion thereon." Auditing al ...
documents covering an organization's business continuity and disaster recovery (BCDR) plans provides a third-party validation to stakeholders that the
documentation Documentation is any communicable material that is used to describe, explain or instruct regarding some attributes of an object, system or procedure, such as its parts, assembly, installation, maintenance, and use. As a form of knowledge managem ...
is complete and does not contain
material A material is a matter, substance or mixture of substances that constitutes an Physical object, object. Materials can be pure or impure, living or non-living matter. Materials can be classified on the basis of their physical property, physical ...
misrepresentations.


Overview

Often used together, the terms business continuity (BC) and disaster recovery (DR) are very different. BC refers to the ability of a business to continue critical functions and business processes after the occurrence of a disaster, whereas DR refers specifically to the IT functions of the business, albeit a subset of BC.


Metrics

The primary objective is to protect the organization in the event that all or part of its operations and/or computer services are rendered partially or completely unusable.


DR metrics

Minimizing downtime and data loss during disaster recovery is typically measured in terms of two key concepts: * Recovery time objective (RTO), time until a
system A system is a group of interacting or interrelated elements that act according to a set of rules to form a unified whole. A system, surrounded and influenced by its open system (systems theory), environment, is described by its boundaries, str ...
is completely up and running * Recovery point objective (RPO), a measure of the ability to recover files by specifying a point in time the
backup In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "wikt:back ...
copy will restore to.


The auditor's role

An auditor examines and assesses * the procedures stated in the BCP and DR plan are actually consistent with real practice * a specific individual within the organization, who may be referred to as the disaster recovery officer, the disaster recovery liaison, the DR coordinator, or some other similar title, has the technical skills, training, experience, and abilities to analyze the capabilities of the team members to complete assigned tasks * more than one individual is trained and capable of doing a particular function during the DR exercise. Tests and inquiries of personnel can help achieve this objective.


Documentation


Disaster recovery plan

A disaster recovery plan (DRP) is a documented process or set of procedures to execute an organization's disaster recovery processes and recover and protect a business IT infrastructure in the event of a
disaster A disaster is an event that causes serious harm to people, buildings, economies, or the environment, and the affected community cannot handle it alone. '' Natural disasters'' like avalanches, floods, earthquakes, and wildfires are caused by na ...
. It is "a comprehensive statement of consistent actions to be taken before, during and after a disaster". The disaster could be natural, environmental or man-made. Man-made disasters could be intentional (for example, an act of a terrorist) or unintentional (that is, accidental, such as the breakage of a man-made dam or even "fat fingers" - or errant commands entered - on a computer system).


Types of plans

Although there is no one-size-fits-all plan, there are three basic strategies: # prevention, including proper backups, having surge protectors and generators # detection, a byproduct of routine inspections, which may discover new (potential) threats # correction The latter may include securing proper insurance policies, and holding a "lessons learned" brainstorming session.


Best practices

To maximize their effectiveness, DRPs are most effective when updated frequently, and should: * be an integral part of all business analysis processes, * be revisited at every major corporate acquisition, at every new product launch and at every new system development milestone. * be thoroughly tested, not just unpracticed bureaucratic documentation Adequate records need to be retained by the organization. The auditor examines records, billings, and
contracts A contract is an agreement that specifies certain legally enforceable rights and obligations pertaining to two or more parties. A contract typically involves consent to transfer of goods, services, money, or promise to transfer any of thos ...
to verify that records are being kept. One such record is a current list of the organization's hardware and software vendors. Such list is made and periodically updated to reflect changing business practices and as part of an IT asset management system. Copies of it are stored on and off site and are made available or accessible to those who require them. An auditor tests the procedures used to meet this objective and determine their effectiveness.


Relationship to BCPs

Disaster recovery is a
subset In mathematics, a Set (mathematics), set ''A'' is a subset of a set ''B'' if all Element (mathematics), elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they a ...
of business continuity. Where DRP encompasses the policies, tools and procedures to enable recovery of data following a catastrophic event, BCP involves keeping all aspects of a business functioning regardless of potential disruptive events. As such, a business continuity plan is a comprehensive organizational strategy that includes the DRP as well as threat prevention, detection, recovery, and resumption of operations should a data breach or other disaster event occur. Therefore, BCP consists of five component plans: * Business resumption plan * Occupant emergency plan * Continuity of operations plan * Incident management plan * Disaster recovery plan The first three components (business resumption, occupant emergency, and continuity of operations plans) do not deal with the IT infrastructure. The incident management plan (IMP) does deal with the IT infrastructure, but since it establishes structure and procedures to address cyber attacks against an organization's IT systems, it generally does not represent an agent for activating the DRP; thus DRP is the only BCP component of active interest to IT.


Testing

The overall categorization of tests are functional- and discussion-based. Types of tests include: tabletop exercises, checklists, simulations, parallel processing (testing recovery site while primary site is in operation), and full interruption (fail over) tests. These apply to both BC and DR.


Benefits

Like every insurance plan, there are benefits that can be obtained from proper business continuity planning, including: Studies have shown a correlation between higher spending on auditing fees and lower rates of incidents. * Minimizing risk of delays * Guaranteeing the reliability of standby systems (even automating the failure detection and recovery in certain scenarios) * Providing a standard for testing the plan * Minimizing decision-making during a disaster * Reducing potential legal liabilities * Lowering unnecessarily stressful work environment


Planning and testing methodology

According to Geoffrey H. Wold of the Disaster Recovery Journal, the entire process involved in developing a Disaster Recovery Plan consists of 10 steps: * Performing a risk assessment: The planning committee prepares a
risk analysis In simple terms, risk is the possibility of something bad happening. Risk involves uncertainty about the effects/implications of an activity with respect to something that humans value (such as health, well-being, wealth, property or the environ ...
and a business impact analysis (BIA) that includes a range of possible disasters. Each functional area of the organization is analyzed to determine potential consequences. Traditionally, fire has posed the greatest threat. A thorough plan provides for "worst case" situations, such as destruction of the main building. * Establishing priorities for processing and operations: Critical needs of each department are evaluated and prioritized. Written agreements for alternatives selected are prepared, with details specifying duration, termination conditions,
system testing System testing, a.k.a. end-to-end (E2E) testing, is testing conducted on a complete software system. System testing describes testing at the system level to contrast to testing at the system integration, integration or unit level. System t ...
,
cost Cost is the value of money that has been used up to produce something or deliver a service, and hence is not available for use anymore. In business, the cost may be one of acquisition, in which case the amount of money expended to acquire it i ...
, any special security procedures, procedure for the notification of system changes, hours of operation, the specific hardware and other equipment required for processing, personnel requirements, definition of the circumstances constituting an emergency, process to negotiate service extensions, guarantee of compatibility, availability, non-mainframe resource requirements, priorities, and other contractual issues. * Collecting data: This includes various lists (employee backup position listing, critical telephone numbers list, master call list, master vendor list, notification checklist), inventories (communications equipment, documentation, office equipment, forms, insurance policies, workgroup and data center computer hardware, microcomputer hardware and software, office supply, off-site storage location equipment, telephones, etc.), distribution register, software and data files backup/retention schedules, temporary location specifications, any other such lists, materials, inventories, and documentation. Pre-formatted forms are often used to facilitate the data gathering process. * Organizing and documenting a written plan * Developing testing criteria and procedures: reasons for testing include ** Determining the feasibility and compatibility of backup facilities and procedures. ** Identifying areas in the plan that need modification. ** Providing training to the team managers and team members. ** Demonstrating the ability of the organization to recover. ** Providing motivation for maintaining and updating the disaster recovery plan. * Testing the plan: An initial " dry run" of the plan is performed by conducting a structured walk-through test. An actual test-run must be performed. Problems are corrected. Initial testing can be plan is done in sections and after normal business hours to minimize disruptions. Subsequent tests occur during normal business hours.


Caveats/controversies

Due to high cost, various plans are not without critics.
Dell Dell Inc. is an American technology company that develops, sells, repairs, and supports personal computers (PCs), Server (computing), servers, data storage devices, network switches, software, computer peripherals including printers and webcam ...
has identified five "common mistakes" organizations often make related to BCP/DR planning: * Lack of buy-in: When executive management sees DR planning as "just another fake earthquake drill" or CEOs fail to make DR planning and preparation a priority * Incomplete RTOs and RPOs: Failure to include each and every important business process or a block of data. Ripples can extend a disaster's impact. Payroll may not initially be mission-critical, but left alone for several days, it can become more important than any of your initial problems. * Systems myopia: A third point of failure involves focusing only on DR without considering the larger business continuity needs. Corporate office space lost to a disaster can result in an instant pool of teleworkers which, in turn, can overload a company's VPN overnight, overwork the IT support staff at the blink of an eye and cause serious bottlenecks and monopolies with the dial-in PBX system. * Lax security: When there is a disaster, an organization's data and business processes become vulnerable. As such, security can be more important than the raw speed involved in a disaster recovery plan's RTO. The most critical consideration then becomes securing the new data pipelines: from new VPNs to the connection from offsite backup services. ** In disasters, planning for post-mortem forensics ** Locking down or remotely wiping lost handheld devices


Decisions and strategies

Site designation: choice of a backup site. A hot site is fully equipped to resume operations while a cold site does not have that capability. A warm site has the capability to resume some, but not all operations. A cost-benefit analysis is needed. * Occasional tests and trials verify the viability and effectiveness of the plan. An auditor looks into the probability that operations of the organization can be sustained at the level that is assumed in the plan, and the ability of the entity to actually establish operations at the site. * The auditor can verify this through paper and paperless documentation and actual physical observation. The security of the storage site is also confirmed. Data backup: An audit of backup processes determines if (a) they are effective, and (b) if they are actually being implemented by the involved personnel.Berman, Alan. : Constructing a Successful Business Continuity Plan. ''Business Insurance Magazine'', March 9, 2015. http://www.businessinsurance.com/article/20150309/ISSUE0401/303159991/constructing-a-successful-business-continuity-plan The disaster recovery plan also includes information on how best to recover any data that has not been copied. Controls and protections are put in place to ensure that data is not damaged, altered, or destroyed during this process. Drills: Practice drills conducted periodically to determine how effective the plan is and to determine what changes may be necessary. The auditor's primary concern here is verifying that these drills are being conducted properly and that problems uncovered during these drills are addressed. Backup of key personnel - including periodic
training Training is teaching, or developing in oneself or others, any skills and knowledge or fitness that relate to specific useful competencies. Training has specific goals of improving one's capability, capacity, productivity and performance. I ...
, cross-training, and personnel redundancy.


Other considerations


Insurance issues

The auditor determines the adequacy of the company's
insurance Insurance is a means of protection from financial loss in which, in exchange for a fee, a party agrees to compensate another party in the event of a certain loss, damage, or injury. It is a form of risk management, primarily used to protect ...
coverage (particularly
property Property is a system of rights that gives people legal control of valuable things, and also refers to the valuable things themselves. Depending on the nature of the property, an owner of property may have the right to consume, alter, share, re ...
and casualty insurance) through a review of the company's insurance policies and other research. Among the items that the auditor needs to verify are: the scope of the policy (including any stated exclusions), that the amount of coverage is sufficient to cover the organization's needs, and that the policy is current and in force. The auditor also ascertains, through a review of the ratings assigned by independent rating agencies, that the insurance company or companies providing the coverage have the financial viability to cover the losses in the event of a disaster. Effective DR plans take into account the extent of a company's responsibilities to other entities and its ability to fulfill those commitments despite a major disaster. A good DR audit will include a review of existing MOA and
contract A contract is an agreement that specifies certain legally enforceable rights and obligations pertaining to two or more parties. A contract typically involves consent to transfer of goods, services, money, or promise to transfer any of thos ...
s to ensure that the organization's legal liability for lack of performance in the event of
disaster A disaster is an event that causes serious harm to people, buildings, economies, or the environment, and the affected community cannot handle it alone. '' Natural disasters'' like avalanches, floods, earthquakes, and wildfires are caused by na ...
or any other unusual circumstance is minimized. Agreements pertaining to establishing support and assisting with recovery for the entity are also outlined. Techniques used for evaluating this area include an examination of the reasonableness of the plan, a determination of whether or not the plan takes all factors into account, and a verification of the contracts and agreements reasonableness through documentation and outside research.


Communication issues

The auditor must verify that planning ensures that both
management Management (or managing) is the administration of organizations, whether businesses, nonprofit organizations, or a Government agency, government bodies through business administration, Nonprofit studies, nonprofit management, or the political s ...
and the recovery team have effective
communication Communication is commonly defined as the transmission of information. Its precise definition is disputed and there are disagreements about whether Intention, unintentional or failed transmissions are included and whether communication not onl ...
hardware, contact information for both internal communication and external issues, such as business partners and key customers. Audit techniques include * testing of procedures, interviewing employees, making comparison against the plans of other company and against industry standards, * examining company manuals and other written procedures. * direct observation that emergency telephone numbers are listed and easily accessible in the event of a disaster.


Emergency procedures

Procedures to sustain staff during a round-the-clock disaster recovery effort are included in any good disaster recovery plan. Procedures for the stocking of food and water, capabilities of administering CPR/ first aid, and dealing with family emergencies are clearly written and tested. This can generally be accomplished by the company through good
training Training is teaching, or developing in oneself or others, any skills and knowledge or fitness that relate to specific useful competencies. Training has specific goals of improving one's capability, capacity, productivity and performance. I ...
programs and a clear definition of job responsibilities. A review of the readiness capacity of a plan often includes tasks such as inquires of personnel, direct physical observation, and examination of training records and any certifications.


Environmental issues

The auditor must review procedures that take into account the possibility of power failures or other situations that are of a non-IT nature. * Flashlights and
candle A candle is an ignitable candle wick, wick embedded in wax, or another flammable solid substance such as tallow, that provides light, and in some cases, a Aroma compound, fragrance. A candle can also provide heat or a method of keeping time. ...
s may be needed. *
Safety Safety is the state of being protected from harm or other danger. Safety can also refer to the control of recognized hazards in order to achieve an acceptable level of risk. Meanings The word 'safety' entered the English language in the 1 ...
procedures in case of gas leaks,
fire Fire is the rapid oxidation of a fuel in the exothermic chemical process of combustion, releasing heat, light, and various reaction Product (chemistry), products. Flames, the most visible portion of the fire, are produced in the combustion re ...
s or other such phenomena and PPE may be needed.


See also

* Backup rotation scheme * Comparison of backup software * Comparison of online backup services * Information technology audit *
Vulnerability (computing) Vulnerabilities are flaws or weaknesses in a system's design, implementation, or management that can be exploited by a malicious actor to compromise its security. Despite a system administrator's best efforts to achieve complete correctness, vi ...


References

* * {{DEFAULTSORT:Business continuity and disaster recovery auditing Data management Backup Information technology audit Planning