In
information technology
Information technology (IT) is a set of related fields within information and communications technology (ICT), that encompass computer systems, software, programming languages, data processing, data and information processing, and storage. Inf ...
, a backup, or data backup is a copy of
computer data
''In computer science, data (treated as singular, plural, or as a mass noun) is any sequence of one or more symbols; datum is a single symbol of data. Data requires interpretation to become information. Digital data is data that is represen ...
taken and stored elsewhere so that it may be used to restore the original after a
data loss
Data loss is an error condition in information systems in which information is destroyed by failures (like failed spindle motors or head crashes on hard drives) or neglect (like mishandling, careless handling or storage under unsuitable conditions) ...
event. The verb form, referring to the process of doing so, is "
back up
Backup is the computing function of making copies of data to enable recovery from data loss.
Backup may also refer to:
Information technology
* Backup (backup software), Apple Mac software
* Backup and Restore, Windows software
* Backup softwa ...
", whereas the noun and adjective form is "
backup
In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "wikt:back ...
".
Backups can be used to
recover data after its loss from
data deletion or
corruption
Corruption is a form of dishonesty or a criminal offense that is undertaken by a person or an organization that is entrusted in a position of authority to acquire illicit benefits or abuse power for one's gain. Corruption may involve activities ...
, or to recover data from an earlier time.
Backups provide a simple form of
IT disaster recovery
IT disaster recovery (also, simply disaster recovery (DR)) is the process of maintaining or reestablishing vital infrastructure and systems following a natural or human-induced disaster, such as a storm or battle. DR employs policies, tools, an ...
; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a
computer cluster
A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The newes ...
,
active directory
Active Directory (AD) is a directory service developed by Microsoft for Windows domain networks. Windows Server operating systems include it as a set of processes and services. Originally, only centralized domain management used Active Direct ...
server, or
database server
A database server is a server which uses a database application that provides database services to other computer programs or to computers, as defined by the client–server model. Database management systems (DBMSs) frequently provide database- ...
.
A backup system contains at least one copy of all data considered worth saving. The
data storage
Data storage is the recording (storing) of information (data) in a storage medium. Handwriting, phonographic recording, magnetic tape, and optical discs are all examples of storage media. Biological molecules such as RNA and DNA are con ...
requirements can be large. An
information repository
In information technology, an information repository or simply a repository is "a central place in which an aggregation of data is kept and maintained in an organized way, usually in computer storage." It "may be just the aggregation of data itse ...
model may be used to provide structure to this storage. There are different types of
data storage device
Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted form ...
s used for copying backups of data that is already in secondary storage onto
archive file
In computing, an archive file stores the content of one or more files, possibly compressed, with associated metadata such as file name, directory structure, error detection and correction information, commentary, compressed data archives, sto ...
s.
[In contrast to everyday use of the term "archive", the data stored in an "archive file" is not necessarily old or of historical interest.] There are also different ways these devices can be arranged to provide geographic dispersion,
data security
Data security or data protection means protecting digital data, such as those in a database, from destructive forces and from the unwanted actions of unauthorized users, such as a cyberattack or a data breach.
Technologies
Disk encryption
...
, and
portability.
Data is selected, extracted, and manipulated for storage. The process can include methods for
dealing with live data, including open files, as well as compression, encryption, and
de-duplication. Additional techniques apply to
enterprise client-server backup
In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", ...
. Backup schemes may include
dry runs that validate the reliability of the data being backed up. There are limitations
and human factors involved in any backup scheme.
Storage
A backup strategy requires an information repository, "a secondary storage space for data"
that aggregates backups of data "sources". The repository could be as simple as a list of all backup media (DVDs, etc.) and the dates produced, or could include a computerized index, catalog, or
relational database
A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970.
A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
.
3-2-1 Backup Rule
The backup data needs to be stored, requiring a
backup rotation scheme
A backup rotation scheme is a system of backing up data to computer media (such as tapes) that minimizes, by re-use, the number of media used. The scheme determines how and when each piece of removable storage is used for a backup job and how l ...
,
which is a system of backing up data to computer media that limits the number of backups of different dates retained separately, by appropriate re-use of the data storage media by overwriting of backups no longer needed. The scheme determines how and when each piece of removable storage is used for a backup operation and how long it is retained once it has backup data stored on it. The 3-2-1 rule can aid in the backup process. It states that there should be at least 3 copies of the data, stored on 2 different types of storage media, and one copy should be kept offsite, in a remote location (this can include
cloud storage
Cloud storage is a model of computer data storage in which data, said to be on "the cloud", is stored remotely in logical pools and is accessible to users over a network, typically the Internet. The physical storage spans multiple servers (so ...
). 2 or more different media should be used to eliminate data loss due to similar reasons (for example, optical discs may tolerate being underwater while LTO tapes may not, and SSDs cannot fail due to
head crash
A head crash is a hard-disk failure that occurs when a disk read-and-write head, read–write head of a hard disk drive makes contact with its rotating hard disk platter, platter, slashing its surface and permanently damaging its magnetic media ...
es or damaged spindle motors since they do not have any moving parts, unlike hard drives). An offsite copy protects against fire, theft of physical media (such as tapes or discs) and natural disasters like floods and earthquakes. Physically protected hard drives are an alternative to an offsite copy, but they have limitations like only being able to resist fire for a limited period of time, so an offsite copy still remains as the ideal choice.
Because there is no perfect storage, many backup experts recommend maintaining a second copy on a local physical device, even if the data is also backed up offsite.
Backup methods
Unstructured
An unstructured repository may simply be a stack of tapes, DVD-Rs or external HDDs with minimal information about what was backed up and when. This method is the easiest to implement, but unlikely to achieve a high level of recoverability as it lacks automation.
Full only/System imaging
A repository using this backup method contains complete source data copies taken at one or more specific points in time. Copying
system image
{{Refimprove, date=December 2013
In computing, a system image is a serialized copy of the entire state of a computer system stored in some non-volatile form, such as a binary executable file.
If a system has all its state written to a disk (i ...
s, this method is frequently used by computer technicians to record known good configurations. However, imaging is generally more useful as a way of deploying a standard configuration to many systems rather than as a tool for making ongoing backups of diverse systems.
Incremental
An
incremental backup
An incremental backup is one in which successive copies of the data contain only the portion that has changed since the preceding backup copy was made. When a full recovery is needed, the restoration process would need the last full backup plus al ...
stores data changed since a reference point in time. Duplicate copies of unchanged data are not copied. Typically a full backup of all files is made once or at infrequent intervals, serving as the reference point for an incremental repository. Subsequently, a number of incremental backups are made after successive time periods. Restores begin with the last full backup and then apply the incrementals.
Some backup systems
can create a from a series of incrementals, thus providing the equivalent of frequently doing a full backup. When done to modify a single archive file, this speeds restores of recent versions of files.
Near-CDP
Continuous Data Protection (CDP) refers to a backup that instantly saves a copy of every change made to the data. This allows restoration of data to any point in time and is the most comprehensive and advanced data protection.
Near-CDP backup applications—often
marketed as "CDP"—automatically take incremental backups at a specific interval, for example every 15 minutes, one hour, or 24 hours. They can therefore only allow restores to an interval boundary.
Near-CDP backup applications use
journaling and are typically based on periodic "snapshots",
read-only copies of the data frozen at a particular
point in time.
Near-CDP (except for
Apple Time Machine
Time Machine is the backup mechanism of macOS, the desktop operating system developed by Apple. The software is designed to work with both local storage devices and network-attached disks, and is commonly used with external disk drives connecte ...
)
intent-logs every change on the host system,
often by saving byte or block-level differences rather than file-level differences. This backup method differs from simple
disk mirroring
In Data storage device, data storage, disk mirroring is the Replication (computing), replication of logical disk volumes onto separate physical hard disks in Real-time computing, real time to ensure continuous availability. It is most commonly u ...
in that it enables a roll-back of the log and thus a restoration of old images of data. Intent-logging allows precautions for the consistency of live data, protecting ''self-consistent'' files but requiring ''applications'' "be quiesced and made ready for backup."
Near-CDP is more practicable for ordinary personal backup applications, as opposed to ''true'' CDP, which must be run in conjunction with a virtual machine
or equivalent
and is therefore generally used in enterprise client-server backups.
Software may create copies of individual files such as written documents, multimedia projects, or user preferences, to prevent failed write events caused by power outages, operating system crashes, or exhausted disk space, from causing data loss. A common implementation is an appended
".bak" extension to the
file name.
Reverse incremental
A
Reverse incremental backup method stores a recent archive file "mirror" of the source data and a series of differences between the "mirror" in its current state and its previous states. A reverse incremental backup method starts with a non-image full backup. After the full backup is performed, the system periodically synchronizes the full backup with the live copy, while storing the data necessary to reconstruct older versions. This can either be done using
hard links
Hard means something that is difficult to do. It may also refer to:
* Hardness, resistance of physical materials to deformation or fracture
* Hard water, water with high mineral content
Arts and entertainment
* Hard (TV series), ''Hard'' (TV ser ...
—as Apple Time Machine does, or using binary
diffs.
Differential
A
differential backup saves only the data that has changed since the last full backup. This means a maximum of two backups from the repository are used to restore the data. However, as time from the last full backup (and thus the accumulated changes in data) increases, so does the time to perform the differential backup. Restoring an entire system requires starting from the most recent full backup and then applying just the last differential backup.
A differential backup copies files that have been created or changed since the last full backup, regardless of whether any other differential backups have been made since, whereas an incremental backup copies files that have been created or changed since the most recent backup of any type (full or incremental). Changes in files may be detected through a more recent date/time of last modification
file attribute
File attributes are a type of metadata that describe and may modify how files and/or directories in a filesystem behave. Typical file attributes may, for example, indicate or specify whether a file is visible, modifiable, compressed, or encrypte ...
, and/or changes in file size. Other variations of incremental backup include multi-level incrementals and block-level incrementals that compare parts of files instead of just entire files.
Storage media
Regardless of the repository model that is used, the data has to be copied onto an archive file data storage medium. The medium used is also referred to as the type of backup destination.
Magnetic tape
Magnetic tape
Magnetic tape is a medium for magnetic storage made of a thin, magnetizable coating on a long, narrow strip of plastic film. It was developed in Germany in 1928, based on the earlier magnetic wire recording from Denmark. Devices that use magnetic ...
was for a long time the most commonly used medium for bulk data storage, backup, archiving, and interchange. It was previously a less expensive option, but this is no longer the case for smaller amounts of data.
Tape is a
sequential access medium, so the rate of continuously writing or reading data can be very fast. While tape media itself has a low cost per space,
tape drive
A tape drive is a data storage device that reads and writes data on a magnetic tape. Magnetic-tape data storage is typically used for offline, archival data storage. Tape media generally has a favorable unit cost and long archival stability.
...
s are typically dozens of times as expensive as
hard disk drive
A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating hard disk drive platter, pla ...
s and
optical drive
In computing, an optical disc drive (ODD) is a disk drive, disc drive that uses laser light or electromagnetic waves within or near the visible light spectrum as part of the process of reading or writing data to or from optical discs. Some driv ...
s.
Many tape formats have been proprietary or specific to certain markets like mainframes or a particular brand of personal computer. By 2014
LTO had become the primary tape technology.
The other remaining viable "super" format is the
IBM 3592 (also referred to as the TS11xx series). The
Oracle StorageTek T10000 was discontinued in 2016.
Hard disk
The use of
hard disk
A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating hard disk drive platter, pla ...
storage has increased over time as it has become progressively cheaper. Hard disks are usually easy to use, widely available, and can be accessed quickly.
However, hard disk backups are
close-tolerance mechanical devices and may be more easily damaged than tapes, especially while being transported.
In the mid-2000s, several drive manufacturers began to produce portable drives employing
ramp loading and accelerometer technology (sometimes termed a "shock sensor"),
and by 2010 the industry average in drop tests for drives with that technology showed drives remaining intact and working after a 36-inch non-operating drop onto industrial carpeting.
Some manufacturers also offer 'ruggedized' portable hard drives, which include a shock-absorbing case around the hard disk, and
claim a range of higher drop specifications.
[
] Over a period of years the stability of hard disk backups is shorter than that of tape backups.
External hard disks can be connected via local interfaces like
SCSI
Small Computer System Interface (SCSI, ) is a set of standards for physically connecting and transferring data between computers and peripheral devices, best known for its use with storage devices such as hard disk drives. SCSI was introduced ...
,
USB
Universal Serial Bus (USB) is an industry standard, developed by USB Implementers Forum (USB-IF), for digital data transmission and power delivery between many types of electronics. It specifies the architecture, in particular the physical ...
,
FireWire
IEEE 1394 is an interface standard for a serial bus for high-speed communications and isochronous real-time data transfer. It was developed in the late 1980s and early 1990s by Apple in cooperation with a number of companies, primarily Sony a ...
, or
eSATA
SATA (Serial AT Attachment) is a computer bus interface that connects host bus adapters to mass storage devices such as hard disk drives, optical drives, and solid-state drives. Serial ATA succeeded the earlier Parallel ATA (PATA) standard ...
, or via longer-distance technologies like
Ethernet
Ethernet ( ) is a family of wired computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 198 ...
,
iSCSI
Internet Small Computer Systems Interface or iSCSI ( ) is an Internet Protocol-based storage networking standard for linking data storage facilities. iSCSI provides block-level access to storage devices by carrying SCSI commands over a TCP/IP ...
, or
Fibre Channel
Fibre Channel (FC) is a high-speed data transfer protocol providing in-order, lossless delivery of raw block data. Fibre Channel is primarily used to connect computer data storage to Server (computing), servers in storage area networks (SAN) in ...
. Some disk-based backup systems, via
Virtual Tape Libraries or otherwise, support data deduplication, which can reduce the amount of disk storage capacity consumed by daily and weekly backup data.
Optical storage
Optical storage uses lasers to store and retrieve data. Recordable
CDs, DVDs, and
Blu-ray Disc
Blu-ray (Blu-ray Disc or BD) is a Digital media, digital optical disc data storage format designed to supersede the DVD format. It was invented and developed in 2005 and released worldwide on June 20, 2006, capable of storing several hours of ...
s are commonly used with personal computers and are generally cheap. The capacities and speeds of these discs have typically been lower than hard disks or tapes. Advances in optical media may shrink that gap in the future.
Potential future data losses caused by gradual
media degradation can be
predicted by
measuring the rate of correctable minor data errors, of which consecutively too many increase the risk of uncorrectable sectors. Support for error scanning varies among
optical drive
In computing, an optical disc drive (ODD) is a disk drive, disc drive that uses laser light or electromagnetic waves within or near the visible light spectrum as part of the process of reading or writing data to or from optical discs. Some driv ...
vendors.
Many optical disc formats are
WORM
Worms are many different distantly related bilateria, bilateral animals that typically have a long cylindrical tube-like body, no limb (anatomy), limbs, and usually no eyes.
Worms vary in size from microscopic to over in length for marine ...
type, which makes them useful for archival purposes since the data cannot be changed in any way, including by user error and by malware such as
ransomware
Ransomware is a type of malware that Encryption, encrypts the victim's personal data until a ransom is paid. Difficult-to-trace Digital currency, digital currencies such as paysafecard or Bitcoin and other cryptocurrency, cryptocurrencies are com ...
. Moreover, optical discs are
not vulnerable to
head crash
A head crash is a hard-disk failure that occurs when a disk read-and-write head, read–write head of a hard disk drive makes contact with its rotating hard disk platter, platter, slashing its surface and permanently damaging its magnetic media ...
es, magnetism, imminent water ingress or
power surges; and, a fault of the drive typically just halts the spinning.
Optical media is
modular; the storage controller is not tied to media itself like with hard drives or flash storage (→
flash memory controller), allowing it to be removed and accessed through a different drive. However, recordable media may degrade earlier under long-term exposure to light.
Some optical storage systems allow for cataloged data backups without human contact with the discs, allowing for longer data integrity. A French study in 2008 indicated that the lifespan of typically-sold
CD-Rs was 2–10 years,
but one manufacturer later estimated the longevity of its CD-Rs with a gold-sputtered layer to be as high as 100 years. Sony's
proprietary Optical Disc Archive can in 2016 reach a read rate of 250 MB/s.
Solid-state drive
Solid-state drives (SSDs) use
integrated circuit
An integrated circuit (IC), also known as a microchip or simply chip, is a set of electronic circuits, consisting of various electronic components (such as transistors, resistors, and capacitors) and their interconnections. These components a ...
assemblies to store data.
Flash memory
Flash memory is an Integrated circuit, electronic Non-volatile memory, non-volatile computer memory storage medium that can be electrically erased and reprogrammed. The two main types of flash memory, NOR flash and NAND flash, are named for t ...
,
thumb drives,
USB flash drive
A flash drive (also thumb drive, memory stick, and pen drive/pendrive) is a data storage device that includes flash memory with an integrated USB interface. A typical USB drive is removable, rewritable, and smaller than an optical disc, and u ...
s,
CompactFlash
CompactFlash (CF) is a flash memory mass storage device used mainly in portable electronic devices. The format was specified and the devices were first manufactured by SanDisk in 1994.
CompactFlash became one of the most successful of the e ...
,
SmartMedia
SmartMedia is an obsolete flash memory, flash memory card standard owned by Toshiba, with capacities ranging from 2 MB to 128 MB. The format mostly saw application in the early 2000s in digital cameras and audio production. SmartMedia m ...
,
Memory Stick
The Memory Stick is a removable flash memory, flash memory card format, originally launched by Sony in late 1998. In addition to the original Memory Stick, this family includes the Memory Stick PRO, a revision that allows greater maximum storage ...
s, and
Secure Digital card
Secure Digital (SD) is a proprietary, non-volatile, flash memory card format developed by the SD Association (SDA). Owing to their compact size, SD cards have been widely adopted in a variety of portable consumer electronics, including digi ...
devices are relatively expensive for their low capacity, but convenient for backing up relatively low data volumes. A solid-state drive does not contain any movable parts, making it less susceptible to physical damage, and can have huge throughput of around 500 Mbit/s up to 6 Gbit/s. Available SSDs have become more capacious and cheaper.
[ Flash memory backups are stable for fewer years than hard disk backups.]
Remote backup service
Remote backup services or cloud backups involve service providers storing data offsite. This has been used to protect against events such as fires, floods, or earthquakes which could destroy locally stored backups. Cloud-based backup (through services like or similar to Google Drive
Google Drive is a file-hosting service and synchronization service developed by Google. Launched on April 24, 2012, Google Drive allows users to store files in the cloud (on Google servers), synchronize files across devices, and share files ...
, and Microsoft OneDrive
Microsoft OneDrive is a file-hosting service operated by Microsoft. First released as SkyDrive in August 2007, it allows registered users to store, share, back-up and synchronize their files. OneDrive also works as the storage backend of the ...
) provides a layer of data protection. However, the users must trust the provider to maintain the privacy and integrity of their data, with confidentiality enhanced by the use of encryption
In Cryptography law, cryptography, encryption (more specifically, Code, encoding) is the process of transforming information in a way that, ideally, only authorized parties can decode. This process converts the original representation of the inf ...
. Because speed and availability are limited by a user's online connection, users with large amounts of data may need to use cloud seeding and large-scale recovery.
Management
Various methods can be used to manage backup media, striking a balance between accessibility, security and cost. These media management methods are not mutually exclusive and are frequently combined to meet the user's needs. Using on-line disks for staging data before it is sent to a near-line tape library
Tape or Tapes may refer to:
Material
Tape is long, narrow, thin strip of material usually used to stick things together. (see also Ribbon (disambiguation):
Adhesive tapes
* Adhesive tape, any of many varieties of backing materials coated with ...
is a common example.
Online
Online
In computer technology and telecommunications, online indicates a state of connectivity, and offline indicates a disconnected state. In modern terminology, this usually refers to an Internet connection, but (especially when expressed as "on lin ...
backup storage is typically the most accessible type of data storage, and can begin a restore in milliseconds. An internal hard disk or a disk array
A disk array is a disk storage system which contains multiple disk drives. It is differentiated from a disk enclosure, in that an array has cache (computing), cache memory and advanced functionality, like redundant array of independent disks, RAID ...
(maybe connected to SAN) is an example of an online backup. This type of storage is convenient and speedy, but is vulnerable to being deleted or overwritten, either by accident, by malevolent action, or in the wake of a data-deleting virus
A virus is a submicroscopic infectious agent that replicates only inside the living Cell (biology), cells of an organism. Viruses infect all life forms, from animals and plants to microorganisms, including bacteria and archaea. Viruses are ...
payload.
Near-line
Nearline storage is typically less accessible and less expensive than online storage, but still useful for backup data storage. A mechanical device is usually used to move media units from storage into a drive where the data can be read or written. Generally it has safety properties similar to on-line storage. An example is a tape library
Tape or Tapes may refer to:
Material
Tape is long, narrow, thin strip of material usually used to stick things together. (see also Ribbon (disambiguation):
Adhesive tapes
* Adhesive tape, any of many varieties of backing materials coated with ...
with restore times ranging from seconds to a few minutes.
Off-line
Off-line storage requires some direct action to provide access to the storage media: for example, inserting a tape into a tape drive or plugging in a cable. Because the data is not accessible via any computer except during limited periods in which they are written or read back, they are largely immune to on-line backup failure modes. Access time varies depending on whether the media are on-site or off-site.
Off-site data protection
Backup media may be sent to an off-site vault to protect against a disaster or other site-specific problem. The vault can be as simple as a system administrator's home office or as sophisticated as a disaster-hardened, temperature-controlled, high-security bunker with facilities for backup media storage. A data replica can be off-site but also on-line (e.g., an off-site RAID
RAID (; redundant array of inexpensive disks or redundant array of independent disks) is a data storage virtualization technology that combines multiple physical Computer data storage, data storage components into one or more logical units for th ...
mirror).
Backup site
A backup site or disaster recovery center is used to store data that can enable computer systems and networks to be restored and properly configured in the event of a disaster. Some organisations have their own data recovery centres, while others contract this out to a third-party. Due to high costs, backing up is rarely considered the preferred method of moving data to a DR site. A more typical way would be remote disk mirroring
In Data storage device, data storage, disk mirroring is the Replication (computing), replication of logical disk volumes onto separate physical hard disks in Real-time computing, real time to ensure continuous availability. It is most commonly u ...
, which keeps the DR data as up to date as possible.
Selection and extraction of data
A backup operation starts with selecting and extracting coherent units of data. Most data on modern computer systems is stored in discrete units, known as files. These files are organized into filesystems. Deciding what to back up at any given time involves tradeoffs. By backing up too much redundant data, the information repository will fill up too quickly. Backing up an insufficient amount of data can eventually lead to the loss of critical information.
Files
* Copying files: Making copies of files is the simplest and most common way to perform a backup. A means to perform this basic function is included in all backup software and all operating systems.
*Partial file copying: A backup may include only the blocks or bytes within a file that have changed in a given period of time. This can substantially reduce needed storage space, but requires higher sophistication to reconstruct files in a restore situation. Some implementations require integration with the source file system.
*Deleted files: To prevent the unintentional restoration of files that have been intentionally deleted, a record of the deletion must be kept.
*Versioning of files: Most backup applications, other than those that do only full only/System imaging, also back up files that have been modified since the last backup. "That way, you can retrieve many different versions of a given file, and if you delete it on your hard disk, you can still find it in your nformation repositoryarchive."
Filesystems
*Filesystem dump: A copy of the whole filesystem in block-level can be made. This is also known as a "raw partition backup" and is related to disk imaging
A disk image is a snapshot of a storage device's content typically stored in a file on another storage device.
Traditionally, a disk image was relatively large because it was a bit-by-bit copy of every storage location of a device (i.e. every ...
. The process usually involves unmounting the filesystem and running a program like dd (Unix)
dd is shell command for reading, writing and converting file data. Originally developed for Unix, it has been implemented on many other environments including Unix-like operating systems, Windows, Plan 9 and Inferno.
The command can be used ...
. Because the disk is read sequentially and with large buffers, this type of backup can be faster than reading every file normally, especially when the filesystem contains many small files, is highly fragmented, or is nearly full. But because this method also reads the free disk blocks that contain no useful data, this method can also be slower than conventional reading, especially when the filesystem is nearly empty. Some filesystems, such as XFS
XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; a ...
, provide a "dump" utility that reads the disk sequentially for high performance while skipping unused sections. The corresponding restore utility can selectively restore individual files or the entire volume at the operator's choice.
*Identification of changes: Some filesystems have an archive bit
The archive bit is a file attribute used by CP/M, Microsoft operating systems, OS/2, and AmigaOS. It is used to indicate whether or not the file has been backed up (archived).
Usage
In Windows and OS/2, when a file is created or modified, the ...
for each file that says it was recently changed. Some backup software looks at the date of the file and compares it with the last backup to determine whether the file was changed.
* Versioning file system: A versioning filesystem tracks all changes to a file. The NILFS versioning filesystem for Linux is an example.
Live data
Files that are actively being updated present a challenge to back up. One way to back up live data is to temporarily quiesce
To quiesce is to pause or alter a device or application to achieve a consistent state, usually in preparation for a backup or other maintenance.
Description
In software applications that modify information stored on disk, this generally involve ...
them (e.g., close all files), take a "snapshot", and then resume live operations. At this point the snapshot can be backed up through normal methods. A snapshot is an instantaneous function of some filesystems that presents a copy of the filesystem as if it were frozen at a specific point in time, often by a copy-on-write
Copy-on-write (COW), also called implicit sharing or shadowing, is a resource-management technique used in programming to manage shared data efficiently. Instead of copying data right away when multiple programs use it, the same data is shared ...
mechanism. Snapshotting a file while it is being changed results in a corrupted file that is unusable. This is also the case across interrelated files, as may be found in a conventional database or in applications such as Microsoft Exchange Server
Microsoft Exchange Server is a mail server and calendaring server developed by Microsoft. It runs exclusively on Windows Server operating systems.
The first version was called Exchange Server 4.0, to position it as the successor to the relat ...
. The term fuzzy backup A fuzzy backup is a secondary (or backup
In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring t ...
can be used to describe a backup of live data that looks like it ran correctly, but does not represent the state of the data at a single point in time.
Backup options for data files that cannot be or are not quiesced include:
*Open file backup: Many backup software applications undertake to back up open files in an internally consistent state. Some applications simply check whether open files are in use and try again later. Other applications exclude open files that are updated very frequently. Some low-availability interactive applications can be backed up via natural/induced pausing.
*Interrelated database files backup: Some interrelated database file systems offer a means to generate a "hot backup" of the database while it is online and usable. This may include a snapshot of the data files plus a snapshotted log of changes made while the backup is running. Upon a restore, the changes in the log files are applied to bring the copy of the database up to the point in time at which the initial backup ended. Other low-availability interactive applications can be backed up via coordinated snapshots. However, genuinely-high-availability interactive applications can be only be backed up via Continuous Data Protection.
Metadata
Not all information stored on the computer is stored in files. Accurately recovering a complete system from scratch requires keeping track of this non-file data too.
*System description: System specifications are needed to procure an exact replacement after a disaster.
*Boot sector
A boot sector is the disk sector, sector of a persistent data storage device (e.g., hard disk, floppy disk, optical disc, etc.) which contains machine code to be loaded into random-access memory (RAM) and then executed by a computer system's bui ...
: The boot sector can sometimes be recreated more easily than saving it. It usually isn't a normal file and the system won't boot without it.
* Partition layout: The layout of the original disk, as well as partition tables and filesystem settings, is needed to properly recreate the original system.
*File metadata
Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive ...
: Each file's permissions, owner, group, ACLs, and any other metadata need to be backed up for a restore to properly recreate the original environment.
*System metadata: Different operating systems have different ways of storing configuration information. Microsoft Windows
Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
keeps a registry of system information that is more difficult to restore than a typical file.
Manipulation of data and dataset optimization
It is frequently useful or required to manipulate the data being backed up to optimize the backup process. These manipulations can improve backup speed, restore speed, data security, media usage and/or reduced bandwidth requirements.
Automated data grooming
Out-of-date data can be automatically deleted, but for personal backup applications—as opposed to enterprise client-server backup applications where automated data "grooming" can be customized—the deletion[Some backup applications—notably ]rsync
rsync (remote sync) is a utility for transferring and synchronizing files between a computer and a storage drive and across networked computers by comparing the modification times and sizes of files. It is commonly found on Unix-like opera ...
and CrashPlan
Code42 is an American cybersecurity software company based in Minneapolis specializing in insider risk management. It is the maker of the cloud-native data protection product Incydr and security microlearning product Instructor.
Code42's Incydr ...
—term removing backup data "pruning" instead of "grooming". can at most be globally delayed or be disabled.
Compression
Various schemes can be employed to shrink the size of the source data to be stored so that it uses less storage space. Compression is frequently a built-in feature of tape drive hardware.
Deduplication
Redundancy due to backing up similarly configured workstations can be reduced, thus storing just one copy. This technique can be applied at the file or raw block level. This potentially large reduction is called deduplication. It can occur on a server before any data moves to backup media, sometimes referred to as source/client side deduplication. This approach also reduces bandwidth required to send backup data to its target media. The process can also occur at the target storage device, sometimes referred to as inline or back-end deduplication.
Duplication
Sometimes backups are duplicated to a second set of storage media. This can be done to rearrange the archive files to optimize restore speed, or to have a second copy at a different location or on a different storage medium—as in the disk-to-disk-to-tape capability of Enterprise client-server backup.
Encryption
High-capacity removable storage media such as backup tapes present a data security risk if they are lost or stolen. Encrypting the data on these media can mitigate this problem, however encryption is a CPU intensive process that can slow down backup speeds, and the security of the encrypted backups is only as effective as the security of the key management policy.
Multiplexing
When there are many more computers to be backed up than there are destination storage devices, the ability to use a single storage device with several simultaneous backups can be useful. However cramming the scheduled backup window via "multiplexed backup" is only used for tape destinations.
Refactoring
The process of rearranging the sets of backups in an archive file is known as refactoring. For example, if a backup system uses a single tape each day to store the incremental backups for all the protected computers, restoring one of the computers could require many tapes. Refactoring could be used to consolidate all the backups for a single computer onto a single tape, creating a "synthetic full backup". This is especially useful for backup systems that do incrementals forever style backups.
Staging
Sometimes backups are copied to a staging disk before being copied to tape. This process is sometimes referred to as D2D2T, an acronym for Disk-to-disk-to-tape. It can be useful if there is a problem matching the speed of the final destination device with the source device, as is frequently faced in network-based backup systems. It can also serve as a centralized location for applying other data manipulation techniques.
Objectives
* Recovery point objective (RPO): The point in time that the restarted infrastructure will reflect, expressed as "the maximum targeted period in which data (transactions) might be lost from an IT service due to a major incident". Essentially, this is the roll-back that will be experienced as a result of the recovery. The most desirable RPO would be the point just prior to the data loss event. Making a more recent recovery point achievable requires increasing the frequency of synchronization
Synchronization is the coordination of events to operate a system in unison. For example, the Conductor (music), conductor of an orchestra keeps the orchestra synchronized or ''in time''. Systems that operate with all parts in synchrony are sa ...
between the source data and the backup repository.
*Recovery time objective (RTO): The amount of time elapsed between disaster and restoration of business functions.
*Data security
Data security or data protection means protecting digital data, such as those in a database, from destructive forces and from the unwanted actions of unauthorized users, such as a cyberattack or a data breach.
Technologies
Disk encryption
...
: In addition to preserving access to data for its owners, data must be restricted from unauthorized access. Backups must be performed in a manner that does not compromise the original owner's undertaking. This can be achieved with data encryption and proper media handling policies.
*Data retention
Data retention defines the policies of persistent data and records management for meeting legal and business data archival requirements. Although sometimes interchangeable, it is not to be confused with the Data Protection Act 1998.
The differe ...
period: Regulations and policy can lead to situations where backups are expected to be retained for a particular period, but not any further. Retaining backups after this period can lead to unwanted liability and sub-optimal use of storage media.
*Checksum
A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify dat ...
or hash function
A hash function is any Function (mathematics), function that can be used to map data (computing), data of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output. The values returned by a ...
validation: Applications that back up to tape archive files need this option to verify that the data was accurately copied.
* Backup process monitoring: Enterprise client-server backup applications need a user interface that allows administrators to monitor the backup process, and proves compliance to regulatory bodies outside the organization; for example, an insurance company in the USA might be required under HIPAA
The Health Insurance Portability and Accountability Act of 1996 (HIPAA or the Kennedy– Kassebaum Act) is a United States Act of Congress enacted by the 104th United States Congress and signed into law by President Bill Clinton on August 21, ...
to demonstrate that its client data meet records retention requirements.HIPAA Advisory
. Retrieved 10 March 2007
*
User-initiated backups and restores: To avoid or recover from ''minor'' disasters, such as inadvertently deleting or overwriting the "good" versions of one or more files, the computer user—rather than an administrator—may initiate backups and restores (from not necessarily the most-recent backup) of files or folders.
See also
About backup
* Backup software and services
**
List of backup software
This is a list of notable backup software that performs data backups. Archivers, transfer protocols, and version control systems are often used for backups but only software focused on backup is listed here. See Comparison of backup software ...
**
Comparison of online backup services
**
Comparison of backup software
Feature comparison of backup software. For a more general comparison see List of backup software.
List
References
{{Reflist
*
*
Backup software
In information technology, a backup, or data backup is a copy of computer data taken and ...
*
Glossary of backup terms
*
Virtual backup appliance
VMware Infrastructure is a collection of virtualization products from VMware. Virtualization is an abstraction layer that decouples hardware from operating systems. The VMware Infrastructure suite allows enterprises to optimize and manage their ...
Related topics
*
Data consistency
Data inconsistency refers to whether the same data kept at different places do or do not match.
Point-in-time consistency
Point-in-time consistency is an important property of backup files and a critical objective of software that creates backups. ...
*
Data degradation
Data degradation is the gradual Data corruption, corruption of Data (computing), computer data due to an accumulation of non-critical failures in a data storage device. It is also referred to as data decay, data rot or bit rot. This results in ...
*
Data portability
Data portability is a concept to protect users from having their data stored in "silos" or "walled gardens" that are incompatible with one another, i.e. closed platforms, thus subjecting them to vendor lock-in and making the creation of data back ...
*
Data proliferation Data proliferation refers to the prodigious amount of data, structured and unstructured, that businesses and governments continue to generate at an unprecedented rate and the usability problems that result from attempting to store and manage that da ...
*
Database dump
*
Digital preservation
In library science, library and archival science, digital preservation is a formal process to ensure that digital information of continuing value remains accessible and usable in the long term. It involves planning, resource allocation, and appli ...
*
Disaster recovery and business continuity auditing
Given organizations' increasing dependency on information technology (IT) to run their operations, business continuity planning (and its subset IT service continuity planning) covers the entire organization, while IT disaster recovery, disaster rec ...
*
World Backup Day
Notes
References
External links
*
*
{{Computer files
Computer data
*
Data management
Data security
Records management