Storage
A backup strategy requires an information repository, "a secondary storage space for data" that aggregates backups of data "sources". The repository could be as simple as a list of all backup media (DVDs, etc.) and the dates produced, or could include a computerized index, catalog, or relational database. The backup data needs to be stored, requiring a backup rotation scheme, which is a system of backing up data to computer media that limits the number of backups of different dates retained separately, by appropriate re-use of the data storage media by overwriting of backups no longer needed. The scheme determines how and when each piece of removable storage is used for a backup operation and how long it is retained once it has backup data stored on it. The 3-2-1 rule can aid in the backup process. It states that there should be at least 3 copies of the data, stored on 2 different types of storage media, and one copy should be kept offsite, in a remote location (this can includeBackup methods
Unstructured
An unstructured repository may simply be a stack of tapes, DVD-Rs or external HDDs with minimal information about what was backed up and when. This method is the easiest to implement, but unlikely to achieve a high level of recoverability as it lacks automation.Full only/System imaging
A repository using this backup method contains complete source data copies taken at one or more specific points in time. Copying system images, this method is frequently used by computer technicians to record known good configurations. However, imaging is generally more useful as a way of deploying a standard configuration to many systems rather than as a tool for making ongoing backups of diverse systems.Incremental
AnNear-CDP
Continuous Data Protection (CDP) refers to a backup that instantly saves a copy of every change made to the data. This allows restoration of data to any point in time and is the most comprehensive and advanced data protection. Near-CDP backup applications—often marketed as "CDP"—automatically take incremental backups at a specific interval, for example every 15 minutes, one hour, or 24 hours. They can therefore only allow restores to an interval boundary. Near-CDP backup applications use journaling and are typically based on periodic "snapshots", read-only copies of the data frozen at a particular point in time. Near-CDP (except forReverse incremental
A Reverse incremental backup method stores a recent archive file "mirror" of the source data and a series of differences between the "mirror" in its current state and its previous states. A reverse incremental backup method starts with a non-image full backup. After the full backup is performed, the system periodically synchronizes the full backup with the live copy, while storing the data necessary to reconstruct older versions. This can either be done using hard links—as Apple Time Machine does, or using binary diffs.Differential
A differential backup saves only the data that has changed since the last full backup. This means a maximum of two backups from the repository are used to restore the data. However, as time from the last full backup (and thus the accumulated changes in data) increases, so does the time to perform the differential backup. Restoring an entire system requires starting from the most recent full backup and then applying just the last differential backup. A differential backup copies files that have been created or changed since the last full backup, regardless of whether any other differential backups have been made since, whereas an incremental backup copies files that have been created or changed since the most recent backup of any type (full or incremental). Changes in files may be detected through a more recent date/time of last modificationStorage media
Magnetic tape
Hard disk
The use of hard disk storage has increased over time as it has become progressively cheaper. Hard disks are usually easy to use, widely available, and can be accessed quickly. However, hard disk backups are close-tolerance mechanical devices and may be more easily damaged than tapes, especially while being transported. In the mid-2000s, several drive manufacturers began to produce portable drives employing ramp loading and accelerometer technology (sometimes termed a "shock sensor"), and by 2010 the industry average in drop tests for drives with that technology showed drives remaining intact and working after a 36-inch non-operating drop onto industrial carpeting. Some manufacturers also offer 'ruggedized' portable hard drives, which include a shock-absorbing case around the hard disk, and claim a range of higher drop specifications. Over a period of years the stability of hard disk backups is shorter than that of tape backups. External hard disks can be connected via local interfaces like SCSI, USB,Optical storage
Optical storage uses lasers to store and retrieve data. Recordable CDs, DVDs, and Blu-ray Discs are commonly used with personal computers and are generally cheap. In the past, the capacities and speeds of these discs have been lower than hard disks or tapes, although advances in optical media are slowly shrinking that gap. Potential future data losses caused by gradual media degradation can beSolid-state drive
Solid-state drives (SSDs) use integrated circuit assemblies to store data.Remote backup service
Remote backup services or cloud backups involve service providers storing data offsite. This has been used to protect against events such as fires, floods, or earthquakes which could destroy locally stored backups. Cloud-based backup (through services like or similar to Google Drive, and Microsoft OneDrive) provides a layer of data protection. However, the users must trust the provider to maintain the privacy and integrity of their data, with confidentiality enhanced by the use of encryption. Because speed and availability are limited by a user's online connection, users with large amounts of data may need to use cloud seeding and large-scale recovery.Management
Various methods can be used to manage backup media, striking a balance between accessibility, security and cost. These media management methods are not mutually exclusive and are frequently combined to meet the user's needs. Using on-line disks for staging data before it is sent to a near-line tape library is a common example.Online
Near-line
Nearline storage is typically less accessible and less expensive than online storage, but still useful for backup data storage. A mechanical device is usually used to move media units from storage into a drive where the data can be read or written. Generally it has safety properties similar to on-line storage. An example is a tape library with restore times ranging from seconds to a few minutes.Off-line
Off-line storage requires some direct action to provide access to the storage media: for example, inserting a tape into a tape drive or plugging in a cable. Because the data is not accessible via any computer except during limited periods in which they are written or read back, they are largely immune to on-line backup failure modes. Access time varies depending on whether the media are on-site or off-site.Off-site data protection
Backup media may be sent to an off-site vault to protect against a disaster or other site-specific problem. The vault can be as simple as a system administrator's home office or as sophisticated as a disaster-hardened, temperature-controlled, high-security bunker with facilities for backup media storage. A data replica can be off-site but also on-line (e.g., an off-site RAID mirror). Such a replica has fairly limited value as a backup.Backup site
A backup site or disaster recovery center is used to store data that can enable computer systems and networks to be restored and properly configure in the event of a disaster. Some organisations have their own data recovery centres, while others contract this out to a third-party. Due to high costs, backing up is rarely considered the preferred method of moving data to a DR site. A more typical way would be remote disk mirroring, which keeps the DR data as up to date as possible.Selection and extraction of data
A backup operation starts with selecting and extracting coherent units of data. Most data on modern computer systems is stored in discrete units, known asFiles
* Copying files : Making copies of files is the simplest and most common way to perform a backup. A means to perform this basic function is included in all backup software and all operating systems. *Partial file copying: A backup may include only the blocks or bytes within a file that have changed in a given period of time. This can substantially reduce needed storage space, but requires higher sophistication to reconstruct files in a restore situation. Some implementations require integration with the source file system. *Deleted files : To prevent the unintentional restoration of files that have been intentionally deleted, a record of the deletion must be kept. *Versioning of files : Most backup applications, other than those that do only full only/System imaging, also back up files that have been modified since the last backup. "That way, you can retrieve many different versions of a given file, and if you delete it on your hard disk, you can still find it in your nformation repositoryarchive."Filesystems
*Filesystem dump: A copy of the whole filesystem in block-level can be made. This is also known as a "raw partition backup" and is related to disk imaging. The process usually involves unmounting the filesystem and running a program like dd (Unix). Because the disk is read sequentially and with large buffers, this type of backup can be faster than reading every file normally, especially when the filesystem contains many small files, is highly fragmented, or is nearly full. But because this method also reads the free disk blocks that contain no useful data, this method can also be slower than conventional reading, especially when the filesystem is nearly empty. Some filesystems, such as XFS, provide a "dump" utility that reads the disk sequentially for high performance while skipping unused sections. The corresponding restore utility can selectively restore individual files or the entire volume at the operator's choice. *Identification of changes: Some filesystems have an archive bit for each file that says it was recently changed. Some backup software looks at the date of the file and compares it with the last backup to determine whether the file was changed. * Versioning file system : A versioning filesystem tracks all changes to a file. The NILFS versioning filesystem for Linux is an example.Live data
Files that are actively being updated present a challenge to back up. One way to back up live data is to temporarily quiesce them (e.g., close all files), take a "snapshot", and then resume live operations. At this point the snapshot can be backed up through normal methods. A snapshot is an instantaneous function of some filesystems that presents a copy of the filesystem as if it were frozen at a specific point in time, often by a copy-on-write mechanism. Snapshotting a file while it is being changed results in a corrupted file that is unusable. This is also the case across interrelated files, as may be found in a conventional database or in applications such as Microsoft Exchange Server. The term fuzzy backup can be used to describe a backup of live data that looks like it ran correctly, but does not represent the state of the data at a single point in time. Backup options for data files that cannot be or are not quiesced include: *Open file backup: Many backup software applications undertake to back up open files in an internally consistent state. Some applications simply check whether open files are in use and try again later. Other applications exclude open files that are updated very frequently. Some low-availability interactive applications can be backed up via natural/induced pausing. *Interrelated database files backup: Some interrelated database file systems offer a means to generate a "hot backup" of the database while it is online and usable. This may include a snapshot of the data files plus a snapshotted log of changes made while the backup is running. Upon a restore, the changes in the log files are applied to bring the copy of the database up to the point in time at which the initial backup ended. Other low-availability interactive applications can be backed up via coordinated snapshots. However, genuinely-high-availability interactive applications can be only be backed up via Continuous Data Protection.Metadata
Not all information stored on the computer is stored in files. Accurately recovering a complete system from scratch requires keeping track of this non-file data too. *System description: System specifications are needed to procure an exact replacement after a disaster. * Boot sector : The boot sector can sometimes be recreated more easily than saving it. It usually isn't a normal file and the system won't boot without it. * Partition layout: The layout of the original disk, as well as partition tables and filesystem settings, is needed to properly recreate the original system. *File metadata : Each file's permissions, owner, group, ACLs, and any other metadata need to be backed up for a restore to properly recreate the original environment. *System metadata: Different operating systems have different ways of storing configuration information. Microsoft Windows keeps a registry of system information that is more difficult to restore than a typical file.Manipulation of data and dataset optimization
It is frequently useful or required to manipulate the data being backed up to optimize the backup process. These manipulations can improve backup speed, restore speed, data security, media usage and/or reduced bandwidth requirements.Automated data grooming
Out-of-date data can be automatically deleted, but for personal backup applications—as opposed to enterprise client-server backup applications where automated data "grooming" can be customized—the deletionSome backup applications—notablyCompression
Various schemes can be employed toDeduplication
Redundancy due to backing up similarly configured workstations can be reduced, thus storing just one copy. This technique can be applied at the file or raw block level. This potentially large reduction is called deduplication. It can occur on a server before any data moves to backup media, sometimes referred to as source/client side deduplication. This approach also reduces bandwidth required to send backup data to its target media. The process can also occur at the target storage device, sometimes referred to as inline or back-end deduplication.Duplication
Sometimes backups are duplicated to a second set of storage media. This can be done to rearrange the archive files to optimize restore speed, or to have a second copy at a different location or on a different storage medium—as in the disk-to-disk-to-tape capability of Enterprise client-server backup.Encryption
High-capacity removable storage media such as backup tapes present a data security risk if they are lost or stolen. Encrypting the data on these media can mitigate this problem, however encryption is a CPU intensive process that can slow down backup speeds, and the security of the encrypted backups is only as effective as the security of the key management policy.Multiplexing
When there are many more computers to be backed up than there are destination storage devices, the ability to use a single storage device with several simultaneous backups can be useful. However cramming the scheduled backup window via "multiplexed backup" is only used for tape destinations.Refactoring
The process of rearranging the sets of backups in an archive file is known as refactoring. For example, if a backup system uses a single tape each day to store the incremental backups for all the protected computers, restoring one of the computers could require many tapes. Refactoring could be used to consolidate all the backups for a single computer onto a single tape, creating a "synthetic full backup". This is especially useful for backup systems that do incrementals forever style backups.Staging
Sometimes backups are copied to a staging disk before being copied to tape. This process is sometimes referred to as D2D2T, an acronym forObjectives
* Recovery point objective (RPO) : The point in time that the restarted infrastructure will reflect, expressed as "the maximum targeted period in which data (transactions) might be lost from an IT service due to a major incident". Essentially, this is the roll-back that will be experienced as a result of the recovery. The most desirable RPO would be the point just prior to the data loss event. Making a more recent recovery point achievable requires increasing the frequency of synchronization between the source data and the backup repository. *Recovery time objective (RTO) : The amount of time elapsed between disaster and restoration of business functions. * Data security : In addition to preserving access to data for its owners, data must be restricted from unauthorized access. Backups must be performed in a manner that does not compromise the original owner's undertaking. This can be achieved with data encryption and proper media handling policies. * Data retention period : Regulations and policy can lead to situations where backups are expected to be retained for a particular period, but not any further. Retaining backups after this period can lead to unwanted liability and sub-optimal use of storage media. *See also
;About backup * Backup software & services ** List of backup software ** List of online backup services * Glossary of backup terms * Virtual backup appliance ;Related topics * Data consistency *Notes
References
External links
* *{{Commons category-inline Computer data * Data management Data security Records management