Data degradation is the gradual
corruption
Corruption is a form of dishonesty or a criminal offense that is undertaken by a person or an organization that is entrusted in a position of authority to acquire illicit benefits or abuse power for one's gain. Corruption may involve activities ...
of
computer data
''In computer science, data (treated as singular, plural, or as a mass noun) is any sequence of one or more symbols; datum is a single symbol of data. Data requires interpretation to become information. Digital data is data that is represen ...
due to an accumulation of non-critical failures in a
data storage device
Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted form ...
. It is also referred to as data decay, data rot or bit rot.
This results in a decline in data quality over time, even when the data is not being utilized. The concept of data degradation involves progressively minimizing data in interconnected processes, where data is used for multiple purposes at different levels of detail. At specific points in the process chain, data is irreversibly reduced to a level that remains sufficient for the successful completion of the following steps
Manifestations
Primary storages
Data degradation in
dynamic random-access memory
Dynamics (from Greek language, Greek δυναμικός ''dynamikos'' "powerful", from δύναμις ''dynamis'' "power (disambiguation), power") or dynamic may refer to:
Physics and engineering
* Dynamics (mechanics), the study of forces and t ...
(DRAM) can occur when the
electric charge
Electric charge (symbol ''q'', sometimes ''Q'') is a physical property of matter that causes it to experience a force when placed in an electromagnetic field. Electric charge can be ''positive'' or ''negative''. Like charges repel each other and ...
of a
bit in DRAM disperses, possibly altering program code or stored data. DRAM may be altered by
cosmic ray
Cosmic rays or astroparticles are high-energy particles or clusters of particles (primarily represented by protons or atomic nuclei) that move through space at nearly the speed of light. They originate from the Sun, from outside of the ...
s or other high-energy particles. Such data degradation is known as a
soft error.
ECC memory
Error correction code memory (ECC memory) is a type of computer data storage that uses an error correction code (ECC) to detect and correct ''n''-bit data corruption which occurs in memory.
Typically, ECC memory maintains a memory system immun ...
can be used to mitigate this type of data degradation.
Secondary storages
Data degradation results from the gradual decay of
storage media over the course of years or longer. Causes vary by medium.
Solid-state media
EPROM
An EPROM (rarely EROM), or erasable programmable read-only memory, is a type of programmable read-only memory (PROM) integrated circuit, chip that retains its data when its power supply is switched off. Computer memory that can retrieve stored d ...
s,
flash memory
Flash memory is an Integrated circuit, electronic Non-volatile memory, non-volatile computer memory storage medium that can be electrically erased and reprogrammed. The two main types of flash memory, NOR flash and NAND flash, are named for t ...
and other
solid-state drive
A solid-state drive (SSD) is a type of solid-state storage device that uses integrated circuits to store data persistently. It is sometimes called semiconductor storage device, solid-state device, or solid-state disk.
SSDs rely on non- ...
store data using electrical charges, which can slowly leak away due to imperfect insulation. Modern flash controller chips account for this leak by trying several lower threshold voltages (until
ECC passes), prolonging the age of data.
Multi-level cell
In electronics, a multi-level cell (MLC) is a memory cell (computing), memory cell capable of storing more than a single bit of information, compared to a single-level cell (SLC), which can store only one bit per memory cell. A memory cell typical ...
s with much lower distance between voltage levels cannot be considered stable without this functionality.
The chip itself is not affected by this, so reprogramming it approximately once per decade prevents decay. An undamaged copy of the master data is required for the reprogramming. A
checksum
A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify dat ...
can be used to assure that the on-chip data is not yet damaged and ready for reprogramming.
The typical SD card, USB stick and M.2 NVMe all have a limited endurance. Power on can usually recover data but error rates will eventually degrade the media to illegibility. Writing zeros to a degraded NAND device can revive the storage to close to new condition for further use. Refresh cycles should be no longer than 6 months to be sure the device is legible.
Magnetic media
Magnetic media, such as
hard disk drive
A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating hard disk drive platter, pla ...
s,
floppy disk
A floppy disk or floppy diskette (casually referred to as a floppy, a diskette, or a disk) is a type of disk storage composed of a thin and flexible disk of a magnetic storage medium in a square or nearly square plastic enclosure lined with a ...
s and
magnetic tape
Magnetic tape is a medium for magnetic storage made of a thin, magnetizable coating on a long, narrow strip of plastic film. It was developed in Germany in 1928, based on the earlier magnetic wire recording from Denmark. Devices that use magnetic ...
s, may experience data decay as bits lose their magnetic orientation. Higher temperature speeds up the rate of magnetic loss. As with solid-state media, re-writing is useful as long as the medium itself is not damaged (see below).
Modern hard drives use
Giant magnetoresistance and have a higher magnetic lifespan on the order of decades. They also automatically correct any errors detected by ECC through rewriting. The reliance on a
servowriter can complicate data recovery if it becomes unrecoverable, however.
Floppy disks and tapes are poorly protected against ambient air. In warm/humid conditions, they are prone to the physical
decomposition
Decomposition is the process by which dead organic substances are broken down into simpler organic or inorganic matter such as carbon dioxide, water, simple sugars and mineral salts. The process is a part of the nutrient cycle and is ess ...
of the storage medium.
Optical media
Optical media
An optical disc is a flat, usuallyNon-circular optical discs exist for fashion purposes; see shaped compact disc. disc-shaped object that stores information in the form of physical variations on its surface that can be read with the aid o ...
such as
CD-R
CD-R (Compact disc-recordable) is a digital media, digital optical disc data storage device, storage format. A CD-R disc is a compact disc that can only be Write once read many, written once and read arbitrarily many times.
CD-R discs (CD-Rs) ...
,
DVD-R
DVD recordable and DVD rewritable are a collection of optical disc formats that can be written to by a DVD recorder and by computers using a DVD writer. The "recordable" discs are write-once read-many (WORM) media, where as "rewritable" discs a ...
and
BD-R, may experience data decay from the
breakdown of the storage medium. This can be mitigated by storing discs in a dark, cool, low humidity location. "Archival quality" discs are available with an extended lifetime, but are still not permanent. However,
data integrity scanning that measures the rates of various types of errors is able to predict data decay on optical media well ahead of uncorrectable data loss occurring.
Both the disc dye and the disc backing layer are potentially susceptible to breakdown. Early cyanine-based dyes used in CD-R were notorious for their lack of UV stability. Early CDs also suffered from
CD bronzing, and is related to a combination of bad lacquer material and failure of the aluminum reflection layer.
Later discs use more stable dyes or forgo them for an inorganic mixture. The aluminum layer is also commonly swapped out for gold or silver alloy.
Paper media
Paper media, such as
punched cards
A punched card (also punch card or punched-card) is a stiff paper-based medium used to store digital information via the presence or absence of holes in predefined positions. Developed over the 18th to 20th centuries, punched cards were wide ...
and
punched tape
file:PaperTapes-5and8Hole.jpg, Five- and eight-hole wide punched paper tape
file:Harwell-dekatron-witch-10.jpg, Paper tape reader on the Harwell computer with a small piece of five-hole tape connected in a circle – creating a physical program ...
, may literally
rot.
Mylar
BoPET (biaxially oriented polyethylene terephthalate) is a polyester film made from stretched polyethylene terephthalate (PET) and is used for its high tensile strength, chemical stability, dimensional stability, transparency reflectivity, an ...
punched tape is another approach that does not rely on electromagnetic stability. Degradation of
books
A book is a structured presentation of recorded information, primarily verbal and graphical, through a medium. Originally physical, electronic books and audiobooks are now existent. Physical books are objects that contain printed material, mo ...
and
printing paper is primarily driven by
acid hydrolysis of
glycosidic bonds within the
cellulose
Cellulose is an organic compound with the chemical formula, formula , a polysaccharide consisting of a linear chain of several hundred to many thousands of glycosidic bond, β(1→4) linked glucose, D-glucose units. Cellulose is an important s ...
molecule as well as by
oxidation
Redox ( , , reduction–oxidation or oxidation–reduction) is a type of chemical reaction in which the oxidation states of the reactants change. Oxidation is the loss of electrons or an increase in the oxidation state, while reduction is ...
;
degradation of paper is accelerated by high
relative humidity
Humidity is the concentration of water vapor present in the air. Water vapor, the gaseous state of water, is generally invisible to the human eye. Humidity indicates the likelihood for precipitation (meteorology), precipitation, dew, or fog t ...
, high temperature, as well as by exposure to acids, oxygen, light, and various pollutants, including various
volatile organic compounds
Volatile organic compounds (VOCs) are organic compounds that have a high vapor pressure at room temperature. They are common and exist in a variety of settings and products, not limited to house mold, upholstered furniture, arts and crafts sup ...
and
nitrogen dioxide
Nitrogen dioxide is a chemical compound with the formula . One of several nitrogen oxides, nitrogen dioxide is a reddish-brown gas. It is a paramagnetic, bent molecule with C2v point group symmetry. Industrially, is an intermediate in the s ...
.
Streaming media
Data degradation in
streaming media
Streaming media refers to multimedia delivered through a Computer network, network for playback using a Media player (disambiguation), media player. Media is transferred in a ''stream'' of Network packet, packets from a Server (computing), ...
acquisition modules, as addressed by the repair algorithms, reflects real-time data quality issues caused by device limitations. However, a more general form of data degradation refers to the gradual decay of storage media over extended periods, influenced by factors like physical wear, environmental conditions, or technological obsolescence. Causes of such degradation can vary depending on the medium, such as magnetic fields in hard drives, moisture or temperature for tape storage, or electronic failure over time.
Example
One manifestation of data degradation is when one or a few bits are randomly flipped over a long period of time. This is illustrated by several digital images below, all consisting of 326,272 bits. The original photo is displayed first. In the next image, a single bit was changed from 0 to 1. In the next two images, two and three bits were flipped. On
Linux
Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
systems, the binary difference between files can be revealed using the command (e.g. ).
File:Bitrot in JPEG files, 0 bits flipped.jpg, 0 bits flipped
File:Bitrot in JPEG files, 1 bit flipped.jpg, 1 bit flipped
File:Bitrot in JPEG files, 2 bits flipped.jpg, 2 bits flipped
File:Bitrot in JPEG files, 3 bits flipped.jpg, 3 bits flipped
Causes
This deterioration can be caused by a variety of factors that impact the reliability and integrity of digital information, including physical factors,
software errors, security breaches,
human error
Human error is an action that has been done but that was "not intended by the actor; not desired by a set of rules or an external observer; or that led the task or system outside its acceptable limits".Senders, J.W. and Moray, N.P. (1991) Human Er ...
, obsolete technology, and unauthorized access incidents.
Most disk,
disk controller
A disk controller is a controller circuit that enables a CPU to communicate with a hard disk, floppy disk or other kind of disk drive. It also provides an interface between the disk drive and the bus connecting it to the rest of the system.{ ...
and higher-level systems are subject to a slight chance of unrecoverable failure. With ever-growing disk capacities, file sizes, and increases in the amount of data stored on a disk, the likelihood of the occurrence of data decay and other forms of uncorrected and undetected
data corruption
Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data. Computer, transmission, and storage systems use a number of meas ...
increases.
Low-level disk controllers typically employ
error correction code
In computing, telecommunication, information theory, and coding theory, forward error correction (FEC) or channel coding is a technique used for controlling errors in data transmission over unreliable or noisy communication channels.
The centra ...
s (ECC) to correct erroneous data.
Higher-level software systems may be employed to mitigate the risk of such underlying failures by increasing redundancy and implementing integrity checking, error correction codes and self-repairing algorithms.
The
ZFS
ZFS (previously Zettabyte File System) is a file system with Volume manager, volume management capabilities. It began as part of the Sun Microsystems Solaris (operating system), Solaris operating system in 2001. Large parts of Solaris, includin ...
file system was designed to address many of these data corruption issues.
The
Btrfs
Btrfs (pronounced as "better F S", "butter F S", "b-tree F S", or "B.T.R.F.S.") is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager (distinct from Linux's LVM), d ...
file system also includes data protection and recovery mechanisms,
as does
ReFS
Resilient File System (ReFS), codenamed "Protogon", is a Microsoft proprietary file system introduced with Windows Server 2012 with the intent of becoming the "next generation" file system after NTFS.
ReFS was designed to overcome problem ...
.
Mitigation
There is no solution that completely eliminates the threat of data degradation, but various measures exist that can stave it off. One of these is to
replicate the data as
backup
In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "wikt:back ...
s. Both the original and backed data are then
audited for any faults due to storage media errors by
checksum
A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify dat ...
ming the data or comparing it with that of other copies. This is the only way to detect ''latent'' faults proactively, which might otherwise go unnoticed until the data is actually accessed.
Current storage systems such as those based on
RAID
RAID (; redundant array of inexpensive disks or redundant array of independent disks) is a data storage virtualization technology that combines multiple physical Computer data storage, data storage components into one or more logical units for th ...
already employ such measures internally. Ideally, and especially for data that must be
preserved digitally, the replicas should be distributed across multiple administrative sites that function autonomously and deploy various hardware and software, increasing resistance to failure, as well as human error and cyberattacks.
See also
*
Cliff effect
*
Database integrity
*
Data curation
Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted formal ...
*
Data preservation
*
Data scrubbing
Data scrubbing is an error correction technique that uses a background task to periodically inspect main memory or storage for errors, then corrects detected errors using redundant data in the form of different checksums or copies of data. Data ...
*
Digital permanence
*
Disc rot
*
Error detection and correction
In information theory and coding theory with applications in computer science and telecommunications, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communi ...
*
Link rot
Link rot (also called link death, link breaking, or reference rot) is the phenomenon of hyperlinks tending over time to cease to point to their originally targeted file, web page, or server due to that resource being relocated to a new address ...
*
Media preservation
Preservation of documents, pictures, recordings, digital content, etc., is a major aspect of archival science. It is also an important consideration for people who are creating time capsules, family history, historical documents, scrapbooks a ...
*
RAR archive file format has optional recovery
*
PAR2 recovery file format
References
Sources
*
*
*
{{Data
Computer jargon
Data quality