HOME

TheInfoList



OR:

Memory scrubbing consists of reading from each
computer memory Computer memory stores information, such as data and programs, for immediate use in the computer. The term ''memory'' is often synonymous with the terms ''RAM,'' ''main memory,'' or ''primary storage.'' Archaic synonyms for main memory include ...
location, correcting
bit error In digital transmission, the number of bit errors is the number of received bits of a data stream over a communication channel that have been altered due to noise, interference, distortion or bit synchronization errors. The bit error rate (BER) ...
s (if any) with an error-correcting code ( ECC), and writing the corrected data back to the same location. Due to the high integration density of modern computer memory
chips ''CHiPs'' is an American crime drama television series created by Rick Rosner and originally aired on NBC from September 15, 1977, to May 1, 1983. After the final first-run telecast on NBC in May 1983, the series went into reruns on Sundays fr ...
, the individual memory cell structures became small enough to be vulnerable to
cosmic ray Cosmic rays or astroparticles are high-energy particles or clusters of particles (primarily represented by protons or atomic nuclei) that move through space at nearly the speed of light. They originate from the Sun, from outside of the ...
s and/or
alpha particle Alpha particles, also called alpha rays or alpha radiation, consist of two protons and two neutrons bound together into a particle identical to a helium-4 nucleus. They are generally produced in the process of alpha decay but may also be produce ...
emission. The errors caused by these phenomena are called
soft error In electronics and computing, a soft error is a type of error where a signal or datum is wrong. Errors may be caused by a defect, usually understood either to be a mistake in design or construction, or a broken component. A soft error is also a ...
s. Over 8% of dual in-line memory modules (
DIMM A DIMM (Dual In-line Memory Module) is a popular type of memory module used in computers. It is a printed circuit board with one or both sides (front and back) holding DRAM chips and pins. The vast majority of DIMMs are manufactured in compl ...
s) experience at least one correctable error per year. This can be a problem for
DRAM Dram, DRAM, or drams may refer to: Technology and engineering * Dram (unit), a unit of mass and volume, and an informal name for a small amount of liquor, especially whisky or whiskey * Dynamic random-access memory, a type of electronic semicondu ...
and SRAM based memories. The probability of a soft error at any individual memory bit is very small. However, together with the large amount of memory modern computersespecially serversare equipped with, and together with extended periods of
uptime Uptime is a Measurement, measure of system reliability, expressed as the period of system time, time a machine, typically a computer, has been continuously working and available. Uptime is the opposite of downtime. It is often used as a measure ...
, the probability of soft errors in the total memory installed is significant. The information in an
ECC memory Error correction code memory (ECC memory) is a type of computer data storage that uses an error correction code (ECC) to detect and correct ''n''-bit data corruption which occurs in memory. Typically, ECC memory maintains a memory system immun ...
is stored redundantly enough to correct single bit error per memory word. Hence, an ECC memory can support the scrubbing of the memory content. Namely, if the
memory controller A memory controller, also known as memory chip controller (MCC) or a memory controller unit (MCU), is a digital circuit that manages the flow of data going to and from a computer's main memory. When a memory controller is integrated into anothe ...
scans systematically through the memory, the single bit errors can be detected, the erroneous bit can be determined using the ECC
checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify dat ...
, and the corrected data can be written back to the memory.


Overview

It is important to check each memory location periodically, frequently enough, before ''multiple'' bit errors within the same word are too likely to occur, because the ''one'' bit errors can be corrected, but the ''multiple'' bit errors are not correctable, in the case of usual (as of 2008) ECC memory modules. In order not to disturb regular memory requests from the
CPU A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, log ...
and thus prevent decreasing
performance A performance is an act or process of staging or presenting a play, concert, or other form of entertainment. It is also defined as the action or process of carrying out or accomplishing an action, task, or function. Performance has evolved glo ...
, scrubbing is usually only done during idle periods. As the scrubbing consists of normal read and write operations, it may increase
power consumption Electric energy consumption is energy consumption in the form of electrical energy. About a fifth of global energy is consumed as electricity: for residential, industrial, commercial, transportation and other purposes. The global electricity con ...
for the memory compared to non-scrubbing operation. Therefore, scrubbing is not performed continuously but periodically. For many servers, the scrub period can be configured in the
BIOS In computing, BIOS (, ; Basic Input/Output System, also known as the System BIOS, ROM BIOS, BIOS ROM or PC BIOS) is a type of firmware used to provide runtime services for operating systems and programs and to perform hardware initialization d ...
setup program. The normal memory reads issued by the CPU or DMA devices are checked for ECC errors, but due to
data locality Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted form ...
reasons they can be confined to a small range of addresses, leaving other memory locations untouched for a very long time. These locations can become vulnerable to more than one soft error, while scrubbing ensures the checking of the whole memory within a guaranteed time. On some systems, not only the main memory (DRAM-based) can be scrubbed but also the
CPU cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, whi ...
s (SRAM-based). On most systems the scrubbing rates for both can be set independently. Because cache is much smaller than the main memory, the scrubbing for caches does not need to happen as frequently. Memory scrubbing increases reliability, therefore it can be classified as a
Reliability, availability and serviceability Reliability, availability and serviceability (RAS), also known as reliability, availability, and maintainability (RAM), is a computer hardware engineering term involving reliability engineering, high availability, and serviceability design. The p ...
(RAS) feature.


Variants

There are usually two variants, known as ''patrol scrubbing'' and ''demand scrubbing''. While they both essentially perform memory scrubbing and associated error correction (if it is doable), the main difference is how these two variants are initiated and executed. Patrol scrubbing runs in an automated manner when the system is idle, while demand scrubbing performs the error correction when the data is actually requested from main memory.


See also

*
Data scrubbing Data scrubbing is an error correction technique that uses a background task to periodically inspect main memory or storage for errors, then corrects detected errors using redundant data in the form of different checksums or copies of data. Data ...
, a general category containing memory scrubbing *
Soft error In electronics and computing, a soft error is a type of error where a signal or datum is wrong. Errors may be caused by a defect, usually understood either to be a mistake in design or construction, or a broken component. A soft error is also a ...
, an important reason for doing memory scrubbing *
Error detection and correction In information theory and coding theory with applications in computer science and telecommunications, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communi ...
, a general theory used for memory scrubbing *
Memory refresh Memory refresh is a process of periodically reading information from an area of computer memory and immediately rewriting the read information to the same area without modification, for the purpose of preserving the information."refresh cycle" in ...
, which preserves information stored in memory


References

{{DEFAULTSORT:Memory Scrubbing Computer memory