HOME

TheInfoList



OR:

The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the differences between values (sample or population values) predicted by a model or an
estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, th ...
and the values observed. The RMSD represents the square root of the second
sample moment In mathematics, the moments of a function are certain quantitative measures related to the shape of the function's graph. If the function represents mass density, then the zeroth moment is the total mass, the first moment (normalized by total m ...
of the differences between predicted values and observed values or the
quadratic mean In mathematics and its applications, the root mean square of a set of numbers x_i (abbreviated as RMS, or rms and denoted in formulas as either x_\mathrm or \mathrm_x) is defined as the square root of the mean square (the arithmetic mean of the ...
of these differences. These deviations are called '' residuals'' when the calculations are performed over the data sample that was used for estimation and are called ''errors'' (or prediction errors) when computed out-of-sample. The RMSD serves to aggregate the magnitudes of the errors in predictions for various data points into a single measure of predictive power. RMSD is a measure of
accuracy Accuracy and precision are two measures of ''observational error''. ''Accuracy'' is how close a given set of measurements (observations or readings) are to their '' true value'', while ''precision'' is how close the measurements are to each oth ...
, to compare forecasting errors of different models for a particular dataset and not between datasets, as it is scale-dependent. RMSD is always non-negative, and a value of 0 (almost never achieved in practice) would indicate a perfect fit to the data. In general, a lower RMSD is better than a higher one. However, comparisons across different types of data would be invalid because the measure is dependent on the scale of the numbers used. RMSD is the square root of the average of squared errors. The effect of each error on RMSD is proportional to the size of the squared error; thus larger errors have a disproportionately large effect on RMSD. Consequently, RMSD is sensitive to outliers.


Formula

The RMSD of an
estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, th ...
\hat with respect to an estimated parameter \theta is defined as the square root of the
mean square error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between ...
: :\operatorname(\hat) = \sqrt = \sqrt. For an
unbiased estimator In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called ''unbiased''. In sta ...
, the RMSD is the square root of the
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
, known as the
standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, w ...
. The RMSD of predicted values \hat y_t for times ''t'' of a regression's
dependent variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
y_t, with variables observed over ''T'' times, is computed for ''T'' different predictions as the square root of the mean of the squares of the deviations: :\operatorname=\sqrt. (For regressions on
cross-sectional data Cross-sectional data, or a cross section of a study population, in statistics and econometrics, is a type of data collected by observing many subjects (such as individuals, firms, countries, or regions) at the one point or period of time. The analy ...
, the subscript ''t'' is replaced by ''i'' and ''T'' is replaced by ''n''.) In some disciplines, the RMSD is used to compare differences between two things that may vary, neither of which is accepted as the "standard". For example, when measuring the average difference between two time series x_ and x_, the formula becomes :\operatorname= \sqrt.


Normalization

Normalizing the RMSD facilitates the comparison between datasets or models with different scales. Though there is no consistent means of normalization in the literature, common choices are the mean or the range (defined as the maximum value minus the minimum value) of the measured data: :\mathrm = \frac or \mathrm = \frac . This value is commonly referred to as the ''normalized root-mean-square deviation'' or ''error'' (NRMSD or NRMSE), and often expressed as a percentage, where lower values indicate less residual variance. In many cases, especially for smaller samples, the sample range is likely to be affected by the size of sample which would hamper comparisons. Another possible method to make the RMSD a more useful comparison measure is to divide the RMSD by the
interquartile range In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the difference ...
. When dividing the RMSD with the IQR the normalized value gets less sensitive for extreme values in the target variable. :\mathrm = \frac where IQR = Q_3 - Q_1 with Q_1 = \text^(0.25) and Q_3 = \text^(0.75) , where CDF−1 is the
quantile function In probability and statistics, the quantile function, associated with a probability distribution of a random variable, specifies the value of the random variable such that the probability of the variable being less than or equal to that value ...
. When normalizing by the mean value of the measurements, the term ''coefficient of variation of the RMSD, CV(RMSD)'' may be used to avoid ambiguity. This is analogous to the
coefficient of variation In probability theory and statistics, the coefficient of variation (CV), also known as relative standard deviation (RSD), is a standardized measure of dispersion of a probability distribution or frequency distribution. It is often expressed a ...
with the RMSD taking the place of the
standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, w ...
. : \mathrm = \frac .


Mean absolute error

Some researchers have recommended the use of the
Mean Absolute Error In statistics, mean absolute error (MAE) is a measure of errors between paired observations expressing the same phenomenon. Examples of ''Y'' versus ''X'' include comparisons of predicted versus observed, subsequent time versus initial time, and ...
(MAE) instead of the Root Mean Square Deviation. MAE possesses advantages in interpretability over RMSD. MAE is the average of the absolute values of the errors. MAE is fundamentally easier to understand than the square root of the average of squared errors. Furthermore, each error influences MAE in direct proportion to the absolute value of the error, which is not the case for RMSD. However, MAE is not a substitute, as it accounts only for the
systematic error Observational error (or measurement error) is the difference between a measured value of a quantity and its true value.Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. In statistics, an error is not necessarily a " mista ...
s, while RMSD accounts for both systematic and
random error Observational error (or measurement error) is the difference between a measured value of a quantity and its true value.Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. In statistics, an error is not necessarily a " mistak ...
s.


Applications

*In
meteorology Meteorology is a branch of the atmospheric sciences (which include atmospheric chemistry and physics) with a major focus on weather forecasting. The study of meteorology dates back millennia, though significant progress in meteorology did no ...
, to see how effectively a
mathematical Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
model predicts the behavior of the
atmosphere An atmosphere () is a layer of gas or layers of gases that envelop a planet, and is held in place by the gravity of the planetary body. A planet retains an atmosphere when the gravity is great and the temperature of the atmosphere is low. A s ...
. *In
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
, the
root-mean-square deviation of atomic positions In bioinformatics, the root-mean-square deviation of atomic positions, or simply root-mean-square deviation (RMSD), is the measure of the average distance between the atoms (usually the backbone atoms) of superimposed proteins. Note that RMSD calc ...
is the measure of the average distance between the atoms of
superimposed Superimposition is the placement of one thing over another, typically so that both are still evident. Graphics In graphics, superimposition is the placement of an image or video on top of an already-existing image or video, usually to add to t ...
proteins Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
. *In structure based drug design, the RMSD is a measure of the difference between a crystal conformation of the ligand conformation and a docking prediction. *In
economics Economics () is the social science that studies the production, distribution, and consumption of goods and services. Economics focuses on the behaviour and interactions of economic agents and how economies work. Microeconomics analy ...
, the RMSD is used to determine whether an economic model fits
economic indicator An economic indicator is a statistic about an economic activity. Economic indicators allow analysis of economic performance and predictions of future performance. One application of economic indicators is the study of business cycles. Economic in ...
s. Some experts have argued that RMSD is less reliable than Relative Absolute Error. *In
experimental psychology Experimental psychology refers to work done by those who apply experimental methods to psychological study and the underlying processes. Experimental psychologists employ human participants and animal subjects to study a great many topics, in ...
, the RMSD is used to assess how well mathematical or computational models of behavior explain the empirically observed behavior. *In
GIS A geographic information system (GIS) is a type of database containing geographic data (that is, descriptions of phenomena for which location is relevant), combined with software tools for managing, analyzing, and visualizing those data. In a ...
, the RMSD is one measure used to assess the accuracy of spatial analysis and remote sensing. *In
hydrogeology Hydrogeology (''hydro-'' meaning water, and ''-geology'' meaning the study of the Earth) is the area of geology that deals with the distribution and movement of groundwater in the soil and rocks of the Earth's crust (commonly in aq ...
, RMSD and NRMSD are used to evaluate the calibration of a groundwater model. *In
imaging science Imaging is the representation or reproduction of an object's form; especially a visual representation (i.e., the formation of an image). Imaging technology is the application of materials and methods to create, preserve, or duplicate images. ...
, the RMSD is part of the
peak signal-to-noise ratio Peak signal-to-noise ratio (PSNR) is an engineering term for the ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation. Because many signals have a very wide dynamic ...
, a measure used to assess how well a method to reconstruct an image performs relative to the original image. *In
computational neuroscience Computational neuroscience (also known as theoretical neuroscience or mathematical neuroscience) is a branch of neuroscience which employs mathematical models, computer simulations, theoretical analysis and abstractions of the brain to u ...
, the RMSD is used to assess how well a system learns a given model. *In
protein nuclear magnetic resonance spectroscopy Nuclear magnetic resonance spectroscopy of proteins (usually abbreviated protein NMR) is a field of structural biology in which NMR spectroscopy is used to obtain information about the structure and dynamics of proteins, and also nucleic acids, and ...
, the RMSD is used as a measure to estimate the quality of the obtained bundle of structures. *Submissions for the
Netflix Prize The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films, i.e. without the users being identified ...
were judged using the RMSD from the test dataset's undisclosed "true" values. *In the simulation of energy consumption of buildings, the RMSE and CV(RMSE) are used to calibrate models to measured building performance. *In
X-ray crystallography X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles ...
, RMSD (and RMSZ) is used to measure the deviation of the molecular internal coordinates deviate from the restraints library values. *In control theory, the RMSE is used as a quality measure to evaluate the performance of a
State observer In control theory, a state observer or state estimator is a system that provides an estimate of the internal state of a given real system, from measurements of the input and output of the real system. It is typically computer-implemented, and pro ...
.https://kalman-filter.com/root-mean-square-error
/ref>


See also

*
Root mean square In mathematics and its applications, the root mean square of a set of numbers x_i (abbreviated as RMS, or rms and denoted in formulas as either x_\mathrm or \mathrm_x) is defined as the square root of the mean square (the arithmetic mean of th ...
*
Mean absolute error In statistics, mean absolute error (MAE) is a measure of errors between paired observations expressing the same phenomenon. Examples of ''Y'' versus ''X'' include comparisons of predicted versus observed, subsequent time versus initial time, and ...
*
Average absolute deviation The average absolute deviation (AAD) of a data set is the average of the absolute deviations from a central point. It is a summary statistic of statistical dispersion or variability. In the general form, the central point can be a mean, median, ...
* Mean signed deviation * Mean squared deviation *
Squared deviations Squared deviations from the mean (SDM) result from squaring deviations. In probability theory and statistics, the definition of ''variance'' is either the expected value of the SDM (when considering a theoretical distribution) or its average va ...
*
Errors and residuals in statistics In statistics and optimization, errors and residuals are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its " true value" (not necessarily observable). The er ...
{{Machine learning evaluation metrics


References

Point estimation performance Statistical deviation and dispersion