Double descent in

statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

and

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

is the phenomenon where a

model A model is an informative representation of an object, person, or system. The term originally denoted the plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin , . Models can be divided in ...

with a small number of

parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...

s and a model with an extremely large number of parameters both have a small training error, but a model whose number of parameters is about the same as the number of

data points In statistics, a unit of observation is the unit described by the data that one analyzes. A study may treat groups as a unit of observation with a country as the unit of analysis, drawing conclusions on group characteristics from data collected a ...

used to train the model will have a much greater test error than one with a much larger number of parameters. This phenomenon has been considered surprising, as it contradicts assumptions about

overfitting In mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfi ...

in classical machine learning.

History

Early observations of what would later be called double descent in specific models date back to 1989. The term "double descent" was coined by Belkin et. al. in 2019, when the phenomenon gained popularity as a broader concept exhibited by many models. The latter development was prompted by a perceived contradiction between the conventional wisdom that too many parameters in the model result in a significant overfitting error (an extrapolation of the

bias–variance tradeoff In statistics and machine learning, the bias–variance tradeoff describes the relationship between a model's complexity, the accuracy of its predictions, and how well it can make predictions on previously unseen data that were not used to train ...

), and the empirical observations in the 2010s that some modern machine learning techniques tend to perform better with larger models.

Theoretical models

Double descent occurs in

linear regression In statistics, linear regression is a statistical model, model that estimates the relationship between a Scalar (mathematics), scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A mode ...

with

isotropic In physics and geometry, isotropy () is uniformity in all orientations. Precise definitions depend on the subject area. Exceptions, or inequalities, are frequently indicated by the prefix ' or ', hence '' anisotropy''. ''Anisotropy'' is also ...

Gaussian covariates and isotropic Gaussian noise. A model of double descent at the

thermodynamic limit In statistical mechanics, the thermodynamic limit or macroscopic limit, of a system is the Limit (mathematics), limit for a large number of particles (e.g., atoms or molecules) where the volume is taken to grow in proportion with the number of ...

has been analyzed using the

replica trick In the statistical physics of spin glasses and other systems with quenched disorder, the replica trick is a mathematical technique based on the application of the formula: \ln Z=\lim_ or: \ln Z = \lim_ \frac where Z is most commonly the partit ...

, and the result has been confirmed numerically.

Empirical examples

The scaling behavior of double descent has been found to follow a

broken neural scaling law In machine learning, a neural scaling law is an empirical scaling law that describes how neural network performance changes as key factors are scaled up or down. These factors typically include the number of parameters, training dataset size, and ...

Caballero, Ethan; Gupta, Kshitij; Rish, Irina; Krueger, David (2022). "Broken Neural Scaling Laws". International Conference on Learning Representations (ICLR), 2023. functional form.

References

External links

* *
Understanding "Deep Double Descent"
at evhub. Model selection Machine learning Statistical classification {{stat-stub

History

Theoretical models

Empirical examples

See also

References

Further reading

External links