statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, the one in ten rule is a

rule of thumb In English language, English, the phrase ''rule of thumb'' refers to an approximate method for doing something, based on practical experience rather than theory. This usage of the phrase can be traced back to the 17th century and has been associat ...

for how many predictor parameters can be estimated from data when doing regression analysis (in particular proportional hazards models in

survival analysis Survival analysis is a branch of statistics for analyzing the expected duration of time until one event occurs, such as death in biological organisms and failure in mechanical systems. This topic is called reliability theory, reliability analysis ...

and

logistic regression In statistics, a logistic model (or logit model) is a statistical model that models the logit, log-odds of an event as a linear function (calculus), linear combination of one or more independent variables. In regression analysis, logistic regres ...

) while keeping the risk of

overfitting In mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfi ...

and finding spurious correlations low. The rule states that one predictive variable can be studied for every ten events. For logistic regression the number of events is given by the size of the smallest of the outcome categories, and for survival analysis it is given by the number of uncensored events. In other words: for each feature we need 10 observations/labels. For example, if a sample of 200 patients is studied and 20 patients die during the study (so that 180 patients survive), the one in ten rule implies that two pre-specified predictors can reliably be fitted to the total data. Similarly, if 100 patients die during the study (so that 100 patients survive), ten pre-specified predictors can be fitted reliably. If more are fitted, the rule implies that overfitting is likely and the results will not predict well outside the training data. It is not uncommon to see the 1:10 rule violated in fields with many variables (e.g. gene expression studies in cancer), decreasing the confidence in reported findings.

Improvements

A "one in 20 rule" has been suggested, indicating the need for shrinkage of regression coefficients, and a "one in 50 rule" for stepwise selection with the default

p-value In null-hypothesis significance testing, the ''p''-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small ''p''-value means ...

of 5%. Other studies, however, show that the one in ten rule may be too conservative as a general recommendation and that five to nine events per predictor can be enough, depending on the research question. More recently, a study has shown that the ratio of events per predictive variable is not a reliable statistic for estimating the minimum number of events for estimating a logistic prediction model. Instead, the number of predictor variables, the total sample size (events + non-events) and the events fraction (events / total sample size) can be used to calculate the expected prediction error of the model that is to be developed. One can then estimate the required sample size to achieve an expected prediction error that is smaller than a predetermined allowable prediction error value. Alternatively, three requirements for prediction model estimation have been suggested: the model should have a global shrinkage factor of ≥ 0.9, an absolute difference of ≤ 0.05 in the model's apparent and adjusted Nagelkerke R², and a precise estimation of the overall risk or rate in the target population. The necessary sample size and number of events for model development are then given by the values that meet these requirements.

Other modalities

For highly

correlated In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistic ...

input data the one-in-10 rule (10 observations or labels needed per feature) may not be directly applicable due to the high correlation of the features: For images there is a rule of thumb that per class 1000 examples are needed.Applications of Machine Learning and Artificial Intelligence in Education. (2022). USA: IGI Global. Page 53, https://books.google.com/books?id=l59lEAAAQBAJ&dq=%22one%20in%20ten%20rule%22%20%20images%20machine%20learning&pg=PA53 This would mean that for a binary classification of images (with fictive 1000

pixel In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a Raster graphics, raster image, or the smallest addressable element in a dot matrix display device. In most digital display devices, p ...

x 1000 pixel per image, i.e. 1 000 000 features per image), we would only require 2000 labels /1 000 0000 pixel = 0.002 labels per pixel or 0.002 labels per feature. This is however only due to the high (spatial) correlation of pixels.

Literature

* David A. Freedman (1983) "A Note on Screening Regression Equations," ''The American Statistician'', 37:2, 152–155,

References

{{Reflist, 30em Rules of thumb Regression variable selection