statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

and, in particular, in the fitting of

linear In mathematics, the term ''linear'' is used in two distinct senses for two different properties: * linearity of a '' function'' (or '' mapping''); * linearity of a '' polynomial''. An example of a linear function is the function defined by f(x) ...

logistic regression In statistics, a logistic model (or logit model) is a statistical model that models the logit, log-odds of an event as a linear function (calculus), linear combination of one or more independent variables. In regression analysis, logistic regres ...

models, the elastic net is a regularized regression method that linearly combines the ''L''₁ and ''L''₂ penalties of the

lasso A lasso or lazo ( or ), also called reata or la reata in Mexico, and in the United States riata or lariat (from Mexican Spanish lasso for roping cattle), is a loop of rope designed as a restraint to be thrown around a target and tightened when ...

and

ridge A ridge is a long, narrow, elevated geomorphologic landform, structural feature, or a combination of both separated from the surrounding terrain by steep sides. The sides of a ridge slope away from a narrow top, the crest or ridgecrest, wi ...

methods. Nevertheless, elastic net regularization is typically more accurate than both methods with regard to reconstruction.

Specification

The elastic net method overcomes the limitations of the

LASSO A lasso or lazo ( or ), also called reata or la reata in Mexico, and in the United States riata or lariat (from Mexican Spanish lasso for roping cattle), is a loop of rope designed as a restraint to be thrown around a target and tightened when ...

(least absolute shrinkage and selection operator) method which uses a penalty function based on :

\, \beta\, _1 = \textstyle \sum_^p , \beta_j, .

Use of this penalty function has several limitations. For example, in the "large ''p'', small ''n''" case (high-dimensional data with few examples), the LASSO selects at most ''n'' variables before it saturates. Also if there is a group of highly correlated variables, then the LASSO tends to select one variable from a group and ignore the others. To overcome these limitations, the elastic net adds a quadratic part (

\, \beta\, ^2

) to the penalty, which when used alone is

ridge regression Ridge regression (also known as Tikhonov regularization, named for Andrey Tikhonov) is a method of estimating the coefficients of multiple- regression models in scenarios where the independent variables are highly correlated. It has been used in m ...

(known also as

Tikhonov regularization Ridge regression (also known as Tikhonov regularization, named for Andrey Tikhonov) is a method of estimating the coefficients of multiple- regression models in scenarios where the independent variables are highly correlated. It has been used in m ...

). The estimates from the elastic net method are defined by :

\hat \equiv \underset (\,  y-X \beta \, ^2 + \lambda_2 \, \beta\, ^2 + \lambda_1 \, \beta\, _1) .

The quadratic penalty term makes the loss function strongly convex, and it therefore has a unique minimum. The elastic net method includes the LASSO and ridge regression: in other words, each of them is a special case where

\lambda_1 = \lambda, \lambda_2 = 0

\lambda_1 = 0, \lambda_2 = \lambda

. Meanwhile, the naive version of elastic net method finds an estimator in a two-stage procedure : first for each fixed

\lambda_2

it finds the ridge regression coefficients, and then does a LASSO type shrinkage. This kind of estimation incurs a double amount of shrinkage, which leads to increased bias and poor predictions. To improve the prediction performance, sometimes the coefficients of the naive version of elastic net is rescaled by multiplying the estimated coefficients by

(1 + \lambda_2)

. Examples of where the elastic net method has been applied are: * Support vector machine * Metric learning * Portfolio optimization *Cancer prognosis

Reduction to support vector machine

It was proven in 2014 that the elastic net can be reduced to the linear

support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised max-margin models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laborato ...

. A similar reduction was previously proven for the LASSO in 2014. The authors showed that for every instance of the elastic net, an artificial binary classification problem can be constructed such that the hyper-plane solution of a linear

(SVM) is identical to the solution

\beta

(after re-scaling). The reduction immediately enables the use of highly optimized SVM solvers for elastic net problems. It also enables the use of

GPU A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal ...

acceleration, which is often already used for large-scale SVM solvers. The reduction is a simple transformation of the original data and regularization constants :

X\in^,y\in ^n,\lambda_1\geq 0,\lambda_2\geq 0

into new artificial data instances and a regularization constant that specify a binary classification problem and the SVM regularization constant :

X_2\in^,y_2\in\^, C\geq 0.

Here,

y_2

consists of binary labels

. When

2p>n

it is typically faster to solve the linear SVM in the primal, whereas otherwise the dual formulation is faster. Some authors have referred to the transformation as Support Vector Elastic Net (SVEN), and provided the following MATLAB pseudo-code: function β=SVEN(X, y, t, λ2); ,p= size(X); X2 = sxfun(@minus, X, y./t); bsxfun(@plus, X, y./t)��; Y2 = nes(p,1);-ones(p,1) if 2p > n then w = SVMPrimal(X2, Y2, C = 1/(2*λ2)); α = C * max(1-Y2.*(X2*w), 0); else α = SVMDual(X2, Y2, C = 1/(2*λ2)); end if β = t * (α(1:p) - α(p+1:2p)) / sum(α);

Software

* "Glmnet: Lasso and elastic-net regularized generalized linear models" is a software which is implemented as an R source package and as a

MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...

toolbox. This includes fast algorithms for estimation of generalized linear models with ℓ₁ (the lasso), ℓ₂ (ridge regression) and mixtures of the two penalties (the elastic net) using cyclical coordinate descent, computed along a regularization path. * JMP Pro 11 includes elastic net regularization, using the Generalized Regression personality with Fit Model. * "pensim: Simulation of high-dimensional data and parallelized repeated penalized regression" implements an alternate, parallelised "2D" tuning method of the ℓ parameters, a method claimed to result in improved prediction accuracy. *

scikit-learn scikit-learn (formerly scikits.learn and also known as sklearn) is a free and open-source machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support ...

includes linear regression and

with elastic net regularization. * SVEN, a

Matlab MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...

implementation of Support Vector Elastic Net. This solver reduces the Elastic Net problem to an instance of SVM binary classification and uses a Matlab SVM solver to find the solution. Because SVM is easily parallelizable, the code can be faster than Glmnet on modern hardware.
SpaSM
a

implementation of sparse regression, classification and principal component analysis, including elastic net regularized regression. *

Apache Spark Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Californ ...

provides support for Elastic Net Regression in it
MLlib
machine learning library. The method is available as a parameter of the more general LinearRegression class. *

SAS (software) SAS (previously "Statistical Analysis System") is a statistical software suite developed by SAS Institute for data management, advanced analytics, multivariate analysis, business intelligence, and predictive analytics. SAS was developed at No ...

The SAS procedure Glmselect and

SAS Viya SAS Viya is an artificial intelligence, analytics and data management platform developed by SAS Institute. History SAS Viya was released in 2016. The software was containerized with the release of Viya 4 in 2020. Viya has become one of SAS' m ...

procedure Regselect support the use of elastic net regularization for model selection.

References

External links

Regularization and Variable Selection via the Elastic Net
(presentation) Logistic regression Regression analysis Machine learning algorithms

Specification

Reduction to support vector machine

Software

References

Further reading

External links