Exponential smoothing is a

rule of thumb In English, the phrase ''rule of thumb'' refers to an approximate method for doing something, based on practical experience rather than theory. This usage of the phrase can be traced back to the 17th century and has been associated with various t ...

technique for smoothing

time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Ex ...

data using the exponential

window function In signal processing and statistics, a window function (also known as an apodization function or tapering function) is a mathematical function that is zero-valued outside of some chosen interval, normally symmetric around the middle of the int ...

. Whereas in the

simple moving average In statistics, a moving average (rolling average or running average) is a calculation to analyze data points by creating a series of averages of different subsets of the full data set. It is also called a moving mean (MM) or rolling mean and is ...

the past observations are weighted equally, exponential functions are used to assign exponentially decreasing weights over time. It is an easily learned and easily applied procedure for making some determination based on prior assumptions by the user, such as seasonality. Exponential smoothing is often used for analysis of time-series data. Exponential smoothing is one of many window functions commonly applied to smooth data in

signal processing Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing '' signals'', such as sound, images, and scientific measurements. Signal processing techniques are used to optimize transmissions, ...

, acting as

low-pass filter A low-pass filter is a filter that passes signals with a frequency lower than a selected cutoff frequency and attenuates signals with frequencies higher than the cutoff frequency. The exact frequency response of the filter depends on the filt ...

s to remove high-frequency

noise Noise is unwanted sound considered unpleasant, loud or disruptive to hearing. From a physics standpoint, there is no distinction between noise and desired sound, as both are vibrations through a medium, such as air or water. The difference aris ...

. This method is preceded by Poisson's use of recursive exponential window functions in convolutions from the 19th century, as well as Kolmogorov and Zurbenko's use of recursive moving averages from their studies of turbulence in the 1940s. The raw data sequence is often represented by

\

beginning at time

t = 0

, and the output of the exponential smoothing algorithm is commonly written as

\

, which may be regarded as a best estimate of what the next value of

x

will be. When the sequence of observations begins at time

t = 0

, the simplest form of exponential smoothing is given by the formulas: :

\begin
s_0& = x_0\\
s_t & = \alpha x_ + (1-\alpha)s_,\quad t>0
\end

where

\alpha

is the ''smoothing factor'', and

0 < \alpha < 1

Basic (simple) exponential smoothing

The use of the exponential window function is first attributed to Poisson as an extension of a numerical analysis technique from the 17th century, and later adopted by the

community in the 1940s. Here, exponential smoothing is the application of the exponential, or Poisson,

. Exponential smoothing was first suggested in the statistical literature without citation to previous work by Robert Goodell Brown in 1956, and then expanded by

Charles C. Holt Charles C. Holt (21 May 1921 – 13 December 2010) was Professor at the Department of Management at the McCombs School of Business at the University of Texas at Austin. He is well known for his contributions (and for the contributions of his studen ...

in 1957. The formulation below, which is the one commonly used, is attributed to Brown and is known as "Brown’s simple exponential smoothing". All the methods of Holt, Winters and Brown may be seen as a simple application of recursive filtering, first found in the 1940s to convert

finite impulse response In signal processing, a finite impulse response (FIR) filter is a filter whose impulse response (or response to any finite length input) is of ''finite'' duration, because it settles to zero in finite time. This is in contrast to infinite impulse ...

(FIR) filters to

infinite impulse response Infinite impulse response (IIR) is a property applying to many linear time-invariant systems that are distinguished by having an impulse response h(t) which does not become exactly zero past a certain point, but continues indefinitely. This is in ...

filters. The simplest form of exponential smoothing is given by the formula: :

s_t = \alpha x_t + (1-\alpha) s_ = s_ + \alpha (x_t - s_).

where

\alpha

is the ''smoothing factor'', and

0 \le \alpha \le 1

. In other words, the smoothed statistic

s_t

is a simple weighted average of the current observation

x_t

and the previous smoothed statistic

s_

. Simple exponential smoothing is easily applied, and it produces a smoothed statistic as soon as two observations are available. The term ''smoothing factor'' applied to

\alpha

here is something of a misnomer, as larger values of

\alpha

actually reduce the level of smoothing, and in the limiting case with

\alpha

= 1 the output series is just the current observation. Values of

\alpha

close to one have less of a smoothing effect and give greater weight to recent changes in the data, while values of

\alpha

closer to zero have a greater smoothing effect and are less responsive to recent changes. There is no formally correct procedure for choosing

\alpha

. Sometimes the statistician's judgment is used to choose an appropriate factor. Alternatively, a statistical technique may be used to ''optimize'' the value of

\alpha

. For example, the method of least squares might be used to determine the value of

\alpha

for which the sum of the quantities

(s_t - x_)^2

is minimized. Unlike some other smoothing methods, such as the simple moving average, this technique does not require any minimum number of observations to be made before it begins to produce results. In practice, however, a "good average" will not be achieved until several samples have been averaged together; for example, a constant signal will take approximately

3 / \alpha

stages to reach 95% of the actual value. To accurately reconstruct the original signal without information loss, all stages of the exponential moving average must also be available, because older samples decay in weight exponentially. This is in contrast to a simple moving average, in which some samples can be skipped without as much loss of information due to the constant weighting of samples within the average. If a known number of samples will be missed, one can adjust a weighted average for this as well, by giving equal weight to the new sample and all those to be skipped. This simple form of exponential smoothing is also known as an exponentially weighted moving average (EWMA). Technically it can also be classified as an autoregressive integrated moving average (ARIMA) (0,1,1) model with no constant term.

Time constant

The

time constant In physics and engineering, the time constant, usually denoted by the Greek letter (tau), is the parameter characterizing the response to a step input of a first-order, linear time-invariant (LTI) system.Concretely, a first-order LTI system is a s ...

of an exponential moving average is the amount of time for the smoothed response of a unit step function to reach

1-1/e \approx 63.2\,\%

of the original signal. The relationship between this time constant,

\tau

, and the smoothing factor,

\alpha

, is given by the formula: :

\alpha = 1 - e^

, thus

\tau = - \frac

where

\Delta T

is the sampling time interval of the discrete time implementation. If the sampling time is fast compared to the time constant (

\Delta T \ll \tau

) then :

\alpha \approx \frac \tau

Choosing the initial smoothed value

Note that in the definition above,

s_0

is being initialized to

x_0

. Because exponential smoothing requires that at each stage we have the previous forecast, it is not obvious how to get the method started. We could assume that the initial forecast is equal to the initial value of demand; however, this approach has a serious drawback. Exponential smoothing puts substantial weight on past observations, so the initial value of demand will have an unreasonably large effect on early forecasts. This problem can be overcome by allowing the process to evolve for a reasonable number of periods (10 or more) and using the average of the demand during those periods as the initial forecast. There are many other ways of setting this initial value, but it is important to note that the smaller the value of

\alpha

, the more sensitive your forecast will be on the selection of this initial smoother value

s_0

Optimization

For every exponential smoothing method we also need to choose the value for the smoothing parameters. For simple exponential smoothing, there is only one smoothing parameter (''α''), but for the methods that follow there is usually more than one smoothing parameter. There are cases where the smoothing parameters may be chosen in a subjective manner – the forecaster specifies the value of the smoothing parameters based on previous experience. However, a more robust and objective way to obtain values for the unknown parameters included in any exponential smoothing method is to estimate them from the observed data. The unknown parameters and the initial values for any exponential smoothing method can be estimated by minimizing the sum of squared errors (SSE). The errors are specified as

e_t=y_t-\hat_

for

t=1, \ldots,T

(the one-step-ahead within-sample forecast errors). Hence we find the values of the unknown parameters and the initial values that minimize :

\text = \sum_^T (y_t-\hat_)^2=\sum_^T e_t^2

Unlike the regression case (where we have formulae to directly compute the regression coefficients which minimize the SSE) this involves a non-linear minimization problem and we need to use an

optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...

tool to perform this.

"Exponential" naming

The name 'exponential smoothing' is attributed to the use of the exponential window function during convolution. It is no longer attributed to Holt, Winters & Brown. By direct substitution of the defining equation for simple exponential smoothing back into itself we find that :

+ (1-\alpha)^t x_0. \end

In other words, as time passes the smoothed statistic

s_t

becomes the weighted average of a greater and greater number of the past observations

s_,\ldots, s_

, and the weights assigned to previous observations are proportional to the terms of the geometric progression :

1, (1-\alpha), (1-\alpha)^2,\ldots, (1-\alpha)^n,\ldots

geometric progression In mathematics, a geometric progression, also known as a geometric sequence, is a sequence of non-zero numbers where each term after the first is found by multiplying the previous one by a fixed, non-zero number called the ''common ratio''. For ex ...

is the discrete version of an

exponential function The exponential function is a mathematical function denoted by f(x)=\exp(x) or e^x (where the argument is written as an exponent). Unless otherwise specified, the term generally refers to the positive-valued function of a real variable, ...

, so this is where the name for this smoothing method originated according to

Statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...

lore.

Comparison with moving average

Exponential smoothing and moving average have similar defects of introducing a lag relative to the input data. While this can be corrected by shifting the result by half the window length for a symmetrical kernel, such as a moving average or gaussian, it is unclear how appropriate this would be for exponential smoothing. They also both have roughly the same distribution of forecast error when ''α'' = 2/(''k'' + 1). They differ in that exponential smoothing takes into account all past data, whereas moving average only takes into account ''k'' past data points. Computationally speaking, they also differ in that moving average requires that the past ''k'' data points, or the data point at lag ''k'' + 1 plus the most recent forecast value, to be kept, whereas exponential smoothing only needs the most recent forecast value to be kept. In the

literature, the use of non-causal (symmetric) filters is commonplace, and the exponential

is broadly used in this fashion, but a different terminology is used: exponential smoothing is equivalent to a first-order

infinite-impulse response Infinite impulse response (IIR) is a property applying to many linear time-invariant systems that are distinguished by having an impulse response h(t) which does not become exactly zero past a certain point, but continues indefinitely. This is i ...

(IIR) filter and moving average is equivalent to a

finite impulse response filter In signal processing, a finite impulse response (FIR) filter is a filter whose impulse response (or response to any finite length input) is of ''finite'' duration, because it settles to zero in finite time. This is in contrast to infinite impuls ...

with equal weighting factors.

Double exponential smoothing (Holt linear)

Simple exponential smoothing does not do well when there is a trend in the data. In such situations, several methods were devised under the name "double exponential smoothing" or "second-order exponential smoothing," which is the recursive application of an exponential filter twice, thus being termed "double exponential smoothing". This nomenclature is similar to quadruple exponential smoothing, which also references its recursion depth. The basic idea behind double exponential smoothing is to introduce a term to take into account the possibility of a series exhibiting some form of trend. This slope component is itself updated via exponential smoothing. One method, works as follows: Again, the raw data sequence of observations is represented by

x_t

, beginning at time

t=0

. We use

s_t

to represent the smoothed value for time

t

, and

b_t

is our best estimate of the trend at time

t

. The output of the algorithm is now written as

F_

, an estimate of the value of

x_

at time

m > 0

based on the raw data up to time

t

. Double exponential smoothing is given by the formulas :

\begin
s_0 & = x_0\\
b_0 & = x_1 - x_0\\
\end

And for

t > 0

by :

\begin
s_t & = \alpha x_t + (1-\alpha)(s_ + b_)\\
b_t & = \beta (s_t - s_) + (1-\beta)b_\\
\end

where

\alpha

(

0 \le \alpha \le 1

) is the ''data smoothing factor'', and

\beta

(

0 \le \beta \le 1

) is the ''trend smoothing factor''. To forecast beyond

x_t

is given by the approximation: :

F_ = s_t + m \cdot b_t

Setting the initial value

b

is a matter of preference. An option other than the one listed above is

\frac n

for some

n

. Note that ''F''₀ is undefined (there is no estimation for time 0), and according to the definition ''F''₁=''s''₀+''b''₀, which is well defined, thus further values can be evaluated. A second method, referred to as either Brown's linear exponential smoothing (LES) or Brown's double exponential smoothing works as follows. :

\begin
s'_0 & = x_0\\
s''_0 & = x_0\\
s'_t & = \alpha x_t + (1-\alpha)s'_\\
s''_t & = \alpha s'_t + (1-\alpha)s''_\\
F_ & = a_t + mb_t,
\end

where ''a''_''t'', the estimated level at time ''t'' and ''b''_''t'', the estimated trend at time ''t'' are: :

b_t & = \frac \alpha (s'_t - s''_t). \end

Triple exponential smoothing (Holt Winters)

Triple exponential smoothing applies exponential smoothing three times, which is commonly used when there are three high frequency signals to be removed from a time series under study. There are different types of seasonality: 'multiplicative' and 'additive' in nature, much like addition and multiplication are basic operations in mathematics. If every month of December we sell 10,000 more apartments than we do in November the seasonality is ''additive'' in nature. However, if we sell 10% more apartments in the summer months than we do in the winter months the seasonality is ''multiplicative'' in nature. Multiplicative seasonality can be represented as a constant factor, not an absolute amount. Triple exponential smoothing was first suggested by Holt's student, Peter Winters, in 1960 after reading a signal processing book from the 1940s on exponential smoothing. Holt's novel idea was to repeat filtering an odd number of times greater than 1 and less than 5, which was popular with scholars of previous eras. While recursive filtering had been used previously, it was applied twice and four times to coincide with the

Hadamard conjecture In mathematics, a Hadamard matrix, named after the French mathematician Jacques Hadamard, is a square matrix whose entries are either +1 or −1 and whose rows are mutually orthogonal. In geometric terms, this means that each pair of row ...

, while triple application required more than double the operations of singular convolution. The use of a triple application is considered a

technique, rather than one based on theoretical foundations and has often been over-emphasized by practitioners. - Suppose we have a sequence of observations

x_t

, beginning at time

t=0

with a cycle of seasonal change of length

L

. The method calculates a trend line for the data as well as seasonal indices that weight the values in the trend line based on where that time point falls in the cycle of length

L

. Let

s_t

represent the smoothed value of the constant part for time

t

b_t

is the sequence of best estimates of the linear trend that are superimposed on the seasonal changes, and

c_t

is the sequence of seasonal correction factors. We wish to estimate

c_t

at every time

t

mod

L

in the cycle that the observations take on. As a rule of thumb, a minimum of two full seasons (or

2L

periods) of historical data is needed to initialize a set of seasonal factors. The output of the algorithm is again written as

F_

, an estimate of the value of

x_

at time

t+m>0

based on the raw data up to time

t

. Triple exponential smoothing with multiplicative seasonality is given by the formulas :

F_ & = (s_t + mb_t)c_, \end

where

\alpha

(

0 \le \alpha \le 1

) is the ''data smoothing factor'',

\beta

(

0 \le \beta \le 1

) is the ''trend smoothing factor'', and

\gamma

(

0 \le \gamma \le 1

) is the ''seasonal change smoothing factor''. The general formula for the initial trend estimate

b

is: :

\begin
b_0 & = \frac \left(\frac + \frac + \cdots + \frac\right)
\end

Setting the initial estimates for the seasonal indices

c_i

for

i = 1,2,\ldots,L

is a bit more involved. If

N

is the number of complete cycles present in your data, then: :

c_i = \frac \sum_^N \frac \quad \text i = 1,2,\ldots,L

where :

A_j = \frac \quad \text j = 1,2,\ldots,N

Note that

A_j

is the average value of

x

in the

j^\text

cycle of your data. Triple exponential smoothing with additive seasonality is given by: :

\begin
s_0 & = x_0\\
s_t & = \alpha (x_t-c_) + (1-\alpha)(s_ + b_)\\
b_t & = \beta (s_t - s_) + (1-\beta)b_\\
c_t & = \gamma (x_t-s_-b_)+(1-\gamma)c_\\
F_ & = s_t + mb_t+c_,
\end

Implementations in statistics packages

* R: the HoltWinters function in the stats package and ets function in the forecast package (a more complete implementation, generally resulting in a better performance). * Python: the holtwinters module of the statsmodels package allow for simple, double and triple exponential smoothing. * IBM

SPSS SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. C ...

includes Simple, Simple Seasonal, Holt's Linear Trend, Brown's Linear Trend, Damped Trend, Winters' Additive, and Winters' Multiplicative in the Time-Series modeling procedure within its Statistics and Modeler statistical packages. The default Expert Modeler feature evaluates all seven exponential smoothing models and ARIMA models with a range of nonseasonal and seasonal ''p'', ''d'', and ''q'' values, and selects the model with the lowest

Bayesian Information Criterion In statistics, the Bayesian information criterion (BIC) or Schwarz information criterion (also SIC, SBC, SBIC) is a criterion for model selection among a finite set of models; models with lower BIC are generally preferred. It is based, in part, o ...

statistic. *

Stata Stata (, , alternatively , occasionally stylized as STATA) is a general-purpose statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. It is used by researchers in many fie ...

: tssmooth command *

LibreOffice LibreOffice () is a free and open-source office productivity software suite, a project of The Document Foundation (TDF). It was forked in 2010 from OpenOffice.org, an open-sourced version of the earlier StarOffice. The LibreOffice suite co ...

5.2 *

Microsoft Excel Microsoft Excel is a spreadsheet developed by Microsoft for Microsoft Windows, Windows, macOS, Android (operating system), Android and iOS. It features calculation or computation capabilities, graphing tools, pivot tables, and a macro (comp ...

2016

Notes

External links

Lecture notes on exponential smoothing (Robert Nau, Duke University)

Data Smoothing
by Jon McLoone, The Wolfram Demonstrations Project
The Holt–Winters Approach to Exponential Smoothing: 50 Years Old and Going Strong
by Paul Goodwin (2010) Foresight: The International Journal of Applied Forecasting
Algorithms for Unevenly Spaced Time Series: Moving Averages and Other Rolling Operators
by Andreas Eckner {{Quantitative forecasting methods Time series

Basic (simple) exponential smoothing

Time constant

Choosing the initial smoothed value

Optimization

"Exponential" naming

Comparison with moving average

Double exponential smoothing (Holt linear)

Triple exponential smoothing (Holt Winters)

Implementations in statistics packages

See also

Notes

External links