Linear trend estimation is a

statistical Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industr ...

technique to aid interpretation of data. When a series of measurements of a process are treated as, for example, a sequences or

time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Ex ...

, trend estimation can be used to make and justify statements about tendencies in the data, by relating the measurements to the times at which they occurred. This model can then be used to describe the behaviour of the observed data, without explaining it. In particular, it may be useful to determine if measurements exhibit an increasing or decreasing trend which is statistically distinguished from random behaviour. Some examples are determining the trend of the daily average temperatures at a given location from winter to summer, and determining the trend in a global temperature series over the last 100 years. In the latter case, issues of

homogeneity Homogeneity and heterogeneity are concepts often used in the sciences and statistics relating to the uniformity of a substance or organism. A material or image that is homogeneous is uniform in composition or character (i.e. color, shape, size, ...

are important (for example, about whether the series is equally reliable throughout its length).

Fitting a trend: least-squares

Given a set of data and the desire to produce some kind of

model A model is an informative representation of an object, person or system. The term originally denoted the plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin ''modulus'', a measure. Models c ...

of those data, there are a variety of functions that can be chosen for the fit. If there is no prior understanding of the data, then the simplest function to fit is a straight line with the data values on the y axis, and time (''t'' = 1, 2, 3, ...) on the x axis. Once it has been decided to fit a straight line, there are various ways to do so, but the most usual choice is a

least-squares The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the res ...

fit. This method minimizes the sum of the squared errors in the data series ''y''. Given a set of points in time

t

, and data values

y_t

observed for those points in time, values of

a

and

b

are chosen so that :

\sum_t \left y_t - \left( \hatt + \hat \right) \right 2

is minimized. Here ''at'' + ''b'' is the trend line, so the sum of

squared deviations Squared deviations from the mean (SDM) result from squaring deviations. In probability theory and statistics, the definition of ''variance'' is either the expected value of the SDM (when considering a theoretical distribution) or its average valu ...

from the trend line is what is being minimized. This can always be done in closed form since this is a case of

simple linear regression In statistics, simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the ''x'' and ...

. For the rest of this article, “trend” will mean the slope of the least squares line, since this is a common convention.

Trends in random data

Before considering trends in real data, it is useful to understand trends in

random data In common usage, randomness is the apparent or actual lack of pattern or predictability in events. A random sequence of events, symbols or steps often has no order and does not follow an intelligible pattern or combination. Individual rando ...

. If a series which is known to be random is analysed – fair dice falls, or computer-generated pseudo-random numbers – and a trend line is fitted through the data, the chances of an exactly zero estimated trend are negligible. But the trend would be expected to be small. If an individual series of observations is generated from simulations that employ a given

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...

of noise that equals the observed variance of our data series of interest, and a given length (say, 100 points), a large number of such simulated series (say, 100,000 series) can be generated. These 100,000 series can then be analysed individually to calculate estimated trends in each series, and these results establish a distribution of estimated trends that are to be expected from such random data – see diagram. Such a distribution will be

normal Normal(s) or The Normal(s) may refer to: Film and television * ''Normal'' (2003 film), starring Jessica Lange and Tom Wilkinson * ''Normal'' (2007 film), starring Carrie-Anne Moss, Kevin Zegers, Callum Keith Rennie, and Andrew Airlie * ''Norma ...

according to the

central limit theorem In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themsel ...

except in pathological cases. A level of statistical certainty, ''S'', may now be selected – 95% confidence is typical; 99% would be stricter, 90% looser – and the following question can be asked: what is the borderline trend value ''V'' that would result in ''S''% of trends being between −''V'' and ''+V''? The above procedure can be replaced by a

permutation test A permutation test (also called re-randomization test) is an exact statistical hypothesis test making use of the proof by contradiction. A permutation test involves two or more samples. The null hypothesis is that all samples come from the same di ...

. For this, the set of 100,000 generated series would be replaced by 100,000 series constructed by randomly shuffling the observed data series; clearly such a constructed series would be trend-free, so as with the approach of using simulated data these series can be used to generate borderline trend values ''V'' and −''V''. In the above discussion the distribution of trends was calculated by simulation, from a large number of trials. In simple cases (normally distributed random noise being a classic) the distribution of trends can be calculated exactly without simulation. The range (−''V'', ''V'') can be employed in deciding whether a trend estimated from the actual data is unlikely to have come from a data series that truly has a zero trend. If the estimated value of the regression parameter ''a'' lies outside this range, such a result could have occurred in the presence of a true zero trend only, for example, one time out of twenty if the confidence value ''S''=95% was used; in this case, it can be said that, at degree of certainty ''S'', we reject the null hypothesis that the true underlying trend is zero. However, note that whatever value of ''S'' we choose, then a given fraction, 1 − ''S'', of truly random series will be declared (falsely, by construction) to have a significant trend. Conversely, a certain fraction of series that in fact have a non-zero trend will not be declared to have a trend.

Data as trend plus noise

To analyse a (time) series of data, we assume that it may be represented as trend plus noise: :

y_t = at + b + e_t\,

where

a

and

b

are unknown constants and the

e

's are randomly distributed

errors An error (from the Latin ''error'', meaning "wandering") is an action which is inaccurate or incorrect. In some usages, an error is synonymous with a mistake. The etymology derives from the Latin term 'errare', meaning 'to stray'. In statistics ...

. If one can reject the null hypothesis that the errors are

non-stationary In mathematics and statistics, a stationary process (or a strict/strictly stationary process or strong/strongly stationary process) is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Con ...

, then the non-stationary series is called trend-stationary. The least squares method assumes the errors to be independently distributed with a

normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...

. If this is not the case, hypothesis tests about the unknown parameters ''a'' and ''b'' may be inaccurate. It is simplest if the

e

's all have the same distribution, but if not (if some have higher variance, meaning that those data points are effectively less certain) then this can be taken into account during the least squares fitting, by weighting each point by the inverse of the variance of that point. In most cases, where only a single time series exists to be analysed, the variance of the

e

's is estimated by fitting a trend to obtain the estimated parameter values

\hat a

and

\hat b,

thus allowing the predicted values :

\hat y =\hat at+\hat b

to be subtracted from the data

y_t

(thus ''detrending'' the data) and leaving the residuals

\hat e_t

as the ''detrended data'', and estimating the variance of the

e_t

's from the residuals — this is often the only way of estimating the variance of the

e_t

's. Once we know the "noise" of the series, we can then assess the significance of the trend by making the

null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is ...

that the trend,

a

, is not different from 0. From the above discussion of trends in random data with known

, we know the distribution of calculated trends to be expected from random (trendless) data. If the estimated trend,

\hat a

, is larger than the critical value for a certain

significance level In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). More precisely, a study's defined significance level, denoted by \alpha, is the p ...

, then the estimated trend is deemed significantly different from zero at that significance level, and the null hypothesis of zero underlying trend is rejected. The use of a linear trend line has been the subject of criticism, leading to a search for alternative approaches to avoid its use in model estimation. One of the alternative approaches involves

unit root In probability theory and statistics, a unit root is a feature of some stochastic processes (such as random walks) that can cause problems in statistical inference involving time series models. A linear stochastic process has a unit root if 1 is ...

tests and the cointegration technique in econometric studies. The estimated coefficient associated with a linear trend variable such as time is interpreted as a measure of the impact of a number of unknown or known but unmeasurable factors on the dependent variable over one unit of time. Strictly speaking, that interpretation is applicable for the estimation time frame only. Outside that time frame, one does not know how those unmeasurable factors behave both qualitatively and quantitatively. Furthermore, the linearity of the time trend poses many questions: (i) Why should it be linear? (ii) If the trend is non-linear then under what conditions does its inclusion influence the magnitude as well as the statistical significance of the estimates of other parameters in the model? (iii) The inclusion of a linear time trend in a model precludes by assumption the presence of fluctuations in the tendencies of the dependent variable over time; is this necessarily valid in a particular context? (iv) And, does a spurious relationship exist in the model because an underlying causative variable is itself time-trending? Research results of mathematicians, statisticians, econometricians, and economists have been published in response to those questions. For example, detailed notes on the meaning of linear time trends in regression model are given in Cameron (2005); Granger, Engle and many other econometricians have written on stationarity, unit root testing, co-integration and related issues (a summary of some of the works in this area can be found in an information paper by the Royal Swedish Academy of Sciences (2003); and Ho-Trieu & Tucker (1990) have written on logarithmic time trends with results indicating linear time trends are special cases of cycles.

Example: noisy time series

It is harder to see a trend in a noisy time series. For example, if the true series is 0, 1, 2, 3 all plus some independent normally distributed "noise" ''e'' of

standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, whil ...

''E'', and we have a sample series of length 50, then if ''E'' = 0.1 the trend will be obvious; if ''E'' = 100 the trend will probably be visible; but if ''E'' = 10000 the trend will be buried in the noise. If we consider a concrete example, the global surface temperature record of the past 140 years as presented by the

IPCC The Intergovernmental Panel on Climate Change (IPCC) is an intergovernmental body of the United Nations. Its job is to advance scientific knowledge about climate change caused by human activities. The World Meteorological Organization (WMO) ...

: then the interannual variation is about 0.2 °C and the trend about 0.6 °C over 140 years, with 95% confidence limits of 0.2 °C (by coincidence, about the same value as the interannual variation). Hence the trend is statistically different from 0. However, as noted elsewhere this time series doesn't conform to the assumptions necessary for least squares to be valid.

Goodness of fit (''r''-squared) and trend

The least-squares fitting process produces a value –

r-squared In statistics, the coefficient of determination, denoted ''R''2 or ''r''2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s). It is a statistic used ...

(''r''²) – which is 1 minus the ratio of the variance of the residuals to the variance of the dependent variable. It says what fraction of the variance of the data is explained by the fitted trend line. It does not relate to the

statistical significance In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). More precisely, a study's defined significance level, denoted by \alpha, is the p ...

of the trend line (see graph); statistical significance of the trend is determined by its

t-statistic In statistics, the ''t''-statistic is the ratio of the departure of the estimated value of a parameter from its hypothesized value to its standard error. It is used in hypothesis testing via Student's ''t''-test. The ''t''-statistic is used in a ...

. Often, filtering a series increases ''r''² while making little difference to the fitted trend.

Real data may need more complicated models

Thus far the data have been assumed to consist of the trend plus noise, with the noise at each data point being

independent and identically distributed random variables In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usu ...

and to have a

. Real data (for example climate data) may not fulfill these criteria. This is important, as it makes an enormous difference to the ease with which the statistics can be analysed so as to extract maximum information from the data series. If there are other non-linear effects that have a correlation to the independent variable (such as cyclic influences), the use of least-squares estimation of the trend is not valid. Also where the variations are significantly larger than the resulting straight line trend, the choice of start and end points can significantly change the result. That is, the model is mathematically misspecified. Statistical inferences (tests for the presence of trend, confidence intervals for the trend, etc.) are invalid unless departures from the standard assumptions are properly accounted for, for example as follows: *Dependence: autocorrelated time series might be modeled using

autoregressive moving average model In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model spe ...

s. *Non-constant variance: in the simplest cases

weighted least squares Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the variance of observations is incorporated into the regression. WLS is also a speci ...

might be used. *Non-normal distribution for errors: in the simplest cases a generalised linear model might be applicable. *

Unit root In probability theory and statistics, a unit root is a feature of some stochastic processes (such as random walks) that can cause problems in statistical inference involving time series models. A linear stochastic process has a unit root if 1 is ...

: taking first (or occasionally second) differences of the data, with the level of differencing being identified through various unit root tests. In R, the linear trend in data can be estimated by using the 'tslm' function of the 'forecast' package.

Trends in clinical data

Medical and

biomedical Biomedicine (also referred to as Western medicine, mainstream medicine or conventional medicine)

studies often seek to determine a link in sets of data, such as (as indicated above) three different diseases. But data may also be linked in time (such as change in the effect of a drug from baseline, to month 1, to month 2), or by an external factor that may or may not be determined by the researcher and/or their subject (such as no pain, mild pain, moderate pain, severe pain). In these cases one would expect the effect test statistic (e.g. influence of a

statin Statins, also known as HMG-CoA reductase inhibitors, are a class of lipid-lowering medications that reduce illness and mortality in those who are at high risk of cardiovascular disease. They are the most common cholesterol-lowering drugs. Low ...

on levels of

cholesterol Cholesterol is any of a class of certain organic molecules called lipids. It is a sterol (or modified steroid), a type of lipid. Cholesterol is biosynthesized by all animal cells and is an essential structural component of animal cell memb ...

, an

analgesic An analgesic drug, also called simply an analgesic (American English), analgaesic (British English), pain reliever, or painkiller, is any member of the group of drugs used to achieve relief from pain (that is, analgesia or pain management). It ...

on the degree of pain, or increasing doses of a drug on a measurable index) to change in direct order as the effect develops. Suppose the mean level of cholesterol before and after the prescription of a statin falls from 5.6

mmol/L Molar concentration (also called molarity, amount concentration or substance concentration) is a measure of the concentration of a chemical species, in particular of a solute in a solution, in terms of amount of substance per unit volume of solut ...

at baseline to 3.4 mmol/L at one month and to 3.7 mmol/L at two months. Given sufficient power, an ANOVA would most likely find a significant fall at one and two months, but the fall is not linear. Furthermore, a post-hoc test may be required. An alternative test may be repeated measures (two way) ANOVA, or

Friedman test The Friedman test is a non-parametric statistical test developed by Milton Friedman. Similar to the parametric repeated measures ANOVA, it is used to detect differences in treatments across multiple test attempts. The procedure involves ranking ...

, depending on the nature of the data. Nevertheless, because the groups are ordered, a standard ANOVA is inappropriate. Should the cholesterol fall from 5.4 to 4.1 to 3.7, there is a clear linear trend. The same principal may be applied to the effects of allele/ genotype frequency, where it could be argued that SNPs in nucleotides XX, XY, YY are in fact a trend of no Y's, one Y, and then two Y's. The mathematics of linear trend estimation is a variant of the standard ANOVA, giving different information, and would be the most appropriate test if the researchers are hypothesising a trend effect in their test statistic. One example is of levels of serum

trypsin Trypsin is an enzyme in the first section of the small intestine that starts the digestion of protein molecules by cutting these long chains of amino acids into smaller pieces. It is a serine protease from the PA clan superfamily, found in the d ...

in six groups of subjects ordered by age decade (10–19 years up to 60–69 years). Levels of trypsin (ng/mL) rise in a direct linear trend of 128, 152, 194, 207, 215, 218. Unsurprisingly, a 'standard' ANOVA gives ''p'' < 0.0001, whereas linear trend estimation give ''p'' = 0.00006. Incidentally, it could be reasonably argued that as age is a natural continuously variable index, it should not be categorised into decades, and an effect of age and serum trypsin sought by correlation (assuming the raw data is available). A further example is of a substance measured at four time points in different groups: mean D(1) 1.6 .56 (2) 1.94 .75 (3) 2.22 .66 (4) 2.40 .79 which is a clear trend. ANOVA gives ''p'' = 0.091, because the overall variance exceeds the means, whereas linear trend estimation gives ''p'' = 0.012. However, should the data have been collected at four time points in the same individuals, linear trend estimation would be inappropriate, and a two-way (repeated measures) ANOVA applied.

Notes

References

* * * * DOI:10.22004/ag.econ.12288 * * {{Statistics, analysis Regression with time series structure Statistical forecasting Change detection