statistics Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industri ...

, point estimation involves the use of sample

data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted. ...

to calculate a single value (known as a point estimate since it identifies a point in some

parameter space The parameter space is the space of possible parameter values that define a particular mathematical model, often a subset of finite-dimensional Euclidean space. Often the parameters are inputs of a function, in which case the technical term for the ...

) which is to serve as a "best guess" or "best estimate" of an unknown population

parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...

(for example, the

population mean In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hypoth ...

). More formally, it is the application of a point

estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...

to the data to obtain a point estimate. Point estimation can be contrasted with

interval estimation In statistics, interval estimation is the use of sample data to estimate an '' interval'' of plausible values of a parameter of interest. This is in contrast to point estimation, which gives a single value. The most prevalent forms of interval e ...

: such interval estimates are typically either confidence intervals, in the case of

frequentist inference Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or pro ...

, or

credible intervals In Bayesian statistics, a credible interval is an interval within which an unobserved parameter value falls with a particular probability. It is an interval in the domain of a posterior probability distribution or a predictive distribution. The ...

, in the case of

Bayesian inference Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and ...

. More generally, a point estimator can be contrasted with a set estimator. Examples are given by confidence sets or credible sets. A point estimator can also be contrasted with a distribution estimator. Examples are given by

confidence distribution In statistical inference, the concept of a confidence distribution (CD) has often been loosely referred to as a distribution function on the parameter space that can represent confidence intervals of all levels for a parameter of interest. Histori ...

s, randomized estimators, and Bayesian posteriors.

Properties of point estimates

Biasness

“

Bias Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, ...

” is defined as the difference between the expected value of the estimator and the true value of the population parameter being estimated. It can also be described that the closer the

expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...

of a parameter is to the measured parameter, the lesser the bias. When the estimated number and the true value is equal, the estimator is considered unbiased. This is called an ''unbiased estimator.'' The estimator will become a ''best unbiased estimator'' if it has minimum

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...

. However, A biased estimator with a small variance may be more useful than an unbiased estimator with a large variance. Most importantly, we prefer point estimators that has the smallest mean square errors. If we let T = h(X₁,X₂, . . . , X_n) be an estimator based on a random sample X₁,X₂, . . . , X_n, the estimator T is called an unbiased estimator for the parameter θ if E = θ, irrespective of the value of θ. For example, from the same random sample we have E( x̄ ) = µ(mean) and E(s²) = σ² (variance), then x̄ and s² would be unbiased estimators for µ and σ². The difference E − θ is called the bias of T ; if this difference is nonzero, then T is called biased.

Consistency

Consistency is about whether the point estimate stays close to the value when the parameter increases its size. The larger the sample size, the more accurate the estimate is. If a point estimator is consistent, its expected value and variance should be close to the true value of the parameter. An unbiased estimator is consistent if the limit of the variance of estimator T equals zero.

Efficiency

Let ''T''₁ and ''T''₂ be two unbiased estimators for the same parameter ''θ''. The estimator ''T''₂ would be called ''more efficient'' than estimator ''T''₁ if Var(''T''₂) ''<'' Var(''T''₁), irrespective of the value of ''θ''. We can also say that the most efficient estimators are the ones with the least variability of outcomes. Therefore, if the estimator has smallest variance among sample to sample, it is both most efficient and unbiased. We extend the notion of efficiency by saying that estimator T₂ is more efficient than estimator T₁ (for the same parameter of interest), if the MSE(

mean square error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between ...

) of T₂ is smaller than the MSE of T₁. Generally, we must consider the distribution of the population when determining the efficiency of estimators. For example, in a

normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu i ...

, the mean is considered more efficient than the median, but the same does not apply in asymmetrical, or

skewed In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal d ...

, distributions.

Sufficiency

In statistics, the job of a statistician is to interpret the data that they have collected and to draw statistically valid conclusion about the population under investigation. But in many cases the raw data, which are too numerous and too costly to store, are not suitable for this purpose. Therefore, the statistician would like to condense the data by computing some statistics and to base his analysis on these statistics so that there is no loss of relevant information in doing so, that is the statistician would like to choose those statistics which exhaust all information about the parameter, which is contained in the sample. We define

sufficient statistic In statistics, a statistic is ''sufficient'' with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the par ...

s as follows: Let X =( X₁, X₂, ... ,X_n) be a random sample. A statistic T(X) is said to be sufficient for θ(or for the family of distribution) if the conditional distribution of X given T is free from θ.

Types of point estimation

Bayesian point estimation

Bayesian inference is typically based on the

posterior distribution The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior p ...

. Many Bayesian point estimators are the posterior distribution's statistics of

central tendency In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probability distribution.Weisberg H.F (1992) ''Central Tendency and Variability'', Sage University Paper Series on Quantitative Applications in ...

, e.g., its mean, median, or mode: * Posterior mean, which minimizes the (posterior) ''risk'' (expected loss) for a squared-error

loss function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cos ...

; in Bayesian estimation, the risk is defined in terms of the posterior distribution, as observed by

Gauss Johann Carl Friedrich Gauss (; german: Gauß ; la, Carolus Fridericus Gauss; 30 April 177723 February 1855) was a German mathematician and physicist who made significant contributions to many fields in mathematics and science. Sometimes refer ...

. * Posterior median, which minimizes the posterior risk for the absolute-value loss function, as observed by

Laplace Pierre-Simon, marquis de Laplace (; ; 23 March 1749 – 5 March 1827) was a French scholar and polymath whose work was important to the development of engineering, mathematics, statistics, physics, astronomy, and philosophy. He summarized ...

. *

maximum a posteriori In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is an estimate of an unknown quantity, that equals the mode of the posterior distribution. The MAP can be used to obtain a point estimate of an unobserved quantity on the ...

(''MAP''), which finds a maximum of the posterior distribution; for a uniform prior probability, the MAP estimator coincides with the maximum-likelihood estimator; The MAP estimator has good asymptotic properties, even for many difficult problems, on which the maximum-likelihood estimator has difficulties. For regular problems, where the maximum-likelihood estimator is consistent, the maximum-likelihood estimator ultimately agrees with the MAP estimator. Bayesian estimators are admissible, by Wald's theorem. The

Minimum Message Length Minimum message length (MML) is a Bayesian information-theoretic method for statistical model comparison and selection. It provides a formal information theory restatement of Occam's Razor: even when models are equal in their measure of fit-accurac ...

( MML) point estimator is based in Bayesian

information theory Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...

and is not so directly related to the

. Special cases of Bayesian filters are important: *

Kalman filter For statistics and control theory, Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, and produces estimat ...

Wiener filter In signal processing, the Wiener filter is a filter used to produce an estimate of a desired or target random process by linear time-invariant ( LTI) filtering of an observed noisy process, assuming known stationary signal and noise spectra, and ...

Several

methods Method ( grc, μέθοδος, methodos) literally means a pursuit of knowledge, investigation, mode of prosecuting such inquiry, or system. In recent centuries it more often means a prescribed process for completing a task. It may refer to: *Scien ...

computational statistics Computational statistics, or statistical computing, is the bond between statistics and computer science. It means statistical methods that are enabled by using computational methods. It is the area of computational science (or scientific computin ...

have close connections with Bayesian analysis: *

particle filter Particle filters, or sequential Monte Carlo methods, are a set of Monte Carlo algorithms used to solve filtering problems arising in signal processing and Bayesian statistical inference. The filtering problem consists of estimating the inte ...

Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain a ...

(MCMC)

Methods of finding point estimates

Below are some commonly used methods of estimating unknown parameters which are expected to provide estimators having some of these important properties. In general, depending on the situation and the purpose of our study we apply any one of the methods that may be suitable among the methods of point estimation.

Method of maximum likelihood (MLE)

The

method of maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statis ...

, due to R.A. Fisher, is the most important general method of estimation. This estimator method attempts to acquire unknown parameters that maximize the likelihood function. It uses a known model (ex. the normal distribution) and uses the values of parameters in the model that maximize a likelihood function to find the most suitable match for the data. Let X = (X₁, X₂, ... ,X_n) denote a random sample with joint p.d.f or p.m.f. f(x, θ) (θ may be a vector). The function f(x, θ), considered as a function of θ, is called the likelihood function. In this case, it is denoted by L(θ). The principle of maximum likelihood consists of choosing an estimate within the admissible range of θ, that maximizes the likelihood. This estimator is called the maximum likelihood estimate (MLE) of θ. In order to obtain the MLE of θ, we use the equation ''dlog''L(θ)/''d''θ_i=0, i = 1, 2, …, k. If θ is a vector, then partial derivatives are considered to get the likelihood equations.

Method of moments (MOM)

The method of moments was introduced by K. Pearson and P. Chebyshev in 1887, and it is one of the oldest methods of estimation. This method is based on

law of large numbers In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials shou ...

, which uses all the known facts about a population and apply those facts to a sample of the population by deriving equations that relate the population moments to the unknown parameters. We can then solve with the sample mean of the population moments. However, due to the simplicity, this method is not always accurate and can be biased easily. Let (X₁, X₂,…X_n) be a random sample from a population having p.d.f. (or p.m.f) f(x,θ), θ = (θ₁, θ₂, …, θ_k). The objective is to estimate the parameters θ₁, θ₂, ..., θ_k. Further, let the first k population moments about zero exist as explicit function of θ, i.e. μ_r = μ_r(θ₁, θ₂,…, θ_k), r = 1, 2, …, k. In the method of moments, we equate k sample moments with the corresponding population moments. Generally, the first k moments are taken because the errors due to sampling increase with the order of the moment. Thus, we get k equations μ_r(θ₁, θ₂,…, θ_k) = m_r, r = 1, 2, …, k. Solving these equations we get the method of moment estimators (or estimates) as m_r = 1/n ΣX_i^r. See also generalized method of moments.

Method of least square

In the method of least square, we consider the estimation of parameters using some specified form of the expectation and second moment of the observations. For fitting a curve of the form y = f( x, β₀, β₁, ,,,, β_p) to the data (x_i, y_i), i = 1, 2,…n, we may use the method of least squares. This method consists of minimizing the sum of squares. When f( x, β₀, β₁, ,,,, β_p) is a linear function of the parameters and the x-values are known, least square estimators will be

best linear unbiased estimator Best or The Best may refer to: People * Best (surname), people with the surname Best * Best (footballer, born 1968), retired Portuguese footballer Companies and organizations * Best & Co., an 1879–1971 clothing chain * Best Lock Corporation, ...

(BLUE). Again, if we assume that the least square estimates are independently and identically normally distributed, then a linear estimator will be

minimum-variance unbiased estimator In statistics a minimum-variance unbiased estimator (MVUE) or uniformly minimum-variance unbiased estimator (UMVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter. For pra ...

(MVUE) for the entire class of unbiased estimators. See also

minimum mean squared error In statistics and signal processing, a minimum mean square error (MMSE) estimator is an estimation method which minimizes the mean square error (MSE), which is a common measure of estimator quality, of the fitted values of a dependent variable. In ...

(MMSE).

Minimum-variance mean-unbiased estimator (MVUE)

The method of

minimizes the risk (expected loss) of the squared-error loss-function.

Median unbiased estimator

Median-unbiased estimator In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fea ...

minimizes the risk of the absolute-error loss function.

Best linear unbiased estimator (BLUE)

Best linear unbiased estimator Best or The Best may refer to: People * Best (surname), people with the surname Best * Best (footballer, born 1968), retired Portuguese footballer Companies and organizations * Best & Co., an 1879–1971 clothing chain * Best Lock Corporation, ...

, also known as the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero.

Point estimate v.s. confidence interval estimate

Point estimation and confidence interval estimation

There are two major types of estimates: point estimate and confidence interval estimate. In the point estimate we try to choose a unique point in the parameter space which can reasonably be considered as the true value of the parameter. On the other hand, instead of unique estimate of the parameter, we are interested in constructing a family of sets that contain the true (unknown) parameter value with a specified probability. In many problems of statistical inference we are not interested only in estimating the parameter or testing some hypothesis concerning the parameter, we also want to get a lower or an upper bound or both, for the real-valued parameter. To do this, we need to construct a confidence interval. Confidence interval describes how reliable an estimate is. We can calculate the upper and lower confidence limits of the intervals from the observed data. Suppose a dataset x₁, . . . , x_n is given, modeled as realization of random variables X₁, . . . , X_n. Let θ be the parameter of interest, and γ a number between 0 and 1. If there exist sample statistics L_n = g(X₁, . . . , X_n) and U_n = h(X₁, . . . , X_n) such that P(L_n < θ < U_n) = γ for every value of θ, then (l_n, u_n), where l_n = g(x₁, . . . , x_n) and u_n = h(x₁, . . . , x_n), is called a 100γ% confidence interval for θ. The number γ is called the

confidence level In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as ...

. In general, with a normally-distributed sample mean, Ẋ, and with a known value for the standard deviation, σ, a 100(1-α)% confidence interval for the true μ is formed by taking Ẋ ± e, with e = z_1-α/2(σ/n^1/2), where z_1-α/2 is the 100(1-α/2)% cumulative value of the standard normal curve, and n is the number of data values in that column. For example, z_1-α/2 equals 1.96 for 95% confidence. Here two limits are computed from the set of observations, say l_n and u_n and it is claimed with a certain degree of confidence (measured in probabilistic terms) that the true value of γ lies between l_n and u_n. Thus we get an interval (l_n and u_n) which we expect would include the true value of γ(θ). So this type of estimation is called confidence interval estimation. This estimation provides a range of values which the parameter is expected to lie. It generally gives more information than point estimates and are preferred when making inferences. In some way, we can say that point estimation is the opposite of interval estimation.