Univariate is a term commonly used in statistics to describe a type of data which consists of observations on only a single characteristic or attribute. A simple example of univariate data would be the salaries of workers in industry. Like all the other data, univariate data can be visualized using graphs, images or other analysis tools after the data is measured, collected, reported, and analyzed.

Data types

Some univariate data consists of numbers (such as the height of 65 inches or the weight of 100 pounds), while others are nonnumerical (such as eye colors of brown or blue). Generally, the terms categorical univariate data and numerical univariate data are used to distinguish between these types.

Categorical univariate data

Categorical univariate data consists of non-numerical

observation Observation in the natural sciences is an act or instance of noticing or perceiving and the acquisition of information from a primary source. In living beings, observation employs the senses. In science, observation can also involve the percep ...

s that may be placed in categories. It includes labels or names used to identify an attribute of each element. Categorical univariate data usually use either nominal or ordinal scale of measurement.

Numerical univariate data

Numerical univariate data consists of observations that are numbers. They are obtained using either interval or

ratio In mathematics, a ratio () shows how many times one number contains another. For example, if there are eight oranges and six lemons in a bowl of fruit, then the ratio of oranges to lemons is eight to six (that is, 8:6, which is equivalent to the ...

scale of measurement. This type of univariate data can be classified even further into two subcategories:

discrete Discrete may refer to: *Discrete particle or quantum in physics, for example in quantum theory * Discrete device, an electronic component with just one circuit element, either passive or active, other than an integrated circuit * Discrete group, ...

and

continuous Continuity or continuous may refer to: Mathematics * Continuity (mathematics), the opposing concept to discreteness; common examples include ** Continuous probability distribution or random variable in probability and statistics ** Continuous ...

. A numerical univariate data is discrete if the set of all possible values is

finite Finite may refer to: * Finite set, a set whose cardinality (number of elements) is some natural number * Finite verb, a verb form that has a subject, usually being inflected or marked for person and/or tense or aspect * "Finite", a song by Sara Gr ...

or countably infinite. Discrete univariate data are usually associated with counting (such as the number of books read by a person). A numerical univariate data is continuous if the set of all possible values is an interval of numbers. Continuous univariate data are usually associated with measuring (such as the weights of people).

Data analysis and applications

Univariate analysis is the simplest form of analyzing data. ''Uni'' means "one", so the data has only one variable (''

univariate In mathematics, a univariate object is an expression (mathematics), expression, equation, function (mathematics), function or polynomial involving only one Variable (mathematics), variable. Objects involving more than one variable are ''wikt:multi ...

''). Univariate data requires to analyze each variable separately. Data is gathered for the purpose of answering a question, or more specifically, a research question. Univariate data does not answer research questions about relationships between variables, but rather it is used to describe one characteristic or attribute that varies from observation to observation. Usually there are two purposes that a researcher can look for. The first one is to answer a research question with descriptive study and the second one is to get knowledge about how attribute varies with individual effect of a variable in regression analysis. There are some ways to describe patterns found in univariate data which include graphical methods, measures of central tendency and measures of variability. Like other forms of statistics, it can be inferential or

descriptive In the study of language, description or descriptive linguistics is the work of objectively analyzing and describing how language is actually used (or how it was used in the past) by a speech community. François & Ponsonnet (2013). All aca ...

. The key fact is that only one variable is involved. Univariate analysis can yield misleading results in cases in which

multivariate analysis Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., '' multivariate random variables''. Multivariate statistics concerns understanding the differ ...

is more appropriate.

Measures of central tendency

Central tendency is one of the most common numerical descriptive measures. It is used to estimate the central location of the univariate data by the calculation of

mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...

median The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “ ...

and mode. Each of these calculations has its own advantages and limitations. The mean has the advantage that its calculation includes each value of the data set, but it is particularly susceptible to the influence of

outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...

s. The median is a better measure when the data set contains outliers. The mode is simple to locate. One is not restricted to using only one of these measures of central tendency. If the data being analyzed is categorical, then the only measure of central tendency that can be used is the mode. However, if the data is numerical in nature ( ordinal or interval/

) then the mode, median, or mean can all be used to describe the data. Using more than one of these measures provides a more accurate descriptive summary of central tendency for the univariate.

Measures of variability

A measure of variability or dispersion (deviation from the mean) of a univariate data set can reveal the shape of a univariate data distribution more sufficiently. It will provide some information about the variation among data values. The measures of variability together with the measures of central tendency give a better picture of the data than the measures of central tendency alone. The three most frequently used measures of variability are

range Range may refer to: Geography * Range (geographic), a chain of hills or mountains; a somewhat linear, complex mountainous or hilly area (cordillera, sierra) ** Mountain range, a group of mountains bordered by lowlands * Range, a term used to i ...

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

and

standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...

. The appropriateness of each measure would depend on the type of data, the shape of the distribution of data and which measure of central tendency are being used. If the data is categorical, then there is no measure of variability to report. For data that is numerical, all three measures are possible. If the distribution of data is symmetrical, then the measures of variability are usually the variance and standard deviation. However, if the data are

skewed In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal ...

, then the measure of variability that would be appropriate for that data set is the range.

Descriptive methods

Descriptive statistics describe a sample or population. They can be part of

exploratory data analysis In statistics, exploratory data analysis (EDA) is an approach of data analysis, analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. A statistical model can be used or ...

. The appropriate statistic depends on the

level of measurement Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scale ...

. For nominal variables, a

frequency table In statistics, the frequency or absolute frequency of an event i is the number n_i of times the observation has occurred/been recorded in an experiment or study. These frequencies are often depicted graphically or tabular form. Types The cumula ...

and a listing of the mode(s) is sufficient. For ordinal variables the

can be calculated as a measure of

central tendency In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probability distribution.Weisberg H.F (1992) ''Central Tendency and Variability'', Sage University Paper Series on Quantitative Applications in ...

and the

(and variations of it) as a measure of dispersion. For interval level variables, the

arithmetic mean In mathematics and statistics, the arithmetic mean ( ), arithmetic average, or just the ''mean'' or ''average'' is the sum of a collection of numbers divided by the count of numbers in the collection. The collection is often a set of results fr ...

(average) and

are added to the toolbox and, for ratio level variables, we add the

geometric mean In mathematics, the geometric mean is a mean or average which indicates a central tendency of a finite collection of positive real numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum). The geometri ...

and

harmonic mean In mathematics, the harmonic mean is a kind of average, one of the Pythagorean means. It is the most appropriate average for ratios and rate (mathematics), rates such as speeds, and is normally only used for positive arguments. The harmonic mean ...

as measures of central tendency and the

coefficient of variation In probability theory and statistics, the coefficient of variation (CV), also known as normalized root-mean-square deviation (NRMSD), percent RMS, and relative standard deviation (RSD), is a standardized measure of dispersion of a probability ...

as a measure of dispersion. For interval and ratio level data, further descriptors include the variable's skewness and

kurtosis In probability theory and statistics, kurtosis (from , ''kyrtos'' or ''kurtos'', meaning "curved, arching") refers to the degree of “tailedness” in the probability distribution of a real-valued random variable. Similar to skewness, kurtos ...

Inferential methods

Inferential methods allow us to infer from a sample to a population. For a nominal variable a one-way chi-square (goodness of fit) test can help determine if our sample matches that of some population. For interval and ratio level data, a one-sample t-test can let us infer whether the mean in our sample matches some proposed number (typically 0). Other available tests of location include the one-sample

sign test The sign test is a statistical test for consistent differences between pairs of observations, such as the weight of subjects before and after treatment. Given pairs of observations (such as weight pre- and post-treatment) for each subject, the sign ...

and Wilcoxon signed rank test.

Graphical methods

The most frequently used graphical illustrations for univariate data are:

Frequency distribution tables

Frequency is how many times a number occurs. The frequency of an observation in statistics tells us the number of times the observation occurs in the data. For example, in the following list of numbers , the frequency of the number 9 is 5 (because it occurs 5 times in this data set).

Bar charts

Bar chart is a

graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discret ...

consisting of

rectangular In Euclidean plane geometry, a rectangle is a rectilinear convex polygon or a quadrilateral with four right angles. It can also be defined as: an equiangular quadrilateral, since equiangular means that all of its angles are equal (360°/4 = 90 ...

bars. These bars actually represents

number A number is a mathematical object used to count, measure, and label. The most basic examples are the natural numbers 1, 2, 3, 4, and so forth. Numbers can be represented in language with number words. More universally, individual numbers can ...

or percentage of observations of existing categories in a variable. The

length Length is a measure of distance. In the International System of Quantities, length is a quantity with Dimension (physical quantity), dimension distance. In most systems of measurement a Base unit (measurement), base unit for length is chosen, ...

height Height is measure of vertical distance, either vertical extent (how "tall" something or someone is) or vertical position (how "high" a point is). For an example of vertical extent, "This basketball player is 7 foot 1 inches in height." For an e ...

of bars gives a visual representation of the proportional differences among categories.

Histograms

Histograms A histogram is a visual representation of the distribution of quantitative data. To construct a histogram, the first step is to "bin" (or "bucket") the range of values— divide the entire range of values into a series of intervals—and then ...

are used to estimate distribution of the data, with the frequency of values assigned to a value range called a bin.

Pie charts

Pie chart is a circle divided into portions that represent the relative frequencies or percentages of a population or a sample belonging to different categories.

Distributions

Univariate distribution In statistics, a univariate distribution is a probability distribution of only one random variable. This is in contrast to a multivariate distribution, the probability distribution of a random vector (consisting of multiple random variables). Exam ...

is a dispersal type of a single random variable described either with a

probability mass function In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...

(pmf) for

discrete probability distribution In probability theory and statistics, a probability distribution is a function that gives the probabilities of occurrence of possible events for an experiment. It is a mathematical description of a random phenomenon in terms of its sample spa ...

, or

probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...

(pdf) for

continuous probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

.{{cite book, last1=Samaniego, first1=Francisco J., title=Stochastic modeling and mathematical statistics : a text for statisticians and quantitative scientists, date=2014, publisher=CRC Press, location=Boca Raton, isbn=978-1-4665-6046-8, page=167 It is not to be confused with

multivariate distribution Multivariate is the quality of having multiple variables. It may also refer to: In mathematics * Multivariable calculus * Multivariate function * Multivariate polynomial * Multivariate interpolation * Multivariate optimization In computing * ...

Common discrete distributions

Uniform distribution (discrete) In probability theory and statistics, the discrete uniform distribution is a symmetric probability distribution wherein each of some finite whole number ''n'' of outcome values are equally likely to be observed. Thus every one of the ''n'' o ...

Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability p and the value 0 with pro ...

Binomial distribution In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...

Geometric distribution In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions: * The probability distribution of the number X of Bernoulli trials needed to get one success, supported on \mathbb = \; * T ...

Negative binomial distribution In probability theory and statistics, the negative binomial distribution, also called a Pascal distribution, is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Berno ...

Poisson distribution In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...

Hypergeometric distribution In probability theory and statistics, the hypergeometric distribution is a Probability distribution#Discrete probability distribution, discrete probability distribution that describes the probability of k successes (random draws for which the ...

* Zeta distribution

Common continuous distributions

Uniform distribution (continuous) In probability theory and statistics, the continuous uniform distributions or rectangular distributions are a family of symmetric probability distributions. Such a distribution describes an experiment where there is an arbitrary outcome that li ...

Normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...

Gamma distribution In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the g ...

Exponential distribution In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuousl ...

Weibull distribution In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It models a broad range of random variables, largely in the nature of a time to failure or time between events. Examples are maximum on ...

Cauchy distribution The Cauchy distribution, named after Augustin-Louis Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) ...

Beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval

, 1 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...

or (0, 1) in terms of two positive Statistical parameter, parameters, denoted by ''alpha'' (''α'') an ...

References

Mathematical terminology Statistical data