Univariate is a term commonly used in statistics to describe a type of data which consists of observations on only a single characteristic or attribute. A simple example of univariate data would be the salaries of workers in industry. Like all the other data, univariate data can be visualized using graphs, images or other analysis tools after the data is measured, collected, reported, and analyzed.
Data types
Some univariate data consists of numbers (such as the height of 65 inches or the weight of 100 pounds), while others are nonnumerical (such as eye colors of brown or blue). Generally, the terms
categorical univariate data and
numerical univariate data are used to distinguish between these types.
Categorical univariate data
Categorical univariate data consists of non-numerical
observation
Observation in the natural sciences is an act or instance of noticing or perceiving and the acquisition of information from a primary source. In living beings, observation employs the senses. In science, observation can also involve the percep ...
s that may be placed in categories. It includes labels or names used to identify an attribute of each element. Categorical univariate data usually use either
nominal or
ordinal scale of measurement.
Numerical univariate data
Numerical univariate data consists of observations that are numbers. They are obtained using either
interval or
ratio
In mathematics, a ratio () shows how many times one number contains another. For example, if there are eight oranges and six lemons in a bowl of fruit, then the ratio of oranges to lemons is eight to six (that is, 8:6, which is equivalent to the ...
scale of measurement. This type of univariate data can be classified even further into two subcategories:
discrete
Discrete may refer to:
*Discrete particle or quantum in physics, for example in quantum theory
* Discrete device, an electronic component with just one circuit element, either passive or active, other than an integrated circuit
* Discrete group, ...
and
continuous
Continuity or continuous may refer to:
Mathematics
* Continuity (mathematics), the opposing concept to discreteness; common examples include
** Continuous probability distribution or random variable in probability and statistics
** Continuous ...
.
A numerical univariate data is discrete if the set of all possible values is
finite
Finite may refer to:
* Finite set, a set whose cardinality (number of elements) is some natural number
* Finite verb, a verb form that has a subject, usually being inflected or marked for person and/or tense or aspect
* "Finite", a song by Sara Gr ...
or countably
infinite. Discrete univariate data are usually associated with counting (such as the number of books read by a person). A numerical univariate data is continuous if the set of all possible values is an interval of numbers. Continuous univariate data are usually associated with measuring (such as the weights of people).
Data analysis and applications
Univariate analysis is the simplest form of analyzing data. ''Uni'' means "one", so the data has only one variable (''
univariate
In mathematics, a univariate object is an expression (mathematics), expression, equation, function (mathematics), function or polynomial involving only one Variable (mathematics), variable. Objects involving more than one variable are ''wikt:multi ...
'').
Univariate data requires to analyze each
variable separately. Data is gathered for the purpose of answering a question, or more specifically, a research question. Univariate data does not answer research questions about relationships between variables, but rather it is used to describe one characteristic or attribute that varies from observation to observation.
Usually there are two purposes that a researcher can look for. The first one is to answer a research question with descriptive study and the second one is to get knowledge about how
attribute varies with individual effect of a variable in
regression analysis. There are some ways to describe patterns found in univariate data which include graphical methods, measures of central tendency and measures of variability.
Like other forms of statistics, it can be
inferential or
descriptive
In the study of language, description or descriptive linguistics is the work of objectively analyzing and describing how language is actually used (or how it was used in the past) by a speech community. François & Ponsonnet (2013).
All aca ...
. The key fact is that only one variable is involved.
Univariate analysis can yield misleading results in cases in which
multivariate analysis
Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., '' multivariate random variables''.
Multivariate statistics concerns understanding the differ ...
is more appropriate.
Measures of central tendency
Central tendency is one of the most common numerical descriptive measures. It is used to estimate the central location of the univariate data by the calculation of
mean
A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
,
median
The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “ ...
and
mode. Each of these calculations has its own advantages and limitations. The mean has the advantage that its calculation includes each value of the data set, but it is particularly susceptible to the influence of
outlier
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s. The median is a better measure when the data set contains outliers. The mode is simple to locate.
One is not restricted to using only one of these measures of central tendency. If the data being analyzed is categorical, then the only measure of central tendency that can be used is the mode. However, if the data is numerical in nature (
ordinal or
interval/
ratio
In mathematics, a ratio () shows how many times one number contains another. For example, if there are eight oranges and six lemons in a bowl of fruit, then the ratio of oranges to lemons is eight to six (that is, 8:6, which is equivalent to the ...
) then the mode, median, or mean can all be used to describe the data. Using more than one of these measures provides a more accurate descriptive summary of central tendency for the univariate.
Measures of variability
A measure of
variability or
dispersion (deviation from the mean) of a univariate data set can reveal the shape of a univariate data distribution more sufficiently. It will provide some information about the variation among data values. The measures of variability together with the measures of central tendency give a better picture of the data than the measures of central tendency alone. The three most frequently used measures of variability are
range
Range may refer to:
Geography
* Range (geographic), a chain of hills or mountains; a somewhat linear, complex mountainous or hilly area (cordillera, sierra)
** Mountain range, a group of mountains bordered by lowlands
* Range, a term used to i ...
,
variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
and
standard deviation
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
. The appropriateness of each measure would depend on the type of data, the shape of the distribution of data and which measure of central tendency are being used. If the data is categorical, then there is no measure of variability to report. For data that is numerical, all three measures are possible. If the distribution of data is symmetrical, then the measures of variability are usually the variance and standard deviation. However, if the data are
skewed
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.
For a unimodal ...
, then the measure of variability that would be appropriate for that data set is the range.
Descriptive methods
Descriptive statistics describe a sample or population. They can be part of
exploratory data analysis
In statistics, exploratory data analysis (EDA) is an approach of data analysis, analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. A statistical model can be used or ...
.
The appropriate statistic depends on the
level of measurement
Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scale ...
. For nominal variables, a
frequency table
In statistics, the frequency or absolute frequency of an event i is the number n_i of times the observation has occurred/been recorded in an experiment or study. These frequencies are often depicted graphically or tabular form.
Types
The cumula ...
and a listing of the
mode(s) is sufficient. For ordinal variables the
median
The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “ ...
can be calculated as a measure of
central tendency
In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probability distribution.Weisberg H.F (1992) ''Central Tendency and Variability'', Sage University Paper Series on Quantitative Applications in ...
and the
range
Range may refer to:
Geography
* Range (geographic), a chain of hills or mountains; a somewhat linear, complex mountainous or hilly area (cordillera, sierra)
** Mountain range, a group of mountains bordered by lowlands
* Range, a term used to i ...
(and variations of it) as a measure of dispersion. For interval level variables, the
arithmetic mean
In mathematics and statistics, the arithmetic mean ( ), arithmetic average, or just the ''mean'' or ''average'' is the sum of a collection of numbers divided by the count of numbers in the collection. The collection is often a set of results fr ...
(average) and
standard deviation
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
are added to the toolbox and, for ratio level variables, we add the
geometric mean
In mathematics, the geometric mean is a mean or average which indicates a central tendency of a finite collection of positive real numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum). The geometri ...
and
harmonic mean
In mathematics, the harmonic mean is a kind of average, one of the Pythagorean means.
It is the most appropriate average for ratios and rate (mathematics), rates such as speeds, and is normally only used for positive arguments.
The harmonic mean ...
as measures of central tendency and the
coefficient of variation
In probability theory and statistics, the coefficient of variation (CV), also known as normalized root-mean-square deviation (NRMSD), percent RMS, and relative standard deviation (RSD), is a standardized measure of dispersion of a probability ...
as a measure of dispersion.
For interval and ratio level data, further descriptors include the variable's skewness and
kurtosis
In probability theory and statistics, kurtosis (from , ''kyrtos'' or ''kurtos'', meaning "curved, arching") refers to the degree of “tailedness” in the probability distribution of a real-valued random variable. Similar to skewness, kurtos ...
.
Inferential methods
Inferential methods allow us to infer from a sample to a population.
For a nominal variable a one-way chi-square (goodness of fit) test can help determine if our sample matches that of some population. For interval and ratio level data, a
one-sample t-test can let us infer whether the mean in our sample matches some proposed number (typically 0). Other available tests of location include the one-sample
sign test
The sign test is a statistical test for consistent differences between pairs of observations, such as the weight of subjects before and after treatment. Given pairs of observations (such as weight pre- and post-treatment) for each subject, the sign ...
and
Wilcoxon signed rank test.
Graphical methods
The most frequently used graphical illustrations for univariate data are:
Frequency distribution tables
Frequency is how many times a number occurs. The frequency of an observation in statistics tells us the number of times the observation occurs in the data. For example, in the following list of numbers , the frequency of the number 9 is 5 (because it occurs 5 times in this data set).
Bar charts

Bar chart is a
graph
Graph may refer to:
Mathematics
*Graph (discrete mathematics), a structure made of vertices and edges
**Graph theory, the study of such graphs and their properties
*Graph (topology), a topological space resembling a graph in the sense of discret ...
consisting of
rectangular
In Euclidean plane geometry, a rectangle is a rectilinear convex polygon or a quadrilateral with four right angles. It can also be defined as: an equiangular quadrilateral, since equiangular means that all of its angles are equal (360°/4 = 90 ...
bars. These bars actually represents
number
A number is a mathematical object used to count, measure, and label. The most basic examples are the natural numbers 1, 2, 3, 4, and so forth. Numbers can be represented in language with number words. More universally, individual numbers can ...
or percentage of observations of existing categories in a variable. The
length
Length is a measure of distance. In the International System of Quantities, length is a quantity with Dimension (physical quantity), dimension distance. In most systems of measurement a Base unit (measurement), base unit for length is chosen, ...
or
height
Height is measure of vertical distance, either vertical extent (how "tall" something or someone is) or vertical position (how "high" a point is). For an example of vertical extent, "This basketball player is 7 foot 1 inches in height." For an e ...
of bars gives a visual representation of the proportional differences among categories.
Histograms
Histograms
A histogram is a visual representation of the distribution of quantitative data. To construct a histogram, the first step is to "bin" (or "bucket") the range of values— divide the entire range of values into a series of intervals—and then ...
are used to estimate distribution of the data, with the frequency of values assigned to a value range called a
bin.
Pie charts

Pie chart is a circle divided into portions that represent the relative frequencies or percentages of a population or a sample belonging to different categories.
Distributions
Univariate distribution In statistics, a univariate distribution is a probability distribution of only one random variable. This is in contrast to a multivariate distribution, the probability distribution of a random vector (consisting of multiple random variables).
Exam ...
is a dispersal type of a single random variable described either with a
probability mass function
In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...
(pmf) for
discrete probability distribution
In probability theory and statistics, a probability distribution is a function that gives the probabilities of occurrence of possible events for an experiment. It is a mathematical description of a random phenomenon in terms of its sample spa ...
, or
probability density function
In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
(pdf) for
continuous probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
.
[{{cite book, last1=Samaniego, first1=Francisco J., title=Stochastic modeling and mathematical statistics : a text for statisticians and quantitative scientists, date=2014, publisher=CRC Press, location=Boca Raton, isbn=978-1-4665-6046-8, page=167] It is not to be confused with
multivariate distribution
Multivariate is the quality of having multiple variables.
It may also refer to:
In mathematics
* Multivariable calculus
* Multivariate function
* Multivariate polynomial
* Multivariate interpolation
* Multivariate optimization
In computing
* ...
.
Common discrete distributions
*
Uniform distribution (discrete)
In probability theory and statistics, the discrete uniform distribution is a symmetric probability distribution wherein each of some finite whole number ''n'' of outcome values are equally likely to be observed. Thus every one of the ''n'' o ...
*
Bernoulli distribution
In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability p and the value 0 with pro ...
*
Binomial distribution
In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...
*
Geometric distribution
In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:
* The probability distribution of the number X of Bernoulli trials needed to get one success, supported on \mathbb = \;
* T ...
*
Negative binomial distribution
In probability theory and statistics, the negative binomial distribution, also called a Pascal distribution, is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Berno ...
*
Poisson distribution
In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...
*
Hypergeometric distribution
In probability theory and statistics, the hypergeometric distribution is a Probability distribution#Discrete probability distribution, discrete probability distribution that describes the probability of k successes (random draws for which the ...
*
Zeta distribution
Common continuous distributions
*
Uniform distribution (continuous)
In probability theory and statistics, the continuous uniform distributions or rectangular distributions are a family of symmetric probability distributions. Such a distribution describes an experiment where there is an arbitrary outcome that li ...
*
Normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
*
Gamma distribution
In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the g ...
*
Exponential distribution
In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuousl ...
*
Weibull distribution
In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It models a broad range of random variables, largely in the nature of a time to failure or time between events. Examples are maximum on ...
*
Cauchy distribution
The Cauchy distribution, named after Augustin-Louis Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) ...
*
Beta distribution
In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval , 1
The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...
or (0, 1) in terms of two positive Statistical parameter, parameters, denoted by ''alpha'' (''α'') an ...
See also
*
Univariate
In mathematics, a univariate object is an expression (mathematics), expression, equation, function (mathematics), function or polynomial involving only one Variable (mathematics), variable. Objects involving more than one variable are ''wikt:multi ...
*
Univariate distribution In statistics, a univariate distribution is a probability distribution of only one random variable. This is in contrast to a multivariate distribution, the probability distribution of a random vector (consisting of multiple random variables).
Exam ...
*
Bivariate analysis
Bivariate analysis is one of the simplest forms of quantitative (statistical) analysis. Earl R. Babbie, ''The Practice of Social Research'', 12th edition, Wadsworth Publishing, 2009, , pp. 436–440 It involves the analysis of two variables (oft ...
*
Multivariate analysis
Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., '' multivariate random variables''.
Multivariate statistics concerns understanding the differ ...
*
List of probability distributions
References
Mathematical terminology
Statistical data