Bivariate analysis is one of the simplest forms of
quantitative (statistical) analysis.
[ Earl R. Babbie, ''The Practice of Social Research'', 12th edition, Wadsworth Publishing, 2009, , pp. 436–440] It involves the analysis of two
variables (often denoted as ''X'', ''Y''), for the purpose of determining the empirical relationship between them.
[
Bivariate analysis can be helpful in testing simple ]hypotheses
A hypothesis (: hypotheses) is a proposed explanation for a phenomenon. A scientific method, scientific hypothesis must be based on observations and make a testable and reproducible prediction about reality, in a process beginning with an educ ...
of association. Bivariate analysis can help determine to what extent it becomes easier to know and predict a value for one variable (possibly a dependent variable
A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical functio ...
) if we know the value of the other variable (possibly the independent variable
A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical function ...
) (see also correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
and simple linear regression
In statistics, simple linear regression (SLR) is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the ''x ...
).[Bivariate Analysis]
Sociology Index
Bivariate analysis can be contrasted with univariate analysis
Univariate is a term commonly used in statistics to describe a type of data which consists of observations on only a single characteristic or attribute. A simple example of univariate data would be the salaries of workers in industry. Like all the ...
in which only one variable is analysed.[ Like univariate analysis, bivariate analysis can be ]descriptive
In the study of language, description or descriptive linguistics is the work of objectively analyzing and describing how language is actually used (or how it was used in the past) by a speech community. François & Ponsonnet (2013).
All aca ...
or inferential. It is the analysis of the relationship between the two variables.[ Bivariate analysis is a simple (two variable) special case of ]multivariate analysis
Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., '' multivariate random variables''.
Multivariate statistics concerns understanding the differ ...
(where multiple relations between multiple variables are examined simultaneously).[
]
Bivariate Regression
Regression is a statistical technique used to help investigate how variation in one or more variables predicts or explains variation in another variable. Bivariate regression aims to identify the equation representing the optimal line that defines the relationship between two variables based on a particular data set. This equation is subsequently applied to anticipate values of the dependent variable not present in the initial dataset. Through regression analysis, one can derive the equation for the curve or straight line and obtain the correlation coefficient.
Simple Linear Regression
Simple linear regression is a statistical method used to model the linear relationship between an independent variable and a dependent variable. It assumes a linear relationship between the variables and is sensitive to outliers. The best-fitting linear equation is often represented as a straight line to minimize the difference between the predicted values from the equation and the actual observed values of the dependent variable.
Equation:
: independent variable (predictor)
: dependent variable (outcome)
: slope of the line
: -intercept
Least Squares Regression Line (LSRL)
The least squares regression line is a method in simple linear regression for modeling the linear relationship between two variables, and it serves as a tool for making predictions based on new values of the independent variable. The calculation is based on the method of the least squares
The method of least squares is a mathematical optimization technique that aims to determine the best fit function by minimizing the sum of the squares of the differences between the observed values and the predicted values of the model. The me ...
criterion. The goal is to minimize the sum of the squared vertical distances (residuals) between the observed y-values and the corresponding predicted y-values of each data point.
Bivariate Correlation
A bivariate correlation is a measure of whether and how two variables covary linearly, that is, whether the variance of one changes in a linear fashion as the variance of the other changes.
Covariance can be difficult to interpret across studies because it depends on the scale or level of measurement used. For this reason, covariance is standardized by dividing by the product of the standard deviations of the two variables to produce the Pearson product–moment correlation coefficient (also referred to as the Pearson correlation coefficient
In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviatio ...
or correlation coefficient), which is usually denoted by the letter “''r''.”
Pearson’s correlation coefficient is used when both variables are measured on an interval or ratio scale. Other correlation coefficients or analyses are used when variables are not interval or ratio, or when they are not normally distributed. Examples are Spearman’s correlation coefficient, Kendall’s tau, Biserial correlation
The point biserial correlation coefficient (''rpb'') is a correlation coefficient used when one variable (e.g. ''Y'') is dichotomous; ''Y'' can either be "naturally" dichotomous, like whether a coin lands heads or tails, or an artificially dichoto ...
, and Chi-square analysis.
Three important notes should be highlighted with regard to correlation:
* The presence of outliers can severely bias the correlation coefficient.
* Large sample sizes can result in statistically significant correlations that may have little or no practical significance.
* It is not possible to draw conclusions about causality based on correlation analyses alone.
When there is a dependent variable
If the dependent variable
A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical functio ...
—the one whose value is determined to some extent by the other, independent variable
A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical function ...
— is a categorical variable
In statistics, a categorical variable (also called qualitative variable) is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or ...
, such as the preferred brand of cereal, then probit
In probability theory and statistics, the probit function is the quantile function associated with the standard normal distribution. It has applications in data analysis and machine learning, in particular exploratory statistical graphics and ...
or logit
In statistics, the logit ( ) function is the quantile function associated with the standard logistic distribution. It has many uses in data analysis and machine learning, especially in Data transformation (statistics), data transformations.
Ma ...
regression (or multinomial probit
In statistics and econometrics, the multinomial probit model is a generalization of the probit model used when there are several possible categories that the dependent variable can fall into. As such, it is an alternative to the multinomial logi ...
or multinomial logit
In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the prob ...
) can be used. If both variables are ordinal, meaning they are ranked in a sequence as first, second, etc., then a rank correlation
In statistics, a rank correlation is any of several statistics that measure an ordinal association — the relationship between rankings of different ordinal data, ordinal variables or different rankings of the same variable, where a "ranking" is t ...
coefficient can be computed. If just the dependent variable is ordinal, ordered probit or ordered logit can be used. If the dependent variable is continuous—either interval level or ratio level, such as a temperature scale or an income scale—then simple regression can be used.
If both variables are time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
, a particular type of causality known as Granger causality can be tested for, and vector autoregression can be performed to examine the intertemporal linkages between the variables.
When there is not a dependent variable
When neither variable can be regarded as dependent on the other, regression is not appropriate but some form of correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
analysis may be.
Graphical methods
Graphs
Graph may refer to:
Mathematics
*Graph (discrete mathematics), a structure made of vertices and edges
**Graph theory, the study of such graphs and their properties
* Graph (topology), a topological space resembling a graph in the sense of discre ...
that are appropriate for bivariate analysis depend on the type of variable. For two continuous variables, a scatterplot
A scatter plot, also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram, is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of dat ...
is a common graph. When one variable is categorical and the other continuous, a box plot
In descriptive statistics, a box plot or boxplot is a method for demonstrating graphically the locality, spread and skewness groups of numerical data through their quartiles.
In addition to the box on a box plot, there can be lines (which are ca ...
is common and when both are categorical a mosaic plot
A mosaic plot, Marimekko chart, Mekko chart, or sometimes percent stacked bar plot, is a graphical visualization of data from two or more qualitative variables. It is the multidimensional extension of spineplots, which graphically display the same ...
is common. These graphs are part of descriptive statistics
A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and an ...
.
See also
* Canonical correlation
In statistics, canonical-correlation analysis (CCA), also called canonical variates analysis, is a way of inferring information from cross-covariance matrices. If we have two vectors ''X'' = (''X''1, ..., ''X'n'') and ''Y'' ...
* Coding (social sciences)
In the social sciences, coding is an analytical process in which data, in both quantitative form (such as questionnaires results) or qualitative form (such as interview transcripts) are categorized to facilitate analysis.
One purpose of coding ...
* Descriptive statistics
A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and an ...
External links
* Discriminant correlation analysis (DCA)[M. Haghighat, M. Abdel-Mottaleb, & W. Alhalabi (2016)]
Discriminant Correlation Analysis: Real-Time Feature Level Fusion for Multimodal Biometric Recognition
IEEE Transactions on Information Forensics and Security, 11(9), 1984-1996.
References
{{Statistics
Multivariate statistics
2 (number)