Social statistics is the use of
statistical
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
measurement systems to study
human behavior
Human behavior is the potential and expressed capacity (Energy (psychological), mentally, Physical activity, physically, and Social action, socially) of human individuals or groups to respond to internal and external Stimulation, stimuli throu ...
in a social environment. This can be accomplished through
polling a group of people, evaluating a subset of data obtained about a group of people, or by observation and statistical analysis of a set of data that relates to people and their behaviors.
Statistics in the social sciences
History
Adolph Quetelet was a proponent of
social physics
Social physics or sociophysics is a field of science which uses mathematical tools inspired by physics to understand the behavior of human crowds. In a modern commercial use, it can also refer to the analysis of social phenomena with big data.
Soc ...
. In his book ''Physique sociale'' he presents distributions of human
height
Height is measure of vertical distance, either vertical extent (how "tall" something or someone is) or vertical position (how "high" a point is). For an example of vertical extent, "This basketball player is 7 foot 1 inches in height." For an e ...
s,
age of marriage, time of birth and death,
time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
of human marriages, births and deaths, a
survival density for humans and curve describing
fecundity
Fecundity is defined in two ways; in human demography, it is the potential for reproduction of a recorded population as opposed to a sole organism, while in population biology, it is considered similar to fertility, the capability to produc ...
as a function of age. He also developed the
Quetelet Index.
Francis Ysidro Edgeworth
Francis Ysidro Edgeworth (8 February 1845 – 13 February 1926) was an Anglo-Irish philosopher and political economist who made significant contributions to the methods of statistics during the 1880s. From 1891 onward, he was appointed th ...
published "On Methods of Ascertaining Variations in the Rate of Births, Deaths, and Marriages" in 1885 which uses squares of differences for studying fluctuations and
George Udny Yule published "On the
Correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
of total
Pauperism
Pauperism (; ) is the condition of being a "pauper", i.e. receiving relief administered under the Irish poor laws, Irish and English Poor Laws. From this, pauperism can also be more generally the state of being supported at public expense, withi ...
with Proportion of
Out-Relief" in 1895.
A numerical
calibration
In measurement technology and metrology, calibration is the comparison of measurement values delivered by a device under test with those of a calibration standard of known accuracy. Such a standard could be another measurement device of known ...
for the fertility curve was given by
Karl Pearson
Karl Pearson (; born Carl Pearson; 27 March 1857 – 27 April 1936) was an English biostatistician and mathematician. He has been credited with establishing the discipline of mathematical statistics. He founded the world's first university ...
in 1897 in his "The Chances of Death, and Other Studies in Evolution" In this book Pearson also uses
standard deviation
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
,
correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
and
skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.
For a unimodal ...
for studying humans.
Vilfredo Pareto
Vilfredo Federico Damaso Pareto (; ; born Wilfried Fritz Pareto; 15 July 1848 – 19 August 1923) was an Italian polymath, whose areas of interest included sociology, civil engineering, economics, political science, and philosophy. He made severa ...
published his analysis of the
distribution of income
In economics, income distribution covers how a country's total GDP is distributed amongst its population. Economic theory and economic policy have long seen income and its distribution as a central concern. Unequal distribution of income causes ...
in
Great Britain
Great Britain is an island in the North Atlantic Ocean off the north-west coast of continental Europe, consisting of the countries England, Scotland, and Wales. With an area of , it is the largest of the British Isles, the List of European ...
and
Ireland
Ireland (, ; ; Ulster Scots dialect, Ulster-Scots: ) is an island in the North Atlantic Ocean, in Northwestern Europe. Geopolitically, the island is divided between the Republic of Ireland (officially Names of the Irish state, named Irelan ...
in 1897, this is now known as the
Pareto principle.
Louis Guttman
Louis Guttman (; February 10, 1916 – October 25, 1987) was an American sociologist and Professor of Social and Psychological Assessment at the Hebrew University of Jerusalem, known primarily for his work in social statistics.
Biography
Louis ( ...
proposed that the values of
ordinal variable
Ordinal data is a categorical, statistical data type where the variables have natural, ordered categories and the distances between the categories are not known. These data exist on an ordinal scale, one of four levels of measurement described b ...
s can be represented by a
Guttman scale
In the analysis of multivariate observations designed to assess subjects with respect to an attribute, a Guttman scale (named after Louis Guttman) is a single (unidimensional) ordinal scale for the assessment of the attribute, from which the ori ...
, which is useful if the number of variables is large and allows the use of techniques such as
ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression
In statistics, linear regression is a statistical model, model that estimates the relationship ...
.
Macroeconomic
Macroeconomics is a branch of economics that deals with the performance, structure, behavior, and decision-making of an economy as a whole. This includes regional, national, and global economies. Macroeconomists study topics such as output/ GDP ...
statistical research has provided
stylized facts
In social sciences, especially economics, a stylized fact is a simplified presentation of an empirical finding. Stylized facts are broad tendencies that aim to summarize the data, offering essential truths while ignoring individual details. Stylize ...
, which include:
*
Bowley's law (1937) regarding the proportion between
wages
A wage is payment made by an employer to an employee for work done in a specific period of time. Some examples of wage payments include compensatory payments such as ''minimum wage'', '' prevailing wage'', and ''yearly bonuses,'' and remune ...
and national output
* The
Phillips curve
The Phillips curve is an economic model, named after Bill Phillips, that correlates reduced unemployment with increasing wages in an economy. While Phillips did not directly link employment and inflation, this was a trivial deduction from his ...
(1958) regarding the relation between
wages
A wage is payment made by an employer to an employee for work done in a specific period of time. Some examples of wage payments include compensatory payments such as ''minimum wage'', '' prevailing wage'', and ''yearly bonuses,'' and remune ...
and
unemployment
Unemployment, according to the OECD (Organisation for Economic Co-operation and Development), is the proportion of people above a specified age (usually 15) not being in paid employment or self-employment but currently available for work du ...
Statistics and statistical analyses have become a key feature of social science: statistics is employed in
economics
Economics () is a behavioral science that studies the Production (economics), production, distribution (economics), distribution, and Consumption (economics), consumption of goods and services.
Economics focuses on the behaviour and interac ...
,
psychology
Psychology is the scientific study of mind and behavior. Its subject matter includes the behavior of humans and nonhumans, both consciousness, conscious and Unconscious mind, unconscious phenomena, and mental processes such as thoughts, feel ...
,
political science
Political science is the scientific study of politics. It is a social science dealing with systems of governance and Power (social and political), power, and the analysis of political activities, political philosophy, political thought, polit ...
,
sociology
Sociology is the scientific study of human society that focuses on society, human social behavior, patterns of Interpersonal ties, social relationships, social interaction, and aspects of culture associated with everyday life. The term sociol ...
and
anthropology
Anthropology is the scientific study of humanity, concerned with human behavior, human biology, cultures, society, societies, and linguistics, in both the present and past, including archaic humans. Social anthropology studies patterns of behav ...
.
Statistical methods in social sciences

Methods and concepts used in quantitative social sciences include:
*
Research design
Research design refers to the overall strategy utilized to answer research questions. A research design typically outlines the theories and models underlying a project; the research question(s) of a project; a strategy for gathering data and info ...
,
survey methodology
Survey methodology is "the study of survey methods".
As a field of applied statistics concentrating on human-research surveys, survey methodology studies the sampling of individual units from a population and associated techniques of survey d ...
and
survey sampling
In statistics, survey sampling describes the process of selecting a sample of elements from a target population to conduct a survey.
The term " survey" may refer to many different types or techniques of observation. In survey sampling it most oft ...
*
Delphi method The Delphi method or Delphi technique ( ; also known as Estimate-Talk-Estimate or ETE) is a structured communication technique or method, originally developed as a systematic, interactive forecasting method that relies on a panel of experts. Delphi ...
Statistical techniques include:
Covariance based methods
*
Regression analysis
*
Canonical correlation
In statistics, canonical-correlation analysis (CCA), also called canonical variates analysis, is a way of inferring information from cross-covariance matrices. If we have two vectors ''X'' = (''X''1, ..., ''X'n'') and ''Y'' ...
*
Causal analysis
Causal analysis is the field of experimental design and statistics pertaining to establishing cause and effect. Typically it involves establishing four elements: correlation, sequence in time (that is, causes must occur before their proposed effect ...
*
Multilevel models
*
Factor analysis
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observe ...
*
Linear discriminant analysis
Linear discriminant analysis (LDA), normal discriminant analysis (NDA), canonical variates analysis (CVA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to fi ...
*
Path analysis
*
Structural Equation Modeling
Structural equation modeling (SEM) is a diverse set of methods used by scientists for both observational and experimental research. SEM is used mostly in the social and behavioral science fields, but it is also used in epidemiology, business, ...
Probability based methods
*
Probit
In probability theory and statistics, the probit function is the quantile function associated with the standard normal distribution. It has applications in data analysis and machine learning, in particular exploratory statistical graphics and ...
and
logit
In statistics, the logit ( ) function is the quantile function associated with the standard logistic distribution. It has many uses in data analysis and machine learning, especially in Data transformation (statistics), data transformations.
Ma ...
*
Item response theory
In psychometrics, item response theory (IRT, also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of Test (student assessment), tests, questionnaires, and sim ...
*
Bayesian statistics
Bayesian statistics ( or ) is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about ...
*
Stochastic process
In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables in a probability space, where the index of the family often has the interpretation of time. Sto ...
*
Latent class model
In statistics, a latent class model (LCM) is a model for clustering multivariate discrete data. It assumes that the data arise from a mixture of discrete distributions, within each of which the variables are independent. It is called a latent class ...
Distance based methods
*
Cluster analysis
Cluster analysis or clustering is the data analyzing technique in which task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more Similarity measure, similar (in some specific sense defined by the ...
*
Multidimensional scaling
Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a data set. MDS is used to translate distances between each pair of n objects in a set into a configuration of n points mapped into an ...
Methods for categorical data
*
Classification analysis
*
Cohort analysis
Cohort analysis is a kind of behavioral analytics that breaks the data in a data set into related groups before analysis. These groups, or cohort (statistics), cohorts, usually share common characteristics or experiences within a defined time-span. ...
Usage and applications
Social scientists use social statistics for many purposes, including:
* the
evaluation
In common usage, evaluation is a systematic determination and assessment of a subject's merit, worth and significance, using criteria governed by a set of Standardization, standards. It can assist an organization, program, design, project or any o ...
of the quality of
service
Service may refer to:
Activities
* Administrative service, a required part of the workload of university faculty
* Civil service, the body of employees of a government
* Community service, volunteer service for the benefit of a community or a ...
s available to a group or organization,
* analyzing behaviors of groups of people in their environment and special situations,
* determining the wants of people through statistical
sampling
* evaluation of wage expenditures and savings
* preventing industrial diseases
* prevention of industrial accidents
*
labour dispute
A labor dispute is a disagreement between an employer and employees regarding the terms of employment. This could include disputes regarding conditions of employment, fringe benefits, hours of work, tenure, and wages to be negotiated during co ...
s, such as supporting the
Anthracite Coal Strike Commission of 1902-1903
* supporting governments in times of peace and war
Reliability
The use of statistics has become so widespread in the social sciences that many universities such as
Harvard
Harvard University is a private Ivy League research university in Cambridge, Massachusetts, United States. Founded in 1636 and named for its first benefactor, the Puritan clergyman John Harvard, it is the oldest institution of higher lear ...
, have developed institutes focusing on "quantitative social science." Harvard's Institute for Quantitative Social Science focuses mainly on fields like
political science
Political science is the scientific study of politics. It is a social science dealing with systems of governance and Power (social and political), power, and the analysis of political activities, political philosophy, political thought, polit ...
that incorporate the advanced causal statistical models that
Bayesian method
Bayesian inference ( or ) is a method of statistical inference in which Bayes' theorem is used to calculate a probability of a hypothesis, given prior evidence, and update it as more information becomes available. Fundamentally, Bayesian inferen ...
s provide. However, some experts in causality feel that these claims of
causal statistics are overstated.
[J. Pearl, Bayesianism and causality, or, why I am only a half-bayesian http://ftp.cs.ucla.edu/pub/stat_ser/r284-reprint.pdf] There is a debate regarding the uses and value of statistical methods in social science, especially in
political science
Political science is the scientific study of politics. It is a social science dealing with systems of governance and Power (social and political), power, and the analysis of political activities, political philosophy, political thought, polit ...
, with some statisticians questioning practices such as
data dredging
Data dredging, also known as data snooping or ''p''-hacking is the misuse of data analysis to find patterns in data that can be presented as statistically significant, thus dramatically increasing and understating the risk of false positives. Th ...
that can lead to unreliable policy conclusions of political partisans who overestimate the interpretive power that non-robust statistical methods such as simple and multiple
linear regression
In statistics, linear regression is a statistical model, model that estimates the relationship between a Scalar (mathematics), scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A mode ...
allow. Indeed, an important axiom that social scientists cite, but often forget, is that "
correlation does not imply causation
The phrase "correlation does not imply causation" refers to the inability to legitimately deduce a cause-and-effect relationship between two events or variables solely on the basis of an observed association or correlation between them. The id ...
."
Further reading
*
*
*
*Irvine, John, Miles, Ian, Evans, Jeff, (editors), "Demystifying Social Statistics ", London : Pluto Press, 1979.
*
*
References
External links
Center for Statistics and Social Sciences, University of WashingtonCenter for the Promotion of Research Involving Innovative Statistical Methodology, New York University, NYCentre for Research Methods, Faculty of Social Sciences, University of Helsinki, Finland
{{Authority control