A Likert scale ( ,) is a

psychometric Psychometrics is a field of study within psychology concerned with the theory and technique of measurement. Psychometrics generally covers specialized fields within psychology and education devoted to testing, measurement, assessment, and rela ...

scale named after its inventor, American social psychologist

Rensis Likert Rensis Likert ( ; August5, 1903September3, 1981) was an American organizational and social psychologist known for developing the Likert scale, a psychometrically sound scale based on responses to multiple questions. The scale has become a method ...

, which is commonly used in research

questionnaire A questionnaire is a research instrument that consists of a set of questions (or other types of prompts) for the purpose of gathering information from respondents through survey or statistical study. A research questionnaire is typically a mix of ...

s. It is the most widely used approach to scaling responses in survey research, such that the term (or more fully the Likert-type scale) is often used interchangeably with ''

rating scale A rating scale is a set of categories designed to obtain information about a quantitative property, quantitative or a Qualitative data, qualitative attribute. In the social sciences, particularly psychology, common examples are the Likert scale, L ...

'', although there are other types of rating scales. Likert distinguished between a scale proper, which emerges from collective responses to a set of items (usually eight or more), and the format in which responses are scored along a range. Technically speaking, a Likert scale refers only to the former. The difference between these two concepts has to do with the distinction Likert made between the underlying phenomenon being investigated and the means of capturing variation that points to the underlying phenomenon. When responding to a Likert item, respondents specify their level of agreement or disagreement on a symmetric agree-disagree scale for a series of statements. Thus, the range captures the intensity of their feelings for a given item. A scale can be created as the simple sum or average of questionnaire responses over the set of individual items (questions). In so doing, Likert scaling assumes distances between each choice (answer option) are equal. Many researchers employ a set of such items that are highly correlated (that show high

internal consistency In statistics and research, internal consistency is typically a measure based on the correlations between different items on the same test (or the same subscale on a larger test). It measures whether several items that propose to measure the same g ...

) but also that together will capture the full domain under study (which requires less-than perfect correlations). Others hold to a standard by which "All items are assumed to be replications of each other or in other words items are considered to be parallel instruments". By contrast, modern test theory treats the difficulty of each item (the ICCs) as information to be incorporated in scaling items.

Composition

A ''Likert scale'' is the sum of responses on several ''Likert item''s. Because many Likert scales pair each constituent Likert item with its own instance of a

visual analogue scale The visual analogue scale (VAS) is a psychometric response scale that can be used in questionnaires. It is a measurement instrument for subjective characteristics or attitudes that cannot be directly measured. When responding to a VAS item, respond ...

(e.g., a horizontal line, on which the subject indicates a response by circling or checking tick-marks), an individual item is itself sometimes erroneously referred to as being or having a scale, with this error creating pervasive confusion in the literature and parlance of the field. A Likert item is simply a statement that the respondent is asked to evaluate by giving it a quantitative value on any kind of subjective or objective dimension, with level of agreement/disagreement being the dimension most commonly used. Well-designed Likert items exhibit both "symmetry" and "balance". Symmetry means that they contain equal numbers of positive and negative positions whose respective distances apart are bilaterally symmetric about the "neutral"/zero value (whether or not that value is presented as a candidate). Balance means that the distance between each candidate value is the same, allowing for quantitative comparisons such as averaging to be valid across items containing more than two candidate values. The format of a typical five-level Likert item, for example, could be: # Strongly disagree # Disagree # Neither agree nor disagree # Agree # Strongly agree Likert scaling is a bipolar scaling method, measuring either positive or negative response to a statement. Sometimes an even-point scale is used, where the middle option of "neither agree nor disagree" is not available. This is sometimes called a "forced choice" method, since the neutral option is removed. The neutral option can be seen as an easy option to take when a respondent is unsure, and so whether it is a true neutral option is questionable. A 1987 study found negligible differences between the use of "undecided" and "neutral" as the middle option in a five-point Likert scale. Likert scales may be subject to distortion from several causes. Respondents may: * Avoid using extreme response categories (''

central tendency In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probability distribution.Weisberg H.F (1992) ''Central Tendency and Variability'', Sage University Paper Series on Quantitative Applications in ...

bias''), especially out of a desire to avoid being perceived as having extremist views (an instance of

social desirability bias In Social research, social science research social-desirability bias is a type of response bias that is the tendency of survey methodology, survey respondents to answer questions in a manner that will be viewed favorably by others. It can take the ...

). This effect may appear early in a test due to an expectation that questions which the subject has stronger views on may follow, such that on earlier questions one "leaves room" for stronger responses later in the test. This expectation creates bias that is especially pernicious in that its effects are not uniform throughout the test and cannot be corrected for through simple across-the-board normalization; * Agree with statements as presented (''

acquiescence bias Acquiescence bias, also known as agreement bias, is a category of response bias common to survey research in which respondents have a tendency to select a positive response option or indicate a positive connotation disproportionately more frequent ...

''), for example, agreeing with both Statement A, and its opposite. This effect especially strong among children, people with developmental disabilities, elderly people, and individuals who are subjected to a culture of

institutionalization In sociology, institutionalisation (or institutionalization) is the process of embedding some conception (for example a belief, norm, social role, particular value or mode of behavior) within an organization, social system, or society as a w ...

that encourages and incentivizes eagerness to please; * Disagree with sentences as presented out of a defensive desire to avoid making erroneous statements and/or avoid negative consequences that respondents may fear will result from their answers being used against them, especially if misinterpreted and/or taken out of context; * Provide answers that they believe will be evaluated as indicating strength or lack of weakness/dysfunction ("faking good"); * Provide answers that they believe will be evaluated as indicating weakness or presence of impairment/pathology ("faking bad"); * Try to portray themselves or their organization in a light that they believe the examiner or society to consider more favorable than their true beliefs (''

'', the intersubjective version of objective "faking good" discussed above); * Try to portray themselves or their organization in a light that they believe the examiner or society to consider less favorable/more unfavorable than their true beliefs (''norm defiance'', the intersubjective version of objective "faking bad" discussed above). Designing a scale with balanced keying (an equal number of positive and negative statements and, especially, an equal number of positive and negative statements regarding each position or issue in question) can obviate the problem of acquiescence bias, since acquiescence on positively keyed items will balance acquiescence on negatively keyed items, but defensive, central tendency, and social desirability biases are somewhat more problematic.

Scoring and analysis

After the questionnaire is completed, each item may be analyzed separately or in some cases item responses may be summed to create a score for a group of items. Hence, Likert scales are often called summative scales. Whether individual Likert items can be considered as interval-level data, or whether they should be treated as ordered-categorical data is the subject of considerable disagreement in the literature, with strong convictions on what are the most applicable methods. This disagreement can be traced back, in many respects, to the extent to which Likert items are interpreted as being ordinal data. There are two primary considerations in this discussion. First, Likert scales are arbitrary. The value assigned to a Likert item has no objective numerical basis, either in terms of

measure theory In mathematics, the concept of a measure is a generalization and formalization of geometrical measures (length, area, volume) and other common notions, such as magnitude (mathematics), magnitude, mass, and probability of events. These seemingl ...

or scale (from which a

distance metric In mathematics, a metric space is a set together with a notion of ''distance'' between its elements, usually called points. The distance is measured by a function called a metric or distance function. Metric spaces are a general setting for ...

can be determined). The value assigned to each Likert item is simply determined by the researcher designing the survey, who makes the decision based on a desired level of detail. However, by convention Likert items tend to be assigned progressive positive integer values. Likert scales typically range from 2 to 10 – with 3, 5, or, 7 being the most common. Further, this progressive structure of the scale is such that each successive Likert item is treated as indicating a 'better' response than the preceding value. (This may differ in cases where reverse ordering of the Likert scale is needed). The second, and possibly more important point, is whether the "distance" between each successive item category is equivalent, which is inferred traditionally. For example, in the above five-point Likert item, the inference is that the 'distance' between category 1 and 2 is the same as between category 3 and 4. In terms of good research practice, an equidistant presentation by the researcher is important; otherwise a bias in the analysis may result. For example, a four-point Likert item with categories "Poor", "Average", "Good", and "Very Good" is unlikely to have all equidistant categories since there is only one category that can receive a below-average rating. This would arguably bias any result in favor of a positive outcome. On the other hand, even if a researcher presents what he or she believes are equidistant categories, it may not be interpreted as such by the respondent. A good Likert scale, as above, will present a ''symmetry'' of categories about a midpoint with clearly defined linguistic qualifiers. In such symmetric scaling, equidistant attributes will typically be more clearly observed or, at least, inferred. It is when a Likert scale is symmetric and equidistant that it will behave more like an interval-level measurement. So while a Likert scale is indeed ordinal, if well presented it may nevertheless approximate an interval-level measurement. This can be beneficial since, if it was treated just as an ordinal scale, then some valuable information could be lost if the 'distance' between Likert items were not available for consideration. The important idea here is that the appropriate type of analysis is dependent on how the Likert scale has been presented. The validity of such measures depends on the underlying interval nature of the scale. If interval nature is assumed for a comparison of two groups, the paired samples -test is not inappropriate. If non-parametric tests are to be performed the Pratt (1959) modification to the Wilcoxon signed-rank test is recommended over the standard

Wilcoxon signed-rank test The Wilcoxon signed-rank test is a non-parametric rank test for statistical hypothesis testing used either to test the location of a population based on a sample of data, or to compare the locations of two populations using two matched samples., ...

. Responses to several Likert questions may be summed providing that all questions use the same Likert scale and that the scale is a defensible approximation to an interval scale, in which case the

central limit theorem In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the Probability distribution, distribution of a normalized version of the sample mean converges to a Normal distribution#Standard normal distributi ...

allows treatment of the data as interval data measuring a latent variable. If the summed responses fulfill these assumptions, parametric statistical tests such as the

analysis of variance Analysis of variance (ANOVA) is a family of statistical methods used to compare the Mean, means of two or more groups by analyzing variance. Specifically, ANOVA compares the amount of variation ''between'' the group means to the amount of variati ...

can be applied. Typical cutoffs for thinking that this approximation will be acceptable is a minimum of four and preferably eight items in the sum. To model binary Likert responses directly, they may be represented in a

binomial Binomial may refer to: In mathematics *Binomial (polynomial), a polynomial with two terms *Binomial coefficient, numbers appearing in the expansions of powers of binomials *Binomial QMF, a perfect-reconstruction orthogonal wavelet decomposition * ...

form by summing agree and disagree responses separately. The chi-squared, Cochran's Q test, or

McNemar test McNemar's test is a statistical test used on paired nominal data. It is applied to 2 × 2 contingency tables with a dichotomous trait, with matched pairs of subjects, to determine whether the row and column marginal frequencies are equa ...

are common statistical procedures used after this transformation. Non-parametric tests such as

chi-squared test A chi-squared test (also chi-square or test) is a Statistical hypothesis testing, statistical hypothesis test used in the analysis of contingency tables when the sample sizes are large. In simpler terms, this test is primarily used to examine w ...

, Mann–Whitney test,

, or

Kruskal–Wallis test The Kruskal–Wallis test by ranks, Kruskal–Wallis H test (named after William Kruskal and W. Allen Wallis), or one-way ANOVA on ranks is a non-parametric statistical test for testing whether samples originate from the same distribution. It is ...

. are often used in the analysis of Likert scale data. Alternatively, Likert scale responses can be analyzed with an

ordered probit In statistics, the ordered logit model or proportional odds logistic regression is an ordinal regression model—that is, a regression model for ordinal dependent variables—first considered by Peter McCullagh. For example, if one question on ...

model, preserving the ordering of responses without the assumption of an interval scale. The use of an ordered probit model can prevent errors that arise when treating ordered ratings as interval-level measurements.

Consensus-based assessment Consensus-based assessment expands on the common practice of consensus decision-making and the theoretical observation that expertise can be closely approximated by large numbers of novices or journeymen. It creates a method for determining measure ...

(CBA) can be used to create an objective standard for Likert scales in domains where no generally accepted or objective standard exists. Consensus-based assessment (CBA) can be used to refine or even validate generally accepted standards.

Latent variable models

A common practice for analyzing responses to collections of Likert scale items is to summarize them via a

latent variable model A latent variable model is a statistical model that relates a set of observable variables (also called ''manifest variables'' or ''indicators'') to a set of latent variables. Latent variable models are applied across a wide range of fields such ...

, for example using

factor analysis Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observe ...

item response theory In psychometrics, item response theory (IRT, also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of Test (student assessment), tests, questionnaires, and sim ...

Rasch model

Likert scale data can, in principle, be used as a basis for obtaining interval level estimates on a continuum by applying the

polytomous Rasch model The polytomous Rasch model is generalization of the dichotomous Rasch model. It is a measurement model that has potential application in any context in which the objective is to measure a trait or ability through a process in which responses to i ...

, when data can be obtained that fit this model. In addition, the polytomous Rasch model permits testing of the

hypothesis A hypothesis (: hypotheses) is a proposed explanation for a phenomenon. A scientific hypothesis must be based on observations and make a testable and reproducible prediction about reality, in a process beginning with an educated guess o ...

that the statements reflect increasing levels of an attitude or trait, as intended. For example, application of the model often indicates that the neutral category does not represent a level of attitude or trait between the disagree and agree categories. Not every set of Likert scaled items can be used for Rasch measurement. The data has to be thoroughly checked to fulfill the strict formal

axiom An axiom, postulate, or assumption is a statement that is taken to be true, to serve as a premise or starting point for further reasoning and arguments. The word comes from the Ancient Greek word (), meaning 'that which is thought worthy or ...

s of the model. However, the raw scores are the

sufficient statistics In logic and mathematics, necessity and sufficiency are terms used to describe a conditional or implicational relationship between two statements. For example, in the conditional statement: "If then ", is necessary for , because the truth of ...

for the Rasch measures, a deliberate choice by

Georg Rasch Georg William Rasch () (21 September 1901 – 19 October 1980) was a Danish mathematician, statistician, and psychometrician, most famous for the development of a class of measurement models known as Rasch models. He studied with R.A. Fisher an ...

, so, if you are prepared to accept the raw scores as valid, then you can also accept the Rasch measures as valid.

Visual presentation of Likert-type data

An important part of data analysis and presentation is the visualization (or plotting) of data. The subject of plotting Likert (and other) rating data is discussed at length in two papers by Robbins and Heiberger. In the first they recommend the use of what they call diverging stacked bar charts and compare them to other plotting styles. The second paper describes the use of the Likert function in the HH package for R, and gives many examples of its use. Another paper also provided

Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (prog ...

code to create a clustered diverging stacked bar chart of 5-point Likert scale responses.

Level of measurement

The five response categories are often believed to represent an interval

level of measurement Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scale ...

. However, this can only be the case if the intervals between the scale points correspond to empirical observations in a metric sense. Reips and Funke (2008) show that this criterion is much better met by a

. In fact, there may also appear phenomena which even question the ordinal scale level in Likert scales. For example, in a set of items , , rated with a Likert scale circular relations like > , > and > can appear. This violates the axiom of transitivity for the ordinal scale. Research by Labovitz and Traylor provide evidence that, even with rather large distortions of perceived distances between scale points, Likert-type items perform closely to scales that are perceived as equal intervals. So these items and other equal-appearing scales in questionnaires are robust to violations of the equal distance assumption many researchers believe are required for parametric statistical procedures and tests.

Pronunciation

, the developer of the scale, pronounced his name . (That is, LICK-ert, as opposed to LIKE-ert.) Some have claimed that Likert's name "is among the most mispronounced in hefield", because many people pronounce the name of the scale as .

Notes

References

External links

* * {{cite web , last = Galili , first = Tal , title = Correlation scatter-plot matrix for ordered-categorical data , url = https://www.r-statistics.com/2010/04/correlation-scatter-plot-matrix-for-ordered-categorical-data/ , access-date = November 7, 2017 , website = R-statistics blog, date = 2010-04-07 * Uebersax, John S
Likert scales: Dispelling the confusion
2006. * Jebb, A. T., Ng, V., & Tay, L. (2021). A Review of Key Likert Scale Development Advances: 1995–2019. ''Frontiers in Psychology'', 12, 637547. https://doi.org/10.3389/fpsyg.2021.637547 Psychometrics Questionnaire construction Survey methodology