In the social sciences, scaling is the process of

measuring Measurement is the quantification of attributes of an object or event, which can be used to compare with other objects or events. In other words, measurement is a process of determining how large or small a physical quantity is as compared ...

or ordering entities with respect to quantitative attributes or traits. For example, a scaling technique might involve estimating individuals' levels of extraversion, or the perceived quality of products. Certain methods of scaling permit estimation of magnitudes on a continuum, while other methods provide only for relative ordering of the entities. The level of measurement is the type of data that is measured. The word scale, including in academic literature, is sometimes used to refer to another

composite measure Composite measure in statistics and research design refer to composite measures of variables, i.e. measurements based on multiple data items. An example of a composite measure is an IQ test, which gives a single score based on a series of response ...

, that of an index. Those concepts are however different.

Scale construction decisions

*What level ( level of measurement) of data is involved ( nominal, ordinal, interval, or

ratio In mathematics, a ratio shows how many times one number contains another. For example, if there are eight oranges and six lemons in a bowl of fruit, then the ratio of oranges to lemons is eight to six (that is, 8:6, which is equivalent to the ...

)? *What will the results be used for? *What should be used - a scale, index, or typology? *What types of statistical analysis would be useful? *Choose to use a comparative scale or a noncomparative scale. *How many scale divisions or categories should be used (1 to 10; 1 to 7; −3 to +3)? *Should there be an odd or even number of divisions? (Odd gives neutral center value; even forces respondents to take a non-neutral position.) *What should the nature and descriptiveness of the scale labels be? *What should the physical form or layout of the scale be? (graphic, simple linear, vertical, horizontal) *Should a response be forced or be left optional?

Scale construction method

It is possible that something similar to one's scale will already exist, so including those scale(s) and possible dependent variables in one's survey may increase validity of one's scale. #Begin by generating at least ten items to represent each of the scales. Administer the survey; the more representative and larger the sample, the more

confidence Confidence is a state of being clear-headed either that a hypothesis or prediction is correct or that a chosen course of action is the best or most effective. Confidence comes from a Latin word 'fidere' which means "to trust"; therefore, having ...

one will have in the scales. #Review the

means Means may refer to: * Means LLC, an anti-capitalist media worker cooperative * Means (band), a Christian hardcore band from Regina, Saskatchewan * Means, Kentucky, a town in the US * Means (surname) * Means Johnston Jr. (1916–1989), US Navy adm ...

and

standard deviations In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...

for the items, dropping any items with skewed means or very low variance. #Run a

principal components analysis Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...

with oblique rotation on one's items and the other items for scales - it will be important to differentiate from one's own. Request components with eigenvalues (for calculating eigenvalue for each factor square the factor loading's and sum down the columns) greater than 1. It is easier to group the items by targeted scales. The more distinct the other items, the better the chances the items will load only on one's own scale. #“Cleanly loaded items” are those that load at least .40 on one component and more than .10 greater on that component than on any others. Identify those. #“Cross loaded items” are those that do not meet the above criterion. These are candidates to drop. #Identify components with only a few items that do not represent clear concepts, these are “uninterpretable scales.” Also identify any components with only one item. These components and their items are candidates to drop. #Look at the candidates to drop and the components to be dropped. Is there anything that needs to be retained because it is critical to one's construct ? For example, if a conceptually important item only cross loads on a component to be dropped, it is good to keep it for the next round. #Drop the items, and rerun asking the program to give one only the number of components after dropping the uninterpretable and single-item ones. Go through the process again starting at Step 3. #Keep running through the process until one get “clean factors” (all components have cleanly loaded items). #Run the Alpha program (asking for the Alphas if each item is dropped). Any scales with insufficient Alphas should be dropped and the process repeated from Step 3. ''Coefficient alpha=number of items² x average correlation between different items/sum of all correlations in the correlation matrix (including 1s)'' #For better practices, keep the final components and all loadings of yours and similar scales selected to be used in the Appendix of one's scale.

Data types

The type of information collected can influence scale construction. Different types of information are measured in different ways. #Some data are measured at the

nominal level Nominal level is the operating level at which an electronic signal processing device is designed to operate. The electronic circuits that make up such equipment are limited in the maximum signal they can handle and the low-level internally genera ...

. That is, any numbers used are mere labels; they express no mathematical properties. Examples are SKU inventory codes and UPC bar codes. #Some data are measured at the ordinal level. Numbers indicate the relative position of items, but not the magnitude of difference. An example is a preference ranking. #Some data are measured at the interval level. Numbers indicate the magnitude of difference between items, but there is no absolute zero point. Examples are attitude scales and opinion scales. #Some data are measured at the ratio level. Numbers indicate magnitude of difference and there is a fixed zero point. Ratios can be calculated. Examples include: age, income, price, costs, sales revenue, sales volume, and market share.

Composite measures

Composite measure Composite measure in statistics and research design refer to composite measures of variables, i.e. measurements based on multiple data items. An example of a composite measure is an IQ test, which gives a single score based on a series of response ...

s of variables are created by combining two or more separate empirical indicators into a single measure. Composite measures measure complex concepts more adequately than single indicators, extend the range of scores available and are more efficient at handling multiple items. In addition to scales, there are two other types of composite measures.

Indexes Index (or its plural form indices) may refer to: Arts, entertainment, and media Fictional entities * Index (''A Certain Magical Index''), a character in the light novel series ''A Certain Magical Index'' * The Index, an item on a Halo megastru ...

are similar to scales except multiple indicators of a variable are combined into a single measure. The index of consumer confidence, for example, is a combination of several measures of consumer attitudes. A

typology Typology is the study of types or the systematic classification of the types of something according to their common characteristics. Typology is the act of finding, counting and classification facts with the help of eyes, other senses and logic. Ty ...

is similar to an index except the variable is measured at the

. Indexes are constructed by accumulating scores assigned to individual attributes, while scales are constructed through the assignment of scores to patterns of attributes. While indexes and scales provide measures of a single

dimension In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coor ...

, typologies are often employed to examine the intersection of two or more dimensions. Typologies are very useful analytical tools and can be easily used as independent variables, although since they are not unidimensional it is difficult to use them as a dependent variable.

Comparative and non comparative scaling

With comparative scaling, the items are directly compared with each other (example: Does one prefer

Pepsi Pepsi is a carbonated soft drink manufactured by PepsiCo. Originally created and developed in 1893 by Caleb Bradham and introduced as Brad's Drink, it was renamed as Pepsi-Cola in 1898, and then shortened to Pepsi in 1961. History Pepsi wa ...

or Coke?). In noncomparative scaling each item is scaled independently of the others. (Example: How does one feel about Coke?)

Comparative scaling techniques

Pairwise comparison Pairwise comparison generally is any process of comparing entities in pairs to judge which of each entity is preferred, or has a greater amount of some quantitative property, or whether or not the two entities are identical. The method of pairwi ...

scale – a respondent is presented with two items at a time and asked to select one (example : does one prefer Pepsi or Coke?). This is an ordinal level technique when a measurement model is not applied. Krus and Kennedy (1977) elaborated the paired comparison scaling within their domain-referenced model. The Bradley–Terry–Luce (BTL) model (Bradley and Terry, 1952; Luce, 1959) can be applied in order to derive measurements provided the data derived from paired comparisons possess an appropriate structure. Thurstone's

Law of comparative judgment The law of comparative judgment was conceived by L. L. Thurstone. In modern-day terminology, it is more aptly described as a model that is used to obtain measurements from any process of pairwise comparison. Examples of such processes are the compa ...

can also be applied in such contexts. * Rasch model scaling – respondents interact with items and comparisons are inferred between items from the responses to obtain scale values. Respondents are subsequently also scaled based on their responses to items given the item scale values. The Rasch model has a close relation to the BTL model. * Rank-ordering – a respondent is presented with several items simultaneously and asked to rank them (example : Rate the following advertisements from 1 to 10.). This is an ordinal level technique. * Bogardus social distance scale – measures the degree to which a person is willing to associate with a class or type of people. It asks how willing the respondent is to make various associations. The results are reduced to a single score on a scale. There are also non-comparative versions of this scale. *

Q-Sort Q methodology is a research method used in psychology and in social sciences to study people's "subjectivity"—that is, their viewpoint. Q was developed by psychologist William Stephenson. It has been used both in clinical settings for assessing a ...

– Up to 140 items are sorted into groups based on rank-order procedure. * Guttman scale – This is a procedure to determine whether a set of items can be rank-ordered on a unidimensional scale. It utilizes the intensity structure among several indicators of a given variable. Statements are listed in order of importance. The rating is scaled by summing all responses until the first negative response in the list. The Guttman scale is related to Rasch measurement; specifically, Rasch models bring the Guttman approach within a probabilistic framework. * Constant sum scale – a respondent is given a constant sum of money, script, credits, or points and asked to allocate these to various items (example : If one had 100 Yen to spend on food products, how much would one spend on product A, on product B, on product C, etc.). This is an ordinal level technique. * Magnitude estimation scale – In a psychophysics procedure invented by S. S. Stevens people simply assign numbers to the dimension of judgment. The geometric mean of those numbers usually produces a power law with a characteristic exponent. In cross-modality matching instead of assigning numbers, people manipulate another dimension, such as loudness or brightness to match the items. Typically the exponent of the psychometric function can be predicted from the magnitude estimation exponents of each dimension.

Non-comparative scaling techniques

* Visual analogue scale (also called the Continuous rating scale and the graphic rating scale) – respondents rate items by placing a mark on a line. The line is usually labeled at each end. There are sometimes a series of numbers, called scale points, (say, from zero to 100) under the line. Scoring and codification is difficult for paper-and-pencil scales, but not for computerized and Internet-based visual analogue scales.U.-D. Reips and F. Funke (2008) "Interval level measurement with visual analogue scales in Internet-based research: VAS Generator." * Likert scale – Respondents are asked to indicate the amount of agreement or disagreement (from strongly agree to strongly disagree) on a five- to nine-point response scale (not to be confused with a Likert scale). The same format is used for multiple questions. It is the combination of these questions that forms the Likert scale. This categorical scaling procedure can easily be extended to a magnitude estimation procedure that uses the full scale of numbers rather than verbal categories. * Phrase completion scales – Respondents are asked to complete a phrase on an 11-point response scale in which 0 represents the absence of the theoretical construct and 10 represents the theorized maximum amount of the construct being measured. The same basic format is used for multiple questions. * Semantic differential scale – Respondents are asked to rate on a 7-point scale an item on various attributes. Each attribute requires a scale with bipolar terminal labels. * Stapel scale – This is a unipolar ten-point rating scale. It ranges from +5 to −5 and has no neutral zero point. * Thurstone scale – This is a scaling technique that incorporates the intensity structure among indicators. * Mathematically derived scale – Researchers infer respondents’ evaluations mathematically. Two examples are multi dimensional scaling and

conjoint analysis Conjoint analysis is a survey-based statistical technique used in market research that helps determine how people value different attributes (feature, function, benefits) that make up an individual product or service. The objective of conjoint an ...

Scale evaluation

Scales should be tested for

reliability Reliability, reliable, or unreliable may refer to: Science, technology, and mathematics Computing * Data reliability (disambiguation), a property of some disk arrays in computer storage * High availability * Reliability (computer networking), a ...

, generalizability, and validity. Generalizability is the ability to make inferences from a sample to the population, given the scale one have selected. Reliability is the extent to which a scale will produce consistent results. Test-retest reliability checks how similar the results are if the research is repeated under similar circumstances. Alternative forms reliability checks how similar the results are if the research is repeated using different forms of the scale. Internal consistency reliability checks how well the individual measures included in the scale are converted into a composite measure. Scales and indexes have to be validated. Internal validation checks the relation between the individual measures included in the scale, and the composite scale itself. External validation checks the relation between the composite scale and other indicators of the variable, indicators not included in the scale. Content validation (also called face validity) checks how well the scale measures what is supposed to measured. Criterion validation checks how meaningful the scale criteria are relative to other possible criteria. Construct validation checks what underlying construct is being measured. There are three variants of

construct validity Construct validity concerns how well a set of indicators represent or reflect a concept that is not directly measurable. ''Construct validation'' is the accumulation of evidence to support the interpretation of what a measure reflects.Polit DF Beck ...

. They are

convergent validity Convergent validity, for human cognition, especially within sociology, psychology, and other behavioral sciences, refers to the degree to which two measures that theoretically should be related, are in fact related. Convergent validity, along with ...

discriminant validity In psychology, discriminant validity tests whether concepts or measurements that are not supposed to be related are actually unrelated. Campbell and Fiske (1959) introduced the concept of discriminant validity within their discussion on evaluating ...

, and nomological validity (Campbell and Fiske, 1959; Krus and Ney, 1978). The coefficient of reproducibility indicates how well the data from the individual measures included in the scale can be reconstructed from the composite scale.

References

External links

* Handbook of Management Scales – Multi-item metrics to be used in research, Wikibooks Questionnaire construction Psychometrics * * Survey methodology lt:Matavimų skalė pt:Escala (estatística) fi:Mitta-asteikko