Content analysis is the study of
documents and communication artifacts, which might be texts of various formats, pictures, audio or video. Social scientists use content analysis to examine patterns in communication in a replicable and systematic manner. One of the key advantages of using content analysis to analyse social phenomena is its non-invasive nature, in contrast to simulating social experiences or collecting survey answers.
Practices and philosophies of content analysis vary between academic disciplines. They all involve systematic reading or observation of
texts
Text may refer to:
Written word
* Text (literary theory), any object that can be read, including:
**Religious text, a writing that a religious tradition considers to be sacred
**Text, a verse or passage from scripture used in expository preachin ...
or artifacts which are
assigned labels (sometimes called codes) to indicate the presence of interesting,
meaningful
Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and compu ...
pieces of content.
By systematically labeling the content of a set of
texts
Text may refer to:
Written word
* Text (literary theory), any object that can be read, including:
**Religious text, a writing that a religious tradition considers to be sacred
**Text, a verse or passage from scripture used in expository preachin ...
, researchers can analyse patterns of content
quantitatively using
statistical methods, or use
qualitative
Qualitative descriptions or distinctions are based on some quality or characteristic rather than on some quantity or measured value.
Qualitative may also refer to:
*Qualitative property, a property that can be observed but not measured numericall ...
methods to analyse meanings of content within
texts
Text may refer to:
Written word
* Text (literary theory), any object that can be read, including:
**Religious text, a writing that a religious tradition considers to be sacred
**Text, a verse or passage from scripture used in expository preachin ...
.
Computers are increasingly used in content analysis to automate the labeling (or coding) of documents. Simple computational techniques can provide descriptive data such as word frequencies and document lengths.
Machine learning classifiers can greatly increase the number of texts that can be labeled, but the scientific utility of doing so is a matter of debate. Further, numerous computer-aided text analysis (CATA) computer programs are available that analyze text for pre-determined linguistic, semantic, and psychological characteristics.
Goals
Content analysis is best understood as a broad family of techniques. Effective researchers choose techniques that best help them answer their substantive questions. That said, according to
Klaus Krippendorff, six questions must be addressed in every content analysis:
#Which data are analyzed?
#How are the data defined?
#From what population are data drawn?
#What is the relevant context?
#What are the boundaries of the analysis?
#What is to be measured?
The simplest and most objective form of content analysis considers unambiguous characteristics of the text such as
word frequencies, the page area taken by a newspaper column, or the duration of a
radio or
television program. Analysis of simple word frequencies is limited because the meaning of a word depends on surrounding text.
Key Word In Context
Key Word In Context (KWIC) is the most common format for concordance lines. The term KWIC was first coined by Hans Peter Luhn. The system was based on a concept called ''keyword in titles'' which was first proposed for Manchester libraries in 1864 ...
(KWIC) routines address this by placing words in their textual context. This helps resolve ambiguities such as those introduced by
synonym
A synonym is a word, morpheme, or phrase that means exactly or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are all ...
s and
homonyms.
A further step in analysis is the distinction between dictionary-based (quantitative) approaches and qualitative approaches. Dictionary-based approaches set up a list of categories derived from the frequency list of words and control the distribution of words and their respective categories over the texts. While methods in quantitative content analysis in this way transform observations of found categories into quantitative statistical data, the qualitative content analysis focuses more on the intentionality and its implications. There are strong parallels between qualitative content analysis and
thematic analysis.
Qualitative and quantitative content analysis
Quantitative content analysis highlights frequency counts and objective analysis of these coded frequencies.
Additionally, quantitative content analysis begins with a framed hypothesis with coding decided on before the analysis begins. These coding categories are strictly relevant to the researcher's hypothesis. Quantitative analysis also takes a deductive approach.
Examples of content-analytical variables and constructs can be found, for example, in the open-access databas
DOCA This database compiles, systematizes, and evaluates relevant content-analytical variables of communication and political science research areas and topics.
Siegfried Kracauer provides a critique of quantitative analysis, asserting that it oversimplifies complex communications in order to be more reliable. On the other hand, qualitative analysis deals with the intricacies of latent interpretations, whereas quantitative has a focus on manifest meanings. He also acknowledges an "overlap" of qualitative and quantitative content analysis.
Patterns are looked at more closely in qualitative analysis, and based on the latent meanings that the researcher may find, the course of the research could be changed. It is inductive and begins with open research questions, as opposed to a hypothesis.
Codebooks
The data collection instrument used in content analysis is the codebook or coding scheme. In qualitative content analysis the codebook is constructed and improved ''during'' coding, while in quantitative content analysis the codebook needs to be developed and pretested for reliability and validity ''before'' coding.
The codebook includes detailed instructions for human coders plus clear definitions of the respective concepts or variables to be coded plus the assigned values.
According to current standards of good scientific practice, each content analysis study should provide their codebook in the appendix or as supplementary material so that
reproducibility
Reproducibility, also known as replicability and repeatability, is a major principle underpinning the scientific method. For the findings of a study to be reproducible means that results obtained by an experiment or an observational study or in a ...
of the study is ensured. On th
Open Science Framework(OSF) server of the
Center for Open Science a lot of codebooks of content analysis studies are freely available via search for “codebook”.
Furthermore, the ''Database of Variables for Content Analysis'' (DOCA) provides an open access archive of pretested variables and established codebooks for content analyses. Measures from the archive can be adopted in future studies to ensure the use of high-quality and comparable instruments. DOCA covers, among others, measures for the content analysis of fictional media and entertainment (e.g., measures for sexualization in video games), of user-generated media content (e.g., measures for online hate speech), and of news media and journalism (e.g., measures for stock photo use in press reporting on child sexual abuse, and measures of personalization in election campaign coverage).
Computational tools
With the rise of common computing facilities like PCs, computer-based methods of analysis are growing in popularity. Answers to open ended questions, newspaper articles, political party manifestos, medical records or systematic observations in experiments can all be subject to systematic analysis of textual data.
By having contents of communication available in form of machine readable texts, the input is analyzed for frequencies and coded into categories for building up inferences.
Computer-assisted analysis can help with large, electronic data sets by cutting out time and eliminating the need for multiple human coders to establish inter-coder reliability. However, human coders can still be employed for content analysis, as they are often more able to pick out nuanced and latent meanings in text. A study found that human coders were able to evaluate a broader range and make inferences based on latent meanings.
Reliability and Validity
Robert Weber notes: "To make valid inferences from the text, it is important that the classification procedure be reliable in the sense of being consistent: Different people should code the same text in the same way". The validity, inter-coder reliability and intra-coder reliability are subject to intense methodological research efforts over long years.
Neuendorf suggests that when human coders are used in content analysis at least two independent coders should be used.
Reliability of human coding is often measured using a statistical measure of ''inter-coder reliability'' or "the amount of agreement or correspondence among two or more coders".
Lacy and Riffe identify the measurement of inter-coder reliability as a strength of quantitative content analysis, arguing that, if content analysts do not measure inter-coder reliability, their data are no more reliable than the subjective impressions of a single reader.
According to today’s reporting standards, quantitative content analyses should be published with complete codebooks and for all variables or measures in the codebook the appropriate inter-coder or
inter-rater reliability coefficients should be reported based on empirical pre-tests.
Furthermore, the
validity
Validity or Valid may refer to:
Science/mathematics/statistics:
* Validity (logic), a property of a logical argument
* Scientific:
** Internal validity, the validity of causal inferences within scientific studies, usually based on experiments
...
of all variables or measures in the codebook must be ensured. This can be achieved through the use of established measures that have proven their validity in earlier studies. Also, the
content validity of the measures can be checked by experts from the field who scrutinize and then approve or correct coding instructions, definitions and examples in the codebook.
Kinds of text
There are five types of texts in content analysis:
#
written text, such as books and papers
# oral text, such as speech and theatrical performance
# iconic text, such as drawings, paintings, and icons
# audio-visual text, such as TV programs, movies, and videos
#
hypertext
Hypertext is E-text, text displayed on a computer display or other electronic devices with references (hyperlinks) to other text that the reader can immediately access. Hypertext documents are interconnected by hyperlinks, which are typi ...
s, which are texts found on the Internet
History
Content analysis is research using the categorization and classification of speech, written text, interviews, images, or other forms of communication. In its beginnings, using the first newspapers at the end of the 19th century, analysis was done manually by measuring the number of columns given a subject. The approach can also be traced back to a university student studying patterns in Shakespeare's literature in 1893.
Over the years, content analysis has been applied to a variety of scopes.
Hermeneutics and
philology have long used content analysis to interpret sacred and profane texts and, in many cases, to attribute texts'
authorship and
authenticity.
In recent times, particularly with the advent of
mass communication
Mass communication is the process of imparting and exchanging information through mass media to large segments of the population. It is usually understood for relating to various forms of media, as its technologies are used for the dissemination o ...
, content analysis has known an increasing use to deeply analyze and understand media content and media logic.
The political scientist
Harold Lasswell
Harold Dwight Lasswell (February 13, 1902December 18, 1978) was an American political scientist and communications theorist. He earned his bachelor's degree in philosophy and economics and was a PhD student at the University of Chicago. He was ...
formulated the core questions of content analysis in its early-mid 20th-century mainstream version: "Who says what, to whom, why, to what extent and with what effect?". The strong emphasis for a quantitative approach started up by Lasswell was finally carried out by another "father" of content analysis,
Bernard Berelson
Bernard Reuben Berelson (1912–1979) was an American behavioral scientist, known for his work on communication and mass media.
He was a leading proponent of the broad idea of the "behavioral sciences", a field he saw as including areas such as ...
, who proposed a definition of content analysis which, from this point of view, is emblematic: "a research technique for the objective, systematic and quantitative description of the manifest content of communication".
Quantitative content analysis has enjoyed a renewed popularity in recent years thanks to technological advances and fruitful application in of mass communication and personal communication research. Content analysis of textual
big data produced by
new media
New media describes communication technologies that enable or enhance interaction between users as well as interaction between users and content. In the middle of the 1990s, the phrase "new media" became widely used as part of a sales pitch for ...
, particularly
social media
Social media are interactive media technologies that facilitate the creation and sharing of information, ideas, interests, and other forms of expression through virtual communities and networks. While challenges to the definition of ''social me ...
and
mobile devices
A mobile device (or handheld computer) is a computer small enough to hold and operate in the hand. Mobile devices typically have a flat LCD or OLED screen, a touchscreen interface, and digital or physical buttons. They may also have a physical ...
has become popular. These approaches take a simplified view of language that ignores the complexity of semiosis, the process by which meaning is formed out of language. Quantitative content analysts have been criticized for limiting the scope of content analysis to simple counting, and for applying the measurement methodologies of the natural sciences without reflecting critically on their appropriateness to social science.
Conversely, qualitative content analysts have been criticized for being insufficiently systematic and too impressionistic.
Krippendorff argues that quantitative and qualitative approaches to content analysis tend to overlap, and that there can be no generalisable conclusion as to which approach is superior.
Content analysis can also be described as studying
traces
Traces may refer to:
Literature
* ''Traces'' (book), a 1998 short-story collection by Stephen Baxter
* ''Traces'' series, a series of novels by Malcolm Rose
Music Albums
* ''Traces'' (Classics IV album) or the title song (see below), 1969
* ''Tra ...
, which are documents from past times, and artifacts, which are non-linguistic documents. Texts are understood to be produced by communication processes in a broad sense of that phrase—often gaining mean through
abduction
Abduction may refer to:
Media
Film and television
* "Abduction" (''The Outer Limits''), a 2001 television episode
* " Abduction" (''Death Note'') a Japanese animation television series
* " Abductions" (''Totally Spies!''), a 2002 episode of an ...
.
Latent and manifest content
Manifest content is readily understandable at its face value. Its meaning is direct. Latent content is not as overt, and requires interpretation to uncover the meaning or implication.
Uses
Holsti groups fifteen uses of content analysis into three basic
categories
Category, plural categories, may refer to:
Philosophy and general uses
*Categorization, categories in cognitive science, information science and generally
*Category of being
* ''Categories'' (Aristotle)
*Category (Kant)
*Categories (Peirce)
*C ...
:
* make
inference
Inferences are steps in reasoning, moving from premises to logical consequences; etymologically, the word '' infer'' means to "carry forward". Inference is theoretically traditionally divided into deduction and induction, a distinction that ...
s about the antecedents of a
communication
Communication (from la, communicare, meaning "to share" or "to be in relation with") is usually defined as the transmission of information. The term may also refer to the message communicated through such transmissions or the field of inqu ...
*describe and make inferences about characteristics of a communication
*make inferences about the
effects
Effect may refer to:
* A result or change of something
** List of effects
** Cause and effect, an idiom describing causality
Pharmacy and pharmacology
* Drug effect, a change resulting from the administration of a drug
** Therapeutic effect, a b ...
of a communication.
He also places these uses into the context of the basic communication
paradigm.
The following table shows fifteen uses of content analysis in terms of their general purpose, element of the communication paradigm to which they apply, and the general question they are intended to answer.
As a counterpoint, there are limits to the scope of use for the procedures that characterize content analysis. In particular, if access to the goal of analysis can be obtained by direct means without material interference, then direct measurement techniques yield better data. Thus, while content analysis attempts to quantifiably describe
communications whose features are primarily categorical——limited usually to a nominal or ordinal scale——via selected conceptual units (the
unitization) which are assigned values (the
categorization) for
enumeration while monitoring
intercoder reliability, if instead the target quantity manifestly is already directly measurable——typically on an interval or ratio scale——especially a continuous physical quantity, then such targets usually are not listed among those needing the "subjective" selections and formulations of content analysis.
For example (from mixed research and clinical application), as medical images
communicate diagnostic features to physicians,
neuroimaging
Neuroimaging is the use of quantitative (computational) techniques to study the structure and function of the central nervous system, developed as an objective way of scientifically studying the healthy human brain in a non-invasive manner. Incr ...
's
stroke (infarct) volume scale called ASPECTS is
unitized as 10 qualitatively delineated (unequal) brain regions in the
middle cerebral artery
The middle cerebral artery (MCA) is one of the three major paired cerebral arteries that supply blood to the cerebrum. The MCA arises from the internal carotid artery and continues into the lateral sulcus where it then branches and projects to man ...
territory, which it
categorizes as being at least partly versus not at all infarcted in order to
enumerate the latter, with published series often assessing
intercoder reliability by
Cohen's kappa
Cohen's kappa coefficient (''κ'', lowercase Greek kappa) is a statistic that is used to measure inter-rater reliability (and also intra-rater reliability) for qualitative (categorical) items. It is generally thought to be a more robust measure tha ...
. The foregoing
italicized operations impose the uncredited
form of content analysis onto an estimation of infarct extent, which instead is easily enough and more accurately measured as a volume directly on the images. ("Accuracy ... is the highest form of reliability.") The concomitant clinical assessment, however, by the
National Institutes of Health Stroke Scale The National Institutes of Health Stroke Scale, or NIH Stroke Scale (NIHSS), is a tool used by healthcare providers to objectively quantify the impairment caused by a stroke. The NIHSS is composed of 11 items, each of which scores a specific abilit ...
(NIHSS) or the
modified Rankin Scale (mRS), retains the necessary form of content analysis. Recognizing potential limits of content analysis across the contents of language and images alike,
Klaus Krippendorff affirms that "comprehen
ion
An ion () is an atom or molecule with a net electrical charge.
The charge of an electron is considered to be negative by convention and this charge is equal and opposite to the charge of a proton, which is considered to be positive by conven ...
... may ... not conform at all to the process of classification and/or counting by which most content analyses proceed," suggesting that content analysis might materially distort a message.
The development of the initial coding scheme
The process of the initial coding scheme or approach to coding is contingent on the particular content analysis approach selected. Through a directed content analysis, the scholars draft a preliminary coding scheme from pre-existing theory or assumptions. While with the conventional content analysis approach, the initial coding scheme developed from the data.
The conventional process of coding
With either approach above, immersing oneself into the data to obtain an overall picture is recommendable for researchers to conduct. Furthermore, identifying a consistent and clear unit of coding is vital, and researchers' choices range from a single word to several paragraphs, from texts to iconic symbols. Last, constructing the relationships between codes by sorting out them within specific categories or themes.
See also
*
Donald Wayne Foster
Donald Wayne Foster (born 1950) was a professor of English at Vassar College in New York. He is now retired. He is known for his work dealing with various issues of Shakespearean authorship through textual analysis. He has also applied these tech ...
*
Hermeneutics
*
Text mining
Text mining, also referred to as ''text data mining'', similar to text analytics, is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extract ...
* ''
The Polish Peasant in Europe and America
''The Polish Peasant in Europe and America'' is a book by Florian Znaniecki and William I. Thomas, considered to be one of the classics of sociology. The book is a study of Polish immigrants to the United States and their families, based on pers ...
''
*
Transition words
A transition or linking word is a word or phrase that shows the relationship between paragraphs or sections of a text or speech. Transitions provide greater cohesion by making it more explicit or signaling how ideas relate to one another. Transiti ...
*
Video content analysis Video content analysis or video content analytics (VCA), also known as video analysis or video analytics (VA), is the capability of automatically analyzing video to detect and determine temporal and spatial events.
This technical capability is used ...
References
Further reading
*
* Budge, Ian (ed.) (2001). ''Mapping Policy Preferences. Estimates for Parties, Electors and Governments 1945-1998''. Oxford, UK: Oxford University Press. .
* Krippendorff, Klaus, and Bock, Mary Angela (eds) (2008). ''The Content Analysis Reader.'' Thousand Oaks, CA: Sage. .
* Neuendorf, Kimberly A. (2017). ''The Content Analysis Guidebook,'' 2nd ed. Thousand Oaks, CA: Sage. .
* Roberts, Carl W. (ed.) (1997). ''Text Analysis for the Social Sciences: Methods for Drawing Inferences from Texts and Transcripts.'' Mahwah, NJ: Lawrence Erlbaum. .
* Wimmer, Roger D. and Dominick, Joseph R. (2005). ''Mass Media Research: An Introduction,'' 8th ed. Belmont, CA: Wadsworth. .
{{DEFAULTSORT:Content Analysis
Quantitative research
Qualitative research
Hermeneutics