Social sequence analysis
   HOME

TheInfoList



OR:

In social sciences, sequence analysis (SA) is concerned with the analysis of sets of categorical sequences that typically describe
longitudinal data In statistics and econometrics, panel data and longitudinal data are both multi-dimensional data involving measurements over time. Panel data is a subset of longitudinal data where observations are for the same subjects each time. Time series and ...
. Analyzed sequences are encoded representations of, for example, individual life trajectories such as family formation, school to work transitions, working careers, but they may also describe daily or weekly time use or represent the evolution of observed or self-reported health, of political behaviors, or the development stages of organizations. Such sequences are chronologically ordered unlike words or DNA sequences for example. SA is a longitudinal analysis approach that is holistic in the sense that it considers each sequence as a whole. SA is essentially exploratory. Broadly, SA provides a comprehensible overall picture of sets of sequences with the objective of characterizing the structure of the set of sequences, finding the salient characteristics of groups, identifying typical paths, comparing groups, and more generally studying how the sequences are related to covariates such as sex, birth cohort, or social origin. Introduced in the social sciences in the 80s by
Andrew Abbott Andrew Delano Abbott (born November 1948) is an American sociologist and social theorist working at the University of Chicago. His research topics range from occupations and professions to the philosophy of methods, the history of academic disc ...
, SA has gained much popularity after the release of dedicated software such as the SQ and SADI addons for Stata and th
TraMineR
R package with its companions TraMineRextras and WeightedCluster. Despite some connections, the aims and methods of SA in social sciences strongly differ from those of sequence analysis in bioinformatics.


History

Sequence analysis methods were first imported into the social sciences from the information and biological sciences (see Sequence alignment) by the
University of Chicago The University of Chicago (UChicago, Chicago, U of C, or UChi) is a private university, private research university in Chicago, Illinois. Its main campus is located in Chicago's Hyde Park, Chicago, Hyde Park neighborhood. The University of Chic ...
sociologist
Andrew Abbott Andrew Delano Abbott (born November 1948) is an American sociologist and social theorist working at the University of Chicago. His research topics range from occupations and professions to the philosophy of methods, the history of academic disc ...
in the 1980s, and they have since developed in ways that are unique to the social sciences. Scholars in
psychology Psychology is the scientific study of mind and behavior. Psychology includes the study of conscious and unconscious phenomena, including feelings and thoughts. It is an academic discipline of immense scope, crossing the boundaries between ...
,
economics Economics () is the social science that studies the production, distribution, and consumption of goods and services. Economics focuses on the behaviour and interactions of economic agents and how economies work. Microeconomics analyzes ...
,
anthropology Anthropology is the scientific study of humanity, concerned with human behavior, human biology, cultures, societies, and linguistics, in both the present and past, including past human species. Social anthropology studies patterns of be ...
,
demography Demography () is the statistical study of populations, especially human beings. Demographic analysis examines and measures the dimensions and dynamics of populations; it can cover whole societies or groups defined by criteria such as edu ...
,
communication Communication (from la, communicare, meaning "to share" or "to be in relation with") is usually defined as the transmission of information. The term may also refer to the message communicated through such transmissions or the field of inqui ...
,
political science Political science is the scientific study of politics. It is a social science dealing with systems of governance and power, and the analysis of political activities, political thought, political behavior, and associated constitutions and la ...
, organizational studies, and especially
sociology Sociology is a social science that focuses on society, human social behavior, patterns of social relationships, social interaction, and aspects of culture associated with everyday life. It uses various methods of empirical investigation an ...
have been using sequence methods ever since. In sociology, sequence techniques are most commonly employed in studies of patterns of life-course development, cycles, and life histories. There has been a great deal of work on the sequential development of careers, and there is increasing interest in how career trajectories intertwine with life-course sequences. Many scholars have used sequence techniques to model how work and family activities are linked in household divisions of labor and the problem of schedule synchronization within families. The study of interaction patterns is increasingly centered on sequential concepts, such as turn-taking, the predominance of reciprocal utterances, and the strategic solicitation of preferred types of responses (see
Conversation Analysis Conversation analysis (CA) is an approach to the study of social interaction, embracing both verbal and non-verbal conduct, in situations of everyday life. CA originated as a sociological method, but has since spread to other fields. CA began with ...
). Social network analysts (see
Social network analysis Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of ''nodes'' (individual actors, people, or things within the network) ...
) have begun to turn to sequence methods and concepts to understand how social contacts and activities are enacted in real time, and to model and depict how whole networks evolve. Social network epidemiologists have begun to examine social contact sequencing to better understand the spread of disease. Psychologists have used those methods to study how the order of information affects learning, and to identify structure in interactions between individuals (see Sequence learning). Many of the methodological developments in sequence analysis came on the heels of a special section devoted to the topic in a 2000 issue of '' Sociological Methods & Research'', which hosted a debate over the use of the
optimal matching Optimal matching is a sequence analysis method used in social science, to assess the dissimilarity of ordered arrays of tokens that usually represent a time-ordered sequence of socio-economic states two individuals have experienced. Once such dista ...
(OM) edit distance for comparing sequences. In particular, sociologists objected to the descriptive and data-reducing orientation of
optimal matching Optimal matching is a sequence analysis method used in social science, to assess the dissimilarity of ordered arrays of tokens that usually represent a time-ordered sequence of socio-economic states two individuals have experienced. Once such dista ...
, as well as to a lack of fit between bioinformatic sequence methods and uniquely social phenomena. The debate has given rise to several methodological innovations (see Pairwise dissimilarities below) that address limitations of early sequence comparison methods developed in the 20th century. In 2006, David Stark and Balazs Vedres proposed the term "social sequence analysis" to distinguish the approach from bioinformatic
sequence analysis In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. Methodologies used include sequence alig ...
. However, if we except the nice book by Benjamin Cornwell, the term was seldom used, probably because the context prevents any confusion in the SA literature. ''Sociological Methods & Research'' organized a special issue on sequence analysis in 2010, leading to what Aisenbrey and Fasang referred to as the "second wave of sequence analysis", which mainly extended optimal matching and introduced other techniques to compare sequences. Alongside sequence comparison, recent advances in SA concerned among others the visualization of sets of sequence data, the measure and analysis of the discrepancy of sequences, the identification of representative sequences, and the development of summary indicators of individual sequences. Raab and Struffolino have conceived more recent advances as the third wave of sequence analysis. This wave is largely characterized by the effort of bringing together the stochastic and the algorithmic modeling culture by jointly applying SA with more established methods such as
analysis of variance Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statistician ...
, event history analysis, Markovian modeling,
social network A social network is a social structure made up of a set of social actors (such as individuals or organizations), sets of dyadic ties, and other social interactions between actors. The social network perspective provides a set of methods for ...
analysis, or causal analysis and statistical modeling in general.


Domain-specific theoretical foundation


Sociology

The analysis of sequence patterns has foundations in sociological theories that emerged in the middle of the 20th century. Structural theorists argued that society is a system that is characterized by regular patterns. Even seemingly trivial social phenomena are ordered in highly predictable ways. This idea serves as an implicit motivation behind social sequence analysts' use of optimal matching, clustering, and related methods to identify common "classes" of sequences at all levels of social organization, a form of pattern search. This focus on regularized patterns of social action has become an increasingly influential framework for understanding microsocial interaction and contact sequences, or "microsequences." This is closely related to Anthony Giddens's theory of
structuration The theory of structuration is a social theory of the creation and reproduction of social systems that is based on the analysis of both ''structure'' and '' agents'' (see structure and agency), without giving primacy to either. Furthermore, in str ...
, which holds that social actors' behaviors are predominantly structured by routines, and which in turn provides predictability and a sense of stability in an otherwise chaotic and rapidly moving social world. This idea is also echoed in Pierre Bourdieu's concept of habitus, which emphasizes the emergence and influence of stable worldviews in guiding everyday action and thus produce predictable, orderly sequences of behavior. The resulting influence of routine as a structuring influence on social phenomena was first illustrated empirically by
Pitirim Sorokin Pitirim Alexandrovich Sorokin (; russian: Питири́м Алекса́ндрович Соро́кин; – 10 February 1968) was a Russian American sociologist and political activist, who contributed to the social cycle theory. Background ...
, who led a 1939 study that found that daily life is so routinized that a given person is able to predict with about 75% accuracy how much time they will spend doing certain things the following day. Talcott Parsons's argument that all social actors are mutually oriented to their larger social systems (for example, their family and larger community) through
social roles A role (also rôle or social role) is a set of connected behaviors, rights, obligations, beliefs, and norms as conceptualized by people in a social situation. It is an expected or free or continuously changing behavior and may have a given indivi ...
also underlies social sequence analysts' interest in the linkages that exist between different social actors' schedules and ordered experiences, which has given rise to a considerable body of work on synchronization between social actors and their social contacts and larger communities. All of these theoretical orientations together warrant critiques of the
general linear model The general linear model or general multivariate regression model is a compact way of simultaneously writing several multiple linear regression models. In that sense it is not a separate statistical linear model. The various multiple linear regr ...
of social reality, which as applied in most work implies that society is either static or that it is highly stochastic in a manner that conforms to
Markov Markov ( Bulgarian, russian: Марков), Markova, and Markoff are common surnames used in Russia and Bulgaria. Notable people with the name include: Academics *Ivana Markova (born 1938), Czechoslovak-British emeritus professor of psychology at ...
processes This concern inspired the initial framing of social sequence analysis as an antidote to general linear models. It has also motivated recent attempts to model sequences of activities or events in terms as elements that link social actors in non-linear network structures This work, in turn, is rooted in Georg Simmel's theory that experiencing similar activities, experiences, and statuses serves as a link between social actors.


Demography and historical demography

In demography and historical demography, from the 1980s the rapid appropriation of the life course perspective and methods was part of a substantive paradigmatic change that implied a stronger embedment of demographic processes into social sciences dynamics. After a first phase with a focus on the occurrence and timing of demographic events studied separately from each other with a hypothetico-deductive approach, from the early 2000s the need to consider the structure of the life courses and to make justice to its complexity led to a growing use of sequence analysis with the aim of pursuing a holistic approach. At an inter-individual level, pairwise dissimilarities and clustering appeared as the appropriate tools for revealing the heterogeneity in human development. For example, the meta-narrations contrasting individualized Western societies with collectivist societies in the South (especially in Asia) were challenged by comparative studies revealing the diversity of pathways to legitimate reproduction. At an intra-individual level, sequence analysis integrates the basic life course principle that individuals interpret and make decision about their life according to their past experiences and their perception of contingencies. The interest for this perspective was also promoted by the changes in individuals' life courses for cohorts born between the beginning and the end of the 20th century. These changes have been described as de-standardization, de-synchronization, de-institutionalization. Among the drivers of these dynamics, the transition to adulthood is key: for more recent birth cohorts this crucial phase along individual life courses implied a larger number of events and lengths of the state spells experienced. For example, many postponed leaving parental home and the transition to parenthood, in some context cohabitation replaced marriage as long-lasting living arrangement, and the birth of the first child occurs more frequently while parents cohabit instead of within a wedlock. Such complexity required to be measured to be able to compare quantitative indicators across birth cohorts (see for an extension of this questioning to populations from low- and medium income countries). The demography's old ambition to develop a 'family demography' has found in the sequence analysis a powerful tool to address research questions at the cross-road with other disciplines: for example, multichannel techniques represent precious opportunities to deal with the issue of compatibility between working and family lives. Similarly, more recent combinations of sequence analysis and event history analysis have been developed (see for a review) and can be applied, for instance, for understanding of the link between demographic transitions and health.


Political sciences

The analysis of temporal processes in the domain of political sciences regards how institutions, that is, systems and organizations (regimes, governments, parties, courts, etc.) that crystallize political interactions, formalize legal constraints and impose a degree of stability or inertia. Special importance is given to, first, the role of contexts, which confer meaning to trends and events, while shared contexts offer shared meanings; second, to changes over time in power relationships, and, subsequently, asymmetries, hierarchies, contention, or conflict; and, finally, to historical events that are able to shape trajectories, such as elections, accidents, inaugural speeches, treaties, revolutions, or ceasefires. Empirically, political sequences' unit of analysis can be individuals, organizations, movements, or institutional processes. Depending on the unit of analysis, the sample sizes may be limited few cases (e.g., regions in a country when considering the turnover of local political parties over time) or include a few hundreds (e.g., individuals' voting patterns). Three broad kinds of political sequences may be distinguished. The first and most common is ''careers,'' that is, formal, mostly hierarchical positions along which individuals progress in institutional environments, such as parliaments, cabinets, administrations, parties, unions or business organizations. We may name ''trajectories'' political sequences that develop in more informal and fluid contexts, such as activists evolving across various causes and social movements,Fillieule, O. and Blanchard, P. (2013). Fighting Together. Assessing Continuity and Change in Social Movement Organizations Through the Study of Constituencies' Heterogeneity. In A Political Sociology of Transnational Europe, chapter 4. ECPR Press, Colchester. or voters navigating a political and ideological landscape across successive polls. Finally, ''processes'' relate to non-individual entities, such as: public policies developing through successive policy stages across distinct arenas; sequences of symbolic or concrete interactions between national and international actors in diplomatic and military contexts; and development of organizations or institutions, such as pathways of countries towards democracy (Wilson 2014).


Concepts

A
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is calle ...
''s'' is an ordered list of elements (''s''1,''s''2,...,''sl'') taken from a finite alphabet ''A''. For a set S of sequences, three sizes matter: the number ''n'' of sequences, the size ''a'' = , ''A'', of the alphabet, and the length ''l'' of the sequences (that could be different for each sequence). In social sciences, ''n'' is generally something between a few hundreds and a few thousands, the alphabet size remains limited (most often less than 20), while sequence length rarely exceeds 100. We may distinguish between ''state sequences'' and ''event sequences'', where states last while events occur at one time point and do not last but contribute possibly together with other events to state changes. For instance, the joint occurrence of the two events leaving home and starting a union provoke a state change from 'living at home with parents' to 'living with a partner'. When a state sequence is represented as the list of states observed at the successive time points, the position of each element in the sequence conveys this time information and the distance between positions reflects duration. An alternative more compact representation of a sequence, is the list of the successive spells stamped with their duration, where a ''spell'' (also called ''episode'') is a
substring In formal language theory and computer science, a substring is a contiguous sequence of characters within a string. For instance, "''the best of''" is a substring of "''It was the best of times''". In contrast, "''Itwastimes''" is a subsequenc ...
in a same state. For example, in is a spell of length 3 in state ''b'', and the whole sequence can be represented as (''a'',2)-(''b'',3)-(''c'',1). A crucial point when looking at state sequences is the timing scheme used to time align the sequences. This could be the historical calendar time, or a process time such as age, i.e. time since birth. In event sequences, positions do not convey any time information. Therefore event occurrence time must be explicitly provided (as a timestamp) when it matters. SA is essentially concerned with state sequences.


Methods

Conventional SA consists essentially in building a typology of the observed trajectories. Abbott and Tsay (2000) describe this typical SA as a three-step program: 1. Coding individual narratives as sequences of states; 2. Measuring pairwise dissimilarities between sequences; and 3. Clustering the sequences from the pairwise dissimilarities. However, SA is much more (see e.g.) and encompasses also among others the description and visual rendering of sets of sequences, ANOVA-like analysis and regression trees for sequences, the identification of representative sequences, the study of the relationship between linked sequences (e.g. dyadic, linked-lives, or various life dimensions such as occupation, family, health), and sequence-network.


Describing and rendering state sequences

Given an alignment rule, a set of sequences can be represented in tabular form with sequences in rows and columns corresponding to the positions in the sequences.


Sequences of cross-sectional distributions

To describe such data, we may look at the columns and consider the cross-sectional state distributions at the successive positions. The ''chronogram'' or ''density plot'' of a set of sequences renders these successive cross-sectional distributions. For each (column) distribution we can compute characteristics such as entropy or modal state and look at how these values evolve over the positions (see pp 18–21).


Characteristics of individual sequences

Alternatively, we can look at the rows. The ''index plot'' where each sequence is represented as a horizontal stacked bar or line is the basic plot for rendering individual sequences. We can compute characteristics of the individual sequences and examine the cross-sectional distribution of these characteristics. Main indicators of individual sequences * Basic measures ** Length ** Number of states visited ** Number of transitions (length of sequence of distinct successive states, DSS) ** Number of subsequences ** Recurrence * Diversity ** Within sequence entropy ** Variance of spell duration * Complexity of the sequence structure ** Volatility ** Complexity index ** Turbulence * Measures that take account of the nature of the states ** Normative volatility i.e. proportion of positive spells. ** Integration index also known as Quality index ** Degradation ** Badness ** Precarity index ** Insecurity


Other overall descriptive measures

* Mean time in the different states (overall state distribution) and their standard errors * Transition probabilities between states.


Visualization

State sequences can nicely be rendered graphically and such plots prove useful for interpretation purposes. As shown above, the two basic plots are the index plot that renders individual sequences and the chronogram that renders the evolution of the cross-sectional state distribution along the timeframe. Chronograms (also known as status proportion plot or state distribution plot) completely overlook the diversity of the sequences, while index plots are often too scattered to be readable. Relative frequency plots and plots of representative sequences attempt to increase the readability of index plots without falling in the oversimplification of a chronogram. In addition, there are many plots that focus on specific characteristics of the sequences. Below is a list of plots that have been proposed in the literature for rendering large sets of sequences. For each plot, we give examples of software (details in section
Software Software is a set of computer programs and associated software documentation, documentation and data (computing), data. This is in contrast to Computer hardware, hardware, from which the system is built and which actually performs the work. ...
) that produce it. * Index plot: renders the set of individual sequences (SADI, SQ, TraMineR) * Chronogram (status proportion plot, state distribution plot): renders the sequence of cross-sectional distributions (SADI, SQ, TraMineR) * Plot of multichannel sequences grouped by channels (seqHMM) or by individuals * Plot of time series of cross-sectional indicators (entropy, modal state, ...) (SQ, TraMineR) * Frequency plot (SQ, TraMineR) * Relative frequency plot (TraMineRextras) * Representative sequences (TraMineR) * Mean time in the different states and their standard errors (TraMineR) * State survival plot (TraMineRextras) * Transition patterns (SADI) * Transition plot (SQ; Gmisc) and plot of transition probabilities (seqHMM) * Parallel coordinate plot (TraMineR, SQ) * Probabilistic suffix trees (PST) * Sequence networks (see
social network analysis Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of ''nodes'' (individual actors, people, or things within the network) ...
) (Software?) * Narrative networks (Software?)


Pairwise dissimilarities

Pairwise dissimilarities between sequences serve to compare sequences and many advanced SA methods are based on these dissimilarities. The most popular dissimilarity measure is ''
optimal matching Optimal matching is a sequence analysis method used in social science, to assess the dissimilarity of ordered arrays of tokens that usually represent a time-ordered sequence of socio-economic states two individuals have experienced. Once such dista ...
'' (OM), i.e. the minimal cost of transforming one sequence into the other by means of indel (insert or delete) and substitution operations with possibly costs of these elementary operations depending on the states involved. SA is so intimately linked with OM that it is sometimes named optimal matching analysis (OMA). There are roughly three categories of dissimilarity measures: * Optimal matching and other edit distances ** Examples: OM, OMloc (localized OM), OMslen (spell-length sensitive OM), OMspell (OM of spell sequences), OMstran (OM of sequences of transitions), TWED (time-warp edit distance), HAM (Hamming and generalized Hamming), DHD (Dynamic Hamming). ** Strategies for setting the substitution and indel costs *** Constant costs (all substitution costs identical and single indel cost) *** Theory-based costs *** Feature-based costs *** Data-driven costs: based on transition probabilities or state frequencies * Measures based on the count of common attributes ** Examples: LCS (derived from length of
longest common subsequence A longest common subsequence (LCS) is the longest subsequence common to all sequences in a set of sequences (often just two sequences). It differs from the longest common substring: unlike substrings, subsequences are not required to occupy conse ...
), LCP (from length of longest common prefix), NMS (number of matching subsequences), and NMSMST and SVRspell two variants of NMS. * Distances between within-sequence state distributions ** Examples: CHI2 and EUCLID defined as the average of respectively the Chi-squared and Euclidean distance between state distributions in successive sliding windows.


Dissimilarity-based analysis

Pairwise dissimilarities between sequences give access to a series of techniques to discover holistic structuring characteristics of the sequence data. In particular, dissimilarities between sequences can serve as input to cluster algorithms and multidimensional scaling, but also allow to identify medoids or other representative sequences, define neighborhoods, measure the discrepancy of a set of sequences, proceed to ANOVA-like analyses, and grow regression trees. * Cluster analysis ** Descriptive: identification of main sequence patterns. ** Clusters as dependent or independent variables in regression analysis: study of relationships with other variables of interest. *
Multidimensional scaling Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. MDS is used to translate "information about the pairwise 'distances' among a set of n objects or individuals" into a configurati ...
(principal coordinates): numerical representation of sequences. * Discrepancy (
ANOVA Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statistician ...
-like) analysis ** Sequence of ANOVA-like analyses * Regression trees * Representative sequences * Multiple domains (multichannel analysis) * Dyadic and polyadic sequence data


Other methods of analysis

Although dissimilarity-based methods play a central role in social SA, essentially because of their ability to preserve the holistic perspective, several other approaches also prove useful for analyzing sequence data. * Non dissimilarity-based clustering ** Latent class analysis (LCA), **
Markov model In probability theory, a Markov model is a stochastic model used to model pseudo-randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it (that is, it assumes the Mark ...
mixture and
hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an o ...
mixture ** Mixtures of exponential-distance models * Sequence networks ** Representing a single sequence as a network ** Meta network of sequences ** Sequence network measures ** Life history graph * Probabilistic approaches ** Markovian and other transition distribution models. See also
Markov model In probability theory, a Markov model is a stochastic model used to model pseudo-randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it (that is, it assumes the Mark ...
. ** Probabilistic Suffix Tree (PST) also known as
variable-order Markov model In the mathematical theory of stochastic processes, variable-order Markov (VOM) models are an important class of models that extend the well known Markov chain models. In contrast to the Markov chain models, where each random variable in a sequence ...
or variable-length Markov model. * Event sequences ** Event structure models **Rendering of event sequences (parallel coordinate plots, ...) ** Frequent subsequences ** Discriminant subsequences ** Dissimilarity-based analysis of event sequences


Advances: the third wave of sequence analysis

Some recent advances can be conceived as the ''third wave of SA''. This wave is largely characterized by the effort of bringing together the stochastic and the algorithmic modeling culture by jointly applying SA with more established methods such as analysis of variance, event history, network analysis, or causal analysis and statistical modeling in general. Some examples are given below; see also "Other methods of analysis". * Effect of past trajectories on the hazard of an event: Sequence History Analysis, SHA * Effect of time varying covariates on trajectories: Competing Trajectories Analysis (CTA), and Sequence Analysis Multistate Model (SAMM) * Validation of cluster typologies * Discrepancy analysis to bring time back to qualitative comparative analysis – QCA


Open issues and limitations

Although SA witnesses a steady inflow of methodological contributions that address the issues raised two decades ago, some pressing open issues remain. Among the most challenging, we can mention: * Sequences of different lengths, truncated sequences, and missing values. * Validation of cluster results * Sequence length vs importance of recency: for example, when analyzing biographic sequences 40 year-long from age 1 to 40, one can only consider individuals born 40 years earlier and therefore the behavior of younger birth cohorts is disregarded. Up-to-date information on advances, methodological discussions, and recent relevant publications can be found on the Sequence Analysis Associatio
webpage


Fields of application

These techniques have proved valuable in a variety of contexts. In life-course research, for example, research has shown that retirement plans are affected not just by the last year or two of one's life, but instead how one's work and family careers unfolded over a period of several decades. People who followed an "orderly" career path (characterized by consistent employment and gradual ladder-climbing within a single organization) retired earlier than others, including people who had intermittent careers, those who entered the labor force late, as well as those who enjoyed regular employment but who made numerous lateral moves across organizations throughout their careers. In the field of
economic sociology Economic sociology is the study of the social cause and effect of various economic phenomena. The field can be broadly divided into a classical period and a contemporary one, known as "new economic sociology". The classical period was concerned ...
, research has shown that firm performance depends not just on a firm's current or recent social network connectedness, but also the durability or stability of their connections to other firms. Firms that have more "durably cohesive" ownership network structures attract more foreign investment than less stable or poorly connected structures. Research has also used data on everyday work activity sequences to identify classes of work schedules, finding that the timing of work during the day significantly affects workers' abilities to maintain connections with the broader community, such as through community events. More recently, social sequence analysis has been proposed as a meaningful approach to study trajectories in the domain of creative enterprise, allowing the comparison among the idiosyncrasies of unique creative careers. While other methods for constructing and analyzing whole sequence structure have been developed during the past three decades, including event structure analysis, OM and other sequence comparison methods form the backbone of research on whole sequence structures. Some examples of application include: Sociology *Labor market entry sequences * De-standardization of the life course * Residential trajectories * Time use * Actual and idealized relationship scripts * Basic types of figures in ritual dances * Pathways of alcohol consumption Demography and historical demography * Transition to adulthood * Partnership biographies * Family formation life course * Childbirth histories Political sciences * Pathways towards democratization * Pathways of legislative processes * Bargaining between actors during national crises Psychology * Sequences of adolescences' social interactions Medical research * Care trajectory in chronic disease Survey methodology * Response in survey collection Geography * Mobility studies * Land use


Software

Two main statistical computing environment offer tools to conduct a sequence analysis in the form of user-written packages: Stata and R. * Stata: ''SQ'' and ''SADI'' are general SA toolkits. ''MICT'' is dedicated to imputation of missing elements in sequences. * R:
TraMineR
' with its extension ''TraMineRextras'' is probably the most comprehensive SA toolkit; ''ggseqplot,'' provides ggplot versions of most TraMineR plots; ''seqhandbook'' provides several specific tools such as heat maps of sequence data and the GIMSA method for measuring dissimilarities between multidomain sequences; ''seqimpute'' provides tools for imputing missing elements in sequences; ''seqHMM,'' although specialized in fitting Markov models, this package provides useful plotting facilities for rendering multichannel sequences and transition probabilities; ''WeightedCluster'' versatile clustering package with original tools for grouping identical sequences and rendering hierarchical trees of sequences; ''PST'' fits and renders probabilistic suffix trees of sequences.


Institutional development

The first international conference dedicated to social-scientific research that uses sequence analysis methods – the Lausanne Conference on Sequence Analysis, o
LaCOSA
– was held in Lausanne, Switzerland in June 2012. A second conference
LaCOSA II
was held in Lausanne in June 2016. Th
Sequence Analysis Association
(SAA) was founded at the International Symposium on Sequence Analysis and Related Methods, in October 2018 at Monte Verità, TI, Switzerland. The SAA is an international organization whose goal is to organize events such as symposia and training courses and related events, and to facilitate scholars' access to sequence analysis resources.


See also


References


External links


The homepage of the Sequence Analysis Association.


ndrew Abbott's 1995 review of sociological approaches to sequence analysis.
The TraMineR page

Brendan Halpin's sequence analysis page
at the
University of Limerick The University of Limerick (UL) ( ga, Ollscoil Luimnigh) is a public research university institution in Limerick, Ireland. Founded in 1972 as the National Institute for Higher Education, Limerick, it became a university in 1989 in accordance w ...
.
Laurent Lesnard's Stata plugin for sequence analysis using the dynamic Hamming distance.
{{Portal bar, Society Methods in sociology Demography Data analysis Methodology Social statistics Longitudinal sociological studies