Spell Sequences, State Proximities, and Distance Metrics

  title={Spell Sequences, State Proximities, and Distance Metrics},
  author={Cees H. Elzinga and Matthias Studer},
  journal={Sociological Methods \& Research},
  pages={3 - 47}
Because optimal matching (OM) distance is not very sensitive to differences in the order of states, we introduce a subsequence-based distance measure that can be adapted to subsequence length, to subsequence duration, and to soft-matching of states. Using a simulation technique developed by Studer, we investigate the sensitivity, relative to OM, of several variants of this metric to variations in order, timing, and duration of states. The results show that the behavior of the metric is as… Expand
A comparative review of sequence dissimilarity measures
This is a comparative study of the multiple ways of measuring dissimilarities between state sequences. For sequences describing life courses, such as family life trajectories or professional careers,Expand
Three Narratives of Sequence Analysis
How do we relate the distance between two sequences, as given by an algorithm such as optimal matching, to sociologically meaningful notions of similarity and dissimilarity? This has beenExpand
Normalization of Distance and Similarity in Sequence Analysis
The relations between the notion of distance and a feature set–based concept of similarity has a spatial interpretation that is complementary to distance that is interpreted as “direction” and proper normalization leads to distances that can be directly interpreted as dissimilarity. Expand
What matters in differences between life trajectories: a comparative review of sequence dissimilarity measures
The study shows that there is no universally optimal distance index, and that the choice of a measure depends on which aspect the authors want to focus on, and introduces novel ways of measuring dissimilarities that overcome some flaws in existing measures. Expand
Mixture Hidden Markov Models for Sequence Data: The seqHMM Package in R
The seqHMM package in R is designed for the efficient modeling of sequences and other categorical time series data containing one or multiple subjects with one or several interdependent sequences using HMMs and MHMMs. Expand
Unsupervised Learning of the Sequences of Adulthood Transition Trajectories
Unsupervised learning of the sampled sequences of adulthood transitions considered herein has successfully demonstrated its potential usefulness in displaying and summarizing complex event history data into meaningful and interpretable dimensions to meet new challenges and to build policy framework for the adults of a nation. Expand
Divisive Property-Based and Fuzzy Clustering for Sequence Analysis
This paper discusses the usefulness of divisive property-based and fuzzy clustering for sequence analysis, and discusses several methods by which to visualize a fuzzy clusters solution, and analyzes them with regression-like approaches. Expand
Comparing methods of classifying life courses: sequence analysis and latent class analysis
We compare life course typology solutions generated by sequence analysis (SA) and latent class analysis (LCA). First, we construct an analytic protocol to arrive at typology solutions for bothExpand
Validating Sequence Analysis Typologies Using Parametric Bootstrap
  • M. Studer
  • Medicine, Mathematics
  • Sociological methodology
  • 2021
A methodology for the validation of sequence analysis typologies on the basis of parametric bootstraps following the framework proposed by Hennig and Lin (2015) is proposed, which allows identifying the key structural aspects captured by the observed typology. Expand
Order or chaos? Understanding career mobility using sequence analysis and information-theoretic methods
We examine the careers of a nationally representative US cohort of young adults using sequence analysis and information-theoretic techniques to describe these careers’ structure and how thisExpand


Optimal Matching Analysis and Life-Course Data: The Importance of Duration
The optimal matching (OM) algorithm is widely used for sequence analysis in sociology. It has a natural interpretation for discrete-time sequences but is also widely used for life-history data, whichExpand
Measuring Resemblance in Sequence Data: An Optimal Matching Analysis of Musicians' Careers
This article introduces a method that measures resemblance between sequences using a simple metric based on the insertions, deletions, and substitutions required to transform one sequence intoExpand
The subsequence composition of a string
Words that appear as constrained subsequences in a text-string are considered as possible indicators of the host string structure, hence also as a possible means of sequence comparison andExpand
Measuring the Agreement between Sequences
The present article proposes a new method to assess distances between sequences of states, belonging to, for instance, event histories. It is based on the number of moves needed to turn one sequenceExpand
Algorithms for subsequence combinatorics
Theorems that lead to efficient dynamic programming algorithms to count distinct subsequences in a string are presented and sequences generated by a string allowing characters to come in runs of a length that is bounded from above are presented. Expand
Analyzing and Visualizing State Sequences in R with TraMineR
This article describes the many capabilities offered by the TraMineR toolbox for categorical sequence data. It focuses more specifically on the analysis and rendering of state sequences. AddressedExpand
Setting Cost in Optimal Matching to Uncover Contemporaneous Socio-Temporal Patterns
This article addresses the question of the effects of cost setting on the kind of temporal patterns optimal matching (OM) can uncover when applied to social science data. It is argued that theExpand
On Order Equivalences between Distance and Similarity Measures on Sequences and Trees
It is shown that alignment-orderings by distance can be dualized by similarity, and vice-versa, and that there are categorisation and hierarchical clustering outcomes which can be achieved via similarity but not via distance. Expand
Discrepancy Analysis of State Sequences
In this article, the authors define a methodological framework for analyzing the relationship between state sequences and covariates. Inspired by the principles of analysis of variance, this approachExpand
This article proposes the method of multichannel sequence analysis (MCSA), which simultaneously extends the usual optimal matching analysis (OMA) to multiple life spheres and finds that MCSA offers an alternative to the sole use of ex-post sum of distance matrices by locally aligning distinct life trajectories simultaneously. Expand