Measuring Resemblance in Sequence Data: An Optimal Matching Analysis of Musicians' Careers

  title={Measuring Resemblance in Sequence Data: An Optimal Matching Analysis of Musicians' Careers},
  author={Andrew Abbott and Alexandra Hrycak},
  journal={American Journal of Sociology},
  pages={144 - 185}
This article introduces a method that measures resemblance between sequences using a simple metric based on the insertions, deletions, and substitutions required to transform one sequence into another. The method, called optimal matching, is widely used in natural science. The article reviews the literature on sequence analysis, then discusses the optimal matching algorithm in some detail. Applying this technique to a data set detailing careers of musicians active in Germany in the 18th century… 
A comparative review of sequence dissimilarity measures
This is a comparative study of the multiple ways of measuring dissimilarities between state sequences. For sequences describing life courses, such as family life trajectories or professional careers,
Sequence Similarity
This article reviews objections to optimal-matching (OM) algorithms in sequence analysis and reformulates the concept of sequence similarity in terms of a binary precedence relation. This precedence
A Transition-Oriented Approach to Optimal Matching
This paper introduces a distinction between two sequence types—namely, common ancestors and unfolding processes and presents a new way of coding sequences as an extension to conventional OM analyses and demonstrates its usefulness in simulated and empirical examples.
Optimal Matching Analysis and Life-Course Data: The Importance of Duration
The optimal matching (OM) algorithm is widely used for sequence analysis in sociology. It has a natural interpretation for discrete-time sequences but is also widely used for life-history data, which
Sequence Analysis and Optimal Matching Methods in Sociology
The authors review all known studies applying optimal matching or alignment (OM) techniques to social science sequence data. Issues of data, coding, temporality, cost setting/algorithm design, and
Spell Sequences, State Proximities, and Distance Metrics
This work investigates the sensitivity, relative to OM, of several variants of this metric to variations in order, timing, and duration of states, and shows that the behavior of the metric is as intended.
Three Narratives of Sequence Analysis
How do we relate the distance between two sequences, as given by an algorithm such as optimal matching, to sociologically meaningful notions of similarity and dissimilarity? This has been
A Comment on “Measuring the Agreement between Sequences”
The author discusses the general concept and nature of alignment algorithms for sequence data, and talks about the character and utility of the Dijkstra/Taris algorithm, a particular implementation of the alignment approach to sequence analysis.
Categorizing Event Sequences Using Regular Expressions
IASSIST Quarterly Introduction Researchers who work with large sequential datasets are often limited in the kinds of analytic strategies they can use because of the sheer size of the data. Automated
Analyzing Sequence Data
Optimal matching (OM), an invaluable yet underutilized tool in the analysis of sequence data, is discussed and an illustration of its use in the examination of careers of deans at U.S. business schools is provided.


Properties of levenshtein metrics on sequences
Levenshtein dissimilarity measures are used to compare sequences in application areas including coding theory, computer science and macromolecular biology. In general, they measure sequence
The data analysis problem of ordering or sequencing a set of objects using an asymmetric proximity function is reviewed, with an emphasis on literature not generally referenced in psychology. In
Rapid and sensitive protein similarity searches.
An algorithm was developed which facilitates the search for similarities between newly determined amino acid sequences and sequences already available in databases and increases sensitivity by giving high scores to those amino acid replacements which occur frequently in evolution.
Rapid similarity searches of nucleic acid and protein data banks.
  • W. Wilbur, D. Lipman
  • Biology, Medicine
    Proceedings of the National Academy of Sciences of the United States of America
  • 1983
An algorithm for the global comparison of sequences based on matching k-tuples of sequence elements for a fixed k results in substantial reduction in the time required to search a data bank when compared with prior techniques of similarity analysis, with minimal loss in sensitivity.
A convenient and adaptable microcomputer environment for DNA and protein sequence manipulation and analysis
We describe the further development of a widely used package of DNA and protein sequence analysis programs for microcomputers (1,2,3). The package now provides a screen oriented user interface, and
Discovering Patterns in Sequences of Events
A program, called SPARC/E, is described that implements most of the methodology as applied to discovering sequence generating rules in the card game Eleusis, and is used as a source of examples for illustrating the performance of SPARC /E.
Wage change in the late career: A model for the outcomes of job sequences
Abstract This paper elaborates a model for the outcomes of job sequences and illustrates its utility by an empirical analysis of the determinants of wage change for men in their late careers. We
The Representation of Social Processes by Markov Models
In this paper we consider a class of issues which are central to modeling social phenomena by continuous-time Markov structures. In particular, we discuss (a) embeddability, or how to determine
Methodology of the Social Sciences
PROF. F. KAUFMANN, formerly of Vienna and now at the New School of Social Research in New York, has long been concerned with problems of methodology. Here he attacks the most difficult of them, the
Toward a Stochastic Model of Managerial Careers
This study describes how the career movements of managers and professionals within organizations may be described by a Markov chain model. This allows a formal description of the results of current