# Discrepancy Analysis of State Sequences

@article{Studer2011DiscrepancyAO, title={Discrepancy Analysis of State Sequences}, author={Matthias Studer and Gilbert Ritschard and Alexis Gabadinho and Nicolas S. M{\"u}ller}, journal={Sociological Methods \& Research}, year={2011}, volume={40}, pages={471 - 510} }

In this article, the authors define a methodological framework for analyzing the relationship between state sequences and covariates. Inspired by the principles of analysis of variance, this approach looks at how the covariates explain the discrepancy of the sequences. The authors use the pairwise dissimilarities between sequences to determine the discrepancy, which makes it possible to develop a series of statistical significance–based analysis tools. They introduce generalized simple and…

## 173 Citations

### Analyzing and Visualizing State Sequences in R with TraMineR

- Computer Science
- 2011

This article describes the many capabilities offered by the TraMineR toolbox for categorical sequence data. It focuses more specifically on the analysis and rendering of state sequences. Addressed…

### What matters in differences between life trajectories: a comparative review of sequence dissimilarity measures

- Psychology
- 2016

The study shows that there is no universally optimal distance index, and that the choice of a measure depends on which aspect the authors want to focus on, and introduces novel ways of measuring dissimilarities that overcome some flaws in existing measures.

### Discrepancy Analysis of Activity Sequences

- Computer Science
- 2014

This paper proposes the use of a new combination of the sequence alignment and discrepancy analysis methodologies instead of the cluster-based approach, which allows the association between activity sequences characterized by a pairwise distance matrix and one or more covariates to be evaluated.

### Data quality challenges with missing values and mixed types in joint sequence analysis

- Computer Science2017 IEEE International Conference on Big Data (Big Data)
- 2017

This paper employs longitudinal sequence data representations, a similarity measure designed for categorical and longitudinal data, together with state-of-the art clustering methodologies reliant on hierarchical algorithms to investigate the impact of missing values in categorical time series sequences on common data analysis tasks.

### Comparing Groups of Life-Course Sequences Using the Bayesian Information Criterion and the Likelihood-Ratio Test

- Psychology
- 2020

How can we statistically assess differences in groups of life-course trajectories? The authors address a long-standing inadequacy of social sequence analysis by proposing an adaption of the Bayesian…

### Use of State Sequence Analysis in Pharmacoepidemiology: A Tutorial

- Computer ScienceInternational journal of environmental research and public health
- 2021

This paper presents an application of SSA to opioid prescription patterns in patients with non-cancer pain, using real-world data from Italy and shows how SSA allows the identification of patterns in prescriptions in these data that might not be evident using standard statistical approaches and how these patterns are associated with future discontinuation of opioid therapy.

### A comparative review of sequence dissimilarity measures

- Psychology
- 2014

This is a comparative study of the multiple ways of measuring dissimilarities between state sequences. For sequences describing life courses, such as family life trajectories or professional careers,…

### Evaluating the Effects of Missing Values and Mixed Data Types on Social Sequence Clustering Using t-SNE Visualization

- Computer ScienceACM J. Data Inf. Qual.
- 2019

It is found that the ability to overcome missing data problems is more difficult in the nominal domain than in the binary domain, and the usage of t-distributed stochastic neighborhood embedding is demonstrated to visually guide mitigation of such biases.

### WeightedCluster Library Manual A practical guide to creating typologies of trajectories in the social sciences with R

- Computer Science
- 2013

This manual presents the WeightedCluster library and offers a step-by-step guide to creating typologies of sequences for the social sciences, and shows that these methods offer an important descriptive point of view on sequences by bringing to light recurrent patterns.

### Network Analysis of Sequence Structures

- Business
- 2018

This chapter identifies several useful network techniques, shows how they correspond to sequence-related concerns, and describes how to compare multiple sequence-network structures to each other.

## References

SHOWING 1-10 OF 58 REFERENCES

### Discrepancy Analysis of Complex Objects Using Dissimilarities

- Computer ScienceEGC
- 2009

A generalization of the analysis of variance (ANOVA) to assess the link of complex objects (e.g. sequences) with a given categorical variable and a new tree method for analyzing discrepancy ofcomplex objects that exploits the former test as splitting criterion are introduced.

### Analyzing and Visualizing State Sequences in R with TraMineR

- Computer Science
- 2011

This article describes the many capabilities offered by the TraMineR toolbox for categorical sequence data. It focuses more specifically on the analysis and rendering of state sequences. Addressed…

### Analysis of distance for structured multivariate data and extensions to multivariate analysis of variance

- Mathematics
- 1999

Many data sets in practice fit a multivariate analysis of variance (MANOVA) structure but are not consonant with MANOVA assumptions. One particular such data set from economics is described. This set…

### Multivariate regression analysis of distance matrices for testing associations between gene expression patterns and related variables

- Computer ScienceProceedings of the National Academy of Sciences
- 2006

The proposed multivariate method avoids the need for reducing the dimensions of a similarity matrix, can be used to assess relationships between the genes used to construct the matrix and additional information collected on the samples under study, and can be use to analyze individual genes or groups of genes identified in different ways.

### Measuring the Agreement between Sequences

- Computer Science
- 1995

A new method to assess distances between sequences of states, belonging to, for instance, event histories, based on the number of moves needed to turn one sequence into another sequence is proposed.

### Some distance properties of latent root and vector methods used in multivariate analysis

- Mathematics
- 1966

SUMMARY This paper is concerned with the representation of a multivariate sample of size n as points P1, P2, ..., PI in a Euclidean space. The interpretation of the distance A(Pi, Pj) between the ith…

### Extracting and Rendering Representative Sequences

- Computer ScienceIC3K
- 2009

The proposed heuristic for extracting the representative subset requires as main arguments a pairwise distance matrix, a representativeness criterion and a distance threshold under which two sequences are considered as redundant or, identically, in the neighborhood of each other.

### A Primer on Sequence Methods

- Sociology
- 1990

This paper considers the technical problem of analyzing sequences of social events, including organizational life cycles, patterns of innovation development, and career tracks of individuals, and considers methods for unique event sequences, proposing the use of multidimensional scaling and illustrating it with an analysis of data on medical organizations.

### Clustering work and family trajectories by using a divisive algorithm

- Computer Science
- 2007

This work introduces a new divisive clustering algorithm which has features that are in common with both Ward's agglomerative algorithm and classification and regression trees and analyses British Household Panel Survey data on the employment and family trajectories of women.

### Distance‐Based Tests for Homogeneity of Multivariate Dispersions

- MathematicsBiometrics
- 2006

Summary The traditional likelihood‐based test for differences in multivariate dispersions is known to be sensitive to nonnormality. It is also impossible to use when the number of variables exceeds…