Statistical Models in Forensic Voice Comparison

  title={Statistical Models in Forensic Voice Comparison},
  author={Geoffrey Stewart Morrison and Ewald Enzinger and Daniel Ramos and Joaqu'in Gonz'alez-Rodr'iguez and Alicia Lozano-D'iez},
  journal={arXiv: Applications},
This chapter describes a number of signal-processing and statistical-modeling techniques that are commonly used to calculate likelihood ratios in human-supervised automatic approaches to forensic voice comparison. Techniques described include mel-frequency cepstral coefficients (MFCCs) feature extraction, Gaussian mixture model - universal background model (GMM-UBM) systems, i-vector - probabilistic linear discriminant analysis (i-vector PLDA) systems, deep neural network (DNN) based systems… 
Triplet loss based embeddings for forensic speaker identification in Spanish
This work explores the use of speech embeddings obtained by training a CNN using the triplet loss and focuses on the Spanish language which has not been extensively studies, and proposes two approaches to calculate the Likelihood Radio given out speechembeddings quality.
Consensus on validation of forensic voice comparison.
Validations of an alpha version of the E3 Forensic Speech Science System (E3FS3) core software tools
Testing for calibration discrepancy of reported likelihood ratios in forensic science
  • Jan Hannig, H. Iyer
  • Computer Science
    Journal of the Royal Statistical Society: Series A (Statistics in Society)
  • 2021
A statistical approach for testing the calibration discrepancy of likelihood ratio systems using ground truth known empirical data and results from a limited simulation study concerning the performance of the proposed approach are provided.
MAP Adaptation Characteristics in Forensic Long-Term Formant Analysis
Results show that in terms of overall performance characteristics there is little difference between the selection and de-selection of MAP and that application of MAP allows for more symmetric sameand different-speaker distributions and shows more robustness against duration reductions, both of which are forensically important.
In the context of forensic casework, are there meaningful metrics of the degree of calibration?
  • G. Morrison
  • Computer Science
    Forensic science international. Synergy
  • 2021


Voice source features for forensic voice comparison - an evaluation of the GLOTTEX software package
It has been proposed that the output of GLOTTEX R ⃝ can be used as part of a forensic-voice-comparison system, and manually labeled segments from a database of voice recordings of 60 female Chinese speakers are tested.
Analysis of DNN approaches to speaker identification
This work studies the usage of the Deep Neural Network (DNN) Bottleneck (BN) features together with the traditional MFCC features in the task of i-vector-based speaker recognition. We decouple the
Reliability of voice comparison for forensic applications
It is shown that oral vowels, nasal vowels and nasal consonants bring more speaker-specific information than averaged phonemic content in FVC accuracy, and an approach to predict the LR reliability based only on the pair of voice recordings is investigated.
Bayesian Speaker Verification with Heavy-Tailed Priors
A new approach to speaker verification is described which is based on a generative model of speaker and channel effects but differs from Joint Factor Analysis in several respects, including each utterance is represented by a low dimensional feature vector rather than by a high dimensional set of Baum-Welch statistics.
Forensic speech science
As part of the Expert Evidence series this chapter is intended to be accessible to lawyers, judges, police officers, and potential jury members; however, it is hoped that this chapter will also be of
A novel scheme for speaker recognition using a phonetically-aware deep neural network
We propose a novel framework for speaker recognition in which extraction of sufficient statistics for the state-of-the-art i-vector model is driven by a deep neural network (DNN) trained for
Insights into deep neural networks for speaker recognition
The insights gained by this study indicate that, for the purpose of speaker recognition, not using fMLLR speaker adaptation and early stopping of the DNN training allow significant computational reduction without sacrificing performance.