Automatic classification of speaker characteristics

@article{Nguyen2010AutomaticCO,
  title={Automatic classification of speaker characteristics},
  author={Phuoc Nguyen and Dat T. Tran and Xu Huang and Dharmendra Sharma},
  journal={International Conference on Communications and Electronics 2010},
  year={2010},
  pages={147-152}
}
An automatic voice-based classification system of speaker characteristics including age, gender and accent is presented in this paper. Speakers are grouped according to their characteristics and their speech features are then extracted to train speaker group models using different classification techniques. Finally fusion of classification results for those speaker groups is performed to obtain results for each speaker characteristic. The ANDOSL Australian speech database consisting of 108… 

Figures and Tables from this paper

A Survey Paper on Gender Identification System using Speech Signal
TLDR
This paper provides a survey of automatic human gender identification using speech signal characteristics and classifiers and highlights of selection of speech features, their processing and different classifiers used for this purpose are discussed.
A Survey of Speaker Recognition: Fundamental Theories, Recognition Methods and Opportunities
TLDR
This literature survey gives a concise introduction to ASR and provides an overview of the general architectures dealing with speaker recognition technologies, and upholds the past, present, and future research trends in this area.
Computational Assessment of Interest in Speech—Facing the Real-Life Challenge
TLDR
A fully automatic combination of brute-forced acoustic features, linguistic analysis, and non-linguistic vocalizations, exploiting cross-entity information in an early feature fusion is introduced.
Semantic Speech Tagging: Towards Combined Analysis of Speaker Traits
TLDR
This paper deals with the question how further paralinguistic information, such as speaker age, height, or race can provide beneficial information when their ground truth knowledge is provided within single-task speaker classification.
Study of Word-Level Accent Classication and Gender Factors
TLDR
This work proposes to use stacked ensemble classier to classify gender rstly and then classify accent to improve accuracy, and results show that HMM-MFCC models show promising performance.
Acoustic correlates for perceived effort levels in male and female acted voices.
TLDR
Perception-grounded male and female acoustic feature sets which tracked the actors' expressive effort levels through the continuum of whispered, breathy, modal, and resonant speech are presented and validated via multiple models.

References

SHOWING 1-10 OF 30 REFERENCES
Automatic estimation of one's age with his/her speech based upon acoustic modeling techniques of speakers
TLDR
A technique which automatically estimates speakers' age only with acoustic, not linguistic, information of their utterances is proposed, showing high correlation between speakers'Age estimated subjectively by humans and automatically calculated score of ‘agedness’.
Comparison of Four Approaches to Age and Gender Recognition for Telephone Applications
  • Florian Metze, J. Ajmera, B. Littel
  • Computer Science
    2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07
  • 2007
TLDR
A comparative study of four different approaches to automatic age and gender classification using seven classes on a telephony speech task and also compares the results with human performance on the same data.
Voice signatures
  • I. Shafran, M. Riley, M. Mohri
  • Computer Science
    2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721)
  • 2003
TLDR
Two approaches for extracting speaker traits are investigated: the first focuses on general acoustic and prosodic features, the second on the choice of words used by the speaker, showing that voice signatures are of practical interest in real-world applications.
Higher-Level Features in Speaker Recognition
TLDR
This article briefly summarizes approaches to using higher-level features for text-independent speaker verification over the last decade in terms of their type, temporal span, and reliance on automatic speech recognition for both feature extraction and feature conditioning.
Robust text-independent speaker identification using Gaussian mixture speaker models
TLDR
The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are effective for modeling speaker identity and is shown to outperform the other speaker modeling techniques on an identical 16 speaker telephone speech task.
Speaker Characteristics
In this chapter, we give a brief introduction to speech-driven applications in order to motivate whyit is desirable to automatically recognize particular speaker characteristics from speech. Starting
Performance of Speaker-independent Speech Recognisers for Automatic Recognition of Australian English
This paper investigates the performance of three speaker-independent speech recognisers (SISRs) that support continuous speech and are currently available for speaker-independent recognition of
Fusing high- and low-level features for speaker recognition
TLDR
It is shown how novel features and classifiers provide complementary information and can be fused together to drive down the equal error rate on the 2001 NIST Extended Data Task to 0.2%—a 71% relative reduction in error over the previous state of the art.
Acoustic Analysis of Adult Speaker Age
TLDR
This chapter offers an introduction to the phonetic study of speaker age, with focus on what is known about the acoustic features which vary with age.
Automatic accent classification of foreign accented Australian English speech
  • Karsten Kumpf, R. W. King
  • Linguistics
    Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96
  • 1996
TLDR
An automatic classification system for foreign accents in Australian English (AuE) speech based on accent-dependent parallel phoneme recognition (PPR) has been developed and is novel in that it does not require manually labelled accented data to be trained.
...
...