Voice Quality and Pitch Features in Transformer-Based Speech Recognition

  title={Voice Quality and Pitch Features in Transformer-Based Speech Recognition},
  author={Guillermo C'ambara and Jordi Luque and Mireia Farr'us},
Jitter and shimmer measurements have shown to be carriers of voice quality and prosodic information which enhance the performance of tasks like speaker recognition, diarization or automatic speech recognition (ASR). However, such features have been seldom used in the context of neural-based ASR, where spectral features often prevail. In this work, we study the effects of incorporating voice quality and pitch features altogether and separately to a Transformer-based ASR model, with the intuition… 

