Efficient voice activity detection algorithms using long-term speech information

  title={Efficient voice activity detection algorithms using long-term speech information},
  author={Javier Ram{\'i}rez and Jos{\'e} C. Segura and M. Carmen Ben{\'i}tez and {\'A}ngel de la Torre and Antonio J. Rubio},
  journal={Speech Communication},
Currently, there are technology barriers inhibiting speech processing systems working under extreme noisy conditions. The emerging applications of speech technology, especially in the fields of wireless communications, digital hearing aids or speech recognition, are examples of such systems and often require a noise reduction technique operating in combination with a precise voice activity detector (VAD). This paper presents a new VAD algorithm for improving speech detection robustness in noisy… CONTINUE READING
Highly Influential
This paper has highly influenced a number of papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 343 citations. REVIEW CITATIONS

13 Figures & Tables

Extracted Numerical Results

  • In AURORA 2, the word error rate was reduced from 8.14% to 7.82% for the multicondition training experiments and from 13.07% to 11.87% for the clean training experiments. In AURORA 3, the improvements were especially important in high mismatch experiments being the word error rate reduced from 13.27% to 10.54%.
  • On the other hand, if the AURORA complex Back-End using digit models with 20 Gaussians per state and a silence model with 36 Gaussians per state is considered, the AURORA 2 word error rate is reduced from 12.04% to 11.29% for the clean training experiments and from 6.57% to 6.13% for the multi-condition training experiments when the VADs of the original AFE are replaced by the proposed LTSE VAD.
  • Particularly, when the feature extraction algorithm was based on Wiener filtering and frame-drooping, and the models were trained using clean speech, the proposed LTSE VAD leaded to word error rate reductions of up to 58.71%, 49.36%, 17.66% and 15.54% over G.729, AMR1, AMR2 and AFE VADs, respectively, while the advantages were of up to 35.81%, 35.77%, 16.92% and 15.18% when the models were trained using noisy speech.



Citations per Year

344 Citations

Semantic Scholar estimates that this publication has 344 citations based on the available data.

See our FAQ for additional information.