Charl Johannes van Heerden

Learn More
Spoken dialogue systems (SDSs) have great potential for information access in the developing world. However, the real-isation of that potential requires the solution of several challenging problems, including the development of sufficiently accurate speech recognisers for a diverse multitude of languages. We investigate the feasibility of developing(More)
We investigate the number of speakers and the amount of data that is required for the development of useable speaker-independent speech-recognition systems in resource-scarce languages. Our experiments employ the Lwazi corpus, which contains speech in the eleven official languages of South Africa. We find that a surprisingly small number of speakers (fewer(More)
Spoken recordings that have been transcribed for human reading (e.g. as captions for audiovisual material, or to provide alternative modes of access to recordings) are widely available in many languages. Such recordings and transcriptions have proven to be a valuable source of ASR data in well-resourced languages, but have not been exploited to a(More)
In this paper, we describe the " Spoken Web Search " Task, which was held as part of the 2011 MediaEval benchmark campaign. The purpose of this task was to perform audio search with audio input in four languages, with very few resources being available in each language. The data was taken from " spoken web " material collected over mobile phone connections(More)
In light of the serious problems with both illiteracy and information access in the developing world, there is a widespread belief that speech technology can play a significant role in improving the quality of life of developing-world citizens. We review the main reasons why this impact has not occurred to date, and propose that voice-search systems may be(More)
We describe the Lwazi corpus for automatic speech recognition (ASR), a new telephone speech corpus which contains data from the eleven official languages of South Africa. Because of practical constraints, the amount of speech per language is relatively small compared to major corpora in world languages, and we report on our investigation of the stability of(More)
This paper describes the language modeling architectures and recognition experiments that enabled support of 'what-with-where' queries on GOOG-411. First we compare accuracy trade-offs between a single national business LM for business queries and using many small models adapted for particular cities. Experimental evaluations show that both approaches lead(More)
We present a novel approach to automatic speaker age classification, which combines regression and classification to achieve competitive classification accuracy on telephone speech. Support vector machine regression is used to generate finer age estimates, which are combined with the posterior probabilities of well-trained discriminative gender classifiers(More)
We investigate the effectiveness with which the accuracy of a prompted speech corpus can be validated when minimal additional speech resources are available, and specifically when a language model in the target language is not available. We compare a word-based variant of Goodness of Pronunciation (GOP) with a phone-based dynamic programming (PDP) scoring(More)
Aiming at both speaker independence and robustness with respect to recognition errors in the spoken queries, we have implemented a two-pass system for spoken web search. In the first pass, unconstrained phone recognition of both the query terms and the content audio is employed to represent these recordings as phone strings. A dynamic-programming approach(More)