Annika Hämäläinen

Learn More
Standard automatic speech recognition (ASR) systems use acoustic models typically trained with speech of young adult speakers. Ageing is known to alter speech production in ways that require ASR systems to be adapted, in particular at the level of acoustic modeling. This paper reports ASR experiments that illustrate the impact of speaker age on speech(More)
Articulatory and acoustic reduction can manifest itself in the temporal and spectral domains. This study introduces a measure of spectral reduction, which is based on the speech decoding techniques commonly used in automatic speech recognizers. Using data for four frequent Dutch affixes from a large corpus of spontaneous face-to-face conversations, it(More)
In this paper, we construct multi-path syllable models using phonetic knowledge for initialising the parallel paths, and a data-driven solution for their re-estimation. We hypothesise that the richer topology of multi-path syllable models would be better at accounting for pronunciation variation than context-dependent phone models that can only account for(More)
This paper builds on previous work aimed at unraveling the structure of the speech signal using probabilistic representations. The context of this work is a multi-pass speech recognition system in which a phone lattice is created and used as a basis for a lexical decoding pass (search) that allows symbolic mismatches at certain costs. The focus is on the(More)
Large speech corpora with word-level transcriptions annotated for noises and disfluent speech are necessary for training automatic speech recognisers. Crowdsourcing is a lower-cost, faster-turnaround, highly scalable alternative for expert transcription and annotation. In this paper, we showcase our three-step crowdsourcing approach motivated by the(More)
The PaeLife project is a European industry-academia collaboration whose goal is to provide the elderly with easy access to online services that make their life easier and encourage their continued participation in the society. To reach this goal, the project partners are developing a multimodal virtual personal life assistant (PLA) offering a wide range of(More)
Recent research on the TIMIT corpus suggests that longerlength acoustic units are better suited for modelling coarticulation and long-term temporal dependencies in speech than conventional context-dependent phone models. However, the impressive results achieved on TIMIT [1] are yet to be reproduced on other corpora, such as read speech from the Spoken Dutch(More)
Recent research suggests that modeling coarticulation in speech is more appropriate at the syllable level. However, due to a number of additional factors that affect the way syllables are articulated, creating multiple paths through syllable models might be necessary. Our previous research on longer-length multi-path models in connected digit recognition(More)
Recent research suggests that it is more appropriate to model pronunciation variation with syllable-length acoustic models than with triphones. Due to the large number of factors contributing to pronunciation variation at the syllable level, the creation of multi-path model topologies appears necessary. In this paper, we construct multi-path models using(More)