Hervé Glotin

Learn More
In this paper, we develop di€erent mathematical models in the framework of the multi-stream paradigm for noise robust automatic speech recognition (ASR), and discuss their close relationship with human speech perception. Largely inspired by Fletcher's ``product-of-errors'' rule (PoE rule) in psychoacoustics, multi-band ASR aims for robustness to data(More)
We report a summary of the Johns Hopkins Summer 2000 Workshop on audio-visual automatic speech recognition (ASR) in the large-vocabulary, continuous speech domain. Two problems of audio-visual ASR were mainly addressed: Visual feature extraction and audio-visual information fusion. First, image transform and model-based visual features were considered,(More)
We describe a new model of CASA labelling which assigns to each time-frequency region a probability "clean" enough to feed a multistream recogniser only adapted to clean data. This labelling process is based on the harmonicity of the speech. The probability is evaluated according to a SNR-feature mapping and the choice of a SNR decision threshold. This(More)
Many biological monitoring projects rely on acoustic detection of birds. Despite increasingly large datasets, this detection is often manual or semi-automatic, requiring manual tuning/postprocessing. We review the state of the art in automatic bird sound detection, and identify a widespread need for tuning-free and species-agnostic approaches. We introduce(More)
Building accurate knowledge of the identity, the geographic distribution and the evolution of living species is essential for a sustainable development of humanity as well as for biodiversity conservation. In this context, using multimedia identification tools is considered as one of the most promising solutions to help bridging the taxonomic gap. With the(More)
Using multimedia identification tools is considered as one of the most promising solutions to help bridging the taxonomic gap and build accurate knowledge of the identity, the geographic distribution and the evolution of living species. Large and structured communities of nature observers (e.g. eBird, Xeno-canto, Tela Botanica, etc.) as well as big(More)
The LifeCLEF bird identification task provides a testbed for a system-oriented evaluation of 999 bird species identification. The main originality of this data is that it was specifically built through a citizen science initiative conducted by Xeno-Canto, an international social network of amateur and expert ornithologists. This makes the task closer to the(More)
This paper addresses the method of multichannel signal separation with its application to cocktail party speech recognition. First, we present a fundamental principle for multichannel signal separation which describes what spatial independence criterion results in. Second, for practical implementation of the signal separation lter, we consider a dynamic(More)
Statistical approaches for Functional Data Analysis concern the paradigm for which the individuals are functions or curves rather than finite dimensional vectors. In this paper, we particularly focus on the modeling and the classification of functional data which are temporal curves presenting regime changes over time. More specifically, we propose a new(More)