Ricard Marxer

Learn More
The CHiME challenge series aims to advance far field speech recognition technology by promoting research at the interface of signal processing and automatic speech recognition. This paper presents the design and outcomes of the 3rd CHiME Challenge, which targets the performance of automatic speech recognition in a real-world, commercially-motivated(More)
Model-free reinforcement learning has been shown to be a promising data driven approach for automatic dialogue policy optimization, but a relatively large amount of dialogue interactions is needed before the system reaches reasonable performance. Recently, Gaussian process based reinforcement learning methods have been shown to reduce the number of(More)
We report here about our submissions to different music classification tasks for the MIREX 2010 evaluations. These submissions are similar to the ones sent at MIREX 2009 (see [1]), if we look at the classifiers and the main audio features. However we added high-level features (or semantic features), based on Support Vector Machine models of curated(More)
Traditional intelligibility models are concerned with predicting the average number of words heard correctly in given noise conditions and can be readily tested by comparison with listener data. In contrast, recent ‘microscopic’ intelligibility models, which attempt to make precise predictions about a listener’s perception or misperception of specific(More)
We present the use of a Tikhonov regularization based method, as an alternative to the Non-negative Matrix Factorization (NMF) approach, for source separation in professional audio recordings. This method is a direct and computationally less expensive solution to the problem, which makes it interesting in low latency scenarios. The technique sacrifices the(More)
We introduce a generic model of emergence of musical categories during the listening process. The model is based on a preprocessing and a categorization module. Preprocessing results in a perceptually plausible representation of music events extracted from audio or symbolic input. The categorization module lets a taxonomy of musical entities emerge(More)
We present a method for lead instrument separation using an available musical score that may not be properly aligned with the polyphonic audio mixture. Improper alignment degrades the performance of existing score-informed source separation algorithms. Several techniques are proposed to manage local and global misalignments, such as a score information(More)
A causal system to represent a stream of music into musical events, and to generate further expected events, is presented. Starting from an auditory front-end which extracts low-level (i.e. MFCC) and mid-level features such as onsets and beats, an unsupervised clustering process builds and maintains a set of symbols aimed at representing musical stream(More)
We present a review on perception and cognition models designed for or applicable to music. An emphasis is put on computational implementations. We include findings from different disciplines: neuroscience, psychology, cognitive science, artificial intelligence, and musicology. The article summarizes the methodology that these disciplines use to approach(More)