Learn More
The Interspeech 2015 Zero Resource Speech Challenge aims at discovering subword and word units from raw speech. The challenge provides the first unified and open source suite of evaluation metrics and data sets to compare and analyse the results of unsupervised linguistic unit discovery algorithms. It consists of two tracks. In the first, a psychophysically(More)
The unsupervised discovery of linguistic terms from either continuous phoneme transcriptions or from raw speech has seen an increasing interest in the past years both from a theoretical and a practical standpoint. Yet, there exists no common accepted evaluation method for the systems performing term discovery. Here, we propose such an evaluation toolbox,(More)
This paper presents a deep architecture for learning a similarity metric on variable-length character sequences. The model combines a stack of character-level bidi-rectional LSTM's with a Siamese architecture. It learns to project variable-length strings into a fixed-dimensional embedding space by using only information about the similarity between pairs of(More)
Infants learn language at an incredible speed, and one of the first steps in this voyage is learning the basic sound units of their native languages. It is widely thought that caregivers facilitate this task by hyperarticulating when speaking to their infants. Using state-of-the-art speech technology, we addressed this key theoretical question: Are sound(More)
We report on an architecture for the unsupervised discovery of talker-invariant subword embeddings. It is made out of two components: a dynamic-time warping based spoken term discovery (STD) system and a Siamese deep neural network (DNN). The STD system clusters word-sized repeated fragments in the acoustic streams while the DNN is trained to minimize the(More)
This paper reports on the results of the Zero Resource Speech Challenge 2015, the first unified benchmark for zero resource speech technology, which aims at the unsupervised discovery of subword and word units from raw speech. This paper discusses the motivation for the challenge, its data sets, tasks and baseline systems. We outline the ideas behind the(More)
Recent work has explored deep architectures for learning acoustic features in an unsupervised or weakly-supervised way for phone recognition. Here we investigate the role of the input features, and in particular we test whether standard mel-scaled filterbanks could be replaced by inherently richer representations, such as derived from an analytic scattering(More)
This paper investigates the effects of novel words on a cogni-tively plausible computational model of word learning. The model is first familiarized with a set of words, achieving high recognition scores and subsequently offered novel words for training. We show that the model is able to recognize the novel words as different from the previously seen words,(More)
This paper studies the properties of the Histograms of Acoustic Co-occurrences (HAC) approach to acoustic modeling. While HAC-vectors have been predominantly used with matrix decomposition algorithms, we show that the additivity and sparse-ness constraints inherent in HAC lead to a representational space in which utterances are linearly separable with(More)