Encoding of phonology in a recurrent neural model of grounded speech

@article{Alishahi2017EncodingOP,
  title={Encoding of phonology in a recurrent neural model of grounded speech},
  author={A. Alishahi and Marie Barking and Grzegorz Chrupała},
  journal={ArXiv},
  year={2017},
  volume={abs/1706.03815}
}
We study the representation and encoding of phonemes in a recurrent neural network model of grounded speech. We use a model which processes images and their spoken descriptions, and projects the visual and auditory representations into the same semantic space. We perform a number of analyses on how information about individual phonemes is encoded in the MFCC features extracted from the speech signal, and the activations of the layers of the model. Via experiments with phoneme decoding and… Expand
Word Recognition, Competition, and Activation in a Model of Visually Grounded Speech
TLDR
It is found that not all speech frames play an equal role in the final encoded representation of a given word, but that some frames have a crucial effect on it and it is suggested that word representation could be activated through a process of lexical competition. Expand
Analyzing analytical methods: The case of phonology in neural models of spoken language
TLDR
It is concluded that reporting analysis results with randomly initialized models is crucial, and that global-scope methods tend to yield more consistent and interpretable results and are recommend their use as a complement to local-scope diagnostic methods. Expand
Encoding of speaker identity in a Neural Network model of Visually Grounded Speech perception
This thesis presents research on how the unique characteristics of a voice are encoded in a Recurrent Neural Network (RNN) trained on Visually Grounded Speech signals. Multiple experiments wereExpand
Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition
TLDR
This paper analyzes the learned internal representations in an end-to-end ASR model and finds remarkable consistency in how different properties are represented in different layers of the deep neural network. Expand
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech
In this paper, we present a method for learning discrete linguistic units by incorporating vector quantization layers into neural models of visually grounded speech. We show that our method isExpand
On the difficulty of a distributional semantics of spoken language
TLDR
It is conjecture that unsupervised learning of spoken language semantics becomes possible if the authors abstract from the surface variability, and possible routes toward transferring these approaches to the domain of unrestricted natural speech are suggested. Expand
From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings
A core part of linguistic typology is the classification of languages according to linguistic properties, such as those detailed in the World Atlas of Language Structure (WALS). Doing this manuallyExpand
On internal language representations in deep learning: an analysis of machine translation and speech recognition
TLDR
A unified methodology for evaluating internal representations in neural networks, consisting of three steps: training a model on a complex end-to-end task; generating feature representations from different parts of the trained model; and training classifiers on simple supervised learning tasks using the representations. Expand
Understanding Learning Dynamics Of Language Models with SVCCA
TLDR
This first study on the learning dynamics of neural language models is presented, using a simple and flexible analysis method called Singular Vector Canonical Correlation Analysis (SVCCA), which enables to compare learned representations across time and across models, without the need to evaluate directly on annotated data. Expand
Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems
TLDR
This work analyzes the speech representations learned by a deep end-to-end model that is based on convolutional and recurrent layers, and trained with a connectionist temporal classification (CTC) loss and evaluates representations from different layers of the deep model. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 44 REFERENCES
Representation of Linguistic Form and Function in Recurrent Neural Networks
TLDR
A method for estimating the amount of contribution of individual tokens in the input to the final prediction of the networks is proposed and shows that the Visual pathway pays selective attention to lexical categories and grammatical functions that carry semantic information, and learns to treat word types differently depending on their grammatical function and their position in the sequential structure of the sentence. Expand
From phonemes to images: levels of representation in a recurrent neural model of visually-grounded language learning
TLDR
This work presents a model of visually-grounded language learning based on stacked gated recurrent neural networks which learns to predict visual features given an image description in the form of a sequence of phonemes, and shows that it represents linguistic information in a hierarchy of levels. Expand
Detection of phonological features in continuous speech using neural networks
TLDR
This paper reports experiments on three phonological feature systems: the Sound Pattern of English (SPE) system, amulti-valued (MV) feature system which uses traditional phonetic categories such as manner, place, etc., and government Phonology which uses a set of structured primes. Expand
Dynamic Encoding of Acoustic Features in Neural Responses to Continuous Speech
TLDR
Electroencephalography responses to continuous speech are characterized by obtaining the time-locked responses to phoneme instances (phoneme-related potential), and it is found that each instance of a phoneme in continuous speech produces multiple distinguishable neural responses occurring as early as 50 ms and as late as 400 ms after the phoneme onset. Expand
Representations of language in a model of visually grounded speech signal
TLDR
An in-depth analysis of the representations used by different components of the trained model shows that encoding of semantic aspects tends to become richer as the authors go up the hierarchy of layers, whereas encoding of form-related aspects of the language input tends to initially increase and then plateau or decrease. Expand
Common Neural Basis for Phoneme Processing in Infants and Adults
TLDR
It is argued that infants have access at the beginning of life to phonemic representations, which are modified without training or implicit instruction, but by the statistical distributions of speech input in order to converge to the native phonemic categories. Expand
Exploiting deep neural networks for detection-based speech recognition
TLDR
It is shown that DNNs can be used to boost the classification accuracy of basic speech units, such as phonetic attributes (phonological features) and phonemes, and results in improved word recognition accuracy, which is better than previously reported word lattice rescoring results. Expand
Semantics guide infants' vowel learning: Computational and experimental evidence.
TLDR
Computational as well as experimental support is given for the idea that semantic context plays a role in disambiguating phonetic auditory input. Expand
Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks
TLDR
This work proposes a framework that facilitates better understanding of the encoded representations of sentence vectors and demonstrates the potential contribution of the approach by analyzing different sentence representation mechanisms. Expand
Memory for Serial Order : A Network Model of the Phonological Loop and its Timing
A connectionist model of human short-term memory is presented that extends the `phonological loop' (A. D. Baddeley, 1986) to encompass serial order and learning. Psychological and neuropsychologicalExpand
...
1
2
3
4
5
...