Corpus ID: 234469796

Discrete representations in neural models of spoken language

@article{Higy2021DiscreteRI,
  title={Discrete representations in neural models of spoken language},
  author={Bertrand Higy and Lieke Gelderloos and A. Alishahi and Grzegorz Chrupała},
  journal={ArXiv},
  year={2021},
  volume={abs/2105.05582}
}
The distributed and continuous representations used by neural networks are at odds with representations employed in linguistics, which are typically symbolic. Vector quantization has been proposed as a way to induce discrete neural representations that are closer in nature to their linguistic counterparts. However, it is not clear which metrics are the best-suited to analyze such discrete representations. We compare the merits of four commonly used metrics in the context of weakly supervised… Expand

Figures and Tables from this paper

Visually grounded models of spoken language: A survey of datasets, architectures and evaluation techniques
TLDR
An overview of the evolution of visually grounded models of spoken language over the last 20 years is provided, which discusses the central research questions addressed, the timeline of developments, and the datasets which enabled much of this work. Expand

References

SHOWING 1-10 OF 43 REFERENCES
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech
In this paper, we present a method for learning discrete linguistic units by incorporating vector quantization layers into neural models of visually grounded speech. We show that our method isExpand
Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge
TLDR
Two neural models are proposed to tackle the challenge of discrete representations of speech that separate phonetic content from speaker-specific details, using vector quantization to map continuous features to a finite set of codes. Expand
From Audio to Semantics: Approaches to End-to-End Spoken Language Understanding
TLDR
This paper formulate audio to semantic understanding as a sequence-to-sequence problem, and proposes and compares various encoder-decoder based approaches that optimize both modules jointly, in an end- to-end manner. Expand
Neural Discrete Representation Learning
TLDR
Pairing these representations with an autoregressive prior, the model can generate high quality images, videos, and speech as well as doing high quality speaker conversion and unsupervised learning of phonemes, providing further evidence of the utility of the learnt representations. Expand
Discovering discrete subword units with binarized autoencoders and hidden-Markov-model encoders
TLDR
Binarized Autoencoder and Hidden-Markov-Model Encoder can outperform standard AEs when using a larger number of encoding nodes, while HMM Encoders may allow more compact subword transcriptions without worsening the ABX performance. Expand
Collecting Image Annotations Using Amazon’s Mechanical Turk
TLDR
It is found that the use of a qualification test provides the highest improvement of quality, whereas refining the annotations through follow-up tasks works rather poorly. Expand
Analyzing analytical methods: The case of phonology in neural models of spoken language
TLDR
It is concluded that reporting analysis results with randomly initialized models is crucial, and that global-scope methods tend to yield more consistent and interpretable results and are recommend their use as a complement to local-scope diagnostic methods. Expand
Cross-Modal Discrete Representation Learning
TLDR
This work presents a self-supervised learning framework that is able to learn a representation that captures finer levels of granularity across different modalities such as concepts or events represented by visual objects or spoken words. Expand
Visually grounded models of spoken language: A survey of datasets, architectures and evaluation techniques
TLDR
An overview of the evolution of visually grounded models of spoken language over the last 20 years is provided, which discusses the central research questions addressed, the timeline of developments, and the datasets which enabled much of this work. Expand
Discourse structure interacts with reference but not syntax in neural language models
TLDR
This work utilized stimuli from psycholinguistic studies showing that humans can condition reference and syntactic processing on the same discourse structure to find that, contrary to humans, implicit causality only influences LM behavior for reference, not syntax, despite model representations that encode the necessary discourse information. Expand
...
1
2
3
4
5
...