• Corpus ID: 7684909

Sound-based distributional models

@inproceedings{Lopopolo2015SoundbasedDM,
  title={Sound-based distributional models},
  author={Alessandro Lopopolo and Emiel van Miltenburg},
  booktitle={IWCS},
  year={2015}
}
Following earlier work in multimodal distributional semantics, we present the first results of our efforts to build a perceptually grounded semantic model. Rather than using images, our models are built on sound data collected from freesound.org. We compare three models: one bag-of-words model based on user-provided tags, a model based on audio features, using a ‘bag-of-audio-words’ approach and a model that combines the two. Our results show that the models are able to capture semantic… 

Tables from this paper

Audio-Based Distributional Representations of Meaning Using a Fusion of Feature Encodings
TLDR
This work constructs an ADSM model in order to compute the distance between words (lexical semantic similarity task) and is shown to significantly outperform the state-of-the-art results reported in the literature.
Audio-based Distributional Semantic Models for Music Auto-tagging and Similarity Measurement
TLDR
Acoustic-semantic models are shown to outperform the state-of-the-art for this task and produce high quality tags for audio/music clips.
Sound-Word2Vec: Learning Word Representations Grounded in Sounds
TLDR
This work treats sound as a first-class citizen, studying downstream 6textual tasks which require aural grounding and proposes sound-word2vec – a new embedding scheme that learns specialized word embeddings grounded in sounds.
The VU Sound Corpus: Adding More Fine-grained Annotations to the Freesound Database
TLDR
A collection of annotations for a set of 2,133 environmental sounds taken from the Freesound database is presented, finding that it is not only feasible to perform crowd-labeling for a large collection of sounds, but it is also very useful to highlight different aspects of the sounds that authors may fail to mention.
Learning Neural Audio Embeddings for Grounding Semantics in Auditory Perception
TLDR
This paper examines grounding semantic representations in raw auditory data, using standard evaluations for multi-modal semantics, and shows how they can be applied to tasks where auditory perception is relevant, including two unsupervised categorization experiments.
Sensory-Aware Multimodal Fusion for Word Semantic Similarity Estimation
TLDR
This work estimates multimodal word representations via the fusion of auditory and visual modalities with the text modality through middle and late fusion of representations with modality weights assigned to each of the unimodal representations.
Semantic memory: A review of methods, models, and current challenges
  • A. Kumar
  • Psychology, Computer Science
    Psychonomic bulletin & review
  • 2020
TLDR
Traditional and modern computational models of semantic memory are reviewed, within the umbrella of network (free association-based), feature (property generation norms- based), and distributional semantic (natural language corpora-based) models, and the contribution of these models to important debates in the literature regarding knowledge representation and learning is discussed.
A Synchronized Word Representation Method With Dual Perceptual Information
TLDR
A language model is proposed that synchronously trains dual perceptual information to enhance word representation in a synchronized way that adopts an attention model to utilize both text and phonetic perceptual information in unsupervised learning tasks.
Exploiting Disagreement Through Open-Ended Tasks for Capturing Interpretation Spaces
TLDR
This research investigates how the complete interpretation space of humans about the content and context of this data can be captured, using open-ended crowdsourcing tasks that optimize the capturing of multiple interpretations combined with disagreement based metrics for evaluation of the results.
Efficient Hybrid Semantic Text Similarity using Wordnet and a Corpus
TLDR
This article evaluates previous word similarity measures on benchmark datasets and then uses a hybrid word similarity in a novel text similarity measure (TSM), based on information content and WordNet semantic relations.
...
1
2
3
...

References

SHOWING 1-10 OF 25 REFERENCES
Multimodal Distributional Semantics
TLDR
This work proposes a flexible architecture to integrate text- and image-based distributional information, and shows in a set of empirical tests that the integrated model is superior to the purely text-based approach, and it provides somewhat complementary semantic information with respect to the latter.
Distributional Semantics in Technicolor
TLDR
While visual models with state-of-the-art computer vision techniques perform worse than textual models in general tasks, they are as good or better models of the meaning of words with visual correlates such as color terms, even in a nontrivial task that involves nonliteral uses of such words.
Distributional semantics with eyes: using image analysis to improve computational representations of word meaning
TLDR
This work claims that image analysis techniques can "return the favor" to the text processing community and be successfully used for a general-purpose representation of word meaning and shows how distinguishing between a concept and its context in images can improve the quality of the word meaning representations extracted from images.
SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation
TLDR
SimLex-999 is presented, a gold standard resource for evaluating distributional semantic models that improves on existing resources in several important ways, and explicitly quantifies similarity rather than association or relatedness so that pairs of entities that are associated but not actually similar have a low rating.
Coherent bag-of audio words model for efficient large-scale video copy detection
TLDR
This paper attempts to tackle the video copy detection task resorting to audio information, which is equivalently important as well as visual information in multimedia processing, and proposes a bag-of audio words (BoA) representation to characterize each audio frame.
Spectral vs. spectro-temporal features for acoustic event detection
  • Courtenay V. Cotton, D. Ellis
  • Computer Science, Physics
    2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
  • 2011
TLDR
This work proposes an approach to detecting and modeling acoustic events that directly describes temporal context, using convolutive non-negative matrix factorization (NMF), and discovers a set of spectro-temporal patch bases that best describe the data.
Zero-Shot Learning Through Cross-Modal Transfer
TLDR
This work introduces a model that can recognize objects in images even if no training data is available for the object class, and uses novelty detection methods to differentiate unseen classes from seen classes.
Bag-of-Audio-Words Approach for Multimedia Event Classification
TLDR
Variations of the BoAW method are explored and results on NIST 2011 multimedia event detection (MED) dataset are presented.
From Frequency to Meaning: Vector Space Models of Semantics
TLDR
The goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs, and to provide pointers into the literature for those who are less familiar with the field.
A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.
TLDR
A new general theory of acquired similarity and knowledge representation, latent semantic analysis (LSA), is presented and used to successfully simulate such learning and several other psycholinguistic phenomena.
...
1
2
3
...