Finding Non-Arbitrary Form-Meaning Systematicity Using String-Metric Learning for Kernel Regression

@inproceedings{Gutirrez2016FindingNF,
  title={Finding Non-Arbitrary Form-Meaning Systematicity Using String-Metric Learning for Kernel Regression},
  author={E. Dar{\'i}o Guti{\'e}rrez and R. Levy and Benjamin K. Bergen},
  booktitle={ACL},
  year={2016}
}
Arbitrariness of the sign—the notion that the forms of words are unrelated to their meanings—is an underlying assumption of many linguistic theories. [] Key Method In the kernel regression formulation we introduce, form-meaning relationships can be used to predict words’ distributional semantic vectors from their forms. Furthermore, we introduce a novel metric learning algorithm that can learn weighted edit distances that minimize kernel regression error. Our results suggest that the English lexicon exhibits…

Figures and Tables from this paper

Meaning to Form: Measuring Systematicity as Information
TLDR
This work offers a holistic quantification of the systematicity of the sign using mutual information and recurrent neural networks, and finds a statistically significant reduction in entropy when modeling a word form conditioned on its semantic representation.
Finding Concept-specific Biases in Form–Meaning Associations
TLDR
New methods to detect cross-linguistic associations at scale are provided, and it is found that there is a significant effect of non-arbitrariness, but it is unsurprisingly small.
Modelling Form-Meaning Systematicity with Linguistic and Visual Features
TLDR
This paper constructs word meaning representations from linguistic as well as visual data and analyzes the structure and significance of form-meaning systematicity found in English using text-based models to corroborate the existence and show that this systematicity is concentrated in localized clusters.
What Meaning-Form Correlation Has to Compose With: A Study of MFC on Artificial and Natural Language
TLDR
It is found that linguistic phenomena such as synonymy and ungrounded stop-words weigh on MFC measurements, and that straightforward methods to mitigate their effects have widely varying results depending on the dataset they are applied to.
Wordform Similarity Increases With Semantic Similarity: An Analysis of 100 Languages
TLDR
Evidence is shown in 100 languages from a diverse array of language families that more semantically similar word pairs are also more phonologically similar, which suggests that there is an important statistical trend for lexicons to haveSemantically similar words be phonological similar as well, possibly for functional reasons associated with language learning.
Arbitrariness of Linguistic Sign Questioned: Correlation between Word Form and Meaning in Russian
In this paper, we present the results of preliminary experiments on finding the link between the surface forms of Russian nouns (as represented by their graphic forms) and their meanings (as
Discovering Phonesthemes with Sparse Regularization
TLDR
A simple method for extracting non-arbitrary form-meaning representations from a collection of semantic vectors and applies this model to the problem of automatically discovering phonesthemes, which are submorphemic sound clusters that appear in words with similar meaning.
On Homophony and R\'enyi Entropy
TLDR
A new information-theoretic quantification of a language’s homophony is proposed: the sample Rényi entropy and this quantification is used to revisit Trott and Bergen's claims.
On Homophony and Rényi Entropy
TLDR
A new information-theoretic quantification of a language’s homophony is proposed: the sample Rényi entropy and this quantification is used to revisit Trott and Bergen's claims.
NAACL HLT 2018 Subword and Character LEvel Models in NLP Proceedings of the Second Workshop
TLDR
It is found that word embeddings utilizing subword information consistently outperform standard word embedDings on a word similarity task and as initialization of the source word embeddeddings in a low-resource NMT system.
...
1
2
...

References

SHOWING 1-10 OF 44 REFERENCES
Determinants of wordlikeness: Phonotactics or lexical neighborhoods?
Wordlikeness, the extent to which a sound sequence is typical of words in a language, affects language acquisition, language processing, and verbal short-term memory. Wordlikeness has generally been
The Systematicity of the Sign: Modeling Activation of Semantic Attributes from Nonwords
TLDR
The extent to which similarities amongst the sounds of words was sufficient to drive sound symbolic effects was tested and whether a computational model that learned to map between form and meaning of English words better accounted for the observed behavior was tested.
Distributed Representations of Words and Phrases and their Compositionality
TLDR
This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Semantic Glimmers: Phonaesthemes Facilitate Access to Sentence Meaning
The association between sound and meaning is commonly thought to be symbolic and arbitrary. While this appears to be mostly correct, there is evidence that specific phonetic groupings can be broad
Exploring systematicity between phonological and context-cooccurrence representations of the mental lexicon
This paper investigates the existence of systematicity between two similarity-based representations of the lexicon, one focusing on word-form and another one based on cooccurrence statistics in
Automatic Labeling of Phonesthemic Senses
TLDR
This study attempts to advance corpus-based exploration of sound iconicity, i.e. the existence of a non-arbitrary relation- ship between forms and meanings in language, by examining a number of phonesthemes, phonetic groupings proposed to be meaningful in the literature with the aim of developing ways to validate their existence and their semantic content.
How arbitrary is language?
TLDR
It is proposed that the vocabulary is structured to enable systematicity in early language learning to promote language acquisition, while also incorporating arbitrariness for later language in order to facilitate communicative expressivity and efficiency.
Software Framework for Topic Modelling with Large Corpora
TLDR
This work describes a Natural Language Processing software framework which is based on the idea of document streaming, i.e. processing corpora document after document, in a memory independent fashion, and implements several popular algorithms for topical inference, including Latent Semantic Analysis and Latent Dirichlet Allocation in a way that makes them completely independent of the training corpus size.
The Origins of Arbitrariness in Language
TLDR
It is argued that arbitrariness be- comes necessary as the number of words increases and the effectiveness of competitive learning for acquiring lexicons that are arbitrary in this sense is discussed.
Good edit similarity learning by loss minimization
TLDR
This paper proposes an approach to edit similarity learning based on loss minimization, called GESL, driven by the notion of (ϵ,γ,τ)-goodness, a theory that bridges the gap between the properties of a similarity function and its performance in classification.
...
1
2
3
4
5
...