Word-Embeddings Distinguish Denominal and Root-Derived Verbs in Semitic

  title={Word-Embeddings Distinguish Denominal and Root-Derived Verbs in Semitic},
  author={Ido Benbaji and Omri Doron and Adele H'enot-Mortier},
Proponents of the Distributed Morphology framework have posited the existence of two levels of morphological word formation: a lower one, leading to loose input-output semantic relationships; and an upper one, leading to tight input-output semantic relationships. In this work, we propose to test the validity of this assumption in the context of Hebrew word embeddings. If the two-level hypothesis is borne out, we expect state-of-the-art Hebrew word embeddings to encode (1) a noun, (2) a… 



The root and word distinction: an experimental study of Hebrew denominal verbs

The morpho-syntactic structure of Semitic languages, traditionally seen as based on abstract root morphemes, has been analysed by some as being fully word based. Others have proposed a root-based

Locality Constraints on the Interpretation of Roots: The Case of Hebrew Denominal VERBS

This paper argues for a distinction between word formation fromroots and word formation from existing words. Focusing on Hebrew, it is shown that roots – and only roots – may be assigned multiple

Decomposing morphologically complex words in a nonlinear morphology.

The authors investigated the decomposition process, focusing on the structural properties of verbal forms that are perceived and extracted during word recognition, and demonstrated that if 1 consonant is missing, the parsing system collapses, and there is no evidence for morphological priming.

Skip-Gram − Zipf + Uniform = Vector Additivity

This work shows that Skip-Gram embeddings are optimal in the sense of Globerson and Tishby and implies that the heuristics commonly used to approximately fit Skip-gram models can be used to fit SDR models.

On the identity of roots

This paper attempts to articulate the essential nature of the notion ‘root’ in the morphosyntax, and argues that roots must be individuated purely abstractly, as independent indices on the √ node in the syntactic computation that serves as the linkage between a particular set of spell-out instructions and a particularSet of interpretive instructions.

A Latent Variable Model Approach to PMI-based Word Embeddings

A new generative model is proposed, a dynamic version of the log-linear topic model of Mnih and Hinton (2007) to use the prior to compute closed form expressions for word statistics, and it is shown that latent word vectors are fairly uniformly dispersed in space.

Enriching Word Vectors with Subword Information

A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.

Analogies Explained: Towards Understanding Word Embeddings

A probabilistically grounded definition of paraphrasing is derived that is re-interpreted as word transformation, a mathematical description of "$w_x$ is to $w_y$".

Learning Word Vectors for 157 Languages

This paper describes how high quality word representations for 157 languages were trained on the free online encyclopedia Wikipedia and data from the common crawl project, and introduces three new word analogy datasets to evaluate these word vectors.