Data Set Used
Recent work has shown that simple vector subtraction over word embeddings is surprisingly effective at capturing different lexical relations, despite lacking explicit supervision. Prior work has evaluated this intriguing result using a word analogy prediction formulation and hand-selected relations, but the generality of the finding over a broader range of… (More)
We describe an algorithm for automatic classification of idiomatic and literal expressions. Our starting point is that words in a given text segment, such as a paragraph, that are high-ranking representatives of a common topic of discussion are less likely to be a part of an idiomatic expression. Our additional hypothesis is that contexts in which idioms… (More)
This paper describes an experimental approach to Detection of Minimal Semantic Units and their Meaning (DiMSUM), explored within the framework of SemEval'16 Task 10. The approach is primarily based on a combination of word embeddings and parser-based features, and employs unidirectional in-cremental computation of compositional em-beddings for multiword… (More)
Dealing with the co mplex word forms in morphologically rich languages is an open problem in language processing, and is particularly important in translation. In contrast to most modern neural systems of translation, which discard the identity for rare words, in this paper we propose several architectures for learning word representations from character… (More)
This paper presents a novel approach to low resource language modeling. Here we propose a model for word prediction which is based on multi-variant ngram abstraction with weighted confidence level. We demonstrate a significant improvement in word recall over " traditional " Kneser-Ney back-off model for most of the examined low resource languages.