• Publications
  • Influence
SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation
TLDR
SimLex-999 is presented, a gold standard resource for evaluating distributional semantic models that improves on existing resources in several important ways, and explicitly quantifies similarity rather than association or relatedness so that pairs of entities that are associated but not actually similar have a low rating.
The Hitchhiker’s Guide to Testing Statistical Significance in Natural Language Processing
TLDR
This opinion/ theoretical paper proposes a simple practical protocol for statistical significance test selection in NLP setups and accompanies this protocol with a brief survey of the most relevant tests.
Modeling the Detection of Textual Cyberbullying
TLDR
This work decomposes the overall detection problem into detection of sensitive topics, lending itself into text classification sub-problems and shows that the detection of textual cyberbullying can be tackled by building individual topic-sensitive classifiers.
SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity
TLDR
SimVerb-3500, an evaluation resource that provides human ratings for the similarity of 3,500 verb pairs, is introduced, hoping that it will enable a richer understanding of the diversity and complexity of verb semantics and guide the development of systems that can effectively represent and interpret this meaning.
Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints
TLDR
The evaluation shows that the Attract-Repel method can make use of existing cross-lingual lexicons to construct high-quality vector spaces for a plethora of different languages, facilitating semantic transfer from high- to lower-resource ones.
Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints
TLDR
The evaluation shows that the Attract-Repel method can make use of existing cross-lingual lexicons to construct high-quality vector spaces for a plethora of different languages, facilitating semantic transfer from high- to lower-resource ones.
Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction
TLDR
A novel word level vector representation based on symmetric patterns (SPs) that performs exceptionally well on verbs, and a simple combination of the word similarity scores generated by the method and by word2vec results in a superior predictive power over that of each individual model.
Pivot Based Language Modeling for Improved Neural Domain Adaptation
TLDR
The Pivot Based Language Model is presented, a representation learning model that marries together pivot-based and NN modeling in a structure aware manner and can naturally feed structure aware text classifiers such as LSTM and CNN.
Multi-Task Active Learning for Linguistic Annotations
TLDR
It is shown that MTAL outperforms random selection and a stronger baseline, onesided example selection, in which one task is pursued using AL and the selected examples are provided also to the other task.
Separated by an Un-common Language: Towards Judgment Language Informed Vector Space Modeling
TLDR
The analysis reveals that human judgments are strongly impacted by the judgment language, and it is shown that in a large number of setups, multilingual VSM combination results in improved correlations with human judgments, suggesting that multilingualism may partially compensate for the judge language effect on human judgments.
...
1
2
3
4
5
...