• Publications
  • Influence
SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation
SimLex-999 is presented, a gold standard resource for evaluating distributional semantic models that improves on existing resources in several important ways, and explicitly quantifies similarity rather than association or relatedness so that pairs of entities that are associated but not actually similar have a low rating.
Learning Distributed Representations of Sentences from Unlabelled Data
A systematic comparison of models that learn distributed phrase or sentence representations from unlabelled data finds that the optimal approach depends critically on the intended application.
A large-scale classification of English verbs
The result is a comprehensive Levin-style classification for English verbs providing over 90% token coverage of the Proposition Bank data and thus can be highly useful for practical applications.
SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity
SimVerb-3500, an evaluation resource that provides human ratings for the similarity of 3,500 verb pairs, is introduced, hoping that it will enable a richer understanding of the diversity and complexity of verb semantics and guide the development of systems that can effectively represent and interpret this meaning.
How to Train good Word Embeddings for Biomedical NLP
It is found that bigger corpora do not necessarily produce better biomedical domain word embeddings and one that creates contradictory results between intrinsic and extrinsic evaluations is observed.
Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints
The evaluation shows that the Attract-Repel method can make use of existing cross-lingual lexicons to construct high-quality vector spaces for a plethora of different languages, facilitating semantic transfer from high- to lower-resource ones.
Extending VerbNet with Novel Verb Classes
The integration of 57 novel classes for verbs not covered (comprehensively) by Levin into VerbNet is described, which is the most extensive Levin-style classification for English verbs which can be highly useful for practical applications.
Learning to Understand Phrases by Embedding the Dictionary
This work proposes using the definitions found in everyday dictionaries as a means of bridging the gap between lexical and phrasal semantics, and presents two applications of these architectures: reverse dictionaries that return the name of a concept given a definition or description and general-knowledge crossword question answerers.
HyperLex: A Large-Scale Evaluation of Graded Lexical Entailment
We introduce HyperLex—a data set and evaluation resource that quantifies the extent of the semantic category membership, that is, type-of relation, also known as hyponymy–hypernymy or lexical
Automatic Linguistic Annotation ofLarge Scale L2 Databases: The EF-Cambridge Open Language Database(EFCamDat)
A new English L2 database, the EF Cambridge Open Language Database, henceforth EFCAMDAT is introduced, developed by the Department of Theoretical and Applied Linguistics at the University of Cambridge in collaboration with EF Education First, an international educational organization.