Learn More
Multiword expressions are a key problem for the development of large-scale, linguistically sound natural language processing technology. This paper surveys the problem and some currently available analytic techniques. The various kinds of multiword expressions should be analyzed in distinct ways, including listing " words with spaces " , hierarchically(More)
This paper introduces the novel task of topic coherence evaluation, whereby a set of words, as generated by a topic model, is rated for coherence or interpretability. We apply a range of topic scoring models to the evaluation task, drawing on WordNet, Wikipedia and the Google search engine, and existing research on lexical similarity/relatedness. In(More)
This paper describes Task 5 of the Workshop on Semantic Evaluation 2010 (SemEval-2010). Systems are to automatically assign keyphrases or keywords to given scientific articles. The participating systems were evaluated by matching their extracted keyphrases against manually assigned ones. We present the overall ranking of the submitted systems and discuss(More)
We apply topic modelling to automatically induce word senses of a target word, and demonstrate that our word sense induction method can be used to automatically detect words with emergent novel senses, as well as token occurrences of those senses. We start by exploring the utility of standard topic models for word sense induction (WSI), with a predetermined(More)
This paper presents a construction-inspecific model of multiword expression decomposability based on latent semantic analysis. We use latent semantic analysis to determine the similarity between a multiword expression and its constituent words, and claim that higher similarities indicate greater decomposability. We test the model over English noun-noun(More)
Topic models based on latent Dirichlet allocation and related methods are used in a range of user-focused tasks including document navigation and trend analysis, but evaluation of the intrinsic quality of the topic model and topics remains an open research area. In this work, we explore the two tasks of automatic evaluation of single topics and automatic(More)
We propose a method for automatically labelling topics learned via LDA topic models. We generate our label candidate set from the top-ranking topic terms, titles of Wikipedia articles containing the top-ranking topic terms, and sub-phrases extracted from the Wikipedia article titles. We rank the label candidates using a combination of association measures(More)