• Publications
  • Influence
TextRank: Bringing Order into Text
TextRank, a graph-based ranking model for text processing, is introduced and it is shown how this model can be successfully used in natural language applications.
Corpus-based and Knowledge-based Measures of Text Semantic Similarity
This paper shows that the semantic similarity method out-performs methods based on simple lexical matching, resulting in up to 13% error rate reduction with respect to the traditional vector-based similarity metric.
Wikify!: linking documents to encyclopedic knowledge
This paper introduces the use of Wikipedia as a resource for automatic keyword extraction and word sense disambiguation, and shows how this online encyclopedia can be used to achieve state-of-the-art
SemEval-2007 Task 14: Affective Text
The data set used in the evaluation and the results obtained by the participating systems are described, meant as an exploration of the connection between emotions and lexical semantics.
Learning to identify emotions in text
The construction of a large data set annotated for six basic emotions, ANGER, DISGUST, FEAR, JOY, SADNESS and SURPRISE, and several knowledge-based and corpusbased methods for the automatic identification of these emotions in text are proposed.
The Senseval-3 English lexical sample task
The task definition, resources, participating systems, and comparative results for the English lexical sample task, which was organized as part of the SENSEVAL-3 evaluation exercise, are presented.
Measuring the Semantic Similarity of Texts
A method that combines word- to-word similarity metrics into a text-to-text metric is introduced, and it is shown that this method outperforms the traditional text similarity metrics based on lexical matching.
Text-to-Text Semantic Similarity for Automatic Short Answer Grading
This paper compares a number of knowledge-based and corpus-based measures of text similarity, evaluates the effect of domain and size on the corpus- based measures, and introduces a novel technique to improve the performance of the system by integrating automatic feedback from the student answers.
The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language
It is shown that automatic classification is a viable technique to distinguish between truth and falsehood as expressed in language and a method for class-based feature analysis is introduced, which sheds some light on the features that are characteristic for deceptive text.
SemEval-2014 Task 10: Multilingual Semantic Textual Similarity
This year, the participants were challenged with new data sets for English, as well as the introduction of Spanish, as a new language in which to assess semantic similarity, and the annotations for both tasks leveraged crowdsourcing.