Learn More
We present a method for identifying the positive or negative semantic orientation of foreign words. Identifying the semantic orientation of words has numerous applications in the areas of text classification, analysis of product review, analysis of responses to surveys, and mining online discussions. Identifying the semantic orientation of English words has(More)
The New Yorker publishes a weekly captionless cartoon. More than 5,000 readers submit captions for it. The editors select three of them and ask the readers to pick the funniest one. We describe an experiment that compares a dozen automatic methods for selecting the funniest caption. We show that negative sentiment, human-centeredness, and lexical centrality(More)
We investigate the task of generating coherent survey articles for scientific topics. We introduce an extractive summarization algorithm that combines a content model with a discourse model to generate coherent and readable summaries of scientific topics using text from scientific articles relevant to the topic. Human evaluation on 15 topics in(More)
The Computational Linguistics (CL) Summarization Pilot Task was created to encourage a community effort to address the research problem of summarizing research articles as “faceted summaries” in the domain of computational linguistics. In this pilot stage, a handannotated set of citing papers was provided for ten reference papers to help in automating the(More)
In this paper, we investigate the problem of automatic generation of scientific surveys starting from keywords provided by a user. We present a system that can take a topic query as input and generate a survey of the topic by first selecting a set of relevant documents, and then selecting sentences from those documents. We discuss the issues of robust(More)
We present heterogeneous networks as a way to unify lexical networks with relational data. We build a unified ACL Anthology network, tying together the citation, author collaboration, and termcooccurence networks with affiliation and venue relations. This representation proves to be convenient and allows problems such as name disambiguation, topic modeling,(More)
We present a system for automatically identifying the native language of a writer. We experiment with a large set of features and train them on a corpus of 9,900 essays written in English by speakers of 11 different languages. our system achieved an accuracy of 43% on the test data, improved to 63% with improved feature normalization. In this paper, we(More)
Kathleen McKeown1, Hal Daume III2, Snigdha Chaturvedi2, John Paparrizos1, Kapil Thadani1, Pablo Barrio1, Or Biran1, Suvarna Bothe1, Michael Collins1, Kenneth R. Fleischmann3, Luis Gravano1, Rahul Jha4, Ben King4, Kevin McInerney5, Taesun Moon6, Arvind Neelakantan8, Diarmuid O’Seaghdha7, Dragomir Radev4, Clay Templeton3, Simone Teufel7 1Columbia University,(More)