Rahul Jha

Learn More
We present heterogeneous networks as a way to unify lexical networks with re-lational data. We build a unified ACL Anthology network, tying together the citation, author collaboration, and term-cooccurence networks with affiliation and venue relations. This representation proves to be convenient and allows problems such as name disambiguation, topic(More)
The New Yorker publishes a weekly captionless cartoon. More than 5,000 readers submit captions for it. The editors select three of them and ask the readers to pick the funniest one. We describe an experiment that compares a dozen automatic methods for selecting the funniest caption. We show that negative sentiment, human-centeredness, and lexical centrality(More)
We present a system for automatically identifying the native language of a writer. We experiment with a large set of features and train them on a corpus of 9,900 essays written in English by speakers of 11 different languages. our system achieved an accuracy of 43% on the test data, improved to 63% with improved feature normalization. In this paper, we(More)
We present a method for identifying the positive or negative semantic orientation of foreign words. Identifying the semantic orientation of words has numerous applications in the areas of text classification, analysis of product review, analysis of responses to surveys, and mining online discussions. Identifying the semantic orientation of English words has(More)
New scientific concepts, interpreted broadly, are continuously introduced in the literature , but relatively few concepts have a long-term impact on society. The identification of such concepts is a challenging prediction task that would help multiple parties – including researchers and the general public – focus their attention within the vast scientific(More)
We investigate the task of generating coherent survey articles for scientific topics. We introduce an extrac-tive summarization algorithm that combines a content model with a discourse model to generate coherent and readable summaries of scientific topics using text from scientific articles relevant to the topic. Human evaluation on 15 topics in(More)
The Computational Linguistics (CL) Summa-rization Pilot Task was created to encourage a community effort to address the research problem of summarizing research articles as " faceted summaries " in the domain of computational linguistics. In this pilot stage, a hand-annotated set of citing papers was provided for ten reference papers to help in automating(More)
In this paper, we investigate the problem of automatic generation of scientific surveys starting from keywords provided by a user. We present a system that can take a topic query as input and generate a survey of the topic by first selecting a set of relevant documents, and then selecting sentences from those documents. We discuss the issues of robust(More)