Rahul Jha

Learn More
We present heterogeneous networks as a way to unify lexical networks with re-lational data. We build a unified ACL Anthology network, tying together the citation, author collaboration, and term-cooccurence networks with affiliation and venue relations. This representation proves to be convenient and allows problems such as name disambiguation, topic(More)
We investigate the task of generating coherent survey articles for scientific topics. We introduce an extrac-tive summarization algorithm that combines a content model with a discourse model to generate coherent and readable summaries of scientific topics using text from scientific articles relevant to the topic. Human evaluation on 15 topics in(More)
We present a method for identifying the positive or negative semantic orientation of foreign words. Identifying the semantic orientation of words has numerous applications in the areas of text classification, analysis of product review, analysis of responses to surveys, and mining online discussions. Identifying the semantic orientation of English words has(More)
In this paper, we investigate the problem of automatic generation of scientific surveys starting from keywords provided by a user. We present a system that can take a topic query as input and generate a survey of the topic by first selecting a set of relevant documents, and then selecting sentences from those documents. We discuss the issues of robust(More)
The Computational Linguistics (CL) Summa-rization Pilot Task was created to encourage a community effort to address the research problem of summarizing research articles as " faceted summaries " in the domain of computational linguistics. In this pilot stage, a hand-annotated set of citing papers was provided for ten reference papers to help in automating(More)
In this paper, we study the problem of automatically annotating the factoids present in collective discourse. Factoids are information units that are shared between instances of collective discourse and may have many different ways of being realized in words. Our approach divides this problem into two steps, using a graph-based approach for each step: (1)(More)
The New Yorker publishes a weekly captionless cartoon. More than 5,000 readers submit captions for it. The editors select three of them and ask the readers to pick the funniest one. We describe an experiment that compares a dozen automatic methods for selecting the funniest caption. We show that negative sentiment, human-centeredness, and lexical centrality(More)
New scientific concepts, interpreted broadly, are continuously introduced in the literature , but relatively few concepts have a long-term impact on society. The identification of such concepts is a challenging prediction task that would help multiple parties – including researchers and the general public – focus their attention within the vast scientific(More)