Rahul Jha

Learn More
We present heterogeneous networks as a way to unify lexical networks with re-lational data. We build a unified ACL Anthology network, tying together the citation, author collaboration, and term-cooccurence networks with affiliation and venue relations. This representation proves to be convenient and allows problems such as name disambiguation, topic(More)
We present a system for automatically identifying the native language of a writer. We experiment with a large set of features and train them on a corpus of 9,900 essays written in English by speakers of 11 different languages. our system achieved an accuracy of 43% on the test data, improved to 63% with improved feature normalization. In this paper, we(More)
We present a method for identifying the positive or negative semantic orientation of foreign words. Identifying the semantic orientation of words has numerous applications in the areas of text classification, analysis of product review, analysis of responses to surveys, and mining online discussions. Identifying the semantic orientation of English words has(More)
We investigate the task of generating coherent survey articles for scientific topics. We introduce an extrac-tive summarization algorithm that combines a content model with a discourse model to generate coherent and readable summaries of scientific topics using text from scientific articles relevant to the topic. Human evaluation on 15 topics in(More)
In this paper, we study the problem of automatically annotating the factoids present in collective discourse. Factoids are information units that are shared between instances of collective discourse and may have many different ways of being realized in words. Our approach divides this problem into two steps, using a graph-based approach for each step: (1)(More)
In this paper, we investigate the problem of automatic generation of scientific surveys starting from keywords provided by a user. We present a system that can take a topic query as input and generate a survey of the topic by first selecting a set of relevant documents, and then selecting sentences from those documents. We discuss the issues of robust(More)
The New Yorker publishes a weekly captionless cartoon. More than 5,000 readers submit captions for it. The editors select three of them and ask the readers to pick the funniest one. We describe an experiment that compares a dozen automatic methods for selecting the funniest caption. We show that negative sentiment, human-centeredness, and lexical centrality(More)
New scientific concepts, interpreted broadly, are continuously introduced in the literature , but relatively few concepts have a long-term impact on society. The identification of such concepts is a challenging prediction task that would help multiple parties – including researchers and the general public – focus their attention within the vast scientific(More)
We present a new factoid-annotated dataset for evaluating content models for scientific survey article generation containing 3,425 sentences from 7 topics in natural language processing. We also introduce a novel HITS-based content model for automated survey article generation called HITSUM that exploits the lexical network structure between sentences from(More)
  • 1