Steven Bethard

Learn More
We describe the design and use of the Stanford CoreNLP toolkit, an extensible pipeline that provides core natural language analysis. This toolkit is quite widely used, both in the research NLP community and also among commercial and government users of open source NLP technology. We suggest that this follows from a simple, approachable design,(More)
We identify a new task in the ongoing analysis of opinions: finding propositional opinions, sentential complements which for many verbs contain the actual opinion, rather than full opinion sentences. We propose an extension of semantic parsing techniques, coupled with additional lexical and syntactic features, that can produce labels for propositional(More)
Scientists depend on literature search to find prior work that is relevant to their research ideas. We introduce a retrieval model for literature search that incorporates a wide variety of factors important to researchers, and learns the weights of each of these factors by observing citation patterns. We introduce features like topical similarity and author(More)
We present an overview of the first shared task on language identification on codeswitched data. The shared task included code-switched data from four language pairs: Modern Standard ArabicDialectal Arabic (MSA-DA), MandarinEnglish (MAN-EN), Nepali-English (NEPEN), and Spanish-English (SPA-EN). A total of seven teams participated in the task and submitted(More)
In this paper, we extend current state-of-theart research on unsupervised acquisition of scripts, that is, stereotypical and frequently observed sequences of events. We design, evaluate and compare different methods for constructing models for script event prediction: given a partial chain of events in a script, predict other events that are likely to(More)
The ClearTK-TimeML submission to TempEval 2013 competed in all English tasks: identifying events, identifying times, and identifying temporal relations. The system is a pipeline of machine-learning models, each with a small set of features from a simple morpho-syntactic annotation pipeline, and where temporal relations are only predicted for a small set of(More)
We describe a set of top-performing systems at the SemEval 2015 English Semantic Textual Similarity (STS) task. Given two English sentences, each system outputs the degree of their semantic similarity. Our unsupervised system, which is based on word alignments across the two input sentences, ranked 5th among 73 submitted system runs with a mean correlation(More)
Captchas are designed to be easy for humans but hard for machines. However, most recent research has focused only on making them hard for machines. In this paper, we present what is to the best of our knowledge the first large scale evaluation of captchas from the human perspective, with the goal of assessing how much friction captchas present to the(More)
We present a simple, easy-to-replicate monolingual aligner that demonstrates state-of-the-art performance while relying on almost no supervision and a very small number of external resources. Based on the hypothesis that words with similar meanings represent potential pairs for alignment if located in similar contexts, we propose a system that operates by(More)
We present an algorithm for computing the semantic similarity between two sentences. It adopts the hypothesis that semantic similarity is a monotonically increasing function of the degree to which (1) the two sentences contain similar semantic units, and (2) such units occur in similar semantic contexts. With a simplistic operationalization of the notion of(More)