Learn More
We describe the design and use of the Stanford CoreNLP toolkit, an extensible pipeline that provides core natural language analysis. This toolkit is quite widely used, both in the research NLP community and also among commercial and government users of open source NLP technology. We suggest that this follows from a simple, approachable design,(More)
Natural language parsing has typically been done with small sets of discrete categories such as NP and VP, but this representation does not capture the full syntactic nor semantic richness of linguistic phrases, and attempts to improve on this by lexicalizing phrases or splitting categories only partly address the problem at the cost of huge feature spaces(More)
This paper describes the design and implementation of the slot filling system prepared by Stanford's natural language processing group for the 2010 Knowledge Base Population (KBP) track at the Text Analysis Conference (TAC). Our system relies on a simple distant supervision approach using mainly resources furnished by the track organizers: we used slot(More)
Multiword expressions (MWE), a known nuisance for both linguistics and NLP, blur the lines between syntax and semantics. Previous work on MWE identification has relied primarily on surface statistics, which perform poorly for longer MWEs and cannot model discontin-uous expressions. To address these problems, we show that even the simplest parsing models can(More)
We present a gold standard annotation of syntactic dependencies in the English Web Treebank corpus using the Stanford Dependencies standard. This resource addresses the lack of a gold standard dependency treebank for English, as well as the limited availability of gold standard syntactic annotations for informal genres of English text. We also present(More)
This paper describes the design and implementation of the slot filling system prepared by Stanford's natural language processing group for the 2011 Knowledge Base Population (KBP) track at the Text Analysis Conference (TAC). Our system relies on a simple distant supervision approach using mainly resources furnished by the track's organizers: we used slot(More)
We present an application-level I/O caching, prefetching, asynchronous system to hide access latency experienced by HPC applications. Our solution of user controllable caching and prefetching system maintains a file-IO cache in the user space of the application, analyzes the I/O access patterns, prefetches requests, and performs write-back of dirty data to(More)
Accurately surveying shark populations is critical to monitoring precipitous ongoing declines in shark abundance and interpreting the effects that these reductions are having on ecosystems. To evaluate the effectiveness of existing survey tools, we used field trials and computer simulations to critically examine the operation of four common methods for(More)
In this paper, we present an application level aggressive I/O caching and prefetching system to hide I/O access latency experienced by out-of-core applications. Without the application level prefetching and caching capability, users of I/O intensive applications need to rewrite them with asynchronous I/O calls or restructure their code with MPI-IO calls to(More)
We describe the Stanford University NLP Group submission to the 2013 Workshop on Statistical Machine Translation Shared Task. We demonstrate the effectiveness of a new adaptive, online tuning algorithm that scales to large feature and tuning sets. For both English-French and English-German, the algorithm produces feature-rich models that improve over a(More)