• Publications
  • Influence
The Stanford CoreNLP Natural Language Processing Toolkit
TLDR
The design and use of the Stanford CoreNLP toolkit is described, an extensible pipeline that provides core natural language analysis, and it is suggested that this follows from a simple, approachable design, straightforward interfaces, the inclusion of robust and good quality analysis components, and not requiring use of a large amount of associated baggage.
Effective Self-Training for Parsing
We present a simple, but surprisingly effective, method of self-training a two-phase parser-reranker system using readily available unlabeled data. We show that this type of bootstrapping is possible
Reranking and Self-Training for Parser Adaptation
TLDR
The reranking parser described in Charniak and Johnson (2005) improves performance of the parser on Brown to 85.2% and use of the self-training techniques described in (McClosky et al., 2006) raise this to 87.8% (an error reduction of 28%) again without any use of labeled Brown data.
Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing
TLDR
This paper introduces basic valence frames and lexical information into an unsupervised dependency grammar inducer and shows how this additional information can be leveraged via smoothing to produce state-of-the-art results.
The Role of Context Types and Dimensionality in Learning Word Embeddings
We provide the first extensive evaluation of how using different types of context to learn skip-gram word embeddings affects performance on a wide range of intrinsic and extrinsic NLP tasks. Our
Event Extraction as Dependency Parsing
TLDR
This work proposes a simple approach for the extraction of nested event structures by taking the tree of event-argument relations and using it directly as the representation in a reranking dependency parser, which provides a simple framework that captures global properties of both nested and flat event structures.
Improving Statistical MT through Morphological Analysis
TLDR
This work shows that using morphological analysis to modify the Czech input can improve a Czech-English machine translation system, and investigates several different methods of incorporating morphological information, and shows that a system that combines these methods yields the best results.
Automatic Domain Adaptation for Parsing
TLDR
The resulting system proposes linear combinations of parsing models trained on the source corpora that outperforms all non-oracle baselines including the best domain-independent parsing model.
Model Combination for Event Extraction in BioNLP 2011
TLDR
The FAUST system explores several stacking models for combination using as base models the UMass dual decomposition and Stanford event parsing approaches and finds that it is most effective when using a small set of stacking features and the base models use slightly different representations of the input data.
Any domain parsing: automatic domain adaptation for natural language parsing
TLDR
A technique is presented, Any Domain Parsing, which automatically detects useful source domains and mixes them together to produce a customized parsing model which performs almost as well as the best seen parsing models (oracle) for each target domain.
...
...