• Publications
  • Influence
Semi-supervised sequence tagging with bidirectional language models
TLDR
A general semi-supervised approach for adding pre- trained context embeddings from bidirectional language models to NLP systems and apply it to sequence labeling tasks, surpassing previous systems that use other forms of transfer or joint learning with additional labeled data and task specific gazetteers. Expand
Many Languages, One Parser
TLDR
This work trains one multilingual model for dependency parsing and uses it to parse sentences in several languages, enabling the parser not only to parse effectively in multiple languages, but also to generalize across languages based on linguistic universals and typological similarities. Expand
ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing
TLDR
ScispaCy, a new Python library and models for practical biomedical/scientific text processing, which heavily leverages the spaCy library is described, which detail the performance of two packages of models released in scispa Cy and demonstrate their robustness on several tasks and datasets. Expand
Massively Multilingual Word Embeddings
TLDR
New methods for estimating and evaluating embeddings of words in more than fifty languages in a single shared embedding space are introduced and a new evaluation method is shown to correlate better than previous ones with two downstream tasks. Expand
DyNet: The Dynamic Neural Network Toolkit
TLDR
DyNet is a toolkit for implementing neural network models based on dynamic declaration of network structure that has an optimized C++ backend and lightweight graph representation and is designed to allow users to implement their models in a way that is idiomatic in their preferred programming language. Expand
Construction of the Literature Graph in Semantic Scholar
TLDR
This paper reduces literature graph construction into familiar NLP tasks, point out research challenges due to differences from standard formulations of these tasks, and report empirical results for each task. Expand
A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications
TLDR
The first public dataset of scientific peer reviews available for research purposes (PeerRead v1) is presented and it is shown that simple models can predict whether a paper is accepted with up to 21% error reduction compared to the majority baseline. Expand
Structural Scaffolds for Citation Intent Classification in Scientific Publications
TLDR
This work proposes structural scaffolds, a multitask model to incorporate structural information of scientific papers into citations for effective classification of citation intents, which achieves a new state-of-the-art on an existing ACL anthology dataset with a 13.3% absolute increase in F1 score. Expand
Content-Based Citation Recommendation
TLDR
It is shown empirically that, although adding metadata improves the performance on standard metrics, it favors self-citations which are less useful in a citation recommendation setup and released an online portal for citation recommendation based on this method. Expand
Conditional Random Field Autoencoders for Unsupervised Structured Prediction
TLDR
Competitive results with instantiations of the framework for unsupervised learning of structured predictors with overlapping, global features are shown, and it is shown that training the proposed model can be substantially more efficient than a comparable feature-rich baseline. Expand
...
1
2
3
4
5
...