• Publications
  • Influence
Natural Language Processing with Python
This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automaticExpand
NLTK: The Natural Language Toolkit
NLTK, the Natural Language Toolkit, is a suite of open source program modules, tutorials and problem sets, providing ready-to-use computational linguistics courseware. NLTK covers symbolic andExpand
The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics
This is a post-print of a paper from Sixth International Conference on Language Resources and Evaluation 2008, where six papers were presented, one of which was new to the literature. Expand
A formal framework for linguistic annotation
A wide variety of existing annotation formats are surveyed and a common conceptual core, the annotation graph, is demonstrated, which provides a formal framework for constructing, maintaining and searching linguistic annotations, while remaining consistent with many alternative data structures and file formats. Expand
Seven Dimensions of Portability for Language Documentation and Description
This article reviews existing software tools and digital technologies for language documentation and description, and analyzes portability problems in the seven areas of CONTENT, FORMAT, DISCOVERY, ACCESS, CITATION, PRESERVATION, and RIGHTS. Expand
Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser
This work proposes a learning method that needs less data, based on the observation that there are underlying shared structures across languages, and exploits cues from a different source language in order to guide the learning process. Expand
Reconsidering Language Identification for Written Language Resources
A review of previous research in written language identification reveals a number of questions which remain open and ripe for further investigation. Expand
ATLAS: A Flexible and Extensible Architecture for Linguistic Annotation
A formal model for annotating linguistic artifacts is described, from which an application programming interface (API) to a suite of tools for manipulating these annotations are derived, and a review of the current efforts towards implementing key pieces of this architecture is reviewed. Expand
Cross-lingual Transfer for Unsupervised Dependency Parsing Without Parallel Data
This method learns syntactic word embeddings that generalise over the syntactic contexts of a bilingual vocabulary, and incorporates these into a neural network parser, and shows empirical improvements over a baseline delexicalised parser on both the CoNLL and Universal Dependency Treebank datasets. Expand
Learning Crosslingual Word Embeddings without Bilingual Corpora
This method takes advantage of a high coverage dictionary in an EM style training algorithm over monolingual corpora in two languages to achieve state-of-the-art performance on bilingual lexicon induction task exceeding models using large bilingual corpora, and competitive results on the Monolingual word similarity and cross-lingual document classification task. Expand