The Stanford CoreNLP Natural Language Processing Toolkit

  title={The Stanford CoreNLP Natural Language Processing Toolkit},
  author={Christopher D. Manning and Mihai Surdeanu and John Bauer and Jenny Rose Finkel and Steven Bethard and David McClosky},
We describe the design and use of the Stanford CoreNLP toolkit, an extensible pipeline that provides core natural language analysis. This toolkit is quite widely used, both in the research NLP community and also among commercial and government users of open source NLP technology. We suggest that this follows from a simple, approachable design, straightforward interfaces, the inclusion of robust and good quality analysis components, and not requiring use of a large amount of associated baggage. 
Jigg: A Framework for an Easy Natural Language Processing Pipeline
Jigg is a Scala (or JVMbased) NLP annotation pipeline framework, which is easy to use and is extensible, and system developers can easily integrate their downstream system into a NLP pipeline from a raw text by just preparing a wrapper of it. Expand
A Tidy Data Model for Natural Language Processing using cleanNLP
  • T. Arnold
  • Computer Science, Mathematics
  • R J.
  • 2017
The package cleanNLP provides a set of fast tools for converting a textual corpus into a setOf normalized tables, exposing a number of annotation tasks for text written in English, French, German, and Spanish. Expand
Natural Language Processing on Ambiguous Sentence Using NLP Tools: Core NLP, Apertium and PRAAT
This paper is focusing on three NLP tools Core NLP, Apertium and PRAAT and using them on single ambiguous sentence. Expand
18 : 6 NLPPort : A Pipeline for Portuguese NLP
Although there are tools for some the most common natural language processing tasks in Portuguese, there is a lack of available cross-platform tools specifically targeted for Portuguese, from end toExpand
SupWSD: A Flexible Toolkit for Supervised Word Sense Disambiguation
The aim of SupWSD is to provide an easy-to-use tool for the research community, designed to be modular, fast and scalable for training and testing on large datasets. Expand
Improving NLTK for Processing Portuguese
NLPyPort is described, a NLP pipeline in Python, primarily based on NLTK, and focused on Portuguese, that improves over the performance of existing alternatives in Python in the tasks of tokenization, PoS tagging, lemmatization and NER. Expand
A Concrete Chinese NLP Pipeline
A CONCreTE Chinese NLP Pipeline is presented: an NLP stack built using a series of open source systems integrated based on the CONCRETE data schema, which includes data ingest, word segmentation, part of speech tagging, parsing, named entity recognition, relation extraction and cross document coreference resolution. Expand
Italy goes to Stanford: a collection of CoreNLP modules for Italian
Tint is an easy-to-use set of fast, accurate and extendable Natural Language Processing modules for Italian based on Stanford CoreNLP and is freely available as a standalone software or a library that can be integrated in an existing project. Expand
Doing Natural Language Processing in A Natural Way: An NLP toolkit based on object-oriented knowledge base and multi-level grammar base
  • Yu Guo
  • Computer Science
  • ArXiv
  • 2021
An NLP toolkit based on object-oriented knowledge base and multilevel grammar base that focuses on semantic parsing and has abilities to discover new knowledge and grammar automatically and will be used to update the knowledgebase and grammar base. Expand
Yet Another Suite of Multilingual NLP Tools
The experiments performed in Portuguese and English show that the current development of a multilingual suite for Natural Language Processing competes with some well known tools for NLP. Expand


An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines)
Curator, an NLP management framework designed to address some common problems and inefficiencies associated with building NLP process pipelines; and Edison, a NLP data structure library in Java that provides streamlined interactions with Curator and offers a range of useful supporting functionality. Expand
Natural Language Processing with Python
This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automaticExpand
Fast Exact Inference with a Factored Model for Natural Language Parsing
A novel generative model for natural language tree structures in which semantic and syntactic structures are scored with separate models that admits an extremely effective A* parsing algorithm, which enables efficient, exact inference. Expand
Book Review: Natural Language Processing with Python by Steven Bird, Ewan Klein, and Edward Loper
This book comes with “batteries included” (a reference to the phrase often used to explain the popularity of the Python programming language). It is the companion book to an impressive open-sourceExpand
ClearTK 2.0: Design Patterns for Machine Learning in UIMA
ClearTK adds machine learning functionality to the UIMA framework, providing wrappers to popular machine learning libraries, a rich feature extraction library that works across different classifiers,Expand
UIMA: an architectural approach to unstructured information processing in the corporate research environment
A general introduction to U IMA is given focusing on the design points of its analysis engine architecture and how UIMA is helping to accelerate research and technology transfer is discussed. Expand
SUTime: A library for recognizing and normalizing time expressions
SUTIME is a temporal tagger for recognizing and normalizing temporal expressions in English text and is a deterministic rule-based system designed for extensibility. Expand
GATE: an Architecture for Development of Robust HLT applications
GATE is presented, a framework and graphical development environment which enables users to develop and deploy language engineering components and resources in a robust fashion and can be used to develop applications and Resources in multiple languages, based on its thorough Unicode support. Expand
Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network
A new part-of-speech tagger is presented that demonstrates the following ideas: explicit use of both preceding and following tag contexts via a dependency network representation, broad use of lexical features, and effective use of priors in conditional loglinear models. Expand
Generating Typed Dependency Parses from Phrase Structure Parses
A system for extracting typed dependency parses of English sentences from phrase structure parses that captures inherent relations occurring in corpus texts that can be critical in real-world applications is described. Expand