A broad-coverage collection of portable NLP components for building shareable analysis pipelines

@inproceedings{EckartdeCastilho2014ABC,
  title={A broad-coverage collection of portable NLP components for building shareable analysis pipelines},
  author={Richard Eckart de Castilho and Iryna Gurevych},
  booktitle={OIAF4HLT@COLING},
  year={2014}
}
Due to the diversity of natural language processing (NLP) tools and resources, combining them into processing pipelines is an important issue, and sharing these pipelines with others remains a problem. We present DKPro Core, a broad-coverage component collection integrating a wide range of third-party NLP tools and making them interoperable. Contrary to other recent endeavors that rely heavily on web services, our collection consists only of portable components distributed via a repository… 
A Data-Centric Framework for Composable NLP Workflows
TLDR
A unified open-source framework to support fast development of such sophisticated NLP workflows in a composable manner and introduces a uniform data representation to encode heterogeneous results by a wide range of NLP tasks.
Towards cross-platform interoperability for machine-assisted annotation
In this paper we investigate cross-platform interoperability for natural language processing (NLP) and, in particular, annotation of textual resources, with an eye toward identifying the design
Towards cross-platform interoperability for machine-assisted text annotation
In this paper, we investigate cross-platform interoperability for natural language processing (NLP) and, in particular, annotation of textual resources, with an eye toward identifying the design
A Lightweight API-Based Approach for Building Flexible Clinical NLP Systems
TLDR
This research presents a lightweight architecture which is designed to be composable, extensible, and configurable, and takes NLP as an external component which can be accessed independently and orchestrated in a pipeline via web APIs.
Evaluating and Integrating Databases in the Area of NLP
Since computational power is rapidly increasing, analyzing big data is getting more popular. This is exemplified by word embeddings producing huge index files of interrelated items. Another example
Flexible NLP Pipelines for Digital Humanities Research
TLDR
This abstract presents work in progress on NLP Pipeline (nlppln), an open source tool that improves access to NLP software by facilitating combining NLP functionality from different software packages based on Common Workflow Language (CWL), a standard for describing data analysis workflows and tools.
Interface for Managing NLP-related Text Annotations
NLP and automatic text analysis necessarily involve the annotation of natural language texts. The Apache Unstructured Information Management applications (UIMA) framework is used in several projects,
Cost-Efficient Quality Assurance of Natural Language Processing Tools through Continuous Monitoring with Continuous Integration
TLDR
A monitoring system based on principles of Continuous Integration which guides IE or QA application developers to build high quality NLP pipelines in a cost-efficient way and is based on many common tools, used in many software engineering projects.
A Lightweight Modeling Middleware for Corpus Processing
TLDR
This framework for modeling arbitrary multi-modal corpus resources in a unified form for processing tools serves as a middleware system and combines the expressiveness of general graph-based models with a rich metadata schema to preserve linguistic specificity.
Docforia: A Multilayer Document Model
TLDR
Docforia is a multilayer document model and application programming interface (API) to store formatting, lexical, syntactic, and semantic annotations on Wikipedia and other kinds of text and visualize them, compatible with cluster computing frameworks such as Hadoop or Spark.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 37 REFERENCES
Development and Analysis of NLP Pipelines in Argo
TLDR
Argo, a Web-based workbench for the development and processing of NLP pipelines/workflows based upon UIMA, is demonstrated, which allows users to seamlessly connect their tools to workflows running in Argo, and take advantage of both the available library of components and the analytical tools.
U-Compare: A modular NLP workflow construction and evaluation system
TLDR
This work has collected a large library of interoperable resources, developed several workflow creation utilities, added a customizable comparison and evaluation system, and built visualization utilities that are modularly designed to accommodate various use cases and potential reuse scenarios.
UIMA: an architectural approach to unstructured information processing in the corporate research environment
TLDR
A general introduction to U IMA is given focusing on the design points of its analysis engine architecture and how UIMA is helping to accelerate research and technology transfer is discussed.
A Flexible Framework for Integrating Annotations from Different Tools and Tag Sets
TLDR
OLiA is introduced, an ontology of linguistic annotations that mediates between alternative tag sets that cover the same class of linguistic phenomena and is tied to a machine learning component for semiautomatic annotation.
Virtual Language Observatory: The Portal to the Language Resources and Technology Universe
TLDR
The Virtual Language Observatory portal is initiated to provide a low-barrier, easy-to-follow entry point to language resources and tools and substantial harmonization and curation efforts are required to provide researchers with metadata-based guidance.
Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium
TLDR
Recent work to update CWB for the new century includes support for multiple character sets, most especially Unicode (in the form of UTF-8), allowing all the world’s writing systems to be utilised within a CWB-indexed corpus, and support for powerful Perl-style regular expressions in CQP queries.
The Stanford CoreNLP Natural Language Processing Toolkit
TLDR
The design and use of the Stanford CoreNLP toolkit is described, an extensible pipeline that provides core natural language analysis, and it is suggested that this follows from a simple, approachable design, straightforward interfaces, the inclusion of robust and good quality analysis components, and not requiring use of a large amount of associated baggage.
Promoting Interoperability of Resources in META-SHARE
TLDR
U- Compare facilitates the rapid construction and evaluation of NLP applications that make use of interoperable components, and can help to speed up the development of a new generation of European language technology applications.
A common type system for clinical natural language processing
TLDR
A type system for clinical NLP that has an end target of deep semantics based on Clinical Element Models (CEMs), thus interoperating with structured data and accommodating diverse NLP approaches is described.
WebLicht: Web-based LRT Services in a Distributed eScience Infrastructure
TLDR
WebLicht is an eScience environment for linguistic analysis, making linguistic tools and resources available network-wide, and several kinds of linguistic tools are available which cover the basic functionality of automatic and incremental creation of annotated text corpora.
...
1
2
3
4
...