• Corpus ID: 21699175

CogCompNLP: Your Swiss Army Knife for NLP

@inproceedings{Khashabi2018CogCompNLPYS,
  title={CogCompNLP: Your Swiss Army Knife for NLP},
  author={Daniel Khashabi and Mark Sammons and Ben Zhou and Tom Redman and Christos Christodoulopoulos and Vivek Srikumar and Nicholas Rizzolo and Lev-Arie Ratinov and Guanheng Luo and Quang Xuan Do and Chen-Tse Tsai and Subhro Roy and Stephen Mayhew and Zhili Feng and John Wieting and Xiaodong Yu and Yangqiu Song and Shashank Gupta and Shyam Upadhyay and N. Arivazhagan and Qiang Ning and Shaoshi Ling and Dan Roth},
  booktitle={LREC},
  year={2018}
}
Implementing a Natural Language Processing (NLP) system requires considerable engineering effort: creating data-structures to represent language constructs; reading corpora annotations into these data-structures; applying off-the-shelf NLP tools to augment the text representation; extracting features and training machine learning components; conducting experiments and computing performance statistics; and creating the end-user application that integrates the implemented components. While there… 

Figures and Tables from this paper

Reasoning-Driven Question-Answering for Natural Language Understanding
TLDR
This thesis proposes a formulation for abductive reasoning in natural language and shows its effectiveness, especially in domains with limited training data, and presents the first formal framework for multi-step reasoning algorithms, in the presence of a few important properties of language use.
CogCompTime: A Tool for Understanding Time in Natural Language
TLDR
This paper introduces CogCompTime, a system that has these two important functionalities and incorporates the most recent progress, achieves state-of-the-art performance, and is publicly available at http://cogcomp.org/page/publication_view/844.
TALEN: Tool for Annotation of Low-resource ENtities
TLDR
A small user study is conducted to compare against a popular annotation tool, showing that TALEN achieves higher precision and recall against ground-truth annotations, and that users strongly prefer it over the alternative.
Zero-Shot Open Entity Typing as Type-Compatible Grounding
TLDR
A zero-shot entity typing approach that requires no annotated data and can flexibly identify newly defined types is proposed that is shown to be competitive with state-of-the-art supervised NER systems, and to outperform them on out- of-training datasets.
Named Entity Recognition and Relation Extraction
TLDR
This study aims to present an overview of approaches that can be applied to extract key insights from textual data in a structured way and covers early approaches as well as the developments made up till now using machine learning models.
Cross-lingual Entity Alignment for Knowledge Graphs with Incidental Supervision from Free Text
TLDR
A new model, JEANS, is proposed, which jointly represents multilingual KGs and text corpora in a shared embedding scheme, and seeks to improve entity alignment with incidental supervision signals from text.
On the Strength of Character Language Models for Multilingual Named Entity Recognition
TLDR
It is demonstrated that CLMs provide a simple and powerful model for capturing the inherent differences between name and nonname tokens in text, and by adding very simple CLM-based features the authors can significantly improve the performance of an off-the-shelf NER system for multiple languages.
Named Entity Recognition with Partially Annotated Training Data
TLDR
This work introduces a constraint-driven iterative algorithm that learns to detect false negatives in the noisy set and downweigh them, resulting in a weighted NER model, and evaluates the algorithm with weighted variants of neural and non-neural NER models on data in 8 languages from several language and script families, showing strong ability to learn from partial data.
Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences
TLDR
The dataset is the first to study multi-sentence inference at scale, with an open-ended set of question types that requires reasoning skills, and finds human solvers to achieve an F1-score of 88.1%.
...
1
2
3
...

References

SHOWING 1-10 OF 47 REFERENCES
EDISON: Feature Extraction for NLP, Simplified
TLDR
EDISON, a Java library of feature generation functions used in a suite of state-of-the-art NLP tools, based on a set of generic NLP data structures, to show that this can significantly reduce the time spent by developers on feature extraction design for NLP systems.
Learning Based Java for Rapid Development of NLP Systems
TLDR
This paper demonstrates that there exists a theoretical model that describes most NLP approaches adeptly and introduces the concept of data driven compilation, a translation process in which the efficiency of the generated code benefits from the data given as input to the learning algorithms.
The Manually Annotated Sub-Corpus: A Community Resource for and by the People
TLDR
The Manually Annotated Sub-Corpus (MASC) project provides data and annotations to serve as the base for a communitywide annotation effort of a subset of the American National Corpus, the first large-scale, open, community-based effort to create much needed language resources for NLP.
Exploiting the Wikipedia structure in local and global classification of taxonomic relations*
  • Q. Do, D. Roth
  • Computer Science
    Natural Language Engineering
  • 2012
TLDR
An algorithmic approach that, given two terms, determines the taxonomic relation between them using a machine learning-based approach that makes use of existing resources is described and significantly outperforms other systems built upon the existing well-known knowledge sources.
CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes
TLDR
The OntoNotes annotation (coreference and other layers) is described and the parameters of the shared task including the format, pre-processing information, evaluation criteria, and presents and discusses the results achieved by the participating systems.
“Ask Not What Textual Entailment Can Do for You...”
TLDR
It is argued that the single global label with which RTE examples are annotated is insufficient to effectively evaluate RTE system performance; to promote research on smaller, related NLP tasks, it is believed more detailed annotation and evaluation are needed.
The Stanford CoreNLP Natural Language Processing Toolkit
TLDR
The design and use of the Stanford CoreNLP toolkit is described, an extensible pipeline that provides core natural language analysis, and it is suggested that this follows from a simple, approachable design, straightforward interfaces, the inclusion of robust and good quality analysis components, and not requiring use of a large amount of associated baggage.
The Automatic Content Extraction (ACE) Program - Tasks, Data, and Evaluation
The objective of the ACE program is to develop technology to automatically infer from human language data the entities being mentioned, the relations among these entities that are directly expressed,
Illinois Cross-Lingual Wikifier: Grounding Entities in Many Languages to the English Wikipedia
TLDR
A cross-lingual wikification system for all languages in Wikipedia that identifies names of people, locations, organizations, and grounds these names to the corresponding English Wikipedia entries using a cross-lingsual mention grounding model.
The Proposition Bank: An Annotated Corpus of Semantic Roles
TLDR
An automatic system for semantic role tagging trained on the corpus is described and the effect on its performance of various types of information is discussed, including a comparison of full syntactic parsing with a flat representation and the contribution of the empty trace categories of the treebank.
...
1
2
3
4
5
...