Corpus ID: 51777494

PAL, a tool for Pre-annotation and Active Learning

  title={PAL, a tool for Pre-annotation and Active Learning},
  author={Maria Skeppstedt and Carita Paradis and Andreas Kerren},
  journal={J. Lang. Technol. Comput. Linguistics},
Many natural language processing systems rely on machine learning models that are trained on large amounts of manually annotated text data. The lack of sufficient amounts of annotated data is, however, a common obstacle for such systems, since manual annotation of text is often expensive and time-consuming. The aim of “PAL", a tool for Pre-annotation and Active Learning” is to provide a ready-made package that can be used to simplify annotation and to reduce the amount of annotated data… Expand
A Pipeline for Manual Annotations of Risk Factor Mentions in the COVID-19 Open Research Dataset
A set of tools that are being maintained and further developed within the Sprakbanken Sam and SWE-CLARIN infrastructures can be employed for creating manually labelled training data in a low-resource setting. Expand
Learning Document-Level Label Propagation and Instance Selection by Deep Q-Network for Interactive Named Entity Annotation
This paper proposes a reinforcement learning-based approach, which learns to propagate labels among the instances within a document for interactive named entity annotation, and optimize the objective which is a trade-off between human effort and annotation quality by training a deep Q-network. Expand
Active learning approach using a modified least confidence sampling strategy for named entity recognition
A modified least confidence-based query sampling strategy for the active learning approach for named entity recognition task has been proposed, which considers different numbers of uncertain words present within the sentences to compute the final least confidence score of the sentence for comparison. Expand
End-to-End Active Learning for Computer Security Experts
An end-to-end active learning system, ILAB, tailored to the needs of computer security experts is introduced and the active learning strategy and the user interface jointly are designed to effectively reduce the annotation effort. Expand
Visualising and evaluating the effects of combining active learning with word embedding features
A tool that enables the use of active learning, as well as the incorporation of word embeddings, was evaluated for its ability to decrease the training data set size required for a named entityExpand
Expert-in-the-Loop Supervised Learning for Computer Security Detection Systems. (Apprentissage supervisé et systèmes de détection : une approche de bout-en-bout impliquant les experts en sécurité)
The constraints that such methods should meet to be effective in building supervised detection models are defined and three state-of-the-art methods are compared based on these criteria. Expand
StanceVis Prime: visual analysis of sentiment and stance in social media texts
StanceVis Prime is described, which has been designed for the analysis of sentiment and stance in temporal text data from various social media data sources, and provides the end users with an overview of similarities between the data series based on dynamic time warping analysis, as well as detailed visualizations of data series values. Expand
Language Processing Components of the StaViCTA Project
The StaViCTA project is concerned with visualising the expression of stance in written text, and is therefore dependent on components for stance detection. These components are to (i) download and ...
Computational Methods for Text Analysis and Text Classification
This chapter presents the computational methods for text analysis and text classification, including both rule-based and machine learning-based methods such as unsupervised and supervised methods.
Clinical Text Mining
  • H. Dalianis
  • Computer Science
  • Springer International Publishing
  • 2018


brat: a Web-based Tool for NLP-Assisted Text Annotation
The brat rapid annotation tool (BRAT) is introduced, an intuitive web-based tool for text annotation supported by Natural Language Processing (NLP) technology and an evaluation of annotation assisted by semantic class disambiguation on a multicategory entity mention annotation task, showing a 15% decrease in total annotation time. Expand
Automatic Annotation Suggestions and Custom Annotation Layers in WebAnno
This paper extends WebAnno an open-source web-based annotation tool and tightly integrate a generic machine learning component for automatic annotation suggestions of span annotations, and shows that automatic annotations suggestions, combined with the split-pane UI concept, significantly reduces annotation time. Expand
A survey on annotation tools for the biomedical literature
This survey shows that current tools can support many of the tasks in biomedical text annotation in a satisfying manner, but also that no tool can be considered as a true comprehensive solution. Expand
WordFreak: An Open Tool for Linguistic Annotation
A plug-in architecture has been developed which allows components to be added to WordFreak for customized visualization, annotation specification, and automatic annotation, without re-compilation. Expand
Influence of Pre-Annotation on POS-Tagged Corpus Development
This article details a series of carefully designed experiments aiming at evaluating the influence of automatic pre-annotation on the manual part-of-speech annotation of a corpus, both from theExpand
Bootstrapping Named Entity Annotation by Means of Active Machine Learning: A Method for Creating Corpora
The results suggest that while the recognizer produced in phases one and two is as useful for pre-tagging as a recognizer created from randomly selected documents, the applicability of the Recognizer created during phase two as a pre-tagger in phase three is best investigated by conducting a user study involving real annotators working on a real named entity recognition task. Expand
An Analysis of Active Learning Strategies for Sequence Labeling Tasks
This paper surveys previously used query selection strategies for sequence models, and proposes several novel algorithms to address their shortcomings, and conducts a large-scale empirical comparison. Expand
Annotating named entities in clinical text by combining pre-annotation and active learning
For expanding a corpus of clinical text, annotated for named entities, a method that combines pre-tagging with a version of active learning is proposed, which aims to minimise the instances in which none of the presented pre-taggings is correct. Expand
Visual Analysis of Text Annotations for Stance Classification with ALVA
This work proposes a visual analytics approach called ALVA for text data annotation and visualization that supports the annotation process management and supplies annotators with a clean user interface for labeling utterances with several stance categories. Expand
Multi-Criterion Active Learning in Conditional Random Fields
The empirical results demonstrate that the use of multi-criterion active learning for identification of a small but sufficient set of text samples for training CRFs is capable of reducing the manual annotation costs, while also limiting the retraining costs that are often associated with active learning. Expand