CorA: A web-based annotation tool for historical and other non-standard language data

@inproceedings{Bollmann2014CorAAW,
  title={CorA: A web-based annotation tool for historical and other non-standard language data},
  author={Marcel Bollmann and Florian Petran and Stefanie Dipper and Julia Krasselt},
  booktitle={LaTeCH@EACL},
  year={2014}
}
We present CorA, a web-based annotation tool for manual annotation of historical and other non-standard language data. It allows for editing the primary data and modifying token boundaries during the annotation process. Further, it supports immediate retraining of taggers on newly annotated data. 

Figures from this paper

SLATE: A Super-Lightweight Annotation Tool for Experts
TLDR
SLATE is a new annotation tool that is designed to fill the niche of a lightweight interface for users with a terminal-based workflow, and has already been used to annotate two corpora.
Annotation Challenges for Reconstructing the Structural Elaboration of Middle Low German
TLDR
CorA is planned to extend CorA - a web-based annotation tool for historical and other non-standard language data - to capture elaboration phenomena and annotator unsureness and seek to interactively learn morphological as well as syntactic annotations.
Web-based Annotation Tool for Inflectional Language Resources
TLDR
Wasim is a web-based tool for semi-automatic morphosyntactic annotation of inflectional languages resources that aims to speed up the annotation by completely relying on a keyboard interface, with no mouse interaction required.
Overview of Annotation Creation: Processes and Tools
TLDR
This chapter outlines the process of creating end-to-end linguistic annotations, identifying specific tasks that researchers often perform, and focuses more on abstract capabilities and problems because new tools appear continuously, while old tools disappear into disuse or disrepair.
1 Annotation : More Than Just a Scheme
Creating linguistic annotations requires more than just a reliable annotation scheme. Annotation can be a complex endeavour potentially involving many people, stages, and tools. This chapter outlines
Overview of Annotation Creation: Processes & Tools
TLDR
This chapter outlines the process of creating end-to-end linguistic annotations, identifying specific tasks that researchers often perform, and focuses more on abstract capabilities and problems because new tools appear continuously, while old tools disappear into disuse or disrepair.
Analyzing Middle High German Syntax with RDF and SPARQL
TLDR
The paper presents technological foundations for an empirical study of Middle High German syntax and provides a rule-based shallow parser and an enrichment pipeline with the purpose of quantitative evaluation of a qualitative hypothesis.
Enabling Annotation of Historical Corpora in an Asynchronous Collaborative Environment
TLDR
Current research in Corpus Linguistics and related disciplines within the multi-disciplinary field of Digital Humanities, involves computer-aided manual processing of large text corpora, which involves synchronizing interdependent annotations by different researchers.
Analyzing Constructional Change: Linguistic Annotation and Sources of Uncertainty
TLDR
This paper presents the various sources of uncertainty the authors encounter in the investigation of language elaboration processes in Middle Low German and develops an interface that captures all annotators' doubts.
PALMYRA 2.0: A Configurable Multilingual Platform Independent Tool for Morphology and Syntax Annotation
TLDR
PALMYRA 2.0 is designed to be highly configurable to any dependency parsing representation, and to enable the annotation of a multitude of linguistic features in a graphical dependency-tree visualization and editing software.
...
1
2
...

References

SHOWING 1-10 OF 12 REFERENCES
ANNIS: A Search Tool for Multi-Layer Annotated Corpora
TLDR
The different features of the architecture as well as actual use cases for corpus linguistic research on such diverse areas as information structure, learner language and discourse level phenomena are presented.
Manual and semi-automatic normalization of historical spelling - case studies from Early New High German
TLDR
Norma is presented, a semi-automatic normalization tool that integrates different modules (lexicon lookup, rewrite rules) for normalizing words in an interactive way and dynamically updates the set of rule entries, given new input.
(Semi-)Automatic Normalization of Historical Texts using Distance Measures and the Norma tool
TLDR
This paper compares several approaches to normalization with a focus on methods based on string distance measures and evaluates them on two different types of historical texts, showing that a combination of normalization methods produces the best results.
Tagging the Bard: Evaluating the accuracy of a modern POS tagger on Early Modern English corpora
TLDR
Evaluating the accuracy of existing POS taggers, trained on modern English, when they are applied to Early Modern English (EModE) datasets highlights the extent to which the handling of orthographic variants is sufficient for the tagging accuracy of EModE data to approximate to the levels attained on modernday text(s).
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text
TLDR
This study assesses the effects of spelling variation on the performance of the tagger, and investigates to what extent tagger performance can be improved by using 'normalised' input, where spelling variants in the corpus are standardised to a modern form.
Is Part-of-Speech Tagging a Solved Task? An Evaluation of POS Taggers for the German Web as Corpus
TLDR
It is found that HMM taggers are more robust and much faster than advanced machine-learning approaches such as MaxEnt and promising directions for future research are unsupervised learning of a tagger lexicon from large unannotated corpora, as well as developing adaptive tagging models.
#hardtoparse: POS Tagging and Parsing the Twitterverse
TLDR
Retraining Malt on dependency trees produced by a state-of-the-art phrase structure parser, which has itself been self-trained on Twitter material, results in a significant improvement and is analysed by examining in detail the effect of the retraining on individual dependency types.
Competing Target Hypotheses in the Falko Corpus: A Flexible Multi-Layer Corpus Architecture
TLDR
Using the German learner corpus Falko as an example, this work argues for a flexible multi-layer standoff corpus architecture where competing target hypotheses can be coded simultaneously.
Estimation of Conditional Probabilities With Decision Trees and an Application to Fine-Grained POS Tagging
TLDR
A HMM part-of-speech tagging method which is particularly suited for POS tagsets with a large number of fine-grained tags based on splitting of the POS tags into attribute vectors and decomposition of the contextual POS probabilities of the HMM into a product of attribute probabilities.
<tiger2/> as a standardised serialisation for ISO 24615
  • Pro-
  • 2012
...
1
2
...