CorA: A web-based annotation tool for historical and other non-standard language data
@inproceedings{Bollmann2014CorAAW, title={CorA: A web-based annotation tool for historical and other non-standard language data}, author={Marcel Bollmann and Florian Petran and Stefanie Dipper and Julia Krasselt}, booktitle={LaTeCH@EACL}, year={2014} }
We present CorA, a web-based annotation tool for manual annotation of historical and other non-standard language data. It allows for editing the primary data and modifying token boundaries during the annotation process. Further, it supports immediate retraining of taggers on newly annotated data.Â
19 Citations
SLATE: A Super-Lightweight Annotation Tool for Experts
- Computer ScienceACL
- 2019
SLATE is a new annotation tool that is designed to fill the niche of a lightweight interface for users with a terminal-based workflow, and has already been used to annotate two corpora.
Annotation Challenges for Reconstructing the Structural Elaboration of Middle Low German
- Computer Science, LinguisticsLaTeCH@ACL
- 2017
CorA is planned to extend CorA - a web-based annotation tool for historical and other non-standard language data - to capture elaboration phenomena and annotator unsureness and seek to interactively learn morphological as well as syntactic annotations.
Web-based Annotation Tool for Inflectional Language Resources
- Computer ScienceLREC
- 2018
Wasim is a web-based tool for semi-automatic morphosyntactic annotation of inflectional languages resources that aims to speed up the annotation by completely relying on a keyboard interface, with no mouse interaction required.
Overview of Annotation Creation: Processes and Tools
- Biology
- 2017
This chapter outlines the process of creating end-to-end linguistic annotations, identifying specific tasks that researchers often perform, and focuses more on abstract capabilities and problems because new tools appear continuously, while old tools disappear into disuse or disrepair.
1 Annotation : More Than Just a Scheme
- Business
- 2017
Creating linguistic annotations requires more than just a reliable annotation scheme. Annotation can be a complex endeavour potentially involving many people, stages, and tools. This chapter outlines…
Overview of Annotation Creation: Processes & Tools
- BiologyArXiv
- 2016
This chapter outlines the process of creating end-to-end linguistic annotations, identifying specific tasks that researchers often perform, and focuses more on abstract capabilities and problems because new tools appear continuously, while old tools disappear into disuse or disrepair.
Analyzing Middle High German Syntax with RDF and SPARQL
- Computer ScienceLREC
- 2018
The paper presents technological foundations for an empirical study of Middle High German syntax and provides a rule-based shallow parser and an enrichment pipeline with the purpose of quantitative evaluation of a qualitative hypothesis.
Enabling Annotation of Historical Corpora in an Asynchronous Collaborative Environment
- Computer ScienceDATeCH
- 2017
Current research in Corpus Linguistics and related disciplines within the multi-disciplinary field of Digital Humanities, involves computer-aided manual processing of large text corpora, which involves synchronizing interdependent annotations by different researchers.
Analyzing Constructional Change: Linguistic Annotation and Sources of Uncertainty
- LinguisticsTEEM
- 2018
This paper presents the various sources of uncertainty the authors encounter in the investigation of language elaboration processes in Middle Low German and develops an interface that captures all annotators' doubts.
PALMYRA 2.0: A Configurable Multilingual Platform Independent Tool for Morphology and Syntax Annotation
- Computer ScienceUDW
- 2020
PALMYRA 2.0 is designed to be highly configurable to any dependency parsing representation, and to enable the annotation of a multitude of linguistic features in a graphical dependency-tree visualization and editing software.
References
SHOWING 1-10 OF 12 REFERENCES
ANNIS: A Search Tool for Multi-Layer Annotated Corpora
- Computer Science
- 2009
The different features of the architecture as well as actual use cases for corpus linguistic research on such diverse areas as information structure, learner language and discourse level phenomena are presented.
Manual and semi-automatic normalization of historical spelling - case studies from Early New High German
- Computer ScienceKONVENS
- 2012
Norma is presented, a semi-automatic normalization tool that integrates different modules (lexicon lookup, rewrite rules) for normalizing words in an interactive way and dynamically updates the set of rule entries, given new input.
(Semi-)Automatic Normalization of Historical Texts using Distance Measures and the Norma tool
- Computer Science
- 2012
This paper compares several approaches to normalization with a focus on methods based on string distance measures and evaluates them on two different types of historical texts, showing that a combination of normalization methods produces the best results.
Tagging the Bard: Evaluating the accuracy of a modern POS tagger on Early Modern English corpora
- Linguistics
- 2007
Evaluating the accuracy of existing POS taggers, trained on modern English, when they are applied to Early Modern English (EModE) datasets highlights the extent to which the handling of orthographic variants is sufficient for the tagging accuracy of EModE data to approximate to the levels attained on modernday text(s).
Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text
- BusinessLaTeCH@ACL
- 2011
This study assesses the effects of spelling variation on the performance of the tagger, and investigates to what extent tagger performance can be improved by using 'normalised' input, where spelling variants in the corpus are standardised to a modern form.
Is Part-of-Speech Tagging a Solved Task? An Evaluation of POS Taggers for the German Web as Corpus
- Computer Science
- 2009
It is found that HMM taggers are more robust and much faster than advanced machine-learning approaches such as MaxEnt and promising directions for future research are unsupervised learning of a tagger lexicon from large unannotated corpora, as well as developing adaptive tagging models.
#hardtoparse: POS Tagging and Parsing the Twitterverse
- Computer ScienceAnalyzing Microtext
- 2011
Retraining Malt on dependency trees produced by a state-of-the-art phrase structure parser, which has itself been self-trained on Twitter material, results in a significant improvement and is analysed by examining in detail the effect of the retraining on individual dependency types.
Competing Target Hypotheses in the Falko Corpus: A Flexible Multi-Layer Corpus Architecture
- Linguistics
- 2011
Using the German learner corpus Falko as an example, this work argues for a flexible multi-layer standoff corpus architecture where competing target hypotheses can be coded simultaneously.
Estimation of Conditional Probabilities With Decision Trees and an Application to Fine-Grained POS Tagging
- Computer ScienceCOLING
- 2008
A HMM part-of-speech tagging method which is particularly suited for POS tagsets with a large number of fine-grained tags based on splitting of the POS tags into attribute vectors and decomposition of the contextual POS probabilities of the HMM into a product of attribute probabilities.
<tiger2/> as a standardised serialisation for ISO 24615
- Pro-
- 2012