The GUM corpus: creating multilayer resources in the classroom
- Amir Zeldes
- LinguisticsLanguage Resources and Evaluation
- 1 September 2017
The results of this project show that high quality, richly annotated resources can be created effectively as part of a linguistics curriculum, opening new possibilities not just for research, but also for corpora in linguistics pedagogy.
Universal Dependencies 2.1
- Joakim Nivre, Zeljko Agic, Hanzhi Zhu
- Linguistics, Computer Science
- 13 March 2017
The annotation scheme is based on (universal) Stanford dependencies, Google universal part-of-speech tags, and the Interset interlingua for morphosyntactic tagsets for morpho-lingual tagsets.
ANNIS3: A new architecture for generic corpus query and visualization
- Thomas Krause, Amir Zeldes
- Computer ScienceDigital Scholarship in the Humanities
- 1 April 2016
This article proposes a generic solution for specialized corpus visualizations in a Web interface using annotation-triggered style sheets, which leverage the power of modern browsers and CSS for multiple and highly customizable views of primary data.
Productivity in Argument Selection: From Morphology to Syntax
- Amir Zeldes
- Linguistics
- 28 November 2012
This book centers on the idea that some verbs and other argument structure constructions have an inherently different propensity to realize lexically unfamiliar arguments, independently of lexical…
RIDGES Herbology: designing a diachronic multi-layer corpus
- C. Odebrecht, Malte Belz, Amir Zeldes, Anke Lüdeling, Thomas Krause
- Computer ScienceLanguage Resources and Evaluation
- 1 September 2017
A multi-layer corpus architecture with multiple tokenizations using the open source historical, diachronic corpus of German called Register in Diachronic German Science, concerned with the development of a German scientific register, independent of Latin is introduced.
The DISRPT 2019 Shared Task on Elementary Discourse Unit Segmentation and Connective Detection
- Amir Zeldes, Debopam Das, E. Maziero, Juliano D. Antonio, Mikel Iruskieta
- Computer Science
- 1 June 2019
In 2019, we organized the first iteration of a shared task dedicated to the underlying units used in discourse parsing across formalisms: the DISRPT Shared Task on Elementary Discourse Unit…
GumDrop at the DISRPT2019 Shared Task: A Model Stacking Approach to Discourse Unit Segmentation and Connective Detection
- Yue Yu, Yilun Zhu, Amir Zeldes
- Computer ScienceProceedings of the Workshop on Discourse Relation…
- 23 April 2019
GumDrop, Georgetown University’s entry at the DISRPT 2019 Shared Task on automatic discourse unit segmentation and connective detection, relies on model stacking, creating a heterogeneous ensemble of classifiers, which feed into a metalearner for each final task.
The DISRPT 2021 Shared Task on Elementary Discourse Unit Segmentation, Connective Detection, and Relation Classification
- Amir Zeldes, Yang Janet Liu, Mikel Iruskieta, Philippe Muller, Chloé Braud, Sonia Badene
- Computer ScienceDISRPT
- 2021
The data included in the Shared Task is reviewed, which covers nearly 3 million manually annotated tokens from 16 datasets in 11 languages, and system performance on each task is reported on for both annotated and plain-tokenized versions of the data.
A Cross-Genre Ensemble Approach to Robust Reddit Part of Speech Tagging
- Shabnam Behzad, Amir Zeldes
- Computer ScienceWorkshop on Autonomic Communication
- 1 April 2020
This work studies how a state-of-the-art tagging model trained on different genres performs on Web content from unfiltered Reddit forum discussions, and offers a typology of the most common error types among them, broken down by training corpus.
DisCoDisCo at the DISRPT2021 Shared Task: A System for Discourse Segmentation, Classification, and Connective Detection
- Luke Gessler, Shabnam Behzad, Yang Janet Liu, Siyao Peng, Yilun Zhu, Amir Zeldes
- Computer ScienceDISRPT
- 20 September 2021
A partial evaluation of multiple pretrained Transformer-based language models indicates that models pre-trained on the Next Sentence Prediction (NSP) task are optimal for relation classification, and results on relation classification suggest strong performance on the new 2021 benchmark.
...
...