Resources for Turkish Dependency Parsing: Introducing the BOUN Treebank and the BoAT Annotation Tool

@article{Trk2021ResourcesFT,
  title={Resources for Turkish Dependency Parsing: Introducing the BOUN Treebank and the BoAT Annotation Tool},
  author={Utku T{\"u}rk and Furkan Atmaca and Saziye Bet{\"u}l {\"O}zates and G{\"o}zde Berk and Seyyit Talha Bedir and Abdullatif K{\"o}ksal and Balkiz {\"O}zt{\"u}rk Basaran and Tunga G{\"u}ng{\"o}r and Arzucan {\"O}zg{\"u}r},
  journal={ArXiv},
  year={2021},
  volume={abs/2002.10416}
}
In this paper, we describe our contributions and efforts to develop Turkish resources, which include a new treebank (BOUN Treebank) with novel sentences, along with the guidelines we adopted and a new annotation tool we developed (BoAT). The manual annotation process we employed was shaped and implemented by a team of four linguists and five NLP specialists. Decisions regarding the annotation of the BOUN Treebank were made in line with the Universal Dependencies framework, which originated from… Expand
A Language-aware Approach to Code-switched Morphological Tagging
TLDR
Experimental results show that including language IDs to the learning model significantly improves accuracy over other approaches and this approach for integrating language IDs into a transformer-based framework for CS morphological tagging. Expand
Massive Choice, Ample Tasks (MaChAmp): A Toolkit for Multi-task Learning in NLP
TLDR
MaChAmp is presented, a toolkit for easy fine-tuning of contextualized embeddings in multi-task settings and the benefits are its flexible configuration options, and the support of a variety of natural language processing tasks in a uniform toolkit. Expand

References

SHOWING 1-10 OF 114 REFERENCES
Turkish Treebanking: Unifying and Constructing Efforts
TLDR
It is demonstrated that the annotation of the TNC-UD improves the parsing accuracy of Turkish, and a custom annotation software with advanced filtering and morphological editing options is constructed. Expand
Improving the Annotations in the Turkish Universal Dependency Treebank
TLDR
It is observed that the re-annotation of the Turkish IMST-UD treebank improves performance with regards to dependency parsing. Expand
IMST: A Revisited Turkish Dependency Treebank
TLDR
An attempt at reannotating the treebank from the ground up using the proposed schemes is described, and the consistencies of the two versions of the original treebank are compared via cross-validation using a dependency parser. Expand
A Gold Standard Dependency Treebank for Turkish
TLDR
T; a new treebank for Turkish which consists of web and Wikipedia sentences that are annotated for segmentation, morphology, part-of-speech and dependency relations and also the results of the baseline experiments on Turkish dependency parsing with this treebank are presented. Expand
Universal Dependencies for Turkish
TLDR
The findings suggest that the UD framework is at least as viable for Turkish as the original annotation framework of the IMST Treebank. Expand
Swedish-Turkish Parallel Treebank
TLDR
The treebank is a balanced syntactically annotated corpus containing both fiction and technical documents that was developed within the project supporting research environment for minor languages aiming at to create representative language resources for language pairs dissimilar in language structure. Expand
The English-Swedish-Turkish Parallel Treebank
TLDR
A syntactically annotated parallel corpus containing typologically partly different languages, namely English, Swedish and Turkish, is described, used in teaching and linguistic research to study the relationship between the structurally different languages. Expand
The TIGER Treebank
This paper reports on the TIGER Treebank, a corpus of currently 35.000 syntactically annotated German newspaper sentences. We describe what kind of information is encoded in the treebank andExpand
Constructing a Turkish Constituency Parse TreeBank
TLDR
The words are semi-automatically annotated morphologically and a rule-based approach is used for refining the parse trees based on the morphological analyses of the words. Expand
...
1
2
3
4
5
...