• Corpus ID: 248085107

Resources for Turkish Natural Language Processing: A critical survey

  title={Resources for Turkish Natural Language Processing: A critical survey},
  author={cCaugri cColtekin and A. Seza Dougruoz and Ozlem cCetinouglu},
This paper presents a comprehensive survey of corpora and lexical resources available for Turkish. We review a broad range of resources, focusing on the ones that are publicly available. In addition to providing information about the available linguistic resources, we present a set of recommendations, and identify gaps in the data available for conducting research and building applications in Turkish Linguistics and Natural Language Processing. 
1 Citations

Tables from this paper

Türkçe Radyoloji Raporlarının Doğal Dil İşlenmesi

  • Sumeyra Kus OrduOktay Yildiz
  • Computer Science
    2022 International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT)
  • 2022
Today, it is widely used on DDI and studies on Turkish texts scarcity and the need for DDI applications in the field of health makes the work important.



An all-words sense annotated Turkish corpus

  • Sinan AkcakayaO. T. Yildiz
  • Computer Science
    2018 2nd International Conference on Natural Language and Speech Processing (ICNLSP)
  • 2018
This paper reports our efforts in constructing of a sense labeled Turkish corpus with respect to Turkish Language Institution's dictionary, using the traditional method of manual tagging. We tagged a

Resources for Turkish morphological processing

We present a set of language resources and tools—a morphological parser, a morphological disambiguator, and a text corpus—for exploiting Turkish morphology in natural language processing

Critical Survey of the Freely Available Arabic Corpora

The results of a recent survey conducted to identify the list of the freely available Arabic corpora and language resources are presented and they are presented in the various categories studied.

Turkish NLP web services in the WebLicht environment

A number of Turkish natural language processing tools that are being integrated into the CLARIN infrastructure are introduced, and particular challenges met during this effort are discussed.

Building a wordnet for Turkish

This paper summarizes the development process of a wordnet for Turkish as part of the Balkanet project. After discussing the basic method-ological issues that had to be resolved during the course of

TDB 1.1: Extensions on Turkish Discourse Bank

This paper presents the recent developments on Turkish Discourse Bank (TDB), and new annotations for three discourse relation types - implicit relations, entity relations and alternative lexicalizations are described.

Normalizing Non-canonical Turkish Texts Using Machine Translation Approaches

This work proposes a fully automated, context-aware machine translation approach with fewer stages of processing in Turkish text normalization, able to surpass the current best-performing system by a large margin.

Comparison of Turkish Proposition Banks by Frame Matching

PropBanks applied for Turkish are compared by checking semantic roles in the frame files of matched verb senses and creation of an inclusive lexical resource for Turkish is of great importance.

Integrating Morphology with Multi-word Expression Processing in Turkish

This paper describes a multi-word expression processor for preprocessing Turkish text for various language engineering applications and presents results from runs over a large corpus and a small gold-standard corpus.

A Gold Standard Dependency Treebank for Turkish

T; a new treebank for Turkish which consists of web and Wikipedia sentences that are annotated for segmentation, morphology, part-of-speech and dependency relations and also the results of the baseline experiments on Turkish dependency parsing with this treebank are presented.