Radovan Garabík

  • Citations Per Year
Learn More
Morphological annotation constitutes essential, very useful and very common linguistic information presented in corpora, especially for highly inflectional languages. The morphological tagset used in the Slovak National Corpus has been designed with several goals in mind – the tags are compact and easily human-readable, without sacrificing their(More)
This article1 provides an overview of the dissemination work carried out in META-NET from 2010 until 2015; we describe its impact on the regional, national and international level, mainly with regard to politics and the funding situation for LT topics. The article documents the initiative’s work throughout Europe in order to boost progress and innovation in(More)
The article briefly reviews bilingual Slovak-Bulgarian/BulgarianSlovak parallel and aligned corpus. The corpus is collected and developed as results of the collaboration in the frameworks of the joint research project between Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, and Ľ. Štúr Institute of Linguistics, Slovak Academy of(More)
Error diagnosis is an integral part of improving the quality and robustness of any ASR system, especially for languages with limited resources. This paper explores a semi-automatic approach to error categorization usable for databases that have a set of identical sentences produced by a sufficiently large number of speakers. We use a matrix created from an(More)
Presented French-Slovak parallel corpus FRASK is a sizeable corpus consisting of European Union legislative texts and fiction in both French and Slovak languages. Texts are sentence-aligned, lemmatized and contain morphological information. The searching mechanism includes the possibility to query single words, phrases, lemmas and morphology tag, using(More)
We describe a lexical database consisting of morphologically and phonetically tagged words that occur in the texts primarily used for language arts instruction in the Czech Republic, Poland and Slovakia in the initial period of primary education (up to grade 4 or 5). The database aims to parallel the contents and usage of the British English Children’s(More)
The paper discusses the requirements that need to be met in order for grid computing to be successfully applied to the field of digital lexicography, in particular to corpus processing. We explain the need for grid computing in this context, overview the current state of the grid, and discuss what special aspects are exhibited by grid-based corpus(More)
Современное развитие вычислительной техники позволяет нам принять участие в раньше невозможных направлениях научного исследования естественного языка. Основной, необходимой базой данных являются корпусы языков, в том числе и репрезентативные большие (национальные) корпусы. Уже широко доступны общие программные средства позволяющее эффективно обрабатывать(More)
Unknown named entity recognition in inflected languages faces several specific problems – the first and foremost is that the entities themselves are inflected1 (Dvonč et al., 1966) leading to a problem of identifying word forms as belonging to the same lexeme, and also the problem of finding correct lemma. In this article we analyse the distribution of word(More)