Analysis of the Quotation Corpus of the Russian Wiktionary

  title={Analysis of the Quotation Corpus of the Russian Wiktionary},
  author={A. Smirnov and T. Levashova and Alexey Karpov and I. Kipyatkova and A. Ronzhin and Andrew Krizhanovsky and Nataly Krizhanovsky},
The quantitative evaluation of quotations in the Russian Wiktionary was performed using the developed Wiktionary parser. It was found that the number of quotations in the dictionary is growing fast (51.5 thousands in 2011, 62 thousands in 2012). These quotations were extracted and saved in the relational database of a machine-readable dictionary. For this database, tables related to the quotations were designed. A histogram of distribution of quotations of literary works written in different… Expand


A Quantitative Analysis of the English Lexicon in Wiktionaries and WordNet
A quantitative analysis of the English lexicon shows that the average polysemy, the number and the distribution of word senses follow similar patterns in both expert and collaborative resources with relatively minor differences. Expand
Wiktionary as a source for automatic pronunciation extraction
Whether dictionaries from the World Wide Web which contain phonetic notations, may support the rapid creation of pronunciation dictionaries within the speech recognition and speech synthesis system building process is analyzed. Expand
Transformation of Wiktionary entry structure into tables and relations in a relational database schema
The paper describes how the flat text of the Wiktionary entry was extracted, converted, and stored in the specially designed relational database schema, which is a part of a machine-readable dictionary (MRD). Expand
Automatic Pronunciation Dictionary Generation from Wiktionary and Wikipedia
In this work we show that dictionaries from the World Wide Web which contain phonetic notations may represent a good basis for the rapid pronunciation dictionary creation within the speechExpand
Wiktionary: a new rival for expert-built lexicons? Exploring the possibilities of collaborative lexicography
The variety of encoded lexical, semantic, and cross-lingual knowledge of three different language editions of Wiktionary is studied and the coverage of terms, lexemes, word senses, domains, and registers are compared to multiple expert-built lexicons. Expand
Multilingual Ontology Matching based on Wiktionary Data Accessible via SPARQL Endpoint
In the case study, the problem entity is a task of multilingual ontology matching based on Wiktionary data accessible via SPARQL endpoint, and Ontology matching results obtained usingWiktionary were compared with results based on Google Translate API. Expand
Automatically Linking GermaNet to Wikipedia for Harvesting Corpus Examples for GermaNet Senses
The paper describes the automatic mapping of GermaNet senses to Wikipedia articles, using proven, state-ofthe-art word sense disambiguation methods, in particular different versions of word overlap algorithms and PageRank as well as classifiers that combine these methods. Expand
Accessing and standardizing Wiktionary lexical entries for the translation of labels in Cultural Heritage taxonomies
Conversion tools between Wiktionary and TEI are developed, using ISO standards (LMF, MAF), to make such resources available to both the Digital Humanities community and the Language Resources community. Expand
A Syntactically and Semantically Tagged Corpus of Russian: State of the Art and Prospects
We describe a project aimed at creating a deeply annotated corpus of Russian texts. The annotation consists of comprehensive morphological marking, syntactic tagging in the form of a completeExpand
NULEX: An Open-License Broad Coverage Lexicon
NU-LEX is described, an open-license feature-based lexicon for general purpose parsing that combines WordNet, VerbNet, and Wiktionary and contains over 100,000 words and its shortcomings primarily fell into two categories, suggesting future research directions. Expand