Analysis of the Quotation Corpus of the Russian Wiktionary

  title={Analysis of the Quotation Corpus of the Russian Wiktionary},
  author={Alexander V. Smirnov and Tatiana Levashova and Alexey Karpov and Irina S. Kipyatkova and Andrey Ronzhin and Andrew Krizhanovsky and Nataly Krizhanovsky},
The quantitative evaluation of quotations in the Russian Wiktionary was performed using the developed Wiktionary parser. It was found that the number of quotations in the dictionary is growing fast (51.5 thousands in 2011, 62 thousands in 2012). These quotations were extracted and saved in the relational database of a machine-readable dictionary. For this database, tables related to the quotations were designed. A histogram of distribution of quotations of literary works written in different… 

Figures and Tables from this paper



A Quantitative Analysis of the English Lexicon in Wiktionaries and WordNet

A quantitative analysis of the English lexicon shows that the average polysemy, the number and the distribution of word senses follow similar patterns in both expert and collaborative resources with relatively minor differences.

Wiktionary as a source for automatic pronunciation extraction

Whether dictionaries from the World Wide Web which contain phonetic notations, may support the rapid creation of pronunciation dictionaries within the speech recognition and speech synthesis system building process is analyzed.

Transformation of Wiktionary entry structure into tables and relations in a relational database schema

The paper describes how the flat text of the Wiktionary entry was extracted, converted, and stored in the specially designed relational database schema, which is a part of a machine-readable dictionary (MRD).

Automatic Pronunciation Dictionary Generation from Wiktionary and Wikipedia

In this work we show that dictionaries from the World Wide Web which contain phonetic notations may represent a good basis for the rapid pronunciation dictionary creation within the speech

Wiktionary: a new rival for expert-built lexicons? Exploring the possibilities of collaborative lexicography

The variety of encoded lexical, semantic, and cross-lingual knowledge of three different language editions of Wiktionary is studied and the coverage of terms, lexemes, word senses, domains, and registers are compared to multiple expert-built lexicons.

Multilingual Ontology Matching based on Wiktionary Data Accessible via SPARQL Endpoint

In the case study, the problem entity is a task of multilingual ontology matching based on Wiktionary data accessible via SPARQL endpoint, and Ontology matching results obtained usingWiktionary were compared with results based on Google Translate API.

Automatically Linking GermaNet to Wikipedia for Harvesting Corpus Examples for GermaNet Senses

The paper describes the automatic mapping of GermaNet senses to Wikipedia articles, using proven, state-ofthe-art word sense disambiguation methods, in particular different versions of word overlap algorithms and PageRank as well as classifiers that combine these methods.

Accessing and standardizing Wiktionary lexical entries for the translation of labels in Cultural Heritage taxonomies

Conversion tools between Wiktionary and TEI are developed, using ISO standards (LMF, MAF), to make such resources available to both the Digital Humanities community and the Language Resources community.

A Syntactically and Semantically Tagged Corpus of Russian: State of the Art and Prospects

We describe a project aimed at creating a deeply annotated corpus of Russian texts. The annotation consists of comprehensive morphological marking, syntactic tagging in the form of a complete

NULEX: An Open-License Broad Coverage Lexicon

NU-LEX is described, an open-license feature-based lexicon for general purpose parsing that combines WordNet, VerbNet, and Wiktionary and contains over 100,000 words and its shortcomings primarily fell into two categories, suggesting future research directions.