Learn More
We introduce an Icelandic corpus of more than 250 million running words and describe the methodology to build it. The resource is available for use free of charge. We provide automatically generated mono-lingual lexicon entries, comprising frequency statistics, samples of usage, co-occurring words and a graphical representation of the word's semantic(More)
The effects of total parenteral nutrition (TPN) versus enteral nutrition (TEN) were studied in 34 patients following major neurosurgery. Measurements were made of resting energy expenditure (REE), urea production rate (UPR), visceral proteins, parameters of liver and pancreas function, as well as gastrointestinal absorption. To predict nutritional status,(More)
The Leipzig Corpora Collection offers free online access to 136 monolingual dictionaries enriched with statistical information. In this paper we describe current advances of the project in collecting and processing text data automatically for a large number of languages. Our main interest lies in languages of " low density " , where only few text data(More)
The quality of statistical measurements on corpora is strongly related to a strict definition of the measuring process and to corpus quality. In the case of multiple result inspections, an exact measurement of previously specified parameters ensures compatibility of the different measurements performed by different researchers on possibly different objects.(More)
Since 2011 the comprehensive, electronically available sources of the Leipzig Corpora Collection have been used consistently for the compilation of high quality word lists. The underlying corpora include newspaper texts, Wikipedia articles and other randomly collected Web texts. For many of the languages featured in this collection, it is the first(More)
One way to analyse word relations is to examine their co-occurrence in the same context. This allows for the identification of potential semantic or lexical relationships between words. As previous studies showed word co-occurrences often reflect human stimuli-response pairs. In this paper significant sentence co-occurrences on word level were used to(More)
The new POS-tagged Icelandic corpus of the Leipzig Corpora Collection is an extensive resource for the analysis of the Icelandic language. As it contains a large share of all Web documents hosted under the .is top-level domain, it is especially valuable for investigations on modern Icelandic and non-standard language varieties. The corpus is accessible via(More)