Learn More
This paper describes the two systems for determining the semantic similarity of short texts submitted to the SemEval 2012 Task 6. Most of the research on semantic similarity of textual content focuses on large documents. However, a fair amount of information is condensed into short text snippets such as social media posts, image captions, and scientific(More)
An enormous number of documents is being produced that have to be stored, searched and accessed. Document indexing represents an efficient way to tackle this problem. Contributing to the document indexing process, we developed the Computer Aided Document Indexing System (CADIS) that applies controlled vocabulary keywords from the EUROVOC thesaurus. The main(More)
Identifying synonyms is important for many natural language processing and information retrieval applications. In this paper we address the task of automatically identifying synonyms in Croatian language using distributional semantic models (DSM). We build several DSMs using latent semantic analysis (LSA) and random indexing (RI) on the large hrWaC corpus.(More)
In this paper we present CroNER, a named entity recognition and classification system for Croatian language based on supervised sequence labeling with conditional random fields (CRF). We use a rich set of lexical and gazetteer-based features and different methods for enforcing document-level label consistency. Extensive evaluation shows that our method(More)
Derivational models are still an under-researched area in computational morphology. Even for German, a rather resource-rich language, there is a lack of large-coverage derivational knowledge. This paper describes a rule-based framework for inducing derivational families (i.e., clusters of lemmas in derivational relationships) and its application to create a(More)