Learn More
We present experiments with part-of-speech tagging for Bulgarian, a Slavic language with rich inflectional and deriva-tional morphology. Unlike most previous work, which has used a small number of grammatical categories, we work with 680 morpho-syntactic tags. We combine a large morphological lexicon with prior linguistic knowledge and guided learning from(More)
This paper proposes a fast and simple unsuper-vised word segmentation algorithm that utilizes the local predictability of adjacent character sequences, while searching for a least-effort representation of the data. The model uses branching entropy as a means of constraining the hypothesis space, in order to efficiently obtain a solution that minimizes the(More)
This paper proposes a combined model for POS tagging, dependency parsing and co-reference resolution for Bulgarian — a pro-drop Slavic language with rich mor-phosyntax. We formulate an extension of the MSTParser algorithm that allows the simultaneous handling of the three tasks in a way that makes it possible for each task to benefit from the information(More)
In the CLEF 2012 the BulTreeBank Group of LMD, IICT, BAS is participating for QA4MRE task for Bulgarian. The system represented in the paper exploits an NLP Pipeline for Bulgarian in order to process the questions, answers and the supporting texts. Then we represent the results of the analysis as a bag of linguistic units-lemmas, dependency relations. These(More)
We describe experiments with building a rec-ognizer for disease names in Bulgarian clinical epicrises, where both the language and the domain are different from those in mainstream research, which has focused on PubMed articles in English. We show that using a general framework such as GATE and an appropriate pragmatic methodology can yield significant(More)
The performance of NLP classifiers largely depends on the quality of the features considered for prediction (feature engineering). However, as the number of features increases, the more likely overfit-ting becomes and performance decreases. Also, due to the very large number of features, only slimple linear classifiers are considered, thus disregarding(More)
We describe three language-independent methods for the task of answer validation. All methods are based on a scoring mechanism that reflects the degree of similarity between the question-answer pairs and the supporting text. We evaluate the proposed methods when using various string similarity metrics, such as exact matching, Levenshtein, Jaro and(More)
  • 1