Learn More
E-Dictor is a tool for encoding, applying levels of editions, and assigning part-of-speech tags to ancient texts. In short, it works as a WYSIWYG interface to encode text in XML format. It comes from the experience during the building of the Tycho Brahe Parsed Corpus of Historical Portuguese and from consortium activities with other research groups.(More)
This paper describes an approach used in the 2012 Probabilistic Automata Learning Competition. The main goal of the competition was to obtain insights about which techniques and approaches work best for sequence learning based on different kinds of automata generating machines. This paper proposes the usage of n-gram models with variable length. Experiments(More)
During the development of an ontology it may be important to know which is the logic underlying that particular ontology, so that the developer knows what the expected complexity of reasoning over it will be. In this paper, we first present an ontology that describes several description logics and then two different classifiers that were implemented using(More)
Variable-Length Markov Chains (VLMCs) offer a way of modeling contexts longer than trigrams without suffering from data sparsity and state space complexity. However, in Historical Portuguese, two words show a high degree of ambiguity: que and a. The number of errors tagging these words corresponds to a quarter of the total errors made by a VLMC-based(More)
All approaches today for multilingual dependency parsing don't use any support of sister languages. This paper present a very different approach to deal with data sparsity, language transfer, etc. Our approach really use sister languages theory, combining resources from that type of languages, we want a model who can really parse many different languages.(More)
  • 1