Aliaksandr Autayeu

Learn More
Achieving automatic interoperability among systems with diverse data structures and languages expressing different viewpoints is a goal that has been difficult to accomplish. This paper describes S-Match, an open source semantic matching framework that tackles the semantic interoperability problem by transforming several data structures such as business(More)
We study the use of Natural Language Processing techniques to improve different machine learning approaches (Support Vector Machines (SVM), Local SVM, Random Forests), to tackle the problem of automatic keyphrases extraction from scientific papers. For the assessment we propose a large high quality dataset: 2000 ACM papers from the Computer Science domain.(More)
Identifying semantic correspondences between different vocabularies has been recognized as a fundamental step towards achieving interoperability. Several manual and automatic techniques have been recently proposed. Fully manual approaches are very precise, but extremely costly. Conversely, automatic approaches tend to fail when domain specific background(More)
As a valid solution to the semantic heterogeneity problem, many matching solutions have been proposed. Given two lightweight ontologies, we compute the minimal mapping, namely the subset of all possible correspondences, that we call mapping elements, between them such that (i) all the others can be computed from them in time linear in the size of the input(More)
In this paper we use Natural Language Processing techniques to improve different machine learning approaches (Support Vector Machines (SVM), Local SVM, Random Forests) to the problem of automatic keyphrases extraction from scientific papers. For the evaluation we propose a large and high-quality dataset: 2000 ACM papers from the Computer Science domain. We(More)
Evaluating and comparing different ontology matching techniques is a complex multifaceted problem. Currently, diverse golden standards and various practices are used for evaluations. In this paper we show that, by following certain rules, the quality of the evaluations can be significantly improved, particularly in regard to the accuracy of precision and(More)
Understanding metadata written in natural language is a premise to successful automated integration of large scale language-rich datasets, such as digital libraries. In this paper we describe an analysis of the part of speech structure of two different datasets of metadata, show how this structure can be used to detect structural patterns that can be parsed(More)
Controlled vocabularies that power semantic applications allow them to operate with high precision, which comes with a price of having to disambiguate between senses of terms. Fully automatic disambiguation is a largely unsolved problem and semi-automatic approaches are preferred. These approaches involve users to do the disambiguation and require an(More)
Understanding metadata written in natural language is a premise to successful automated integration of large scale, language-rich, classifications such as the ones used in digital libraries. We analyze the natural language labels within classification by exploring their syntactic structure, we then show how this structure can be used to detect patterns of(More)