Algorithms for Similarity Relation Learning from High Dimensional Data

@article{Janusz2014AlgorithmsFS,
  title={Algorithms for Similarity Relation Learning from High Dimensional Data},
  author={Andrzej Janusz},
  journal={Trans. Rough Sets},
  year={2014},
  volume={17},
  pages={174-292}
}
  • Andrzej Janusz
  • Published 6 February 2014
  • Computer Science
  • Trans. Rough Sets
The notion of similarity plays an important role in machine learning and artificial intelligence. It is widely used in tasks related to a supervised classification, clustering, an outlier detection and planning. Moreover, in domains such as information retrieval or case-based reasoning, the concept of similarity is essential as it is used at every phase of the reasoning cycle. The similarity itself, however, is a very complex concept that slips out from formal definitions. A similarity of two… 

Hierarchy-based semantic embeddings for single-valued & multi-valued categorical variables

TLDR
This paper presents a method that uses prior knowledge of the application domain to support machine learning in cases with insufficient data, and proposes two embedding schemes for single-valued and multi-valued categorical data.

Identification of Product's Features Based on Customer Reviews

TLDR
The main aim of this paper is to present a method for mining reviews considering products' features, extracting products’ features and preparing a summary of reviews using a new promising technique—RuleBased Similarity Model.

Discernibility Matrix and Rules Acquisition Based Chinese Question Answering System

TLDR
The experimental results show that the proposed representation method of QA patterns has good flexibility to deal with the uncertainty caused by the Chinese word segmentation, and the proposed method has good performance at both MAP and MRR on the test data.

Mining Data from Coal Mines: IJCRS'15 Data Challenge

TLDR
The topic of this data mining competition was related to the problem of active safety monitoring in underground corridors and the task was to design an efficient method of predicting dangerous concentrations of methane in longwalls of a Polish coal mine.

A Resemblance Based Approach for Recognition of Risks at a Fire Ground

TLDR
This research split the actions into a set of frames which compose a timeline of a firefighting process and applies a comparator framework for the evaluation of similarities between the processes to recognize the risks that appear during the actions.

Dyskretyzacja z nadzorem tablic danych przy użyciu wielordzeniowego procesora karty graficznej (GPU)

Streszczenie. Niniejszy artykul opisuje opracowany algorytm do dyskretyzacji tablic, polegający na masowym zrownolegleniu wyliczania optymalnego ciecia, poprzez jednoczesne badanie bardzo wielu

References

SHOWING 1-10 OF 216 REFERENCES

Similarity Relation in Classification Problems

TLDR
This paper presents a methodology of constructing robust classifiers based on a concept called a Hierarchic Similarity Model (HSM), which can be used to construct classifiers which may successfully compete with other popular methods like boosted decision trees or k-NN algorithm.

Transactions on Rough Sets XV

TLDR
Dynamic Rules-based Similarity model (DRBS) is presented which is designed to boost the quality of the learned relation in case of highly dimensional data and can successfully compete with other state-of-the-art algorithms such as Random Forest or SVM.

Rough Sets Similarity-Based Learning from Databases

TLDR
This work presents a similarity-based learning method from databases in the context of rough set theory that can analyse the attribute in the databases by using roughSet theory and identify the relevant attributes to the task attributes.

Rule-Based Similarity for Classification

  • Andrzej Janusz
  • Computer Science
    2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology
  • 2009
TLDR
A new model of similarity is presented, called Rule-based Similarity (RBS), in which the similarity is expressed in terms of higher-level binary features of objects, which may be associated with decision rules derived from data and interpreted as arguments for a similarity or for a dissimilarity of the examined objects.

Dynamic Rule-Based Similarity Model for DNA Microarray Data

TLDR
Dynamic Rules-based Similarity model (DRBS) is presented which is designed to boost the quality of the learned relation in case of highly dimensional data and can successfully compete with other state-of-the-art algorithms such as Random Forest or SVM.

Semi-supervised clustering: probabilistic models, algorithms and experiments

TLDR
This thesis presents probabilistic models for semi-supervised clustering, develops algorithms based on these models and empirically validate their performances by extensive experiments on data sets from different domains, e.g., text analysis, hand-written character recognition, and bioinformatics.

Utilization of Dynamic Reducts to Improve Performance of the Rule-Based Similarity Model for Highly-Dimensional Data

  • Andrzej Janusz
  • Computer Science
    2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology
  • 2010
TLDR
The extended RBS model is presented -- a novel rough set approach to the problem of learning a similarity relation from data that is significantly more accurate than the original RBS as well as some other popular classification algorithms, such as the \emph{random forest} or $k$-NN combined with several attribute selection methods.

Analogy-Based Reasoning in Classifier Construction

  • A. Wojna
  • Computer Science
    Trans. Rough Sets
  • 2005
TLDR
This dissertation introduces two new classification models based on the k-nn algorithm and proposes a method for dealing with such sets based on locally induced metrics that improved significantly the classification accuracy of methods with global models in the hardest tested problems.

Distance Metric Learning with Application to Clustering with Side-Information

TLDR
This paper presents an algorithm that, given examples of similar (and, if desired, dissimilar) pairs of points in �”n, learns a distance metric over ℝn that respects these relationships.

Correlation-based Feature Selection for Machine Learning

TLDR
This thesis addresses the problem of feature selection for machine learning through a correlation based approach with CFS (Correlation based Feature Selection), an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy.
...