András Kocsor

Learn More
The core vector machine (CVM) is a recent approach for scaling up kernel methods based on the notion of minimum enclosing ball (MEB). Though conceptually simple, an efficient implementation still requires a sophisticated numerical solver. In this paper, we introduce the enclosing ball (EB) problem where the ball's radius is fixed and thus does not have to(More)
ROC ('receiver operator characteristics') analysis is a visual as well as numerical method used for assessing the performance of classification algorithms, such as those used for predicting structures and functions from sequence data. This review summarizes the fundamental concepts of ROC analysis and the interpretation of results using examples of sequence(More)
Kernel-based nonlinear feature extraction and classification algorithms are a popular new research direction in machine learning. This paper examines their applicability to the classification of phonemes in a phonological awareness drilling software package. We first give a concise overview of the nonlinear feature extraction methods such as kernel(More)
MOTIVATION Distance measures built on the notion of text compression have been used for the comparison and classification of entire genomes and mitochondrial genomes. The present study was undertaken in order to explore their utility in the classification of protein sequences. RESULTS We constructed compression-based distance measures (CBMs) using the(More)
In this paper we introduce a multilingual Named Entity Recognition (NER) system that uses statistical modeling techniques. The system identifies and classifies NEs in the Hungarian and English languages by applying AdaBoostM1 and the C4.5 decision tree learning algorithm. We focused on building as large a feature set as possible, and used a split and(More)
A highly accurate Named Entity (NE) corpus for Hungarian that is publicly available for research purposes is introduced in the paper, along with its main properties. The results of experiments that apply various Machine Learning models and classifier combination schemes are also presented to serve as a benchmark for further research based on the corpus. The(More)
Protein classification by machine learning algorithms is now widely used in structural and functional annotation of proteins. The Protein Classification Benchmark collection (http://hydra.icgeb.trieste.it/benchmark) was created in order to provide standard datasets on which the performance of machine learning methods can be compared. It is primarily meant(More)
Large-margin methods, such as support vector machines (SVMs), have been very successful in classification problems. Recently, maximum margin discriminant analysis (MMDA) was proposed that extends the large-margin idea to feature extraction. It often outperforms traditional methods such as kernel principal component analysis (KPCA) and kernel Fisher(More)