Weighted quality estimates in machine learning

  title={Weighted quality estimates in machine learning},
  author={Levon Budagyan and Ruben Abagyan},
  volume={22 21},
MOTIVATION Machine learning methods such as neural networks, support vector machines, and other classification and regression methods rely on iterative optimization of the model quality in the space of the parameters of the method. Model quality measures (accuracies, correlations, etc.) are frequently overly optimistic because the training sets are dominated by particular families and subfamilies. To overcome the bias, the dataset is usually reduced by filtering out closely related objects… 

Figures and Tables from this paper

kScore: a novel machine learning approach that is not dependent on the data structure of the training set
The Structural Risk Minimization principle and the linear ε-insensitive loss terms have been added to the kScore optimization function and the resulting kScore algorithm has proven to be quite universal across several datasets and either produces results similar to or outperforms the most predictive machine learning algorithms tested.
Exploring classification strategies with the CoEPrA 2006 contest
This work proposes a novel approach that uses available quantitative information directly for classification rather than first using a regression scheme, and uses a new type of loss function called weighted biased regression.
Multi-stage redundancy reduction: effective utilisation of small protein data sets
This work outlines a process of multi-stage redundancy reduction, whereby the paucity of data can be effectively utilised without compromising the integrity of the model or the testing procedure.
An artificial intelligence-based risk prediction model of myocardial infarction
Compared with traditional models, artificial intelligence–based machine learning models have better accuracy and real-time performance and can reduce the occurrence of in-hospital MI from a data-driven perspective, thereby increasing the cure rate of patients and improving their prognosis.
Prediction of One‐Dimensional Structural Properties Of Proteins by Integrated Neural Networks
This book discusses how proteins can perform a wide range of biological functions from molecular signaling and transportation, molecular motors, structural support to catalyzing chemical reactions as enzymes.


A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection
The results indicate that for real-word datasets similar to the authors', the best method to use for model selection is ten fold stratified cross validation even if computation power allows using more folds.
Instance-based learning algorithms
This paper describes how storage requirements can be significantly reduced with, at most, minor sacrifices in learning rate and classification accuracy and extends the nearest neighbor algorithm, which has large storage requirements.
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
  • A. Atiya
  • Computer Science
    IEEE Transactions on Neural Networks
  • 2005
This book is an excellent choice for readers who wish to familiarize themselves with computational intelligence techniques or for an overview/introductory course in the field of computational intelligence.
Beating the hold-out: bounds for K-fold and progressive cross-validation
It is shown that for any nontrivial learning problem and learning algorithm that is insensitive to example ordering, the k-fold estimate is strictly more accurate than a single hold-out estimate on 1/k of the data, for 2 < k < n (k = n is leave-one-out), based on its variance and all higher moments.
Cross-validation for binary classification by real-valued functions: theoretical analysis
This paper devise new holdout and cross-validation estimators for the case where real-valued functions are used as classifiers, and analyse theoretically the accuracy of these.
Weighting in sequence space: a comparison of methods in terms of generalized sequences.
  • M. Vingron, P. R. Sibbald
  • Biology
    Proceedings of the National Academy of Sciences of the United States of America
  • 1993
A geometric analysis based on a continuous sequence space is presented that provides a common framework in which to compare the methods and concludes that there are two "best" methods for weighting aligned biological sequences.
The Lack of A Priori Distinctions Between Learning Algorithms
It's true that n >> m and π(x) is uniform though, almost all of the terms in the sum have m' = m, so the summand doesn't vary drastically between d X 's of one m' and d X's of another.
Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles
Ensembles of bidirectional recurrent neural network architectures, PSI‐BLAST‐derived profiles, and a large nonredundant training set are used to derive two new predictors for secondary structure predictions, and confusion matrices are reported.
Prediction of MHC Class I Binding Peptides by a Query Learning Algorithm Based on Hidden Markov Models
After 7 rounds of active learning with 181 peptides in all, predictive performance of the algorithm surpassed the so far bestperforming matrix based prediction and by combining the both methods binder peptides could be predicted with84% accuracy.
Combinatorial Methods in Density Estimation. Springer series in statistics
  • 2001