Learn More
A common problem when using complicated models for prediction and classification is that the complexity of the model entails that it is hard, or impossible, to interpret. For some scenarios this might not be a limitation, since the priority is the accuracy of the model. In other situations the limitations might be severe, since additional aspects are(More)
Most highly accurate predictive modeling techniques produce opaque models. When comprehensible models are required, rule extraction is sometimes used to generate a transparent model, based on the opaque. Naturally, the extracted model should be as similar as possible to the opaque. This criterion, called fidelity, is therefore a key part of the optimization(More)
To improve protein folding simulations, we investigated a new search strategy in combination with the simple genetic algorithm on a two-dimensional lattice model. This search strategy, we called systematic crossover, couples the best individuals, tests every possible crossover point, and takes the two best individuals for the next generation. We compared(More)
This paper addresses the important issue of the tradeoff between accuracy and comprehensibility in data mining. The paper presents results which show that it is, to some extent, possible to bridge this gap. A method for rule extraction from opaque models (Genetic Rule EXtraction – G-REX) is used to show the effects on accuracy when forcing the creation of(More)
This paper presents G-REX, a versatile data mining framework based on genetic programming. What differs G-REX from other GP frameworks is that it doesn't strive to be a general purpose framework. This allows G-REX to include more functionality specific to data mining like preprocessing, evaluation- and optimization methods, but also a multitude of(More)
Solvent entropy is a force to consider in protein folding and protein design but is difficult to model. It is investigated here in the context of the hp model: Two types of residues, hydrophobic and hydrophilic, are modeled on a lattice. Nine chains and two- and three-dimensional simulations are compared. We show that considering solvent entropy alone,(More)
—In conformal prediction, predictive models output sets of predictions with a bound on the error rate. In classification , this translates to that the probability of excluding the correct class is lower than a predefined significance level, in the long run. Since the error rate is guaranteed, the most important criterion for conformal predictors is(More)
— Although data mining is performed to support decision making, many of the most powerful techniques, like neural networks and ensembles, produce opaque models. This lack of interpretability is an obvious disadvantage, since decision makers normally require some sort of explanation before taking action. To achieve comprehensibility, accuracy is often(More)