A.-L Boulesteix

Learn More
Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction patterns given as expressions linked by logical operators. Methods for multiple testing in high-dimensional settings can be applied when many SNPs are considered(More)
In statistical bioinformatics research, different optimization mechanisms potentially lead to " over-optimism " in published papers. The present empirical study illustrates these mechanisms through a concrete example from an active research field. The investigated sources of over-optimism include the optimization of the data sets, of the settings, of the(More)
The first genome sequence assemblies of farm animal species are now accessible through public domain databases, and further sequencing projects are in rapid progress. In addition, large collections of expressed sequences have been obtained, which will aid in constructing annotated transcript maps for many economically important species. Thus, the breeding(More)
We revisit resampling procedures for error estimation in binary classification in terms of U-statistics. In particular, we exploit the fact that the error rate estimator involving all learning-testing splits is a U-statistic. Therefore, several standard theorems on properties of U-statistics apply. In particular, it has minimal variance among all unbiased(More)
We consider simulation studies on supervised learning which measure the performance of a classification-or regression method based on i.i.d. samples randomly drawn from a pre-specified distribution. In a typical setting, a large number of data sets are generated and split into training and test sets used to train and evaluate models, respectively. Here, we(More)
For the last eight years, microarray-based class prediction has been the subject of numerous publications in medicine, bioinformatics and statistics journals. However, in many articles, the assessment of classification accuracy is carried out using suboptimal procedures and is not paid much attention. In this paper, we carefully review various statistical(More)
Svitlana Tyekucheva and Francesca Chiaromonte provide an attractive solution to the problem of the estimation of the inverse covariance matrix with high-dimensional data and small samples, which is an important challenge in modern bioinformatics. 1 Optimizing the noise parameter Our first comment is on the optimization of the model parameter τ controlling(More)
  • 1