Giuseppe Jurman

Learn More
Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung(More)
Regulated transcription controls the diversity, developmental pathways and spatial organization of the hundreds of cell types that make up a mammal. Using single-molecule cDNA sequencing, we mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene(More)
The concordance of RNA-sequencing (RNA-seq) with microarrays for genome-wide analysis of differential gene expression has not been rigorously assessed using a range of chemical treatment conditions. Here we use a comprehensive study design to generate Illumina RNA-seq and Affymetrix microarray data from the same liver samples of rats exposed in triplicate(More)
MOTIVATION We propose a method for studying the stability of biomarker lists obtained from functional genomics studies. It is common to adopt resampling methods to tune and evaluate marker-based diagnostic and prognostic systems in order to prevent selection bias. Such caution promotes honest estimation of class prediction, but leads to alternative sets of(More)
We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the(More)
We describe a new wrapper algorithm for fast feature ranking in classification problems. The Entropy-based Recursive Feature Elimination (E-RFE) method eliminates chunks of uninteresting features according to the entropy of the weights distribution of a SVM classifier. With specific regard to DNA microarray datasets, the method is designed to support(More)
We show that the Confusion Entropy, a measure of performance in multiclass problems has a strong (monotone) relation with the multiclass generalization of a classical metric, the Matthews Correlation Coefficient. Analytical results are provided for the limit cases of general no-information (n-face dice rolling) of the binary classification. Computational(More)
Many functions have been recently defined to assess the similarity among networks as tools for quantitative comparison. They stem from very different frameworks and they are tuned for dealing with different situations. Here we show an overview of the spectral distances, highlighting their behavior in some basic cases of static and dynamic synthetic and real(More)
Given the complexity of microarray-based gene expression studies, guidelines encourage transparent design and public data availability. Several journals require public data deposition and several public databases exist. However, not all data are publicly available, and even when available, it is unknown whether the published results are reproducible by(More)
The search for predictive biomarkers of disease from high-throughput mass spectrometry (MS) data requires a complex analysis path. Preprocessing and machine-learning modules are pipelined, starting from raw spectra, to set up a predictive classifier based on a shortlist of candidate features. As a machine-learning problem, proteomic profiling on MS data(More)