Breast Cancer Microarray and RNASeq Data Integration Applied to Classification

  title={Breast Cancer Microarray and RNASeq Data Integration Applied to Classification},
  author={Daniel Castillo and Juan Manuel G{\'a}lvez and Luis Javier Herrera and Ignacio Rojas},
Although Next-Generation Sequencing (NGS) has more impact nowadays than microarray sequencing, there is a huge volume of microarray data that has not still been processed. The last represents the most important source of biological information nowadays due largely to its use over many years, and a very important potential source of genetic knowledge deserving appropriate analysis. Thanks to the two techniques, there is now a huge amount of data that allows us to obtain robust results from its… Expand
1 Citations
SGL-SVM: a novel method for tumor classification via support vector machine with sparse group Lasso.
The experimental results show that the proposed method achieves a higher classification accuracy and selects fewer feature genes, which can be widely applied in classification for high-dimensional and small-sample tumor datasets. Expand


virtualArray: a R/bioconductor package to merge raw data from different microarray platforms
The virtualArray software package can combine raw data sets using almost any chip types based on current annotations from NCBI GEO or Bioconductor, and researchers can easily integrate their own microarray data with data from public repositories or other sources based on different microarray chip types. Expand
limma powers differential expression analyses for RNA-sequencing and microarray studies
The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described. Expand
NOIseq: a RNA-seq differential expression method robust for sequencing depth biases
It is observed that many RNA-seq datasets have not reached saturation for detection of expressed genes and that the relative proportion of different transcript biotypes changes with increasing sequencing depth, and a novel differential expression methodology – NOISeq1 that is robust to the amount of reads is proposed. Expand
Gene Expression Studies Using Affymetrix Microarrays
  • H. Göhlmann, W. Talloen
  • Biology, Computer Science
  • Chapman and Hall / CRC mathematical and computational biology series
  • 2009
As one of the part of book categories, gene expression studies using affymetrix microarrays always becomes the most wanted book. Expand
Minimum redundancy feature selection from microarray gene expression data
  • C. Ding, H. Peng
  • Biology, Computer Science
  • Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003
  • 2003
Feature sets obtained through the minimum redundancy - maximum relevance framework represent broader spectrum of characteristics of phenotypes than those obtained through standard ranking methods; they are more robust, generalize well to unseen data, and lead to significantly improved classifications in extensive experiments on 5 gene expressions data sets. Expand
A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae
This study provides a useful and comprehensive comparison between the two platforms (RNA-seq and microrrays) for gene expression analysis and addresses the contribution of the different steps involved in the analysis of RNA-seq data. Expand
RNA-Seq: a revolutionary tool for transcriptomics
The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods. Expand
NCBI GEO: mining tens of millions of expression profiles—database and tools update
A summary of the GEO database structure and user facilities is provided, and recent enhancements to database design, performance, submission format options, data query and retrieval utilities are described. Expand
TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions
TopHat2 is described, which incorporates many significant enhancements to TopHat, and combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes. Expand
HTSeq—a Python framework to work with high-throughput sequencing data
This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis, and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Expand