The molecular portraits of breast tumors are conserved across microarray platforms


Validation of a novel gene expression signature in independent data sets is a critical step in the development of a clinically useful test for cancer patient risk-stratification. However, validation is often unconvincing because the size of the test set is typically small. To overcome this problem we used publicly available breast cancer gene expression data sets and a novel approach to data fusion, in order to validate a new breast tumor intrinsic list. A 105-tumor training set containing 26 sample pairs was used to derive a new breast tumor intrinsic gene list. This intrinsic list contained 1300 genes and a proliferation signature that was not present in previous breast intrinsic gene sets. We tested this list as a survival predictor on a data set of 311 tumors compiled from three independent microarray studies that were fused into a single data set using Distance Weighted Discrimination. When the new intrinsic gene set was used to hierarchically cluster this combined test set, tumors were grouped into LumA, LumB, Basal-like, HER2+/ER-, and Normal Breast-like tumor subtypes that we demonstrated in previous datasets. These subtypes were associated with significant differences in Relapse-Free and Overall Survival. Multivariate Cox analysis of the combined test set showed that the intrinsic subtype classifications added significant prognostic information that was independent of standard clinical predictors. From the combined test set, we developed an objective and unchanging classifier based upon five intrinsic subtype mean expression profiles (i.e. centroids), which is designed for single sample predictions (SSP). The SSP approach was applied to two additional independent data sets and consistently predicted survival in both systemically treated and untreated patient groups. This study validates the "breast tumor intrinsic" subtype classification as an objective means of tumor classification that should be translated into a clinical assay for further retrospective and prospective validation. In addition, our method of combining existing data sets can be used to robustly validate the potential clinical value of any new gene expression profile.

DOI: 10.1186/1471-2164-7-96

Extracted Key Phrases

1 Figure or Table

Citations per Year

7,045 Citations

Semantic Scholar estimates that this publication has 7,045 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Hu2006TheMP, title={The molecular portraits of breast tumors are conserved across microarray platforms}, author={Zhiyuan Hu and Cheng Fan and Daniel Sunho Oh and Js Marron and Xiaping He and Bahjat F. Qaqish and Chad Livasy and Lisa A. Carey and Evangeline Reynolds and Lynn G. Dressler and Andrew B. Nobel and Joel S. Parker and Matthew G. Ewend and Lynda R. Sawyer and Junyuan Wu and Yudong Liu and Rita Nanda and Maria Tretiakova and Alejandra Ruiz Orrico and Donna Dreher and Juan Pablo Palazzo and Laurent Perreard and Edward W. Nelson and Mary C. Mone and H. J. Hansen and Michael E Mullins and John Quackenbush and Matthew J. Ellis and Olufunmilayo I. Olopade and Philip S. Bernard and Charles M. Perou}, journal={BMC Genomics}, year={2006}, volume={7}, pages={96 - 96} }