Learn More
Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets,(More)
Type 2 diabetes mellitus (DM2) affects over 110 million people worldwide and is a principal contributor to atherosclerotic vascular disease, blindness, amputation and kidney failure 1. Defects in insulin secretion are observed early in individuals with maturity-onset diabetes of the young, a monogenic form of type 2 diabetes 2 ; insulin resistance at(More)
Prostate tumors are among the most heterogeneous of cancers, both histologically and clinically. Microarray expression analysis was used to determine whether global biological differences underlie common pathological features of prostate cancer and to identify genes that might anticipate the clinical behavior of this disease. While no expression correlates(More)
In this paper we present a new methodology of class discovery and clustering validation tailored to the task of analyzing gene expression data. The method can best be thought of as an analysis approach, to guide and assist in the use of any of a wide range of available clustering algorithms. We call the new methodology consensus clustering, and in(More)
Diffuse large B-cell lymphoma (DLBCL), the most common lymphoid malignancy in adults, is curable in less than 50% of patients. Prognostic models based on pre-treatment characteristics, such as the International Prognostic Index (IPI), are currently used to predict outcome in DLBCL. However, clinical outcome models identify neither the molecular basis of(More)
Embryonal tumours of the central nervous system (CNS) represent a heterogeneous group of tumours about which little is known biologically, and whose diagnosis, on the basis of morphologic appearance alone, is controversial. Medulloblastomas, for example, are the most common malignant brain tumour of childhood, but their pathogenesis is unknown, their(More)
We describe here the use of nonnegative matrix factorization (NMF), an algorithm based on decomposition by parts that can reduce the dimension of expression data from thousands of genes to a handful of metagenes. Coupled with a model selection mechanism, adapted to work for any stochastic clustering algorithm, NMF is an efficient method for identification(More)
MOTIVATION Well-annotated gene sets representing the universe of the biological processes are critical for meaningful and insightful interpretation of large-scale genomic data. The Molecular Signatures Database (MSigDB) is one of the most widely used repositories of such sets. RESULTS We report the availability of a new version of the database, MSigDB(More)
A comprehensive understanding of the molecular vulnerabilities of every type of cancer will provide a powerful roadmap to guide therapeutic approaches. Efforts such as The Cancer Genome Atlas Project will identify genes with aberrant copy number, sequence, or expression in various cancer types, providing a survey of the genes that may have a causal role in(More)
Using gene expression data to classify tumor types is a very promising tool in cancer diagnosis. Previous works show several pairs of tumor types can be successfully distinguished by their gene expression patterns (Golub et al. 1999, Ben-Dor et al. 2000, Alizadeh et al. 2000). However, the simultaneous classification across a heterogeneous set of tumor(More)