Learn More
Singular-value decomposition (SVD) [and principal component analysis (PCA)] is one of the most widely used techniques for dimensionality reduction: successful and efficiently computable, it is nevertheless plagued by a well-known, well-documented sensitivity to outliers. Recent work has considered the setting where each point has a few arbitrarily corrupted(More)
We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and regularization has implications for both algorithms, and analysis. In terms of algorithms, the equivalence suggests more general SVM-like algorithms for(More)
We derive generalization bounds for learning algorithms based on their robustness: the property that if a testing sample is “similar” to a training sample, then the testing error is close to the training error. This provides a novel approach, different from complexity or stability arguments, to study generalization of learning algorithms. One advantage of(More)
This paper considers the problem of clustering a partially observed unweighted graph—i.e., one where for some node pairs we know there is an edge between them, for some others we know there is no edge, and for the remaining we do not know whether or not there is an edge. We want to organize the nodes into disjoint clusters so that there is relatively dense(More)
BACKGROUND The subtype distribution of lymphoid neoplasms in Southwest China was analyzed according to WHO classifications. This study aims to analyze subtype distribution of lymphomas in southwest China. METHODS Lymphoid neoplasms diagnosed within 9 years in a single institution in Southwest China were analyzed according to the WHO classification. (More)
Principal component analysis plays a central role in statistics, engineering, and science. Because of the prevalence of corrupted data in real-world applications, much research has focused on developing robust algorithms. Perhaps surprisingly, these algorithms are unequipped-indeed, unable-to deal with outliers in the high-dimensional setting where the(More)
The expression of carcino-embryonic antigen by colorectal cancer is an example of oncogenic activation of embryonic gene expression. Hypothesizing that oncogenesis-recapitulating-ontogenesis may represent a broad programmatic commitment, we compared gene expression patterns of human colorectal cancers (CRCs) and mouse colon tumor models to those of mouse(More)
The majority of common diseases are multi-factorial and modified by genetically and mechanistically complex polygenic interactions and environmental factors. High-throughput genome-wide studies like linkage analysis and gene expression profiling, tend to be most useful for classification and characterization but do not provide sufficient information to(More)