#### Filter Results:

#### Publication Year

1993

2017

#### Publication Type

#### Co-author

#### Publication Venue

#### Data Set Used

#### Key Phrases

Learn More

Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called… (More)

MOTIVATION
Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classification methods and examined many issues important for a practical recognition system.
RESULTS
Most current discriminative methods for protein fold prediction use the… (More)

We present several new variations on the theme of nonnegative matrix factorization (NMF). Considering factorizations of the form X = FG<sup>T</sup>, we focus on algorithms in which G is restricted to containing nonnegative entries, but allowing the data matrix X to have mixed signs, thus extending the applicable range of NMF methods. We also consider… (More)

How to selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their differential expressions among phenotypes and pick the top-ranked genes. We observe that feature sets so obtained have certain redundancy and study methods to… (More)

Currently, most research on nonnegative matrix factorization (NMF)focus on 2-factor $X=FG^T$ factorization. We provide a systematicanalysis of 3-factor $X=FSG^T$ NMF. While it unconstrained 3-factor NMF is equivalent to it unconstrained 2-factor NMF, itconstrained 3-factor NMF brings new features to it constrained 2-factor NMF. We study the orthogonality… (More)

Current nonnegative matrix factorization (NMF) deals with X = F G T type. We provide a systematic analysis and extensions of NMF to the symmetric W = HH T , and the weighted W = HSH T. We show that (1) W = HH T is equivalent to Kernel K-means clustering and the Laplacian-based spectral clustering. (2) X = F G T is equivalent to simultaneous clustering of… (More)

Principal component analysis (PCA) is a widely used statistical technique for unsupervised dimension reduction. <i>K</i>-means clustering is a commonly used data clustering for performing unsupervised learning tasks. Here we prove that principal components are the continuous solutions to the discrete cluster membership indicators for <i>K</i>-means… (More)

Feature selection is an important component of many machine learning applications. Especially in many bioinformatics tasks, efficient and robust feature selection methods are desired to extract meaningful features and eliminate noisy ones. In this paper, we propose a new robust feature selection method with emphasizing joint 2,1-norm minimization on both… (More)

Principal component analysis (PCA) minimizes the sum of squared errors (<i>L</i><inf>2</inf>-norm) and is sensitive to the presence of outliers. We propose a <i>rotational invariant L</i><inf>1</inf>-norm PCA (<i>R</i><inf>1</inf>-PCA). <i>R</i><inf>1</inf>-PCA is similar to PCA in that (1) it has a unique global solution, (2) the solution are principal… (More)

An important application of graph partitioning is data clustering using a,graph model-the pairwise similarities between all data objects form a weighted graph adjacency matrix that contains all necessary information for clustering. Here we propose a new algorithm for graph partition with an objective function that follows the min-mas clustering principle.… (More)