• Publications
  • Influence
Nonlinear Dimension Reduction via Local Tangent Space Alignment
TLDR
A new algorithm for manifold learning and nonlinear dimension reduction is presented based on a set of unorganized data points sampled with noise from the manifold using tangent spaces learned by fitting an affine subspace in a neighborhood of each data point.
R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization
TLDR
Experiments on several real-life datasets show R1-PCA can effectively handle outliers and it is shown that L1-norm K-means leads to poor results while R2-K-MEans outperforms standard K-Means.
Learning Social Infectivity in Sparse Low-rank Networks Using Multi-dimensional Hawkes Processes
TLDR
This paper proposes a convex optimization approach to discover the hidden network of social influence by modeling the recurrent events at different individuals as multidimensional Hawkes processes, emphasizing the mutual-excitation nature of the dynamics of event occurrence.
A min-max cut algorithm for graph partitioning and data clustering
TLDR
This paper proposes a new algorithm for graph partitioning with an objective function that follows the min-max clustering principle, and demonstrates that a linearized search order based on linkage differential is better than that based on the Fiedler vector, providing another effective partitioning method.
Two supervised learning approaches for name disambiguation in author citations
TLDR
Two supervised learning approaches to disambiguate authors in the citations are investigated, one uses the naive Bayes probability model, a generative model; the other uses support vector machines (SVMs) and the vector space representation of citations, a discriminative model.
Spectral Relaxation for K-means Clustering
TLDR
It is shown that a relaxed version of the trace maximization problem possesses global optimal solutions which can be obtained by Computing a partial eigendecomposition of the Gram matrix, and the cluster assignment for each data vectors can be found by computing a pivoted QR decomposition ofThe eigenvector matrix.
On Updating Problems in Latent Semantic Indexing
TLDR
Numerical experiments show that the new SVD-updating algorithms give higher (interpolated) average precisions than the existing algorithms, and the retrieval accuracy is comparable to that obtained using the complete document collection.
Automatic document metadata extraction using support vector machines
TLDR
It is found that discovery and use of the structural patterns of the data and domain based word clustering can improve the metadata extraction performance and an appropriate feature normalization also greatly improves the classification performance.
Scalable Influence Estimation in Continuous-Time Diffusion Networks
TLDR
Experiments on both synthetic and real-world data show that the proposed algorithm can easily scale up to networks of millions of nodes while significantly improves over previous state-of-the-arts in terms of the accuracy of the estimated influence and the quality of the selected nodes in maximizing the influence.
...
...