• Publications
  • Influence
Self-taught hashing for fast similarity search
TLDR
The ability of fast similarity search at large scale is of great importance to many Information Retrieval (IR) applications. Expand
  • 350
  • 72
  • PDF
Question classification using support vector machines
TLDR
We propose to use a special kernel function called the tree kernel to enable the SVM to take advantage of the syntactic structures of questions. Expand
  • 642
  • 71
  • PDF
IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models
TLDR
This paper provides a unified account of two schools of thinking in information retrieval modelling: the generative retrieval focusing on predicting relevant documents given a query, and the discriminative retrieval focus on predicting relevancy given the query. Expand
  • 303
  • 63
  • PDF
Semantic, Hierarchical, Online Clustering of Web Search Results
TLDR
We propose a Semantic, Hierarchical, Online Clustering (SHOC) approach to automatically organizing Web search results into groups through a semantic, hierarchical, online clustering approach named SHOC. Expand
  • 199
  • 14
  • PDF
Extracting key-substring-group features for text classification
TLDR
In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words. Expand
  • 60
  • 12
  • PDF
Combining lexicon and learning based approaches for concept-level sentiment analysis
TLDR
We present the anatomy of pSenti --- a concept-level sentiment analysis system that seamlessly integrates into opinion mining lexicon-based and learning-based approaches. Expand
  • 171
  • 8
  • PDF
Web taxonomy integration using support vector machines
TLDR
We address the problem of integrating objects from a source taxonomy into a master taxonomy. Expand
  • 70
  • 8
  • PDF
Laplacian Co-hashing of Terms and Documents
TLDR
In this paper, we introduce the novel problem of co-hashing where both documents and terms are hashed simultaneously according to their semantic similarities. Expand
  • 38
  • 8
  • PDF
Text classification with kernels on the multinomial manifold
TLDR
We prove that the Negative Geodesic Distance (NGD) on the multinomial manifold is conditionally positive definite (cpd), thus can be used as a kernel in SVMs. Expand
  • 75
  • 7
  • PDF
Understanding user intent in community question answering
TLDR
In this paper, we propose to classify questions into three categories according to their underlying user intent: subjective, objective, and social. Expand
  • 53
  • 5
  • PDF
...
1
2
3
4
5
...