Semantic hashing using tags and topic modeling

@article{Wang2013SemanticHU,
  title={Semantic hashing using tags and topic modeling},
  author={Qifan Wang and Dan Zhang and Luo Si},
  journal={Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval},
  year={2013}
}
  • Qifan Wang, Dan Zhang, Luo Si
  • Published 28 July 2013
  • Computer Science
  • Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
It is an important research problem to design efficient and effective solutions for large scale similarity search. One popular strategy is to represent data examples as compact binary codes through semantic hashing, which has produced promising results with fast search speed and low storage cost. Many existing semantic hashing methods generate binary codes for documents by modeling document relationships based on similarity in a keyword feature space. Two major limitations in existing methods… 

Figures and Tables from this paper

Short Text Hashing Improved by Integrating Multi-granularity Topics and Tags
TLDR
This paper proposes a selection method to choose the optimal multi-granularity topics depending on the type of dataset, and designs two distinct hashing strategies to incorporate multi- granularity topics.
Short Text Hashing Improved by Integrating Topic Features and Tags
TLDR
This work proposes a novel unified hashing approach that the optimal topic features can be selected automatically to be integrated with original features for preserving similarity, and tags are fully utilized to improve hash code learning.
Sparse Semantic Hashing for Efficient Large Scale Similarity Search
TLDR
A unified framework is designed for ensuring the hidden semantic structure among the documents by a sparse coding model, while at the same time preserving the document similarity via graph Laplacian and an iterative coordinate descent procedure is proposed for solving the optimization problem.
Active hashing with joint data example and tag selection
TLDR
This paper proposes a novel active hashing approach, Active Hashing with Joint Data Example and Tag Selection (AH-JDETS), which actively selects the most informative data examples and tags in a joint manner for hashing function learning.
Learning to Hash with Partial Tags: Exploring Correlation between Tags and Hashing Bits for Large Scale Image Retrieval
TLDR
A novel semi-supervised tag hashing (SSTH) approach that fully incorporates tag information into learning effective hashing function by exploring the correlation between tags and hashing bits and improves the effectiveness of hashing function through orthogonal transformation by minimizing the quantization error.
Learning compact hashing codes with complex objectives from multiple sources for large scale similarity search
TLDR
This dissertation addresses five major problems for utilizing supervised information from multiple sources in hashing with respect to different objectives: incorporating semantic tags by modeling the latent correlations between tags and data examples, preserving the similarities between data examples and ensuring the tag consistency via a latent factor model.
Deep Semantic Text Hashing with Weak Supervision
TLDR
Two deep generative semantic hashing models are introduced to leverage weak signals for text hashing and can generate high-quality binary codes without using hand-labeled training data and significantly outperform the competitive unsupervised semantic hashing baselines.
node2hash: Graph aware deep semantic text hashing
Learning compact hashing codes for efficient tag completion and prediction
TLDR
A novel efficient Hashing approach for Tag Completion and Prediction (HashTCP) is proposed that can achieve similar or even better accuracy with state-of-the-art methods and can be much more efficient, which is important for large scale applications.
Variational Deep Semantic Hashing for Text Documents
TLDR
A series of novel deep document generative models for text hashing that can be interpreted as encoder-decoder deep neural networks and thus they are capable of learning complex nonlinear distributed representations of the original documents.
...
...

References

SHOWING 1-10 OF 41 REFERENCES
Composite hashing with multiple information sources
TLDR
The focus of the new research problem is to design an algorithm for incorporating the features from different information sources into the binary hashing codes efficiently and effectively, and to propose an algorithm CHMIS-AW (CHMIS with Adjusted Weights) for learning the codes.
Self-taught hashing for fast similarity search
TLDR
This paper proposes a novel Self-Taught Hashing (STH) approach to semantic hashing: it first finds the optimal l-bit binary codes for all documents in the given corpus via unsupervised learning, and then train l classifiers via supervised learning to predict the l- bit code for any query document unseen before.
Semi-Supervised Hashing for Large-Scale Search
TLDR
This work proposes a semi-supervised hashing (SSH) framework that minimizes empirical error over the labeled set and an information theoretic regularizer over both labeled and unlabeled sets and presents three different semi- supervised hashing methods, including orthogonal hashing, nonorthogonal hash, and sequential hashing.
Boosting multi-kernel locality-sensitive hashing for scalable image retrieval
TLDR
A novel Boosting Multi-Kernel Locality-Sensitive Hashing (BMKLSH) scheme is proposed that significantly boosts the retrieval performance of KLSH by making use of multiple kernels and outperforms the state-of-the-art techniques.
Kernelized locality-sensitive hashing for scalable image search
  • B. Kulis, K. Grauman
  • Computer Science
    2009 IEEE 12th International Conference on Computer Vision
  • 2009
TLDR
It is shown how to generalize locality-sensitive hashing to accommodate arbitrary kernel functions, making it possible to preserve the algorithm's sub-linear time similarity search guarantees for a wide class of useful similarity functions.
Supervised hashing with kernels
TLDR
A novel kernel-based supervised hashing model which requires a limited amount of supervised information, i.e., similar and dissimilar data pairs, and a feasible training cost in achieving high quality hashing, and significantly outperforms the state-of-the-arts in searching both metric distance neighbors and semantically similar neighbors is proposed.
Similarity Search in High Dimensions via Hashing
TLDR
Experimental results indicate that the novel scheme for approximate similarity search based on hashing scales well even for a relatively large number of dimensions, and provides experimental evidence that the method gives improvement in running time over other methods for searching in highdimensional spaces based on hierarchical tree decomposition.
Evaluating topic models for information retrieval
TLDR
Experiments to measure topic models' ability to predict held-out likelihood confirm past results on small corpora, but suggest that simple approaches to topic model are better for large corpora.
Hashing with Graphs
TLDR
This paper proposes a novel graph-based hashing method which automatically discovers the neighborhood structure inherent in the data to learn appropriate compact codes and describes a hierarchical threshold learning procedure in which each eigenfunction yields multiple bits, leading to higher search accuracy.
Sequential Projection Learning for Hashing with Compact Codes
TLDR
This paper proposes a novel data-dependent projection learning method such that each hash function is designed to correct the errors made by the previous one sequentially, and shows significant performance gains over the state-of-the-art methods on two large datasets containing up to 1 million points.
...
...