LOH and Behold: Web-Scale Visual Search, Recommendation and Clustering Using Locally Optimized Hashing

  title={LOH and Behold: Web-Scale Visual Search, Recommendation and Clustering Using Locally Optimized Hashing},
  author={Yannis Kalantidis and Lyndon S. Kennedy and Huy Nguyen and Clayton Mellina and David A. Shamma},
We propose a novel hashing-based matching scheme, called Locally Optimized Hashing (LOH), based on a state-of-the-art quantization algorithm that can be used for efficient, large-scale search, recommendation, clustering, and deduplication. We show that matching with LOH only requires set intersections and summations to compute and so is easily implemented in generic distributed computing systems. We further show application of LOH to: (a) large-scale search tasks where performance is on par… 

An Evaluation of Large-scale Methods for Image Instance and Class Discovery

This paper evaluates and shows the interest of diffusion methods that have been neglected by the state of the art, such as the Markov Clustering algorithm, and shows that descriptions selected for instance search improve the discovery of object classes.

Delving Deep into Personal Photo and Video Search

This paper is the first to study personal media search using large-scale real-world search logs and proposes the deep query understanding model to learn a mapping from the personal queries to the concepts in the clicked photos.

A Hybrid Semantic Algorithm for Web Image Retrieval Incorporating Ontology Classification and User-Driven Query Expansion

A system that facilitates modeling of homonymous and synonymous ontologies that understands the users’ need for images and a Hybrid Semantic Algorithm that computes the semantic similarity using APMI is proposed.

Clustering and Its Extensions in the Social Media Domain

This chapter summarizes existing clustering and related approaches for the identified challenges as described in Sect. 1.2 and presents the key branches of social media mining applications where

Low-Shot Learning with Large-Scale Diffusion

This paper considers the problem of inferring image labels from images when only a few annotated examples are available at training time and considers a semi-supervised setting based on a large collection of images to support label propagation.

Multimodal Classification of Moderated Online Pro-Eating Disorder Content

A deep learning classifier is developed that jointly models textual and visual characteristics of pro-eating disorder content that violates community guidelines and discovers deviant content efficiently while also maintaining high recall.



Web-Scale Image Clustering Revisited

This work revisits recent advances in approximate k-means variants, and designs a dynamic variant that is able to determine the number of clusters k in a single run at nearly zero additional cost, and achieves clustering of a 100 million image collection on a single machine in less than one hour.

Locality sensitive hashing: A comparison of hash function types and querying mechanisms

Supervised Discrete Hashing

This work proposes a new supervised hashing framework, where the learning objective is to generate the optimal binary hash codes for linear classification, and introduces an auxiliary variable to reformulate the objective such that it can be solved substantially efficiently by employing a regularization algorithm.

Asymmetric Distances for Binary Embeddings

This work proposes two general asymmetric distances that are applicable to a wide variety of embedding techniques including locality sensitive hashing (LSH), locality sensitive binary codes (LSBC), spectral hashing (SH), PCA embedding (PCA), PCAE with random rotations (PCAE-RR), and PCA with iterative quantization (PCae-ITQ).

Locality-sensitive hashing scheme based on p-stable distributions

A novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions that improves the running time of the earlier algorithm and yields the first known provably efficient approximate NN algorithm for the case p<1.

Complementary Projection Hashing

This paper proposes a novel algorithm named Complementary Projection Hashing (CPH), which aims at sequentially finding a series of hyper planes (hashing functions) which cross the sparse region of the data.

Web scale photo hash clustering on a single machine

This paper presents a fast binary k-means algorithm that works directly on the similarity-preserving hashes of images and clusters them into binary centers on which to build hash indexes to speedup computation.

Efficient manifold ranking for image retrieval

The original manifold ranking algorithm is extended and a new framework named Efficient Manifold Ranking (EMR) is proposed, which significantly reduces the computational time and makes it a promising method to large scale real world retrieval problems.

Mining Multiple Queries for Image Retrieval: On-the-Fly Learning of an Object-Specific Mid-level Representation

A new method based on pattern mining is proposed, using the minimal description length principle, to derive the most suitable set of patterns to describe the query object, with patterns corresponding to local feature configurations, which results in a powerful object-specific mid-level image representation.

Binary Code Ranking with Weighted Hamming Distance

This paper proposes a weighted Hamming distance ranking algorithm (WhRank) to rank the binary codes of hashing methods by assigning different bit-level weights to different hash bits, so that the returned binary codes are ranked at a finer-grained binary code level.