Preferences Prediction using a Gallery of Mobile Device based on Scene Recognition and Object Detection

  title={Preferences Prediction using a Gallery of Mobile Device based on Scene Recognition and Object Detection},
  author={A. Savchenko and Kirill V. Demochkin and Ivan S. Grechikhin},
In this paper user modeling task is examined by processing a gallery of photos and videos on a mobile device. We propose novel engine for user preference prediction based on scene recognition, object detection and facial analysis. At first, all faces in a gallery are clustered and all private photos and videos with faces from large clusters are processed on the embedded system in offline mode. Other photos may be sent to the remote server to be analyzed by very deep models. The visual features… Expand


Scene Recognition in User Preference Prediction Based on Classification of Deep Embeddings and Object Detection
Experimental results with a subset of ImageNet dataset demonstrate that the proposed approach is up to 5% more accurate when compared to conventional fine-tuned models. Expand
User Modeling on Mobile Device Based on Facial Clustering and Object Detection in Photos and Videos
An approach for extraction of user preferences based on the analysis of a gallery of photos and videos on mobile device is proposed to firstly use fast SSD-based methods in order to detect objects of interests in offline mode directly on mobile devices. Expand
Efficient facial representations for age, gender and identity recognition in organizing photo albums using multi-output ConvNet
  • A. Savchenko
  • Computer Science, Medicine
  • PeerJ Comput. Sci.
  • 2019
It is experimentally demonstrated that the quality of facial clustering for the developed network is competitive with the state-of-the-art results achieved by deep neural networks, though implementation of the proposed approach is much computationally cheaper. Expand
Recognizing and Curating Photo Albums via Event-Specific Image Importance
This paper proposes a hybrid system consisting of a siamese network-based event-specific image importance prediction, a Convolutional Neural Network that recognizes the event type, and a Long Short-Term Memory (LSTM)-based sequence level event recognizer. Expand
Representing scenes for real-time context classification on mobile devices
The DCT-GIST image representation model is introduced which is useful to summarize the context of the scene, and closely matches other state-of-the-art methods based on bag of Textons collected on spatial hierarchy. Expand
Recognize complex events from static images by fusing deep channels
Inspired by the recent success of deep learning, a multi-layer framework is formulated to tackle the problem of event recognition, which takes into account both visual appearance and the interactions among humans and objects and combines them via semantic fusion. Expand
Privacy-CNH: A Framework to Detect Photo Privacy with Convolutional Neural Network using Hierarchical Features
A new framework called Privacy-CNH is proposed that utilizes hierarchical features which include both object and convolutional features in a deep learning model to detect privacy at risk photos and provides a richer model to understand photo privacy from different aspects, thus improving photo privacy detection accuracy. Expand
Neural Aggregation Network for Video Face Recognition
This NAN is trained with a standard classification or verification loss without any extra supervision signal, and it is found that it automatically learns to advocate high-quality face images while repelling low-quality ones such as blurred, occluded and improperly exposed faces. Expand
DLDR: Deep Linear Discriminative Retrieval for Cultural Event Classification from a Single Image
This paper uses convolutional neural networks with VGG-16 architecture, pretrained on ImageNet or the Places205 dataset for image classification, and fine-tuned on cultural events data to solve the classification of cultural events from a single image with a deep learning based method. Expand
Places: A 10 Million Image Database for Scene Recognition
The Places Database is described, a repository of 10 million scene photographs, labeled with scene semantic categories, comprising a large and diverse list of the types of environments encountered in the world, using the state-of-the-art Convolutional Neural Networks as baselines, that significantly outperform the previous approaches. Expand