Deceiving Google’s Cloud Video Intelligence API Built for Summarizing Videos

  title={Deceiving Google’s Cloud Video Intelligence API Built for Summarizing Videos},
  author={Hossein Hosseini and Baicen Xiao and Radha Poovendran},
  journal={2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
Despite the rapid progress of the techniques for image classification, video annotation has remained a challenging task. [] Key Method A demonstration website has been also launched, which allows anyone to select a video for annotation. The API then detects the video labels (objects within the video) as well as shot labels (description of the video events over time).,,,,,,In this paper, we examine the usability of the Google's Cloud Video Intelligence API in adversarial environments. In particular, we…

Figures and Tables from this paper

Adversarial Video Captioning

This is the first successful method for targeted attacks against a video captioning model, able to inject 'subliminal' perturbations into the video stream, and force the model to output a chosen caption with up to 0.981 cosine similarity, achieving near-perfect similarity to chosen target captions.

Google's Cloud Vision API is Not Robust to Noise

By adding sufficient noise to the image, the Google Cloud Vision API generates completely different outputs for the noisy image, while a human observer would perceive its original content, suggesting that cloud vision API can readily benefit from noise filtering, without the need for updating image analysis algorithms.

Sparse Adversarial Perturbations for Videos

An l2,1-norm based optimization algorithm is proposed to compute the sparse adversarial perturbations for videos and chooses the action recognition as the targeted task, and networks with a CNN+RNN architecture as threat models to verify the method.

Adversarial Evasion Noise Attacks Against TensorFlow Object Detection API

The positive effect of low-density additive noise in terms of improving the performance of the ML models such that they could be considered to be added as a new feature vector is shown.

When George Clooney Is Not George Clooney: Using GenAttack to Deceive Amazon's and Naver's Celebrity Recognition APIs

A novel way to generate adversarial example images using an evolutionary genetic algorithm (GA) and demonstrates the practicability of generating adversarial examples and successfully fooling the state-of-the-art commercial image recognition systems.

Negative Adversarial Example Generation Against Naver's Celebrity Recognition API

This work generates adversarial images against Naver's celebrity recognition API and demonstrates that it is extremely easy to fool the online DNN-based APIs using adversarial examples and discusses possible negative impacts resulting from these adversarialExamples.

Image Processing and Location based Image Querier(LBIQ)

An LBIQ takes into account only a single input that is an image and is able to give out a set of attributes which can be further processed for convenience of service.

Avoiding the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training

It is shown that the bias can be reduced in the sentence representations by using an ensemble of adversaries, encouraging the model to jointly decrease the accuracy of these different adversaries while fitting the data.

Gone at Last: Removing the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training

It is shown that using an ensemble of adversaries can prevent the bias from being relearned after the model training is completed, further improving how well the model generalises to different NLI datasets.

Enhancing robustness of machine learning systems via data transformations

The use of data transformations as a defense against evasion attacks on ML classifiers is effective against the best known evasion attacks from the literature, resulting in a two-fold increase in the resources required by a white-box adversary with knowledge of the defense.

A Generic Framework for Video Annotation via Semi-Supervised Learning

A Fast Graph-based Semi-Supervised Multiple Instance Learning (FGSSMIL) algorithm is proposed to jointly explore small-scale expert labeled videos and large-scale unlabeled videos to train the models and results compared with the state-of-the-arts are promising and demonstrate the effectiveness and efficiency of the proposed approach.

Efficiently Scaling Up Video Annotation with Crowdsourced Marketplaces

This work has created a public framework for dividing the work of labeling video data into micro-tasks that can be completed by huge labor pools available through crowdsourced marketplaces and leverages more sophisticated interpolation between key frames to maximize performance given a budget.

Unified Video Annotation via Multigraph Learning

This paper shows that various crucial factors in video annotation, including multiple modalities, multiple distance functions, and temporal consistency, all correspond to different relationships among video units, and hence they can be represented by different graphs, and proposes optimized multigraph-based semi-supervised learning (OMG-SSL), which aims to simultaneously tackle these difficulties in a unified scheme.

Interactive Video Indexing With Statistical Active Learning

A novel active learning approach based on the optimum experimental design criteria in statistics is proposed that simultaneously exploits sample's local structure, and sample relevance, density, and diversity information, as well as makes use of labeled and unlabeled data.

Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation

This work proposes a novel neighborhood similarity measure, which explores the local sample and label distributions and shows that the neighborhood similarity between two samples simultaneously takes into account three characteristics: their distance; the distribution difference of the surrounding samples; and the distribution different of surrounding labels.

Deceiving Google's Perspective API Built for Detecting Toxic Comments

It is shown that an adversary can subtly modify a highly toxic phrase in a way that the system assigns significantly lower toxicity score to it, and this attack can consistently reduce the toxicity scores to the level of the non-toxic phrases.

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.

Very Deep Convolutional Networks for Large-Scale Image Recognition

This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

The Limitations of Deep Learning in Adversarial Settings

This work formalizes the space of adversaries against deep neural networks (DNNs) and introduces a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs.

Assistive tagging: A survey of multimedia tagging with human-computer joint exploration

Along with the explosive growth of multimedia data, automatic multimedia tagging has attracted great interest of various research communities, such as computer vision, multimedia, and information