FrameHopper: Selective Processing of Video Frames in Detection-driven Real-Time Video Analytics

@article{Arefeen2022FrameHopperSP,
  title={FrameHopper: Selective Processing of Video Frames in Detection-driven Real-Time Video Analytics},
  author={Md. Adnan Arefeen and Sumaiya Tabassum Nimi and Md. Yusuf Sarwar Uddin},
  journal={ArXiv},
  year={2022},
  volume={abs/2203.11493}
}
Detection-driven real-time video analytics require continuous detection of objects contained in the video frames using deep learning models like YOLOV3, EfficientDet, etc. However, running these detectors on each and every frame in resource-constrained edge devices is computationally intensive. By taking the temporal correlation between consecutive video frames into account, we note that detection outputs tend to be overlapping in successive frames. Elimination of “similar” consecutive frames… 

References

SHOWING 1-10 OF 26 REFERENCES

Reducto: On-Camera Filtering for Resource-Efficient Real-Time Video Analytics

Reducto is built, a system that dynamically adapts filtering decisions according to the time-varying correlation between feature type, filtering threshold, query accuracy, and video content, and it achieves significant filtering benefits, while consistently meeting the desired accuracy.

Watching a Small Portion could be as Good as Watching All: Towards Efficient Video Classification

An end-to-end deep reinforcement approach which enables an agent to classify videos by watching a very small portion of frames by incorporating an adaptive stop network to measure confidence score and generate timely trigger to stop the agent watching videos, which improves efficiency without loss of accuracy.

NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale

NoScope is a system for querying videos that can reduce the cost of neural network video analysis by up to three orders of magnitude via inference-optimized model search and achieves two to three order of magnitude speed-ups on binary classification tasks over fixed-angle webcam and surveillance video while maintaining accuracy within 1--5% of state-of-the-art neural networks.

Reinventing Video Streaming for Distributed Vision Analytics

The key insight is that existing streaming protocols are essentially client-driven; in contrast, by letting the analytics server decide what/when to stream from the camera, the new protocols can directly optimize the inference accuracy while minimizing bandwidth usage.

Efficient Video Classification Using Fewer Frames

This work focuses on building compute-efficient video classification models which process fewer frames and hence have less number of FLOPs and shows that in each of these cases, a see-it-all teacher can be used to train a compute efficient see-very-little student.

Temporal Complementary Learning for Video Person Re-Identification

A Temporal Complementary Learning Network that extracts complementary features of consecutive video frames for video person re-identification by effectively alleviating the information loss caused by the erasing operation of TSE is proposed.

Chameleon: scalable adaptation of video analytics

Chameleon is a controller that dynamically picks the best configurations for existing NN-based video analytics pipelines, demonstrating that compared to a baseline that picks a single optimal configuration offline, Chameleon can achieve 20-50% higher accuracy with the same amount of resources, or achieve the same accuracy with only 30--50% of the resources.

MIRIS: Fast Object Track Queries in Video

This work proposes a novel query-driven tracking approach that integrates query processing with object tracking to efficiently process object track queries and address the computational complexity of object detection methods.

NoScope: Optimizing Neural Network Queries over Video at Scale

NoScope is a system for querying videos that can reduce the cost of neural network video analysis by up to three orders of magnitude via inference-optimized model search and achieves two to three order of magnitude speed-ups on binary classification tasks over fixed-angle webcam and surveillance video while maintaining accuracy within 1-5% of state-of-the-art neural networks.

Distream: scaling live video analytics with workload-adaptive distributed edge intelligence

This work presents Distream, a distributed live video analytics system based on the smart camera-edge cluster architecture that is able to adapt to the workload dynamics to achieve low-latency, high-throughput, and scalable live video Analytics.