WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition

@article{Zhu2021WebFace260MAB,
  title={WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition},
  author={Zheng Zhu and Guan Huang and Jiankang Deng and Yun Ye and Junjie Huang and Xinze Chen and Jiagang Zhu and Tian Yang and Jiwen Lu and Dalong Du and Jie Zhou},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={10487-10497}
}
  • Zheng Zhu, Guan Huang, +8 authors Jie Zhou
  • Published 6 March 2021
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
In this paper, we contribute a new million-scale face benchmark containing noisy 4M identities/260M faces (WebFace260M) and cleaned 2M identities/42M faces (WebFace42M) training data, as well as an elaborately designed time-constrained evaluation protocol. Firstly, we collect 4M name list and download 260M faces from the Internet. Then, a Cleaning Automatically utilizing Self-Training (CAST) pipeline is devised to purify the tremendous WebFace260M, which is efficient and scalable. To the best… 
A realistic approach to generate masked faces applied on two novel masked face recognition data sets
TLDR
A method for enhancing data sets containing faces without masks by creating synthetic masks and overlaying them on faces in the original images, which produces significantly more realistic training examples of masks overlaid on faces by asking volunteers to qualitatively compare it to other methods or data sets designed for the same task.
Face.evoLVe: A High-Performance Face Recognition Library
While face recognition has drawn much attention, a large number of algorithms and models have been proposed with applications to daily life, such as authentication for mobile payments, etc. Recently,
An Efficient Training Approach for Very Large Scale Face Recognition
TLDR
This work proposes a novel training approach for ultra-large-scale face datasets, termed Faster Face Classification (FC), and designs the Dual Loaders including Identity-based and Instance-based Loaders to load identities and instances to generate training batches.
Face-NMS: A Core-set Selection Approach for Efficient Face Recognition
TLDR
The first attempt in this perspective on the face recognition problem, it is found that existing methods are limited in both performance and efficiency, and contributes a novel filtering strategy dubbed Face-NMS, which accelerates the whole pipeline by applying a smaller but sufficient proxy dataset in training the proxy model.
YOLO5Face: Why Reinventing a Face Detector
TLDR
A face detector based on YOLOv5 object detector is implemented and called YolO5Face, which can achieve state-of-the-art performance in almost all the Easy, Medium, and Hard subsets, exceeding the more complex designated face detectors.
An Optically-encoded Loss-predictive Framework for Face Recognition Using Nonlinear Adaptive Margin
TLDR
A novel paradigm of an optical image encoder, DNN-decoder system for improved face recognition by introducing a covariance loss prediction module attached to the network backbone to dynamically adjust the loss objective.
Masked Face Recognition Challenge: The WebFace260M Track Report
TLDR
The Face Bio-metrics under COVID Workshop and Masked Face Recognition Challenge in ICCV 2021 is organized and a new test set is gathered consisting of elaborated 2,478 celebrities and 60,926 faces, which is the world-largest real-world masked test set.
Structure-Aware Face Clustering on a Large-Scale Graph with 107 Nodes
  • Shuai Shen, Wanhua Li, +4 authors Jie Zhou
  • Computer Science
    2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2021
TLDR
The proposed STructure-AwaRe Face Clustering (STAR-FC) method is the first to train on very large-scale graph with 20M nodes, and achieve superior inference results on 12M testing data.
A Survey on Face Recognition Systems
TLDR
This paper gives an overview of a general face recognition system, and covers various network architectures and training losses that have had a substantial impact.
PrintsGAN: Synthetic Fingerprint Generator
TLDR
PrintsGAN is proposed, a synthetic fingerprint generator capable of generating unique fingerprints along with multiple impressions for a given fingerprint, and the utility of the PrintsGAN generated dataset is shown by training a deep network to extract a fixed-length embedding from a fingerprint.
...
1
2
3
...

References

SHOWING 1-10 OF 86 REFERENCES
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
TLDR
DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.
Designing Network Design Spaces
TLDR
The RegNet design space provides simple and fast networks that work well across a wide range of flop regimes, and outperform the popular EfficientNet models while being up to 5x faster on GPUs.
Learning to Cluster Faces via Confidence and Connectivity Estimation
TLDR
This paper proposes a fully learnable clustering framework without requiring a large number of overlapped subgraphs, and transforms the clustering problem into two sub-problems, designed to estimate the confidence of vertices and the connectivity of edges, respectively.
Learning to Cluster Faces on an Affinity Graph
TLDR
This work explores a novel approach, namely, learning to cluster instead of relying on hand-crafted criteria, and proposes a framework based on graph convolutional network, which combines a detection and a segmentation module to pinpoint face clusters.
IARPA Janus 10500 Benchmark C: Face dataset and protocol
  • In ICB, 2018
  • 2018
The Devil of Face Recognition is in the Noise
TLDR
This work contributes cleaned subsets of popular face databases, i.e., MegaFace and MS-Celeb-1M datasets, and builds a new large-scale noise-controlled IMDb-Face dataset, and investigates ways to improve data cleanliness, including a comprehensive user study on the influence of data labeling strategies to annotation accuracy.
MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition
TLDR
A benchmark task to recognize one million celebrities from their face images, by using all the possibly collected face images of this individual on the web as training data, which could lead to one of the largest classification problems in computer vision.
FaceNet: A unified embedding for face recognition and clustering
TLDR
A system that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure offace similarity, and achieves state-of-the-art face recognition performance using only 128-bytes perface.
Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments
TLDR
The database contains labeled face photographs spanning the range of conditions typically encountered in everyday life, and exhibits “natural” variability in factors such as pose, lighting, race, accessories, occlusions, and background.
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
TLDR
A new scaling method is proposed that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient and is demonstrated the effectiveness of this method on scaling up MobileNets and ResNet.
...
1
2
3
4
5
...