VGGFace2: A Dataset for Recognising Faces across Pose and Age
- Qiong Cao, Li Shen, Weidi Xie, O. Parkhi, Andrew Zisserman
- Computer ScienceIEEE International Conference on Automatic Face…
- 23 October 2017
A new large-scale face dataset named VGGFace2 is introduced, which contains 3.31 million images of 9131 subjects, with an average of 362.6 images for each subject, and the automated and manual filtering stages to ensure a high accuracy for the images of each identity are described.
Vggsound: A Large-Scale Audio-Visual Dataset
- Honglie Chen, Weidi Xie, A. Vedaldi, Andrew Zisserman
- Computer ScienceIEEE International Conference on Acoustics…
- 29 April 2020
The goal is to collect a large-scale audio-visual dataset with low label noise from videos ‘in the wild’ using computer vision techniques and investigates various Convolutional Neural Network architectures and aggregation approaches to establish audio recognition baselines for this new dataset.
Voxceleb: Large-scale speaker verification in the wild
- Arsha Nagrani, Joon Son Chung, Weidi Xie, Andrew Zisserman
- Computer ScienceComputer Speech and Language
- 1 March 2020
Utterance-level Aggregation for Speaker Recognition in the Wild
- Weidi Xie, Arsha Nagrani, Joon Son Chung, Andrew Zisserman
- Computer ScienceIEEE International Conference on Acoustics…
- 26 February 2019
This paper proposes a powerful speaker recognition deep network, using a ‘thin-ResNet’ trunk architecture, and a dictionary-based NetVLAD or GhostVLAD layer to aggregate features across time, that can be trained end-to-end.
Video Representation Learning by Dense Predictive Coding
- Tengda Han, Weidi Xie, Andrew Zisserman
- Computer ScienceIEEE/CVF International Conference on Computer…
- 10 September 2019
With single stream (RGB only), DPC pretrained representations achieve state-of-the-art self-supervised performance on both UCF101 and HMDB51, outperforming all previous learning methods by a significant margin, and approaching the performance of a baseline pre-trained on ImageNet.
Self-supervised Co-training for Video Representation Learning
- Tengda Han, Weidi Xie, Andrew Zisserman
- Computer ScienceNeural Information Processing Systems
- 19 October 2020
This paper investigates the benefit of adding semantic-class positives to instance-based Info Noise Contrastive Estimation (InfoNCE) training, and proposes a novel self-supervised co-training scheme to improve the popular infoNCE loss.
Memory-augmented Dense Predictive Coding for Video Representation Learning
- Tengda Han, Weidi Xie, Andrew Zisserman
- Computer ScienceEuropean Conference on Computer Vision
- 3 August 2020
A new architecture and learning framework Memory-augmented Dense Predictive Coding (MemDPC) is proposed for the self-supervised learning from video, in particular for representations for action recognition, trained with a predictive attention mechanism over the set of compressed memories.
Microscopy cell counting and detection with fully convolutional regression networks
- Weidi Xie, J. Noble, Andrew Zisserman
- Computer ScienceComput. methods Biomech. Biomed. Eng. Imaging Vis…
- 4 May 2018
A new state-of-the-art performance for cell count on standard synthetic image benchmarks is set and it is shown that the FCRNs trained entirely with synthetic data can generalise well to real microscopy images both for cell counting and detections for the case of overlapping cells.
NeRF-: Neural Radiance Fields Without Known Camera Parameters
- Zirui Wang, Shangzhe Wu, Weidi Xie, Min Chen, V. Prisacariu
- Computer ScienceArXiv
- 14 February 2021
It is shown that the camera parameters can be jointly optimised as learnable parameters with NeRF training, through a photometric reconstruction, and the joint optimisation pipeline can recover accurate camera parameters and achieve comparable novel view synthesis quality as those trained with COLMAP pre-computed camera parameters.
Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval
- A. Brown, Weidi Xie, Vicky S. Kalogeiton, Andrew Zisserman
- Computer ScienceEuropean Conference on Computer Vision
- 23 July 2020
Smooth-AP is a plug-and-play objective function that allows for end-to-end training of deep networks with a simple and elegant implementation and improves the performance over the state-of-the-art, especially for larger-scale datasets, thus demonstrating the effectiveness and scalability of Smooth-AP to real-world scenarios.
...
...