BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues

@inproceedings{Albanie2020BSL1KSU,
  title={BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues},
  author={Samuel Albanie and G{\"u}l Varol and Liliane Momeni and Triantafyllos Afouras and Joon Son Chung and Neil Fox and Andrew Zisserman},
  booktitle={ECCV},
  year={2020}
}
Recent progress in fine-grained gesture and action classification, and machine translation, point to the possibility of automated sign language recognition becoming a reality. A key stumbling block in making progress towards this goal is a lack of appropriate training data, stemming from the high complexity of sign annotation and a limited supply of qualified annotators. In this work, we introduce a new scalable approach to data collection for sign recognition in continuous videos. We make use… Expand
Read and Attend: Temporal Localisation in Sign Language Videos
TLDR
The Transformer model is trained to ingest a continuous signing stream and output a sequence of written tokens on a largescale collection of signing footage with weakly-aligned subtitles, and it is shown that through this training it acquires the ability to attend to a large vocabulary of sign instances in the input sequence, enabling their localisation. Expand
Watch, read and lookup: learning to spot signs from multiple supervisors
TLDR
This work trains a model using multiple types of available supervision to identify whether and where it has been signed in a continuous, co-articulated sign language video, and validates the effectiveness of this approach on low-shot sign spotting benchmarks. Expand
Hand-Model-Aware Sign Language Recognition
TLDR
The hand prior is introduced and a new hand-model-aware framework for isolated SLR is proposed with the modeling hand as the intermediate representation with multiple weaklysupervised losses to constrain its spatial and temporal consistency. Expand
Looking for the Signs: Identifying Isolated Sign Instances in Continuous Video Footage
TLDR
The proposed transformer-based network, called Sign-Lookup, achieves state-of-the-art performance on the sign spotting task with accuracy as high as 96% on challenging benchmark datasets and significantly outperforming other approaches. Expand
Isolated Sign Recognition from RGB Video using Pose Flow and Self-Attention
Automatic sign language recognition lies at the intersection of natural language processing (NLP) and computer vision. The highly successful transformer architectures, based on multi-head attention,Expand
Improving Sign Language Translation with Monolingual Data by Sign Back-Translation
TLDR
The proposed sign back-translation (SignBT) approach, which incorporates massive spoken language texts into SLT training, and obtains a substantial improvement over previous state-of-the-art SLT methods. Expand
TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation
TLDR
This paper presents a novel sign video segment representation which takes into account multiple temporal granularities, thus alleviating the need for accurate video segmentation and develops a novel hierarchical sign video feature learning method via a temporal semantic pyramid network, called TSPNet. Expand
SIGNER DIARISATION IN THE WILD
In this work, we propose a framework that enables collection of large-scale, diverse sign language datasets that can be used to train automatic sign language recognition models. The firstExpand
Sign Segmentation with Changepoint-Modulated Pseudo-Labelling
TLDR
A simple yet effective algorithm to improve segmentation performance on unlabelled signing footage from a domain of interest and proposes the Changepoint-Modulated Pseudo-Labelling algorithm to leverage cues from abrupt changes in motionsensitive feature space to improve pseudo-labelling quality for adaptation. Expand
Mutual Support of Data Modalities in the Task of Sign Language Recognition
This paper presents a method for automatic sign language recognition that was utilized in the CVPR 2021 ChaLearn Challenge (RGB track). Our method is composed of several approaches combined in anExpand
...
1
2
...

References

SHOWING 1-10 OF 72 REFERENCES
MS-ASL: A Large-Scale Data Set and Benchmark for Understanding American Sign Language
TLDR
This work proposes the first real-life large-scale sign language data set comprising over 25,000 annotated videos, which it thoroughly evaluates with state-of-the-art methods from sign and related action recognition, outperforming the current state of theart by a large margin. Expand
Zero-Shot Sign Language Recognition: Can Textual Data Uncover Sign Languages?
TLDR
By leveraging the descriptive text embeddings along with these spatio-temporal representations within a zero-shot learning framework, it is shown that textual data can indeed be useful in uncovering sign languages. Expand
Transferring Cross-Domain Knowledge for Video Sign Language Recognition
TLDR
A novel method is proposed that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them, and outperforms previous state-of-the-art methods significantly. Expand
Large-scale Learning of Sign Language by Watching TV (Using Co-occurrences)
TLDR
It is shown that, somewhat counter-intuitively, mouth patterns are highly informative for isolating words in a language for the Deaf, and their co-occurrence with signing can be used to significantly reduce the correspondence search space. Expand
Learning sign language by watching TV (using weakly aligned subtitles)
TLDR
This work proposes a distance function to match signing sequences which includes the trajectory of both hands, the hand shape and orientation, and properly models the case of hands touching and shows that by optimizing a scoring function based on multiple instance learning, it is able to extract the sign of interest from hours of signing footage, despite the very weak and noisy supervision. Expand
Video-based Sign Language Recognition without Temporal Segmentation
TLDR
A novel continuous sign recognition framework, the Hierarchical Attention Network with Latent Space (LS-HAN), which eliminates the preprocessing of temporal segmentation. Expand
Word Spotting in Silent Lip Videos
TLDR
A pipeline for recognition-free retrieval is developed, and a query expansion technique using pseudo-relevant feedback and a novel re-ranking method based on maximizing the correlation between spatio-temporal landmarks of the query and the top retrieval candidates are proposed. Expand
Learning signs from subtitles: A weakly supervised approach to sign language recognition
TLDR
A fully automated, unsupervised method to recognise sign from subtitles is introduced by using data mining to align correspondences in sections of videos using head and hand tracking and a proposed contextual negative selection method. Expand
Weakly Supervised Automatic Transcription of Mouthings for Gloss-Based Sign Language Corpora
In this work we propose a method to automatically annotate mouthings in sign language corpora, requiring no more than a simple gloss annotation and a source of weak supervision, such as automaticExpand
The American Sign Language Lexicon Video Dataset
TLDR
The ASL lexicon video dataset is introduced, a large and expanding public dataset containing video sequences of thousands of distinct ASL signs, as well as annotations of those sequences, including start/end frames and class label of every sign. Expand
...
1
2
3
4
5
...