• Corpus ID: 54446047

MS-ASL: A Large-Scale Data Set and Benchmark for Understanding American Sign Language

@inproceedings{Joze2019MSASLAL,
  title={MS-ASL: A Large-Scale Data Set and Benchmark for Understanding American Sign Language},
  author={Hamid Reza Vaezi Joze and Oscar Koller},
  booktitle={BMVC},
  year={2019}
}
Computer Vision has been improved significantly in the past few decades. [...] Key Method We propose I3D, known from video classifications, as a powerful and suitable architecture for sign language recognition. We also propose new pre-trained model more appropriate for sign language recognition. Finally, We estimate the effect of number of classes and number of training samples on the recognition accuracy.Expand
Read and Attend: Temporal Localisation in Sign Language Videos
TLDR
A Transformer model is trained to ingest a continuous signing stream and output a sequence of written tokens on a large-scale collection of signing footage with weakly-aligned subtitles, and it is shown that through this training it acquires the ability to attend to a large vocabulary of sign instances in the input sequence, enabling their localisation.
INCLUDE: A Large Scale Dataset for Indian Sign Language Recognition
TLDR
This work presents the Indian Lexicon Sign Language Dataset - INCLUDE - an ISL dataset that contains 0.27 million frames across 4,287 videos over 263 word signs from 15 different word categories and evaluates several deep neural networks combining different methods for augmentation, feature extraction, encoding and decoding.
Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison
TLDR
This paper introduces a new large-scale Word-Level American Sign Language (WLASL) video dataset, containing more than 2000 words performed by over 100 signers, and proposes a novel pose-based temporal graph convolution networks (Pose-TGCN) that model spatial and temporal dependencies in human pose trajectories simultaneously, which has further boosted the performance of the pose- based method.
Transferring Cross-Domain Knowledge for Video Sign Language Recognition
TLDR
A novel method is proposed that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them, and outperforms previous state-of-the-art methods significantly.
Understanding Motion in Sign Language: A New Structured Translation Dataset
TLDR
An encoder-decoder deep strategy is herein introduced to support automatic translation, including attention modules that capture short, long, and structural kinematic dependencies and their respective relationships with sign recognition.
Watch, read and lookup: learning to spot signs from multiple supervisors
TLDR
This work trains a model using multiple types of available supervision to identify whether and where it has been signed in a continuous, co-articulated sign language video, and validates the effectiveness of this approach on low-shot sign spotting benchmarks.
Hand-Model-Aware Sign Language Recognition
TLDR
The hand prior is introduced and a new hand-model-aware framework for isolated SLR is proposed with the modeling hand as the intermediate representation with multiple weaklysupervised losses to constrain its spatial and temporal consistency.
Pose-based Sign Language Recognition using GCN and BERT
TLDR
This work tackles the problem of WSLR using a novel pose-based approach, which captures spatial and temporal information separately and performs late fusion, and explicitly captures the spatial interactions in the video using a Graph Convolutional Network (GCN) and a Bidirectional Encoder Representations from Transformers (BERT).
BBC-Oxford British Sign Language Dataset
TLDR
This work describes several strengths and limitations of the data from the perspectives of machine learning and linguistics, note sources of bias present in the dataset, and discuss potential applications of BOBSL in the context of sign language technology.
Recognition of Non-Manual Content in Continuous Japanese Sign Language
TLDR
A two-stage pipeline based on two-dimensional body joint positions extracted from RGB camera data, which is able to distinguish word segments of specific non-manual intonations with 86% accuracy from the underlying body joint movement data, constitutes an important contribution for a better understanding of mixed manual and non- manual content in signed communication.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 90 REFERENCES
Neural Sign Language Translation
TLDR
This work formalizes SLT in the framework of Neural Machine Translation (NMT) for both end-to-end and pretrained settings (using expert knowledge) and allows to jointly learn the spatial representations, the underlying language model, and the mapping between sign and spoken language.
The American Sign Language Lexicon Video Dataset
TLDR
The ASL lexicon video dataset is introduced, a large and expanding public dataset containing video sequences of thousands of distinct ASL signs, as well as annotations of those sequences, including start/end frames and class label of every sign.
Video-based Sign Language Recognition without Temporal Segmentation
TLDR
A novel continuous sign recognition framework, the Hierarchical Attention Network with Latent Space (LS-HAN), which eliminates the preprocessing of temporal segmentation.
Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers
TLDR
This work presents a statistical recognition approach performing large vocabulary continuous sign language recognition across different signers, and is the first time system design on a large data set with true focus on real-life applicability is thoroughly presented.
Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos
TLDR
This work applies the approach to the domain of sign language recognition exploiting the sequential parallelism to learn sign language, mouth shape and hand shape classifiers and clearly outperform the state-of-the-art on all data sets and observe significantly faster convergence using the parallel alignment approach.
Modality Combination Techniques for Continuous Sign Language Recognition
TLDR
Early combination of features, late fusion of decisions, as well as synchronous combination on the hidden Markov model state level, and asynchronous combination onThe gloss level are investigated for five modalities on two publicly available benchmark databases consisting of challenging real-life data and less complex lab-data.
Re-Sign: Re-Aligned End-to-End Sequence Modelling with Deep Recurrent CNN-HMMs
TLDR
This work proposes an algorithm that treats the provided training labels as weak labels and refines the label-to-image alignment on-the-fly in a weakly supervised fashion, and embedded into an HMM the resulting deep model continuously improves its performance in several re-alignments.
Extensions of the Sign Language Recognition and Translation Corpus RWTH-PHOENIX-Weather
This paper introduces the RWTH-PHOENIX-Weather 2014, a video-based, large vocabulary, German sign language corpus which has been extended over the last two years, tripling the size of the original
ASL-LEX: A lexical database of American Sign Language
TLDR
ASL-LEX is a lexical database that catalogues information about nearly 1,000 signs in ASL, including subjective frequency ratings from 25–31 deaf signers, iconicity ratings from 21–37 hearing non-signers, videoclip duration, sign length, grammatical class, and whether the sign is initialized, a fingerspelled loan sign, or a compound.
Dynamic affine-invariant shape-appearance handshape features and classification in sign language videos
We propose the novel approach of dynamic affine-invariant shape-appearance model (Aff-SAM) and employ it for handshape classification and sign recognition in sign language (SL) videos. Aff-SAM offers
...
1
2
3
4
5
...