Transferring Cross-Domain Knowledge for Video Sign Language Recognition

@article{Li2020TransferringCK,
  title={Transferring Cross-Domain Knowledge for Video Sign Language Recognition},
  author={Dongxu Li and Xin Yu and Chenchen Xu and Lars Petersson and Hongdong Li},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020},
  pages={6204-6213}
}
  • Dongxu Li, Xin Yu, +2 authors Hongdong Li
  • Published 2020
  • Computer Science
  • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation. It requires models to recognize isolated sign words from videos. However, annotating WSLR data needs expert knowledge, thus limiting WSLR dataset acquisition. On the contrary, there are abundant subtitled sign news videos on the internet. Since these videos have no word-level annotation and exhibit a large domain gap from isolated signs, they cannot be directly used for training WSLR models. We… Expand
TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation
TLDR
This paper presents a novel sign video segment representation which takes into account multiple temporal granularities, thus alleviating the need for accurate video segmentation and develops a novel hierarchical sign video feature learning method via a temporal semantic pyramid network, called TSPNet. Expand
SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition
  • Hezhen Hu, Weichao Zhao, Wengang Zhou, Yuechen Wang, Houqiang Li
  • Computer Science
  • 2021
TLDR
This paper introduces the first self-supervised pre-trainable SignBERT with incorporated hand prior for SLR, and attempts to incorporate hand prior in a model-aware method to better model hierarchical context over the hand sequence. Expand
Read and Attend: Temporal Localisation in Sign Language Videos
TLDR
The Transformer model is trained to ingest a continuous signing stream and output a sequence of written tokens on a largescale collection of signing footage with weakly-aligned subtitles, and it is shown that through this training it acquires the ability to attend to a large vocabulary of sign instances in the input sequence, enabling their localisation. Expand
Pose-based Sign Language Recognition using GCN and BERT
TLDR
This work tackles the problem of WSLR using a novel pose-based approach, which captures spatial and temporal information separately and performs late fusion, and explicitly captures the spatial interactions in the video using a Graph Convolutional Network (GCN) and a Bidirectional Encoder Representations from Transformers (BERT). Expand
Visual Alignment Constraint for Continuous Sign Language Recognition
TLDR
This work revisits the overfitting problem in recent CTC-based CSLR works and proposes a Visual Alignment Constraint (VAC) to enhance the feature extractor with more alignment supervision and proposes two metrics to evaluate the contributions of the feature Extractor and the alignment model, which provide evidence for the over fitting problem. Expand
Watch, read and lookup: learning to spot signs from multiple supervisors
TLDR
This work trains a model using multiple types of available supervision to identify whether and where it has been signed in a continuous, co-articulated sign language video, and validates the effectiveness of this approach on low-shot sign spotting benchmarks. Expand
Improving Sign Language Translation with Monolingual Data by Sign Back-Translation
TLDR
The proposed sign back-translation (SignBT) approach, which incorporates massive spoken language texts into SLT training, and obtains a substantial improvement over previous state-of-the-art SLT methods. Expand
Phonology Recognition in American Sign Language
TLDR
This paper introduces the idea of exploiting the phonological properties manually assigned by sign language users to classify videos of people performing signs by regressing a 3D mesh using statistical and deep learning algorithms. Expand
Mutual Support of Data Modalities in the Task of Sign Language Recognition
TLDR
This paper presents a method for automatic sign language recognition that was utilized in the CVPR 2021 ChaLearn Challenge (RGB track) and results in 95.46% accuracy. Expand
Quantitative Survey of the State of the Art in Sign Language Recognition
TLDR
This study compiles the state of the art in a concise way to help advance the field and reveal open questions, such as shifts in the field from intrusive to non-intrusive capturing, from local to global features and the lack of non-manual parameters included in medium and larger vocabulary recognition systems. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 48 REFERENCES
Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison
TLDR
This paper introduces a new large-scale Word-Level American Sign Language (WLASL) video dataset, containing more than 2000 words performed by over 100 signers, and proposes a novel pose-based temporal graph convolution networks (Pose-TGCN) that model spatial and temporal dependencies in human pose trajectories simultaneously, which has further boosted the performance of the pose- based method. Expand
MS-ASL: A Large-Scale Data Set and Benchmark for Understanding American Sign Language
TLDR
This work proposes the first real-life large-scale sign language data set comprising over 25,000 annotated videos, which it thoroughly evaluates with state-of-the-art methods from sign and related action recognition, outperforming the current state of theart by a large margin. Expand
Sign language recognition with recurrent neural network using human keypoint detection
TLDR
This work develops a sign language recognition system by utilizing the human keypoints extracted from face, hand, and body parts and shows that the system is robust even when the size of training data is not sufficient. Expand
Neural Sign Language Translation based on Human Keypoint Estimation
TLDR
This paper introduces the KETI sign language dataset and develops a neural network model for translating sign videos into natural language sentences by utilizing the human keypoints extracted from a face, hands, and body parts. Expand
Learning sign language by watching TV (using weakly aligned subtitles)
TLDR
This work proposes a distance function to match signing sequences which includes the trajectory of both hands, the hand shape and orientation, and properly models the case of hands touching and shows that by optimizing a scoring function based on multiple instance learning, it is able to extract the sign of interest from hours of signing footage, despite the very weak and noisy supervision. Expand
Sign Language Recognition using 3D convolutional neural networks
TLDR
A novel 3D convolutional neural network (CNN) which extracts discriminative spatial-temporal features from raw video stream automatically without any prior knowledge, avoiding designing features is proposed. Expand
Re-Sign: Re-Aligned End-to-End Sequence Modelling with Deep Recurrent CNN-HMMs
TLDR
This work proposes an algorithm that treats the provided training labels as weak labels and refines the label-to-image alignment on-the-fly in a weakly supervised fashion, and embedded into an HMM the resulting deep model continuously improves its performance in several re-alignments. Expand
Selfie Sign Language Recognition with Convolutional Neural Networks
TLDR
This paper proposes the recognition of Indian sign language gestures using a powerful artificial intelligence tool, convolutional neural networks (CNN), and achieves 92.88 % recognition rate compared to other classifier models reported on the same dataset. Expand
Gesture and Sign Language Recognition with Temporal Residual Networks
TLDR
This work approaches Gesture and sign language recognition in a continuous video stream as a framewise classification problem using temporal convolutions and recent advances in the deep learning field like residual networks, batch normalization and exponential linear units. Expand
Chinese sign language recognition based on video sequence appearance modeling
  • Quan-Xi Yang
  • Computer Science
  • 2010 5th IEEE Conference on Industrial Electronics and Applications
  • 2010
TLDR
Experiments prove that this appearance modeling method for sign language spatio-temporal appearance modeling is simple, efficient, and effective for characterizing hand gestures, and the SVMs method has excellent classification and generalization ability in solving learning problem with small training set of sample in sign language recognition. Expand
...
1
2
3
4
5
...