Open-Domain Sign Language Translation Learned from Online Video

@article{Shi2022OpenDomainSL,
  title={Open-Domain Sign Language Translation Learned from Online Video},
  author={Bowen Shi and Diane Brentari and Greg Shakhnarovich and Karen Livescu},
  journal={ArXiv},
  year={2022},
  volume={abs/2205.12870}
}
Existing work on sign language translation— that is, translation from sign language videos into sentences in a written language—has focused mainly on (1) data collected in a controlled environment or (2) data in a specific domain, which limits the applicability to real-world settings. In this paper, we introduce OpenASL, a large-scale ASL-English dataset collected from online video sites (e.g., YouTube). OpenASL contains 288 hours of ASL videos in various domains (news, VLOGs, etc.) from over… 

Topic Detection in Continuous Sign Language Videos

TLDR
This work introduces the novel task of sign language topic detection and provides strong baselines for the task of topic detection, and presents a comparison between different visual features commonly used in the domain of signlanguage.

References

SHOWING 1-10 OF 41 REFERENCES

TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation

TLDR
This paper presents a novel sign video segment representation which takes into account multiple temporal granularities, thus alleviating the need for accurate video segmentation and develops a novel hierarchical sign video feature learning method via a temporal semantic pyramid network, called TSPNet.

Neural Sign Language Translation

TLDR
This work formalizes SLT in the framework of Neural Machine Translation (NMT) for both end-to-end and pretrained settings (using expert knowledge) and allows to jointly learn the spatial representations, the underlying language model, and the mapping between sign and spoken language.

Multi-channel Transformers for Multi-articulatory Sign Language Translation

TLDR
This paper tackles the multi-articulatory sign language translation task and proposes a novel multi-channel transformer architecture that overcome the reliance on gloss annotations which underpin other state-of-the-art approaches, thereby removing future need for expensive curated datasets.

Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation

TLDR
A novel transformer based architecture that jointly learns Continuous Sign Language Recognition and Translation while being trainable in an end-to-end manner is introduced by using a Connectionist Temporal Classification (CTC) loss to bind the recognition and translation problems into a single unified architecture.

Neural Sign Language Translation based on Human Keypoint Estimation

TLDR
This paper introduces the KETI sign language dataset and develops a neural network model for translating sign videos into natural language sentences by utilizing the human keypoints extracted from a face, hands, and body parts.

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

TLDR
I3D models considerably improve upon the state-of-the-art in action classification, reaching 80.2% on HMDB-51 and 97.9% on UCF-101 after pre-training on Kinetics, and a new Two-Stream Inflated 3D Conv net that is based on 2D ConvNet inflation is introduced.

Attention is All you Need

TLDR
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

ROUGE: A Package for Automatic Evaluation of Summaries

TLDR
Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.

American Sign Language Fingerspelling Recognition in the Wild

TLDR
This work introduces the largest data set available so far for the problem of fingerspelling recognition, and the first using naturally occurring video data, and presents the first attempt to recognize fingerspelling sequences in this challenging setting.

A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation

TLDR
A simple transfer learning baseline for sign language translation that surpasses the previous state-of-the-art results on two sign languagetranslation benchmarks, demonstrat-ing the effectiveness of transfer learning.