• Computer Science
  • Published in ICCV 2019

VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research

@article{Wang2019VATEXAL,
  title={VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research},
  author={Xin Wang and Jiawei Wu and Junkun Chen and Lei Li and Yuan-fang Wang and William Yang Wang},
  journal={ArXiv},
  year={2019},
  volume={abs/1904.03493}
}
We present a new large-scale multilingual video description dataset, VATEX, which contains over 41,250 videos and 825,000 captions in both English and Chinese. Among the captions, there are over 206,000 English-Chinese parallel translation pairs. Compared to the widely-used MSR-VTT dataset, VATEX is multilingual, larger, linguistically complex, and more diverse in terms of both video and natural language descriptions. We also introduce two tasks for video-and-language research based on VATEX… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-9 OF 9 CITATIONS

Imperial College London Submission to VATEX Video Captioning Task

VIEW 4 EXCERPTS
CITES BACKGROUND, RESULTS & METHODS
HIGHLY INFLUENCED

Cross-Lingual Vision-Language Navigation

VIEW 3 EXCERPTS
CITES METHODS & BACKGROUND

References

Publications referenced by this paper.
SHOWING 1-10 OF 69 REFERENCES

MSR-VTT: A Large Video Description Dataset for Bridging Video and Language

VIEW 7 EXCERPTS
HIGHLY INFLUENTIAL

Collecting Highly Parallel Data for Paraphrase Evaluation

VIEW 6 EXCERPTS
HIGHLY INFLUENTIAL

Bidirectional recurrent neural networks

VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

Long Short-Term Memory

VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

nocaps: novel object captioning at scale

VIEW 1 EXCERPT

COCO-CN for Cross-Lingual Image Tagging, Captioning, and Retrieval

VIEW 1 EXCERPT