A Large Dataset of Historical Japanese Documents with Complex Layouts

  title={A Large Dataset of Historical Japanese Documents with Complex Layouts},
  author={Zejiang Shen and Kaixuan Zhang and Melissa Dell},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
Deep learning-based approaches for automatic document layout analysis and content extraction have the potential to unlock rich information trapped in historical documents on a large scale. One major hurdle is the lack of large datasets for training robust models. In particular, little training data exist for Asian languages. To this end, we present HJDataset, a Large Dataset of Historical Japanese Documents with Complex Layouts. It contains over 250,000 layout element annotations of seven types… 

