COCO_TS Dataset: Pixel-level Annotations Based on Weak Supervision for Scene Text Segmentation

  title={COCO\_TS Dataset: Pixel-level Annotations Based on Weak Supervision for Scene Text Segmentation},
  author={Simone Bonechi and Paolo Andreini and Monica Bianchini and Franco Scarselli},
  journal={arXiv: Computer Vision and Pattern Recognition},
The absence of large scale datasets with pixel-level supervisions is a significant obstacle for the training of deep convolutional networks for scene text segmentation. For this reason, synthetic data generation is normally employed to enlarge the training dataset. Nonetheless, synthetic data cannot reproduce the complexity and variability of natural images. In this paper, a weakly supervised learning approach is used to reduce the shift between training on real and synthetic data. Pixel-level… Expand
2 Citations
Syllabification of the Divine Comedy
An online vocabulary is provided, for each word, information about its syllabification, the location of the tonic accent, and the aforementioned synalephe propensity, on the left and right sides, which opens a wide range of exciting perspectives, from the possibility of automatic learning syllabified words and verses to the improvement of generative models. Expand
A Two Stage GAN for High Resolution Retinal Image Generation and Segmentation
This paper uses Generative Adversarial Networks for synthesizing high quality retinal images, along with the corresponding semantic label-maps, to be used instead of real images during the training process, to generate realistic high resolution images that can be effectively used to enlarge small available datasets. Expand


Scene Text Detection and Segmentation Based on Cascaded Convolution Neural Networks
In this method, a CNN-based text-aware candidate text region (CTR) extraction model is designed and trained using both the edges and the whole regions of text, with which coarse CTRs are detected. Expand
Synthetic Data for Text Localisation in Natural Images
The relation of FCRN to the recently-introduced YOLO detector, as well as other end-to-end object detection systems based on deep learning, are discussed. Expand
A Deep Learning Approach to Bacterial Colony Segmentation
A scalable alternative to the human ground–truth supervision—often difficult to obtain in medical imaging, due to privacy issues and scarcity of data—is provided, and synthetic generated data, together with few annotated images, were used to train a Fully–Convolutional Network. Expand
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. Expand
Generating Bounding Box Supervision for Semantic Segmentation with Deep Learning
This paper proposes a deep learning approach that uses bounding box annotations to train a semantic segmentation network, obtaining competitive results compared to a fully supervised setting. Expand
Pyramid Scene Parsing Network
This paper exploits the capability of global context information by different-region-based context aggregation through the pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet) to produce good quality results on the scene parsing task. Expand
Simple Does It: Weakly Supervised Instance and Semantic Segmentation
This work proposes a new approach that does not require modification of the segmentation training procedure and shows that when carefully designing the input labels from given bounding boxes, even a single round of training is enough to improve over previously reported weakly supervised results. Expand
Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition
To evaluate its robustness against curved text, DeconvNet is fine-tuned and benchmarked on Total-Text to facilitate a new research direction for the scene text community. Expand
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
The COCO-Text dataset is described, which contains over 173k text annotations in over 63k images and presents an analysis of three leading state-of-the-art photo Optical Character Recognition (OCR) approaches on the dataset. Expand
The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes
This paper generates a synthetic collection of diverse urban images, named SYNTHIA, with automatically generated class annotations, and conducts experiments with DCNNs that show how the inclusion of SYnTHIA in the training stage significantly improves performance on the semantic segmentation task. Expand