• Corpus ID: 233181548

1st Place Solution to ICDAR 2021 RRC-ICTEXT End-to-end Text Spotting and Aesthetic Assessment on Integrated Circuit

@article{Wang20211stPS,
  title={1st Place Solution to ICDAR 2021 RRC-ICTEXT End-to-end Text Spotting and Aesthetic Assessment on Integrated Circuit},
  author={Qiyao Wang and Pengfei Li and Li Zhu and Yi Niu},
  journal={ArXiv},
  year={2021},
  volume={abs/2104.03544}
}
This paper presents our proposed methods to ICDAR 2021 Robust Reading Challenge Integrated Circuit Text Spotting and Aesthetic Assessment (ICDAR RRCICTEXT 2021). For the text spotting task, we detect the characters on integrated circuit and classify them based on yolov5 detection model. We balance the lowercase and non-lowercase by using SynthText, generated data and data sampler. We adopt semi-supervised algorithm and distillation to furtherly improve the model’s accuracy. For the aesthetic… 

Figures and Tables from this paper

ICDAR 2021 Competition on Integrated Circuit Text Spotting and Aesthetic Assessment
TLDR
A text on chips dataset, ICText is used as the main target for the proposed Robust Reading Challenge on Integrated Circuit Text Spotting and Aesthetic Assessment (RRC-ICText) 2021 to encourage the research on this problem.

References

SHOWING 1-8 OF 8 REFERENCES
YOLOv4: Optimal Speed and Accuracy of Object Detection
TLDR
This work uses new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, C mBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100.
Weighted Boxes Fusion: ensembling boxes for object detection models
TLDR
A novel Weighted Box Fusion (WBF) ensembling algorithm that boosts the performance by ensembled predictions from different object detection models by boosting the performance on predictions of different models trained on large Open Images Dataset.
Editing Text in the Wild
TLDR
This work proposes an end-to-end trainable style retention network (SRNet) that consists of three modules: text conversion module, background inpainting module and fusion module, which is the first attempt to edit text in natural images at the word level.
LVIS: A Dataset for Large Vocabulary Instance Segmentation
TLDR
This work introduces LVIS (pronounced ‘el-vis’): a new dataset for Large Vocabulary Instance Segmentation, which has a long tail of categories with few training samples due to the Zipfian distribution of categories in natural images.
Synthetic Data for Text Localisation in Natural Images
TLDR
The relation of FCRN to the recently-introduced YOLO detector, as well as other end-to-end object detection systems based on deep learning, are discussed.
Distilling the Knowledge in a Neural Network
TLDR
This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.
Icdar rrc-ictext