Line Segment Detection Using Transformers without Edges

@article{Xu2021LineSD,
  title={Line Segment Detection Using Transformers without Edges},
  author={Yifan Xu and Weijian Xu and David Cheung and Zhuowen Tu},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={4255-4264}
}
  • Yifan Xu, Weijian Xu, Z. Tu
  • Published 6 January 2021
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
In this paper, we present a joint end-to-end line segment detection algorithm using Transformers that is post-processing and heuristics-guided intermediate processing (edge/junction/region detection) free. Our method, named LinE segment TRansformers (LETR), takes advantages of having integrated tokenized queries, a self-attention mechanism, and encoding-decoding strategy within Transformers by skipping standard heuristic designs for the edge element detection and perceptual grouping processes… 

Figures and Tables from this paper

Hole-robust Wireframe Detection
TLDR
This work first time combines GAN in the model to let the model better predict underlying scene structure even beyond large holes, and introduces pseudo labeling to further enlarge the model capacity to overcome small-scale labeled data.
CTRL-C: Camera calibration TRansformer with Line-Classification
TLDR
This work proposes Camera calibration TRansformer with Line-Classification (CTRL-C), an end-to-end neural network-based approach to single image camera calibration, which directly estimates the camera parameters from an image and a set of line segments.
HEAT: Holistic Edge Attention Transformer for Structured Reconstruction
TLDR
A novel attention-based neural network for structured reconstruction, which takes a 2D raster image as an input and reconstructs a planar graph depicting an underlying geometric structure, which demonstrates the superiority of this approach over the state of the art.
ELSED: Enhanced Line SEgment Drawing
Glass Segmentation with RGB-Thermal Image Pairs
—This paper proposes a new glass segmentation method utilizing paired RGB and thermal images. Due to the large difference between the transmission property of visible light and that of the thermal
RNGDet: Road Network Graph Detection by Transformer in Aerial Images
TLDR
A novel approach based on transformer and imitation learning named RNGDet (Road Network Graph Detection by Transformer) in this paper that can handle complicated intersection points of various numbers of road segments and is superior to existing segmentation-based approaches.
Text Spotting Transformers
TLDR
This paper presents TExt Spotting TRansformers (TESTR), a generic end-to-end text spotting framework using Transformers for text detection and recognition in the wild, and designs a bounding-box guided polygon detection process.
Conditional DETR for Fast Training Convergence
TLDR
The approach, named conditional DETR, learns a conditional spatial query from the decoder embedding for decoder multi-head cross-attention, which narrows down the spatial range for localizing the distinct regions for object classification and box regression, thus relaxing the dependence on the content embeddings and easing the training.
ELSD: Efficient Line Segment Detector and Descriptor
TLDR
The novel Efficient Line Segment Detector and Descriptor (ELSD) to simultaneously detect line segments and extract their descriptors in an image achieves the state-of-the-art performance on the Wireframe dataset and YorkUrban dataset, in both accuracy and efficiency.
Fully Convolutional Line Parsing
TLDR
This work presents a one-stage Fully Convolutional Line Parsing network (F-Clip) that detects line segments from images and achieves a significantly better trade-off between efficiency and accuracy, resulting in a real-time line detector at up to 73 FPS on a single GPU.
...
1
2
...

References

SHOWING 1-10 OF 41 REFERENCES
End-to-End Object Detection with Transformers
TLDR
This work presents a new method that views object detection as a direct set prediction problem, and demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset.
Holistically-Attracted Wireframe Parsing
TLDR
This paper presents a fast and parsimonious parsing method to accurately and robustly detect a vectorized wireframe in an input image with a single forward pass, and is thus called Holistically-Attracted Wireframe Parser (HAWP).
Learning Attraction Field Representation for Robust Line Segment Detection
TLDR
A region-partition based attraction field dual representation for line segment maps, which poses the problem of line segment detection (LSD) as the region coloring problem and harnesses the best practices developed in ConvNets based semantic segmentation methods such as the encoder-decoder architecture and the a-trous convolution.
Learning to Parse Wireframes in Images of Man-Made Environments
TLDR
A learning-based approach to the task of automatically extracting a "wireframe" representation for images of cluttered man-made environments and two convolutional neural networks that are suitable for extracting junctions and lines with large spatial support are proposed.
Attention is All you Need
TLDR
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Efficient Edge-Based Methods for Estimating Manhattan Frames in Urban Imagery
TLDR
It is argued that in a sense, less can be more: that basing estimation on sparse, accurately localized edges, rather than dense gradient maps, permits the derivation of more accurate statistical models and leads to more efficient estimation.
End-to-End Wireframe Parsing
TLDR
This work presents a conceptually simple yet effective algorithm that significantly outperforms the previous state-of-the-art wireframe and line extraction algorithms and proposes a new metric for wireframe evaluation that penalizes overlapped line segments and incorrect line connectivities.
Holistically-Nested Edge Detection
  • Saining Xie, Z. Tu
  • Computer Science
    2015 IEEE International Conference on Computer Vision (ICCV)
  • 2015
TLDR
HED performs image-to-image prediction by means of a deep learning model that leverages fully convolutional neural networks and deeply-supervised nets, and automatically learns rich hierarchical representations that are important in order to resolve the challenging ambiguity in edge and object boundary detection.
Focal Loss for Dense Object Detection
TLDR
This paper proposes to address the extreme foreground-background class imbalance encountered during training of dense detectors by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples, and develops a novel Focal Loss, which focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
...
1
2
3
4
5
...