• Corpus ID: 239050540

Weakly Supervised Training of Monocular 3D Object Detectors Using Wide Baseline Multi-view Traffic Camera Data

@article{Howe2021WeaklyST,
  title={Weakly Supervised Training of Monocular 3D Object Detectors Using Wide Baseline Multi-view Traffic Camera Data},
  author={Matthew Howe and Ian D. Reid and Jamie Mackenzie},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.10966}
}
Accurate 7DoF prediction of vehicles at an intersection is an important task for assessing potential conflicts between road users. In principle, this could be achieved by a single camera system that is capable of detecting the pose of each vehicle but this would require a large, accurately labelled dataset from which to train the detector. Although large vehicle pose datasets exist (ostensibly developed for autonomous vehicles), we find training on these datasets inadequate. These datasets… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 42 REFERENCES
Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss
TLDR
It is argued that the SS3D architecture provides a solid framework upon which high performing detection systems can be built, with autonomous driving being the main application in mind.
Monocular 3D Object Detection via Geometric Reasoning on Keypoints
TLDR
This paper proposes a novel keypoint-based approach for 3D object detection and localization from a single RGB image, building a multi-branch model around 2D keypoint detection in images and complement it with a conceptually simple geometric reasoning method.
WILDTRACK: A Multi-camera HD Dataset for Dense Unscripted Pedestrian Detection
TLDR
A new large-scale and high-resolution dataset that has been captured with seven static cameras in a public open area, and unscripted dense groups of pedestrians standing and walking, and provides an accurate joint (extrinsic and intrinsic) calibration, as well as 7 series of 400 annotated frames for detection at a rate of 2 frames per second.
M3D-RPN: Monocular 3D Region Proposal Network for Object Detection
  • G. Brazil, Xiaoming Liu
  • Computer Science
    2019 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2019
TLDR
M3D-RPN is able to significantly improve the performance of both monocular 3D Object Detection and Bird's Eye View tasks within the KITTI urban autonomous driving dataset, while efficiently using a shared multi-class model.
MonoGRNet: A General Framework for Monocular 3D Object Detection
TLDR
The task decomposition significantly facilitates the monocular 3D object detection, allowing the target 3D bounding boxes to be efficiently predicted in a single forward pass, without using object proposals, post-processing or the computationally expensive pixel-level depth estimation utilized by previous methods.
Learning Monocular 3D Human Pose Estimation from Multi-view Images
TLDR
This paper trains the system to predict the same pose in all views, and proposes a method to estimate camera pose jointly with human pose, which lets us utilize multiview footage where calibration is difficult, e.g., for pan-tilt or moving handheld cameras.
Hand Keypoint Detection in Single Images Using Multiview Bootstrapping
TLDR
An approach that uses a multi-camera system to train fine-grained detectors for keypoints that are prone to occlusion, such as the joints of a hand, and derives a result analytically relating the minimum number of views to achieve target true and false positive rates for a given detector.
CarFusion: Combining Point Tracking and Part Detection for Dynamic 3D Reconstruction of Vehicles
TLDR
This work develops a framework to fuse both the single-view feature tracks and multiview detected part locations to significantly improve the detection, localization and reconstruction of moving vehicles, even in the presence of strong occlusions.
Weakly-Supervised 3D Human Pose Learning via Multi-View Images in the Wild
TLDR
A novel end-to-end learning framework that enables weakly-supervised training using multi-view consistency and proposes a novel objective function that can only be minimized when the predictions of the trained model are consistent and plausible across all camera views.
nuScenes: A Multimodal Dataset for Autonomous Driving
Robust detection and tracking of objects is crucial for the deployment of autonomous vehicle technology. Image based benchmark datasets have driven development in computer vision tasks such as object
...
1
2
3
4
5
...