Learning by Watching

  title={Learning by Watching},
  author={Jimuyang Zhang and Eshed Ohn-Bar},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Jimuyang Zhang, Eshed Ohn-Bar
  • Published 1 June 2021
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
When in a new situation or geographical location, human drivers have an extraordinary ability to watch others and learn maneuvers that they themselves may have never performed. In contrast, existing techniques for learning to drive preclude such a possibility as they assume direct access to an instrumented ego-vehicle with fully known observations and expert driver actions. However, such measurements cannot be directly accessed for the non-ego vehicles when learning by watching others… 

Figures and Tables from this paper

Causal Imitative Model for Autonomous Driving
This paper proposes Causal Imitative Model (CIM), a model that explicitly discovers the causal model and utilizes it to train the policy, and shrinks the input dimension to only two, hence, can adapt to new environments in a few-shot setting.
Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline
Current end-to-end autonomous driving methods either run a controller based on a planned trajectory or perform control prediction directly, which have spanned two separately studied lines of
Learn-to-Race Challenge 2022: Benchmarking Safe Learning and Cross-domain Generalisation in Autonomous Racing
The main goal of the challenge is to evaluate the joint safety, performance, and generalisation capabilities of reinforcement learning agents on multi-modal perception, through a two-stage process.
SelfD: Self-Learning Large-Scale Driving Policies From the Web
This work introduces SelfD, a framework for learning scalable driving by utilizing large amounts of online monocular images and proposes a pseudo-labeling step which enables making full use of highly diverse demonstration data through “hypothetical” planning-based data augmentation.
Learning from All Vehicles
A system to train driving policies from experiences collected not just from the ego-vehicle, but all vehicles that it observes, which outperforms all prior methods on the public CARLA Leaderboard by a wide margin.
X-World: Accessibility, Vision, and Autonomy Meet
X-World, an accessibility-centered development environment for vision-based autonomous systems, is introduced and its contributions provide an initial step towards widespread deployment of vision- based agents that can perceive and model the interaction needs of diverse people with disabilities.
Safe Autonomous Racing via Approximate Reachability on Ego-vision
This work proposes to incorporate Hamilton-Jacobi (HJ) reachability theory, a safety verification method for general non-linear systems, into the constrained Markov decision process (CMDP) framework, and demonstrates that with neural approximation, the HJ safety value can be learned directly on vision context—the highestdimensional problem studied via the method, to-date.
Designed to Cooperate: A Kant-Inspired Ethic of Machine-to-Machine Cooperation
This position paper argues that machines capable of autonomous sensing, decision-making and action, such as automated vehicles and urban robots, owned and used by different self-interested parties, and having their own agendas should be designed and built to be cooperative in their behaviours, especially if they share public spaces.


Learning by Cheating
This work shows that this challenging learning problem can be simplified by decomposing it into two stages and uses the presented approach to train a vision-based autonomous driving system that substantially outperforms the state of the art on the CARLA benchmark and the recent NoCrash benchmark.
Exploring the Limitations of Behavior Cloning for Autonomous Driving
It is shown that behavior cloning leads to state-of-the-art results, executing complex lateral and longitudinal maneuvers, even in unseen environments, without being explicitly programmed to do so, and some limitations of the behavior cloning approach are confirmed.
End-to-End Driving Via Conditional Imitation Learning
This work evaluates different architectures for conditional imitation learning in vision-based driving and conducts experiments in realistic three-dimensional simulations of urban driving and on a 1/5 scale robotic truck that is trained to drive in a residential area.
Multi-View Fusion of Sensor Data for Improved Perception and Prediction in Autonomous Driving
The proposed multi-view fusion approach improves the state-of-the-art on proprietary large-scale real-world data collected by a fleet of self-driving vehicles, as well as on the public nuScenes data set with minimal increases on the computational cost.
Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D
In pursuit of the goal of learning dense representations for motion planning, it is shown that the representations inferred by the model enable interpretable end-to-end motion planning by "shooting" template trajectories into a bird's-eye-view cost map output by the network.
One Thousand and One Hours: Self-driving Motion Prediction Dataset
This collection was collected by a fleet of 20 autonomous vehicles along a fixed route in Palo Alto, California over a four-month period and forms the largest, most complete and detailed dataset to date for the development of self-driving, machine learning tasks such as motion forecasting, planning and simulation.
FISHING Net: Future Inference of Semantic Heatmaps In Grids
This work presents an end-to-end pipeline that performs semantic segmentation and short term prediction using a top-down semantic grid representation and finds this representation favorable as it is agnostic to sensor-specific reference frames and captures both the semantic and geometric information for the surrounding scene.
Can Autonomous Vehicles Identify, Recover From, and Adapt to Distribution Shifts?
An epistemic uncertainty-aware planning method, called robust imitative planning (RIP), that can detect and recover from some distribution shifts, reducing the overconfident and catastrophic extrapolations in OOD scenes.
PnPNet: End-to-End Perception and Prediction With Tracking in the Loop
This work proposes PnPNet, an end-to-end model that takes as input sequential sensor data, and outputs at each time step object tracks and their future trajectories, and shows significant improvements over the state-of-the-art with better occlusion recovery and more accurate future prediction.
Label Efficient Visual Abstractions for Autonomous Driving
This work seeks to quantify the impact of reducing segmentation annotation costs on learned behavior cloning agents, and finds that state-of-the-art driving performance can be achieved with orders of magnitude reduction in annotation cost.