• Publications
  • Influence
Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors
TLDR
Improvements in facial landmark detection on both images and video and significant reduction of jittering in video detections are demonstrated.
3D Multi-Object Tracking: A Baseline and New Evaluation Metrics
TLDR
Surprisingly, although the proposed system does not use any 2D data as inputs, it achieves competitive performance on the KITTI 2D MOT leaderboard and runs at a rate of 207.4 FPS, achieving the fastest speed among all modern MOT systems.
A Baseline for 3D Multi-Object Tracking
TLDR
This work proposes a simple yet accurate real-time baseline 3D MOT system, using an off-the-shelf 3D object detector to obtain oriented 3D bounding boxes from the LiDAR point cloud and using a combination of 3D Kalman filter and Hungarian algorithm for state estimation and data association.
Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud
  • Xinshuo Weng, Kris Kitani
  • Computer Science, Environmental Science
    IEEE/CVF International Conference on Computer…
  • 23 March 2019
TLDR
This work aims at bridging the performance gap between 3D sensing and 2D sensing for 3D object detection by enhancing LiDAR-based algorithms to work with single image input by enhancing pseudo-LiDAR end-to-end methods.
AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting
TLDR
A stochastic multi-agent trajectory prediction model that can attend to features of any agent at any previous timestep when inferring an agent’s future position is proposed and significantly improves the state of the art on well-established pedestrian and autonomous driving datasets.
GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi-Feature Learning
TLDR
This work proposes two techniques to improve the discriminative feature learning for MOT by introducing a novel feature interaction mechanism by introducing the Graph Neural Network and proposes a novel joint feature extractor to learn appearance and motion features from 2D and 3D space simultaneously.
Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting
TLDR
This work proposes to first forecast 3D sensor data and then detect/track objects on the predicted point cloud sequences to obtain future poses, i.e., a forecast-then-detect pipeline, and shows that SPFNet is effective for the SPF task, and that pose forecasting performance improves with the addition of unlabeled data.
Deep Reinforcement Learning for Autonomous Driving
TLDR
This work adopts the deep deterministic policy gradient (DDPG) algorithm, which has the capacity to handle complex state and action spaces in continuous domain, and designs a network architecture for both actor and critic inside DDPG paradigm.
Joint Object Detection and Multi-Object Tracking with Graph Neural Networks
TLDR
This work proposes a new instance of joint MOT approach based on Graph Neural Networks (GNNs), which can model relations between variablesized objects in both the spatial and temporal domains, which is essential for learning discriminative features for detection and data association.
Learning Spatio-Temporal Features with Two-Stream Deep 3D CNNs for Lipreading
TLDR
The experiments show that the deep 3D CNNs front-end with pre-training on the large-scale image and video datasets (e.g., ImageNet and Kinetics) can improve the classification accuracy and it is demonstrated that using the optical flow input alone can achieve comparable performance as using the grayscale video as input.
...
...