The Visual Object Tracking VOT2017 Challenge Results
- M. Kristan, A. Leonardis, Zhiqun He
- Computer ScienceIEEE International Conference on Computer Vision…
- 1 October 2017
The Visual Object Tracking challenge VOT2017 is the fifth annual tracker benchmarking activity organized by the VOT initiative. Results of 51 trackers are presented; many are state-of-the-art…
3D Multi-Object Tracking: A Baseline and New Evaluation Metrics
- Xinshuo Weng, Jianren Wang, David Held, Kris Kitani
- Computer ScienceIEEE/RJS International Conference on Intelligent…
- 9 July 2019
Surprisingly, although the proposed system does not use any 2D data as inputs, it achieves competitive performance on the KITTI 2D MOT leaderboard and runs at a rate of 207.4 FPS, achieving the fastest speed among all modern MOT systems.
PRECOG: PREdiction Conditioned on Goals in Visual Multi-Agent Settings
- Nicholas Rhinehart, Rowan McAllister, Kris Kitani, S. Levine
- Computer ScienceIEEE International Conference on Computer Vision
- 3 May 2019
A probabilistic forecasting model of future interactions between a variable number of agents that performs both standard forecasting and the novel task of conditional forecasting, which reasons about how all agents will likely respond to the goal of a controlled agent.
A Baseline for 3D Multi-Object Tracking
- Xinshuo Weng, Kris Kitani
- Computer ScienceArXiv
- 9 July 2019
This work proposes a simple yet accurate real-time baseline 3D MOT system, using an off-the-shelf 3D object detector to obtain oriented 3D bounding boxes from the LiDAR point cloud and using a combination of 3D Kalman filter and Hungarian algorithm for state estimation and data association.
Ego4D: Around the World in 3,000 Hours of Egocentric Video
- K. Grauman, Andrew Westbury, J. Malik
- Computer ScienceComputer Vision and Pattern Recognition
- 13 October 2021
Ego4D, a massive-scale egocentric video dataset and benchmark suite, is introduced and a host of new benchmark challenges centered around understanding the first-person visual experience in the past, present, and future are presented.
AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting
- Ye Yuan, Xinshuo Weng, Yanglan Ou, Kris Kitani
- Computer ScienceIEEE International Conference on Computer Vision
- 25 March 2021
A stochastic multi-agent trajectory prediction model that can attend to features of any agent at any previous timestep when inferring an agent’s future position is proposed and significantly improves the state of the art on well-established pedestrian and autonomous driving datasets.
Fast unsupervised ego-action learning for first-person sports videos
- Kris Kitani, Takahiro Okabe, Yoichi Sato, A. Sugimoto
- Computer ScienceComputer Vision and Pattern Recognition
- 20 June 2011
This work addresses the novel task of discovering first-person action categories (which it is called ego-actions) which can be useful for such tasks as video indexing and retrieval and investigates the use of motion-based histograms and unsupervised learning algorithms to quickly cluster video content.
Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud
- Xinshuo Weng, Kris Kitani
- Computer Science, Environmental ScienceIEEE/CVF International Conference on Computer…
- 23 March 2019
This work aims at bridging the performance gap between 3D sensing and 2D sensing for 3D object detection by enhancing LiDAR-based algorithms to work with single image input by enhancing pseudo-LiDAR end-to-end methods.
Rethinking Transformer-based Set Prediction for Object Detection
- Zhiqing Sun, Shengcao Cao, Yiming Yang, Kris Kitani
- Computer ScienceIEEE International Conference on Computer Vision
- 21 November 2020
Experimental results show that the proposed methods not only converge much faster than the original DETR, but also significantly outperform DETR and other baselines in terms of detection accuracy.
GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi-Feature Learning
- Xinshuo Weng, Yongxin Wang, Yunze Man, Kris Kitani
- Computer ScienceComputer Vision and Pattern Recognition
- 1 June 2020
This work proposes two techniques to improve the discriminative feature learning for MOT by introducing a novel feature interaction mechanism by introducing the Graph Neural Network and proposes a novel joint feature extractor to learn appearance and motion features from 2D and 3D space simultaneously.
...
...