SoundSpaces: Audio-Visual Navigation in 3D Environments
- Changan Chen, Unnat Jain, K. Grauman
- Computer ScienceEuropean Conference on Computer Vision
- 24 December 2019
This work proposes a multi-modal deep reinforcement learning approach to train navigation policies end-to-end from a stream of egocentric audio-visual observations, allowing the agent to discover elements of the geometry of the physical space indicated by the reverberating audio and detect and follow sound-emitting targets.
Creativity: Generating Diverse Questions Using Variational Autoencoders
- Unnat Jain, Ziyu Zhang, A. Schwing
- Computer ScienceComputer Vision and Pattern Recognition
- 11 April 2017
This paper proposes a creative algorithm for visual question generation which combines the advantages of variational autoencoders with long short-term memory networks and demonstrates that this framework is able to generate a large set of varying questions given a single input image.
Two Can Play This Game: Visual Dialog with Discriminative Question Generation and Answering
- Unnat Jain, S. Lazebnik, A. Schwing
- Computer ScienceIEEE/CVF Conference on Computer Vision and…
- 29 March 2018
A simple symmetric discriminative baseline is demonstrated that can be applied to both predicting an answer as well as predicting a question, and it is shown that this method performs on par with the state of the art, even memory net based methods.
MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation
- Saim Wani, Shivansh Patel, Unnat Jain, Angel X. Chang, M. Savva
- Computer ScienceNeural Information Processing Systems
- 7 December 2020
This work proposes the multiON task, which requires navigation to an episode-specific sequence of objects in a realistic environment and generalizes the ObjectGoal navigation task and explicitly tests the ability of navigation agents to locate previously observed goal objects.
Cooperative Exploration for Multi-Agent Deep Reinforcement Learning
- Iou-Jen Liu, Unnat Jain, Raymond A. Yeh, A. Schwing
- Computer ScienceInternational Conference on Machine Learning
- 23 July 2021
Cooperative multi-agent exploration (CMAE) is proposed, where the goal is selected from multiple projected state spaces via a normalized entropy-based technique and agents are trained to reach this goal in a coordinated manner.
A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks
- Unnat Jain, Luca Weihs, A. Schwing
- Computer ScienceEuropean Conference on Computer Vision
- 9 July 2020
The novel task FurnMove is introduced, in which agents work together to move a piece of furniture through a living room to a goal, and SYNC-policies (synchronize your actions coherently) and CORDIAL (coordination loss) are introduced.
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
- Jingxiang Lin, Unnat Jain, A. Schwing
- Computer Science
- 31 October 2019
A much simpler model obtained by ablating and pruning the existing intricate baseline can perform better with half the number of trainable parameters, and is obtained for the new visual commonsense reasoning (VCR) task, TAB-VCR.
AllenAct: A Framework for Embodied AI Research
- Luca Weihs, Jordi Salvador, Aniruddha Kembhavi
- Computer ScienceArXiv
- 28 August 2020
AllenAct is introduced, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research that provides first-class support for a growing collection of embodied environments, tasks and algorithms.
TAB-VCR: Tags and Attributes based VCR Baselines
- Jingxiang Lin, Unnat Jain, A. Schwing
- Computer ScienceNeural Information Processing Systems
- 31 October 2019
This work shows that a much simpler model obtained by ablating and pruning the existing intricate baseline can perform better with half the number of trainable parameters, and obtains further improvements for the simpler & effective baseline, TAB-VCR.
Audio-Visual Embodied Navigation
- Changan Chen, Unnat Jain, K. Grauman
- Computer ScienceArXiv
- 24 December 2019
This work develops a multi-modal deep reinforcement learning pipeline to train navigation policies end-to-end from a stream of egocentric audio-visual observations, allowing the agent to discover elements of the geometry of the physical space indicated by the reverberating audio and detect and follow sound-emitting targets.
...
...