• Publications
  • Influence
AI2-THOR: An Interactive 3D Environment for Visual AI
TLDR
AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks and facilitate building visually intelligent models. Expand
Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations
TLDR
This work proposes to utilize the synergies between grasp affordance and 3D reconstruction through multi-task learning of a shared representation, which takes advantage of deep implicit functions, a continuous and memory-efficient representation, to enable differentiable training of both tasks. Expand
Discovering Generalizable Skills via Automated Generation of Diverse Tasks
TLDR
The proposed Skill Learning In Diversified Environments (SLIDE), a method to discover generalizable skills via automated generation of a diverse set of tasks, suggests that the learned skills can effectively improve the robot’s performance in various unseen target tasks compared to existing reinforcement learning and skill learning methods. Expand
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
TLDR
MultiBench is released, a systematic and unified large-scale benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas that paves the way towards a better understanding of the capabilities and limitations of multimodal models, all the while ensuring ease of use, accessibility, and reproducibility. Expand
Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning
TLDR
This work considers the fundamental hurdle affecting both valuebased and policy-gradient approaches: an exponential blowup of the action space with the number of agents and proposes a novel tensorised formulation of the Bellman equation, which gives rise to TESSERACT, which views the Q-function as a tensor whose modes correspond to the action spaces of different agents. Expand
SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies
  • Linxi Fan, Guanzhi Wang, +4 authors Anima Anandkumar
  • Computer Science
  • ICML
  • 17 June 2021
TLDR
This work considers robust policy learning which targets zero-shot generalization to unseen visual environments with large distributional shift and proposes SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to decouple robust representation learning from policy optimization. Expand
Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales
TLDR
A novel computer vision task, i.e., the Dynamic Metric Learning, which aims to learn a scalable metric space to accommodate visual concepts across multiple semantic scales, and proposes Cross-Scale Learning (CSL) to alleviate conflict. Expand
Coach-Player Multi-Agent Reinforcement Learning for Dynamic Team Composition
TLDR
COPA, a coach-player framework to tackle the problem of coordinating teams with dynamic composition, adopts the attention mechanism for both the coach and the players; proposes a variational objective to regularize learning; and designs an adaptive communication method to let the coach decide when to communicate with the players. Expand
DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision
TLDR
A self-ensembling framework where instance segmentation and semantic correspondence are jointly guided by a structured teacher in addition to the bounding box supervision, which shows a symbiotic relationship where the two tasks mutually benefit from each other. Expand