• Publications
  • Influence
Distributed cosegmentation via submodular optimization on anisotropic diffusion
CoSand is proposed, a distributed cosegmentation approach for a highly variable large-scale image collection that takes advantage of a strong theoretic property in that the temperature under linear anisotropic diffusion is a submodular function; therefore, a greedy algorithm guarantees at least a constant factor approximation to the optimal solution for temperature maximization. Expand
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
This paper proposes three new tasks designed specifically for video VQA, which require spatio-temporal reasoning from videos to answer questions correctly and introduces a new large-scale dataset for videoVQA named TGIF-QA that extends existing VQ a work with its new tasks. Expand
On multiple foreground cosegmentation
  • Gunhee Kim, E. Xing
  • Mathematics, Computer Science
  • IEEE Conference on Computer Vision and Pattern…
  • 16 June 2012
This paper proposes a novel optimization method for MFC, which makes no assumption on foreground configurations and does not suffer from the aforementioned limitation, while still leverages all the benefits of having co-occurring or (partially) recurring contents across images. Expand
A Joint Sequence Fusion Model for Video Question Answering and Retrieval
This work focuses on video-language tasks including multimodal retrieval and video QA, and evaluates the JSFusion model in three retrieval and VQA tasks in LSMDC, for which the model achieves the best performance reported so far. Expand
Unsupervised modeling of object categories using link analysis techniques
An approach for learning visual models of object categories in an unsupervised manner in which a large-scale complex network is built which captures the interactions of all unit visual features across the entire training set and information is inferred directly from the graph by using link analysis techniques. Expand
Big/little deep neural network for ultra low power inference
A novel concept called big/LITTLE DNN (BL-DNN) which significantly reduces energy consumption required for DNN execution at a negligible loss of inference accuracy, and presents design-time and runtime methods to control the execution of big DNN under a trade-off between energy consumption and inference accuracy. Expand
A Hierarchical Latent Structure for Variational Conversation Modeling
A novel model named Variational Hierarchical Conversation RNNs (VHCR), involving two key ideas of using a hierarchical structure of latent variables, and exploiting an utterance drop regularization is proposed, which successfully utilizes latent variables and outperforms state-of-the-art models for conversation generation. Expand
A Neural Dirichlet Process Mixture Model for Task-Free Continual Learning
This work proposes an expansion-based approach for task-free continual learning for the first time and presents a model, named Continual Neural Dirichlet Process Mixture (CN-DPM), which expands the number of experts in a principled way under the Bayesian nonparametric framework. Expand
Abstractive Summarization of Reddit Posts with Multi-level Memory Networks
This work collects Reddit TIFU dataset, consisting of 120K posts from the online discussion forum Reddit, and proposes a novel abstractive summarization model named multi-level memory networks (MMN), equipped with multi- level memory to store the information of text from different levels of abstraction. Expand
A Read-Write Memory Network for Movie Story Understanding
A novel memory network model named Read-Write Memory Network (RWMN) is proposed to perform question and answering tasks for large-scale, multimodal movie story understanding and shows a potential to better understand not only the content in the story, but also more abstract information, such as relationships between characters and the reasons for their actions. Expand