Hierarchically-Attentive RNN for Album Summarization and Storytelling

@inproceedings{Yu2017HierarchicallyAttentiveRF,
  title={Hierarchically-Attentive RNN for Album Summarization and Storytelling},
  author={Licheng Yu and Mohit Bansal and Tamara L. Berg},
  booktitle={EMNLP},
  year={2017}
}
We address the problem of end-to-end visual storytelling. Given a photo album, our model first selects the most representative (summary) photos, and then composes a natural language story for the album. For this task, we make use of the Visual Storytelling dataset and a model composed of three hierarchically-attentive Recurrent Neural Nets (RNNs) to: encode the album photos, select representative (summary) photos, and compose the story. Automatic and human evaluations show our model achieves… Expand
A Hierarchical Approach for Visual Storytelling Using Image Description
TLDR
This paper proposes a hierarchical deep learning architecture based on encoder-decoder networks that incorporates natural language image descriptions along with the images themselves to generate each story sentence and demonstrates the effectiveness of the architecture for visual storytelling. Expand
Contextualize, Show and Tell: A Neural Visual Storyteller
TLDR
A neural model for generating short stories from image sequences, which extends the image description model by Vinyals et al., 2015, showed competitive results with the METEOR metric and human ratings in the internal track of the Visual Storytelling Challenge 2018. Expand
BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling
TLDR
Experimental results demonstrate that the proposed novel hierarchical visual storytelling framework outperforms most closely related baselines under automatic evaluation metrics BLEU and CIDEr, and also show the effectiveness of the method with human evaluation. Expand
Incorporating Textual Evidence in Visual Storytelling
TLDR
This work exploits textual evidence from similar images to help generate coherent and meaningful stories using an extended Seq2Seq model with two-channel encoder and attention and proposes a two-step ranking method based on image object recognition techniques. Expand
Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling
TLDR
This paper applies the gradient-based meta-learning algorithm on multi-modal seq2seq models to endow the VIST model the ability to adapt quickly from topic to topic, and proposes a topic adaptive storyteller to model the able of inter-topic generalization. Expand
Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling
TLDR
This paper proposes a hide-and-tell model, which is designed to learn non-local relations across the photo streams and to refine and improve conventional RNN-based models, and qualitatively shows the learned ability to interpolate storyline over visual gaps. Expand
What Makes A Good Story? Designing Composite Rewards for Visual Storytelling
TLDR
This paper proposes a reinforcement learning framework, ReCo-RL, with reward functions designed to capture the essence of three assessment criteria: relevance, coherence and expressiveness, which it is observed through empirical analysis could constitute a "high-quality" story to the human eye. Expand
Towards Coherent Visual Storytelling with Ordered Image Attention
TLDR
Order image attention (OIA) models interactions between the sentence-corresponding image and important regions in other images of the sequence and introduces an adaptive prior to alleviate common linguistic mistakes like repetitiveness. Expand
Informative Visual Storytelling with Cross-modal Rules
TLDR
This work proposes a method to mine the cross-modal rules to help the model infer these informative concepts given certain visual input, and leverages these concepts in the authors' encoder-decoder framework with the attention mechanism. Expand
Diverse and Relevant Visual Storytelling with Scene Graph Embeddings
TLDR
This work applies metrics that account for the diversity of words and phrases of generated stories as well as for reference to narratively-salient image features and shows that this approach outperforms previous systems on reference-based metrics. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 32 REFERENCES
Sort Story: Sorting Jumbled Images and Captions into Stories
TLDR
This work proposes the task of sequencing -- given a jumbled set of aligned image-caption pairs that belong to a story, the task is to sort them such that the output sequence forms a coherent story, and presents multiple approaches, via unary and pairwise predictions, and their ensemble-based combinations, achieving strong results. Expand
Visual Storytelling
TLDR
Modelling concrete description as well as figurative and social language, as provided in this dataset and the storytelling task, has the potential to move artificial intelligence from basic understandings of typical visual scenes towards more and more human-like understanding of grounded event structure and subjective expression. Expand
Learning Visual Storylines with Skipping Recurrent Neural Networks
TLDR
The novel Skipping Recurrent Neural Network model, which uses a framework that skips through the images in the photo stream to explore the space of all ordered subsets of the albums via an efficient sampling procedure, reduces the negative impact of strong short-term correlations, and recovers the latent story more accurately. Expand
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
TLDR
An approach that exploits hierarchical Recurrent Neural Networks to tackle the video captioning problem, i.e., generating one or multiple sentences to describe a realistic video, significantly outperforms the current state-of-the-art methods. Expand
A Neural Attention Model for Abstractive Sentence Summarization
TLDR
This work proposes a fully data-driven approach to abstractive sentence summarization by utilizing a local attention-based model that generates each word of the summary conditioned on the input sentence. Expand
Video summarization by learning submodular mixtures of objectives
TLDR
A new method is introduced that uses a supervised approach in order to learn the importance of global characteristics of a summary and jointly optimizes for multiple objectives and thus creates summaries that posses multiple properties of a good summary. Expand
Textually Customized Video Summaries
TLDR
This paper trains a deep architecture to effectively learn semantic embeddings of video frames by leveraging the abundance of image-caption data via a progressive and residual manner and shows that the method is able to generate semantically diverse video summaries by only utilizing the learned visual embedded data. Expand
Story-Driven Summarization for Egocentric Video
  • Zheng Lu, K. Grauman
  • Computer Science
  • 2013 IEEE Conference on Computer Vision and Pattern Recognition
  • 2013
TLDR
A video summarization approach that discovers the story of an egocentric video, and defines a random-walk based metric of influence between sub shots that reflects how visual objects contribute to the progression of events. Expand
Summary Transfer: Exemplar-Based Subset Selection for Video Summarization
TLDR
A novel subset selection technique that leverages supervision in the form of humancreated summaries to perform automatic keyframe-based video summarization, and shows how to extend the method to exploit semantic side information about the video's category/ genre to guide the transfer process by those training videos semantically consistent with the test input. Expand
Neural Summarization by Extracting Sentences and Words
TLDR
This work develops a general framework for single-document summarization composed of a hierarchical document encoder and an attention-based extractor that allows for different classes of summarization models which can extract sentences or words. Expand
...
1
2
3
4
...