Corpus ID: 195346132

# Tensor Product Generation Networks

@article{Huang2017TensorPG,
title={Tensor Product Generation Networks},
author={Qiuyuan Huang and Paul Smolensky and Xiaodong He and Li Deng and Dapeng Oliver Wu},
journal={ArXiv},
year={2017},
volume={abs/1709.09118}
}
We present a new tensor product generation network (TPGN) that generates natural language descriptions for images. The model has a novel architecture that instantiates a general framework for encoding and processing symbolic structure through neural network computation. This framework is built on Tensor Product Representations (TPRs). We evaluated the proposed TPGN on the MS COCO image captioning task. The experimental results show that the TPGN outperforms the LSTM based state-of-the-art… Expand
4 Citations

#### Figures, Tables, and Topics from this paper

Grammatically-Interpretable Learned Representations in Deep NLP Models
• Computer Science
• 2018
In the application of TPRN, internal representations — learned by end-to-end optimization in a deep neural network performing a textual QA task — are interpretable using basic concepts from linguistic theory, and this interpretability is achieved without paying a performance penalty. Expand
Learning to Reason with Third-Order Tensor Products
• Computer Science, Mathematics
• NeurIPS
• 2018
This work combines Recurrent Neural Networks with Tensor Product Representations to learn combinatorial representations of sequential data that improves symbolic interpretation and systematic generalisation and augment a subset of the data such that training and test data exhibit large systematic differences. Expand
Learning Representations from Imperfect Time Series Data via Tensor Rank Regularization
• Mathematics, Computer Science
• ACL
• 2019
A regularization method based on tensor rank minimization is presented based on the observation that high-dimensional multimodal time series data often exhibit correlations across time and modalities which leads to low-rank tensor representations. Expand
Natasha 2: Faster Non-Convex Optimization Than SGD
A stochastic algorithm to train any smooth neural network to $\varepsilon$-approximate local minima, using backpropagations to find any smooth nonconvex function in rate, with only oracle access to stochastically gradients. Expand

#### References

SHOWING 1-10 OF 32 REFERENCES
Question-Answering with Grammatically-Interpretable Representations
• Computer Science
• AAAI
• 2018
Support is found for the initial hypothesis that symbols can be interpreted as lexical-semantic word meanings, while roles can be interpretations as approximations of grammatical roles (or categories) such as subject, wh-word, determiner, etc. Expand
Deep visual-semantic alignments for generating image descriptions
• Computer Science
• CVPR
• 2015
A model that generates natural language descriptions of images and their regions based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding is presented. Expand
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
• Computer Science
• ArXiv
• 2014
This work introduces the structure-content neural language model that disentangles the structure of a sentence to its content, conditioned on representations produced by the encoder, and shows that with linear encoders, the learned embedding space captures multimodal regularities in terms of vector space arithmetic. Expand
Show and tell: A neural image caption generator
• Computer Science
• 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
• 2015
This paper presents a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. Expand
Semantic Compositional Networks for Visual Captioning
• Zhe Gan, +5 authors L. Deng
• Computer Science
• 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
• 2017
Experimental results show that the proposed method significantly outperforms prior state-of-the-art approaches, across multiple evaluation metrics. Expand
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
The m-RNN model directly models the probability distribution of generating a word given previous words and an image, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval. Expand
Language Models for Image Captioning: The Quirks and What Works
By combining key aspects of the ME and RNN methods, this paper achieves a new record performance over previously published results on the benchmark COCO dataset, however, the gains the authors see in BLEU do not translate to human judgments. Expand
Mind's eye: A recurrent visual representation for image caption generation
• Computer Science
• 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
• 2015
This paper explores the bi-directional mapping between images and their sentence-based descriptions with a recurrent neural network that attempts to dynamically build a visual representation of the scene as a caption is being generated or read. Expand
Multimodal Neural Language Models
• Computer Science
• ICML
• 2014
This work introduces two multimodal neural language models: models of natural language that can be conditioned on other modalities and imagetext modelling, which can generate sentence descriptions for images without the use of templates, structured prediction, and/or syntactic trees. Expand
Jointly learning sentence embeddings and syntax with unsupervised Tree-LSTMs
• Computer Science
• Natural Language Engineering
• 2019
This paper introduces a model based on the CKY chart parser, and evaluates its downstream performance on a natural language inference task and a reverse dictionary task, and finds that its approach is competitive against similar models of comparable size and outperforms Tree-LSTMs that use trees produced by a parser. Expand