Corpus ID: 195346132

Tensor Product Generation Networks

  title={Tensor Product Generation Networks},
  author={Qiuyuan Huang and Paul Smolensky and Xiaodong He and Li Deng and Dapeng Oliver Wu},
We present a new tensor product generation network (TPGN) that generates natural language descriptions for images. The model has a novel architecture that instantiates a general framework for encoding and processing symbolic structure through neural network computation. This framework is built on Tensor Product Representations (TPRs). We evaluated the proposed TPGN on the MS COCO image captioning task. The experimental results show that the TPGN outperforms the LSTM based state-of-the-art… Expand
Grammatically-Interpretable Learned Representations in Deep NLP Models
In the application of TPRN, internal representations — learned by end-to-end optimization in a deep neural network performing a textual QA task — are interpretable using basic concepts from linguistic theory, and this interpretability is achieved without paying a performance penalty. Expand
Learning to Reason with Third-Order Tensor Products
This work combines Recurrent Neural Networks with Tensor Product Representations to learn combinatorial representations of sequential data that improves symbolic interpretation and systematic generalisation and augment a subset of the data such that training and test data exhibit large systematic differences. Expand
Learning Representations from Imperfect Time Series Data via Tensor Rank Regularization
A regularization method based on tensor rank minimization is presented based on the observation that high-dimensional multimodal time series data often exhibit correlations across time and modalities which leads to low-rank tensor representations. Expand
Natasha 2: Faster Non-Convex Optimization Than SGD
A stochastic algorithm to train any smooth neural network to $\varepsilon$-approximate local minima, using backpropagations to find any smooth nonconvex function in rate, with only oracle access to stochastically gradients. Expand


Question-Answering with Grammatically-Interpretable Representations
Support is found for the initial hypothesis that symbols can be interpreted as lexical-semantic word meanings, while roles can be interpretations as approximations of grammatical roles (or categories) such as subject, wh-word, determiner, etc. Expand
Deep visual-semantic alignments for generating image descriptions
A model that generates natural language descriptions of images and their regions based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding is presented. Expand
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
This work introduces the structure-content neural language model that disentangles the structure of a sentence to its content, conditioned on representations produced by the encoder, and shows that with linear encoders, the learned embedding space captures multimodal regularities in terms of vector space arithmetic. Expand
Show and tell: A neural image caption generator
This paper presents a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. Expand
Semantic Compositional Networks for Visual Captioning
  • Zhe Gan, Chuang Gan, +5 authors L. Deng
  • Computer Science
  • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
Experimental results show that the proposed method significantly outperforms prior state-of-the-art approaches, across multiple evaluation metrics. Expand
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
The m-RNN model directly models the probability distribution of generating a word given previous words and an image, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval. Expand
Language Models for Image Captioning: The Quirks and What Works
By combining key aspects of the ME and RNN methods, this paper achieves a new record performance over previously published results on the benchmark COCO dataset, however, the gains the authors see in BLEU do not translate to human judgments. Expand
Mind's eye: A recurrent visual representation for image caption generation
This paper explores the bi-directional mapping between images and their sentence-based descriptions with a recurrent neural network that attempts to dynamically build a visual representation of the scene as a caption is being generated or read. Expand
Multimodal Neural Language Models
This work introduces two multimodal neural language models: models of natural language that can be conditioned on other modalities and imagetext modelling, which can generate sentence descriptions for images without the use of templates, structured prediction, and/or syntactic trees. Expand
Jointly learning sentence embeddings and syntax with unsupervised Tree-LSTMs
This paper introduces a model based on the CKY chart parser, and evaluates its downstream performance on a natural language inference task and a reverse dictionary task, and finds that its approach is competitive against similar models of comparable size and outperforms Tree-LSTMs that use trees produced by a parser. Expand