Corpus Construction and Semantic Analysis of Indonesian Image Description

  title={Corpus Construction and Semantic Analysis of Indonesian Image Description},
  author={Khumaisa Nur'Aini and Johanes Effendi and Sakriani Sakti and Mirna Adriani and Satoshi Nakamura},
Understanding language grounded in visual content is a challenging problem that has raised interest in both the computer vision and natural language processing communities. Flickr30k, which is one of the corpora that have become a standard benchmark to study sentence-based image description, was initially limited to English descriptions, but it has been extended to German, French, and Czech. This paper describes our construction of an image description dataset in the Indonesian language. We… 
Automatic Myanmar Image Captioning using CNN and LSTM-Based Language Model
A generative merge model based on Convolutional Neural Network and Long-Short Term Memory is applied especially for Myanmar image captioning, and two conventional feature extraction models Visual Geometry Group (VGG) OxfordNet 16-layer and 19-layer are compared.


Image Caption Generation with Part of Speech Guidance
Experimental results on the most popular benchmark datasets, e.g., Flickr30k and MS COCO, consistently demonstrate that the method can significantly enhance the performance of a standard image caption generation model, and achieve the conpetitive results.
Deep visual-semantic alignments for generating image descriptions
A model that generates natural language descriptions of images and their regions based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding is presented.
A Shared Task on Multimodal Machine Translation and Crosslingual Image Description
This paper introduces and summarises the findings of a new shared task at the intersection of Natural Language Processing and Computer Vision: the generation of image descriptions in a target
Multi30K: Multilingual English-German Image Descriptions
This dataset extends the Flickr30K dataset with i) German translations created by professional translators over a subset of the English descriptions, and ii) descriptions crowdsourced independently of the original English descriptions.
Add English to image Chinese captioning
  • Xiangfang Zeng, Xiaodong Wang
  • Computer Science
    2017 IEEE 2nd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA)
  • 2017
A method of adding English information to image Chinese captioning by using abundant English datasets for the issue is proposed and validated the use of English information with state-of-the art performance on the datasets: Flickr8K-CN.
Topic Models for Image Annotation and Text Illustration
A probabilistic model based on the assumption that images and their co-occurring textual data are generated by mixtures of latent topics is described, which outperforms previously proposed approaches when applied to image annotation and the related task of text illustration despite the noisy nature of the dataset.
Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description
The results from the second shared task on multimodal machine translation and multilingual image description show multi-modal systems improved, but text-only systems remain competitive.
STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset
It is shown that a neural network trained using STAIR Captions can generate more natural and better Japanese captions, compared to those generated using English-Japanese machine translation after generating English captions.
Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks
This work proposes a framework that facilitates better understanding of the encoded representations of sentence vectors and demonstrates the potential contribution of the approach by analyzing different sentence representation mechanisms.
Enriching Word Vectors with Subword Information
A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.