Emiel van Miltenburg

Learn More
Following earlier work in multimodal distributional semantics, we present the first results of our efforts to build a perceptually grounded semantic model. Rather than using images, our models are built on sound data collected from freesound.org. We compare three models: one bag-of-words model based on user-provided tags, a model based on audio features,(More)
This paper presents a pattern-based method that can be used to infer adjectival scales, such as 〈lukewarm,warm, hot〉, from a corpus. Specifically, the proposed method uses lexical patterns to automatically identify and order pairs of scalemates, followed by a filtering phase in which unrelated pairs are discarded. For the filtering phase, several different(More)
In: Proceedings of the Workshop on Multimodal Corpora: Computer vision and language processing (MMC-2016), pages 1–4. Workshop held: 24 May 2016, collocated with LREC 2016, Portorož, Slovenia. Proceedings available at: http://www.lrec-conf.org/proceedings/lrec2016/workshops/ LREC2016Workshop-MCC-2016-proceedings.pdf An untested assumption behind the(More)
We provide a qualitative analysis of the descriptions containing negations (no, not, n’t, nobody, etc) in the Flickr30K corpus, and a categorization of negation uses. Based on this analysis, we provide a set of requirements that an image description system should have in order to generate negation sentences. As a pilot experiment, we used our categorization(More)
This paper presents a collection of annotations (tags or keywords) for a set of 2,133 environmental sounds taken from the Freesound database (www.freesound.org). The annotations are acquired through an open-ended crowd-labeling task, in which participants were asked to provide keywords for each of three sounds. The main goal of this study is to find out (i)(More)
This research proposal discusses pragmatic factors in image description, arguing that current automatic image description systems do not take these factors into account. I present a general model of the human image description process, and propose to study this process using corpus analysis, experiments, and computational modeling. This will lead to a(More)
In recent years we have seen rapid and significant progress in automatic image description but what are the open problems in this area? Most work has been evaluated using text-based similarity metrics, which only indicate that there have been improvements, without explaining what has improved. In this paper, we present a detailed error analysis of the(More)
Automatic image description systems are commonly trained and evaluated on large image description datasets. Recently, researchers have started to collect such datasets for languages other than English. An unexplored question is how different these datasets are from English and, if there are any differences, what causes them to differ. This paper provides a(More)
  • 1