Hamed R.-Tavakoli

Learn More
This paper presents a framework for image captioning by exploiting the scene context. To date, most of the captioning models have been relying on the combination of Convolutional Neural Networks (CNN) and the Long-Short Term Memory (LSTM) model, trained in an end-to-end fashion. Recently, there has been extensive research towards improving the language(More)
Multimedia analysis benefits from understanding the emotional content of a scene in a variety of tasks such as video genre classification and content-based image retrieval. Recently, there has been an increasing interest in applying human bio-signals, particularly eye movements, to recognize the emotional gist of a scene such as its valence. In order to(More)
There are errors in the Funding section. The correct funding information is as follows: Scholarships, and The Finnish center of excellence in computational inference research (COIN). The funders had no role in study design, data collection and analysis, decision to publish , or preparation of the manuscript. Additionally, there is an error in the Competing(More)
  • 1