Mateusz Malinowski

Learn More
We address a question answering task on real-world images that is set up as a Visual Turing Test. By combining latest advances in image representation and natural language processing, we propose Neural-Image-QA, an end-to-end formulation to this problem for which all parts are trained jointly. In contrast to previous efforts, we are facing a multi-modal(More)
We propose a method for automatically answering questions about images by bringing together recent advances from natural language processing and computer vision. We combine discrete reasoning with uncertain predictions by a multiworld approach that represents uncertainty about the perceived world in a bayesian framework. Our approach can handle human(More)
Scaling up visual category recognition to large numbers of classes remains challenging. A promising research direction is zero-shot learning, which does not require any training data to recognize new classes, but rather relies on some form of auxiliary information describing the new classes. Ultimately, this may allow to use textbook knowledge that humans(More)
As language and visual understanding by machines progresses rapidly, we are observing an increasing interest in holistic architectures that tightly interlink both modalities in a joint learning and inference process. This trend has allowed the community to progress towards more challenging and open tasks and refueled the hope at achieving the old AI dream(More)
One of the difficulties in interactive music and entertainment is creating environments that reflect and react to the collective activity of groups with tens, hundreds, or even thousands of participants. Generating content on this scale involves many challenges. For example, how is the individual granted low latency control and a sense of causality, while(More)
Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn. In this paper we describe how to use Relation Networks (RNs) as a simple plug-and-play module to solve problems that fundamentally hinge on relational reasoning. We tested RN-augmented networks on three tasks: visual question(More)
PURPOSE Volumetric assessment of the liver regularly yields discrepant results between pre- and intraoperatively determined volumes. Nevertheless, the main factor responsible for this discrepancy remains still unclear. The aim of this study was to systematically determine the difference between in vivo CT-volumetry and ex vivo volumetry in a pig animal(More)
This paper describes CargoNet, a system of low-cost, micropower active sensor tags that seeks to bridge the current gap between wireless sensor networks and radio-frequency identification (RFID). CargoNet was aimed at applications in environmental monitoring at the crate and case level for supply-chain management and asset security. Custom-designed circuits(More)
We propose a Deep Learning approach to the visual question answering task, where machines answer to questions about real-world images. By combining latest advances in image representation and natural language processing, we propose Ask Your Neurons, a scalable, jointly trained, end-to-end formulation to this problem. In contrast to previous efforts, we are(More)
Non-invasive breath tests can serve as valuable diagnostic tools in medicine as they can determine particular enzymatic and metabolic functions in vivo. However, methodological pitfalls have limited the actual clinical application of those tests till today. A major challenge of non-invasive breath tests has remained the provision of individually reliable(More)