See the Glass Half Full: Reasoning About Liquid Containers, Their Volume and Content

  title={See the Glass Half Full: Reasoning About Liquid Containers, Their Volume and Content},
  author={Roozbeh Mottaghi and Connor Schenck and Dieter Fox and Ali Farhadi},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
Humans have rich understanding of liquid containers and their contents; for example, we can effortlessly pour water from a pitcher to a cup. Doing so requires estimating the volume of the cup, approximating the amount of water in the pitcher, and predicting the behavior of water when we tilt the pitcher. Very little attention in computer vision has been made to liquids and their containers. In this paper, we study liquid containers and their contents, and propose methods to estimate the volume… 

Figures and Tables from this paper

Multi-modal estimation of the properties of containers and their content: survey and evaluation
An open benchmarking framework and an in-depth comparative analysis of recent methods that estimate the capacity of a container, as well as the type, mass, and amount of its content are presented.
The CORSMAL Benchmark for the Prediction of the Properties of Containers
It is concluded that audio-only and audio-visual classifiers are suitable for the estimation of the type and amount of the content using different types of convolutional neural networks, combined with either recurrent neural networks or a majority voting strategy, whereas computer vision methods are suitable to determine the capacity of the container using regression and geometric approaches.
Predicting 3D shapes, masks, and properties of materials, liquids, and objects inside transparent containers, using the TransProteus CGI dataset
This work supplies a new procedurally generated dataset consisting of 50k images of liquids and solid objects inside transparent containers, and proposes a camera agnostic method that predicts 3D models from an image as an XYZ map.
Audio-Visual Object Classification for Human-Robot Collaboration
The CORSMAL challenge is presented and a dataset is presented to assess the performance of the algorithms through a set of well-defined performance scores and a novel feature of the challenge is the real-to-simulation framework for visualising and assessing the impact of estimation errors in human- to-robot handovers.
Precision Pouring into Unknown Containers by Service Robots
Two approaches for controlling the motion of a service robot as it pours liquid precisely from an unknown container into another unknown container without the need of any external tools are proposed.
Making Sense of Audio Vibration for Liquid Height Estimation in Robotic Pouring
This paper proposes to make use of audio vibration sensing and design a deep neural network PouringNet to predict the liquid height from the audio fragment during the robotic pouring task, and facilitates a more robust and accurate audio-based perception for robotic pouring.
Autonomous Precision Pouring from Unknown Symmetric Containers
We autonomously pour from unknown symmetric containers found in a traditional wet-lab towards the development of a robot-assisted rapid experiment preparation system. The robot estimates the pouring
Pouring from Deformable Containers Based on Tactile Information Using a Dual-Arm Robot
This paper considers household robots that pour various contents from deformable containers and carefully design the grasping strategy: the palm of one hand supports the deformable container from the bottom and the other hand pulls up the containers from the top.
Computer Vision for Recognition of Materials and Vessels in Chemistry Lab Settings and the Vector-LabPics Dataset
This work presents the Vector-LabPics data set, which consists of 2187 images of materials within mostly transparent vessels in a chemistry lab and other general settings, and trained neural networks achieved good accuracy in detecting and segmenting vessels and material phases, and in classifying liquids and solids.
Liquid Pouring Monitoring via Rich Sensory Inputs
This work trains a hierarchical LSTM with late fusion for monitoring and proposes two auxiliary tasks during training: inferring the initial state of containers and forecasting the one-step future 3D trajectory of the hand with an adversarial training procedure to improve the robustness of the system.


Fill and Transfer: A Simple Physics-Based Approach for Containability Reasoning
A novel approach to reason about liquid containability - the affordance of containing liquid based on two simple physical processes: the Fill and Transfer of liquid is introduced.
Microsoft COCO: Common Objects in Context
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene
"What Happens If..." Learning to Predict the Effect of Forces in Images
A deep neural network model is designed that learns long-term sequential dependencies of object movements while taking into account the geometry and appearance of the scene by combining Convolutional and Recurrent Neural Networks.
What Is Where: Inferring Containment Relations from Videos
A dynamic programming algorithm is adopted to finding both the optimal sequence of containment relations across the video, and the containment relation changes between adjacent frames by reasoning about containment relations over time.
Humans predict liquid dynamics using probabilistic simulation
This thesis finds evidence that people’s reasoning about how liquids move is consistent with a computational cognitive model based on approximate probabilistic simulation and extends this thesis to the more complex and unexplored domain of reasoning about liquids.
Evaluating Human Cognition of Containing Relations with Physical Simulation
The physical simulation is a good approximation to the human cognition of container and containing relations with physical simulation as well as to human judgments with respect to results of physical simulation under different scenarios.
Seeing Glassware: from Edge Detection to Pose Estimation and Shape Recovery
A new approach that combines recent advances in learnt object detectors with perceptual grouping in 2D, and projective geometry of apparent contours in 3D is introduced and results comparable to category-based detection and localization of opaque objects without any training on the object shape are shown.
Inferring Forces and Learning Human Utilities from Videos
We propose a notion of affordance that takes into account physical quantities generated when the human body interacts with real-world objects, and introduce a learning framework that incorporates the
Are Elephants Bigger than Butterflies? Reasoning about Sizes of Objects
This paper introduces a method to automatically infer object sizes, leveraging visual and textual information from web, and shows that the method outperforms competitive textual and visual baselines in reasoning about size comparisons.
Understanding tools: Task-oriented object modeling, learning and recognition
A new framework is presented - task-oriented modeling, learning and recognition which aims at understanding the underlying functions, physics and causality in using objects as “tools”, and any objects can be viewed as a hammer or a shovel.