Understanding Contexts Inside Robot and Human Manipulation Tasks through Vision-Language Model and Ontology System in Video Streams

  title={Understanding Contexts Inside Robot and Human Manipulation Tasks through Vision-Language Model and Ontology System in Video Streams},
  author={Chen Jiang and Masood Dehghan and Martin J{\"a}gersand},
  journal={2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
Manipulation tasks in daily life, such as pouring water, unfold through human intentions. Being able to process contextual knowledge from these Activities of Daily Living (ADLs) over time can help us understand manipulation intentions, which are essential for an intelligent robot to transition smoothly between various manipulation actions. In this paper, to model the intended concepts of manipulation, we present a vision dataset under a strictly constrained knowledge domain for both robot and… 

Figures and Tables from this paper

Bridging Visual Perception with Contextual Semantics for Understanding Robot Manipulation Tasks

An implementing framework to generate high-level conceptual dynamic knowledge graphs from video clips using a combination of a Vision-Language model and an ontology system to represent robot manipulation knowledge with Entity-Relation-Entity and Entity-Attribute-Value tuples is proposed.

Adding Commonsense to Robotic Application Using Ontology-Based Model Retraining

The objective of this paper is to provide an improved retrained model for robotics in order to give them the ability to act more human-like when performing tasks, by using the proposed model robots are able to answer the incomplete command or inquiries related to a given context.



An Object Attribute Guided Framework for Robot Learning Manipulations from Human Demonstration Videos

A framework that can generate robotic manipulation plans by observing human demonstration videos without special marks or unnatural demonstrated behaviors is proposed and is able to learn manipulation plans from demonstration videos with high accuracy.

Prediction of Manipulation Action Classes Using Semantic Spatial Reasoning

A novel prediction algorithm for manipulation action classes in video sequences using the Enriched Semantic Event Chain framework and it is observed that manipulations can be correctly predicted after only (on average) 45% of action's total time and that it is almost twice as fast as the HMM-based method.

Manipulation action tree bank: A knowledge resource for humanoids

It is believed that tree banks are an effective and practical way to organize semantic structures of manipulation actions for humanoids applications and could be used as basis for automatic manipulation action understanding and execution and reasoning and prediction during both observation and execution.

Robot Learning Manipulation Action Plans by "Watching" Unconstrained Videos from the World Wide Web

A system that learns manipulation action plans by processing unconstrained videos from the World Wide Web to robustly generate the sequence of atomic actions of seen longer actions in video in order to acquire knowledge for robots.

Robot Learning and Execution of Collaborative Manipulation Plans from YouTube Videos

This work proposes a framework for understanding and executing demonstrated action sequences from full-length, unconstrained cooking videos on the web, and proposes an open-source platform for executing the learned plans in a simulation environment as well as with an actual robotic arm.

Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction

INGRESS, a robot system that follows human natural language instructions to pick and place everyday objects, is presented and a two-stage neural network model for grounding is proposed and outperformed a state-of-the-art method on the RefCOCO dataset and in robot experiments with humans.

Know Rob 2.0 — A 2nd Generation Knowledge Processing Framework for Cognition-Enabled Robotic Agents

Novel features and extensions of KnowRob2 substantially increase the capabilities of robotic agents of acquiring open-ended manipulation skills and competence, reasoning about how to perform manipulation actions more realistically, and acquiring commonsense knowledge.

What can i do around here? Deep functional scene understanding for cognitive robots

This work addresses the problem of localization and recognition of functional areas in an arbitrary indoor scene, formulated as a two-stage deep learning based detection pipeline, using a new scene functionality testing-bed compiled from two publicly available indoor scene datasets.

Combined Task and Action Learning from Human Demonstrations for Mobile Manipulation Applications

An approach to learning flexible mobile manipulation action models and task goal representations from teacher demonstrations using a probabilistic framework based on Monte Carlo tree search to compute sequences of feasible actions imitating the teacher intention in new settings without requiring the teacher to specify an explicit goal state.