Task Learning Through Imitation and Human-Robot Interaction

Abstract

behaviors embed representations of goals in the form of abstracted environmental states. This is a key feature critical for learning from experience. To learn a task, the robot must create a mapping between its perception (observations) and its own behaviors that achieve the observed effects. This process is enabled by abstract behaviors, the perceptual component of a behavior, which activate each time the robot’s observations match the goal(s) of a primitive behavior. This correlation enables the robot to identify its own behaviors that are relevant for the task being learned. Primitive behaviors execute the robot’s actions and achieve its goals. They are also used for communication and interaction. Acting in the environment is a form of implicit communication. By using evocative actions, people and other animals convey emotions, desires, interests, and intentions. Action-based communication has the advantage that it need not be restricted to robots or agents with a humanoid body or face: structural similarities between interacting agents are not required for successful interaction. Even if there is no direct mapping between the physical characteristics of the robot and its user, the robot can still use communication 1.4 Communication by Acting a Means for Robot-Human Interaction 5 through action to convey certain types of messages, drawing on human common sense [16]. 1.4 Communication by Acting a Means for Robot-Human Interaction Consider a prelinguistic child who wants an out-of-reach toy. The child will try to bring a grown-up to the toy and will then point and reach, indicating his desires. Similarly, a dog will run back and forth to induce its owner to come to a place where it has found something it desires. The ability of the child and the dog to demonstrate their desires and intentions by calling a helper and mock-executing actions is an expressive and natural way to communicate a problem and need for help. The human capacity to understand these intentions is also natural and inherent. We apply the same strategy in enabling robots to communicate their desires and intentions to people. The action-based communication approach we propose is general and can be applied on a variety tasks and physical bodies/platforms. The robot performs its task independently, but if it fails in a cognizant fashion, it searches for a human and attempts to induce him to follow it to the place where the failure occurred, and then demonstrates its intentions in hopes of obtaining help. Attracting a human to help is achieved through movement, using back-and-forth, cyclic actions. After capturing the human’s attention, the robot leads the human helper to the site of the task and attempts to resume its work from the point where it failed. To communicate the nature of the problem, the robot repeatedly tries to execute the failed behavior in front of its helper. This is a general strategy that can be employed for a wide variety of failures but, notably, not for all. Executing the previously failed behavior will likely fail again, effectively expressing the robot’s problem to the human observer. 1.4.1 Experiments in Communication by Acting We implemented and tested our concepts on a Pioneer 2-DX mobile robot, equipped with two sonar rings (8 front and 8 rear), a SICK laser range-finder, a pan-tilt-zoom color camera, a gripper, and on-board computation on a PC104 stack. The robot had a behavior set that allowed it to track cylindrical colored targets (Track (ColorOfTarget, GoalAngle, GoalDistance)), to pick up PickUp(ColorOfObject), and to drop small colored objects Drop. These behaviors were implemented in AYLLU [17]. In the validation experiments, we asked a person that had not worked with the robot before to be near-by during task execution and to expect to be engaged in an interaction. There is no initial assumption that people will be helpful or motivated to assist the robot. The robot is able to deal with unhelpful or misleading humans 6 Task Learning Through Imitation and Human-Robot Interaction by monitoring their presence along with its progress in the task. The following main categories of interactions emerged from the experiments: uninterested: the human was not interested, did not react to, or did not understand the robot’s need for help. As a result, the robot searched for another helper. interested but unhelpful: the human was interested and followed the robot for a while, but then abandoned it. As above, the robot searched for another helper. helpful: the human was interested, followed the robot to the location of the problem, and assisted the robot. In these cases, the robot was able to finish the task. (a) Going through a blocked gate (b) Picking up an inaccessible box (c) Visiting a missing target Fig. 1.3. The human-robot interaction experiments setup We purposefully constrained the environment used in the experiments to encourage human-robot interaction, as follows: Traversing blocked gates: the robot’s task was to pass through a gate formed by two closely placed colored targets (Figure 1.3(a)), but its path was blocked by a large box. The robot expressed its intentions by executing the Track behavior, making its way around one of the targets. Trying to reach the desired distance and angle to the target while being hindered by box resulted in its clear manifestation of the direction it wanted to pursue, blocked by the obstacle. Moving inaccessible located objects: the robot’s task was to pick up a small object which was made inaccessible by being placed in a narrow space between two large boxes (Figure 1.3(b)). The robot expressed its intentions by attempting to execute the PickUp behavior, lowering and opening its gripper and tilting its camera downward while approaching the object, and then moving backwards to avoid the boxes. 1.5 Learning from Imitation and Additional Cues 7 Visiting non-existing targets: the robot’s task was to visit a number of targets in a specific order (Green, Orange, Blue, Yellow, Orange, Green), in an environment where one of the targets had been removed (Figure 1.3(c)). After some time, the robot gave up searching for the missing target and sought out a human helper. The robot expressed its intentions by searching for a target, which appeared as aimless wandering. This behavior was not conducive for the human to infer the robot’s goal and problem. In this and similar situations, our framework would benefit from more explicit communication. 1.4.2 Discussion From the experimental results [18] and the interviews and report of the human subject who interacted with the robot, we derived the following conclusions about the robot’s social behavior: Capturing a human’s attention by approaching and then going back-and-forth is a behavior typically easily recognized and interpreted as soliciting help. Getting a human to follow by turning around and starting to go to the place where the problem occurred (after capturing the human’s attention) requires multiple trials in order for the human to follow the robot the entire way. Even if interested and realizing that the robot wants something from him, the human may have trouble understanding that following is the desired behavior. Also, after choosing to follow the robot, if wandering in search of the place with the problem takes too long, the human gives up not knowing whether the robot still needs him. Conveying intentions by repeating a failing behavior in front of a helper is effective for tasks in which the task components requiring help are observable to the human (such as the blocked gate). However, if some part of the task is not observable (such as the missing target), the human cannot infer it from the robot’s behavior and thus is not able to help (at least not without trial and error). 1.5 Learning from Imitation and Additional Cues Learning by observation and imitation are especially effective means of human skill acquisition. As skill or task complexity increases, however, teaching typically involves increased concurrent use of multiple instructional modalities, including demonstration, verbal instruction, attentional cues, and gestures. Students/learners are typically given one or a few demonstrations of the task, followed by a set of supervised practice trials. During those, the teacher provides feedback cues indicating needed corrections. The teacher may also provide additional demonstrations 8 Task Learning Through Imitation and Human-Robot Interaction that could be used for generalization. While most of these learning and teaching tools are typically overlooked in the majority of robot teaching approaches, considering them collectively improves the imitation learning process considerably. Toward this end, we developed a method for learning representations of high level tasks. Specifically, we augmented imitation learning by allowing the demonstrator to employ additional instructive activities (verbal commands and attentional cues) and by refining the learned representations through generalization from multiple learning experiences and through direct feedback from the teacher. In our work, the robot is equipped with a set of skills in the form of behaviors [13, 14]; we focus on a strategy that enables it to use those behaviors to construct a high-level task representation of a novel complex, sequentially structured task. We use learning by experienced demonstrations in which the robot actively participates in the demonstration provided by the teacher, experiencing the task through its own sensors, an essential characteristic of our approach. We assume that the teacher knows what behaviors the robot has, and also by what means (sensors) the robot can perceive demonstrations. The advantage of putting the robot through the task during the demonstration is that the robot is able to adjust its behaviors (via their parameters) using the information gathered through its own sensors: the values of all behaviors’ parameters are learned directly from sensory information obtained from the environment. In addition, executing the task during the demonstration provides observations that contain temporal information for proper behavior sequencing (which would otherwise be tedious to design by hand). During the demonstration the robot follows the teacher while all of its behaviors continuously monitor the status of their postconditions (without executing any of their actions). Whenever the robot’s observations match the goals of one or more primitive behavior, this means the robot has observed something it is also able to perform, and the corresponding abstract behavior activates, allowing the robot to learn which of its own behaviors that are relevant for the particular portion of the task being demonstrated. Feedback cues received from the teacher are used in conjunction with these observations, to eliminate any irrelevant observations. The general idea of our task learning algorithm is to add to the robot’s behavior network an instance of all behaviors whose postconditions have been detected as true during the demonstration, and during which there have been relevance signals from the teacher, in the order of their occurrence (on-line stage). At the end of the teaching experience, the intervals of time when the effects of each of the behaviors were true are known, and are used to determine if these effects were active in overlapping intervals or in sequence. Based on that information, the algorithm generates proper dependency links (permanent, enabling or ordering) between behaviors (off-line stage). This one-step learning process is described in detail in [18]. 1.5 Learning from Imitation and Additional Cues 9 Through this process, the robot may learn a correct, but over-specialized version of the task. Two types of errors can occur in the learning process: learning irrelevant steps (false positives) and omission of steps that are relevant (false negatives). It is thus important that the robot can generalize over multiple demonstrations and incorporate additional feedback from the demonstrator. To enable a robot to generalize from multiple demonstrations of the same task (presented in similar or different environments), we build a task representation that encodes the specifics of each of the given examples, and also incorporates their common components [19]. However, repeated observations of irrelevant steps will also inadvertently lead the learner to include them in the learned representation. Also, limitations in robot sensing and challenging structures in the environment may prevent the robot from observing some relevant steps. To address these issues, we allow the teacher to provide feedback to the robot while observing the robot’s execution of the learned task, during practice experiments. The teacher signals any detected errors as they occur, through appropriate feedback cues (as spoken commands). The provided feedback allows the robot to eliminate irrelevant observations and, by re-demonstrating relevant steps that were previously missed, the demonstrator enables the robot to make its learned task representation more complete [19]. 1.5.1 Experiments in Learning from Multiple Cues We implemented and tested our approach to learning with additional cues on the Pioneer 2-DX mobile robot described in Section 1.4.1. For the voice commands and feedback we used an off-the-shelf Logitech cordless headset, and the IBM ViaVoice software recognition engine. We performed two sets of robot teaching experiments to validate the key features of the proposed approach. 1.5.1.1 Generalization from a Small Number of Examples In the first experiment set, we demonstrated the robot’s generalization capabilities by teaching it an object transport task in three consecutive demonstrations, performed in different environments (Figure 1.4), and designed to contain incorrect steps and inconsistencies. The environment consisted of a set of colored cylindrical targets. The teacher lead the robot around those, instructing it when to pick up or drop a small orange box. The task to be learned was as follows: go to either the Green (G) or the Light Green (LG) targets, pick up an Orange (O) box, go between the Yellow (Y) and Red (R) targets, go to the Pink (P) target, drop the box there, then go to the Light Orange (LO) target, and come back to the target Light Green. The shown courses of the three demonstrations illustrate that none corresponded 10 Task Learning Through Imitation and Human-Robot Interaction (a) First demonstration (b) Second demonstration (c) Third demonstration Fig. 1.4. Structure of the environment and course of demonstration exactly to the intended task description. Some contained unnecessary steps (such as a final visit to a Green target in the first trial), and some had inconsistencies (such as the visits to the Light Orange target at various demonstration stages). Figure 1.5 shows the task representations (their topological form) obtained after each learning demonstration, followed by generalization. The topological representation of a task network is obtained by applying a topological sort on the behavior network graph; this representation shows the succession of behavior execution for the task. With generalization, the following types of alternative paths can be obtained: Both paths contain actual behaviors. For example, Figure 1.5(c) encodes the fact that both going to the Green or to the Light Green targets is acceptable for the task. Given such alternate paths, the robot choses opportunistically, as induced by the state of the environment (e.g., go to the target seen first). One path is a direct link to the end of the other alternate sequence. In Figure 1.5(c), there is a direct link from MT5(Red,...) to MT7(Pink,...), bypassing the behavior MT6(LOrange,...). For such paths, the robot automatically choses the direct path, shortcutting the alternate sequence. The generalized representation captures the main structure of the task while correctly treating the irrelevant and inconsistent components: they are captured as parts of a bypassed alternate path that will never be executed. While irrelevant actions are thus effectively pruned, any necessary but inconsistently demonstrated steps would have to be included by different means. This is to be expected; generalization alone, when provided with inconsistent examples, is not sufficient for learning a correct representation. The next section shows how practice and teacher feedback can be used for solving this problem. 1.5.1.2 Learning from practice and teacher feedback We allowed the robot to refine the previously learned task representation through practice (Figure 1.5(e)) in a different environment (Figure 1.6(a)). Figure 1.6(b) shows the robot’s trajectory and the teacher’s intervention (dotted). After dropping 1.5 Learning from Imitation and Additional Cues 11

Extracted Key Phrases

7 Figures and Tables

Cite this paper

@inproceedings{Nicolescu2004TaskLT, title={Task Learning Through Imitation and Human-Robot Interaction}, author={Monica N. Nicolescu and Maja J. Mataric}, year={2004} }