TY - GEN
T1 - Directing policy search with interactively taught via-points
AU - Schroecker, Yannick
AU - Ben Amor, Hani
AU - Thomaz, Andrea
N1 - Funding Information:
This work was conducted as a part of the OpenLabs project 1436618 sponsored by PSA Peugeot and partially funded under ONR grant number N000141410003.
Publisher Copyright:
Copyright © 2016, International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.
PY - 2016
Y1 - 2016
N2 - Policy search has been successfully applied to robot motor learning problems. However, for moderately complex tasks the necessity of good heuristics or initialization still arises. One method that has been used to alleviate this problem is to utilize demonstrations obtained by a human teacher as a starting point for policy search in the space of trajectories. In this paper we describe an alternative way of giving demonstrations as soft via-points and show how they can be used for initialization as well as for active corrections during the learning process. With this approach, we restrict the search space to trajectories that will be close to the taught via-points at the taught time and thereby significantly reduce the number of samples necessary to learn a good policy. We show with a simulated robot arm that our method can efficiently learn to insert an object in a hole with just a minimal demonstration and evaluate our method further on a synthetic letter reproduction task.
AB - Policy search has been successfully applied to robot motor learning problems. However, for moderately complex tasks the necessity of good heuristics or initialization still arises. One method that has been used to alleviate this problem is to utilize demonstrations obtained by a human teacher as a starting point for policy search in the space of trajectories. In this paper we describe an alternative way of giving demonstrations as soft via-points and show how they can be used for initialization as well as for active corrections during the learning process. With this approach, we restrict the search space to trajectories that will be close to the taught via-points at the taught time and thereby significantly reduce the number of samples necessary to learn a good policy. We show with a simulated robot arm that our method can efficiently learn to insert an object in a hole with just a minimal demonstration and evaluate our method further on a synthetic letter reproduction task.
KW - Dynamic movement primitives
KW - Keyframe demonstrations
KW - Learning from demonstration
KW - Reinforcement learning
KW - Reinforcement learning for motor skills
UR - http://www.scopus.com/inward/record.url?scp=85014153645&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85014153645&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85014153645
T3 - Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
SP - 1052
EP - 1059
BT - AAMAS 2016 - Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems
PB - International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
T2 - 15th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2016
Y2 - 9 May 2016 through 13 May 2016
ER -