Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient

Kevin Sebastian Luck; Mel Vecerik; Simon Stepputtis; Heni Ben Amor; Jonathan Scholz

doi:10.1109/IROS40897.2019.8967896

Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient

Kevin Sebastian Luck, Mel Vecerik, Simon Stepputtis, Heni Ben Amor, Jonathan Scholz

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

5 Scopus citations

Abstract

Model-free reinforcement learning algorithms such as Deep Deterministic Policy Gradient (DDPG) often require additional exploration strategies, especially if the actor is of deterministic nature. This work evaluates the use of model-based trajectory optimization methods used for exploration in Deep Deterministic Policy Gradient when trained on a latent image embedding. In addition, an extension of DDPG is derived using a value function as critic, making use of a learned deep dynamics model to compute the policy gradient. This approach leads to a symbiotic relationship between the deep reinforcement learning algorithm and the latent trajectory optimizer. The trajectory optimizer benefits from the critic learned by the RL algorithm and the latter from the enhanced exploration generated by the planner. The developed methods are evaluated on two continuous control tasks, one in simulation and one in the real world. In particular, a Baxter robot is trained to perform an insertion task, while only receiving sparse rewards and images as observations from the environment.

Original language	English (US)
Title of host publication	2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	3704-3711
Number of pages	8
ISBN (Electronic)	9781728140049
DOIs	https://doi.org/10.1109/IROS40897.2019.8967896
State	Published - Nov 2019
Event	2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019 - Macau, China Duration: Nov 3 2019 → Nov 8 2019

Publication series

Name	IEEE International Conference on Intelligent Robots and Systems
ISSN (Print)	2153-0858
ISSN (Electronic)	2153-0866

Conference

Conference	2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019
Country/Territory	China
City	Macau
Period	11/3/19 → 11/8/19

ASJC Scopus subject areas

Control and Systems Engineering
Software
Computer Vision and Pattern Recognition
Computer Science Applications

Access to Document

10.1109/IROS40897.2019.8967896

Cite this

Luck, K. S., Vecerik, M., Stepputtis, S., Amor, H. B., & Scholz, J. (2019). Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019 (pp. 3704-3711). Article 8967896 (IEEE International Conference on Intelligent Robots and Systems). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IROS40897.2019.8967896

Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient. / Luck, Kevin Sebastian; Vecerik, Mel; Stepputtis, Simon et al.
2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019. Institute of Electrical and Electronics Engineers Inc., 2019. p. 3704-3711 8967896 (IEEE International Conference on Intelligent Robots and Systems).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Luck, KS, Vecerik, M, Stepputtis, S, Amor, HB & Scholz, J 2019, Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient. in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019., 8967896, IEEE International Conference on Intelligent Robots and Systems, Institute of Electrical and Electronics Engineers Inc., pp. 3704-3711, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019, Macau, China, 11/3/19. https://doi.org/10.1109/IROS40897.2019.8967896

Luck KS, Vecerik M, Stepputtis S, Amor HB, Scholz J. Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 3704-3711. 8967896. (IEEE International Conference on Intelligent Robots and Systems). doi: 10.1109/IROS40897.2019.8967896

Luck, Kevin Sebastian ; Vecerik, Mel ; Stepputtis, Simon et al. / Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 3704-3711 (IEEE International Conference on Intelligent Robots and Systems).

@inproceedings{a535c53ec457488ab467540ee40c2585,

title = "Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient",

abstract = "Model-free reinforcement learning algorithms such as Deep Deterministic Policy Gradient (DDPG) often require additional exploration strategies, especially if the actor is of deterministic nature. This work evaluates the use of model-based trajectory optimization methods used for exploration in Deep Deterministic Policy Gradient when trained on a latent image embedding. In addition, an extension of DDPG is derived using a value function as critic, making use of a learned deep dynamics model to compute the policy gradient. This approach leads to a symbiotic relationship between the deep reinforcement learning algorithm and the latent trajectory optimizer. The trajectory optimizer benefits from the critic learned by the RL algorithm and the latter from the enhanced exploration generated by the planner. The developed methods are evaluated on two continuous control tasks, one in simulation and one in the real world. In particular, a Baxter robot is trained to perform an insertion task, while only receiving sparse rewards and images as observations from the environment.",

author = "Luck, {Kevin Sebastian} and Mel Vecerik and Simon Stepputtis and Amor, {Heni Ben} and Jonathan Scholz",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019 ; Conference date: 03-11-2019 Through 08-11-2019",

year = "2019",

month = nov,

doi = "10.1109/IROS40897.2019.8967896",

language = "English (US)",

series = "IEEE International Conference on Intelligent Robots and Systems",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "3704--3711",

booktitle = "2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019",

}

TY - GEN

T1 - Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient

AU - Luck, Kevin Sebastian

AU - Vecerik, Mel

AU - Stepputtis, Simon

AU - Amor, Heni Ben

AU - Scholz, Jonathan

PY - 2019/11

Y1 - 2019/11

N2 - Model-free reinforcement learning algorithms such as Deep Deterministic Policy Gradient (DDPG) often require additional exploration strategies, especially if the actor is of deterministic nature. This work evaluates the use of model-based trajectory optimization methods used for exploration in Deep Deterministic Policy Gradient when trained on a latent image embedding. In addition, an extension of DDPG is derived using a value function as critic, making use of a learned deep dynamics model to compute the policy gradient. This approach leads to a symbiotic relationship between the deep reinforcement learning algorithm and the latent trajectory optimizer. The trajectory optimizer benefits from the critic learned by the RL algorithm and the latter from the enhanced exploration generated by the planner. The developed methods are evaluated on two continuous control tasks, one in simulation and one in the real world. In particular, a Baxter robot is trained to perform an insertion task, while only receiving sparse rewards and images as observations from the environment.

AB - Model-free reinforcement learning algorithms such as Deep Deterministic Policy Gradient (DDPG) often require additional exploration strategies, especially if the actor is of deterministic nature. This work evaluates the use of model-based trajectory optimization methods used for exploration in Deep Deterministic Policy Gradient when trained on a latent image embedding. In addition, an extension of DDPG is derived using a value function as critic, making use of a learned deep dynamics model to compute the policy gradient. This approach leads to a symbiotic relationship between the deep reinforcement learning algorithm and the latent trajectory optimizer. The trajectory optimizer benefits from the critic learned by the RL algorithm and the latter from the enhanced exploration generated by the planner. The developed methods are evaluated on two continuous control tasks, one in simulation and one in the real world. In particular, a Baxter robot is trained to perform an insertion task, while only receiving sparse rewards and images as observations from the environment.

UR - http://www.scopus.com/inward/record.url?scp=85081165540&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85081165540&partnerID=8YFLogxK

U2 - 10.1109/IROS40897.2019.8967896

DO - 10.1109/IROS40897.2019.8967896

M3 - Conference contribution

AN - SCOPUS:85081165540

T3 - IEEE International Conference on Intelligent Robots and Systems

SP - 3704

EP - 3711

BT - 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019

Y2 - 3 November 2019 through 8 November 2019

ER -

Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this