On-line learning control by association and reinforcement

Jennie Si; Yu Tsung Wang

doi:10.1109/72.914523

On-line learning control by association and reinforcement

Jennie Si, Yu Tsung Wang

Electrical Engineering

Research output: Contribution to journal › Article › peer-review

715 Scopus citations

Abstract

This paper focuses on a systematic treatment for developing a generic on-line learning control system based on the fundamental principle of reinforcement learning or more specifically neural dynamic programming. This on-line learning system improves its performance over time in two aspects. First, it learns from its own mistakes through the reinforcement signal from the external environment and tries to reinforce its action to improve future performance. Second, system states associated with the positive reinforcement is memorized through a network learning process where in the future, similar states will be more positively associated with a control action leading to a positive reinforcement. A successful candidate of on-line learning control design will be introduced. Real-time learning algorithms will be derived for individual components in the learning system. Some analytical insight will be provided to give guidelines on the learning process took place in each module of the on-line learning control system. The performance of the on-line learning controller is measured by its learning speed, success rate of learning, and the degree to meet the learning control objective. The overall learning control system performance will be tested on a single cart-pole balancing problem, a pendulum swing up and balancing task, and a more complex problem of balancing a triple-link inverted pendulum.

Original language	English (US)
Pages (from-to)	264-276
Number of pages	13
Journal	IEEE Transactions on Neural Networks
Volume	12
Issue number	2
DOIs	https://doi.org/10.1109/72.914523
State	Published - Mar 2001

Keywords

Neural dynamic programming (NDP)
On-line learning
Reinforcement learning

ASJC Scopus subject areas

Software
Computer Science Applications
Computer Networks and Communications
Artificial Intelligence

Access to Document

10.1109/72.914523

Cite this

@article{ad7c195525a541ab89feffddd1d78734,

title = "On-line learning control by association and reinforcement",

abstract = "This paper focuses on a systematic treatment for developing a generic on-line learning control system based on the fundamental principle of reinforcement learning or more specifically neural dynamic programming. This on-line learning system improves its performance over time in two aspects. First, it learns from its own mistakes through the reinforcement signal from the external environment and tries to reinforce its action to improve future performance. Second, system states associated with the positive reinforcement is memorized through a network learning process where in the future, similar states will be more positively associated with a control action leading to a positive reinforcement. A successful candidate of on-line learning control design will be introduced. Real-time learning algorithms will be derived for individual components in the learning system. Some analytical insight will be provided to give guidelines on the learning process took place in each module of the on-line learning control system. The performance of the on-line learning controller is measured by its learning speed, success rate of learning, and the degree to meet the learning control objective. The overall learning control system performance will be tested on a single cart-pole balancing problem, a pendulum swing up and balancing task, and a more complex problem of balancing a triple-link inverted pendulum.",

keywords = "Neural dynamic programming (NDP), On-line learning, Reinforcement learning",

author = "Jennie Si and Wang, {Yu Tsung}",

note = "Funding Information: Manuscript received October 7, 1999; revised March 20, 2000 and November 20, 2000. This work was supported by NSF under Grants ECS-9553202 and ECS-0002098 and in part by EPRI-DOD under Grant WO8333-01, by DARPA under Grant MDA 972-00-1-0027, and by Motorola. The authors are with Department of Electrical Engineering, Arizona State University, Tempe, AZ 85287-7606 USA (e-mail: si@asu.edu). Publisher Item Identifier S 1045-9227(01)01404-7.",

year = "2001",

month = mar,

doi = "10.1109/72.914523",

language = "English (US)",

volume = "12",

pages = "264--276",

journal = "IEEE Transactions on Neural Networks",

issn = "1045-9227",

publisher = "IEEE Computational Intelligence Society",

number = "2",

}

TY - JOUR

T1 - On-line learning control by association and reinforcement

AU - Si, Jennie

AU - Wang, Yu Tsung

N1 - Funding Information: Manuscript received October 7, 1999; revised March 20, 2000 and November 20, 2000. This work was supported by NSF under Grants ECS-9553202 and ECS-0002098 and in part by EPRI-DOD under Grant WO8333-01, by DARPA under Grant MDA 972-00-1-0027, and by Motorola. The authors are with Department of Electrical Engineering, Arizona State University, Tempe, AZ 85287-7606 USA (e-mail: si@asu.edu). Publisher Item Identifier S 1045-9227(01)01404-7.

PY - 2001/3

Y1 - 2001/3

N2 - This paper focuses on a systematic treatment for developing a generic on-line learning control system based on the fundamental principle of reinforcement learning or more specifically neural dynamic programming. This on-line learning system improves its performance over time in two aspects. First, it learns from its own mistakes through the reinforcement signal from the external environment and tries to reinforce its action to improve future performance. Second, system states associated with the positive reinforcement is memorized through a network learning process where in the future, similar states will be more positively associated with a control action leading to a positive reinforcement. A successful candidate of on-line learning control design will be introduced. Real-time learning algorithms will be derived for individual components in the learning system. Some analytical insight will be provided to give guidelines on the learning process took place in each module of the on-line learning control system. The performance of the on-line learning controller is measured by its learning speed, success rate of learning, and the degree to meet the learning control objective. The overall learning control system performance will be tested on a single cart-pole balancing problem, a pendulum swing up and balancing task, and a more complex problem of balancing a triple-link inverted pendulum.

AB - This paper focuses on a systematic treatment for developing a generic on-line learning control system based on the fundamental principle of reinforcement learning or more specifically neural dynamic programming. This on-line learning system improves its performance over time in two aspects. First, it learns from its own mistakes through the reinforcement signal from the external environment and tries to reinforce its action to improve future performance. Second, system states associated with the positive reinforcement is memorized through a network learning process where in the future, similar states will be more positively associated with a control action leading to a positive reinforcement. A successful candidate of on-line learning control design will be introduced. Real-time learning algorithms will be derived for individual components in the learning system. Some analytical insight will be provided to give guidelines on the learning process took place in each module of the on-line learning control system. The performance of the on-line learning controller is measured by its learning speed, success rate of learning, and the degree to meet the learning control objective. The overall learning control system performance will be tested on a single cart-pole balancing problem, a pendulum swing up and balancing task, and a more complex problem of balancing a triple-link inverted pendulum.

KW - Neural dynamic programming (NDP)

KW - On-line learning

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=0035273403&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0035273403&partnerID=8YFLogxK

U2 - 10.1109/72.914523

DO - 10.1109/72.914523

M3 - Article

C2 - 18244383

AN - SCOPUS:0035273403

SN - 1045-9227

VL - 12

SP - 264

EP - 276

JO - IEEE Transactions on Neural Networks

JF - IEEE Transactions on Neural Networks

IS - 2

ER -

On-line learning control by association and reinforcement

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this