Online Reinforcement Learning Control for the Personalization of a Robotic Knee Prosthesis

Yue Wen; Jennie Si; Andrea Brandt; Xiang Gao; He Helen Huang

doi:10.1109/TCYB.2019.2890974

Online Reinforcement Learning Control for the Personalization of a Robotic Knee Prosthesis

Yue Wen, Jennie Si, Andrea Brandt, Xiang Gao, He Helen Huang

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Contribution to journal › Article › peer-review

97 Scopus citations

Abstract

Robotic prostheses deliver greater function than passive prostheses, but we face the challenge of tuning a large number of control parameters in order to personalize the device for individual amputee users. This problem is not easily solved by traditional control designs or the latest robotic technology. Reinforcement learning (RL) is naturally appealing. The recent, unprecedented success of AlphaZero demonstrated RL as a feasible, large-scale problem solver. However, the prosthesis-tuning problem is associated with several unaddressed issues such as that it does not have a known and stable model, the continuous states and controls of the problem may result in a curse of dimensionality, and the human-prosthesis system is constantly subject to measurement noise, environmental change and human-body-caused variations. In this paper, we demonstrated the feasibility of direct heuristic dynamic programming, an approximate dynamic programming (ADP) approach, to automatically tune the 12 robotic knee prosthesis parameters to meet individual human users' needs. We tested the ADP-tuner on two subjects (one able-bodied subject and one amputee subject) walking at a fixed speed on a treadmill. The ADP-tuner learned to reach target gait kinematics in an average of 300 gait cycles or 10 min of walking. We observed improved ADP tuning performance when we transferred a previously learned ADP controller to a new learning session with the same subject. To the best of our knowledge, our approach to personalize robotic prostheses is the first implementation of online ADP learning control to a clinical problem involving human subjects.

Original language	English (US)
Article number	8613842
Pages (from-to)	2346-2356
Number of pages	11
Journal	IEEE Transactions on Cybernetics
Volume	50
Issue number	6
DOIs	https://doi.org/10.1109/TCYB.2019.2890974
State	Published - Jun 2020

Keywords

Approximate dynamic programming (ADP)
direct heuristic dynamic programming (dHDP)
reinforcement learning (RL)
robotic knee prosthesis

ASJC Scopus subject areas

Software
Control and Systems Engineering
Information Systems
Human-Computer Interaction
Computer Science Applications
Electrical and Electronic Engineering

Access to Document

10.1109/TCYB.2019.2890974

Cite this

@article{e5d4d2a4b7e343ce9da9cd92be12569b,

title = "Online Reinforcement Learning Control for the Personalization of a Robotic Knee Prosthesis",

abstract = "Robotic prostheses deliver greater function than passive prostheses, but we face the challenge of tuning a large number of control parameters in order to personalize the device for individual amputee users. This problem is not easily solved by traditional control designs or the latest robotic technology. Reinforcement learning (RL) is naturally appealing. The recent, unprecedented success of AlphaZero demonstrated RL as a feasible, large-scale problem solver. However, the prosthesis-tuning problem is associated with several unaddressed issues such as that it does not have a known and stable model, the continuous states and controls of the problem may result in a curse of dimensionality, and the human-prosthesis system is constantly subject to measurement noise, environmental change and human-body-caused variations. In this paper, we demonstrated the feasibility of direct heuristic dynamic programming, an approximate dynamic programming (ADP) approach, to automatically tune the 12 robotic knee prosthesis parameters to meet individual human users' needs. We tested the ADP-tuner on two subjects (one able-bodied subject and one amputee subject) walking at a fixed speed on a treadmill. The ADP-tuner learned to reach target gait kinematics in an average of 300 gait cycles or 10 min of walking. We observed improved ADP tuning performance when we transferred a previously learned ADP controller to a new learning session with the same subject. To the best of our knowledge, our approach to personalize robotic prostheses is the first implementation of online ADP learning control to a clinical problem involving human subjects.",

keywords = "Approximate dynamic programming (ADP), direct heuristic dynamic programming (dHDP), reinforcement learning (RL), robotic knee prosthesis",

author = "Yue Wen and Jennie Si and Andrea Brandt and Xiang Gao and Huang, {He Helen}",

note = "Funding Information: Manuscript received May 1, 2018; revised September 26, 2018; accepted December 19, 2018. Date of publication January 16, 2019; date of current version May 7, 2020. This work was supported in part by the National Science Foundation under Grant 1563454, Grant 1563921, Grant 1808752, and Grant 1808898. This paper was recommended by Associate Editor H. Zhang. (Corresponding authors: He (Helen) Huang; Jennie Si.) Y. Wen, A. Brandt, and H. Huang are with the UNC/NCSU Joint Department of Biomedical Engineering, North Carolina State University, Raleigh, NC 27695 USA, and the University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA (e-mail: hhuang11@ncsu.edu). Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2020",

month = jun,

doi = "10.1109/TCYB.2019.2890974",

language = "English (US)",

volume = "50",

pages = "2346--2356",

journal = "IEEE Transactions on Cybernetics",

issn = "2168-2267",

publisher = "IEEE Advancing Technology for Humanity",

number = "6",

}

TY - JOUR

T1 - Online Reinforcement Learning Control for the Personalization of a Robotic Knee Prosthesis

AU - Wen, Yue

AU - Si, Jennie

AU - Brandt, Andrea

AU - Gao, Xiang

AU - Huang, He Helen

N1 - Funding Information: Manuscript received May 1, 2018; revised September 26, 2018; accepted December 19, 2018. Date of publication January 16, 2019; date of current version May 7, 2020. This work was supported in part by the National Science Foundation under Grant 1563454, Grant 1563921, Grant 1808752, and Grant 1808898. This paper was recommended by Associate Editor H. Zhang. (Corresponding authors: He (Helen) Huang; Jennie Si.) Y. Wen, A. Brandt, and H. Huang are with the UNC/NCSU Joint Department of Biomedical Engineering, North Carolina State University, Raleigh, NC 27695 USA, and the University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA (e-mail: hhuang11@ncsu.edu). Publisher Copyright: © 2013 IEEE.

PY - 2020/6

Y1 - 2020/6

N2 - Robotic prostheses deliver greater function than passive prostheses, but we face the challenge of tuning a large number of control parameters in order to personalize the device for individual amputee users. This problem is not easily solved by traditional control designs or the latest robotic technology. Reinforcement learning (RL) is naturally appealing. The recent, unprecedented success of AlphaZero demonstrated RL as a feasible, large-scale problem solver. However, the prosthesis-tuning problem is associated with several unaddressed issues such as that it does not have a known and stable model, the continuous states and controls of the problem may result in a curse of dimensionality, and the human-prosthesis system is constantly subject to measurement noise, environmental change and human-body-caused variations. In this paper, we demonstrated the feasibility of direct heuristic dynamic programming, an approximate dynamic programming (ADP) approach, to automatically tune the 12 robotic knee prosthesis parameters to meet individual human users' needs. We tested the ADP-tuner on two subjects (one able-bodied subject and one amputee subject) walking at a fixed speed on a treadmill. The ADP-tuner learned to reach target gait kinematics in an average of 300 gait cycles or 10 min of walking. We observed improved ADP tuning performance when we transferred a previously learned ADP controller to a new learning session with the same subject. To the best of our knowledge, our approach to personalize robotic prostheses is the first implementation of online ADP learning control to a clinical problem involving human subjects.

AB - Robotic prostheses deliver greater function than passive prostheses, but we face the challenge of tuning a large number of control parameters in order to personalize the device for individual amputee users. This problem is not easily solved by traditional control designs or the latest robotic technology. Reinforcement learning (RL) is naturally appealing. The recent, unprecedented success of AlphaZero demonstrated RL as a feasible, large-scale problem solver. However, the prosthesis-tuning problem is associated with several unaddressed issues such as that it does not have a known and stable model, the continuous states and controls of the problem may result in a curse of dimensionality, and the human-prosthesis system is constantly subject to measurement noise, environmental change and human-body-caused variations. In this paper, we demonstrated the feasibility of direct heuristic dynamic programming, an approximate dynamic programming (ADP) approach, to automatically tune the 12 robotic knee prosthesis parameters to meet individual human users' needs. We tested the ADP-tuner on two subjects (one able-bodied subject and one amputee subject) walking at a fixed speed on a treadmill. The ADP-tuner learned to reach target gait kinematics in an average of 300 gait cycles or 10 min of walking. We observed improved ADP tuning performance when we transferred a previously learned ADP controller to a new learning session with the same subject. To the best of our knowledge, our approach to personalize robotic prostheses is the first implementation of online ADP learning control to a clinical problem involving human subjects.

KW - Approximate dynamic programming (ADP)

KW - direct heuristic dynamic programming (dHDP)

KW - reinforcement learning (RL)

KW - robotic knee prosthesis

UR - http://www.scopus.com/inward/record.url?scp=85084695560&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85084695560&partnerID=8YFLogxK

U2 - 10.1109/TCYB.2019.2890974

DO - 10.1109/TCYB.2019.2890974

M3 - Article

C2 - 30668514

AN - SCOPUS:85084695560

SN - 2168-2267

VL - 50

SP - 2346

EP - 2356

JO - IEEE Transactions on Cybernetics

JF - IEEE Transactions on Cybernetics

IS - 6

M1 - 8613842

ER -

Online Reinforcement Learning Control for the Personalization of a Robotic Knee Prosthesis

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this