TY - JOUR
T1 - Online Reinforcement Learning Control for the Personalization of a Robotic Knee Prosthesis
AU - Wen, Yue
AU - Si, Jennie
AU - Brandt, Andrea
AU - Gao, Xiang
AU - Huang, He Helen
N1 - Funding Information:
Manuscript received May 1, 2018; revised September 26, 2018; accepted December 19, 2018. Date of publication January 16, 2019; date of current version May 7, 2020. This work was supported in part by the National Science Foundation under Grant 1563454, Grant 1563921, Grant 1808752, and Grant 1808898. This paper was recommended by Associate Editor H. Zhang. (Corresponding authors: He (Helen) Huang; Jennie Si.) Y. Wen, A. Brandt, and H. Huang are with the UNC/NCSU Joint Department of Biomedical Engineering, North Carolina State University, Raleigh, NC 27695 USA, and the University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA (e-mail: hhuang11@ncsu.edu).
Publisher Copyright:
© 2013 IEEE.
PY - 2020/6
Y1 - 2020/6
N2 - Robotic prostheses deliver greater function than passive prostheses, but we face the challenge of tuning a large number of control parameters in order to personalize the device for individual amputee users. This problem is not easily solved by traditional control designs or the latest robotic technology. Reinforcement learning (RL) is naturally appealing. The recent, unprecedented success of AlphaZero demonstrated RL as a feasible, large-scale problem solver. However, the prosthesis-tuning problem is associated with several unaddressed issues such as that it does not have a known and stable model, the continuous states and controls of the problem may result in a curse of dimensionality, and the human-prosthesis system is constantly subject to measurement noise, environmental change and human-body-caused variations. In this paper, we demonstrated the feasibility of direct heuristic dynamic programming, an approximate dynamic programming (ADP) approach, to automatically tune the 12 robotic knee prosthesis parameters to meet individual human users' needs. We tested the ADP-tuner on two subjects (one able-bodied subject and one amputee subject) walking at a fixed speed on a treadmill. The ADP-tuner learned to reach target gait kinematics in an average of 300 gait cycles or 10 min of walking. We observed improved ADP tuning performance when we transferred a previously learned ADP controller to a new learning session with the same subject. To the best of our knowledge, our approach to personalize robotic prostheses is the first implementation of online ADP learning control to a clinical problem involving human subjects.
AB - Robotic prostheses deliver greater function than passive prostheses, but we face the challenge of tuning a large number of control parameters in order to personalize the device for individual amputee users. This problem is not easily solved by traditional control designs or the latest robotic technology. Reinforcement learning (RL) is naturally appealing. The recent, unprecedented success of AlphaZero demonstrated RL as a feasible, large-scale problem solver. However, the prosthesis-tuning problem is associated with several unaddressed issues such as that it does not have a known and stable model, the continuous states and controls of the problem may result in a curse of dimensionality, and the human-prosthesis system is constantly subject to measurement noise, environmental change and human-body-caused variations. In this paper, we demonstrated the feasibility of direct heuristic dynamic programming, an approximate dynamic programming (ADP) approach, to automatically tune the 12 robotic knee prosthesis parameters to meet individual human users' needs. We tested the ADP-tuner on two subjects (one able-bodied subject and one amputee subject) walking at a fixed speed on a treadmill. The ADP-tuner learned to reach target gait kinematics in an average of 300 gait cycles or 10 min of walking. We observed improved ADP tuning performance when we transferred a previously learned ADP controller to a new learning session with the same subject. To the best of our knowledge, our approach to personalize robotic prostheses is the first implementation of online ADP learning control to a clinical problem involving human subjects.
KW - Approximate dynamic programming (ADP)
KW - direct heuristic dynamic programming (dHDP)
KW - reinforcement learning (RL)
KW - robotic knee prosthesis
UR - http://www.scopus.com/inward/record.url?scp=85084695560&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084695560&partnerID=8YFLogxK
U2 - 10.1109/TCYB.2019.2890974
DO - 10.1109/TCYB.2019.2890974
M3 - Article
C2 - 30668514
AN - SCOPUS:85084695560
SN - 2168-2267
VL - 50
SP - 2346
EP - 2356
JO - IEEE Transactions on Cybernetics
JF - IEEE Transactions on Cybernetics
IS - 6
M1 - 8613842
ER -