TY - JOUR
T1 - HiFlash
T2 - Communication-Efficient Hierarchical Federated Learning With Adaptive Staleness Control and Heterogeneity-Aware Client-Edge Association
AU - Wu, Qiong
AU - Chen, Xu
AU - Ouyang, Tao
AU - Zhou, Zhi
AU - Zhang, Xiaoxi
AU - Yang, Shusen
AU - Zhang, Junshan
N1 - Funding Information:
This work was supported in part by the National Science Foundation of China under Grants U20A20159, 61972432; in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2021B151520008; in part by the Program for Guangdong Introducing Innovative and Entrepreneurial Teams under Grant 2017ZT07X355.
Publisher Copyright:
© 1990-2012 IEEE.
PY - 2023/5/1
Y1 - 2023/5/1
N2 - Federated learning (FL) is a promising paradigm that enables collaboratively learning a shared model across massive clients while keeping the training data locally. However, for many existing FL systems, clients need to frequently exchange model parameters of large data size with the remote cloud server directly via wide-area networks (WAN), leading to significant communication overhead and long transmission time. To mitigate the communication bottleneck, we resort to the hierarchical federated learning paradigm of HiFL, which reaps the benefits of mobile edge computing and combines synchronous client-edge model aggregation and asynchronous edge-cloud model aggregation together to greatly reduce the traffic volumes of WAN transmissions. Specifically, we first analyze the convergence bound of HiFL theoretically and identify the key controllable factors for model performance improvement. We then advocate an enhanced design of HiFlash by innovatively integrating deep reinforcement learning based adaptive staleness control and heterogeneity-aware client-edge association strategy to boost the system efficiency and mitigate the staleness effect without compromising model accuracy. Extensive experiments corroborate the superior performance of HiFlash in model accuracy, communication reduction, and system efficiency.
AB - Federated learning (FL) is a promising paradigm that enables collaboratively learning a shared model across massive clients while keeping the training data locally. However, for many existing FL systems, clients need to frequently exchange model parameters of large data size with the remote cloud server directly via wide-area networks (WAN), leading to significant communication overhead and long transmission time. To mitigate the communication bottleneck, we resort to the hierarchical federated learning paradigm of HiFL, which reaps the benefits of mobile edge computing and combines synchronous client-edge model aggregation and asynchronous edge-cloud model aggregation together to greatly reduce the traffic volumes of WAN transmissions. Specifically, we first analyze the convergence bound of HiFL theoretically and identify the key controllable factors for model performance improvement. We then advocate an enhanced design of HiFlash by innovatively integrating deep reinforcement learning based adaptive staleness control and heterogeneity-aware client-edge association strategy to boost the system efficiency and mitigate the staleness effect without compromising model accuracy. Extensive experiments corroborate the superior performance of HiFlash in model accuracy, communication reduction, and system efficiency.
KW - Client-edge association
KW - federated learning
KW - hierarchical mechanism
KW - staleness control
UR - http://www.scopus.com/inward/record.url?scp=85147300592&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147300592&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2023.3238049
DO - 10.1109/TPDS.2023.3238049
M3 - Article
AN - SCOPUS:85147300592
SN - 1045-9219
VL - 34
SP - 1560
EP - 1579
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 5
ER -