HiFlash: Communication-Efficient Hierarchical Federated Learning With Adaptive Staleness Control and Heterogeneity-Aware Client-Edge Association

Qiong Wu; Xu Chen; Tao Ouyang; Zhi Zhou; Xiaoxi Zhang; Shusen Yang; Junshan Zhang

doi:10.1109/TPDS.2023.3238049

HiFlash: Communication-Efficient Hierarchical Federated Learning With Adaptive Staleness Control and Heterogeneity-Aware Client-Edge Association

Qiong Wu, Xu Chen, Tao Ouyang, Zhi Zhou, Xiaoxi Zhang, Shusen Yang, Junshan Zhang

Research output: Contribution to journal › Article › peer-review

12 Scopus citations

Abstract

Federated learning (FL) is a promising paradigm that enables collaboratively learning a shared model across massive clients while keeping the training data locally. However, for many existing FL systems, clients need to frequently exchange model parameters of large data size with the remote cloud server directly via wide-area networks (WAN), leading to significant communication overhead and long transmission time. To mitigate the communication bottleneck, we resort to the hierarchical federated learning paradigm of HiFL, which reaps the benefits of mobile edge computing and combines synchronous client-edge model aggregation and asynchronous edge-cloud model aggregation together to greatly reduce the traffic volumes of WAN transmissions. Specifically, we first analyze the convergence bound of HiFL theoretically and identify the key controllable factors for model performance improvement. We then advocate an enhanced design of HiFlash by innovatively integrating deep reinforcement learning based adaptive staleness control and heterogeneity-aware client-edge association strategy to boost the system efficiency and mitigate the staleness effect without compromising model accuracy. Extensive experiments corroborate the superior performance of HiFlash in model accuracy, communication reduction, and system efficiency.

Original language	English (US)
Pages (from-to)	1560-1579
Number of pages	20
Journal	IEEE Transactions on Parallel and Distributed Systems
Volume	34
Issue number	5
DOIs	https://doi.org/10.1109/TPDS.2023.3238049
State	Published - May 1 2023
Externally published	Yes

Keywords

Client-edge association
federated learning
hierarchical mechanism
staleness control

ASJC Scopus subject areas

Signal Processing
Hardware and Architecture
Computational Theory and Mathematics

Access to Document

10.1109/TPDS.2023.3238049

Cite this

@article{8a9fcf49b7f54df381d4547ac2215bf1,

title = "HiFlash: Communication-Efficient Hierarchical Federated Learning With Adaptive Staleness Control and Heterogeneity-Aware Client-Edge Association",

abstract = "Federated learning (FL) is a promising paradigm that enables collaboratively learning a shared model across massive clients while keeping the training data locally. However, for many existing FL systems, clients need to frequently exchange model parameters of large data size with the remote cloud server directly via wide-area networks (WAN), leading to significant communication overhead and long transmission time. To mitigate the communication bottleneck, we resort to the hierarchical federated learning paradigm of HiFL, which reaps the benefits of mobile edge computing and combines synchronous client-edge model aggregation and asynchronous edge-cloud model aggregation together to greatly reduce the traffic volumes of WAN transmissions. Specifically, we first analyze the convergence bound of HiFL theoretically and identify the key controllable factors for model performance improvement. We then advocate an enhanced design of HiFlash by innovatively integrating deep reinforcement learning based adaptive staleness control and heterogeneity-aware client-edge association strategy to boost the system efficiency and mitigate the staleness effect without compromising model accuracy. Extensive experiments corroborate the superior performance of HiFlash in model accuracy, communication reduction, and system efficiency.",

keywords = "Client-edge association, federated learning, hierarchical mechanism, staleness control",

author = "Qiong Wu and Xu Chen and Tao Ouyang and Zhi Zhou and Xiaoxi Zhang and Shusen Yang and Junshan Zhang",

note = "Funding Information: This work was supported in part by the National Science Foundation of China under Grants U20A20159, 61972432; in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2021B151520008; in part by the Program for Guangdong Introducing Innovative and Entrepreneurial Teams under Grant 2017ZT07X355. Publisher Copyright: {\textcopyright} 1990-2012 IEEE.",

year = "2023",

month = may,

day = "1",

doi = "10.1109/TPDS.2023.3238049",

language = "English (US)",

volume = "34",

pages = "1560--1579",

journal = "IEEE Transactions on Parallel and Distributed Systems",

issn = "1045-9219",

publisher = "IEEE Computer Society",

number = "5",

}

TY - JOUR

T1 - HiFlash

T2 - Communication-Efficient Hierarchical Federated Learning With Adaptive Staleness Control and Heterogeneity-Aware Client-Edge Association

AU - Wu, Qiong

AU - Chen, Xu

AU - Ouyang, Tao

AU - Zhou, Zhi

AU - Zhang, Xiaoxi

AU - Yang, Shusen

AU - Zhang, Junshan

N1 - Funding Information: This work was supported in part by the National Science Foundation of China under Grants U20A20159, 61972432; in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2021B151520008; in part by the Program for Guangdong Introducing Innovative and Entrepreneurial Teams under Grant 2017ZT07X355. Publisher Copyright: © 1990-2012 IEEE.

PY - 2023/5/1

Y1 - 2023/5/1

N2 - Federated learning (FL) is a promising paradigm that enables collaboratively learning a shared model across massive clients while keeping the training data locally. However, for many existing FL systems, clients need to frequently exchange model parameters of large data size with the remote cloud server directly via wide-area networks (WAN), leading to significant communication overhead and long transmission time. To mitigate the communication bottleneck, we resort to the hierarchical federated learning paradigm of HiFL, which reaps the benefits of mobile edge computing and combines synchronous client-edge model aggregation and asynchronous edge-cloud model aggregation together to greatly reduce the traffic volumes of WAN transmissions. Specifically, we first analyze the convergence bound of HiFL theoretically and identify the key controllable factors for model performance improvement. We then advocate an enhanced design of HiFlash by innovatively integrating deep reinforcement learning based adaptive staleness control and heterogeneity-aware client-edge association strategy to boost the system efficiency and mitigate the staleness effect without compromising model accuracy. Extensive experiments corroborate the superior performance of HiFlash in model accuracy, communication reduction, and system efficiency.

AB - Federated learning (FL) is a promising paradigm that enables collaboratively learning a shared model across massive clients while keeping the training data locally. However, for many existing FL systems, clients need to frequently exchange model parameters of large data size with the remote cloud server directly via wide-area networks (WAN), leading to significant communication overhead and long transmission time. To mitigate the communication bottleneck, we resort to the hierarchical federated learning paradigm of HiFL, which reaps the benefits of mobile edge computing and combines synchronous client-edge model aggregation and asynchronous edge-cloud model aggregation together to greatly reduce the traffic volumes of WAN transmissions. Specifically, we first analyze the convergence bound of HiFL theoretically and identify the key controllable factors for model performance improvement. We then advocate an enhanced design of HiFlash by innovatively integrating deep reinforcement learning based adaptive staleness control and heterogeneity-aware client-edge association strategy to boost the system efficiency and mitigate the staleness effect without compromising model accuracy. Extensive experiments corroborate the superior performance of HiFlash in model accuracy, communication reduction, and system efficiency.

KW - Client-edge association

KW - federated learning

KW - hierarchical mechanism

KW - staleness control

UR - http://www.scopus.com/inward/record.url?scp=85147300592&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85147300592&partnerID=8YFLogxK

U2 - 10.1109/TPDS.2023.3238049

DO - 10.1109/TPDS.2023.3238049

M3 - Article

AN - SCOPUS:85147300592

SN - 1045-9219

VL - 34

SP - 1560

EP - 1579

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

IS - 5

ER -

HiFlash: Communication-Efficient Hierarchical Federated Learning With Adaptive Staleness Control and Heterogeneity-Aware Client-Edge Association

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this