DA3: Dynamic Additive Attention Adaption for Memory-Efficient On-Device Multi-Domain Learning

Li Yang; Adnan Siraj Rakin; Deliang Fan

doi:10.1109/CVPRW56347.2022.00295

DA³: Dynamic Additive Attention Adaption for Memory-Efficient On-Device Multi-Domain Learning

Li Yang, Adnan Siraj Rakin, Deliang Fan

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

2 Scopus citations

Abstract

Nowadays, one practical limitation of deep neural network (DNN) is its high degree of specialization to a single task or domain (e.g., one visual domain). It motivates re-searchers to develop algorithms that can adapt DNN model to multiple domains sequentially, while still performing well on the past domains, which is known as multi-domain learning. Almost all conventional methods only focus on improving accuracy with minimal parameter update, while ignoring high computing and memory cost during training, which makes it difficult to deploy multi-domain learning into more and more widely used resource-limited edge devices, like mobile phone, IoT, embedded system, etc. During our study in multi-domain training process, we observe that large memory used for activation storage is the bottleneck that largely limits the training time and cost on edge devices. To reduce training memory usage, while keeping the domain adaption accuracy performance, we propose Dynamic Additive Attention Adaption (DA3), a novel memory-efficient on-device multi-domain learning method. DA3 learns a novel additive attention adaptor module, while freezing the weights of the pre-trained backbone model for each domain. Differentiating from prior works, our proposed DA3 module not only mitigates activation memory buffering for reducing memory usage during training, but also serves as dynamic gating mechanism to reduce the computation cost for fast inference. We validate DA3 on multiple dataset against state-of-the-art methods, which shows great improvement in both accuracy and training time. Moreover, we deploy DA3 into the popular NIVDIA Jetson Nano edge GPU, where the measured experimental results show our proposed DA3 reduces the on-device training memory consumption by 5-37×, and training time by 2×, in comparison to the baseline methods (e.g., standard fine-tuning, Parallel and Series Res. adaptor, Piggyback and TinyTL).

Original language	English (US)
Title of host publication	Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022
Publisher	IEEE Computer Society
Pages	2618-2626
Number of pages	9
ISBN (Electronic)	9781665487399
DOIs	https://doi.org/10.1109/CVPRW56347.2022.00295
State	Published - 2022
Event	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022 - New Orleans, United States Duration: Jun 19 2022 → Jun 20 2022

Publication series

Name	IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
Volume	2022-June
ISSN (Print)	2160-7508
ISSN (Electronic)	2160-7516

Conference

Conference	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022
Country/Territory	United States
City	New Orleans
Period	6/19/22 → 6/20/22

ASJC Scopus subject areas

Computer Vision and Pattern Recognition
Electrical and Electronic Engineering

Access to Document

10.1109/CVPRW56347.2022.00295

Cite this

Yang, L., Rakin, A. S., & Fan, D. (2022). DA³: Dynamic Additive Attention Adaption for Memory-Efficient On-Device Multi-Domain Learning. In Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022 (pp. 2618-2626). (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops; Vol. 2022-June). IEEE Computer Society. https://doi.org/10.1109/CVPRW56347.2022.00295

DA³: Dynamic Additive Attention Adaption for Memory-Efficient On-Device Multi-Domain Learning. / Yang, Li; Rakin, Adnan Siraj; Fan, Deliang.
Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022. IEEE Computer Society, 2022. p. 2618-2626 (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops; Vol. 2022-June).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Yang, L, Rakin, AS & Fan, D 2022, DA³: Dynamic Additive Attention Adaption for Memory-Efficient On-Device Multi-Domain Learning. in Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, vol. 2022-June, IEEE Computer Society, pp. 2618-2626, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022, New Orleans, United States, 6/19/22. https://doi.org/10.1109/CVPRW56347.2022.00295

Yang L, Rakin AS, Fan D. DA³: Dynamic Additive Attention Adaption for Memory-Efficient On-Device Multi-Domain Learning. In Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022. IEEE Computer Society. 2022. p. 2618-2626. (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops). doi: 10.1109/CVPRW56347.2022.00295

Yang, Li ; Rakin, Adnan Siraj ; Fan, Deliang. / DA³ : Dynamic Additive Attention Adaption for Memory-Efficient On-Device Multi-Domain Learning. Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022. IEEE Computer Society, 2022. pp. 2618-2626 (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops).

@inproceedings{ffc67a47bb3f4885bbce6578b1beff83,

title = "DA3: Dynamic Additive Attention Adaption for Memory-Efficient On-Device Multi-Domain Learning",

abstract = "Nowadays, one practical limitation of deep neural network (DNN) is its high degree of specialization to a single task or domain (e.g., one visual domain). It motivates re-searchers to develop algorithms that can adapt DNN model to multiple domains sequentially, while still performing well on the past domains, which is known as multi-domain learning. Almost all conventional methods only focus on improving accuracy with minimal parameter update, while ignoring high computing and memory cost during training, which makes it difficult to deploy multi-domain learning into more and more widely used resource-limited edge devices, like mobile phone, IoT, embedded system, etc. During our study in multi-domain training process, we observe that large memory used for activation storage is the bottleneck that largely limits the training time and cost on edge devices. To reduce training memory usage, while keeping the domain adaption accuracy performance, we propose Dynamic Additive Attention Adaption (DA3), a novel memory-efficient on-device multi-domain learning method. DA3 learns a novel additive attention adaptor module, while freezing the weights of the pre-trained backbone model for each domain. Differentiating from prior works, our proposed DA3 module not only mitigates activation memory buffering for reducing memory usage during training, but also serves as dynamic gating mechanism to reduce the computation cost for fast inference. We validate DA3 on multiple dataset against state-of-the-art methods, which shows great improvement in both accuracy and training time. Moreover, we deploy DA3 into the popular NIVDIA Jetson Nano edge GPU, where the measured experimental results show our proposed DA3 reduces the on-device training memory consumption by 5-37×, and training time by 2×, in comparison to the baseline methods (e.g., standard fine-tuning, Parallel and Series Res. adaptor, Piggyback and TinyTL).",

author = "Li Yang and Rakin, {Adnan Siraj} and Deliang Fan",

note = "Funding Information: Acknowledgements This work is supported in part by the National Science Foundation under Grant No.1931871 and No. 2144751 Publisher Copyright: {\textcopyright} 2022 IEEE.; 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022 ; Conference date: 19-06-2022 Through 20-06-2022",

year = "2022",

doi = "10.1109/CVPRW56347.2022.00295",

language = "English (US)",

series = "IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops",

publisher = "IEEE Computer Society",

pages = "2618--2626",

booktitle = "Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022",

}

TY - GEN

T1 - DA3

T2 - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022

AU - Yang, Li

AU - Rakin, Adnan Siraj

AU - Fan, Deliang

PY - 2022

Y1 - 2022

N2 - Nowadays, one practical limitation of deep neural network (DNN) is its high degree of specialization to a single task or domain (e.g., one visual domain). It motivates re-searchers to develop algorithms that can adapt DNN model to multiple domains sequentially, while still performing well on the past domains, which is known as multi-domain learning. Almost all conventional methods only focus on improving accuracy with minimal parameter update, while ignoring high computing and memory cost during training, which makes it difficult to deploy multi-domain learning into more and more widely used resource-limited edge devices, like mobile phone, IoT, embedded system, etc. During our study in multi-domain training process, we observe that large memory used for activation storage is the bottleneck that largely limits the training time and cost on edge devices. To reduce training memory usage, while keeping the domain adaption accuracy performance, we propose Dynamic Additive Attention Adaption (DA3), a novel memory-efficient on-device multi-domain learning method. DA3 learns a novel additive attention adaptor module, while freezing the weights of the pre-trained backbone model for each domain. Differentiating from prior works, our proposed DA3 module not only mitigates activation memory buffering for reducing memory usage during training, but also serves as dynamic gating mechanism to reduce the computation cost for fast inference. We validate DA3 on multiple dataset against state-of-the-art methods, which shows great improvement in both accuracy and training time. Moreover, we deploy DA3 into the popular NIVDIA Jetson Nano edge GPU, where the measured experimental results show our proposed DA3 reduces the on-device training memory consumption by 5-37×, and training time by 2×, in comparison to the baseline methods (e.g., standard fine-tuning, Parallel and Series Res. adaptor, Piggyback and TinyTL).

AB - Nowadays, one practical limitation of deep neural network (DNN) is its high degree of specialization to a single task or domain (e.g., one visual domain). It motivates re-searchers to develop algorithms that can adapt DNN model to multiple domains sequentially, while still performing well on the past domains, which is known as multi-domain learning. Almost all conventional methods only focus on improving accuracy with minimal parameter update, while ignoring high computing and memory cost during training, which makes it difficult to deploy multi-domain learning into more and more widely used resource-limited edge devices, like mobile phone, IoT, embedded system, etc. During our study in multi-domain training process, we observe that large memory used for activation storage is the bottleneck that largely limits the training time and cost on edge devices. To reduce training memory usage, while keeping the domain adaption accuracy performance, we propose Dynamic Additive Attention Adaption (DA3), a novel memory-efficient on-device multi-domain learning method. DA3 learns a novel additive attention adaptor module, while freezing the weights of the pre-trained backbone model for each domain. Differentiating from prior works, our proposed DA3 module not only mitigates activation memory buffering for reducing memory usage during training, but also serves as dynamic gating mechanism to reduce the computation cost for fast inference. We validate DA3 on multiple dataset against state-of-the-art methods, which shows great improvement in both accuracy and training time. Moreover, we deploy DA3 into the popular NIVDIA Jetson Nano edge GPU, where the measured experimental results show our proposed DA3 reduces the on-device training memory consumption by 5-37×, and training time by 2×, in comparison to the baseline methods (e.g., standard fine-tuning, Parallel and Series Res. adaptor, Piggyback and TinyTL).

UR - http://www.scopus.com/inward/record.url?scp=85137750081&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85137750081&partnerID=8YFLogxK

U2 - 10.1109/CVPRW56347.2022.00295

DO - 10.1109/CVPRW56347.2022.00295

M3 - Conference contribution

AN - SCOPUS:85137750081

T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

SP - 2618

EP - 2626

BT - Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022

PB - IEEE Computer Society

Y2 - 19 June 2022 through 20 June 2022

ER -