Improved knowledge distillation via teacher assistant

Seyed Iman Mirzadeh; Mehrdad Farajtabar; Ang Li; Nir Levine; Akihiro Matsukawa; Hassan Ghasemzadeh

Improved knowledge distillation via teacher assistant

Seyed Iman Mirzadeh, Mehrdad Farajtabar, Ang Li, Nir Levine, Akihiro Matsukawa, Hassan Ghasemzadeh

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Despite the fact that deep neural networks are powerful models and achieve appealing results on many tasks, they are too large to be deployed on edge devices like smartphones or embedded sensor nodes. There have been efforts to compress these networks, and a popular method is knowledge distillation, where a large (teacher) pre-trained network is used to train a smaller (student) network. However, in this paper, we show that the student network performance degrades when the gap between student and teacher is large. Given a fixed student network, one cannot employ an arbitrarily large teacher, or in other words, a teacher can effectively transfer its knowledge to students up to a certain size, not smaller. To alleviate this shortcoming, we introduce multi-step knowledge distillation, which employs an intermediate-sized network (teacher assistant) to bridge the gap between the student and the teacher. Moreover, we study the effect of teacher assistant size and extend the framework to multi-step distillation. Theoretical analysis and extensive experiments on CIFAR-10,100 and ImageNet datasets and on CNN and ResNet architectures substantiate the effectiveness of our proposed approach.

Original language	English (US)
Title of host publication	AAAI 2020 - 34th AAAI Conference on Artificial Intelligence
Publisher	AAAI press
Pages	5191-5198
Number of pages	8
ISBN (Electronic)	9781577358350
State	Published - 2020
Externally published	Yes
Event	34th AAAI Conference on Artificial Intelligence, AAAI 2020 - New York, United States Duration: Feb 7 2020 → Feb 12 2020

Publication series

Name	AAAI 2020 - 34th AAAI Conference on Artificial Intelligence

Conference

Conference	34th AAAI Conference on Artificial Intelligence, AAAI 2020
Country/Territory	United States
City	New York
Period	2/7/20 → 2/12/20

ASJC Scopus subject areas

Artificial Intelligence

Cite this

Mirzadeh, SI, Farajtabar, M, Li, A, Levine, N, Matsukawa, A & Ghasemzadeh, H 2020, Improved knowledge distillation via teacher assistant. in AAAI 2020 - 34th AAAI Conference on Artificial Intelligence. AAAI 2020 - 34th AAAI Conference on Artificial Intelligence, AAAI press, pp. 5191-5198, 34th AAAI Conference on Artificial Intelligence, AAAI 2020, New York, United States, 2/7/20.

@inproceedings{919bc24c09bc45468af16f7ae1e40bcc,

title = "Improved knowledge distillation via teacher assistant",

abstract = "Despite the fact that deep neural networks are powerful models and achieve appealing results on many tasks, they are too large to be deployed on edge devices like smartphones or embedded sensor nodes. There have been efforts to compress these networks, and a popular method is knowledge distillation, where a large (teacher) pre-trained network is used to train a smaller (student) network. However, in this paper, we show that the student network performance degrades when the gap between student and teacher is large. Given a fixed student network, one cannot employ an arbitrarily large teacher, or in other words, a teacher can effectively transfer its knowledge to students up to a certain size, not smaller. To alleviate this shortcoming, we introduce multi-step knowledge distillation, which employs an intermediate-sized network (teacher assistant) to bridge the gap between the student and the teacher. Moreover, we study the effect of teacher assistant size and extend the framework to multi-step distillation. Theoretical analysis and extensive experiments on CIFAR-10,100 and ImageNet datasets and on CNN and ResNet architectures substantiate the effectiveness of our proposed approach.",

author = "Mirzadeh, {Seyed Iman} and Mehrdad Farajtabar and Ang Li and Nir Levine and Akihiro Matsukawa and Hassan Ghasemzadeh",

note = "Funding Information: Authors Mirzadeh and Ghasemzadeh were supported in part through grant CNS-1750679 from the United States National Science Foundation. The authors would like to thank Luke Metz, Rohan Anil, Sepehr Sameni, Hooman Shahrokhi, Janardhan Rao Doppa, and Hung Bui for their review and feedback. Publisher Copyright: Copyright {\textcopyright} 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 34th AAAI Conference on Artificial Intelligence, AAAI 2020 ; Conference date: 07-02-2020 Through 12-02-2020",

year = "2020",

language = "English (US)",

series = "AAAI 2020 - 34th AAAI Conference on Artificial Intelligence",

publisher = "AAAI press",

pages = "5191--5198",

booktitle = "AAAI 2020 - 34th AAAI Conference on Artificial Intelligence",

}

TY - GEN

T1 - Improved knowledge distillation via teacher assistant

AU - Mirzadeh, Seyed Iman

AU - Farajtabar, Mehrdad

AU - Li, Ang

AU - Levine, Nir

AU - Matsukawa, Akihiro

AU - Ghasemzadeh, Hassan

N1 - Funding Information: Authors Mirzadeh and Ghasemzadeh were supported in part through grant CNS-1750679 from the United States National Science Foundation. The authors would like to thank Luke Metz, Rohan Anil, Sepehr Sameni, Hooman Shahrokhi, Janardhan Rao Doppa, and Hung Bui for their review and feedback. Publisher Copyright: Copyright © 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

PY - 2020

Y1 - 2020

N2 - Despite the fact that deep neural networks are powerful models and achieve appealing results on many tasks, they are too large to be deployed on edge devices like smartphones or embedded sensor nodes. There have been efforts to compress these networks, and a popular method is knowledge distillation, where a large (teacher) pre-trained network is used to train a smaller (student) network. However, in this paper, we show that the student network performance degrades when the gap between student and teacher is large. Given a fixed student network, one cannot employ an arbitrarily large teacher, or in other words, a teacher can effectively transfer its knowledge to students up to a certain size, not smaller. To alleviate this shortcoming, we introduce multi-step knowledge distillation, which employs an intermediate-sized network (teacher assistant) to bridge the gap between the student and the teacher. Moreover, we study the effect of teacher assistant size and extend the framework to multi-step distillation. Theoretical analysis and extensive experiments on CIFAR-10,100 and ImageNet datasets and on CNN and ResNet architectures substantiate the effectiveness of our proposed approach.

AB - Despite the fact that deep neural networks are powerful models and achieve appealing results on many tasks, they are too large to be deployed on edge devices like smartphones or embedded sensor nodes. There have been efforts to compress these networks, and a popular method is knowledge distillation, where a large (teacher) pre-trained network is used to train a smaller (student) network. However, in this paper, we show that the student network performance degrades when the gap between student and teacher is large. Given a fixed student network, one cannot employ an arbitrarily large teacher, or in other words, a teacher can effectively transfer its knowledge to students up to a certain size, not smaller. To alleviate this shortcoming, we introduce multi-step knowledge distillation, which employs an intermediate-sized network (teacher assistant) to bridge the gap between the student and the teacher. Moreover, we study the effect of teacher assistant size and extend the framework to multi-step distillation. Theoretical analysis and extensive experiments on CIFAR-10,100 and ImageNet datasets and on CNN and ResNet architectures substantiate the effectiveness of our proposed approach.

UR - http://www.scopus.com/inward/record.url?scp=85106586399&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85106586399&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85106586399

T3 - AAAI 2020 - 34th AAAI Conference on Artificial Intelligence

SP - 5191

EP - 5198

BT - AAAI 2020 - 34th AAAI Conference on Artificial Intelligence

PB - AAAI press

T2 - 34th AAAI Conference on Artificial Intelligence, AAAI 2020

Y2 - 7 February 2020 through 12 February 2020

ER -

Improved knowledge distillation via teacher assistant

Abstract

Publication series

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this