Exploiting Parallelism Opportunities with Deep Learning Frameworks

Yu Emma Wang; Carole Jean Wu; Xiaodong Wang; Kim Hazelwood; David Brooks

doi:10.1145/3431388

Exploiting Parallelism Opportunities with Deep Learning Frameworks

Yu Emma Wang, Carole Jean Wu, Xiaodong Wang, Kim Hazelwood, David Brooks

Research output: Contribution to journal › Article › peer-review

14 Scopus citations

Abstract

State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using a performance-optimal setting in feature-rich frameworks, however, involves a non-trivial amount of performance profiling efforts and often relies on domain-specific knowledge. This article takes a deep dive into analyzing the performance impact of key design features in a machine learning framework and quantifies the role of parallelism. The observations and insights distill into a simple set of guidelines that one can use to achieve much higher training and inference speedup. Across a diverse set of real-world deep learning models, the evaluation results show that the proposed performance tuning guidelines outperform the Intel and TensorFlow recommended settings by 1.30× and 1.38×, respectively.

Original language	English (US)
Article number	9
Journal	ACM Transactions on Architecture and Code Optimization
Volume	18
Issue number	1
DOIs	https://doi.org/10.1145/3431388
State	Published - Jan 2021
Externally published	Yes

Keywords

Machine learning frameworks
parallel computing
performance analysis

ASJC Scopus subject areas

Software
Information Systems
Hardware and Architecture

Access to Document

10.1145/3431388

Cite this

@article{29182653b9b54c52856d5472e5d2cc98,

title = "Exploiting Parallelism Opportunities with Deep Learning Frameworks",

abstract = "State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using a performance-optimal setting in feature-rich frameworks, however, involves a non-trivial amount of performance profiling efforts and often relies on domain-specific knowledge. This article takes a deep dive into analyzing the performance impact of key design features in a machine learning framework and quantifies the role of parallelism. The observations and insights distill into a simple set of guidelines that one can use to achieve much higher training and inference speedup. Across a diverse set of real-world deep learning models, the evaluation results show that the proposed performance tuning guidelines outperform the Intel and TensorFlow recommended settings by 1.30× and 1.38×, respectively.",

keywords = "Machine learning frameworks, parallel computing, performance analysis",

author = "Wang, {Yu Emma} and Wu, {Carole Jean} and Xiaodong Wang and Kim Hazelwood and David Brooks",

note = "Funding Information: This work was supported by NSF Grant #CCF-1533737. Authors{\textquoteright} addresses: Y. E. Wang, Harvard University, 33 Oxford St, Cambridge, MA; emails: ywang03@g.harvard.edu, dbrooks@eecs.harvard.edu; C.-J. Wu, X. Wang, and K. Hazelwood, Facebook AI, 1 Hacker Way, Menlo Park, CA 94025; emails: {carolejeanwu, xdwang}@fb.com. Publisher Copyright: {\textcopyright} 2020 ACM.",

year = "2021",

month = jan,

doi = "10.1145/3431388",

language = "English (US)",

volume = "18",

journal = "ACM Transactions on Architecture and Code Optimization",

issn = "1544-3566",

publisher = "Association for Computing Machinery (ACM)",

number = "1",

}

TY - JOUR

T1 - Exploiting Parallelism Opportunities with Deep Learning Frameworks

AU - Wang, Yu Emma

AU - Wu, Carole Jean

AU - Wang, Xiaodong

AU - Hazelwood, Kim

AU - Brooks, David

N1 - Funding Information: This work was supported by NSF Grant #CCF-1533737. Authors’ addresses: Y. E. Wang, Harvard University, 33 Oxford St, Cambridge, MA; emails: ywang03@g.harvard.edu, dbrooks@eecs.harvard.edu; C.-J. Wu, X. Wang, and K. Hazelwood, Facebook AI, 1 Hacker Way, Menlo Park, CA 94025; emails: {carolejeanwu, xdwang}@fb.com. Publisher Copyright: © 2020 ACM.

PY - 2021/1

Y1 - 2021/1

N2 - State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using a performance-optimal setting in feature-rich frameworks, however, involves a non-trivial amount of performance profiling efforts and often relies on domain-specific knowledge. This article takes a deep dive into analyzing the performance impact of key design features in a machine learning framework and quantifies the role of parallelism. The observations and insights distill into a simple set of guidelines that one can use to achieve much higher training and inference speedup. Across a diverse set of real-world deep learning models, the evaluation results show that the proposed performance tuning guidelines outperform the Intel and TensorFlow recommended settings by 1.30× and 1.38×, respectively.

AB - State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using a performance-optimal setting in feature-rich frameworks, however, involves a non-trivial amount of performance profiling efforts and often relies on domain-specific knowledge. This article takes a deep dive into analyzing the performance impact of key design features in a machine learning framework and quantifies the role of parallelism. The observations and insights distill into a simple set of guidelines that one can use to achieve much higher training and inference speedup. Across a diverse set of real-world deep learning models, the evaluation results show that the proposed performance tuning guidelines outperform the Intel and TensorFlow recommended settings by 1.30× and 1.38×, respectively.

KW - Machine learning frameworks

KW - parallel computing

KW - performance analysis

UR - http://www.scopus.com/inward/record.url?scp=85099780667&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85099780667&partnerID=8YFLogxK

U2 - 10.1145/3431388

DO - 10.1145/3431388

M3 - Article

AN - SCOPUS:85099780667

SN - 1544-3566

VL - 18

JO - ACM Transactions on Architecture and Code Optimization

JF - ACM Transactions on Architecture and Code Optimization

IS - 1

M1 - 9

ER -

Exploiting Parallelism Opportunities with Deep Learning Frameworks

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this