TY - GEN
T1 - RecPipe
T2 - 54th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2021
AU - Gupta, Udit
AU - Hsia, Samuel
AU - Zhang, Jeff Jun
AU - Wilkening, Mark
AU - Pombra, Javin
AU - Lee, Hsien Hsin S.
AU - Wei, Gu Yeon
AU - Wu, Carole Jean
AU - Brooks, David
N1 - Funding Information:
We would like to thank the anonymous reviewers and artifact evaluation committee in providing valuable feedback in improving this work and corresponding artifact. The academic authors, at Harvard University, of this work were also supported in part by the NSF Graduate Research Fellowship, SRC JUMP Applications Driven Architecture (ADA) center, and Intel Corporation.
Publisher Copyright:
© 2021 Association for Computing Machinery.
PY - 2021/10/18
Y1 - 2021/10/18
N2 - Deep learning recommendation systems must provide high quality, personalized content under strict tail-latency targets and high system loads. This paper presents RecPipe, a system to jointly optimize recommendation quality and inference performance. Central to RecPipe is decomposing recommendation models into multi-stage pipelines to maintain quality while reducing compute complexity and exposing distinct parallelism opportunities. RecPipe implements an inference scheduler to map multi-stage recommendation engines onto commodity, heterogeneous platforms (e.g., CPUs, GPUs). While the hardware-aware scheduling improves ranking efficiency, the commodity platforms suffer from many limitations requiring specialized hardware. Thus, we design RecPipeAccel (RPAccel), a custom accelerator that jointly optimizes quality, tail-latency, and system throughput. RPAccel is designed specifically to exploit the distinct design space opened via RecPipe. In particular, RPAccel processes queries in sub-batches to pipeline recommendation stages, implements dual static and dynamic embedding caches, a set of top-k filtering units, and a reconfigurable systolic array. Compared to previously proposed specialized recommendation accelerators and at iso-quality, we demonstrate that RPAccel improves latency and throughput by 3× and 6×.
AB - Deep learning recommendation systems must provide high quality, personalized content under strict tail-latency targets and high system loads. This paper presents RecPipe, a system to jointly optimize recommendation quality and inference performance. Central to RecPipe is decomposing recommendation models into multi-stage pipelines to maintain quality while reducing compute complexity and exposing distinct parallelism opportunities. RecPipe implements an inference scheduler to map multi-stage recommendation engines onto commodity, heterogeneous platforms (e.g., CPUs, GPUs). While the hardware-aware scheduling improves ranking efficiency, the commodity platforms suffer from many limitations requiring specialized hardware. Thus, we design RecPipeAccel (RPAccel), a custom accelerator that jointly optimizes quality, tail-latency, and system throughput. RPAccel is designed specifically to exploit the distinct design space opened via RecPipe. In particular, RPAccel processes queries in sub-batches to pipeline recommendation stages, implements dual static and dynamic embedding caches, a set of top-k filtering units, and a reconfigurable systolic array. Compared to previously proposed specialized recommendation accelerators and at iso-quality, we demonstrate that RPAccel improves latency and throughput by 3× and 6×.
KW - Datacenter
KW - Deep Learning
KW - Hardware accelerator
KW - Personalized recommendation
UR - http://www.scopus.com/inward/record.url?scp=85118866145&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85118866145&partnerID=8YFLogxK
U2 - 10.1145/3466752.3480127
DO - 10.1145/3466752.3480127
M3 - Conference contribution
AN - SCOPUS:85118866145
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 870
EP - 884
BT - MICRO 2021 - 54th Annual IEEE/ACM International Symposium on Microarchitecture, Proceedings
PB - IEEE Computer Society
Y2 - 18 October 2021 through 22 October 2021
ER -