TY - GEN
T1 - DeepRecSys
T2 - 47th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2020
AU - Gupta, Udit
AU - Hsia, Samuel
AU - Saraph, Vikram
AU - Wang, Xiaodong
AU - Reagen, Brandon
AU - Wei, Gu Yeon
AU - Lee, Hsien Hsin S.
AU - Brooks, David
AU - Wu, Carole Jean
N1 - Funding Information:
IX. ACKNOWLEDGEMENTS We would like to thank Cong Chen and Ashish Shenoy for the valuable feedback and numerous discussions on the at-scale execution of personalized recommendation systems in Facebook’s datacenter fleets. The collaboration led to insights which we used to refine the proposed design presented in this paper. It also resulted in design implementation, testing, and evaluationof the proposed idea for production use cases. This work is also supported in part by NSF XPS-1533737 and an NSF Graduate Research Fellowship.
Publisher Copyright:
© 2020 IEEE.
PY - 2020/5
Y1 - 2020/5
N2 - Neural personalized recommendation is the cornerstone of a wide collection of cloud services and products, constituting significant compute demand of cloud infrastructure. Thus, improving the execution efficiency of recommendation directly translates into infrastructure capacity saving. In this paper, we propose DeepRecSched, a recommendation inference scheduler that maximizes latency-bounded throughput by taking into account characteristics of inference query size and arrival patterns, model architectures, and underlying hardware systems. By carefully optimizing task versus data-level parallelism, DeepRecSched improves system throughput on server class CPUs by 2 × across eight industry-representative models. Next, we deploy and evaluate this optimization in an at-scale production datacenter which reduces end-to-end tail latency across a wide variety of recommendation models by 30%. Finally, DeepRecSched demonstrates the role and impact of specialized AI hardware in optimizing system level performance (QPS) and power efficiency (QPS/watt) of recommendation inference. In order to enable the design space exploration of customized recommendation systems shown in this paper, we design and validate an end-to-end modeling infrastructure, DeepRecInfra. DeepRecInfra enables studies over a variety of recommendation use cases, taking into account at-scale effects, such as query arrival patterns and recommendation query sizes, observed from a production datacenter, as well as industry-representative models and tail latency targets.
AB - Neural personalized recommendation is the cornerstone of a wide collection of cloud services and products, constituting significant compute demand of cloud infrastructure. Thus, improving the execution efficiency of recommendation directly translates into infrastructure capacity saving. In this paper, we propose DeepRecSched, a recommendation inference scheduler that maximizes latency-bounded throughput by taking into account characteristics of inference query size and arrival patterns, model architectures, and underlying hardware systems. By carefully optimizing task versus data-level parallelism, DeepRecSched improves system throughput on server class CPUs by 2 × across eight industry-representative models. Next, we deploy and evaluate this optimization in an at-scale production datacenter which reduces end-to-end tail latency across a wide variety of recommendation models by 30%. Finally, DeepRecSched demonstrates the role and impact of specialized AI hardware in optimizing system level performance (QPS) and power efficiency (QPS/watt) of recommendation inference. In order to enable the design space exploration of customized recommendation systems shown in this paper, we design and validate an end-to-end modeling infrastructure, DeepRecInfra. DeepRecInfra enables studies over a variety of recommendation use cases, taking into account at-scale effects, such as query arrival patterns and recommendation query sizes, observed from a production datacenter, as well as industry-representative models and tail latency targets.
UR - http://www.scopus.com/inward/record.url?scp=85091974412&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091974412&partnerID=8YFLogxK
U2 - 10.1109/ISCA45697.2020.00084
DO - 10.1109/ISCA45697.2020.00084
M3 - Conference contribution
AN - SCOPUS:85091974412
T3 - Proceedings - International Symposium on Computer Architecture
SP - 982
EP - 995
BT - Proceedings - 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture, ISCA 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 30 May 2020 through 3 June 2020
ER -