RecPipe: Co-designing models and hardware to jointly optimize recommendation quality and performance

Udit Gupta, Samuel Hsia, Jeff Jun Zhang, Mark Wilkening, Javin Pombra, Hsien Hsin S. Lee, Gu Yeon Wei, Carole Jean Wu, David Brooks

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Deep learning recommendation systems must provide high quality, personalized content under strict tail-latency targets and high system loads. This paper presents RecPipe, a system to jointly optimize recommendation quality and inference performance. Central to RecPipe is decomposing recommendation models into multi-stage pipelines to maintain quality while reducing compute complexity and exposing distinct parallelism opportunities. RecPipe implements an inference scheduler to map multi-stage recommendation engines onto commodity, heterogeneous platforms (e.g., CPUs, GPUs). While the hardware-aware scheduling improves ranking efficiency, the commodity platforms suffer from many limitations requiring specialized hardware. Thus, we design RecPipeAccel (RPAccel), a custom accelerator that jointly optimizes quality, tail-latency, and system throughput. RPAccel is designed specifically to exploit the distinct design space opened via RecPipe. In particular, RPAccel processes queries in sub-batches to pipeline recommendation stages, implements dual static and dynamic embedding caches, a set of top-k filtering units, and a reconfigurable systolic array. Compared to previously proposed specialized recommendation accelerators and at iso-quality, we demonstrate that RPAccel improves latency and throughput by 3× and 6×.

Original languageEnglish (US)
Title of host publicationMICRO 2021 - 54th Annual IEEE/ACM International Symposium on Microarchitecture, Proceedings
PublisherIEEE Computer Society
Pages870-884
Number of pages15
ISBN (Electronic)9781450385572
DOIs
StatePublished - Oct 18 2021
Externally publishedYes
Event54th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2021 - Virtual, Online, Greece
Duration: Oct 18 2021Oct 22 2021

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
ISSN (Print)1072-4451

Conference

Conference54th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2021
Country/TerritoryGreece
CityVirtual, Online
Period10/18/2110/22/21

Keywords

  • Datacenter
  • Deep Learning
  • Hardware accelerator
  • Personalized recommendation

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'RecPipe: Co-designing models and hardware to jointly optimize recommendation quality and performance'. Together they form a unique fingerprint.

Cite this