Though recent progress in resistive random access memory (ReRAM)-based accelerator designs for convolutional neural networks (CNN) achieve superior timing performance and area-efficiency improvements over CMOS-based accelerators, they have high energy consumptions due to low inter-layer data reuse. In this work, we propose a multi-tile ReRAM accelerator for supporting multiple CNN topologies, where each tile processes one or more layers in a pipelined fashion. Building upon the fact that a tile with large receptive field can be built with a stack of smaller (3×3) filters, we design every tile with 9 processing elements that operate in a systolic fashion. Use of systolic data flow design maximizes input feature map reuse and minimizes interconnection cost. We show that 1-bit weight and 4-bit activation achieves good accuracy for both AlexNet and VGGNet, and design our ReRAM based accelerator to support this configuration. System-level simulation results on 32 nm node show that the proposed architecture for AlexNet with stacking small filters can achieve computation efficiency of 8.42 TOPs/s/mm 2 , energy efficiency of 4.08 TOPs/s/W and storage efficiency of 0.18 MB/mm 2 for inference computation of one image in the CIFAR-100 dataset.