In recent years, we have observed the prevalence of stream applications in many embedded domains. Stream applications distinguish themselves from traditional sequential programming languages through well defined independent actors, explicit data communication, and stable code/data access patterns. In order to achieve high performance and low power, scratch pad memory (SPM) has been introduced in today's embedded multicore processors. Programing on SPM based architecture is both challenging and time consuming. In this paper we address the problem of automatic compilation of stream applications onto SPM based embedded multicore processors through unrolling and retiming. In our technique, code overlay and data overlay are implemented to overcome the limited SPM capacity. Smart double buffering and code prefetching are introduced to amortize memory access delays. We evaluated the efficiency of our technique through compiling several stream applications onto the IBM Cell processor and compared their performance with existing approaches.