TY - GEN
T1 - A lightweight run-time scheduler for multitasking multicore stream applications
AU - Baker, Michael A.
AU - Chatha, Karam S.
PY - 2010/12/1
Y1 - 2010/12/1
N2 - Stream programming models promise dramatic improvements in developers' ability to express parallelism in their applications while enabling extremely efficient implementations on modern many-core processors. Unfortunately, the wide variation in the architectural features of available multi-core processors implies that a single compiler may be incapable of generating general solutions which can run on many target systems, or even on different configurations of the same system. In particular, off-line approaches for finding optimal mappings and schedules for a stream program on a specific processor are limited by their lack of portability across different processors, and by a lack of flexibility for run time variations in resource availability in typical multi-tasking environments. The paper presents a scheme that includes a lightweight compile-time sequencer, and a dynamic scheduler capable of mapping stream programs onto available cores in a multi-core processor at run-time. Unlike previous implementations, our scheme requires limited knowledge of the target architecture's resources at compile time. The off-line portion of the scheme generates canonical scheduling information about the stream program. This information is utilized by the lightweight run-time scheduling algorithm to generate application mappings in linear time based on available resources giving near optimal throughput. Evaluations of schedules generated for twelve streaming benchmarks gives an average of 96% and 93% of the theoretical optimum throughput for schedules with up to 4 and 128 cores, respectively.
AB - Stream programming models promise dramatic improvements in developers' ability to express parallelism in their applications while enabling extremely efficient implementations on modern many-core processors. Unfortunately, the wide variation in the architectural features of available multi-core processors implies that a single compiler may be incapable of generating general solutions which can run on many target systems, or even on different configurations of the same system. In particular, off-line approaches for finding optimal mappings and schedules for a stream program on a specific processor are limited by their lack of portability across different processors, and by a lack of flexibility for run time variations in resource availability in typical multi-tasking environments. The paper presents a scheme that includes a lightweight compile-time sequencer, and a dynamic scheduler capable of mapping stream programs onto available cores in a multi-core processor at run-time. Unlike previous implementations, our scheme requires limited knowledge of the target architecture's resources at compile time. The off-line portion of the scheme generates canonical scheduling information about the stream program. This information is utilized by the lightweight run-time scheduling algorithm to generate application mappings in linear time based on available resources giving near optimal throughput. Evaluations of schedules generated for twelve streaming benchmarks gives an average of 96% and 93% of the theoretical optimum throughput for schedules with up to 4 and 128 cores, respectively.
UR - http://www.scopus.com/inward/record.url?scp=78650728642&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78650728642&partnerID=8YFLogxK
U2 - 10.1109/ICCD.2010.5647732
DO - 10.1109/ICCD.2010.5647732
M3 - Conference contribution
AN - SCOPUS:78650728642
SN - 9781424489350
T3 - Proceedings - IEEE International Conference on Computer Design: VLSI in Computers and Processors
SP - 297
EP - 304
BT - 2010 IEEE International Conference on Computer Design, ICCD 2010
T2 - 28th IEEE International Conference on Computer Design, ICCD 2010
Y2 - 3 October 2010 through 6 October 2010
ER -