TY - GEN
T1 - Branch penalty reduction on IBM cell SPUs via software branch hinting
AU - Lu, Jing
AU - Kim, Yooseong
AU - Shrivastava, Aviral
AU - Huang, Chuan
PY - 2011
Y1 - 2011
N2 - As power-efficiency becomes paramount concern in processor design, architectures are coming up that completely do away with hardware branch prediction, and rely solely on software branch hinting. A popular example is the Synergistic Processing Unit (SPU) in the IBM Cell processor. To be able to minimize the branch penalty using branch hint instructions, in addition to estimating the branch probabilities (which has been looked at before [6, 25, 24]), it is important to carefully insert branch hints. Towards this, in this paper, we i) construct a branch penalty model for compiler, ii) formulate the problem of minimizing branch penalty using branch hinting and iii) propose a heuristic to solve this problem. The heuristic is based on three basic techniques that we introduce in this paper: NOP padding, hint pipelining, and nested loop restructuring. Experimental results on several benchmarks show that our solution can reduce the branch penalty as much as 35.4% over the previous approach.
AB - As power-efficiency becomes paramount concern in processor design, architectures are coming up that completely do away with hardware branch prediction, and rely solely on software branch hinting. A popular example is the Synergistic Processing Unit (SPU) in the IBM Cell processor. To be able to minimize the branch penalty using branch hint instructions, in addition to estimating the branch probabilities (which has been looked at before [6, 25, 24]), it is important to carefully insert branch hints. Towards this, in this paper, we i) construct a branch penalty model for compiler, ii) formulate the problem of minimizing branch penalty using branch hinting and iii) propose a heuristic to solve this problem. The heuristic is based on three basic techniques that we introduce in this paper: NOP padding, hint pipelining, and nested loop restructuring. Experimental results on several benchmarks show that our solution can reduce the branch penalty as much as 35.4% over the previous approach.
KW - Branch hint
KW - Cell processor
KW - Compiler optimization
UR - http://www.scopus.com/inward/record.url?scp=81355124046&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=81355124046&partnerID=8YFLogxK
U2 - 10.1145/2039370.2039425
DO - 10.1145/2039370.2039425
M3 - Conference contribution
AN - SCOPUS:81355124046
SN - 9781450307154
T3 - Embedded Systems Week 2011, ESWEEK 2011 - Proceedings of the 9th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS'11
SP - 355
EP - 364
BT - Embedded Systems Week 2011, ESWEEK 2011 - Proceedings of the 9th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS'11
T2 - Embedded Systems Week 2011, ESWEEK 2011 - 9th IEEE/ACM International Conference on Hardware/Software-Codesign and System Synthesis, CODES+ISSS'11
Y2 - 9 October 2011 through 14 October 2011
ER -