TY - JOUR
T1 - Retargetable pipeline hazard detection for partially bypassed processors
AU - Shrivastava, Aviral
AU - Earlie, Eugene
AU - Dutt, Nikil D.
AU - Nicolau, Alex
N1 - Funding Information:
Manuscript received June 28, 2005; revised January 4, 2006. This work was funded in part by grants from Intel Corporation, UC Micro (03-028), and by the Semiconductor Research Corporation under Contract 2003-HJ-1111.
PY - 2006/8
Y1 - 2006/8
N2 - Register bypassing is a widely used feature in modern processors to eliminate certain data hazards. Although complete bypassing is ideal for performance, it has significant impact on the cycle time, area, and power consumption of the processor. Owing to the strict design constraints on the performance, cost, and the power consumption of embedded processor systems, architects seek a compromise between the design parameters by implementing partial bypassing in processors. However, partial bypassing in processors presents challenges for compilation. Traditional data hazard detection and/or avoidance techniques used in retargetable compilers that assume a constant value of operation latency, break down in the presence of partial bypassing. In this article, we present the concept of operation tables (OTs) that can be used to accurately detect data hazards, even in the presence of incomplete bypassing. OTs integrate the detection of all kinds of pipeline hazards in a unified framework, and can, therefore, be easily deployed in a compiler to generate better schedules. Our experimental results on the popular Intel XScale embedded processor running embedded applications from the MiBench suite, demonstrate that accurate pipeline hazard detection by OTs can result in up to 20% performance improvement over the best performing GCC generated code. Finally, we demonstrate the usefulness of OTs over various bypass configurations of the Intel XScale.
AB - Register bypassing is a widely used feature in modern processors to eliminate certain data hazards. Although complete bypassing is ideal for performance, it has significant impact on the cycle time, area, and power consumption of the processor. Owing to the strict design constraints on the performance, cost, and the power consumption of embedded processor systems, architects seek a compromise between the design parameters by implementing partial bypassing in processors. However, partial bypassing in processors presents challenges for compilation. Traditional data hazard detection and/or avoidance techniques used in retargetable compilers that assume a constant value of operation latency, break down in the presence of partial bypassing. In this article, we present the concept of operation tables (OTs) that can be used to accurately detect data hazards, even in the presence of incomplete bypassing. OTs integrate the detection of all kinds of pipeline hazards in a unified framework, and can, therefore, be easily deployed in a compiler to generate better schedules. Our experimental results on the popular Intel XScale embedded processor running embedded applications from the MiBench suite, demonstrate that accurate pipeline hazard detection by OTs can result in up to 20% performance improvement over the best performing GCC generated code. Finally, we demonstrate the usefulness of OTs over various bypass configurations of the Intel XScale.
KW - Bypasses
KW - Forwarding path
KW - Operation table
KW - Partial bypassing
KW - Partially bypassed processor
KW - Pipeline hazard detection
KW - Processor pipeline
UR - http://www.scopus.com/inward/record.url?scp=33747448287&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33747448287&partnerID=8YFLogxK
U2 - 10.1109/TVLSI.2006.878468
DO - 10.1109/TVLSI.2006.878468
M3 - Article
AN - SCOPUS:33747448287
SN - 1063-8210
VL - 14
SP - 791
EP - 801
JO - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
JF - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IS - 8
M1 - 1664901
ER -