Retargetable pipeline hazard detection for partially bypassed processors

Aviral Shrivastava, Eugene Earlie, Nikil D. Dutt, Alex Nicolau

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Register bypassing is a widely used feature in modern processors to eliminate certain data hazards. Although complete bypassing is ideal for performance, it has significant impact on the cycle time, area, and power consumption of the processor. Owing to the strict design constraints on the performance, cost, and the power consumption of embedded processor systems, architects seek a compromise between the design parameters by implementing partial bypassing in processors. However, partial bypassing in processors presents challenges for compilation. Traditional data hazard detection and/or avoidance techniques used in retargetable compilers that assume a constant value of operation latency, break down in the presence of partial bypassing. In this article, we present the concept of operation tables (OTs) that can be used to accurately detect data hazards, even in the presence of incomplete bypassing. OTs integrate the detection of all kinds of pipeline hazards in a unified framework, and can, therefore, be easily deployed in a compiler to generate better schedules. Our experimental results on the popular Intel XScale embedded processor running embedded applications from the MiBench suite, demonstrate that accurate pipeline hazard detection by OTs can result in up to 20% performance improvement over the best performing GCC generated code. Finally, we demonstrate the usefulness of OTs over various bypass configurations of the Intel XScale.

Original languageEnglish (US)
Article number1664901
Pages (from-to)791-801
Number of pages11
JournalIEEE Transactions on Very Large Scale Integration (VLSI) Systems
Volume14
Issue number8
DOIs
StatePublished - Aug 2006

Fingerprint

Hazards
Pipelines
Electric power utilization
Costs

Keywords

  • Bypasses
  • Forwarding path
  • Operation table
  • Partial bypassing
  • Partially bypassed processor
  • Pipeline hazard detection
  • Processor pipeline

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Hardware and Architecture

Cite this

Retargetable pipeline hazard detection for partially bypassed processors. / Shrivastava, Aviral; Earlie, Eugene; Dutt, Nikil D.; Nicolau, Alex.

In: IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 14, No. 8, 1664901, 08.2006, p. 791-801.

Research output: Contribution to journalArticle

@article{e2be61c17479451ba4044742d8f6f1fe,
title = "Retargetable pipeline hazard detection for partially bypassed processors",
abstract = "Register bypassing is a widely used feature in modern processors to eliminate certain data hazards. Although complete bypassing is ideal for performance, it has significant impact on the cycle time, area, and power consumption of the processor. Owing to the strict design constraints on the performance, cost, and the power consumption of embedded processor systems, architects seek a compromise between the design parameters by implementing partial bypassing in processors. However, partial bypassing in processors presents challenges for compilation. Traditional data hazard detection and/or avoidance techniques used in retargetable compilers that assume a constant value of operation latency, break down in the presence of partial bypassing. In this article, we present the concept of operation tables (OTs) that can be used to accurately detect data hazards, even in the presence of incomplete bypassing. OTs integrate the detection of all kinds of pipeline hazards in a unified framework, and can, therefore, be easily deployed in a compiler to generate better schedules. Our experimental results on the popular Intel XScale embedded processor running embedded applications from the MiBench suite, demonstrate that accurate pipeline hazard detection by OTs can result in up to 20{\%} performance improvement over the best performing GCC generated code. Finally, we demonstrate the usefulness of OTs over various bypass configurations of the Intel XScale.",
keywords = "Bypasses, Forwarding path, Operation table, Partial bypassing, Partially bypassed processor, Pipeline hazard detection, Processor pipeline",
author = "Aviral Shrivastava and Eugene Earlie and Dutt, {Nikil D.} and Alex Nicolau",
year = "2006",
month = "8",
doi = "10.1109/TVLSI.2006.878468",
language = "English (US)",
volume = "14",
pages = "791--801",
journal = "IEEE Transactions on Very Large Scale Integration (VLSI) Systems",
issn = "1063-8210",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "8",

}

TY - JOUR

T1 - Retargetable pipeline hazard detection for partially bypassed processors

AU - Shrivastava, Aviral

AU - Earlie, Eugene

AU - Dutt, Nikil D.

AU - Nicolau, Alex

PY - 2006/8

Y1 - 2006/8

N2 - Register bypassing is a widely used feature in modern processors to eliminate certain data hazards. Although complete bypassing is ideal for performance, it has significant impact on the cycle time, area, and power consumption of the processor. Owing to the strict design constraints on the performance, cost, and the power consumption of embedded processor systems, architects seek a compromise between the design parameters by implementing partial bypassing in processors. However, partial bypassing in processors presents challenges for compilation. Traditional data hazard detection and/or avoidance techniques used in retargetable compilers that assume a constant value of operation latency, break down in the presence of partial bypassing. In this article, we present the concept of operation tables (OTs) that can be used to accurately detect data hazards, even in the presence of incomplete bypassing. OTs integrate the detection of all kinds of pipeline hazards in a unified framework, and can, therefore, be easily deployed in a compiler to generate better schedules. Our experimental results on the popular Intel XScale embedded processor running embedded applications from the MiBench suite, demonstrate that accurate pipeline hazard detection by OTs can result in up to 20% performance improvement over the best performing GCC generated code. Finally, we demonstrate the usefulness of OTs over various bypass configurations of the Intel XScale.

AB - Register bypassing is a widely used feature in modern processors to eliminate certain data hazards. Although complete bypassing is ideal for performance, it has significant impact on the cycle time, area, and power consumption of the processor. Owing to the strict design constraints on the performance, cost, and the power consumption of embedded processor systems, architects seek a compromise between the design parameters by implementing partial bypassing in processors. However, partial bypassing in processors presents challenges for compilation. Traditional data hazard detection and/or avoidance techniques used in retargetable compilers that assume a constant value of operation latency, break down in the presence of partial bypassing. In this article, we present the concept of operation tables (OTs) that can be used to accurately detect data hazards, even in the presence of incomplete bypassing. OTs integrate the detection of all kinds of pipeline hazards in a unified framework, and can, therefore, be easily deployed in a compiler to generate better schedules. Our experimental results on the popular Intel XScale embedded processor running embedded applications from the MiBench suite, demonstrate that accurate pipeline hazard detection by OTs can result in up to 20% performance improvement over the best performing GCC generated code. Finally, we demonstrate the usefulness of OTs over various bypass configurations of the Intel XScale.

KW - Bypasses

KW - Forwarding path

KW - Operation table

KW - Partial bypassing

KW - Partially bypassed processor

KW - Pipeline hazard detection

KW - Processor pipeline

UR - http://www.scopus.com/inward/record.url?scp=33747448287&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33747448287&partnerID=8YFLogxK

U2 - 10.1109/TVLSI.2006.878468

DO - 10.1109/TVLSI.2006.878468

M3 - Article

VL - 14

SP - 791

EP - 801

JO - IEEE Transactions on Very Large Scale Integration (VLSI) Systems

JF - IEEE Transactions on Very Large Scale Integration (VLSI) Systems

SN - 1063-8210

IS - 8

M1 - 1664901

ER -