Aggregating processor free time for energy reduction

Aviral Shrivastava, Eugene Earlie, Nikil Dutt, Alex Nicolau

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

Even after carefully tuning the memory characteristics to the application properties and the processor speed, during the execution of real applications there are times when the processor stalls, waiting for data from the memory. Processor stall can be used to increase the throughput by temporarily switching to a different thread of execution, or reduce the power and energy consumption by temporarily switching the processor to low-power mode. However, any such technique has a performance overhead in terms of switching time. Even though over the execution of an application the processor is stalled for a considerable amount of time, each stall duration is too small to profitably perform any state switch. In this paper, we present code transformations to aggregate processor free time. Our experiments on the Intel XScale and Stream kernels show that up to 50,000 processor cycles can be aggregated, and used to profitably switch the processor to low-power mode. We further show that our code transformations can switch the processor to low-power mode for up to 75% of kernel runtime, achieving up to 18% of processor energy savings on multimedia applications. Our technique requires minimal architectural modifications and incurs negligible (< 1%) performance loss.

Original languageEnglish (US)
Title of host publicationCODES+ISSS 2005 - International Conference on Hardware/Software Codesign and System Synthesis
Pages154-159
Number of pages6
StatePublished - 2005
Externally publishedYes
Event3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and Systems Synthesis CODES+ISSS 2005 - Jersey City, NJ, United States
Duration: Sep 18 2005Sep 21 2005

Other

Other3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and Systems Synthesis CODES+ISSS 2005
CountryUnited States
CityJersey City, NJ
Period9/18/059/21/05

Fingerprint

Switches
Data storage equipment
Energy conservation
Electric power utilization
Energy utilization
Tuning
Throughput
Experiments

Keywords

  • Aggregation
  • Clock Gating
  • Code Transformation
  • Embedded Systems
  • Energy Reduction
  • Processor Free Time

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Shrivastava, A., Earlie, E., Dutt, N., & Nicolau, A. (2005). Aggregating processor free time for energy reduction. In CODES+ISSS 2005 - International Conference on Hardware/Software Codesign and System Synthesis (pp. 154-159)

Aggregating processor free time for energy reduction. / Shrivastava, Aviral; Earlie, Eugene; Dutt, Nikil; Nicolau, Alex.

CODES+ISSS 2005 - International Conference on Hardware/Software Codesign and System Synthesis. 2005. p. 154-159.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shrivastava, A, Earlie, E, Dutt, N & Nicolau, A 2005, Aggregating processor free time for energy reduction. in CODES+ISSS 2005 - International Conference on Hardware/Software Codesign and System Synthesis. pp. 154-159, 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and Systems Synthesis CODES+ISSS 2005, Jersey City, NJ, United States, 9/18/05.
Shrivastava A, Earlie E, Dutt N, Nicolau A. Aggregating processor free time for energy reduction. In CODES+ISSS 2005 - International Conference on Hardware/Software Codesign and System Synthesis. 2005. p. 154-159
Shrivastava, Aviral ; Earlie, Eugene ; Dutt, Nikil ; Nicolau, Alex. / Aggregating processor free time for energy reduction. CODES+ISSS 2005 - International Conference on Hardware/Software Codesign and System Synthesis. 2005. pp. 154-159
@inproceedings{f7afd0260558464bb8f9f75d42d24673,
title = "Aggregating processor free time for energy reduction",
abstract = "Even after carefully tuning the memory characteristics to the application properties and the processor speed, during the execution of real applications there are times when the processor stalls, waiting for data from the memory. Processor stall can be used to increase the throughput by temporarily switching to a different thread of execution, or reduce the power and energy consumption by temporarily switching the processor to low-power mode. However, any such technique has a performance overhead in terms of switching time. Even though over the execution of an application the processor is stalled for a considerable amount of time, each stall duration is too small to profitably perform any state switch. In this paper, we present code transformations to aggregate processor free time. Our experiments on the Intel XScale and Stream kernels show that up to 50,000 processor cycles can be aggregated, and used to profitably switch the processor to low-power mode. We further show that our code transformations can switch the processor to low-power mode for up to 75{\%} of kernel runtime, achieving up to 18{\%} of processor energy savings on multimedia applications. Our technique requires minimal architectural modifications and incurs negligible (< 1{\%}) performance loss.",
keywords = "Aggregation, Clock Gating, Code Transformation, Embedded Systems, Energy Reduction, Processor Free Time",
author = "Aviral Shrivastava and Eugene Earlie and Nikil Dutt and Alex Nicolau",
year = "2005",
language = "English (US)",
isbn = "1595931619",
pages = "154--159",
booktitle = "CODES+ISSS 2005 - International Conference on Hardware/Software Codesign and System Synthesis",

}

TY - GEN

T1 - Aggregating processor free time for energy reduction

AU - Shrivastava, Aviral

AU - Earlie, Eugene

AU - Dutt, Nikil

AU - Nicolau, Alex

PY - 2005

Y1 - 2005

N2 - Even after carefully tuning the memory characteristics to the application properties and the processor speed, during the execution of real applications there are times when the processor stalls, waiting for data from the memory. Processor stall can be used to increase the throughput by temporarily switching to a different thread of execution, or reduce the power and energy consumption by temporarily switching the processor to low-power mode. However, any such technique has a performance overhead in terms of switching time. Even though over the execution of an application the processor is stalled for a considerable amount of time, each stall duration is too small to profitably perform any state switch. In this paper, we present code transformations to aggregate processor free time. Our experiments on the Intel XScale and Stream kernels show that up to 50,000 processor cycles can be aggregated, and used to profitably switch the processor to low-power mode. We further show that our code transformations can switch the processor to low-power mode for up to 75% of kernel runtime, achieving up to 18% of processor energy savings on multimedia applications. Our technique requires minimal architectural modifications and incurs negligible (< 1%) performance loss.

AB - Even after carefully tuning the memory characteristics to the application properties and the processor speed, during the execution of real applications there are times when the processor stalls, waiting for data from the memory. Processor stall can be used to increase the throughput by temporarily switching to a different thread of execution, or reduce the power and energy consumption by temporarily switching the processor to low-power mode. However, any such technique has a performance overhead in terms of switching time. Even though over the execution of an application the processor is stalled for a considerable amount of time, each stall duration is too small to profitably perform any state switch. In this paper, we present code transformations to aggregate processor free time. Our experiments on the Intel XScale and Stream kernels show that up to 50,000 processor cycles can be aggregated, and used to profitably switch the processor to low-power mode. We further show that our code transformations can switch the processor to low-power mode for up to 75% of kernel runtime, achieving up to 18% of processor energy savings on multimedia applications. Our technique requires minimal architectural modifications and incurs negligible (< 1%) performance loss.

KW - Aggregation

KW - Clock Gating

KW - Code Transformation

KW - Embedded Systems

KW - Energy Reduction

KW - Processor Free Time

UR - http://www.scopus.com/inward/record.url?scp=27644561753&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=27644561753&partnerID=8YFLogxK

M3 - Conference contribution

SN - 1595931619

SN - 9781595931610

SP - 154

EP - 159

BT - CODES+ISSS 2005 - International Conference on Hardware/Software Codesign and System Synthesis

ER -