TY - JOUR
T1 - Energy-efficient thermal-aware task scheduling for homogeneous high-performance computing data centers
T2 - A cyber-physical approach
AU - Tang, Qinghui
AU - Gupta, Sandeep
AU - Varsamopoulos, Georgios
N1 - Funding Information:
The authors thank Dan Stanzione for granting access to the ASU Fulton HPCI facility and its logs, Michael Jonas for processing the Fulton HPCI data, Tridib Mukherjee for performing and documenting the power measurements, and Ayan Banerjee for assisting with the simulations. They also thank Sanjay Rungta from Intel Corp. for his collaboration in power profiling the equipment, and the anonymous reviewers for their insightful comments and suggestions. This work was supported in part by grants from Intel Corp., Science Foundation Arizona, and the National Science Foundation (CNS#0649868). A preliminary version of this paper appeared in IEEE Cluster 2007 [1].
PY - 2008
Y1 - 2008
N2 - High Performance Computing data centers have been rapidly growing, both in number and in size. Thermal management of data centers can address dominant problems associated with cooling such as the recirculation of hot air from the equipment outlets to their inlets, and the appearance of hot spots. In this paper, we are looking into assigning the incoming tasks to machines of a data center in such a way so as to affect the heat recirculation and make cooling more efficient. Using a low complexity linear heat recirculation model, we formulate the problem of minimizing the peak inlet temperature within a data center through task assignment, consequently leading to minimal cooling power consumption. We also provide two methods to solve the formulation, one that uses a genetic algorithm and the other that uses sequential quadratic programming. We show through formalization that minimizing the peak inlet temperature allows for the lowest cooling power needs. Results from a simulated, small-scale data center show that solving the formulation leads to an inlet temperature distribution that is 2 °C to 5 °C lower compared to other approaches, and achieves about 20 to 30% cooling energy savings at moderate data center utilization rates. Moreover, our algorithms consistently outperform MinHR, a recirculation-reducing placement algorithm in the literature.
AB - High Performance Computing data centers have been rapidly growing, both in number and in size. Thermal management of data centers can address dominant problems associated with cooling such as the recirculation of hot air from the equipment outlets to their inlets, and the appearance of hot spots. In this paper, we are looking into assigning the incoming tasks to machines of a data center in such a way so as to affect the heat recirculation and make cooling more efficient. Using a low complexity linear heat recirculation model, we formulate the problem of minimizing the peak inlet temperature within a data center through task assignment, consequently leading to minimal cooling power consumption. We also provide two methods to solve the formulation, one that uses a genetic algorithm and the other that uses sequential quadratic programming. We show through formalization that minimizing the peak inlet temperature allows for the lowest cooling power needs. Results from a simulated, small-scale data center show that solving the formulation leads to an inlet temperature distribution that is 2 °C to 5 °C lower compared to other approaches, and achieves about 20 to 30% cooling energy savings at moderate data center utilization rates. Moreover, our algorithms consistently outperform MinHR, a recirculation-reducing placement algorithm in the literature.
KW - Energy-aware systems
KW - Evaluation
KW - Measurement
KW - Modeling
KW - Modeling techniques
KW - Performance analysis and design
KW - Simulation of multiple-processor systems
UR - http://www.scopus.com/inward/record.url?scp=54249125817&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=54249125817&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2008.111
DO - 10.1109/TPDS.2008.111
M3 - Article
AN - SCOPUS:54249125817
SN - 1045-9219
VL - 19
SP - 1458
EP - 1472
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 11
ER -