Model-driven coordinated management of data centers

Tridib Mukherjee, Ayan Banerjee, Georgios Varsamopoulos, Sandeep Gupta

Research output: Contribution to journalArticle

18 Scopus citations

Abstract

Management of computing infrastructure in data centers is an important and challenging problem, that needs to: (i) ensure availability of services conforming to the Service Level Agreements (SLAs); and (ii) reduce the Power Usage Effectiveness (PUE), i.e. the ratio of total power, up to half of which is attributed to data center cooling, over the computing power to service the workloads. The cooling energy consumption can be reduced by allowing higher-than-usual thermostat set temperatures while maintaining the ambient temperature in the data center room within manufacturer-specified server redline temperatures for their reliable operations. This paper proposes: (i) a Coordinated Job, Power, and Cooling Management (JPCM) policy, which performs: (a) job management so as to allow for an increase in the thermostat setting of the cooling unit while meeting the SLA requirements, (b) power management to reduce the produced thermal load, and (c) cooling management to dynamically adjust the thermostat setting; and (ii) a Model-driven coordinated Management Architecture (MMA), which uses a state-based model to dynamically decide the correct management policy to handle events, such as new workload arrival or failure of a cooling unit, that can trigger an increase in the ambient temperature. Each event is associated with a time window, referred to as the window-of-opportunity, after which the temperature at the inlet of one or more servers can go beyond the redline temperature if proper management policies are not enforced. This window-of-opportunity monotonically decreases with increase in the incoming workload. The selection of the management policy depends on their potential energy benefits and the conformance of the delays in their actuation to the window-of-opportunity. Simulations based on actual job traces from the ASU HPC data center show that the JPCM can achieve up to 18% energy-savings over separated power or job management policies. However, high delay to reach a stable ambient temperature (in case of cooling management through dynamic thermostat setting) can violate the server redline temperatures. A management decision chart is developed as part of MMA to autonomically employ the management policy with maximum energy-savings without violating the window-of-opportunity, and hence the redline temperatures. Further, a prototype of the JPCM is developed by configuring the widely used Moab cluster manager to dynamically change the server priorities for job assignment.

Original languageEnglish (US)
Pages (from-to)2869-2886
Number of pages18
JournalComputer Networks
Volume54
Issue number16
DOIs
StatePublished - Nov 15 2010

Keywords

  • Cooling management
  • Coordinated management
  • Data Center
  • Job management
  • Power management

ASJC Scopus subject areas

  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'Model-driven coordinated management of data centers'. Together they form a unique fingerprint.

  • Cite this