Technical challenges of supporting interactive HPC

Albert Reuther, Jeremy Kepner, Andy MCcabe, Julie Mullen, Nadya Bliss, Hahn Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

Users' demand for interactive, on-demand access to a large pool of high performance computing (HPC) resources is increasing. The majority of users at Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) are involved in the interactive development of sensor processing algorithms. This development often requires a large amount of computation due to the complexity of the algorithms being explored and/or the size of the data set being analyzed. These researchers also require rapid turnaround of their jobs because each iteration directly influences code changes made for the following iteration. Historically, batch queue systems have not been a good match for this kind of user. The Lincoln Laboratory Grid (LLGrid) system at MIT-LL is the largest dedicated interactive, on-demand HPC system in the world. While the system also accommodates some batch queue jobs, the vast majority of jobs submitted are interactive, on-demand jobs. Choosing between running a system with a batch queue or in an interactive, on-demand manner involves tradeoffs. This paper discusses the tradeoffs between operating a cluster as a batch system, an interactive, ondemand system, or a hybrid system. The LLGrid system has been operational for over three years, and now serves over 200 users from across Lincoln. The system has run over 100,000 interactive jobs. It has become an integral part of many researchers' algorithm development workflows. For instance, in batch queue systems, an individual user commonly can gain access to 25% of the processors in the system after the job has waited in the queue; in our experience with ondemand, interactive operation, individual users often can also gain access to 20-25% of the cluster processors. This paper will share a variety of the new data on our experiences with running an interactive, on-demand system that also provides some batch queue access.

Original languageEnglish (US)
Title of host publicationDepartment of Defense - Proceedings of the HPCMP Users Group Conference 2007; High Performance Computing Modernization Program: A Bridge to Future Defense, DoD HPCMP UGC
Pages403-409
Number of pages7
DOIs
StatePublished - 2007
Externally publishedYes
EventDepartment of Defense - HPCMP Users Group Conference 2007; High Performance Computing Modernization Program: A Bridge to Future Defense, DoD HPCMP UGC - Pittsburg, PA, United States
Duration: Jun 18 2007Jun 21 2007

Other

OtherDepartment of Defense - HPCMP Users Group Conference 2007; High Performance Computing Modernization Program: A Bridge to Future Defense, DoD HPCMP UGC
CountryUnited States
CityPittsburg, PA
Period6/18/076/21/07

Fingerprint

Hybrid systems
Sensors
Processing

Keywords

  • Cluster computing
  • Grid computing
  • Interactive high performance computing
  • On-demand
  • Parallel MATLAB

ASJC Scopus subject areas

  • Computer Science(all)
  • Software

Cite this

Reuther, A., Kepner, J., MCcabe, A., Mullen, J., Bliss, N., & Kim, H. (2007). Technical challenges of supporting interactive HPC. In Department of Defense - Proceedings of the HPCMP Users Group Conference 2007; High Performance Computing Modernization Program: A Bridge to Future Defense, DoD HPCMP UGC (pp. 403-409). [4438018] https://doi.org/10.1109/HPCMP-UGC.2007.72

Technical challenges of supporting interactive HPC. / Reuther, Albert; Kepner, Jeremy; MCcabe, Andy; Mullen, Julie; Bliss, Nadya; Kim, Hahn.

Department of Defense - Proceedings of the HPCMP Users Group Conference 2007; High Performance Computing Modernization Program: A Bridge to Future Defense, DoD HPCMP UGC. 2007. p. 403-409 4438018.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Reuther, A, Kepner, J, MCcabe, A, Mullen, J, Bliss, N & Kim, H 2007, Technical challenges of supporting interactive HPC. in Department of Defense - Proceedings of the HPCMP Users Group Conference 2007; High Performance Computing Modernization Program: A Bridge to Future Defense, DoD HPCMP UGC., 4438018, pp. 403-409, Department of Defense - HPCMP Users Group Conference 2007; High Performance Computing Modernization Program: A Bridge to Future Defense, DoD HPCMP UGC, Pittsburg, PA, United States, 6/18/07. https://doi.org/10.1109/HPCMP-UGC.2007.72
Reuther A, Kepner J, MCcabe A, Mullen J, Bliss N, Kim H. Technical challenges of supporting interactive HPC. In Department of Defense - Proceedings of the HPCMP Users Group Conference 2007; High Performance Computing Modernization Program: A Bridge to Future Defense, DoD HPCMP UGC. 2007. p. 403-409. 4438018 https://doi.org/10.1109/HPCMP-UGC.2007.72
Reuther, Albert ; Kepner, Jeremy ; MCcabe, Andy ; Mullen, Julie ; Bliss, Nadya ; Kim, Hahn. / Technical challenges of supporting interactive HPC. Department of Defense - Proceedings of the HPCMP Users Group Conference 2007; High Performance Computing Modernization Program: A Bridge to Future Defense, DoD HPCMP UGC. 2007. pp. 403-409
@inproceedings{68b1235cc3e7407783917bde84c69443,
title = "Technical challenges of supporting interactive HPC",
abstract = "Users' demand for interactive, on-demand access to a large pool of high performance computing (HPC) resources is increasing. The majority of users at Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) are involved in the interactive development of sensor processing algorithms. This development often requires a large amount of computation due to the complexity of the algorithms being explored and/or the size of the data set being analyzed. These researchers also require rapid turnaround of their jobs because each iteration directly influences code changes made for the following iteration. Historically, batch queue systems have not been a good match for this kind of user. The Lincoln Laboratory Grid (LLGrid) system at MIT-LL is the largest dedicated interactive, on-demand HPC system in the world. While the system also accommodates some batch queue jobs, the vast majority of jobs submitted are interactive, on-demand jobs. Choosing between running a system with a batch queue or in an interactive, on-demand manner involves tradeoffs. This paper discusses the tradeoffs between operating a cluster as a batch system, an interactive, ondemand system, or a hybrid system. The LLGrid system has been operational for over three years, and now serves over 200 users from across Lincoln. The system has run over 100,000 interactive jobs. It has become an integral part of many researchers' algorithm development workflows. For instance, in batch queue systems, an individual user commonly can gain access to 25{\%} of the processors in the system after the job has waited in the queue; in our experience with ondemand, interactive operation, individual users often can also gain access to 20-25{\%} of the cluster processors. This paper will share a variety of the new data on our experiences with running an interactive, on-demand system that also provides some batch queue access.",
keywords = "Cluster computing, Grid computing, Interactive high performance computing, On-demand, Parallel MATLAB",
author = "Albert Reuther and Jeremy Kepner and Andy MCcabe and Julie Mullen and Nadya Bliss and Hahn Kim",
year = "2007",
doi = "10.1109/HPCMP-UGC.2007.72",
language = "English (US)",
isbn = "0769530885",
pages = "403--409",
booktitle = "Department of Defense - Proceedings of the HPCMP Users Group Conference 2007; High Performance Computing Modernization Program: A Bridge to Future Defense, DoD HPCMP UGC",

}

TY - GEN

T1 - Technical challenges of supporting interactive HPC

AU - Reuther, Albert

AU - Kepner, Jeremy

AU - MCcabe, Andy

AU - Mullen, Julie

AU - Bliss, Nadya

AU - Kim, Hahn

PY - 2007

Y1 - 2007

N2 - Users' demand for interactive, on-demand access to a large pool of high performance computing (HPC) resources is increasing. The majority of users at Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) are involved in the interactive development of sensor processing algorithms. This development often requires a large amount of computation due to the complexity of the algorithms being explored and/or the size of the data set being analyzed. These researchers also require rapid turnaround of their jobs because each iteration directly influences code changes made for the following iteration. Historically, batch queue systems have not been a good match for this kind of user. The Lincoln Laboratory Grid (LLGrid) system at MIT-LL is the largest dedicated interactive, on-demand HPC system in the world. While the system also accommodates some batch queue jobs, the vast majority of jobs submitted are interactive, on-demand jobs. Choosing between running a system with a batch queue or in an interactive, on-demand manner involves tradeoffs. This paper discusses the tradeoffs between operating a cluster as a batch system, an interactive, ondemand system, or a hybrid system. The LLGrid system has been operational for over three years, and now serves over 200 users from across Lincoln. The system has run over 100,000 interactive jobs. It has become an integral part of many researchers' algorithm development workflows. For instance, in batch queue systems, an individual user commonly can gain access to 25% of the processors in the system after the job has waited in the queue; in our experience with ondemand, interactive operation, individual users often can also gain access to 20-25% of the cluster processors. This paper will share a variety of the new data on our experiences with running an interactive, on-demand system that also provides some batch queue access.

AB - Users' demand for interactive, on-demand access to a large pool of high performance computing (HPC) resources is increasing. The majority of users at Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) are involved in the interactive development of sensor processing algorithms. This development often requires a large amount of computation due to the complexity of the algorithms being explored and/or the size of the data set being analyzed. These researchers also require rapid turnaround of their jobs because each iteration directly influences code changes made for the following iteration. Historically, batch queue systems have not been a good match for this kind of user. The Lincoln Laboratory Grid (LLGrid) system at MIT-LL is the largest dedicated interactive, on-demand HPC system in the world. While the system also accommodates some batch queue jobs, the vast majority of jobs submitted are interactive, on-demand jobs. Choosing between running a system with a batch queue or in an interactive, on-demand manner involves tradeoffs. This paper discusses the tradeoffs between operating a cluster as a batch system, an interactive, ondemand system, or a hybrid system. The LLGrid system has been operational for over three years, and now serves over 200 users from across Lincoln. The system has run over 100,000 interactive jobs. It has become an integral part of many researchers' algorithm development workflows. For instance, in batch queue systems, an individual user commonly can gain access to 25% of the processors in the system after the job has waited in the queue; in our experience with ondemand, interactive operation, individual users often can also gain access to 20-25% of the cluster processors. This paper will share a variety of the new data on our experiences with running an interactive, on-demand system that also provides some batch queue access.

KW - Cluster computing

KW - Grid computing

KW - Interactive high performance computing

KW - On-demand

KW - Parallel MATLAB

UR - http://www.scopus.com/inward/record.url?scp=49949110340&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=49949110340&partnerID=8YFLogxK

U2 - 10.1109/HPCMP-UGC.2007.72

DO - 10.1109/HPCMP-UGC.2007.72

M3 - Conference contribution

AN - SCOPUS:49949110340

SN - 0769530885

SN - 9780769530888

SP - 403

EP - 409

BT - Department of Defense - Proceedings of the HPCMP Users Group Conference 2007; High Performance Computing Modernization Program: A Bridge to Future Defense, DoD HPCMP UGC

ER -