Data handling inefficiencies between CUDA, 3D rendering, and system memory

Brian Gordon, Sohum Sohoni, Damon Chandler

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

While GPGPU programming offers faster computation of highly parallelized code, the memory bandwidth between the system and the GPU can create a bottleneck that reduces the potential gains. CUDA is a prominent GPGPU API which can transfer data to and from system code, and which can also access data used by 3D rendering APIs. In an application that relies on both GPU programming APIs to accelerate 3D modeling and an easily parallelized algorithm, the hidden inefficiencies of nVidia's data handling with CUDA become apparent. First, CUDA uses the CPU's store units to copy data between the graphics card and system memory instead of using a more efficient method like DMA. Second, data exchanged between the two GPU-based APIs travels through the main processor instead of staying on the GPU. As a result, a non-GPGPU implementation of a program runs faster than the same program using GPGPU.

Original languageEnglish (US)
Title of host publicationIEEE International Symposium on Workload Characterization, IISWC'10
DOIs
StatePublished - 2010
Externally publishedYes
Event2010 IEEE International Symposium on Workload Characterization, IISWC'10 - Atlanta, GA, United States
Duration: Dec 2 2010Dec 4 2010

Other

Other2010 IEEE International Symposium on Workload Characterization, IISWC'10
CountryUnited States
CityAtlanta, GA
Period12/2/1012/4/10

Fingerprint

Data handling
Application programming interfaces (API)
Computer systems
Data storage equipment
Program processors
Dynamic mechanical analysis
Data transfer
Computer programming
Bandwidth
Graphics processing unit

ASJC Scopus subject areas

  • Computer Science Applications
  • Electrical and Electronic Engineering

Cite this

Gordon, B., Sohoni, S., & Chandler, D. (2010). Data handling inefficiencies between CUDA, 3D rendering, and system memory. In IEEE International Symposium on Workload Characterization, IISWC'10 [5648828] https://doi.org/10.1109/IISWC.2010.5648828

Data handling inefficiencies between CUDA, 3D rendering, and system memory. / Gordon, Brian; Sohoni, Sohum; Chandler, Damon.

IEEE International Symposium on Workload Characterization, IISWC'10. 2010. 5648828.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gordon, B, Sohoni, S & Chandler, D 2010, Data handling inefficiencies between CUDA, 3D rendering, and system memory. in IEEE International Symposium on Workload Characterization, IISWC'10., 5648828, 2010 IEEE International Symposium on Workload Characterization, IISWC'10, Atlanta, GA, United States, 12/2/10. https://doi.org/10.1109/IISWC.2010.5648828
Gordon B, Sohoni S, Chandler D. Data handling inefficiencies between CUDA, 3D rendering, and system memory. In IEEE International Symposium on Workload Characterization, IISWC'10. 2010. 5648828 https://doi.org/10.1109/IISWC.2010.5648828
Gordon, Brian ; Sohoni, Sohum ; Chandler, Damon. / Data handling inefficiencies between CUDA, 3D rendering, and system memory. IEEE International Symposium on Workload Characterization, IISWC'10. 2010.
@inproceedings{1970b650b1bf4c06b73183d302f85664,
title = "Data handling inefficiencies between CUDA, 3D rendering, and system memory",
abstract = "While GPGPU programming offers faster computation of highly parallelized code, the memory bandwidth between the system and the GPU can create a bottleneck that reduces the potential gains. CUDA is a prominent GPGPU API which can transfer data to and from system code, and which can also access data used by 3D rendering APIs. In an application that relies on both GPU programming APIs to accelerate 3D modeling and an easily parallelized algorithm, the hidden inefficiencies of nVidia's data handling with CUDA become apparent. First, CUDA uses the CPU's store units to copy data between the graphics card and system memory instead of using a more efficient method like DMA. Second, data exchanged between the two GPU-based APIs travels through the main processor instead of staying on the GPU. As a result, a non-GPGPU implementation of a program runs faster than the same program using GPGPU.",
author = "Brian Gordon and Sohum Sohoni and Damon Chandler",
year = "2010",
doi = "10.1109/IISWC.2010.5648828",
language = "English (US)",
isbn = "9781424492978",
booktitle = "IEEE International Symposium on Workload Characterization, IISWC'10",

}

TY - GEN

T1 - Data handling inefficiencies between CUDA, 3D rendering, and system memory

AU - Gordon, Brian

AU - Sohoni, Sohum

AU - Chandler, Damon

PY - 2010

Y1 - 2010

N2 - While GPGPU programming offers faster computation of highly parallelized code, the memory bandwidth between the system and the GPU can create a bottleneck that reduces the potential gains. CUDA is a prominent GPGPU API which can transfer data to and from system code, and which can also access data used by 3D rendering APIs. In an application that relies on both GPU programming APIs to accelerate 3D modeling and an easily parallelized algorithm, the hidden inefficiencies of nVidia's data handling with CUDA become apparent. First, CUDA uses the CPU's store units to copy data between the graphics card and system memory instead of using a more efficient method like DMA. Second, data exchanged between the two GPU-based APIs travels through the main processor instead of staying on the GPU. As a result, a non-GPGPU implementation of a program runs faster than the same program using GPGPU.

AB - While GPGPU programming offers faster computation of highly parallelized code, the memory bandwidth between the system and the GPU can create a bottleneck that reduces the potential gains. CUDA is a prominent GPGPU API which can transfer data to and from system code, and which can also access data used by 3D rendering APIs. In an application that relies on both GPU programming APIs to accelerate 3D modeling and an easily parallelized algorithm, the hidden inefficiencies of nVidia's data handling with CUDA become apparent. First, CUDA uses the CPU's store units to copy data between the graphics card and system memory instead of using a more efficient method like DMA. Second, data exchanged between the two GPU-based APIs travels through the main processor instead of staying on the GPU. As a result, a non-GPGPU implementation of a program runs faster than the same program using GPGPU.

UR - http://www.scopus.com/inward/record.url?scp=78751496641&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78751496641&partnerID=8YFLogxK

U2 - 10.1109/IISWC.2010.5648828

DO - 10.1109/IISWC.2010.5648828

M3 - Conference contribution

SN - 9781424492978

BT - IEEE International Symposium on Workload Characterization, IISWC'10

ER -