TY - GEN
T1 - Data handling inefficiencies between CUDA, 3D rendering, and system memory
AU - Gordon, Brian
AU - Sohoni, Sohum
AU - Chandler, Damon
PY - 2010/12/1
Y1 - 2010/12/1
N2 - While GPGPU programming offers faster computation of highly parallelized code, the memory bandwidth between the system and the GPU can create a bottleneck that reduces the potential gains. CUDA is a prominent GPGPU API which can transfer data to and from system code, and which can also access data used by 3D rendering APIs. In an application that relies on both GPU programming APIs to accelerate 3D modeling and an easily parallelized algorithm, the hidden inefficiencies of nVidia's data handling with CUDA become apparent. First, CUDA uses the CPU's store units to copy data between the graphics card and system memory instead of using a more efficient method like DMA. Second, data exchanged between the two GPU-based APIs travels through the main processor instead of staying on the GPU. As a result, a non-GPGPU implementation of a program runs faster than the same program using GPGPU.
AB - While GPGPU programming offers faster computation of highly parallelized code, the memory bandwidth between the system and the GPU can create a bottleneck that reduces the potential gains. CUDA is a prominent GPGPU API which can transfer data to and from system code, and which can also access data used by 3D rendering APIs. In an application that relies on both GPU programming APIs to accelerate 3D modeling and an easily parallelized algorithm, the hidden inefficiencies of nVidia's data handling with CUDA become apparent. First, CUDA uses the CPU's store units to copy data between the graphics card and system memory instead of using a more efficient method like DMA. Second, data exchanged between the two GPU-based APIs travels through the main processor instead of staying on the GPU. As a result, a non-GPGPU implementation of a program runs faster than the same program using GPGPU.
UR - http://www.scopus.com/inward/record.url?scp=78751496641&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78751496641&partnerID=8YFLogxK
U2 - 10.1109/IISWC.2010.5648828
DO - 10.1109/IISWC.2010.5648828
M3 - Conference contribution
AN - SCOPUS:78751496641
SN - 9781424492978
T3 - IEEE International Symposium on Workload Characterization, IISWC'10
BT - IEEE International Symposium on Workload Characterization, IISWC'10
T2 - 2010 IEEE International Symposium on Workload Characterization, IISWC'10
Y2 - 2 December 2010 through 4 December 2010
ER -