Parallelization techniques for implementing trellis algorithms on graphics processors

Q. Zheng; Y. Chen; R. Dreslinski; Chaitali Chakrabarti; A. Anastasopoulos; S. Mahlke; T. Mudge

doi:10.1109/ISCAS.2013.6572072

Parallelization techniques for implementing trellis algorithms on graphics processors

Q. Zheng, Y. Chen, R. Dreslinski, Chaitali Chakrabarti, A. Anastasopoulos, S. Mahlke, T. Mudge

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

3 Scopus citations

Abstract

In this paper, we study different schemes to parallelize trellis algorithms for efficient implementation on a GPU. We consider parallelization schemes at the packet-level, subblock-level and trellis-level to increase the number of threads in a GPU implementation. At the trellis-level, we consider state-level, forward-backward traversal and branch-metric parallelism. To evaluate the performance of the different schemes, an LTE uplink Turbo decoder is implemented on an NVIDIA GTX470 GPU. Tradeoffs between throughput, latency and bit error rate are presented. Our most balanced configuration is simultaneously processing multiple subblocks in a packet in conjunction with recovery schemes and trellis-level parallelism, which can achieve a throughput of 19.65 Mbps with a latency of 0.56 ms at bit error rate of 10^-5 for 1.3 dB channel SNR. We also show how different combinations of parallelization schemes can be used to satisfy systems with widely varying requirements of throughput, latency and bit error rate.

Original language	English (US)
Title of host publication	2013 IEEE International Symposium on Circuits and Systems, ISCAS 2013
Pages	1220-1223
Number of pages	4
DOIs	https://doi.org/10.1109/ISCAS.2013.6572072
State	Published - 2013
Event	2013 IEEE International Symposium on Circuits and Systems, ISCAS 2013 - Beijing, China Duration: May 19 2013 → May 23 2013

Publication series

Name	Proceedings - IEEE International Symposium on Circuits and Systems
ISSN (Print)	0271-4310

Other

Other	2013 IEEE International Symposium on Circuits and Systems, ISCAS 2013
Country/Territory	China
City	Beijing
Period	5/19/13 → 5/23/13

ASJC Scopus subject areas

Electrical and Electronic Engineering

Access to Document

10.1109/ISCAS.2013.6572072

Cite this

Zheng, Q., Chen, Y., Dreslinski, R., Chakrabarti, C., Anastasopoulos, A., Mahlke, S., & Mudge, T. (2013). Parallelization techniques for implementing trellis algorithms on graphics processors. In 2013 IEEE International Symposium on Circuits and Systems, ISCAS 2013 (pp. 1220-1223). Article 6572072 (Proceedings - IEEE International Symposium on Circuits and Systems). https://doi.org/10.1109/ISCAS.2013.6572072

Parallelization techniques for implementing trellis algorithms on graphics processors. / Zheng, Q.; Chen, Y.; Dreslinski, R. et al.
2013 IEEE International Symposium on Circuits and Systems, ISCAS 2013. 2013. p. 1220-1223 6572072 (Proceedings - IEEE International Symposium on Circuits and Systems).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Zheng, Q, Chen, Y, Dreslinski, R, Chakrabarti, C, Anastasopoulos, A, Mahlke, S & Mudge, T 2013, Parallelization techniques for implementing trellis algorithms on graphics processors. in 2013 IEEE International Symposium on Circuits and Systems, ISCAS 2013., 6572072, Proceedings - IEEE International Symposium on Circuits and Systems, pp. 1220-1223, 2013 IEEE International Symposium on Circuits and Systems, ISCAS 2013, Beijing, China, 5/19/13. https://doi.org/10.1109/ISCAS.2013.6572072

Zheng Q, Chen Y, Dreslinski R, Chakrabarti C, Anastasopoulos A, Mahlke S et al. Parallelization techniques for implementing trellis algorithms on graphics processors. In 2013 IEEE International Symposium on Circuits and Systems, ISCAS 2013. 2013. p. 1220-1223. 6572072. (Proceedings - IEEE International Symposium on Circuits and Systems). doi: 10.1109/ISCAS.2013.6572072

@inproceedings{431cd4567a8a423e9a22f7629be38500,

title = "Parallelization techniques for implementing trellis algorithms on graphics processors",

abstract = "In this paper, we study different schemes to parallelize trellis algorithms for efficient implementation on a GPU. We consider parallelization schemes at the packet-level, subblock-level and trellis-level to increase the number of threads in a GPU implementation. At the trellis-level, we consider state-level, forward-backward traversal and branch-metric parallelism. To evaluate the performance of the different schemes, an LTE uplink Turbo decoder is implemented on an NVIDIA GTX470 GPU. Tradeoffs between throughput, latency and bit error rate are presented. Our most balanced configuration is simultaneously processing multiple subblocks in a packet in conjunction with recovery schemes and trellis-level parallelism, which can achieve a throughput of 19.65 Mbps with a latency of 0.56 ms at bit error rate of 10-5 for 1.3 dB channel SNR. We also show how different combinations of parallelization schemes can be used to satisfy systems with widely varying requirements of throughput, latency and bit error rate.",

author = "Q. Zheng and Y. Chen and R. Dreslinski and Chaitali Chakrabarti and A. Anastasopoulos and S. Mahlke and T. Mudge",

year = "2013",

doi = "10.1109/ISCAS.2013.6572072",

language = "English (US)",

isbn = "9781467357609",

series = "Proceedings - IEEE International Symposium on Circuits and Systems",

pages = "1220--1223",

booktitle = "2013 IEEE International Symposium on Circuits and Systems, ISCAS 2013",

note = "2013 IEEE International Symposium on Circuits and Systems, ISCAS 2013 ; Conference date: 19-05-2013 Through 23-05-2013",

}

TY - GEN

T1 - Parallelization techniques for implementing trellis algorithms on graphics processors

AU - Zheng, Q.

AU - Chen, Y.

AU - Dreslinski, R.

AU - Chakrabarti, Chaitali

AU - Anastasopoulos, A.

AU - Mahlke, S.

AU - Mudge, T.

PY - 2013

Y1 - 2013

N2 - In this paper, we study different schemes to parallelize trellis algorithms for efficient implementation on a GPU. We consider parallelization schemes at the packet-level, subblock-level and trellis-level to increase the number of threads in a GPU implementation. At the trellis-level, we consider state-level, forward-backward traversal and branch-metric parallelism. To evaluate the performance of the different schemes, an LTE uplink Turbo decoder is implemented on an NVIDIA GTX470 GPU. Tradeoffs between throughput, latency and bit error rate are presented. Our most balanced configuration is simultaneously processing multiple subblocks in a packet in conjunction with recovery schemes and trellis-level parallelism, which can achieve a throughput of 19.65 Mbps with a latency of 0.56 ms at bit error rate of 10-5 for 1.3 dB channel SNR. We also show how different combinations of parallelization schemes can be used to satisfy systems with widely varying requirements of throughput, latency and bit error rate.

AB - In this paper, we study different schemes to parallelize trellis algorithms for efficient implementation on a GPU. We consider parallelization schemes at the packet-level, subblock-level and trellis-level to increase the number of threads in a GPU implementation. At the trellis-level, we consider state-level, forward-backward traversal and branch-metric parallelism. To evaluate the performance of the different schemes, an LTE uplink Turbo decoder is implemented on an NVIDIA GTX470 GPU. Tradeoffs between throughput, latency and bit error rate are presented. Our most balanced configuration is simultaneously processing multiple subblocks in a packet in conjunction with recovery schemes and trellis-level parallelism, which can achieve a throughput of 19.65 Mbps with a latency of 0.56 ms at bit error rate of 10-5 for 1.3 dB channel SNR. We also show how different combinations of parallelization schemes can be used to satisfy systems with widely varying requirements of throughput, latency and bit error rate.

UR - http://www.scopus.com/inward/record.url?scp=84883326393&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84883326393&partnerID=8YFLogxK

U2 - 10.1109/ISCAS.2013.6572072

DO - 10.1109/ISCAS.2013.6572072

M3 - Conference contribution

AN - SCOPUS:84883326393

SN - 9781467357609

T3 - Proceedings - IEEE International Symposium on Circuits and Systems

SP - 1220

EP - 1223

BT - 2013 IEEE International Symposium on Circuits and Systems, ISCAS 2013

T2 - 2013 IEEE International Symposium on Circuits and Systems, ISCAS 2013

Y2 - 19 May 2013 through 23 May 2013

ER -

Parallelization techniques for implementing trellis algorithms on graphics processors

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this