Spatial-temporal Data Compression of Dynamic Vision Sensor Output with High Pixel-level Saliency using Low-precision Sparse Autoencoder

Ahmed Hasssan; Jian Meng; Yu Cao; Jae Sun Seo

doi:10.1109/IEEECONF56349.2022.10051946

Spatial-temporal Data Compression of Dynamic Vision Sensor Output with High Pixel-level Saliency using Low-precision Sparse Autoencoder

Ahmed Hasssan, Jian Meng, Yu Cao, Jae Sun Seo

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

1 Scopus citations

Abstract

Imaging innovations such as dynamic vision sensor (DVS) can significantly reduce the image data volume by tracking only the changes in events. However, when DVS camera itself moves around (e.g. self-driving cars), the DVS output stream is not sparse enough to achieve the desired hardware efficiency. In this work, we investigate designing a compact sparse auto encoder model to largely compress event-based DVS output. The proposed encoder-decoder-based autoencoder design is a shallow convolutional neural network (CNN) architecture with two convolution and inverse-convolution layers with only ~ 10k parameters. We implement quantization-aware training on our proposed model to achieve 2-bit and 4-bit precision. Moreover, we implement unstructured pruning on the encoder module to achieve >90 % active pixel compression at the latent space. The proposed autoencoder design has been validated against multiple benchmark DVS-based datasets including DVS-MNIST, N-Cars, DVS-IBM Gesture, and Prophesee Automotive Gen1 dataset. We achieve low accuracy drop of 2%, 3%, and 3.8% compared to the uncompressed baseline, with 7.08%, 1.36%, and 5.53 % active pixels in the images from the decoder (compression ratio of 13.1×, 29.1×, and 18.1×) for DVS-MNIST, N-Cars, and DVS-IBM Gesture datasets, respectively. For the Prophesee Automotive Gen1 dataset, we achieve a minimal mAP drop of 0.07 from the baseline with 9% active pixels in the images from the decoder (compression ratio of 11.9×).

Original language	English (US)
Title of host publication	56th Asilomar Conference on Signals, Systems and Computers, ACSSC 2022
Editors	Michael B. Matthews
Publisher	IEEE Computer Society
Pages	344-348
Number of pages	5
ISBN (Electronic)	9781665459068
DOIs	https://doi.org/10.1109/IEEECONF56349.2022.10051946
State	Published - 2022
Event	56th Asilomar Conference on Signals, Systems and Computers, ACSSC 2022 - Virtual, Online, United States Duration: Oct 31 2022 → Nov 2 2022

Publication series

Name	Conference Record - Asilomar Conference on Signals, Systems and Computers
Volume	2022-October
ISSN (Print)	1058-6393

Conference

Conference	56th Asilomar Conference on Signals, Systems and Computers, ACSSC 2022
Country/Territory	United States
City	Virtual, Online
Period	10/31/22 → 11/2/22

Keywords

Dynamic vision sensor
Neural network pruning
Neural network training
Object recognition
Quantization
Sparse autoencoder
detection

ASJC Scopus subject areas

Signal Processing
Computer Networks and Communications

Access to Document

10.1109/IEEECONF56349.2022.10051946

Cite this

Hasssan, A., Meng, J., Cao, Y., & Seo, J. S. (2022). Spatial-temporal Data Compression of Dynamic Vision Sensor Output with High Pixel-level Saliency using Low-precision Sparse Autoencoder. In M. B. Matthews (Ed.), 56th Asilomar Conference on Signals, Systems and Computers, ACSSC 2022 (pp. 344-348). (Conference Record - Asilomar Conference on Signals, Systems and Computers; Vol. 2022-October). IEEE Computer Society. https://doi.org/10.1109/IEEECONF56349.2022.10051946

Spatial-temporal Data Compression of Dynamic Vision Sensor Output with High Pixel-level Saliency using Low-precision Sparse Autoencoder. / Hasssan, Ahmed; Meng, Jian; Cao, Yu et al.
56th Asilomar Conference on Signals, Systems and Computers, ACSSC 2022. ed. / Michael B. Matthews. IEEE Computer Society, 2022. p. 344-348 (Conference Record - Asilomar Conference on Signals, Systems and Computers; Vol. 2022-October).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Hasssan, A, Meng, J, Cao, Y & Seo, JS 2022, Spatial-temporal Data Compression of Dynamic Vision Sensor Output with High Pixel-level Saliency using Low-precision Sparse Autoencoder. in MB Matthews (ed.), 56th Asilomar Conference on Signals, Systems and Computers, ACSSC 2022. Conference Record - Asilomar Conference on Signals, Systems and Computers, vol. 2022-October, IEEE Computer Society, pp. 344-348, 56th Asilomar Conference on Signals, Systems and Computers, ACSSC 2022, Virtual, Online, United States, 10/31/22. https://doi.org/10.1109/IEEECONF56349.2022.10051946

Hasssan A, Meng J, Cao Y, Seo JS. Spatial-temporal Data Compression of Dynamic Vision Sensor Output with High Pixel-level Saliency using Low-precision Sparse Autoencoder. In Matthews MB, editor, 56th Asilomar Conference on Signals, Systems and Computers, ACSSC 2022. IEEE Computer Society. 2022. p. 344-348. (Conference Record - Asilomar Conference on Signals, Systems and Computers). doi: 10.1109/IEEECONF56349.2022.10051946

Hasssan, Ahmed ; Meng, Jian ; Cao, Yu et al. / Spatial-temporal Data Compression of Dynamic Vision Sensor Output with High Pixel-level Saliency using Low-precision Sparse Autoencoder. 56th Asilomar Conference on Signals, Systems and Computers, ACSSC 2022. editor / Michael B. Matthews. IEEE Computer Society, 2022. pp. 344-348 (Conference Record - Asilomar Conference on Signals, Systems and Computers).

@inproceedings{b8c2fb5bae134577995064e61580dcbf,

title = "Spatial-temporal Data Compression of Dynamic Vision Sensor Output with High Pixel-level Saliency using Low-precision Sparse Autoencoder",

abstract = "Imaging innovations such as dynamic vision sensor (DVS) can significantly reduce the image data volume by tracking only the changes in events. However, when DVS camera itself moves around (e.g. self-driving cars), the DVS output stream is not sparse enough to achieve the desired hardware efficiency. In this work, we investigate designing a compact sparse auto encoder model to largely compress event-based DVS output. The proposed encoder-decoder-based autoencoder design is a shallow convolutional neural network (CNN) architecture with two convolution and inverse-convolution layers with only ~ 10k parameters. We implement quantization-aware training on our proposed model to achieve 2-bit and 4-bit precision. Moreover, we implement unstructured pruning on the encoder module to achieve >90 % active pixel compression at the latent space. The proposed autoencoder design has been validated against multiple benchmark DVS-based datasets including DVS-MNIST, N-Cars, DVS-IBM Gesture, and Prophesee Automotive Gen1 dataset. We achieve low accuracy drop of 2%, 3%, and 3.8% compared to the uncompressed baseline, with 7.08%, 1.36%, and 5.53 % active pixels in the images from the decoder (compression ratio of 13.1×, 29.1×, and 18.1×) for DVS-MNIST, N-Cars, and DVS-IBM Gesture datasets, respectively. For the Prophesee Automotive Gen1 dataset, we achieve a minimal mAP drop of 0.07 from the baseline with 9% active pixels in the images from the decoder (compression ratio of 11.9×).",

keywords = "Dynamic vision sensor, Neural network pruning, Neural network training, Object recognition, Quantization, Sparse autoencoder, detection",

author = "Ahmed Hasssan and Jian Meng and Yu Cao and Seo, {Jae Sun}",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE.; 56th Asilomar Conference on Signals, Systems and Computers, ACSSC 2022 ; Conference date: 31-10-2022 Through 02-11-2022",

year = "2022",

doi = "10.1109/IEEECONF56349.2022.10051946",

language = "English (US)",

series = "Conference Record - Asilomar Conference on Signals, Systems and Computers",

publisher = "IEEE Computer Society",

pages = "344--348",

editor = "Matthews, {Michael B.}",

booktitle = "56th Asilomar Conference on Signals, Systems and Computers, ACSSC 2022",

}

TY - GEN

T1 - Spatial-temporal Data Compression of Dynamic Vision Sensor Output with High Pixel-level Saliency using Low-precision Sparse Autoencoder

AU - Hasssan, Ahmed

AU - Meng, Jian

AU - Cao, Yu

AU - Seo, Jae Sun

PY - 2022

Y1 - 2022

N2 - Imaging innovations such as dynamic vision sensor (DVS) can significantly reduce the image data volume by tracking only the changes in events. However, when DVS camera itself moves around (e.g. self-driving cars), the DVS output stream is not sparse enough to achieve the desired hardware efficiency. In this work, we investigate designing a compact sparse auto encoder model to largely compress event-based DVS output. The proposed encoder-decoder-based autoencoder design is a shallow convolutional neural network (CNN) architecture with two convolution and inverse-convolution layers with only ~ 10k parameters. We implement quantization-aware training on our proposed model to achieve 2-bit and 4-bit precision. Moreover, we implement unstructured pruning on the encoder module to achieve >90 % active pixel compression at the latent space. The proposed autoencoder design has been validated against multiple benchmark DVS-based datasets including DVS-MNIST, N-Cars, DVS-IBM Gesture, and Prophesee Automotive Gen1 dataset. We achieve low accuracy drop of 2%, 3%, and 3.8% compared to the uncompressed baseline, with 7.08%, 1.36%, and 5.53 % active pixels in the images from the decoder (compression ratio of 13.1×, 29.1×, and 18.1×) for DVS-MNIST, N-Cars, and DVS-IBM Gesture datasets, respectively. For the Prophesee Automotive Gen1 dataset, we achieve a minimal mAP drop of 0.07 from the baseline with 9% active pixels in the images from the decoder (compression ratio of 11.9×).

AB - Imaging innovations such as dynamic vision sensor (DVS) can significantly reduce the image data volume by tracking only the changes in events. However, when DVS camera itself moves around (e.g. self-driving cars), the DVS output stream is not sparse enough to achieve the desired hardware efficiency. In this work, we investigate designing a compact sparse auto encoder model to largely compress event-based DVS output. The proposed encoder-decoder-based autoencoder design is a shallow convolutional neural network (CNN) architecture with two convolution and inverse-convolution layers with only ~ 10k parameters. We implement quantization-aware training on our proposed model to achieve 2-bit and 4-bit precision. Moreover, we implement unstructured pruning on the encoder module to achieve >90 % active pixel compression at the latent space. The proposed autoencoder design has been validated against multiple benchmark DVS-based datasets including DVS-MNIST, N-Cars, DVS-IBM Gesture, and Prophesee Automotive Gen1 dataset. We achieve low accuracy drop of 2%, 3%, and 3.8% compared to the uncompressed baseline, with 7.08%, 1.36%, and 5.53 % active pixels in the images from the decoder (compression ratio of 13.1×, 29.1×, and 18.1×) for DVS-MNIST, N-Cars, and DVS-IBM Gesture datasets, respectively. For the Prophesee Automotive Gen1 dataset, we achieve a minimal mAP drop of 0.07 from the baseline with 9% active pixels in the images from the decoder (compression ratio of 11.9×).

KW - Dynamic vision sensor

KW - Neural network pruning

KW - Neural network training

KW - Object recognition

KW - Quantization

KW - Sparse autoencoder

KW - detection

UR - http://www.scopus.com/inward/record.url?scp=85150197099&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85150197099&partnerID=8YFLogxK

U2 - 10.1109/IEEECONF56349.2022.10051946

DO - 10.1109/IEEECONF56349.2022.10051946

M3 - Conference contribution

AN - SCOPUS:85150197099

T3 - Conference Record - Asilomar Conference on Signals, Systems and Computers

SP - 344

EP - 348

BT - 56th Asilomar Conference on Signals, Systems and Computers, ACSSC 2022

A2 - Matthews, Michael B.

PB - IEEE Computer Society

T2 - 56th Asilomar Conference on Signals, Systems and Computers, ACSSC 2022

Y2 - 31 October 2022 through 2 November 2022

ER -

Spatial-temporal Data Compression of Dynamic Vision Sensor Output with High Pixel-level Saliency using Low-precision Sparse Autoencoder

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this