The study of human reaction time (RT) is invaluable not only to understand the sensory-motor functions but also to translate brain signals into machine comprehensible commands that can facilitate augmentative and alternative communication using brain-computer interfaces (BCI). Recent developments in sensor technologies, hardware computational capabilities, and neural network models have significantly helped advance biomedical signal processing research. This study is an attempt to utilize state-of-the-art resources to explore the relationship between human behavioral responses during perceptual decision-making and corresponding brain signals in the form of electroencephalograms (EEG). In this paper, a generalized 3D convolutional neural network (CNN) architecture is introduced to estimate RT for a simple visual task using single-trial multi-channel EEG. Earlier comparable studies have also employed a number of machine learning and deep learning-based models, but none of them considered inter-channel relationships while estimating RT. On the contrary, the use of 3D convolutional layers enabled us to consider the spatial relationship among adjacent channels while simultaneously utilizing spectral information from individual channels. Our model can predict RT with a root mean square error of 91.5 ms and a correlation coefficient of 0.83. These results surpass all the previous results attained from different studies.Clinical relevance Novel approaches to decode brain signals can facilitate research on brain-computer interfaces (BCIs), psychology, and neuroscience, enabling people to utilize assistive devices by root-causing psychological or neuromuscular disorders.