TY - JOUR
T1 - Learning in the frequency domain
AU - Xu, Kai
AU - Qin, Minghai
AU - Sun, Fei
AU - Wang, Yuhao
AU - Chen, Yen Kuang
AU - Ren, Fengbo
N1 - Funding Information:
Acknowledgement. The work by Arizona State University is supported by an NSF grant (IIS/CPS-1652038).
Publisher Copyright:
© 2020 IEEE
PY - 2020
Y1 - 2020
N2 - Deep neural networks have achieved remarkable success in computer vision tasks. Existing neural networks mainly operate in the spatial domain with fixed input sizes. For practical applications, images are usually large and have to be downsampled to the predetermined input size of neural networks. Even though the downsampling operations reduce computation and the required communication bandwidth, it removes both redundant and salient information obliviously, which results in accuracy degradation. Inspired by digital signal processing theories, we analyze the spectral bias from the frequency perspective and propose a learning-based frequency selection method to identify the trivial frequency components which can be removed without accuracy loss. The proposed method of learning in the frequency domain leverages identical structures of the well-known neural networks, such as ResNet-50, MobileNetV2, and Mask R-CNN, while accepting the frequency-domain information as the input. Experiment results show that learning in the frequency domain with static channel selection can achieve higher accuracy than the conventional spatial downsampling approach and meanwhile further reduce the input data size. Specifically for ImageNet classification with the same input size, the proposed method achieves 1.60% and 0.63% top-1 accuracy improvements on ResNet-50 and MobileNetV2, respectively. Even with half input size, the proposed method still improves the top-1 accuracy on ResNet-50 by 1.42%. In addition, we observe a 0.8% average precision improvement on Mask R-CNN for instance segmentation on the COCO dataset.
AB - Deep neural networks have achieved remarkable success in computer vision tasks. Existing neural networks mainly operate in the spatial domain with fixed input sizes. For practical applications, images are usually large and have to be downsampled to the predetermined input size of neural networks. Even though the downsampling operations reduce computation and the required communication bandwidth, it removes both redundant and salient information obliviously, which results in accuracy degradation. Inspired by digital signal processing theories, we analyze the spectral bias from the frequency perspective and propose a learning-based frequency selection method to identify the trivial frequency components which can be removed without accuracy loss. The proposed method of learning in the frequency domain leverages identical structures of the well-known neural networks, such as ResNet-50, MobileNetV2, and Mask R-CNN, while accepting the frequency-domain information as the input. Experiment results show that learning in the frequency domain with static channel selection can achieve higher accuracy than the conventional spatial downsampling approach and meanwhile further reduce the input data size. Specifically for ImageNet classification with the same input size, the proposed method achieves 1.60% and 0.63% top-1 accuracy improvements on ResNet-50 and MobileNetV2, respectively. Even with half input size, the proposed method still improves the top-1 accuracy on ResNet-50 by 1.42%. In addition, we observe a 0.8% average precision improvement on Mask R-CNN for instance segmentation on the COCO dataset.
UR - http://www.scopus.com/inward/record.url?scp=85094659578&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094659578&partnerID=8YFLogxK
U2 - 10.1109/CVPR42600.2020.00181
DO - 10.1109/CVPR42600.2020.00181
M3 - Conference article
AN - SCOPUS:85094659578
SN - 1063-6919
SP - 1737
EP - 1746
JO - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
JF - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
M1 - 9157408
T2 - 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020
Y2 - 14 June 2020 through 19 June 2020
ER -