TY - GEN
T1 - Benchmarking and Boosting Transformers for Medical Image Classification
AU - Ma, Dong Ao
AU - Hosseinzadeh Taher, Mohammad Reza
AU - Pang, Jiaxuan
AU - Islam, Nahid Ui
AU - Haghighi, Fatemeh
AU - Gotway, Michael B.
AU - Liang, Jianming
N1 - Funding Information:
Acknowledgments. This research has been supported in part by ASU and Mayo Clinic through a Seed Grant and an Innovation Grant, and in part by the NIH under Award Number R01HL128785. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. This work has utilized the GPUs provided in part by the ASU Research Computing and in part by the Extreme Science and Engineering Discovery Environment (XSEDE) funded by the National Science Foundation (NSF) under grant numbers: ACI-1548562, ACI-1928147, and ACI-2005632. We thank Manas Chetan Valia and Haozhe Luo for evaluating the pre-trained ResNet50 models on the five chest X-ray tasks and the pre-trained transformer models on the VinDr-CXR target tasks, respectively. The content of this paper is covered by patents pending.
Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Visual transformers have recently gained popularity in the computer vision community as they began to outrank convolutional neural networks (CNNs) in one representative visual benchmark after another. However, the competition between visual transformers and CNNs in medical imaging is rarely studied, leaving many important questions unanswered. As the first step, we benchmark how well existing transformer variants that use various (supervised and self-supervised) pre-training methods perform against CNNs on a variety of medical classification tasks. Furthermore, given the data-hungry nature of transformers and the annotation-deficiency challenge of medical imaging, we present a practical approach for bridging the domain gap between photographic and medical images by utilizing unlabeled large-scale in-domain data. Our extensive empirical evaluations reveal the following insights in medical imaging: (1) good initialization is more crucial for transformer-based models than for CNNs, (2) self-supervised learning based on masked image modeling captures more generalizable representations than supervised models, and (3) assembling a larger-scale domain-specific dataset can better bridge the domain gap between photographic and medical images via self-supervised continuous pre-training. We hope this benchmark study can direct future research on applying transformers to medical imaging analysis. All codes and pre-trained models are available on our GitHub page https://github.com/JLiangLab/BenchmarkTransformers.
AB - Visual transformers have recently gained popularity in the computer vision community as they began to outrank convolutional neural networks (CNNs) in one representative visual benchmark after another. However, the competition between visual transformers and CNNs in medical imaging is rarely studied, leaving many important questions unanswered. As the first step, we benchmark how well existing transformer variants that use various (supervised and self-supervised) pre-training methods perform against CNNs on a variety of medical classification tasks. Furthermore, given the data-hungry nature of transformers and the annotation-deficiency challenge of medical imaging, we present a practical approach for bridging the domain gap between photographic and medical images by utilizing unlabeled large-scale in-domain data. Our extensive empirical evaluations reveal the following insights in medical imaging: (1) good initialization is more crucial for transformer-based models than for CNNs, (2) self-supervised learning based on masked image modeling captures more generalizable representations than supervised models, and (3) assembling a larger-scale domain-specific dataset can better bridge the domain gap between photographic and medical images via self-supervised continuous pre-training. We hope this benchmark study can direct future research on applying transformers to medical imaging analysis. All codes and pre-trained models are available on our GitHub page https://github.com/JLiangLab/BenchmarkTransformers.
KW - Benchmarking
KW - Domain-adaptive pre-training
KW - Transfer learning
KW - Vision Transformer
UR - http://www.scopus.com/inward/record.url?scp=85140442747&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85140442747&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-16852-9_2
DO - 10.1007/978-3-031-16852-9_2
M3 - Conference contribution
AN - SCOPUS:85140442747
SN - 9783031168512
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 12
EP - 22
BT - Domain Adaptation and Representation Transfer - 4th MICCAI Workshop, DART 2022, Held in Conjunction with MICCAI 2022, Proceedings
A2 - Kamnitsas, Konstantinos
A2 - Koch, Lisa
A2 - Islam, Mobarakol
A2 - Xu, Ziyue
A2 - Cardoso, Jorge
A2 - Dou, Qi
A2 - Rieke, Nicola
A2 - Tsaftaris, Sotirios
PB - Springer Science and Business Media Deutschland GmbH
T2 - 4th MICCAI Workshop on Domain Adaptation and Representation Transfer, DART 2022, held in conjunction with the 25th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2022
Y2 - 22 September 2022 through 22 September 2022
ER -