Zhiyuan Fang, Jianfeng Wang, Lijuan Wang, Lei Zhang, Yezhou Yang, Zicheng Liu

Research output: Contribution to conferencePaperpeer-review


This paper is concerned with self-supervised learning for small models. The problem is motivated by our empirical studies that while the widely used contrastive self-supervised learning method has shown great progress on large model training, it does not work well for small models. To address this problem, we propose a new learning paradigm, named SElf-SupErvised Distillation (SEED), where we leverage a larger network (as Teacher) to transfer its representational knowledge into a smaller architecture (as Student) in a self-supervised fashion. Instead of directly learning from unlabeled data, we train a student encoder to mimic the similarity score distribution inferred by a teacher over a set of instances. We show that SEED dramatically boosts the performance of small networks on downstream tasks. Compared with self-supervised baselines, SEED improves the top-1 accuracy from 42.2% to 67.6% on EfficientNet-B0 and from 36.3% to 68.2% on MobileNetV3-Large on the ImageNet-1k dataset.

Original languageEnglish (US)
StatePublished - 2021
Event9th International Conference on Learning Representations, ICLR 2021 - Virtual, Online
Duration: May 3 2021May 7 2021


Conference9th International Conference on Learning Representations, ICLR 2021
CityVirtual, Online

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science Applications
  • Education
  • Linguistics and Language


Dive into the research topics of 'SEED: SELF-SUPERVISED DISTILLATION FOR VISUAL REPRESENTATION'. Together they form a unique fingerprint.

Cite this