文献阅读(十):AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE
- ABSTRACT
- 1 INTRODUCTION
- 2 RELATED WORK
- 3 METHOD
-
- 3.1 VISION TRANSFORMER (VIT)
- 3.2 FINE-TUNING AND HIGHER RESOLUTION
- 4 EXPERIMENTS
-
- 4.1 SETUP
- 4.2 COMPARISON TO STATE OF THE ART
- 4.3 PRE-TRAINING DATA REQUIREMENTS
- 4.4 SCALING STUDY
- 4.5 INSPECTING VISION TRANSFORMER
- <