본문으로 바로가기

Masked Autoencoders are Secretly Efficient Learners

category AI 2024. 11. 6. 02:22

WACV 2024 Accepted Paper

Link: https://openaccess.thecvf.com/content/CVPR2024W/ECV24/html/Wei_Masked_Autoencoders_are_Secretly_Efficient_Learners__CVPRW_2024_paper.html

 

Method

Reduce the number of decoder layers

  • To improve the efficiency of our model, we use a one-layer decoder as the default setting.
  • This design choice is motivated by the fact that using an eight-layer decoder, while providing only marginal improvements, consumes over 50% of the total FLOPs in the MAE model.
  • In practice, switching to a one-layer decoder setup leads to an acceleration of over 60% compared to the speed of an eight-layer decoder.

Reduce the number of pre-training epochs

  • training epochs from 1600 to 100 and increasing the masking ratio from 75% to 90%
    • further speeds up the MAE training by 23×
    • however, the final performance is 3.2% worse than the original MAE model (80.4% vs 81.8%)

Reduce the pre-training batch size

  • The conventional pre-training batch size is 4096
  • By reducing pre-training batch size from 4096 to 1024, increase the performance by 1.2%

Layer-wise learning rate decay

  • 기존 MAE에서는 PT 때 lower-level feature가 잘 학습되었기 때문에 FT 때는 update가 크게 필요없가고 가정하고 있다.
  • 하지만 reduce version MAE에서는 이러한 가정이 깨질 수 있다.
  • 그래서 LLRD rate를 올려봤더니 FT 성능이 오른다
    • Mask ratio 90%기준: 1.2%가 올랐다

Optimal LLRD of different pre-train recipes

  • 최적값으로 학습 레시피를 짜고 학습한다면 기존 8-layer decoder MAE와 성능차이는 0.1%만 차이난다.
  • 하지만 학습 속도는 50배 차이난다.

Low-cost parameter searching

  • recipe를 어떻게 작성하는지에 따라서 성능차이가 아주 크다.

Experiment


MisoYuri's Deck
블로그 이미지 MisoYuri 님의 블로그
VISITOR 오늘 / 전체