새로운 방법론을 제안하기 보다는 LoRA를 RS에 적용해 여러 실험을 했다는 점을 주요하게 인정받은듯 하다.
Summary
Background:
foundation models for geospatial and satellite remote sensing applications are commonly trained on large optical RGB or multi-spectral datasets
Limitations:
although data from a wide variety of heterogeneous sensors are available in the remote sensing domain.
This leads to significant discrepancies between pre-training and downstream target data distributions for many important applications.
Fine-tuning large foundation models to bridge that gap incurs high computational cost and can be infeasible when target datasets are small.
Goal:
Address the question of how large, pre-trained foundational transformer models can be efficiently adapted to downstream remote sensing tasks involving different data modalities or limited dataset size.
Introduction or Motivation
Computer vision approaches for remote sensing data are highly fragmented into specialized sub-fields defined by the different modalities or the application of interest
ex) RGB, NIR, hyperspectral data or SAR
Without zero- or few-shot capabilities on modalities other than optical data, resulting in the re-training of large foundation models for datasets involving new modalities.
Expensive fine-tuning protocols have to be employed
This requires large amounts of labeled samples to adapt the model and comes with high computational cost.
Method
Scaled Low-Rank (SLR) adapters
a small number of parameters to add new data modalities to a pre-trained foundation model.
these additional parameters allow the model to adapt to the characteristics of the new data modality, while the pre-trained parameters are kept fixed.
helps to generalize remote sensing foundation models beyond their pre-training data modalities while fully leveraging their existing capabilities.