모델을 변경하면서 실험을 할 때 "gradient computation has been modified by an inplace operation"같은 error만큼 답이 안나오는 상황이 흔치 않다.
어떤 경우는 detach()로 해경하는 경우도 있었지만 이번 경우는 "nn.parallel.DistributedDataParallel"을 사용하면서 생겼다.
model = nn.parallel.DistributedDataParallel$($model, device_ids=[local_rank], broadcast_buffers=True, find_unused_parameters=False$)$
같은 상황에서 broadcast_buffers option을 True에서 False로 변경했을 때 해결됐다.
https://github.com/pytorch/pytorch/issues/62474 https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html
Distributed: RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torc
🐛 Bug when using distributed with pretrained model, backprop seems to error out due to inplace modification. To Reproduce I have converted a repo: https://github.com/talreiss/Mean-Shifted-Anomaly-D...
github.com
'AI' 카테고리의 다른 글
| Decoupled Knowledge Distillation - CVPR2022 (0) | 2023.08.12 |
|---|---|
| Debiased Self-Training for Semi-Supervised Learning (0) | 2023.01.07 |
| SaR: Self-Adaptive Refinement on Pseudo Labels for Multiclass-Imbalanced Semi-Supervised Learning (0) | 2023.01.06 |
| Numpy Image File with torchvision.datasets (0) | 2022.04.22 |
| 대학원 면접 준비 (0) | 2022.01.18 |
