Hi everyone,
I’m using OWLViTForObjectDetection model and I want to perform inference on the GPU. So what I’m doing is something like:
model = model.to(device='cuda')
with torch.no_grad():
model.eval()
data = data.to(device='cuda')
# inference code
It seems that the inclusion of torch.no_grad() is probably causing some of the model’s parameters to not be copied in the GPU memory because I’m getting an error that all tensors should be on the same device but at least two different devices were found (cuda and cpu). If I remove torch.no_grad() the error does not happen but then I get an out of memory error because all the model’s activation are kept in GPU memory for gradient calculation.
This has not happened to me ever in the past with various models that I’ve been using, so I’m wondering whether it is particularly related to HuggingFace models. Have this occurred to anyone else? Are there any known workarounds for this?
Thank you!