"OS Errorr: No space left on device" when trying to load a trained model from S3

Hello all!

I have been stuck on this for weeks and am genuinely beyond confused. For some context, I was able to successfully train and finetune my CodeLlama-7B and CodeLlama-13B on SageMaker using the instances ml.g5.2xlarge and ml.g5.8xlarge and store these models in my S3 bucket. Then, I was able to effectively deploy my CodeLlama-7B model on the SageMaker Inference Endpoint using the following code in my SageMaker Notebook Instance:

model = HuggingFaceModel(
    model_data="s3://...model.tar.gz",
    entry_point="inference.py",
    source_dir="scripts",
    ... # some versioning parameters
)

predictor = model.deploy(
    endpoint_name="CodeLlama-7B",
    instance_type="ml.g5.2xlarge",
    ...
)

In this code, the model_data points to a file (model.tar.gz) containing my finetuned model and inference.py is a script that holds the functions for inference (model_fn(), predict_fn(), etc.). Everything works beautifully when I deploy my CodeLlama-7B. However, when I replace it with my s3 file containing the CodeLlama-13B, I started receiving the OS Error: Device Out of Space error. Several things I have tried that all still resulted in this same error:

  1. Scaling up the instance_type with a very powerful instance, such as ml.p4d.24xlarge (which is weird because I’ve seen tutorials hosting Llama 2-70B on this instance).
  2. Adding a volume_size parameter in my model.deploy() call with other large instances because ml.g5.* instances don’t support attaching extra volume storage.
  3. Using multi-GPU and setting device_map='auto' when calling .from_pretrained().
  4. Setting the SM_NUM_GPUS variable.
  5. Scale up my Notebook Instance.

Any pointers and guidance would be very much appreciated! @philschmid , just wanted to say I’ve been following a lot of your tutorials and they have been super helpful, thank you so much for all the materials you’ve put out : )

Cheers!

The issue has been solved!

My solution was to download the model on my local machine, unpack and include my custom inference script, and then call the .model() and .deploy()` without including the entry scripts and directories.