RuntimeError with Mixed Precision during LoRA Fine-Tuning in LLAVA on Small GPU Machine

Hi everyone,

I’m facing an issue while fine-tuning the LLAVA model using LoRA on a machine with limited GPU resources. To accommodate the small GPU, I’ve been experimenting with 4-bit precision. However, I consistently encounter the following error:

RuntimeError: expected scalar type BFloat16 but found Float
This occurs specifically in the vision model, particularly during the LayerNorm operation in the forward pass.

Key Configuration:

  • Model: liuhaotian/llava-v1.6-vicuna-7b
  • Vision Tower: openai/clip-vit-large-patch14-336
  • LoRA: Enabled with lora_r=128, lora_alpha=256
  • Precision: 4-bit (bits=4)
  • Other Settings: bf16=True, gradient_checkpointing=True

Problem:

I’m running into a data type mismatch where some layers (e.g., LayerNorm) expect BFloat16, but are instead using Float32, which triggers the error. When I inspect the model, I find a mix of data types across the layers:

  • 166 layers in float32
  • 744 layers in bfloat16
  • 369 layers in uint8

My Situation:

I’m trying to modify LLAVA for my own use case and need to run it in a ā€œdebug modeā€ to test and tweak the code. Since I have limited GPU resources, I’m using low precision (4-bit) to make debugging feasible. However, this data type mismatch is proving to be a roadblock.

My Questions:

  • How can I debug or fine-tune LLAVA with LoRA on a small GPU without running into these precision-related errors?
  • Should I be manually converting specific layers to avoid the mismatch between bfloat16 and float32?
  • Is there a general approach to running LoRA fine-tuning in a lightweight ā€œdebug modeā€ for code experimentation without worrying about outputs or precision mismatches?

Any guidance or suggestions would be greatly appreciated!

Thanks in advance!

It seems like torch’s autocast is doing something bad or CUDA version mismatch is the most common cause.