Tokenizer deprecating in ORPO

Dear all,
I was training an LLM using ORPO based on the guides by Maxime Labonne.

Unfortunately, I encountered the following error on the code below:
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.

Trying to resolve it by using the processing_class in the way suggested by Gemini/ChatGPT did not resolve the problem. Does anyone have an idea how to resolve this?

Kind regards,
Ben

orpo_args = ORPOConfig(
    learning_rate=8e-6,
    beta=0.1,
    lr_scheduler_type="linear",
    max_length=1024,
    max_prompt_length=512,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=4,
    optim="paged_adamw_8bit",
    #Ideally train 3-5 epochs
    num_train_epochs=1,
    evaluation_strategy="steps",
    eval_steps=0.2,
    logging_steps=1,
    warmup_steps=10,
    report_to="wandb",
    output_dir="./results/",
)

trainer = ORPOTrainer(
    model=model,
    args=orpo_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    peft_config=peft_config,
    tokenizer=tokenizer,
)
trainer.train()
trainer.save_model(new_model)

It appears that an issue has been issued on github and fixed. The date is yesterday, so I’m not sure if it has been reflected yet.

Hi John,
thanks for highlighting this - apparently it hasn’t been reflected yet. Will probably keep trying in the following days.

In a local environment, this change alone would be a workaround, but it would be best to wait until it is fixed for the latest version.

pip uninstall transformers
pip install transformers==4.45.2

Dear John,
thanks for helping a newbie – that literally fixed all my problems and stupid me didn’t think of just using the previous transformers version.

It’s not a solution, it’s a workaround, but, well, better to have it work than not to have it work!