How to run validation on multiple evaluation datasets simultaneously during Qwen2.5-VL-7B-Instruct fine-tuning?

I’m trying to fine-tune Qwen2.5-VL-7B-Instruct and want to use two evaluation datasets (eval_dataset_A, eval_dataset_B) to compute validation loss during training.
I referred to the official fine-tuning parameters here:
https://github.com/QwenLM/Qwen2.5-VL/tree/main/qwen-vl-finetune

I added the following parameters:

--eval_dataset_use eval_dataset_A,eval_dataset_B
--eval_strategy steps
--eval_steps 2

My goal: use both eval_dataset_A and eval_dataset_B for validation loss calculation at each eval_steps.

In qwen-vl-finetune/qwenvl/train/trainer.py, there is the following code:

if data_args.data_packing:
    data_module = make_supervised_data_module_packed(tokenizer=tokenizer, data_args=data_args)
else:
    data_module = make_supervised_data_module(tokenizer=tokenizer, data_args=data_args)

trainer = Trainer(
    model=model, processing_class=tokenizer, args=training_args, **data_module
)

How should I modify this so that Trainer evaluates on two different evaluation datasets?

If anyone has done something similar, could you share example code?

Seems relatively new Trainer natively evaluates each dataset when eval_dataset is a dict and prefixes metrics with the dict key like this:

eval_datasets = {"A": eval_dataset_A, "B": eval_dataset_B}
...
trainer = Trainer(
    model=model, processing_class=tokenizer, args=training_args, eval_dataset=eval_datasets, **data_module
)

Train with multiple eval datasets raises an Exception

Thanks to your advice, I was able to create the following code. I really appreciate your help.

if data_args.data_packing:
    data_module = make_supervised_data_module_packed(tokenizer=tokenizer, data_args=data_args)
else:
    data_module = make_supervised_data_module(tokenizer=tokenizer, data_args=data_args)

if training_args.do_eval:
    if data_args.eval_dataset_use is None:
        raise ValueError("do_eval is True but eval_dataset_use is not set")
    
    eval_datasets = {}
    eval_dataset_names = data_args.eval_dataset_use.split(',')
    for name in eval_dataset_names:
        eval_data_args = copy.deepcopy(data_args)
        eval_data_args.dataset_use = name
        eval_datasets[name] = LazySupervisedDataset(tokenizer=tokenizer, data_args=eval_data_args)
else:
    eval_datasets = None

if "eval_dataset" in data_module:
    del data_module["eval_dataset"]

trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    args=training_args,
    eval_dataset=eval_datasets,
    **data_module,
)