How to successfully ONNX pretrained models

MH-BS · January 26, 2025, 11:53am

Hi!

I was looking for a way to use multithreading (How to use multithreading on a CPU - #6 by MH-BS) and was directed towards ONNX. As I understood it converts existing models and then they could use multiple threads on my CPU. Unfortunately I always encounter problems. I know that we cannot solve all of them here, but I thought, I will share two or three of them and maybe I can learn from your advice, what I am doing wrong.

MH-BS · January 26, 2025, 11:56am

The first one is about Llava-llama-3-8b-v1-1. The script looks like this:

cat conv.py 
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import onnx
from onnxruntime import InferenceSession
from PIL import Image

def convert_to_onnx(model, tokenizer, model_name='test.onnx'):
    dummy_input = tokenizer("Inserting some dummy text here since I don't have your actual input.",
                            return_tensors='pt').input_ids
    torch.onnx.export(
        model,                   # model being run
        dummy_input,             # model input (or a tuple for multiple inputs)
        model_name,              # where to save the model (can be a file or file-like object)
        export_params=True,      # store the trained parameter weights inside the model file
        opset_version=11,        # the ONNX version to export the model to
        do_constant_folding=True # whether to execute constant folding for optimization
    )
    print(f"Model converted to {model_name}.")

def process_model(onnx_model, image, frage):
    print("\nQuestion: " + question)
    # Placeholder for actual InferenceSession and processing
    session = InferenceSession(onnx_model)
    input_name = session.get_inputs()[0].name
    #image_flags = torch.Tensor([0, 1, 2])  # Example initialization
    #image_flags = image_flags.squeeze(-1)
    # Placeholder: Convert your image into the required input format
    # result = session.run(None, {input_name: image}) [ adjust this according to your input type ]
    # print(result)

path="xtuner/llava-llama-3-8b-v1_1"
model = AutoModelForCausalLM.from_pretrained(path, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)

convert_to_onnx(model, tokenizer)
image = Image.open("tmp2.jpg")
image.show()

question = "Please describe the image."
process_model("test.onnx", image, frage)

The output looks like this:
[…]
odel-00009-of-00009.safetensors: 92%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 965M/1.05G [02:16<00:12, 7.1model-00009-of-00009.safetensors: 93%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 975M/1.05G [02:17<00:10, 7.1model-00009-of-00009.safetensors: 94%|███████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 986M/1.05G [02:19<00:09, 7.1model-00009-of-00009.safetensors: 95%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 996M/1.05G [02:20<00:07, 6.8model-00009-of-00009.safetensors: 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 1.01G/1.05G [02:22<00:06, 6.9model-00009-of-00009.safetensors: 97%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 1.02G/1.05G [02:23<00:04, 7.0model-00009-of-00009.safetensors: 98%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 1.03G/1.05G [02:25<00:03, 7.0model-00009-of-00009.safetensors: 99%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 1.04G/1.05G [02:26<00:01, 7.1model-00009-of-00009.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 1.05G/1.05G [02:28<00:00, 7.1model-00009-of-00009.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.05G/1.05G [02:28<00:00, 7.08MB/s]
Downloading shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [37:50<00:00, 252.25s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:31<00:00, 3.50s/it]
generation_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 126/126 [00:00<00:00, 715kB/s]
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 51.0k/51.0k [00:00<00:00, 909kB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.08M/9.08M [00:01<00:00, 6.14MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 301/301 [00:00<00:00, 1.78MB/s]
/home/martin/esn_vqa/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py:726: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if sequence_length != 1:
Traceback (most recent call last):
File “/home/martin/esn_vqa/conv_moon.py”, line 35, in
convert_to_onnx(model, tokenizer)
File “/home/martin/esn_vqa/conv_moon.py”, line 10, in convert_to_onnx
torch.onnx.export(
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/onnx/init.py”, line 375, in export
export(
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/onnx/utils.py”, line 502, in export
_export(
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/onnx/utils.py”, line 1564, in _export
graph, params_dict, torch_out = _model_to_graph(
^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/onnx/utils.py”, line 1113, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/onnx/utils.py”, line 997, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/onnx/utils.py”, line 904, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/jit/_trace.py”, line 1500, in _get_trace_graph
outs = ONNXTracedModule(
^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/nn/modules/module.py”, line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/nn/modules/module.py”, line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/jit/_trace.py”, line 139, in forward
graph, out = torch._C._create_graph_by_tracing(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/jit/_trace.py”, line 133, in wrapper
out_vars, _ = _flatten(outs)
^^^^^^^^^^^^^^
RuntimeError: Only tuples, lists and Variables are supported as JIT inputs/outputs. Dictionaries and strings are also accepted, but their usage is not recommended. Here, received an input of unsupported type: DynamicCache

John6666 · January 26, 2025, 11:56am

If you have any questions about ONNX, it’s a good idea to ask them on the ONNX Community. I might be able to help you with some basic mistakes…

MH-BS · January 26, 2025, 11:58am

Ah okay, do you think, I should not proceed here? I already wanted to post the next model experience. But of course I can also switch to the ONNX community.

John6666 · January 26, 2025, 12:01pm

No, it’s genuinely helpful that you’re sharing your experiences with everyone.
I’ve just explained it. By the way, from what I’ve seen so far, I think there are a lot of errors that occur when the ONNX library is old, but is this happening with the latest version of ONNX?

Edit:
Also, I think this is probably the recommended method for exporting now.

MH-BS · January 26, 2025, 12:05pm

Thank you, I will take a look at the optimum approach.

MH-BS · January 26, 2025, 12:22pm

It may be that the optimum approach is not suitable. I just got this response:

ValueError: Asked to export a llama model for the task image-text-to-text (auto-detected), but the Optimum ONNX exporter only supports the tasks feature-extraction, feature-extraction-with-past, text-generation, text-generation-with-past, text-classification for llama. Please use a supported task. Please open an issue at GitHub · Where software is built if you would like the task image-text-to-text to be supported in the ONNX export for llama.

And as all the models I am look at right now are from the VQA/VL area, I guess that optimum is not suited for that task.

John6666 · January 26, 2025, 1:03pm

It seems that in some cases, this can be avoided by directly specifying the task. If that doesn’t work, let’s look for something other than ONNX and Optimum.

github.com/huggingface/optimum

OPTIMUM Onnx Exporter for openai/clip-vit-large-patch14 model

opened 10:24AM - 12 Jul 24 UTC

antje2233

onnx

### Feature request I wonder if the task text-classification can to be supporte…d in the ONNX export for clip? Ich want to use the openai/clip-vit-large-path14 model for zero-shot image classification (classification of images without pretraining based on given candidate labels) but I get the following error: ValueError Traceback (most recent call last) File /home/danne00a/ZablageBlazeG/ZeroShotClassification/zeroshotclassifier.py:2 [1](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/zeroshotclassifier.py:1) #%% ----> [2](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/zeroshotclassifier.py:2) ort_model = ORTModelForSequenceClassification.from_pretrained(model_checkpoint, export=True) File ~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:669, in ORTModel.from_pretrained(cls, model_id, export, force_download, use_auth_token, cache_dir, subfolder, config, local_files_only, provider, session_options, provider_options, use_io_binding, **kwargs) [620](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:620) @classmethod [621](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:621) @add_start_docstrings(FROM_PRETRAINED_START_DOCSTRING) [622](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:622) def from_pretrained( (...) [636](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:636) **kwargs, [637](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:637) ): [638](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:638) """ [639](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:639) provider (`str`, defaults to `"CPUExecutionProvider"`): [640](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:640) ONNX Runtime provider to use for loading the model. See https://onnxruntime.ai/docs/execution-providers/ for (...) [667](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:667) `ORTModel`: The loaded ORTModel model. [668](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:668) """ --> [669](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:669) return super().from_pretrained( [670](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:670) model_id, [671](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:671) export=export, [672](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:672) force_download=force_download, [673](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:673) use_auth_token=use_auth_token, [674](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/onnxruntime/modeling_ort.py:674) cache_dir=cache_dir, ... [274](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/exporters/onnx/__main__.py:274) ) [276](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/exporters/onnx/__main__.py:276) # TODO: Fix in Transformers so that SdpaAttention class can be exported to ONNX. `attn_implementation` is introduced in Transformers 4.36. [277](https://vscode-remote+ssh-002dremote-002bdemlhr7sw119x.vscode-resource.vscode-cdn.net/home/danne00a/ZablageBlazeG/ZeroShotClassification/~/mambaforge/envs/ZeroShot_Mamba_env/lib/python3.11/site-packages/optimum/exporters/onnx/__main__.py:277) if model_type in SDPA_ARCHS_ONNX_EXPORT_NOT_SUPPORTED and _transformers_version >= version.parse("4.35.99"): ValueError: Asked to export a clip model for the task text-classification, but the Optimum ONNX exporter only supports the tasks feature-extraction, zero-shot-image-classification for clip. Please use a supported task. Please open an issue at https://github.com/huggingface/optimum/issues if you would like the task text-classification to be supported in the ONNX export for clip. ### Motivation I'm struggling with the sioze of the openai/clip-vit-large-patch14 model, thus I want to convert it to OPTIMUM onnx! ### Your contribution no ideas so far..

Topic		Replies	Views
Paligemma2 onnx export KeyError: "Unknown task: image-text-to-text 🤗Optimum	4	245	February 11, 2025
Export M2M100 model to ONNX 🤗Transformers	13	3759	June 15, 2023
ORT CLI vs. Programmatic 🤗Optimum	12	1414	August 17, 2023
Cannot export to ONNX with optimum.onnxruntime 🤗Optimum	0	1021	February 28, 2024
Getting ValueError when exporting model to ONNX using optimum 🤗Optimum	16	5329	November 25, 2022

How to successfully ONNX pretrained models

Related topics