This guide provides detailed instructions on how to export models for Executorch and execute them on the OpenVINO backend. The examples demonstrate how to export a model, load a model, prepare input tensors, execute inference, and save the output results.
Below is the layout of the examples/openvino directory, which includes the necessary files for the example applications:
examples/openvino
├── README.md # Documentation for examples (this file)
├── aot_optimize_and_infer.py # Example script to export and execute models
└── llama
├── README.md # Documentation for Llama example
└── llama3_2_ov_4wo.yaml # Configuration file for exporting Llama3.2 with OpenVINO backend
└── qwen2_5
├── README.md # Documentation for Qwen2.5 example
└── qwen2_5_1_5b_ov_4wo.yaml # Configuration file for exporting Qwen2.5-1.5B with OpenVINO backend
└── stable_diffusion
├── README.md # Documentation for Stable Diffusion example
├── export_lcm.py # Script for exporting models
├── openvino_lcm.py # Script for inference execution
└── requirements.txt # Requirements file for Stable Diffusion example
Follow the instructions of Prerequisites and Setup in backends/openvino/README.md to set up the OpenVINO backend.
The python script called aot_optimize_and_infer.py allows users to export deep learning models from various model suites (TIMM, Torchvision, Hugging Face) to a openvino backend using Executorch. Users can dynamically specify the model, input shape, and target device.
-
--suite(required): Specifies the model suite to use. Supported values:timm(e.g., VGG16, ResNet50)torchvision(e.g., resnet18, mobilenet_v2)huggingface(e.g., bert-base-uncased). NB: Quantization and validation is not supported yet.
-
--model(required): Name of the model to export. Examples:- For
timm:vgg16,resnet50 - For
torchvision:resnet18,mobilenet_v2 - For
huggingface:bert-base-uncased,distilbert-base-uncased
- For
-
--input_shape(optional): Input shape for the model. Provide this as a list or tuple. Examples:[1, 3, 224, 224](Zsh users: wrap in quotes)(1, 3, 224, 224)
-
--export(optional): Save the exported model as a.ptefile. -
--model_file_name(optional): Specify a custom file name to save the exported model. -
--batch_size: Batch size for the validation. Default batch_size == 1. The dataset length must be evenly divisible by the batch size. -
--quantize(optional): Enable model quantization. --dataset argument is requred for the quantization.huggingfacesuite does not supported yet. -
--validate(optional): Enable model validation. --dataset argument is requred for the validation.huggingfacesuite does not supported yet. -
--dataset(optional): Path to the imagenet-like calibration dataset. -
--infer(optional): Execute inference with the compiled model and report average inference timing. -
--num_iter(optional): Number of iterations to execute inference. Default value for the number of iterations is1. -
--warmup_iter(optional): Number of warmup iterations to execute inference before timing begins. Default value for the warmup iterations is0. -
--input_tensor_path(optional): Path to the raw tensor file to be used as input for inference. If this argument is not provided, a random input tensor will be generated. -
--output_tensor_path(optional): Path to the raw tensor file which the output of the inference to be saved. -
--device(optional) Target device for the compiled model. Default isCPU. Examples:CPU,GPU
python aot_optimize_and_infer.py --export --suite timm --model vgg16 --input_shape "[1, 3, 224, 224]" --device CPUpython aot_optimize_and_infer.py --export --suite torchvision --model resnet50 --input_shape "(1, 3, 256, 256)" --device GPUpython aot_optimize_and_infer.py --export --suite huggingface --model bert-base-uncased --input_shape "(1, 512)" --device CPUpython aot_optimize_and_infer.py --export --suite timm --model vgg16 --input_shape [1, 3, 224, 224] --device CPU --validate --dataset /path/to/datasetpython aot_optimize_and_infer.py --export --suite timm --model vgg16 --input_shape [1, 3, 224, 224] --device CPU --validate --dataset /path/to/dataset --quantizepython aot_optimize_and_infer.py --suite torchvision --model inception_v3 --infer --warmup_iter 10 --num_iter 100 --input_shape "(1, 3, 256, 256)" --device CPU-
Input Shape in Zsh: If you are using Zsh, wrap
--input_shapein quotes or use a tuple:--input_shape '[1, 3, 224, 224]' --input_shape "(1, 3, 224, 224)"
-
Model Compatibility: Ensure the specified
model_nameexists in the selectedsuite. Use the corresponding library's documentation to verify model availability. -
Output File: The exported model will be saved as
<MODEL_NAME>.ptein the current directory. -
Dependencies:
- Python 3.8+
- PyTorch
- Executorch
- TIMM (
pip install timm) - Torchvision
- Transformers (
pip install transformers)
-
Model Not Found: If the script raises an error such as:
ValueError: Model <MODEL_NAME> not found
Verify that the model name is correct for the chosen suite.
-
Unsupported Input Shape: Ensure
--input_shapeis provided as a valid list or tuple.
Build the backend libraries and executor runner by executing the script below in <executorch_root>/backends/openvino/scripts folder:
./openvino_build.shThe executable is saved in <executorch_root>/cmake-out/
Now, run the example using the executable generated in the above step. The executable requires a model file (.pte file generated in the aot step), and optional number of inference executions.
cd ../../cmake-out
./executor_runner \
--model_path=<path_to_model> \
--num_executions=<iterations>
--model_path: (Required) Path to the model serialized in.pteformat.--num_executions: (Optional) Number of times to run inference (default: 1).
Run inference with a given model for 10 iterations:
./executor_runner \
--model_path=model.pte \
--num_executions=10