Ready-to-use Compressed LLMs can be found on OpenVINO Hugging Face page. Each model card includes NNCF parameters that were used to compress the model.
INT8 Post-Training Quantization (PTQ) results for public Vision, NLP and GenAI models can be found on OpenVino Performance Benchmarks page. PTQ results for ONNX models are available in the ONNX section below.
| ONNX Model | Compression algorithm | Dataset | Accuracy (drop) % |
|---|---|---|---|
| DenseNet-121 | PTQ | ImageNet | 60.16 (0.8) |
| GoogleNet | PTQ | ImageNet | 66.36 (0.3) |
| MobileNet V2 | PTQ | ImageNet | 71.38 (0.49) |
| ResNet-50 | PTQ | ImageNet | 74.63 (0.21) |
| ShuffleNet | PTQ | ImageNet | 47.25 (0.18) |
| SqueezeNet V1.0 | PTQ | ImageNet | 54.3 (0.54) |
| VGG‑16 | PTQ | ImageNet | 72.02 (0.0) |
| ONNX Model | Compression algorithm | Dataset | mAP (drop) % |
|---|---|---|---|
| SSD1200 | PTQ | COCO2017 | 20.17 (0.17) |
| Tiny-YOLOv2 | PTQ | VOC12 | 29.03 (0.23) |