Skip to content

Latest commit

 

History

History
156 lines (146 loc) · 3.89 KB

File metadata and controls

156 lines (146 loc) · 3.89 KB

NNCF Compressed Model Zoo

Ready-to-use Compressed LLMs can be found on OpenVINO Hugging Face page. Each model card includes NNCF parameters that were used to compress the model.

INT8 Post-Training Quantization (PTQ) results for public Vision, NLP and GenAI models can be found on OpenVino Performance Benchmarks page. PTQ results for ONNX models are available in the ONNX section below.

PyTorch

PyTorch NLP (HuggingFace Transformers-powered models)

PyTorch Model Compression algorithm Dataset Accuracy (drop) %
BERT-base-cased • QAT: INT8 CoNLL2003 99.18 (-0.01)
BERT-base-cased • QAT: INT8 MRPC 84.8 (-0.24)
BERT-base-chinese • QAT: INT8 XNLI 77.22 (0.46)
BERT-large
(Whole Word Masking)
• QAT: INT8 SQuAD v1.1 F1: 92.68 (0.53)
DistilBERT-base • QAT: INT8 SST-2 90.3 (0.8)
GPT-2 • QAT: INT8 WikiText-2 (raw) perplexity: 20.9 (-1.17)
MobileBERT • QAT: INT8 SQuAD v1.1 F1: 89.4 (0.58)
RoBERTa-large • QAT: INT8 MNLI matched: 89.25 (1.35)

ONNX

ONNX Classification

ONNX Model Compression algorithm Dataset Accuracy (drop) %
DenseNet-121 PTQ ImageNet 60.16 (0.8)
GoogleNet PTQ ImageNet 66.36 (0.3)
MobileNet V2 PTQ ImageNet 71.38 (0.49)
ResNet-50 PTQ ImageNet 74.63 (0.21)
ShuffleNet PTQ ImageNet 47.25 (0.18)
SqueezeNet V1.0 PTQ ImageNet 54.3 (0.54)
VGG‑16 PTQ ImageNet 72.02 (0.0)

ONNX Object Detection

ONNX Model Compression algorithm Dataset mAP (drop) %
SSD1200 PTQ COCO2017 20.17 (0.17)
Tiny-YOLOv2 PTQ VOC12 29.03 (0.23)