Release v3.0.0 · openvinotoolkit/nncf

Post-training Quantization:

Breaking changes:
- Renamed nncf.CompressWeightsMode.CB4_F8E4M3 mode option to nncf.CompressWeightsMode.CB4.
General:
- Added nncf.prune API function, which provides a unified interface for pruning algorithms. Currently available for PyTorch backend and supports Magnitude Pruning.
  More details about the new API can be found in the documentation.
- Added nncf.build_graph API function for building NNCFGraph from a model. This API can be used to inspect and define the ignored scope.
- Added documentation about using nncf.IgnoredScope.
- Reworked HWConfig, now using Python-style definition of hardware configuration instead of JSON files.
Features:
- Added support for models containing MatMul operations with transposed activation inputs in data-free Weight Compression and data-aware AWQ algorithms.
- (OpenVINO) Introduced new experimental compression data type ADAPTIVE_CODEBOOK. This compression type calculates a unique codebook for each MatMul or block of identical MatMuls (for example, all down_proj could have the same codebook). This approach reduces quality degradation in the case of per-channel weight compression. See example.
- (TorchFX) Preview support for the new compress_pt2e API has been introduced, enabling quantization of torch.fx.GraphModule models with the OpenVINOQuantizer. Users now can quantize their models in ExecuTorch for the OpenVINO backend via the nncf compress_pt2e employing Scale Estimation and AWQ.
- (PyTorch) Added support for linear functions for the Fast Bias Correction algorithm to improve the accuracy of such models after the quantization.
- (OpenVINO) Added activation profiler tool to collect and visualize tensor statistics.
Fixes:
- (ONNX) Fixed compress_quantize_weights_transformation() method by removing names of deleted initializers from graph inputs.
- (ONNX) Fixed incorrect insertion of MatMulNBits nodes.
Improvements:
- Added support for the compression of 3D weights in AWQ, Scale Estimation, and GPTQ algorithms. Models with MoE (Mixture of Experts), such as GPT-OSS-20B and Qwen3-30B-A3B, can be compressed with data-aware methods now.
Tutorials:

Deprecations/Removals:

(TensorFlow) Removed support for TensorFlow backend.
(PyTorch) Removed legacy create_compressed_model API for PyTorch backend, which was previously marked as deprecated.
(PyTorch) Removed legacy algorithms for PyTorch that were based on using NNCFNetwork: NAS, Structural Pruning, AutoML, Knowledge Distillation, Mixed-Precision Quantization, and Movement Sparsity.

Requirements:

Dropped jsonschema, natsort, and pymoo from dependencies as they are no longer required.
Updated numpy to >=1.24.0, <2.5.0.

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@avolkov-intel @Shehrozkashif @ruro @mostafafaheem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3.0.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Contributors

Uh oh!