Is your feature request related to a problem? Please describe.
Not a problem. An idea about how the TensorRT performance can be improved.
I checked Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) improvements on multiple projects. The results are available here. According to the tests, these optimizations can help with achieving better performance in many cases for many applications: compilers and interpreters, static analysis, databases, networking, etc. Since this, I think optimizing TensorRT (its C++ part) with PGO and PLO would be a good idea.
Describe the solution you'd like
I can suggest the following things:
- Perform PGO benchmarks on TensorRT. If it shows improvements - add a note to the documentation about possible improvements in TensorRT performance with PGO.
- Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize TensorRT according to their workloads.
- Optimize pre-built TensorRT binaries
Additional context
As an additional optimization step after PGO, I can suggest Post-Link Optimization (PLO) with a tool like LLVM BOLT. I think it's still worth evaluating it only after the PGO integration into TensorRT.
Here I collected several PGO-related links (more PGO-related materials available at https://github.com/zamazan4ik/awesome-pgo/).
Examples of how PGO optimization is integrated into other projects:
I have some examples of how PGO information looks in the documentation:
Regarding LLVM BOLT integration, I have the following examples:
Is your feature request related to a problem? Please describe.
Not a problem. An idea about how the TensorRT performance can be improved.
I checked Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) improvements on multiple projects. The results are available here. According to the tests, these optimizations can help with achieving better performance in many cases for many applications: compilers and interpreters, static analysis, databases, networking, etc. Since this, I think optimizing TensorRT (its C++ part) with PGO and PLO would be a good idea.
Describe the solution you'd like
I can suggest the following things:
Additional context
As an additional optimization step after PGO, I can suggest Post-Link Optimization (PLO) with a tool like LLVM BOLT. I think it's still worth evaluating it only after the PGO integration into TensorRT.
Here I collected several PGO-related links (more PGO-related materials available at https://github.com/zamazan4ik/awesome-pgo/).
Examples of how PGO optimization is integrated into other projects:
configurescriptI have some examples of how PGO information looks in the documentation:
Regarding LLVM BOLT integration, I have the following examples: