1# Performance best practices 2 3Mobile and embedded devices have limited computational resources, so it is 4important to keep your application resource efficient. We have compiled a list 5of best practices and strategies that you can use to improve your TensorFlow 6Lite model performance. 7 8## Choose the best model for the task 9 10Depending on the task, you will need to make a tradeoff between model complexity 11and size. If your task requires high accuracy, then you may need a large and 12complex model. For tasks that require less precision, it is better to use a 13smaller model because they not only use less disk space and memory, but they are 14also generally faster and more energy efficient. For example, graphs below show 15accuracy and latency tradeoffs for some common image classification models. 16 17 18 19 20 21One example of models optimized for mobile devices are 22[MobileNets](https://arxiv.org/abs/1704.04861), which are optimized for mobile 23vision applications. 24[TensorFlow Hub](https://tfhub.dev/s?deployment-format=lite) lists several other 25models that have been optimized specifically for mobile and embedded devices. 26 27You can retrain the listed models on your own dataset by using transfer 28learning. Check out the transfer learning tutorials using TensorFlow Lite 29[Model Maker](../models/modify/model_maker/). 30 31## Profile your model 32 33Once you have selected a candidate model that is right for your task, it is a 34good practice to profile and benchmark your model. TensorFlow Lite 35[benchmarking tool](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark) 36has a built-in profiler that shows per operator profiling statistics. This can 37help in understanding performance bottlenecks and which operators dominate the 38computation time. 39 40You can also use 41[TensorFlow Lite tracing](measurement#trace_tensorflow_lite_internals_in_android) 42to profile the model in your Android application, using standard Android system 43tracing, and to visualize the operator invocations by time with GUI based 44profiling tools. 45 46## Profile and optimize operators in the graph 47 48If a particular operator appears frequently in the model and, based on 49profiling, you find that the operator consumes the most amount of time, you can 50look into optimizing that operator. This scenario should be rare as TensorFlow 51Lite has optimized versions for most operators. However, you may be able to 52write a faster version of a custom op if you know the constraints in which the 53operator is executed. Check out the 54[custom operators guide](../guide/ops_custom). 55 56## Optimize your model 57 58Model optimization aims to create smaller models that are generally faster and 59more energy efficient, so that they can be deployed on mobile devices. 60TensorFlow Lite supports multiple optimization techniques, such as quantization. 61 62Check out the [model optimization docs](model_optimization) for details. 63 64## Tweak the number of threads 65 66TensorFlow Lite supports multi-threaded kernels for many operators. You can 67increase the number of threads and speed up execution of operators. Increasing 68the number of threads will, however, make your model use more resources and 69power. 70 71For some applications, latency may be more important than energy efficiency. You 72can increase the number of threads by setting the number of interpreter 73[threads](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/interpreter.h#L346). 74Multi-threaded execution, however, comes at the cost of increased performance 75variability depending on what else is executed concurrently. This is 76particularly the case for mobile apps. For example, isolated tests may show 2x 77speed-up vs single-threaded, but, if another app is executing at the same time, 78it may result in worse performance than single-threaded. 79 80## Eliminate redundant copies 81 82If your application is not carefully designed, there can be redundant copies 83when feeding the input to and reading the output from the model. Make sure to 84eliminate redundant copies. If you are using higher level APIs, like Java, make 85sure to carefully check the documentation for performance caveats. For example, 86the Java API is a lot faster if `ByteBuffers` are used as 87[inputs](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/java/src/main/java/org/tensorflow/lite/Interpreter.java#L175). 88 89## Profile your application with platform specific tools 90 91Platform specific tools like 92[Android profiler](https://developer.android.com/studio/profile/android-profiler) 93and [Instruments](https://help.apple.com/instruments/mac/current/) provide a 94wealth of profiling information that can be used to debug your app. Sometimes 95the performance bug may be not in the model but in parts of application code 96that interact with the model. Make sure to familiarize yourself with platform 97specific profiling tools and best practices for your platform. 98 99## Evaluate whether your model benefits from using hardware accelerators available on the device 100 101TensorFlow Lite has added new ways to accelerate models with faster hardware 102like GPUs, DSPs, and neural accelerators. Typically, these accelerators are 103exposed through [delegate](delegates) submodules that take over parts of the 104interpreter execution. TensorFlow Lite can use delegates by: 105 106* Using Android's 107 [Neural Networks API](https://developer.android.com/ndk/guides/neuralnetworks/). 108 You can utilize these hardware accelerator backends to improve the speed and 109 efficiency of your model. To enable the Neural Networks API, check out the 110 [NNAPI delegate](https://www.tensorflow.org/lite/android/delegates/nnapi) 111 guide. 112* GPU delegate is available on Android and iOS, using OpenGL/OpenCL and Metal, 113 respectively. To try them out, see the [GPU delegate tutorial](gpu) and 114 [documentation](gpu_advanced). 115* Hexagon delegate is available on Android. It leverages the Qualcomm Hexagon 116 DSP if it is available on the device. See the 117 [Hexagon delegate tutorial](https://www.tensorflow.org/lite/android/delegates/hexagon) 118 for more information. 119* It is possible to create your own delegate if you have access to 120 non-standard hardware. See [TensorFlow Lite delegates](delegates) for more 121 information. 122 123Be aware that some accelerators work better for different types of models. Some 124delegates only support float models or models optimized in a specific way. It is 125important to [benchmark](measurement) each delegate to see if it is a good 126choice for your application. For example, if you have a very small model, it may 127not be worth delegating the model to either the NN API or the GPU. Conversely, 128accelerators are a great choice for large models that have high arithmetic 129intensity. 130 131## Need more help 132 133The TensorFlow team is happy to help diagnose and address specific performance 134issues you may be facing. Please file an issue on 135[GitHub](https://github.com/tensorflow/tensorflow/issues) with details of the 136issue. 137