1.. _docs-size-optimizations: 2 3================== 4Size optimizations 5================== 6This page contains recommendations for optimizing the size of embedded software 7including its memory and code footprints. 8 9These recommendations are subject to change as the C++ standard and compilers 10evolve, and as the authors continue to gain more knowledge and experience in 11this area. If you disagree with recommendations, please discuss them with the 12Pigweed team, as we're always looking to improve the guide or correct any 13inaccuracies. 14 15--------------------------------- 16Compile Time Constant Expressions 17--------------------------------- 18The use of `constexpr <https://en.cppreference.com/w/cpp/language/constexpr>`_ 19and soon with C++20 20`consteval <https://en.cppreference.com/w/cpp/language/consteval>`_ can enable 21you to evaluate the value of a function or variable more at compile-time rather 22than only at run-time. This can often not only result in smaller sizes but also 23often times more efficient, faster execution. 24 25We highly encourage using this aspect of C++, however there is one caveat: be 26careful in marking functions constexpr in APIs which cannot be easily changed 27in the future unless you can prove that for all time and all platforms, the 28computation can actually be done at compile time. This is because there is no 29"mutable" escape hatch for constexpr. 30 31See the :doc:`embedded_cpp_guide` for more detail. 32 33--------- 34Templates 35--------- 36The compiler implements templates by generating a separate version of the 37function for each set of types it is instantiated with. This can increase code 38size significantly. 39 40Be careful when instantiating non-trivial template functions with multiple 41types. 42 43Consider splitting templated interfaces into multiple layers so that more of the 44implementation can be shared between different instantiations. A more advanced 45form is to share common logic internally by using default sentinel template 46argument value and ergo instantation such as ``pw::Vector``'s 47``size_t kMaxSize = vector_impl::kGeneric`` or ``pw::span``'s 48``size_t Extent = dynamic_extent``. 49 50----------------- 51Virtual Functions 52----------------- 53Virtual functions provide for runtime polymorphism. Unless runtime polymorphism 54is required, virtual functions should be avoided. Virtual functions require a 55virtual table and a pointer to it in each instance, which all increases RAM 56usage and requires extra instructions at each call site. Virtual functions can 57also inhibit compiler optimizations, since the compiler may not be able to tell 58which functions will actually be invoked. This can prevent linker garbage 59collection, resulting in unused functions being linked into a binary. 60 61When runtime polymorphism is required, virtual functions should be considered. 62C alternatives, such as a struct of function pointers, could be used instead, 63but these approaches may offer no performance advantage while sacrificing 64flexibility and ease of use. 65 66Only use virtual functions when runtime polymorphism is needed. Lastly try to 67avoid templated virtual interfaces which can compound the cost by instantiating 68many virtual tables. 69 70Devirtualization 71================ 72When you do use virtual functions, try to keep devirtualization in mind. You can 73make it easier on the compiler and linker by declaring class definitions as 74``final`` to improve the odds. This can help significantly depending on your 75toolchain. 76 77If you're interested in more details, 78`this is an interesting deep dive <https://quuxplusone.github.io/blog/2021/02/15/devirtualization/>`_. 79 80--------------------------------------------------------- 81Initialization, Constructors, Finalizers, and Destructors 82--------------------------------------------------------- 83Constructors 84============ 85Where possible consider making your constructors constexpr to reduce their 86costs. This also enables global instances to be eligible for ``.data`` or if 87all zeros for ``.bss`` section placement. 88 89Static Destructors And Finalizers 90================================= 91For many embedded projects, cleaning up after the program is not a requirement, 92meaning the exit functions including any finalizers registered through 93``atexit``, ``at_quick_exit``, and static destructors can all be removed to 94reduce the size. 95 96The exact mechanics for disabling static destructors depends on your toolchain. 97 98See the `Ignored Finalizer and Destructor Registration`_ section below for 99further details regarding disabling registration of functions to be run at exit 100via ``atexit`` and ``at_quick_exit``. 101 102Clang 103----- 104With modern versions of Clang you can simply use ``-fno-C++-static-destructors`` 105and you are done. 106 107GCC with newlib-nano 108-------------------- 109With GCC this is more complicated. For example with GCC for ARM Cortex M devices 110using ``newlib-nano`` you are forced to tackle the problem in two stages. 111 112First, there are the destructors for the static global objects. These can be 113placed in the ``.fini_array`` and ``.fini`` input sections through the use of 114the ``-fno-use-cxa-atexit`` GCC flag, assuming ``newlib-nano`` was configured 115with ``HAVE_INITFINI_ARAY_SUPPORT``. The two input sections can then be 116explicitly discarded in the linker script through the use of the special 117``/DISCARD/`` output section: 118 119.. code-block:: text 120 121 /DISCARD/ : { 122 /* The finalizers are never invoked when the target shuts down and ergo 123 * can be discarded. These include C++ global static destructors and C 124 * designated finalizers. */ 125 *(.fini_array); 126 *(.fini); 127 128Second, there are the destructors for the scoped static objects, frequently 129referred to as Meyer's Singletons. With the Itanium ABI these use 130``__cxa_atexit`` to register destruction on the fly. However, if 131``-fno-use-cxa-atexit`` is used with GCC and ``newlib-nano`` these will appear 132as ``__tcf_`` prefixed symbols, for example ``__tcf_0``. 133 134There's `an interesting proposal (P1247R0) <http://wg21.link/p1247r0>`_ to 135enable ``[[no_destroy]]`` attributes to C++ which would be tempting to use here. 136Alas this is not an option yet. As mentioned in the proposal one way to remove 137the destructors from these scoped statics is to wrap it in a templated wrapper 138which uses placement new. 139 140.. code-block:: cpp 141 142 #include <type_traits> 143 144 template <class T> 145 class NoDestroy { 146 public: 147 template <class... Ts> 148 NoDestroy(Ts&&... ts) { 149 new (&static_) T(std::forward<Ts>(ts)...); 150 } 151 152 T& get() { return reinterpret_cast<T&>(static_); } 153 154 private: 155 std::aligned_storage_t<sizeof(T), alignof(T)> static_; 156 }; 157 158This can then be used as follows to instantiate scoped statics where the 159destructor will never be invoked and ergo will not be linked in. 160 161.. code-block:: cpp 162 163 Foo& GetFoo() { 164 static NoDestroy<Foo> foo(foo_args); 165 return foo.get(); 166 } 167 168------- 169Strings 170------- 171 172Tokenization 173============ 174Instead of directly using strings and printf, consider using 175:ref:`module-pw_tokenizer` to replace strings and printf-style formatted strings 176with binary tokens during compilation. This can reduce the code size, memory 177usage, I/O traffic, and even CPU utilization by replacing snprintf calls with 178simple tokenization code. 179 180Be careful when using string arguments with tokenization as these still result 181in a string in your binary which is appended to your token at run time. 182 183String Formatting 184================= 185The formatted output family of printf functions in ``<cstdio>`` are quite 186expensive from a code size point of view and they often rely on malloc. Instead, 187where tokenization cannot be used, consider using :ref:`module-pw_string`'s 188utilities. 189 190Removing all printf functions often saves more than 5KiB of code size on ARM 191Cortex M devices using ``newlib-nano``. 192 193Logging & Asserting 194=================== 195Using tokenized backends for logging and asserting such as 196:ref:`module-pw_log_tokenized` coupled with :ref:`module-pw_assert_log` can 197drastically reduce the costs. However, even with this approach there remains a 198callsite cost which can add up due to arguments and including metadata. 199 200Try to avoid string arguments and reduce unnecessary extra arguments where 201possible. And consider adjusting log levels to compile out debug or even info 202logs as code stabilizes and matures. 203 204Future Plans 205------------ 206Going forward Pigweed is evaluating extra configuration options to do things 207such as dropping log arguments for certain log levels and modules to give users 208finer grained control in trading off diagnostic value and the size cost. 209 210---------------------------------- 211Threading and Synchronization Cost 212---------------------------------- 213 214Lighterweight Signaling Primatives 215================================== 216Consider using ``pw::sync::ThreadNotification`` instead of semaphores as they 217can be implemented using more efficient RTOS specific signaling primitives. For 218example on FreeRTOS they can be backed by direct task notifications which are 219more than 10x smaller than semaphores while also being faster. 220 221Threads and their stack sizes 222============================= 223Although synchronous APIs are incredibly portable and often easier to reason 224about, it is often easy to forget the large stack cost this design paradigm 225comes with. We highly recommend watermarking your stacks to reduce wasted 226memory. 227 228Our snapshot integration for RTOSes such as :ref:`module-pw_thread_freertos` and 229:ref:`module-pw_thread_embos` come with built in support to report stack 230watermarks for threads if enabled in the kernel. 231 232In addition, consider using asynchronous design patterns such as Active Objects 233which can use :ref:`module-pw_work_queue` or similar asynchronous dispatch work 234queues to effectively permit the sharing of stack allocations. 235 236Buffer Sizing 237============= 238We'd be remiss not to mention the sizing of the various buffers that may exist 239in your application. You could consider watermarking them with 240:ref:`module-pw_metric`. You may also be able to adjust their servicing interval 241and priority, but do not forget to keep the ingress burst sizes and scheduling 242jitter into account. 243 244---------------------------- 245Standard C and C++ libraries 246---------------------------- 247Toolchains are typically distributed with their preferred standard C library and 248standard C++ library of choice for the target platform. 249 250Although you do not always have a choice in what standard C library and what 251standard C++ library is used or even how it's compiled, stay vigilant for common 252sources of bloat. 253 254Assert 255====== 256The standard C library should provides the ``assert`` function or macro which 257may be internally used even if your application does not invoke it directly. 258Although this can be disabled through ``NDEBUG`` there typically is not a 259portable way of replacing the ``assert(condition)`` implementation without 260configuring and recompiling your standard C library. 261 262However, you can consider replacing the implementation at link time with a 263cheaper implementation. For example ``newlib-nano``, which comes with the 264``GNU Arm Embedded Toolchain``, often has an expensive ``__assert_func`` 265implementation which uses ``fiprintf`` to print to ``stderr`` before invoking 266``abort()``. This can be replaced with a simple ``PW_CRASH`` invocation which 267can save several kilobytes in case ``fiprintf`` isn't used elsewhere. 268 269One option to remove this bloat is to use ``--wrap`` at link time to replace 270these implementations. As an example in GN you could replace it with the 271following ``BUILD.gn`` file: 272 273.. code-block:: text 274 275 import("//build_overrides/pigweed.gni") 276 277 import("$dir_pw_build/target_types.gni") 278 279 # Wraps the function called by newlib's implementation of assert from stdlib.h. 280 # 281 # When using this, we suggest injecting :newlib_assert via pw_build_LINK_DEPS. 282 config("wrap_newlib_assert") { 283 ldflags = [ "-Wl,--wrap=__assert_func" ] 284 } 285 286 # Implements the function called by newlib's implementation of assert from 287 # stdlib.h which invokes __assert_func unless NDEBUG is defined. 288 pw_source_set("wrapped_newlib_assert") { 289 sources = [ "wrapped_newlib_assert.cc" ] 290 deps = [ 291 "$dir_pw_assert:check", 292 "$dir_pw_preprocessor", 293 ] 294 } 295 296And a ``wrapped_newlib_assert.cc`` source file implementing the wrapped assert 297function: 298 299.. code-block:: cpp 300 301 #include "pw_assert/check.h" 302 #include "pw_preprocessor/compiler.h" 303 304 // This is defined by <cassert> 305 extern "C" PW_NO_RETURN void __wrap___assert_func(const char*, 306 int, 307 const char*, 308 const char*) { 309 PW_CRASH("libc assert() failure"); 310 } 311 312 313Ignored Finalizer and Destructor Registration 314============================================= 315Even if no cleanup is done during shutdown for your target, shutdown functions 316such as ``atexit``, ``at_quick_exit``, and ``__cxa_atexit`` can sometimes not be 317linked out. This may be due to vendor code or perhaps using scoped statics, also 318known as Meyer's Singletons. 319 320The registration of these destructors and finalizers may include locks, malloc, 321and more depending on your standard C library and its configuration. 322 323One option to remove this bloat is to use ``--wrap`` at link time to replace 324these implementations with ones which do nothing. As an example in GN you could 325replace it with the following ``BUILD.gn`` file: 326 327.. code-block:: text 328 329 import("//build_overrides/pigweed.gni") 330 331 import("$dir_pw_build/target_types.gni") 332 333 config("wrap_atexit") { 334 ldflags = [ 335 "-Wl,--wrap=atexit", 336 "-Wl,--wrap=at_quick_exit", 337 "-Wl,--wrap=__cxa_atexit", 338 ] 339 } 340 341 # Implements atexit, at_quick_exit, and __cxa_atexit from stdlib.h with noop 342 # versions for targets which do not cleanup during exit and quick_exit. 343 # 344 # This removes any dependencies which may exist in your existing libc. 345 # Although this removes the ability for things such as Meyer's Singletons, 346 # i.e. non-global statics, to register destruction function it does not permit 347 # them to be garbage collected by the linker. 348 pw_source_set("wrapped_noop_atexit") { 349 sources = [ "wrapped_noop_atexit.cc" ] 350 } 351 352And a ``wrapped_noop_atexit.cc`` source file implementing the noop functions: 353 354.. code-block:: cpp 355 356 // These two are defined by <cstdlib>. 357 extern "C" int __wrap_atexit(void (*)(void)) { return 0; } 358 extern "C" int __wrap_at_quick_exit(void (*)(void)) { return 0; } 359 360 // This function is part of the Itanium C++ ABI, there is no header which 361 // provides this. 362 extern "C" int __wrap___cxa_atexit(void (*)(void*), void*, void*) { return 0; } 363 364Unexpected Bloat in Disabled STL Exceptions 365=========================================== 366The GCC 367`manual <https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_exceptions.html>`_ 368recommends using ``-fno-exceptions`` along with ``-fno-unwind-tables`` to 369disable exceptions and any associated overhead. This should replace all throw 370statements with calls to ``abort()``. 371 372However, what we've noticed with the GCC and ``libstdc++`` is that there is a 373risk that the STL will still throw exceptions when the application is compiled 374with ``-fno-exceptions`` and there is no way for you to catch them. In theory, 375this is not unsafe because the unhandled exception will invoke ``abort()`` via 376``std::terminate()``. This can occur because the libraries such as 377``libstdc++.a`` may not have been compiled with ``-fno-exceptions`` even though 378your application is linked against it. 379 380See 381`this <https://blog.mozilla.org/nnethercote/2011/01/18/the-dangers-of-fno-exceptions/>`_ 382for more information. 383 384Unfortunately there can be significant overhead surrounding these throw call 385sites in the ``std::__throw_*`` helper functions. These implementations such as 386``std::__throw_out_of_range_fmt(const char*, ...)`` and 387their snprintf and ergo malloc dependencies can very quickly add up to many 388kilobytes of unnecessary overhead. 389 390One option to remove this bloat while also making sure that the exceptions will 391actually result in an effective ``abort()`` is to use ``--wrap`` at link time to 392replace these implementations with ones which simply call ``PW_CRASH``. 393 394As an example in GN you could replace it with the following ``BUILD.gn`` file, 395note that the mangled names must be used: 396 397.. code-block:: text 398 399 import("//build_overrides/pigweed.gni") 400 401 import("$dir_pw_build/target_types.gni") 402 403 # Wraps the std::__throw_* functions called by GNU ISO C++ Library regardless 404 # of whether "-fno-exceptions" is specified. 405 # 406 # When using this, we suggest injecting :wrapped_libstdc++_functexcept via 407 # pw_build_LINK_DEPS. 408 config("wrap_libstdc++_functexcept") { 409 ldflags = [ 410 "-Wl,--wrap=_ZSt21__throw_bad_exceptionv", 411 "-Wl,--wrap=_ZSt17__throw_bad_allocv", 412 "-Wl,--wrap=_ZSt16__throw_bad_castv", 413 "-Wl,--wrap=_ZSt18__throw_bad_typeidv", 414 "-Wl,--wrap=_ZSt19__throw_logic_errorPKc", 415 "-Wl,--wrap=_ZSt20__throw_domain_errorPKc", 416 "-Wl,--wrap=_ZSt24__throw_invalid_argumentPKc", 417 "-Wl,--wrap=_ZSt20__throw_length_errorPKc", 418 "-Wl,--wrap=_ZSt20__throw_out_of_rangePKc", 419 "-Wl,--wrap=_ZSt24__throw_out_of_range_fmtPKcz", 420 "-Wl,--wrap=_ZSt21__throw_runtime_errorPKc", 421 "-Wl,--wrap=_ZSt19__throw_range_errorPKc", 422 "-Wl,--wrap=_ZSt22__throw_overflow_errorPKc", 423 "-Wl,--wrap=_ZSt23__throw_underflow_errorPKc", 424 "-Wl,--wrap=_ZSt19__throw_ios_failurePKc", 425 "-Wl,--wrap=_ZSt19__throw_ios_failurePKci", 426 "-Wl,--wrap=_ZSt20__throw_system_errori", 427 "-Wl,--wrap=_ZSt20__throw_future_errori", 428 "-Wl,--wrap=_ZSt25__throw_bad_function_callv", 429 ] 430 } 431 432 # Implements the std::__throw_* functions called by GNU ISO C++ Library 433 # regardless of whether "-fno-exceptions" is specified with PW_CRASH. 434 pw_source_set("wrapped_libstdc++_functexcept") { 435 sources = [ "wrapped_libstdc++_functexcept.cc" ] 436 deps = [ 437 "$dir_pw_assert:check", 438 "$dir_pw_preprocessor", 439 ] 440 } 441 442And a ``wrapped_libstdc++_functexcept.cc`` source file implementing each 443wrapped and mangled ``std::__throw_*`` function: 444 445.. code-block:: cpp 446 447 #include "pw_assert/check.h" 448 #include "pw_preprocessor/compiler.h" 449 450 // These are all wrapped implementations of the throw functions provided by 451 // libstdc++'s bits/functexcept.h which are not needed when "-fno-exceptions" 452 // is used. 453 454 // std::__throw_bad_exception(void) 455 extern "C" PW_NO_RETURN void __wrap__ZSt21__throw_bad_exceptionv() { 456 PW_CRASH("std::throw_bad_exception"); 457 } 458 459 // std::__throw_bad_alloc(void) 460 extern "C" PW_NO_RETURN void __wrap__ZSt17__throw_bad_allocv() { 461 PW_CRASH("std::throw_bad_alloc"); 462 } 463 464 // std::__throw_bad_cast(void) 465 extern "C" PW_NO_RETURN void __wrap__ZSt16__throw_bad_castv() { 466 PW_CRASH("std::throw_bad_cast"); 467 } 468 469 // std::__throw_bad_typeid(void) 470 extern "C" PW_NO_RETURN void __wrap__ZSt18__throw_bad_typeidv() { 471 PW_CRASH("std::throw_bad_typeid"); 472 } 473 474 // std::__throw_logic_error(const char*) 475 extern "C" PW_NO_RETURN void __wrap__ZSt19__throw_logic_errorPKc(const char*) { 476 PW_CRASH("std::throw_logic_error"); 477 } 478 479 // std::__throw_domain_error(const char*) 480 extern "C" PW_NO_RETURN void __wrap__ZSt20__throw_domain_errorPKc(const char*) { 481 PW_CRASH("std::throw_domain_error"); 482 } 483 484 // std::__throw_invalid_argument(const char*) 485 extern "C" PW_NO_RETURN void __wrap__ZSt24__throw_invalid_argumentPKc( 486 const char*) { 487 PW_CRASH("std::throw_invalid_argument"); 488 } 489 490 // std::__throw_length_error(const char*) 491 extern "C" PW_NO_RETURN void __wrap__ZSt20__throw_length_errorPKc(const char*) { 492 PW_CRASH("std::throw_length_error"); 493 } 494 495 // std::__throw_out_of_range(const char*) 496 extern "C" PW_NO_RETURN void __wrap__ZSt20__throw_out_of_rangePKc(const char*) { 497 PW_CRASH("std::throw_out_of_range"); 498 } 499 500 // std::__throw_out_of_range_fmt(const char*, ...) 501 extern "C" PW_NO_RETURN void __wrap__ZSt24__throw_out_of_range_fmtPKcz( 502 const char*, ...) { 503 PW_CRASH("std::throw_out_of_range"); 504 } 505 506 // std::__throw_runtime_error(const char*) 507 extern "C" PW_NO_RETURN void __wrap__ZSt21__throw_runtime_errorPKc( 508 const char*) { 509 PW_CRASH("std::throw_runtime_error"); 510 } 511 512 // std::__throw_range_error(const char*) 513 extern "C" PW_NO_RETURN void __wrap__ZSt19__throw_range_errorPKc(const char*) { 514 PW_CRASH("std::throw_range_error"); 515 } 516 517 // std::__throw_overflow_error(const char*) 518 extern "C" PW_NO_RETURN void __wrap__ZSt22__throw_overflow_errorPKc( 519 const char*) { 520 PW_CRASH("std::throw_overflow_error"); 521 } 522 523 // std::__throw_underflow_error(const char*) 524 extern "C" PW_NO_RETURN void __wrap__ZSt23__throw_underflow_errorPKc( 525 const char*) { 526 PW_CRASH("std::throw_underflow_error"); 527 } 528 529 // std::__throw_ios_failure(const char*) 530 extern "C" PW_NO_RETURN void __wrap__ZSt19__throw_ios_failurePKc(const char*) { 531 PW_CRASH("std::throw_ios_failure"); 532 } 533 534 // std::__throw_ios_failure(const char*, int) 535 extern "C" PW_NO_RETURN void __wrap__ZSt19__throw_ios_failurePKci(const char*, 536 int) { 537 PW_CRASH("std::throw_ios_failure"); 538 } 539 540 // std::__throw_system_error(int) 541 extern "C" PW_NO_RETURN void __wrap__ZSt20__throw_system_errori(int) { 542 PW_CRASH("std::throw_system_error"); 543 } 544 545 // std::__throw_future_error(int) 546 extern "C" PW_NO_RETURN void __wrap__ZSt20__throw_future_errori(int) { 547 PW_CRASH("std::throw_future_error"); 548 } 549 550 // std::__throw_bad_function_call(void) 551 extern "C" PW_NO_RETURN void __wrap__ZSt25__throw_bad_function_callv() { 552 PW_CRASH("std::throw_bad_function_call"); 553 } 554 555--------------------------------- 556Compiler and Linker Optimizations 557--------------------------------- 558 559Compiler Optimization Options 560============================= 561Don't forget to configure your compiler to optimize for size if needed. With 562Clang this is ``-Oz`` and with GCC this can be done via ``-Os``. The GN 563toolchains provided through :ref:`module-pw_toolchain` which are optimized for 564size are suffixed with ``*_size_optimized``. 565 566Garbage collect function and data sections 567========================================== 568By default the linker will place all functions in an object within the same 569linker "section" (e.g. ``.text``). With Clang and GCC you can use 570``-ffunction-sections`` and ``-fdata-sections`` to use a unique "section" for 571each object (e.g. ``.text.do_foo_function``). This permits you to pass 572``--gc-sections`` to the linker to cull any unused sections which were not 573referenced. 574 575To see what sections were garbage collected you can pass ``--print-gc-sections`` 576to the linker so it prints out what was removed. 577 578The GN toolchains provided through :ref:`module-pw_toolchain` are configured to 579do this by default. 580 581Function Inlining 582================= 583Don't forget to expose trivial functions such as member accessors as inline 584definitions in the header. The compiler and linker can make the trade-off on 585whether the function should be actually inlined or not based on your 586optimization settings, however this at least gives it the option. Note that LTO 587can inline functions which are not defined in headers. 588 589We stand by the 590`Google style guide <https://google.github.io/styleguide/cppguide.html#Inline_Functions>`_ 591to recommend considering this for simple functions which are 10 lines or less. 592 593Link Time Optimization (LTO) 594============================ 595**Summary: LTO can decrase your binary size, at a cost: LTO makes debugging 596harder, interacts poorly with linker scripts, and makes crash reports less 597informative. We advise only enabling LTO when absolutely necessary.** 598 599Link time optimization (LTO) moves some optimizations from the individual 600compile steps to the final link step, to enable optimizing across translation 601unit boundaries. 602 603LTO can both increase performance and reduce binary size for embedded projects. 604This appears to be a strict improvement; and one might think enabling LTO at 605all times is the best approach. However, this is not the case; in practice, LTO 606is a trade-off. 607 608**LTO benefits** 609 610* **Reduces binary size** - When compiling with size-shrinking flags like 611 ``-Oz``, some function call overhead can be eliminated, and code paths might 612 be eliminated by the optimizer after inlining. This can include critical 613 abstraction removal like devirtualization. 614* **Improves performance** - When code is inlined, the optimizer can better 615 reduce the number of instructions. When code is smaller, the instruction 616 cache has better hit ratio leading to better performance. In some cases, 617 entire function calls are eliminated. 618 619**LTO costs** 620 621* **LTO interacts poorly with linker scripts** - Production embedded projects 622 often have complicated linker scripts to control the physical layout of code 623 and data on the device. For example, you may want to put performance critical 624 audio codec functions into the fast tightly coupled (TCM) memory region. 625 However, LTO can interact with linker script requirements in strange ways, 626 like inappropriately inlining code that was manually placed into other 627 functions in the wrong region; leading to hard-to-understand bugs. 628* **Debugging LTO binaries is harder** - LTO increases the differences between 629 the machine code and the source code. This makes stepping through source code 630 in a debugger confusing, since the instruction pointer can hop around in 631 confusing ways. 632* **Crash reports for LTO binaries can be misleading** - Just as with 633 debugging, LTO'd binaries can produce confusing stacks in crash reports. 634* **LTO significantly increases build times** - The compilation model is 635 different when LTO is enabled, since individual translation unit compilations 636 (`.cc` --> `.o`) files now produce LLVM- or GCC- IR instead of native machine 637 code; machine code is only generated at the link phase. This makes the final 638 link step take significantly longer. Since any source changes will result in 639 a link step, developer velocity is reduced due to the slow compile time. 640 641How to enable LTO 642----------------- 643On GCC and Clang LTO is enabled by passing ``-flto`` to both the compiler 644and the linker. On GCC ``-fdevirtualize-at-ltrans`` enables more aggressive 645devirtualization. 646 647Our recommendation 648------------------ 649* Disable LTO unless absolutely necessary; e.g. due to lack of space. 650* When enabling LTO, carefully and thoroughly test the resulting binary. 651* Check that crash reports are still useful under LTO for your product. 652 653Disabling Scoped Static Initialization Locks 654============================================ 655C++11 requires that scoped static objects are initialized in a thread-safe 656manner. This also means that scoped statics, i.e. Meyer's Singletons, be 657thread-safe. Unfortunately this rarely is the case on embedded targets. For 658example with GCC on an ARM Cortex M device if you test for this you will 659discover that instead the program crashes as reentrant initialization is 660detected through the use of guard variables. 661 662With GCC and Clang, ``-fno-threadsafe-statics`` can be used to remove the global 663lock which often does not work for embedded targets. Note that this leaves the 664guard variables in place which ensure that reentrant initialization continues to 665crash. 666 667Be careful when using this option in case you are relying on threadsafe 668initialization of statics and the global locks were functional for your target. 669 670Triaging Unexpectedly Linked In Functions 671========================================= 672Lastly as a tip if you cannot figure out why a function is being linked in you 673can consider: 674 675* Using ``--wrap`` with the linker to remove the implementation, resulting in a 676 link failure which typically calls out which calling function can no longer be 677 linked. 678* With GCC, you can use ``-fcallgraph-info`` to visualize or otherwise inspect 679 the callgraph to figure out who is calling what. 680* Sometimes symbolizing the address can resolve what a function is for. For 681 example if you are using ``newlib-nano`` along with ``-fno-use-cxa-atexit``, 682 scoped static destructors are prefixed ``__tcf_*``. To figure out object these 683 destructor functions are associated with, you can use ``llvm-symbolizer`` or 684 ``addr2line`` and these will often print out the related object's name. 685 686Sorting input sections by alignment 687========================================= 688 689Linker scripts often contain input section wildcard patterns to specify which 690input sections should be placed in each output section. For example, say a 691linker script contains a sections command like the following: 692 693.. code-block:: text 694 695 .text : { *(.init*) *(.text*) } 696 697By default, the GCC and Clang linkers will place symbols matched by each 698wildcard pattern in the order they are seen at link-time. The linker will insert 699padding bytes as necessary to satisfy the alignment requirements of each symbol. 700 701The GCC and Clang linkers allow one to first sort matched symbols for each 702wildcard pattern by alignment with the ``SORT_BY_ALIGNMENT`` keyword, which can 703reduce the amount of necessary padding bytes and save memory. This can be used 704to enable alignment sort on a per-pattern basis like so: 705 706.. code-block:: text 707 708 .text : { *(SORT_BY_ALIGNMENT(.init*)) *(SORT_BY_ALIGNMENT(.text*)) } 709 710This keyword can be applied globally to all wildcard matches in your linker 711script by passing the ``--sort-section=alignment`` option to the linker. 712 713See the `ld manual <https://sourceware.org/binutils/docs/ld/Input-Section-Wildcards.html>`_ 714for more information. 715