world leader in high performance signal processing
Trace: » optimization

Optimize uClinux Kernel and Application

Remember the following strategy when writing an application:

  1. Choose an algorithm suited to the architecture being targeted. For example, a trade-off may exist between memory usage and algorithm complexity that may be influenced by the target architecture.
  2. Code the algorithm in a simple, high-level generic form. Keep the target in mind, especially regarding choices of data types.
  3. Emphasize code tuning. For critical code sections, carefully consider the strengths of the target platform and make non-portable changes where necessary.

Your first decision is to choose whether to implement your application in C or C++ or Assembly. This decision may be influenced by performance considerations. C++ code using only C features has very similar performance to a pure C source. Many higher level C++ features (for example, those resolved at compilation, such as namespaces, overloaded functions and inheritance) have no performance cost. However, use of some other features may degrade performance. Carefully weigh performance loss against the richness of expression available in C++. Examples of features that may degrade performance include virtual functions or classes used to implement basic data types.

There is a vast difference in performance between code compiled optimized and code compiled non-optimized. In some cases, optimized code can run ten or twenty times faster. Always use optimization when measuring performance or shipping code as product.

The optimizer in the C/C++ compiler is designed to generate efficient code from source that has been written in a straightforward manner. The basic strategy for tuning a program is to present the algorithm in a way that gives the optimizer excellent visibility of the operations and data, and hence the greatest freedom to safely manipulate the code. Future releases of the compiler will continue to enhance the optimizer. Expressing algorithms simply will provide the best chance of benefiting from such enhancements.

that the default setting is for non-optimized compilation to assist programmers in diagnosing problems with their initial coding.

GCC optimization for Blackfin

There are many features of the C and C++ languages that, while legal, indicate programming errors. There are also aspects that are valid but may be relatively expensive for an embedded environment. The compiler can provide the following diagnostics, which may avoid time and effort characterizing source-related problems:

  • Warnings and remarks
  • Source and assembly annotations

These diagnostics are particularly important for obtaining high-performance code, since the optimizer aggressively transforms the application to get the best performance, discarding unused or redundant code. If this code is redundant because of a programming error (such as omitting an essential volatile qualifier from a declaration), then the code will behave differently from a non-optimized version. Using the compiler’s diagnostics may help you identify such situations before they become problems.

By default, the compiler emits warnings to the standard error stream at compile-time, when it detects a problem with the source code. Disabling warnings, while possible is inadvisable until each instance has been investigated for problems. A typical warning involves a variable being used before its value has been set.

Using the GCC optimization flag

Without any optimization option, the compiler’s goal is to reduce the cost of compilation and to make debugging produce the expected results. Turning on optimization flags makes the compiler attempt to improve the performance and/or code size at the expense of compilation time and possibly the ability to debug the program.

  • -O0: Do not optimize. This is the default.
  • -O, -O1: The compiler tries to reduce code size and execution time, without performing any optimizations that take a great deal of compilation time.
  • -O2: GCC performs nearly all supported optimizations that do not involve a space-speed trade‐off. This option increases both compilation time and the performance of the generated code.
  • -O3: Besides other optimizations, the compiler performs loop unrolling or function inlining, leading to possibly bigger size.
  • -Os: Optimize for size.

Fast Floating point

Blackfin has no FPU. GCC provides soft float point library for emulation (http://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html#Soft-float-library-routines. The soft float point library in Blackfin GCC has been optimized for Blackfin. To enable, add ”-mfast-fp” to the compilation flag.

The application compiled with this option will call floating-point functions provided by a fast floating library libbffastfp, which is written in assembly and optimized for Blackfin, instead of the ones provided by libgcc. The floating-point functions in libbffastfp are several times faster than the ones in libgcc. libbffastfp relaxes some rules in IEEE floating-point standard of checking NaN for better performance.

Here is the whetstone results showing the effect of optimization flags.

Unsafe optimizations

There are a few unsafe optimizations, that can provide an overall performance benefit, but should only be used when you have time for either through analysis or through testing.

  • -funsafe-loop-optimizations - If given, the loop optimizer will assume that loop indices do not overflow, and that the loops with nontrivial exit condition are not infinite. This enables a greater number of C loops to generate hardware loops in the Blackfin. This option can result in incorrect output for loops which do have infinite or overflowing indices.
  • -funsafe-math-optimizations - This allow optimizations for floating-point arithmetic that assume that arguments and results are valid and may violate IEEE or ANSI standards. This option can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions.

Using Blackfin GCC Built-in Functions

Blackfin GCC implements built-in functions to make efficient use of HW resources. Knowledge of these functions is built into the compiler. Please refer to built-in_functions for a list of the built-ins implemented.

Using L1 Memory

Level 1 (L1) memories in Blackfin interconnect closely and efficient with the core for best performance. Using the L1 memory blocks are key to being able to effectively and efficient run the Blackfin. See on-chip_sram for details.

Using Blackfin DSP library

The DSP run-time library which contains a broad collection of functions that are commonly required by signal processing applications. Many of the functions are optimized to use built-in functions. See libbfdsp for details descriptions.

Related Links

making_the_blackfin_perform: Guide on optimize your code with examples.

Complete Table of Contents/Topics