Remember the following strategy when writing an application:
Your first decision is to choose whether to implement your application in C or C++ or Assembly. This decision may be influenced by performance considerations. C++ code using only C features has very similar performance to a pure C source. Many higher level C++ features (for example, those resolved at compilation, such as namespaces, overloaded functions and inheritance) have no performance cost. However, use of some other features may degrade performance. Carefully weigh performance loss against the richness of expression available in C++. Examples of features that may degrade performance include virtual functions or classes used to implement basic data types.
There is a vast difference in performance between code compiled optimized and code compiled non-optimized. In some cases, optimized code can run ten or twenty times faster. Always use optimization when measuring performance or shipping code as product.
The optimizer in the C/C++ compiler is designed to generate efficient code from source that has been written in a straightforward manner. The basic strategy for tuning a program is to present the algorithm in a way that gives the optimizer excellent visibility of the operations and data, and hence the greatest freedom to safely manipulate the code. Future releases of the compiler will continue to enhance the optimizer. Expressing algorithms simply will provide the best chance of benefiting from such enhancements.
There are many features of the C and C++ languages that, while legal, indicate programming errors. There are also aspects that are valid but may be relatively expensive for an embedded environment. The compiler can provide the following diagnostics, which may avoid time and effort characterizing source-related problems:
These diagnostics are particularly important for obtaining high-performance code, since the optimizer aggressively transforms the application to get the best performance, discarding unused or redundant code. If this code is redundant because of a programming error (such as omitting an essential volatile qualifier from a declaration), then the code will behave differently from a non-optimized version. Using the compiler’s diagnostics may help you identify such situations before they become problems.
By default, the compiler emits warnings to the standard error stream at compile-time, when it detects a problem with the source code. Disabling warnings, while possible is inadvisable until each instance has been investigated for problems. A typical warning involves a variable being used before its value has been set.
Without any optimization option, the compiler’s goal is to reduce the cost of compilation and to make debugging produce the expected results. Turning on optimization flags makes the compiler attempt to improve the performance and/or code size at the expense of compilation time and possibly the ability to debug the program.
-O0: Do not optimize. This is the default.
-O1: The compiler tries to reduce code size and execution time, without performing any optimizations that take a great deal of compilation time.
-O2: GCC performs nearly all supported optimizations that do not involve a space-speed trade‐off. This option increases both compilation time and the performance of the generated code.
-O3: Besides other optimizations, the compiler performs loop unrolling or function inlining, leading to possibly bigger size.
-Os: Optimize for size.
Blackfin has no FPU. GCC provides soft float point library for emulation (http://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html#Soft-float-library-routines. The soft float point library in Blackfin GCC has been optimized for Blackfin. To enable, add ”-mfast-fp” to the compilation flag.
The application compiled with this option will call floating-point functions provided by a fast floating library libbffastfp, which is written in assembly and optimized for Blackfin, instead of the ones provided by libgcc. The floating-point functions in libbffastfp are several times faster than the ones in libgcc. libbffastfp relaxes some rules in IEEE floating-point standard of checking NaN for better performance.
Here is the whetstone results showing the effect of optimization flags.
There are a few unsafe optimizations, that can provide an overall performance benefit, but should only be used when you have time for either through analysis or through testing.
-funsafe-loop-optimizations- If given, the loop optimizer will assume that loop indices do not overflow, and that the loops with nontrivial exit condition are not infinite. This enables a greater number of C loops to generate hardware loops in the Blackfin. This option can result in incorrect output for loops which do have infinite or overflowing indices.
-funsafe-math-optimizations- This allow optimizations for floating-point arithmetic that assume that arguments and results are valid and may violate IEEE or ANSI standards. This option can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions.
Blackfin GCC implements built-in functions to make efficient use of HW resources. Knowledge of these functions is built into the compiler. Please refer to built-in_functions for a list of the built-ins implemented.
Level 1 (L1) memories in Blackfin interconnect closely and efficient with the core for best performance. Using the L1 memory blocks are key to being able to effectively and efficient run the Blackfin. See on-chip_sram for details.
The DSP run-time library which contains a broad collection of functions that are commonly required by signal processing applications. Many of the functions are optimized to use built-in functions. See libbfdsp for details descriptions.
making_the_blackfin_perform: Guide on optimize your code with examples.