Tuning an application begins with an understanding of which areas of the application are most frequently executed and therefore where improvements would provide the largest gains. Profiling allows you to learn where your program spent its time while it was executing. This information can show you which pieces of your program are slower than you expected, and might be candidates for rewriting to make your program execute faster. It can also tell you which functions are being called more or less often than you expected. This may help you spot bugs that had otherwise been unnoticed. Depending on the profiler, sometimes a call graph, which functions called which other functions, can be generated.
Since the profiler uses information collected during the actual execution of your program, it can be used on programs that are too large or too complex to analyze by reading the source (including the entire kernel). However, how your program is run will affect the information that shows up in the profile data. If you don't use some feature of your program while it is being profiled, no profile information will be generated for that feature.
There are two profilers that are currently supported: Oprofile is a system-wide profiler for Linux systems, capable of profiling all running code at low overhead. It consists of a kernel driver and a daemon for collecting sample data, and several post-profiling tools for turning data into information. oprofile
VDSP++'s statistical profiling feature can also be used for Linux Applications. The particular advantage of statistical profiling is that since it uses JTAG nearly 100%, it is completely unobtrusive to the applciation. Other forms of profiling insert instrumentation into the code, disturbing the original optimization, code size, and register allocation to some degree. More details about how to use it may be found in the VisualDSP++ User’s Guide, and there is a short guide to statistical profiling.
Kernel Level Profiling with the built in profiling facilities kernel profiling.
You can also try using the hardware performance counters (CYCLES) to do profiling of blocks of code. That process is documented in the Making the Blackfin Perform article.