world leader in high performance signal processing
Trace: » sim

Simulator

The binutils portion of the GNU toolchain includes a simulation environment simply referred to as the Simulator. Often times it might also be referred to as the GDB simulator due to the nice integration it has with GDB.

Different architectures have very different feature sets, so here we'll focus on the Blackfin feature set. Currently, it supports:

  • Instruction Set Architecture (ISA) Simulation
  • Core Simulation
    • Core Devices (CEC/MMU/etc…)
      • Supervisor (NMI/Exception/etc…) and Usermode
    • Internal Memory (L1 SRAM)
  • System Simulation
    • System Peripherals (SIC/UART/etc…)
    • External Memory (SDRAM/DDR)

Performance

The performance of the simulator depends on what type of Blackfin instructions you are running, and how fast your host processor is. For example, on a 1.83 GHz Intel processor (unfortunately, since the simulator is single threaded, it can not take advantage of multi-core hosts).

$ bfin-elf-run -v --env user ./user/mp3play/mp3play.gdb -w out.raw in.mp3
run ./user/mp3play/mp3play.gdb
bfin-sim: unimplemented syscall 97
in.mp3: MPEG1-III (121238 ms)
bfin-sim: unimplemented syscall 78
bfin-sim: unimplemented syscall 78
Simulator Execution Speed

  Total instructions:      2,583,456,559
  Total execution time   : 929.75 seconds
  Simulator speed:         2,778,657 insns/second

We can see on this host, and this application - about 3MIPs.

Environment

The simulator has different environment modes. This is controlled by the --environment sim option. You should also pick a CPU model via the --model sim option.

  • Virtual (the default)
    • Focus on ISA simulation
    • Supervisor mode-only
    • No core or system devices
    • Exceptions are processed virtually (by the simulator)
  • Operating (like the hardware)
    • Full core and system simulation support
    • Starts in supervisor mode, but may CEC may change (user/etc…)
    • Exceptions are processed by the simulated code
  • User (like userspace)
    • Focus on userspace applications
    • Usermode-only
    • No core or system devices
    • Exceptions (i.e. system calls) are processed virtually (by the simulator)

Usage

The simulator may be used directly via the run program, or via the GDB sim target.

Run

Options to the simulator must come before the program. Any options after the program are interpreted as arguments to the program itself. Running something like the following will execute the Blackfin program ls with the --help option. It will not pass the --help option to the bfin-elf-run program itself!

$ bfin-elf-run ./ls --help

This example will pass --help to the bfin-elf-run program and not the Blackfin ls program.

$ bfin-elf-run --help ./ls

User Mode

Only basic Linux userspace support exists in the simulator. Enough syscalls are supported to do basic I/O (stdin/out) and file I/O, but don't expect too much. For FLAT binaries, you must run the statically linked FLAT .gdb ELF file (rather than the raw FLAT file itself). Static FDPIC ELF binaries work fine, but dynamic FDPIC ELF binaries require you to pass the --sysroot option (set it to the bfin-linux-uclibc/runtime/ subdir in the toolchain).

For example, the flat application mp3play:

$ bfin-elf-run --env user ./user/mp3play/mp3play.gdb -w output.raw input.mp3
run ./user/mp3play/mp3play.gdb
bfin-sim: unimplemented syscall 97
/nfs_export/mp3/fpm-calm-river.mp3: MPEG1-III (121238 ms)
bfin-sim: unimplemented syscall 78
bfin-sim: unimplemented syscall 78
$
The files output.raw and input.mp3 will be in the current host directory.

Virtual mode

To run a simple Blackfin ELF in Virtual mode:

$ bfin-elf-run ./some-blackfin-elf

Operating mode

To run a Blackfin ELF of an operating system (like U-Boot or Linux) in Operating mode:

$ bfin-elf-run --env operating --model bf537 ./u-boot

U-Boot 2009.11.1-00099-g1c038a0-dirty (ADI-2010R1-pre) (Apr 02 2010 - 15:05:34)

CPU:   ADSP bf537-0.2 (Detected Rev: 0.0) (bypass boot)
Board: ADI BF537 stamp board
       Support: http://blackfin.uclinux.org/
Clock: VCO: 0.250 MHz, Core: 0.250 MHz, System: 0.050 MHz
RAM:   64 MB
Using default environment

In:    serial
Out:   serial
Err:   serial
KGDB:  [on serial] ready
Hit any key to stop autoboot:  0
## Error: "ramboot" not defined
bfin> bdinfo
bdinfo
U-Boot      = U-Boot 2009.11.1-00099-g1c038a0-dirty (ADI-2010R1-pre) (Apr 02 2010 - 15:05:34)
CPU         = bf537-0.2
Board       = bf537-stamp
VCO         =  0.250 MHz
CCLK        =  0.250 MHz
SCLK        =  0.050 MHz
boot_params = 0x00000000
memstart    = 0x00000000
memsize     = 0x04000000
flashstart  = 0x00000000
flashsize   = 0x00000000
flashoffset = 0x00000000
ethaddr     = (not set)
ip_addr     = 03f1ffa8
baudrate    = 57600 bps
bfin>

GDB

The sim target with GDB is like any other remote GDB target. You give GDB an ELF to debug, connect to the “remote” simulator, load up the ELF, and then execute things.

The sim target takes the same options as the run binary. The format is target sim [options].

To load a simple Blackfin ELF in Virtual mode in GDB and run it:

$ bfin-elf-gdb ./lsetup.s.x
(gdb) target sim
Connected to the simulator.

(gdb) load
Loading section .text, size 0x17c lma 0x0
Loading section .data, size 0x22c lma 0x117c
Start address 0xa8
Transfer rate: 7488 bits in <1 sec.

(gdb) break *_start
Breakpoint 1 at 0xa8: file lsetup.s, line 8.

(gdb) disassemble _start
Dump of assembler code for function _start:
0x000000a8 <_start+0>:  R0 = 0x123 (X);
0x000000ac <_start+4>:  P0 = R0;
0x000000ae <_start+6>:  LSETUP(0x0xb2 <_start+10>, 0x0xb2 <_start+10>) LC0 = P0;
0x000000b2 <_start+10>: R0 += -0x1;
0x000000b4 <_start+12>: R1 = 0x0 (X);
0x000000b6 <_start+14>: CC = R1 == R0;
0x000000b8 <_start+16>: IF CC JUMP 0x0xbe <_start+22>;
0x000000ba <_start+18>: CALL 0x0x52 <_fail>;
0x000000be <_start+22>: P0 = 0xa (X);

(gdb) run
Starting program: svn/toolchain/trunk/binutils-2.17/sim/testsuite/sim/bfin/lsetup.s.x

Breakpoint 1, _start () at lsetup.s:8
8               R0 = 0x123;
Current language:  auto; currently asm

(gdb) stepi
_start () at lsetup.s:9
9               P0 = R0;

(gdb) stepi
_start () at lsetup.s:10
10              LSETUP (.L1, .L1) LC0 = P0;

(gdb) regs
R0: 00000123 291          P0: 00000123  RETS: 00000000  LC0: 00000000 0
R1: 00000000 0            P1: 00000000  RETI: 00000000  LT0: 00000000
R2: 00000000 0            P2: 00000000  RETX: 00000000  LB0: 00000000
R3: 00000000 0            P3: 00000000  RETE: 00000000  LC1: 00000000 0
R4: 00000000 0            P4: 00000000  RETN: 00000000  LT1: 00000000
R5: 00000000 0            P5: 00000000 ASTAT: 00000000  LB1: 00000000
R6: 00000000 0            SP: 01000000    CC: 00000000
R7: 00000000 0           USP: 00000000  CYC1: 00000000  SEQSTAT: 00000001
PC: 000000ae              FP: 00000000  CYC2: 00000000   SYSCFG: 00000030

(gdb) stepi
_start () at lsetup.s:12
12              R0 += -1;

(gdb) regs
R0: 00000123 291          P0: 00000123  RETS: 00000000  LC0: 00000123 291
R1: 00000000 0            P1: 00000000  RETI: 00000000  LT0: 000000b2
R2: 00000000 0            P2: 00000000  RETX: 00000000  LB0: 000000b2
R3: 00000000 0            P3: 00000000  RETE: 00000000  LC1: 00000000 0
R4: 00000000 0            P4: 00000000  RETN: 00000000  LT1: 00000000
R5: 00000000 0            P5: 00000000 ASTAT: 00000000  LB1: 00000000
R6: 00000000 0            SP: 01000000    CC: 00000000
R7: 00000000 0           USP: 00000000  CYC1: 00000000  SEQSTAT: 00000001
PC: 000000b2              FP: 00000000  CYC2: 00000000   SYSCFG: 00000030

(gdb) continue
Continuing.
pass

Program exited normally.

Peripherals

While the majority of the core should be handled, the different Blackfin peripherals may vary greatly in terms of functional completeness. While IRQ routing should be fully working from peripherals through the DMAC, to the SIC, and up to the CEC, not all peripherals simulate interrupt generation yet.

Peripheral % Complete Notes
acm Nonexistent
atapi =10 Reads/writes to MMR range not checked
can Nonexistent
cec =========100 Simulation isn't exactly complete, but expected EVT0 behavior is unknown (since it needs an ICE hooked up)
ctimer =========100 Hardware & HRM don't seem to match; everything the hardware does should work
dma =========90 Striding isn't supported
dmac =======70 Only primary peripheral works when multiple are muxed to one channel
ebiu_amc =======70 Basic bank control works, and can be mapped to flash (CFI) or raw memory; extended flash MMRs are stubs
ebiu_ddrc ==20 Stubs only; status bits clear/stay cleared; external memory fixed at start
ebiu_sdc ==20 Stubs only; status bits clear/stay cleared; external memory fixed at start
emac =====50 Simple sending/receiving works; MDIO works
eppi =====50 Output only; no error detection; data visualized via SDL
gpio/pmux =10 Reads/writes to MMR range not checked
gptimer ==20 Stubs only; status bits clear/stay cleared; unified control MMRs not checked
kpad Nonexistent
lockbox Nonexistent
mmu =========90 ITEST not implemented; no cache simulation; everything else works
musb =10 Reads/writes to MMR range not checked
mxvr Nonexistent
nfc ==20 Stubs only; status bits clear/stay cleared
otp =======70 Reads/writes work; no CRC handling; no Lock support; no interrupts
pixc Nonexistent
pll ========80 MMRs all function as expected, but no plans to bother simulating SCLK/CCLK domains
ppi =====50 Output only; no error detection; data visualized via SDL
pwm Nonexistent
rsi/sdh =10 Reads/writes to MMR range not checked
rtc ===30 Reading time returns host system time; otherwise only basic MMR checks
sic =========90 Wakeup sources not supported, but not sure if we should bother; everything else works
spi ==20 Stubs only; status bits clear/stay cleared
sport =10 Reads/writes to MMR range not checked
twi ==20 Stubs only; status bits clear/stay cleared
trace =========100 Hardware behavior should be matched; trace buffer extends even further than hardware
uart ========80 Sending/receiving works (DMA+PIO); exotic things (IRDA/CTS/RTS) not simulated
uart2 ========80 Sending/receiving works (DMA+PIO); exotic things (IRDA/CTS/RTS) not simulated
cnt Nonexistent
wdog ==20 Stubs only; status bits clear/stay cleared

bfin_emac

The Blackfin on-chip MAC driver can be simulated with the help of the TUN/TAP driver under an operating environment simulation. On your host Linux system, you can set up the host side of the network by doing:

$ sudo tunctl -u ${USER} -t tap-gdb
$ sudo ifconfig tap-gdb 10.1.1.1 up

This should provide the tap-gdb ethernet device.

$ sudo ifconfig tap-gdb
tap-gdb   Link encap:Ethernet  HWaddr f2:f4:c6:fb:0b:4c  
          inet addr:10.1.1.1  Bcast:10.255.255.255  Mask:255.0.0.0
          inet6 addr: fe80::f0f4:c6ff:fefb:b4c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:28 overruns:0 carrier:0
          collisions:0 txqueuelen:500 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

Now the host side has an IP of 10.1.1.1. Have the simulated code bind a similar appropriate IP and you can talk to each other. For example, this uses the bfin MAC driver in U-Boot to talk to the host Ethernet device.

$ bin/bfin-elf-run --env operating --model bf537 ./u-boot

U-Boot 2010.06-svn2356 (ADI-2010R1-pre) (Jul 16 2010 - 14:31:48)

CPU:   ADSP bf537-0.2 (Detected Rev: 0.3) (bypass boot)
Board: ADI BF537 stamp board
       Support: http://blackfin.uclinux.org/
Clock: VCO: 500 MHz, Core: 500 MHz, System: 125 MHz
RAM:   64 MiB
Flash: ## Unknown FLASH on Bank 1 - Size = 0x00000000 = 0 MB
0 Bytes
In:    serial
Out:   serial
Err:   serial
KGDB:  [on serial] ready
Warning: Generating 'random' MAC address
Net:   bfin_mac
Hit any key to stop autoboot:  5  
bfin> set ipaddr 10.1.1.2
bfin> set serverip 10.1.1.1
bfin> ping $(serverip)
Using bfin_mac device
host 10.1.1.1 is alive

Tracing

The simulator supports a variety of tracing methods. Not all are implemented, so don't be surprised if you use one and see no output related to it.

$ bfin-elf-run --trace-insn --trace-branch --trace-core --trace-linenum ./simple0.s.x
reg:      wrote SP = 0x4000000
reg:      wrote SYSCFG = 0x30
reg:      wrote PC = 0xa8
core:     0x0000a8 #7    __start                           -IBUS FETCH 2 bytes @ 0x000000a8: 0x6028
insn:     0x0000a8 #7    __start                           -R0 = 0x5 (X);
reg:      0x0000a8 #7    __start                           -wrote R0 = 0x5
reg:      0x0000a8 #7    __start                           -wrote PC = 0xaa
core:     0x0000aa #8    __start                           -IBUS FETCH 2 bytes @ 0x000000aa: 0x67f8
insn:     0x0000aa #8    __start                           -R0 += -0x1;
reg:      0x0000aa #8    __start                           -wrote ASTAT[an] = 0
reg:      0x0000aa #8    __start                           -wrote ASTAT[v] = 0
reg:      0x0000aa #8    __start                           -wrote ASTAT[az] = 0
reg:      0x0000aa #8    __start                           -wrote ASTAT[ac0] = 1
reg:      0x0000aa #8    __start                           -wrote R0 = 0x4
reg:      0x0000aa #8    __start                           -wrote PC = 0xac
core:     0x0000ac #9    __start                           -IBUS FETCH 2 bytes @ 0x000000ac: 0xf000
core:     0x0000ac #9    __start                           -IBUS FETCH 2 bytes @ 0x000000ae: 0x0004
insn:     0x0000ac #9    __start                           -DBGA (R0.L, 0x4);
reg:      0x0000ac #9    __start                           -wrote PC = 0xb0
core:     0x0000b0 #10   __start                           -IBUS FETCH 2 bytes @ 0x000000b0: 0xe3ff
core:     0x0000b0 #10   __start                           -IBUS FETCH 2 bytes @ 0x000000b2: 0xffb8
insn:     0x0000b0 #10   __start                           -CALL 0xffffff70;
branch:   0x0000b0 #10   __start                           -CALL to 0x20
reg:      0x0000b0 #10   __start                           -wrote RETS = 0xb4
reg:      0x0000b0 #10   __start                           -wrote PC = 0x20
core:     0x000020 #4    __pass                            -IBUS FETCH 2 bytes @ 0x00000020: 0xe148
core:     0x000020 #4    __pass                            -IBUS FETCH 2 bytes @ 0x00000022: 0x0000
insn:     0x000020 #4    __pass                            -P0.H = 0;
reg:      0x000020 #4    __pass                            -wrote P0 = 0
reg:      0x000020 #4    __pass                            -wrote PC = 0x24
...

Profiling

various profiling options can be given

$ bfin-elf-run --profile-pc -v --env user ./user/mp3play/mp3play.gdb -w output.raw input.mp3
run ./user/mp3play/mp3play.gdb
bfin-sim: unimplemented syscall 97
input.mp3: MPEG1-III (121238 ms)
bfin-sim: unimplemented syscall 78
bfin-sim: unimplemented syscall 78
Summary profiling results:

Program Counter Statistics:

  Total samples: 10,052,360
  Granularity: 8,192 bytes per bucket
  Size: 16 buckets
  Frequency: 257 cycles per sample
  Range: 0x0 0x20000

  0x00000000:      6,334  0.1:
  0x00002000:  1,682,325 16.7: ***************
  0x00004000:    901,640  9.0: ********
  0x00006000:  4,437,908 44.1: ****************************************
  0x00008000:  2,999,642 29.8: ***************************
  0x0000a000:     24,440  0.2:
  0x0000c000:         63  0.0:
  0x0000e000:          8  0.0:

Simulator Execution Speed

  Total instructions:      2,583,456,559
  Total execution time   : 932.65 seconds
  Simulator speed:         2,770,017 insns/second
A histogram of the program counter is provided.

The amount of PC samples can be changes with the -profile-pc-frequency (the number of cycles to go by before the PC is sampled), and --profile-pc-granularity affects the size of the bins in the histgram. Setting either of this too high will negatively effect the overall speed of the simulator. For example:

$ bfin-elf-run --profile-pc --profile-pc-granularity 1024 --profile-pc-frequency 1 -v --env user ./user/mp3play/mp3play.gdb -w output.raw in.mp3
run ./user/mp3play/mp3play.gdb
bfin-sim: unimplemented syscall 97
in.mp3: MPEG1-III (121238 ms)
bfin-sim: unimplemented syscall 78
bfin-sim: unimplemented syscall 78
Summary profiling results:

Program Counter Statistics:

  Total samples: 2,583,456,559
  Granularity: 1,024 bytes per bucket
  Size: 70 buckets
  Frequency: 1 cycles per sample
  Range: 0x0 0x11800

  0x00000000:      12,957  0.0:
  0x00000400:     232,225  0.0:
  0x00000800:          99  0.0:
  0x00000c00:         157  0.0:
  0x00001000:      92,934  0.0:
  0x00001400:     372,446  0.0:
  0x00001800:     455,003  0.0:
  0x00001c00:     454,504  0.0:
  0x00002000:   2,397,115  0.1:
  0x00002400: 145,451,550  5.6: ************************
  0x00002800: 153,068,129  5.9: *************************
  0x00002c00:     129,948  0.0:
  0x00003000:  56,338,159  2.2: *********
  0x00003800:  70,734,046  2.7: ***********
  0x00003c00:   4,273,825  0.2:
  0x00004000:   4,433,501  0.2:
  0x00004400:   8,708,839  0.3: *
  0x00004c00:   3,693,004  0.1:
  0x00005000:   2,358,656  0.1:
  0x00005400: 207,597,745  8.0: **********************************
  0x00005800:   3,080,881  0.1:
  0x00005c00:   1,812,873  0.1:
  0x00006000: 119,947,811  4.6: ********************
  0x00006400: 172,422,432  6.7: ****************************
  0x00006800: 136,668,168  5.3: **********************
  0x00006c00:   1,670,760  0.1:
  0x00007000:   4,740,794  0.2:
  0x00007400: 236,056,788  9.1: ***************************************
  0x00007800: 239,330,640  9.3: ****************************************
  0x00007c00: 229,662,084  8.9: **************************************
  0x00008000: 158,777,161  6.1: **************************
  0x00008400:  19,785,449  0.8: ***
  0x00008800:  22,187,520  0.9: ***
  0x00008c00: 157,194,704  6.1: **************************
  0x00009000: 161,369,152  6.2: **************************
  0x00009400: 148,607,816  5.8: ************************
  0x00009800: 103,023,715  4.0: *****************
  0x0000a000:          60  0.0:
  0x0000a400:      67,023  0.0:
  0x0000a800:          66  0.0:
  0x0000ac00:         218  0.0:
  0x0000b000:         848  0.0:
  0x0000b400:       3,665  0.0:
  0x0000b800:         152  0.0:
  0x0000bc00:   6,222,694  0.2: *
  0x0000c000:         334  0.0:
  0x0000c800:       3,215  0.0:
  0x0000cc00:      11,637  0.0:
  0x0000d000:         290  0.0:
  0x0000d400:          89  0.0:
  0x0000d800:         778  0.0:
  0x0000dc00:         710  0.0:
  0x0000e000:         204  0.0:
  0x0000e400:         119  0.0:
  0x0000e800:         767  0.0:
  0x0000f000:          38  0.0:
  0x0000f400:          43  0.0:
  0x00011400:          19  0.0:

Simulator Execution Speed

  Total instructions:      2,583,456,559
  Total execution time   : 1458.86 seconds
  Simulator speed:         1,770,873 insns/second

The overhead of sampling the PC on ever cycle, slows the simulator down to 1.7MIPS. However, we can find the hot spots of the code much easier, and use tools like bfin-elf-nm or bfin-elf-objdump to determine the functions in those address ranges which should be looked at in terms of performance optimizations.