The SMP support in certain Blackfin processors is describe as “SMP Like” rather than just “SMP” due to the lack of hardware cache coherency. A true SMP system would have support for cache coherency in hardware.
On all “SMP Like” setups, cache coherency is maintained via software mechanisms. The result has a few significant implications:
On systems where the L1 SRAM cannot be accessed directly from another core (such as the BF561), dedicated L1 SRAM cannot be used in the kernel. Care must be taken if using L1 SRAM from userspace (making sure applications have their affinity set to a specific core).
It's easy to enable SMP on BF561. First, you'd go into kernel configuration,
Linux Kernel Configuration Blackfin Processor Options ---> CPU (BF561) [*] Symmetric multi-processing support
There are several places indicate that SMP kernel is working.
root:/> dmesg | grep SMP Linux version 2.6.28-rc2-ADI-2009R1-pre (ymm@gyang) (gcc version 4.1.2 (ADI svn)) #752 SMP Mon Dec 22 13:38:05 CST 2008 SMP: Total of 2 processors activated (1179.64 BogoMIPS).
root:/> cat /proc/cpuinfo processor : 0 vendor_id : Analog Devices cpu family : 0x27bb model name : ADSP-BF561 600(MHz CCLK) 100(MHz SCLK) (mpu off) stepping : 3 cpu MHz : 600.000/100.000 bogomips : 1179.64 Calibration : 589824000 loops cache size : 16 KB(L1 icache) 32 KB(L1 dcache-wt) 0 KB(L2 cache) dbank-A/B : cache/cache icache setup : 4 Sub-banks/4 Ways, 32 Lines/Way dcache setup : 2 Super-banks/4 Sub-banks/2 Ways, 64 Lines/Way SMP Dcache Flushes : 31241 processor : 1 vendor_id : Analog Devices cpu family : 0x27bb model name : ADSP-BF561 600(MHz CCLK) 100(MHz SCLK) (mpu off) stepping : 3 cpu MHz : 600.000/100.000 bogomips : 1179.64 Calibration : 589824000 loops cache size : 16 KB(L1 icache) 32 KB(L1 dcache-wt) 0 KB(L2 cache) dbank-A/B : cache/cache icache setup : 4 Sub-banks/4 Ways, 32 Lines/Way dcache setup : 2 Super-banks/4 Sub-banks/2 Ways, 64 Lines/Way SMP Dcache Flushes : 28537 L2 SRAM : 128KB board name : ADI BF561-EZKIT board memory : 65536 kB (0x00000000 -> 0x04000000) kernel memory : 57336 kB (0x00001000 -> 0x037ff000)
root:/> cat /proc/interrupts 35: 0 0 INTN BFIN_UART_RX 36: 835 145 INTN BFIN_UART_TX 42: 14163 14086 INTN Blackfin Timer Tick 69: 198 178 INTN SMP interrupt 82: 1 0 GPIO eth0 Err: 0
The second and third columns are the interrupt times on CoreA and CoreB.
While the BF561 has one atomic instruction (TESTSET), it has significant restrictions. Basically, it can only be used on L2 regions of memory. So all inter-core locks are stored in L2 memory.
Here is a comparison between the BF561 and a typical X86:
|Cache Coherency||N/A||Cache coherency protocols|
|Atomic instruction||TESTSET||Lock# signal/Lock Prefix|
|Local interrupt controller||CEC||LAPIC|
|System interrupt controller||SIC(SICA,SICB)||IOAPIC|
|Local timer||Core timer||LAPIC timer|
The overhead of implementing write back caches is so significant that it is unusable. So only write through cache mode is supported.
All spin lock, atomic, and memory barrier operations need to obtain this Core Lock first so as to protect atomic data.
We will cover the spin lock and unlock operations as all other spin lock operations can be trivially extrapolated from here.
A call to spin_lock() does:
A call to spin_unlock() does:
All atomic operations do:
We defined a global variable, barrier_mask, located in L2 SRAM, to denote whether the barrier operations have crossed Cores. Under such conditions, we will invalidate entire data cache in smp_rmb() and smp_mb().
Whetstone test platform,
|whetstone & whetstone||30s & 30s||15~18s & 18~24s|
SMP like kernel support POSIX IPC semaphore and message. Share memory between two processes running simultaneously on different cores have cache coherency issues. If you intend to use POSIX shm, you can,
If you don't need POSIX standard, you can,