Parallel Processing refers to the concept of speeding-up the execution of a program by dividing the program into multiple fragments that can execute simultaneously, each on its own processor. A program being executed across n processors might execute n times faster than it would using a single processor.
Traditionally, multiple processors were provided within a specially designed “parallel computer”; along these lines, Linux now supports SMP systems in which multiple processors share a single memory and bus interface within a single computer. It is also possible for a group of computers (for example, a group of PCs each running Linux) to be interconnected by a network to form a parallel-processing cluster. The third alternative for parallel computing using Linux is to use the multimedia instruction extensions (i.e., MMX) to operate in parallel on vectors of integer data. Finally, it is also possible to use a Linux system as a “host” for a specialized attached parallel processing compute engine. All these approaches are discussed in detail in the Parallel-Processing How To, and the 4th, (a specialized attached parallel processing compute engine) will be described here.
The classical trade-off between system performance and ease of programming is one of the primary differentiators between general purpose operating system (GPOS) and real-time operating systems (RTOS).
GPOSes tend to provide a higher degree of resource abstraction. This improves application portability, ease of development and increases system robustness through software modularity and isolation of resources. This makes a GPOS ideal for addressing general purpose system components such as networking, user interface and display management.
However, this abstraction sacrifices the fine-grained control of system resources required to meet the performance goals of computationally intensive algorithms such as signal processing code. For this level of control, developers typically turn to a real-time operating system (RTOS), or program directly on bare metal.
There are various use cases for wanting to be able to load bare metal applications or appications under real time OS into Core B, and use it like a hardware accelerator.
There are various things that can be done to accelerate some task which normally runs under the Linux kernel.
Running an optimized H.264 or MPEG or WMV video codec on Core B, with mplayer running on Core A.
The decoder does nothing except for decoding video stream into some frame buffers. Mplayer open a h.264 bit stream from either a file on disk or a connection over network.
Crypto_API_(Linux) offers hardware acceleration support.
There are times where the hard real time performance offered by the Linux kernel or by ADEOS are not enough for the application. In those select times, you can still use a thin RTOS (VDK, uCos, etc.) on CORE B, and Linux on Core A.
This section is intended to define the communication well enough that different implementations can successfully communicate.
There are a fixed number of shared variables with sizes and addresses known to each processor.
There is a shared variable for the basic message queue. Protocols that use this queue may require additional shared variables or may require individual processors to have a pool of shareable memory from which buffers can be allocated.
If processors have different word sizes and address maps then addresses of shared buffers and the size of addressable units could differ, and the protocol would need to define a common address representation and addressable unit. We also define types that have at least 16 bits and 32 bits with size larger than or equal to the smallest addressable unit.
We will use the data types:
typedef 'some unsigned integer' sm_unit_t; // defined in specifics typedef 'some unsigned integer' sm_uint16_t; // defined in specifics typedef 'some unsigned integer' sm_uint32_t; // defined in specifics typedef 'some integral type' sm_address_t; // defined in specifics
Both cores on BF561 and BF60x have the same address space and are byte addressed.
typedef uint8_t sm_unit_t; typedef uint16_t sm_uint16_t; typedef uint32_t sm_uint32_t; typedef void *sm_address_t;
One of the assumptions of the MCAPI/ICC protocol is that the payload buffer received on one core is located in the memory region managed(owned) by the other cores.
Core 0 should set up write through CPLB entries for the memory region managed by core 1. So, The invalidate instruction on core 0 doesn’t flush dummy data in cache back to the MCAPI payload buffer sent by core 1 or drop unrelated data in the same cache line near the MCAPI payload boundary. What CPLB entries (WT/WB) are set up for the same memory region on core 1 doesn’t matter, because core 1 should flush the MCAPI payload buffer before sending.
For example:
| BF609 mem addr | Owner | Core0 cache | Core1 cache |
|---|---|---|---|
| 0~0x3FFFFF | Core0 | WB | WT |
| 0x400000~0x800000 | Core1 | WT | WB |
The specific part defines a type which may be read and written atomically. Two operations are defined on the type: Read and Write, and it may hold values of sm_uint16_t.
// defined at specifics typedef 'some type' sm_atomic_t; sm_uint16_t sm_read_atomic(volatile sm_atomic_t *); void sm_write_atomic(volatile sm_atomic_t *, sm_uint16_t);
Atomic means that if one core writes a variable and another core reads it, the value read is either the value before the write or the value after it and not some third value because the write had only half completed when the read occurred.
Atomic operations on a single core must also be ordered with respect to each other so the following logic holds.
// initial values sm_atomic_t a = 0, b = 0;
on processor 0:
sm_write_atomic(&a, 1);
while (sm_read_atomic(&b) == 0)
;
on processor 1: sm_write_atomic(&b, 1); x = sm_read_atomic(&a); assert(x == 1); // because read(b) must follow write(a) on processor 0
On BF561 and BF60x both cores use the same bus to L2 and the EBUI. L2 memory is 64-bits wide and memory attached to the EBUI is at least 16-bits wide. So an uncached 16-bit write to L2 or L3 is atomic.
typedef uint16_t sm_atomic_t;
inline void sm_atomic_write(volatile sm_atomic_t *a, sm_uint16_t v) {
*a = v;
}
inline sm_uint16_t sm_atomic_read(volatile sm_atomic_t *a) {
return *a;
}
Each processor must be able to raise interrupts on the other processor. We use one interrupt on each core which indicates some action is required. The interrupt handler works out what the action is from the channel state. So all modifications to shared data should be visible to both processors by the time the interrupt handler is entered.
The mechanism for initialising interrupt handlers and clearing the interrupt source is necessarily processor and environment specific. The initialisation sequence described below requires the interrupt to be initially masked, which is usually the case.
| CPU | master core ICC interrupt | slave core ICC interrupt |
|---|---|---|
| BF561 | core supplemental interrupt 0 | core supplemental interrupt 0 |
| BF60x | SEC soft interrupt 0 | SEC soft interrupt 1 |
The protocol does not define the core interrupt vectors used to handle this interrupts or whether they are shared with other interrupt sources, as this is a decision local to the environment running on the core.
Modifications to shared data are made visible to the other core before raising the interrupt by
The interrupted core is responsible for ensuring the initial reads of shared data are not from cache.
The protocol is for two way communication between two processors. If there are more processors in the system then the protocol could be used for separate two way channels between each pair of processors.
There are four message queues. Two in each direction, one for high priority messages and the other for standard priority.
Message queues are circular buffers containing SM_MSGQ_LEN fixed size messages.
typedef struct {
sm_atomic_t sent;
sm_atomic_t received;
sm_msg_t buf[SM_MSGQ_LEN];
} sm_msgq_t;
The size and content of sm_msg_t is defined in Part 2 below.
SM_MSGQ_LEN is a constant. For efficiency it should be a power of two.
The message queue uses a lockless protocol. The sender always writes a message at sent % SM_MSGQ_LEN and then increments sent, and the receiver always reads from received % SM_MSGQ_LEN and then increments received. The number of messages in the queue is (sm_uint16_t)(sent - received). The counters are unsigned so, due to the wonders of modulo arithmetic, this is true even if received > sent because sent has wrapped round.
Before sending a message the sender checks that there is space available in the buffer. If space is available the sender writes the message to buf[sent % SM_MSGQ_LEN], increments sent, then raises the 'Action Required' interrupt on the other processor. If no space is available in the buffer the calling process must block.
The handler for the 'Action Required' interrupt causes a receiver for both high and standard priority queues to run. Whether the receivers execute within the handler or are just scheduled to run once it returns is environment dependent.
The receiver checks the number of messages in the buffer. If there are any it reads the message at buf[received % SM_MSGQ_LEN] and tries to deliver it. If successful it increments received and raises the 'Action Required' interrupt on the sending core.
The interrupt handler also checks whether space has come available in the queues there are processes blocked on.
The definition of a process, the mechanism for blocking a process, and the method of dealing with the race condition between the sender blocking and the receiver raising the interrupt is processor and environment specific and outside the scope of the protocol. For example in a bare metal environment there is only one “process” and it can block by spinning on a variable that is set by the interrupt handler whereas other environments would use operating system primitives.
A message channel between a pair of processors is composed of 4 message queues.
typedef struct {
volatile sm_msgq_t msgq[2][2];
} sm_channel_t;
The processor of large cpu_id is on the contrary. It receives on msgq[priority][1] and sends on mesgq[priority][0]. The message queue id can be identified according to current cpu, destination cpu and cpu which sends the inter processor interrupt.
if (cur_cpuid == ipi_src_cupid || cur_cpuid == des_cpuid)
BUG();
recv_msgq_id = cur_cpuid < ipi_src_cpuid ? 0 : 1;
send_msgq_id = cur_cpuid < des_cpuid ? 1 : 0;
Each processor receives high priority on the queue msgq[0][recv_msgq_id], and the standard priority messages on msgq[1][recv_msgq_id]. So do the queue to send message.
If there are N processors in the architecture, there should be a channel array of (N - 1) * N / 2 channels. The channel arrays exist at a known location of the shared memory. The channel id can be identified according to current cpu and remote cpu.
Message Channel ID Table
| Processor ID | 0 | 1 | 2 | 3 |
| 0 | NA | 0 | 1 | 2 |
| 1 | NA | NA | 3 | 4 |
| 2 | NA | NA | NA | 5 |
| 3 | NA | NA | NA | NA |
#define CPU_NUM 4
sm_channel_t channels[(CPU_NUM - 1) * CPU_NUM / 2];
int8 channel_table[CPU_NUM][CPU_NUM] = {
{-1, 0, 1, 2},
{-1,-1, 3, 4},
{-1,-1,-1, 5},
{-1,-1,-1,-1},
};
channel_id = channel_table[cur_cpuid, remote_cpuid];
if (channel_id < 0 || channel_id >= CPU_NUM)
BUG();
channel = channels[channel_id];
Each message queue is statically initialised with received and sent containing the value 0. When a processor starts running the 'Action Required' interrupt is masked before attempting to sending the first message a handler is installed and the interrupt is unmasked. A message queue can be written to before the receiver has initialized its interrupts. If it fills up the 'Action Required' signal is raised but not serviced until the receiver unmasks the interrupt.
A single block of four message queues is held in the shared variable at a known address msgq.
Its start address should be at a fixed position known to code running on all processors.
#define MSGQ_START_ADDR 0xFEB00000 // in BF561 and BF60x L2 SRAM
typedef struct {
volatile sm_msgq_t msgq[2][2];
} sm_channel_t;
static dsp_channel_t *sm_ch = (sm_channel_t *)MSGQ_START_ADDR;
typedef sm_unint16_t sm_endpoint_t;
typedef struct {
sm_endpoint_t dst_ep, src_ep;
sm_uint32_t type;
sm_uint32_t length;
sm_address_t payload;
} sm_msg_t;
The fields dst_ep and src_ep denote endpoints.
The meaning of an endpoint is application dependent.
The receiver should inspect the dst_ep field to decide how to process the message.
The src_ep field indicates the sender which may be meaningful to the receiving endpoint.
type is an an unsigned 32 bit integer value that indicates a message type
defined in one of the higher level protocols
and is mainly interpreted by the endpoint.
The top eight bits of the value indicates the protocol and the low 24 bits the subtype.
// compose type enumeration value from protocol & subtype #define SM_MSG_TYPE(protocol, subtype) (((protocol)<<24)|(subtype)) // extract subtype from type enumeration value #define SM_MSG_SUBTYPE(type) ((type)&0xffffff) // extract protocol from type enumeration value #define SM_MSG_PROTOCOL(type) (((type)>>24)&0xff)
An endpoint may recognise more than one protocol.
The receiver must know the protocols recognised by each endppoint.
When dst_ep has the value 0xffff the message is broadcast to every endpoint
which recognises the protocol encoded in the type field.
The meaning of length and payload is dependent on the value of type
and interpreted by the endpoint.
On BF561 message reads and writes to a queue in L2 is more efficient if 'sm_msg_t' is aligned on a 64 bit boundary. An aligned access should take 14 rather than 21 cycles.
The type declaration for the VisualDSP compiler should use pragma align:
typedef struct {
#pragma align 8
...
} sm_msg_t;
All protocol types
enum {
SP_GENERAL = 0,
SP_CORE_CONTROL,
SP_TASK_MANAGER,
SP_RES_MANAGER,
SP_PACKET,
SP_SESSION_PACKET,
SP_SCALAR,
SP_SESSION_SCALAR,
SP_MAX,
};
All protocols should recognise the standard message types.
A couple of common error conditions are covered by standard messages.
These are sent with the same priority as the message to which they responding.
All endpoints should recognise the message:
SM_BAD_ENDPOINT = SM_MSG_TYPE(0, 0)
This may be sent in response to a message sent by this endpoint to
indicate the dst_ep field was invalid.
The SM_BAD_ENDPOINT message has its src_ep field set to the invalid endpoint id, its length to 0, and its payload to the type value of the original message.
The SM_BAD_ENDPOINT message may not be sent in all environments. If endpoints can be created dynamically it may be more appropriate to queue the message until the endpoint is created.
All endpoints should recognise and be able to send the message:
SM_BAD_MSG = SM_MSG_TYPE(0, 1)
When an endpoint receives a message with a type field it does not expect it should return an SM_BAD_MSG message with the payload set to the type value it did not recognise and length field set to 0.
The message queue layer should also return SM_BAD_MSG if a message with an invalid protocol value is sent to an endpoint. Either 0 or the endpoint's known protocol is valid.
All endpoints should recognise and be able to send the message:
SM_QUERY_MSG = SM_MSG_TYPE(0, 2)
SM_QUERY_MSG and SM_QUERY_ACK_MSG messages are used for query remote endpoint status. Query message should set dsp_ep field and type field.
When an endpoint receives a SM_QUERY_MSG, it should return a SM_QUERY_ACK_MSG. The message queue layer should return SM_QUERY_NOEP_MSG if the endpoint hasn't been created.
All endpoints should recognise and be able to send the message:
SM_QUERY_ACK_MSG = SM_MSG_TYPE(0, 3)
SM_QUERY_ACK_MSG message should set src_ep field and type field.
All endpoints should recognise and be able to send the message:
SM_QUERY_NOEP_MSG = SM_MSG_TYPE(0, 4)
All endpoints should recognise and be able to send the message:
SM_NOTIFY_EP_CREATE_MSG = SM_MSG_TYPE(0, 5)
If a new endpoint has been created, it should send a SM_NOTIFY_EP_CREATE_MSG notify to remote message queue layer. SM_NOTIFY_EP_CREATE_MSG should set src_ep field.
Communication protocols defined in DSP bridge framework are as following.
| Protocol type | value | Protocol Name |
|---|---|---|
| SP_CORE_CONTROL | 1 | Core Control Protocol |
| SP_TASK_MANAGER | 2 | Task Manager Protocol |
| SP_RES_MANAGER | 3 | Resource Manager Protocol |
| SP_PACKET | 4 | Connectionless Packet Transfer Protocol |
| SP_SESSION_PACKET | 5 | Connection based Packet Transfer Protocol |
| SP_SCALAR | 6 | Connectionless Scalar Transfer Protocol |
| SP_SESSION_SCALAR | 7 | Connection based Scalar Transfer Protocol |
The core control protocol is a simple set of messages for controlling a slave core.
| message | value | sent by | meaning |
|---|---|---|---|
| SM_CORE_START | SM_MSG_TYPE(SP_CORE_CONTROL, 0) | Master | Change slave state from stopped to started |
| SM_CORE_STARTED | SM_MSG_TYPE(SP_CORE_CONTROL, 1) | Slave | in response to SM_CORE_START once started |
| SM_CORE_STOP | SM_MSG_TYPE(SP_CORE_CONTROL, 2) | Master | Change slave state from started to stopped |
| SM_CORE_STOPPED | SM_MSG_TYPE(SP_CORE_CONTROL, 3) | Slave | in response to SM_CORE_STOPPED once stopped |
| SM_CORE_RESET | SM_MSG_TYPE(SP_CORE_CONTROL, 4) | Master | Put slave in stopped state if not already stopped and reset state including PC. |
| SM_CORE_RESETED | SM_MSG_TYPE(SP_CORE_CONTROL, 5) | Slave | in response to SM_CORE_STOPPED once stopped |
All messages are sent with high priority.
The task manage protocol is a simple set of messages to run and kill a task on the slave cores.
| message | value | sent by | meaning |
|---|---|---|---|
| SM_TASK_RUN | SM_MSG_TYPE(SP_TASK_MANAGER, 0) | Master | ask slave core to execute a task with function addresses and parameters of init and exit. addresses and parameters are stored in payload buffer allocated by master. |
| SM_TASK_RUNNING | SM_MSG_TYPE(SP_TASK_MANAGER, 1) | Slave | in response to SM_TASK_RUN. task id or 0 is stored in payload. master can free payload buffer after received this response. |
| SM_TASK_KILL | SM_MSG_TYPE(SP_TASK_MANAGER, 2) | Master | ask slave core to stop running a task of give id in payload. |
| SM_TASK_KILLED | SM_MSG_TYPE(SP_TASK_MANAGER, 3) | Slave | in response to SM_TASK_KILL once return to idle. task id or 0 is stored in payload. |
All messages are sent with high priority.
How the application use this resource manager protocol depends on how the precedent shared resource partition is defined for all cores. Precedent shared resource partition may be more suitable to systems that don't need dynamic resource allocation and free. Different implementations can make their own decision.
| message | value | sent by | meaning |
|---|---|---|---|
| SM_RES_MGR_REQUEST | SM_MSG_TYPE(SP_RES_MANAGER, 0) | slave | request shared resources |
| SM_RES_MGR_REQUEST_OK | SM_MSG_TYPE(SP_RES_MANAGER, 1) | master | request succeeds for all resources in the slave's request list |
| SM_RES_MGR_REQUEST_FAIL | SM_MSG_TYPE(SP_RES_MANAGER, 2) | master | request fails for at least one resource in he slave's request list |
| SM_RES_MGR_FREE | SM_MSG_TYPE(SP_RES_MANAGER, 3) | slave | free reserved resources |
| SM_RES_MGR_FREE_DONE | SM_MSG_TYPE(SP_RES_MANAGER, 4) | master | free done |
| SM_RES_MGR_EXPIRE | SM_MSG_TYPE(SP_RES_MANAGER, 5) | master | ask slave to stop using the resources |
| SM_RES_MGR_EXPIRE_DONE | SM_MSG_TYPE(SP_RES_MANAGER, 6) | slave | |
| SM_RES_MGR_LIST | SM_MSG_TYPE(SP_RES_MANAGER, 7) | slave | request a list of all shared resources of a type, no payload |
| SM_RES_MGR_LIST_OK | SM_MSG_TYPE(SP_RES_MANAGER, 8) | master | reply a list of all available shared resources of a resource type in payload buffer |
| SM_RES_MGR_LIST_DONE | SM_MSG_TYPE(SP_RES_MANAGER, 9) | slave | finish access this list buffer |
The same payload address should be returned in all reply messages, while list message has no payload. All messages are with normal priority.
enum {
SM_RES_MGR_REQUEST = SM_MSG_TYPE(SP_RES_MANAGER, 0),
SM_RES_MGR_REQUEST_OK,
SM_RES_MGR_REQUEST_FAIL,
SM_RES_MGR_FREE,
SM_RES_MGR_FREE_DONE,
SM_RES_MGR_EXPIRE,
SM_RES_MGR_EXPIRE_DONE,
SM_RES_MGR_LIST,
SM_RES_MGR_LIST_OK,
SM_RES_MGR_LIST_DONE,
SM_RES_MGR_MAX,
};
The resource manager service should bind to endpoint 0 on each processor. Slave applications and OS should always request all types of shared resources from this endpoint in master OS.
// resource manager service endpoint #define EP_RESMGR_SERVICE 0
The ID of a shared resource is unique among all kinds of resources. The supper 4 bits indicate the type of the shared resource, while the rest 12 bits is the index in the given type group. There are at most 16 (2^4) types and only 5 is defined yet. For each type, there could be at most 4096 (2^12) individual resources. The SM_RES_MGR message use payload to pass resouce ID, and use length to point to a 32-bit resouce description data address if resouce type is RESMGR_TYPE_PERIPHERAL.
// resource types
enum {
RESMGR_TYPE_PERIPHERAL = 0,
RESMGR_TYPE_GPIO,
RESMGR_TYPE_SYS_IRQ,
RESMGR_TYPE_DMA,
RESMGR_TYPE_MAX,
};
#define RES_TYPE_OFFSET 12
#define RES_TYPE_MASK 0xF
#define RES_SUBID_MASK 0xFFF
// compose resource id from resource type & sub id
#define RESMGR_ID(type, subid) ((type << RES_TYPE_OFFSET ) | (subid & RES_SUBID_MASK))
// extract resource subid from resource id
#define RESMGR_SUBID(id) (id & RES_SUBID_MASK)
// extract resource type from resource id
#define RESMGR_TYPE(id) ((id >> RES_TYPE_OFFSET) & RES_TYPE_MASK)
Resource description data address should be put in the length of the message in following format.
typedef struct {
uint8_t label[32]; // resource device owner name
uint16_t count; // resource number in next array
uint32_t resources_array; // address of the resource ID array
} resources_t;
Resource manager APIs declaration:
int sm_request_resource(uint32_t dst_cpu, uint32_t resource_id, resources_t *data) int sm_free_resource(uint32_t dst_cpu, uint32_t resource_id, resources_t *data)
For peripherals type, the peripheral name and list is passed by resouce description data.
unsigned short bfin_peripheral_list[] = {P_SPI1_SCK, P_SPI1_MISO, P_SPI1_MOSI, 0};
resources_t bfin_peri_res = {
.label = "bfin-spi1",
};
bfin_peri_res.count = 3;
bfin_peri_res.resources_array = (uint32_t)bfin_peripheral_list;
COREB_DEBUG(1, "request resource id %s\n", bfin_peri_res.label);
ret = sm_request_resource(EP_RESMGR_SERVICE, RESMGR_ID(RESMGR_TYPE_PERIPHERAL, 0), &bfin_peri_res);
if (ret) {
COREB_DEBUG(1, "request peri resource failed\n");
}
ret = sm_free_resource(EP_RESMGR_SERVICE, RESMGR_ID(RESMGR_TYPE_PERIPHERAL, 0), &bfin_peri_res);
if (ret) {
COREB_DEBUG(1, "free peri resource failed\n");
}
The generic map of the GPIOs, system IRQs and DMA channels to their ID should be defined for each arch. The resource sequence in the HRM can be one reference for the generic map.
| GPIO ID | GPIO in bf561 HRM |
|---|---|
| 0 | PF0 |
| 1 | PF1 |
| … | … |
| 47 | PF47 |
| System IRQ ID | System IRQ in bf561 HRM |
|---|---|
| 0 | PLL_WAKEUP |
| 1 | DMA1_ERROR |
| 2 | DMA2_ERROR |
| 3 | IMDMA_ERROR |
| 4 | PPI0_ERROR |
| 5 | PPI1_ERROR |
| 6 | SPORT0_ERROR |
| 7 | SPORT1_ERROR |
| 8 | SPI0_ERROR |
| 9 | UART0_ERROR |
| 10 | RESERVED |
| 11 | DMA1_CH0 |
| 12 | DMA1_CH1 |
| 13 | DMA1_CH2 |
| 14 | DMA1_CH3 |
| 15 | DMA1_CH4 |
| 16 | DMA1_CH5 |
| 17 | DMA1_CH6 |
| 18 | DMA1_CH7 |
| 19 | DMA1_CH8 |
| 20 | DMA1_CH9 |
| 21 | DMA1_CH10 |
| 22 | DMA1_CH11 |
| 23 | DMA2_CH0 |
| 24 | DMA2_CH1 |
| 25 | DMA2_CH2 |
| 26 | DMA2_CH3 |
| 27 | DMA2_CH4 |
| 28 | DMA2_CH5 |
| 29 | DMA2_CH6 |
| 30 | DMA2_CH7 |
| 31 | DMA2_CH8 |
| 32 | DMA2_CH9 |
| 33 | DMA2_CH10 |
| 34 | DMA2_CH11 |
| 35 | TIMER0 |
| 36 | TIMER1 |
| 37 | TIMER2 |
| 38 | TIMER3 |
| 39 | TIMER4 |
| 40 | TIMER5 |
| 41 | TIMER6 |
| 42 | TIMER7 |
| 43 | TIMER8 |
| 44 | TIMER9 |
| 45 | TIMER10 |
| 46 | TIMER11 |
| 47 | PF0_PF15_A |
| 48 | PF0_PF15_B |
| 49 | PF16_PF31_A |
| 50 | PF16_PF31_B |
| 51 | PF32_PF47_A |
| 52 | PF32_PF47_B |
| 53 | DMA1_MDMA_STREAM0 |
| 54 | DMA1_MDMA_STREAM1 |
| 55 | DMA2_MDMA_STREAM0 |
| 56 | DMA2_MDMA_STREAM1 |
| 57 | IMDMA_STREAM0 |
| 58 | IMDMA_STREAM0 |
| 59 | WATCHDOG |
| 60 | RESERVED |
| 61 | RESERVED |
| 62 | RESERVED (SUPPLE_0 is reserved by DSP bridge framework |
| 63 | SUPPLE_1 |
| DMA ID | DMA in bf561 HRM |
|---|---|
| 0 | DMA1_PPI0 |
| 1 | DMA1_PPI1 |
| 2 | RESERVED |
| 3 | RESERVED |
| 4 | RESERVED |
| 5 | RESERVED |
| 6 | RESERVED |
| 7 | RESERVED |
| 8 | RESERVED |
| 9 | RESERVED |
| 10 | RESERVED |
| 11 | RESERVED |
| 12 | DMA1_MEM_STREAM0_DES |
| 13 | DMA1_MEM_STREAM0_SRC |
| 14 | DMA1_MEM_STREAM1_DES |
| 15 | DMA1_MEM_STREAM1_SRC |
| 16 | DMA2_SPORT0_RX |
| 17 | DMA2_SPORT0_TX |
| 18 | DMA2_SPORT1_RX |
| 19 | DMA2_SPORT1_TX |
| 20 | DMA2_SPI0 |
| 21 | DMA2_UART0_RX |
| 22 | DMA2_UART0_TX |
| 23 | RESERVED |
| 24 | RESERVED |
| 25 | RESERVED |
| 26 | RESERVED |
| 27 | RESERVED |
| 28 | DMA2_MEM_STREAM0_DES |
| 39 | DMA2_MEM_STREAM0_SRC |
| 30 | DMA2_MEM_STREAM1_DES |
| 31 | DMA2_MEM_STREAM1_SRC |
| 32 | IMDMA_MEM_STREAM0_DES |
| 33 | IMDMA_MEM_STREAM0_SRC |
| 34 | IMDMA_MEM_STREAM1_DES |
| 35 | IMDMA_MEM_STREAM1_SRC |
| GPIO ID | GPIO in bf609 HRM |
|---|---|
| 0 | GPIO0 |
| 1 | GPIO1 |
| … | … |
| 112 | GPIO112 |
| System IRQ ID | System IRQ in b609 HRM |
|---|---|
| 0 | IRQ_SEC_ERR |
| 1 | IRQ_CGU_EVT |
| 2 | IRQ_WATCH0 |
| 3 | IRQ_WATCH1 |
| 4 | IRQ_L2CTL0_ECC_ERR |
| 5 | IRQ_L2CTL0_ECC_WARN |
| 6 | IRQ_C0_DBL_FAULT |
| 7 | IRQ_C1_DBL_FAULT |
| 8 | IRQ_C0_HW_ERR |
| 9 | IRQ_C1_HW_ERR |
| 10 | IRQ_C0_NMI_L1_PARITY_ERR |
| 11 | IRQ_C1_NMI_L1_PARITY_ERR |
| 12 | IRQ_TIMER0 |
| 13 | IRQ_TIMER1 |
| 14 | IRQ_TIMER2 |
| 15 | IRQ_TIMER3 |
| 16 | IRQ_TIMER4 |
| 17 | IRQ_TIMER5 |
| 18 | IRQ_TIMER6 |
| 19 | IRQ_TIMER7 |
| 20 | IRQ_TIMER_STAT |
| 21 | IRQ_PINT0 |
| 22 | IRQ_PINT1 |
| 23 | IRQ_PINT2 |
| 24 | IRQ_PINT3 |
| 25 | IRQ_PINT4 |
| 26 | IRQ_PINT5 |
| 27 | IRQ_CNT |
| 28 | IRQ_PWM0_TRIP |
| 29 | IRQ_PWM0_SYNC |
| 30 | IRQ_PWM1_TRIP |
| 31 | IRQ_PWM1_SYNC |
| 32 | IRQ_TWI0 |
| 33 | IRQ_TWI1 |
| 34 | IRQ_SOFT0 |
| 35 | IRQ_SOFT1 |
| 36 | IRQ_SOFT2 |
| 37 | IRQ_SOFT3 |
| 38 | IRQ_ACM_EVT_MISS |
| 39 | IRQ_ACM_EVT_COMPLETE |
| 40 | IRQ_CAN0_RX |
| 41 | IRQ_CAN0_TX |
| 42 | IRQ_CAN0_STAT |
| 43 | IRQ_SPORT0_TX |
| 44 | IRQ_SPORT0_TX_STAT |
| 45 | IRQ_SPORT0_RX |
| 46 | IRQ_SPORT0_RX_STAT |
| 47 | IRQ_SPORT1_TX |
| 48 | IRQ_SPORT1_TX_STAT |
| 49 | IRQ_SPORT1_RX |
| 50 | IRQ_SPORT1_RX_STAT |
| 51 | IRQ_SPORT2_TX |
| 52 | IRQ_SPORT2_TX_STAT |
| 53 | IRQ_SPORT2_RX |
| 54 | IRQ_SPORT2_RX_STAT |
| 55 | IRQ_SPI0_TX |
| 56 | IRQ_SPI0_RX |
| 57 | IRQ_SPI0_STAT |
| 58 | IRQ_SPI1_TX |
| 59 | IRQ_SPI1_RX |
| 60 | IRQ_SPI1_STAT |
| 61 | IRQ_RSI |
| 62 | IRQ_RSI_INT0 |
| 63 | IRQ_RSI_INT1 |
| 64 | IRQ_SDU |
| 65 | DMA12 Data Reserved |
| 66 | Reserved |
| 67 | Reserved |
| 68 | IRQ_EMAC0_STAT |
| 69 | EMAC0 Power Reserved |
| 70 | IRQ_EMAC1_STAT |
| 71 | EMAC1 Power Reserved |
| 72 | IRQ_LP0 |
| 73 | IRQ_LP0_STAT |
| 74 | IRQ_LP1 |
| 75 | IRQ_LP1_STAT |
| 76 | IRQ_LP2 |
| 77 | IRQ_LP2_STAT |
| 78 | IRQ_LP3 |
| 79 | IRQ_LP3_STAT |
| 80 | IRQ_UART0_TX |
| 81 | IRQ_UART0_RX |
| 82 | IRQ_UART0_STAT |
| 83 | IRQ_UART1_TX |
| 84 | IRQ_UART1_RX |
| 85 | IRQ_UART1_STAT |
| 86 | IRQ_MDMA0_SRC_CRC0 |
| 87 | IRQ_MDMA0_DEST_CRC0/ IRQ_MDMAS0 |
| 88 | IRQ_CRC0_DCNTEXP |
| 89 | IRQ_CRC0_ERR |
| 90 | IRQ_MDMA1_SRC_CRC1 |
| 91 | IRQ_MDMA1_DEST_CRC1/IRQ_MDMAS1 |
| 92 | IRQ_CRC1_DCNTEXP |
| 93 | IRQ_CRC1_ERR |
| 94 | IRQ_MDMA2_SRC |
| 95 | IRQ_MDMA2_DEST/IRQ_MDMAS2 |
| 96 | IRQ_MDMA3_SRC |
| 97 | IRQ_MDMA3_DEST/IRQ_MDMAS3 |
| 98 | IRQ_EPPI0_CH0 |
| 99 | IRQ_EPPI0_CH1 |
| 100 | IRQ_EPPI0_STAT |
| 101 | IRQ_EPPI2_CH0 |
| 102 | IRQ_EPPI2_CH1 |
| 103 | IRQ_EPPI2_STAT |
| 104 | IRQ_EPPI1_CH0 |
| 105 | IRQ_EPPI1_CH1 |
| 106 | IRQ_EPPI1_STAT |
| 107 | IRQ_PIXC_CH0 |
| 108 | IRQ_PIXC_CH1 |
| 109 | IRQ_PIXC_CH2 |
| 110 | IRQ_PIXC_STAT |
| 111 | IRQ_PVP_CPDOB |
| 112 | IRQ_PVP_CPDOC |
| 113 | IRQ_PVP_CPSTAT |
| 114 | IRQ_PVP_CPCI |
| 115 | IRQ_PVP_STAT0 |
| 116 | IRQ_PVP_MPDO |
| 117 | IRQ_PVP_MPDI |
| 118 | IRQ_PVP_MPSTAT |
| 119 | IRQ_PVP_MPCI |
| 120 | IRQ_PVP_CPDOA |
| 121 | IRQ_PVP_STAT1 |
| 122 | IRQ_USB_STAT |
| 123 | IRQ_USB_DMA |
| 124 | IRQ_TRU_INT0 |
| 125 | IRQ_TRU_INT1 |
| 126 | IRQ_TRU_INT2 |
| 127 | IRQ_TRU_INT3 |
| 128 | IRQ_DMAC0_ERROR |
| 129 | IRQ_CGU0_ERROR |
| 130 | Reserved |
| 131 | IRQ_DPM |
| 132 | Reserved |
| 133 | IRQ_SWU0 |
| 134 | IRQ_SWU1 |
| 135 | IRQ_SWU2 |
| 136 | IRQ_SWU3 |
| 137 | IRQ_SWU4 |
| 138 | IRQ_SWU5 |
| 139 | IRQ_SWU6 |
| DMA ID | DMA in bf609 HRM |
|---|---|
| 0 | CH_SPORT0_TX |
| 1 | CH_SPORT0_RX |
| 2 | CH_SPORT1_TX |
| 3 | CH_SPORT1_RX |
| 4 | CH_SPORT2_TX |
| 5 | CH_SPORT2_RX |
| 6 | CH_SPI0_TX |
| 7 | CH_SPI0_RX |
| 8 | CH_SPI1_TX |
| 9 | CH_SPI1_RX |
| 10 | CH_RSI |
| 11 | CH_SDU |
| 13 | CH_LP0 |
| 14 | CH_LP1 |
| 15 | CH_LP2 |
| 16 | CH_LP3 |
| 17 | CH_UART0_TX |
| 18 | CH_UART0_RX |
| 19 | CH_UART1_TX |
| 20 | CH_UART1_RX |
| 21 | CH_MEM_STREAM0_SRC_CRC0/CH_MEM_STREAM0_SRC |
| 22 | CH_MEM_STREAM0_DEST_CRC0/CH_MEM_STREAM0_DEST |
| 23 | CH_MEM_STREAM1_SRC_CRC1/CH_MEM_STREAM1_SRC |
| 24 | CH_MEM_STREAM1_DEST_CRC1/CH_MEM_STREAM1_DEST |
| 25 | CH_MEM_STREAM2_SRC |
| 26 | CH_MEM_STREAM2_DEST |
| 27 | CH_MEM_STREAM3_SRC |
| 28 | CH_MEM_STREAM3_DEST |
| 29 | CH_EPPI0_CH0 |
| 30 | CH_EPPI0_CH1 |
| 31 | CH_EPPI2_CH0 |
| 32 | CH_EPPI2_CH1 |
| 33 | CH_EPPI1_CH0 |
| 34 | CH_EPPI1_CH1 |
| 35 | CH_PIXC_CH0 |
| 36 | CH_PIXC_CH1 |
| 37 | CH_PIXC_CH2 |
| 38 | CH_PVP_CPDOB |
| 39 | CH_PVP_CPDOC |
| 40 | CH_PVP_CPSTAT |
| 41 | CH_PVP_CPCI |
| 42 | CH_PVP_MPDO |
| 43 | CH_PVP_MPDI |
| 44 | CH_PVP_MPSTAT |
| 45 | CH_PVP_MPCI |
| 46 | CH_PVP_CPDOA |
ret = sm_request_resource(EP_RESMGR_SERVICE, RESMGR_ID(RESMGR_TYPE_GPIO, 40), 0);
if (ret)
COREB_DEBUG(1, "request resource failed\n");
ret = sm_request_resource(EP_RESMGR_SERVICE, RESMGR_ID(RESMGR_TYPE_SYS_IRQ, 52), 0);
if (ret)
COREB_DEBUG(1, "request resource failed\n");
ret = sm_request_resource(EP_RESMGR_SERVICE, RESMGR_ID(RESMGR_TYPE_DMA, 20), 0);
if (ret)
COREB_DEBUG(1, "request resource failed\n");
sm_free_resource(EP_RESMGR_SERVICE, RESMGR_ID(RESMGR_TYPE_GPIO, 40), 0);
sm_free_resource(EP_RESMGR_SERVICE, RESMGR_ID(RESMGR_TYPE_SYS_IRQ, 52), 0);
sm_free_resource(EP_RESMGR_SERVICE, RESMGR_ID(RESMGR_TYPE_DMA, 20), 0);
The packet transfer protocol is to transfer data via local allocated buffer between processors. It is based on top of the former message protocol. Each processor should be able to access other processor's local memory pool via proper CPLB configuration.
This protocol is connectionless. One endpoint registered on one processor may receive packets sent from any src_enp on the other processors.
To send a packet, the packet protocol:
To received a packet in icc for bare metal application:
To received a packet in icc for OS
Message to deliver packet is with normal priority.
| message type | value | meaning |
|---|---|---|
| SM_PACKET_READY | SM_MSG_TYPE(SP_PACKET, 0) | The sender allocates memory for the packet. len = packet length; payload = packet address of buffer allocated by sender |
| SM_PACKET_CONSUMED | SM_MSG_TYPE(SP_PACKET, 1) | The receiver finishes processing the arriving packet and the sender can free its memory. len = packet length; payload = packet address of buffer allocated by sender |
| SM_PACKET_ERROR | SM_MSG_TYPE(SP_PACKET, 2) | Signal an error, payload field is an error code, len=0. Both sides free local buffers in the received packet list. |
| SM_PACKET_ERROR_ACK | SM_MSG_TYPE(SP_PACKET, 3) | In response to ERROR received. |
Endpoint reserved for broadcast packet.
/* * Protocol layer should dispatch packet of des_ep 0xFFFF to all receivers. * Receivers should not bind to this endpoint. */ #define EP_PACKET_BORADCAST 0xFFFF
Endpoint reserved for debug information service.
/* * Debug information service should bind to 0 end point on each processor. * Senders should not bind to this endpoint. */ #define EP_PACKET_DEBUG_INFO 0
The session packet transfer protocol establishes a connection between 2 endpoints on different processors to transfer data via local allocated buffers. It is based on top of the message protocol. Each processor should be able to access other processor's local memory pool via proper CPLB configuration.
In this protcol, connection should be established before packet can be delivered. The server should bind to a listening endpoint in advance. After receive a connection request message, the server creates a session with an endpoint pair of the src_enp in connection request and a new free local endpoint. Then, application can deliver packets over this session, while the server backs to monitor the listening endpoint. This session is closed only after connection close request and ACK are received by any party.
Broadcast data is not supported in this protocol.
Message for session packet protocol is with normal priority.
| message type | value | meaning |
|---|---|---|
| SM_SESSION_PACKET_CONNECT | SM_MSG_TYPE(SP_SESSION_PACKET, 0) | After allocate a new session and bind to a local endpoint, the client sends connection request to the server. |
| SM_SESSION_PACKET_CONNECT_ACK | SM_MSG_TYPE(SP_SESSION_PACKET, 1) | The server allocates a new session and responses to the connection request. After client receives ACK, it thinks the connection is established and start to transfer data over this session. No payload. |
| SM_SESSION_PACKET_CONNECT_DONE | SM_MSG_TYPE(SP_SESSION_PACKET, 2) | The client sends connection established status back to server after receive ACK and before real data transfer. No payload. After server receives DONE, server thinks the connection is established and wakes up application or thread to do data transfer on the new session. |
| SM_SESSION_PACKET_ACTIVE | SM_MSG_TYPE(SP_SESSION_PACKET, 3) | The client sends this message at a minute-level interval and wait for the ACK to keep the connection active after the connection succeeds. No payload. |
| SM_SESSION_PACKET_ACTIVE_ACK | SM_MSG_TYPE(SP_SESSION_PACKET, 4) | The server should answer the active tick message to keep the connection active. No payload. |
| SM_SESSION_PACKET_CLOSE | SM_MSG_TYPE(SP_SESSION_PACKET, 5) | Any party in the session can send connection close request to the other. No payload. After receiving CLOSE, free the session. |
| SM_SESSION_PACKET_CLOSE_ACK | SM_MSG_TYPE(SP_SESSION_PACKET, 6) | Response to the connection close request. No payload. After receiving ACK, free the session. |
| SM_SESSION_PACKET_READY | SM_MSG_TYPE(SP_SESSION_PACKET, 7) | The sender allocates memory for the packet. len = packet length; payload = packet address of buffer allocated by sender |
| SM_SESSION_PACKET_COMSUMED | SM_MSG_TYPE(SP_SESSION_PACKET, 8) | The receiver finishes processing the arriving packet and the sender can free its memory. len = packet length; payload = packet address of buffer allocated by sender |
| SM_SESSION_PACKET_ERROR | SM_MSG_TYPE(SP_SESSION_PACKET, 9) | Signal an error, payload field is an error code, len=0. Both sides free local buffers in the connection received data list. |
| SM_SESSION_PACKET_ERROR_ACK | SM_MSG_TYPE(SP_SESSION_PACKET, 10) | In response to ERROR received. |
To enable the session packet protocol without a standard socket stack, you have to have at least a simple stack library(API) to:
This library may differ on cores with different DSP bridge implementation.
Scalar transfer provide a efficient method to transmit scalars (8-bit, 16-bit, 32-bit and 64-bit variant) between endpoints. It is based on top of the former message protocol. Packet protocol tranfer pass a reference to local allocated buffers through ICC msg(payload, length). To transmit scalars efficiently payload and length of ICC sm_msg is used for passing 2 32-bits scalar data directly.
| message type | value | meaning |
|---|---|---|
| SM_SCALAR_READY_8 | SM_MSG_TYPE(SP_SCALAR, 0) | |
| SM_SCALAR_READY_16 | SM_MSG_TYPE(SP_SCALAR, 1) | |
| SM_SCALAR_READY_32 | SM_MSG_TYPE(SP_SCALAR, 2) | |
| SM_SCALAR_READY_64 | SM_MSG_TYPE(SP_SCALAR, 3) | |
| SM_SCALAR_CONSUMED | SM_MSG_TYPE(SP_SCALAR, 4) | |
| SM_SCALAR_ERROR | SM_MSG_TYPE(SP_SCALAR, 5) | |
| SM_SCALAR_ERROR_ACK | SM_MSG_TYPE(SP_SCALAR, 6) |
Like scalar transfer, session scalar transfer alse transmit scalars (8-bit, 16-bit, 32-bit and 64-bit variant) between endpoints. It is based on top of the former message protocol. In this protcol, connection should be established before scalar data can be delivered.
| message type | value | meaning |
|---|---|---|
| SM_SESSION_SCALAR_READY_8 | SM_MSG_TYPE(SP_SESSION_SCALAR, 0) | |
| SM_SESSION_SCALAR_READY_16 | SM_MSG_TYPE(SP_SESSION_SCALAR, 1) | |
| SM_SESSION_SCALAR_READY_32 | SM_MSG_TYPE(SP_SESSION_SCALAR, 2) | |
| SM_SESSION_SCALAR_READY_64 | SM_MSG_TYPE(SP_SESSION_SCALAR, 3) | |
| SM_SESSION_SCALAR_COMSUMED | SM_MSG_TYPE(SP_SESSION_SCALAR, 4) | |
| SM_SESSION_SCALAR_ERROR | SM_MSG_TYPE(SP_SESSION_SCALAR, 5) | |
| SM_SESSION_SCALAR_ERROR_ACK | SM_MSG_TYPE(SP_SESSION_SCALAR, 6) | |
| SM_SESSION_SCALAR_CONNECT | SM_MSG_TYPE(SP_SESSION_SCALAR, 7) | |
| SM_SESSION_SCALAR_CONNECT_ACK | SM_MSG_TYPE(SP_SESSION_SCALAR, 8) | |
| SM_SESSION_SCALAR_CONNECT_DONE | SM_MSG_TYPE(SP_SESSION_SCALAR, 9) | |
| SM_SESSION_SCALAR_ACTIVE | SM_MSG_TYPE(SP_SESSION_SCALAR, 10) | |
| SM_SESSION_SCALAR_ACTIVE_ACK | SM_MSG_TYPE(SP_SESSION_SCALAR, 11) | |
| SM_SESSION_SCALAR_CLOSE | SM_MSG_TYPE(SP_SESSION_SCALAR, 12) | |
| SM_SESSION_SCALAR_CLOSE_ACK | SM_MSG_TYPE(SP_SESSION_SCALAR, 13) | |
This section describes a framework to be implemented on Linux that will use the above communication protocols.
The design goal is to be able to control Core B in a generic way as possible from (userspace and kernel) to load/start/stop/reload any potential acceleration or RTOS task that a user may want to do.
To accomplish this, we lean on the OSI network model, which we review here, to provide a little context.
The OSI model was developed by the International Organization for Standardization (ISO) as a guideline for developing standards to enable the interconnection of dissimilar computing devices. It is important to understand that the OSI model is not itself a communication standard. In other words, it is not an agreed-on method that governs how data is sent and received; it is only a guideline for developing such standards.
It would be difficult to overstate the importance of the OSI model. Virtually all vendors and users of products which must communicate over the network understand how important it is that their products adhere to and fully support the networking standards this model has generated.
When a vendor's products adhere to the standards the OSI model has generated, connecting those products to other vendors' products is relatively simple. Conversely, the further a vendor departs from those standards, the more difficult it becomes to connect that vendor's products to those of other vendors.
In addition, if a vendor were to depart from the communication standards the model has engendered, software development efforts would be very difficult because the vendor would have to build every part of all necessary software, rather than being able to build on the existing work of other vendors.
In the “Core B” scenario, the implications are the same. By providing standard communications methods, and allowing people to build on these standard methods, it will make interoperability higher, at the same time as lowering development costs.
The basic communication framework is intended to allow both message and stream based communication in both synchronous or asynchronous way. A simple API is defined and libraries are provided for both:
At this time, only the layer 1 to layer 3 protocols are defined -- anything higher than just passing raw data back and forth are implementation and user application dependent.
There are two kind of interface available for both the Linux application and bare metal application.
From the view of a linux user, the icc is a device driver that control the DSP devices, and bridges the the program runing on DSPs and linux user applications. The program running on DSP, is an ELF non-relocatable binary. It can be loaded by the icc driver per the request of the Linux user application.
Kernel icc driver will build a packet list for each registered end point. The packets from the current dsp side will be copyed and added to this list, waiting user application to fetch.
If the DSP device is opened in non-block mode. Poll by select system call or register signal SIG_DSP_PACKET_ARRIVE and do real message receiving operation in application.
DSP bridge ioctl commands are executed under the combination efforts of main CPU and DSP device.
char *pathname[]; ioctl(fd, CMD_DSP_LOAD, pathname); ioctl(fd, CMD_DSP_START, NULL); ioctl(fd, CMD_DSP_STOP, NULL); ioctl(fd, CMD_DSP_RESET, NULL);
Network layer interface is to transfer buffers among linux user application, kernel driver, and program running on DSP core.
/*
* remote_ep - destination end point in sending operation, local endpoint which receiver binds to
* local_ep - sender's endpoint in sending operation, should be 0 in receiving operation
* buf_len is used to indicate the actual data size to send or have been received.
* type - packet protocol type, connectionless or connection packet, SP_PACKET or SP_SESSION_PACKET
*/
struct sm_packet {
sm_uint32_t session_idx;
sm_uint32_t local_ep;
sm_uint32_t remote_ep;
sm_uint32_t type;
sm_uint32_t dst_cpu;
sm_uint32_t src_cpu;
sm_uint32_t buf_len;
void *buf;
};
ioctl commands:
struct sm_packet pkt;
char buf[64] = "1234567890abcdef";
memset(&pkt, 0, sizeof(struct sm_packet));
pkt.local_ep = 9;
pkt.remote_ep = 5;
pkt.type = SP_PACKET;
pkt.dst_cpu = 1;
pkt.buf_len = 16;
pkt.buf = buf;
ioctl(fd, CMD_SM_CREATE, &pkt);
ioctl(fd, CMD_SM_SEND, &pkt);
ioctl(fd, CMD_SM_SHUTDOWN, &pkt);
struct sm_packet pkt;
char buf[64] = "1234567890abcdef";
memset(&pkt, 0, sizeof(struct sm_packet));
pkt.local_ep = 9;
pkt.remote_ep = 6;
pkt.type = SP_SESSION_PACKET;
pkt.dst_cpu = 1;
pkt.buf_len = 16;
pkt.buf = payload;
printf("sp packet %d\n", pkt.type);
printf("begin create ep\n");
ioctl(fd, CMD_SM_CREATE, &pkt);
printf("finish create ep session index = %d\n", pkt.session_idx);
ioctl(fd, CMD_SM_CONNECT, &pkt);
ioctl(fd, CMD_SM_SEND, &pkt);
ioctl(fd, CMD_SM_RECV, &pkt);
/* get buffer from pkt.buf */
ioctl(fd, CMD_SM_SHUTDOWN, &pkt);
icc device nodes are /dev/icc. When coreb dsp binary is loaded by icc driver, it starts each dsp to initialize dsp's cplb and event contoller properly. IPI interrupt is configured especially for dsp bridge message and control notification. After initialization is done, DSP devices sleep in idle loop in IRQ level 15. These DSP initialization and idle loop code and data are in shared memory for all DSPs and main CPU.
Each DSP application should implement two enrances(icc_task_init, icc_task_exit). icc_task_init is for DSP application to register its end point and protocol based packet dispatch functions. DSP runs this entrance in EVT7 mode when it is asked to start by a task run message. The DSP applications should call icc_wait() to wait for any incoming messages or register session handler callbacks via registration API- sm_registe_session_handler(). After task_init it exit to EVT15 and wait for new message to handle. icc_task_exit is for DSP application end running and exit with cleanup.
sample1
sm_uint32_t __icc_task_data session_index;
void icc_task_init(int argc, char *argv[])
{
struct sm_session *session;
void *buf;
int len;
int ret;
int src_ep, src_cpu;
session_index = sm_create_session(LOCAL_SESSION, SP_PACKET);
coreb_msg("%s() %s %s index %d\n", __func__, argv[0], argv[1], session_index);
if (session_index >= 32)
coreb_msg("create session failed\n");
while (1) {
coreb_msg("task loop\n");
if (icc_wait()) {
ret = sm_recv_packet(session_index, &src_ep, &src_cpu, &buf, len);
if (ret <= 0) {
coreb_msg("recv packet failed\n");
}
/* handle payload */
coreb_msg("processing msg %s\n", buf);
if (*(char *)buf == '1') {
int len = 64;
int dst_ep = src_ep;
int dst_cpu = src_cpu;
void *send_buf = sm_send_request(len, session_index);
coreb_msg("coreb send buf %x\n", send_buf);
if (!send_buf)
coreb_msg("NO MEM\n");
memset(send_buf, 0, len);
strcpy(send_buf, "finish");
sm_send_packet(session_index, dst_ep, dst_cpu, send_buf, len);
} else {
coreb_msg("msg payload %s \n", buf);
}
sm_recv_release(buf, len, session_index);
}
}
coreb_msg("%s() end\n", __func__);
}
void icc_task_exit(void)
{
sm_destroy_session(session_index);
}
sample2
void icc_task_init(int argc, char *argv[])
{
struct sm_session *session;
index = sm_create_session(LOCAL_SESSION, SP_PACKET);
coreb_msg("%s() %s %s index %d\n", __func__, argv[0], argv[1], index);
if (index >= 32)
coreb_msg("create session failed\n");
session = &coreb_info.icc_info.sessions_table[index];
sm_registe_session_handler(index, default_session_handle);
coreb_msg("%s() end\n", __func__);
}
void icc_task_exit(void)
{
sm_destroy_session(index);
}
int default_session_handle(struct sm_message *msg, struct sm_session *session)
{
void *buf;
sm_uint32_t len;
int ret;
coreb_msg(" %s session %d msg %s \n",__func__, session->local_ep, msg->payload);
coreb_msg("dst %d dstep %d, src %d, srcep %d\n", msg->dst, msg->dst_ep, msg->src, msg->src_ep);
ret = sm_recv_packet(index, &buf, len);
if (ret <= 0) {
coreb_msg("recv packet failed\n");
return ret;
}
/* handle payload */
coreb_msg("processing msg %s\n", buf);
if (*(char *)buf == '1') {
int len = 64;
int dst_ep = msg->src_ep;
int dst_cpu = msg->src;
void *send_buf = sm_send_request(len, session);
coreb_msg("coreb send buf %x\n", send_buf);
if (!send_buf)
coreb_msg("NO MEM\n");
memset(send_buf, 0, len);
*(char *)send_buf = 'f';
sm_send_packet(index, dst_ep, dst_cpu, send_buf, len);
} else {
coreb_msg("msg payload %s \n", buf);
}
sm_recv_release(buf, len, session);
DSP application call register_packet_dispatch_callback to register its packet dispatch function and sender's clean up function in main entrance. The registered packet receive callback functions are invoked in EVT15 mode(IPEND = 0x8000) as well.
/*
* endpoint - bind to a local endpoint to receive incoming packet.
* src_cpuid - processor who sends the incoming packet.
* src_enp - source endpoint of the incoming packet.
* len - the length of the buffer.
* packet - the buffer pointer.
*/
int sm_register_session_handler(sm_uint32_t session_idx,
void (*handle)(struct sm_message *message, struct sm_session *session))
After session is connected, send and receive data is same as packet transfer by sm_send_packet() and sm_recv_packet().
This example is based on network layer communication APIs defined for both Linux application and the bare metal DSP application.
simple packet sample packet.c at: http://blackfin.uclinux.org/gf/project/uclinux-dist/scmsvn/?action=browse&path=%2Ftrunk%2Fuser%2Fblkfin-apps%2Ficc_utils%2Fexample%2Ftest_app%2Fpacket.c&view=markup&revision=10273
dsp side sample task1.c at:
The bare metal application should be linked with the dsp bridge library in order to interact with linux application properly. The offset address of entry main() can be discovered by dsp bridge kernel moduel when loading.
The compile command, when compiling on linux host, seems like,
$bfin-elf-gcc -T coreb.lds -mcpu=bf561 -D__DSP__ coreb.c dsp_bridge.a -o coreb.bin
The linker scripts coreb.lds, seems like,
MEMORY
{
MEM_L1_CODE : ORIGIN = 0xFF600000, LENGTH = 0x4000
MEM_L1_CODE_CACHE : ORIGIN = 0xFF610000, LENGTH = 0x4000
MEM_L1_SCRATCH : ORIGIN = 0xFF700000, LENGTH = 0x1000
MEM_L1_DATA_B : ORIGIN = 0xFF500000, LENGTH = 0x8000
MEM_L1_DATA_A : ORIGIN = 0xFF400000, LENGTH = 0x8000
MEM_L2 : ORIGIN = 0xFEB00000, LENGTH = 0x20000
}
OUTPUT_FORMAT("elf32-bfin", "elf32-bfin",
"elf32-bfin")
OUTPUT_ARCH(bfin)
ENTRY(_main)
SECTIONS
{
.text_l1 :
{
/*
* Here is the reserved jump instruction to jump to the Linux
* dsp device driver core B init code.
*/
. = MEM_L1_CODE + 0x10;
*(.l1.text)
} >MEM_L1_CODE =0
.text :
{
/*
* Here is the static shared message queues between core A and B.
*/
. = MEM_L2 + 0x40;
*(.text.*)
} >MEM_L2 =0
.l2 :
{
*(.l2 .l2.*)
} >MEM_L2 =0
.data_l1 :
{
*(.l1.data)
} >MEM_L1_DATA_A =0
.data :
{
*(.data .data.*)
} >MEM_L2
.bss :
{
__bss_start = .;
*(.bss .bss.*)
__bss_end = .;
} >MEM_L2
__stack_end = ORIGIN(MEM_L1_SCRATCH) + LENGTH(MEM_L1_SCRATCH);
}
Following aspects are described:
The DSP bridge relies on endpoint 0 to control DSP application status via core control protocol. Message dispatch loop on DSP core can react to the core control commands.
enum {
EP_CORE_CONTROL = 0;
};
Bare metal application is loaded by user space loader into core B memory space. The main and disatch entries in application and its dsp_bridge library are figured out by the loader. The loader informs the dsp_driver of these entry address.
On core running Linux
On core running bare metal application
Local system interrupt Supp0 and core interrupt EVT15 are always reserved for DSP bridge library.
Following APIs are to be implemented in DSP bridge library:
ICC framework is now enabled for both BF561 and BF609.
Steps to run and test the initial ICC implementation for Linux and bare metal can be found at test_icc.
You should following the example under folder icc_utils/. The ICC stub(main event loop) for core B should be loaded by icc_utility from Linux filesystem before loading any further core B ICC applications. This can be done in /etc/rc or any time later. You can also build the ICC stub and applications for core B into one elf binary and load at once after kernel bootup.
You can either boot the RTOS in the same way as ICC stub or boot it from proper address in NOR flash directly with the help of u-boot/kernel.
GDB and gdbserver over ethernet/UART is the only way to debug Linux application on core A. For RTOS on core B, that depends on the application debuging tool available in that RTOS. For bare metal application on core B, only JTAG tool is applicable, such as GDB and gdbproxy over JTAG. To debug 2 ICC applications on 2 cores, you have to run 2 debugging instances concurrently and stepping each application individually.
Process to debug ICC applications:
To better address the issue of proprietary Inter-Processor Communication (IPC), the Multicore Association (MCA) created an API-based standard called the Multicore Communication API (MCAPI). MCAPI is used in AMP configurations that require communication and synchronization between multiple operating system instances. MCAPI defines three fundamental communication types. These are: 1. messages - connection-less datagrams 2. packet channels - connection-oriented, uni-directional, FIFO packet streams 3. scalar channels - connection-oriented single word uni-directional, FIFO packet streams
An MCAPI domain is comprised of one or more MCAPI nodes in a multicore topology, and it is used for routing purposes. Potential uses for domains: separation between different transports
An MCAPI node is a logical abstraction that can be mapped to many entities, including but not limited to: a process, a thread, a instance of an operating system, a hardware accelerator, or a proccessor core.
MCAPI endpoints are socket-like communication termination points.
Channels provide point-to-point FIFO connections between a pair of endpoints. MCAPI channels are unidirectional.
MCAPI implementation concerns Link management
The MCAPI specification is both an API and communications sematic specification. It does not define which link management, device model or wired protocol underneath it.
In our use case, we will implement MCAPI2.0 APIs which sit on top of ICC protocol. The domain field in our impelmentation will be used to separate different transport type(0 for ICC protocol in our case). In long term there will be other transport types other than ICC according to the new multi-core architecture. The node ID will be used to identify processor cores (e.g. 0 for coreA 1 for coreB on BF561). The port id will be used to map to the ICC session to communicate with another end on another core.
mcapi application interfaces, initialize and finalize mcapi, create endpoints, manage mcapi data communication between two endpoints. We can implement the mcapi on top of icc by modifying the transport layer. CoreA node and CoreB node will be statically created on coreA and coreB, each is a logical abstraction instance of a core node(or OS node). Then MCAPI ports can be implemented on top if ICC sessions, each endpoint maps a ICC session. MCAPI endpoints identified by a <domain_id, node_id, port_id> tuple will map to <node, session> in ICC layer, then data delivery between a pair of MCAPI endpoints can be implemented on top of ICC.
OS specific resource management layer, manage share memory, semaphore synchromization,
device node implementation on top of ICC
intercore communication layer base on share memory and intercore interrupt
physical data delivery layer, can be L2 shared memory, link port, etc.
Steps to run and test the MCAPI 2.0 implementation for Linux and bare metal on BF561 or BF609 can be found at test_mcapi.