|
Overview The hArtes Hardware Platform (hAP from now on) is the workhorse for all demonstration activities associated with the project. As such it plays a key role in the structure of the project, in that it serves two complementary purposes:
- It must provide the computing muscle needed for the demanding applications that we plan to demonstrate within the project.
- It must be a friendly target architecture for the hArtes suite of tools.
Key features of the hAP structure are the following: - integrate several elements of an heterogeneous computer architecture and provide sufficient interconnection harness between these elements (both at the data and control level) to support computational tasks closely integrated between those elements;
- configure the hAP as a modular system, starting from a simple and cheap entry-level configuration and extending to top-of-the-range systems, offering high computational power (at the level of some tens of Gflops);
- define an architecture in which new processing elements (that is, additional heterogeneous components) can be incorporated in a compatible way at a later stage of the design process;
- provide adequate input-output channels, both at the general purpose level (e.g., USB connections) and at the level of the specific standards applicable in the realm of audio signal processing.
The overall structure of the hAP is sketched in the following figure. | The system contains a certain number (2 in the original design, to be later upgraded to 4) of in principle independent blocks, each of which contains a RISC processor, a DSP processor and an application specific configurable block, that also contains a RISC processor. We call the basic building block (RISC processor + configurable element) the Basic Configurable Element (BCE). |
Figure 1: the Hardware platform. The system contains a certain number (2 in the original design, to be later upgraded to 4) of independent blocks, each of which contains a RISC processor, a DSP processor and an application specific configurable block, that also contains a RISC processor. We call the basic building block (RISC processor + configurable element) the Basic Configurable Element (BCE). One may ask why such a complex structure has been chosen. The simple reason is that real life applications have diverse and conflicting requirements that cannot be met with high efficiency by just a single architecture. For instance, traditional computer architecture (like IA-32) are not particularly efficient for DSP processing. On the other hand, an FPGA block is very good at handling streaming applications, but neither of themis a good choice to support a friendly human interface. In our case, we try to provide the best structure for each part of a complex application. The price that one has to pay with this approach is that the target applications have to be neatly divided into specific threads, running on the most appropriate sub-systems and communicating among themselves whenever necessary. This is by no means a simple task. Even if one is able to do this, then specific program development for each thread on the target sub-system has to be taken into account. This is why heterogeneous systems have not been popular, in spite of their potential advantages. hArtes plans to change this situation by creating a set of tools able to partition the application, map it to the appropriate hardware, and optimize it for performance in an almost automatic way. In this context, the hAP bears responsibility for providing the basic capabilities that the hArtes software tools will then perform. Details on the system are given in the following sections. Architecture The top-level typical architecture of the hAP is shown in the following diagram (for the case of a system with two BCE):
| The following points should be noted: - The two BCEs have the same structure and are able to run any selected thread of a large application
- The two BCEs are connected by a direct data link with high-bandwidth (~ 200 Mbyte/sec) and low latency (just a few system clocks). The links are used to implement a shared memory system between the two BCEs, so application data can be easily shared between the units. However, private (distributed) memory is also available on each BCE, for faster access to local data. In this way, complex applications sharing global data segments can run on the application with minimal porting effort. However tuning for performance is possible, as data segments that do not need to be shared are moved to distributed memories within the appropriate BCE.
- Dedicated hardware for massive data streaming is available on the system. Up to 64 input channels and up to 64 ouput audio channels are available on 8+8 ADAT interfaces. Audio streams are moved to/from memory buffers with full hardware support (implemented in the FPGA blocks) so no programming overhead is associated to audio streaming.
|
Figure 2: the Hardware platform with two BCEs and external I/O The next picture provides a more detailed description of a BCE. The picture shows the two main hardware blocks, corresponding to the RISC/DSP processor (the D940HF chip by Atmel) and the reconfigurable processor based on a Field Programmable Gate Array (FPGA) of recent generation, the Xilinx Virtex4-FX140 or Virtex4-FX100. 
Figure 3: schematic of the Basic Configurable Element. Several comments are needed here: - Each of the two processor has a private memory bank, independent of the other, boosting overall memory bandwidth needed to sustain a high computational throughput.
- The FPGA based processor has also a shared memory bank. This bank can be shared at several levels, i.e. i) by the two processors of the BCE itself, ii) by the processor of any other BCE available on the system. Memory sharing at this level is fully supported at the hardware level.
- Memory sharing (or any other program-controlled pattern of BCE to BCE communication) is handled by the high speed links connecting the BCEs. These links not only have high bandwidth (of the order of 200 Mbyte/sec) but also have a very small communication latency (just a few clock cycles) making data transfers of even short data packets very efficient.
- Dedicated data links (independent of the inter-BCE communication harness) are provided between the BCE and all streaming audio interfaces (there are 8 input and 8 output ADAT audio interfaces, catering for up to 64 high-quality audio streams in both directions). Hardware supports data streaming between the interfaces and designated memory buffers.
- Each BCE has several standard input/output interfaces (ethernet, USB, UARTs) to be used for connecting the system to other systems (typically under operating system control).
The physical layout of the system is shown in the following picture: 
Figure 4: drawing of the main board. The role of the hHp within hArtes The hHp is basically a prototype system, that focuses much more on its role as an effective playground for exercising the suite of the hArtes project tools rather than on the development of a hardware product, with a commercial value of its own. The hHp has several specific features associated with its role within the project: - Each BCE is a parallel system on its own, able to adapt to different computing requirements: just one BCE may be enough for simple applications, while the more demanding applications that we have in mind (e.g., immersive audio) may require that all four BCEs work together.
- Each BCE has several independent processors on board: the Atmel processor contains a well known RISC processor (the ARM) tightly coupled to a high performance DSP core, while the Xilinx FPGA itself contains yet another RISC CPU (the IBM PowerPC) as well as a massive uncommitted logic, to be configured on a program-by-program basis. Each of the two processors are by themselves examples of a heterogeneous architecture. Thus they are the ideal testground for the hArtes tools. Audio centered applications might for instance run only on the Atmel D940 processor, which is able to provide e.g. a friendly user interface on the ARM as well as powerful computing muscles for audio processing on its DSP core. A more complex application might transcode (or reformat) streaming data by appropriate reconfiguration of the FPGA, pass this data onto the DSP core for processing, with all of this processing being coordinated by an ARM based thread.
- The hHp has unusual and very powerful audio streaming capabilities. As already remarked, up to 64 input and 64 output audio channels can be handled by the system (such a large number of channels is necessary for high end immersive audio applications). Even more important, from the point of view of performance, there is powerfull hardware support for these data streams. This means that input audio streams are moved automatically (without software overhead) to memory buffers (designated by the application software) and that, at the same time, audio data processed by the system is also automatically sent to the appropriate output channels.
Public documentation This section contains all available public documentation about the hArtes Application Platform. More will be added as the project evolves. [1] - Presentation of the Hardware Platform at the CASTNESS'08 conference (Computing Architectures and Software Tools for Numerical Embedded Scalable Systems), held in Rome in January 2008: castness2008hardPlatform.pdf, PDF, 1.90 MB [2] - This document contains more detailed technical specifications for the hArtes Hardware platform: boards_spec-hArtes-UniFE_12d.pdf, PDF, 1.85 MB [3] - Two presentations describing the architecture of the Hardware Platform, the shared memory issues and some estimates of memory access delay: rome_20070906.pdf, PDF, 3.65 MB arch_and_shmem.pdf, PDF, 2.20 MB (update of the previous presentation) Related links
|