Paul Stravers, Chief Architect and Marco Jacobs, Director of Marketing, both at Vector Fabrics discuss the benefits of an advanced EDA tool for parallel hardware programming that gets right to the core of designs

Silicon platform vendors are pushing multicore architectures as the dominant paradigm to improve performance as Moore’s Law continues to increase the transistor density. The result is that many consumer electronics devices include multicore silicon. Amazon’s £180 Kindle Fire Tablet includes a dual core processor. The heart of Sony’s announced PlayStation Vita handheld game console is a quad core processor.

Another trend, which is perhaps not as visible, is the growing role of GPUs in these systems. Where in the past these graphics processing units were dedicated to rendering graphics, they have now become generic, massively parallel compute units that can be used to offload the multicore host and accelerate applications, resulting in a snappier user interface, faster internet browsing, or even enabling applications that previously weren’t possible.

While the multicore option solves the hardware problem, it only aggravates the software problem.

The time of software designers being greatly challenged to adapt their software to match the parallel structure of the target silicon platform has come. Without this match there can be no efficient system, hence no incentive for system architects to design multicore platforms into their products.

Silicon platform owners are increasingly aware of the need for tools that help their customers program their parallel hardware. These tools not only need to be able to analyse the concurrency potential of applications, but the tools also need to have a strong awareness of the opportunities of the target hardware architecture.

A common problem that programmers have to deal with is estimating the performance of a potential application partitioning. On the one hand, partitioning introduces concurrency to the application, which may increase its performance.

In addition, mapping code onto a specific hard-wired accelerator increases performance. On the other hand, the partitioning introduces overhead to the application, which in turn decreases the application performance. This overhead comes from thread creation and cleanup, synchronisation, and increased latency and bandwidth due to communication across cores.

For example, the vfEmbedded tool from Vector Fabrics models the target architecture, such as GPGPUs as well as SMP x86 or ARM Cortex-A9 CPUs, and estimates the performance for the parallelised application. This way, the developer gets insight into parallel application performance and can decide his best parallelisation strategy.

The code is parallelised by the tool, using a point and click user interface, enabling quick evaluation of different design choices.

Availability of core architectures

In addition, within a single platform, different core architectures may be available, e.g. a multicore CPU, a GPU and DSP cluster. Tools should quickly point out what parts of the application are best suited for each of the available cores, and calculate what the overall speedup will be.

To make up the balance and see whether a proposed partitioning contributes positively to the performance bottom line requires a deep insight in the operation of the target hardware, as well as a deep insight in the structure of the application.

Programmers have neither the time, nor the hardware expertise to oversee the cost and performance impact of all possible partitioning and mapping.

Platform vendors, on the other hand, know exactly what their platforms are capable of. The challenge for these platform owners is to share this technical know-how with their customers, preferably without disclosing trade secrets, and in a way that is easy for their customers to digest.

The solution is to encapsulate the vendor’s platform knowledge in tools capable of predicting the cost and performance impact of partitioning and parallelisation options. The tools give the software developer the insight into how their application will perform on the target architecture, and allows users to see the performance impact on their chosen partitioning and mapping.

In addition to predicting the cost and performance impact of parallelisation options, the tool also points the platform user to the relevant platform software APIs that help users implement parallelisation choices.

Often, some software restructuring is involved, e.g. a loop body needs to be outlined and wrapped in a function body so it can be passed to a thread constructor. The details often differ per platform, and often there are different ways of accomplishing similar goals on a single platform.

With the growing complexity of parallel hardware architectures, it is becoming increasingly difficult for the hardware architects to transfer application integration knowledge to system integrators. With every new silicon generation the problem is aggravated.

At the same time, the availability of an ever-growing number of implementation APIs makes it even harder for application integrators to make the best partitioning and mapping choices.

Tools that allow integrators to efficiently use even the most advanced silicon platforms is a necessary requirement for silicon vendors to continue on the path set out by Moore’s Law.

Vector Fabrics

www.vectorfabrics.com