HOME

TheInfoList



OR:

Software development for the
Cell microprocessor The Cell Broadband Engine (Cell/B.E.) is a 64-bit multi-core processor and microarchitecture developed by Sony, Toshiba, and IBM—an alliance known as "STI". It combines a general-purpose PowerPC core, called the Power Processing Element (PPE), ...
involves a mixture of conventional development practices for the
PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple Inc., App ...
-compatible PPU core, and novel software development challenges with regard to the functionally reduced SPU coprocessors.


Linux on Cell

An open source software-based strategy was adopted to accelerate the development of a Cell BE ecosystem and to provide an environment to develop Cell applications, including a GCC-based Cell compiler, binutils and a port of the Linux operating system.


Octopiler

Octopiler is
IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
's prototype
compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
to allow
software developer Software development is the process of designing and Implementation, implementing a software solution to Computer user satisfaction, satisfy a User (computing), user. The process is more encompassing than Computer programming, programming, wri ...
s to write
code In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communicati ...
for Cell processors.


Software portability


Adapting VMX for SPU


Differences between VMX and SPU

The VMX (Vector Multimedia Extensions) technology is conceptually similar to the vector model provided by the SPU processors, but there are many significant differences. The VMX ''
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
mode'' conforms to the Java Language Specification 1 subset of the default
IEEE Standard The Institute of Electrical and Electronics Engineers Standards Association (IEEE SA) is an operating unit within IEEE that develops global standards in a broad range of industries, including: power and energy, artificial intelligence systems, ...
, extended to include IEEE and C9X compliance where the Java standard falls silent. In a typical implementation, non-Java mode converts denormal values to zero but Java mode traps into an emulator when the processor encounters such a value. The IBM PPE Vector/SIMD manual does not define operations for double-precision floating point, though IBM has published material implying certain double-precision performance numbers associated with the Cell PPE VMX technology.


Intrinsics

Compilers for Cell provide
intrinsic In science and engineering, an intrinsic property is a property of a specified subject that exists itself or within the subject. An extrinsic property is not essential or inherent to the subject that is being characterized. For example, mass i ...
s to expose useful SPU instructions in C and C++. Instructions that differ only in the type of operand (such as a, ai, ah, ahi, fa, and dfa for addition) are typically represented by a single C/C++ intrinsic which selects the proper instruction based on the type of the operand.


Porting VMX code for SPU

A substantial amount of VMX ( Altivec) code developed for
IBM Power microprocessors Power microprocessors (originally POWER prior to Power10) are designed and sold by IBM for Server (computing), servers and supercomputers. The name "POWER" was originally presented as an acronym for "Performance Optimization With Enhanced RISC ...
, particularly under the
PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple Inc., App ...
version of
macOS macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...
, can potentially be adapted for use on the SPU. The feasibility of porting depends on the extent of VMX-specific features used—ranging from straightforward to impractical. However, key workloads typically map well to the SPU architecture. In some cases, existing VMX code can be ported directly. If the VMX code is highly generic (makes few assumptions about the execution environment) the translation can be relatively straightforward. The two processors specify a different binary code format, so recompilation is required at a minimum. Even where instructions exist with the same behaviors, they do not have the same instruction names, so this must be mapped as well. IBM's development toolkit includes compiler intrinsics that automate much of this mapping. In many cases, however, a directly equivalent instruction does not exist. The workaround might be obvious or it might not. For example, if saturation behavior is required on the SPU, it can be coded by adding additional SPU instructions to accomplish this (with some loss of efficiency). At the other extreme, if Java floating-point semantics are required, this is almost impossible to achieve on the SPU processor. To achieve the same computation on the SPU might require that an entirely different
algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
be written from scratch. The most important conceptual similarity between VMX and the SPU architecture is supporting the same
vectorization model Vectorization may refer to: Computing * Array programming, a style of computer programming where operations are applied to whole arrays instead of individual elements * Automatic vectorization, a compiler optimization that transforms loops to vec ...
. For this reason, most algorithms adapted to Altivec will usually adapt successfully to the SPU architecture as well.


Local store exploitation

Transferring data between the local stores of different SPUs can have a large performance cost. The local stores of individual SPUs can be exploited using a variety of strategies. Applications with high locality, such as dense matrix computations, represent an ideal workload class for the local stores in Cell BE. Streaming computations can be efficiently accommodated using
software pipelining In computer science, software pipelining is a technique used to optimize loops, in a manner that parallels hardware pipelining. Software pipelining is a type of out-of-order execution, except that the reordering is done by a compiler (or in the ...
of memory block transfers using a multi-buffering strategy. The software cache offers a solution for random accesses. More sophisticated applications can use multiple strategies for different data types.


References


The Cell Project at IBM Research

Optimizing Compiler for a CELL Processor

Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture


{{DEFAULTSORT:Cell Software Development Cell BE architecture Compilers Vaporware