VLIW
   HOME

TheInfoList



OR:

Very long instruction word (VLIW) refers to
instruction set architecture In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ...
s designed to exploit
instruction level parallelism Instruction-level parallelism (ILP) is the parallel or simultaneous execution of a sequence of instructions in a computer program. More specifically ILP refers to the average number of instructions run per step of this parallel execution. Disc ...
(ILP). Whereas conventional
central processing unit A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, a ...
s (CPU, processor) mostly allow programs to specify instructions to execute in sequence only, a VLIW processor allows programs to explicitly specify instructions to execute in
parallel Parallel is a geometric term of location which may refer to: Computing * Parallel algorithm * Parallel computing * Parallel metaheuristic * Parallel (software), a UNIX utility for running programs in parallel * Parallel Sysplex, a cluster o ...
. This design is intended to allow higher performance without the complexity inherent in some other designs.


Overview

The traditional means to improve performance in processors include dividing instructions into substeps so the instructions can be executed partly at the same time (termed ''pipelining''), dispatching individual instructions to be executed independently, in different parts of the processor (''
superscalar A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a sup ...
architectures''), and even executing instructions in an order different from the program (''
out-of-order execution In computer engineering, out-of-order execution (or more formally dynamic execution) is a paradigm used in most high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In this paradigm, a proces ...
''). These methods all complicate hardware (larger circuits, higher cost and energy use) because the processor must make all of the decisions internally for these methods to work. In contrast, the VLIW method depends on the programs providing all the decisions regarding which instructions to execute simultaneously and how to resolve conflicts. As a practical matter, this means that the
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
(software used to create the final programs) becomes more complex, but the hardware is simpler than in many other means of parallelism.


Motivation

A processor that executes every instruction one after the other (i.e., a non- pipelined scalar architecture) may use processor resources inefficiently, yielding potential poor performance. The performance can be improved by executing different substeps of sequential instructions simultaneously (termed ''pipelining''), or even executing multiple instructions entirely simultaneously as in
superscalar A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a sup ...
architectures. Further improvement can be achieved by executing instructions in an order different from that in which they occur in a program, termed
out-of-order execution In computer engineering, out-of-order execution (or more formally dynamic execution) is a paradigm used in most high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In this paradigm, a proces ...
. These three methods all raise hardware complexity. Before executing any operations in parallel, the processor must verify that the instructions have no interdependencies. For example, if a first instruction's result is used as a second instruction's input, then they cannot execute at the same time and the second instruction cannot execute before the first. Modern out-of-order processors have increased the hardware resources which schedule instructions and determine interdependencies. In contrast, VLIW executes operations in parallel, based on a fixed schedule, determined when programs are
compiled In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
. Since determining the order of execution of operations (including which operations can execute simultaneously) is handled by the compiler, the processor does not need the scheduling hardware that the three methods described above require. Thus, VLIW CPUs offer more computing with less hardware complexity (but greater compiler complexity) than do most superscalar CPUs. This is also complimentary to the idea that as many computations as possible should be done before the program is executed, at compile time.


Design

In superscalar designs, the number of execution units is invisible to the instruction set. Each instruction encodes one operation only. For most superscalar designs, the instruction width is 32 bits or fewer. In contrast, one VLIW instruction encodes multiple operations, at least one operation for each execution unit of a device. For example, if a VLIW device has five execution units, then a VLIW instruction for the device has five operation fields, each field specifying what operation should be done on that corresponding execution unit. To accommodate these operation fields, VLIW instructions are usually at least 64 bits wide, and far wider on some architectures. For example, the following is an instruction for the
Super Harvard Architecture Single-Chip Computer {{Distinguish, SuperH The Super Harvard Architecture Single-Chip Computer (SHARC) is a high performance floating-point and fixed-point DSP from Analog Devices. SHARC is used in a variety of signal processing applications ranging from audio pro ...
(SHARC). In one cycle, it does a floating-point multiply, a floating-point add, and two autoincrement loads. All of this fits in one 48-bit instruction: f12 = f0 * f4, f8 = f8 + f12, f0 = dm(i0, m3), f4 = pm(i8, m9); Since the earliest days of computer architecture, some CPUs have added several
arithmetic logic unit In computing, an arithmetic logic unit (ALU) is a combinational digital circuit that performs arithmetic and bitwise operations on integer binary numbers. This is in contrast to a floating-point unit (FPU), which operates on floating point num ...
s (ALUs) to run in parallel.
Superscalar A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a sup ...
CPUs use hardware to decide which operations can run in parallel at runtime, while VLIW CPUs use software (the compiler) to decide which operations can run in parallel in advance. Because the complexity of instruction scheduling is moved into the compiler, complexity of hardware can be reduced substantially. A similar problem occurs when the result of a parallelizable instruction is used as input for a branch. Most modern CPUs ''guess'' which branch will be taken even before the calculation is complete, so that they can load the instructions for the branch, or (in some architectures) even start to compute them speculatively. If the CPU guesses wrong, all of these instructions and their context need to be '' flushed'' and the correct ones loaded, which takes time. This has led to increasingly complex instruction-dispatch logic that attempts to guess correctly, and the simplicity of the original
reduced instruction set computing In computer engineering, a reduced instruction set computer (RISC) is a computer designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a complex instruction set compu ...
(RISC) designs has been eroded. VLIW lacks this logic, and thus lacks its energy use, possible design defects, and other negative aspects. In a VLIW, the compiler uses heuristics or profile information to guess the direction of a branch. This allows it to move and preschedule operations speculatively before the branch is taken, favoring the most likely path it expects through the branch. If the branch takes an unexpected way, the compiler has already generated compensating code to discard speculative results to preserve program semantics.
Vector processor In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data calle ...
cores (designed for large one-dimensional arrays of data called ''vectors'') can be combined with the VLIW architecture such as in the Fujitsu
FR-V The Fujitsu FR-V (Fujitsu RISC- VLIW) is one of the very few processors ever able to process both a very long instruction word (VLIW) and vector processor instructions at the same time, increasing throughput with high parallel computing while incre ...
microprocessor, further increasing
throughput Network throughput (or just throughput, when in context) refers to the rate of message delivery over a communication channel, such as Ethernet or packet radio, in a communication network. The data that these messages contain may be delivered ove ...
and
speed In everyday use and in kinematics, the speed (commonly referred to as ''v'') of an object is the magnitude of the change of its position over time or the magnitude of the change of its position per unit of time; it is thus a scalar quant ...
.


History

The concept of VLIW architecture, and the term ''VLIW'', were invented by
Josh Fisher Joseph A "Josh" Fisher is an American and Spanish computer scientist noted for his work on VLIW architectures, compiling, and instruction-level parallelism, and for the founding of Multiflow Computer. He is a Hewlett-Packard Senior Fellow ...
in his research group at
Yale University Yale University is a private research university in New Haven, Connecticut. Established in 1701 as the Collegiate School, it is the third-oldest institution of higher education in the United States and among the most prestigious in the w ...
in the early 1980s. His original development of
trace scheduling Trace scheduling is an optimization technique developed by Josh Fisher used in compilers for computer programs. A compiler often can, by rearranging its generated machine instructions for faster execution, improve program performance. It incre ...
as a compiling method for VLIW was developed when he was a graduate student at
New York University New York University (NYU) is a private research university in New York City. Chartered in 1831 by the New York State Legislature, NYU was founded by a group of New Yorkers led by then- Secretary of the Treasury Albert Gallatin. In 1832, th ...
. Before VLIW, the notion of prescheduling
execution unit In computer engineering, an execution unit (E-unit or EU) is a part of the central processing unit (CPU) that performs the operations and calculations as instructed by the computer program. It may have its own internal control sequence unit (not ...
s and instruction-level parallelism in software was well established in the practice of developing
horizontal microcode In processor design, microcode (μcode) is a technique that interposes a layer of computer organization between the central processing unit (CPU) hardware and the programmer-visible instruction set architecture of a computer. Microcode is a la ...
. Before Fisher the theoretical aspects of what would be later called VLIW were developed by the Soviet computer scientist Mikhail Kartsev based on his Sixties work on military-oriented M-9 and M-10 computers. His ideas were later developed and published as a part of a textbook two years before Fisher's seminal paper, but because of the
Iron Curtain The Iron Curtain was the political boundary dividing Europe into two separate areas from the end of World War II in 1945 until the end of the Cold War in 1991. The term symbolizes the efforts by the Soviet Union (USSR) to block itself and its ...
and because Kartsev's work was mostly military-related it remained largely unknown in the West. Fisher's innovations involved developing a compiler that could target horizontal microcode from programs written in an ordinary
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
. He realized that to get good performance and target a
wide-issue A wide-issue architecture is a computer processor that issues more than one instruction per clock cycle. They can be considered in three broad types: * Statically-scheduled superscalar architectures execute instructions in the order presented; ...
machine, it would be necessary to find parallelism beyond that generally within a
basic block In compiler construction, a basic block is a straight-line code sequence with no branches in except to the entry and no branches out except at the exit. This restricted form makes a basic block highly amenable to analysis. Compilers usually deco ...
. He also developed region scheduling methods to identify parallelism beyond basic blocks. Trace scheduling is such a method, and involves scheduling the most likely path of basic blocks first, inserting compensating code to deal with speculative motions, scheduling the second most likely trace, and so on, until the schedule is complete. Fisher's second innovation was the notion that the target CPU architecture should be designed to be a reasonable target for a compiler; that the compiler and the architecture for a VLIW processor must be codesigned. This was inspired partly by the difficulty Fisher observed at Yale of compiling for architectures like
Floating Point Systems Floating Point Systems, Inc. (FPS), was a Beaverton, Oregon vendor of attached array processors and minisupercomputers. The company was founded in 1970 by former Tektronix engineer Norm Winningstad, with partners Tom Prince, Frank Bouton and Rob ...
' FPS164, which had a
complex instruction set computing A complex instruction set computer (CISC ) is a computer architecture in which single instructions can execute several low-level operations (such as a load from memory, an arithmetic operation, and a memory store) or are capable of multi-step o ...
(CISC) architecture that separated instruction initiation from the instructions that saved the result, needing very complex scheduling algorithms. Fisher developed a set of principles characterizing a proper VLIW design, such as self-draining pipelines, wide multi-port
register file A register file is an array of processor registers in a central processing unit (CPU). Register banking is the method of using a single name to access multiple different physical registers depending on the operating mode. Modern integrated circuit- ...
s, and
memory architecture Memory architecture describes the methods used to implement electronic computer data storage in a manner that is a combination of the fastest, most reliable, most durable, and least expensive way to store and retrieve information. Depending on the ...
s. These principles made it easier for compilers to emit fast code. The first VLIW compiler was described in a Ph.D. thesis by John Ellis, supervised by Fisher. The compiler was named Bulldog, after Yale's mascot. Fisher left Yale in 1984 to found a startup company,
Multiflow {{no references, date=June 2019 Multiflow Computer, Inc., founded in April, 1984 near New Haven, Connecticut, USA, was a manufacturer and seller of minisupercomputer hardware and software embodying the VLIW design style. Multiflow, incorporated in ...
, along with cofounders John O'Donnell and John Ruttenberg. Multiflow produced the TRACE series of VLIW
minisupercomputer Minisupercomputers constituted a short-lived class of computers that emerged in the mid-1980s, characterized by the combination of vector processing and small-scale multiprocessing. As scientific computing using vector processors became more po ...
s, shipping their first machines in 1987. Multiflow's VLIW could issue 28 operations in parallel per instruction. The TRACE system was implemented in a mix of medium-scale integration (MSI), large-scale integration (LSI), and very large-scale integration (VLSI), packaged in cabinets, a technology obsoleted as it grew more cost-effective to integrate all of the components of a processor (excluding memory) on one chip. Multiflow was too early to catch the following wave, when chip architectures began to allow multiple-issue CPUs. The major semiconductor companies recognized the value of Multiflow technology in this context, so the compiler and architecture were subsequently licensed to most of these firms.


Implementations

Cydrome Cydrome (1984−1988) was a computer company established in San Jose of the Silicon Valley region in California. Its mission was to develop a numeric processor. The founders were David Yen, Wei Yen, Ross Towle, Arun Kumar, and Bob Rau (the chie ...
was a company producing VLIW numeric processors using
emitter-coupled logic In electronics, emitter-coupled logic (ECL) is a high-speed integrated circuit bipolar transistor logic family. ECL uses an overdriven bipolar junction transistor (BJT) differential amplifier with single-ended input and limited emitter current to ...
(ECL) integrated circuits in the same timeframe (late 1980s). This company, like Multiflow, failed after a few years. One of the licensees of the Multiflow technology is
Hewlett-Packard The Hewlett-Packard Company, commonly shortened to Hewlett-Packard ( ) or HP, was an American multinational information technology company headquartered in Palo Alto, California. HP developed and provided a wide variety of hardware components ...
, which
Josh Fisher Joseph A "Josh" Fisher is an American and Spanish computer scientist noted for his work on VLIW architectures, compiling, and instruction-level parallelism, and for the founding of Multiflow Computer. He is a Hewlett-Packard Senior Fellow ...
joined after Multiflow's demise.
Bob Rau Bantwal Ramakrishna "Bob" Rau (1951 – December 10, 2002) was a computer engineer and HP Fellow. Rau was a founder and chief architect of Cydrome, where he helped develop the Very long instruction word technology that is now common in modern com ...
, founder of Cydrome, also joined HP after Cydrome failed. These two would lead computer architecture research at Hewlett-Packard during the 1990s. Along with the above systems, during the same time (1989–1990), Intel implemented VLIW in the
Intel i860 The Intel i860 (also known as 80860) is a RISC microprocessor design introduced by Intel in 1989. It is one of Intel's first attempts at an entirely new, high-end instruction set architecture since the failed Intel iAPX 432 from the beginning of ...
, their first 64-bit microprocessor, and the first processor to implement VLIW on one chip. This processor could operate in both simple RISC mode and VLIW mode:
In the early 1990s, Intel introduced the i860 RISC microprocessor. This simple chip had two modes of operation: a scalar mode and a VLIW mode. In the VLIW mode, the processor always fetched two instructions and assumed that one was an integer instruction and the other floating-point.
The i860's VLIW mode was used extensively in embedded
digital signal processor A digital signal processor (DSP) is a specialized microprocessor chip, with its architecture optimized for the operational needs of digital signal processing. DSPs are fabricated on MOS integrated circuit chips. They are widely used in audio s ...
(DSP) applications since the application execution and datasets were simple, well ordered and predictable, allowing designers to fully exploit the parallel execution advantages enabled by VLIW. In VLIW mode, the i860 could maintain floating-point performance in the range of 20-40 double-precision MFLOPS; a very high value for its time and for a processor running at 25-50Mhz. In the 1990s, Hewlett-Packard researched this problem as a side effect of ongoing work on their
PA-RISC PA-RISC is an instruction set architecture (ISA) developed by Hewlett-Packard. As the name implies, it is a reduced instruction set computer (RISC) architecture, where the PA stands for Precision Architecture. The design is also referred to as ...
processor family. They found that the CPU could be greatly simplified by removing the complex dispatch logic from the CPU and placing it in the compiler. Compilers of the day were far more complex than those of the 1980s, so the added complexity in the compiler was considered to be a small cost. VLIW CPUs are usually made of multiple RISC-like
execution unit In computer engineering, an execution unit (E-unit or EU) is a part of the central processing unit (CPU) that performs the operations and calculations as instructed by the computer program. It may have its own internal control sequence unit (not ...
s that operate independently. Contemporary VLIWs usually have four to eight main execution units. Compilers generate initial instruction sequences for the VLIW CPU in roughly the same manner as for traditional CPUs, generating a sequence of RISC-like instructions. The compiler analyzes this code for dependence relationships and resource requirements. It then schedules the instructions according to those constraints. In this process, independent instructions can be scheduled in parallel. Because VLIWs typically represent instructions scheduled in parallel with a longer instruction word that incorporates the individual instructions, this results in a much longer
opcode In computing, an opcode (abbreviated from operation code, also known as instruction machine code, instruction code, instruction syllable, instruction parcel or opstring) is the portion of a machine language instruction that specifies the operat ...
(termed ''very long'') to specify what executes on a given cycle. Examples of contemporary VLIW CPUs include the TriMedia media processors by
NXP NXP Semiconductors N.V. (NXP) is a Dutch semiconductor designer and manufacturer with headquarters in Eindhoven, Netherlands. The company employs approximately 31,000 people in more than 30 countries. NXP reported revenue of $11.06 billion in 2 ...
(formerly Philips Semiconductors), the
Super Harvard Architecture Single-Chip Computer {{Distinguish, SuperH The Super Harvard Architecture Single-Chip Computer (SHARC) is a high performance floating-point and fixed-point DSP from Analog Devices. SHARC is used in a variety of signal processing applications ranging from audio pro ...
(SHARC) DSP by Analog Devices, the
ST200 family The ST200 is a family of very long instruction word (VLIW) processor cores based on technology jointly developed by Hewlett-Packard Laboratories and STMicroelectronics under the name Lx. The main application of the ST200 family is embedded media ...
by STMicroelectronics based on the Lx architecture (designed in Josh Fisher's HP lab by Paolo Faraboschi), the
FR-V The Fujitsu FR-V (Fujitsu RISC- VLIW) is one of the very few processors ever able to process both a very long instruction word (VLIW) and vector processor instructions at the same time, increasing throughput with high parallel computing while incre ...
from
Fujitsu is a Japanese multinational information and communications technology equipment and services corporation, established in 1935 and headquartered in Tokyo. Fujitsu is the world's sixth-largest IT services provider by annual revenue, and the la ...
, the BSP15/16 from Pixelworks, the CEVA-X DSP from CEVA, the Jazz DSP from Improv Systems, the HiveFlex series from Silicon Hive, and the MPPA Manycore family by Kalray. The Texas Instruments TMS320 DSP line has evolved, in its C6000 family, to look more like a VLIW, in contrast to the earlier C5000 family. These contemporary VLIW CPUs are mainly successful as embedded media processors for consumer electronic devices. VLIW features have also been added to configurable processor cores for
system-on-a-chip A system on a chip or system-on-chip (SoC ; pl. ''SoCs'' ) is an integrated circuit that integrates most or all components of a computer or other electronic system. These components almost always include a central processing unit (CPU), memory ...
(SoC) designs. For example, Tensilica's
Xtensa Tensilica was a company based in Silicon Valley in the semiconductor intellectual property core business. It is now a part of Cadence Design Systems. Tensilica is known for its customizable Xtensa microprocessor core. Other products include: HiF ...
LX2 processor incorporates a technology named Flexible Length Instruction eXtensions (FLIX) that allows multi-operation instructions. The Xtensa C/C++ compiler can freely intermix 32- or 64-bit FLIX instructions with the Xtensa processor's one-operation RISC instructions, which are 16 or 24 bits wide. By packing multiple operations into a wide 32- or 64-bit instruction word and allowing these multi-operation instructions to intermix with shorter RISC instructions, FLIX allows SoC designers to realize VLIW's performance advantages while eliminating the
code bloat In computer programming, code bloat is the production of program code (source code or machine code) that is perceived as unnecessarily long, slow, or otherwise wasteful of resources. Code bloat can be caused by inadequacies in the programming la ...
of early VLIW architectures. The Infineon Carmel DSP is another VLIW processor core intended for SoC. It uses a similar code density improvement method called ''configurable long instruction word'' (CLIW). Outside embedded processing markets, Intel's
Itanium Itanium ( ) is a discontinued family of 64-bit Intel microprocessors that implement the Intel Itanium architecture (formerly called IA-64). Launched in June 2001, Intel marketed the processors for enterprise servers and high-performance comput ...
IA-64
explicitly parallel instruction computing Explicitly parallel instruction computing (EPIC) is a term coined in 1997 by the HP–Intel alliance to describe a computing paradigm that researchers had been investigating since the early 1980s. This paradigm is also called ''Independence'' a ...
(EPIC) and
Elbrus 2000 The Elbrus 2000, E2K (russian: Эльбрус 2000) is a Russian 512-bit wide VLIW microprocessor developed by Moscow Center of SPARC Technologies (MCST) and fabricated by TSMC. It supports two instruction set architectures (ISA): * Elbrus VLI ...
appear as the only examples of a widely used VLIW CPU architectures. However, EPIC architecture is sometimes distinguished from a pure VLIW architecture, since EPIC advocates full instruction predication, rotating register files, and a very long instruction word that can encode non-parallel instruction groups. VLIWs also gained significant consumer penetration in the
graphics processing unit A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, m ...
(GPU) market, though both
Nvidia Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
and AMD have since moved to RISC architectures to improve performance on non-graphics workloads.
ATI Technologies ATI Technologies Inc. (commonly called ATI) was a Canadian semiconductor technology corporation based in Markham, Ontario, that specialized in the development of graphics processing units and chipsets. Founded in 1985 as Array Technology Inc., ...
' (ATI) and
Advanced Micro Devices Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufact ...
' (AMD) TeraScale microarchitecture for
graphics processing unit A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, m ...
s (GPUs) is a VLIW microarchitecture. In December 2015, the first shipment of PCs based on VLIW CPU Elbrus-4s was made in Russia. The Neo by REX Computing is a processor consisting of a 2D mesh of VLIW cores aimed at power efficiency. The
Elbrus 2000 The Elbrus 2000, E2K (russian: Эльбрус 2000) is a Russian 512-bit wide VLIW microprocessor developed by Moscow Center of SPARC Technologies (MCST) and fabricated by TSMC. It supports two instruction set architectures (ISA): * Elbrus VLI ...
(russian: Эльбрус 2000) and its successors are Russian 512-bit wide VLIW microprocessors developed by
Moscow Center of SPARC Technologies MCST (russian: МЦСТ, acronym for Moscow Center of SPARC Technologies) is a Russian microprocessor company that was set up in 1992. Different types of processors made by MCST were used in personal computers, servers and computing systems. MCS ...
(MCST) and fabricated by
TSMC Taiwan Semiconductor Manufacturing Company Limited (TSMC; also called Taiwan Semiconductor) is a Taiwanese multinational semiconductor contract manufacturing and design company. It is the world's most valuable semiconductor company, the world' ...
.


Backward compatibility

When silicon technology allowed for wider implementations (with more execution units) to be built, the compiled programs for the earlier generation would not run on the wider implementations, as the encoding of binary instructions depended on the number of execution units of the machine.
Transmeta Transmeta Corporation was an American fabless semiconductor company based in Santa Clara, California. It developed low power x86 compatible microprocessors based on a VLIW core and a software layer called Code Morphing Software. Code Morphing S ...
addressed this issue by including a binary-to-binary software compiler layer (termed '' code morphing'') in their
Crusoe Crusoe may refer to: Art, entertainment, and media * ''Crusoe'' (film), a 1989 film by Caleb Deschanel based on the novel ''Robinson Crusoe'' * ''Crusoe'' (TV series), a 2008 television series based on the novel ''Robinson Crusoe'' * Crusoe the ...
implementation of the x86 architecture. This mechanism was advertised to basically recompile, optimize, and translate x86 opcodes at runtime into the CPU's internal machine code. Thus, the Transmeta chip is ''internally'' a VLIW processor, effectively decoupled from the x86 CISC
instruction set In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ...
that it executes. Intel's
Itanium Itanium ( ) is a discontinued family of 64-bit Intel microprocessors that implement the Intel Itanium architecture (formerly called IA-64). Launched in June 2001, Intel marketed the processors for enterprise servers and high-performance comput ...
architecture (among others) solved the backward-compatibility problem with a more general mechanism. Within each of the multiple-opcode instructions, a bit field is allocated to denote dependency on the prior VLIW instruction within the program instruction stream. These bits are set at
compile time In computer science, compile time (or compile-time) describes the time window during which a computer program is compiled. The term is used as an adjective to describe concepts related to the context of program compilation, as opposed to concep ...
, thus relieving the hardware from calculating this dependency information. Having this dependency information encoded in the instruction stream allows wider implementations to issue multiple non-dependent VLIW instructions in parallel per cycle, while narrower implementations would issue a smaller number of VLIW instructions per cycle. Another perceived deficiency of VLIW designs is the
code bloat In computer programming, code bloat is the production of program code (source code or machine code) that is perceived as unnecessarily long, slow, or otherwise wasteful of resources. Code bloat can be caused by inadequacies in the programming la ...
that occurs when one or more execution unit(s) have no useful work to do and thus must execute ''No Operation'' NOP instructions. This occurs when there are dependencies in the code and the instruction pipelines must be allowed to drain before later operations can proceed. Since the number of transistors on a chip has grown, the perceived disadvantages of the VLIW have diminished in importance. VLIW architectures are growing in popularity, especially in the
embedded system An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded ...
market, where it is possible to customize a processor for an application in a
system-on-a-chip A system on a chip or system-on-chip (SoC ; pl. ''SoCs'' ) is an integrated circuit that integrates most or all components of a computer or other electronic system. These components almost always include a central processing unit (CPU), memory ...
.


See also

*
Explicitly parallel instruction computing Explicitly parallel instruction computing (EPIC) is a term coined in 1997 by the HP–Intel alliance to describe a computing paradigm that researchers had been investigating since the early 1980s. This paradigm is also called ''Independence'' a ...
(EPIC) *
Elbrus (computer) The Elbrus (russian: Эльбрус) is a line of Soviet and Russian computer systems developed by the Lebedev Institute of Precision Mechanics and Computer Engineering. These computers are used in the space program, nuclear weapons research, ...
*
Itanium Itanium ( ) is a discontinued family of 64-bit Intel microprocessors that implement the Intel Itanium architecture (formerly called IA-64). Launched in June 2001, Intel marketed the processors for enterprise servers and high-performance comput ...
* Movidius (SHAVE core) *
Single instruction, multiple data Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
*
Single instruction, multiple threads Single instruction, multiple threads (SIMT) is an execution model used in parallel computing where single instruction, multiple data (SIMD) is combined with multithreading. It is different from SPMD in that all instructions in all "threads" are exe ...
*
Transport triggered architecture In computer architecture, a transport triggered architecture (TTA) is a kind of processor_(computing), processor design in which programs directly control the internal transport bus (computing), buses of a processor. Computation happens as a side e ...
(TTA)


References


External links


Paper That Introduced VLIWs

Book on the history of Multiflow Computer, VLIW pioneering company

ISCA "Best Papers" Retrospective On Paper That Introduced VLIWs

VLIW and Embedded Processing

FR500 VLIW-architecture High-performance Embedded Microprocessor


*[http://opac.lib.rpi.edu/search~S6?/Xyerazunis&m=t&m=r&m=g&m=a&m=o&SORT=D&searchscope=6/Xyerazunis&m=t&m=r&m=g&m=a&m=o&SORT=D&searchscope=6&SUBKEY=yerazunis/1%2C14%2C14%2CB/frameset&FF=Xyerazunis&m=t&m=r&m=g&m=a&m=o&SORT=D&searchscope=6&1%2C1%2C DIS: an Architecture for fast LISP execution.] A similar VLIW architecture, with a parallelizing compiler directed toward LISP. {{DEFAULTSORT:Very Long Instruction Word Very long instruction word computing, Digital signal processing Instruction processing * Parallel computing