The AMD Am29000, commonly shortened to 29k, is a family of 32-bit
RISC
In electronics and computer science, a reduced instruction set computer (RISC) is a computer architecture designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a comp ...
microprocessor
A microprocessor is a computer processor (computing), processor for which the data processing logic and control is included on a single integrated circuit (IC), or a small number of ICs. The microprocessor contains the arithmetic, logic, a ...
s and
microcontroller
A microcontroller (MC, uC, or μC) or microcontroller unit (MCU) is a small computer on a single integrated circuit. A microcontroller contains one or more CPUs (processor cores) along with memory and programmable input/output peripherals. Pro ...
s developed and fabricated by
Advanced Micro Devices
Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a Information technology, hardware and F ...
(AMD). Based on the seminal
Berkeley RISC, the 29k added a number of significant improvements. They were commonly used in
laser printer
Laser printing is an electrostatic digital printing process. It produces high-quality text and graphics (and moderate-quality photographs) by repeatedly passing a laser beam back and forth over a Electric charge, negatively charged cylinder call ...
s from several manufacturers of the era and well documented as being used in the high-end HP Color LaserJet series from the first model Color LaserJet (Am29030) up to and including the HP Color LaserJet 5 which uses a Am29040.
Developed since 1984–1985, announced in March 1987 and released in May 1988,
the initial Am29000 was followed by several versions, ending with the Am29040 in 1995. The 29050 was notable for being early to feature a
floating point unit capable of executing one
multiply–add operation per cycle.
AMD was designing a
superscalar
A superscalar processor (or multiple-issue processor) is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single in ...
version until late 1995, when AMD dropped the development of the 29k because the design team was transferred to support the PC (
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
) side of the business. What remained of AMD's embedded business was realigned towards the embedded 186 family of
80186 derivatives. By then the majority of AMD's resources were concentrated on their high-performance x86 processors for desktop PCs, using many of the ideas and individual parts of the 29k designs to produce the
AMD K5.
Design
The 29k evolved from the same
Berkeley RISC design that also led to the
Sun SPARC,
Intel i960
Intel's i960 (or 80960) is a RISC-based microprocessor design that became popular during the early 1990s as an embedded system, embedded microcontroller. It became a best-selling CPU in that segment, along with the competing AMD 29000. In spite ...
,
ARM and
RISC-V
RISC-V (pronounced "risk-five") is an open standard instruction set architecture (ISA) based on established reduced instruction set computer (RISC) principles. The project commenced in 2010 at the University of California, Berkeley. It transfer ...
.
One design element used in some of the Berkeley RISC-derived designs is the concept of
register windows, a technique used to speed up
procedure calls significantly. The idea is to use a large set of
registers as a stack, loading local data into a set of registers during a call, and marking them "dead" when the procedure returns. Values being returned from the routines would be placed in the "global page", the top eight registers in the SPARC (for instance). The competing early RISC design from
Stanford University
Leland Stanford Junior University, commonly referred to as Stanford University, is a Private university, private research university in Stanford, California, United States. It was founded in 1885 by railroad magnate Leland Stanford (the eighth ...
, the
Stanford MIPS
MIPS, an acronym for Microprocessor without Interlocked Pipeline Stages, was a research project conducted by John L. Hennessy at Stanford University between 1981 and 1984. MIPS investigated a type of instruction set architecture (ISA) now called ...
, also looked at this concept but decided that improved compilers could make more efficient use of general purpose registers than a hard-wired window.
In the original Berkeley design, SPARC, and i960, the windows were fixed in size. A routine using only one local variable would still use up eight registers on the SPARC, wasting this expensive resource. It was here that the 29000 differed from these earlier designs, using a variable window size. In this example only two registers would be used, one for the local variable, another for the
return address
In postal mail, a return address is an explicit inclusion of the address of the person sending the message. It provides the recipient (and sometimes authorized intermediaries) with a means to determine how to respond to the sender of the message ...
. It also added more registers, including the same 128 registers for the procedure stack, but adding another 64 for global access. In comparison, the SPARC had 128 registers in total, and the global set was a standard window of eight. This change resulted in much better register use in the 29000 under a wide variety of workloads.
The 29000 also extended the register window stack with an in-memory (and in theory, in-cache) stack. When the window filled the calls would be pushed off the end of the register stack into memory, restored as required when the routine returned. Generally, the 29000's register usage was considerably more advanced than competing designs based on the Berkeley concepts.
Another difference with the Berkeley design is that the 29000 included no special-purpose condition code register. Any register could be used for this purpose, allowing the conditions to be easily saved at the expense of complicating some code. A Branch Target Cache (512 bytes on the 29000 and 1024 bytes on the 29050) stored sets of 4 or 2 sequential instructions found at the branch target address, reducing the instruction fetch latency during taken branches—the 29000 did not include any
branch prediction system so there was a delay if a branch was taken. It means the 29000 has a single branch
delay slot
In computer architecture, a delay slot is an instruction slot being executed without the effects of a preceding instruction. The most common form is a single arbitrary instruction located immediately after a branch instruction (computer science) ...
. The buffer mitigated this by storing four or two instructions from the target address of the branch, which could be run instantly while the fetch buffer was re-filled with new instructions from memory.
Support for virtual address translation followed a similar approach to that of the MIPS architecture. A 64-entry
translation lookaside buffer
A translation lookaside buffer (TLB) is a memory CPU cache, cache that stores the recent translations of virtual memory address to a physical memory Memory_address, location. It is used to reduce the time taken to access a user memory location. It ...
(TLB) retained mappings from virtual to physical addresses, and upon an untranslated address being encountered, the resulting TLB "miss" would cause the processor to trap to a software routine responsible for providing any appropriate mapping to physical memory. In contrast to the MIPS approach which employed a ''random'' register to select the TLB entry to be replaced upon a TLB miss event, the 29000 provided a dedicated ''lru'' (least recently used) register.
Some products in the 29000 family provided only 16 TLB entries to be able to dedicate part of the silicon to peripherals. To compensate, the maximum page size employed by a mapping was increased from 8 KB to 16 MB.
Versions
The first Am29000 was released in 1988, including a built-in
MMU but
floating point support was offloaded to the Am29027
FPU. Units with failed MMU or Branch Target Cache were sold as the Am29005.
In 1991 the line was extended with the Am29030 and Am29035, which included an 8
KB or 4 KB of instruction cache, respectively. By then the Am29050 had also become available, without on-chip cache but featuring a
floating-point unit
A floating-point unit (FPU), numeric processing unit (NPU), colloquially math coprocessor, is a part of a computer system specially designed to carry out operations on floating-point numbers. Typical operations are addition, subtraction, multip ...
with fully pipelined
multiply–accumulate operations, a larger 1 KB Branch Target Cache with a claimed 80% hit rate, and better-pipelined load operations sped up by a 4-entry
TLB-like Physical Address Cache. Though it is not a
superscalar
A superscalar processor (or multiple-issue processor) is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single in ...
processor, it permits a floating-point operation and an integer operation to complete at the same cycle. The integer and floating-point sides each have an own write port to the registers.
It contained 428,000 transistors on a 1-micron process with a 0.8-micron effective channel length
and was available at 20, 25, 33, and 40 MHz. Later the Am29040 was released at 33, 40, and 50 MHz, being like the Am29030 except for featuring a 4 KB data cache, a multiplication unit, and a few other enhancements. The 119 mm
2 Am29040 contained 1.2 million transistors on a 0.7-micron process.
A
superscalar
A superscalar processor (or multiple-issue processor) is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single in ...
version of 29K was being designed, but canceled in favor of x86. It was codenamed ''Jaguar'',
and was described in November 1994 and August 1995.
It was an advanced design, capable of four-way dispatch into six
reservation stations and
speculative out-of-order execution
In computer engineering, out-of-order execution (or more formally dynamic execution) is an instruction scheduling paradigm used in high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In t ...
of instructions, with four-way retire. The register file permitted four reads and two writes at once. The caches for instructions and data were 8 KB each. Loads from cache
could bypass stores. It had no on-chip FPU due to cost reasons and the target market. It was expected to attain 100 MHz frequency on a 0.4-micron process.
AMD used the unreleased 29K microarchitecture as the basis of the
K5 series of
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
-compatible processors. The ALUs were carried over, as was the
re-order buffer
A re-order buffer (ROB) is a hardware unit used in an extension to Tomasulo's algorithm to support out-of-order and speculative instruction execution. The extension forces instructions to be committed in-order.
The buffer is a circular buffer ...
with a slight modification. The
FPU was taken from the 29050, but extended to
80 bits precision. The K5 translated the x86 instructions into "RISC-OPs" upon decoding, aided by the predecode information held of the cached instructions. AMD claimed that the superscalar 29K would have only a slightly lower performance than the K5, but much lower cost due to the size difference.
The Honeywell 29KII is a CPU based on the AMD 29050, and it was extensively used in real-time avionics.
Image:AMD_Am29000_die.JPG, Am29000
Image:AMD_Am29030_die.jpg, Am29030
Image:AMD_Am29040_die.JPG, Am29040
Image:AMD_Am29050_die.JPG, Am29050
Products and applications
Positioned as a product for "medium- to high-performance embedded applications" with potential for use in Unix workstations,
the 29000 was used in a variety of products such as X terminals, laser printer controller cards, graphics accelerator cards, optical character recognition solutions, and network bridges.
The memory architecture of the 29000 was a particular attraction for product designers, allowing them to forego external cache memory and to employ dynamic RAM directly while maintaining acceptable performance,
permitting a degree of flexibility in the choice of memory technologies used to retain program instructions and data.
The 29k saw some use as a computational accelerator or coprocessor, particularly on the Macintosh and IBM PC-compatible platforms. For instance, Yarc Systems Corporation produced 29k-based "RISC coprocessor" cards for
Macintosh II and
PC AT systems, alongside other "CISC coprocessor" cards featuring Motorola 68020 and 68030 processors, and "parallel coprocessor" cards featuring T800
transputer
The transputer is a series of pioneering microprocessors from the 1980s, intended for parallel computing. To support this, each transputer had its own integrated memory and serial communication links to exchange data with other transputers. ...
processors.
Its ''NuSuper'' (originally named the ''McCray''
) and ''AT-Super'' cards, employing the Am29000 CPU and Am29027 floating-point accelerator,
were followed by the ''MacRageous'', upgrading the CPU to the Am29050.
Such accelerator cards offered performance several times that of the Macintosh II itself and benchmarked competitively with RISC workstations such as the
DECstation 3100. Multiple cards could also be fitted to a system. However, the cost of a Macintosh II system combined with such a card approached that of established RISC workstations running Unix.
The AT-Super was priced at around $4,600 and was reported as running Unix, competing with similar products employing Intel's i860 processor.
One notable product utilising the 29k was Apple's ''Macintosh Display Card 8·24 GC'' for its
Macintosh IIfx, featuring a 30 MHz Am29000 processor, 64 KB static RAM cache, and 2 MB of video RAM, with the option of an additional 2 MB of dynamic RAM for use by the QuickDraw graphical toolkit. The inclusion of the 29k differentiated this particular version of the card from other versions sold by Apple, significantly improving performance when handling 24-bits-per-pixel images.
See also
*
List of AMD Am2900 and Am29000 families
References
External links
AMD 29k (Streamlined Instruction Processor) ID Guide* pdf book about 29k family
Images of different Am29000 processors
{{Microcontrollers
Microcontrollers
Am29000
Superscalar microprocessors
32-bit microprocessors