TeraScale is the codename for a family of

graphics processing unit A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, m ...

microarchitecture In computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as µarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular processor. A given ISA may be imp ...

s developed by

ATI Technologies ATI Technologies Inc. (commonly called ATI) was a Canadian semiconductor technology corporation based in Markham, Ontario, that specialized in the development of graphics processing units and chipsets. Founded in 1985 as Array Technology Inc., ...

/ AMD and their second

implementing the unified shader model following '' Xenos''. TeraScale replaced the old fixed-pipeline microarchitectures and competed directly with Nvidia's first unified shader microarchitecture named Tesla. TeraScale was used in HD 2000 manufactured in 80 nm and 65 nm, HD 3000 manufactured in 65 nm and 55 nm, HD 4000 manufactured in 55 nm and 40 nm, HD 5000 and HD 6000 manufactured in 40 nm. TeraScale was also used in the

AMD Accelerated Processing Unit AMD Accelerated Processing Unit (APU), formerly known as Fusion, is a series of 64-bit microprocessors from Advanced Micro Devices (AMD), combining a general-purpose AMD64 central processing unit (CPU) and integrated graphics processing un ...

s code-named "Brazos", "Llano", "Trinity" and "Richland". TeraScale is even found in some of the succeeding graphics cards brands. TeraScale is a VLIW

SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it shoul ...

architecture, while Tesla is a

RISC In computer engineering, a reduced instruction set computer (RISC) is a computer designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a complex instruction set comp ...

SIMD architecture, similar to TeraScale's successor

Graphics Core Next Graphics Core Next (GCN) is the codename for a series of microarchitectures and an instruction set architecture that were developed by AMD for its GPUs as the successor to its TeraScale microarchitecture. The first product featuring GCN was la ...

. TeraScale implements

HyperZ HyperZ is the brand for a set of processing techniques developed by ATI Technologies and later Advanced Micro Devices and implemented in their Radeon-GPUs. HyperZ was announced in November 2000 and was still available in the TeraScale-based Rad ...

. An LLVM code generator (i.e. a compiler back-end) is available for TeraScale, but it seems to be missing in LLVM's matrix. E.g. Mesa 3D makes use of it.

TeraScale 1 (VLIW)

At SIGGRAPH 08 in December 2008 AMD employee Mike Houston described some of the TeraScale microarchitecture. At FOSDEM09 Matthias Hopf from AMDs technology partner SUSE Linux presented a slide regarding the programming of open-source driver for the R600.

Unified shaders

Previous GPU architectures implemented fixed-pipelines, i.e. there were distinct shader processors for each type of

shader In computer graphics, a shader is a computer program that calculates the appropriate levels of light, darkness, and color during the rendering of a 3D scene - a process known as ''shading''. Shaders have evolved to perform a variety of speci ...

. TeraScale leverages many flexible shader processors which can be scheduled to process a variety of shader types, thereby significantly increasing GPU throughput (dependent on application instruction mix as noted below). The R600 core processes vertex, geometry, and pixel shaders as outlined by the

Direct3D Direct3D is a graphics application programming interface (API) for Microsoft Windows. Part of DirectX, Direct3D is used to render three-dimensional graphics in applications where performance is important, such as games. Direct3D uses hardware ...

10.0 specification for Shader Model 4.0 in addition to full

OpenGL OpenGL (Open Graphics Library) is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve hardwa ...

3.0 support.AMD OpenGL 3.0 driver release on Jan 28, 2009
/ref> The new unified shader functionality is based upon a

very long instruction word Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units (CPU, processor) mostly allow programs to specify instructions to exe ...

(VLIW) architecture in which the core executes operations in parallel.Wasson, Scott
AMD Radeon HD 2900 XT graphics processor: R600 revealed
Tech Report, May 14, 2007 A shader cluster is organized into 5 stream processing units. Each stream processing unit can retire a finished single precision floating point MAD (or ADD or MUL) instruction per clock, dot product (DP, and special cased by combining ALUs), and integer ADD.Beyond3D review: AMD R600 Architecture and GPU Analysis
retrieved June 2, 2007. The 5th unit is more complex and can additionally handle special

transcendental function In mathematics, a transcendental function is an analytic function that does not satisfy a polynomial equation, in contrast to an algebraic function. In other words, a transcendental function "transcends" algebra in that it cannot be expressed ...

s such as

sine In mathematics, sine and cosine are trigonometric functions of an angle. The sine and cosine of an acute angle are defined in the context of a right triangle: for the specified angle, its sine is the ratio of the length of the side that is opp ...

and cosine. Each shader cluster can execute 6 instructions per clock cycle (peak), consisting of 5 shading instructions plus 1 branch. Notably, the VLIW architecture brings with it some classic challenges inherent to VLIW designs, namely that of maintaining optimal instruction flow. Additionally, the chip cannot co-issue instructions when one is dependent on the results of the other. Performance of the GPU is highly dependent on the mixture of instructions being used by the application and how well the real-time compiler in the driver can organize said instructions. R600 core includes 64 shader clusters, while RV610 and RV630 cores have 8 and 24 shader clusters respectively.

Hardware tessellation

TeraScale includes multiple units capable of carrying out

tessellation A tessellation or tiling is the covering of a surface, often a plane, using one or more geometric shapes, called ''tiles'', with no overlaps and no gaps. In mathematics, tessellation can be generalized to higher dimensions and a variety of ...

. Those are similar to the programmable units of the Xenos GPU which is used in the Xbox 360. Tessellation was officially specified in the major API's starting with DirectX 11 and OpenGL 4. TeraScale 1 based GPU's (HD 2000, 3000 and 4000 series) are only conformant to Direct3D 10 and OpenGL 3.3 and implements therefore a different tessellation principle which uses vendor specific API extensions. The TeraScale 2 based GPU's (starting with the Radeon HD 5000 series) were the first to conform with both Direct3D 11 and OpenGL 4.0 tesselation technique. Although the TeraScale 1 tessellator is simpler in design, it is described by AMD as a subset of the later tesselation standard. The TeraScale tessellator units allow the developers to take a simple polygon mesh and subdivide it using a curved surface evaluation function. There are different tessellation forms, such as Bézier surfaces with N-patches,

B-spline In the mathematical subfield of numerical analysis, a B-spline or basis spline is a spline function that has minimal support with respect to a given degree, smoothness, and domain partition. Any spline function of given degree can be expresse ...

s and NURBS, and also some subdivision techniques of the surface, which usually includes

displacement map Displacement mapping is an alternative computer graphics technique in contrast to bump, normal, and parallax mapping, using a texture or height map to cause an effect where the actual geometric position of points over the textured surface are ...

some kind of a texture.ExtremeTech review
/ref> Essentially, this allows a simple, low-polygon model to be increased dramatically in polygon density in real-time with very small impact on the performance. Scott Wasson of Tech Report noted during an AMD demo that the resulting model was so dense with millions of polygons that it appeared to be solid. The TeraScale tessellator is reminiscent of ''

ATI TruForm ATI TruForm was a brand by ATI (now AMD) for a SIP block capable of doing a graphics procedure called tessellation in computer hardware. ATI TruForm was included into Radeon 8500 (available from August 2001 on) and newer products. The successor ...

'', the brand name of an early hardware tessellation unit used initially in the Radeon 8500. ''

'' received little attention from software developers. A few games (such as

Madden NFL 2004 ''Madden NFL 2004'' is the 15th installment of the ''Madden NFL'' series of American football video games. Former Atlanta Falcons quarterback Michael Vick is on the cover. Gameplay New features in ''Madden 2004'' include a new owner mode option ...

Serious Sam ''Serious Sam'' is a video game series created and primarily developed by Croteam. It consists predominantly of first-person shooters. The series follows the advances of mercenary Sam "Serious" Stone against Mental, an extraterrestrial overlo ...

, Unreal Tournament 2003 and

2004 2004 was designated as an International Year of Rice by the United Nations, and the International Year to Commemorate the Struggle Against Slavery and its Abolition (by UNESCO). Events January * January 3 – Flash Airlines Flight ...

, and unofficially Morrowind), had the support for the ATI's tesselation technology included. Such a slow adaptation has to do with the fact that it was not a feature shared with NVIDIA GPUs, since those had implemented a competing tessellation solution using Quintic-RT patches which had achieved even less support from the major game developers. Since the Xbox 360's GPU is based on the ATI's architecture, Microsoft saw the hardware-accelerated surface tessellation as a major GPU feature. A couple of years later the tesselation feature became mandatory with the release of the DirectX 11 in 2009. While the tessellation principle introduced with TeraScale was not part of the OpenGL 3.3 or Direct3D 10.0 requirements, and competitors such as the GeForce 8 series lacked similar hardware, Microsoft has added the tessellation feature as part of their DirectX 10.1 future plans.The Future of DirectX
presentation, slide 24-29 Finally, Microsoft introduced tessellation as a required capability not with DirectX 10.1 but DirectX 11. GCN geometric processor is AMD's (which acquired the ATI's GPU business) most current solution for carrying out tessellation using the GPU.

Ultra-threaded dispatch processor

Although the R600 is a significant departure from previous designs, it still shares many features with its predecessor, the Radeon R520. The ''Ultra-Threaded Dispatch Processor'' is a major architectural component of the R600 core, just as it was with the Radeon X1000 GPUs. This processor manages a large number of in-flight threads of three distinct types (vertex, geometry, and pixel shaders) and switches amongst them as needed. With a large number of threads being managed simultaneously it is possible to reorganize thread order to optimally utilize the shaders. In other words, the dispatch processor evaluates what goes in the other parts of the R600 and attempts to keep processing efficiency as high as possible. There are lower levels of management as well; each SIMD array of 80 stream processors has its own sequencer and arbiter. The arbiter decides which thread to process next, while the sequencer attempts to reorder instructions for best possible performance within each thread.

Texturing and anti-aliasing

Texturing and final output aboard the R600 core is similar but also distinct from R580. R600 is equipped with 4 texture units that are decoupled (independent) from the shader core, like in the R520 and R580 GPUs. The render output units (ROPs) of Radeon HD 2000 series now performs the task of Multisample anti-aliasing (MSAA) with programmable sample grids and maximum of 8 sample points, instead of using pixel shaders as in the

Radeon X1000 series The R520 (codenamed Fudo) is a graphics processing unit (GPU) developed by ATI Technologies and produced by TSMC. It was the first GPU produced using a 90 nm photolithography process. The R520 is the foundation for a line of DirectX 9.0 ...

. Also new is the capability to filter FP16 textures, popular with HDR lighting, at full-speed. ROP can also perform trilinear and

anisotropic filtering In 3D computer graphics, anisotropic filtering (abbreviated AF) is a method of enhancing the image quality of textures on surfaces of computer graphics that are at oblique viewing angles with respect to the camera where the projection of the t ...

on all texture formats. On R600, this totals 16 pixels per clock for FP16 textures, while higher precision FP32 textures filter at half-speed (8 pixels per clock). Anti-aliasing capabilities are more robust on R600 than on the R520 series. In addition to the ability to perform 8× MSAA, up from 6× MSAA on the R300 through R580, R600 has a new ''custom filter anti-aliasing'' (CFAA) mode. CFAA refers to an implementation of non-box filters that look at pixels around the particular pixel being processed in order to calculate the final color and anti-alias the image. CFAA is performed by shader, instead of in the ROPs. This brings greatly enhanced programmability because the filters can be customized, but may also bring potential performance issues because of the use of shader resources. As of launch of R600, CFAA utilizes wide and narrow tent filters. With these, samples from outside the pixel being processed are weighted

linear Linearity is the property of a mathematical relationship ('' function'') that can be graphically represented as a straight line. Linearity is closely related to '' proportionality''. Examples in physics include rectilinear motion, the linear ...

ly based upon their distance from the

centroid In mathematics and physics, the centroid, also known as geometric center or center of figure, of a plane figure or solid figure is the arithmetic mean position of all the points in the surface of the figure. The same definition extends to any ...

of that pixel, with the linear function adjusted based on the wide or narrow filter chosen.

Memory controllers

Memory controllers are connected via internal bi-directional ring bus wrapped around the processor. In Radeon HD 2900, it is a 1,024-bit bi-directional ring bus (512-bit read and 512-bit write), with 8 64-bit memory channels for a total bus width of 512-bits on the 2900 XT.; in Radeon HD 3800, it is a 512-bit ring bus; in Radeon HD 2600 and HD 3600, it is a 256-bit ring bus; In Radeon HD 2400 and HD 3400, there is no ring bus.

Half-generation update

The series saw a half-generation update with

die shrink The term die shrink (sometimes optical shrink or process shrink) refers to the scaling of metal-oxide-semiconductor (MOS) devices. The act of shrinking a die is to create a somewhat identical circuit using a more advanced fabrication process, u ...

(55 nm) variants: RV670, RV635 and RV620. All variants support PCI Express 2.0,

DirectX Microsoft DirectX is a collection of application programming interfaces (APIs) for handling tasks related to multimedia, especially game programming and video, on Microsoft platforms. Originally, the names of these APIs all began with "Direct" ...

10.1 with Shader Model 4.1 features, dedicated ATI

Unified Video Decoder Unified Video Decoder (UVD, previously called Universal Video Decoder) is the name given to AMD's dedicated video decoding ASIC. There are multiple versions implementing a multitude of video codecs, such as H.264 and VC-1. UVD was introduced wit ...

(UVD) for all models and PowerPlay technology for desktop video cards. Except the Radeon HD 3800 series, all variants supported 2 integrated DisplayPort outputs, supporting 24- and 30-bit displays for resolutions up to 2,560×1,600. Each output included 1, 2, or 4 lanes per output, with data rate up to 2.7 Gbit/s per lane. ATI claimed that the support of DirectX 10.1 can bring improved performance and processing efficiency with reduced rounding error (0.5

ULP ULP may refer to: Science and technology * Unit in the last place in computer science * File extension for CadSoft/Autodesk EAGLE User Language Program Organisations * ''Université Louis Pasteur'', Strasbourg, France * Former United Labour Par ...

compared with average error 1.0 ULP as tolerable error), better image details and quality,

global illumination Global illumination (GI), or indirect illumination, is a group of algorithms used in 3D computer graphics that are meant to add more realistic lighting to 3D scenes. Such algorithms take into account not only the light that comes directly from ...

(a technique used in animated films, and more improvements to consumer gaming systems therefore giving more realistic gaming experience. )

Video cards

* Radeon HD 2000 series * Radeon HD 3000 series *

Radeon HD 4000 series The Radeon R700 is the engineering codename for a graphics processing unit series developed by Advanced Micro Devices under the ATI Technologies, ATI brand name. The foundation chip, codenamed ''RV770'', was announced and demonstrated on June 16, ...

(see list of chips in those pages)

TeraScale 2 (VLIW5)

TeraScale 2 (VLIW5) was introduced with Radeon HD 5000 Series GPUs in "Evergreen" generation. At HPG10 Mark Fowler presented the "Evergreen" and stated that e.g. 5870 (Cypress), 5770 (Juniper) and 5670 (Redwood) support max resolution of the 6 times 2560×1600 pixels, while the 5470 (Cedar) supports 4 times 2560×1600 pixels, important for

AMD Eyefinity AMD Eyefinity is a brand name for AMD video card products that support multi-monitor setups by integrating multiple (up to six) display controllers on one GPU. AMD Eyefinity was introduced with the Radeon HD 5000 Series "Evergreen" in Septemb ...

multi-monitor support. With the release of ''Cypress'', the ''Terascale graphics engine'' architecture has been upgraded with twice the number of stream cores, texture units and ROP units compared to the RV770. The architecture of stream cores is largely unchanged, but adds support for DirectX 11/ DirectCompute 11 capabilities with new instructions.DirectX 11 in the Open: ATI Radeon HD 5870 Review
Also similar to RV770, four texture units are tied to 16 stream cores (each have five processing elements, making a total of 80 processing elements). This combination of is referred to as a ''SIMD core''. Unlike the predecessor Radeon R700, as DirectX 11 mandates full developer control over interpolation, dedicated interpolators were removed, relying instead on the SIMD cores. The stream cores can handle the higher rounding precision

fused multiply–add Fuse or FUSE may refer to: Devices * Fuse (electrical), a device used in electrical systems to protect against excessive current ** Fuse (automotive), a class of fuses for vehicles * Fuse (hydraulic), a device used in hydraulic systems to prot ...

(FMA) instruction in both single and double precision which increases precision over multiply–add (MAD) and is compliant to

IEEE 754-2008 The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operation ...

standard.Report: AMD Radeon HD 5870 and 5850
/ref> The instruction sum of absolute differences (SAD) has been natively added to the processors. This instruction can be used to greatly improve the performance of some processes, such as video encoding and transcoding on the 3D engine. Each SIMD core is equipped with 32 KiB local data share and 8 kiB of L1 cache, while all SIMD cores share 64 KiB global data share.

Memory controller

Each memory controller ties to two quad

ROPs Rops may refer to: People * Daniel-Rops (1901–1965), French writer and historian * Félicien Rops (1833–1898), Belgian artist Places * Rops (peak), a mountain in Kosovo Sports * Rovaniemen Palloseura (RoPS), a Finnish football club T ...

, one per 64-bit channel, and dedicated 512 KiB L2 cache.

Power saving

AMD PowerPlay is supported, see there.

Chips

* Evergreen chips: ** Cedar RV810 ** Cypress RV870 ** Hemlock R800 ** Juniper RV840 ** Redwood RV830 * Northern Islands chips: ** Barts RV940 ** Caicos RV910 ** Turks RV930 * APU that include a TeraScale 2 IGP: ** Llano ** Ontario ** Zacate

TeraScale 3 (VLIW4)

TeraScale 3 (VLIW4) replaces the previous 5-way VLIW designs with a 4-way VLIW design. The new design also incorporates an additional tessellation unit to improve Direct3D 11 performance. TeraScale 3 is introduced in the Radeon HD 6900-branded graphics cards and also implemented in the Trinity and Richland APUs.

Power saving

AMD PowerTune, dynamic frequency scaling for GPUs, was introduced with the Radeon HD 6900 series on December 15, 2010 and has seen continued development, as documented in some reviews by

AnandTech ''AnandTech'' is an online computer hardware magazine owned by Future plc. It was founded in 1997 by then-14-year-old Anand Lal Shimpi, who served as CEO and editor-in-chief until August 30, 2014, with Ryan Smith replacing him as editor-in-chief ...

Chips

* Northern Islands chips: ** Cayman RV970 ** Antilles R900 ** Trinity and Richland include a TeraScale 3 IGP

Successor

At HPG11 in August 2011 AMD employees Michael Mantor (Senior Fellow Architect) and Mike Houston (Fellow Architect) presented

, the microarchitecture succeeding TeraScale.

Comparison of TeraScale chips

¹ Duo chips such as R680 (2x RV670) and R700 (2x RV770) are not listed.

References

{{AMD graphics AMD microarchitectures GPGPU Radeon TeraScale Parallel computing Very long instruction word computing