RDNA 3 is a GPU microarchitecture designed by

AMD Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a hardware and fabless company that de ...

, released with the Radeon RX 7000 series on December 13, 2022. Alongside powering the RX 7000 series, RDNA 3 is also featured in the SoCs designed by AMD for the Asus ROG Ally, Lenovo Legion Go, and the PlayStation 5 Pro consoles.

Background

On June 9, 2022, AMD held their Financial Analyst Day where they presented a client GPU roadmap which contained mention of RDNA 3 coming in 2022 and RDNA 4 coming in 2024. AMD announced to investors their intention to achieve a performance-per-watt uplift of over 50% with RDNA 3 and that the upcoming architecture would be built using chiplet packaging on a 5 nm process. A sneak preview for RDNA 3 was included towards the end of AMD's Ryzen 7000 unveiling event on August 29, 2022. The preview included RDNA 3 running gameplay of '' Lies of P'', AMD CEO Lisa Su confirming that a chiplet design would be used, and a partial look at AMD's reference design for an RDNA 3 GPU. Full details for the RDNA 3 architecture were unveiled on November 3, 2022 at an event in

Las Vegas Las Vegas, colloquially referred to as Vegas, is the most populous city in the U.S. state of Nevada and the county seat of Clark County. The Las Vegas Valley metropolitan area is the largest within the greater Mojave Desert, and second-l ...

Architecture

Chiplet packaging

For the first time ever in a consumer GPU, RDNA 3 utilizes modular chiplets rather than a single large monolithic die. AMD previously had great success with its use of chiplets in its

Ryzen Ryzen ( ) is a brand of multi-core x86-64 microprocessors, designed and marketed by AMD for desktop, mobile, server, and embedded platforms, based on the Zen microarchitecture. It consists of central processing units (CPUs) marketed for mai ...

desktop and

Epyc Epyc (stylized as EPYC) is a brand of multi-core x86-64 microprocessors designed and sold by AMD, based on the company's Zen microarchitecture. Introduced in June 2017, they are specifically targeted for the server and embedded system market ...

server processors. The decision to move to a chiplet-based GPU microarchitecture was led by AMD Senior Vice President Sam Naffziger who had also lead the chiplet initiative with Ryzen and Epyc. The development of RDNA 3's chiplet architecture began towards the end of 2017 with Naffziger leading the AMD graphics team in the effort. The benefit of using chiplets is that dies can be fabricated on different process nodes depending on their functions and intended purpose. According to Naffziger, cache and SRAM do not scale as linearly as logic does on advanced nodes like N5 in terms of density and power consumption so they can instead be fabricated on the cheaper, more mature N6 node. The use of smaller dies rather than one large monolithic die is beneficial for maximizing wafer yields as more dies can be fitted onto a single wafer. Alternatively, a large monolithic RDNA 3 die built on N5 would be more expensive to produce with lower yields. RDNA 3 uses two types of chiplets: the Graphics Compute Die (GCD) and Memory Cache Dies (MCDs). On Ryzen and Epyc processors, AMD used its

PCIe PCI Express (Peripheral Component Interconnect Express), officially abbreviated as PCIe, is a high-speed standard used to connect hardware components inside computers. It is designed to replace older expansion bus standards such as Peripher ...

-based Infinity Fabric protocol with the package's dies connected via traces on an organic substrate. This approach is easily scalable in a cost-effective manner but has the drawbacks of increased latency, increased power consumption when moving data between dies at around 1.5 picojoules per bit, and it cannot achieve the connection density needed for high-bandwidth GPUs. An organic package could not host the number of wires that would be needed to connect multiple dies in a GPU. RDNA 3's dies are instead connected using

TSMC Taiwan Semiconductor Manufacturing Company Limited (TSMC or Taiwan Semiconductor) is a Taiwanese multinational semiconductor contract manufacturing and design company. It is one of the world's most valuable semiconductor companies, the world' ...

's Integrated Fan-Out Re-Distribution Layer (InFO-RDL) packaging technique which provides a silicon bridge for high bandwidth and high density die-to-die communication. InFO allows dies to be connected without the use of a more costly silicon interposer such as the one used in AMD's Instinct MI200 and MI300 datacenter accelerators. Each Infinity Fanout link has 9.2 Gbps in bandwidth. Naffziger explains that "The bandwidth density that we achieve is almost 10x" with the Infinity Fanout rather than the wires used by Ryzen and Epyc processors. The chiplet interconnects in RDNA achieve cumulative bandwidth of 5.3TB/s.

Memory Cache Dies (MCDs)

With a respective 2.05 billion transistors, each Memory Cache Die (MCD) contains 16MB of L3 cache. Theoretically, additional L3 cache could be added to the MCDs via AMD's 3D V-Cache die stacking technology as the MCDs contain unused TSV connection points. Also present on each MCD are two physical 32-bit GDDR6 memory interfaces for a combined 64-bit interface per MCD. The Radeon RX 7900 XTX has a 384-bit memory bus through the use of six MCDs while the RX 7900 XT has a 320-bit bus due to its five MCDs.

Graphics Compute Die (GCD)

Compute Units

RDNA 3's Compute Units (CUs) for graphics processing are organized in dual CU Work Group Processors (WGPs). Rather than including a very large number of WGPs in RDNA 3 GPUs, AMD instead focused on improving per-WGP throughput. This is done with improved dual-issue shader ALUs with the ability to execute two instructions per cycle. It can contain up to 96 graphics Compute Units that can provide up to 61 TFLOPS of compute. While RDNA 3 doesn't include dedicated execution units for AI acceleration like the Matrix Cores found in AMD's compute-focused

CDNA In genetics, complementary DNA (cDNA) is DNA that was reverse transcribed (via reverse transcriptase) from an RNA (e.g., messenger RNA or microRNA). cDNA exists in both single-stranded and double-stranded forms and in both natural and engin ...

architectures, the efficiency of running inference tasks on

FP16 In computing, half precision (sometimes called FP16 or float16) is a binary floating-point computer number format that occupies 16 bits (two bytes in modern computers) in computer memory. It is intended for storage of floating-point values in a ...

execution resources is improved with Wave MMA (

matrix Matrix (: matrices or matrixes) or MATRIX may refer to: Science and mathematics * Matrix (mathematics), a rectangular array of numbers, symbols or expressions * Matrix (logic), part of a formula in prenex normal form * Matrix (biology), the m ...

multiply–accumulate) instructions. This results in increased inference performance compared to RDNA 2. WMMA supports FP16, BF16, INT8, and INT4 data types. ''

Tom's Hardware ''Tom's Hardware'' is an online publication owned by Future plc and focused on technology. It was founded in 1996 by Thomas Pabst. It provides articles, news, price comparisons, videos and reviews on computer hardware and high technology. The s ...

'' found that AMD's fastest RDNA 3 GPU, the RX 7900 XTX, was capable of generating 26 images per minute with

Stable Diffusion Stable Diffusion is a deep learning, text-to-image model released in 2022 based on Diffusion model, diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of ...

, compared to only 6.6 images per minute of the RX 6950 XT, the fastest RDNA 2 GPU.

Ray tracing

RDNA 3 features second generation ray-tracing accelerators. Each Compute Unit contains one ray tracing accelerator. The overall number of ray tracing accelerators is increased due to the higher number of Compute Units, though the number of ray tracing accelerators per Compute Unit has not increased over RDNA 2.

Clock speeds

RDNA 3 was designed to support high clock speeds. On RDNA 3, clock speeds have been decoupled with the front end operating at a 2.5GHz frequency while the shaders operate at 2.3GHz. The shaders operating at a lower clock speed gives up to 25% power savings according to AMD and RDNA 3's shader clock speed is still 15% faster than RDNA 2.

Cache and memory subsystem

RDNA 3 increased the capacity of L1 and L2 caches. The 16-way associative L1 cache shared across a shader array is doubled in RDNA 3 to 256KB. The L2 cache increased from 4MB on RDNA 2 to 6MB on RDNA 3. The L3 Infinity Cache has been lowered in capacity from 128MB to 96MB and latency has increased as it is physically present on the MCDs rather than being closer to the WGPs within the GCD. The Infinity Cache capacity was decreased due to RDNA 3 having wider a memory interface up to 384-bit whereas RDNA 2 used memory interfaces up to 256-bit. RDNA 3 having a wider 384-bit memory means that its cache hitrate does not have to be as high to still avoid bandwidth bottlenecks as there is higher memory bandwidth. RDNA 3 GPUs use

GDDR6 Graphics Double Data Rate 6 Synchronous Dynamic Random-Access Memory (GDDR6 SDRAM) is a type of synchronous graphics random-access memory (SGRAM) with a high bandwidth, "double data rate" interface, designed for use in graphics cards, game con ...

memory rather than faster

GDDR6X Graphics Double Data Rate 6 Synchronous Dynamic Random-Access Memory (GDDR6 SDRAM) is a type of Synchronous dynamic random-access memory#Synchronous graphics RAM .28SGRAM.29, synchronous graphics random-access memory (SGRAM) with a high Bandwidth ...

due to the latter's increased power consumption.

Media engine

RDNA 3 is the first RDNA architecture to have a dedicated media engine. It is built into the GCD and is based on VCN 4.0 encoding and decoding core. AMD's AMF AV1 encoder is comparable in quality to Nvidia's NVENC AV1 encoder but can handle a higher number of simultaneous encoding streams compared to the limit of 3 on the GeForce RTX 40 series.

Display engine

RDNA 3 GPUs feature a new display engine called the "Radiance Display Engine". AMD touted its support for DisplayPort 2.1 UHBR 13.5, delivering up to 54Gbps bandwidth for high refresh rates at 4K and 8K resolutions. The Radeon Pro W7900 and W7800 support the 80Gbps UHBR20 standard. DisplayPort 2.1 can support 4K at 480Hz and 8K at 165Hz with

Display Stream Compression Display Stream Compression (DSC) is a VESA-developed video compression algorithm designed to enable increased display resolutions and frame rates over existing physical interfaces, and make devices smaller and lighter, with longer battery life. ...

(DSC). The previous DisplayPort 1.4 standard with DSC was limited to 4K at 240Hz and 8K at 60Hz.

Power efficiency

AMD claims that RDNA 3 achieves a 54% increase in performance-per-watt which is in line with their previous claims of 50% performance-per-watt increases for both RDNA and RDNA 2.

Navi 3x dies

Products

Gaming

Desktop

Mobile

Workstation

Desktop workstation

Integrated graphics processing units (iGPUs)

References

{{AMD graphics AMD microarchitectures Computer-related introductions in 2022 Graphics microarchitectures