HOME

TheInfoList



OR:

The ARM Cortex-A78 is a
central processing unit A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary Processor (computing), processor in a given computer. Its electronic circuitry executes Instruction (computing), instructions ...
implementing the ARMv8.2-A 64-bit
instruction set In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, s ...


Design

The ARM Cortex-A78 is the successor to the
ARM Cortex-A77 The ARM Cortex-A77 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Holdings' Austin design centre. Released in 2019, ARM claimed an increase of 23% and 35% in integer and floating point performance ...
. It can be paired with the
ARM Cortex-X1 The ARM Cortex-X1 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Holdings' Austin design centre as part of ARM's Cortex-X Custom (CXC) program. Design The Cortex-X1 design is based on the ARM ...
and/or
ARM Cortex-A55 The ARM Cortex-A55 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Holdings' Cambridge design centre. The Cortex-A55 is a 2-wide decode in-order superscalar pipeline. Design The Cortex-A55 serves ...
CPUs in a
DynamIQ ARM big.LITTLE is a heterogeneous computing architecture developed by Arm Holdings, coupling relatively battery-saving and slower processor cores (''LITTLE'') with relatively more powerful and power-hungry ones (''big''). The intention is to cr ...
configuration to deliver both performance and efficiency. The processor also claims as much as 50% energy savings over its predecessor. The Cortex-A78 is a 4-wide decode out-of-order
superscalar A superscalar processor (or multiple-issue processor) is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single in ...
design with a 1.5K macro-OP (MOPs) cache. It can fetch 4 instructions and 6 Mops per cycle, and rename and dispatch 6 Mops, and 12 μops per cycle. The out-of-order window size is 160 entries and the backend has 13 execution ports with a pipeline depth of 14 stages, and the execution latencies consist of 10 stages. The processor is built on a standard Cortex-A roadmap and offers a 2.1 GHz (
5 nm In semiconductor manufacturing Semiconductor device fabrication is the process used to manufacture semiconductor devices, typically integrated circuits (ICs) such as microprocessors, microcontrollers, and memories (such as Random-access memor ...
) chipset which makes it better than its predecessor in the following ways: * 7% better performance * 4% lower power consumption * 5% smaller, meaning 15% more area serving for a quad-core cluster, extra
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal ...
, NPU There is also extended scalability with extra support from Dynamic Shared Unit for
DynamIQ ARM big.LITTLE is a heterogeneous computing architecture developed by Arm Holdings, coupling relatively battery-saving and slower processor cores (''LITTLE'') with relatively more powerful and power-hungry ones (''big''). The intention is to cr ...
on the chipset. A smaller 32 KB L1
cache Cache, caching, or caché may refer to: Science and technology * Cache (computing), a technique used in computer storage for easier data access * Cache (biology) or hoarding, a food storing behavior of animals * Cache (archaeology), artifacts p ...
from the 64 KB L1 cache configuration is optional. To offset this smaller L1 memory, the
branch predictor In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch (e.g., an if–then–else structure) will go before this is known definitively. The purpose of the branch predictor is to improve the flow ...
is better at covering irregular search patterns and is capable of following two taken branches per cycle, which results in fewer L1 cache misses and helps hide pipeline bubbles to keep the core well supplied. The pipeline is one cycle longer compared to the A77, which ensures that the A78 hits a
clock frequency Clock rate or clock speed in computing typically refers to the frequency at which the clock generator of a Microprocessor, processor can generate Clock signal, pulses used to Synchronization (computer science), synchronize the operations of it ...
target of around 3 GHz. The A78 is a 6 instruction per cycle design. ARM also introduced a second integer multiply unit in the execution unit and an additional load Address Generation Unit (AGU) to increase both the data load and bandwidth by 50%. Other optimizations of the chipset include fused instructions and efficiency improvements to instruction schedulers,
register renaming In computer architecture, register renaming is a technique that abstracts logical processor register, registers from physical registers. Every logical register has a set of physical registers associated with it. When a machine language instructio ...
structures, and the
re-order buffer A re-order buffer (ROB) is a hardware unit used in an extension to Tomasulo's algorithm to support out-of-order and speculative instruction execution. The extension forces instructions to be committed in-order. The buffer is a circular buffer ...
. L2 cache is available up to 512 KB and has double the bandwidth to maximize the performance, while the shared L3 cache is available up to 4 MB, double that of previous generations. A Dynamic Shared Unit (DSU) also allows for an 8 MB configuration with the
ARM Cortex-X1 The ARM Cortex-X1 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Holdings' Austin design centre as part of ARM's Cortex-X Custom (CXC) program. Design The Cortex-X1 design is based on the ARM ...
.


Variants


Cortex-A78C

The Cortex-A78C is targeted for productivity and gaming applications, it increases the max core support from 4 to 8 cores and from 4MB to 8MB of L3 cache.


Cortex-A78AE

The Cortex-A78AE is targeted for security/safety applications.


Licensing

The Cortex-A78 is available as a SIP core to licensees whilst its design makes it suitable for integration with other SIP cores (e.g.
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal ...
,
display controller A video display controller (VDC), also called a display engine or display interface, is an integrated circuit which is the main component in a video-signal generator, a device responsible for the production of a TV video signal in a computing ...
,
DSP DSP may refer to: Computing * Digital signal processing, the mathematical manipulation of an information signal * Digital signal processor, a microprocessor designed for digital signal processing * Dynamic Reconfiguration port * Yamaha DSP-1 ...
,
image processor An image processor, also known as an image processing engine, image processing unit (IPU), or image signal processor (ISP), is a type of media processor or specialized digital signal processor (DSP) used for image processing, in digital cameras o ...
, etc.) into one
die Die, as a verb, refers to death, the cessation of life. Die may also refer to: Games * Die, singular of dice, small throwable objects used for producing random numbers Manufacturing * Die (integrated circuit), a rectangular piece of a semicondu ...
constituting a
system on a chip A system on a chip (SoC) is an integrated circuit that combines most or all key components of a computer or Electronics, electronic system onto a single microchip. Typically, an SoC includes a central processing unit (CPU) with computer memory, ...
(SoC).


Usage

The Cortex-A78 was first used in Samsung
Exynos The Samsung Exynos (stylized as SΛMSUNG Exynos), formerly Hummingbird (), is a series of ARM architecture, Arm-based System on a chip, system-on-chips developed by Samsung Electronics' System LSI division and manufactured by Samsung Foundry. I ...
2100 SoC, introduced in November and December 2020 respectively. The custom Kryo 680 Gold core used in the Snapdragon 888 SoC is based on the Cortex-A78 microarchitecture. The Cortex-A78 is also used in the
MediaTek MediaTek Inc. (), sometimes informally abbreviated as MTK, is a Taiwanese fabless semiconductor company that designs and manufactures a range of semiconductor products, providing chips for wireless communications, high-definition television, h ...
Dimensity 1200 and 8000 series. The device is also used in
Nvidia Nvidia Corporation ( ) is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. Founded in 1993 by Jensen Huang (president and CEO), Chris Malachowsky, and Curti ...
's BlueField-3 and 3X DPUs, and in the HiSilicon Kirin 9000s, released in August 2023.


See also

*
ARM Cortex-X1 The ARM Cortex-X1 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Holdings' Austin design centre as part of ARM's Cortex-X Custom (CXC) program. Design The Cortex-X1 design is based on the ARM ...
, related high-performance microarchitecture *
ARM Cortex-A77 The ARM Cortex-A77 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Holdings' Austin design centre. Released in 2019, ARM claimed an increase of 23% and 35% in integer and floating point performance ...
, predecessor *
Comparison of ARMv8-A cores This is a comparison of ARM instruction set architecture application processor cores designed by Arm Holdings (ARM Cortex-A) and 3rd parties. It does not include ARM Cortex-R, ARM Cortex-M, or legacy ARM cores. ARMv7-A This is a table com ...
, ARMv8 family


References

{{Application ARM-based chips ARM processors