Zero-copy
   HOME

TheInfoList



OR:

"Zero-copy" describes computer operations in which the CPU does not perform the task of copying data from one
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered ...
area to another or in which unnecessary data copies are avoided. This is frequently used to save CPU cycles and memory bandwidth in many time consuming tasks, such as when transmitting a
file File or filing may refer to: Mechanical tools and processes * File (tool), a tool used to ''remove'' fine amounts of material from a workpiece **Filing (metalworking), a material removal process in manufacturing ** Nail file, a tool used to gent ...
at high speed over a
network Network, networking and networked may refer to: Science and technology * Network theory, the study of graphs as a representation of relations between discrete objects * Network science, an academic field that studies complex networks Mathematics ...
, etc., thus improving
performances A performance is an act of staging or presenting a play, concert, or other form of entertainment. It is also defined as the action or process of carrying out or accomplishing an action, task, or function. Management science In the work place ...
of programs ( processes) executed by a computer.


Principle

Zero-copy programming techniques can be used when exchanging data within a
user space A modern computer operating system usually segregates virtual memory into user space and kernel space. Primarily, this separation serves to provide memory protection and hardware protection from malicious or errant software behaviour. Kernel ...
process (i.e. between two or more threads, etc.) and/or between two or more processes (see also producer–consumer problem) and/or when data has to be accessed / copied / moved inside kernel space or between a user space process and kernel space portions of
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also i ...
s (OS). Usually when a user space process has to execute system operations like reading or writing data from/to a device (i.e. a disk, a NIC, etc.) through their high level software interfaces or like moving data from one device to another, etc., it has to perform one or more
system call In computing, a system call (commonly abbreviated to syscall) is the programmatic way in which a computer program requests a service from the operating system on which it is executed. This may include hardware-related services (for example, acc ...
s that are then executed in kernel space by the operating system. If data has to be copied or moved from source to destination and both are located inside kernel space (i.e. two files, a file and a network card, etc.) then unnecessary data copies, from kernel space to user space and from user space to kernel space, can be avoided by using special (zero-copy) system calls, usually available in most recent versions of popular operating systems. Zero-copy versions of operating system elements, such as device drivers, file systems, network protocol stacks, etc., greatly increase the performance of certain application programs (that become processes when executed) and more efficiently utilize system resources. Performance is enhanced by allowing the CPU to move on to other tasks while data copies / processing proceed in parallel in another part of the machine. Also, zero-copy operations reduce the number of time-consuming context switches between user space and kernel space. System resources are utilized more efficiently since using a sophisticated CPU to perform extensive data copy operations, which is a relatively simple task, is wasteful if other simpler system components can do the copying. As an example, reading a file and then sending it over a network the traditional way requires 2 extra data copies (1 to read from kernel to user space + 1 to write from user to kernel space) and 4 context switches per read/write cycle. Those extra data copies use the CPU. Sending that file by using mmap of file data and a cycle of write calls, reduces the context switches to 2 per write call and avoids those previous 2 extra user data copies. Sending the same file via zero copy reduces the context switches to 2 per sendfile call and eliminates all CPU extra data copies (both in user and in kernel space). Zero-copy protocols are especially important for very high-speed networks in which the capacity of a network link approaches or exceeds the CPU's processing capacity. In such a case the CPU may spend nearly all of its time copying transferred data, and thus becomes a bottleneck which limits the communication rate to below the link's capacity. A rule of thumb used in the industry is that roughly one CPU clock cycle is needed to process one bit of incoming data.


Hardware implementations

An early implementation was IBM
OS/360 OS/360, officially known as IBM System/360 Operating System, is a discontinued batch processing operating system developed by IBM for their then-new System/360 mainframe computer, announced in 1964; it was influenced by the earlier IBSYS/IBJOB ...
where a program can instruct the channel subsystem to read blocks of data from one file or device into a
buffer Buffer may refer to: Science * Buffer gas, an inert or nonflammable gas * Buffer solution, a solution used to prevent changes in pH * Buffering agent, the weak acid or base in a buffer solution * Lysis buffer, in cell biology * Metal ion buffer * ...
and write to another from the same buffer without moving the data. Techniques for creating zero-copy software include the use of
direct memory access Direct memory access (DMA) is a feature of computer systems and allows certain hardware subsystems to access main system memory independently of the central processing unit (CPU). Without DMA, when the CPU is using programmed input/output, it is ...
(DMA)-based copying and memory-mapping through a
memory management unit A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a computer hardware unit having all memory references passed through itself, primarily performing the translation of virtual memory addresses to physical a ...
(MMU). These features require specific hardware support and usually involve particular memory alignment requirements. A newer approach used by the
Heterogeneous System Architecture Heterogeneous System Architecture (HSA) is a cross-vendor set of specifications that allow for the integration of central processing units and graphics processors on the same bus, with shared memory and tasks. The HSA is being developed by the HSA ...
(HSA) facilitates the passing of
pointers Pointer may refer to: Places * Pointer, Kentucky * Pointers, New Jersey * Pointers Airport, Wasco County, Oregon, United States * The Pointers, a pair of rocks off Antarctica People with the name * Pointer (surname), a surname (including a lis ...
between the CPU and the
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
and also other processors. This requires a unified address space for the CPU and the GPU.


Program interfaces

Several operating systems support zero-copying of user data and file contents through specific
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
s. Here are listed only a few well known system calls / APIs available in most popular OSs.
Novell NetWare NetWare is a discontinued computer network operating system developed by Novell, Inc. It initially used cooperative multitasking to run various services on a personal computer, using the IPX network protocol. The original NetWare product in ...
supports a form of zero-copy through Event Control Blocks (ECBs), see
NCOPY In digital file management, copying is a file operation that creates a new file which has the same content as an existing file. Computer operating systems include file copying methods to users, with operating systems with graphical user inter ...
. The
internal Internal may refer to: *Internality as a concept in behavioural economics *Neijia, internal styles of Chinese martial arts *Neigong or "internal skills", a type of exercise in meditation associated with Daoism *''Internal (album)'' by Safia, 2016 ...
COPY Copy may refer to: *Copying or the product of copying (including the plural "copies"); the duplication of information or an artifact **Cut, copy and paste, a method of reproducing text or other data in computing **File copying **Photocopying, a pr ...
command in some versions of DR-DOS since 1992 initiates this as well when
COMMAND.COM COMMAND.COM is the default command-line interpreter for MS-DOS, Windows 95, Windows 98 and Windows Me. In the case of DOS, it is the default user interface as well. It has an additional role as the usual first program run after boot (init proc ...
detects that the files to be copied are stored on a NetWare file server, otherwise it falls back to normal file copying. The
external External may refer to: * External (mathematics), a concept in abstract algebra * Externality In economics, an externality or external cost is an indirect cost or benefit to an uninvolved third party that arises as an effect of another party' ...
MOVE Move may refer to: People * Daniil Move (born 1985), a Russian auto racing driver Brands and enterprises * Move (company), an online real estate company * Move (electronics store), a defunct Australian electronics retailer * Daihatsu Move Go ...
command since
DR DOS 6.0 DR-DOS (written as DR DOS, without a hyphen, in versions up to and including 6.0) is a disk operating system for IBM PC compatibles. Upon its introduction in 1988, it was the first DOS attempting to be compatible with IBM PC DOS and MS-D ...
(1991) and MS-DOS 6.0 (1993) internally performs a RENAME (causing just the directory entries to be modified in the file system instead of physically copying the file data) when the source and destination are located on the same logical volume. The Linux kernel supports zero-copy through various system calls, such as: * sendfile, sendfile64; * splice; * tee; * vmsplice; * process_vm_readv; * process_vm_writev; * copy_file_range; * raw sockets with packet
mmap In computing, mmap(2) is a POSIX-compliant Unix system call that maps files or devices into memory. It is a method of memory-mapped file I/O. It implements demand paging because file contents are not immediately read from disk and initially use no ...
or AF_XDP. Some of them are specified in
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming inter ...
and thus also present in the BSD kernels or
IBM AIX AIX (Advanced Interactive eXecutive, pronounced , "ay-eye-ex") is a series of proprietary Unix operating systems developed and sold by IBM for several of its computer platforms. Background Originally released for the IBM RT PC RISC ...
, some are unique to the Linux kernel API. FreeBSD, NetBSD, OpenBSD,
DragonFly BSD DragonFly BSD is a free and open-source Unix-like operating system forked from FreeBSD 4.8. Matthew Dillon, an Amiga developer in the late 1980s and early 1990s and FreeBSD developer between 1994 and 2003, began working on DragonFly BSD in ...
, etc. support zero-copy through at least these system calls: * sendfile; * write, writev + mmap when writing data to a network socket.
MacOS macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and lapt ...
should support zero-copy through the FreeBSD portion of the kernel because it offers the same system calls (and its manual pages are still tagged BSD) such as: * sendfile.
Oracle Solaris Solaris is a proprietary Unix operating system originally developed by Sun Microsystems. After the Sun acquisition by Oracle in 2010, it was renamed Oracle Solaris. Solaris superseded the company's earlier SunOS in 1993, and became known for i ...
supports zero-copy through at least these system calls: * sendfile; * sendfilev; * write, writev + mmap. Microsoft Windows supports zero-copy through at least this system call: * TransmitFile.
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
input streams can support zero-copy through the java.nio.channels.FileChannel's transferTo() method if the underlying operating system also supports zero copy. RDMA (Remote Direct Memory Access) protocols deeply rely on zero-copy techniques.


See also

* AF XDP *
Call by reference In a programming language, an evaluation strategy is a set of rules for evaluating expressions. The term is often used to refer to the more specific notion of a ''parameter-passing strategy'' that defines the kind of value that is passed to the f ...
* Device driver *
Embedded system An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' ...
*
Infiniband InfiniBand (IB) is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also use ...
*
Locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
*
NCOPY In digital file management, copying is a file operation that creates a new file which has the same content as an existing file. Computer operating systems include file copying methods to users, with operating systems with graphical user inter ...
*
netsniff-ng netsniff-ng is a free Linux network analyzer and networking toolkit originally written by Daniel Borkmann. Its gain of performance is reached by zero-copy mechanisms for network packets (RX_RING, TX_RING), so that the Linux kernel does not need ...
*
Programmed input/output Programmed may refer to: * ''Programmed'' (Innerzone Orchestra album), 1999 * ''Programmed'' (Lethal album), 1990 See also * Program (disambiguation) Program, programme, programmer, or programming may refer to: Business and management * P ...
* Socket Direct Protocol * Scatter/gather I/O


References

{{reflist, refs= {{Cite web , url=https://www.linuxjournal.com/article/6345?page=0,0 , title=Zero Copy I: User-Mode Perspective , website=www.linuxjournal.com , language=en , date=2003-01-01 , access-date=2021-10-14 , first=Dragan , last=Stancevic {{Cite journal , title=ZeroCopy: Techniques, Benefits and Pitfalls , language=en , date=2012-01-01 , first=Eduard , last=Bröse , citeseerx=10.1.1.93.9589 {{Cite web , url=https://www.uidaho.edu/-/media/UIdaho-Responsive/Files/engr/research/csds/publications/2012/Performance-Review-of-Zero-Copy-Techniques-2012.pdf?la=en&hash=B5F37435875AAD15C55C7DFC1FDA53DBF242C0E3 , title=Performance Review of Zero Copy Techniques , website=www.uidaho.edu, language=en , date=2012-01-01 , access-date=2021-10-14 , first1=Jia , last1=Song , first2=Jim , last2=Alves-Foss {{Cite web , url=https://freebsdfoundation.org/wp-content/uploads/2020/07/TLS-Offload-in-the-Kernel.pdf , title=TLS offload in the kernel, website=freebsdfoundation.org , language=en , date=2020-05-01 , access-date=2021-10-14 , first=John , last=Baldwin {{cite web , title=The programmer's guide to the APU galaxy , url=http://developer.amd.com/afds/assets/keynotes/Phil%20Rogers%20Keynote-FINAL.pdf {{cite web , title=AMD Outlines HSA Roadmap: Unified Memory for CPU/GPU , url=http://www.anandtech.com/show/5493/amd-outlines-hsa-roadmap-unified-memory-for-cpugpu-in-2013-hsa-gpus-in-2014 , date=2012-02-02 {{Cite web , url=https://man7.org/linux/man-pages/man2/sendfile.2.html, title=sendfile(2) - Linux manual page , website=man7.org , language=en , date=2021-03-22 , access-date=2021-10-13 {{Cite web , url=https://man7.org/linux/man-pages/man2/splice.2.html, title=splice(2) - Linux manual page , website=man7.org , language=en , date=2021-03-22 , access-date=2021-10-13 {{Cite web , url=https://man7.org/linux/man-pages/man2/tee.2.html, title=tee(2) - Linux manual page , website=man7.org , language=en , date=2021-03-22 , access-date=2021-10-13 {{Cite web , url=https://man7.org/linux/man-pages/man2/vmsplice.2.html, title=vmsplice(2) - Linux manual page , website=man7.org , language=en , date=2021-03-22 , access-date=2021-10-13 {{Cite web , url=https://man7.org/linux/man-pages/man2/process_vm_readv.2.html , title=process_vm_readv(2) - Linux manual page , website=man7.org , language=en , date=2021-03-22 , access-date=2021-10-13 {{Cite web , url=https://man7.org/linux/man-pages/man2/process_vm_writev.2.html , title=process_vm_writev(2) - Linux manual page , website=man7.org , language=en , date=2021-03-22 , access-date=2021-10-13 {{Cite web , url=https://man7.org/linux/man-pages/man2/copy_file_range.2.html , title=copy_file_range(2) - Linux manual page , website=man7.org , language=en , date=2021-03-22 , access-date=2021-10-13 {{Cite web , url=https://www.kernel.org/doc/Documentation/networking/packet_mmap.txt , title=Linux PACKET_MMAP documentation , website=kernel.org {{Cite web , url=https://www.freebsd.org/cgi/man.cgi?query=sendfile&apropos=0&sektion=2&manpath=FreeBSD+13.0-RELEASE+and+Ports&arch=default&format=html , title=sendfile(2) - FreeBSD manual pages , website=www.freebsd.org , language=en , date=2020-04-30 , access-date=2021-10-13 {{Cite web , url=https://www.freebsd.org/cgi/man.cgi?query=write&apropos=0&sektion=2&manpath=FreeBSD+13.0-RELEASE+and+Ports&arch=default&format=html , title=write(2) - FreeBSD manual pages , website=www.freebsd.org , language=en , date=2020-04-30 , access-date=2021-10-13 {{Cite web , url=https://www.freebsd.org/cgi/man.cgi?query=writev&apropos=0&sektion=2&manpath=FreeBSD+13.0-RELEASE+and+Ports&arch=default&format=html , title=writev(2) - FreeBSD manual pages , website=www.freebsd.org , language=en , date=2020-04-30 , access-date=2021-10-13 {{Cite web , url=https://www.freebsd.org/cgi/man.cgi?query=mmap&apropos=0&sektion=2&manpath=FreeBSD+13.0-RELEASE+and+Ports&arch=default&format=html , title=mmap(2) - FreeBSD manual pages , website=www.freebsd.org , language=en , date=2020-04-30 , access-date=2021-10-13 {{Cite web , url=https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/sendfile.2.html , title=sendfile(2) - Mac OS X Manual Page , website=developer.apple.com , language=en , date=2006-03-31 , access-date=2021-10-13 {{Cite web , url=https://docs.oracle.com/cd/E88353_01/html/E37843/sendfile-3c.html , title=sendfile(3C) - Solaris manual pages , website=docs.oracle.com , language=en , date=2021-08-13 , access-date=2021-10-13 {{Cite web , url=https://docs.oracle.com/cd/E88353_01/html/E37843/sendfilev-3c.html , title=sendfilev(3C) - Solaris manual pages , website=docs.oracle.com , language=en , date=2021-08-13 , access-date=2021-10-13 {{Cite web , url=https://docs.oracle.com/cd/E88353_01/html/E37841/write-2.html , title=write(2) - Solaris manual pages , website=docs.oracle.com , language=en , date=2021-08-13 , access-date=2021-10-13 {{Cite web , url=https://docs.oracle.com/cd/E88353_01/html/E37841/writev-2.html, title=writev(2) - Solaris manual pages , website=docs.oracle.com , language=en , date=2021-08-13 , access-date=2021-10-13 {{Cite web , url=https://docs.oracle.com/cd/E88353_01/html/E37841/mmap-2.html, title=mmap(2) - Solaris manual pages , website=docs.oracle.com , language=en , date=2021-08-13 , access-date=2021-10-13 {{Cite web , url=https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile , title=TransmitFile function (Win32) , website=docs.microsoft.com , language=en , date=2021-05-10 , access-date=2021-10-13 {{Cite web , url=https://developer.ibm.com/articles/j-zerocopy/?mhsrc=ibmsearch_a&mhq=java%20zero%20copy , title=Java zero-copy , website=developer.ibm.com, language=en , date=2008-09-02 , access-date=2021-10-13 , first1=Sathish K. , last1=Palaniappan , first2=Pramod B. , last2=Nagaraja {{cite web , title=Caldera OpenDOS Machine Readable Source Kit (M.R.S) 7.01 , publisher= Caldera, Inc. , date=1997-05-01 , url=https://archive.sundby.com/retro/DR-DOS/dossrc.zip , access-date=2022-01-02 , url-status=live , archive-url=https://web.archive.org/web/20210807095409/https://archive.sundby.com/retro/DR-DOS/dossrc.zip , archive-date=2021-08-07

(NB. Actually implemented since
DR DOS "Panther" DR-DOS (written as DR DOS, without a hyphen, in versions up to and including 6.0) is a disk operating system for IBM PC compatibles. Upon its introduction in 1988, it was the first DOS attempting to be compatible with IBM PC DOS and MS-DO ...
on 1992-06-22, see COMCPY.C/DOSIF.ASM in the COMMAND.COM sources of OpenDOS 7.01.)
{{cite book , title=NWDOS-TIPs — Tips & Tricks rund um Novell DOS 7, mit Blick auf undokumentierte Details, Bugs und Workarounds , chapter=II.4. Undokumentierte Eigenschaften externer Kommandos: MOVE.EXE , work=MPDOSTIP , author-first=Matthias R. , author-last=Paul , date=1997-07-30 , orig-date=1994-05-01 , edition=3 , version=Release 157 , language=de , url=http://www.antonis.de/dos/dos-tuts/mpdostip/html/nwdostip.htm , access-date=2014-08-06 , url-status=live , archive-url=https://web.archive.org/web/20170910194752/http://www.antonis.de/dos/dos-tuts/mpdostip/html/nwdostip.htm , archive-date=2017-09-10 (NB. NWDOSTIP.TXT is a comprehensive work on Novell DOS 7 and OpenDOS 7.01, including the description of many undocumented features and internals. It is part of the author's yet larger MPDOSTIP.ZIP collection maintained up to 2001 and distributed on many sites at the time. The provided link points to a HTML-converted older version of the NWDOSTIP.TXT file.
A yet older version 155 from 1997-05-13 of the 1997-07-15 distribution archive. -->
/ref> Software optimization