
In
information security
Information security, sometimes shortened to InfoSec, is the practice of protecting information by mitigating information risks. It is part of information risk management. It typically involves preventing or reducing the probability of unauthori ...
and
programming, a buffer overflow, or buffer overrun, is an
anomaly
Anomaly may refer to:
Science
Natural
*Anomaly (natural sciences)
** Atmospheric anomaly
** Geophysical anomaly
Medical
* Congenital anomaly (birth defect), a disorder present at birth
** Physical anomaly, a deformation of an anatomical struct ...
whereby a
program
Program, programme, programmer, or programming may refer to:
Business and management
* Program management, the process of managing several related projects
* Time management
* Program, a part of planning
Arts and entertainment Audio
* Programm ...
, while writing
data
In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
to a
buffer
Buffer may refer to:
Science
* Buffer gas, an inert or nonflammable gas
* Buffer solution, a solution used to prevent changes in pH
* Buffering agent, the weak acid or base in a buffer solution
* Lysis buffer, in cell biology
* Metal ion buffer
* ...
, overruns the buffer's boundary and overwrites adjacent
memory
Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered ...
locations.
Buffers are areas of memory set aside to hold data, often while moving it from one section of a program to another, or between programs. Buffer overflows can often be triggered by malformed inputs; if one assumes all inputs will be smaller than a certain size and the buffer is created to be that size, then an anomalous transaction that produces more data could cause it to write past the end of the buffer. If this overwrites adjacent data or executable code, this may result in erratic program behavior, including memory access errors, incorrect results, and
crashes.
Exploiting the behavior of a buffer overflow is a well-known
security exploit
An exploit (from the English verb ''to exploit'', meaning "to use something to one’s own advantage") is a piece of software, a chunk of data, or a sequence of commands that takes advantage of a bug or vulnerability to cause unintended or unanti ...
. On many systems, the memory layout of a program, or the system as a whole, is well defined. By sending in data designed to cause a buffer overflow, it is possible to write into areas known to hold
executable code
In computing, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instructions", as opposed to a data file ...
and replace it with
malicious code
Malware (a portmanteau for ''malicious software'') is any software intentionally designed to cause disruption to a computer, server, client, or computer network, leak private information, gain unauthorized access to information or systems, depr ...
, or to selectively overwrite data pertaining to the program's state, therefore causing behavior that was not intended by the original programmer. Buffers are widespread in
operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
(OS) code, so it is possible to make attacks that perform
privilege escalation
Privilege escalation is the act of exploiting a bug, a design flaw, or a configuration oversight in an operating system or software application to gain elevated access to resources that are normally protected from an application or user. The re ...
and gain unlimited access to the computer's resources. The famed
Morris worm in 1988 used this as one of its attack techniques.
Programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming l ...
s commonly associated with buffer overflows include
C and
C++
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significa ...
, which provide no built-in protection against accessing or overwriting data in any part of memory and do not automatically check that data written to an
array
An array is a systematic arrangement of similar objects, usually in rows and columns.
Things called an array include:
{{TOC right
Music
* In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
(the built-in buffer type) is within the boundaries of that array.
Bounds checking
In computer programming, bounds checking is any method of detecting whether a variable is within some bounds before it is used. It is usually used to ensure that a number fits into a given type (range checking), or that a variable being used as ...
can prevent buffer overflows, but requires additional code and processing time. Modern operating systems use a variety of techniques to combat malicious buffer overflows, notably by
randomizing the layout of memory, or deliberately leaving space between buffers and looking for actions that write into those areas ("canaries").
Technical description
A buffer overflow occurs when
data
In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
written to a buffer also corrupts data values in
memory address
In computing, a memory address is a reference to a specific memory location used at various levels by software and hardware. Memory addresses are fixed-length sequences of digits conventionally displayed and manipulated as unsigned integers. ...
es adjacent to the destination buffer due to insufficient
bounds checking
In computer programming, bounds checking is any method of detecting whether a variable is within some bounds before it is used. It is usually used to ensure that a number fits into a given type (range checking), or that a variable being used as ...
. This can occur when copying data from one buffer to another without first checking that the data fits within the destination buffer.
Example
In the following example expressed in
C, a program has two variables which are adjacent in memory: an 8-byte-long string buffer, A, and a two-byte
big-endian
In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most si ...
integer, B.
char A = "";
unsigned short B = 1979;
Initially, A contains nothing but zero bytes, and B contains the number 1979.
Now, the program attempts to store the
null-terminated string
In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character (a character with a value of zero, called NUL in this article). Alternative names are C str ...
with
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
encoding in the A buffer.
strcpy(A, "excessive");
is 9 characters long and encodes to 10 bytes including the null terminator, but A can take only 8 bytes. By failing to check the length of the string, it also overwrites the value of B:
B's value has now been inadvertently replaced by a number formed from part of the character string. In this example "e" followed by a zero byte would become 25856.
Writing data past the end of allocated memory can sometimes be detected by the operating system to generate a
segmentation fault
In computing, a segmentation fault (often shortened to segfault) or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restrict ...
error that terminates the process.
To prevent the buffer overflow from happening in this example, the call to
strcpy
The C programming language has a set of functions implementing operations on strings (character strings and byte strings) in its standard library. Various operations, such as copying, concatenation, tokenization and searching are supported. ...
could be replaced with
strlcpy
The C programming language has a set of functions implementing operations on strings (character strings and byte strings) in its standard library. Various operations, such as copying, concatenation, tokenization and searching are supported. ...
, which takes the maximum capacity of A (including a null-termination character) as an additional parameter and ensures that no more than this amount of data is written to A:
strlcpy(A, "excessive", sizeof(A));
When available, the
strlcpy
The C programming language has a set of functions implementing operations on strings (character strings and byte strings) in its standard library. Various operations, such as copying, concatenation, tokenization and searching are supported. ...
library function is preferred over
strncpy
The C programming language has a set of functions implementing operations on strings (character strings and byte strings) in its standard library. Various operations, such as copying, concatenation, tokenization and searching are supported. ...
which does not null-terminate the destination buffer if the source string's length is greater than or equal to the size of the buffer (the third argument passed to the function), therefore
A may not be null-terminated and cannot be treated as a valid C-style string.
Exploitation
The techniques to
exploit
Exploit means to take advantage of something (a person, situation, etc.) for one's own end, especially unethically or unjustifiably.
Exploit can mean:
* Exploitation of natural resources
*Exploit (computer security)
* Video game exploit
*Exploita ...
a buffer overflow vulnerability vary by
architecture
Architecture is the art and technique of designing and building, as distinguished from the skills associated with construction. It is both the process and the product of sketching, conceiving, planning, designing, and constructing buildings ...
, by
operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
and by memory region. For example, exploitation on the
heap
Heap or HEAP may refer to:
Computing and mathematics
* Heap (data structure), a data structure commonly used to implement a priority queue
* Heap (mathematics), a generalization of a group
* Heap (programming) (or free store), an area of memory f ...
(used for dynamically allocated memory), differs markedly from exploitation on the
call stack
In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program. This kind of stack is also known as an execution stack, program stack, control stack, run-time stack, or mach ...
. In general, heap exploitation is dependent on the heap manager used on the target system, stack exploitation is dependent on the calling convention used by the architecture and compiler.
Stack-based exploitation
A technically inclined user may exploit stack-based buffer overflows to manipulate the program to their advantage in one of several ways:
* By overwriting a local variable that is located near the vulnerable buffer on the stack, in order to change the behavior of the program
* By overwriting the return address in a
stack frame
In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program. This kind of stack is also known as an execution stack, program stack, control stack, run-time stack, or mach ...
to point to code selected by the attacker, usually called the
shellcode
In hacking, a shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability. It is called "shellcode" because it typically starts a command shell from which the attacker can control the compromised mac ...
. Once the function returns, execution will resume at the attacker's shellcode.
* By overwriting a
function pointer
A function pointer, also called a subroutine pointer or procedure pointer, is a pointer that points to a function. As opposed to referencing a data value, a function pointer points to executable code within memory. Dereferencing the function point ...
or
exception handler
In computing and computer programming, exception handling is the process of responding to the occurrence of ''exceptions'' – anomalous or exceptional conditions requiring special processing – during the execution of a program. In general, an ...
to point to the shellcode, which is subsequently executed
* By overwriting a local variable (or pointer) of a different stack frame, which will be used by the function which owns that frame later.
The attacker designs data to cause one of these exploits, then places this data in a buffer supplied to users by the vulnerable code. If the address of the user-supplied data used to affect the stack buffer overflow is unpredictable, exploiting a stack buffer overflow to cause remote code execution becomes much more difficult. One technique that can be used to exploit such a buffer overflow is called "
trampolining
Trampolining or trampoline gymnastics is a competitive Olympic sport in which athletes perform acrobatics while bouncing on a trampoline. In competition, these can include simple jumps in the straight, pike, tuck, or straddle position to more c ...
". In that technique, an attacker will find a pointer to the vulnerable stack buffer, and compute the location of their
shellcode
In hacking, a shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability. It is called "shellcode" because it typically starts a command shell from which the attacker can control the compromised mac ...
relative to that pointer. Then, they will use the overwrite to jump to an
instruction already in memory which will make a second jump, this time relative to the pointer; that second jump will branch execution into the shellcode. Suitable instructions are often present in large code. The
Metasploit Project
The Metasploit Project is a computer security project that provides information about security vulnerabilities and aids in penetration testing and IDS signature development. It is owned by Boston, Massachusetts-based security company Rapid7.
I ...
, for example, maintains a database of suitable opcodes, though it lists only those found in the
Windows
Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for ...
operating system.
Heap-based exploitation
A buffer overflow occurring in the heap data area is referred to as a heap overflow and is exploitable in a manner different from that of stack-based overflows. Memory on the heap is dynamically allocated by the application at run-time and typically contains program data. Exploitation is performed by corrupting this data in specific ways to cause the application to overwrite internal structures such as linked list pointers. The canonical heap overflow technique overwrites dynamic memory allocation linkage (such as
malloc
C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely , , , and .
The C++ programming language includ ...
meta data) and uses the resulting pointer exchange to overwrite a program function pointer.
Microsoft
Microsoft Corporation is an American multinational corporation, multinational technology company, technology corporation producing Software, computer software, consumer electronics, personal computers, and related services headquartered at th ...
's
GDI+
The Graphics Device Interface (GDI) is a legacy component of Microsoft Windows responsible for representing graphical objects and transmitting them to output devices such as monitors and printers. Windows apps use Windows API to interact with G ...
vulnerability in handling
JPEG
JPEG ( ) is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and im ...
s is an example of the danger a heap overflow can present.
Barriers to exploitation
Manipulation of the buffer, which occurs before it is read or executed, may lead to the failure of an exploitation attempt. These manipulations can mitigate the threat of exploitation, but may not make it impossible. Manipulations could include conversion to upper or lower case, removal of
metacharacter
A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression (regex) engine.
In POSIX extended regular expressions, there are 14 metacharacters that must be ''escaped'' (prec ...
s and filtering out of non-
alphanumeric
Alphanumericals or alphanumeric characters are a combination of alphabetical and numerical characters. More specifically, they are the collection of Latin letters and Arabic digits. An alphanumeric code is an identifier made of alphanumeric ...
strings. However, techniques exist to bypass these filters and manipulations;
alphanumeric shellcode
In hacker (computer security), hacking, a shellcode is a small piece of code used as the Payload (computing), payload in the exploit (computer security), exploitation of a software Vulnerability (computing), vulnerability. It is called "shellc ...
,
polymorphic code
In computing, polymorphic code is code that uses a polymorphic engine to mutate while keeping the original algorithm intact - that is, the ''code'' changes itself every time it runs, but the ''function'' of the code (its semantics) will not chang ...
,
self-modifying code
In computer science, self-modifying code (SMC) is code that alters its own instructions while it is executing – usually to reduce the instruction path length and improve performance or simply to reduce otherwise repetitively similar code ...
and
return-to-libc attack
A "return-to-libc" attack is a computer security attack usually starting with a buffer overflow in which a subroutine return address on a call stack is replaced by an address of a subroutine that is already present in the process executable memory, ...
s. The same methods can be used to avoid detection by
intrusion detection system
An intrusion detection system (IDS; also intrusion prevention system or IPS) is a device or software application that monitors a network or systems for malicious activity or policy violations. Any intrusion activity or violation is typically rep ...
s. In some cases, including where code is converted into
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
, the threat of the vulnerability has been misrepresented by the disclosers as only Denial of Service when in fact the remote execution of arbitrary code is possible.
Practicalities of exploitation
In real-world exploits there are a variety of challenges which need to be overcome for exploits to operate reliably. These factors include null bytes in addresses, variability in the location of shellcode, differences between environments and various counter-measures in operation.
NOP sled technique

A NOP-sled is the oldest and most widely known technique for exploiting stack buffer overflows.
It solves the problem of finding the exact address of the buffer by effectively increasing the size of the target area. To do this, much larger sections of the stack are corrupted with the
no-op
In computer science, a NOP, no-op, or NOOP (pronounced "no op"; short for no operation) is a machine language instruction and its assembly language mnemonic, programming language statement, or computer protocol command that does nothing.
...
machine instruction. At the end of the attacker-supplied data, after the no-op instructions, the attacker places an instruction to perform a relative jump to the top of the buffer where the
shellcode
In hacking, a shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability. It is called "shellcode" because it typically starts a command shell from which the attacker can control the compromised mac ...
is located. This collection of no-ops is referred to as the "NOP-sled" because if the return address is overwritten with any address within the no-op region of the buffer, the execution will "slide" down the no-ops until it is redirected to the actual malicious code by the jump at the end. This technique requires the attacker to guess where on the stack the NOP-sled is instead of the comparatively small shellcode.
Because of the popularity of this technique, many vendors of
intrusion prevention system
An intrusion detection system (IDS; also intrusion prevention system or IPS) is a device or software application that monitors a network or systems for malicious activity or policy violations. Any intrusion activity or violation is typically rep ...
s will search for this pattern of no-op machine instructions in an attempt to detect shellcode in use. It is important to note that a NOP-sled does not necessarily contain only traditional no-op machine instructions; any instruction that does not corrupt the machine state to a point where the shellcode will not run can be used in place of the hardware assisted no-op. As a result, it has become common practice for exploit writers to compose the no-op sled with randomly chosen instructions which will have no real effect on the shellcode execution.
While this method greatly improves the chances that an attack will be successful, it is not without problems. Exploits using this technique still must rely on some amount of luck that they will guess offsets on the stack that are within the NOP-sled region.
An incorrect guess will usually result in the target program crashing and could alert the
system administrator
A system administrator, or sysadmin, or admin is a person who is responsible for the upkeep, configuration, and reliable operation of computer systems, especially multi-user computers, such as servers. The system administrator seeks to en ...
to the attacker's activities. Another problem is that the NOP-sled requires a much larger amount of memory in which to hold a NOP-sled large enough to be of any use. This can be a problem when the allocated size of the affected buffer is too small and the current depth of the stack is shallow (i.e., there is not much space from the end of the current stack frame to the start of the stack). Despite its problems, the NOP-sled is often the only method that will work for a given platform, environment, or situation, and as such it is still an important technique.
The jump to address stored in a register technique
The "jump to register" technique allows for reliable exploitation of stack buffer overflows without the need for extra room for a NOP-sled and without having to guess stack offsets. The strategy is to overwrite the return pointer with something that will cause the program to jump to a known pointer stored within a register which points to the controlled buffer and thus the shellcode. For example, if register A contains a pointer to the start of a buffer then any jump or call taking that register as an operand can be used to gain control of the flow of execution.

In practice a program may not intentionally contain instructions to jump to a particular register. The traditional solution is to find an unintentional instance of a suitable
opcode
In computing, an opcode (abbreviated from operation code, also known as instruction machine code, instruction code, instruction syllable, instruction parcel or opstring) is the portion of a machine language instruction that specifies the opera ...
at a fixed location somewhere within the program memory. In figure
E on the left is an example of such an unintentional instance of the i386
jmp esp
instruction. The opcode for this instruction is
FF E4
.
This two-byte sequence can be found at a one-byte offset from the start of the instruction
call DbgPrint
at address
0x7C941EED
. If an attacker overwrites the program return address with this address the program will first jump to
0x7C941EED
, interpret the opcode
FF E4
as the
jmp esp
instruction, and will then jump to the top of the stack and execute the attacker's code.
When this technique is possible the severity of the vulnerability increases considerably. This is because exploitation will work reliably enough to automate an attack with a virtual guarantee of success when it is run. For this reason, this is the technique most commonly used in
Internet worm
A computer worm is a standalone malware computer program that replicates itself in order to spread to other computers. It often uses a computer network to spread itself, relying on security failures on the target computer to access it. It wil ...
s that exploit stack buffer overflow vulnerabilities.
This method also allows shellcode to be placed after the overwritten return address on the Windows platform. Since executables are mostly based at address
0x00400000
and x86 is a
Little Endian
In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most si ...
architecture, the last byte of the return address must be a null, which terminates the buffer copy and nothing is written beyond that. This limits the size of the shellcode to the size of the buffer, which may be overly restrictive. DLLs are located in high memory (above
0x01000000
) and so have addresses containing no null bytes, so this method can remove null bytes (or other disallowed characters) from the overwritten return address. Used in this way, the method is often referred to as "DLL trampolining".
Protective countermeasures
Various techniques have been used to detect or prevent buffer overflows, with various tradeoffs. The following sections describe the choices and implementations available.
Choice of programming language
Assembly and C/C++ are popular programming languages that are vulnerable to buffer overflow, in part because they allow direct access to memory and are not
strongly typed
In computer programming, one of the many ways that programming languages are colloquially classified is whether the language's type system makes it strongly typed or weakly typed (loosely typed). However, there is no precise technical definition o ...
.
[https://www.owasp.org/index.php/Buffer_OverflowsBuffer Overflows article on OWASP ] C provides no built-in protection against accessing or overwriting data in any part of memory; more specifically, it does not check that data written to a buffer is within the boundaries of that buffer. The standard C++ libraries provide many ways of safely buffering data, and C++'s
Standard Template Library
The Standard Template Library (STL) is a software library originally designed by Alexander Stepanov for the C++ programming language that influenced many parts of the C++ Standard Library. It provides four components called ''algorithms'', ''co ...
(STL) provides containers that can optionally perform bounds checking if the programmer explicitly calls for checks while accessing data. For example, a
vector
's member function
at()
performs a bounds check and throws an
out_of_range
exception if the bounds check fails. However, C++ behaves just like C if the bounds check is not explicitly called. Techniques to avoid buffer overflows also exist for C.
Languages that are strongly typed and do not allow direct memory access, such as COBOL, Java, Python, and others, prevent buffer overflow from occurring in most cases.
Many programming languages other than C/C++ provide runtime checking and in some cases even compile-time checking which might send a warning or raise an exception when C or C++ would overwrite data and continue to execute further instructions until erroneous results are obtained which might or might not cause the program to crash. Examples of such languages include
Ada
Ada may refer to:
Places
Africa
* Ada Foah, a town in Ghana
* Ada (Ghana parliament constituency)
* Ada, Osun, a town in Nigeria
Asia
* Ada, Urmia, a village in West Azerbaijan Province, Iran
* Ada, Karaman, a village in Karaman Province, Tu ...
,
Eiffel
Eiffel may refer to:
Places
* Eiffel Peak, a summit in Alberta, Canada
* Champ de Mars – Tour Eiffel station, Paris, France; a transit station
Structures
* Eiffel Tower, in Paris, France, designed by Gustave Eiffel
* Eiffel Bridge, Ungheni, ...
,
Lisp
A lisp is a speech impairment in which a person misarticulates sibilants (, , , , , , , ). These misarticulations often result in unclear speech.
Types
* A frontal lisp occurs when the tongue is placed anterior to the target. Interdental lispi ...
,
Modula-2
Modula-2 is a structured, procedural programming language developed between 1977 and 1985/8 by Niklaus Wirth at ETH Zurich. It was created as the language for the operating system and application software of the Lilith personal workstation. It w ...
,
Smalltalk
Smalltalk is an object-oriented, dynamically typed reflective programming language. It was designed and created in part for educational use, specifically for constructionist learning, at the Learning Research Group (LRG) of Xerox PARC by ...
,
OCaml
OCaml ( , formerly Objective Caml) is a general-purpose, multi-paradigm programming language which extends the Caml dialect of ML with object-oriented features. OCaml was created in 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, D ...
and such C-derivatives as
Cyclone,
Rust
Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH), ...
and
D. The
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
and
.NET Framework
The .NET Framework (pronounced as "''dot net"'') is a proprietary software framework developed by Microsoft that runs primarily on Microsoft Windows. It was the predominant implementation of the Common Language Infrastructure (CLI) until bein ...
bytecode environments also require bounds checking on all arrays. Nearly every
interpreted language
In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpre ...
will protect against buffer overflows, signaling a well-defined error condition. Often where a language provides enough type information to do bounds checking an option is provided to enable or disable it.
Static code analysis
In computer science, static program analysis (or static analysis) is the analysis of computer programs performed without executing them, in contrast with dynamic program analysis, which is performed on programs during their execution.
The term ...
can remove many dynamic bound and type checks, but poor implementations and awkward cases can significantly decrease performance. Software engineers must carefully consider the tradeoffs of safety versus performance costs when deciding which language and compiler setting to use.
Use of safe libraries
The problem of buffer overflows is common in the C and C++ languages because they expose low level representational details of buffers as containers for data types. Buffer overflows must thus be avoided by maintaining a high degree of correctness in code which performs buffer management. It has also long been recommended to avoid standard library functions which are not bounds checked, such as
gets
,
scanf
A scanf format string (''scan f''ormatted) is a control parameter used in various functions to specify the layout of an input string. The functions can then divide the string and translate into values of appropriate data types. String scanning ...
and
strcpy
The C programming language has a set of functions implementing operations on strings (character strings and byte strings) in its standard library. Various operations, such as copying, concatenation, tokenization and searching are supported. ...
. The
Morris worm exploited a
gets
call in
fingerd.
Well-written and tested abstract data type libraries which centralize and automatically perform buffer management, including bounds checking, can reduce the occurrence and impact of buffer overflows. The two main building-block data types in these languages in which buffer overflows commonly occur are strings and arrays; thus, libraries preventing buffer overflows in these data types can provide the vast majority of the necessary coverage. Still, failure to use these safe libraries correctly can result in buffer overflows and other vulnerabilities; and naturally, any
bug in the library itself is a potential vulnerability. "Safe" library implementations include "The Better String Library", Vstr and Erwin. The
OpenBSD
OpenBSD is a security-focused operating system, security-focused, free and open-source, Unix-like operating system based on the Berkeley Software Distribution (BSD). Theo de Raadt created OpenBSD in 1995 by fork (software development), forking N ...
operating system's
C library
The C standard library or libc is the standard library for the C programming language, as specified in the ISO C standard. ISO/ IEC (2018). '' ISO/IEC 9899:2018(E): Programming Languages - C §7'' Starting from the original ANSI C standard, it wa ...
provides the
strlcpy
The C programming language has a set of functions implementing operations on strings (character strings and byte strings) in its standard library. Various operations, such as copying, concatenation, tokenization and searching are supported. ...
and
strlcat
The C programming language has a set of functions implementing operations on strings (character strings and byte strings) in its standard library. Various operations, such as copying, concatenation, tokenization and searching are supported. ...
functions, but these are more limited than full safe library implementations.
In September 2007, Technical Report 24731, prepared by the C standards committee, was published; it specifies a set of functions which are based on the standard C library's string and I/O functions, with additional buffer-size parameters. However, the efficacy of these functions for the purpose of reducing buffer overflows is disputable; it requires programmer intervention on a per function call basis that is equivalent to intervention that could make the analogous older standard library functions buffer overflow safe.
Buffer overflow protection
Buffer overflow protection is used to detect the most common buffer overflows by checking that the
stack
Stack may refer to:
Places
* Stack Island, an island game reserve in Bass Strait, south-eastern Australia, in Tasmania’s Hunter Island Group
* Blue Stack Mountains, in Co. Donegal, Ireland
People
* Stack (surname) (including a list of people ...
has not been altered when a function returns. If it has been altered, the program exits with a
segmentation fault
In computing, a segmentation fault (often shortened to segfault) or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restrict ...
. Three such systems are Libsafe, and the ''
StackGuard Buffer overflow protection is any of various techniques used during software development to enhance the security of executable programs by detecting buffer overflows on stack-allocated variables, and preventing them from causing program misbehavior ...
'' and ''
ProPolice Buffer overflow protection is any of various techniques used during software development to enhance the security of executable programs by detecting buffer overflows on stack-allocated variables, and preventing them from causing program misbehavior ...
''
gcc patches.
Microsoft's implementation of
Data Execution Prevention
In computer security, executable-space protection marks memory regions as non-executable, such that an attempt to execute machine code in these regions will cause an exception. It makes use of hardware features such as the NX bit (no-execute bit ...
(DEP) mode explicitly protects the pointer to the
Structured Exception Handler The Microsoft Windows family of operating systems employ some specific exception handling mechanisms.
Structured Exception Handling
Microsoft Structured Exception Handling is the native exception handling mechanism for Windows and a forerunner t ...
(SEH) from being overwritten.
Stronger stack protection is possible by splitting the stack in two: one for data and one for function returns. This split is present in the
Forth language, though it was not a security-based design decision. Regardless, this is not a complete solution to buffer overflows, as sensitive data other than the return address may still be overwritten.
Pointer protection
Buffer overflows work by manipulating
pointers
Pointer may refer to:
Places
* Pointer, Kentucky
* Pointers, New Jersey
* Pointers Airport, Wasco County, Oregon, United States
* The Pointers, a pair of rocks off Antarctica
People with the name
* Pointer (surname), a surname (including a li ...
, including stored addresses. PointGuard was proposed as a compiler-extension to prevent attackers from being able to reliably manipulate pointers and addresses. The approach works by having the compiler add code to automatically XOR-encode pointers before and after they are used. Theoretically, because the attacker does not know what value will be used to encode/decode the pointer, he cannot predict what it will point to if he overwrites it with a new value. PointGuard was never released, but Microsoft implemented a similar approach beginning in
Windows XP
Windows XP is a major release of Microsoft's Windows NT operating system. It was release to manufacturing, released to manufacturing on August 24, 2001, and later to retail on October 25, 2001. It is a direct upgrade to its predecessors, Wind ...
SP2 and
Windows Server 2003
Windows Server 2003 is the sixth version of Windows Server operating system produced by Microsoft. It is part of the Windows NT family of operating systems and was released to manufacturing on March 28, 2003 and generally available on April 24, ...
SP1. Rather than implement pointer protection as an automatic feature, Microsoft added an API routine that can be called. This allows for better performance (because it is not used all of the time), but places the burden on the programmer to know when it is necessary.
Because XOR is linear, an attacker may be able to manipulate an encoded pointer by overwriting only the lower bytes of an address. This can allow an attack to succeed if the attacker is able to attempt the exploit multiple times or is able to complete an attack by causing a pointer to point to one of several locations (such as any location within a NOP sled). Microsoft added a random rotation to their encoding scheme to address this weakness to partial overwrites.
Executable space protection
Executable space protection is an approach to buffer overflow protection which prevents execution of code on the stack or the heap. An attacker may use buffer overflows to insert arbitrary code into the memory of a program, but with executable space protection, any attempt to execute that code will cause an exception.
Some CPUs support a feature called
NX ("No eXecute") or
XD ("eXecute Disabled") bit, which in conjunction with software, can be used to mark
pages of data (such as those containing the stack and the heap) as readable and writable but not executable.
Some Unix operating systems (e.g.
OpenBSD
OpenBSD is a security-focused operating system, security-focused, free and open-source, Unix-like operating system based on the Berkeley Software Distribution (BSD). Theo de Raadt created OpenBSD in 1995 by fork (software development), forking N ...
,
macOS
macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac (computer), Mac computers. Within the market of ...
) ship with executable space protection (e.g.
W^X
W^X ("write xor execute", pronounced ''W xor X'') is a security feature in operating systems and virtual machines. It is a memory protection policy whereby every page in a process's or kernel's address space may be either writable or executable, ...
). Some optional packages include:
*
PaX
Pax or PAX may refer to:
Peace
* Peace (Latin: ''pax'')
** Pax (goddess), the Roman goddess of peace
** Pax, a truce term
* Pax (liturgy), a salutation in Catholic and Lutheran religious services
* Pax (liturgical object), an object formerly kis ...
*
Exec Shield
Exec Shield is a project started at Red Hat, Inc in late 2002 with the aim of reducing the risk of worm or other automated remote attacks on Linux systems. The first result of the project was a security patch for the Linux kernel that emulates an ...
*
Openwall
The Openwall Project is a source for various software, including Openwall GNU/*/Linux (Owl), a security-enhanced Linux distribution designed for servers. Openwall patches and security extensions have been included into many major Linux distribut ...
Newer variants of Microsoft Windows also support executable space protection, called
Data Execution Prevention
In computer security, executable-space protection marks memory regions as non-executable, such that an attempt to execute machine code in these regions will cause an exception. It makes use of hardware features such as the NX bit (no-execute bit ...
.