American Fuzzy Lop (software)
   HOME

TheInfoList



OR:

American Fuzzy Lop (AFL), stylized in
all lowercase Letter case is the distinction between the letters that are in larger uppercase or capitals (more formally ''majuscule'') and smaller lowercase (more formally ''minuscule'') in the written representation of certain languages. The writing systems ...
as , is a
free software Free software, libre software, libreware sometimes known as freedom-respecting software is computer software distributed open-source license, under terms that allow users to run the software for any purpose as well as to study, change, distribut ...
fuzzer In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exception ...
that employs
genetic algorithm In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to g ...
s in order to efficiently increase
code coverage In software engineering, code coverage, also called test coverage, is a percentage measure of the degree to which the source code of a program is executed when a particular test suite is run. A program with high code coverage has more of its ...
of the
test case In software engineering, a test case is a specification of the inputs, execution conditions, testing procedure, and expected results that define a single test to be executed to achieve a particular software testing objective, such as to exercise ...
s. So far it has detected hundreds of significant
software bug A software bug is a design defect ( bug) in computer software. A computer program with many or serious bugs may be described as ''buggy''. The effects of a software bug range from minor (such as a misspelled word in the user interface) to sev ...
s in major free software projects, including
X.Org Server X.Org Server is the free and open-source implementation of the X Window System (X11) display server stewarded by the X.Org Foundation. Implementations of the client-side X Window System protocol exist in the form of ''X11 libraries'', which ...
,
PHP PHP is a general-purpose scripting language geared towards web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by the PHP Group. ...
,
OpenSSL OpenSSL is a software library for applications that provide secure communications over computer networks against eavesdropping, and identify the party at the other end. It is widely used by Internet servers, including the majority of HTTPS web ...
, pngcrush, bash,
Firefox Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements curr ...
,
BIND BIND () is a suite of software for interacting with the Domain Name System (DNS). Its most prominent component, named (pronounced ''name-dee'': , short for ''name Daemon (computing), daemon''), performs both of the main DNS server roles, acting ...
, Qt, and
SQLite SQLite ( "S-Q-L-ite", "sequel-ite") is a free and open-source relational database engine written in the C programming language. It is not a standalone app; rather, it is a library that software developers embed in their apps. As such, it ...
. Initially released in November 2013, AFL quickly became one of the most widely used fuzzers in security research. For many years after its release, AFL has been considered a "state of the art" fuzzer. AFL is considered "a de-facto standard for fuzzing", and the release of AFL contributed significantly to the development of fuzzing as a research area. AFL is widely used in academia; academic fuzzers are often
forks In cutlery or kitchenware, a fork (from 'pitchfork') is a Eating utensil, utensil, now usually made of metal, whose long handle terminates in a head that branches into several narrow and often slightly curved tine (structural), tines with whic ...
of AFL, and AFL is commonly used as a baseline to evaluate new techniques. The
source code In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer. Since a computer, at base, only ...
of American fuzzy lop is published on
GitHub GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...
. Its name is a reference to a breed of rabbit, the American Fuzzy Lop.


Overview

AFL requires the user to provide a sample command that runs the tested application and at least one small example input. The input can be fed to the tested program either via standard input or as an input file specified in the process command line. Fuzzing networked programs is currently not directly supported, although in some cases there are feasible solutions to this problem. For example, in case of an audio player, American fuzzy lop can be instructed to open a short sound file with it. Then, the fuzzer attempts to actually execute the specified command and if that succeeds, it tries to reduce the input file to the smallest one that triggers the same behavior. After this initial phase, AFL begins the actual process of fuzzing by applying various modifications to the input file. When the tested program crashes or hangs, this usually implies the discovery of a new bug, possibly a
security vulnerability Vulnerabilities are flaws or weaknesses in a system's design, implementation, or management that can be exploited by a malicious actor to compromise its security. Despite a system administrator's best efforts to achieve complete correctness, vir ...
. In this case, the modified input file is saved for further user inspection. In order to maximize the fuzzing performance, American fuzzy lop expects the tested program to be
compiled In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
with the aid of a
utility program Utility software is a program specifically designed to help manage and tune system or application software. It is used to support the computer infrastructure - in contrast to application software, which is aimed at directly performing tasks that b ...
that
instruments Instrument may refer to: Science and technology * Flight instruments, the devices used to measure the speed, altitude, and pertinent flight angles of various kinds of aircraft * Laboratory equipment, the measuring tools used in a scientific lab ...
the code with helper functions which track
control flow In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an '' ...
. This allows the fuzzer to detect when the target's behavior changes in response to the input. In cases when this is not possible,
black-box testing Black-box testing, sometimes referred to as specification-based testing, is a method of software testing that examines the functionality of an application without peering into its internal structures or workings. This method of test can be applie ...
is supported as well.


Fuzzing algorithm

Fuzzers attempt to find unexpected behaviors (i.e., bugs) in a target program by repeatedly executing the program on various inputs. As described above, AFL is a gray-box fuzzer, meaning it expects instrumentation to measure
code coverage In software engineering, code coverage, also called test coverage, is a percentage measure of the degree to which the source code of a program is executed when a particular test suite is run. A program with high code coverage has more of its ...
to have been injected into the target program at compile time and uses the coverage metric to direct the generation of new inputs. AFL's fuzzing algorithm has influenced many subsequent gray-box fuzzers. The inputs to AFL are an instrumented target program (the
system under test System under test (SUT) refers to a system that is being tested for correct operation. According to ISTQB it is the test object. From a unit testing perspective, the system under test represents all of the classes in a test that are not predefined ...
) and corpus, that is, a collection of inputs to the target. Inputs are also known as test cases. The algorithm maintains a queue of inputs, which is initialized to the input corpus. The overall algorithm works as follows: # Load the next input from the queue # Minimize the test case #
Mutate In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitosis ...
the test case. If any mutant results in additional code coverage, add it to the queue. If the mutant results in a crash or hang, save it to disk for later inspection. # Go to step 1


Mutation

To generate new inputs, AFL applies various mutations to existing inputs. These mutations are mostly agnostic to the input format of the target program; they generally treat the input as simple blob of
binary Binary may refer to: Science and technology Mathematics * Binary number, a representation of numbers using only two values (0 and 1) for each digit * Binary function, a function that takes two arguments * Binary operation, a mathematical op ...
data. At first, AFL applies a
deterministic Determinism is the metaphysical view that all events within the universe (or multiverse) can occur only in one possible way. Deterministic theories throughout the history of philosophy have developed from diverse and sometimes overlapping mo ...
sequence of mutations to each input. These are applied at various offsets in the input. They include: * Flipping (i.e., negating or inverting) 1-32 bits * Incrementing and decrementing 8-, 16-, and 32-bit
integers An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...
, in both little- and big- endian encodings * Overwriting parts of the input with "approximately two dozen 'interesting' values", including zero and maximum and minimum signed and unsigned integers of various widths, again in both little- and big- endian encodings. * Replacing parts of the input with data drawn from a "dictionary" of user-specified or auto-detected tokens (e.g., magic bytes, or keywords in a text-based format) After applying all available deterministic mutations, AFL moves on to havoc, a stage where between 2 and 128 mutations are applied in a row. These mutations are any of: * The deterministic mutations described above * Overwriting bytes with random values * Operations over multi-byte "blocks": ** Deleting blocks ** Duplicating blocks ** Setting each byte in a block to a single value If AFL cycles through the entire queue without generating any input that achieves new code coverage, it begins splicing. Splicing takes two inputs from the queue, truncates them at arbitrary positions, concatenates them together, and applies the havoc stage to the result.


Measuring coverage

AFL pioneered the use of binned hitcounts for measuring code coverage. The author claims that this technique mitigates path explosion. Conceptually, AFL counts the number of times a given execution of the target traverses each edge in the target's
control-flow graph In computer science, a control-flow graph (CFG) is a representation, using graph notation, of all paths that might be traversed through a program during its execution. The control-flow graph was conceived by Frances E. Allen, who noted that ...
; the documentation refers to these edges as tuples and the counts as hitcounts. At the end of the execution, the hitcounts are binned or bucketed into the following eight buckets: 1, 2, 3, 4–7, 8–15, 16–31, 32–127, and 128 and greater. AFL maintains a global set of (tuple, binned count) pairs that have been produced by any execution thus far. An input is considered "interesting" and is added to the queue if it produces a (tuple, binned count) pair that is not yet in the global set. In practice, the hitcounts are collected and processed using an efficient but
lossy In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...
scheme. The compile-time instrumentation injects code that is conceptually similar to the following at each branch in the control-flow graph of the target program: cur_location = ; shared_mem ur_location ^ prev_location+; prev_location = cur_location >> 1; where is a random integer and shared_mem is a 64
kilobyte The kilobyte is a multiple of the unit byte for Computer data storage, digital information. The International System of Units (SI) defines the prefix ''kilo-, kilo'' as a multiplication factor of 1000 (103); therefore, one kilobyte is 1000&nbs ...
region of memory shared between the fuzzer and the target. This representation is more fine-grained (distinguishes between more executions) than simple block or statement coverage, but still allows for a linear-time "interestingness" test.


Minimization

On the assumption that smaller inputs take less time to execute, AFL attempts to minimize or trim the test cases in the queue. Trimming works by removing blocks from the input; if the trimmed input still results in the same coverage (see #Measuring coverage), then the original input is discarded and the trimmed input is saved in the queue.


Scheduling

AFL selects a subset of favored inputs from the queue, non-favored inputs are skipped with some probability.


Features


Performance features

One of the challenges American fuzzy lop had to solve involved an efficient
spawning Spawn is the Egg cell, eggs and Spermatozoa, sperm released or deposited into water by aquatic animals. As a verb, ''to spawn'' refers to the process of freely releasing eggs and sperm into a body of water (fresh or marine); the physical act is ...
of hundreds of processes per second. Apart from the original engine that spawned every process from scratch, American fuzzy lop offers the default engine that relies heavily on the
fork In cutlery or kitchenware, a fork (from 'pitchfork') is a utensil, now usually made of metal, whose long handle terminates in a head that branches into several narrow and often slightly curved tines with which one can spear foods either to h ...
system call. This can further be sped up by leveraging
LLVM LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...
deferred fork server mode or the similar persistent mode, but this comes at the cost of having to modify the tested program. Also, American fuzzy lop supports fuzzing the same program over the network.


User interface

American fuzzy lop features a colorful
command line interface A command-line interface (CLI) is a means of interacting with software via commands each formatted as a line of text. Command-line interfaces emerged in the mid-1960s, on computer terminals, as an interactive and more user-friendly alternati ...
that displays real-time statistics about the fuzzing process. Various settings may be triggered by either command line options or
environment variable An environment variable is a user-definable value that can affect the way running processes will behave on a computer. Environment variables are part of the environment in which a process runs. For example, a running process can query the va ...
s. Apart from that, programs may read runtime statistics from files in a machine-readable format.


Utility programs

In addition to afl-fuzz and tools that can be used for binary instrumentation, American fuzzy lop features utility programs meant for monitoring of the fuzzing process. Apart from that, there is afl-cmin and afl-tmin, which can be used for test case and test corpus minimization. This can be useful when the test cases generated by afl-fuzz would be used by other fuzzers.


Forks

AFL has been forked many times in order to examine new fuzzing techniques, or to apply fuzzing to different kinds of programs. A few notable forks include: * AFL++ * MOPT-AFL * AFLFast * AFLSmart * AFLGo * SymCC-AFL * WinAFL, "a fork of AFL for fuzzing Windows binaries"


AFL++

AFL++ (AFLplusplus) is a community-maintained
fork In cutlery or kitchenware, a fork (from 'pitchfork') is a utensil, now usually made of metal, whose long handle terminates in a head that branches into several narrow and often slightly curved tines with which one can spear foods either to h ...
of AFL created due to the relative inactivity of
Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
's upstream AFL development since September 2017. It includes new features and speedups. Google's OSS-Fuzz initiative, which provides free fuzzing services to open source software, replaced its AFL option with AFL++ in January 2021.


References


Notes


Sources

* *


Further reading


The AFL technical "whitepaper"
written by the original author
Michał Zalewski Michał Zalewski (born 19 January 1981), also known by the user name lcamtuf, is a computer security expert and " white hat" hacker from Poland. He is a former Google Inc. employee (until 2018), and currently the VP of Security Engineering at ...
* Multi-System and Internet Security Cookbook, Hors-Serie No. 11 "Outils de sécurité", p. 36, "American Fuzzy Lop", Kevin Denis, June 2015


"Fuzz and strings (lwn.net)"

"Fuzzing (on) FreeBSD - (Mostly) automated bug discovery with security/afl"
- a presentation at
FOSDEM Free and Open source Software Developers' European Meeting (FOSDEM) is an annual software engineering conference. It is non-commercial and volunteer-organized with a focus on free and open-source software. Initiated in 2000, it is usually held d ...

"Testing with two failure seeking missiles: fuzzing and property based testing"
- a presentation at EuroPython 2015.
"The Fuzzing Project"
* "Fuzzing Code with AFL", Peter Gutmann,;login, Vol. 41, No. 2, Summer 2016


External links


Official AFL documentation

The blog of Michał Zalewski
author of AFL. The blog contains several posts diving in to the technical details of AFL. {{Google LLC Security testing tools Free and open-source software Software using the Apache license Free software programmed in C