Automatic bug-fixing is the automatic
repair
The technical meaning of maintenance involves functional checks, servicing, repairing or replacing of necessary devices, equipment, machinery, building infrastructure and supporting utilities in industrial, business, and residential installat ...
of
software bug
A software bug is a design defect ( bug) in computer software. A computer program with many or serious bugs may be described as ''buggy''.
The effects of a software bug range from minor (such as a misspelled word in the user interface) to sev ...
s without the intervention of a human programmer.
It is also commonly referred to as ''automatic patch generation'', ''automatic bug repair'', or ''automatic program repair''.
The typical goal of such techniques is to automatically generate correct
patches to eliminate bugs in
software program
A computer program is a sequence or set of instructions in a programming language for a computer to execute. It is one component of software, which also includes documentation and other intangible components.
A ''computer program'' in its h ...
s without causing
software regression.
Specification
Automatic bug fixing is made according to a specification of the expected behavior which can be for instance a
formal specification
In computer science, formal specifications are mathematically based techniques whose purpose is to help with the implementation of systems and software. They are used to describe a system, to analyze its behavior, and to aid in its design by verify ...
or a
test suite
In software development, a test suite, less commonly known as a validation suite, is a collection of test cases that are intended to be used to test a software program to show that it has some specified set of behaviors. A test suite often conta ...
.
A test-suite – the input/output pairs specify the functionality of the program, possibly captured in
assertions can be used as a
test oracle
In software testing, a test oracle (or just oracle) is a provider of information that describes correct output based on the input of a test case. Testing with an oracle involves comparing actual results of the system under test (SUT) with the ex ...
to drive the search. This oracle can in fact be divided between the ''bug oracle'' that exposes the faulty behavior, and the ''regression oracle'', which encapsulates the functionality any program repair method must preserve. Note that a test suite is typically incomplete and does not cover all possible cases. Therefore, it is often possible for a validated patch to produce expected outputs for all inputs in the test suite but incorrect outputs for other inputs.
The existence of such validated but incorrect patches is a major challenge for generate-and-validate techniques.
Recent successful automatic bug-fixing techniques often rely on additional information other than the test suite, such as information learned from previous human patches, to further identify correct patches among validated patches.
Another way to specify the expected behavior is to use
formal specification
In computer science, formal specifications are mathematically based techniques whose purpose is to help with the implementation of systems and software. They are used to describe a system, to analyze its behavior, and to aid in its design by verify ...
s
Verification against full specifications that specify the whole program behavior including functionalities is less common because such specifications are typically not available in practice and the computation cost of such
verification is prohibitive. For specific classes of errors, however, implicit partial specifications are often available. For example, there are targeted bug-fixing techniques validating that the patched program can no longer trigger overflow errors in the same execution path.
Techniques
Generate-and-validate
Generate-and-validate approaches compile and test each candidate patch to collect all validated patches that produce expected outputs for all inputs in the test suite.
Such a technique typically starts with a test suite of the program, i.e., a set of
test cases, at least one of which exposes the bug.
An early generate-and-validate bug-fixing systems is GenProg.
The effectiveness of generate-and-validate techniques remains controversial, because they typically do not provide
patch correctness guarantees.
Nevertheless, the reported results of recent state-of-the-art techniques are generally promising. For example, on systematically collected 69 real world bugs in eight large
C software programs, the state-of-the-art bug-fixing system Prophet generates correct patches for 18 out of the 69 bugs.
One way to generate candidate patches is to apply
mutation operators on the original program. Mutation operators manipulate the original program, potentially via its
abstract syntax tree
An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text (often source code) written in a formal ...
representation, or a more coarse-grained representation such as operating at the
statement-level or
block
Block or blocked may refer to:
Arts, entertainment and media Broadcasting
* Block programming, the result of a programming strategy in broadcasting
* W242BX, a radio station licensed to Greenville, South Carolina, United States known as ''96.3 ...
-level. Earlier
genetic improvement approaches operate at the statement level and carry out simple delete/replace operations such as deleting an existing statement or replacing an existing statement with another statement in the same source file.
Recent approaches use more fine-grained operators at the
abstract syntax tree
An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text (often source code) written in a formal ...
level to generate more diverse set of candidate patches.
Notably, the statement deletion mutation operator, and more generally removing code, is a reasonable repair strategy, or at least a good fault localization strategy.
Another way to generate candidate patches consists of using fix templates. Fix templates are typically predefined changes for fixing specific classes of bugs.
Examples of fix templates include inserting a
conditional statement to check whether the value of a variable is null to fix null pointer exception, or changing an integer constant by one to fix off-by-one errors.
Synthesis-based
Repair techniques exist that are based on symbolic execution. For example, Semfix
uses symbolic execution to extract a repair constraint. Angelix
introduced the concept of angelic forest in order to deal with multiline patches.
Under certain assumptions, it is possible to state the repair problem as a synthesis problem.
SemFix
uses component-based synthesis.
Dynamoth uses dynamic synthesis.
S3 is based on
syntax-guided synthesis.
SearchRepair
converts potential patches into an SMT formula and queries candidate patches that allow the patched program to pass all supplied test cases.
Data-driven
Machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
techniques can improve the effectiveness of automatic bug-fixing systems.
One example of such techniques learns from past successful patches from human developers collected from
open source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
repositories in
GitHub
GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...
and
SourceForge
SourceForge is a web service founded by Geoffrey B. Jeffery, Tim Perdue, and Drew Streib in November 1999. SourceForge provides a centralized software discovery platform, including an online platform for managing and hosting open-source soft ...
.
It then use the learned information to recognize and prioritize potentially correct patches among all generated candidate patches.
Alternatively, patches can be directly mined from existing sources. Example approaches include mining patches from donor applications
or from QA web sites.
Getafix
is a language-agnostic approach developed and used in production at
Facebook
Facebook is a social media and social networking service owned by the American technology conglomerate Meta Platforms, Meta. Created in 2004 by Mark Zuckerberg with four other Harvard College students and roommates, Eduardo Saverin, Andre ...
. Given a sample of
code commits where engineers fixed a certain kind of bug, it learns human-like fix patterns that apply to future bugs of the same kind. Besides using Facebook's own
code repositories as training data, Getafix learnt some fixes from
open source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
Java repositories. When new bugs get detected, Getafix applies its previously learnt patterns to produce candidate fixes and ranks them within seconds. It presents only the top-ranked fix for final validation by tools or an engineer, in order to save resources and ideally be so fast that no human time was spent on fixing the same bug, yet.
Template-based repair
For specific classes of errors, targeted automatic bug-fixing techniques use specialized templates:
*
null pointer exception repair
with insertion of a
conditional statement to check whether the value of a variable is null.
*
integer overflow
In computer programming, an integer overflow occurs when an arithmetic operation on integers attempts to create a numeric value that is outside of the range that can be represented with a given number of digits – either higher than the maximu ...
repair
*
buffer overflow repair
*
memory leak
In computer science, a memory leak is a type of resource leak that occurs when a computer program incorrectly manages memory allocations in a way that memory which is no longer needed is not released. A memory leak may also happen when an objec ...
repair,
with automated insertion of missing memory deallocation statements.
Comparing to generate-and-validate techniques, template-based techniques tend to have better bug-fixing accuracy but a much narrowed scope.
Use
There are multiple uses of automatic bug fixing:
* In a development environment: When encountering a bug the developer activates a feature to search for a patch (for instance by clicking on a button). This search can also happen in the background, when the IDE proactively searches for solutions to potential problems, without waiting for explicit action from the developer.
* At runtime: When a failure happens at runtime, a binary patch can be searched for and
applied online. An example of such a repair system is ClearView,
which does repair on x86 code, with x86 binary patches.
Search space
In essence, automatic bug fixing is a search activity, whether deductive-based or heuristic-based. The search space of automatic bug fixing is composed of all edits that can be possibly made to a program. There have been studies to understand the structure of this search space. Qi et al. showed that the original fitness function of Genprog is not better than random search to drive the search. Long et al.'s
study indicated that correct patches can be considered as sparse in the search space and that incorrect overfitting patches are vastly more abundant (see also discussion about overfitting below).
Overfitting
Sometimes, in test-suite based program repair, tools generate patches that pass the test suite, yet are actually incorrect, this is known as the "overfitting" problem.
"Overfitting" in this context refers to the fact that the patch overfits to the test inputs. There are different kinds of overfitting: incomplete fixing means that only some buggy inputs are fixed, regression introduction means some previously working features are broken after the patch (because they were poorly tested). Early prototypes for automatic repair suffered a lot from overfitting: on the Manybugs C benchmark, Qi et al.
reported that 104/110 of plausible GenProg patches were overfitting. In the context of synthesis-based repair, Le et al. obtained more than 80% of overfitting patches.
One way to avoid overfitting is to filter out the generated patches. This can be done based on dynamic analysis.
Alternatively, Tian et al. propose heuristic approaches to assess patch correctness.
Limitations of automatic bug-fixing
Automatic bug-fixing techniques that rely on a test suite do not provide patch correctness guarantees, because the test suite is incomplete and does not cover all cases.
A weak test suite may cause generate-and-validate techniques to produce validated but incorrect patches that have negative effects such as eliminating desirable functionalities, causing memory leaks, and introducing security vulnerabilities.
One possible approach is to amplify the failing test suite by automatically generating further test cases that are then labelled as passing or failing. To minimize the human labelling effort, an automatic
test oracle
In software testing, a test oracle (or just oracle) is a provider of information that describes correct output based on the input of a test case. Testing with an oracle involves comparing actual results of the system under test (SUT) with the ex ...
can be trained that gradually learns to automatically classify test cases as passing or failing and only engages the bug-reporting user for uncertain cases.
A limitation of generate-and-validate repair systems is the search space explosion.
For a program, there are a large number of statements to change and for each statement there are a large number of possible modifications. State-of-the-art systems address this problem by assuming that a small modification is enough for fixing a bug, resulting in a search space reduction.
The limitation of approaches based on symbolic analysis
is that real world programs are often converted to intractably large formulas especially for modifying statements with
side effects
In medicine, a side effect is an effect of the use of a medicinal drug or other treatment, usually adverse but sometimes beneficial, that is unintended. Herbal and traditional medicines also have side effects.
A drug or procedure usually used ...
.
Benchmarks
Benchmarks of bugs typically focus on one specific programming language.
In C, the Manybugs benchmark collected by GenProg authors contains 69 real world defects and it is widely used to evaluate many other bug-fixing tools for C.
In
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
, the main benchmark is Defects4J now extensively used in most research papers on program repair for Java.
Alternative benchmarks exist, such as the Quixbugs benchmark, which contains original bugs for program repair. Other benchmarks of Java bugs include Bugs.jar, based on past commits.
Example tools
Automatic bug-fixing is an active research topic in computer science. There are many implementations of various bug-fixing techniques especially for C and Java programs. Note that most of these implementations are research prototypes for demonstrating their techniques, i.e., it is unclear whether their current implementations are ready for industrial usage or not.
C
* ClearView:
A generate-and-validate tool of generating binary patches for deployed systems. It is evaluated on 10 security vulnerability cases. A later study shows that it generates correct patches for at least 4 of the 10 cases.
* GenProg:
A seminal generate-and-validate bug-fixing tool. It has been extensively studied in the context of the ManyBugs benchmark.
* SemFix:
The first solver-based bug-fixing tool for C.
* CodePhage:
The first bug-fixing tool that directly transfer code across programs to generate patch for C program. Note that although it generates C patches, it can extract code from
binary programs without source code.
* LeakFix:
A tool that automatically fixes memory leaks in C programs.
* Prophet:
The first generate-and-validate tool that uses machine learning techniques to learn useful knowledge from past human patches to recognize correct patches. It is evaluated on the same benchmark as GenProg and generate correct patches (i.e., equivalent to human patches) for 18 out of 69 cases.
* SearchRepair:
A tool for replacing buggy code using snippets of code from elsewhere. It is evaluated on the IntroClass benchmark
and generates much higher quality patches on that benchmark than GenProg, RSRepair, and AE.
* Angelix:
An improved solver-based bug-fixing tool. It is evaluated on the GenProg benchmark. For 10 out of the 69 cases, it generate patches that is equivalent to human patches.
* Learn2Fix:
The first human-in-the-loop semi-automatic repair tool. Extends GenProg to learn the condition under which a semantic bug is observed by systematic queries to the user who is reporting the bug. Only works for programs that take and produce integers.
Java
* PAR:
A generate-and-validate tool that uses a set of manually defined fix templates.
* QACrashFix:
A tool that fixes Java crash bugs by mining fixes from Q&A web site.
* ARJA:
A repair tool for Java based on multi-objective genetic programming.
* NpeFix:
An automatic repair tool for NullPointerException in Java, availabl
on Github
Other languages
* AutoFixE:
A bug-fixing tool for
Eiffel language. It relies the contracts (i.e., a form of formal specification) in Eiffel programs to validate generated patches.
*Getafix:
Operates purely on
AST transformations and thus requires only a
parser
Parsing, syntax analysis, or syntactic analysis is a process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar by breaking it into parts. The term '' ...
and formatter. At Facebook it has been applied to
Hack
Hack may refer to:
Arts, entertainment, and media Games
* Hack (Unix video game), ''Hack'' (Unix video game), a 1984 roguelike video game
* .hack (video game series), ''.hack'' (video game series), a series of video games by the multimedia fran ...
,
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
and
Objective-C
Objective-C is a high-level general-purpose, object-oriented programming language that adds Smalltalk-style message passing (messaging) to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was ...
.
Proprietary
* DeepCode integrates public and private
GitHub
GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...
,
GitLab
GitLab is a software forge primarily developed by GitLab Inc. It is available as a community edition and a commercial edition.
History
GitLab was created in 2011 by Ukrainian programmer Dmitriy Zaporozhets as a side project written in Rub ...
and
Bitbucket
Bitbucket is a Git-based source code repository hosting service owned by Atlassian. Bitbucket offers both commercial plans and free accounts with an unlimited number of private repositories.
Services Bitbucket Cloud
Bitbucket Cloud (pre ...
repositories to identify code-fixes and improve software.
*
Kodezi utilizes opensource data from
GitHub
GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...
repositories,
Stack Overflow
In software, a stack overflow occurs if the call stack pointer exceeds the stack bound. The call stack may consist of a limited amount of address space, often determined at the start of the program. The size of the call stack depends on many fa ...
, and private trained models to analyze code, provide solutions, and descriptions about the coding bugs instantly.
References
External links
* {{URL, http://program-repair.org/ datasets, tools, etc., related to automated program repair research.
Debugging
Software development