HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, garbage collection (GC) is a form of automatic
memory management Memory management (also dynamic memory management, dynamic storage allocation, or dynamic memory allocation) is a form of Resource management (computing), resource management applied to computer memory. The essential requirement of memory manag ...
. The ''garbage collector'' attempts to reclaim memory that was allocated by the program, but is no longer referenced; such memory is called ''
garbage Garbage, trash (American English), rubbish (British English), or refuse is waste material that is discarded by humans, usually due to a perceived lack of utility. The term generally does not encompass bodily waste products, purely liquid or ...
''. Garbage collection was invented by American computer scientist John McCarthy around 1959 to simplify manual memory management in
Lisp Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation. Originally specified in the late 1950s, ...
. Garbage collection relieves the programmer from doing manual memory management, where the programmer specifies what objects to de-allocate and return to the memory system and when to do so. Other, similar techniques include stack allocation, region inference, and memory ownership, and combinations thereof. Garbage collection may take a significant proportion of a program's total processing time, and affect
performance A performance is an act or process of staging or presenting a play, concert, or other form of entertainment. It is also defined as the action or process of carrying out or accomplishing an action, task, or function. Performance has evolved glo ...
as a result. Resources other than memory, such as
network socket A network socket is a software structure within a network node of a computer network that serves as an endpoint for sending and receiving data across the network. The structure and properties of a socket are defined by an application programming ...
s, database
handle A handle is a part of, or an attachment to, an object that allows it to be grasped and object manipulation, manipulated by hand. The design of each type of handle involves substantial ergonomics, ergonomic issues, even where these are dealt wi ...
s,
window A window is an opening in a wall, door, roof, or vehicle that allows the exchange of light and may also allow the passage of sound and sometimes air. Modern windows are usually glazed or covered in some other transparent or translucent ma ...
s, file descriptors, and device descriptors, are not typically handled by garbage collection, but rather by other
method Method (, methodos, from μετά/meta "in pursuit or quest of" + ὁδός/hodos "a method, system; a way or manner" of doing, saying, etc.), literally means a pursuit of knowledge, investigation, mode of prosecuting such inquiry, or system. In re ...
s (e.g. destructors). Some such methods de-allocate memory also.


Overview

Many
programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
s require garbage collection, either as part of the language specification (e.g., RPL,
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
, C#, D, Go, and most
scripting language In computing, a script is a relatively short and simple set of instructions that typically automation, automate an otherwise manual process. The act of writing a script is called scripting. A scripting language or script language is a programming ...
s) or effectively for practical implementation (e.g., formal languages like
lambda calculus In mathematical logic, the lambda calculus (also written as ''λ''-calculus) is a formal system for expressing computability, computation based on function Abstraction (computer science), abstraction and function application, application using var ...
). These are said to be ''garbage-collected languages''. Other languages, such as C and C++, were designed for use with manual memory management, but have garbage-collected implementations available. Some languages, like Ada,
Modula-3 Modula-3 is a programming language conceived as a successor to an upgraded version of Modula-2 known as Modula-2+. It has been influential in research circles (influencing the designs of languages such as Java, C#, Python and Nim), but it ha ...
, and C++/CLI, allow both garbage collection and manual memory management to co-exist in the same application by using separate heaps for collected and manually managed objects. Still others, like D, are garbage-collected but allow the user to manually delete objects or even disable garbage collection entirely when speed is required. Although many languages integrate GC into their
compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
and
runtime system In computer programming, a runtime system or runtime environment is a sub-system that exists in the computer where a program is created, as well as in the computers where the program is intended to be run. The name comes from the compile time ...
, ''post-hoc'' GC systems also exist, such as
Automatic Reference Counting Automatic Reference Counting (ARC) is a memory management feature of the Clang compiler providing automatic reference counting for the Objective-C and Swift (programming language), Swift programming languages. At compile time, it inserts into the o ...
(ARC). Some of these ''post-hoc'' GC systems do not require recompilation.


Advantages

GC frees the programmer from manually de-allocating memory. This helps avoid some kinds of
error An error (from the Latin , meaning 'to wander'Oxford English Dictionary, s.v. “error (n.), Etymology,” September 2023, .) is an inaccurate or incorrect action, thought, or judgement. In statistics, "error" refers to the difference between t ...
s: * '' Dangling pointers'', which occur when a piece of memory is freed while there are still pointers to it, and one of those pointers is dereferenced. By then the memory may have been reassigned to another use, with unpredictable results. * ''Double free bugs'', which occur when the program tries to free a region of memory that has already been freed, and perhaps already been allocated again. * Certain kinds of '' memory leaks'', in which a program fails to free memory occupied by objects that have become unreachable, which can lead to memory exhaustion.


Disadvantages

GC uses computing resources to decide which memory to free. Therefore, the penalty for the convenience of not annotating object lifetime manually in the source code is overhead, which can impair program performance. A peer-reviewed paper from 2005 concluded that GC needs five times the memory to compensate for this overhead and to perform as fast as the same program using idealized explicit memory management. The comparison however is made to a program generated by inserting deallocation calls using an
oracle An oracle is a person or thing considered to provide insight, wise counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. If done through occultic means, it is a form of divination. Descript ...
, implemented by collecting traces from programs run under a profiler, and the program is only correct for one particular execution of the program. Interaction with
memory hierarchy In computer architecture, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and contr ...
effects can make this overhead intolerable in circumstances that are hard to predict or to detect in routine testing. The impact on performance was given by Apple as a reason for not adopting garbage collection in iOS, despite it being the most desired feature. The moment when the garbage is actually collected can be unpredictable, resulting in stalls (pauses to shift/free memory) scattered throughout a session. Unpredictable stalls can be unacceptable in real-time environments, in
transaction processing In computer science, transaction processing is information processing that is divided into individual, indivisible operations called ''transactions''. Each transaction must succeed or fail as a complete unit; it can never be only partially c ...
, or in interactive programs. Incremental, concurrent, and real-time garbage collectors address these problems, with varying trade-offs.


Strategies


Tracing

Tracing garbage collection is the most common type of garbage collection, so much so that "garbage collection" often refers to tracing garbage collection, rather than other methods such as reference counting. The overall strategy consists of determining which objects should be garbage collected by tracing which objects are ''reachable'' by a chain of references from certain root objects, and considering the rest as garbage and collecting them. However, there are a large number of algorithms used in implementation, with widely varying complexity and performance characteristics.


Reference counting

Reference counting garbage collection is where each object has a count of the number of references to it. Garbage is identified by having a reference count of zero. An object's reference count is incremented when a reference to it is created and decremented when a reference is destroyed. When the count reaches zero, the object's memory is reclaimed. As with manual memory management, and unlike tracing garbage collection, reference counting guarantees that objects are destroyed as soon as their last reference is destroyed, and usually only accesses memory which is either in
CPU cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, whi ...
s, in objects to be freed, or directly pointed to by those, and thus tends to not have significant negative side effects on CPU cache and
virtual memory In computing, virtual memory, or virtual storage, is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a ver ...
operation. There are a number of disadvantages to reference counting; this can generally be solved or mitigated by more sophisticated algorithms: ; Cycles: If two or more objects refer to each other, they can create a cycle whereby neither will be collected as their mutual references never let their reference counts become zero. Some garbage collection systems using reference counting (like the one in
CPython CPython is the reference implementation of the Python programming language. Written in C and Python, CPython is the default and most widely used implementation of the Python language. CPython can be defined as both an interpreter and a comp ...
) use specific cycle-detecting algorithms to deal with this issue. Another strategy is to use weak references for the "backpointers" which create cycles. Under reference counting, a weak reference is similar to a weak reference under a tracing garbage collector. It is a special reference object whose existence does not increment the reference count of the referent object. Furthermore, a weak reference is safe in that when the referent object becomes garbage, any weak reference to it ''lapses'', rather than being permitted to remain dangling, meaning that it turns into a predictable value, such as a null reference. ; Space overhead (reference count): Reference counting requires space to be allocated for each object to store its reference count. The count may be stored adjacent to the object's memory or in a side table somewhere else, but in either case, every single reference-counted object requires additional storage for its reference count. Memory space with the size of an unsigned pointer is commonly used for this task, meaning that 32 or 64 bits of reference count storage must be allocated for each object. On some systems, it may be possible to mitigate this overhead by using a tagged pointer to store the reference count in unused areas of the object's memory. Often, an architecture does not actually allow programs to access the full range of memory addresses that could be stored in its native pointer size; a certain number of high bits in the address is either ignored or required to be zero. If an object reliably has a pointer at a certain location, the reference count can be stored in the unused bits of the pointer. For example, each object in
Objective-C Objective-C is a high-level general-purpose, object-oriented programming language that adds Smalltalk-style message passing (messaging) to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was ...
has a pointer to its
class Class, Classes, or The Class may refer to: Common uses not otherwise categorized * Class (biology), a taxonomic rank * Class (knowledge representation), a collection of individuals or objects * Class (philosophy), an analytical concept used d ...
at the beginning of its memory; on the ARM64 architecture using
iOS 7 iOS 7 is the seventh major release of the iOS mobile operating system developed by Apple Inc., being the successor to iOS 6. It was announced at the company's Worldwide Developers Conference on June 10, 2013, and was released on September 18 ...
, 19 unused bits of this class pointer are used to store the object's reference count. ; Speed overhead (increment/decrement): In naive implementations, each assignment of a reference and each reference falling out of scope often require modifications of one or more reference counters. However, in a common case when a reference is copied from an outer scope variable into an inner scope variable, such that the lifetime of the inner variable is bounded by the lifetime of the outer one, the reference incrementing can be eliminated. The outer variable "owns" the reference. In the programming language C++, this technique is readily implemented and demonstrated with the use of const references. Reference counting in C++ is usually implemented using " smart pointers" whose constructors, destructors, and assignment operators manage the references. A smart pointer can be passed by reference to a function, which avoids the need to copy-construct a new smart pointer (which would increase the reference count on entry into the function and decrease it on exit). Instead, the function receives a reference to the smart pointer which is produced inexpensively. The Deutsch-Bobrow method of reference counting capitalizes on the fact that most reference count updates are in fact generated by references stored in local variables. It ignores these references, only counting references in the heap, but before an object with reference count zero can be deleted, the system must verify with a scan of the stack and register that no other reference to it still exists. A further substantial decrease in the overhead on counter updates can be obtained by update coalescing introduced by Levanoni and Petrank. Consider a pointer that in a given interval of the execution is updated several times. It first points to an object O1, then to an object O2, and so forth until at the end of the interval it points to some object On. A reference counting algorithm would typically execute rc(O1)--, rc(O2)++, rc(O2)--, rc(O3)++, rc(O3)--, ..., rc(On)++. But most of these updates are redundant. In order to have the reference count properly evaluated at the end of the interval it is enough to perform rc(O1)-- and rc(On)++. Levanoni and Petrank measured an elimination of more than 99% of the counter updates in typical Java benchmarks. ; Requires atomicity: When used in a multithreaded environment, these modifications (increment and decrement) may need to be atomic operations such as compare-and-swap, at least for any objects which are shared, or potentially shared among multiple threads. Atomic operations are expensive on a multiprocessor, and even more expensive if they have to be emulated with software algorithms. It is possible to avoid this issue by adding per-thread or per-CPU reference counts and only accessing the global reference count when the local reference counts become or are no longer zero (or, alternatively, using a binary tree of reference counts, or even giving up deterministic destruction in exchange for not having a global reference count at all), but this adds significant memory overhead and thus tends to be only useful in special cases (it is used, for example, in the reference counting of Linux kernel modules). Update coalescing by Levanoni and Petrank can be used to eliminate all atomic operations from the write-barrier. Counters are never updated by the program threads in the course of program execution. They are only modified by the collector which executes as a single additional thread with no synchronization. This method can be used as a stop-the-world mechanism for parallel programs, and also with a concurrent reference counting collector. ; Not real-time: Naive implementations of reference counting do not generally provide real-time behavior, because any pointer assignment can potentially cause a number of objects bounded only by total allocated memory size to be recursively freed while the thread is unable to perform other work. It is possible to avoid this issue by delegating the freeing of unreferenced objects to other threads, at the cost of extra overhead.


Escape analysis

Escape analysis is a compile-time technique that can convert heap allocations to stack allocations, thereby reducing the amount of garbage collection to be done. This analysis determines whether an object allocated inside a function is accessible outside of it. If a function-local allocation is found to be accessible to another function or thread, the allocation is said to "escape" and cannot be done on the stack. Otherwise, the object may be allocated directly on the stack and released when the function returns, bypassing the heap and associated memory management costs.


Availability

Generally speaking, higher-level programming languages are more likely to have garbage collection as a standard feature. In some languages lacking built-in garbage collection, it can be added through a library, as with the Boehm garbage collector for C and C++. Most functional programming languages, such as ML,
Haskell Haskell () is a general-purpose, statically typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell pioneered several programming language ...
, and APL, have garbage collection built in.
Lisp Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation. Originally specified in the late 1950s, ...
is especially notable as both the first functional programming language and the first language to introduce garbage collection. Other dynamic languages, such as
Ruby Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
and Julia (but not
Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". Perl was developed ...
 5 or PHP before version 5.3, which both use reference counting),
JavaScript JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior. Web browsers have ...
and
ECMAScript ECMAScript (; ES) is a standard for scripting languages, including JavaScript, JScript, and ActionScript. It is best known as a JavaScript standard intended to ensure the interoperability of web pages across different web browsers. It is stan ...
also tend to use GC.
Object-oriented programming Object-oriented programming (OOP) is a programming paradigm based on the concept of '' objects''. Objects can contain data (called fields, attributes or properties) and have actions they can perform (called procedures or methods and impl ...
languages such as
Smalltalk Smalltalk is a purely object oriented programming language (OOP) that was originally created in the 1970s for educational use, specifically for constructionist learning, but later found use in business. It was created at Xerox PARC by Learni ...
,
ooRexx Object REXX is a High-level programming language, high-level, General-purpose programming language, general-purpose, Interpreter (computing), interpreted, Object-oriented programming, object-oriented (Class-based programming, class-based) program ...
, RPL and
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
usually provide integrated garbage collection. Notable exceptions are C++ and
Delphi Delphi (; ), in legend previously called Pytho (Πυθώ), was an ancient sacred precinct and the seat of Pythia, the major oracle who was consulted about important decisions throughout the ancient Classical antiquity, classical world. The A ...
, which have destructors.


BASIC

BASIC Basic or BASIC may refer to: Science and technology * BASIC, a computer programming language * Basic (chemistry), having the properties of a base * Basic access authentication, in HTTP Entertainment * Basic (film), ''Basic'' (film), a 2003 film ...
and
Logo A logo (abbreviation of logotype; ) is a graphic mark, emblem, or symbol used to aid and promote public identification and recognition. It may be of an abstract or figurative design or include the text of the name that it represents, as in ...
have often used garbage collection for variable-length data types, such as strings and lists, so as not to burden programmers with memory management details. On the Altair 8800, programs with many string variables and little string space could cause long pauses due to garbage collection. Similarly the Applesoft BASIC interpreter's garbage collection algorithm repeatedly scans the string descriptors for the string having the highest address in order to compact it toward high memory, resulting in O(n^2) performance and pauses anywhere from a few seconds to a few minutes. A replacement garbage collector for Applesoft BASIC by Randy Wigginton identifies a group of strings in every pass over the heap, reducing collection time dramatically. BASIC.SYSTEM, released with ProDOS in 1983, provides a windowing garbage collector for BASIC that is many times faster.


Objective-C

While the
Objective-C Objective-C is a high-level general-purpose, object-oriented programming language that adds Smalltalk-style message passing (messaging) to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was ...
traditionally had no garbage collection, with the release of OS X 10.5 in 2007
Apple An apple is a round, edible fruit produced by an apple tree (''Malus'' spp.). Fruit trees of the orchard or domestic apple (''Malus domestica''), the most widely grown in the genus, are agriculture, cultivated worldwide. The tree originated ...
introduced garbage collection for
Objective-C Objective-C is a high-level general-purpose, object-oriented programming language that adds Smalltalk-style message passing (messaging) to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was ...
 2.0, using an in-house developed runtime collector. However, with the 2012 release of OS X 10.8, garbage collection was deprecated in favor of
LLVM LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...
's automatic reference counter (ARC) that was introduced with OS X 10.7. Furthermore, since May 2015 Apple even forbade the usage of garbage collection for new OS X applications in the
App Store An app store, also called an app marketplace or app catalog, is a type of digital distribution platform for computer software called applications, often in a mobile context. Apps provide a specific set of functions which, by definition, do not i ...
. For iOS, garbage collection has never been introduced due to problems in application responsivity and performance; instead, iOS uses ARC.


Limited environments

Garbage collection is rarely used on embedded or real-time systems because of the usual need for very tight control over the use of limited resources. However, garbage collectors compatible with many limited environments have been developed. The Microsoft .NET Micro Framework, .NET nanoFramework and Java Platform, Micro Edition are embedded software platforms that, like their larger cousins, include garbage collection.


Java

Garbage collectors available in
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
OpenJDK OpenJDK (Open Java Development Kit) is a free and open-source implementation of the Java Platform, Standard Edition (Java SE). It is the result of an effort Sun Microsystems began in 2006, four years before the company was acquired by Oracle Corp ...
s virtual machine (JVM) include: * Serial * Parallel * CMS (Concurrent Mark Sweep) * G1 (Garbage-First) * ZGC (Z Garbage Collector) * Epsilon * Shenandoah * GenZGC (Generational ZGC) * GenShen (Generational Shenandoah) * IBM Metronome (only in
IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
OpenJDK) * SAP (only in SAP OpenJDK) * Azul C4 (Continuously Concurrent Compacting Collector) (only in Azul Systems OpenJDK)


Compile-time use

Compile-time garbage collection is a form of static analysis allowing memory to be reused and reclaimed based on invariants known during compilation. This form of garbage collection has been studied in the Mercury programming language, and it saw greater usage with the introduction of
LLVM LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...
's automatic reference counter (ARC) into Apple's ecosystem (iOS and OS X) in 2011.


Real-time systems

Incremental, concurrent, and real-time garbage collectors have been developed, for example by Henry Baker and by Henry Lieberman. In Baker's algorithm, the allocation is done in either half of a single region of memory. When it becomes half full, a garbage collection is performed which moves the live objects into the other half and the remaining objects are implicitly deallocated. The running program (the 'mutator') has to check that any object it references is in the correct half, and if not move it across, while a background task is finding all of the objects. Generational garbage collection schemes are based on the empirical observation that most objects die young. In generational garbage collection, two or more allocation regions (generations) are kept, which are kept separate based on the object's age. New objects are created in the "young" generation that is regularly collected, and when a generation is full, the objects that are still referenced from older regions are copied into the next oldest generation. Occasionally a full scan is performed. Some high-level language computer architectures include hardware support for real-time garbage collection. Most implementations of real-time garbage collectors use tracing. Such real-time garbage collectors meet hard real-time constraints when used with a real-time operating system.


See also

* Destructor (computer programming) * Dynamic dead-code elimination * Smart pointer * Virtual memory compression


References


Further reading

* (511 pages) * (404 pages) * * *


External links


The Memory Management Reference

The Very Basics of Garbage Collection



TinyGC - an independent implementation of the BoehmGC API

Conservative Garbage Collection Implementation for C Language

MeixnerGC - an incremental mark and sweep garbage collector for C++ using smart pointers
{{Authority control Memory management * Solid-state computer storage