HOME

TheInfoList



OR:

In
computer storage Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a compute ...
, fragmentation is a phenomenon in which storage space, main storage or
secondary storage Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a compute ...
, is used inefficiently, reducing capacity or performance and often both. The exact consequences of fragmentation depend on the specific system of storage allocation in use and the particular form of fragmentation. In many cases, fragmentation leads to storage space being "wasted", and in that case the term also refers to the wasted space itself.


Basic principle

When a computer program requests blocks of memory from the computer system, the blocks are allocated in chunks. When the computer program is finished with a chunk, it can free it back to the system, making it available to later be allocated again to another or the same program. The size and the amount of time a chunk is held by a program varies. During its lifespan, a computer program can request and free many chunks of memory. When a program is started, the free memory areas are long and contiguous. Over time and with use, the long contiguous regions become fragmented into smaller and smaller contiguous areas. Eventually, it may become impossible for the program to obtain large contiguous chunks of memory.


Types

There are three different but related forms of fragmentation: external fragmentation, internal fragmentation, and data fragmentation, which can be present in isolation or conjunction. Fragmentation is often accepted in return for improvements in speed or simplicity. Analogous phenomena occur for other resources such as processors; see below.


Internal fragmentation

Memory paging In computer operating systems, memory paging is a memory management scheme by which a computer stores and retrieves data from secondary storage for use in main memory. In this scheme, the operating system retrieves data from secondary stor ...
creates internal fragmentation because an entire
page frame A page, memory page, or virtual page is a fixed-length contiguous block of virtual memory, described by a single entry in the page table. It is the smallest unit of data for memory management in a virtual memory operating system. Similarly, a ...
will be allocated whether or not that much storage is needed. Due to the rules governing memory allocation, more computer memory is sometimes allocated than is needed. For example, memory can only be provided to programs in chunks (usually a multiple of 4 bytes), and as a result if a program requests perhaps 29 bytes, it will actually get a chunk of 32 bytes. When this happens, the excess memory goes to waste. In this scenario, the unusable memory is contained within an allocated region. This arrangement, termed fixed partitions, suffers from inefficient memory use - any process, no matter how small, occupies an entire partition. This waste is called internal fragmentation. Unlike other types of fragmentation, internal fragmentation is difficult to reclaim; usually the best way to remove it is with a design change. For example, in
dynamic memory allocation Memory management is a form of resource management applied to computer memory. The essential requirement of memory management is to provide ways to dynamically allocate portions of memory to programs at their request, and free it for reuse when ...
,
memory pool Memory pools, also called fixed-size blocks allocation, is the use of pools for memory management that allows dynamic memory allocation comparable to malloc or C++'s operator new. As those implementations suffer from fragmentation because ...
s drastically cut internal fragmentation by spreading the space overhead over a larger number of objects.


External fragmentation

External fragmentation arises when free memory is separated into small blocks and is interspersed by allocated memory. It is a weakness of certain storage allocation algorithms, when they fail to order memory used by programs efficiently. The result is that, although free storage is available, it is effectively unusable because it is divided into pieces that are too small individually to satisfy the demands of the application. The term "external" refers to the fact that the unusable storage is outside the allocated regions. For example, consider a situation wherein a program allocates three continuous blocks of memory and then frees the middle block. The memory allocator can use this free block of memory for future allocations. However, it cannot use this block if the memory to be allocated is larger in size than this free block. External fragmentation also occurs in file systems as many files of different sizes are created, change size, and are deleted. The effect is even worse if a file which is divided into many small pieces is deleted, because this leaves similarly small regions of free spaces.


Data fragmentation

Data fragmentation occurs when a collection of data in memory is broken up into many pieces that are not close together. It is typically the result of attempting to insert a large object into storage that has already suffered external fragmentation. For example, files in a
file system In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
are usually managed in units called '' blocks'' or clusters. When a file system is created, there is free space to store file blocks together contiguously. This allows for rapid sequential file reads and writes. However, as files are added, removed, and changed in size, the free space becomes externally fragmented, leaving only small holes in which to place new data. When a new file is written, or when an existing file is extended, the operating system puts the new data in new non-contiguous data blocks to fit into the available holes. The new data blocks are necessarily scattered, slowing access due to
seek time Higher performance in hard disk drives comes from devices which have better performance characteristics. These performance characteristics can be grouped into two categories: access time and data transfer time (or rate). Access time The ''acces ...
and
rotational latency Higher performance in hard disk drives comes from devices which have better performance characteristics. These performance characteristics can be grouped into two categories: access time and data transfer time (or rate). Access time The ''access ...
of the read/write head, and incurring additional overhead to manage additional locations. This is called file system fragmentation. When writing a new file of a known size, if there are any empty holes that are larger than that file, the operating system can avoid data fragmentation by putting the file into any one of those holes. There are a variety of algorithms for selecting which of those potential holes to put the file; each of them is a
heuristic A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate ...
approximate solution to the bin packing problem. The "best fit" algorithm chooses the smallest hole that is big enough. The "worst fit" algorithm chooses the largest hole. The " first-fit algorithm" chooses the first hole that is big enough. The "next fit" algorithm keeps track of where each file was written. The "next fit" algorithm is faster than "first fit," which is in turn faster than "best fit," which is the same speed as "worst fit". Just as compaction can eliminate external fragmentation, data fragmentation can be eliminated by rearranging data storage so that related pieces are close together. For example, the primary job of a defragmentation tool is to rearrange blocks on disk so that the blocks of each file are contiguous. Most defragmenting utilities also attempt to reduce or eliminate free space fragmentation. Some moving garbage collectors, utilities that perform automatic memory management, will also move related objects close together (this is called ''compacting'') to improve cache performance. There are four kinds of systems that never experience data fragmentation—they always store every file contiguously. All four kinds have significant disadvantages compared to systems that allow at least some temporary data fragmentation: # Simply write each file contiguously. If there isn't already enough contiguous free space to hold the file, the system immediately fails to store the file—even when there are many little bits of free space from deleted files that add up to more than enough to store the file. # If there isn't already enough contiguous free space to hold the file, use a copying collector to convert many little bits of free space into one contiguous free region big enough to hold the file. This takes a lot more time than breaking the file up into fragments and putting those fragments into the available free space. # Write the file into any free block, through fixed-size blocks storage. If a programmer picks a fixed block size too small, the system immediately fails to store some files—files larger than the block size—even when there are many free blocks that add up to more than enough to store the file. If a programmer picks a block size too big, a lot of space is wasted on internal fragmentation. # Some systems avoid dynamic allocation entirely, pre-storing (contiguous) space for all possible files they will need—for example, MultiFinder pre-allocates a chunk of RAM to each application as it was started according to how much RAM that application's programmer claimed it would need.


Comparison

Compared to external fragmentation, overhead and internal fragmentation account for little loss in terms of wasted memory and reduced performance. It is defined as: : \frac Fragmentation of 0% means that all the free memory is in a single large block; fragmentation is 90% (for example) when 100 MB free memory is present but largest free block of memory for storage is just 10 MB. External fragmentation tends to be less of a problem in file systems than in primary memory (RAM) storage systems, because programs usually require their RAM storage requests to be fulfilled with contiguous blocks, but file systems typically are designed to be able to use any collection of available blocks (fragments) to assemble a file which logically appears contiguous. Therefore, if a highly fragmented file or many small files are deleted from a full volume and then a new file with size equal to the newly freed space is created, the new file will simply reuse the same fragments that were freed by the deletion. If what was deleted was one file, the new file will be just as fragmented as that old file was, but in any case there will be no barrier to using all the (highly fragmented) free space to create the new file. In RAM, on the other hand, the storage systems used often cannot assemble a large block to meet a request from small noncontiguous free blocks, and so the request cannot be fulfilled and the program cannot proceed to do whatever it needed that memory for (unless it can reissue the request as a number of smaller separate requests).


Problems


Storage failure

The most severe problem caused by fragmentation is causing a process or system to fail, due to premature resource exhaustion: if a contiguous block must be stored and cannot be stored, failure occurs. Fragmentation causes this to occur even if there is enough of the resource, but not a ''contiguous'' amount. For example, if a computer has 4 GiB of memory and 2 GiB are free, but the memory is fragmented in an alternating sequence of 1 MiB used, 1 MiB free, then a request for 1 contiguous GiB of memory cannot be satisfied even though 2 GiB total are free. In order to avoid this, the allocator may, instead of failing, trigger a defragmentation (or memory compaction cycle) or other resource reclamation, such as a major garbage collection cycle, in the hope that it will then be able to satisfy the request. This allows the process to proceed, but can severely impact performance.


Performance degradation

Fragmentation causes performance degradation for a number of reasons. Most basically, fragmentation increases the work required to allocate and access a resource. For example, on a hard drive or tape drive, sequential data reads are very fast, but seeking to a different address is slow, so reading or writing a fragmented file requires numerous seeks and is thus much slower, in addition to causing greater wear on the device. Further, if a resource is not fragmented, allocation requests can simply be satisfied by returning a single block from the start of the free area. However it is fragmented, the request requires either searching for a large enough free block, which may take a long time, or fulfilling the request by several smaller blocks (if this is possible), which results in this allocation being fragmented, and requiring additional overhead to manage the several pieces. A subtler problem is that fragmentation may prematurely exhaust a cache, causing thrashing, due to caches holding blocks, not individual data. For example, suppose a program has a
working set Working set is a concept in computer science which defines the amount of memory that a process requires in a given time interval. Definition Peter Denning (1968) defines "the working set of information W(t, \tau) of a process at time t to be t ...
of 256 KiB, and is running on a computer with a 256 KiB cache (say L2 instruction+data cache), so the entire working set fits in cache and thus executes quickly, at least in terms of cache hits. Suppose further that it has 64
translation lookaside buffer A translation lookaside buffer (TLB) is a memory cache that stores the recent translations of virtual memory to physical memory. It is used to reduce the time taken to access a user memory location. It can be called an address-translation cache ...
(TLB) entries, each for a 4 KiB
page Page most commonly refers to: * Page (paper), one side of a leaf of paper, as in a book Page, PAGE, pages, or paging may also refer to: Roles * Page (assistance occupation), a professional occupation * Page (servant), traditionally a young m ...
: each memory access requires a virtual-to-physical translation, which is fast if the page is in cache (here TLB). If the working set is unfragmented, then it will fit onto exactly 64 pages (the ''page'' working set will be 64 pages), and all memory lookups can be served from cache. However, if the working set is fragmented, then it will not fit into 64 pages, and execution will slow due to thrashing: pages will be repeatedly added and removed from the TLB during operation. Thus cache sizing in system design must include margin to account for fragmentation. Memory fragmentation is one of the most severe problems faced by
system A system is a group of interacting or interrelated elements that act according to a set of rules to form a unified whole. A system, surrounded and influenced by its environment, is described by its boundaries, structure and purpose and express ...
managers. Over time, it leads to degradation of system performance. Eventually, memory fragmentation may lead to complete loss of (application-usable) free memory. Memory fragmentation is a
kernel Kernel may refer to: Computing * Kernel (operating system), the central component of most operating systems * Kernel (image processing), a matrix used for image convolution * Compute kernel, in GPGPU programming * Kernel method, in machine learn ...
programming level problem. During
real-time computing Real-time computing (RTC) is the computer science term for hardware and software systems subject to a "real-time constraint", for example from event to system response. Real-time programs must guarantee response within specified time constrai ...
of applications, fragmentation levels can reach as high as 99%, and may lead to system crashes or other instabilities. This type of system crash can be difficult to avoid, as it is impossible to anticipate the critical rise in levels of memory fragmentation. However, while it may not be possible for a system to continue running all programs in the case of excessive memory fragmentation, a well-designed system should be able to recover from the critical fragmentation condition by moving in some memory blocks used by the system itself in order to enable consolidation of free memory into fewer, larger blocks, or, in the worst case, by terminating some programs to free their memory and then defragmenting the resulting sum total of free memory. This will at least avoid a true crash in the sense of system failure and allow the system to continue running some programs, save program data, etc. It is also important to note that fragmentation is a phenomenon of system software design; different software will be susceptible to fragmentation to different degrees, and it is possible to design a system that will never be forced to shut down or kill processes as a result of memory fragmentation.


Analogous phenomena

While fragmentation is best known as a problem in memory allocation, analogous phenomena occur for other
resources Resource refers to all the materials available in our environment which are technologically accessible, economically feasible and culturally sustainable and help us to satisfy our needs and wants. Resources can broadly be classified upon their av ...
, notably processors. For example, in a system that uses
time-sharing In computing, time-sharing is the sharing of a computing resource among many users at the same time by means of multiprogramming and multi-tasking.DEC Timesharing (1965), by Peter Clark, The DEC Professional, Volume 1, Number 1 Its emergence ...
for
preemptive multitasking In computing, preemption is the act of temporarily interrupting an executing task, with the intention of resuming it at a later time. This interrupt is done by an external scheduler with no assistance or cooperation from the task. This preemp ...
, but that does not check if a process is blocked, a process that executes for part of its
time slice In computing, preemption is the act of temporarily interrupting an executing task, with the intention of resuming it at a later time. This interrupt is done by an external scheduler with no assistance or cooperation from the task. This preemp ...
but then blocks and cannot proceed for the remainder of its time slice wastes time because of the resulting ''internal'' fragmentation of time slices. More fundamentally, time-sharing itself causes ''external'' fragmentation of processes due to running them in fragmented time slices, rather than in a single unbroken run. The resulting cost of process switching and increased cache pressure from multiple processes using the same caches can result in degraded performance. In concurrent systems, particularly
distributed system A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another from any system. Distributed computing is a field of computer sci ...
s, when a group of processes must interact in order to progress, if the processes are scheduled at separate times or on separate machines (fragmented across time or machines), the time spent waiting for each other or in communicating with each other can severely degrade performance. Instead, performant systems require
coscheduling Coscheduling is the principle for concurrent systems of scheduling related processes to run on different processors at the same time (in parallel). There are various specific implementations to realize this. If an application consists of a coll ...
of the group. Some
flash file system A flash file system is a file system designed for storing files on flash memory–based storage devices. While flash file systems are closely related to file systems in general, they are optimized for the nature and characteristics of flash me ...
s have several different kinds of internal fragmentation involving "dead space" and "dark space.".. Adrian Hunter
"A Brief Introduction to the Design of UBIFS"
2008.etc p. 8.


See also

* Defragmentation * File system fragmentation *
Memory management Memory management is a form of resource management applied to computer memory. The essential requirement of memory management is to provide ways to dynamically allocate portions of memory to programs at their request, and free it for reuse when ...
* Memory management (operating systems) *
Block (data storage) In computing (specifically data transmission and data storage), a block, sometimes called a physical record, is a sequence of bytes or bits, usually containing some whole number of records, having a maximum length; a ''block size''. Data th ...
* Data cluster


References


Sources

*http://www.edn.com/design/systems-design/4333346/Handling-memory-fragmentation *http://www.sqlservercentral.com/articles/performance+tuning/performancemonitoringbyinternalfragmentationmeasur/2014/ *C++ Footprint and Performance Optimization, R. Alexander; G. Bensley, Sams Publisher, First edition, Page no:128, ISBN no:9780672319044 *Ibid, Page no:129 {{Authority control File system management Memory management