ZFS
   HOME

TheInfoList



OR:

ZFS (previously Zettabyte File System) is a file system with
volume management In computer storage, logical volume management or LVM provides a method of allocating space on mass-storage devices that is more flexible than conventional partitioning schemes to store volumes. In particular, a volume manager can concatenate, ...
capabilities. It began as part of the
Sun Microsystems Sun Microsystems, Inc., often known as Sun for short, was an American technology company that existed from 1982 to 2010 which developed and sold computers, computer components, software, and information technology services. Sun contributed sig ...
Solaris Solaris is the Latin word for sun. It may refer to: Arts and entertainment Literature, television and film * ''Solaris'' (novel), a 1961 science fiction novel by Stanisław Lem ** ''Solaris'' (1968 film), directed by Boris Nirenburg ** ''Sol ...
operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...
in 2001. Large parts of Solaris, including ZFS, were published under an
open source license Open-source licenses are software licenses that allow content to be used, modified, and shared. They facilitate free and open-source software (FOSS) development. Intellectual property (IP) laws restrict the modification and sharing of creative ...
as
OpenSolaris OpenSolaris () is a discontinued open-source computer operating system for SPARC and x86 based systems, created by Sun Microsystems and based on Solaris. Its development began in the mid 2000s and ended in 2010. OpenSolaris was developed as ...
for around 5 years from 2005 before being placed under a closed source license when
Oracle Corporation Oracle Corporation is an American Multinational corporation, multinational computer technology company headquartered in Austin, Texas. Co-founded in 1977 in Santa Clara, California, by Larry Ellison, who remains executive chairman, Oracle was ...
acquired Sun in 20092010. During 2005 to 2010, the open source version of ZFS was ported to
Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
,
Mac OS X macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...
(continued as
MacZFS OpenZFS is an open-source implementation of the ZFS file system and volume manager initially developed by Sun Microsystems for the Oracle Solaris, Solaris operating system, and is now maintained by the OpenZFS Project. Similar to the original ZF ...
) and
FreeBSD FreeBSD is a free-software Unix-like operating system descended from the Berkeley Software Distribution (BSD). The first version was released in 1993 developed from 386BSD, one of the first fully functional and free Unix clones on affordable ...
. In 2010, the
illumos Illumos (stylized as "illumos") is a partly free and open-source Unix operating system. It has been developed since 2010 and is based on OpenSolaris, after the discontinuation of that product by Oracle. It comprises a kernel, device driver ...
project forked a recent version of OpenSolaris, including ZFS, to continue its development as an open source project. In 2013,
OpenZFS OpenZFS is an open-source implementation of the ZFS file system and volume manager initially developed by Sun Microsystems for the Solaris operating system, and is now maintained by the OpenZFS Project. Similar to the original ZFS, the impleme ...
was founded to coordinate the development of open source ZFS. OpenZFS maintains and manages the core ZFS code, while organizations using ZFS maintain the specific code and validation processes required for ZFS to integrate within their systems. OpenZFS is widely used in
Unix-like A Unix-like (sometimes referred to as UN*X, *nix or *NIX) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Uni ...
systems.


Overview

The management of stored data generally involves two aspects: the physical
volume management In computer storage, logical volume management or LVM provides a method of allocating space on mass-storage devices that is more flexible than conventional partitioning schemes to store volumes. In particular, a volume manager can concatenate, ...
of one or more block storage devices (such as
hard drive A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating hard disk drive platter, pla ...
s and
SD card Secure Digital (SD) is a proprietary, non-volatile, flash memory card format developed by the SD Association (SDA). Owing to their compact size, SD cards have been widely adopted in a variety of portable consumer electronics, including dig ...
s), including their organization into logical block devices as VDEVs (ZFS Virtual Device) as seen by the
operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...
(often involving a
volume manager In computer storage, logical volume management or LVM provides a method of allocating space on mass-storage devices that is more flexible than conventional partitioning schemes to store volumes. In particular, a volume manager can concatenate, ...
,
RAID controller A disk array controller is a device that manages the physical disk drives Data storage is the recording (storing) of information (data) in a storage medium. Handwriting, phonographic recording, magnetic tape, and optical discs are all ...
, array manager, or suitable
device driver In the context of an operating system, a device driver is a computer program that operates or controls a particular type of device that is attached to a computer or automaton. A driver provides a software interface to hardware devices, enabli ...
); and the management of data and files that are stored on these logical block devices (a file system or other data storage). : Example: A
RAID RAID (; redundant array of inexpensive disks or redundant array of independent disks) is a data storage virtualization technology that combines multiple physical Computer data storage, data storage components into one or more logical units for th ...
array of 2 hard drives and an SSD caching disk is controlled by Intel's RST system, part of the
chipset In a computer system, a chipset is a set of electronic components on one or more integrated circuits that manages the data flow between the processor, memory and peripherals. The chipset is usually found on the motherboard of computers. Chips ...
and
firmware In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, h ...
built into a desktop computer. The
Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
user sees this as a single volume, containing an NTFS-formatted drive of their data, and NTFS is not necessarily aware of the manipulations that may be required (such as reading from/writing to the cache drive or rebuilding the RAID array if a disk fails). The management of the individual devices and their presentation as a single device is distinct from the management of the files held on that apparent device. ZFS is unusual because, unlike most other storage systems, it unifies both of these roles and ''acts as both the volume manager and the file system''. Therefore, it has complete knowledge of both the physical disks and volumes (including their status, condition, and logical arrangement into volumes) as well as of all the files stored on them. ZFS is designed to ensure (subject to sufficient
data redundancy In computer main memory, auxiliary storage and computer buses, data redundancy is the existence of data that is additional to the actual data and permits correction of errors in stored or transmitted data. The additional data can simply be a com ...
) that data stored on disks cannot be lost due to physical errors, misprocessing by the hardware or
operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...
, or bit rot events and
data corruption Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data. Computer, transmission, and storage systems use a number of meas ...
that may happen over time. Its complete control of the storage system is used to ensure that every step, whether related to file management or disk management, is verified, confirmed, corrected if needed, and optimized, in a way that the storage controller cards and separate volume and file systems cannot achieve. ZFS also includes a mechanism for dataset and pool-level snapshots and replication, including snapshot
cloning Cloning is the process of producing individual organisms with identical genomes, either by natural or artificial means. In nature, some organisms produce clones through asexual reproduction; this reproduction of an organism by itself without ...
, which is described by the
FreeBSD FreeBSD is a free-software Unix-like operating system descended from the Berkeley Software Distribution (BSD). The first version was released in 1993 developed from 386BSD, one of the first fully functional and free Unix clones on affordable ...
documentation as one of its "most powerful features" with functionality that "even other file systems with snapshot functionality lack". Very large numbers of snapshots can be taken without degrading performance, allowing snapshots to be used prior to risky system operations and software changes, or an entire production ("live") file system to be fully snapshotted several times an hour in order to mitigate data loss due to user error or malicious activity. Snapshots can be rolled back "live" or previous file system states can be viewed, even on very large file systems, leading to savings in comparison to formal backup and restore processes. Snapshots can also be cloned to form new independent file systems. ZFS also has the ability to take a pool level snapshot (known as a "checkpoint"), which allows rollback of operations that may affect the entire pool's structure or that add or remove entire datasets.


History


2004–2010: Development at Sun Microsystems

In 1987,
AT&T Corporation AT&T Corporation, an abbreviation for its former name, the American Telephone and Telegraph Company, was an American telecommunications company that provided voice, video, data, and Internet telecommunications and professional services to busi ...
and Sun announced that they were collaborating on a project to merge the most popular Unix variants on the market at that time:
Berkeley Software Distribution The Berkeley Software Distribution (BSD), also known as Berkeley Unix or BSD Unix, is a discontinued Unix operating system developed and distributed by the Computer Systems Research Group (CSRG) at the University of California, Berkeley, beginn ...
,
UNIX System V Unix System V (pronounced: "System Five") is one of the first commercial versions of the Unix operating system. It was originally developed by AT&T and first released in 1983. Four major versions of System V were released, numbered 1, 2, 3, an ...
, and
Xenix Xenix is a discontinued Unix operating system for various microcomputer platforms, licensed by Microsoft from AT&T Corporation. The first version was released in 1980, and Xenix was the most common Unix variant during the mid- to late-1980s. T ...
. This became Unix System V Release 4 (SVR4). The project was released under the name
Solaris Solaris is the Latin word for sun. It may refer to: Arts and entertainment Literature, television and film * ''Solaris'' (novel), a 1961 science fiction novel by Stanisław Lem ** ''Solaris'' (1968 film), directed by Boris Nirenburg ** ''Sol ...
, which became the successor to
SunOS SunOS is a Unix-branded operating system developed by Sun Microsystems for their workstation and server computer systems from 1982 until the mid-1990s. The ''SunOS'' name is usually only used to refer to versions 1.0 to 4.1.4, which were based ...
4 (although SunOS 4.1.''x'' micro releases were retroactively named ''Solaris 1''). ZFS was designed and implemented by a team at Sun led by Jeff Bonwick, Bill Moore, and Matthew Ahrens. It was announced on September 14, 2004, but development started in 2001. Source code for ZFS was integrated into the main trunk of Solaris development on October 31, 2005 and released for developers as part of build 27 of
OpenSolaris OpenSolaris () is a discontinued open-source computer operating system for SPARC and x86 based systems, created by Sun Microsystems and based on Solaris. Its development began in the mid 2000s and ended in 2010. OpenSolaris was developed as ...
on November 16, 2005. In June 2006, Sun announced that ZFS was included in the mainstream 6/06 update to Solaris 10. Solaris was originally developed as
proprietary software Proprietary software is computer software, software that grants its creator, publisher, or other rightsholder or rightsholder partner a legal monopoly by modern copyright and intellectual property law to exclude the recipient from freely sharing t ...
, but Sun Microsystems was an early commercial proponent of
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
software and in June 2005 released most of the Solaris
codebase In software development, a codebase (or code base) is a collection of source code used to build a particular software system, application, or software component. Typically, a codebase includes only human-written source code system files; thu ...
under the
CDDL The Common Development and Distribution License (CDDL) is a free and open-source software license, produced by Sun Microsystems, based on the Mozilla Public License (MPL). Files licensed under the CDDL can be combined with files licensed under o ...
license and founded the
OpenSolaris OpenSolaris () is a discontinued open-source computer operating system for SPARC and x86 based systems, created by Sun Microsystems and based on Solaris. Its development began in the mid 2000s and ended in 2010. OpenSolaris was developed as ...
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
project. In Solaris 10 6/06 ("U2"), Sun added the
ZFS ZFS (previously Zettabyte File System) is a file system with Volume manager, volume management capabilities. It began as part of the Sun Microsystems Solaris (operating system), Solaris operating system in 2001. Large parts of Solaris, includin ...
file system and frequently updated ZFS with new features during the next 5 years. ZFS was ported to
Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
,
Mac OS X macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...
(continued as
MacZFS OpenZFS is an open-source implementation of the ZFS file system and volume manager initially developed by Sun Microsystems for the Oracle Solaris, Solaris operating system, and is now maintained by the OpenZFS Project. Similar to the original ZF ...
), and
FreeBSD FreeBSD is a free-software Unix-like operating system descended from the Berkeley Software Distribution (BSD). The first version was released in 1993 developed from 386BSD, one of the first fully functional and free Unix clones on affordable ...
, under this open source license. The name at one point was said to stand for "Zettabyte File System", but by 2006, the name was no longer considered to be an abbreviation. A ZFS file system can store up to 256 quadrillion zettabytes (ZB). In September 2007,
NetApp NetApp, Inc. is an American data infrastructure company that provides unified data storage, integrated data services, and cloud operations (CloudOps) solutions to enterprise customers. The company is based in San Jose, California. It has ranked ...
sued Sun, claiming that ZFS infringed some of NetApp's patents on
Write Anywhere File Layout The Write Anywhere File Layout (WAFL) is a proprietary file system that supports large, high-performance Redundant array of independent disks, RAID arrays, quick restarts without lengthy consistency checks in the event of a Crash (computing), c ...
. Sun counter-sued in October the same year claiming the opposite. The lawsuits were ended in 2010 with an undisclosed settlement.


2010–current: Development at Oracle, OpenZFS

Ported versions of ZFS began to appear in 2005. After the
Sun acquisition by Oracle The acquisition of Sun Microsystems by Oracle Corporation was completed on January 27, 2010. After the acquisition was completed, Oracle, only a software vendor prior to the merger, owned Sun's hardware product lines, such as SPARC Enterprise, a ...
in 2010, Oracle's version of ZFS became closed source, and the development of open-source versions proceeded independently, coordinated by
OpenZFS OpenZFS is an open-source implementation of the ZFS file system and volume manager initially developed by Sun Microsystems for the Solaris operating system, and is now maintained by the OpenZFS Project. Similar to the original ZFS, the impleme ...
from 2013.


Features


Summary

Examples of features specific to ZFS include: * Designed for long-term storage of data, and indefinitely scaled datastore sizes with zero data loss, and high configurability. * Hierarchical
checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify dat ...
ming of all data and
metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
, ensuring that the entire storage system can be verified on use, and confirmed to be correctly stored, or remedied if corrupt. Checksums are stored with a block's parent
block Block or blocked may refer to: Arts, entertainment and media Broadcasting * Block programming, the result of a programming strategy in broadcasting * W242BX, a radio station licensed to Greenville, South Carolina, United States known as ''96.3 ...
, rather than with the block itself. This contrasts with many file systems where checksums (if held) are stored with the data so that if the data is lost or corrupt, the checksum is also likely to be lost or incorrect. * Can store a user-specified number of copies of data or metadata, or selected types of data, to improve the ability to recover from data corruption of important files and structures. * Automatic rollback of recent changes to the file system and data, in some circumstances, in the event of an error or inconsistency. * Automated and (usually) silent self-healing of data inconsistencies and write failure when detected, for all errors where the data is capable of reconstruction. Data can be reconstructed using all of the following: error detection and correction checksums stored in each block's parent block; multiple copies of data (including checksums) held on the disk; write intentions logged on the SLOG (ZIL) for writes that should have occurred but did not occur (after a power failure); parity data from RAID/RAID-Z disks and volumes; copies of data from mirrored disks and volumes. * Native handling of standard RAID levels and additional ZFS RAID layouts (" RAID-Z"). The RAID-Z levels stripe data across only the disks required, for efficiency (many RAID systems stripe indiscriminately across all devices), and checksumming allows rebuilding of inconsistent or corrupted data to be minimized to those blocks with defects; * Native handling of tiered storage and caching devices, which is usually a volume related task. Because ZFS also understands the file system, it can use file-related knowledge to inform, integrate, and optimize its tiered storage handling which a separate device cannot; * Native handling of snapshots and backup/ replication which can be made efficient by integrating the volume and file handling. Relevant tools are provided at a low level and require external scripts and software for utilization. * Native
data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...
and deduplication, although the latter is largely handled in
RAM Ram, ram, or RAM most commonly refers to: * A male sheep * Random-access memory, computer memory * Ram Trucks, US, since 2009 ** List of vehicles named Dodge Ram, trucks and vans ** Ram Pickup, produced by Ram Trucks Ram, ram, or RAM may also ref ...
and is memory hungry. * Efficient rebuilding of RAID arrays—a RAID controller often has to rebuild an entire disk, but ZFS can combine disk and file knowledge to limit any rebuilding to data which is actually missing or corrupt, greatly speeding up rebuilding; * Unaffected by RAID hardware changes which affect many other systems. On many systems, if self-contained RAID hardware such as a RAID card fails, or the data is moved to another RAID system, the file system will lack information that was on the original RAID hardware, which is needed to manage data on the RAID array. This can lead to a total loss of data unless near-identical hardware can be acquired and used as a "stepping stone". Since ZFS manages RAID itself, a ZFS pool can be migrated to other hardware, or the operating system can be reinstalled, and the RAID-Z structures and data will be recognized and immediately accessible by ZFS again. * Ability to identify data that would have been found in a cache but has been discarded recently instead; this allows ZFS to reassess its caching decisions in light of later use and facilitates very high cache-hit levels (ZFS cache hit rates are typically over 80%); * Alternative caching strategies can be used for data that would otherwise cause delays in data handling. For example, synchronous writes which are capable of slowing down the storage system can be converted to asynchronous writes by being written to a fast separate caching device, known as the SLOG (sometimes called the ZIL—ZFS Intent Log). * Highly tunable—many internal parameters can be configured for optimal functionality. * Can be used for
high availability High availability (HA) is a characteristic of a system that aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period. There is now more dependence on these systems as a result of modernization ...
clusters and computing, although not fully designed for this use.


Data integrity

One major feature that distinguishes ZFS from other file systems is that it is designed with a focus on data integrity by protecting the user's data on disk against
silent data corruption Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data. Computer, transmission, and storage systems use a number of meas ...
caused by
data degradation Data degradation is the gradual Data corruption, corruption of Data (computing), computer data due to an accumulation of non-critical failures in a data storage device. It is also referred to as data decay, data rot or bit rot. This results in ...
, power surges (
voltage spike In electrical engineering, spikes are fast, short duration electrical transients in voltage (voltage spikes), current (current spikes), or transferred energy (energy spikes) in an electrical circuit. Fast, short duration electrical transients ...
s), bugs in disk
firmware In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, h ...
, phantom writes (the previous write did not make it to disk), misdirected reads/writes (the disk accesses the wrong block), DMA parity errors between the array and server memory or from the driver (since the checksum validates data inside the array), driver errors (data winds up in the wrong buffer inside the kernel), accidental overwrites (such as swapping to a live file system), etc.. A 1999 study showed that neither any of the then-major and widespread filesystems (such as UFS,
Ext Ext, ext or EXT may refer to: * Ext functor, used in the mathematical field of homological algebra * Ext (JavaScript library), a programming library used to build interactive web applications * Exeter Airport Exeter Airport , formerly ''Ex ...
,
XFS XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; a ...
, JFS, or
NTFS NT File System (NTFS) (commonly called ''New Technology File System'') is a proprietary journaling file system developed by Microsoft in the 1990s. It was developed to overcome scalability, security and other limitations with File Allocation Tabl ...
), nor hardware RAID (which has some issues with data integrity) provided sufficient protection against data corruption problems. Initial research indicates that ZFS protects data better than earlier efforts. It is also faster than UFS and can be seen as its replacement. Within ZFS, data integrity is achieved by using a Fletcher-based checksum or a
SHA-256 SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compressi ...
hash throughout the file system tree. Each block of data is checksummed and the checksum value is then saved in the pointer to that block—rather than at the actual block itself. Next, the block pointer is checksummed, with the value being saved at ''its'' pointer. This checksumming continues all the way up the file system's data hierarchy to the root node, which is also checksummed, thus creating a Merkle tree. In-flight data corruption or phantom reads/writes (the data written/read checksums correctly but is actually wrong) are undetectable by most filesystems as they store the checksum with the data. ZFS stores the checksum of each block in its parent block pointer so that the entire pool self-validates. When a block is accessed, regardless of whether it is data or meta-data, its checksum is calculated and compared with the stored checksum value of what it "should" be. If the checksums match, the data are passed up the programming stack to the process that asked for it; if the values do not match, then ZFS can heal the data if the storage pool provides
data redundancy In computer main memory, auxiliary storage and computer buses, data redundancy is the existence of data that is additional to the actual data and permits correction of errors in stored or transmitted data. The additional data can simply be a com ...
(such as with internal
mirroring Mirroring is the behavior in which one person subconsciously imitates the gesture, idiolect, speech pattern, or attitude of another. Mirroring often occurs in social situations, particularly in the company of close friends or family, often going ...
), assuming that the copy of data is undamaged and with matching checksums. It is optionally possible to provide additional in-pool redundancy by specifying (or ), which means that data will be stored twice (or three times) on the disk, effectively halving (or, for , reducing to one-third) the storage capacity of the disk. Additionally, some kinds of data used by ZFS to manage the pool are stored multiple times by default for safety even with the default copies=1 setting. If other copies of the damaged data exist or can be reconstructed from checksums and parity data, ZFS will use a copy of the data (or recreate it via a RAID recovery mechanism) and recalculate the checksum—ideally resulting in the reproduction of the originally expected value. If the data passes this integrity check, the system can then update all faulty copies with known-good data and redundancy will be restored. If there are no copies of the damaged data, ZFS puts the pool in a faulted state, preventing its future use and providing no documented ways to recover pool contents. Consistency of data held in memory, such as cached data in the ARC, is not checked by default, as ZFS is expected to run on enterprise-quality hardware with error correcting RAM. However, the capability to check in-memory data exists and can be enabled using "debug flags".


RAID

For ZFS to be able to guarantee data integrity, it needs multiple copies of the data or parity information, usually spread across multiple disks. This is typically achieved by using either a
RAID RAID (; redundant array of inexpensive disks or redundant array of independent disks) is a data storage virtualization technology that combines multiple physical Computer data storage, data storage components into one or more logical units for th ...
controller or so-called "soft" RAID (built into a file system).


Avoidance of hardware RAID controllers

While ZFS ''can'' work with hardware
RAID RAID (; redundant array of inexpensive disks or redundant array of independent disks) is a data storage virtualization technology that combines multiple physical Computer data storage, data storage components into one or more logical units for th ...
devices, it will usually work more efficiently and with greater data protection if it has raw access to all storage devices. ZFS relies on the disk for an honest view to determine the moment data is confirmed as safely written and has numerous
algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
s designed to optimize its use of caching, cache flushing, and disk handling. Disks connected to the system using a hardware, firmware, other "soft" RAID, or any other controller that modifies the ZFS-to-disk I/O path will affect ZFS performance and data integrity. If a third-party device performs caching or presents drives to ZFS as a single system without the
low level High-level and low-level, as technical terms, are used to classify, describe and point to specific goals of a systematic operation; and are applied in a wide range of contexts, such as, for instance, in domains as widely varied as computer scienc ...
view ZFS relies upon, there is a much greater chance that the system will perform ''less'' optimally and that ZFS will be less likely to prevent failures, recover from failures more slowly, or lose data due to a write failure. For example, if a hardware RAID card is used, ZFS may not be able to determine the condition of disks, determine if the RAID array is degraded or rebuilding, detect all data corruption, place data optimally across the disks, make selective repairs, control how repairs are balanced with ongoing use, or make repairs that ZFS could usually undertake. The hardware RAID card will interfere with ZFS' algorithms. RAID controllers also usually add controller-dependent data to the drives which prevents software RAID from accessing the user data. In the case of a hardware RAID controller failure, it may be possible to read the data with another compatible controller, but this isn't always possible and a replacement may not be available. Alternate hardware RAID controllers may not understand the original manufacturer's custom data required to manage and restore an array. Unlike most other systems where RAID cards or similar hardware can offload resources and processing to enhance performance and reliability, with ZFS it is strongly recommended that these methods ''not'' be used as they typically ''reduce'' the system's performance and reliability. If disks must be attached through a RAID or other controller, it is recommended to minimize the amount of processing done in the controller by using a plain HBA (host adapter), a simple
fanout In digital electronics, the fan-out is the number of gate inputs driven by the output of another single logic gate. In most designs, logic gates are connected to form more complex circuits. While no logic gate input can be fed by more than one ...
card, or configure the card in
JBOD The most widespread standard for configuring multiple hard disk drives is RAID (redundant array of inexpensive/independent disks), which comes in a number of standard configurations and non-standard configurations. Non-RAID drive architectures a ...
mode (i.e. turn off RAID and caching functions), to allow devices to be attached with minimal changes in the ZFS-to-disk I/O pathway. A RAID card in JBOD mode may still interfere if it has a cache or, depending upon its design, may detach drives that do not respond in time (as has been seen with many energy-efficient consumer-grade hard drives), and as such, may require Time-Limited Error Recovery (TLER)/CCTL/ERC-enabled drives to prevent drive dropouts, so not all cards are suitable even with RAID functions disabled.


ZFS's approach: RAID-Z and mirroring

Instead of hardware RAID, ZFS employs "soft" RAID, offering ''RAID-Z'' ( parity based like RAID 5 and similar) and disk mirroring (similar to RAID 1). The schemes are highly flexible. RAID-Z is a data/parity distribution scheme like RAID-5, but uses dynamic stripe width: every block is its own RAID stripe, regardless of blocksize, resulting in every RAID-Z write being a full-stripe write. This, when combined with the copy-on-write transactional semantics of ZFS, eliminates the write hole error. RAID-Z is also faster than traditional RAID 5 because it does not need to perform the usual read-modify-write sequence. As all stripes are of different sizes, RAID-Z reconstruction has to traverse the filesystem metadata to determine the actual RAID-Z geometry. This would be impossible if the filesystem and the RAID array were separate products, whereas it becomes feasible when there is an integrated view of the logical and physical structure of the data. Going through the metadata means that ZFS can validate every block against its 256-bit checksum as it goes, whereas traditional RAID products usually cannot do this. In addition to handling whole-disk failures, RAID-Z can also detect and correct
silent data corruption Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data. Computer, transmission, and storage systems use a number of meas ...
, offering "self-healing data": when reading a RAID-Z block, ZFS compares it against its checksum, and if the data disks did not return the right answer, ZFS reads the parity and then figures out which disk returned bad data. Then, it repairs the damaged data and returns good data to the requestor. RAID-Z and mirroring do not require any special hardware: they do not need NVRAM for reliability, and they do not need write buffering for good performance or data protection. With RAID-Z, ZFS provides fast, reliable storage using cheap, commodity disks. There are five different RAID-Z modes: ''striping'' (similar to RAID 0, offers no redundancy), ''RAID-Z1'' (similar to RAID 5, allows one disk to fail), ''RAID-Z2'' (similar to RAID 6, allows two disks to fail), ''RAID-Z3'' (a RAID 7 configuration, allows three disks to fail), and mirroring (similar to RAID 1, allows all but one disk to fail). The need for RAID-Z3 arose in the early 2000s as multi-terabyte capacity drives became more common. This increase in capacity—without a corresponding increase in throughput speeds—meant that rebuilding an array due to a failed drive could "easily take weeks or months" to complete. During this time, the older disks in the array will be stressed by the additional workload, which could result in data corruption or drive failure. By increasing parity, RAID-Z3 reduces the chance of data loss by simply increasing redundancy.


Resilvering and scrub (array syncing and integrity checking)

ZFS has no tool equivalent to
fsck The system utility fsck (''file system check'') is a tool for checking the consistency of a file system in Unix and Unix-like operating systems, such as Linux Linux ( ) is a family of open source Unix-like operating systems based on the L ...
(the standard Unix and Linux data checking and repair tool for file systems). Instead, ZFS has a built-in scrub function which regularly examines all data and repairs silent corruption and other problems. Some differences are: * fsck must be run on an offline filesystem, which means the filesystem must be unmounted and is not usable while being repaired, while scrub is designed to be used on a mounted, live filesystem, and does not need the ZFS filesystem to be taken offline. * fsck usually only checks metadata (such as the journal log) but never checks the data itself. This means, after an fsck, the data might still not match the original data as stored. * fsck cannot always validate and repair data when checksums are stored with data (often the case in many file systems), because the checksums may also be corrupted or unreadable. ZFS always stores checksums separately from the data they verify, improving reliability and the ability of scrub to repair the volume. ZFS also stores multiple copies of data—metadata, in particular, may have upwards of 4 or 6 copies (multiple copies per disk and multiple disk mirrors per volume), greatly improving the ability of scrub to detect and repair extensive damage to the volume, compared to fsck. * scrub checks everything, including metadata and the data. The effect can be observed by comparing fsck to scrub times—sometimes a fsck on a large RAID completes in a few minutes, which means only the metadata was checked. Traversing all metadata and data on a large RAID takes many hours, which is exactly what scrub does. * while fsck detects and tries to fix errors using available filesystem data, scrub relies on redundancy to recover from issues. While fsck offers to fix the file system with partial data loss, scrub puts it into faulted state if there is no redundancy. The official recommendation from Sun/Oracle is to scrub enterprise-level disks once a month, and cheaper commodity disks once a week."You should also run a scrub prior to replacing devices or temporarily reducing a pool's redundancy to ensure that all devices are currently operational."


Capacity

ZFS is a
128-bit General home computing and gaming utility emerged at 8-bit word sizes, as 28=256 Word (computer architecture), words, a natural unit of data, became possible. Early 8-bit CPUs (such as the Zilog Z80 and MOS Technology 6502, used in the 1977 Co ...
file system, so it can address 1.84 × 1019 times more data than 64-bit systems such as
Btrfs Btrfs (pronounced as "better F S", "butter F S", "b-tree F S", or "B.T.R.F.S.") is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager (distinct from Linux's LVM), d ...
. The maximum limits of ZFS are designed to be so large that they should never be encountered in practice. For instance, fully populating a single zpool with 2128 bits of data would require 3×1024 TB hard disk drives. Some theoretical limits in ZFS are: * 16
exbibyte The byte is a units of information, unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character (computing), character of text in a computer and for this ...
s (264 bytes): maximum size of a single file * 248: number of entries in any individual directory * 16 exbibytes: maximum size of any attribute * 256: number of attributes of a file (actually constrained to 248 for the number of files in a directory) * 256 quadrillion zebibytes (2128 bytes): maximum size of any zpool * 264: number of devices in any zpool * 264: number of file systems in a zpool * 264: number of zpools in a system


Encryption

With Oracle Solaris, the encryption capability in ZFS is embedded into the I/O pipeline. During writes, a block may be compressed, encrypted, checksummed and then deduplicated, in that order. The policy for encryption is set at the dataset level when datasets (file systems or ZVOLs) are created. The wrapping keys provided by the user/administrator can be changed at any time without taking the file system offline. The default behaviour is for the wrapping key to be inherited by any child data sets. The data encryption keys are randomly generated at dataset creation time. Only descendant datasets (snapshots and clones) share data encryption keys. A command to switch to a new data encryption key for the clone or at any time is provided—this does not re-encrypt already existing data, instead utilising an encrypted master-key mechanism. the encryption feature is also fully integrated into OpenZFS 0.8.0 available for Debian and Ubuntu Linux distributions. There have been anecdotal end-user reports of failures when using ZFS native encryption. An exact cause has not been established.


Read/write efficiency

ZFS will automatically allocate data storage across all vdevs in a pool (and all devices in each vdev) in a way that generally maximises the performance of the pool. ZFS will also update its write strategy to take account of new disks added to a pool, when they are added. As a general rule, ZFS allocates writes across vdevs based on the free space in each vdev. This ensures that vdevs which have proportionately less data already, are given more writes when new data is to be stored. This helps to ensure that as the pool becomes more used, the situation does not develop that some vdevs become full, forcing writes to occur on a limited number of devices. It also means that when data is read (and reads are much more frequent than writes in most uses), different parts of the data can be read from as many disks as possible at the same time, giving much higher read performance. Therefore, as a general rule, pools and vdevs should be managed and new storage added, so that the situation does not arise that some vdevs in a pool are almost full and others almost empty, as this will make the pool less efficient. Free space in ZFS tends to become fragmented with usage. ZFS does not have a mechanism for defragmenting free space. There are anecdotal end-user reports of diminished performance when high free-space fragmentation is coupled with disk space over-utilization.


Other features


Storage devices, spares, and quotas

Pools can have hot spares to compensate for failing disks. When mirroring, block devices can be grouped according to physical chassis, so that the filesystem can continue in the case of the failure of an entire chassis. Storage pool composition is not limited to similar devices, but can consist of ad-hoc, heterogeneous collections of devices, which ZFS seamlessly pools together, subsequently doling out space to datasets (file system instances or ZVOLs) as needed. Arbitrary storage device types can be added to existing pools to expand their size. The storage capacity of all vdevs is available to all of the file system instances in the zpool. A quota can be set to limit the amount of space a file system instance can occupy, and a reservation can be set to guarantee that space will be available to a file system instance.


Caching mechanisms: ARC, L2ARC, Transaction groups, ZIL, SLOG, Special VDEV

ZFS uses different layers of disk cache to speed up read and write operations. Ideally, all data should be stored in RAM, but that is usually too expensive. Therefore, data is automatically cached in a hierarchy to optimize performance versus cost; these are often called "hybrid storage pools". Frequently accessed data will be stored in RAM, and less frequently accessed data can be stored on slower media, such as
solid-state drive A solid-state drive (SSD) is a type of solid-state storage device that uses integrated circuits to store data persistently. It is sometimes called semiconductor storage device, solid-state device, or solid-state disk. SSDs rely on non- ...
s (SSDs). Data that is not often accessed is not cached and left on the slow hard drives. If old data is suddenly read a lot, ZFS will automatically move it to SSDs or to RAM. ZFS caching mechanisms include one each for reads and writes, and in each case, two levels of caching can exist, one in computer memory (RAM) and one on fast storage (usually
solid-state drive A solid-state drive (SSD) is a type of solid-state storage device that uses integrated circuits to store data persistently. It is sometimes called semiconductor storage device, solid-state device, or solid-state disk. SSDs rely on non- ...
s (SSDs)), for a total of four caches. A number of other caches, cache divisions, and queues also exist within ZFS. For example, each VDEV has its own data cache, and the ARC cache is divided between data stored by the user and
metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
used by ZFS, with control over the balance between these.


= Special VDEV Class

= In OpenZFS 0.8 and later, it is possible to configure a Special VDEV class to preferentially store filesystem metadata, and optionally the Data Deduplication Table (DDT), and small filesystem blocks. This allows, for example, to create a Special VDEV on fast solid-state storage to store the metadata, while the regular file data is stored on spinning disks. This speeds up metadata-intensive operations such as filesystem traversal, scrub, and resilver, without the expense of storing the entire filesystem on solid-state storage.


Copy-on-write transactional model

ZFS uses a
copy-on-write Copy-on-write (COW), also called implicit sharing or shadowing, is a resource-management technique used in programming to manage shared data efficiently. Instead of copying data right away when multiple programs use it, the same data is shared ...
transactional
object model In computing, object model has two related but distinct meanings: # The properties of objects in general in a specific computer programming language, technology, notation or methodology that uses them. Examples are the object models of ''Java'', ...
. All block pointers within the filesystem contain a 256-bit
checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify dat ...
or 256-bit
hash Hash, hashes, hash mark, or hashing may refer to: Substances * Hash (food), a coarse mixture of ingredients, often based on minced meat * Hash (stew), a pork and onion-based gravy found in South Carolina * Hash, a nickname for hashish, a canna ...
(currently a choice between Fletcher-2, Fletcher-4, or
SHA-256 SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compressi ...
) of the target block, which is verified when the block is read. Blocks containing active data are never overwritten in place; instead, a new block is allocated, modified data is written to it, then any
metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
blocks referencing it are similarly read, reallocated, and written. To reduce the overhead of this process, multiple updates are grouped into transaction groups, and ZIL ( intent log) write cache is used when synchronous write semantics are required. The blocks are arranged in a tree, as are their checksums (see
Merkle signature scheme In hash-based cryptography, the Merkle signature scheme is a digital signature scheme based on Merkle trees (also called hash trees) and one-time signatures such as the Lamport signature scheme. It was developed by Ralph Merkle in the late 1970 ...
).


Snapshots and clones

An advantage of copy-on-write is that, when ZFS writes new data, the blocks containing the old data can be retained, allowing a snapshot version of the file system to be maintained. ZFS snapshots are consistent (they reflect the entire data as it existed at a single point in time), and can be created extremely quickly, since all the data composing the snapshot is already stored, with the entire storage pool often snapshotted several times per hour. They are also space efficient, since any unchanged data is shared among the file system and its snapshots. Snapshots are inherently read-only, ensuring they will not be modified after creation, although they should not be relied on as a sole means of backup. Entire snapshots can be restored and also files and directories within snapshots. Writeable snapshots ("clones") can also be created, resulting in two independent file systems that share a set of blocks. As changes are made to any of the clone file systems, new data blocks are created to reflect those changes, but any unchanged blocks continue to be shared, no matter how many clones exist. This is an implementation of the
Copy-on-write Copy-on-write (COW), also called implicit sharing or shadowing, is a resource-management technique used in programming to manage shared data efficiently. Instead of copying data right away when multiple programs use it, the same data is shared ...
principle.


Sending and receiving snapshots

ZFS file systems can be moved to other pools, also on remote hosts over the network, as the send command creates a stream representation of the file system's state. This stream can either describe complete contents of the file system at a given snapshot, or it can be a delta between snapshots. Computing the delta stream is very efficient, and its size depends on the number of blocks changed between the snapshots. This provides an efficient strategy, e.g., for synchronizing offsite backups or high availability mirrors of a pool.


Dynamic striping

Dynamic striping across all devices to maximize throughput means that as additional devices are added to the zpool, the stripe width automatically expands to include them; thus, all disks in a pool are used, which balances the write load across them.


Variable block sizes

ZFS uses variable-sized blocks, with 128 KB as the default size. Available features allow the administrator to tune the maximum block size which is used, as certain workloads do not perform well with large blocks. If
data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...
is enabled, variable block sizes are used. If a block can be compressed to fit into a smaller block size, the smaller size is used on the disk to use less storage and improve IO throughput (though at the cost of increased CPU use for the compression and decompression operations).


Lightweight filesystem creation

In ZFS, filesystem manipulation within a storage pool is easier than volume manipulation within a traditional filesystem; the time and effort required to create or expand a ZFS filesystem is closer to that of making a new directory than it is to volume manipulation in some other systems.


Adaptive endianness

Pools and their associated ZFS file systems can be moved between different platform architectures, including systems implementing different byte orders. The ZFS block pointer format stores filesystem metadata in an endian-adaptive way; individual metadata blocks are written with the native byte order of the system writing the block. When reading, if the stored endianness does not match the endianness of the system, the metadata is byte-swapped in memory. This does not affect the stored data; as is usual in
POSIX The Portable Operating System Interface (POSIX; ) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines application programming interfaces (APIs), along with comm ...
systems, files appear to applications as simple arrays of bytes, so applications creating and reading data remain responsible for doing so in a way independent of the underlying system's endianness.


Deduplication

Data deduplication In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. Successful implementation of the technique can improve storage utilization, which may in turn lower capital expenditure by reducing the overall amou ...
capabilities were added to the ZFS source repository at the end of October 2009, and relevant OpenSolaris ZFS development packages have been available since December 3, 2009 (build 128). Effective use of deduplication may require large RAM capacity; recommendations range between 1 and 5 GB of RAM for every TB of storage. An accurate assessment of the memory required for deduplication is made by referring to the number of unique blocks in the pool, and the number of bytes on disk and in RAM ("core") required to store each record—these figures are reported by inbuilt commands such as zpool and zdb. Insufficient physical memory or lack of ZFS cache can result in virtual memory thrashing when using deduplication, which can cause performance to plummet, or result in complete memory starvation. Because deduplication occurs at write-time, it is also very CPU-intensive and this can also significantly slow down a system. Other storage vendors use modified versions of ZFS to achieve very high
data compression ratio Data compression ratio, also known as compression power, is a measurement of the relative reduction in size of data representation produced by a data compression algorithm. It is typically expressed as the division of uncompressed size by compress ...
s. Two examples in 2012 were GreenBytes and Tegile. In May 2014, Oracle bought GreenBytes for its ZFS deduplication and replication technology. As described above, deduplication is usually ''not'' recommended due to its heavy resource requirements (especially RAM) and impact on performance (especially when writing), other than in specific circumstances where the system and data are well-suited to this space-saving technique.


Additional capabilities

* Explicit I/O priority with deadline scheduling. * Claimed globally optimal I/O sorting and aggregation. * Multiple independent prefetch streams with automatic length and stride detection. * Parallel, constant-time directory operations. * End-to-end checksumming, using a kind of "
Data Integrity Field Data Integrity Field (DIF) is an approach to protect data integrity in computer data storage from data corruption. It was proposed in 2003 by the T10 subcommittee of the International Committee for Information Technology Standards. A similar appro ...
", allowing data corruption detection (and recovery if you have redundancy in the pool). A choice of 3 hashes can be used, optimized for speed (fletcher), standardization and security (
SHA256 SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compression ...
) and salted hashes ( Skein). * Transparent filesystem compression. Supports LZJB,
gzip gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and ...
, LZ4 and Zstd. * Intelligent scrubbing and resilvering (resyncing). * Load and space usage sharing among disks in the pool. * Ditto blocks: Configurable data replication per filesystem, with zero, one or two extra copies requested per write for user data, and with that same base number of copies plus one or two for metadata (according to metadata importance). If the pool has several devices, ZFS tries to replicate over different devices. Ditto blocks are primarily an additional protection against corrupted sectors, not against total disk failure. * ZFS design (copy-on-write + superblocks) is safe when using disks with write cache enabled, if they honor the write barriers. This feature provides safety and a performance boost compared with some other filesystems. * On Solaris, when entire disks are added to a ZFS pool, ZFS automatically enables their write cache. This is not done when ZFS only manages discrete slices of the disk, since it does not know if other slices are managed by non-write-cache safe filesystems, like UFS. The FreeBSD implementation can handle disk flushes for partitions thanks to its GEOM framework, and therefore does not suffer from this limitation. * Per-user, per-group, per-project, and per-dataset quota limits. * Filesystem encryption since Solaris 11 Express, and OpenZFS (ZoL) 0.8. (on some other systems ZFS can utilize encrypted disks for a similar effect; GELI on FreeBSD can be used this way to create fully encrypted ZFS storage). * Pools can be imported in read-only mode. * It is possible to recover data by rolling back entire transactions at the time of importing the zpool. * Snapshots can be taken manually or automatically. The older versions of the stored data that they contain can be exposed as full read-only file systems. They can also be exposed as historic versions of files and folders when used with CIFS (also known as SMB, Samba or file shares); this is known as "Previous versions", "VSS shadow copies", or "File history" on
Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
, or AFP and "Apple Time Machine" on Apple devices. * Disks can be marked as 'spare'. A data pool can be set to automatically and transparently handle disk faults by activating a spare disk and beginning to resilver the data that was on the suspect disk onto it, when needed.


Limitations

* As of Solaris 10 Update 11 and Solaris 11.2, it was neither possible to reduce the number of top-level vdevs in a pool except hot spares, cache, and log devices, nor to otherwise reduce pool capacity. This functionality was said to be in development in 2007. Enhancements to allow reduction of vdevs is under development in OpenZFS.Chris Siebenman
Information on future vdev removal
, Univ Toronto, blog, quote
informal Twitter announcement by Alex Reece
Online shrinking by removing non-redundant top-level vdevs is supported since Solaris 11.4 released in August 2018 and OpenZFS (ZoL) 0.8 released May 2019. * it was not possible to add a disk as a column to a RAID Z, RAID Z2 or RAID Z3 vdev. However, a new RAID Z vdev can be created instead and added to the zpool. * Some traditional nested RAID configurations, such as RAID 51 (a mirror of RAID 5 groups), are not configurable in ZFS, without some 3rd-party tools. Vdevs can only be composed of raw disks or files, not other vdevs, using the default ZFS management commands. However, a ZFS pool effectively creates a stripe (RAID 0) across its vdevs, so the equivalent of a RAID 50 or RAID 60 is common. * Reconfiguring the number of devices in a top-level vdev requires copying data offline, destroying the pool, and recreating the pool with the new top-level vdev configuration, except for adding extra redundancy to an existing mirror, which can be done at any time or if all top level vdevs are mirrors with sufficient redundancy the zpool split command can be used to remove a vdev from each top level vdev in the pool, creating a 2nd pool with identical data.


Data recovery

ZFS does not ship with tools such as
fsck The system utility fsck (''file system check'') is a tool for checking the consistency of a file system in Unix and Unix-like operating systems, such as Linux Linux ( ) is a family of open source Unix-like operating systems based on the L ...
, because the file system itself was designed to self-repair. So long as a storage pool had been built with sufficient attention to the design of storage and redundancy of data, basic tools like
fsck The system utility fsck (''file system check'') is a tool for checking the consistency of a file system in Unix and Unix-like operating systems, such as Linux Linux ( ) is a family of open source Unix-like operating systems based on the L ...
were never required. However, if the pool was compromised because of poor hardware, inadequate design or redundancy, or unfortunate mishap, to the point that ZFS was unable to mount the pool, traditionally, there were no other, more advanced, tools which allowed an end-user to attempt partial salvage of the stored data from a badly corrupted pool. Modern ZFS has improved considerably on this situation over time, and continues to do so: :* Removal or abrupt failure of caching devices no longer causes pool loss. (At worst, loss of the ZIL may lose very recent transactions, but the ZIL does not usually store more than a few seconds' worth of recent transactions. Loss of the L2ARC cache does not affect data.) :* If the pool is unmountable, modern versions of ZFS will attempt to identify the most recent consistent point at which the pool can be recovered, at the cost of losing some of the most recent changes to the contents.
Copy on write Copy may refer to: *Copying or the product of copying (including the plural "copies"); the duplication of information or an artifact **Cut, copy and paste, a method of reproducing text or other data in computing ** File copying **Photocopying, a p ...
means that older versions of data, including top-level records and metadata, may still exist even though they are superseded, and if so, the pool can be wound back to a consistent state based on them. The older the data, the more likely it is that at least some blocks have been overwritten and that some data will be irrecoverable, so there is a limit at some point, on the ability of the pool to be wound back. :* Informally, tools exist to probe the reason why ZFS is unable to mount a pool, and guide the user or a developer as to manual changes required to force the pool to mount. These include using ''zdb'' (ZFS debug) to find a valid importable point in the pool, using
dtrace DTrace is a comprehensive dynamic tracing framework originally created by Sun Microsystems for troubleshooting kernel and application problems on production systems in real time. Originally developed for Solaris, it has since been released un ...
or similar to identify the issue causing mount failure, or manually bypassing health checks that cause the mount process to abort, and allow mounting of the damaged pool. :* , a range of significantly enhanced methods are gradually being rolled out within OpenZFS. These include: ::* Code refactoring, and more detailed diagnostic and debug information on mount failures, to simplify diagnosis and fixing of corrupt pool issues; ::* The ability to trust or distrust the stored pool configuration. This is particularly powerful, as it allows a pool to be mounted even when top-level vdevs are missing or faulty, when top level data is suspect, and also to rewind ''beyond'' a pool configuration change if that change was connected to the problem. Once the corrupt pool is mounted, readable files can be copied for safety, and it may turn out that data can be rebuilt even for missing vdevs, by using copies stored elsewhere in the pool. ::* The ability to fix the situation where a disk needed in one pool, was accidentally removed and added to a different pool, causing it to lose metadata related to the first pool, which becomes unreadable.


OpenZFS and ZFS

Oracle Corporation Oracle Corporation is an American Multinational corporation, multinational computer technology company headquartered in Austin, Texas. Co-founded in 1977 in Santa Clara, California, by Larry Ellison, who remains executive chairman, Oracle was ...
ceased the public development of both ZFS and OpenSolaris after the acquisition of Sun in 2010. Some developers forked the last public release of OpenSolaris as the Illumos project. Because of the significant advantages present in ZFS, it has been ported to several different platforms with different features and commands. For coordinating the development efforts and to avoid fragmentation,
OpenZFS OpenZFS is an open-source implementation of the ZFS file system and volume manager initially developed by Sun Microsystems for the Solaris operating system, and is now maintained by the OpenZFS Project. Similar to the original ZFS, the impleme ...
was founded in 2013. According to Matt Ahrens, one of the main architects of ZFS, over 50% of the original OpenSolaris ZFS code has been replaced in OpenZFS with community contributions as of 2019, making “Oracle ZFS” and “OpenZFS” politically and technologically incompatible.


Commercial and open source products

* 2008: Sun shipped a line of ZFS-based 7000-series storage appliances. * 2013: Oracle shipped ZS3 series of ZFS-based filers and seized first place in the SPC-2 benchmark with one of them. * 2013: iXsystems ships ZFS-based NAS devices called
FreeNAS TrueNAS is a family of Linux and FreeBSD based network-attached storage (NAS) storage products developed by iXsystems. The products consist of TrueNAS Enterprise and TrueNAS Community Edition. TrueNAS Enterprise is a storage appliance, with fu ...
, (now named TrueNAS CORE), for
SOHO SoHo, short for "South of Houston Street, Houston Street", is a neighborhood in Lower Manhattan, New York City. Since the 1970s, the neighborhood has been the location of many artists' lofts and art galleries, art installations such as The Wall ...
and TrueNAS for the enterprise. * 2014:
Netgear Netgear, Inc. (stylized as NETGEAR in all caps), is an American computer networking company based in San Jose, California, with offices in about 22 other countries. It produces networking hardware for consumers, businesses, and service provide ...
ships a line of ZFS-based NAS devices called ReadyDATA, designed to be used in the enterprise. * 2015: rsync.net announces a
cloud storage Cloud storage is a model of computer data storage in which data, said to be on "the cloud", is stored remotely in logical pools and is accessible to users over a network, typically the Internet. The physical storage spans multiple servers (so ...
platform that allows customers to provision their own zpool and import and export data using zfs send and zfs receive. * 2020: iXsystems Begins development of a ZFS-based hyperconverged software called TrueNAS SCALE, for
SOHO SoHo, short for "South of Houston Street, Houston Street", is a neighborhood in Lower Manhattan, New York City. Since the 1970s, the neighborhood has been the location of many artists' lofts and art galleries, art installations such as The Wall ...
and TrueNAS for the enterprise.


Oracle Corporation, closed source, and forking (from 2010)

In January 2010,
Oracle Corporation Oracle Corporation is an American Multinational corporation, multinational computer technology company headquartered in Austin, Texas. Co-founded in 1977 in Santa Clara, California, by Larry Ellison, who remains executive chairman, Oracle was ...
acquired Sun Microsystems, and quickly discontinued the OpenSolaris distribution and the open source development model. In August 2010, Oracle discontinued providing public updates to the source code of the Solaris OS/Networking repository, effectively turning Solaris 11 back into a
closed source Proprietary software is software that grants its creator, publisher, or other rightsholder or rightsholder partner a legal monopoly by modern copyright and intellectual property law to exclude the recipient from freely sharing the software or modi ...
proprietary {{Short pages monitor