HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, an archive file is a
computer file A computer file is a computer resource for recording data in a computer storage device, primarily identified by its file name. Just as words can be written to paper, so can data be written to a computer file. Files can be shared with and trans ...
that is composed of one or more files along with metadata. Archive files are used to
collect The collect ( ) is a short general prayer of a particular structure used in Christian liturgy. Collects appear in the liturgies of Catholic, Eastern Orthodox, Oriental Orthodox, Anglican, Methodist, Lutheran, and Presbyterian churches, among othe ...
multiple data files together into a single file for easier portability and storage, or simply to
compress compress is a Unix shell compression program based on the LZW compression algorithm. Compared to more modern compression utilities such as gzip and bzip2, compress performs faster and with less memory usage, at the cost of a significantly lo ...
files to use less storage space. Archive files often store
directory structure In computing, a directory structure is the way an operating system arranges files that are accessible to the user. Files are typically displayed in a hierarchical tree structure. File names and extensions A filename is a string used to uniquely ...
s, error detection and correction information, arbitrary comments, and sometimes use built-in
encryption In cryptography, encryption is the process of encoding information. This process converts the original representation of the information, known as plaintext, into an alternative form known as ciphertext. Ideally, only authorized parties can de ...
.


Applications


Portability

Archive files are particularly useful in that they store file system data and metadata within the contents of a particular file, and thus can be stored on systems or sent over
channels Channel, channels, channeling, etc., may refer to: Geography * Channel (geography), in physical geography, a landform consisting of the outline (banks) of the path of a narrow body of water. Australia * Channel Country, region of outback Austral ...
that do not support the file system in question, only file contents – examples include sending a
directory structure In computing, a directory structure is the way an operating system arranges files that are accessible to the user. Files are typically displayed in a hierarchical tree structure. File names and extensions A filename is a string used to uniquely ...
over
email Electronic mail (email or e-mail) is a method of exchanging messages ("mail") between people using electronic devices. Email was thus conceived as the electronic ( digital) version of, or counterpart to, mail, at a time when "mail" mean ...
, files with names unsupported on the target file system due to length or characters, and retaining files' date and time information. Additionally, it facilitates transferring high numbers of small files such as resources of saved web pages, since a container file is transferred using a single file operation, whereas transferring many small files requires the computer to modify the file system structure for each file individually, making it considerably slower.


Software distribution

Beyond archival purposes, archive files are frequently used for packaging software for distribution, as software contents are often naturally spread across several files; the archive is then known as a ''package''. While the archival file format is the same, there are additional conventions about contents, such as requiring a
manifest file A manifest file in computing is a file containing metadata for a group of accompanying files that are part of a set or coherent unit. For example, the files of a computer program may have a manifest describing the name, version number, license and t ...
, and the resulting format is known as a
package format A package format is a type of archive containing computer programs and additional metadata needed by package managers. While the archive file format itself may be unchanged, package formats bear additional metadata, such as a manifest file or cert ...
. Examples include deb for Debian, JAR for
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
, APK for Android, and self-extracting
Windows Installer Windows Installer (msiexec.exe, previously known as Microsoft Installer, codename Darwin) is a software component and application programming interface (API) of Microsoft Windows used for the installation, maintenance, and removal of software. ...
executables.


Features

Features supported by various kinds of archives include: * converting metadata into data stored inside a file (e.g., file name, permissions, etc.) *
checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data ...
s to detect errors *
data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressio ...
* file concatenation to store multiple files in a single file * file patches / updates (when recording changes since a previous archive) *
encryption In cryptography, encryption is the process of encoding information. This process converts the original representation of the information, known as plaintext, into an alternative form known as ciphertext. Ideally, only authorized parties can de ...
*
error correction code In computing, telecommunication, information theory, and coding theory, an error correction code, sometimes error correcting code, (ECC) is used for controlling errors in data over unreliable or noisy communication channels. The central idea is ...
to fix errors * splitting a large file into many equal sized files for storage or transmission Some archive programs have self-extraction, self-installation, source volume and medium information, and package notes/description. The
file extension A filename extension, file name extension or file extension is a suffix to the name of a computer file (e.g., .txt, .docx, .md). The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically d ...
or
file header File or filing may refer to: Mechanical tools and processes * File (tool), a tool used to ''remove'' fine amounts of material from a workpiece **Filing (metalworking), a material removal process in manufacturing ** Nail file, a tool used to gent ...
of the archive file are indicators of the file format used. Computer archive files are created by file archiver software,
optical disc authoring software Optical disc authoring, including DVD and Blu-ray Disc authoring, is the process of assembling source material—video, audio or other data—into the proper logical volume format to then be recorded ("burned") onto an optical disc (ty ...
, and
disk image A disk image, in computing, is a computer file containing the contents and structure of a disk volume or of an entire data storage device, such as a hard disk drive, tape drive, floppy disk, optical disc, or USB flash drive. A disk image is us ...
software.


Archive formats

An archive format is the file format of an archive file. Some formats are well-defined by their authors and have become conventions supported by multiple vendors and communities.


Types

* Archiving only formats store metadata and concatenate files. * Compression only formats only compress files. * Multi-function formats can store metadata, concatenate, compress, encrypt, create error detection and recovery information, and package the archive into self-extracting and self-expanding files. * Software packaging formats are used to create software packages that may be self-installing files. * Disk image formats are used to create
disk image A disk image, in computing, is a computer file containing the contents and structure of a disk volume or of an entire data storage device, such as a hard disk drive, tape drive, floppy disk, optical disc, or USB flash drive. A disk image is us ...
s of mass storage volumes.


Examples

Filename extension A filename extension, file name extension or file extension is a suffix to the name of a computer file (e.g., .txt, .docx, .md). The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically ...
s used to distinguish different types of archives include zip, rar, 7z, and
tar Tar is a dark brown or black viscous liquid of hydrocarbons and free carbon, obtained from a wide variety of organic materials through destructive distillation. Tar can be produced from coal, wood, petroleum, or peat. "a dark brown or black bi ...
, the first of which is the most widely implemented. Java also introduced a whole family of archive extensions such as jar and
war War is an intense armed conflict between states, governments, societies, or paramilitary groups such as mercenaries, insurgents, and militias. It is generally characterized by extreme violence, destruction, and mortality, using regular o ...
(''j'' is for Java and ''w'' is for web). They are used to exchange entire byte-code deployment. Sometimes they are also used to exchange source code and other text, HTML and XML files. By default they are all compressed.


Error detection and recovery

Archive files often include
parity check A parity bit, or check bit, is a bit added to a string of binary code. Parity bits are a simple form of error detecting code. Parity bits are generally applied to the smallest units of a communication protocol, typically 8-bit octets (bytes), ...
s and other
checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data ...
s for
error detection In information theory and coding theory with applications in computer science and telecommunication, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communi ...
, for instance zip files use a cyclic redundancy check (CRC). RAR archives may include additional error correction data (called recovery records). Archive files that do not natively support recovery records can use separate
parchive Parchive (a portmanteau of parity archive, and formally known as Parity Volume Set Specification) is an erasure code system that produces par files for checksum verification of data integrity, with the capability to perform data recovery operatio ...
(PAR) files that allows for additional error correction and recovery of missing files in a multi-file archive.


See also

* File archiver *
Disk image A disk image, in computing, is a computer file containing the contents and structure of a disk volume or of an entire data storage device, such as a hard disk drive, tape drive, floppy disk, optical disc, or USB flash drive. A disk image is us ...
*
Digital container format A container format (informally, sometimes called a wrapper) or metafile is a file format that allows multiple data streams to be embedded into a single file, usually along with metadata for identifying and further detailing those streams. Nota ...
, a similar concept in media files


References


"Application Note on the .ZIP file format"
official white paper published by PKWARE, Inc.
Tape Archive (.TAR) file format specification
excerpt from File Format List 2.0 by Max Maischein

from IBM Archives

from IBM Archives


External links

* {{Archive formats Computer files Computer archives Computer file systems Disk images Records management