In
computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, an archive file is a
computer file that is composed of one or more files along with
metadata
Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive metadata – the descriptive ...
. Archive files are used to
collect multiple data files together into a single file for easier
portability
Portability may refer to:
*Portability (social security), the portability of social security benefits
* Porting, the ability of a computer program to be ported from one system to another in computer science
** Software portability, the portability ...
and storage, or simply to
compress files to use less storage space. Archive files often store
directory structures,
error detection and correction
In information theory and coding theory with applications in computer science and telecommunication, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communi ...
information, arbitrary comments, and sometimes use built-in
encryption.
Applications
Portability
Archive files are particularly useful in that they store
file system
In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
data and
metadata
Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive metadata – the descriptive ...
within the contents of a particular file, and thus can be stored on systems or sent over
channels that do not support the file system in question, only file contents – examples include sending a
directory structure over
email, files with names unsupported on the target file system due to length or characters, and
retaining files' date and time information.
Additionally, it facilitates transferring high numbers of small files such as resources of saved web pages, since a container file is transferred using a single
file operation, whereas transferring many small files requires the computer to modify the file system structure for each file individually, making it considerably slower.
Software distribution
Beyond archival purposes, archive files are frequently used for packaging software for
distribution Distribution may refer to:
Mathematics
*Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations
*Probability distribution, the probability of a particular value or value range of a varia ...
, as software contents are often naturally spread across several files; the archive is then known as a
''package''. While the archival file format is the same, there are additional conventions about contents, such as requiring a
manifest file, and the resulting format is known as a
package format. Examples include
deb for
Debian,
JAR for
Java,
APK for
Android
Android may refer to:
Science and technology
* Android (robot), a humanoid robot or synthetic organism designed to imitate a human
* Android (operating system), Google's mobile operating system
** Bugdroid, a Google mascot sometimes referred to ...
, and
self-extracting Windows Installer executables.
Features
Features supported by various kinds of archives include:
* converting
metadata
Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive metadata – the descriptive ...
into data stored inside a file (e.g., file name, permissions, etc.)
*
checksums to detect errors
*
data compression
* file concatenation to store multiple files in a single file
* file patches / updates (when
recording changes since a previous archive)
*
encryption
*
error correction code to fix errors
*
splitting a large file into many equal sized files for storage or transmission
Some archive programs have self-extraction, self-installation, source volume and medium information, and package notes/description.
The
file extension or
file header of the archive file are indicators of the
file format
A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free.
Some file formats ...
used. Computer archive files are created by
file archiver
A file archiver is a computer program that combines a number of files together into one archive file, or a series of archive files, for easier transportation or storage. File archivers may employ lossless data compression in their archive formats ...
software,
optical disc authoring software, and
disk image software.
Archive formats
An archive format is the
file format
A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free.
Some file formats ...
of an archive file. Some formats are well-defined by their authors and have become conventions supported by multiple vendors and communities.
Types
* Archiving only formats store
metadata
Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive metadata – the descriptive ...
and
concatenate
In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalisations of concatenat ...
files.
* Compression only formats only compress files.
* Multi-function formats can store
metadata
Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive metadata – the descriptive ...
, concatenate, compress, encrypt, create error detection and recovery information, and package the archive into self-extracting and self-expanding files.
* Software packaging formats are used to create
software packages that may be self-installing files.
* Disk image formats are used to create
disk images of mass storage volumes.
Examples
Filename extensions used to distinguish different types of archives include
zip
Zip, Zips or ZIP may refer to:
Common uses
* ZIP Code, USPS postal code
* Zipper or zip, clothing fastener
Science and technology Computing
* ZIP (file format), a compressed archive file format
** zip, a command-line program from Info-ZIP
* Zi ...
,
rar RAR or Rar may refer to:
* Radio acoustic ranging, a non-visual technique for determining a ship's position at sea
* "rar", the ISO 639-2 code for the Cook Islands Māori language
* RAR (file format), a proprietary compressed archive file format in ...
,
7z, and
tar, the first of which is the most widely implemented.
Java also introduced a whole family of archive extensions such as
jar and
war (''j'' is for Java and ''w'' is for web). They are used to exchange entire byte-code deployment. Sometimes they are also used to exchange source code and other text, HTML and XML files. By default they are all compressed.
Error detection and recovery
Archive files often include
parity checks and other
checksums for
error detection, for instance
zip files use a
cyclic redundancy check (CRC).
RAR archives may include additional
error correction
In information theory and coding theory with applications in computer science and telecommunication, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communica ...
data (called recovery records).
Archive files that do not natively support recovery records can use separate
parchive (PAR) files that allows for additional error correction and recovery of missing files in a multi-file archive.
See also
*
File archiver
A file archiver is a computer program that combines a number of files together into one archive file, or a series of archive files, for easier transportation or storage. File archivers may employ lossless data compression in their archive formats ...
*
Disk image
*
Digital container format, a similar concept in media files
References
"Application Note on the .ZIP file format" official white paper published by PKWARE, Inc.
Tape Archive (.TAR) file format specification excerpt from File Format List 2.0 by Max Maischein
from IBM Archives
from IBM Archives
External links
*
{{Archive formats
Computer files
Computer archives
Computer file systems
Disk images
Records management