In
computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, hardware and softw ...
, an archive file stores the content of one or more
files, possibly
compressed, with associated
metadata
Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive ...
such as
file name,
directory structure,
error detection and correction
In information theory and coding theory with applications in computer science and telecommunications, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communi ...
information, commentary, compressed data archives, storage, and sometimes
encryption
In Cryptography law, cryptography, encryption (more specifically, Code, encoding) is the process of transforming information in a way that, ideally, only authorized parties can decode. This process converts the original representation of the inf ...
. An archive file is often used to facilitate
portability,
distribution and
backup
In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "wikt:back ...
, and to reduce
storage use.
Applications
Portability
As an archive file stores
file system information, including file content and metadata, it can be leveraged for file system content portability across heterogeneous systems. For example, a
directory tree can be sent via
email
Electronic mail (usually shortened to email; alternatively hyphenated e-mail) is a method of transmitting and receiving Digital media, digital messages using electronics, electronic devices over a computer network. It was conceived in the ...
, files with unsupported names on the target system can be renamed during extraction,
timestamps can be retained rather than lost during
data transmission
Data communication, including data transmission and data reception, is the transfer of data, signal transmission, transmitted and received over a Point-to-point (telecommunications), point-to-point or point-to-multipoint communication chann ...
. Also, transfer of a single archive file may be faster than processing multiple files due to per-file overhead, and even faster if compressed.
Software distribution
Beyond archiving, archive files are often used for
software distribution. When used in connection with a
package manager
A package manager or package management system is a collection of software tools that automates the process of installing, upgrading, configuring, and removing computer programs for a computer in a consistent manner.
A package manager deals wi ...
, an archive must conform to a
package format and is called a ''package''. In particular, the format usually requires a
manifest file
In computer programming, a manifest file is a Data file, file containing metadata for a group of accompanying files that are part of a set or coherent unit. For example, the files of a computer program may have a manifest describing the name, Soft ...
. Examples include
deb for
Debian
Debian () is a free and open-source software, free and open source Linux distribution, developed by the Debian Project, which was established by Ian Murdock in August 1993. Debian is one of the oldest operating systems based on the Linux kerne ...
,
JAR for
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
,
APK for
Android, and
self-extracting Windows Installer executables.
Features
Notable features supported for various archives include:
* Concatenate multiple files in a single file
* Store file metadata as data, including file name, timestamps, permissions, source storage, notes and description
* Compression
* Encryption
* Error detection via
checksums
*
Error correction code to fix errors
*
Splitting
Splitting may refer to:
* Splitting (psychology)
* Lumpers and splitters, in classification or taxonomy
* Wood splitting
* Tongue splitting
* Splitting (raylway), Splitting, railway operation
Mathematics
* Heegaard splitting
* Splitting field
* S ...
a large file into multiple, smaller files
* File patches/updates (when
recording changes since a previous archive)
* Self-extraction
* Self-installation
Error detection and recovery
Archive files often include
parity check
A parity bit, or check bit, is a bit added to a string of binary code. Parity bits are a simple form of error detecting code. Parity bits are generally applied to the smallest units of a communication protocol, typically 8-bit octets (bytes) ...
s and other
checksums for
error detection, for instance
zip files use a
cyclic redundancy check (CRC).
RAR archives may include additional
error correction
In information theory and coding theory with applications in computer science and telecommunications, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communi ...
data (called recovery records).
Archive files that do not natively support recovery records can use separate
parchive (PAR) files that allows for additional error correction and recovery of missing files in a multi-file archive.
Format
The
format of an archive file is its archive format. Some formats are well-defined and some have become conventions supported by multiple vendors and communities. As is common for all files, the format of an archive is generally indicated by
file name extension and/or
file header.
Commonly used formats include
zip,
rar,
7z, and
tar.
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
introduced archive formats including
jar (''j'' for Java) and
war (''w'' for web) that store an entire runnable deployment; usually compressed.
See also
*
*
*
*
*
References
"Application Note on the .ZIP file format" official white paper published by PKWARE, Inc.
Tape Archive (.TAR) file format specification excerpt from File Format List 2.0 by Max Maischein
from IBM Archives
from IBM Archives
External links
{{Archive formats
Computer files
Computer archives
Computer file systems
Disk images
Records management