Compressed Archive
   HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, hardware and softw ...
, an archive file stores the content of one or more files, possibly compressed, with associated
metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
such as
file name A filename or file name is a name used to uniquely identify a computer file in a file system. Different file systems impose different restrictions on filename lengths. A filename may (depending on the file system) include: * name – base ...
,
directory structure In computing, a directory structure is the way an operating system arranges files that are accessible to the user. Files are typically displayed in a hierarchical tree structure. File names and extensions A filename is a string used to uniquely ...
,
error detection and correction In information theory and coding theory with applications in computer science and telecommunications, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communi ...
information, commentary, compressed data archives, storage, and sometimes
encryption In Cryptography law, cryptography, encryption (more specifically, Code, encoding) is the process of transforming information in a way that, ideally, only authorized parties can decode. This process converts the original representation of the inf ...
. An archive file is often used to facilitate portability,
distribution Distribution may refer to: Mathematics *Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations *Probability distribution, the probability of a particular value or value range of a varia ...
and
backup In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "wikt:back ...
, and to reduce storage use.


Applications


Portability

As an archive file stores file system information, including file content and metadata, it can be leveraged for file system content portability across heterogeneous systems. For example, a directory tree can be sent via
email Electronic mail (usually shortened to email; alternatively hyphenated e-mail) is a method of transmitting and receiving Digital media, digital messages using electronics, electronic devices over a computer network. It was conceived in the ...
, files with unsupported names on the target system can be renamed during extraction, timestamps can be retained rather than lost during
data transmission Data communication, including data transmission and data reception, is the transfer of data, signal transmission, transmitted and received over a Point-to-point (telecommunications), point-to-point or point-to-multipoint communication chann ...
. Also, transfer of a single archive file may be faster than processing multiple files due to per-file overhead, and even faster if compressed.


Software distribution

Beyond archiving, archive files are often used for
software distribution Software distribution is the process of delivering software to the end user. Free software distribution tools GNU Autotools are widely used for which consist of source files written in C++ and the C programming language, but are not limited t ...
. When used in connection with a
package manager A package manager or package management system is a collection of software tools that automates the process of installing, upgrading, configuring, and removing computer programs for a computer in a consistent manner. A package manager deals wi ...
, an archive must conform to a
package format Package format is a type of archive containing computer programs and additional metadata needed by package managers; an instance of this type of archive is called a package. While the archive file format itself may be unchanged, package formats c ...
and is called a ''package''. In particular, the format usually requires a
manifest file In computer programming, a manifest file is a Data file, file containing metadata for a group of accompanying files that are part of a set or coherent unit. For example, the files of a computer program may have a manifest describing the name, Soft ...
. Examples include deb for
Debian Debian () is a free and open-source software, free and open source Linux distribution, developed by the Debian Project, which was established by Ian Murdock in August 1993. Debian is one of the oldest operating systems based on the Linux kerne ...
, JAR for
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
, APK for Android, and self-extracting
Windows Installer Windows Installer (msiexec.exe, previously known as Microsoft Installer, List of Microsoft codenames, codename Darwin) is a software component and application programming interface (API) of Microsoft Windows used for the Installation (computer ...
executables.


Features

Notable features supported for various archives include: * Concatenate multiple files in a single file * Store file metadata as data, including file name, timestamps, permissions, source storage, notes and description * Compression * Encryption * Error detection via
checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify dat ...
s *
Error correction code In computing, telecommunication, information theory, and coding theory, forward error correction (FEC) or channel coding is a technique used for controlling errors in data transmission over unreliable or noisy communication channels. The centra ...
to fix errors *
Splitting Splitting may refer to: * Splitting (psychology) * Lumpers and splitters, in classification or taxonomy * Wood splitting * Tongue splitting * Splitting (raylway), Splitting, railway operation Mathematics * Heegaard splitting * Splitting field * S ...
a large file into multiple, smaller files * File patches/updates (when recording changes since a previous archive) * Self-extraction * Self-installation


Error detection and recovery

Archive files often include
parity check A parity bit, or check bit, is a bit added to a string of binary code. Parity bits are a simple form of error detecting code. Parity bits are generally applied to the smallest units of a communication protocol, typically 8-bit octets (bytes) ...
s and other
checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify dat ...
s for
error detection In information theory and coding theory with applications in computer science and telecommunications, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communi ...
, for instance
zip files ZIP is an archive file format that supports lossless compression, lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed. The ZIP file format permits a number of Data compression, compr ...
use a
cyclic redundancy check A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to digital data. Blocks of data entering these systems get a short ''check value'' attached, based on ...
(CRC).
RAR archive RAR is a proprietary archive file format that supports data compression, error correction and file spanning. It was developed in 1993 by Russian software engineer Eugene Roshal and the software is licensed by ''win.rar GmbH''. The name ''RAR' ...
s may include additional
error correction In information theory and coding theory with applications in computer science and telecommunications, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communi ...
data (called recovery records). Archive files that do not natively support recovery records can use separate
parchive Parchive (a portmanteau of parity archive, and formally known as Parity Volume Set Specification) is an erasure code system that produces par files for checksum verification of data integrity, with the capability to perform data recovery operatio ...
(PAR) files that allows for additional error correction and recovery of missing files in a multi-file archive.


Format

The format of an archive file is its archive format. Some formats are well-defined and some have become conventions supported by multiple vendors and communities. As is common for all files, the format of an archive is generally indicated by
file name extension A filename extension, file name extension or file extension is a suffix to the name of a computer file (for example, .txt, .mp3, .exe) that indicates a characteristic of the file contents or its intended use. A filename extension is typically d ...
and/or
file header In information technology, header is supplemental data placed at the beginning of a block of data being stored or transmitted. In data transmission, the data following the header is sometimes called the '' payload'' or '' body''. It is vital that ...
. Commonly used formats include zip, rar, 7z, and
tar Tar is a dark brown or black viscous liquid of hydrocarbons and free carbon, obtained from a wide variety of organic materials through destructive distillation. Tar can be produced from coal, wood, petroleum, or peat. "a dark brown or black b ...
.
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
introduced archive formats including jar (''j'' for Java) and
war War is an armed conflict between the armed forces of states, or between governmental forces and armed groups that are organized under a certain command structure and have the capacity to sustain military operations, or between such organi ...
(''w'' for web) that store an entire runnable deployment; usually compressed.


See also

* * * * *


References


"Application Note on the .ZIP file format"
official white paper published by PKWARE, Inc.
Tape Archive (.TAR) file format specification
excerpt from File Format List 2.0 by Max Maischein

from IBM Archives

from IBM Archives


External links

{{Archive formats Computer files Computer archives Computer file systems Disk images Records management