Compound File Binary Format (CFBF), also called Compound File, Compound Document format, or Composite Document File V2 (CDF), is a
compound
Compound may refer to:
Architecture and built environments
* Compound (enclosure), a cluster of buildings having a shared purpose, usually inside a fence or wall
** Compound (fortification), a version of the above fortified with defensive struct ...
document file format
A document file format is a text or binary file format for storing documents on a storage media, especially for use by computers.
There currently exist a multitude of incompatible document file formats.
Examples of XML-based open standards ar ...
for storing numerous files and streams within a single file on a disk. CFBF is developed by
Microsoft
Microsoft Corporation is an American multinational corporation, multinational technology company, technology corporation producing Software, computer software, consumer electronics, personal computers, and related services headquartered at th ...
and is an implementation of Microsoft
COM Structured Storage
COM Structured Storage (variously also known as ''COM structured storage'' or '' OLE structured storage'') is a technology developed by Microsoft as part of its Windows operating system for storing hierarchical data within a single file. Stri ...
.
Microsoft has opened the format for use by others and it is now used in a variety of programs from
Microsoft Word
Microsoft Word is a word processing software developed by Microsoft. It was first released on October 25, 1983, under the name ''Multi-Tool Word'' for Xenix systems. Subsequent versions were later written for several other platforms includi ...
and
Microsoft Access
Microsoft Access is a database management system (DBMS) from Microsoft that combines the relational Access Database Engine (ACE) with a graphical user interface and software-development tools (not to be confused with the old Microsoft Access ...
to Business Objects. It also forms the basis of the
Advanced Authoring Format
The Advanced Authoring Format (AAF) is a file format for professional cross-platform data interchange, designed for the video post-production and authoring environment. It was created by the Advanced Media Workflow Association (AMWA), and is now ...
.
AMW Association (formerly AAF Association)
Overview
At its simplest, the Compound File Binary Format is a container, with little restriction on what can be stored within it.
A CFBF file structure loosely resembles a FAT
In nutrition, biology, and chemistry, fat usually means any ester of fatty acids, or a mixture of such compounds, most commonly those that occur in living beings or in food.
The term often refers specifically to triglycerides (triple es ...
filesystem
In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
. The file is partitioned into ''Sectors'' which are chained together with a ''File Allocation Table'' (not to be mistaken with the file system of the same name) which contains chains of sectors related to each file, a ''Directory'' holds information for contained files with a Sector ID (SID) for the starting sector of a chain and so on.
Structure
The CFBF file consists of a 512-Byte header record followed by a number of sectors whose size is defined in the header. The literature defines Sectors to be either 512 or 4096 bytes in length, although the format is potentially capable of supporting sectors ranging in size from 128-Bytes upwards in powers of 2 (128, 256, 512, 1024, etc.). The lower limit of 128 is the minimum required to fit a single directory entry in
a Directory Sector.
There are several types of sector that may be present in a CFBF:
* File Allocation Table (FAT) Sector – contains chains of sector indices much as a FAT does in the FAT/FAT32 filesystems
* MiniFAT Sectors – similar to the FAT but storing chains of mini-sectors within the Mini-Stream
* Double-Indirect FAT (DIFAT) Sector – contains chains of FAT sector indices
* Directory Sector – contains directory entries
* Stream Sector – contains arbitrary file data
* Range Lock Sector – contains the byte-range locking area of a large file
More detail is given below for the header and each sector type.
CFBF Header format
The CFBF Header occupies the first 512 bytes of the file and information required to interpret the rest of the file. The C-Style structure declaration below (extracted from the AAFA's Low-Level Container Specification) shows the members of the CFBF header and their purpose:
typedef unsigned long ULONG; // 4 Bytes
typedef unsigned short USHORT; // 2 Bytes
typedef short OFFSET; // 2 Bytes
typedef ULONG SECT; // 4 Bytes
typedef ULONG FSINDEX; // 4 Bytes
typedef USHORT FSOFFSET; // 2 Bytes
typedef USHORT WCHAR; // 2 Bytes
typedef ULONG DFSIGNATURE; // 4 Bytes
typedef unsigned char BYTE; // 1 Byte
typedef unsigned short WORD; // 2 Bytes
typedef unsigned long DWORD; // 4 Bytes
typedef ULONG SID; // 4 Bytes
typedef GUID CLSID; // 16 Bytes
struct StructuredStorageHeader ;
File Allocation Table (FAT) Sectors
When taken together as a single stream the collection of FAT sectors define the status and linkage of every sector in the file. Each entry in the FAT is 4 bytes in length and contains the sector number of the next sector in a FAT chain or one of the following special values:
* FREESECT (0xFFFFFFFF) – denotes an unused sector
* ENDOFCHAIN (0xFFFFFFFE) – marks the last sector in a FAT chain
* FATSECT (0xFFFFFFFD) – marks a sector used to store part of the FAT
* DIFSECT (0xFFFFFFFC) – marks a sector used to store part of the DIFAT
Range Lock Sector
The Range Lock Sector must exist in files greater than 2GB in size, and must not exist in files smaller than 2GB. The Range Lock Sector must contain the byte range 0x7FFFFF00 to 0x7FFFFFFF in the file. This area is reserved by Microsoft's COM implementation for storing byte-range locking information for concurrent access.
Glossary
* ''FAT'' – File Allocation Table, also known as: ''SAT'' – Sector Allocation Table
* ''DIFAT'' – Double-Indirect File Allocation Table
* ''FAT Chain'' – a group of FAT entries which indicate the sectors allocated to a Stream in the file
* ''Stream'' – a virtual file which occupies a number of sectors within the CFBF
* ''Sector'' – the unit of allocation within the CFBF, usually 512 or 4096 Bytes in length
See also
* Structured Storage
Structuring, also known as smurfing in banking jargon, is the practice of executing financial transactions such as making bank deposits in a specific pattern, calculated to avoid triggering financial institutions to file reports required by law ...
* Advanced Authoring Format (AAF)
* Cabinet (file format)
Cabinet (or CAB) is an archive-file format for Microsoft Windows that supports lossless data compression and embedded digital certificates used for maintaining archive integrity. Cabinet files have .cab filename extensions and are recognized b ...
References
External links
*
*
*
* {{cite web
, accessdate = 6 July 2019
, url = https://www.loc.gov/preservation/digital/formats/fdd/fdd000380.shtml
, title = Microsoft Compound File Binary File Format, Version 3
, publisher = Library of Congress, Digital Formats web site
Computer file formats
Digital container formats