The BAM file format (which stands for Binary Alignment Map) is the comprehensive raw data of
genome sequencing
Whole genome sequencing (WGS), also known as full genome sequencing or just genome sequencing, is the process of determining the entirety of the DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's ...
.
It consists of the
lossless
Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statisti ...
, compressed
binary
Binary may refer to:
Science and technology Mathematics
* Binary number, a representation of numbers using only two values (0 and 1) for each digit
* Binary function, a function that takes two arguments
* Binary operation, a mathematical op ...
representation of a set of
Sequence Alignment Map files.
Schema
BAM is the compressed binary representation of SAM (Sequence Alignment Map), a compact and index-able representation of nucleotide sequence alignments. The goal of indexing is to retrieve alignments that overlap a specific location quickly without having to go through all of them. Before indexing, BAM must be sorted by reference ID and then leftmost coordinate.
BAM is in compressed
BGZF format.

The structure of BAM files include a header section and an alignment section:
* Header—The sample name, sample length, and alignment method are all included in this section. The alignments section contains alignments that are linked to specific information in the header section.
* Alignments—The read name, read sequence, read quality, alignment information, and custom tags are all included in this file. The chromosome, start coordinate, alignment quality, and match descriptor string are all included in the read name.
** Alignment Section includes the following:
*** Read Group (RG)
*** Barcode Tag (BC)
*** Single-end alignment quality (SM)
*** Paired-end alignment quality (AS)
*** Edit distance tag (NM)
*** Amplicon name tag (XN)
BAM format uses 0-based
coordinate system
In geometry, a coordinate system is a system that uses one or more numbers, or coordinates, to uniquely determine and standardize the position of the points or other geometric elements on a manifold such as Euclidean space. The coordinates are ...
, where as SAM uses 1-based coordinate system. BAM can represent values in the range [−2^31 , 2^32).
Tools
To view a list of sequencing and analysis tools that work with SAM/BA
click here
See also
* FASTQ format
* SAM (file format), SAM format
* SAMtools
* CRAM (file format), CRAM format
* List of file formats#Biology, List of file formats for molecular biology
* Compression of Genomic Sequencing Data
External links
SAM format specification
Genomics
References
{{Reflist