
In
molecular biology
Molecular biology is a branch of biology that seeks to understand the molecule, molecular basis of biological activity in and between Cell (biology), cells, including biomolecule, biomolecular synthesis, modification, mechanisms, and interactio ...
, the TATA box (also called the Goldberg–Hogness box)
is a
sequence
In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...
of
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
found in the
core promoter region of
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s in
archaea
Archaea ( ) is a Domain (biology), domain of organisms. Traditionally, Archaea only included its Prokaryote, prokaryotic members, but this has since been found to be paraphyletic, as eukaryotes are known to have evolved from archaea. Even thou ...
and
eukaryote
The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...
s.
The
bacteria
Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of Prokaryote, prokaryotic microorganisms. Typically a few micr ...
l
homolog
In biology, homology is similarity in anatomical structures or genes between organisms of different taxa due to shared ancestry, ''regardless'' of current functional differences. Evolutionary biology explains homologous structures as retained her ...
of the TATA box is called the
Pribnow box
The Pribnow box (also known as the Pribnow-Schaller box) is a sequence of ''TATAAT'' of six nucleotides (thymine, adenine, thymine, etc.) that is an essential part of a promoter site on DNA for transcription to occur in bacteria.
It is an ideal ...
which has a shorter
consensus sequence
In molecular biology and bioinformatics, the consensus sequence (or canonical sequence) is the calculated sequence of most frequent residues, either nucleotide or amino acid, found at each position in a sequence alignment. It represents the result ...
.
The TATA box is considered a
non-coding DNA
Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, microRNA, piRNA, ribosomal RNA, and reg ...
sequence
In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...
(also known as a
cis-regulatory element
''Cis''-regulatory elements (CREs) or ''cis''-regulatory modules (CRMs) are regions of non-coding DNA which regulate the transcription of neighboring genes. CREs are vital components of genetic regulatory networks, which in turn control morpho ...
). It was termed the "TATA box" as it contains a consensus sequence characterized by repeating T and A
base pair
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
s.
How the term "box" originated is unclear. In the 1980s, while investigating
nucleotide sequences in
mouse
A mouse (: mice) is a small rodent. Characteristically, mice are known to have a pointed snout, small rounded ears, a body-length scaly tail, and a high breeding rate. The best known mouse species is the common house mouse (''Mus musculus'' ...
genome
A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
loci, the Hogness box sequence was found and "boxed in" at the -31 position.
When
consensus nucleotide
Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
s and alternative ones were compared, homologous regions were "boxed" by the researchers.
The boxing in of sequences sheds light on the origin of the term "box".
The TATA box was first identified in 1978
as a component of eukaryotic promoters.
Transcription is initiated at the TATA box in TATA-containing genes. The TATA box is the binding site of the
TATA-binding protein (TBP) and other
transcription factor
In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription (genetics), transcription of genetics, genetic information from DNA to messenger RNA, by binding t ...
s in some eukaryotic genes. Gene transcription by RNA polymerase II depends on the regulation of the core promoter by long-range regulatory elements such as enhancers and silencers.
Without proper regulation of transcription, eukaryotic organisms would not be able to properly respond to their environment.
Based on the sequence and mechanism of TATA box initiation,
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
s such as
insertions,
deletions, and
point mutation
A point mutation is a genetic mutation where a single nucleotide base is changed, inserted or deleted from a DNA or RNA sequence of an organism's genome. Point mutations have a variety of effects on the downstream protein product—consequences ...
s to this
consensus sequence
In molecular biology and bioinformatics, the consensus sequence (or canonical sequence) is the calculated sequence of most frequent residues, either nucleotide or amino acid, found at each position in a sequence alignment. It represents the result ...
can result in
phenotypic
In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology (physical form and structure), its developmental processes, its biochemical and physiological propert ...
changes. These phenotypic changes can then turn into a
disease
A disease is a particular abnormal condition that adversely affects the structure or function (biology), function of all or part of an organism and is not immediately due to any external injury. Diseases are often known to be medical condi ...
phenotype. Some diseases associated with
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
s in the TATA box include
gastric cancer
Stomach cancer, also known as gastric cancer, is a malignant tumor of the stomach. It is a cancer that develops in the lining of the stomach. Most cases of stomach cancers are gastric carcinomas, which can be divided into a number of subtypes ...
,
spinocerebellar ataxia,
Huntington's disease
Huntington's disease (HD), also known as Huntington's chorea, is an incurable neurodegenerative disease that is mostly Genetic disorder#Autosomal dominant, inherited. It typically presents as a triad of progressive psychiatric, cognitive, and ...
,
blindness
Visual or vision impairment (VI or VIP) is the partial or total inability of visual perception. In the absence of treatment such as corrective eyewear, assistive devices, and medical treatment, visual impairment may cause the individual difficul ...
,
β-thalassemia,
immunosuppression
Immunosuppression is a reduction of the activation or efficacy of the immune system. Some portions of the immune system itself have immunosuppressive effects on other parts of the immune system, and immunosuppression may occur as an adverse react ...
,
Gilbert's syndrome
Gilbert syndrome (GS) is a syndrome in which the liver of affected individuals processes bilirubin more slowly than the majority resulting in higher levels in the blood. Many people never have symptoms. Occasionally jaundice (a yellowing of the ...
, and
HIV-1
The subtypes of HIV include two main subtypes, known as HIV type 1 (HIV-1) and HIV type 2 (HIV-2). These subtypes have distinct genetic differences and are associated with different epidemiological patterns and clinical characteristics.
HIV-1 e ...
. The TATA-binding protein (TBP) could also be targeted by
virus
A virus is a submicroscopic infectious agent that replicates only inside the living Cell (biology), cells of an organism. Viruses infect all life forms, from animals and plants to microorganisms, including bacteria and archaea. Viruses are ...
es as a means of viral transcription.
History
Discovery
The TATA box was the first eukaryotic core promoter motif to be identified in 1978 by American biochemist
David Hogness while he and his graduate student, Michael Goldberg were on sabbatical at the
University of Basel
The University of Basel (Latin: ''Universitas Basiliensis''; German: ''Universität Basel'') is a public research university in Basel, Switzerland. Founded on 4 April 1460, it is Switzerland's oldest university and among the world's oldest univ ...
in Switzerland. They first discovered the TATA sequence while analyzing 5'
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
promoter sequences in ''
Drosophila
''Drosophila'' (), from Ancient Greek δρόσος (''drósos''), meaning "dew", and φίλος (''phílos''), meaning "loving", is a genus of fly, belonging to the family Drosophilidae, whose members are often called "small fruit flies" or p ...
,''
mammal
A mammal () is a vertebrate animal of the Class (biology), class Mammalia (). Mammals are characterised by the presence of milk-producing mammary glands for feeding their young, a broad neocortex region of the brain, fur or hair, and three ...
ian, and
viral genes.
The TATA box was found in
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
coding
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s transcribed by
RNA polymerase II
RNA polymerase II (RNAP II and Pol II) is a Protein complex, multiprotein complex that Transcription (biology), transcribes DNA into precursors of messenger RNA (mRNA) and most small nuclear RNA (snRNA) and microRNA. It is one of the three RNA pol ...
.
Evolutionary history
Most research on the TATA box has been conducted on yeast, human, and ''Drosophila'' genomes, however, similar elements have been found in
archaea
Archaea ( ) is a Domain (biology), domain of organisms. Traditionally, Archaea only included its Prokaryote, prokaryotic members, but this has since been found to be paraphyletic, as eukaryotes are known to have evolved from archaea. Even thou ...
and ancient
eukaryote
The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...
s.
In archaea species, the promoter contains an 8 bp AT-rich sequence located ~24 bp upstream of the transcription start site. This sequence was originally called Box A, which is now known to be the sequence that interacts with the homologue of the archaeal
TATA-binding protein (TBP). Also, even though some studies have uncovered several similarities, there are others that have detected notable differences between archaeal and eukaryotic TBP. The archaea protein exhibits a greater symmetry in its primary sequence and in the distribution of
electrostatic
Electrostatics is a branch of physics that studies slow-moving or stationary electric charges.
Since classical times, it has been known that some materials, such as amber, attract lightweight particles after rubbing. The Greek word (), mean ...
charge, which is important because the higher symmetry lowers the protein's ability to bind the TATA box in a polar manner.
Even though the TATA box is present in many eukaryotic promoters, it is not contained in the majority of promoters. One study found less than 30% of 1031 potential promoter regions contain a putative TATA box motif in humans. In ''Drosophila,'' less than 40% of 205 core promoters contain a TATA box.
When there is an absence of the TATA box and TBP is not present, the
downstream promoter element
In molecular biology, a downstream promoter element (DPE) is a core promoter element. Like all core promoters, the DPE plays an important role in the initiation of gene transcription by RNA polymerase II. The DPE was first described by T. W. Bur ...
(DPE) in cooperation with the
initiator element
The initiator element (''Inr''), sometimes referred to as initiator motif, is a core promoter that is similar in function to the Pribnow box (in prokaryotes) or the TATA box (in eukaryotes). The ''Inr'' is the simplest functional promoter that i ...
(Inr) bind to the transcription factor II D (
TFIID), initiating transcription in TATA-less promoters. The DPE has been identified in three ''Drosophila'' TATA-less promoters and in the TATA-less human
IRF-1 promoter.
Features
Location
Promoter sequences vary between
bacteria
Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of Prokaryote, prokaryotic microorganisms. Typically a few micr ...
and
eukaryote
The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...
s. In eukaryotes, the TATA box is located 25
base pair
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
s
upstream of the start site that
Rpb4/Rbp7 use to initiate
transcription. In
metazoans, the TATA box is located 30 base pairs upstream of the transcription start site.
While in yeast, ''
S. cerevisiae'', the TATA box has a variable position which can range from 40 to 100 bp upstream of the start site. The TATA box is also found in 40% of the
core promoters of genes that code for the
actin cytoskeleton and
contractile apparatus in cells.
The type of core promoter affects the level of transcription and expression of a
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
.
TATA-binding protein (TBP) can be recruited in two ways, by SAGA, a cofactor for
RNA polymerase II
RNA polymerase II (RNAP II and Pol II) is a Protein complex, multiprotein complex that Transcription (biology), transcribes DNA into precursors of messenger RNA (mRNA) and most small nuclear RNA (snRNA) and microRNA. It is one of the three RNA pol ...
, or by
TFIID.
When
promoters use the SAGA/TATA box complex to recruit RNA polymerase II, they are more highly regulated and display higher expression levels than promoters using the TFIID/TBP mode of recruitment.
Analogous sequences
In bacteria, promoter regions may contain a
Pribnow box
The Pribnow box (also known as the Pribnow-Schaller box) is a sequence of ''TATAAT'' of six nucleotides (thymine, adenine, thymine, etc.) that is an essential part of a promoter site on DNA for transcription to occur in bacteria.
It is an ideal ...
, which serves an analogous purpose to the eukaryotic TATA box. The Pribnow box has a 6 bp region centered around the -10 position and an 8-12 bp sequence around the -35 region that are both conserved.
A
CAAT box (also CAT box) is a region of nucleotides with the following consensus sequence: 5’ GGCCAATCT 3’. The CAAT box is located about 75-80 bases upstream of the transcription initiation site and about 150 bases upstream of the TATA box. It binds
transcription factor
In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription (genetics), transcription of genetics, genetic information from DNA to messenger RNA, by binding t ...
s (CAAT TF or CTFs) and thereby stabilizes the nearby
preinitiation complex for easier binding of
RNA polymerase
In molecular biology, RNA polymerase (abbreviated RNAP or RNApol), or more specifically DNA-directed/dependent RNA polymerase (DdRP), is an enzyme that catalyzes the chemical reactions that synthesize RNA from a DNA template.
Using the e ...
s. CAAT boxes are rarely found in genes that express proteins ubiquitous in all cell types.
Structure
Sequence and prevalence

The TATA box is a component of the eukaryotic
core promoter and generally contains the
consensus sequence
In molecular biology and bioinformatics, the consensus sequence (or canonical sequence) is the calculated sequence of most frequent residues, either nucleotide or amino acid, found at each position in a sequence alignment. It represents the result ...
5'-TATA(A/T)A(A/T)-3'.
In yeast, for example, one study found that various ''
Saccharomyces
''Saccharomyces'' is a genus of fungi that includes many species of yeasts. ''Saccharomyces'' is from Greek σάκχαρον (sugar) and μύκης (fungus) and means ''sugar fungus''. Many members of this genus are considered very important in f ...
'' genomes had the consensus sequence 5'-TATA(A/T)A(A/T)(A/G)-3', yet only about 20% of yeast genes even contained the TATA sequence.
Similarly, in humans only 24% of genes have
promoter regions containing the TATA box.
Genes containing the TATA-box tend to be involved in
stress-responses and certain types of
metabolism
Metabolism (, from ''metabolē'', "change") is the set of life-sustaining chemical reactions in organisms. The three main functions of metabolism are: the conversion of the energy in food to energy available to run cellular processes; the co ...
and are more highly regulated when compared to TATA-less genes.
Generally, TATA-containing genes are not involved in essential cellular functions such as
cell growth
Cell most often refers to:
* Cell (biology), the functional basic unit of life
* Cellphone, a phone connected to a cellular network
* Clandestine cell, a penetration-resistant form of a secret or outlawed organization
* Electrochemical cell, a de ...
,
DNA replication
In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all life, living organisms, acting as the most essential part of heredity, biolog ...
,
transcription, and
translation
Translation is the communication of the semantics, meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The English la ...
because of their highly regulated nature.
The TATA box is usually located 25-35 base pairs upstream of the transcription start site. Genes containing the TATA box usually require additional promoter elements, including an
initiator site located just
upstream of the transcription start site and a
downstream core element (DCE).
These additional promoter regions work in conjunction with the TATA box to regulate initiation of transcription in eukaryotes.
Function
Role in transcription initiation
The TATA-box is the site of
preinitiation complex formation, which is the first step in transcription initiation in eukaryotes. Formation of the preinitiation complex begins when the multi-subunit transcription factor II D (
TFIID) binds to the TATA box at its
TATA-binding protein (TBP) subunit.
TBP binds to the
minor groove of the TATA box via a region of antiparallel
β sheets in the protein.
Three types of molecular interactions contribute to
TBP binding to the TATA box:
# Four
phenylalanine
Phenylalanine (symbol Phe or F) is an essential α-amino acid with the chemical formula, formula . It can be viewed as a benzyl group substituent, substituted for the methyl group of alanine, or a phenyl group in place of a terminal hydrogen of ...
residues(Phe57, Phe74, Phe148, Phe165) on TBP bind to DNA and form kinks in the DNA, forcing the DNA minor groove open.
# Four
hydrogen bond
In chemistry, a hydrogen bond (H-bond) is a specific type of molecular interaction that exhibits partial covalent character and cannot be described as a purely electrostatic force. It occurs when a hydrogen (H) atom, Covalent bond, covalently b ...
s form between polar side chains on TBP
amino acid
Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although over 500 amino acids exist in nature, by far the most important are the 22 α-amino acids incorporated into proteins. Only these 22 a ...
(Asn27, Asn117, Thr82, Thr173)( and
bases in the
minor groove.
# Numerous
hydrophobic
In chemistry, hydrophobicity is the chemical property of a molecule (called a hydrophobe) that is seemingly repelled from a mass of water. In contrast, hydrophiles are attracted to water.
Hydrophobic molecules tend to be nonpolar and, thu ...
interactions(~15) form between TBP residues(notably Ile152 and Leu163) and DNA
bases, including
van der Waals forces.
Additionally, binding of
TBP is facilitated by stabilizing interactions with DNA flanking the TATA box, which consists of G-C rich sequences. These secondary interactions induce bending of the DNA and helical unwinding. The degree of DNA bending is species and sequence dependent. For example, one study used the adenovirus TATA promoter sequence (5'-CGCTATAAAAGGGC-3') as a model binding sequence and found that human TBP binding to the TATA box induced a 97° bend toward the
major groove while the yeast TBP protein only induced an 82° bend.
X-ray crystallography
X-ray crystallography is the experimental science of determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to Diffraction, diffract in specific directions. By measuring th ...
studies of TBP/TATA-box complexes generally agree that the DNA goes through an ~80° bend during the process of TBP-binding.
The conformational changes induced by
TBP binding to the TATA box allows for additional
transcription factor
In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription (genetics), transcription of genetics, genetic information from DNA to messenger RNA, by binding t ...
s and
RNA polymerase II
RNA polymerase II (RNAP II and Pol II) is a Protein complex, multiprotein complex that Transcription (biology), transcribes DNA into precursors of messenger RNA (mRNA) and most small nuclear RNA (snRNA) and microRNA. It is one of the three RNA pol ...
to bind to the
promoter region. TFIID first binds to the TATA box, facilitated by
TFIIA binding to the upstream part of the
TFIID complex.
TFIIB then binds to the TFIID-
TFIIA-DNA complex through interactions both upstream and downstream of the TATA box.
RNA polymerase II
RNA polymerase II (RNAP II and Pol II) is a Protein complex, multiprotein complex that Transcription (biology), transcribes DNA into precursors of messenger RNA (mRNA) and most small nuclear RNA (snRNA) and microRNA. It is one of the three RNA pol ...
is then recruited to this multi-protein complex with the help of
TFIIF.
Additional transcription factors then bind, first
TFIIE and then
TFIIH.
This completes the assembly of the
preinitiation complex for eukaryotic transcription.
Generally, the TATA box is found at RNA polymerase II promoter regions, although some ''
in vitro
''In vitro'' (meaning ''in glass'', or ''in the glass'') Research, studies are performed with Cell (biology), cells or biological molecules outside their normal biological context. Colloquially called "test-tube experiments", these studies in ...
'' studies have demonstrated that
RNA polymerase III
In eukaryote cells, RNA polymerase III (also called Pol III) is a protein that transcribes DNA to synthesize 5S ribosomal RNA, tRNA, and other small RNAs.
The genes transcribed by RNA Pol III fall in the category of "housekeeping" genes whose ex ...
can recognize TATA sequences.
This cluster of RNA polymerase II and various transcription factors is known as the basal transcriptional complex (BTC). In this state, it only gives a low level of transcription. Other factors must stimulate the BTC to increase transcription levels.
One such example of a BTC stimulating region of DNA is the
CAAT box. Additional factors, including the
Mediator complex, transcriptional regulatory proteins, and
nucleosome
A nucleosome is the basic structural unit of DNA packaging in eukaryotes. The structure of a nucleosome consists of a segment of DNA wound around eight histone, histone proteins and resembles thread wrapped around a bobbin, spool. The nucleosome ...
-modifying
enzyme
An enzyme () is a protein that acts as a biological catalyst by accelerating chemical reactions. The molecules upon which enzymes may act are called substrate (chemistry), substrates, and the enzyme converts the substrates into different mol ...
s also enhance
transcription ''
in vivo
Studies that are ''in vivo'' (Latin for "within the living"; often not italicized in English) are those in which the effects of various biological entities are tested on whole, living organisms or cells, usually animals, including humans, an ...
''.
Interactions
In specific cell types or on specific promoters TBP can be replaced by one of several TBP-related factors (TRF1 in
Drosophila
''Drosophila'' (), from Ancient Greek δρόσος (''drósos''), meaning "dew", and φίλος (''phílos''), meaning "loving", is a genus of fly, belonging to the family Drosophilidae, whose members are often called "small fruit flies" or p ...
, TBPL1/TRF2 in
metazoans, TBPL2/TRF3 in
vertebrate
Vertebrates () are animals with a vertebral column (backbone or spine), and a cranium, or skull. The vertebral column surrounds and protects the spinal cord, while the cranium protects the brain.
The vertebrates make up the subphylum Vertebra ...
s), some of which interact with the TATA box similar to
TBP. Interaction of TATA boxes with a variety of
activators or
repressor
In molecular genetics, a repressor is a DNA- or RNA-binding protein that inhibits the expression of one or more genes by binding to the operator or associated silencers. A DNA-binding repressor blocks the attachment of RNA polymerase to the ...
s can influence the
transcription of
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s in many ways.
Enhancers
In genetics, an enhancer is a short (50–1500 bp) region of DNA that can be bound by proteins ( activators) to increase the likelihood that transcription of a particular gene will occur. These proteins are usually referred to as transcriptio ...
are long-range regulatory elements that increase promoter activity while
silencers repress promoter activity.
Mutations
Mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
s to the TATA box can range from a
deletion or
insertion to a
point mutation
A point mutation is a genetic mutation where a single nucleotide base is changed, inserted or deleted from a DNA or RNA sequence of an organism's genome. Point mutations have a variety of effects on the downstream protein product—consequences ...
with varying effects based on the gene that has been mutated. The
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
s change the binding of the
TATA-binding protein (TBP) for
transcription initiation. Thus, there is a resulting change in
phenotype
In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology (physical form and structure), its developmental processes, its biochemical and physiological propert ...
based on the gene that is not being
expressed (Figure 3).
Insertions or deletions
One of the first studies of TATA box
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
s looked at a sequence of DNA from ''
Agrobacterium tumefaciens
''Agrobacterium tumefaciens'' (also known as ''Rhizobium radiobacter'') is the causal agent of crown gall disease (the formation of tumours) in over 140 species of eudicots. It is a rod-shaped, Gram-negative soil bacterium. Symptoms are cause ...
'' for the octopine type
cytokinin gene.
This specific gene has three TATA boxes. A
phenotype
In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology (physical form and structure), its developmental processes, its biochemical and physiological propert ...
change was only observed when all three TATA boxes were deleted. An
insertion of extra base pairs between the last TATA box and the transcription start site resulted in a shift in the start site; thus, resulting in a phenotypic change. From this original
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
study, a change in transcription can be seen when there is no TATA box to promote transcription, but transcription of a gene will occur when there is an
insertion to the sequence. The nature of the resulting phenotype may be affected due to the
insertion.
Mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
s in
maize
Maize (; ''Zea mays''), also known as corn in North American English, is a tall stout grass that produces cereal grain. It was domesticated by indigenous peoples in southern Mexico about 9,000 years ago from wild teosinte. Native American ...
promoters affect the expression of the
promoter gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s in a plant-organ-specific manner.
A
duplication of the TATA box leads to a significant decrease in
enzymatic activity in the
scutellum and
root
In vascular plants, the roots are the plant organ, organs of a plant that are modified to provide anchorage for the plant and take in water and nutrients into the plant body, which allows plants to grow taller and faster. They are most often bel ...
s, leaving
pollen
Pollen is a powdery substance produced by most types of flowers of seed plants for the purpose of sexual reproduction. It consists of pollen grains (highly reduced Gametophyte#Heterospory, microgametophytes), which produce male gametes (sperm ...
enzymatic levels unaffected. A
deletion of the TATA box leads to a small decrease in
enzymatic activity in the
scutellum and
root
In vascular plants, the roots are the plant organ, organs of a plant that are modified to provide anchorage for the plant and take in water and nutrients into the plant body, which allows plants to grow taller and faster. They are most often bel ...
s, but a large decrease in
enzymatic levels in
pollen
Pollen is a powdery substance produced by most types of flowers of seed plants for the purpose of sexual reproduction. It consists of pollen grains (highly reduced Gametophyte#Heterospory, microgametophytes), which produce male gametes (sperm ...
.
Point mutations
Point mutations to the TATA box have similar varying
phenotypic
In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology (physical form and structure), its developmental processes, its biochemical and physiological propert ...
changes depending on the gene that is being affected. Studies also show that the placement of the
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
in the TATA box sequence hinders the binding of
TBP.
For example, a
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
from TATAAAA to CATAAAA does completely hinder the binding sufficiently to change
transcription, the neighboring sequences can affect if there is a change or not. However, a change can be seen in
HeLa
HeLa () is an immortalized cell line used in scientific research. It is the oldest human cell line and one of the most commonly used. HeLa cells are durable and prolific, allowing for extensive applications in scientific study. The line is ...
cells with a TATAAAA to TATACAA which leads to a 20 fold decrease in
transcription. Some diseases that can be caused due to this insufficiency by specific gene
transcription are:
Thalassemia
Thalassemias are a group of Genetic disorder, inherited blood disorders that manifest as the production of reduced hemoglobin. Symptoms depend on the type of thalassemia and can vary from none to severe, including death. Often there is mild to ...
,
lung cancer
Lung cancer, also known as lung carcinoma, is a malignant tumor that begins in the lung. Lung cancer is caused by genetic damage to the DNA of cells in the airways, often caused by cigarette smoking or inhaling damaging chemicals. Damaged ...
,
chronic hemolytic anemia,
immunosuppression
Immunosuppression is a reduction of the activation or efficacy of the immune system. Some portions of the immune system itself have immunosuppressive effects on other parts of the immune system, and immunosuppression may occur as an adverse react ...
,
hemophilia B Leyden, and
thrombophlebitis
Thrombophlebitis is a phlebitis (inflammation of a vein) related to a thrombus (blood clot). When it occurs repeatedly in different locations, it is known as thrombophlebitis migrans (migratory thrombophlebitis).
Signs and symptoms
The following ...
and
myocardial infarction
A myocardial infarction (MI), commonly known as a heart attack, occurs when Ischemia, blood flow decreases or stops in one of the coronary arteries of the heart, causing infarction (tissue death) to the heart muscle. The most common symptom ...
.
Savinkova et al. has written a simulation to predict the ''
KD'' value for a selected TATA box sequence and
TBP.
This can be used to directly predict the
phenotypic
In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology (physical form and structure), its developmental processes, its biochemical and physiological propert ...
traits resulting from a selected
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
based on how tightly
TBP is binding to the TATA box.
Diseases
Mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
s in the TATA box region affects the binding of the
TATA-binding protein (TBP) for transcription initiation, which may cause carriers to have a
disease
A disease is a particular abnormal condition that adversely affects the structure or function (biology), function of all or part of an organism and is not immediately due to any external injury. Diseases are often known to be medical condi ...
phenotype
In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology (physical form and structure), its developmental processes, its biochemical and physiological propert ...
.
Gastric cancer
Stomach cancer, also known as gastric cancer, is a malignant tumor of the stomach. It is a cancer that develops in the lining of the stomach. Most cases of stomach cancers are gastric carcinomas, which can be divided into a number of subtypes ...
is correlated with TATA box
polymorphism. The TATA box has a binding site for the
transcription factor
In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription (genetics), transcription of genetics, genetic information from DNA to messenger RNA, by binding t ...
of the PG2 gene. This gene produces PG2 serum, which is used as a
biomarker
In biomedical contexts, a biomarker, or biological marker, is a measurable indicator of some biological state or condition. Biomarkers are often measured and evaluated using blood, urine, or soft tissues to examine normal biological processes, ...
for
tumours in gastric cancer. Longer TATA box sequences correlates with higher levels of PG2 serum indicating gastric cancer conditions. Carriers with shorter TATA box sequences may produce lower levels of PG2 serum.
Several
neurodegenerative disorders
A neurodegenerative disease is caused by the progressive loss of neurons, in the process known as neurodegeneration. Neuronal damage may also ultimately result in their death. Neurodegenerative diseases include amyotrophic lateral sclerosis, mul ...
are associated TATA box mutations. Two disorders have been highlighted,
spinocerebellar ataxia and
Huntington's disease
Huntington's disease (HD), also known as Huntington's chorea, is an incurable neurodegenerative disease that is mostly Genetic disorder#Autosomal dominant, inherited. It typically presents as a triad of progressive psychiatric, cognitive, and ...
. In spinocerebellar ataxia, the disease phenotype is caused by expansion of the polyglutamine repeat in the
TATA-binding protein (TBP). An accumulation of these polyglutamine-TBP cells will occur, as shown by protein aggregates in brain sections of patients, resulting in a loss of
neuronal cells.
Blindness
Visual or vision impairment (VI or VIP) is the partial or total inability of visual perception. In the absence of treatment such as corrective eyewear, assistive devices, and medical treatment, visual impairment may cause the individual difficul ...
can be caused by excessive
cataract
A cataract is a cloudy area in the lens (anatomy), lens of the eye that leads to a visual impairment, decrease in vision of the eye. Cataracts often develop slowly and can affect one or both eyes. Symptoms may include faded colours, blurry or ...
formation when the TATA box is targeted by
microRNA
Micro ribonucleic acid (microRNA, miRNA, μRNA) are small, single-stranded, non-coding RNA molecules containing 21–23 nucleotides. Found in plants, animals, and even some viruses, miRNAs are involved in RNA silencing and post-transcr ...
s to increase the level of oxidative stress genes. MicroRNAs can target the
3'-untranslated region and bind to the TATA box to activate the
transcription of oxidative stress related genes.
SNPs
In genetics and bioinformatics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in ...
in TATA boxes are associated with
B-thalassemia,
immunosuppression
Immunosuppression is a reduction of the activation or efficacy of the immune system. Some portions of the immune system itself have immunosuppressive effects on other parts of the immune system, and immunosuppression may occur as an adverse react ...
, and other
neurological disorder
Neurological disorders represent a complex array of medical conditions that fundamentally disrupt the functioning of the nervous system. These disorders affect the brain, spinal cord, and nerve networks, presenting unique diagnosis, treatment, and ...
s.
SNPs
In genetics and bioinformatics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in ...
destabilize the TBP/TATA complex which significantly decreases the rate at which
TATA-binding proteins (TBP) will bind to the TATA box. This leads to lower levels of
transcription affecting the severity of the disease. Results from studies have shown the interaction in vitro so far, but results may be comparable to that in vivo.
Gilbert's syndrome
Gilbert syndrome (GS) is a syndrome in which the liver of affected individuals processes bilirubin more slowly than the majority resulting in higher levels in the blood. Many people never have symptoms. Occasionally jaundice (a yellowing of the ...
is correlated with UTG1A1 TATA box
polymorphism. This poses a risk for developing jaundice in newborns.
MicroRNA
Micro ribonucleic acid (microRNA, miRNA, μRNA) are small, single-stranded, non-coding RNA molecules containing 21–23 nucleotides. Found in plants, animals, and even some viruses, miRNAs are involved in RNA silencing and post-transcr ...
s also play a role in replicating
virus
A virus is a submicroscopic infectious agent that replicates only inside the living Cell (biology), cells of an organism. Viruses infect all life forms, from animals and plants to microorganisms, including bacteria and archaea. Viruses are ...
es such as
HIV-1
The subtypes of HIV include two main subtypes, known as HIV type 1 (HIV-1) and HIV type 2 (HIV-2). These subtypes have distinct genetic differences and are associated with different epidemiological patterns and clinical characteristics.
HIV-1 e ...
.
Novel HIV-1-encoded microRNA have been found to enhance the production of the virus as well as activating HIV-1 latency by targeting the TATA box region.
Clinical significance
Technology
Many of the studies so far have been performed
in vitro
''In vitro'' (meaning ''in glass'', or ''in the glass'') Research, studies are performed with Cell (biology), cells or biological molecules outside their normal biological context. Colloquially called "test-tube experiments", these studies in ...
, providing only a prediction of what may happen not a real-time representation of what is happening in the
cells. Recent studies in 2016 have been done to demonstrate TATA-binding activity
in vivo
Studies that are ''in vivo'' (Latin for "within the living"; often not italicized in English) are those in which the effects of various biological entities are tested on whole, living organisms or cells, usually animals, including humans, an ...
. Core
promoter-specific mechanisms for transcription initiation by the canonical TBP/TFIID-dependent basal
transcription machinery has recently been documented
in vivo
Studies that are ''in vivo'' (Latin for "within the living"; often not italicized in English) are those in which the effects of various biological entities are tested on whole, living organisms or cells, usually animals, including humans, an ...
showing the activation by
SRF-dependent
upstream activating sequence (UAS) of the human
ACTB gene involved in TATA-binding.
Cancer therapy
Pharmaceutical companies
The pharmaceutical industry is a Medicine, medical industry that discovers, develops, produces, and markets pharmaceutical goods such as medications and medical devices. Medications are then administered to (or Self-medicate, self-administered b ...
have been designing
cancer therapy drug
A drug is any chemical substance other than a nutrient or an essential dietary ingredient, which, when administered to a living organism, produces a biological effect. Consumption of drugs can be via insufflation (medicine), inhalation, drug i ...
s to target
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
in traditional methods over the years, and have proven to be successful.
However, the toxicity of these drugs have pushed scientists to explore other processes related to DNA that could be targeted instead. In recent years, a collective effort has been made to find cancer-specific molecular targets, such as protein-DNA complexes, which include the TATA binding motif. Compounds that trap the
protein-DNA intermediate could result in it being
toxic
Toxicity is the degree to which a chemical substance or a particular mixture of substances can damage an organism. Toxicity can refer to the effect on a whole organism, such as an animal, bacterium, or plant, as well as the effect on a subst ...
to the
cell once they encounter a
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
processing event. Example of
drug
A drug is any chemical substance other than a nutrient or an essential dietary ingredient, which, when administered to a living organism, produces a biological effect. Consumption of drugs can be via insufflation (medicine), inhalation, drug i ...
s that contain such compounds include
topotecan,
SN-38 (
topoisomerase I),
doxorubicin, and
mitoxantrone
Mitoxantrone (INN, BAN, USAN; also known as Mitozantrone in Australia; trade name Novantrone) is an anthracenedione antineoplastic agent.
Uses
Mitoxantrone is used to treat certain types of cancer, mostly acute myeloid leukemia. It improves th ...
(
topoisomerase II
Type II topoisomerases are topoisomerases that cut both strands of the DNA helix simultaneously in order to manage DNA tangles and supercoils. They use the hydrolysis of Adenosine triphosphate, ATP, unlike Type I topoisomerase. In this process, t ...
).
Cisplatin
Cisplatin is a chemical compound with chemical formula, formula ''cis''-. It is a coordination complex of platinum that is used as a chemotherapy medication used to treat a number of cancers. These include testicular cancer, ovarian cancer, c ...
is a compound that binds
covalently to adjacent
guanine
Guanine () (symbol G or Gua) is one of the four main nucleotide bases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine ( uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside ...
s in the
major groove of
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
, which distorts
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
to allow access of
DNA-binding proteins in the
minor groove.
This will destabilize the interaction between the
TATA-binding protein (TBP) to the TATA box. The result is to immobilize the
TATA-binding protein (TBP) on DNA in order to down-regulate
transcription initiation.
Genetic engineering
TATA box modification
Evolutionary changes have pushed
plant
Plants are the eukaryotes that form the Kingdom (biology), kingdom Plantae; they are predominantly Photosynthesis, photosynthetic. This means that they obtain their energy from sunlight, using chloroplasts derived from endosymbiosis with c ...
s to adapt to the changing environmental conditions. In the
history
History is the systematic study of the past, focusing primarily on the Human history, human past. As an academic discipline, it analyses and interprets evidence to construct narratives about what happened and explain why it happened. Some t ...
of
Earth
Earth is the third planet from the Sun and the only astronomical object known to Planetary habitability, harbor life. This is enabled by Earth being an ocean world, the only one in the Solar System sustaining liquid surface water. Almost all ...
, the development of Earth's aerobic
atmosphere
An atmosphere () is a layer of gases that envelop an astronomical object, held in place by the gravity of the object. A planet retains an atmosphere when the gravity is great and the temperature of the atmosphere is low. A stellar atmosph ...
resulted in an
iron
Iron is a chemical element; it has symbol Fe () and atomic number 26. It is a metal that belongs to the first transition series and group 8 of the periodic table. It is, by mass, the most common element on Earth, forming much of Earth's o ...
deficiency in plants.
Compared to other members of the same species, ''
Malus baccata'' var. ''xiaojinensis'' has a TATA box inserted in the promoter upstream of the iron-regulated transporter 1 (IRT1)
promoter. As a result, the promoter activity levels are enhanced, increasing
TFIID activity and subsequently
transcription initiation, resulting in a more iron-efficient phenotype. With genetic engineering, a similar modification can be done to other plants, such as the model species of
tobacco
Tobacco is the common name of several plants in the genus '' Nicotiana'' of the family Solanaceae, and the general term for any product prepared from the cured leaves of these plants. More than 70 species of tobacco are known, but the ...
and ''
Arabidopsis thaliana
''Arabidopsis thaliana'', the thale cress, mouse-ear cress or arabidopsis, is a small plant from the mustard family (Brassicaceae), native to Eurasia and Africa. Commonly found along the shoulders of roads and in disturbed land, it is generally ...
''.
See also
*
Pribnow box
The Pribnow box (also known as the Pribnow-Schaller box) is a sequence of ''TATAAT'' of six nucleotides (thymine, adenine, thymine, etc.) that is an essential part of a promoter site on DNA for transcription to occur in bacteria.
It is an ideal ...
*
Initiator element
The initiator element (''Inr''), sometimes referred to as initiator motif, is a core promoter that is similar in function to the Pribnow box (in prokaryotes) or the TATA box (in eukaryotes). The ''Inr'' is the simplest functional promoter that i ...
*
Kozak consensus sequence
The Kozak consensus sequence (Kozak consensus or Kozak sequence) is a Nucleic acid sequence, nucleic acid motif that functions as the protein Translation (biology), translation initiation site in most eukaryotic Messenger RNA, mRNA transcripts. Reg ...
References
{{Transcription
Regulatory sequences
1978 in biology