
In
molecular biology
Molecular biology is a branch of biology that seeks to understand the molecule, molecular basis of biological activity in and between Cell (biology), cells, including biomolecule, biomolecular synthesis, modification, mechanisms, and interactio ...
, a CCAAT box (also sometimes abbreviated a CAAT box or CAT box) is a distinct pattern of
nucleotide
Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
s with GGCCAATCT
consensus sequence
In molecular biology and bioinformatics, the consensus sequence (or canonical sequence) is the calculated sequence of most frequent residues, either nucleotide or amino acid, found at each position in a sequence alignment. It represents the result ...
that occur upstream by 60–100 bases to the initial
transcription site. The CAAT box signals the binding site for the
RNA
Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
transcription factor
In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription (genetics), transcription of genetics, genetic information from DNA to messenger RNA, by binding t ...
, and is typically accompanied by a
conserved consensus sequence. It is an invariant
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
sequence at about minus 70 base pairs from the origin of transcription in many
eukaryotic
The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...
promoters. Genes that have this element seem to require it for the gene to be transcribed in sufficient quantities. It is frequently absent from genes that encode proteins used in virtually all cells. This box along with the
GC box is known for binding general transcription factors. Both of these consensus sequences belong to the regulatory
promoter. Full gene expression occurs when transcription activator proteins bind to each module within the regulatory promoter. Protein specific binding is required for the CCAAT box activation. These proteins are known as CCAAT box binding proteins/CCAAT box binding factors.
A CCAAT box is a feature frequently found before eukaryote coding regions, but is not found in prokaryotes.
Consensus sequence
In the direction of transcription of the template strand, the
consensus sequence
In molecular biology and bioinformatics, the consensus sequence (or canonical sequence) is the calculated sequence of most frequent residues, either nucleotide or amino acid, found at each position in a sequence alignment. It represents the result ...
, or the calculated order of the most frequent residues, for the CAAT box was 3'-TG ATTGG (T/C)(T/C)(A/G)-5'. The use of parentheses denotes that either base is present, but it is not specified as to their relative frequencies. For example, "(T/C)" would mean that either thymine or cytosine are preferentially selected for.
Within
metazoa
Animals are multicellular, eukaryotic organisms in the biological kingdom Animalia (). With few exceptions, animals consume organic material, breathe oxygen, have myocytes and are able to move, can reproduce sexually, and grow from a hol ...
(animal kingdom), the core binding factor (CBF)-DNA complex retains a high degree of conservation within the CCAAT binding motif, as well as the sequences flanking this pentameric motif. The CCAAT motif in plants (spinach was used in an experiment) differs slightly from metazoa in that it is actually a CAAT binding motif; the promoter lacks one of the two C residues from the pentameric motif, and the artificial addition of the second C has no significant effects on binding activity. Some sequences lack the CAAT-box completely. Secondly, the surrounding nucleotides in plants do not match the consensus sequence above determined by Bi ''et al.''
Core promoter
The CAAT box is what is known as a core promoter, also known as the basal promoter or simply the
promoter, is a region of DNA that initiates transcription of a particular gene. This region, in particular for the CAAT box, is located about 60–100 bases upstream (towards the 5' end), however no less than 27 base pairs away, from the
initial transcription site or a eukaryote gene in which a complex of general transcription factors bind with
RNA polymerase II
RNA polymerase II (RNAP II and Pol II) is a Protein complex, multiprotein complex that Transcription (biology), transcribes DNA into precursors of messenger RNA (mRNA) and most small nuclear RNA (snRNA) and microRNA. It is one of the three RNA pol ...
prior to the initiation of transcription.
It is essential to the transcription that these core binding factors (also referred to as nuclear factor Y or NF-Y) are able to bind to the CCAAT motif. Experiments in many laboratories have shown that mutations to the CCAAT motif that cause a loss of CBF binding also decreases transcriptional activity in these promoters, suggesting that CBF-CCAAT complexes are essential for optimum transcriptional activity.
Binding
In an experiment done with core binding factors (CBF) and DNA complexes, researchers were able to determine the preferential sequences of the promoter in a region over and immediately adjacent to the CAAT box, and two regions on either side of the CAAT box. By using
PCR-mediated random binding selection process, researchers were able to show that the sequence "3' - (T/C)G ATTGG (T/C)(T/C)(A/G) - 5'" immediately flanking the ATTGG region (CCAAT in the complementary strand) was preferentially selected on the coding strand (opposite of the template strand).
This was shown using an oligonucleotide sequence (R1) which contained 27 random nucleotides, flanked by a defined 20 nucleotide sequence on each side. While no single nucleotide was selected in every clone on either side of the ATTGG motif (CCAAT in the complementary strand), there were several nucleotides in positions selected with high frequency. Most notably from the sequence above was the G residue towards the 5' end of the ATTGG. The other residues also listed were notable, but there is a split between two residues. This same experiment also yielded the same sequence as shown above when using a different oligonucleotide (R2) that contained an ATTGG core and flanked by 12 5' random nucleotides and 10 3' random nucleotides. Both these sequences are very similar and confirmed in multiple experiments. For sequences that flanked the ATTGG motif with two adenine residues (AA) on its 5' end and G(A/G) on its 3' end, seems to have inhibited formation of the CBF-DNA complex and subsequently occurred in only 1% of the promoter sequences.
In another experiment performed with the major late promoter (MLP) of adenoviruses from a variety of host species, it was shown that the mutation of the CAAT box and CCAAT sequence, which is thought to play a pivotal role in the (MLP) of subgroup C human adenoviruses, in species with a deficient CAAT sequence. The transcription initiation at mutant MLP species was significantly reduced compared with that of the wild type or species in which there was a CAAT mutant. The failure to restore the normally functional adenoviruses, exhibited by a CAAT box, is consistent with the idea that the CAAT box plays a vital role in the adenovirus MLP and is preferred over other transcriptional elements.
CCAAT in plants
These core binding factors, or nuclear factors (NF-Y), are composed of three subunits – NF-YA, NF-YB, and NF-YC. Whereas in animals each NF-Y subunit is encoded by a single gene, there has been a diversification in plants in both structure and function. Families of NF-Y consist of between eight and 39 members per subunit. A large reason for this diversification is because of gene duplications and tandem duplications, which have helped contribute to the larger family sizes of NF-Y compared to the single encoded animal nuclear factors.
Each subunit contains an evolutionarily conserved part – the
C-terminal
The C-terminus (also known as the carboxyl-terminus, carboxy-terminus, C-terminal tail, carboxy tail, C-terminal end, or COOH-terminus) is the end of an amino acid chain (protein or polypeptide), terminated by a free carboxyl group (-COOH). When t ...
of NF-YA, the central part of NF-YB, and the
N-terminal
The N-terminus (also known as the amino-terminus, NH2-terminus, N-terminal end or amine-terminus) is the start of a protein or polypeptide, referring to the free amine group (-NH2) located at the end of a polypeptide. Within a peptide, the amin ...
of NF-YC, greater than 70% of these across species remains conserved. Neighboring regions however are generally not conserved.
NF-YA subunit
The NF-YA family encodes transcription factors that are variable in length (between 207 and 347 amino acids for ''
M. truncatula''). The NF-YA proteins are generally characterized by two domains that are strongly conserved in all higher eukaryotes investigated to date. The first domain (A1) contains 20 amino acids that forms an
alpha helix
An alpha helix (or α-helix) is a sequence of amino acids in a protein that are twisted into a coil (a helix).
The alpha helix is the most common structural arrangement in the Protein secondary structure, secondary structure of proteins. It is al ...
that appears significant in its interactions with NF-YB and NF-YC. The second domain (A2) is adjacent to the A1 domain by a conserved linker sequence is a sequence of 21 amino acids vital in the specific DNA to CCAAT box binding. The A1 and A2 domains are conserved towards the C-terminus of mammals, but occupy a more central region in plant NF-YA subunits. In plants, the NF-YA subunit has evolved to regulate the development of a facultative root organ only present in leguminous plants and shown to be expressed in root tissue. It was shown to have drought-resistant-like properties, becoming upregulated during drought stress in the roots and leaves of ''
Arabidopsis
''Arabidopsis'' (rockcress) is a genus in the family Brassicaceae. They are small flowering plants related to cabbage and mustard. This genus is of great interest since it contains thale cress (''Arabidopsis thaliana''), one of the model organ ...
''. NF-YA mutants have shown a loss of function and a hypersensitivity to drought-like conditions, and, in contrast, overexpression of NF-YA has resulted in
drought resistance
In botany, drought tolerance is the ability by which a plant maintains its biomass production during arid or drought conditions. Some plants are naturally adapted to dry conditions'','' surviving with protection mechanisms such as desiccation tole ...
.
NF-YB subunit
The NF-YB family is, similar to the NF-YA subunit, variable in length, however, on average much smaller than the NF-YA subunit (90–240 amino acids in "M. truncatula"). They have been characterized with a structure and amino acid composition similar to the
histone fold motif (HFM). This is composed of three alpha-helices separated by two beta strand-loop domains. Similar to NF-YA, NF-YB has been shown to also improve drought resistance when overexpressed and also the promotion of flowering in ''Arabidopsis''.
NF-YC subunit
The NF-YC proteins are an intermediate size between that of NF-YA and NF-YB proteins (117–292 amino acids in ''M. truncatula'') and also contain the HFM that is prevalent in NF-YB proteins. It has also been shown to be involved in flowering time in certain plants (overexpression leads to earlier flowering) where its influence is potentially regulated by the binding of the protein CONSTANS (CO) to the NF-YC subunit.
NF-Y complexes
Because of the evolutionary change in NF-Y encoding genes in plants, they subsequently have a large range of potential trimeric complexes. For example, in ''Arabidopsis'', 36 NF-Y transcription factor subunits (including 10 NF-YA, 13 NF-YB, and 13 NF-YC subunits) have been identified and which could theoretically form 1690 unique complexes (which contains one of each type of subunit). This number, of course is higher than what actually happens since some subunits have specific binding patterns. Functional analyses on NF-Y encoding genes in plants have shown, as a result of their evolutionary diversification relative to their animal counterparts, have acquired diverse specific functions, such as embryo development, flowering time control, ER-stress, drought stress, and nodule and root development. This may only be a small portion of their capabilities, since the number of theoretically combinations of NF-Y complexes is so large and only a small portion can actually be created (less than 10% of all possible interactions were confirmed in both directions in yeast).
CCAAT enhancer binding proteins (C/EBPs)
Another aspect of the CCAAT binding motif is the
CCAAT/enhancer binding proteins (C/EBPs). They are a group of transcription factors of 6 members (α-ζ), which are highly conserved and bind to the CCAAT motif. While research on these binding proteins is relatively recent, their function has been shown to have vital roles in cellular proliferation and differentiation,
metabolism
Metabolism (, from ''metabolē'', "change") is the set of life-sustaining chemical reactions in organisms. The three main functions of metabolism are: the conversion of the energy in food to energy available to run cellular processes; the co ...
, inflammation, and immunity in various cells, but specifically
hepatocytes
A hepatocyte is a cell of the main parenchymal tissue of the liver. Hepatocytes make up 80% of the liver's mass.
These cells are involved in:
* Protein synthesis
* Protein storage
* Transformation of carbohydrates
* Synthesis of cholesterol, bile ...
,
adipocytes
Adipocytes, also known as lipocytes and fat cells, are the cells that primarily compose adipose tissue, specialized in storing energy as fat. Adipocytes are derived from mesenchymal stem cells which give rise to adipocytes through adipogenesis. ...
, and
hematopoietic cells
Haematopoiesis (; ; also hematopoiesis in American English, sometimes h(a)emopoiesis) is the formation of blood cellular components. All cellular blood components are derived from haematopoietic stem cells. In a healthy adult human, roughly ten b ...
.
For example, in adipocytes, this has been shown in a variety of experiments with mice: ectopic expression of these C/EBPs (C/EBPα and C/EBPβ) were able to initiate the differentiation programs of the cell, even in the absence of
adipogenic hormones, or the differentiation of preadipocytes to adipocytes (or fat cells). In addition, an overabundance of these C/EBPs (specifically, C/EBPδ) causes an accelerated response. And furthermore, in cells lacking C/EBP or in C/EBP-deficient mice, both are unable to undergo adipogenesis. This results in the mice dying from
hypoglycemia
Hypoglycemia (American English), also spelled hypoglycaemia or hypoglycæmia (British English), sometimes called low blood sugar, is a fall in blood sugar to levels below normal, typically below 70 mg/dL (3.9 mmol/L). Whipple's tria ...
, or the reduced lipid accumulation in adipose tissue.
The C/EBPs follow a general basic-
leucine zipper
A leucine zipper (or leucine scissors) is a common three-dimensional structural motif in proteins. They were first described by Landschulz and collaborators in 1988 when they found that an enhancer binding protein had a very characteristic 30-amin ...
(bZIP) domain at the
C-terminus
The C-terminus (also known as the carboxyl-terminus, carboxy-terminus, C-terminal tail, carboxy tail, C-terminal end, or COOH-terminus) is the end of an amino acid chain (protein
Proteins are large biomolecules and macromolecules that comp ...
and are able to form dimers with other C/EBPs or other transcription factors. This
dimerization allows the C/EBPs to bind specifically to DNA through a
palindromic sequence
A palindromic sequence is a nucleic acid sequence in a double-stranded DNA or RNA molecule whereby reading in a certain direction (e.g. 5' to 3') on one strand is identical to the sequence in the same direction (e.g. 5' to 3') on the complemen ...
in the major groove of DNA. They are regulated through various means, including
hormones
A hormone (from the Greek participle , "setting in motion") is a class of signaling molecules in multicellular organisms that are sent to distant organs or tissues by complex biological processes to regulate physiology and behavior. Hormones a ...
,
mitogens
A mitogen is a small bioactive protein or peptide that induces a cell to begin cell division, or enhances the rate of division (mitosis). Mitogenesis is the induction (triggering) of mitosis, typically via a mitogen.
The cell cycle
Mitogens ac ...
,
cytokines
Cytokines () are a broad and loose category of small proteins (~5–25 kDa) important in cell signaling.
Cytokines are produced by a broad range of cells, including immune cells like macrophages, B cell, B lymphocytes, T cell, T lymphocytes ...
,
nutrients
A nutrient is a substance used by an organism to survive, grow and reproduce. The requirement for dietary nutrient intake applies to animals, plants, fungi and protists. Nutrients can be incorporated into cells for metabolic purposes or excret ...
, and other various factors.
[
]
See also
* Promoter
References
{{DEFAULTSORT:Caat Box
Regulatory sequences