In
genetics
Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinian friar worki ...
, an expressed sequence tag (EST) is a short sub-sequence of a
cDNA sequence. ESTs may be used to identify gene
transcripts, and were instrumental in gene discovery and in gene-sequence determination.
The identification of ESTs has proceeded rapidly, with approximately 74.2 million ESTs now available in public databases (e.g.
GenBank
The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a part ...
1 January 2013, all species). EST approaches have largely been superseded by whole genome and transcriptome sequencing and metagenome sequencing.
An EST results from one-shot
sequencing of a
cloned cDNA. The cDNAs used for EST generation are typically individual clones from a
cDNA library. The resulting sequence is a relatively low-quality fragment whose length is limited by current technology to approximately 500 to 800
nucleotide
Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecul ...
s. Because these clones consist of DNA that is complementary to mRNA, the ESTs represent portions of expressed genes. They may be represented in databases as either cDNA/mRNA sequence or as the reverse complement of the mRNA, the
template strand.
One can map ESTs to specific chromosome locations using
physical mapping
Physical map is a technique used in molecular biology to find the order and physical distance between DNA base pairs by DNA markers. It is one of the gene mapping techniques which can determine the sequence of DNA base pairs with high accuracy. ...
techniques, such as
radiation hybrid mapping
Radiation hybrid mapping (also known as RH mapping) is a technique for mapping mammalian chromosomes.
Radiation hybrid mapping consists of several steps. First of all, desired chromosomes are broken into several segments with X-rays, after which ...
,
Happy mapping, or
FISH
Fish are aquatic, craniate, gill-bearing animals that lack limbs with digits. Included in this definition are the living hagfish, lampreys, and cartilaginous and bony fish as well as various extinct related groups. Approximately 95% ...
. Alternatively, if the genome of the organism that originated the EST has been sequenced, one can align the EST sequence to that genome using a computer.
The current understanding of the
human set of genes () includes the existence of thousands of genes based solely on EST evidence. In this respect, ESTs have become a tool to refine the predicted transcripts for those genes, which leads to the prediction of their protein products and ultimately of their function. Moreover, the situation in which those ESTs are obtained (tissue, organ, disease state - e.g.
cancer
Cancer is a group of diseases involving abnormal cell growth with the potential to invade or spread to other parts of the body. These contrast with benign tumors, which do not spread. Possible signs and symptoms include a lump, abnormal bl ...
) gives information on the conditions in which the corresponding gene is acting. ESTs contain enough information to permit the design of precise probes for
DNA microarray
A DNA microarray (also commonly known as DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to ...
s that then can be used to determine
gene expression
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, protein or non-coding RNA, and ultimately affect a phenotype, as the final effect. ...
profiles.
Some authors use the term "EST" to describe genes for which little or no further information exists besides the tag.
History
In 1979, teams at Harvard and Caltech extended the basic idea of making DNA copies of mRNAs in vitro to amplifying a library of such in bacterial plasmids.
In 1982, the idea of selecting random or semi-random clones from such a cDNA library for sequencing was explored by Greg Sutcliffe and coworkers.
In 1983, Putney et al. sequenced 178 clones from a rabbit muscle cDNA library.
In 1991, Adams and co-workers coined the term EST and initiated more systematic sequencing as a project (starting with 600 brain cDNAs).
Sources of data and annotations
dbEST
The dbEST is a division of Genbank established in 1992. As for
GenBank
The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a part ...
, data in dbEST is directly submitted by laboratories worldwide and is not curated.
EST contigs
Because of the way ESTs are sequenced, many distinct expressed sequence tags are often partial sequences that correspond to the same mRNA of an organism. In an effort to reduce the number of expressed sequence tags for downstream gene discovery analyses, several groups assembled expressed sequence tags into EST
contigs. Example of resources that provide EST contigs include: TIGR gene indices, Unigene, and STACK
Constructing EST contigs is not trivial and may yield artifacts (contigs that contain two distinct gene products). When the complete genome sequence of an organism is available and transcripts are annotated, it is possible to bypass contig assembly and directly match transcripts with ESTs. This approach is used in the TissueInfo system (see below) and makes it easy to link annotations in the genomic database to tissue information provided by EST data.
Tissue information
High-throughput analyses of ESTs often encounter similar data management challenges. A first challenge is that tissue provenance of EST libraries is described in plain English in dbEST. This makes it difficult to write programs that can unambiguously determine that two EST libraries were sequenced from the same tissue. Similarly, disease conditions for the tissue are not annotated in a computationally friendly manner. For instance, cancer origin of a library is often mixed with the tissue name (e.g., the tissue name "
glioblastoma" indicates that the EST library was sequenced from brain tissue and the disease condition is cancer). With the notable exception of cancer, the disease condition is often not recorded in dbEST entries. The TissueInfo project was started in 2000 to help with these challenges. The project provides curated data (updated daily) to disambiguate tissue origin and disease state (cancer/non cancer), offers a tissue ontology that links tissues and organs by "is part of" relationships (i.e., formalizes knowledge that hypothalamus is part of brain, and that brain is part of the central nervous system) and distributes open-source software for linking transcript annotations from sequenced genomes to tissue expression profiles calculated with data in dbEST.
:institute for computational biomedicine::TissueInfo
See also
* Gene expression
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, protein or non-coding RNA, and ultimately affect a phenotype, as the final effect. ...
* Complementary DNA
In genetics, complementary DNA (cDNA) is DNA synthesized from a single-stranded RNA (e.g., messenger RNA ( mRNA) or microRNA (miRNA)) template in a reaction catalyzed by the enzyme reverse transcriptase. cDNA is often used to express a sp ...
(cDNA)
* transcriptomics
* IMAGE cDNA clones
* Whole genome sequencing
Whole genome sequencing (WGS), also known as full genome sequencing, complete genome sequencing, or entire genome sequencing, is the process of determining the entirety, or nearly the entirety, of the DNA sequence of an organism's genome at a ...
(WGS)
References
External links
*
*
*
**
*
**
*
**
Tissue Info
*
*
*
{{DEFAULTSORT:Expressed Sequence Tag
Gene expression
Genomics
DNA