STARR-seq
   HOME

TheInfoList



OR:

STARR-seq (short for self-transcribing active regulatory region sequencing) is a method to assay enhancer activity for millions of candidates from arbitrary sources of DNA. It is used to identify the sequences that act as transcriptional enhancers in a direct, quantitative, and genome-wide manner. In
eukaryotes The eukaryotes ( ) constitute the domain of Eukaryota or Eukarya, organisms whose cells have a membrane-bound nucleus. All animals, plants, fungi, seaweeds, and many unicellular organisms are eukaryotes. They constitute a major group of ...
,
transcription Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including: Genetics * Transcription (biology), the copying of DNA into RNA, often th ...
is regulated by sequence-specific DNA-binding proteins (
transcription factors In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The fun ...
) associated with a gene’s promoter and also by distant control sequences including enhancers. Enhancers are
non-coding DNA Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, microRNA, piRNA, ribosomal RNA, and reg ...
sequences, containing several binding sites for a variety of transcription factors. They typically recruit transcriptional factors that modulate
chromatin Chromatin is a complex of DNA and protein found in eukaryote, eukaryotic cells. The primary function is to package long DNA molecules into more compact, denser structures. This prevents the strands from becoming tangled and also plays important r ...
structure and directly interact with the transcription machinery placed at the promoter of a gene. Enhancers are able to regulate transcription of target genes in a cell type-specific manner, independent of their location or distance from the promoter of genes. In certain contexts (see Transvection (genetics)), they can even regulate transcription of genes located in a different
chromosome A chromosome is a package of DNA containing part or all of the genetic material of an organism. In most chromosomes, the very long thin DNA fibers are coated with nucleosome-forming packaging proteins; in eukaryotic cells, the most import ...
. However, the knowledge about enhancers so far has been limited to studies of a small number of enhancers, as they have been difficult to identify accurately at a genome-wide scale. Moreover, many regulatory elements function only in certain cell types and specific conditions.


Enhancer detection

Enhancer detection in
Drosophila ''Drosophila'' (), from Ancient Greek δρόσος (''drósos''), meaning "dew", and φίλος (''phílos''), meaning "loving", is a genus of fly, belonging to the family Drosophilidae, whose members are often called "small fruit flies" or p ...
is an original methodology using random insertion of transposon-derived vector that encodes a reporter protein downstream of a minimal promoter. This approach allows to observe the expression of reporter in
transgenic animals Genetically modified animals are animals that have been genetically modified for a variety of purposes including producing drugs, enhancing yields, increasing resistance to disease, etc. The vast majority of genetically modified animals are at th ...
and provides information about nearby genes that are regulated by these sequences. The discovery and characterization of cell types along with genes involved in their determination have been significantly improved by the discovery of this technique. During the past few years, post-genomic technologies, have displayed specific features of poised and active enhancers that have improved enhancer discovery. Development of new methods such as deep sequencing of DNase I hypersensitive sites (
DNase-Seq DNase-seq ( DNase I hypersensitive sites sequencing) is a method in molecular biology Molecular biology is a branch of biology that seeks to understand the molecule, molecular basis of biological activity in and between Cell (biology), cells, i ...
), formaldehyde-assisted isolation of regulatory elements sequencing (
FAIRE-Seq FAIRE-Seq (Formaldehyde-Assisted Isolation of Regulatory Elements) is a method in molecular biology used for determining the sequences of DNA regions in the genome associated with regulatory activity. The technique was developed in the laboratory of ...
), chromatin immunoprecipitation followed by deep sequencing (
ChIP-sequencing ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated prote ...
), and MNase-defined cistrome-Occupancy Analysis (MOA-seq), provide genome-wide enhancer predictions by enhancer-associated chromatin features.


Application

However, DNase-seq and FAIRE-seq alone fail to provide a direct functional or quantitative readout of enhancer activity, so reporter assays that can deduce enhancer strength from the quantitative enrichment of reporter transcripts are needed to assess enhancer activity quantitatively. Yet, these assays are not high-throughput (
High throughput biology High throughput biology (or high throughput cell biology) is the use of automation equipment with classical cell biology techniques to address biological questions that are otherwise unattainable using conventional methods. It may incorporate tec ...
), as it is impossible to conduct millions of tests required for identification of enhancers in a genome-wide manner. The development of STARR-seq attempts to circumvent this analytical barrier. Taking advantage of the knowledge that enhancers can work independently of their relative locations, candidate sequences are placed downstream of a minimal promoter, allowing the active enhancers to transcribe themselves. The strength of each enhancer is then reflected by its relative enrichment among cellular RNAs. Such a direct coupling of candidate sequences to enhancer activity enables the parallel evaluation of millions of DNA fragments from arbitrary sources.


Methodology

Genomic DNA is randomly sheared and broken down to small fragments. Adaptors are ligated to size-selected DNA fragments. Next, adaptor-linked fragments are amplified and the PCR products are purified followed by placing candidate sequences downstream of a minimal promoter of screening vectors, giving them an opportunity to transcribe themselves. Candidate cells are then transfected with reporter library and cultured. Thereafter, total
RNAs The Royal Naval Air Service (RNAS) was the air arm of the Royal Navy, under the direction of the Admiralty's Air Department, and existed formally from 1 July 1914 to 1 April 1918, when it was merged with the British Army's Royal Flying Corps ...
are extracted and poly-A RNAs isolated. Using
reverse transcription A reverse transcriptase (RT) is an enzyme used to convert RNA genome to DNA, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B virus, hepatitis B to replicate their genomes, by retrot ...
method,
cDNAs In genetics, complementary DNA (cDNA) is DNA that was reverse transcribed (via reverse transcriptase) from an RNA (e.g., messenger RNA or microRNA). cDNA exists in both single-stranded and double-stranded forms and in both natural and engine ...
are produced, amplified and then candidate fragments are used for high-throughput paired end sequencing. Sequence reads are mapped to the
reference genome A reference genome (also known as a reference assembly) is a digital nucleic acid sequence database, assembled by scientists as a representative example of the genome, set of genes in one idealized individual organism of a species. As they are a ...
and computational processing of data is carried out.


Identification of enhancers

Applying this technology to Drosophila genome, Arnold et al. found 96% of the non-repetitive genome with at least 10-fold coverage. Authors discovered that most identified enhancers (55.6%) were placed within
introns An intron is any Nucleic acid sequence, nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word ''intron'' is derived from the term ''intragenic region'', i.e., a region inside a gene."The notion of ...
, particularly in the first intron and
intergenic region An intergenic region is a stretch of DNA sequences located between genes. Intergenic regions may contain functional elements and junk DNA. Properties and functions Intergenic regions may contain a number of functional DNA sequences such as p ...
s. 4.5% of enhancers were located at
transcription start site Transcription is the process of copying a segment of DNA into RNA for the purpose of gene expression. Some segments of DNA are transcribed into RNA molecules that can encode proteins, called messenger RNA (mRNA). Other segments of DNA are transc ...
s (TSS), suggesting that these enhancers can start transcription and also improve transcription from a distant TSS. The strongest enhancers were near
housekeeping genes In molecular biology, housekeeping genes are typically constitutive genes that are required for the maintenance of basic cellular function, and are gene expression, expressed in all cells of an organism under normal and patho-physiological condit ...
such as enzymes or component of the
cytoskeleton The cytoskeleton is a complex, dynamic network of interlinking protein filaments present in the cytoplasm of all cells, including those of bacteria and archaea. In eukaryotes, it extends from the cell nucleus to the cell membrane and is compos ...
and developmental regulators such as the transcription factors. The strongest enhancer was located within the intron of the transcription factor zfh1. This transcription factor regulates
neuropeptide Neuropeptides are chemical messengers made up of small chains of amino acids that are synthesized and released by neurons. Neuropeptides typically bind to G protein-coupled receptors (GPCRs) to modulate neural activity and other tissues like the ...
expression and growth of larval neuromuscular junctions in Drosophila. The
ribosomal protein A ribosomal protein (r-protein or rProtein) is any of the proteins that, in conjunction with rRNA, make up the ribosomal subunits involved in the cellular process of translation. ''E. coli'', other bacteria and Archaea have a 30S small subunit ...
genes were the only class of genes with poor enhancers ranking. Moreover, authors demonstrated that many genes are regulated by several independent active enhancers even in a single cell type. Furthermore, gene expression levels on average were correlated with the sum of the enhancer strengths per gene, supporting direct link between gene expression and enhancer activity.


Characterization of variant alleles

Applying this technology to the characterization and discovery of regulatory variant alleles, Vockley et al. characterized the effects of human genetic variation on non-coding regulatory element function, measuring the activity of 100 putative enhancers captured directly from the genomes of 95 members of a study cohort. This approach enables the functional fine-mapping of causal regulatory variants in regions of high linkage disequilibrium identified by
eQTL An expression quantitative trait locus (eQTL) is a type of quantitative trait locus (QTL), a genomic locus (region of DNA) that is associated with phenotypic variation for a specific, quantifiable trait. While the term QTL can refer to a wide ran ...
analyses. This approach provides a general path forward to identify perturbations in gene regulatory elements that contribute to complex phenotypes.


Quantifying enhancer activity

STARR-seq has been used to measure the regulatory activity of DNA fragments that have been enriched for sites occupied by specific transcription factors. Cloning
ChIP Chip may refer to: Food * Chip (snack), thinly sliced and deep-fried gastro item ** Potato chips (US) or crisp (UK) * Chips (fried potato strips) (UK) or french fries (US) (common as a takeout side) * Game chips, thin chip/French fries * Choco ...
DNA libraries generated from chromatin immunoprecipitation of the
glucocorticoid receptor The glucocorticoid receptor (GR or GCR) also known by its gene name ''NR3C1'' ( nuclear receptor subfamily 3, group C, member 1) is the steroid receptor for glucocorticoids such as cortisol. The GR is expressed in almost every cell in the bod ...
into STARR-seq enabled genome-scale quantification of glucocorticoid-induced enhancer activity. This approach is useful for measuring the differences in enhancer activity between sites that are bound by the same transcription factor.


Future directions

By combining traditional approach with high-throughput sequencing technology and highly specialized bio-computing methods, STARR-seq is able to detect enhancers in a quantitative and genome-wide manner. The study of gene regulation and their responsible pathways in the genome during normal development and also in disease can be very demanding. Therefore, applying STARR-seq to many cell types across organisms supports identifying cell type-specific gene regulatory elements and practically assesses non-coding
mutations In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitosi ...
causing disease. Recently, a related approach coupling capture of regions of interest to STARR-seq technique have been developed and extensively validated in mammalian cell lines.


References

{{reflist DNA