HOME

TheInfoList



OR:

Long non-coding RNAs (long ncRNAs, lncRNA) are a type of
RNA Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
, generally defined as transcripts more than 200
nucleotide Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
s that are not translated into protein. This arbitrary limit distinguishes long ncRNAs from small
non-coding RNA A non-coding RNA (ncRNA) is a functional RNA molecule that is not Translation (genetics), translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally imp ...
s, such as
microRNA Micro ribonucleic acid (microRNA, miRNA, μRNA) are small, single-stranded, non-coding RNA molecules containing 21–23 nucleotides. Found in plants, animals, and even some viruses, miRNAs are involved in RNA silencing and post-transcr ...
s (miRNAs),
small interfering RNA Small interfering RNA (siRNA), sometimes known as short interfering RNA or silencing RNA, is a class of double-stranded RNA, double-stranded non-coding RNA, non-coding RNA, RNA molecules, typically 20–24 base pairs in length, similar to microR ...
s (siRNAs), Piwi-interacting RNAs (piRNAs), small nucleolar RNAs (snoRNAs), and other short RNAs. Given that some lncRNAs have been reported to have the potential to encode small proteins or micro-peptides, the latest definition of lncRNA is a class of transcripts of over 200 nucleotides that have no or limited coding capacity. However, John S. Mattick and colleagues suggested to change definition of long non-coding RNAs to transcripts more than 500 nt, which are mostly generated by Pol II. That means that question of lncRNA exact definition is still under discussion in the field. Long intervening/intergenic noncoding RNAs (lincRNAs) are sequences of transcripts that do not overlap protein-coding genes. Long non-coding RNAs include intergenic lincRNAs, intronic ncRNAs, and sense and antisense lncRNAs, each type showing different genomic positions in relation to
gene In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s and
exon An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequence ...
s. The definition of lncRNAs differs from that of other RNAs such as siRNAs, mRNAs, miRNAs, and snoRNAs because it is not connected to the function of the RNA. A lncRNA is any transcript that is not one of the other well-characterized RNAs and is longer than 200-500 nucleotides. Some scientists think that most lncRNAs do not have a biologically relevant function because they are transcripts of junk DNA.


Abundance

Long non-coding transcripts are found in many species. Large-scale
complementary DNA In genetics, complementary DNA (cDNA) is DNA that was reverse transcribed (via reverse transcriptase) from an RNA (e.g., messenger RNA or microRNA). cDNA exists in both single-stranded and double-stranded forms and in both natural and engin ...
(cDNA) sequencing projects such as
FANTOM Fantom is a Swedish velomobile with four wheels, two in the front and two in the rear. It has no front suspension, but has suspension in the rear. Fantom was never sold as a finished product. Instead it was sold as a set of drawings. The drawin ...
reveal the complexity of these transcripts in humans. The FANTOM3 project identified ~35,000 non-coding transcripts that bear many signatures of
messenger RNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein. mRNA is created during the ...
s, including 5' capping, splicing, and poly-adenylation, but have little or no
open reading frame In molecular biology, reading frames are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible reading frames ...
(ORF). This number represents a conservative lower estimate, since it omitted many singleton transcripts and non- polyadenylated transcripts ( tiling array data shows more than 40% of transcripts are non-polyadenylated). Identifying ncRNAs within these cDNA libraries is challenging since it can be difficult to distinguish protein-coding transcripts from non-coding transcripts. It has been suggested through multiple studies that
testis A testicle or testis ( testes) is the gonad in all male bilaterians, including humans, and is Homology (biology), homologous to the ovary in females. Its primary functions are the production of sperm and the secretion of Androgen, androgens, p ...
, and neural tissues express the greatest amount of long non-coding RNAs of any tissue type. Using FANTOM5, 27,919 long ncRNAs have been identified in various human sources. Quantitatively, lncRNAs demonstrate ~10-fold lower abundance than mRNAs, which is explained by higher cell-to-cell variation of expression levels of lncRNA genes in the individual cells, when compared to protein-coding genes. In general, the majority (~78%) of lncRNAs are characterized as tissue-specific, as opposed to only ~19% of mRNAs. Only 3.6% of human lncRNA genes are expressed in various biological contexts and 34% of lncRNA genes are expressed at high level (top 25% of both lncRNAs and mRNAs) in at least one biological context. In addition to higher tissue specificity, lncRNAs are characterized by higher developmental stage specificity, and cell subtype specificity in tissues such as human
neocortex The neocortex, also called the neopallium, isocortex, or the six-layered cortex, is a set of layers of the mammalian cerebral cortex involved in higher-order brain functions such as sensory perception, cognition, generation of motor commands, ...
and other parts of the brain, regulating correct brain development and function. In 2022, a comprehensive integration of lncRNAs from existing databases, revealed that there are 95,243 lncRNA genes and 323,950 transcripts in humans. In comparison to mammals relatively few studies have focused on the prevalence of lncRNAs in
plant Plants are the eukaryotes that form the Kingdom (biology), kingdom Plantae; they are predominantly Photosynthesis, photosynthetic. This means that they obtain their energy from sunlight, using chloroplasts derived from endosymbiosis with c ...
s. However an extensive study considering 37 higher plant species and six
algae Algae ( , ; : alga ) is an informal term for any organisms of a large and diverse group of photosynthesis, photosynthetic organisms that are not plants, and includes species from multiple distinct clades. Such organisms range from unicellular ...
identified ~200,000 non-coding transcripts using an '' in-silico'' approach, which also established the associated Green Non-Coding Database ( GreeNC), a repository of plant lncRNAs.


Genomic organization

In 2005 the landscape of the mammalian genome was described as numerous 'foci' of transcription that are separated by long stretches of intergenic space. While some long ncRNAs are located within the intergenic stretches, the majority are overlapping sense and antisense transcripts that often include protein-coding genes, giving rise to a complex hierarchy of overlapping isoforms. Genomic sequences within these transcriptional foci are often shared within a number of coding and non-coding transcripts in the sense and antisense directions For example, 3012 out of 8961 cDNAs previously annotated as truncated coding sequences within FANTOM2 were later designated as genuine ncRNA variants of protein-coding cDNAs. While the abundance and conservation of these arrangements suggest they have biological relevance, the complexity of these foci frustrates easy evaluation. The GENCODE consortium has collated and analysed a comprehensive set of human lncRNA annotations and their
genomic Genomics is an interdisciplinary field of molecular biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, ...
organisation, modifications, cellular locations and tissue expression profiles. Their analysis indicates human lncRNAs show a bias toward two-
exon An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequence ...
transcripts.


Identification software


Translation

There has been considerable debate about whether lncRNAs have been misannotated and do in fact encode
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
s. Several lncRNAs have been found to in fact encode for
peptide Peptides are short chains of amino acids linked by peptide bonds. A polypeptide is a longer, continuous, unbranched peptide chain. Polypeptides that have a molecular mass of 10,000 Da or more are called proteins. Chains of fewer than twenty am ...
s with biologically significant function.
Ribosome profiling Ribosome profiling, or Ribo-Seq (also named ribosome footprinting), is an adaptation of a technique developed by Joan Steitz and Marilyn Kozak almost 50 years ago that Nicholas Ingolia and Jonathan Weissman adapted to work with next generation se ...
studies have suggested that anywhere from 40% to 90% of annotated lncRNAs are in fact translated, although there is disagreement about the correct method for analyzing ribosome profiling data. Additionally, it is thought that many of the peptides produced by lncRNAs may be highly unstable and without biological function.


Conservation

Unlike protein coding genes, sequence of long non-coding RNAs has lower level of conservation. Initial studies into lncRNA conservation noted that as a class, they were enriched for
conserved sequence In evolutionary biology, conserved sequences are identical or similar sequences in nucleic acids ( DNA and RNA) or proteins across species ( orthologous sequences), or within a genome ( paralogous sequences), or between donor and receptor taxa ...
elements, depleted in substitution and insertion/deletion rates and depleted in rare frequency variants, indicative of purifying selection maintaining lncRNA function. However, further investigations into
vertebrate Vertebrates () are animals with a vertebral column (backbone or spine), and a cranium, or skull. The vertebral column surrounds and protects the spinal cord, while the cranium protects the brain. The vertebrates make up the subphylum Vertebra ...
lncRNAs revealed that while lncRNAs are conserved in sequence, they are not conserved in transcription. In other words, even when the sequence of a human lncRNA is conserved in another vertebrate species, there is often no transcription of a lncRNA in the orthologous genomic region. Some argue that these observations suggest non-functionality of the majority of lncRNAs, while others argue that they may be indicative of rapid
species A species () is often defined as the largest group of organisms in which any two individuals of the appropriate sexes or mating types can produce fertile offspring, typically by sexual reproduction. It is the basic unit of Taxonomy (biology), ...
-specific adaptive selection. While the turnover of lncRNA transcription is much higher than initially expected, it is important to note that still, hundreds of lncRNAs are conserved at the sequence level. There have been several attempts to delineate the different categories of selection signatures seen amongst lncRNAs including: lncRNAs with strong sequence conservation across the entire length of the
gene In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
, lncRNAs in which only a portion of the transcript (e.g. 5′ end, splice sites) is conserved, and lncRNAs that are transcribed from syntenic regions of the genome but have no recognizable sequence similarity. Additionally, there have been attempts to identify conserved secondary structures in lncRNAs, though these studies have currently given way to conflicting results.


Functions

Some groups have claimed that the majority of long noncoding RNAs in mammals are likely to be functional, but other groups have claimed the opposite. This is an active area of research. Some lncRNAs have been functionally annotated in LncRNAdb (a database of literature described lncRNAs), with the majority of these being described in
human Humans (''Homo sapiens'') or modern humans are the most common and widespread species of primate, and the last surviving species of the genus ''Homo''. They are Hominidae, great apes characterized by their Prehistory of nakedness and clothing ...
s. Over 2600 human lncRNAs with experimental evidences have been community-curated in LncRNAWiki (a
wiki A wiki ( ) is a form of hypertext publication on the internet which is collaboratively edited and managed by its audience directly through a web browser. A typical wiki contains multiple pages that can either be edited by the public or l ...
-based, publicly editable and open-content platform for community curation of human lncRNAs). According to the curation of functional mechanisms of lncRNAs based on the literatures, lncRNAs are extensively reported to be involved in ceRNA regulation, transcriptional regulation, and epigenetic regulation. A further large-scale sequencing study provides evidence that many transcripts thought to be lncRNAs may, in fact, be translated into
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
s.


In the regulation of gene transcription


In gene-specific transcription

In
eukaryote The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...
s, RNA transcription is a tightly regulated process. Noncoding RNAs act upon different aspects of this process, targeting transcriptional modulators, RNA polymerase (RNAP) II and even the DNA duplex to regulate gene expression., NcRNAs modulate transcription by several mechanisms, including functioning themselves as co-regulators, modifying
transcription factor In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription (genetics), transcription of genetics, genetic information from DNA to messenger RNA, by binding t ...
activity, or regulating the association and activity of co-regulators. For example, the noncoding RNA Evf-2 functions as a co-activator for the
homeobox A homeobox is a Nucleic acid sequence, DNA sequence, around 180 base pairs long, that regulates large-scale anatomical features in the early stages of embryonic development. Mutations in a homeobox may change large-scale anatomical features of ...
transcription factor Dlx2, which plays important roles in
forebrain In the anatomy of the brain of vertebrates, the forebrain or prosencephalon is the rostral (forward-most) portion of the brain. The forebrain controls body temperature, reproductive functions, eating, sleeping, and the display of emotions. Ve ...
development and
neurogenesis Neurogenesis is the process by which nervous system cells, the neurons, are produced by neural stem cells (NSCs). This occurs in all species of animals except the porifera (sponges) and placozoans. Types of NSCs include neuroepithelial cells ( ...
.
Sonic hedgehog Sonic hedgehog protein (SHH) is a major signaling molecule of embryonic development in humans and animals, encoded by the ''SHH'' gene. This signaling molecule is key in regulating embryonic morphogenesis in all animals. SHH controls organoge ...
induces transcription of Evf-2 from an
ultra-conserved element An ultraconserved element (UCE) is a region of the genome that is shared between evolutionarily distant taxon, taxa and shows little or no variation between those taxa. These regions and regions adjacent to them (flanking DNA) are useful for tracing ...
located between the Dlx5 and Dlx6 genes during forebrain development. Evf-2 then recruits the Dlx2 transcription factor to the same ultra-conserved element whereby Dlx2 subsequently induces expression of Dlx5. The existence of other similar ultra- or highly conserved elements within the mammalian genome that are both transcribed and fulfill enhancer functions suggest Evf-2 may be illustrative of a generalised mechanism that regulates developmental genes with complex expression patterns during vertebrate growth. Indeed, the transcription and expression of similar non-coding ultraconserved elements was shown to be abnormal in human
leukaemia Leukemia ( also spelled leukaemia; pronounced ) is a group of blood cancers that usually begin in the bone marrow and produce high numbers of abnormal blood cells. These blood cells are not fully developed and are called ''blasts'' or '' ...
and to contribute to
apoptosis Apoptosis (from ) is a form of programmed cell death that occurs in multicellular organisms and in some eukaryotic, single-celled microorganisms such as yeast. Biochemistry, Biochemical events lead to characteristic cell changes (Morphology (biol ...
in
colon cancer Colorectal cancer (CRC), also known as bowel cancer, colon cancer, or rectal cancer, is the development of cancer from the colon or rectum (parts of the large intestine). Signs and symptoms may include blood in the stool, a change in bowel ...
cells, suggesting their involvement in
tumorigenesis Carcinogenesis, also called oncogenesis or tumorigenesis, is the formation of a cancer, whereby normal cells are transformed into cancer cells. The process is characterized by changes at the cellular, genetic, and epigenetic levels and abn ...
in like fashion to protein-coding RNA. Local ncRNAs can also recruit transcriptional programmes to regulate adjacent protein-coding
gene expression Gene expression is the process (including its Regulation of gene expression, regulation) by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, ...
. The RNA binding protein TLS binds and inhibits the CREB binding protein and p300 histone acetyltransferase activities on a repressed gene target, cyclin D1. The recruitment of TLS to the promoter of cyclin D1 is directed by long ncRNAs expressed at low levels and tethered to 5' regulatory regions in response to DNA damage signals. Moreover, these local ncRNAs act cooperatively as ligands to modulate the activities of TLS. In the broad sense, this mechanism allows the cell to harness
RNA-binding protein RNA-binding proteins (often abbreviated as RBPs) are proteins that bind to the double or single stranded RNA in cell (biology), cells and participate in forming ribonucleoprotein complexes. RBPs contain various structural motifs, such as RNA reco ...
s, which make up one of the largest classes within the mammalian
proteome A proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. P ...
, and integrate their function in transcriptional programs. Nascent long ncRNAs have been shown to increase the activity of CREB binding protein, which in turn increases the transcription of that ncRNA. A study found that a lncRNA in the antisense direction of the Apolipoprotein A1 (APOA1) regulates the transcription of APOA1 through
epigenetic In biology, epigenetics is the study of changes in gene expression that happen without changes to the DNA sequence. The Greek prefix ''epi-'' (ἐπι- "over, outside of, around") in ''epigenetics'' implies features that are "on top of" or "in ...
modifications. Recent evidence has raised the possibility that transcription of genes that escape from
X-inactivation X-inactivation (also called Lyonization, after English geneticist Mary Lyon) is a process by which one of the copies of the X chromosome is inactivated in therian female mammals. The inactive X chromosome is silenced by being packaged into ...
might be mediated by expression of long non-coding RNA within the escaping
chromosomal A chromosome is a package of DNA containing part or all of the genetic material of an organism. In most chromosomes, the very long thin DNA fibers are coated with nucleosome-forming packaging proteins; in eukaryotic cells, the most importa ...
domains.


Regulating basal transcription machinery

NcRNAs also target general
transcription factor In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription (genetics), transcription of genetics, genetic information from DNA to messenger RNA, by binding t ...
s required for the RNAP II transcription of all genes. These general factors include components of the initiation complex that assemble on promoters or involved in transcription elongation. A ncRNA transcribed from an upstream minor promoter of the dihydrofolate reductase (DHFR) gene forms a stable RNA-DNA triplex within the major promoter of DHFR to prevent the binding of the transcriptional co-factor TFIIB. This novel mechanism of regulating gene expression may represent a widespread method of controlling promoter usage, as thousands of RNA-DNA triplexes exist in eukaryotic
chromosome A chromosome is a package of DNA containing part or all of the genetic material of an organism. In most chromosomes, the very long thin DNA fibers are coated with nucleosome-forming packaging proteins; in eukaryotic cells, the most import ...
. The U1 ncRNA can induce transcription by binding to and stimulating TFIIH to
phosphorylate In biochemistry, phosphorylation is described as the "transfer of a phosphate group" from a donor to an acceptor. A common phosphorylating agent (phosphate donor) is ATP and a common family of acceptor are alcohols: : This equation can be writt ...
the C-terminal domain of RNAP II. In contrast the ncRNA 7SK is able to repress transcription elongation by, in combination with HEXIM1/ 2, forming an inactive complex that prevents PTEFb from phosphorylating the C-terminal domain of RNAP II, repressing global elongation under stressful conditions. These examples, which bypass specific modes of regulation at individual promoters provide a means of quickly affecting global changes in
gene expression Gene expression is the process (including its Regulation of gene expression, regulation) by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, ...
. The ability to quickly mediate global changes is also apparent in the rapid expression of non-coding repetitive sequences. The short interspersed nuclear (
SINE In mathematics, sine and cosine are trigonometric functions of an angle. The sine and cosine of an acute angle are defined in the context of a right triangle: for the specified angle, its sine is the ratio of the length of the side opposite th ...
)
Alu elements An Alu element is a short stretch of DNA originally characterized by the action of the ''Arthrobacter luteus (Alu)'' restriction endonuclease. ''Alu'' elements are the most abundant transposable elements in the human genome, present in excess of ...
in humans and analogous B1 and B2 elements in
mice A mouse (: mice) is a small rodent. Characteristically, mice are known to have a pointed snout, small rounded ears, a body-length scaly tail, and a high breeding rate. The best known mouse species is the common house mouse (''Mus musculus' ...
have succeeded in becoming the most abundant mobile elements within the genomes, comprising ~10% of the
human Humans (''Homo sapiens'') or modern humans are the most common and widespread species of primate, and the last surviving species of the genus ''Homo''. They are Hominidae, great apes characterized by their Prehistory of nakedness and clothing ...
and ~6% of the
mouse A mouse (: mice) is a small rodent. Characteristically, mice are known to have a pointed snout, small rounded ears, a body-length scaly tail, and a high breeding rate. The best known mouse species is the common house mouse (''Mus musculus'' ...
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
, respectively. These elements are transcribed as ncRNAs by RNAP III in response to environmental stresses such as heat shock, where they then bind to RNAP II with high affinity and prevent the formation of active pre-initiation complexes. This allows for the broad and rapid repression of gene expression in response to stress. A dissection of the functional sequences within Alu RNA transcripts has drafted a modular structure analogous to the organization of domains in protein transcription factors. The Alu RNA contains two 'arms', each of which may bind one RNAP II molecule, as well as two regulatory domains that are responsible for RNAP II transcriptional repression in vitro. These two loosely structured domains may even be concatenated to other ncRNAs such as B1 elements to impart their repressive role. The abundance and distribution of Alu elements and similar repetitive elements throughout the mammalian genome may be partly due to these functional domains being co-opted into other long ncRNAs during evolution, with the presence of functional repeat sequence domains being a common characteristic of several known long ncRNAs including Kcnq1ot1, Xlsirt and Xist. In addition to heat shock, the expression of
SINE In mathematics, sine and cosine are trigonometric functions of an angle. The sine and cosine of an acute angle are defined in the context of a right triangle: for the specified angle, its sine is the ratio of the length of the side opposite th ...
elements (including Alu, B1, and B2 RNAs) increases during cellular stress such as
viral infection A viral disease (or viral infection) occurs when an organism's body is invaded by pathogenic viruses, and infectious virus particles (virions) attach to and enter susceptible cells. Examples include the common cold, gastroenteritis, COVID-19, t ...
in some cancer cells where they may similarly regulate global changes to gene expression. The ability of Alu and B2 RNA to bind directly to RNAP II provides a broad mechanism to repress transcription. Nevertheless, there are specific exceptions to this global response where Alu or B2 RNAs are not found at activated promoters of genes undergoing induction, such as the heat shock genes. This additional hierarchy of regulation that exempts individual genes from the generalised repression also involves a long ncRNA, heat shock RNA-1 (HSR-1). It was argued that HSR-1 is present in mammalian cells in an inactive state, but upon stress is activated to induce the expression of heat shock genes. This activation involves a conformational alteration of HSR-1 in response to rising temperatures, permitting its interaction with the
transcriptional activator A transcriptional activator is a protein (transcription factor) that increases transcription of a gene or set of genes. Activators are considered to have ''positive'' control over gene expression, as they function to promote gene transcription and ...
HSF-1, which trimerizes and induces the expression of heat shock genes. In the broad sense, these examples illustrate a regulatory circuit nested within ncRNAs whereby Alu or B2 RNAs repress general
gene expression Gene expression is the process (including its Regulation of gene expression, regulation) by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, ...
, while other ncRNAs activate the expression of specific
gene In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s.


Transcribed by RNA polymerase III

Many of the ncRNAs that interact with general
transcription factor In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription (genetics), transcription of genetics, genetic information from DNA to messenger RNA, by binding t ...
s or RNAP II itself (including 7SK, Alu and B1 and B2 RNAs) are transcribed by RNAP III, uncoupling their expression from RNAP II, which they regulate. RNAP III also transcribes other ncRNAs, such as BC2, BC200 and some microRNAs and snoRNAs, in addition to
housekeeping Housekeeping is the management and routine support activities of running and maintaining an organized physical institution occupied or used by people, like a house, ship, hospital or factory, such as cleaning, tidying/organizing, cooking, shopp ...
ncRNA genes such as tRNAs, 5S rRNAs and snRNAs. The existence of an RNAP III-dependent ncRNA transcriptome that regulates its RNAP II-dependent counterpart is supported by the finding of a set of ncRNAs transcribed by RNAP III with
sequence homology Sequence homology is the homology (biology), biological homology between DNA sequence, DNA, RNA sequence, RNA, or Protein primary structure, protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments ...
to protein-coding genes. This prompted the authors to posit a 'cogene/gene' functional regulatory network, showing that one of these ncRNAs, 21A, regulates the expression of its antisense partner gene, CENP-F in trans.


In post-transcriptional regulation

In addition to regulating transcription, ncRNAs also control various aspects of post-transcriptional mRNA processing. Similar to small regulatory RNAs such as
microRNA Micro ribonucleic acid (microRNA, miRNA, μRNA) are small, single-stranded, non-coding RNA molecules containing 21–23 nucleotides. Found in plants, animals, and even some viruses, miRNAs are involved in RNA silencing and post-transcr ...
s and snoRNAs, these functions often involve complementary base pairing with the target mRNA. The formation of RNA duplexes between complementary ncRNA and mRNA may mask key elements within the mRNA required to bind trans-acting factors, potentially affecting any step in post-transcriptional
gene expression Gene expression is the process (including its Regulation of gene expression, regulation) by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, ...
including pre-mRNA processing and splicing, transport, translation, and degradation.


In splicing

The splicing of mRNA can induce its translation and functionally diversify the repertoire of
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
s it encodes. The Zeb2 mRNA requires the retention of a 5'UTR
intron An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word ''intron'' is derived from the term ''intragenic region'', i.e., a region inside a gene."The notion of the cistron .e., gen ...
that contains an
internal ribosome entry site An internal ribosome entry site, abbreviated IRES, is an RNA element that allows for translation initiation in a cap-independent manner, as part of the greater process of protein synthesis. Initiation of eukaryotic translation nearly always occur ...
for efficient translation. The retention of the intron depends on the expression of an antisense transcript that complements the intronic 5' splice site. Therefore, the ectopic expression of the antisense transcript represses splicing and induces translation of the Zeb2 mRNA during mesenchymal development. Likewise, the expression of an overlapping antisense Rev-ErbAa2 transcript controls the alternative splicing of the thyroid hormone receptor ErbAa2 mRNA to form two antagonistic isoforms.


In translation

NcRNA may also apply additional regulatory pressures during
translation Translation is the communication of the semantics, meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The English la ...
, a property particularly exploited in
neuron A neuron (American English), neurone (British English), or nerve cell, is an membrane potential#Cell excitability, excitable cell (biology), cell that fires electric signals called action potentials across a neural network (biology), neural net ...
s where the dendritic or
axon An axon (from Greek ἄξων ''áxōn'', axis) or nerve fiber (or nerve fibre: see American and British English spelling differences#-re, -er, spelling differences) is a long, slender cellular extensions, projection of a nerve cell, or neuron, ...
al translation of mRNA in response to synaptic activity contributes to changes in
synaptic plasticity In neuroscience, synaptic plasticity is the ability of synapses to Chemical synapse#Synaptic strength, strengthen or weaken over time, in response to increases or decreases in their activity. Since memory, memories are postulated to be represent ...
and the remodelling of neuronal networks. The RNAP III transcribed BC1 and BC200 ncRNAs, that previously derived from tRNAs, are expressed in the mouse and human
central nervous system The central nervous system (CNS) is the part of the nervous system consisting primarily of the brain, spinal cord and retina. The CNS is so named because the brain integrates the received information and coordinates and influences the activity o ...
, respectively. BC1 expression is induced in response to synaptic activity and synaptogenesis and is specifically targeted to dendrites in neurons. Sequence complementarity between BC1 and regions of various neuron-specific mRNAs also suggest a role for BC1 in targeted translational repression. Indeed, it was recently shown that BC1 is associated with translational repression in dendrites to control the efficiency of dopamine D2 receptor-mediated transmission in the
striatum The striatum (: striata) or corpus striatum is a cluster of interconnected nuclei that make up the largest structure of the subcortical basal ganglia. The striatum is a critical component of the motor and reward systems; receives glutamat ...
and BC1 RNA-deleted mice exhibit behavioural changes with reduced exploration and increased
anxiety Anxiety is an emotion characterised by an unpleasant state of inner wikt:turmoil, turmoil and includes feelings of dread over Anticipation, anticipated events. Anxiety is different from fear in that fear is defined as the emotional response ...
.


In siRNA-directed gene regulation

In addition to masking key elements within single-stranded
RNA Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
, the formation of double-stranded RNA duplexes can also provide a substrate for the generation of endogenous siRNAs (endo-siRNAs) in
Drosophila ''Drosophila'' (), from Ancient Greek δρόσος (''drósos''), meaning "dew", and φίλος (''phílos''), meaning "loving", is a genus of fly, belonging to the family Drosophilidae, whose members are often called "small fruit flies" or p ...
and mouse
oocyte An oocyte (, oöcyte, or ovocyte) is a female gametocyte or germ cell involved in reproduction. In other words, it is an immature ovum, or egg cell. An oocyte is produced in a female fetus in the ovary during female gametogenesis. The female ger ...
s. The annealing of complementary sequences, such as antisense or repetitive regions between transcripts, forms an RNA duplex that may be processed by Dicer-2 into endo-siRNAs. Also, long ncRNAs that form extended intramolecular hairpins may be processed into siRNAs, compellingly illustrated by the esi-1 and esi-2 transcripts. Endo-siRNAs generated from these transcripts seem particularly useful in suppressing the spread of mobile transposon elements within the genome in the germline. However, the generation of endo-siRNAs from antisense transcripts or
pseudogene Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Pseudogenes can be formed from both protein-coding genes and non-coding genes. In the case of protein-coding genes, most pseudogenes arise as superfluous copies of fun ...
s may also silence the expression of their functional counterparts via RISC effector complexes, acting as an important node that integrates various modes of long and short RNA regulation, as exemplified by the Xist and Tsix (see above).


In epigenetic regulation

Epigenetic modifications, including
histone In biology, histones are highly basic proteins abundant in lysine and arginine residues that are found in eukaryotic cell nuclei and in most Archaeal phyla. They act as spools around which DNA winds to create structural units called nucleosomes ...
and
DNA methylation DNA methylation is a biological process by which methyl groups are added to the DNA molecule. Methylation can change the activity of a DNA segment without changing the sequence. When located in a gene promoter (genetics), promoter, DNA methylati ...
, histone acetylation and sumoylation, affect many aspects of chromosomal biology, primarily including regulation of large numbers of genes by remodeling broad
chromatin Chromatin is a complex of DNA and protein found in eukaryote, eukaryotic cells. The primary function is to package long DNA molecules into more compact, denser structures. This prevents the strands from becoming tangled and also plays important r ...
domains. While it has been known for some time that RNA is an integral component of chromatin, it is only recently that we are beginning to appreciate the means by which RNA is involved in pathways of chromatin modification. For example, Oplr16 epigenetically induces the activation of
stem cell In multicellular organisms, stem cells are undifferentiated or partially differentiated cells that can change into various types of cells and proliferate indefinitely to produce more of the same stem cell. They are the earliest type of cell ...
core factors by coordinating intrachromosomal looping and recruitment of DNA demethylase TET2. In ''
Drosophila ''Drosophila'' (), from Ancient Greek δρόσος (''drósos''), meaning "dew", and φίλος (''phílos''), meaning "loving", is a genus of fly, belonging to the family Drosophilidae, whose members are often called "small fruit flies" or p ...
'', long ncRNAs induce the expression of the homeotic gene,
Ubx UBX may refer to: * Ulaanbaatar Securities Exchange, a stock exchange in Mongolia * Ultrabithorax, a homeobox gene found in insects {{Disambig ...
, by recruiting and directing the chromatin modifying functions of the trithorax protein Ash1 to Hox regulatory elements. Similar models have been proposed in mammals, where strong epigenetic mechanisms are thought to underlie the embryonic expression profiles of the Hox genes that persist throughout human development. Indeed, the human
Hox gene Hox genes, a subset of homeobox, homeobox genes, are a gene cluster, group of related genes that Evolutionary developmental biology, specify regions of the body plan of an embryo along the craniocaudal axis, head-tail axis of animals. Hox protein ...
s are associated with hundreds of ncRNAs that are sequentially expressed along both the spatial and temporal axes of human development and define chromatin domains of differential histone methylation and
RNA polymerase In molecular biology, RNA polymerase (abbreviated RNAP or RNApol), or more specifically DNA-directed/dependent RNA polymerase (DdRP), is an enzyme that catalyzes the chemical reactions that synthesize RNA from a DNA template. Using the e ...
accessibility. One ncRNA, termed
HOTAIR HOTAIR (for HOX transcript antisense RNA) is a human gene located between HOXC11 and HOXC12 on chromosome 12. It is the first example of an RNA expressed on one chromosome that has been found to influence the transcription of the HOXD cluste ...
, that originates from the HOXC locus represses transcription across 40 kb of the HOXD locus by altering chromatin trimethylation state. HOTAIR is thought to achieve this by directing the action of Polycomb chromatin remodeling complexes in trans to govern the cells' epigenetic state and subsequent
gene expression Gene expression is the process (including its Regulation of gene expression, regulation) by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, ...
. Components of the Polycomb complex, including Suz12, EZH2 and EED, contain RNA binding domains that may potentially bind HOTAIR and probably other similar ncRNAs. This example nicely illustrates a broader theme whereby ncRNAs recruit the function of a generic suite of chromatin modifying proteins to specific genomic loci, underscoring the complexity of recently published genomic maps. Indeed, the prevalence of long ncRNAs associated with protein coding genes may contribute to localised patterns of chromatin modifications that regulate gene expression during development. For example, the majority of protein-coding genes have antisense partners, including many tumour suppressor genes that are frequently silenced by epigenetic mechanisms in cancer. A recent study observed an inverse expression profile of the p15 gene and an antisense ncRNA in leukaemia. A detailed analysis showed the p15 antisense ncRNA ( CDKN2BAS) was able to induce changes to heterochromatin and DNA methylation status of p15 by an unknown mechanism, thereby regulating p15 expression. Therefore, misexpression of the associated antisense ncRNAs may subsequently silence the tumour suppressor gene contributing towards
cancer Cancer is a group of diseases involving Cell growth#Disorders, abnormal cell growth with the potential to Invasion (cancer), invade or Metastasis, spread to other parts of the body. These contrast with benign tumors, which do not spread. Po ...
.


Imprinting

Many emergent themes of ncRNA-directed
chromatin Chromatin is a complex of DNA and protein found in eukaryote, eukaryotic cells. The primary function is to package long DNA molecules into more compact, denser structures. This prevents the strands from becoming tangled and also plays important r ...
modification were first apparent within the phenomenon of imprinting, whereby only one allele of a gene is expressed from either the maternal or the paternal
chromosome A chromosome is a package of DNA containing part or all of the genetic material of an organism. In most chromosomes, the very long thin DNA fibers are coated with nucleosome-forming packaging proteins; in eukaryotic cells, the most import ...
. In general, imprinted genes are clustered together on chromosomes, suggesting the imprinting mechanism acts upon local chromosome domains rather than individual genes. These clusters are also often associated with long ncRNAs whose expression is correlated with the repression of the linked protein-coding gene on the same allele. Indeed, detailed analysis has revealed a crucial role for the ncRNAs Kcnqot1 and Igf2r/Air in directing imprinting. Almost all the genes at the
Kcnq1 Kv7.1 (KvLQT1) is a potassium channel protein whose primary subunit in humans is encoded by the ''KCNQ1'' gene. Its mutation causes Long QT syndrome, Kv7.1 is a voltage and lipid-gated potassium channel present in the cell membranes of cardi ...
loci are maternally inherited, except the paternally expressed antisense ncRNA Kcnqot1. Transgenic mice with truncated Kcnq1ot fail to silence the adjacent genes, suggesting that Kcnqot1 is crucial to the imprinting of genes on the paternal chromosome. It appears that Kcnqot1 is able to direct the trimethylation of lysine 9 (
H3K9me3 H3K9me3 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the tri-methylation at the 9th lysine residue of the histone H3 protein and is often associated with heterochromatin. Nomenclature H3K9me ...
) and 27 of histone 3 (
H3K27me3 H3K27me3 is an epigenetic modification to the DNA packaging protein histone H3. It is a mark that indicates the tri-methylation of lysine 27 on histone H3 protein. This tri-methylation is associated with the Downregulation and upregulation, down ...
) to an imprinting centre that overlaps the Kcnqot1 promoter and actually resides within a Kcnq1 sense exon. Similar to HOTAIR (see above), Eed-Ezh2 Polycomb complexes are recruited to the Kcnq1 loci paternal chromosome, possibly by Kcnqot1, where they may mediate gene silencing through repressive histone methylation. A differentially methylated imprinting centre also overlaps the promoter of a long antisense ncRNA Air that is responsible for the silencing of neighbouring genes at the Igf2r locus on the paternal chromosome. The presence of allele-specific histone methylation at the Igf2r locus suggests Air also mediates silencing via chromatin modification.


Xist and X-chromosome inactivation

The inactivation of a X-chromosome in female placental mammals is directed by one of the earliest and best characterized long ncRNAs, Xist. The expression of Xist from the future inactive X-chromosome, and its subsequent coating of the inactive X-chromosome, occurs during early
embryonic stem cell Embryonic stem cells (ESCs) are Cell potency#Pluripotency, pluripotent stem cells derived from the inner cell mass of a blastocyst, an early-stage pre-Implantation (human embryo), implantation embryo. Human embryos reach the blastocyst stage 4� ...
differentiation. Xist expression is followed by irreversible layers of chromatin modifications that include the loss of the histone (H3K9) acetylation and H3K4 methylation that are associated with active chromatin, and the induction of repressive chromatin modifications including H4 hypoacetylation, H3K27 trimethylation, H3K9 hypermethylation and H4K20 monomethylation as well as H2AK119 monoubiquitylation. These modifications coincide with the transcriptional silencing of the X-linked genes. Xist RNA also localises the histone variant macroH2A to the inactive X–chromosome. There are additional ncRNAs that are also present at the Xist loci, including an antisense transcript Tsix, which is expressed from the future active chromosome and able to repress Xist expression by the generation of endogenous siRNA. Together these ncRNAs ensure that only one X-chromosome is active in
female An organism's sex is female ( symbol: ♀) if it produces the ovum (egg cell), the type of gamete (sex cell) that fuses with the male gamete (sperm cell) during sexual reproduction. A female has larger gametes than a male. Females and ...
mammals.


Telomeric non-coding RNAs

Telomeres form the terminal region of mammalian chromosomes and are essential for stability and aging and play central roles in diseases such as
cancer Cancer is a group of diseases involving Cell growth#Disorders, abnormal cell growth with the potential to Invasion (cancer), invade or Metastasis, spread to other parts of the body. These contrast with benign tumors, which do not spread. Po ...
. Telomeres have been long considered transcriptionally inert DNA-protein complexes until it was shown in the late 2000s that telomeric repeats may be transcribed as telomeric RNAs (TelRNAs) or telomeric repeat-containing RNAs. These ncRNAs are heterogeneous in length, transcribed from several sub-telomeric loci and physically localise to telomeres. Their association with chromatin, which suggests an involvement in regulating telomere specific heterochromatin modifications, is repressed by SMG proteins that protect chromosome ends from telomere loss. In addition, TelRNAs block
telomerase Telomerase, also called terminal transferase, is a ribonucleoprotein that adds a species-dependent telomere repeat sequence to the 3' end of telomeres. A telomere is a region of repetitive sequences at each end of the chromosomes of most euka ...
activity in vitro and may therefore regulate telomerase activity. Although early, these studies suggest an involvement for telomeric ncRNAs in various aspects of telomere biology.


In regulation of DNA replication timing and chromosome stability

Asynchronously replicating autosomal RNAs (ASARs) are very long (~200kb) non-coding RNAs that are non-spliced, non-polyadenylated, and are required for normal
DNA replication In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all life, living organisms, acting as the most essential part of heredity, biolog ...
timing and chromosome stability. Deletion of any one of the genetic loci containing ASAR6, ASAR15, or ASAR6-141 results in the same phenotype of delayed replication timing and delayed mitotic condensation (DRT/DMC) of the entire chromosome. DRT/DMC results in chromosomal segregation errors that lead to increased frequency of secondary rearrangements and an unstable chromosome. Similar to Xist, ASARs show random monoallelic expression and exist in asynchronous DNA replication domains. Although the mechanism of ASAR function is still under investigation, it is hypothesized that they work via similar mechanisms as the Xist lncRNA, but on smaller autosomal domains resulting in allele specific changes in gene expression. Incorrect reparation of DNA double-strand breaks (DSB) leading to chromosomal rearrangements is one of the oncogenesis's primary causes. A number of lncRNAs are crucial at the different stages of the main pathways of DSB repair in eukaryotic cells: nonhomologous end joining ( NHEJ) and homology-directed repair ( HDR). Gene mutations or variation in expression levels of such RNAs can lead to local DNA repair defects, increasing the chromosome aberration frequency. Moreover, it was demonstrated that some RNAs could stimulate long-range chromosomal rearrangements.


Structure

It took over two decades after the discovery of the first human long non-coding transcripts for the functional significance of lncRNA structures to be fully recognized. Early structural studies led to the proposal of several hypotheses for classifying lncRNA architectures. One hypothesis suggests that lncRNAs may feature a compact tertiary structure, similar to ribozymes like the ribosome or self-splicing introns. Another possibility is that lncRNAs could have structured protein-binding sites arranged in a decentralized scaffold, lacking a compact core. A third hypothesis posits that lncRNAs might exhibit a largely unstructured architecture, with loosely organized protein-binding domains interspersed with long regions of disordered single-stranded RNA. Studying the tertiary structure of lncRNAs by conventional methods such as X- ray crystallography, cryo-EM and nuclear magnetic resonance (NMR) is unfortunately still hampered by their size and conformational dynamics, and by the fact that for now we still know too little about their mechanism to reconstruct stable and functionally-active lncRNA-ribonucleoprotein complexes. But some pioneering studies, showed that lncRNAs can already be studied by low-resolution single-particle and in-solution methods, such as atomic force microscopy (AFM) and small-angle X-ray scattering (SAXS), in some cases even in complexes with small molecule modulators. For instance, lncRNA MEG3 was shown to regulate transcription factor p53 thanks to its compact structured core. Moreover, lncRNA Braveheart (Bvht) was shown to have a well-defined, albeit flexible 3D structure that is remodeled upon binding CNBP (Cellular Nucleic-acid Binding Protein) which recognizes distal domains in the RNA. Finally, Xist a master regulator of X chromosome inactivation was shown to specifically bind a small molecule compound, which alters the conformation of Xist RepA motif and displaces two known interacting protein factors (PRC2 and SPEN) from the RNA. By such mechanism of action, the compound abrogates the initiation of X-chromosome inactivation.


See also

* List of long non-coding RNA databases * NONCODE * Pinc * Sphinx (gene) *
VIS1 VIS1 (viral integration site 1), also known as HIS-1, is a long non-coding RNA. It was originally identified in mice in a screen for genes involved in the development of myeloid leukemia. In murine myeloid leukemias, this gene is a common site ...
* ZNRD1-AS1 * Noncoding RNA Activated by DNA Damage


References

{{DEFAULTSORT:Long Noncoding Rna RNA Non-coding RNA Biotechnology