Ridge (biology)
   HOME

TheInfoList



OR:

Ridges (regions of increased gene expression) are domains of the
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
with a high
gene expression Gene expression is the process (including its Regulation of gene expression, regulation) by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, ...
; the opposite of ridges are antiridges. The term was first used by Caron et al. in 2001. Characteristics of ridges are: *Gene dense *Contain many C and G
nucleobase Nucleotide bases (also nucleobases, nitrogenous bases) are nitrogen-containing biological compounds that form nucleosides, which, in turn, are components of nucleotides, with all of these monomers constituting the basic building blocks of nuc ...
s *Genes have short
intron An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word ''intron'' is derived from the term ''intragenic region'', i.e., a region inside a gene."The notion of the cistron .e., gen ...
s *High SINE repeat density *Low LINE repeat density


Discovery

Clustering of genes in
prokaryotes A prokaryote (; less commonly spelled procaryote) is a single-celled organism whose cell lacks a nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Ancient Greek (), meaning 'before', and (), meaning 'nut' ...
was known for a long time. Their genes are grouped in
operon In genetics, an operon is a functioning unit of DNA containing a cluster of genes under the control of a single promoter. The genes are transcribed together into an mRNA strand and either translated together in the cytoplasm, or undergo splic ...
s, genes within operons share a common promoter unit. These genes are mostly functionally related. The genome of prokaryotes is relatively very simple and compact. In
eukaryotes The eukaryotes ( ) constitute the domain of Eukaryota or Eukarya, organisms whose cells have a membrane-bound nucleus. All animals, plants, fungi, seaweeds, and many unicellular organisms are eukaryotes. They constitute a major group of ...
the genome is huge and only a small amount of it are functionally genes, furthermore the genes are not arranged in operons. Except for
nematode The nematodes ( or ; ; ), roundworms or eelworms constitute the phylum Nematoda. Species in the phylum inhabit a broad range of environments. Most species are free-living, feeding on microorganisms, but many are parasitic. Parasitic worms (h ...
s and
trypanosomes Trypanosomatida is a group of kinetoplastid unicellular organisms distinguished by having only a single flagellum. The name is derived from the Greek language, Greek ''trypano'' (borer) and ''soma'' (body) because of the corkscrew-like motion of ...
; although their operons are different from the prokaryotic operons. In eukaryotes each gene has a transcription regulation site of its own. Therefore, genes don't have to be in close proximity to be co-expressed. Therefore, it was long assumed that eukaryotic genes were randomly distributed across the genome due to the high rate of chromosome rearrangements. But because the complete sequence of genomes became available it became possible to absolutely locate a gene and measure its distance to other genes. The first eukaryote genome ever sequenced was that of
Saccharomyces cerevisiae ''Saccharomyces cerevisiae'' () (brewer's yeast or baker's yeast) is a species of yeast (single-celled fungal microorganisms). The species has been instrumental in winemaking, baking, and brewing since ancient times. It is believed to have be ...
, or budding yeast, in 1996. Half a year after that Velculescu et al. (1997) published a research in which they had integrated SAGE data with the now available genome map. During a cell cycle different genes are active in a cell. Therefore, they used SAGE data from three moments of the cell cycle (log phase,
S phase S phase (Synthesis phase) is the phase of the cell cycle in which DNA is replicated, occurring between G1 phase and G2 phase. Since accurate duplication of the genome is critical to successful cell division, the processes that occur during S ...
-arrested and G2/ M-phase arrested cells). Because in yeast all genes have a promoter unit of their own it was not suspected that genes would cluster near to each other but they did. Clusters were present on all 16 yeast chromosomes. A year later Cho et al. also reported (although in more detail) that certain genes are located near to each other in yeast.


Characteristics and function


Co-expression

Cho et al. were the first who determined that clustered genes have the same expression levels. They identified transcripts that show cell-cycle dependent periodicity. Of those genes 25% was located in close proximity to other genes which were transcript in the same cell cycle. Cohen et al. (2000) also identified clusters of co-expressed genes. Caron et al. (2001) made a human transcriptome map of 12 different tissues (cancer cells) and concluded that genes are not randomly distributed across the chromosomes. Instead, genes tend to cluster in groups of sometimes 39 genes in close proximity. Clusters were not only gene dense. They identified 27 clusters of genes with very high expression levels and called them RIDGEs. A common RIDGE counts 6 to 30 genes per centiray. However, there were great exceptions, 40 to 50% of the RIDGEs were not that gene dense; just like in yeast these RIDGEs were located in the
telomere A telomere (; ) is a region of repetitive nucleotide sequences associated with specialized proteins at the ends of linear chromosomes (see #Sequences, Sequences). Telomeres are a widespread genetic feature most commonly found in eukaryotes. In ...
regions. Lercher et al. (2002) pointed to some weaknesses in Caron's approach. Clusters of genes in close proximity and high transcription levels can easily been generated by tandem duplicates. Genes can generate duplicates of themselves which are incorporated in their neighborhood. These duplicates can either became a functional part of the pathway of their parent gene, or (because they are no longer favored by natural selection) gain deleterious mutations and turn into pseudogenes. Because these duplicates are false positives in the search for gene clusters they have to be excluded. Lercher excluded neighboring genes with high resemblance to each other, after that he searched with a sliding window for regions with 15 neighboring genes. It was clear that gene dense regions existed. There was a striking correlation between gene density and a high CG content. Some clusters indeed had high expression levels. But most of the highly expressed regions consisted of housekeeping genes; genes that are highly expressed in all tissues because they code for basal mechanisms. Only a minority of the clusters contained genes that were restricted to specific tissues. Versteeg et al. (2003) tried, with a better human genome map and better SAGE , to determine the characteristics of RIDGEs more specific.
Overlapping genes An overlapping gene (or OLG) is a gene whose expressible nucleotide sequence partially overlaps with the expressible nucleotide sequence of another gene. In this way, a nucleotide sequence may make a contribution to the function of one or more ge ...
were treated as one gene, and genes without introns were rejected as pseudogenes. They determined that RIDGEs are very gene dense, have a high gene expression, short introns, high SINE repeat density and low LINE repeat density. Clusters containing genes with very low transcription levels had characteristics that were the opposite of RIDGEs, therefore those clusters were called antiridges. LINE repeats are junk DNA which contains a cleavage site of endonuclease (TTTTA). Their scarcity in RIDGEs can be explained by the fact that natural selection favors the scarcity of LINE repeats in ORFs because their endonuclease sites can cause deleterious mutation to the genes. Why SINE repeats are abundant is not yet understood. Versteeg et al. also concluded that, contrary to Lerchers analysis, the transcription levels of many genes in RIDGEs (for example a cluster on chromosome 9) can vary strongly between different tissues. Lee et al. (2003) analyzed the trend of gene clustering between different species. They compared ''Saccharomyces cerevisiae'', ''
Homo sapiens Humans (''Homo sapiens'') or modern humans are the most common and widespread species of primate, and the last surviving species of the genus ''Homo''. They are Hominidae, great apes characterized by their Prehistory of nakedness and clothing ...
'', ''
Caenorhabditis elegans ''Caenorhabditis elegans'' () is a free-living transparent nematode about 1 mm in length that lives in temperate soil environments. It is the type species of its genus. The name is a Hybrid word, blend of the Greek ''caeno-'' (recent), ''r ...
'', ''
Arabidopsis thaliana ''Arabidopsis thaliana'', the thale cress, mouse-ear cress or arabidopsis, is a small plant from the mustard family (Brassicaceae), native to Eurasia and Africa. Commonly found along the shoulders of roads and in disturbed land, it is generally ...
'' and ''
Drosophila melanogaster ''Drosophila melanogaster'' is a species of fly (an insect of the Order (biology), order Diptera) in the family Drosophilidae. The species is often referred to as the fruit fly or lesser fruit fly, or less commonly the "vinegar fly", "pomace fly" ...
'', and found a degree of clustering, as fraction of genes in loose clusters, of respectively (37%), (50%), (74%), (52%) and (68%). They concluded that pathways of which the genes are clusters across many species are rare. They found seven universally clustered pathways:
glycolysis Glycolysis is the metabolic pathway that converts glucose () into pyruvic acid, pyruvate and, in most organisms, occurs in the liquid part of cells (the cytosol). The Thermodynamic free energy, free energy released in this process is used to form ...
, aminoacyl-tRNA biosynthesis,
ATP synthase ATP synthase is an enzyme that catalyzes the formation of the energy storage molecule adenosine triphosphate (ATP) using adenosine diphosphate (ADP) and inorganic phosphate (Pi). ATP synthase is a molecular machine. The overall reaction catalyzed ...
,
DNA polymerase A DNA polymerase is a member of a family of enzymes that catalyze the synthesis of DNA molecules from nucleoside triphosphates, the molecular precursors of DNA. These enzymes are essential for DNA replication and usually work in groups to create t ...
, hexachlorocyclohexane degradation, cyanoamino acid metabolism, and
photosynthesis Photosynthesis ( ) is a system of biological processes by which photosynthetic organisms, such as most plants, algae, and cyanobacteria, convert light energy, typically from sunlight, into the chemical energy necessary to fuel their metabo ...
( ATP synthesis in non plant species). Not surprisingly these are basic cellular pathways. Lee et al. used very diverse groups of animals. Within these groups clustering is conserved, for example the clustering motifs of Homo sapiens and Mus musculus are more or less the same. Spellman and Rubin (2002) made a transcriptome map of ''Drosophila''. Of all assayed genes 20% was clustered. Clusters consisted of 10 to 30 genes over a group size of about 100 kilobases. The members of the clusters were not functionally related and the location of clusters didn't correlate with know chromatin structures. This study also showed that within clusters the expression levels of on average 15 genes was much the same across the many experimental conditions which were used. These similarities were so striking that the authors reasoned that the genes in the clusters are not individually regulated by their personal promoter but that changes in the chromatin structure were involved. A similar co-regulation pattern was published in the same year by Roy et al. (2002) in C. elegans. Many genes which are grouped into clusters show the same expression profiles in human invasive ductal breast carcinomas. Roughly 20% of the genes show a correlation with their neighbors. Clusters of co-expressed genes were divided by regions with less correlation between genes. These clusters could cover an entire chromosome arm. Contrary to previous discussed reports Johnidis et al. (2005) have discovered that (at least some) genes within clusters are not co-regulated. Aire is a transcription factor which has an up- and down-regulation effect on various genes. It functions in negative selection of thymocytes, which responds to the organisms own epitopes, by medullary cells. The genes that were controlled by aire clustered. 53 of the genes most activated by aire had an aire-activated neighbor within 200 Kb or less, and 32 of the genes most repressed by aire had an aire-repressed neighbor within 200 Kb; this is less than expected by change. They did the same screening for the transcriptional regulator CIITA. These transcription regulators didn't have the same effect on al genes in the same cluster. Genes that were activated and repressed or unaffected were sometimes present in the same cluster. In this case, it's impossible that aire-regulated genes were clustered because they were all co-regulated. So it is not very clear if domains are co-regulated or not. A very effective way to test this would be by insert synthetic genes into RIDGEs, antiridges and/or random places in the genome and determine their expression. Those expression levels must be compared to each other. Gierman et al. (2007) were the first who proved co-regulation using this approach. As an insertion construct they used a fluorescing GFP gene driven by the ubiquitously expressed human phosphoglycerate kinase (PGK) promoter. They integrated this construct in 90 different positions in the genome of human HEK293 cells. They found that the expression of the construct in Ridges was indeed higher than those inserted in antiridges (while all constructs have the same promoter). They investigated if these differences in expressions were due to genes in the direct neighborhood of the constructs or by the domain as a whole. They found that constructs next to highly expressed genes were slightly more expressed than others. But when to enlarged the window size to the surrounding 49 genes (domain level) they saw that constructs located in domains with an overall high expression had a more than 2-fold higher expression then those located in domains with a low expression level. They also checked if the construct was expressed at similar levels as neighboring genes, and if that tight co-expression was present solely within RIDGEs. They found that the expressions were highly correlated within RIDGEs, and almost absent near the end and outside the RIDGEs. Previous observations and the research of Gierman et al. proved that the activity of a domain has great impact on the expression of the genes located in it. And the genes within a RIDGE are co-expressed. However the constructs used by Gierman et al. were regulated by al full-time active promoter. The genes of the research of Johnidis et al. were dependent of the present of the aire transcription factor. The strange expression of the aire regulated genes could partly have been caused by differences in expression and conformation of the aire transcription factor itself.


Functional relation

It was known before the genomic era that clustered genes tend to be functionally related. Abderrahim et al. (1994) had shown that all the genes of the major histocompatibility complex were clustered on the 6p21 chromosome. Roy et al. (2002) showed that in the nematode ''C. elegans'' genes that are solely expressed in muscle tissue during the larval stage tend to cluster in small groups of 2–5 genes. They identified 13 clusters. Yamashita et al. (2004) showed that genes related to specific functions in organs tend to cluster. Six liver related domains contained genes for xenobiotic, lipid and alcohol metabolism. Five colon-related domains had genes for apoptosis, cell proliferation, ion transporter and mucin production. These clusters were very small and expression levels were low. Brain and breast related genes didn't cluster. This shows that at least some clusters consist of functionally related genes. However, there are great exceptions. Spellman and Rubin have shown that there are clusters of co-expressed genes that are not functionally related. It seems like that clusters appear in very different forms.


Regulation

Cohen et al. found that of a pair of co-expressed genes only one promoter has an
Upstream Activating Sequence An upstream activating sequence or upstream activation sequence (UAS) is a cis-acting regulatory sequence found in yeast like ''Saccharomyces cerevisiae''. It is distinct from the promoter and increases the expression of a neighbouring gene. Due ...
(UAS) associated with that expression pattern. They suggested that UASs can activate genes that are not in immediate adjacency to them. This explanation could explain the co-expression of small clusters, but many clusters contain to many genes to be regulated by a single UAS.
Chromatin Chromatin is a complex of DNA and protein found in eukaryote, eukaryotic cells. The primary function is to package long DNA molecules into more compact, denser structures. This prevents the strands from becoming tangled and also plays important r ...
changes are a plausible explanation for the co-regulation seen in clusters. Chromatin consists of the DNA strand and
histones In biology, histones are highly Base (chemistry), basic proteins abundant in lysine and arginine residues that are found in eukaryotic cell nuclei and in most Archaea, Archaeal Phylum, phyla. They act as spools around which DNA winds to create st ...
that are attached to the DNA. Regions were chromatin is very tightly packed are called heterochromatin. Heterochromatin consists very often of remains of viral genomes,
transposons A transposable element (TE), also transposon, or jumping gene, is a type of mobile genetic element, a nucleic acid sequence in DNA that can change its position within a genome. The discovery of mobile genetic elements earned Barbara McClinto ...
and other junk DNA. Because of tight packing the DNA is almost unreachable for the transcript machinery, covering deleterious DNA with proteins is the way in which the cell can protect itself. Chromatin which consists of functional genes is often an open structure were the DNA is accessible. However, most of the genes are not needed to be expressed all the time. DNA with genes that aren't needed can be covered with histones. When a gene must be expressed special proteins can alter the chemical that are attached to the histones (histone modifications) that cause the histones to open the structure. When the chromatin of one gene is opened, the chromatin of the adjacent genes is also until this modification meets a boundary element. In that way genes is close proximity are expressed on the same time. So, genes are clustered in “expression hubs”. In comparison with this model Gilbert et al. (2004) showed that RIDGEs are mostly present in open chromatin structures. However Johnidis et al. (2005) have shown that genes in the same cluster can be very differently expressed. How eukaryotic gene regulation, and associated chromatin changes, precisely works is still very unclear and there is no consensus about it. In order to get a clear picture about the mechanism of gene clusters first the workings chromatin and gene regulation needs to be illuminated. Furthermore, most papers that identified clusters of co-regulated genes focused on transcription levels whereas few focused on clusters regulated by the same transcription-factors. Johnides et al. discovered strange phenomena when they did.


Origins

The first models which tried to explain the clustering of genes were, of course, focused on operons because they were discovered before eukaryote gene clusters were. In 1999 Lawrence proposed a model for the origin operons. This selfish operon model suggests that individual genes were grouped together by vertical en horizontal transfer and were preserved as a single unit because that was beneficial for the genes, not per se for the organism. This model predicts that the gene clusters must have conserved between species. This is not the case for many operons and gene clusters seen in eukaryotes. According to Eichler and Sankoff the two mean processes in eukaryotic chromosome evolution are 1) rearrangements of chromosomal segments and 2) localized duplication of genes. Clustering could be explained by reasoning that all genes in a cluster are originated from tandem duplicates of a common ancestor. If all co-expressed genes in a cluster were evolved from a common ancestral gene it would have been expected that they're co-expressed because they all have comparable promoters. However, gene clustering is a very common tread in genomes and it isn't clear how this duplication model could explain all of the clustering. Furthermore, many genes that are present in clusters are not homologous. How did evolutionary non-related genes come in close proximity in the first place? Either there is a force that brings functionally related genes near to each other, or the genes came near by change. Singer et al. proposed that genes came in close proximity by random recombination of genome segments. When functionally related genes came in close proximity to each other, this proximity was conserved. They determined all possible recombination sites between genes of human and mouse. After that, they compared the clustering of the mouse and human genome and looked if recombination had occurred at the potentially recombination sites. It turned out that recombination between genes of the same cluster was very rare. So, as soon as a functional cluster is formed recombination is suppressed by the cell. On sex chromosomes, the amount of clusters is very low in both human and mouse. The authors reasoned this was due to the low rate of chromosomal rearrangements of sex chromosomes. Open chromatin regions are active regions. It is more likely that genes will be transferred to these regions. Genes from organelle and virus genome are inserted more often in these regions. In this way non-homologous genes can be pressed together in a small domain. It is possible that some regions in the genome are better suited for important genes. It is important for the cell that genes that are responsible for basal functions are protected from recombination. It has been observed in yeast and worms that essential genes tend to cluster in regions with a small replication rate. It is possible that genes came in close proximity by change. Other models have been proposed but none of them can explain all observed phenomena. It's clear that as soon as clusters are formed they are conserved by natural selection. However, a precise model of how genes came in close proximity is still lacking. The bulk of the present clusters must have formed relatively recent because only seven clusters of functionally related genes are conserved between phyla. Some of these differences can be explained by the fact that gene expression is very differently regulated by different phyla. For example, in vertebrates and plants DNA methylation is used, whereas it is absent in yeast and flies.


See also

*
Chromatin Chromatin is a complex of DNA and protein found in eukaryote, eukaryotic cells. The primary function is to package long DNA molecules into more compact, denser structures. This prevents the strands from becoming tangled and also plays important r ...
*
DNA sequence A nucleic acid sequence is a succession of bases within the nucleotides forming alleles within a DNA (using GACT) or RNA (GACU) molecule. This succession is denoted by a series of a set of five different letters that indicate the order of the nu ...
*
Transcription factor In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription (genetics), transcription of genetics, genetic information from DNA to messenger RNA, by binding t ...


Notes

{{reflist, 2 Gene expression Genomics