Cancer Genome Atlas
   HOME

TheInfoList



OR:

The Cancer Genome Atlas (TCGA) is a project to catalogue the
genomic alteration Genomics is an interdisciplinary field of molecular biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, ...
s responsible for
cancer Cancer is a group of diseases involving Cell growth#Disorders, abnormal cell growth with the potential to Invasion (cancer), invade or Metastasis, spread to other parts of the body. These contrast with benign tumors, which do not spread. Po ...
using
genome sequencing Whole genome sequencing (WGS), also known as full genome sequencing or just genome sequencing, is the process of determining the entirety of the DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's ...
and
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
. The overarching goal was to apply high-throughput genome analysis techniques to improve the ability to diagnose, treat, and prevent cancer through a better understanding of the genetic basis of the disease. TCGA was supervised by the
National Cancer Institute The National Cancer Institute (NCI) coordinates the United States National Cancer Program and is part of the National Institutes of Health (NIH), which is one of eleven agencies that are part of the U.S. Department of Health and Human Services. ...
's Center for Cancer Genomics and the
National Human Genome Research Institute The National Human Genome Research Institute (NHGRI) is an institute of the National Institutes of Health, located in Bethesda, Maryland. NHGRI began as the Office of Human Genome Research in The Office of the Director in 1988. This Office transi ...
funded by the US government. A three-year pilot project, begun in 2006, focused on characterization of three types of human cancers:
glioblastoma multiforme Glioblastoma, previously known as glioblastoma multiforme (GBM), is the most aggressive and most common type of cancer that originates in the brain, and has a very poor prognosis for survival. Initial signs and symptoms of glioblastoma are nons ...
,
lung The lungs are the primary Organ (biology), organs of the respiratory system in many animals, including humans. In mammals and most other tetrapods, two lungs are located near the Vertebral column, backbone on either side of the heart. Their ...
squamous carcinoma, and ovarian serous adenocarcinoma. In 2009, it expanded into phase II, which planned to complete the genomic characterization and sequence analysis of 20–25 different tumor types by 2014. Ultimately, TCGA surpassed that goal, characterizing 33 cancer types including 10 rare cancers. The project initially set out to collect and characterize 500 patient samples, more than most genomics studies of its time, and used a variety of different molecular techniques. Techniques included
gene expression profiling In the field of molecular biology, gene expression profiling is the measurement of the activity (the gene expression, expression) of thousands of genes at once, to create a global picture of cellular function. These profiles can, for example, dis ...
,
copy number variation Copy number variation (CNV) is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals. Copy number variation is a type of structural variation: specifically, it is a type of ...
profiling,
SNP genotyping SNP genotyping is the measurement of genetic variations of single nucleotide polymorphisms (SNPs) between members of a species. It is a form of genotyping, which is the measurement of more general genetic variation. SNPs are one of the most commo ...
, genome wide
DNA methylation DNA methylation is a biological process by which methyl groups are added to the DNA molecule. Methylation can change the activity of a DNA segment without changing the sequence. When located in a gene promoter (genetics), promoter, DNA methylati ...
profiling,
microRNA Micro ribonucleic acid (microRNA, miRNA, μRNA) are small, single-stranded, non-coding RNA molecules containing 21–23 nucleotides. Found in plants, animals, and even some viruses, miRNAs are involved in RNA silencing and post-transcr ...
profiling, and
exon An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequence ...
sequencing In genetics and biochemistry, sequencing means to determine the primary structure (sometimes incorrectly called the primary sequence) of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succ ...
. With restraints of nascent technology and costs at the start of the project, many array-based technologies and limited targeted gene sequencing were performed. During II, TCGA was able to begin performing whole exome and whole transcriptome sequencing on all cases and whole genome sequencing on 10% of the cases used in the project.


Goals

The goal of TCGA's pilot project was to establish an infrastructure to collect, molecularly characterize, and analyze 500 cancers and matched controls. The work required extensive cooperation among a team of scientists from various institutions and assessment of multiple burgeoning high-throughput technologies. TCGA wanted to not only generate high-quality and biologically meaningful genomic data, but also make that data freely available to the cancer research community. Three tumor types were explored during the pilot phase, glioblastoma multiforme (GBM) and high-grade serous ovarian adenocarcinoma, and lung squamous carcinoma. Following success of the pilot phase, TCGA expanded its effort to characterize additional cancer types and provide a rich and large genomic data set for further cancer research discovery.


Management

TCGA was co-managed by scientists and managers from the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). With the expansion of TCGA from the pilot phase to Phase II in October 2009, NCI created a TCGA Program Office to help manage the project. Dr. Jean Claude Zenklusen has been the director of the office since August 2013. The TCGA Program Office was responsible for the operation of six Genome Characterization Centers, seven Genome Analysis Centers, the Biospecimen Core Resource, the Data Coordination Center, and approximately one third of the sequencing done for the project by the three Genome Sequencing Centers. In addition, the TCGA Project Office was responsible for coordinating the accrual of tissues for TCGA. Dr. Carolyn Hutter, project manager for NHGRI, directed two thirds of the sequencing at the Genome Sequencing Centers. Members from the NCI and the NHGRI teams, along with principal investigators funded by the project, comprised the Steering Committee. The Steering Committee was tasked with overseeing the scientific validity of the project while the NCI/NHGRI administrative team ensured that the scientific progress and goals of the project were met, the project was completed on time and on budget, and the various components of the project worked together.


Tissue accrual

Tissue requirements varied from tissue type to tissue type and from cancer type to cancer type. Disease experts from the project's Disease Working Groups helped to define the characteristics of the typical tissue samples accrued as "standard of care" in the United States and how TCGA could best utilize the tissue. For example, the
Brain Disease Central nervous system diseases or central nervous system disorders are a group of neurological disorders that affect the structure or function of the human brain, brain or spinal cord, which collectively form the central nervous system (CNS). Th ...
Working Group determined that samples containing more than 50%
necrosis Necrosis () is a form of cell injury which results in the premature death of cells in living tissue by autolysis. The term "necrosis" came about in the mid-19th century and is commonly attributed to German pathologist Rudolf Virchow, who i ...
would not be suitable for TCGA and that 80% tumor nuclei were required in the viable portion of the tumor. TCGA followed some general guidelines as a starting point for collecting samples from any type of tumor, including a minimum of 200  mg in size, no less than 80% tumor nuclei and a matched source of
germline In biology and genetics, the germline is the population of a multicellular organism's cells that develop into germ cells. In other words, they are the cells that form gametes ( eggs and sperm), which can come together to form a zygote. They dif ...
DNA (such as blood or purified DNA). In addition, institutions submitting tissues to TCGA were required to include a minimal clinical data set as defined by the Disease Working Group, signed consents which have been approved by their institution's IRB, as well as a material transfer agreement with TCGA. In 2009, NCI removed approximately $130 million of ARRA from the NCI's "Prime Contract" with Science Applications International Corporation (SAIC) to fund tissue accrual and a variety of other activities through the NCI Office of Acquisition. $42 million was available for tissue accrual through NCI using "Requests for Quotations" (RFQs) and "Requests for Proposals" (RFPs) to generate purchase orders and contracts, respectively. RFQs were primarily used for the collection of retrospective samples from established banks while RFPs were used for the prospective collection of samples. TCGA finalized sample collection in December, 2013, with nearly 20,000 biospecimens. Institutions that contributed samples to TCGA were paid, and gained advance access to molecular data generated on their samples, while maintaining a link between the TCGA unique identifier and their own unique identifier. This permitted contributing institutions to link back to the clinical data for their samples and to enter into collaborations with other institutions that had similar data on TCGA samples, thus increasing the power of outcome analysis.


Organization

TCGA managed a number of different types of centers that were funded to generate and analyze data. TCGA was the first large-scale genomics project funded by the NIH to include significant resources to bioinformatic discovery. The NCI has devoted 50% of TCGA appropriated funds, approximately $12M/year, to fund bioinformatic discovery. Genome Characterization Centers and Genome Sequencing Centers generated data. Two separate Genome Data Analysis Centers utilized the data for bioinformatic discovery. Two centers were funded to isolate biomolecules from patient samples and one center is funded to store the data. This workflow has evolved over the years and is now known as NCI's Genome Characterization Pipeline.


Biospecimen Core Resource

The Biospecimen Core Resource (BCR) was responsible for verifying the quality and quantity of tissue shipped by tissue source sites, isolating DNA and RNA from the samples, performing quality control of these biomolecules, and shipping processed samples to the GSCs and GCCs. The International Genomics Consortium was awarded the contract to initiate the BCR for the pilot project. There were two BCRs funded by NCI at the start of the full project:
Nationwide Children's Hospital Nationwide Children's Hospital (formerly Columbus Children's Hospital) is a nationally ranked pediatric acute care teaching hospital located in the Southern Orchards neighborhood of Columbus, Ohio. The hospital has 673 pediatric beds and is aff ...
and the International Genomics Consortium. The BCRs were recompeted in 2010 and Nationwide Children's Hospital was awarded the contract.


Genome Sequencing Centers

Three Genome Sequencing Centers (GCCs) were co-funded by NCI and NHGRI: the
Broad Institute The Eli and Edythe L. Broad Institute of MIT and Harvard (IPA: , pronunciation respelling: ), often referred to as the Broad Institute, is a biomedical and genomic research center located in Cambridge, Massachusetts, United States. The institu ...
,
McDonnell Genome Institute McDonnell Genome Institute (The Elizabeth H. and James S. McDonnell III Genome Institute) at Washington University in St. Louis, Missouri, is one of three NIH funded large-scale sequencing centers in the United States. Affiliated with Washingt ...
at Washington University and
Baylor College of Medicine The Baylor College of Medicine (BCM) is a private medical school in Houston, Texas, United States. Originally as the Baylor University College of Medicine from 1903 to 1969, the college became independent with the current name and has been se ...
. All three of these sequencing centers have shifted from Sanger sequencing to next-generation sequencing (NGS). A variety of NGS technologies were tested and implemented simultaneously.


Genome Characterization Centers

The NCI funded seven Genome characterization centers: the Broad Institute, Harvard, University of North Carolina, MD Anderson Cancer Center, Van Andel Institute, Baylor College of Medicine and the British Columbia Cancer Center.


Data Coordinating Center

The Data Coordinating Center (DCC) was the central repository for TCGA data. It was also responsible for the quality control of data entering the TCGA database. The DCC also maintained the TCGA Data Portal, which was where users could access processed TCGA data. This work was performed under contract by bioinformatics scientists and developers from
SRA International SRA may refer to: * SRA0 to SRA4, standard paper sizes defined by ISO 217 * Satanic ritual abuse * SRa or SRA, a type of semiregular variable star * Senior Airman (SrA), a US Air Force rank * Septic Reserve Area, for a septic drain field * The Na ...
, Inc. The DCC did not host raw sequencing data, however. NCI's
Cancer Genomics Hub Cancer is a group of diseases involving abnormal cell growth with the potential to invade or spread to other parts of the body. These contrast with benign tumors, which do not spread. Possible signs and symptoms include a lump, abnormal b ...
(CGHub) was the secure repository for storing, cataloging, and accessing sequence-related data. This work was performed by scientists and staff at the
University of California, Santa Cruz The University of California, Santa Cruz (UC Santa Cruz or UCSC) is a public university, public Land-grant university, land-grant research university in Santa Cruz, California, United States. It is one of the ten campuses in the University of C ...
br>Genomics Institute
Since 2017, all types of data were moved to NCI's Genomic Data Commons.


Genome Data Analysis Centers

Seven Genome Data Analysis Centers (GDACs) funded by the NCI/NHGRI were responsible for the integration of data across all characterization and sequencing centers as well as biological interpretation of TCGA data. The GDACs included The Broad Institute, University of North Carolina, Oregon Health and Science University, University of California, Santa Cruz, MD Anderson Cancer Center, Memorial Sloan Kettering Cancer Center, and The Institute for Systems Biology. All seven GDACs worked together to develop an integrated data analysis pipeline.


Cancer Types Selected for Study

A preliminary list of tumors for TCGA to study was generated by compiling incidence and survival statistics from the SEER Cancer Statistic website. In addition, U.S. current “Standard of Care” was considered when choosing the top 25 tumor types, as TCGA was targeting tumor types where resection prior to adjunct therapy was the standard of care. Availability of samples also played a critical role in determining which tumor types to study and the order in which tumor projects are started; the more common the cancer type, the more likely that samples would be accrued quickly for study. This resulted in common tumor types, such as colon, lung and breast cancer becoming the first tumor types entered into the project, before rare tumor types. Cancer types selected for study byTCGA included: lung squamous cell carcinoma, kidney papillary carcinoma, clear cell kidney carcinoma, breast ductal carcinoma,
renal cell carcinoma Renal cell carcinoma (RCC) is a kidney cancer that originates in the lining of the Proximal tubule, proximal convoluted tubule, a part of the very small tubes in the kidney that transport primary urine. RCC is the most common type of kidney cance ...
,
cervical cancer Cervical cancer is a cancer arising from the cervix or in any layer of the wall of the cervix. It is due to the abnormal growth of cells that can invade or spread to other parts of the body. Early on, typically no symptoms are seen. Later sympt ...
(squamous),
colon adenocarcinoma Colorectal cancer (CRC), also known as bowel cancer, colon cancer, or rectal cancer, is the development of cancer from the colon or rectum (parts of the large intestine). Signs and symptoms may include blood in the stool, a change in bowel ...
, stomach adenocarcinoma, rectal carcinoma,
hepatocellular carcinoma Hepatocellular carcinoma (HCC) is the most common type of primary liver cancer in adults and is currently the most common cause of death in people with cirrhosis. HCC is the third leading cause of cancer-related deaths worldwide. HCC most common ...
, Head and neck (oral) squamous cell carcinoma,
thyroid carcinoma Thyroid neoplasm is a neoplasm or tumor of the thyroid. It can be a benign tumor such as thyroid adenoma, or it can be a malignant neoplasm (thyroid cancer), such as papillary, follicular, medullary or anaplastic thyroid cancer.Hu MI, Vassilo ...
, bladder urothelial carcinoma – nonpapillary, uterine corpus (
endometrial carcinoma Endometrial cancer is a cancer that arises from the endometrium (the lining of the uterus or womb). It is the result of the abnormal growth of cells that can invade or spread to other parts of the body. The first sign is most often vaginal ...
),
pancreatic ductal adenocarcinoma The pancreas (plural pancreases, or pancreata) is an organ of the digestive system and endocrine system of vertebrates. In humans, it is located in the abdomen behind the stomach and functions as a gland. The pancreas is a mixed or heterocrine ...
,
acute myeloid leukemia Acute myeloid leukemia (AML) is a cancer of the myeloid line of blood cells, characterized by the rapid growth of abnormal cells that build up in the bone marrow and blood and interfere with haematopoiesis, normal blood cell production. Sympt ...
,
prostate adenocarcinoma Prostate cancer is the neoplasm, uncontrolled growth of cells in the prostate, a gland in the male reproductive system below the bladder. Abnormal growth of the prostate tissue is usually detected through Screening (medicine), screening tests, ...
,
lung adenocarcinoma Adenocarcinoma of the lung is the most common type of lung cancer, and like other forms of lung cancer, it is characterized by distinct cellular and molecular features. It is classified as one of several non-small cell lung cancers (NSCLC), to d ...
,
cutaneous melanoma Melanoma is the most dangerous type of skin cancer; it develops from the melanin-producing cells known as melanocytes. It typically occurs in the skin, but may rarely occur in the mouth, intestines, or eye (uveal melanoma). In very rare case ...
, breast lobular carcinoma and lower grade glioma, esophageal carcinoma, ovarian serous cystadenocarcinoma, lung squamous cell carcinoma,
adrenocortical carcinoma Adrenocortical carcinoma (ACC) is an aggressive cancer originating in the cortex (steroid hormone-producing tissue) of the adrenal gland. Adrenocortical carcinoma is remarkable for the many hormonal syndromes that can occur in patients with ste ...
,
Diffuse Large B-cell lymphoma Diffuse large B-cell lymphoma (DLBCL) is a cancer of B cells, a type of lymphocyte that is responsible for producing antibodies. It is the most common form of non-Hodgkin lymphoma among adults, with an annual incidence of 7–8 cases per 100,000 ...
,
paraganglioma A paraganglioma is a rare neuroendocrine tumour, neuroendocrine neoplasm that may develop at various body sites (including the head, neck, thorax and abdomen). When the same type of tumor is found in the adrenal gland, they are referred to as a p ...
&
pheochromocytoma Pheochromocytoma is a rare tumor of the adrenal medulla composed of chromaffin cells and is part of the paraganglioma (PGL) family of tumors, being defined as an intra-adrenal PGL. These neuroendocrine tumors can be sympathetic, where they relea ...
,
cholangiocarcinoma Cholangiocarcinoma, also known as bile duct cancer, is a type of cancer that forms in the bile ducts. Symptoms of cholangiocarcinoma may include abdominal pain, yellowish skin, weight loss, generalized itching, and fever. Light colored stoo ...
,
uterine carcinosarcoma Carcinosarcomas are malignant tumors that consist of a mixture of carcinoma (or epithelial cancer) and sarcoma (or mesenchymal/connective tissue cancer). Carcinosarcomas are rare tumors, and can arise in diverse organs, such as the skin, salivary g ...
,
uveal melanoma Uveal melanoma is a type of eye cancer in the uvea of the eye. It is traditionally classed as originating in the iris, choroid, and ciliary body, but can also be divided into class I (low metastatic risk) and class II (high metastatic risk). S ...
,
thymoma A thymoma is a tumor originating from the epithelial cells of the thymus that is considered a rare neoplasm. Thymomas are frequently associated with neuromuscular disorders such as myasthenia gravis; thymoma is found in 20% of patients with myast ...
,
sarcoma A sarcoma is a rare type of cancer that arises from cells of mesenchymal origin. Originating from mesenchymal cells means that sarcomas are cancers of connective tissues such as bone, cartilage, muscle, fat, or vascular tissues. Sarcom ...
,
mesothelioma Mesothelioma is a type of cancer that develops from the thin layer of tissue that covers many of the internal organs (known as the mesothelium). The area most commonly affected is the lining of the lungs and chest wall. Less commonly the lini ...
, and testicular germ cell cancer. TCGA began accruing samples for all of these tumor types simultaneously. The tumor types with the most samples accrued were entered into the characterization pipeline first. The rarer tumor types which were more difficult to accrue and tumor types for which TCGA could not identify a source of high-quality samples were entered into the TCGA production pipeline in the second year of the project. This gave the TCGA Program Office additional time to accrue sufficient samples for the project. TCGA and the Mouse Organogenesis Cell Atlas (MOCA) were elaborated by
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
and
deep learning Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
to compare and find
correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
between cancer and embryonic cells in early cell development and differentiation. They were also applied to distinguish changes in
gene expression Gene expression is the process (including its Regulation of gene expression, regulation) by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, ...
patterns between various types of tumors from an unknown source.


TCGA Publications


Glioblastoma multiforme

In 2008, the TCGA published its first results on
glioblastoma multiforme Glioblastoma, previously known as glioblastoma multiforme (GBM), is the most aggressive and most common type of cancer that originates in the brain, and has a very poor prognosis for survival. Initial signs and symptoms of glioblastoma are nons ...
(GBM) in ''Nature''. These first results characterized and analyzed 91 tumor-normal matched pairs. While 587 biospecimens were collected for the study, most were rejected during quality control: the tumor samples needed to contain at least 80% tumor nuclei and no more than 50% necrosis, and a secondary pathology assessment had to agree that the original diagnosis of GBM was accurate. A last batch of samples was excluded because the DNA or RNA collected was not of sufficient quality or quantity to be analyzed by all of the different platforms used in the study. All of the data from this study, as well as data that has been collected since the publication were made publicly available at TCGA's Data Coordinating Center (DCC) for public access (later moved toe the Genomic Data Commons). Most of the processed TCGA data is completely open access. For data that could potentially identify specific patients, users apply for controlled-data access to the Data Access Committee (DAC), which evaluates whether the end user is a bona fide researcher and is asking a legitimate scientific question that merits access to individual-level data. Data access credentials are now managed through NIH's dbGAP. Since the publication of the first marker paper, several analysis groups within the TCGA Network have presented more detailed analyses of the
glioblastoma Glioblastoma, previously known as glioblastoma multiforme (GBM), is the most aggressive and most common type of cancer that originates in the brain, and has a very poor prognosis for survival. Initial signs and symptoms of glioblastoma are nons ...
data. An analysis group led by Roel Verhaak, PhD, Katherine A. Hoadley, PhD, and D. Neil Hayes, MD, successfully correlated
glioma A glioma is a type of primary tumor that starts in the glial cells of the brain or spinal cord. They are malignant but some are extremely slow to develop. Gliomas comprise about 30% of all brain and central nervous system tumors and 80% of ...
gene expression subtypes with genomic abnormalities. The
DNA methylation DNA methylation is a biological process by which methyl groups are added to the DNA molecule. Methylation can change the activity of a DNA segment without changing the sequence. When located in a gene promoter (genetics), promoter, DNA methylati ...
data analysis team, led by Houtan Noushmehr, PhD and Peter Laird, PhD, identified a distinct subset of
glioma A glioma is a type of primary tumor that starts in the glial cells of the brain or spinal cord. They are malignant but some are extremely slow to develop. Gliomas comprise about 30% of all brain and central nervous system tumors and 80% of ...
samples which displays concerted hypermethylation at a large number of loci, indicating the existence of a glioma- CpG island methylator phenotype ( G-CIMP). G-CIMP tumors belong to the proneural subgroup and were tightly associated with
IDH1 Isocitrate dehydrogenase 1 (NADP+), soluble is an enzyme that in humans is encoded by the ''IDH1'' gene on chromosome 2. Isocitrate dehydrogenases catalyze the oxidative decarboxylation of isocitrate to 2-oxoglutarate. These enzymes belong to ...
somatic mutations.


Serous ovarian adenocarcinoma

TCGA reported on mRNA expression, microRNA expression, promoter methylation, DNA copy number, and exome sequencing of 316 tumor samples of high grade serous ovarian cancer in ''Nature'' in June 2011. The researchers found mutations of the gene ''TP53'' in an overwhelming 96% of the cases analyzed, Recurrent mutations at lower frequency were found in a handful of other genes, including ''NF1'', ''BRCA1'', ''BRCA2'', ''RB1'' and ''CDK12''. TCGA researchers were also able to identify gene expression patterns that correlated with patient survival. They defined four subtypes of the cancer according to gene expression and DNA methylation patterns: immunoreactive, differentiated, proliferative, and mesenchymal. They also identified 68 genes as potential drug targets.


Colorectal carcinoma

TCGA reported on the exome sequence, DNA copy number, promoter methylation and messenger RNA characterization of 276 tumor samples of colon and rectal cancers in ''Nature'' in July 2012. 97 of the samples also underwent ultra-low coverage whole genome sequencing. TCGA researchers discovered the same type of alterations in colon and rectal tumors, indicating that they are a single type of cancer. Some differences, such as hypermethylation, were apparent in tumors originating in the right colon. A subset of the tumors were found to be hypermutated; a majority of those also had high microsatellite instability. Two dozen significantly mutated genes and recurring copy number alterations were found. The study suggested new markers for aggressive colorectal carcinoma and an important role for MYC-directed transcriptional activation and repression.


Phase II: Expanding TCGA to 33 Cancer Types

Fueled by the American Recovery and Reinvestment Act of 2009, NIH extended TCGA to cover 20 types of cancer. This included an effort to study rare cancers, which was enabled with support from patients, patient advocacy groups, and doctors. Starting in 2011, TCGA began holding Annual Scientific Symposiums to discuss and share novel biological discoveries on cancer, analytical methods and translational approaches using the data. In December 2013, TCGA concluded sample collection, having shipped and processed over 20,000 specimens. By the project’s completion, TCGA published “marker papers” describing the characterization and basic analyses covering 33 cancer types. For several cancer types, such as bladder urothelial carcinoma and GBM, additional cases were collected and a second analysis was performed.


Pan-Cancer Atlas Analyses

In 2013, TCGA published an initial Pan-Cancer analysis describing the "mutational landscape" defined as frequently recurring mutations identified from
exome sequencing Exome sequencing, also known as whole exome sequencing (WES), is a genomic technique for sequencing all of the protein-coding regions of genes in a genome (known as the exome). It consists of two steps: the first step is to select only the subs ...
of 3,281 tumours from 12 commonly occurring cancer subtypes. The twelve subtypes studied were breast adenocarcinoma,
lung adenocarcinoma Adenocarcinoma of the lung is the most common type of lung cancer, and like other forms of lung cancer, it is characterized by distinct cellular and molecular features. It is classified as one of several non-small cell lung cancers (NSCLC), to d ...
, lung squamous cell carcinoma,
endometrial carcinoma Endometrial cancer is a cancer that arises from the endometrium (the lining of the uterus or womb). It is the result of the abnormal growth of cells that can invade or spread to other parts of the body. The first sign is most often vaginal ...
,
glioblastoma multiforme Glioblastoma, previously known as glioblastoma multiforme (GBM), is the most aggressive and most common type of cancer that originates in the brain, and has a very poor prognosis for survival. Initial signs and symptoms of glioblastoma are nons ...
,
squamous cell carcinoma Squamous-cell carcinoma (SCC), also known as epidermoid carcinoma, comprises a number of different types of cancer that begin in squamous cells. These cells form on the surface of the skin, on the lining of hollow organs in the body, and on the ...
of the head and neck,
colon cancer Colorectal cancer (CRC), also known as bowel cancer, colon cancer, or rectal cancer, is the development of cancer from the colon or rectum (parts of the large intestine). Signs and symptoms may include blood in the stool, a change in bowel ...
,
rectal cancer Colorectal cancer (CRC), also known as bowel cancer, colon cancer, or rectal cancer, is the development of cancer from the colon or rectum (parts of the large intestine). Signs and symptoms may include blood in the stool, a change in bowel ...
,
bladder cancer Bladder cancer is the abnormal growth of cells in the bladder. These cells can grow to form a tumor, which eventually spreads, damaging the bladder and other organs. Most people with bladder cancer are diagnosed after noticing blood in thei ...
, kidney clear cell carcinoma,
ovarian carcinoma Ovarian cancer is a cancerous tumor of an ovary. It may originate from the ovary itself or more commonly from communicating nearby structures such as fallopian tubes or the inner lining of the abdomen. The ovary is made up of three different ...
and
acute myeloid leukaemia Acute myeloid leukemia (AML) is a cancer of the myeloid line of blood cells, characterized by the rapid growth of abnormal cells that build up in the bone marrow and blood and interfere with normal blood cell production. Symptoms may inclu ...
. In 2018, the TCGA Research Network published what is collectively known as the Pan-Cancer Atlas: a collection of 35 papers summarizing the work accomplished by TCGA and describing overarching themes of cancer biology elucidated by analyzing all of TCGA data as a whole. The main topics are (1) cell-of-origin patterns, which groups and analyzes tumors based on biological system or histological subtype; (2) oncogenic processes, which considers the complex downstream impacts alterations may have on molecular pathways and the microenvironment, and (3) signaling pathways, which surveys the role different pathways play in different cancers and their potential vulnerabilities. The completion of the Pan-Cancer Atlas marked the official end of TCGA as a program, though the data, analysis methods, and other resources produced by TCGA continues to serve as a resource for researchers. For example, TCGA’s whole-genome data were analyzed as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG), an international effort to analyze 2,600 cancer whole genomes to understand somatic and germline variations in both coding and non-coding regions.


Analysis of Non-coding Regions

TCGA researchers also set out to systematically study the non-coding regions of the genome of multiple cancers. The team applied the assay for transposase-accessible chromatin using sequencing (ATAC-seq) to 410 TCGA tumor samples covering 23 primary cancers in order to gain insights into gene dysregulation in cancer. ATAC-seq is a low-cost method for identifying regions of open or active chromatin and positions of DNA-binding proteins. Through ATAC-seq, researchers were able to identify a tens of thousands of potential DNA regulatory elements specific to different cancers and cell types. This provided insights into how gene dysregulation could help drive cancer initiation and progression. Understanding chromatin accessibility of known immune cell-specific regulatory elements also provided clues into the immune microenvironment and the availability of immunotherapy targets. The study, “The chromatin landscape of primary human cancers,” was published in 2018 in ''Science''.


See also

* Cancer Genome Project at the
Wellcome Trust Sanger Institute The Wellcome Sanger Institute, previously known as The Sanger Centre and Wellcome Trust Sanger Institute, is a non-profit organisation, non-profit British genomics and genetics research institute, primarily funded by the Wellcome Trust. It is l ...
* International Cancer Genome Consortium *
List of biological databases Biological databases are stores of biological information. The journal ''Nucleic Acids Research'' regularly publishes special issues on biological databases and has a list of such databases. The 2018 issue has a list of about 180 such databases an ...


References


External links


The Cancer Genome Atlas

NCI Office of Cancer Genomics

Cancer Genomics Hub
at UC Santa Cruz
NCI Wiki
{{DEFAULTSORT:Cancer Genome Atlas Cancer research Cancer genome consortium