
Linked-read sequencing, a type of
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
sequencing
In genetics and biochemistry, sequencing means to determine the primary structure (sometimes incorrectly called the primary sequence) of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succ ...
technology, uses specialized technique that tags DNA molecules with unique barcodes before fragmenting them. Unlike traditional sequencing technology, where DNA is broken into small fragments and then sequenced individually, resulting in short read lengths that has difficulties in accurately reconstructing the original DNA sequence, the unique barcodes of linked-read sequencing allows scientists to link together DNA fragments that come from the same DNA molecule. A pivotal benefit of this technology lies in the small quantities of DNA required for large
genome
A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
information output, effectively combining the advantages of
long-read and
short-read technologies.
History
This sequencing method was originally developed by
10x Genomics in 2015, and was launched under the name 'GemCode' or 'Chromium'. GemCode employed a method of gel bead-based barcoding to amalgamate short DNA fragments.
The longer fragments produced by this could then be sequenced using validated technology such as
Illumina next-generation sequencing
Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation ...
.
An updated version of linked-read sequencing was introduced by the same company in 2018, termed 'Linked-Reads V2'. While GemCode uses a single barcode for tagging of both the gel bead and the DNA fragment, Linked-Reads V2 uses separate barcodes for improved detection of genetic variants.
The group developed the linked-read sequencing technology published their first paper regarding this technology in 2016. The authors of this paper developed the linked-read sequencing technology initially to sequence the genomes of both healthy individuals and
cancer
Cancer is a group of diseases involving Cell growth#Disorders, abnormal cell growth with the potential to Invasion (cancer), invade or Metastasis, spread to other parts of the body. These contrast with benign tumors, which do not spread. Po ...
patients to determine
somatic mutation
A somatic mutation is a change in the DNA sequence of a somatic cell of a multicellular organism with dedicated reproductive cells; that is, any mutation that occurs in a cell other than a gamete, germ cell, or gametocyte. Unlike germline muta ...
s,
copy number variation
Copy number variation (CNV) is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals. Copy number variation is a type of structural variation: specifically, it is a type of ...
s, and
structural variations in cancer genomes.
Later that year, another research group combined linked-read sequencing technology with long-read sequencing technology to assemble human genome.
Both studies demonstrated the utility of linked-read sequencing in comprehensive genome analysis and in understanding genetic diseases. However, in 2019, a lawsuit relating to patent infringement resulted in 10x Genomics discontinuing their line of linked-read products.
Method
Overview
The linked-read sequencing is
microfluidic
Microfluidics refers to a system that manipulates a small amount of fluids (10−9 to 10−18 liters) using small channels with sizes of ten to hundreds of micrometres. It is a multidisciplinary field that involves molecular analysis, molecular bi ...
-based, and only needs nanograms of input DNA.
One nanogram of DNA can be distributed across more than 100,000 droplet partitions, where DNA fragments are barcoded and subjected to
polymerase chain reactions (PCR).
As a result, DNA fragments (or
reads) that share the same barcode can be grouped as coming from one single long input DNA sequence.
And, long range information can be assembled from short reads.
Steps of Linked-read sequencing:
# Sample Preparation: DNA is extracted from a sample (e.g., blood) and cut into fragments of 50 to 200
kilo base-pairs long.
# Barcode Sequencing: each DNA fragment is labelled with a unique barcode through a process known as "Gel Bead-In Emulsion" (GEM).
#
Library
A library is a collection of Book, books, and possibly other Document, materials and Media (communication), media, that is accessible for use by its members and members of allied institutions. Libraries provide physical (hard copies) or electron ...
Preparation: barcoded DNA fragments are amplified with PCR to generate sequencing libraries.
# Sequencing: with
Illumina next-generation sequencing
Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation ...
technology, generate millions to billions of short sequence reads that represent fragments of the original DNA molecules.
# Barcode Processing: group short reads to longer fragments based on barcodes.
# Downstream Analysis: processed reads are aligned to a reference genome, or used for de novo assembly of complex genomes, haplotype phasing, or identification of structural variations.
Barcode Sequencing
During barcode sequencing, high
molecular weight
A molecule is a group of two or more atoms that are held together by Force, attractive forces known as chemical bonds; depending on context, the term may or may not include ions that satisfy this criterion. In quantum physics, organic chemi ...
DNA samples that contain the targeted DNA sequence, ranging from fifty to several hundred
kilobases in size, are combined with gel beads containing unique barcodes, enzymes, and sequencing reagents.
Microfluidic device can partition input DNA molecules into individual nanoliter-sized droplets of water-in-oil emulsion, called GEMs.
Each GEM contains gel beads coated with the same barcode and primers, and a small amount of DNA.
The primers are complementary to specific regions of the DNA molecule, allowing for amplification of the DNA in the droplets through PCR.
The barcodes enable the identification and grouping of sequencing reads that originate from the same long fragment, which is crucial for downstream analysis.
Library Preparation and Sequencing
The barcoded DNA fragments are amplified using PCR to create a library of DNA fragments with identical barcodes. All the fragments derived from a given DNA molecule are tagged with the same barcode.
This step increases the quantity of DNA for sequencing and reduces the chances of losing unique DNA fragments during sequencing. Droplets (or GEM) are later collected in a tube, and the emulsion is broken, releasing the amplified, barcoded DNA sequences.
Standard Illumina next-generation sequencing technology can be used to sequence libraries.
During sequencing, the barcodes are read along with the DNA sequences, allowing researchers and scientists to group together DNA fragments that originate from the same DNA molecule.
Even though each DNA fragment is typically not fully sequenced, the information from many overlapping fragments in the same genomic region can be combined to reconstruct the long stretches of the genome.
Therefore, a genome can be easily assembled from scratch without any prior reference.
Processing
The raw sequencing data is then processed through
bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
(e.g., the GemCode analysis software developed by 10x Genomics) to remove low-quality reads and to assign reads to their respective barcodes.
Reads can be aligned to a reference genome or assembled de novo to generate long-range
contig
A contig (from ''contiguous'') is a set of overlapping DNA segments that together represent a consensus region of DNA.Gregory, S. ''Contig Assembly''. Encyclopedia of Life Sciences, 2005.
In bottom-up sequencing projects, a contig refers to over ...
s. The read alignment step is important for determining the order and orientation of the long DNA fragments, and for identifying genomic variations, such as
insertions or
deletions.
Applications
De Novo Genome Assembly
Linked-read sequencing can facilitate
de novo genome assembly, which involves reconstructing a genome from scratch without any prior reference. Linked-read sequencing enables assembly of large genomic regions, and helps improve the completeness and contiguity of the resulting genome. This can be particularly useful for studying organisms that lack a high-quality reference genome, such as non-model organisms or organisms with complex genomes.
Many scientists have been using linked-read sequencing technology for de novo genome assembly recently in a variety of organisms, including humans, plants, and animals.
For example, Dr. Evan Eichler and his research group used linked-read sequencing to assemble genome of
orangutan
Orangutans are great apes native to the rainforests of Indonesia and Malaysia. They are now found only in parts of Borneo and Sumatra, but during the Pleistocene they ranged throughout Southeast Asia and South China. Classified in the genus ...
, which had previously been difficult to study due to its complex genome.
The resulting genome assembly helped scientists to study new insights into the
evolutionary history
The history of life on Earth traces the processes by which living and extinct organisms evolved, from the earliest emergence of life to the present day. Earth formed about 4.5 billion years ago (abbreviated as ''Ga'', for '' gigaannum'') and ...
of primates and the genetic basis of human diseases.
Also, the aligned or assembled reads can be used for other genetic investigations or downstream analysis, such as haplotype phasing.
Haplotype Phasing
Haplotype
A haplotype (haploid genotype) is a group of alleles in an organism that are inherited together from a single parent.
Many organisms contain genetic material (DNA) which is inherited from two parents. Normally these organisms have their DNA orga ...
refers to a group of genetic variants inherited together on a
chromosome
A chromosome is a package of DNA containing part or all of the genetic material of an organism. In most chromosomes, the very long thin DNA fibers are coated with nucleosome-forming packaging proteins; in eukaryotic cells, the most import ...
from one parent due to their
genetic linkage
Genetic linkage is the tendency of Nucleic acid sequence, DNA sequences that are close together on a chromosome to be inherited together during the meiosis phase of sexual reproduction. Two Genetic marker, genetic markers that are physically near ...
. Haplotype phasing (also called
haplotype estimation In genetics, haplotype estimation (also known as "phasing") refers to the process of statistical estimation of haplotypes from genotype data. The most common situation arises when genotypes are collected at a set of polymorphic sites from a group of ...
) refers to the process of reconstructing individual haplotypes, important for determining the genetic basis of diseases. Linked-read sequencing allows consistent coverage of genes related to different diseases, helping scientists to obtain all the regions carrying
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
s from targeted genes.
For example, in 2018, a group of researchers used linked-read sequencing technology to sequence genetic information from a pregnant woman who was a
carrier of
Duchenne muscular dystrophy
Duchenne muscular dystrophy (DMD) is a severe type of muscular dystrophy predominantly affecting boys. The onset of muscle weakness typically begins around age four, with rapid progression. Initially, muscle loss occurs in the thighs and pe ...
(DMD) mutation.
Linked-read sequencing allows them to identify the maternal haplotypes and determine the presence of the mutant
allele
An allele is a variant of the sequence of nucleotides at a particular location, or Locus (genetics), locus, on a DNA molecule.
Alleles can differ at a single position through Single-nucleotide polymorphism, single nucleotide polymorphisms (SNP), ...
s in the foetal DNA.
This non-invasive prenatal diagnosis of DMD demonstrates the clinical applicability of linked-read sequencing.
Structural Variation Analysis
Structural variation Genomic structural variation is the variation in structure of an organism's chromosome, such as deletions, duplications, copy-number variants, insertions, inversions and translocations. Originally, a structure variation affects a sequence length a ...
s, such as deletions,
duplications, inversions,
translocations
In genetics, chromosome translocation is a phenomenon that results in unusual rearrangement of chromosomes. This includes "balanced" and "unbalanced" translocation, with three main types: "reciprocal", "nonreciprocal" and "Robertsonian" transloc ...
, and other rearrangements, are common in human genomes.
These variations can have significant impacts on genome functions, and have been implicated in many diseases. Linked-read sequencing technology labels all reads that originate from the same long DNA fragment with the same barcode, so it enables the detection of a large number of structural variants.
Complexity of structural variants can be resolved with linked-read sequencing, and provide a complete picture of the genomic landscape. Many scientists have already been using linked-read sequencing to identify and characterise structural variants in diverse populations, including people with genetic disorders or cancers
Transcriptome Analysis
Transcriptome
The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The ...
analysis is the study of all the
RNA
Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
transcripts that are produced by the genome of an organism. Linked-read sequencing has been used by researchers to assemble transcript isoforms and
alternative splicing
Alternative splicing, alternative RNA splicing, or differential splicing, is an alternative RNA splicing, splicing process during gene expression that allows a single gene to produce different splice variants. For example, some exons of a gene ma ...
events.
Information regarding alternative splicing events can provide insights into the
regulation of gene expression
Regulation of gene expression, or gene regulation, includes a wide range of mechanisms that are used by cells to increase or decrease the production of specific gene products (protein or RNA). Sophisticated programs of gene expression are wide ...
in human transcriptome
Epigenetic Analysis
Epigenetics
In biology, epigenetics is the study of changes in gene expression that happen without changes to the DNA sequence. The Greek prefix ''epi-'' (ἐπι- "over, outside of, around") in ''epigenetics'' implies features that are "on top of" or "in ...
refers to the study of heritable changes in genetic activities that are distinct from changes in DNA sequences. Epigenetic analysis involves studying DNA-protein interactions,
histone
In biology, histones are highly basic proteins abundant in lysine and arginine residues that are found in eukaryotic cell nuclei and in most Archaeal phyla. They act as spools around which DNA winds to create structural units called nucleosomes ...
modifications, and
DNA methylation
DNA methylation is a biological process by which methyl groups are added to the DNA molecule. Methylation can change the activity of a DNA segment without changing the sequence. When located in a gene promoter (genetics), promoter, DNA methylati ...
. Linked-read sequencing has been used for studying DNA methylation patterns by many studies.
For example, in 2021, a study investigated the DNA methylation differences in peripheral blood cells between twins, in which one twin had
Alzheimer’s Disease
Alzheimer's disease (AD) is a neurodegenerative disease and the cause of 60–70% of cases of dementia. The most common early symptom is difficulty in remembering recent events. As the disease advances, symptoms can include problems wit ...
and the other was cognitively normal.
Linked-read sequencing technology allowed researchers to identify more than 3000 differentially methylated regions between these twins discordant for
Alzheimer’s Disease
Alzheimer's disease (AD) is a neurodegenerative disease and the cause of 60–70% of cases of dementia. The most common early symptom is difficulty in remembering recent events. As the disease advances, symptoms can include problems wit ...
, and investigation of these differentially methylated regions eventually led to identification of genes enriched in neurodevelopmental processes,
neuronal signalling, and
immune system
The immune system is a network of biological systems that protects an organism from diseases. It detects and responds to a wide variety of pathogens, from viruses to bacteria, as well as Tumor immunology, cancer cells, Parasitic worm, parasitic ...
functions
Use
Advantages
* Wide range of genomic applications and scientific questions, including de novo genome assembly, haplotype phasing, structural variant analysis, and transcriptome and epigenetic analysis.
* Accuracy and scalability.
* Method requires small quantities of input DNA, which can be beneficial for small samples or single cell studies.
* More cost effective per sample in comparison with
long-read technologies such as Oxford
Nanopore sequencing
Nanopore sequencing is a third generation approach used in the sequencing of biopolymers — specifically, polynucleotides in the form of DNA or RNA.
Nanopore sequencing allows a single molecule of DNA or RNA be sequenced without PCR amplif ...
.
* Libraries produced by linked-read can be processed using Illumina short read sequencing, increasing accessibility.
Limitations
* Complexity of library construction - this technology requires high molecular DNA preparation in order to produce long enough DNA molecules for sequencing.
* Limitations in read length may result in limited haplotype resolution, which could reduce the efficacy of this technology in highly complex genomic regions.
Controversy
In 2018,
Bio-Rad Laboratories filed a lawsuit against 10x Genomics stating that their linked-read technology infringed on three patents which had been licensed from Bio-Rad at the
University of Chicago
The University of Chicago (UChicago, Chicago, or UChi) is a Private university, private research university in Chicago, Illinois, United States. Its main campus is in the Hyde Park, Chicago, Hyde Park neighborhood on Chicago's South Side, Chic ...
.
Bio-Rad was awarded a sum of $23,930,716 by a jury. The 10x Genomics filed a motion for
judgement as a matter of law (JMOL) but were denied in 2019, and the court proceedings concluded in 2020. Following this lawsuit, 10x Genomics discontinued their linked-read assay.
An exception was made for linked-read products which had already been sold by the company prior to the lawsuit, allowing 10x Genomics to continue to provide those researchers with services such as support and warranty maintenance for this technology.
References
{{reflist
Molecular biology
Biotechnology