Computational biology refers to the use of techniques in
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
,
data analysis
Data analysis is the process of inspecting, Data cleansing, cleansing, Data transformation, transforming, and Data modeling, modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Da ...
,
mathematical modeling
A mathematical model is an abstract and concrete, abstract description of a concrete system using mathematics, mathematical concepts and language of mathematics, language. The process of developing a mathematical model is termed ''mathematical m ...
and
computational simulations to understand
biological system
A biological system is a complex Biological network inference, network which connects several biologically relevant entities. Biological organization spans several scales and are determined based different structures depending on what the system is ...
s and relationships.
An intersection of
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
,
biology
Biology is the scientific study of life and living organisms. It is a broad natural science that encompasses a wide range of fields and unifying principles that explain the structure, function, growth, History of life, origin, evolution, and ...
, and
data science
Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processing, scientific visualization, algorithms and systems to extract or extrapolate knowledge from potentially noisy, stru ...
, the field also has foundations in
applied mathematics
Applied mathematics is the application of mathematics, mathematical methods by different fields such as physics, engineering, medicine, biology, finance, business, computer science, and Industrial sector, industry. Thus, applied mathematics is a ...
,
molecular biology
Molecular biology is a branch of biology that seeks to understand the molecule, molecular basis of biological activity in and between Cell (biology), cells, including biomolecule, biomolecular synthesis, modification, mechanisms, and interactio ...
,
cell biology
Cell biology (also cellular biology or cytology) is a branch of biology that studies the structure, function, and behavior of cells. All living organisms are made of cells. A cell is the basic unit of life that is responsible for the living an ...
,
chemistry
Chemistry is the scientific study of the properties and behavior of matter. It is a physical science within the natural sciences that studies the chemical elements that make up matter and chemical compound, compounds made of atoms, molecules a ...
, and
genetics
Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinians, Augustinian ...
.
History
Bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
, the analysis of informatics processes in
biological system
A biological system is a complex Biological network inference, network which connects several biologically relevant entities. Biological organization spans several scales and are determined based different structures depending on what the system is ...
s, began in the early 1970s. At this time, research in
artificial intelligence
Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
was using
network models of the human brain in order to generate new
algorithms
In mathematics and computer science, an algorithm () is a finite sequence of mathematically rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for per ...
. This use of
biological data pushed biological researchers to use computers to evaluate and compare large data sets in their own field.
By 1982, researchers shared information via
punch cards. The amount of data grew exponentially by the end of the 1980s, requiring new computational methods for quickly interpreting relevant information.
Perhaps the best-known example of computational biology, the
Human Genome Project, officially began in 1990.
By 2003, the project had mapped around 85% of the human genome, satisfying its initial goals. Work continued, however, and by 2021 level "a complete genome" was reached with only 0.3% remaining bases covered by potential issues. The missing Y
chromosome
A chromosome is a package of DNA containing part or all of the genetic material of an organism. In most chromosomes, the very long thin DNA fibers are coated with nucleosome-forming packaging proteins; in eukaryotic cells, the most import ...
was added in January 2022.
Since the late 1990s, computational biology has become an important part of biology, leading to numerous subfields.
Today, the
International Society for Computational Biology recognizes 21 different 'Communities of Special Interest', each representing a slice of the larger field. In addition to helping sequence the human genome, computational biology has helped create accurate
model
A model is an informative representation of an object, person, or system. The term originally denoted the plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin , .
Models can be divided in ...
s of the
human brain
The human brain is the central organ (anatomy), organ of the nervous system, and with the spinal cord, comprises the central nervous system. It consists of the cerebrum, the brainstem and the cerebellum. The brain controls most of the activi ...
,
map the 3D structure of genomes, and model biological systems.
Global contributions
Colombia
In 2000, despite a lack of initial expertise in programming and data management, Colombia began applying computational biology from an industrial perspective, focusing on plant diseases. This research has contributed to understanding how to counteract diseases in crops like potatoes and studying the genetic diversity of coffee plants.
By 2007, concerns about alternative energy sources and global climate change prompted biologists to collaborate with systems and computer engineers. Together, they developed a robust computational network and database to address these challenges. In 2009, in partnership with the University of Los Angeles, Colombia also created a
Virtual Learning Environment (VLE) to improve the integration of computational biology and bioinformatics.
Poland
In Poland, computational biology is closely linked to mathematics and computational science, serving as a foundation for bioinformatics and biological physics. The field is divided into two main areas: one focusing on physics and simulation and the other on biological sequences.
The application of statistical models in Poland has advanced techniques for studying proteins and RNA, contributing to global scientific progress. Polish scientists have also been instrumental in evaluating protein prediction methods, significantly enhancing the field of computational biology. Over time, they have expanded their research to cover topics such as protein-coding analysis and hybrid structures, further solidifying Poland's influence on the development of bioinformatics worldwide.
Applications
Anatomy
Computational anatomy is the study of anatomical shape and form at the visible or
gross anatomical scale of
morphology. It involves the development of computational mathematical and data-analytical methods for modeling and simulating biological structures. It focuses on the anatomical structures being imaged, rather than the medical imaging devices. Due to the availability of dense 3D measurements via technologies such as
magnetic resonance imaging
Magnetic resonance imaging (MRI) is a medical imaging technique used in radiology to generate pictures of the anatomy and the physiological processes inside the body. MRI scanners use strong magnetic fields, magnetic field gradients, and ...
, computational anatomy has emerged as a subfield of
medical imaging
Medical imaging is the technique and process of imaging the interior of a body for clinical analysis and medical intervention, as well as visual representation of the function of some organs or tissues (physiology). Medical imaging seeks to revea ...
and
bioengineering
Biological engineering or
bioengineering is the application of principles of biology and the tools of engineering to create usable, tangible, economically viable products. Biological engineering employs knowledge and expertise from a number ...
for extracting anatomical coordinate systems at the morpheme scale in 3D.
The original formulation of computational anatomy is as a generative model of shape and form from exemplars acted upon via transformations.
The
diffeomorphism
In mathematics, a diffeomorphism is an isomorphism of differentiable manifolds. It is an invertible function that maps one differentiable manifold to another such that both the function and its inverse are continuously differentiable.
Definit ...
group is used to study different coordinate systems via
coordinate transformations
In geometry, a coordinate system is a system that uses one or more numbers, or coordinates, to uniquely determine and standardize the position of the points or other geometric elements on a manifold such as Euclidean space. The coordinates are ...
as generated via the
Lagrangian and Eulerian velocities of flow from one anatomical configuration in
to another. It relates with
shape statistics and
morphometrics, with the distinction that
diffeomorphism
In mathematics, a diffeomorphism is an isomorphism of differentiable manifolds. It is an invertible function that maps one differentiable manifold to another such that both the function and its inverse are continuously differentiable.
Definit ...
s are used to map coordinate systems, whose study is known as diffeomorphometry.
Data and modeling
Mathematical biology is the use of mathematical models of living organisms to examine the systems that govern structure, development, and behavior in
biological system
A biological system is a complex Biological network inference, network which connects several biologically relevant entities. Biological organization spans several scales and are determined based different structures depending on what the system is ...
s. This entails a more theoretical approach to problems, rather than its more empirically-minded counterpart of
experimental biology. Mathematical biology draws on
discrete mathematics
Discrete mathematics is the study of mathematical structures that can be considered "discrete" (in a way analogous to discrete variables, having a bijection with the set of natural numbers) rather than "continuous" (analogously to continuous f ...
,
topology
Topology (from the Greek language, Greek words , and ) is the branch of mathematics concerned with the properties of a Mathematical object, geometric object that are preserved under Continuous function, continuous Deformation theory, deformat ...
(also useful for computational modeling),
Bayesian statistics
Bayesian statistics ( or ) is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about ...
,
linear algebra
Linear algebra is the branch of mathematics concerning linear equations such as
:a_1x_1+\cdots +a_nx_n=b,
linear maps such as
:(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n,
and their representations in vector spaces and through matrix (mathemat ...
and
Boolean algebra
In mathematics and mathematical logic, Boolean algebra is a branch of algebra. It differs from elementary algebra in two ways. First, the values of the variable (mathematics), variables are the truth values ''true'' and ''false'', usually denot ...
.
These mathematical approaches have enabled the creation of
database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
s and other methods for storing, retrieving, and analyzing biological data, a field known as
bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
. Usually, this process involves
genetics
Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinians, Augustinian ...
and analyzing
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s.
Gathering and analyzing large datasets have made room for growing research fields such as
data mining,
and computational biomodeling, which refers to building
computer models and
visual simulations of biological systems. This allows researchers to predict how such systems will react to different environments, which is useful for determining if a system can "maintain their state and functions against external and internal perturbations".
While current techniques focus on small biological systems, researchers are working on approaches that will allow for larger networks to be analyzed and modeled. A majority of researchers believe this will be essential in developing modern medical approaches to creating new drugs and gene
therapy
A therapy or medical treatment is the attempted remediation of a health problem, usually following a medical diagnosis. Both words, ''treatment'' and ''therapy'', are often abbreviated tx, Tx, or Tx.
As a rule, each therapy has indications a ...
.
A useful modeling approach is to use
Petri nets via tools such as
esyN.
Along similar lines, until recent decades
theoretical ecology
Theoretical ecology is the scientific discipline devoted to the study of ecosystem, ecological systems using theoretical methods such as simple conceptual models, mathematical models, computer simulation, computational simulations, and advanced d ...
has largely dealt with
analytic models that were detached from the
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...
s used by
empirical
Empirical evidence is evidence obtained through sense experience or experimental procedure. It is of central importance to the sciences and plays a role in various other fields, like epistemology and law.
There is no general agreement on how t ...
ecologists. However, computational methods have aided in developing ecological theory via
simulation
A simulation is an imitative representation of a process or system that could exist in the real world. In this broad sense, simulation can often be used interchangeably with model. Sometimes a clear distinction between the two terms is made, in ...
of ecological systems, in addition to increasing application of methods from
computational statistics in ecological analyses.
Systems biology
Systems biology consists of computing the interactions between various biological systems ranging from the cellular level to entire populations with the goal of discovering emergent properties. This process usually involves networking
cell signaling
In biology, cell signaling (cell signalling in British English) is the Biological process, process by which a Cell (biology), cell interacts with itself, other cells, and the environment. Cell signaling is a fundamental property of all Cell (biol ...
and
metabolic pathway
In biochemistry, a metabolic pathway is a linked series of chemical reactions occurring within a cell (biology), cell. The reactants, products, and Metabolic intermediate, intermediates of an enzymatic reaction are known as metabolites, which are ...
s. Systems biology often uses computational techniques from biological modeling and
graph theory
In mathematics and computer science, graph theory is the study of ''graph (discrete mathematics), graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of ''Vertex (graph ...
to study these complex interactions at cellular levels.
Evolutionary biology
Computational biology has assisted evolutionary biology by:
* Using
DNA data to reconstruct the tree of life with
computational phylogenetics
Computational phylogenetics, phylogeny inference, or phylogenetic inference focuses on computational and optimization algorithms, Heuristic (computer science), heuristics, and approaches involved in Phylogenetics, phylogenetic analyses. The goal i ...
* Fitting
population genetics
Population genetics is a subfield of genetics that deals with genetic differences within and among populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as Adaptation (biology), adaptation, s ...
models (either forward time
or
backward time) to DNA data to make inferences about
demographic
Demography () is the statistics, statistical study of human populations: their size, composition (e.g., ethnic group, age), and how they change through the interplay of fertility (births), mortality (deaths), and migration.
Demographic analy ...
or
selective history
* Building
population genetics
Population genetics is a subfield of genetics that deals with genetic differences within and among populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as Adaptation (biology), adaptation, s ...
models of
evolutionary systems from first principles in order to predict what is likely to evolve
Genomics

Computational genomics is the study of the
genome
A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
s of
cells and
organism
An organism is any life, living thing that functions as an individual. Such a definition raises more problems than it solves, not least because the concept of an individual is also difficult. Many criteria, few of them widely accepted, have be ...
s. The
Human Genome Project is one example of computational genomics. This project looks to sequence the entire human genome into a set of data. Once fully implemented, this could allow for doctors to analyze the genome of an individual
patient
A patient is any recipient of health care services that are performed by Health professional, healthcare professionals. The patient is most often Disease, ill or Major trauma, injured and in need of therapy, treatment by a physician, nurse, op ...
. This opens the possibility of personalized medicine, prescribing treatments based on an individual's pre-existing genetic patterns. Researchers are looking to sequence the genomes of animals, plants,
bacteria
Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of Prokaryote, prokaryotic microorganisms. Typically a few micr ...
, and all other types of life.
One of the main ways that genomes are compared is by
sequence homology
Sequence homology is the homology (biology), biological homology between DNA sequence, DNA, RNA sequence, RNA, or Protein primary structure, protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments ...
. Homology is the study of biological structures and nucleotide sequences in different organisms that come from a common
ancestor
An ancestor, also known as a forefather, fore-elder, or a forebear, is a parent or ( recursively) the parent of an antecedent (i.e., a grandparent, great-grandparent, great-great-grandparent and so forth). ''Ancestor'' is "any person from ...
. Research suggests that between 80 and 90% of genes in newly sequenced
prokaryotic genomes can be identified this way.
Sequence alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural biology, structural, or evolutionary relationships between ...
is another process for comparing and detecting similarities between biological sequences or genes. Sequence alignment is useful in a number of bioinformatics applications, such as computing the
longest common subsequence of two genes or comparing variants of certain
disease
A disease is a particular abnormal condition that adversely affects the structure or function (biology), function of all or part of an organism and is not immediately due to any external injury. Diseases are often known to be medical condi ...
s.
An untouched project in computational genomics is the analysis of intergenic regions, which comprise roughly 97% of the human genome.
Researchers are working to understand the functions of non-coding regions of the human genome through the development of computational and statistical methods and via large consortia projects such as
ENCODE and the
Roadmap Epigenomics Project.
Understanding how individual
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s contribute to the
biology
Biology is the scientific study of life and living organisms. It is a broad natural science that encompasses a wide range of fields and unifying principles that explain the structure, function, growth, History of life, origin, evolution, and ...
of an organism at the
molecular,
cellular, and organism levels is known as
gene ontology. The
Gene Ontology Consortium's mission is to develop an up-to-date, comprehensive, computational model of
biological system
A biological system is a complex Biological network inference, network which connects several biologically relevant entities. Biological organization spans several scales and are determined based different structures depending on what the system is ...
s, from the molecular level to larger pathways, cellular, and organism-level systems. The Gene Ontology resource provides a computational representation of current scientific knowledge about the functions of genes (or, more properly, the
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
and non-coding
RNA
Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
molecules produced by genes) from many different organisms, from humans to bacteria.
3D genomics is a subsection in computational biology that focuses on the organization and interaction of genes within a
eukaryotic cell. One method used to gather 3D genomic data is through
Genome Architecture Mapping (GAM). GAM measures 3D distances of
chromatin and DNA in the genome by combining
cryosectioning, the process of cutting a strip from the nucleus to examine the DNA, with laser microdissection. A nuclear profile is simply this strip or slice that is taken from the nucleus. Each nuclear profile contains genomic windows, which are certain sequences of
nucleotide
Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
s - the base unit of DNA. GAM captures a genome network of complex, multi enhancer chromatin contacts throughout a cell.
Biomarker Discovery
Computational biology also plays a pivotal role in identifying
biomarkers for diseases such as cardiovascular conditions. By integrating various '
Omic' data - such as
genomics,
proteomics
Proteomics is the large-scale study of proteins. Proteins are vital macromolecules of all living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replicatio ...
, and
metabolomics
Metabolomics is the scientific study of chemical processes involving metabolites, the small molecule substrates, intermediates, and products of cell metabolism. Specifically, metabolomics is the "systematic study of the unique chemical fingerpri ...
- researchers can uncover potential biomarkers that aid in disease diagnosis, prognosis, and treatment strategies. For instance, metabolomic analyses have identified specific metabolites capable of distinguishing between
coronary artery disease
Coronary artery disease (CAD), also called coronary heart disease (CHD), or ischemic heart disease (IHD), is a type of cardiovascular disease, heart disease involving Ischemia, the reduction of blood flow to the cardiac muscle due to a build-up ...
and
myocardial infarction
A myocardial infarction (MI), commonly known as a heart attack, occurs when Ischemia, blood flow decreases or stops in one of the coronary arteries of the heart, causing infarction (tissue death) to the heart muscle. The most common symptom ...
, thereby enhancing diagnostic precision.
Neuroscience
Computational
neuroscience
Neuroscience is the scientific study of the nervous system (the brain, spinal cord, and peripheral nervous system), its functions, and its disorders. It is a multidisciplinary science that combines physiology, anatomy, molecular biology, ...
is the study of brain function in terms of the information processing properties of the
nervous system
In biology, the nervous system is the complex system, highly complex part of an animal that coordinates its behavior, actions and sense, sensory information by transmitting action potential, signals to and from different parts of its body. Th ...
. A subset of neuroscience, it looks to model the brain to examine specific aspects of the neurological system. Models of the brain include:
* Realistic Brain Models: These models look to represent every aspect of the brain, including as much detail at the cellular level as possible. Realistic models provide the most information about the brain, but also have the largest margin for
error
An error (from the Latin , meaning 'to wander'Oxford English Dictionary, s.v. “error (n.), Etymology,” September 2023, .) is an inaccurate or incorrect action, thought, or judgement.
In statistics, "error" refers to the difference between t ...
. More variables in a brain model create the possibility for more error to occur. These models do not account for parts of the cellular structure that scientists do not know about. Realistic brain models are the most computationally heavy and the most expensive to implement.
* Simplifying Brain Models: These models look to limit the scope of a model in order to assess a specific
physical property
A physical property is any property of a physical system that is measurable. The changes in the physical properties of a system can be used to describe its changes between momentary states. A quantifiable physical property is called ''physical ...
of the neurological system. This allows for the intensive computational problems to be solved, and reduces the amount of potential error from a realistic brain model.
It is the work of computational neuroscientists to improve the
algorithms
In mathematics and computer science, an algorithm () is a finite sequence of mathematically rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for per ...
and data structures currently used to increase the speed of such calculations.
Computational
neuropsychiatry
Neuropsychiatry is a branch of medicine that deals with psychiatry as it relates to neurology, in an effort to understand and attribute behavior to the interaction of neurobiology and social psychology factors. Within neuropsychiatry, the mind i ...
is an emerging field that uses mathematical and computer-assisted modeling of brain mechanisms involved in
mental disorder
A mental disorder, also referred to as a mental illness, a mental health condition, or a psychiatric disability, is a behavioral or mental pattern that causes significant distress or impairment of personal functioning. A mental disorder is ...
s. Several initiatives have demonstrated that computational modeling is an important contribution to understand neuronal circuits that could generate mental functions and dysfunctions.
Pharmacology
Computational pharmacology is "the study of the effects of genomic data to find links between specific
genotype
The genotype of an organism is its complete set of genetic material. Genotype can also be used to refer to the alleles or variants an individual carries in a particular gene or genetic location. The number of alleles an individual can have in a ...
s and diseases and then
screening drug data". The
pharmaceutical industry
The pharmaceutical industry is a medical industry that discovers, develops, produces, and markets pharmaceutical goods such as medications and medical devices. Medications are then administered to (or self-administered by) patients for curing ...
requires a shift in methods to analyze drug data. Pharmacologists were able to use
Microsoft Excel
Microsoft Excel is a spreadsheet editor developed by Microsoft for Microsoft Windows, Windows, macOS, Android (operating system), Android, iOS and iPadOS. It features calculation or computation capabilities, graphing tools, pivot tables, and a ...
to compare chemical and genomic data related to the effectiveness of drugs. However, the industry has reached what is referred to as the Excel barricade. This arises from the limited number of cells accessible on a
spreadsheet
A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in c ...
. This development led to the need for computational pharmacology. Scientists and researchers develop computational methods to analyze these massive
data set
A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more table (database), database tables, where every column (database), column of a table represents a particular Variable (computer sci ...
s. This allows for an efficient comparison between the notable data points and allows for more accurate drugs to be developed.
Analysts project that if major medications fail due to patents, that computational biology will be necessary to replace current drugs on the market. Doctoral students in computational biology are being encouraged to pursue careers in industry rather than take Post-Doctoral positions. This is a direct result of major pharmaceutical companies needing more qualified analysts of the large data sets required for producing new drugs.
Oncology
Computational biology plays a crucial role in discovering signs of new, previously unknown living creatures and in
cancer
Cancer is a group of diseases involving Cell growth#Disorders, abnormal cell growth with the potential to Invasion (cancer), invade or Metastasis, spread to other parts of the body. These contrast with benign tumors, which do not spread. Po ...
research. This field involves large-scale measurements of cellular processes, including
RNA
Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
,
DNA, and proteins, which pose significant computational challenges. To overcome these, biologists rely on computational tools to accurately measure and analyze biological data.
In cancer research, computational biology aids in the complex analysis of
tumor
A neoplasm () is a type of abnormal and excessive growth of tissue. The process that occurs to form or produce a neoplasm is called neoplasia. The growth of a neoplasm is uncoordinated with that of the normal surrounding tissue, and persists ...
samples, helping researchers develop new ways to characterize tumors and understand various cellular properties. The use of high-throughput measurements, involving millions of data points from DNA, RNA, and other biological structures, helps in diagnosing cancer at early stages and in understanding the key factors that contribute to cancer development. Areas of focus include analyzing molecules that are deterministic in causing cancer and understanding how the human genome relates to tumor causation.
Toxicology
Computational toxicology is a multidisciplinary area of study, which is employed in the early stages of drug discovery and development to predict the safety and potential toxicity of drug candidates.
Drug Discovery
Computational biology has become instrumental in revolutionizing
drug discovery
In the fields of medicine, biotechnology, and pharmacology, drug discovery is the process by which new candidate medications are discovered.
Historically, drugs were discovered by identifying the active ingredient from traditional remedies or ...
processes. By integrating computational systems biology approaches, researchers can model complex biological systems, facilitating the identification of novel drug targets and the prediction of drug responses. These methodologies enable the simulation of
intracellular and
intercellular signaling events using data from genomic, proteomic, or metabolomic experiments, thereby streamlining the drug development pipeline and reducing associated costs.
Moreover, the convergence of computational biology with artificial intelligence (AI) has further accelerated drug design. AI-driven models can analyze vast datasets to predict molecular behavior, optimize lead compounds, and anticipate potential side effects, thereby enhancing the efficiency and effectiveness of drug discovery.
Techniques
Computational biologists use a wide range of software and algorithms to carry out their research.
Unsupervised Learning
Unsupervised learning is a type of algorithm that finds patterns in unlabeled data. One example is
k-means clustering
''k''-means clustering is a method of vector quantization, originally from signal processing, that aims to partition of a set, partition ''n'' observations into ''k'' clusters in which each observation belongs to the cluster (statistics), cluste ...
, which aims to partition ''n'' data points into ''k'' clusters, in which each data point belongs to the cluster with the nearest mean. Another version is the
k-medoids algorithm, which, when selecting a cluster center or cluster centroid, will pick one of its data points in the set, and not just an average of the cluster.

The algorithm follows these steps:
# Randomly select ''k'' distinct data points. These are the initial clusters.
# Measure the distance between each point and each of the 'k' clusters. (This is the distance of the points from each point ''k'').
# Assign each point to the nearest cluster.
# Find the center of each cluster (medoid).
# Repeat until the clusters no longer change.
# Assess the quality of the clustering by adding up the variation within each cluster.
# Repeat the processes with different values of k.
# Pick the best value for 'k' by finding the "elbow" in the plot of which k value has the lowest variance.
One example of this in biology is used in the 3D mapping of a genome. Information of a mouse's HIST1 region of chromosome 13 is gathered from
Gene Expression Omnibus. This information contains data on which nuclear profiles show up in certain genomic regions. With this information, the
Jaccard distance can be used to find a normalized distance between all the loci.
Graph Analytics
Graph analytics, or
network analysis, is the study of graphs that represent connections between different objects. Graphs can represent all kinds of networks in biology such as
protein-protein interaction networks, regulatory networks, Metabolic and biochemical networks and much more. There are many ways to analyze these networks. One of which is looking at
centrality in graphs. Finding centrality in graphs assigns nodes rankings to their popularity or centrality in the graph. This can be useful in finding which nodes are most important. For example, given data on the activity of genes over a time period, degree centrality can be used to see what genes are most active throughout the network, or what genes interact with others the most throughout the network. This contributes to the understanding of the roles certain genes play in the network.
There are many ways to calculate centrality in graphs all of which can give different kinds of information on centrality. Finding centralities in biology can be applied in many different circumstances, some of which are gene regulatory, protein interaction and metabolic networks.
Supervised Learning
Supervised learning
In machine learning, supervised learning (SL) is a paradigm where a Statistical model, model is trained using input objects (e.g. a vector of predictor variables) and desired output values (also known as a ''supervisory signal''), which are often ...
is a type of algorithm that learns from labeled data and learns how to assign labels to future data that is unlabeled. In biology supervised learning can be helpful when we have data that we know how to categorize and we would like to categorize more data into those categories.

A common supervised learning algorithm is the
random forest, which uses numerous
decision trees to train a model to classify a dataset. Forming the basis of the random forest, a decision tree is a structure which aims to classify, or label, some set of data using certain known features of that data. A practical biological example of this would be taking an individual's genetic data and predicting whether or not that individual is predisposed to develop a certain disease or cancer. At each internal node the algorithm checks the dataset for exactly one feature, a specific gene in the previous example, and then branches left or right based on the result. Then at each leaf node, the decision tree assigns a class label to the dataset. So in practice, the algorithm walks a specific root-to-leaf path based on the input dataset through the decision tree, which results in the classification of that dataset. Commonly, decision trees have target variables that take on discrete values, like yes/no, in which case it is referred to as a
classification tree, but if the target variable is continuous then it is called a
regression tree. To construct a decision tree, it must first be trained using a training set to identify which features are the best predictors of the target variable.
Open source software
Open source software provides a platform for computational biology where everyone can access and benefit from software developed in research.
PLOS cites four main reasons for the use of open source software:
*
Reproducibility: This allows for researchers to use the exact methods used to calculate the relations between biological data.
* Faster development: developers and researchers do not have to reinvent existing code for minor tasks. Instead they can use pre-existing programs to save time on the development and implementation of larger projects.
* Increased quality: Having input from multiple researchers studying the same topic provides a layer of assurance that errors will not be in the code.
* Long-term availability: Open source programs are not tied to any businesses or patents. This allows for them to be posted to multiple
web page
A web page (or webpage) is a World Wide Web, Web document that is accessed in a web browser. A website typically consists of many web pages hyperlink, linked together under a common domain name. The term "web page" is therefore a metaphor of pap ...
s and ensure that they are available in the future.
Research
There are several large conferences that are concerned with computational biology. Some notable examples are
Intelligent Systems for Molecular Biology,
European Conference on Computational Biology and
Research in Computational Molecular Biology.
There are also numerous journals dedicated to computational biology. Some notable examples include
Journal of Computational Biology and
PLOS Computational Biology, a peer-reviewed
open access journal
Open access (OA) is a set of principles and a range of practices through which nominally copyrightable publications are delivered to readers free of access charges or other barriers. With open access strictly defined (according to the 2001 de ...
that has many notable research projects in the field of computational biology. They provide reviews on
software
Software consists of computer programs that instruct the Execution (computing), execution of a computer. Software also includes design documents and specifications.
The history of software is closely tied to the development of digital comput ...
, tutorials for open source software, and display information on upcoming computational biology conferences. Other journals relevant to this field include
Bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
,
Computers in Biology and Medicine,
BMC Bioinformatics,
Nature Methods,
Nature Communications
''Nature Communications'' is a peer-reviewed, open access, scientific journal published by Nature Portfolio since 2010. It is a multidisciplinary journal that covers the natural sciences, including physics, chemistry, earth sciences, medic ...
,
Scientific Reports,
PLOS One, etc.
Related fields
Computational biology,
bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
and
mathematical biology are all interdisciplinary approaches to the
life sciences
This list of life sciences comprises the branches of science that involve the scientific study of life – such as microorganisms, plants, and animals including human beings. This science is one of the two major branches of natural science, ...
that draw from quantitative disciplines such as mathematics and
information science. The
NIH describes computational/mathematical biology as the use of computational/mathematical approaches to address theoretical and experimental questions in biology and, by contrast, bioinformatics as the application of information science to understand complex life-sciences data.
Specifically, the NIH defines
While each field is distinct, there may be significant overlap at their interface,
so much so that to many, bioinformatics and computational biology are terms that are used interchangeably.
The terms computational biology and
evolutionary computation
Evolutionary computation from computer science is a family of algorithms for global optimization inspired by biological evolution, and the subfield of artificial intelligence and soft computing studying these algorithms. In technical terms ...
have a similar name, but are not to be confused. Unlike computational biology, evolutionary computation is not concerned with modeling and analyzing biological data. It instead creates algorithms based on the ideas of evolution across species. Sometimes referred to as
genetic algorithm
In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to g ...
s, the research of this field can be applied to computational biology. While evolutionary computation is not inherently a part of computational biology, computational evolutionary biology is a subfield of it.
See also
References
External links
bioinformatics.org
{{DEFAULTSORT:Computational Biology
Bioinformatics
Computational fields of study