Cheminformatics (also known as chemoinformatics) refers to use of
physical chemistry
Physical chemistry is the study of macroscopic and microscopic phenomena in chemical systems in terms of the principles, practices, and concepts of physics such as motion, energy, force, time, thermodynamics, quantum chemistry, statistica ...
theory with
computer
A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations ( computation) automatically. Modern digital electronic computers can perform generic sets of operations known as programs. These prog ...
and
information
Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random, ...
science techniques—so called "''in silico''" techniques—in application to a range of descriptive and prescriptive problems in the field of
chemistry
Chemistry is the scientific study of the properties and behavior of matter. It is a natural science that covers the elements that make up matter to the compounds made of atoms, molecules and ions: their composition, structure, proper ...
, including in its applications to
biology
Biology is the scientific study of life. It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field. For instance, all organisms are made up of cells that process hereditary ...
and
related molecular fields. Such ''
in silico
In biology and other experimental sciences, an ''in silico'' experiment is one performed on computer or via computer simulation. The phrase is pseudo-Latin for 'in silicon' (correct la, in silicio), referring to silicon in computer chips. It ...
'' techniques are used, for example, by
pharmaceutical companies and in academic settings to aid and inform the process of
drug discovery
In the fields of medicine, biotechnology and pharmacology, drug discovery is the process by which new candidate medications are discovered.
Historically, drugs were discovered by identifying the active ingredient from traditional remedies or b ...
, for instance in the design of well-defined
combinatorial libraries Combinatorial chemistry comprises chemical synthetic methods that make it possible to prepare a large number (tens to thousands or even millions) of compounds in a single process. These compound libraries can be made as mixtures, sets of individua ...
of synthetic compounds, or to assist in
structure-based drug design
Drug design, often referred to as rational drug design or simply rational design, is the inventive process of finding new medications based on the knowledge of a biological target. The drug is most commonly an organic small molecule that activ ...
. The methods can also be used in chemical and allied industries, and such fields as
environmental science
Environmental science is an interdisciplinary academic field that integrates physics, biology, and geography (including ecology, chemistry, plant science, zoology, mineralogy, oceanography, limnology, soil science, geology and physical geog ...
and
pharmacology
Pharmacology is a branch of medicine, biology and pharmaceutical sciences concerned with drug or medication action, where a drug may be defined as any artificial, natural, or endogenous (from within the body) molecule which exerts a biochemica ...
, where chemical processes are involved or studied.
History
Cheminformatics has been an active field in various guises since the 1970s and earlier, with activity in academic departments and commercial pharmaceutical research and development departments. The term chemoinformatics was defined in its application to drug discovery by F.K. Brown in 1998:
[; see also ]Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization.
Since then, both terms, cheminformatics and chemoinformatics, have been used, although,
lexicographically, cheminformatics appears to be more frequently used, despite academics in Europe declaring for the variant chemoinformatics in 2006. In 2009, a prominent Springer journal in the field was founded by transatlantic executive editors named the
Journal of Cheminformatics
The ''Journal of Cheminformatics'' is a peer-reviewed open access scientific journal that covers cheminformatics and molecular modelling. It was established in 2009 with David Wild (Indiana University) and Christoph Steinbeck (then at EMBL-EBI) as ...
.
Background
Cheminformatics combines the scientific working fields of chemistry, computer science, and information science—for example in the areas of
topology
In mathematics, topology (from the Greek words , and ) is concerned with the properties of a geometric object that are preserved under continuous deformations, such as stretching, twisting, crumpling, and bending; that is, without closing ...
,
chemical graph theory
Chemical graph theory is the topology branch of mathematical chemistry which applies graph theory to mathematical modelling of chemical phenomena.
The pioneers of chemical graph theory are Alexandru Balaban, Ante Graovac, Iván Gutman, Haruo Hoso ...
,
information retrieval
Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other c ...
and
data mining in the
chemical space
Chemical space is a concept in cheminformatics referring to the property space spanned by all possible molecules and chemical compounds adhering to a given set of construction principles and boundary conditions. It contains millions of compounds wh ...
.
Cheminformatics can also be applied to data analysis for various industries like
paper
Paper is a thin sheet material produced by mechanically or chemically processing cellulose fibres derived from wood, rags, grasses or other vegetable sources in water, draining the water through fine mesh leaving the fibre evenly distribu ...
and
pulp, dyes and such allied industries.
Applications
Storage and retrieval
A primary application of cheminformatics is the storage, indexing, and search of information relating to chemical compounds. The efficient search of such stored information includes topics that are dealt with in computer science, such as data mining, information retrieval,
information extraction, and
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
. Related research topics include:
*
Digital libraries
*
Unstructured data
*
Structured data mining and mining of
structured data
**
Database mining
**
Graph mining
Structure mining or structured data mining is the process of finding and extracting useful information from semi-structured data sets. Graph mining, sequential pattern mining and molecule mining are special cases of structured data mining.
Descrip ...
**
Molecule mining
**
Sequence mining
Sequential pattern mining is a topic of data mining concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence. It is usually presumed that the values are discrete, and thus time seri ...
**
Tree mining
File formats
The ''in silico'' representation of chemical structures uses specialized formats such as the
Simplified molecular input line entry specifications (SMILES) or the
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
-based
Chemical Markup Language. These representations are often used for storage in large
chemical database
A chemical database is a database specifically designed to store chemical information. This information is about chemical and crystal structures, spectra, reactions and syntheses, and thermophysical data.
Types of chemical databases
Bioactiv ...
s. While some formats are suited for visual representations in two- or three-dimensions, others are more suited for studying physical interactions, modeling and docking studies.
Virtual libraries
Chemical data can pertain to real or virtual molecules. Virtual libraries of compounds may be generated in various ways to explore chemical space and hypothesize novel compounds with desired properties. Virtual libraries of classes of compounds (drugs, natural products, diversity-oriented synthetic products) were recently generated using the FOG (fragment optimized growth) algorithm.
This was done by using cheminformatic tools to train transition probabilities of a
Markov chain on authentic classes of compounds, and then using the Markov chain to generate novel compounds that were similar to the training database.
Virtual screening
In contrast to
high-throughput screening, virtual screening involves computationally
screening ''in silico'' libraries of compounds, by means of various methods such as
docking, to identify members likely to possess desired properties
such as
biological activity against a given target. In some cases,
combinatorial chemistry is used in the development of the library to increase the efficiency in mining the chemical space. More commonly, a diverse library of small molecules or
natural products is screened.
Quantitative structure-activity relationship (QSAR)
This is the calculation of
quantitative structure–activity relationship
Quantitative structure–activity relationship models (QSAR models) are regression or classification models used in the chemical and biological sciences and engineering. Like other regression models, QSAR regression models relate a set of "predict ...
and
quantitative structure property relationship values, used to predict the activity of compounds from their structures. In this context there is also a strong relationship to
chemometrics. Chemical
expert system
In artificial intelligence, an expert system is a computer system emulating the decision-making ability of a human expert.
Expert systems are designed to solve complex problems by reasoning through bodies of knowledge, represented mainly as if� ...
s are also relevant, since they represent parts of chemical knowledge as an ''in silico'' representation. There is a relatively new concept of
matched molecular pair analysis or prediction-driven MMPA which is coupled with QSAR model in order to identify activity cliff.
See also
*
Bioinformatics
Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
*
Chemical file format
*
Chemicalize.org
*
Cheminformatics toolkits
*
Chemogenomics
Chemogenomics, or chemical genomics, is the systematic screening of targeted chemical libraries of small molecules against individual drug target families (e.g., GPCRs, nuclear receptors, kinases, proteases, etc.) with the ultimate goal of iden ...
*
Computational chemistry
Computational chemistry is a branch of chemistry that uses computer simulation to assist in solving chemical problems. It uses methods of theoretical chemistry, incorporated into computer programs, to calculate the structures and properties of mo ...
*
Information engineering
Information engineering is the engineering discipline that deals with the generation, distribution, analysis, and use of information, data, and knowledge in systems. The field first became identifiable in the early 21st century.
The component ...
*
Journal of Chemical Information and Modeling
*
Journal of Cheminformatics
The ''Journal of Cheminformatics'' is a peer-reviewed open access scientific journal that covers cheminformatics and molecular modelling. It was established in 2009 with David Wild (Indiana University) and Christoph Steinbeck (then at EMBL-EBI) as ...
*
Materials informatics
*
Molecular Conceptor
*
Molecular design software
*
Molecular graphics
*
Molecular Informatics
*
Molecular modelling
*
Nanoinformatics
Nanoinformatics is the application of informatics to nanotechnology. It is an interdisciplinary field that develops methods and software tools for understanding nanomaterials, their properties, and their interactions with biological entities, and ...
*
Software for molecular modeling
*
WorldWide Molecular Matrix
*
Molecular descriptor
References
Further reading
*
*
*
*
*
*
External links
*
{{Authority control
Computational chemistry
Drug discovery
Computational fields of study
Applied statistics