chemo-informatics
   HOME

TheInfoList



OR:

Cheminformatics (also known as chemoinformatics) refers to use of
physical chemistry Physical chemistry is the study of macroscopic and microscopic phenomena in chemical systems in terms of the principles, practices, and concepts of physics such as motion, energy, force, time, thermodynamics, quantum chemistry, statistica ...
theory with computer and
information Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random ...
science techniques—so called "''in silico''" techniques—in application to a range of descriptive and prescriptive problems in the field of chemistry, including in its applications to
biology Biology is the scientific study of life. It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field. For instance, all organisms are made up of cells that process hereditary i ...
and related molecular fields. Such '' in silico'' techniques are used, for example, by
pharmaceutical companies The pharmaceutical industry discovers, develops, produces, and markets drugs or pharmaceutical drugs for use as medications to be administered to patients (or self-administered), with the aim to cure them, vaccinate them, or alleviate symptoms. ...
and in academic settings to aid and inform the process of drug discovery, for instance in the design of well-defined combinatorial libraries of synthetic compounds, or to assist in structure-based drug design. The methods can also be used in chemical and allied industries, and such fields as environmental science and pharmacology, where chemical processes are involved or studied.


History

Cheminformatics has been an active field in various guises since the 1970s and earlier, with activity in academic departments and commercial pharmaceutical research and development departments. The term chemoinformatics was defined in its application to drug discovery by F.K. Brown in 1998:; see also
Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization.
Since then, both terms, cheminformatics and chemoinformatics, have been used, although,
lexicographically In mathematics, the lexicographic or lexicographical order (also known as lexical order, or dictionary order) is a generalization of the alphabetical order of the dictionaries to sequences of ordered symbols or, more generally, of elements of a ...
, cheminformatics appears to be more frequently used, despite academics in Europe declaring for the variant chemoinformatics in 2006. In 2009, a prominent Springer journal in the field was founded by transatlantic executive editors named the Journal of Cheminformatics.


Background

Cheminformatics combines the scientific working fields of chemistry, computer science, and information science—for example in the areas of
topology In mathematics, topology (from the Greek words , and ) is concerned with the properties of a geometric object that are preserved under continuous deformations, such as stretching, twisting, crumpling, and bending; that is, without closing ...
, chemical graph theory, information retrieval and data mining in the chemical space. Cheminformatics can also be applied to data analysis for various industries like
paper Paper is a thin sheet material produced by mechanically or chemically processing cellulose fibres derived from wood, rags, grasses or other vegetable sources in water, draining the water through fine mesh leaving the fibre evenly distrib ...
and
pulp Pulp may refer to: * Pulp (fruit), the inner flesh of fruit Engineering * Dissolving pulp, highly purified cellulose used in fibre and film manufacture * Pulp (paper), the fibrous material used to make paper * Molded pulp, a packaging material ...
, dyes and such allied industries.


Applications


Storage and retrieval

A primary application of cheminformatics is the storage, indexing, and search of information relating to chemical compounds. The efficient search of such stored information includes topics that are dealt with in computer science, such as data mining, information retrieval,
information extraction Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. In most of the cases this activity concer ...
, and
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
. Related research topics include: *
Digital libraries A digital library, also called an online library, an internet library, a digital repository, or a digital collection is an online database of digital objects that can include text, still images, audio, video, digital documents, or other digital m ...
*
Unstructured data Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, num ...
*
Structured data mining Structure mining or structured data mining is the process of finding and extracting useful information from semi-structured data sets. Graph mining, sequential pattern mining and molecule mining are special cases of structured data mining. Descrip ...
and mining of
structured data A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be c ...
** Database mining ** Graph mining **
Molecule mining This page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data in ...
** Sequence mining ** Tree mining


File formats

The ''in silico'' representation of chemical structures uses specialized formats such as the
Simplified molecular input line entry specification The simplified molecular-input line-entry system (SMILES) is a specification in the form of a line notation for describing the structure of chemical species using short ASCII strings. SMILES strings can be imported by most molecule editors f ...
s (SMILES) or the
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
-based
Chemical Markup Language Chemical Markup Language (ChemML or CML) is an approach to managing molecular information using tools such as XML and Java. It was the first domain specific implementation based strictly on XML, first based on a DTD and later on an XML Schema, t ...
. These representations are often used for storage in large
chemical database A chemical database is a database specifically designed to store chemical information. This information is about chemical and crystal structures, spectra, reactions and syntheses, and thermophysical data. Types of chemical databases Bioactivi ...
s. While some formats are suited for visual representations in two- or three-dimensions, others are more suited for studying physical interactions, modeling and docking studies.


Virtual libraries

Chemical data can pertain to real or virtual molecules. Virtual libraries of compounds may be generated in various ways to explore chemical space and hypothesize novel compounds with desired properties. Virtual libraries of classes of compounds (drugs, natural products, diversity-oriented synthetic products) were recently generated using the FOG (fragment optimized growth) algorithm. This was done by using cheminformatic tools to train transition probabilities of a Markov chain on authentic classes of compounds, and then using the Markov chain to generate novel compounds that were similar to the training database.


Virtual screening

In contrast to
high-throughput screening High-throughput screening (HTS) is a method for scientific experimentation especially used in drug discovery and relevant to the fields of biology, materials science and chemistry. Using robotics, data processing/control software, liquid handling ...
, virtual screening involves computationally screening ''in silico'' libraries of compounds, by means of various methods such as docking, to identify members likely to possess desired properties such as biological activity against a given target. In some cases,
combinatorial chemistry Combinatorial chemistry comprises chemical synthetic methods that make it possible to prepare a large number (tens to thousands or even millions) of compounds in a single process. These compound libraries can be made as mixtures, sets of individua ...
is used in the development of the library to increase the efficiency in mining the chemical space. More commonly, a diverse library of small molecules or natural products is screened.


Quantitative structure-activity relationship (QSAR)

This is the calculation of
quantitative structure–activity relationship Quantitative structure–activity relationship models (QSAR models) are regression or classification models used in the chemical and biological sciences and engineering. Like other regression models, QSAR regression models relate a set of "predic ...
and quantitative structure property relationship values, used to predict the activity of compounds from their structures. In this context there is also a strong relationship to
chemometrics Chemometrics is the science of extracting information from chemical systems by data-driven means. Chemometrics is inherently interdisciplinary, using methods frequently employed in core data-analytic disciplines such as multivariate statistics, a ...
. Chemical expert systems are also relevant, since they represent parts of chemical knowledge as an ''in silico'' representation. There is a relatively new concept of matched molecular pair analysis or prediction-driven MMPA which is coupled with QSAR model in order to identify activity cliff.


See also

* Bioinformatics *
Chemical file format A chemical file format is a type of data file which is used specifically to depicting molecular data. One of the most widely used is the chemical table file format, which is similar to ''Structure Data Format'' (SDF) files. They are text files ...
* Chemicalize.org * Cheminformatics toolkits * Chemogenomics * Computational chemistry * Information engineering *
Journal of Chemical Information and Modeling The ''Journal of Chemical Information and Modeling'' is a peer-reviewed scientific journal published by the American Chemical Society. It was established in 1961 as the ''Journal of Chemical Documentation'', renamed in 1975 to ''Journal of Chemical ...
* Journal of Cheminformatics *
Materials informatics Materials informatics is a field of study that applies the principles of informatics to materials science and engineering to improve the understanding, use, selection, development, and discovery of materials. This is an emerging field, with a goal ...
* Molecular Conceptor * Molecular design software * Molecular graphics *
Molecular Informatics ''Molecular Informatics'' is a peer-reviewed scientific journal published by Wiley VCH. It covers research in cheminformatics, quantitative structure–activity relationships, and combinatorial chemistry. It was established in 1981 as ''Quantitati ...
*
Molecular modelling Molecular modelling encompasses all methods, theoretical and computational, used to model or mimic the behaviour of molecules. The methods are used in the fields of computational chemistry, drug design, computational biology and materials sci ...
* Nanoinformatics * Software for molecular modeling *
WorldWide Molecular Matrix The World Wide Molecular Matrix (WWMM) was a proposed electronic Disciplinary repository, repository for unpublished chemical data. First introduced in 2002 by Peter Murray-Rust and his colleagues in the chemistry department at the University of C ...
*
Molecular descriptor Molecular descriptors play a fundamental role in chemistry, pharmaceutical sciences, environmental protection policy, and health researches, as well as in quality control, being the way molecules, thought of as real bodies, are transformed into numb ...


References


Further reading

* * * * * *


External links

* {{Authority control Computational chemistry Drug discovery Computational fields of study Applied statistics