HOME

TheInfoList



OR:

A chemical database is a
database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
specifically designed to store chemical information. This information is about
chemical A chemical substance is a unique form of matter with constant chemical composition and characteristic properties. Chemical substances may take the form of a single element or chemical compounds. If two or more chemical substances can be combin ...
and crystal structures, spectra, reactions and syntheses, and thermophysical data.


Types of chemical databases


Bioactivity database

Bioactivity databases correlate structures or other chemical information to bioactivity results taken from bioassays in literature, patents, and screening programs.


Chemical structures

Chemical structure A chemical structure of a molecule is a spatial arrangement of its atoms and their chemical bonds. Its determination includes a chemist's specifying the molecular geometry and, when feasible and necessary, the electronic structure of the target m ...
s are traditionally represented using lines indicating
chemical bonds A chemical bond is the association of atoms or ions to form molecules, crystals, and other structures. The bond may result from the electrostatic force between oppositely charged ions as in ionic bonds or through the sharing of electrons as ...
between
atoms Atoms are the basic particles of the chemical elements. An atom consists of a nucleus of protons and generally neutrons, surrounded by an electromagnetically bound swarm of electrons. The chemical elements are distinguished from each other ...
and drawn on paper (2D structural formulae). While these are ideal visual representations for the
chemist A chemist (from Greek ''chēm(ía)'' alchemy; replacing ''chymist'' from Medieval Latin ''alchemist'') is a graduated scientist trained in the study of chemistry, or an officially enrolled student in the field. Chemists study the composition of ...
, they are unsuitable for computational use and especially for
search Searching may refer to: Music * "Searchin', Searchin", a 1957 song originally performed by The Coasters * Searching (China Black song), "Searching" (China Black song), a 1991 song by China Black * Searchin' (CeCe Peniston song), "Searchin" (C ...
and storage. Small molecules (also called
ligands In coordination chemistry, a ligand is an ion or molecule with a functional group that binds to a central metal atom to form a coordination complex. The bonding with the metal generally involves formal donation of one or more of the ligand's ...
in drug design applications), are usually represented using lists of atoms and their connections. Large molecules such as proteins are however more compactly represented using the sequences of their amino acid building blocks. Radioactive isotopes are also represented, which is an important attribute for some applications. Large chemical databases for structures are expected to handle the storage and searching of information on millions of molecules taking terabytes of physical memory.


Literature database

Chemical literature databases correlate structures or other chemical information to relevant references such as academic papers or patents. This type of database includes STN, Scifinder, and
Reaxys Reaxys is a web-based tool for the retrieval of information about chemical compounds and data from published literature, including journals and patents. The information includes chemical compounds, chemical reactions, chemical properties, related ...
. Links to literature are also included in many databases that focus on chemical characterization.


Crystallographic database

Crystallographic databases store X-ray crystal structure data. Common examples include
Protein Data Bank The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules such as proteins and nucleic acids, which is overseen by the Worldwide Protein Data Bank (wwPDB). This structural data is obtained a ...
and Cambridge Structural Database.


NMR spectra database

NMR spectra databases correlate chemical structure with NMR data. These databases often include other characterization data such as FTIR and
mass spectrometry Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a ''mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is used ...
.


Reactions database

Most chemical databases store information on
stable A stable is a building in which working animals are kept, especially horses or oxen. The building is usually divided into stalls, and may include storage for equipment and feed. Styles There are many different types of stables in use tod ...
molecule A molecule is a group of two or more atoms that are held together by Force, attractive forces known as chemical bonds; depending on context, the term may or may not include ions that satisfy this criterion. In quantum physics, organic chemi ...
s but in databases for reactions also intermediates and temporarily created unstable molecules are stored. Reaction databases contain information about products, educts, and
reaction mechanism In chemistry, a reaction mechanism is the step by step sequence of elementary reactions by which overall chemical reaction occurs. A chemical mechanism is a theoretical conjecture that tries to describe in detail what takes place at each stage ...
s. A popular example that lists
chemical reaction A chemical reaction is a process that leads to the chemistry, chemical transformation of one set of chemical substances to another. When chemical reactions occur, the atoms are rearranged and the reaction is accompanied by an Gibbs free energy, ...
data, among others, would be the Beilstein database
Reaxys


Thermophysical database

Thermophysical data are information about * phase equilibria including
vapor–liquid equilibrium In thermodynamics and chemical engineering, the vapor–liquid equilibrium (VLE) describes the distribution of a chemical species between the vapor phase and a liquid phase. The Vapor quality, concentration of a vapor in contact with its liquid, ...
,
solubility In chemistry, solubility is the ability of a chemical substance, substance, the solute, to form a solution (chemistry), solution with another substance, the solvent. Insolubility is the opposite property, the inability of the solute to form su ...
of gases in liquids, liquids in solids (SLE), heats of mixing,
vaporization Vaporization (or vapo(u)risation) of an element or compound is a phase transition from the liquid phase to vapor. There are two types of vaporization: evaporation and boiling. Evaporation is a surface phenomenon, whereas boiling is a bulk phenome ...
, and fusion. * caloric data like
heat capacity Heat capacity or thermal capacity is a physical property of matter, defined as the amount of heat to be supplied to an object to produce a unit change in its temperature. The SI unit of heat capacity is joule per kelvin (J/K). Heat capacity is a ...
, heat of formation and
combustion Combustion, or burning, is a high-temperature exothermic redox chemical reaction between a fuel (the reductant) and an oxidant, usually atmospheric oxygen, that produces oxidized, often gaseous products, in a mixture termed as smoke. Combustion ...
, * transport properties like
viscosity Viscosity is a measure of a fluid's rate-dependent drag (physics), resistance to a change in shape or to movement of its neighboring portions relative to one another. For liquids, it corresponds to the informal concept of ''thickness''; for e ...
and
thermal conductivity The thermal conductivity of a material is a measure of its ability to heat conduction, conduct heat. It is commonly denoted by k, \lambda, or \kappa and is measured in W·m−1·K−1. Heat transfer occurs at a lower rate in materials of low ...


Chemical structure representation

There are two principal techniques for representing chemical structures in digital databases * As connection tables / adjacency matrices / lists with additional information on bond (edges) and atom attributes (nodes), such as: *: MDL Molfile, PDB, CML * As a linear string notation based on depth first or breadth first traversal, such as: *: SMILES/SMARTS, SLN, WLN, InChI These approaches have been refined to allow representation of stereochemical differences and charges as well as special kinds of bonding such as those seen in organo-metallic compounds. The principal advantage of a computer representation is the possibility for increased storage and fast, flexible search.


Search


Substructure

Chemists can search databases using parts of structures, parts of their
IUPAC The International Union of Pure and Applied Chemistry (IUPAC ) is an international federation of National Adhering Organizations working for the advancement of the chemical sciences, especially by developing nomenclature and terminology. It is ...
names as well as based on constraints on properties. Chemical databases are different from other general purpose databases in their support for substructure search, a method to retrieve chemicals matching a pattern of atoms and bonds which a user specifies. This kind of search is achieved by looking for subgraph isomorphism (sometimes also called a
monomorphism In the context of abstract algebra or universal algebra, a monomorphism is an injective homomorphism. A monomorphism from to is often denoted with the notation X\hookrightarrow Y. In the more general setting of category theory, a monomorphis ...
) and is a widely studied application of
graph theory In mathematics and computer science, graph theory is the study of ''graph (discrete mathematics), graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of ''Vertex (graph ...
. Query structures may contain bonding patterns such as "single/aromatic" or "any" to provide flexibility. Similarly, the vertices which in an actual compound would be a specific atom may be replaced with an atom list in the query. ''Cis''–''trans'' isomerism at
double bond In chemistry, a double bond is a covalent bond between two atoms involving four bonding electrons as opposed to two in a single bond. Double bonds occur most commonly between two carbon atoms, for example in alkenes. Many double bonds exist betw ...
s is catered for by giving a choice of retrieving only the E form, the Z form, or both.


Conformation

Search by matching 3D conformation of molecules or by specifying spatial constraints is another feature that is particularly of use in
drug design Drug design, often referred to as rational drug design or simply rational design, is the invention, inventive process of finding new medications based on the knowledge of a biological target. The drug is most commonly an organic compound, organi ...
. Searches of this kind can be computationally very expensive. Many approximate methods have been proposed, for instance BCUTS, special function representations, moments of inertia, ray-tracing histograms, maximum distance histograms, shape multipoles to name a few.


Examples

Large databases, such as
PubChem PubChem is a database of Chemistry, chemical molecules and their activities against biological assays. The system is maintained by the National Center for Biotechnology Information (NCBI), a component of the National Library of Medicine, which ...
and ChemSpider, have graphical interfaces for search. The
Chemical Abstracts Service Chemical Abstracts Service (CAS) is a division of the American Chemical Society. It is a source of chemical information and is located in Columbus, Ohio, United States. Print periodicals ''Chemical Abstracts'' is a periodical index that provid ...
provides tools to search the chemical literature and
Reaxys Reaxys is a web-based tool for the retrieval of information about chemical compounds and data from published literature, including journals and patents. The information includes chemical compounds, chemical reactions, chemical properties, related ...
supplied by
Elsevier Elsevier ( ) is a Dutch academic publishing company specializing in scientific, technical, and medical content. Its products include journals such as ''The Lancet'', ''Cell (journal), Cell'', the ScienceDirect collection of electronic journals, ...
covers both chemicals and reaction information, including that originally held in the Beilstein database. PATENTSCOPE makes chemical patents accessible by substructure and Wikipedia's articles describing individual chemicals can also be searched that way. Suppliers of chemicals as synthesis intermediates or for
high-throughput screening High-throughput screening (HTS) is a method for scientific discovery especially used in drug discovery and relevant to the fields of biology, materials science and chemistry. Using robotics, data processing/control software, liquid handling device ...
routinely provide search interfaces. Currently, the largest database that can be freely searched by the public is the ZINC database, which is claimed to contain over 37 billion commercially available molecules.


Descriptors

All properties of molecules beyond their structure can be split up into either physico-chemical or
pharmacological Pharmacology is the science of drugs and medications, including a substance's origin, composition, pharmacokinetics, pharmacodynamics, therapeutic use, and toxicology. More specifically, it is the study of the interactions that occur between ...
attributes also called descriptors. On top of that, there exist various artificial and more or less standardized naming systems for molecules that supply more or less ambiguous names and
synonym A synonym is a word, morpheme, or phrase that means precisely or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are a ...
s. The IUPAC name is usually a good choice for representing a molecule's structure in a both human-readable and unique
string String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
although it becomes unwieldy for larger molecules. Trivial names on the other hand abound with
homonym In linguistics, homonyms are words which are either; '' homographs''—words that mean different things, but have the same spelling (regardless of pronunciation), or '' homophones''—words that mean different things, but have the same pronunciat ...
s and synonyms and are therefore a bad choice as a defining database key. While physico-chemical descriptors like
molecular weight A molecule is a group of two or more atoms that are held together by Force, attractive forces known as chemical bonds; depending on context, the term may or may not include ions that satisfy this criterion. In quantum physics, organic chemi ...
, ( partial) charge,
solubility In chemistry, solubility is the ability of a chemical substance, substance, the solute, to form a solution (chemistry), solution with another substance, the solvent. Insolubility is the opposite property, the inability of the solute to form su ...
, etc. can mostly be computed directly based on the molecule's structure, pharmacological descriptors can be derived only indirectly using involved multivariate statistics or experimental ( screening,
bioassay A bioassay is an analytical method to determine the potency or effect of a substance by its effect on animal testing, living animals or plants (''in vivo''), or on living cells or tissues (''in vitro''). A bioassay can be either quantal or quantit ...
) results. All of those descriptors can for reasons of computational effort be stored along with the molecule's representation and usually are.


Similarity

There is no single definition of molecular similarity, however the concept may be defined according to the application and is often described as an inverse of a measure of distance in descriptor space. Two molecules might be considered more similar for instance if their difference in
molecular weight A molecule is a group of two or more atoms that are held together by Force, attractive forces known as chemical bonds; depending on context, the term may or may not include ions that satisfy this criterion. In quantum physics, organic chemi ...
s is lower than when compared with others. A variety of other measures could be combined to produce a multi-variate distance measure. Distance measures are often classified into Euclidean measures and non-Euclidean measures depending on whether the
triangle inequality In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side. This statement permits the inclusion of Degeneracy (mathematics)#T ...
holds. Maximum Common Subgraph ( MCS) based substructure search (similarity or distance measure) is also very common. MCS is also used for screening drug like compounds by hitting molecules, which share common subgraph (substructure). Chemicals in the databases may be clustered into groups of 'similar' molecules based on similarities. Both hierarchical and non-hierarchical clustering approaches can be applied to chemical entities with multiple attributes. These attributes or molecular properties may either be determined empirically or computationally derived descriptors. One of the most popular clustering approaches is the Jarvis-Patrick algorithm. In
pharmacological Pharmacology is the science of drugs and medications, including a substance's origin, composition, pharmacokinetics, pharmacodynamics, therapeutic use, and toxicology. More specifically, it is the study of the interactions that occur between ...
ly oriented chemical repositories, similarity is usually defined in terms of the biological effects of compounds (
ADME ADME is the four-letter abbreviation (acronym) for absorption (pharmacokinetics), ''absorption'', distribution (pharmacology), ''distribution'', ''metabolism'', and ''excretion'', and is mainly used in fields such as pharmacokinetics and pharmacol ...
/tox) that can in turn be semiautomatically inferred from similar combinations of physico-chemical descriptors using QSAR methods.


Registration systems

Databases systems for maintaining unique records on
chemical compound A chemical compound is a chemical substance composed of many identical molecules (or molecular entities) containing atoms from more than one chemical element held together by chemical bonds. A molecule consisting of atoms of only one element ...
s are termed as Registration systems. These are often used for chemical indexing,
patent A patent is a type of intellectual property that gives its owner the legal right to exclude others from making, using, or selling an invention for a limited period of time in exchange for publishing an sufficiency of disclosure, enabling discl ...
systems and industrial databases. Registration systems usually enforce uniqueness of the chemical represented in the database through the use of unique representations. By applying rules of precedence for the generation of stringified notations, one can obtain unique/' canonical' string representations such as 'canonical SMILES'. Some registration systems such as the CAS system make use of algorithms to generate unique hash codes to achieve the same objective. A key difference between a registration system and a simple chemical database is the ability to accurately represent that which is known, unknown, and partially known. For example, a chemical database might store a molecule with
stereochemistry Stereochemistry, a subdiscipline of chemistry, studies the spatial arrangement of atoms that form the structure of molecules and their manipulation. The study of stereochemistry focuses on the relationships between stereoisomers, which are defined ...
unspecified, whereas a chemical registry system requires the registrar to specify whether the stereo configuration is unknown, a specific (known) mixture, or racemic. Each of these would be considered a different record in a chemical registry system. Registration systems also preprocess molecules to avoid considering trivial differences such as differences in halogen ions in chemicals. An example is the
Chemical Abstracts Service Chemical Abstracts Service (CAS) is a division of the American Chemical Society. It is a source of chemical information and is located in Columbus, Ohio, United States. Print periodicals ''Chemical Abstracts'' is a periodical index that provid ...
(CAS) registration system. See also CAS registry number.


List of chemical cartridges

* Accord * Direct * J Chem * CambridgeSoft * Bingo * Pinpoint


List of chemical registration systems

* ChemReg * Register * RegMol * Compound-Registration * Ensemble


Web-based


Tools

The computational representations are usually made transparent to chemists by graphical display of the data. Data entry is also simplified through the use of chemical structure editors. These editors internally convert the graphical data into computational representations. There are also numerous algorithms for the interconversion of various formats of representation. An open-source utility for conversion is OpenBabel. These search and conversion algorithms are implemented either within the database system itself or as is now the trend is implemented as external components that fit into standard relational database systems. Both Oracle and
PostgreSQL PostgreSQL ( ) also known as Postgres, is a free and open-source software, free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transaction processing, transactions ...
based systems make use of cartridge technology that allows user defined datatypes. These allow the user to make SQL queries with chemical search conditions (For example, a query to search for records having a phenyl ring in their structure represented as a SMILES string in a SMILESCOL column could be SELECT * FROM CHEMTABLE WHERE SMILESCOL.CONTAINS('c1ccccc1') Algorithms for the conversion of
IUPAC The International Union of Pure and Applied Chemistry (IUPAC ) is an international federation of National Adhering Organizations working for the advancement of the chemical sciences, especially by developing nomenclature and terminology. It is ...
names to structure representations and vice versa are also used for extracting structural information from text. However, there are difficulties due to the existence of multiple dialects of IUPAC. Work is on to establish a unique IUPAC standard (See InChI).


See also

*
Reaxys
chemical and drug development database from Elsevier
Embiology
- biological relationship and target database
Pharmapendium
- drug information * * * * * * * * * * * * * * * *


References

47. https://www.elsevier.com/en-in/products/reaxys


External links


Wikipedia Chemical Structure Explorer
to search Wikipedia chemistry articles by substructure {{DEFAULTSORT:Chemical Database Computational chemistry Cheminformatics