PubChem
   HOME

TheInfoList



OR:

PubChem is a
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases s ...
of
chemical A chemical substance is a form of matter having constant chemical composition and characteristic properties. Some references add that chemical substance cannot be separated into its constituent elements by physical separation methods, i.e., w ...
molecule A molecule is a group of two or more atoms held together by attractive forces known as chemical bonds; depending on context, the term may or may not include ions which satisfy this criterion. In quantum physics, organic chemistry, and bioche ...
s and their activities against
biological assays An assay is an investigative (analytic) procedure in laboratory medicine, mining, pharmacology, environmental biology and molecular biology for qualitatively assessing or quantitatively measuring the presence, amount, or functional activity of a ...
. The system is maintained by the
National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The ...
(NCBI), a component of the
National Library of Medicine The United States National Library of Medicine (NLM), operated by the United States federal government, is the world's largest medical library. Located in Bethesda, Maryland, the NLM is an institute within the National Institutes of Health. Its ...
, which is part of the United States
National Institutes of Health The National Institutes of Health, commonly referred to as NIH (with each letter pronounced individually), is the primary agency of the United States government responsible for biomedical and public health research. It was founded in the late ...
(NIH). PubChem can be accessed for free through a web user interface. Millions of compound structures and descriptive datasets can be freely downloaded via
FTP The File Transfer Protocol (FTP) is a standard communication protocol used for the transfer of computer files from a server to a client on a computer network. FTP is built on a client–server model architecture using separate control and data ...
. PubChem contains multiple substance descriptions and small molecules with fewer than 100 atoms and 1,000 bonds. More than 80 database vendors contribute to the growing PubChem database.


History

PubChem was released in 2004 as a component of the Molecular Libraries Program (MLP) of the NIH. As of November 2015, PubChem contains more than 150 million depositor-provided substance descriptions, 60 million unique chemical structures, and 225 million biological activity test results (from over 1 million assay experiments performed on more than 2 million small-molecules covering almost 10,000 unique protein target sequences that correspond to more than 5,000 genes). It also contains RNA interference (RNAi) screening assays that target over 15,000 genes. As of August 2018, PubChem contains 247.3 million substance descriptions, 96.5 million unique chemical structures, contributed by 629 data sources from 40 countries. It also contains 237 million bioactivity test results from 1.25 million biological assays, covering >10,000 target protein sequences. As of 2020, with data integration from over 100 new sources, PubChem contains more than 293 million depositor-provided substance descriptions, 111 million unique chemical structures, and 271 million bioactivity data points from 1.2 million biological assays experiments.


Databases

PubChem consists of three dynamically growing primary databases. As of 5 November 2020 (number of BioAssays is unchanged): * Compounds, 111 million entries (up from 94 million entries in 2017), contains pure and characterized chemical compounds. * Substances, 293 million entries (up from 236 million entries in 2017 and 163 million in Sept. 2014), contains also mixtures, extracts, complexes and uncharacterized substances. * BioAssay,
bioactivity In pharmacology, biological activity or pharmacological activity describes the beneficial or adverse effects of a drug on living matter. When a drug is a complex chemical mixture, this activity is exerted by the substance's active ingredient or ...
results from 1.25 million (up from 6,000 in Sept. 2014)
high-throughput screening High-throughput screening (HTS) is a method for scientific experimentation especially used in drug discovery and relevant to the fields of biology, materials science and chemistry. Using robotics, data processing/control software, liquid handling ...
programs with several million values.


Searching

Searching the databases is possible for a broad range of properties including chemical structure, name fragments,
chemical formula In chemistry, a chemical formula is a way of presenting information about the chemical proportions of atoms that constitute a particular chemical compound or molecule, using chemical element symbols, numbers, and sometimes also other symbol ...
,
molecular weight A molecule is a group of two or more atoms held together by attractive forces known as chemical bonds; depending on context, the term may or may not include ions which satisfy this criterion. In quantum physics, organic chemistry, and bioch ...
,
XLogP In the physical sciences, a partition coefficient (''P'') or distribution coefficient (''D'') is the ratio of concentrations of a compound in a mixture of two immiscible solvents at equilibrium. This ratio is therefore a comparison of the solubi ...
, and hydrogen bond donor and acceptor count. PubChem contains its own online
molecule editor A molecule editor is a computer program for creating and modifying representations of chemical structures. Molecule editors can manipulate chemical structure representations in either a simulated two-dimensional space or three-dimensional space, v ...
with
SMILES The simplified molecular-input line-entry system (SMILES) is a specification in the form of a line notation for describing the structure of chemical species using short ASCII strings. SMILES strings can be imported by most molecule editors f ...
/SMARTS and InChI support that allows the import and export of all common
chemical file format A chemical file format is a type of data file which is used specifically to depicting molecular data. One of the most widely used is the chemical table file format, which is similar to ''Structure Data Format'' (SDF) files. They are text files ...
s to search for structures and fragments. Each hit provides information about synonyms, chemical properties, chemical structure including SMILES and InChI strings, bioactivity, and links to structurally related compounds and other NCBI databases like
PubMed PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Health maintain t ...
. In the text search form the database fields can be searched by adding the field name in square brackets to the search term. A numeric range is represented by two numbers separated by a colon. The search terms and field names are case-insensitive. Parentheses and the
logical operator In logic, a logical connective (also called a logical operator, sentential connective, or sentential operator) is a logical constant. They can be used to connect logical formulas. For instance in the syntax of propositional logic, the binary ...
s AND, OR, and NOT can be used. AND is assumed if no operator is used. Example (
Lipinski's Rule of Five Lipinski's rule of five, also known as Pfizer's rule of five or simply the rule of five (RO5), is a rule of thumb to evaluate druglikeness or determine if a chemical compound with a certain pharmacological or biological activity has chemical pro ...
): 0:500 w0:5 bdc0:10
bac BAC or Bac may refer to: Places * Bac, Rožaje, Bac, a village in Montenegro * Baile Átha Cliath, Irish language name for Dublin city. * Bîc River, aka ''Bâc River'', a Moldovan river * Baç Bridge, bridge in Turkey * Barnes County Municipal A ...
-5:5 ogp


Database fields


See also

*
Chemical database A chemical database is a database specifically designed to store chemical information. This information is about chemical and crystal structures, spectra, reactions and syntheses, and thermophysical data. Types of chemical databases Bioactivi ...
** CAS Common Chemistry - run by the American Chemical Society **
Comparative Toxicogenomics Database The Comparative Toxicogenomics Database (CTD) is a public website and research tool launched in November 2004 that curates scientific data describing relationships between chemicals/drugs, genes/proteins, diseases, taxa, phenotypes, GO annotations ...
- run by North Carolina State University ** ChEMBL - run by European Bioinformatics Institute ** ChemSpider - run by UK's Royal Society of Chemistry ** DrugBank - run by the University of Alberta **
IUPAC The International Union of Pure and Applied Chemistry (IUPAC ) is an international federation of National Adhering Organizations working for the advancement of the chemical sciences, especially by developing nomenclature and terminology. It is ...
- run by Swiss-based International Union of Pure and Applied Chemistry (IUPAC) ** Moltable - run by India's National Chemical Laboratory ** PubChem - run by the National Institute of Health, USA ** BindingDB - run by the University of California, San Diego ** SCRIPDB - run by the University of Toronto, Canada **
National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The ...
(NCBI) - run by the National Institute of Health, USA **
Entrez The Entrez (pronounced ''ɒnˈtreɪ'') Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information ...
- run by the National Institute of Health, USA **
GenBank The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a part ...
- run by the National Institute of Health, USA


References


External links

* {{DEFAULTSORT:Pubchem Chemical databases Biological databases National Institutes of Health Public-domain software with source code