PubChem Logo
   HOME

TheInfoList



OR:

PubChem is a
database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
of
chemical A chemical substance is a unique form of matter with constant chemical composition and characteristic properties. Chemical substances may take the form of a single element or chemical compounds. If two or more chemical substances can be combin ...
molecule A molecule is a group of two or more atoms that are held together by Force, attractive forces known as chemical bonds; depending on context, the term may or may not include ions that satisfy this criterion. In quantum physics, organic chemi ...
s and their activities against
biological assays An assay is an investigative (analytic) procedure in laboratory medicine, mining, pharmacology, environmental biology and molecular biology for qualitatively assessing or quantitatively measuring the presence, amount, or functional activity of ...
. The system is maintained by the
National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) is part of the National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is lo ...
(NCBI), a component of the
National Library of Medicine The United States National Library of Medicine (NLM), operated by the United States federal government, is the world's largest medical library. Located in Bethesda, Maryland, the NLM is an institute within the National Institutes of Health. I ...
, which is part of the United States
National Institutes of Health The National Institutes of Health (NIH) is the primary agency of the United States government responsible for biomedical and public health research. It was founded in 1887 and is part of the United States Department of Health and Human Service ...
(NIH). PubChem can be accessed for free through a
web user interface A web application (or web app) is application software that is created with web technologies and runs via a web browser. Web applications emerged during the late 1990s and allowed for the server to dynamically build a response to the request, ...
. Millions of compound structures and descriptive datasets can be freely downloaded via
FTP The File Transfer Protocol (FTP) is a standard communication protocol used for the transfer of computer files from a server to a client on a computer network. FTP is built on a client–server model architecture using separate control and dat ...
. PubChem contains multiple substance descriptions and small molecules with fewer than 100 atoms and 1,000 bonds. More than 80 database vendors contribute to the growing PubChem database.


History

PubChem was released in 2004 as a component of the Molecular Libraries Program (MLP) of the NIH. As of November 2015, PubChem contains more than 150 million depositor-provided substance descriptions, 60 million unique chemical structures, and 225 million biological activity test results (from over 1 million assay experiments performed on more than 2 million small-molecules covering almost 10,000 unique protein target sequences that correspond to more than 5,000 genes). It also contains
RNA interference RNA interference (RNAi) is a biological process in which RNA molecules are involved in sequence-specific suppression of gene expression by double-stranded RNA, through translational or transcriptional repression. Historically, RNAi was known by ...
(RNAi) screening assays that target over 15,000 genes. As of August 2018, PubChem contains 247.3 million substance descriptions, 96.5 million unique chemical structures, contributed by 629 data sources from 40 countries. It also contains 237 million bioactivity test results from 1.25 million biological assays, covering >10,000 target protein sequences. As of 2020, with data integration from over 100 new sources, PubChem contains more than 293 million depositor-provided substance descriptions, 111 million unique chemical structures, and 271 million bioactivity data points from 1.2 million biological assays experiments.


Databases

PubChem consists of three dynamically growing primary databases. As of 5 November 2020 (number of BioAssays is unchanged): * Compounds, 111 million entries (up from 94 million entries in 2017), contains pure and characterized chemical compounds. * Substances, 293 million entries (up from 236 million entries in 2017 and 163 million in Sept. 2014), contains also mixtures,
extract An extract (essence) is a substance made by extracting a part of a raw material, often by using a solvent such as ethanol, oil or water. Extracts may be sold as tinctures or absolutes or dried and powdered. The aromatic principles of ma ...
s, complexes and uncharacterized substances. * BioAssay,
bioactivity In pharmacology, biological activity or pharmacological activity describes the beneficial or adverse effects of a drug on living matter. When a drug is a complex chemical mixture, this activity is exerted by the substance's active ingredient or ...
results from 1.25 million (up from 6,000 in Sept. 2014)
high-throughput screening High-throughput screening (HTS) is a method for scientific discovery especially used in drug discovery and relevant to the fields of biology, materials science and chemistry. Using robotics, data processing/control software, liquid handling device ...
programs with several million values.


Searching

Searching the databases is possible for a broad range of properties including chemical structure, name fragments,
chemical formula A chemical formula is a way of presenting information about the chemical proportions of atoms that constitute a particular chemical compound or molecule, using chemical element symbols, numbers, and sometimes also other symbols, such as pare ...
,
molecular weight A molecule is a group of two or more atoms that are held together by Force, attractive forces known as chemical bonds; depending on context, the term may or may not include ions that satisfy this criterion. In quantum physics, organic chemi ...
,
XLogP In the physical sciences, a partition coefficient (''P'') or distribution coefficient (''D'') is the ratio of concentrations of a compound in a mixture of two immiscible solvents at equilibrium. This ratio is therefore a comparison of the solubi ...
, and
hydrogen bond In chemistry, a hydrogen bond (H-bond) is a specific type of molecular interaction that exhibits partial covalent character and cannot be described as a purely electrostatic force. It occurs when a hydrogen (H) atom, Covalent bond, covalently b ...
donor and acceptor count. PubChem contains its own online
molecule editor A notable molecule editor is a computer program for creating and modifying representations of chemical structures. Molecule editors can manipulate chemical structure representations in either a simulated two-dimensional space or three-dimensional ...
with
SMILES A smile is a facial expression formed primarily by flexing the muscles at the sides of the mouth. Some smiles include a contraction of the muscles at the corner of the eyes, an action known as a Duchenne smile. Among humans, a smile expresses d ...
/SMARTS and
InChI The International Chemical Identifier (InChI, pronounced ) is a textual identifier for chemical substances, designed to provide a standard way to encode molecular information and to facilitate the search for such information in databases and on ...
support that allows the import and export of all common
chemical file format A chemical file format is a type of data file which is used specifically for depicting molecular data. One of the most widely used is the chemical table file format, which is similar to ''Structure Data Format'' (SDF) files. They are text files ...
s to search for structures and fragments. Each hit provides information about synonyms, chemical properties, chemical structure including SMILES and InChI strings, bioactivity, and links to structurally related compounds and other NCBI databases like
PubMed PubMed is an openly accessible, free database which includes primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institute ...
. In the text search form the database fields can be searched by adding the field name in square brackets to the search term. A numeric range is represented by two numbers separated by a colon. The search terms and field names are case-insensitive. Parentheses and the
logical operator In logic, a logical connective (also called a logical operator, sentential connective, or sentential operator) is a logical constant. Connectives can be used to connect logical formulas. For instance in the syntax of propositional logic, the ...
s AND, OR, and NOT can be used. AND is assumed if no operator is used. Example ( Lipinski's Rule of Five): 0:500 w0:5 bdc0:10
bac BAC or Bac may refer to: Arts and entertainment Arts centres and arts councils * Balochistan Arts Council, in Quetta, Pakistan * Baryshnikov Arts Center, in Manhattan, New York City * Battersea Arts Centre, London, England * Beirut Art Cente ...
-5:5 ogp


Database fields


See also

*
Chemical database A chemical database is a database specifically designed to store chemical information. This information is about chemical and crystal structures, spectra, reactions and syntheses, and thermophysical data. Types of chemical databases Bioactiv ...
** CAS Common Chemistry - run by the American Chemical Society **
Comparative Toxicogenomics Database The Comparative Toxicogenomics Database (CTD) is a public website and research tool launched in November 2004 that curates scientific data describing relationships between chemicals/drugs, genes/proteins, diseases, taxa, phenotypes, GO annotations, ...
- run by North Carolina State University **
ChEMBL ChEMBL or ChEMBLdb is a manually curated chemical database of bioactive molecules with drug inducing properties. It is maintained by the European Bioinformatics Institute (EBI), of the European Molecular Biology Laboratory ( EMBL), based at the ...
- run by European Bioinformatics Institute **
ChemSpider ChemSpider is a freely accessible online chemical database, database of chemicals owned by the Royal Society of Chemistry. It contains information on more than 100 million molecules from over 270 data sources, each of them receiving a unique ...
- run by UK's Royal Society of Chemistry **
DrugBank The DrugBank database is a comprehensive, freely accessible, online database containing information on drugs and drug targets created and maintained by the University of Alberta and The Metabolomics Innovation Centre located in Alberta, Canada. A ...
- run by the University of Alberta **
IUPAC The International Union of Pure and Applied Chemistry (IUPAC ) is an international federation of National Adhering Organizations working for the advancement of the chemical sciences, especially by developing nomenclature and terminology. It is ...
- run by Swiss-based International Union of Pure and Applied Chemistry (IUPAC) ** Moltable - run by India's National Chemical Laboratory ** PubChem - run by the National Institute of Health, USA **
BindingDB BindingDB is a public, web-accessible database of measured binding affinities, focusing chiefly on the interactions of proteins considered to be candidate drug-targets with ligands that are small, drug-like molecules. As of March, 2011, Bindi ...
- run by the University of California, San Diego ** SCRIPDB - run by the University of Toronto, Canada **
National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) is part of the National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is lo ...
(NCBI) - run by the National Institute of Health, USA **
Entrez The Entrez () Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website. The NCB ...
- run by the National Institute of Health, USA **
GenBank The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a par ...
- run by the National Institute of Health, USA


References


External links

* {{DEFAULTSORT:Pubchem Chemical databases Biological databases National Institutes of Health Public-domain software with source code