The Human Metabolome Database (HMDB)
is a comprehensive, high-quality, freely accessible, online database of
small molecule
In molecular biology and pharmacology, a small molecule or micromolecule is a low molecular weight (≤ 1000 daltons) organic compound that may regulate a biological process, with a size on the order of 1 nm. Many drugs are small molecules; ...
metabolites
In biochemistry, a metabolite is an intermediate or end product of metabolism.
The term is usually used for small molecules. Metabolites have various functions, including fuel, structure, signaling, stimulatory and inhibitory effects on enzymes, c ...
found in the human body. It has been created by the Human Metabolome Project funded by
Genome Canada and is one of the first dedicated
metabolomics
Metabolomics is the scientific study of chemical processes involving metabolites, the small molecule substrates, intermediates, and products of cell metabolism. Specifically, metabolomics is the "systematic study of the unique chemical fingerpri ...
databases. The HMDB facilitates human metabolomics research, including the identification and characterization of human metabolites using
NMR spectroscopy
Nuclear magnetic resonance spectroscopy, most commonly known as NMR spectroscopy or magnetic resonance spectroscopy (MRS), is a spectroscopic technique based on re-orientation of atomic nuclei with non-zero nuclear spins in an external magnetic f ...
,
GC-MS spectrometry and
LC/MS spectrometry. To aid in this discovery process, the HMDB contains three kinds of data: 1) chemical data, 2) clinical data, and 3)
molecular biology
Molecular biology is a branch of biology that seeks to understand the molecule, molecular basis of biological activity in and between Cell (biology), cells, including biomolecule, biomolecular synthesis, modification, mechanisms, and interactio ...
/
biochemistry
Biochemistry, or biological chemistry, is the study of chemical processes within and relating to living organisms. A sub-discipline of both chemistry and biology, biochemistry may be divided into three fields: structural biology, enzymology, a ...
data (Fig. 1–3). The chemical data includes 41,514 metabolite structures with detailed descriptions along with nearly 10,000 NMR, GC-MS and LC/MS spectra.
The clinical data includes information on >10,000 metabolite-
biofluid
Body fluids, bodily fluids, or biofluids, sometimes body liquids, are liquids within the body of an organism. In lean healthy adult men, the total body water is about 60% (60–67%) of the total body weight; it is usually slightly lower in wome ...
concentrations and metabolite concentration information on more than 600 different human
diseases
A disease is a particular abnormal condition that adversely affects the structure or function of all or part of an organism and is not immediately due to any external injury. Diseases are often known to be medical conditions that are asso ...
. The biochemical data includes 5,688
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
(and
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
) sequences and more than 5,000
biochemical reactions that are linked to these metabolite entries.
Each metabolite entry in the HMDB contains more than 110 data fields with 2/3 of the information being devoted to chemical/clinical data and the other 1/3 devoted to enzymatic or biochemical data. Many data fields are hyperlinked to other databases (
KEGG
KEGG (Kyoto Encyclopedia of Genes and Genomes) is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis ...
,
MetaCyc,
PubChem
PubChem is a database of Chemistry, chemical molecules and their activities against biological assays. The system is maintained by the National Center for Biotechnology Information (NCBI), a component of the National Library of Medicine, which ...
,
Protein Data Bank
The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules such as proteins and nucleic acids, which is overseen by the Worldwide Protein Data Bank (wwPDB). This structural data is obtained a ...
,
ChEBI
Chemical Entities of Biological Interest, also known as ChEBI, is a chemical database and ontology of molecular entities focused on "small" chemical compounds, that is part of the Open Biomedical Ontologies (OBO) effort at the European Bioinfor ...
,
Swiss-Prot
UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from ...
, and
GenBank
The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a par ...
) and a variety of structure and pathway viewing applets. The HMDB database supports extensive text, sequence, spectral,
chemical structure
A chemical structure of a molecule is a spatial arrangement of its atoms and their chemical bonds. Its determination includes a chemist's specifying the molecular geometry and, when feasible and necessary, the electronic structure of the target m ...
and
relational query searches. It has been widely used in metabolomics,
clinical chemistry,
biomarker
In biomedical contexts, a biomarker, or biological marker, is a measurable indicator of some biological state or condition. Biomarkers are often measured and evaluated using blood, urine, or soft tissues to examine normal biological processes, ...
discovery and general biochemistry education.
Four additional databases, DrugBank,
T3DB,
SMPDB
and
FooDB are also part of the HMDB suite of databases. DrugBank contains equivalent information on ~1,600 drug and drug metabolites, T3DB contains information on 3,100 common
toxin
A toxin is a naturally occurring poison produced by metabolic activities of living cells or organisms. They occur especially as proteins, often conjugated. The term was first used by organic chemist Ludwig Brieger (1849–1919), derived ...
s and environmental
pollutant
A pollutant or novel entity is a substance or energy introduced into the environment that has undesired effect, or adversely affects the usefulness of a resource. These can be both naturally forming (i.e. minerals or extracted compounds like oi ...
s, SMPDB contains pathway diagrams for 700 human metabolic and disease pathways, while FooDB contains equivalent information on ~28,000 food components and
food additive
Food additives are substances added to food to preserve flavor or enhance taste, appearance, or other sensory qualities. Some additives, such as vinegar ( pickling), salt ( salting), smoke ( smoking) and sugar ( crystallization), have been used f ...
s.
Version history
The first version of HMDB was released on January 1, 2007,
followed by two subsequent versions on January 1, 2009 (version 2.0),
August 1, 2009 (version 2.5), September 18, 2012 (version 3.0)
and Jan. 1, 2013 (version 3.5),
2017 (version 4.0).
2022 (version 5.0).
Details for each of the major HMDB versions (up to version 5.0) is provided in Table 1.
Scope and access
All data in HMDB is non-proprietary or is derived from a non-proprietary source. It is freely accessible and available to anyone. In addition, nearly every data item is fully traceable and explicitly referenced to the original source. HMDB data is available through a public web interface and downloads.
See also
*
KEGG
KEGG (Kyoto Encyclopedia of Genes and Genomes) is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis ...
*
DrugBank
The DrugBank database is a comprehensive, freely accessible, online database containing information on drugs and drug targets created and maintained by the University of Alberta and The Metabolomics Innovation Centre located in Alberta, Canada. A ...
*
SMPDB
*
MetaCyc
*
Bovine Metabolome Database
*
Metabolome
The metabolome refers to the complete set of small-molecule chemicals found within a biological sample. The biological sample can be a cell, a cellular organelle, an organ, a tissue, a tissue extract, a biofluid or an entire organism. The ...
*
Metabolomics
Metabolomics is the scientific study of chemical processes involving metabolites, the small molecule substrates, intermediates, and products of cell metabolism. Specifically, metabolomics is the "systematic study of the unique chemical fingerpri ...
*
List of biological databases
References
{{Genomics
Biochemistry databases
Metabolomic databases
Medical databases
Food databases
Human metabolites