KEGG (Kyoto Encyclopedia of Genes and Genomes) is a collection of databases dealing with
genome
A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
s,
biological pathways,
disease
A disease is a particular abnormal condition that adversely affects the structure or function (biology), function of all or part of an organism and is not immediately due to any external injury. Diseases are often known to be medical condi ...
s,
drug
A drug is any chemical substance other than a nutrient or an essential dietary ingredient, which, when administered to a living organism, produces a biological effect. Consumption of drugs can be via insufflation (medicine), inhalation, drug i ...
s, and
chemical substance
A chemical substance is a unique form of matter with constant chemical composition and characteristic properties. Chemical substances may take the form of a single element or chemical compounds. If two or more chemical substances can be com ...
s. KEGG is utilized for
bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
research and education, including data analysis in
genomics
Genomics is an interdisciplinary field of molecular biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, ...
,
metagenomics
Metagenomics is the study of all genetics, genetic material from all organisms in a particular environment, providing insights into their composition, diversity, and functional potential. Metagenomics has allowed researchers to profile the mic ...
,
metabolomics
Metabolomics is the scientific study of chemical processes involving metabolites, the small molecule substrates, intermediates, and products of cell metabolism. Specifically, metabolomics is the "systematic study of the unique chemical fingerpri ...
and other
omics
Omics is the collective characterization and quantification of entire sets of biological molecules and the investigation of how they translate into the structure, function, and dynamics of an organism or group of organisms. The branches of scien ...
studies, modeling and simulation in
systems biology
Systems biology is the computational modeling, computational and mathematical analysis and modeling of complex biological systems. It is a biology-based interdisciplinary field of study that focuses on complex interactions within biological system ...
, and
translational research in
drug development
Drug development is the process of bringing a new pharmaceutical drug to the market once a lead compound has been identified through the process of drug discovery. It includes preclinical research on microorganisms and animals, filing for regu ...
.
The KEGG database project was initiated in 1995 by
Minoru Kanehisa, professor at the Institute for Chemical Research,
Kyoto University
, or , is a National university, national research university in Kyoto, Japan. Founded in 1897, it is one of the former Imperial Universities and the second oldest university in Japan.
The university has ten undergraduate faculties, eighteen gra ...
, under the then ongoing Japanese
Human Genome Program.
Foreseeing the need for a computerized resource that can be used for biological interpretation of
genome sequence data, he started developing the KEGG PATHWAY database. It is a collection of manually drawn KEGG pathway maps representing experimental knowledge on
metabolism
Metabolism (, from ''metabolē'', "change") is the set of life-sustaining chemical reactions in organisms. The three main functions of metabolism are: the conversion of the energy in food to energy available to run cellular processes; the co ...
and various other functions of the
cell and the
organism
An organism is any life, living thing that functions as an individual. Such a definition raises more problems than it solves, not least because the concept of an individual is also difficult. Many criteria, few of them widely accepted, have be ...
. Each pathway map contains a network of molecular interactions and reactions and is designed to link
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s in the genome to
gene product
A gene product is the biochemical material, either RNA or protein, resulting from the expression of a gene. A measurement of the amount of gene product is sometimes used to infer how active a gene is. Abnormal amounts of gene product can be corre ...
s (mostly
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
s) in the pathway. This has enabled the analysis called KEGG pathway mapping, whereby the gene content in the genome is compared with the KEGG PATHWAY database to examine which pathways and associated functions are likely to be encoded in the genome.
According to the developers, KEGG is a "computer representation" of the
biological system
A biological system is a complex Biological network inference, network which connects several biologically relevant entities. Biological organization spans several scales and are determined based different structures depending on what the system is ...
.
It integrates building blocks and wiring diagrams of the system—more specifically, genetic building blocks of genes and proteins, chemical building blocks of
small molecule
In molecular biology and pharmacology, a small molecule or micromolecule is a low molecular weight (≤ 1000 daltons) organic compound that may regulate a biological process, with a size on the order of 1 nm. Many drugs are small molecules; ...
s and reactions, and wiring diagrams of molecular interaction and reaction networks. This concept is realized in the following databases of KEGG, which are categorized into systems, genomic, chemical, and health information.
* Systems information
** PATHWAY:
pathway maps for cellular and organismal functions
** MODULE: modules or functional units of genes
** BRITE: hierarchical classifications of biological entities
* Genomic information
** GENOME: complete
genome
A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
s
** GENES:
gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s and
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
s in the complete genomes
** ORTHOLOGY:
ortholog groups of genes in the complete genomes
* Chemical information
** COMPOUND, GLYCAN:
chemical compound
A chemical compound is a chemical substance composed of many identical molecules (or molecular entities) containing atoms from more than one chemical element held together by chemical bonds. A molecule consisting of atoms of only one element ...
s and
glycan
The terms glycans and polysaccharides are defined by IUPAC as synonyms meaning "compounds consisting of a large number of monosaccharides linked glycosidically". However, in practice the term glycan may also be used to refer to the carbohydrate ...
s
** REACTION, RPAIR, RCLASS:
chemical reaction
A chemical reaction is a process that leads to the chemistry, chemical transformation of one set of chemical substances to another. When chemical reactions occur, the atoms are rearranged and the reaction is accompanied by an Gibbs free energy, ...
s
** ENZYME:
enzyme nomenclature
* Health information
** DISEASE: human
disease
A disease is a particular abnormal condition that adversely affects the structure or function (biology), function of all or part of an organism and is not immediately due to any external injury. Diseases are often known to be medical condi ...
s
** DRUG:
approved drugs
** ENVIRON:
crude drug
Crude drugs are drugs of plant, animal and microbial origin that contain natural substances that have undergone only the processes of collection and drying. The term natural substances refers to those substances found in nature that have not had ...
s and health-related substances
Databases
Systems information
The KEGG PATHWAY database, the wiring diagram database, is the core of the KEGG resource. It is a collection of pathway maps integrating many entities including genes, proteins, RNAs, chemical compounds, glycans, and chemical reactions, as well as disease genes and drug targets, which are stored as individual entries in the other databases of KEGG. The pathway maps are classified into the following sections:
*
Metabolism
Metabolism (, from ''metabolē'', "change") is the set of life-sustaining chemical reactions in organisms. The three main functions of metabolism are: the conversion of the energy in food to energy available to run cellular processes; the co ...
* Genetic information processing (
transcription,
translation
Translation is the communication of the semantics, meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The English la ...
,
replication and
repair
The technical meaning of maintenance involves functional checks, servicing, repairing or replacing of necessary devices, equipment, machinery, building infrastructure and supporting utilities in industrial, business, and residential installat ...
, etc.)
* Environmental information processing (
membrane transport,
signal transduction
Signal transduction is the process by which a chemical or physical signal is transmitted through a cell as a biochemical cascade, series of molecular events. Proteins responsible for detecting stimuli are generally termed receptor (biology), rece ...
, etc.)
* Cellular processes (
cell growth
Cell most often refers to:
* Cell (biology), the functional basic unit of life
* Cellphone, a phone connected to a cellular network
* Clandestine cell, a penetration-resistant form of a secret or outlawed organization
* Electrochemical cell, a de ...
,
cell death,
cell membrane
The cell membrane (also known as the plasma membrane or cytoplasmic membrane, and historically referred to as the plasmalemma) is a biological membrane that separates and protects the interior of a cell from the outside environment (the extr ...
functions, etc.)
* Organismal systems (
immune system
The immune system is a network of biological systems that protects an organism from diseases. It detects and responds to a wide variety of pathogens, from viruses to bacteria, as well as Tumor immunology, cancer cells, Parasitic worm, parasitic ...
,
endocrine system
The endocrine system is a messenger system in an organism comprising feedback loops of hormones that are released by internal glands directly into the circulatory system and that target and regulate distant Organ (biology), organs. In vertebrat ...
,
nervous system
In biology, the nervous system is the complex system, highly complex part of an animal that coordinates its behavior, actions and sense, sensory information by transmitting action potential, signals to and from different parts of its body. Th ...
, etc.)
* Human
disease
A disease is a particular abnormal condition that adversely affects the structure or function (biology), function of all or part of an organism and is not immediately due to any external injury. Diseases are often known to be medical condi ...
s
*
Drug development
Drug development is the process of bringing a new pharmaceutical drug to the market once a lead compound has been identified through the process of drug discovery. It includes preclinical research on microorganisms and animals, filing for regu ...
The metabolism section contains aesthetically drawn global maps showing an overall picture of metabolism, in addition to regular metabolic pathway maps. The low-resolution global maps can be used, for example, to compare metabolic capacities of different organisms in genomics studies and different environmental samples in metagenomics studies. In contrast, KEGG modules in the KEGG MODULE database are higher-resolution, localized wiring diagrams, representing tighter functional units within a pathway map, such as subpathways conserved among specific organism groups and molecular complexes. KEGG modules are defined as characteristic gene sets that can be linked to specific metabolic capacities and other
phenotypic
In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology (physical form and structure), its developmental processes, its biochemical and physiological propert ...
features, so that they can be used for automatic interpretation of genome and metagenome data.
Another database that supplements KEGG PATHWAY is the KEGG BRITE database. It is an
ontology
Ontology is the philosophical study of existence, being. It is traditionally understood as the subdiscipline of metaphysics focused on the most general features of reality. As one of the most fundamental concepts, being encompasses all of realit ...
database containing hierarchical classifications of various entities including genes, proteins, organisms, diseases, drugs, and chemical compounds. While KEGG PATHWAY is limited to molecular interactions and reactions of these entities, KEGG BRITE incorporates many different types of relationships.
Genomic information
Several months after the KEGG project was initiated in 1995, the first report of the completely sequenced
bacteria
Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of Prokaryote, prokaryotic microorganisms. Typically a few micr ...
l genome was published.
Since then all published complete genomes are accumulated in KEGG for both
eukaryote
The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...
s and
prokaryote
A prokaryote (; less commonly spelled procaryote) is a unicellular organism, single-celled organism whose cell (biology), cell lacks a cell nucleus, nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Ancient Gree ...
s. The KEGG GENES database contains gene/protein-level information and the KEGG GENOME database contains organism-level information for these genomes. The KEGG GENES database consists of gene sets for the complete genomes, and genes in each set are given
annotation
An annotation is extra information associated with a particular point in a document or other piece of information. It can be a note that includes a comment or explanation. Annotations are sometimes presented Marginalia, in the margin of book page ...
s in the form of establishing correspondences to the wiring diagrams of KEGG pathway maps, KEGG modules, and BRITE hierarchies.
These correspondences are made using the concept of
orthologs. The KEGG pathway maps are drawn based on experimental evidence in specific organisms but they are designed to be applicable to other organisms as well, because different organisms, such as human and mouse, often share identical pathways consisting of functionally identical genes, called orthologous genes or orthologs. All the genes in the KEGG GENES database are being grouped into such orthologs in the KEGG ORTHOLOGY (KO) database. Because the nodes (gene products) of KEGG pathway maps, as well as KEGG modules and BRITE hierarchies, are given KO identifiers, the correspondences are established once genes in the genome are annotated with KO identifiers by the
genome annotation procedure in KEGG.
Chemical information
The KEGG metabolic pathway maps are drawn to represent the dual aspects of the metabolic network: the genomic network of how genome-encoded
enzyme
An enzyme () is a protein that acts as a biological catalyst by accelerating chemical reactions. The molecules upon which enzymes may act are called substrate (chemistry), substrates, and the enzyme converts the substrates into different mol ...
s are connected to catalyze consecutive reactions and the chemical network of how chemical structures of
substrates and
products are transformed by these reactions.
A set of enzyme genes in the genome will identify enzyme relation networks when superimposed on the KEGG pathway maps, which in turn characterize chemical structure transformation networks allowing interpretation of
biosynthetic and
biodegradation
Biodegradation is the breakdown of organic matter by microorganisms, such as bacteria and fungi. It is generally assumed to be a natural process, which differentiates it from composting. Composting is a human-driven process in which biodegrada ...
potentials of the organism. Alternatively, a set of
metabolite
In biochemistry, a metabolite is an intermediate or end product of metabolism.
The term is usually used for small molecules. Metabolites have various functions, including fuel, structure, signaling, stimulatory and inhibitory effects on enzymes, c ...
s identified in the metabolome will lead to the understanding of enzymatic pathways and enzyme genes involved.
The databases in the chemical information category, which are collectively called KEGG LIGAND, are organized by capturing knowledge of the chemical network. In the beginning of the KEGG project, KEGG LIGAND consisted of three databases: KEGG COMPOUND for chemical compounds, KEGG REACTION for chemical reactions, and KEGG ENZYME for reactions in the enzyme nomenclature.
Currently, there are additional databases: KEGG GLYCAN for glycans
and two auxiliary reaction databases called RPAIR (reactant pair alignments) and RCLASS (reaction class).
KEGG COMPOUND has also been expanded to contain various compounds such as
xenobiotics, in addition to metabolites.
Health information
In KEGG, diseases are viewed as perturbed states of the biological system caused by perturbants of genetic factors and environmental factors, and drugs are viewed as different types of perturbants.
The KEGG PATHWAY database includes not only the normal states but also the perturbed states of the biological systems. However, disease pathway maps cannot be drawn for most diseases because molecular mechanisms are not well understood. An alternative approach is taken in the KEGG DISEASE database, which simply catalogs known genetic factors and environmental factors of diseases. These catalogs may eventually lead to more complete wiring diagrams of diseases.
The KEGG DRUG database contains
active ingredient
An active ingredient is any ingredient that provides biologically active or other direct effect in the diagnosis, cure, mitigation, treatment, or prevention of disease or to affect the structure or any function of the body of humans or animals.
...
s of
approved drug
An approved drug is a Medicine, medicinal preparation that has been validated for a therapeutic use by a Regulation of therapeutic goods, ruling authority of a government. This process is usually specific by country, unless specified otherwise.
...
s in Japan, the US, and Europe. They are distinguished by chemical structures and/or chemical components and associated with
target molecules,
metabolizing enzymes, and other molecular interaction network information in the KEGG pathway maps and the BRITE hierarchies. This enables an integrated analysis of drug interactions with genomic information.
Crude drug
Crude drugs are drugs of plant, animal and microbial origin that contain natural substances that have undergone only the processes of collection and drying. The term natural substances refers to those substances found in nature that have not had ...
s and other health-related substances, which are outside the category of approved drugs, are stored in the KEGG ENVIRON database. The databases in the health information category are collectively called KEGG MEDICUS, which also includes
package inserts of all marketed drugs in Japan.
Subscription model
In July 2011 KEGG introduced a subscription model for FTP download due to a significant cutback of government funding. KEGG continues to be freely available through its website, but the subscription model has raised discussions about sustainability of bioinformatics databases.
See also
*
Comparative Toxicogenomics Database - CTD integrates KEGG pathways with toxicogenomic and disease data
*
ConsensusPathDB, a molecular functional interaction database, integrating information from KEGG
*
Gene Ontology (GO)
*
PubMed
PubMed is an openly accessible, free database which includes primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institute ...
*
Uniprot
UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived fro ...
*
Gene Disease Database
References
External links
KEGG websiteGenomeNet mirror site* Th
entry for KEGGin MetaBase
{{DEFAULTSORT:Kegg
Biological databases
Genetic engineering in Japan
Online databases
Systems biology
21st-century encyclopedias