Metascape is a free gene annotation and analysis resource that helps biologists make sense of one or multiple gene lists. Metascape provides automated meta-analysis tools to understand either common or unique pathways and protein networks within a group of orthogonal target-discovery studies.
History
In the "OMICs" age, it is important to gain biological insights into a list of genes. Although a number of bioinformatics sources exist for this purpose, such as
DAVID
David (; , "beloved one") was a king of ancient Israel and Judah and the third king of the United Monarchy, according to the Hebrew Bible and Old Testament.
The Tel Dan stele, an Aramaic-inscribed stone erected by a king of Aram-Dam ...
, they are not all free, easy to use, and well maintained. To analyze multiple lists of genes originated from orthogonal but complementary "OMICs" studies, tools often require computational skills that are beyond the reach of many biologists. According to the Metascape blog, a team of scientists self-organized to address this challenge. The team includes core members Yingyao Zhou, Bin Zhou, Lars Pache, Max Chang, Christopher Benner, and
Sumit Chanda, as well a
other contributorsover the time. Metascape was first released as a beta version on Oct 8, 2015. The first Metascape application was published on Dec 9, 2015. Metascape has gone through multiple releases since then. It currently supports key model organisms, pathway enrichment analysis, protein-protein interaction network and component analysis, automatic presentation of the results as publication-ready web report, Excel and PowerPoint presentations.
The paper titled "Metascape provides a biologist-oriented resource for the analysis of systems-level datasets" was published on Apr 3, 2019 in Nature Communications.
Analysis workflow
Metascape implements a CAME analysis workflow:
* Conversion: Convert gene identifiers from popular types (such as
Symbol
A symbol is a mark, Sign (semiotics), sign, or word that indicates, signifies, or is understood as representing an idea, physical object, object, or wikt:relationship, relationship. Symbols allow people to go beyond what is known or seen by cr ...
,
RefSeq
The Reference Sequence (RefSeq) database is an open access, annotated and curated collection of publicly available nucleotide sequences (DNA, RNA) and their protein products. RefSeq was introduced in 2000. This database is built by National Center ...
,
Ensembl
Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other v ...
,
UniProt
UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived fro ...
,
UCSC) into human
Entrez
The Entrez () Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website. The NCB ...
gene IDs and vice versa.
* Annotation: Extract from dozens of function-relevant gene annotations, including protein families, transmembrane/secreted predictions, disease associations, compound associations, etc.
* Membership: Flag gene memberships based on a custom keyword search within selected ontologies, e.g., highlight known "invasion" genes.
* Enrichment: Identify enriched biological themes, particularly
GO terms
Players of the game of Go often use jargon to describe situations on the board and surrounding the game. Such technical terms are likely to be encountered in books and articles about Go in English as well as other languages. Many of these terms ...
,
KEGG
KEGG (Kyoto Encyclopedia of Genes and Genomes) is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis ...
,
Reactome Reactome is a free online database of biological pathways. It is manually curated and authored by PhD-level biologists, in collaboration with Reactome editorial staff. The content is cross-referenced to many bioinformatics databases. The rationale ...
,
BioCarta,
WikiPathways
WikiPathways is a community resource for contributing and maintaining content dedicated to biological pathways. Any registered WikiPathways user can contribute, and anybody can become a registered user. Contributions are monitored by a group of a ...
as well as other pathways and data sets collected i
MSigDB etc. In addition, enriched ontology terms are automatically clustered to reduce redundancy for easier interpretation. Protein-protein interaction networks are constructed based on
STRING
String or strings may refer to:
*String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects
Arts, entertainment, and media Films
* ''Strings'' (1991 film), a Canadian anim ...
,
BioGRID, OmniPath, etc. Dense components are identified and biologically interpreted.
Metascape integrated over 40 bioinformatics knowledgebase into a seamless user interface, where experimental biologists can use a single-click Express Analysis feature to turn multiple gene lists into interpretable results.
Analysis report
All analysis results are presented in a web report, which contains
Excel annotation and enrichment sheets,
PowerPoint slides, and custom analysis files (e.g., .cys file by
Cytoscape
Cytoscape is an Open-source software, open source bioinformatics software platform for Visualization (graphic), visualizing Metabolic network modelling, molecular interaction networks and integrating with gene expression profiles and other state da ...
, .svg b
Circos for further offline analysis or processing.
One noticeable strength of Metascape is its visualization capability. Metascape has aided in the interpretation of 2,600 published studies as of December 2021, among which, 2/3 of publications made use of graphs or sheets prepared by Metascape.
MSBio
Metascape for Bioinformaticians (MSBio) was released in 2021 to meet the growing needs of computational biologists to automate Metascape batch analyzes for large-scale gene lists.
MSBio leverages the power of container technology to encapsulate the computational platform in
Docker containers. Academic users can conduct offline analyses, which is only limited by the hardware they have access to. Commercial users have the capability of adding proprietary knowledgebase and conducting secure computations using internal computational assets. MSBio databases are updated in synchronization with the Metascape website.
References
External links
*{{Official website, http://metascape.org
Biological databases
Bioinformatics software
Laboratory software
Systems biology