The PRIDE (PRoteomics IDEntifications database) is a public data repository of

mass spectrometry Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a ''mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is used ...

-based

proteomics Proteomics is the large-scale study of proteins. Proteins are vital macromolecules of all living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replicatio ...

data, and is maintained by the

European Bioinformatics Institute The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wel ...

as part of the Proteomics Team. Originally designed by Lennart Martens in 2003 during a stay at the

as a Marie Curie fellow of the

European Commission The European Commission (EC) is the primary Executive (government), executive arm of the European Union (EU). It operates as a cabinet government, with a number of European Commissioner, members of the Commission (directorial system, informall ...

in the "Quality of Life" Programme (Contract number: QLRI-1999-50595), PRIDE was established as a production service in 2005. The original grant application document from June 2013 to start construction of PRIDE has since been published in a viewpoint article. Several similar proteomics databases have been built, including the GPMDB,

PeptideAtlas PeptideAtlas is a proteomics data resource that gathers tandem mass spectrometry datasets from around the world, reprocesses them with the Trans-Proteomic Pipeline, and makes the combined result freely available to the community. Peptide Atlas is ...

, Proteinpedia and the NCBI Peptidome. The PRIDE database constitutes a structured data repository, and stores the original experimental data from the researchers without editorial control over the submitted data. In total, PRIDE contains data from about 60 species, the biggest fraction of it coming from human samples (including the data from the two draft human proteomes) followed by the fruit fly ''

Drosophila melanogaster ''Drosophila melanogaster'' is a species of fly (an insect of the Order (biology), order Diptera) in the family Drosophilidae. The species is often referred to as the fruit fly or lesser fruit fly, or less commonly the "vinegar fly", "pomace fly" ...

'' and mouse.

Formats and the submission process

Since detailed proteomics data currently cannot be curated from the existing literature, the source of PRIDE data is solely submissions by academic researchers. PRIDE is a standards-compliant public repository, meaning that its own

XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...

-based data exchange format for submissions, PRIDE XML, was built around the

Proteomics Standards Initiative The Proteomics Standards Initiative (PSI) is a working group of the Human Proteome Organization. It aims to define data standards for proteomics to facilitate data comparison, exchange and verification. The Proteomics Standards Initiative focuse ...

mzData standard for

. Recently, PRIDE has been adapted to work with the modern mzML and mzIdentML standards of the

. An additional format, dubbed mzTab, can be used as a simplified way to submit quantitative proteomics data. As there are many types of different mass spectrometry instruments and software formats are currently on the market, wet-lab scientists without a strong bioinformatics background or informatics support were having problems converting their data to PRIDE XML. The development of PRIDE Converter helped to tackle this situation. PRIDE Converter is a tool, written in the

Java programming language Java is a high-level, general-purpose, memory-safe, object-oriented programming language. It is intended to let programmers ''write once, run anywhere'' ( WORA), meaning that compiled Java code can run on all platforms that support Jav ...

, that converts 15 different input mass spectrometry data formats into PRIDE XML via a wizard-like graphical user interface. It is freely available and is open source under the permissive

Apache License The Apache License is a permissive free software license written by the Apache Software Foundation (ASF). It allows users to use the software for any purpose, to distribute it, to modify it, and to distribute modified versions of the software ...

. A new version of PRIDE Converter was released in 2012 as PRIDE Converter 2. This new version constituted a complete rewrite, focused on easy adaptability to different (and evolving) data sources.

Browsing, searching and data mining PRIDE

Currently, data can be queried from PRIDE via the PRIDE web interface, through the stand-alone Java client PRIDE Inspector, or coupled directly to several search engines through PeptideShaker. Moreover, a new RESTful API allows convenient programmatic access to the PRIDE archive. The extensive use of controlled vocabularies (CVs) and ontologies for flexible yet context-sensitive annotation of data, along with the ability to perform intelligent queries by these annotations, are key features of PRIDE.

Involvement in ProteomeXchange

The ProteomeXchange consortium has been set up to provide a coordinated submission of MS proteomics data to the main existing proteomics repositories, and to encourage optimal data dissemination. The consortium contains several member databases, including PRIDE and

. The earliest conception of ProteomeXchange stems from a meeting at the

HUPO The Human Proteome Organization (HUPO) is an international consortium of national proteomics research associations, government researchers, academic institutions, and industry partners. The organization was launched in February 2001, and it promotes ...

2005 conference in Munich, where the main proteomics data repositories at the time agreed in principle to exchange their data, and thus provide a means for the user to find public proteomics data at any of the participating databases. Due to the rapid development of the field, and the need to first develop suitable standards for data exchange, it took almost ten years from that meeting to actually implement this system, an effort that was funded by the 'ProteomeXchange' Coordination Action grant of the European Commission's Seventh Framework Programme.

Data recovery after the discontinuation of Peptidome

The NCBI Peptidome database was discontinued in 2011, yet a joint effort by the PRIDE and Peptidome teams resulted in the transfer of all Peptidome data to PRIDE.

References

{{Reflist

External links

PRIDE homepage
Protein databases Proteomics Science and technology in Cambridgeshire South Cambridgeshire District