HOME

TheInfoList



OR:

The BioCompute Object (BCO) Project is a community-driven initiative to build a framework for standardizing and sharing computations and analyses generated from
High-throughput sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The ...
(HTS -- also referred to as next-generation sequencing or
massively parallel sequencing Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation s ...
). The project has since bee
standardized
as IEEE 2791-2020, and the project files are maintained in a
open source repository
Th
July 22nd, 2020 edition
of the Federal Register announced that the FDA now supports the use of BioCompute (officially known as IEEE 2791-2020) in regulatory submissions, and the inclusion of the standard in the Data Standards Catalog for the submission of HTS data i
NDAs, ANDAs, BLAs, and INDs
to CBER,
CDER The Center for Drug Evaluation and Research (CDER, pronounced "see'-der") is a division of the U.S. Food and Drug Administration (FDA) that monitors most drugs as defined in the Food, Drug, and Cosmetic Act. Some biological products are also leg ...
, and CFSAN.

Originally started as a collaborative contract between the
George Washington University , mottoeng = "God is Our Trust" , established = , type = Private federally chartered research university , academic_affiliations = , endowment = $2.8 billion (2022) , presi ...
and the
Food and Drug Administration The United States Food and Drug Administration (FDA or US FDA) is a federal agency of the Department of Health and Human Services. The FDA is responsible for protecting and promoting public health through the control and supervision of food ...
, the project has grown to include over 20 universities, biotechnology companies, public-private partnerships and pharmaceutical companies including Seven Bridges and
Harvard Medical School Harvard Medical School (HMS) is the graduate medical school of Harvard University and is located in the Longwood Medical and Academic Area, Longwood Medical Area of Boston, Massachusetts. Founded in 1782, HMS is one of the oldest medical schools ...
. The BCO aims to ease the exchange of HTS workflows between various organizations, such as the FDA, pharmaceutical companies, contract research organizations, bioinformatic platform providers, and academic researchers. Due to the sensitive nature of regulatory filings, few direct references to material can be published. However, the project is currently funded to train FDA Reviewers and administrators to read and interpret BCOs, and currently has 4 publications either submitted or nearly submitted.


Background

One of the biggest challenges in bioinformatics is documenting and sharing scientific workflows in such a way that the computation and its results can be peer-reviewed or reliably reproduced. Bioinformatic pipelines typically use multiple pieces of software, each of which typically has multiple versions available, multiple input parameters, multiple outputs, and possibly platform-specific configurations. As with experimental parameters in a laboratory protocol, small changes in computational parameters may have a large impact on the scientific validity of the results. The BioCompute Framework provides an
object oriented design Object-oriented design (OOD) is the process of planning a system of interacting objects for the purpose of solving a software problem. It is one approach to software design. Overview An object contains encapsulated data and procedures grouped t ...
from which a BCO that contains details of a pipeline and how it was used can be constructed,
digitally signed A digital signature is a mathematical scheme for verifying the authenticity of digital messages or documents. A valid digital signature, where the prerequisites are satisfied, gives a recipient very high confidence that the message was created b ...
, and shared. The BioCompute concept was originally developed to satisfy FDA regulatory research and review needs for evaluation, validation, and verification of genomics data. However, the Biocompute Framework follows FAIR Data Principles and can be used broadly to provide communication and
interoperability Interoperability is a characteristic of a product or system to work with other products or systems. While the term was initially defined for information technology or systems engineering services to allow for information exchange, a broader def ...
between different platforms, industries, scientists and regulators


Utility

As a standardization for genomic data, BioCompute Objects are mostly useful to three groups of users: 1) academic researchers carrying out new genetic experiments, 2) pharma/biotech companies that wish to submit work to the FDA for regulatory review, and 3) clinical settings (hospitals and labs) that offer genetic tests and
personalized medicine Personalized medicine, also referred to as precision medicine, is a medical model that separates people into different groups—with medical decisions, practices, interventions and/or products being tailored to the individual patient based on the ...
. The utility to academic researchers is the ability to reproduce experimental data more accurately and with less uncertainty. The utility to entities wishing to submit work to the FDA is a streamlined approach, again with less uncertainty and with the ability to more accurately reproduce work. For clinical settings, it is critical that HTS data and clinical metadata be transmitted in an accurate way, ideally in a standardized way that is readable by any stakeholder, including regulatory partners.


Format

The BioCompute Object is in
json JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other s ...
format and, at a minimum, contains all the software versions and parameters necessary to evaluate or verify a computational pipeline. It may also contain input data as files or links, reference genomes, or executable Docker components. A BioCompute Object can be integrated with HL7 FHIR as a Provenance Resource. Multiple joint implementations are also under development that leverage BCO's report-centric format, including CWL (one of which is part of an active government funded public contract with a cofounder of CWL to pilot and generate documentation for a joint BCO-CWL, as well as examples) and RO.


BCO Consortium

The BioCompute Object working group facilitates a means for different stakeholders to provide input on current practices on the BCO. This working group was formed during preparation for th
2017 HTS Computational Standards for Regulatory Sciences Workshop
and was initially made up of the workshop participants. There has been a continual growth of the BCO working group as a direct result of the interaction between a variety of stakeholders from all interested communities in standardization of computational HTS data processing. The Public-Private partnerships formed between universities, private genomic data companies, software platforms, government and regulatory institutions have been an easy point of entry for new individuals or institutions into the BCO project to participate in the discussion of best practices for the objects.


Implementations

The simple R package biocompute can create, validate, and export BioCompute Objects. Th
Genomics Compliance Suite
is a Shiny app that offers similar features to regular expressions found in all modern text editors. There are several internally developed
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
software packages and web applications that implement the BioCompute specification, three of which have been deployed in a publicly accessible
AWS Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon that provides on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered pay-as-you-go basis. These cloud computing web services provide di ...
EC2
cloud In meteorology, a cloud is an aerosol consisting of a visible mass of miniature liquid droplets, frozen crystals, or other particles suspended in the atmosphere of a planetary body or similar space. Water or various other chemicals may ...
. These include an instance of the High-performance Integrated Virtual Environment, th
BioCompute Portal
ref name="bco_editor"> (a form-based web application that can create and edit BioCompute Objects based on the IEEE-2791-2020 standard, and a BioCompute compliant instance of
Galaxy A galaxy is a system of stars, stellar remnants, interstellar gas, dust, dark matter, bound together by gravity. The word is derived from the Greek ' (), literally 'milky', a reference to the Milky Way galaxy that contains the Solar Sys ...
.


References

{{Reflist


External links


Official WebsiteIEEE 2791-2020 open source project
Bioinformatics software Interoperability JSON DNA sequencing