The BioCompute Object (BCO) Project is a community-driven initiative to build a framework for standardizing and sharing computations and analyses generated from
High-throughput sequencing
DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The ...
(HTS -- also referred to as
next-generation sequencing or
massively parallel sequencing Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation s ...
). The project has since bee
standardizedas IEEE 2791-2020, and the project files are maintained in a
open source repository Th
July 22nd, 2020 editionof the Federal Register announced that the
FDA now supports the use of BioCompute (officially known as IEEE 2791-2020) in regulatory submissions, and the inclusion of the standard in the Data Standards Catalog for the submission of HTS data i
NDAs, ANDAs, BLAs, and INDsto
CBER,
CDER
The Center for Drug Evaluation and Research (CDER, pronounced "see'-der") is a division of the U.S. Food and Drug Administration (FDA) that monitors most drugs as defined in the Food, Drug, and Cosmetic Act. Some biological products are also leg ...
, and
CFSAN.
Originally started as a collaborative contract between the
George Washington University
, mottoeng = "God is Our Trust"
, established =
, type = Private federally chartered research university
, academic_affiliations =
, endowment = $2.8 billion (2022)
, presi ...
and the
Food and Drug Administration
The United States Food and Drug Administration (FDA or US FDA) is a federal agency of the Department of Health and Human Services. The FDA is responsible for protecting and promoting public health through the control and supervision of food ...
, the project has grown to include over 20 universities, biotechnology companies, public-private partnerships and pharmaceutical companies including Seven Bridges and
Harvard Medical School
Harvard Medical School (HMS) is the graduate medical school of Harvard University and is located in the Longwood Medical and Academic Area, Longwood Medical Area of Boston, Massachusetts. Founded in 1782, HMS is one of the oldest medical schools ...
. The BCO aims to ease the exchange of HTS workflows between various organizations, such as the FDA, pharmaceutical companies, contract research organizations, bioinformatic platform providers, and academic researchers. Due to the sensitive nature of regulatory filings, few direct references to material can be published. However, the project is currently funded to train FDA Reviewers and administrators to read and interpret BCOs, and currently has 4 publications either submitted or nearly submitted.
Background
One of the biggest challenges in bioinformatics is documenting and sharing
scientific workflows in such a way that the computation and its results can be peer-reviewed or reliably reproduced. Bioinformatic
pipelines typically use multiple pieces of software, each of which typically has multiple versions available, multiple input parameters, multiple outputs, and possibly platform-specific configurations. As with experimental parameters in a laboratory protocol, small changes in computational parameters may have a large impact on the scientific validity of the results. The BioCompute Framework provides an
object oriented design
Object-oriented design (OOD) is the process of planning a system of interacting objects for the purpose of solving a software problem. It is one approach to software design.
Overview
An object contains encapsulated data and procedures grouped t ...
from which a BCO that contains details of a pipeline and how it was used can be constructed,
digitally signed
A digital signature is a mathematical scheme for verifying the authenticity of digital messages or documents. A valid digital signature, where the prerequisites are satisfied, gives a recipient very high confidence that the message was created b ...
, and shared. The BioCompute concept was originally developed to satisfy FDA regulatory research and review needs for evaluation, validation, and verification of genomics data. However, the Biocompute Framework follows FAIR Data Principles and can be used broadly to provide communication and
interoperability
Interoperability is a characteristic of a product or system to work with other products or systems. While the term was initially defined for information technology or systems engineering services to allow for information exchange, a broader def ...
between different platforms, industries, scientists and regulators
Utility
As a standardization for genomic data, BioCompute Objects are mostly useful to three groups of users: 1) academic researchers carrying out new genetic experiments, 2) pharma/biotech companies that wish to submit work to the FDA for regulatory review, and 3) clinical settings (hospitals and labs) that offer genetic tests and
personalized medicine
Personalized medicine, also referred to as precision medicine, is a medical model that separates people into different groups—with medical decisions, practices, interventions and/or products being tailored to the individual patient based on the ...
. The utility to academic researchers is the ability to reproduce experimental data more accurately and with less uncertainty. The utility to entities wishing to submit work to the FDA is a streamlined approach, again with less uncertainty and with the ability to more accurately reproduce work. For clinical settings, it is critical that HTS data and clinical metadata be transmitted in an accurate way, ideally in a standardized way that is readable by any stakeholder, including regulatory partners.
Format
The BioCompute Object is in
json
JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other s ...
format and, at a minimum, contains all the software versions and parameters necessary to evaluate or verify a computational pipeline. It may also contain input data as files or links, reference genomes, or executable Docker components. A BioCompute Object can be integrated with
HL7 FHIR as a Provenance Resource. Multiple joint implementations are also under development that leverage BCO's report-centric format, including CWL (one of which is part of an active government funded public contract with a cofounder of CWL to pilot and generate documentation for a joint BCO-CWL, as well as examples) and RO.
BCO Consortium
The BioCompute Object working group facilitates a means for different stakeholders to provide input on current practices on the BCO. This working group was formed during preparation for th
2017 HTS Computational Standards for Regulatory Sciences Workshop and was initially made up of the workshop participants. There has been a continual growth of the BCO working group as a direct result of the interaction between a variety of stakeholders from all interested communities in standardization of computational HTS data processing. The
Public-Private partnerships formed between universities, private genomic data companies, software platforms, government and regulatory institutions have been an easy point of entry for new individuals or institutions into the BCO project to participate in the discussion of best practices for the objects.
Implementations
The simple R package biocompute
can create, validate, and export BioCompute Objects. Th
Genomics Compliance Suiteis a Shiny app that offers similar features to regular expressions found in all modern text editors. There are several internally developed
open source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
software packages and web applications that implement the BioCompute specification, three of which have been deployed in a publicly accessible
AWS
Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon that provides on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered pay-as-you-go basis. These cloud computing web services provide di ...
EC2 cloud
In meteorology, a cloud is an aerosol consisting of a visible mass of miniature liquid droplets, frozen crystals, or other particles suspended in the atmosphere of a planetary body or similar space. Water or various other chemicals may ...
. These include an instance of the
High-performance Integrated Virtual Environment, th
BioCompute Portalref name="bco_editor"> (a form-based web application that can create and edit BioCompute Objects based on the IEEE-2791-2020
standard, and a BioCompute compliant instance of
Galaxy
A galaxy is a system of stars, stellar remnants, interstellar gas, dust, dark matter, bound together by gravity. The word is derived from the Greek ' (), literally 'milky', a reference to the Milky Way galaxy that contains the Solar Sys ...
.
References
{{Reflist
External links
Official WebsiteIEEE 2791-2020 open source project
Bioinformatics software
Interoperability
JSON
DNA sequencing