HOME

TheInfoList



OR:

The BioCompute Object (BCO) project is a community-driven initiative to build a framework for standardizing and sharing computations and analyses generated from
High-throughput sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, thymine, cytosine, and guanine. The ...
(HTS—also referred to as
next-generation sequencing Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation ...
or massively parallel sequencing). The project has since bee
standardized
as IEEE 2791-2020, and the project files are maintained in a
open source repository
Th
July 22nd, 2020 edition
of the Federal Register announced that the
FDA The United States Food and Drug Administration (FDA or US FDA) is a federal agency of the Department of Health and Human Services. The FDA is responsible for protecting and promoting public health through the control and supervision of food ...
now supports the use of BioCompute (officially known as IEEE 2791-2020) in regulatory submissions, and the inclusion of the standard in the Data Standards Catalog for the submission of HTS data i
NDAs, ANDAs, BLAs, and INDs
to CBER, CDER, and CFSAN. Originally started as a collaborative contract between the
George Washington University The George Washington University (GW or GWU) is a Private university, private University charter#Federal, federally-chartered research university in Washington, D.C., United States. Originally named Columbian College, it was chartered in 1821 by ...
and the
Food and Drug Administration The United States Food and Drug Administration (FDA or US FDA) is a List of United States federal agencies, federal agency of the United States Department of Health and Human Services, Department of Health and Human Services. The FDA is respo ...
, the project has grown to include over 20 universities, biotechnology companies, public-private partnerships and pharmaceutical companies including Seven Bridges and
Harvard Medical School Harvard Medical School (HMS) is the medical school of Harvard University and is located in the Longwood Medical and Academic Area, Longwood Medical Area in Boston, Massachusetts. Founded in 1782, HMS is the third oldest medical school in the Un ...
. The BCO aims to ease the exchange of HTS workflows between various organizations, such as the FDA, pharmaceutical companies, contract research organizations, bioinformatic platform providers, and academic researchers. Due to the sensitive nature of regulatory filings, few direct references to material can be published. However, the project is currently funded to train FDA Reviewers and administrators to read and interpret BCOs, and currently has 4 publications either submitted or nearly submitted.


Background

One of the biggest challenges in bioinformatics is documenting and sharing scientific workflows in such a way that the computation and its results can be peer-reviewed or reliably reproduced. Bioinformatic
pipelines A pipeline is a system of pipes for long-distance transportation of a liquid or gas, typically to a market area for consumption. The latest data from 2014 gives a total of slightly less than of pipeline in 120 countries around the world. The Un ...
typically use multiple pieces of software, each of which typically has multiple versions available, multiple input parameters, multiple outputs, and possibly platform-specific configurations. As with experimental parameters in a laboratory protocol, small changes in computational parameters may have a large impact on the scientific validity of the results. The BioCompute Framework provides an
object oriented design Object-oriented analysis and design (OOAD) is a technical approach for analyzing and designing an application, system, or business by applying object-oriented programming, as well as using visual modeling throughout the software development proc ...
from which a BCO that contains details of a pipeline and how it was used can be constructed, digitally signed, and shared. The BioCompute concept was originally developed to satisfy FDA regulatory research and review needs for evaluation, validation, and verification of genomics data. However, the Biocompute Framework follows FAIR Data Principles and can be used broadly to provide communication and
interoperability Interoperability is a characteristic of a product or system to work with other products or systems. While the term was initially defined for information technology or systems engineering services to allow for information exchange, a broader de ...
between different platforms, industries, scientists and regulators


Utility

As a standardization for genomic data, BioCompute Objects are mostly useful to three groups of users: 1) academic researchers carrying out new genetic experiments, 2) pharma/biotech companies that wish to submit work to the FDA for regulatory review, and 3) clinical settings (hospitals and labs) that offer genetic tests and personalized medicine. The utility to academic researchers is the ability to reproduce experimental data more accurately and with less uncertainty. The utility to entities wishing to submit work to the FDA is a streamlined approach, again with less uncertainty and with the ability to more accurately reproduce work. For clinical settings, it is critical that HTS data and clinical metadata be transmitted in an accurate way, ideally in a standardized way that is readable by any stakeholder, including regulatory partners.


Format

The BioCompute Object is in
json JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
format and, at a minimum, contains all the software versions and parameters necessary to evaluate or verify a computational pipeline. It may also contain input data as files or links, reference genomes, or executable Docker components. A BioCompute Object can be integrated with HL7 FHIR as a Provenance Resource. Multiple joint implementations are also under development that leverage BCO's report-centric format, including CWL (one of which is part of an active government funded public contract with a cofounder of CWL to pilot and generate documentation for a joint BCO-CWL, as well as examples) and RO.


BCO Consortium

The BioCompute Object working group facilitated a means for different stakeholders to provide input on current practices on the BCO. This working group was formed during preparation for th
2017 HTS Computational Standards for Regulatory Sciences Workshop
and was initially made up of the workshop participants. The growth and work of the BCO working group, as a direct result of the interaction between a variety of stakeholders from all interested communities, culminated in the official standard
IEEE 2791-2020
which was approved in January 2020. A Public-Private partnerships was formed between GWU and CBER and has become an easy point of entry for new individuals or institutions into the BCO project to participate in the discussion of best practices for the objects.


Implementations

The simple R package biocompute can create, validate, and export BioCompute Objects. Th
Genomics Compliance Suite
is a Shiny app that offers similar features to regular expressions found in all modern text editors. There are several internally developed
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
software packages and web applications that implement the BioCompute specification, three of which have been deployed in a publicly accessible AWS EC2
cloud In meteorology, a cloud is an aerosol consisting of a visible mass of miniature liquid droplets, frozen crystals, or other particles, suspended in the atmosphere of a planetary body or similar space. Water or various other chemicals may ...
. These include an instance of the High-performance Integrated Virtual Environment, th
BioCompute Portal
ref name="bco_editor"> (a form-based web application that can create and edit BioCompute Objects based on the IEEE-2791-2020
standard Standard may refer to: Symbols * Colours, standards and guidons, kinds of military signs * Standard (emblem), a type of a large symbol or emblem used for identification Norms, conventions or requirements * Standard (metrology), an object ...
, and a BioCompute compliant instance o
Galaxy
Some bioinformatics platforms have built-in support for Biocompute, which let a user automatically create a BCO from a workflow and edit the descriptive content. * DNAnexus and PrecisionFDA facilitate the generation of BCOs by importing workflows, allowing users to edit descriptive content. The platform supports metadata import and export of WDL and CWL scripts, and offers the BCOnexus tool, which is a high-level, platform-free tool with a graphical user interface that lets a user merge BCOs. * Velsera's Seven Bridges Genomics and Cancer Genomics Cloud also have support for BioCompute by enabling direct pre-population of BCO fields from workflows. * BioCompute has also been integrated int
HIVE
and the main Galaxy instance, both of which similarly enable users to automatically generate BCOs and edit content within these platforms. * BioCompute has also been implemented in the Common Fund Data Elements Playbook Partnership project. This implementation lets a user save a workflow when they're satisfied with the results, which aids in traceability through the network of independently-versioned resources, allowing users to save queries and annotate them for future use, sharing, or repeatability, aligning with its role in advancing bioinformatics practices. Integration into platforms is meant to improve data handling and collaboration and provide effective ways for users to execute a workflow, and graphical representations of BCOs are often more intuitive ways of browsing or reading BCOs.


References

{{Reflist


External links


Official WebsiteIEEE 2791-2020 open source project
Bioinformatics software Interoperability JSON DNA sequencing